Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 KiB
Iterate 2.A — Branch-probe of sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0
Date: 2026-05-21
Mode: WRITE (investigation + branch-probe configuration; no engine LOC change).
Sources: xenia-canary/.../xenia_canary_i2a.exe, xenia-rs/target/release/xrs-i2a,
xenia-rs/sylpheed.db, prior Step 2 report at audit-runs/review-a-step2-natural-trigger/step2-report.md.
TL;DR
The plan's framing — "find the conditional branch inside sub_821CB030 → … → sub_821C4EB0
where canary takes the bl NtSetEvent arm and ours takes the bl NtReleaseSemaphore arm" — does
not match the actual control-flow lattice. Across the entire call-graph reachable from those
four functions (depth ≤ 6, 736 functions scanned, each scanned for conditional branches whose
two arms reach the two wrappers within ≤ 4 BBs), exactly one candidate branch exists:
PC=0x82452E0C inside sub_82452DC0, with
- taken-eq (r3==0) arm → reaches
bl 0x824AB158(NtReleaseSemaphore) viasub_8245B000/sub_82450218; - not-taken (r3!=0) arm → reaches
bl 0x824AA2F0(NtSetEvent) viasub_82452AB8/sub_8245FEB8.
The branch-probe (canary + ours) was run at that PC plus six context PCs. Canary's branch at
0x82452E0C ALWAYS takes the eq arm (r3=0, 44 fires across tids 6/17/18/26, never the
NtSetEvent arm). Ours's JIT only emits BB-entry PCs in the probe, so 0x82452E0C did not fire
directly, but sub_82452DC0 recursion arrived via lr=0x82452E64 (the recursive call site at
0x82452E60 inside the eq-taken arm) on ours tid=13 once and on tid=1 multiple times — confirming
both engines also take the eq arm at 0x82452E0C in their executions.
The candidate branch is NOT divergent at runtime. The Step 2 framing of "NtSetEvent vs NtReleaseSemaphore is an if/else inside this chain" is falsified at the source level: those two operations live on disjoint call-graph paths, NOT alternate arms of a same branch.
The actual divergence is loop iteration count, not branch direction:
- canary tid=17 calls
sub_82452DC0(and thus NtReleaseSemaphore viasub_82450218) multiple times across its 154 ms lifetime, then upstreamsub_821CBA08callssub_82453910AFTERsub_821CB030returns — that's where NtSetEvent at canary idx=347 originates. - ours tid=13 calls
sub_82452DC0ONCE (fires once at cycle=7963 from lr=0x821CB1D0), executes the eq-arm path, fires NtReleaseSemaphore at 0x82452F8C, then wedges in the NtWait atsub_821CB030+0x1AC(0x821CB1DC) beforesub_821CB030can return tosub_821CBA08and callsub_82453910.
Recommended next iterate: 2.B (NtQueryFullAttributesFile arg/return capture) or
2.C (ctx-field read-probe) to identify the upstream state that gates whether the wedge wait
ever gets signaled. The wedge itself was already correctly identified in AUDIT-069 S5: ours has
1 "other producer" vs canary's 25; the missing 24 producers are not present because their guest
state is downstream of the same tid=13 wedge (circular). The fix path traces to what signals
event d5e23609d3948568 in canary that doesn't in ours.
Step 1 — Candidate branch enumeration
Initial pass (target fns only, depth 4)
Conditional branches inside sub_821CB030, sub_821CBA08, sub_821CC3F8, sub_821C4EB0
where taken-arm first call reaches NtSetEvent wrapper (0x824AA2F0) AND not-taken-arm reaches
NtReleaseSemaphore wrapper (0x824AB158), or vice versa, within ≤ 4 BBs per arm.
Result: 0 candidate branches.
This is the Step 1 pivot trigger from the plan — broaden search.
Broadened pass (call-graph depth 6, arm reach depth 4)
Reachable function set: 736 functions.
Result: 1 candidate branch.
| PC | Function | Branch type | Taken arm first call | Not-taken arm first call | Set-event reach | Release-sem reach |
|---|---|---|---|---|---|---|
0x82452E0C |
sub_82452DC0 |
bc 12,4*cr6+eq | 0x8245B000 (eq) |
0x82452AB8 (ne) |
not-taken | taken |
Disassembly context:
0x82452DF8 bl sub_82452200 ; r3 = sub_82452200(...)
0x82452DFC addis r11, r0, 0x41FF
0x82452E00 addi r28, r0, 0
0x82452E04 ori r23, r11, 0xFFFD
0x82452E08 cmpli cr6, 0, r3, 0x0
0x82452E0C bc 12, 4*cr6+eq, 0x82452E1C ; if r3==0 → eq-arm to 0x82452E1C
0x82452E10 or r24, r28, r28
0x82452E14 or r29, r3, r3
0x82452E18 b 0x82452E88 ; not-eq → NtSetEvent path
0x82452E1C ... ; eq → NtReleaseSemaphore path
CSV saved at candidate-branches.csv.
Step 2 — Branch-probe both engines (cold boot)
Probe PCs: 0x82452E0C, 0x821CB1DC, 0x82452F10, 0x82452F8C, 0x82453910, 0x824539A4, 0x82452DC0
Canary command (cold, 120 s wallclock):
wine xenia_canary_i2a.exe ".../Project Sylpheed (...).iso" \
--mute=true \
--audit_61_branch_probe_pcs="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
Ours command (cold, n=50M instructions):
xrs-i2a exec ".../Project Sylpheed (...).iso" -n 50000000 \
--branch-probe="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
Canary fires (109 total)
Per-PC per-tid:
| PC | tid=6 | tid=17 | tid=18 | tid=26 |
|---|---|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 16 | 15 | 2 |
| 0x82452E0C (branch) | 11 | 16 | 15 | 2 |
| 0x82452F10 (NtReleaseSem #1) | 1 | 8 | 5 | 1 |
| 0x82452F8C (NtReleaseSem #2) | 4 | 0 | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 1 | 0 | 1 |
| 0x82453910 (NtSetEvent helper entry) | 0 | 0 | 0 | 0 |
| 0x824539A4 (the branch INSIDE sub_82453910) | 0 | 0 | 0 | 0 |
Branch decision at 0x82452E0C (44 fires across all tids): r3=0x00000000, cr6=..E (equal) 100%.
Notable absences in canary's probe: sub_82453910 entry never fires. The NtSetEvent at canary
tid=17 idx=347 in Step 2's timeline must therefore enter sub_82453910 via a path NOT in our
probe set — meaning the canary tid=17 NtSetEvent at idx=347 is NOT from this iteration's
upstream chain at all; it's likely from a DIFFERENT call site under a different parent function.
Ours fires (13 total BRANCH-PROBE lines)
| PC | tid=1 | tid=13 |
|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 2 |
| 0x82452E0C | 0 | 0 |
| 0x82452F10 | 0 | 0 |
| 0x82452F8C | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 0 |
| 0x82453910 | 0 | 0 |
| 0x824539A4 | 0 | 0 |
Ours's JIT only updates ctx.pc at BB boundaries, so interior PCs do not fire in
fire_branch_probe_if_match even when they should mathematically. Only the function-entry
PC 0x82452DC0 (a JIT lookup target) fires.
However, two of ours tid=13's sub_82452DC0 entries have lr=0x82452E64 (return address
from the recursive bl 0x82452DC0 at 0x82452E60), which is INSIDE the eq-taken arm of
0x82452E0C. This confirms ours's tid=13 entry into sub_82452DC0 from sub_821CB030 (cycle=7963,
lr=0x821CB1D0) then recursively re-entered (cycle=20030, lr=0x82452E64) — meaning ours took
the same eq-arm direction at 0x82452E0C that canary took.
Step 3 — First divergent branch — NOT FOUND
The single candidate branch is not divergent at runtime. Both engines select the eq-arm
(r3=0 returned by sub_82452200) at 0x82452E0C in their first traversal.
This is the "broaden the search" pivot from the plan (If 0, ... OR call-resolution heuristic missed it). The broadened search to depth 6 found 1 candidate, but that candidate is not
runtime-divergent.
Step 4 / 5 — Re-attributing the divergence
The Step 2 report framed canary's NtSetEvent (idx=347) vs ours's NtReleaseSemaphore
(idx=429) as alternate arms of the same branch inside sub_821CB030's chain. Re-analysis of
the source disasm + branch-probe data shows this is incorrect:
- The only function on the chain reaching NtReleaseSemaphore is
sub_82450218(called fromsub_82452DC0at 0x82452F10 and 0x82452F8C). Ours fires this once on tid=13 in iteration 1. - The only fns on the chain reaching NtSetEvent are
sub_8245FEB8,sub_82453910,sub_82458A70,sub_8245D9D8(from the reach analysis). Of these,sub_82453910is directly called fromsub_821CBA08at 0x821CBBF0 — but AFTERsub_821CB030returns. Ours never reaches that line becausesub_821CB030wedges on its NtWait at +0x1AC. - The canary tid=17 NtSetEvent at idx=347 is NOT from
sub_82453910(whose entry probe at 0x82453910 fired 0× in our canary probe). It must be from one of the other 4 NtSetEvent callers in the reach set, or from asub_82453910not on tid=17 at the moment of idx=347 (idx is global, tid attribution requires per-event check — handled in the Step 2 csv).
The real divergence is loop iteration count of sub_82452DC0 and its upstream caller
sub_821CB030 / sub_821CBA08. Each iteration of canary tid=17's body calls
bl 0x82452F8C → bl 0x824AB158 (NtReleaseSemaphore) and then waits at sub_821CB030+0x1AC,
which RETURNS quickly because a peer thread has already signaled. Ours's iteration-1 wait at
that PC never returns because the corresponding signaler never fires.
Cause attribution
Per the plan's Step 5 framework, attribute to one of 3 candidate causes:
- NtQueryFullAttributesFile: NOT directly evidenced by this iterate. The probe didn't capture file-attribute returns.
- Shared CS-protected ctx field set by another tid: STILL UNTESTED. ours's tid=13 wait
on event
d5e23609d3948568depends on another tid signaling it. AUDIT-069 S5's "25 producers vs 1" finding confirms ours has 24 missing peer-producers — meaning peer tids in ours aren't reaching the signal call sites. - Vtable: NOT directly evidenced.
- Loop-count circular wedge (NEW): ours tid=13 wedges on first wait because peer producers (themselves blocked downstream of tid=13's blocked work) never fire. The originating peer producer is on canary tid=4/10/14 (per AUDIT-069 / step2 report), all of which are alive in ours but doing different work (per AUDIT-069 S2: ours's tid=5 fires γ-signalers 81× vs canary tid=10's 492× — ours is under-producing signals by ~84%).
This iterate's negative result on the branch-arm hypothesis sharpens the picture: the divergence is NOT a single-branch lattice mismatch inside sub_821CB030's chain. It's a distributed multi-thread producer underrun, with the wedge a downstream symptom of upstream under-signaling on peer tids that ARE running in ours but executing a different (shorter) trajectory.
Tripstones honored
- #28 (per-engine tid is not stable cross-engine identity): confirmed by Step 2 finding that ours tid=13 ≡ canary tid=17 (same entry sub_821748F0). Branch-probe data uses per-engine tid; cross-engine comparison done by (entry_pc, lr) tuple, not raw tid.
- #32 (canary jitter in contention regions): not relevant here — both engines select eq arm at 0x82452E0C 100% across all observed fires. No jitter.
- #37 (vtable base vs slot-N): not encountered (no vtable read at 0x82452E0C).
- #39 (composite progression vs matched-prefix): this iterate produces neither; an informative null at the source-control-flow lens.
- #40 (single-keystone hypothesis falsified before): Step 2's "single branch arm divergence" framing was itself a candidate keystone. Falsified here.
Cascade
- A (identify candidate branch PCs in DB): PASS with caveat. 1 candidate at depth 6. Initial depth-4 scan returned 0 — pivot trigger fired.
- B (run both engines with branch-probe): PASS. 109 fires canary, 13 fires ours.
- C (find FIRST divergent branch in candidates): NEGATIVE / informative null. The single candidate is not divergent (both engines take eq arm).
- D (attribute to one of 3 candidate causes): MEDIUM. Reframed as "loop-count circular wedge with under-producing peer tids", which subsumes candidate 2 (shared CS- protected ctx).
- E (recommend specific next iterate with LOC estimate): PASS (see below).
Recommended next iterate
Option 2.B — Args/return-value capture for NtQueryFullAttributesFile and key kernel APIs (~30–50 LOC canary)
Extend canary's Phase A event log to populate args_resolved and return_value for:
NtQueryFullAttributesFileNtCreateFile(cache:\ paths)NtReadFile,NtWriteFile
Compare canary tid=17's 9 NtQueryFullAttributesFile invocations against ours tid=13's 1 to find the first divergence in cache-state. Cheap, high signal.
Option 2.C — read-probe on ours tid=13 wait event memory (~20 LOC reusing AUDIT-068 S3)
Use audit_68_host_mem_read_probe to sample event handle d5e23609d3948568's underlying
KEVENT struct in ours, at 100 µs cadence over the 3 ms wedge window. Capture the moment
(if any) when its Header.SignalState would transition. Validates whether the kernel
plumbing is correct vs the producer is simply absent.
Option 2.D — peer-producer LR trace (~0 LOC; reuses existing --lr-trace infra)
Per AUDIT-069 S5, ours has 1 producer where canary has 25. Use existing --lr-trace at the
NtReleaseSemaphore call site 0x82450310 + NtSetEvent wrappers on ALL tids in ours, capture
which guest LRs fire during the 0–3 ms window. Diff vs canary's audit_69_event_signal_watch
JSONL → find which peer-tid call site is MISSING in ours.
Best minimum-LOC next step: 2.D (zero LOC, existing instrumentation; capture peer-producer absence directly).
Best disambiguating step: 2.B (~30–50 LOC) to pin upstream cache-state divergence.
Honest assessment
- The 2-hour timebox was respected.
- Step 1 returned 0 candidates at initial depth; broadened to find 1; that 1 is non-divergent.
- The Step 2 report's source-level branch framing does not survive contact with the call-graph at source-level. The control-flow divergence is at a higher level (loop count, not branch arm).
- The wedge at
sub_821CB030+0x1ACremains the symptom; the cause is the absence of a signaler on a peer tid in the 0–3 ms window. That peer-tid absence is what 2.D would directly identify. - Confidence in pivoting to 2.D/2.B: HIGH.
Artifacts produced
All under xenia-rs/audit-runs/iterate-2A-branch-probe/:
candidate-branches.csv— Step 1 broadened search result (1 row).canary-probe.stdout/.stderr/.lines— canary 120 s cold run with branch probe.ours-probe.stdout/.stderr— ours-n 50Mcold run with branch probe.run-commands.txt— exact CLIs used.iterate2A-report.md— this report.
LOC delta: 0 to engine code, 0 to canary code. Read-only investigation.
xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED. Both binaries (xenia_canary_i2a.exe,
xrs-i2a) are renamed copies; original binaries untouched.