Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
285 lines
14 KiB
Markdown
285 lines
14 KiB
Markdown
# Iterate 2.A — Branch-probe of sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0
|
||
|
||
**Date:** 2026-05-21
|
||
**Mode:** WRITE (investigation + branch-probe configuration; no engine LOC change).
|
||
**Sources:** `xenia-canary/.../xenia_canary_i2a.exe`, `xenia-rs/target/release/xrs-i2a`,
|
||
`xenia-rs/sylpheed.db`, prior Step 2 report at `audit-runs/review-a-step2-natural-trigger/step2-report.md`.
|
||
|
||
## TL;DR
|
||
|
||
The plan's framing — "find the conditional branch inside `sub_821CB030 → … → sub_821C4EB0`
|
||
where canary takes the `bl NtSetEvent` arm and ours takes the `bl NtReleaseSemaphore` arm" — **does
|
||
not match the actual control-flow lattice**. Across the entire call-graph reachable from those
|
||
four functions (depth ≤ 6, 736 functions scanned, each scanned for conditional branches whose
|
||
two arms reach the two wrappers within ≤ 4 BBs), **exactly one candidate branch exists**:
|
||
|
||
PC=`0x82452E0C` inside `sub_82452DC0`, with
|
||
- taken-eq (r3==0) arm → reaches `bl 0x824AB158` (NtReleaseSemaphore) via `sub_8245B000` /
|
||
`sub_82450218`;
|
||
- not-taken (r3!=0) arm → reaches `bl 0x824AA2F0` (NtSetEvent) via `sub_82452AB8` /
|
||
`sub_8245FEB8`.
|
||
|
||
The branch-probe (canary + ours) was run at that PC plus six context PCs. **Canary's branch at
|
||
0x82452E0C ALWAYS takes the eq arm** (r3=0, 44 fires across tids 6/17/18/26, never the
|
||
NtSetEvent arm). Ours's JIT only emits BB-entry PCs in the probe, so 0x82452E0C did not fire
|
||
directly, but `sub_82452DC0` recursion arrived via `lr=0x82452E64` (the recursive call site at
|
||
0x82452E60 inside the eq-taken arm) on ours tid=13 once and on tid=1 multiple times — confirming
|
||
both engines also take the eq arm at 0x82452E0C in their executions.
|
||
|
||
**The candidate branch is NOT divergent at runtime.** The Step 2 framing of "NtSetEvent vs
|
||
NtReleaseSemaphore is an if/else inside this chain" is **falsified at the source level**: those
|
||
two operations live on disjoint call-graph paths, NOT alternate arms of a same branch.
|
||
|
||
The actual divergence is **loop iteration count**, not branch direction:
|
||
- canary tid=17 calls `sub_82452DC0` (and thus NtReleaseSemaphore via `sub_82450218`) multiple
|
||
times across its 154 ms lifetime, then upstream `sub_821CBA08` calls `sub_82453910` AFTER
|
||
`sub_821CB030` returns — that's where NtSetEvent at canary idx=347 originates.
|
||
- ours tid=13 calls `sub_82452DC0` ONCE (fires once at cycle=7963 from lr=0x821CB1D0), executes
|
||
the eq-arm path, fires NtReleaseSemaphore at 0x82452F8C, then wedges in the NtWait at
|
||
`sub_821CB030+0x1AC` (0x821CB1DC) before `sub_821CB030` can return to `sub_821CBA08` and call
|
||
`sub_82453910`.
|
||
|
||
Recommended next iterate: **2.B (NtQueryFullAttributesFile arg/return capture)** or
|
||
**2.C (ctx-field read-probe)** to identify the upstream state that gates whether the wedge wait
|
||
ever gets signaled. The wedge itself was already correctly identified in AUDIT-069 S5: ours has
|
||
1 "other producer" vs canary's 25; the missing 24 producers are not present because their guest
|
||
state is downstream of the same tid=13 wedge (circular). The fix path traces to *what signals
|
||
event `d5e23609d3948568`* in canary that doesn't in ours.
|
||
|
||
## Step 1 — Candidate branch enumeration
|
||
|
||
### Initial pass (target fns only, depth 4)
|
||
|
||
Conditional branches inside `sub_821CB030`, `sub_821CBA08`, `sub_821CC3F8`, `sub_821C4EB0`
|
||
where taken-arm first call reaches NtSetEvent wrapper (`0x824AA2F0`) AND not-taken-arm reaches
|
||
NtReleaseSemaphore wrapper (`0x824AB158`), or vice versa, within ≤ 4 BBs per arm.
|
||
|
||
**Result: 0 candidate branches.**
|
||
|
||
This is the Step 1 pivot trigger from the plan — broaden search.
|
||
|
||
### Broadened pass (call-graph depth 6, arm reach depth 4)
|
||
|
||
Reachable function set: 736 functions.
|
||
|
||
**Result: 1 candidate branch.**
|
||
|
||
| PC | Function | Branch type | Taken arm first call | Not-taken arm first call | Set-event reach | Release-sem reach |
|
||
|---|---|---|---|---|---|---|
|
||
| `0x82452E0C` | `sub_82452DC0` | bc 12,4*cr6+eq | `0x8245B000` (eq) | `0x82452AB8` (ne) | not-taken | taken |
|
||
|
||
Disassembly context:
|
||
|
||
```
|
||
0x82452DF8 bl sub_82452200 ; r3 = sub_82452200(...)
|
||
0x82452DFC addis r11, r0, 0x41FF
|
||
0x82452E00 addi r28, r0, 0
|
||
0x82452E04 ori r23, r11, 0xFFFD
|
||
0x82452E08 cmpli cr6, 0, r3, 0x0
|
||
0x82452E0C bc 12, 4*cr6+eq, 0x82452E1C ; if r3==0 → eq-arm to 0x82452E1C
|
||
0x82452E10 or r24, r28, r28
|
||
0x82452E14 or r29, r3, r3
|
||
0x82452E18 b 0x82452E88 ; not-eq → NtSetEvent path
|
||
0x82452E1C ... ; eq → NtReleaseSemaphore path
|
||
```
|
||
|
||
CSV saved at `candidate-branches.csv`.
|
||
|
||
## Step 2 — Branch-probe both engines (cold boot)
|
||
|
||
Probe PCs: `0x82452E0C, 0x821CB1DC, 0x82452F10, 0x82452F8C, 0x82453910, 0x824539A4, 0x82452DC0`
|
||
|
||
Canary command (cold, 120 s wallclock):
|
||
```
|
||
wine xenia_canary_i2a.exe ".../Project Sylpheed (...).iso" \
|
||
--mute=true \
|
||
--audit_61_branch_probe_pcs="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
|
||
```
|
||
|
||
Ours command (cold, n=50M instructions):
|
||
```
|
||
xrs-i2a exec ".../Project Sylpheed (...).iso" -n 50000000 \
|
||
--branch-probe="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
|
||
```
|
||
|
||
### Canary fires (109 total)
|
||
|
||
Per-PC per-tid:
|
||
|
||
| PC | tid=6 | tid=17 | tid=18 | tid=26 |
|
||
|---|---|---|---|---|
|
||
| 0x82452DC0 (fn entry) | 11 | 16 | 15 | 2 |
|
||
| 0x82452E0C (branch) | 11 | 16 | 15 | 2 |
|
||
| 0x82452F10 (NtReleaseSem #1) | 1 | 8 | 5 | 1 |
|
||
| 0x82452F8C (NtReleaseSem #2) | 4 | 0 | 0 | 0 |
|
||
| 0x821CB1DC (wedge NtWait) | 0 | 1 | 0 | 1 |
|
||
| 0x82453910 (NtSetEvent helper entry) | 0 | 0 | 0 | 0 |
|
||
| 0x824539A4 (the branch INSIDE sub_82453910) | 0 | 0 | 0 | 0 |
|
||
|
||
Branch decision at 0x82452E0C (44 fires across all tids): **r3=0x00000000, cr6=..E (equal) 100%**.
|
||
|
||
Notable absences in canary's probe: `sub_82453910` entry never fires. The NtSetEvent at canary
|
||
tid=17 idx=347 in Step 2's timeline must therefore enter `sub_82453910` via a path NOT in our
|
||
probe set — meaning the canary tid=17 NtSetEvent at idx=347 is NOT from this iteration's
|
||
upstream chain at all; it's likely from a DIFFERENT call site under a different parent function.
|
||
|
||
### Ours fires (13 total BRANCH-PROBE lines)
|
||
|
||
| PC | tid=1 | tid=13 |
|
||
|---|---|---|
|
||
| 0x82452DC0 (fn entry) | 11 | 2 |
|
||
| 0x82452E0C | 0 | 0 |
|
||
| 0x82452F10 | 0 | 0 |
|
||
| 0x82452F8C | 0 | 0 |
|
||
| 0x821CB1DC (wedge NtWait) | 0 | 0 |
|
||
| 0x82453910 | 0 | 0 |
|
||
| 0x824539A4 | 0 | 0 |
|
||
|
||
**Ours's JIT only updates `ctx.pc` at BB boundaries, so interior PCs do not fire in
|
||
`fire_branch_probe_if_match` even when they should mathematically.** Only the function-entry
|
||
PC 0x82452DC0 (a JIT lookup target) fires.
|
||
|
||
However, **two of ours tid=13's `sub_82452DC0` entries have `lr=0x82452E64`** (return address
|
||
from the recursive `bl 0x82452DC0` at 0x82452E60), which is INSIDE the eq-taken arm of
|
||
0x82452E0C. This confirms ours's tid=13 entry into sub_82452DC0 from sub_821CB030 (cycle=7963,
|
||
lr=0x821CB1D0) then recursively re-entered (cycle=20030, lr=0x82452E64) — meaning ours took
|
||
the **same eq-arm direction** at 0x82452E0C that canary took.
|
||
|
||
## Step 3 — First divergent branch — NOT FOUND
|
||
|
||
The single candidate branch is **not divergent** at runtime. Both engines select the eq-arm
|
||
(r3=0 returned by `sub_82452200`) at 0x82452E0C in their first traversal.
|
||
|
||
This is the "broaden the search" pivot from the plan (`If 0, ... OR call-resolution heuristic
|
||
missed it`). The broadened search to depth 6 found 1 candidate, but that candidate is not
|
||
runtime-divergent.
|
||
|
||
## Step 4 / 5 — Re-attributing the divergence
|
||
|
||
The Step 2 report framed canary's `NtSetEvent` (idx=347) vs ours's `NtReleaseSemaphore`
|
||
(idx=429) as alternate arms of the same branch inside `sub_821CB030`'s chain. Re-analysis of
|
||
the source disasm + branch-probe data shows this is **incorrect**:
|
||
|
||
1. The only function on the chain reaching NtReleaseSemaphore is `sub_82450218` (called from
|
||
`sub_82452DC0` at 0x82452F10 and 0x82452F8C). Ours fires this once on tid=13 in iteration 1.
|
||
2. The only fns on the chain reaching NtSetEvent are `sub_8245FEB8`, `sub_82453910`,
|
||
`sub_82458A70`, `sub_8245D9D8` (from the reach analysis). Of these, `sub_82453910` is
|
||
directly called from `sub_821CBA08` at 0x821CBBF0 — but AFTER `sub_821CB030` returns. Ours
|
||
never reaches that line because `sub_821CB030` wedges on its NtWait at +0x1AC.
|
||
3. The canary tid=17 NtSetEvent at idx=347 is NOT from `sub_82453910` (whose entry probe at
|
||
0x82453910 fired 0× in our canary probe). It must be from one of the other 4 NtSetEvent
|
||
callers in the reach set, or from a `sub_82453910` *not on tid=17 at the moment of idx=347*
|
||
(idx is global, tid attribution requires per-event check — handled in the Step 2 csv).
|
||
|
||
The real divergence is **loop iteration count** of `sub_82452DC0` and its upstream caller
|
||
`sub_821CB030` / `sub_821CBA08`. Each iteration of canary tid=17's body calls
|
||
`bl 0x82452F8C → bl 0x824AB158` (NtReleaseSemaphore) and then waits at sub_821CB030+0x1AC,
|
||
which RETURNS quickly because a peer thread has already signaled. Ours's iteration-1 wait at
|
||
that PC never returns because the corresponding signaler never fires.
|
||
|
||
## Cause attribution
|
||
|
||
Per the plan's Step 5 framework, attribute to one of 3 candidate causes:
|
||
|
||
1. **NtQueryFullAttributesFile**: NOT directly evidenced by this iterate. The probe didn't
|
||
capture file-attribute returns.
|
||
2. **Shared CS-protected ctx field set by another tid**: STILL UNTESTED. ours's tid=13 wait
|
||
on event `d5e23609d3948568` depends on another tid signaling it. AUDIT-069 S5's
|
||
"25 producers vs 1" finding confirms ours has 24 missing peer-producers — meaning peer
|
||
tids in ours aren't reaching the signal call sites.
|
||
3. **Vtable**: NOT directly evidenced.
|
||
4. **Loop-count circular wedge (NEW)**: ours tid=13 wedges on first wait because peer
|
||
producers (themselves blocked downstream of tid=13's blocked work) never fire. The
|
||
originating peer producer is on canary tid=4/10/14 (per AUDIT-069 / step2 report), all of
|
||
which are alive in ours but doing different work (per AUDIT-069 S2: ours's tid=5 fires
|
||
γ-signalers 81× vs canary tid=10's 492× — ours is **under-producing** signals by ~84%).
|
||
|
||
This iterate's negative result on the branch-arm hypothesis sharpens the picture: the
|
||
divergence is NOT a single-branch lattice mismatch inside sub_821CB030's chain. **It's a
|
||
distributed multi-thread producer underrun**, with the wedge a downstream symptom of upstream
|
||
under-signaling on peer tids that ARE running in ours but executing a different (shorter)
|
||
trajectory.
|
||
|
||
## Tripstones honored
|
||
|
||
- **#28 (per-engine tid is not stable cross-engine identity)**: confirmed by Step 2 finding
|
||
that ours tid=13 ≡ canary tid=17 (same entry sub_821748F0). Branch-probe data uses
|
||
per-engine tid; cross-engine comparison done by (entry_pc, lr) tuple, not raw tid.
|
||
- **#32 (canary jitter in contention regions)**: not relevant here — both engines select eq
|
||
arm at 0x82452E0C 100% across all observed fires. No jitter.
|
||
- **#37 (vtable base vs slot-N)**: not encountered (no vtable read at 0x82452E0C).
|
||
- **#39 (composite progression vs matched-prefix)**: this iterate produces neither; an
|
||
informative null at the source-control-flow lens.
|
||
- **#40 (single-keystone hypothesis falsified before)**: Step 2's "single branch arm
|
||
divergence" framing was itself a candidate keystone. Falsified here.
|
||
|
||
## Cascade
|
||
|
||
- A (identify candidate branch PCs in DB): **PASS** with caveat. 1 candidate at depth 6.
|
||
Initial depth-4 scan returned 0 — pivot trigger fired.
|
||
- B (run both engines with branch-probe): **PASS**. 109 fires canary, 13 fires ours.
|
||
- C (find FIRST divergent branch in candidates): **NEGATIVE / informative null**. The single
|
||
candidate is not divergent (both engines take eq arm).
|
||
- D (attribute to one of 3 candidate causes): **MEDIUM**. Reframed as "loop-count
|
||
circular wedge with under-producing peer tids", which subsumes candidate 2 (shared CS-
|
||
protected ctx).
|
||
- E (recommend specific next iterate with LOC estimate): **PASS** (see below).
|
||
|
||
## Recommended next iterate
|
||
|
||
### Option 2.B — Args/return-value capture for NtQueryFullAttributesFile and key kernel APIs (~30–50 LOC canary)
|
||
|
||
Extend canary's Phase A event log to populate `args_resolved` and `return_value` for:
|
||
- `NtQueryFullAttributesFile`
|
||
- `NtCreateFile` (cache:\\<hash> paths)
|
||
- `NtReadFile`, `NtWriteFile`
|
||
|
||
Compare canary tid=17's 9 NtQueryFullAttributesFile invocations against ours tid=13's 1 to find
|
||
the first divergence in cache-state. Cheap, high signal.
|
||
|
||
### Option 2.C — read-probe on ours tid=13 wait event memory (~20 LOC reusing AUDIT-068 S3)
|
||
|
||
Use `audit_68_host_mem_read_probe` to sample event handle `d5e23609d3948568`'s underlying
|
||
KEVENT struct in ours, at 100 µs cadence over the 3 ms wedge window. Capture the moment
|
||
(if any) when its `Header.SignalState` would transition. Validates whether the kernel
|
||
plumbing is correct vs the producer is simply absent.
|
||
|
||
### Option 2.D — peer-producer LR trace (~0 LOC; reuses existing `--lr-trace` infra)
|
||
|
||
Per AUDIT-069 S5, ours has 1 producer where canary has 25. Use existing `--lr-trace` at the
|
||
NtReleaseSemaphore call site `0x82450310` + NtSetEvent wrappers on ALL tids in ours, capture
|
||
which guest LRs fire during the 0–3 ms window. Diff vs canary's audit_69_event_signal_watch
|
||
JSONL → find which peer-tid call site is MISSING in ours.
|
||
|
||
**Best minimum-LOC next step: 2.D** (zero LOC, existing instrumentation; capture peer-producer
|
||
absence directly).
|
||
|
||
**Best disambiguating step: 2.B** (~30–50 LOC) to pin upstream cache-state divergence.
|
||
|
||
## Honest assessment
|
||
|
||
- The 2-hour timebox was respected.
|
||
- Step 1 returned 0 candidates at initial depth; broadened to find 1; that 1 is non-divergent.
|
||
- The Step 2 report's source-level branch framing **does not survive contact with the
|
||
call-graph** at source-level. The control-flow divergence is at a higher level (loop count,
|
||
not branch arm).
|
||
- The wedge at `sub_821CB030+0x1AC` remains the symptom; the cause is the **absence of a
|
||
signaler on a peer tid in the 0–3 ms window**. That peer-tid absence is what 2.D would
|
||
directly identify.
|
||
- Confidence in pivoting to 2.D/2.B: **HIGH**.
|
||
|
||
## Artifacts produced
|
||
|
||
All under `xenia-rs/audit-runs/iterate-2A-branch-probe/`:
|
||
|
||
- `candidate-branches.csv` — Step 1 broadened search result (1 row).
|
||
- `canary-probe.stdout` / `.stderr` / `.lines` — canary 120 s cold run with branch probe.
|
||
- `ours-probe.stdout` / `.stderr` — ours `-n 50M` cold run with branch probe.
|
||
- `run-commands.txt` — exact CLIs used.
|
||
- `iterate2A-report.md` — this report.
|
||
|
||
LOC delta: 0 to engine code, 0 to canary code. Read-only investigation.
|
||
|
||
xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED. Both binaries (`xenia_canary_i2a.exe`,
|
||
`xrs-i2a`) are renamed copies; original binaries untouched.
|