# Canary Variance Characterization — Reading-Error #32 ## Source data Re-analysis of the C+22 archived jitter jsonls + ours-cold.jsonl from `xenia-rs/audit-runs/phase-c22-rtl-enter-leave-control-flow/`. No fresh runs done in this session — the C+22 samples (4 canary cold runs + 1 ours cold run) are sufficient to characterize. ## Files inspected - `canary-cold-trunc.jsonl` (494 MB, truncated to ~250k tid=6 events) — fresh c22 - `ours-cold.jsonl` (28 MB, 121,569 events) - Archived: jitter-1, jitter-2, jitter-3 (referenced in C+22 memory + `investigation.md`) - `cold-vs-cold-result.md` — variance table ## Variance summary at tid=6 idx 104,604..104,620 Pattern of `import.call` events (E = RtlEnterCriticalSection, L = RtlLeaveCriticalSection): | sample | observed pattern | wait.begin slow-path? | notes | |---|---|---|---| | C+21 archived (jitter-2 equivalent) | E E L L | no | fast-path acquire, fast-path nested-acquire, two releases | | canary jitter-1 | E **wait.begin** E L L | yes (between first E's call and return) | slow-path on the OUTER acquire | | canary jitter-2 | E E L L | no | same as C+21 | | canary jitter-3 | E E L L (shifted by +3 indices upstream) | no | upstream tid=6 events have different ordering | | fresh c22 | E **wait.begin** E L L | yes | same shape as jitter-1 | | **ours cold** | **E L NtClose** | no | NO nested acquire; releases and proceeds to close | ## Key observations 1. **Canary 5/5 samples** have the second (nested) `E` regardless of whether the outer acquire took the slow path. The nested-Enter is canary-structural, not jitter. 2. **wait.begin presence varies**: 2 of 5 canary samples emit it, 3 of 5 don't. The C+21 floating absorber correctly masks both cases via the shared-global SID `75ae880ec432eb36`. 3. **Ours-cold takes a different control-flow path**: no second E, no nested cleanup, proceeds straight to NtClose. This is `RtlLeaveCriticalSection` followed by `NtClose` on the Event handle that the CS was protecting. 4. The C+21 floating-absorb engages correctly in all canary samples (`floating_create (c/o) = 1/0`, `floating_wait (c/o)` varies 0-3/0). Matched-prefix is invariant at 104,607 across all canary cold samples after absorption. ## The structural divergence After the C+21 absorber runs, the next event index on each side is: - **Canary**: `import.call RtlEnterCriticalSection` (the nested second E at canary idx 104,610, post-absorption-aligned to ours idx 104,607). - **Ours**: `import.call RtlLeaveCriticalSection` (the simple release at ours idx 104,607). These are different guest control-flow paths. Both are correct executions of the SAME guest code under different scheduling assumptions: - **Canary path**: tid=6 blocked on the dispatcher Event while another guest thread acquired the CS, mutated protected state (queue ptr / refcount / signaled flag), released, transferred the CS to tid=6. tid=6 woke, post-acquire branch reads MUTATED state, takes nested-cleanup path. - **Ours path**: tid=1 (mapped from canary tid=6) was running monolithically under the cooperative scheduler. No other thread ran during what would have been the wait window. Post-acquire branch reads PRE-WAIT state (unchanged), takes simple-release path. ## Variance taxonomy | variance dimension | observable | absorbable by current diff tool? | root cause | |---|---|---|---| | Whether wait.begin event fires | yes (event present/absent) | YES (C+21 absorber, shared-global SID) | host-OS scheduler decided contention/no-contention timing | | Index offset in upstream events | yes (idx shifts ±3 across samples) | partial (C+21 absorbs ≤1 floating per side) | upstream contention propagates index drift | | Whether nested Enter/Leave block fires | yes (E E L L vs E L) | NO (would cross reading-error #23) | post-wait state mutation by another thread; real guest control-flow | | First-toucher tid for shared dispatcher | yes (varies tid=9, others) | YES (C+18 shared-global SID scheduling-invariant) | host-OS scheduler decided first-thread-touches-dispatcher | | handle.create raw_handle_id | yes (differs across runs) | YES (SKIP_PAYLOAD_FIELDS) | canary stashes handle-table slot; ours uses dispatcher VA | | KeQuerySystemTime returned value | yes (wallclock vs fixed) | partial (already-known void-export pattern from C+1) | canary wallclock vs ours fixed FILETIME | ## What this means for the plan The C+21 absorber handles the *observation-side* jitter (the wait.begin event itself; the upstream index drift) up to the boundary of reading-error #23. Past 104,607, the variance becomes *state-side*: canary's tid=6 reads mutated protected state, ours's tid=1 doesn't. No event-level absorption can hide a different sequence of guest-code-executed instructions. This is why the plan recommends approach H' (manifest replay): make ours produce the same state-side outcome (mutated CS state after a real wait) so that ours's tid=1 takes the same nested-cleanup path canary's tid=6 takes. The absorber stays unchanged; ours's events become structurally identical to canary's. ## Fresh re-runs not performed This session is plan-only — no fresh `wine xenia_canary --mute=true` cold runs. The C+22 jitter-1/2/3 + c21 + c22 samples are sufficient to characterize variance for plan-design purposes. Fresh re-runs will happen during Stage 0 spike and Stage 1 implementation per the validation criteria in `plan.md`. ## Reading-error #32 status **MITIGATED** at the diff-tool layer for shared-global SIDs (C+18) and wait.begin (C+21). Residual variance at 104,607 is OUT of #32 scope — it's state-mutation timing, addressed by the plan's Stage 3 forced-contention replay.