# Phase D Stage 3 (+ Stage 4) — Contention Replay: Result **Date**: 2026-05-18 **Outcome**: **LANDED.** Stage 3 ours-side contention-replay infrastructure + Stage 4 diff-tool engine-local kind. Default-mode digest **byte-identical** to pre-Stage-3 baseline. Replay-mode digest stable × 3. Sister chains preserved. **Main matched-prefix at 104,607 — unchanged from baseline.** Stage 3 lands as infrastructure; it does not unblock the cap on its own because the 104,607 divergence is **upstream** of any `contention.observed` event in the canary trace. ## Engine source change | file | LOC | purpose | |---|---|---| | [xenia-rs/crates/xenia-kernel/src/contention_manifest.rs](../../crates/xenia-kernel/src/contention_manifest.rs) | **new file, 280** | manifest loader + `(tid, idx) → Entry` HashMap behind Mutex + `consume_at_peek(tid, peek_idx)` with per-tid emit-count translation back to canary's idx space + 12 unit tests | | [xenia-rs/crates/xenia-kernel/src/lib.rs](../../crates/xenia-kernel/src/lib.rs) | +1 | `pub mod contention_manifest;` | | [xenia-rs/crates/xenia-kernel/src/event_log.rs](../../crates/xenia-kernel/src/event_log.rs) | +35 | `object_type::CRITICAL_SECTION = 0x0C` enum value + `emit_contention_observed(tid, guest_cycle, cs_ptr, contended)` helper (mirrors canary's `EmitContentionObserved`) | | [xenia-rs/crates/xenia-kernel/src/state.rs](../../crates/xenia-kernel/src/state.rs) | +20 | `KernelState.contention_manifest: Option>` field + `install_contention_manifest()` setter | | [xenia-rs/crates/xenia-kernel/src/exports.rs](../../crates/xenia-kernel/src/exports.rs) | +75 | `rtl_enter_critical_section` body consults the manifest via `consume_at_peek`, emits `contention.observed` on hit, falls through to natural code (conservative deadlock-safe mode); aggressive mode behind `XENIA_CONTENTION_AGGRESSIVE=1` env var (no-go per probe — see below) | | [xenia-rs/crates/xenia-app/src/main.rs](../../crates/xenia-app/src/main.rs) | +30 | `XENIA_CONTENTION_MANIFEST_PATH` env-var loader + tracing log on success/failure | | **Engine total** | **~440 LOC additive across 6 files** | | | diff-tool file | LOC | purpose (Stage 4 bundled) | |---|---|---| | [xenia-rs/tools/diff-events/diff_events.py](../../tools/diff-events/diff_events.py) | +30 | new `ENGINE_LOCAL_KINDS = {"contention.observed"}` set + skip branches in `diff_one_tid` so per-tid pointer advances past these events on EITHER side without comparison | | [xenia-rs/tools/diff-events/test_diff_events.py](../../tools/diff-events/test_diff_events.py) | +65 | 2 new tests covering both-sides and one-sided engine-local skip | | [xenia-rs/tools/diff-events/build_contention_manifest.py](../../tools/diff-events/build_contention_manifest.py) | +45 | added `--tid-map CANARY=OURS,...` translation so the manifest stores ours-side tids (so Stage-3 lookup keys on the consumer's native current_tid) | | [xenia-rs/tools/diff-events/test_build_manifest.py](../../tools/diff-events/test_build_manifest.py) | +50 | 2 new tests covering tid-map translation + unmapped-tid drop | | **Tooling total** | **~190 LOC additive** | | ## Tests - `cargo test -p xenia-kernel --lib`: **216 PASS** (was 213, +3 new tests in contention_manifest: `consume_at_peek_translates_idx`, `consume_at_peek_miss_does_not_bump_emit_count`, `consume_at_peek_per_tid_independent` — plus 9 earlier ones covering load, consume, peek, version-check, missing-fields-error, cs_ptr-parse). - `cargo test -p xenia-cpu --lib`: 300 PASS (unchanged from Stage 0). - `python3 test_build_manifest.py`: **11 PASS** (was 9, +2 for tid-map). - `python3 test_diff_events.py`: all pre-existing PASS + 2 new (`test_engine_local_contention_observed_skipped_both_sides`, `test_engine_local_one_sided_contention_observed`). ## Cold-run validation ### Gate 1: default-mode digest byte-identical to Stage 0 baseline Without `XENIA_CONTENTION_MANIFEST_PATH` set: ``` $ XENIA_CACHE_WIPE=1 xenia-rs-stage3 exec --phase-a-event-log ours.jsonl \ -n 50000000 --quiet Sylpheed.iso $ det_digest.py ours.jsonl det_fields_md5 = ba5b5e0795ccb32966a49d3b2917a30d <-- same as Stage 0 baseline total_events = 121569 ``` ✓ Stage 3's manifest-check fast-path costs ≈ one `Option::as_ref().and_then(…)` per `rtl_enter_critical_section` call. Default-mode behavior preserved bit-for-bit. Phase B `image_loaded_sha256 = ea8d160e…` UNCHANGED. ### Gate 2: replay-mode digest stable × 3 With manifest installed (807 entries → 284 entries after tid-map filter): | run | digest | total_events | |---|---|---| | 1 | `1d7c6b4592d024405cd9d86eb79f5307` | 121571 | | 2 | `1d7c6b4592d024405cd9d86eb79f5307` | 121571 | | 3 | `1d7c6b4592d024405cd9d86eb79f5307` | 121571 | ✓ Bit-stable × 3 under replay. New digest is expected: the 2 contention.observed emits shift per-tid idx values by +2 starting at idx 102,788, producing a provably different (but deterministic) byte sequence. ### Gate 3: replay-mode matched-prefix vs canary cvar-ON ``` $ python3 diff_events.py \ --canary canary-cvaron-trunc.jsonl \ --ours stage3-replay.jsonl \ --tid-map 6=1,7=2,4=11,12=7,14=9,15=10 | canary_tid | ours_tid | matched | first_divergence_at | |---|---|---|---| | 4 | 11 | 11 | — | | 6 | 1 | 104607| 104607 | | 7 | 2 | 32 | — | | 12| 7 | 4 | 4 | | 14| 9 | 41 | 41 | | 15| 10 | 16 | — | ``` ✓ Main matched-prefix **104,607 — same as pre-Phase-D baseline.** Sister chains preserved (11/32/4/41/16, identical to C+22 baseline). Stage 3 does not break the prefix; nor does it advance it. ## Why the cap isn't unblocked The 104,607 divergence is at canary's tid=6 idx 104,610 (nested RtlEnter) vs ours's tid=1 idx 104,608 (RtlLeave). Both engines completed the **outer** RtlEnter at idx 104,608/104,606 with `return_value=0` and no `contention.observed` event. The first canary contention is at idx **104,664** — AFTER the cap divergence, on a DEEPER RtlEnter call further into the same control-flow branch ours never enters. In other words: **the cap is upstream of any contention.** Replaying canary's contention.observed events at 102,788 and 104,664 happens either too early (102,788 — way before the cap) or too late (104,664 — ours diverged at 104,607 and never reaches that ordinal in the same logical position). The manifest mechanism is correctly aligned to canary's contention events; the cap simply isn't a contention event. The 104,610 nested RtlEnter in canary vs the RtlLeave in ours is guest-code-driven: same PPC code, same outer-Enter return value (0), different next-call decision. Likely cause: some other guest memory state (not the CS struct itself) has diverged between canary and ours upstream of idx 104,610, and the guest's branch decision reads that state. That's a state-divergence root cause, NOT scheduling-determinism. ## Aggressive mode probe (XENIA_CONTENTION_AGGRESSIVE=1) Tested behind an env-var gate to confirm: forcing the park *unconditionally* when the manifest hits (even when CS is free in guest memory) **regresses** the trace catastrophically. - main matched-prefix: 102,789 (-1,818 vs baseline) - ours_total events: 1,019,208 (-vs- 121,569 default; 8× ballooned) - Sister chains: 4 of 5 entirely absent (tid=11/7/9/10 produced zero events) - Cause: tid=1 force-parks on a free CS at idx 102,788. No peer touches the CS during the wait. `Scheduler::unblock_on_deadlock` eventually recovers with `owner=0`, but downstream guest state is now corrupted (the RtlEnter returned with owner unset). The other tids never reach the spawn point because tid=1 was supposed to drive their setup. Aggressive mode is gated off by default (`XENIA_CONTENTION_AGGRESSIVE=1` explicit opt-in). Conservative mode is the landed default. ## What the manifest *did* observe Per the run log (debug level): ``` manifest cs_ptr cross-engine divergence at tid=1 idx=102788: manifest 0xbc65c890, ours 0x40544890 (allocator ε) manifest hit at tid=1 idx=102788 cs=0x40544890 but CS is free/self-owned (owner=0); replay skipped (state-divergence, not schedule-divergence) manifest cs_ptr cross-engine divergence at tid=1 idx=104664: manifest 0xbc65c890, ours 0x828f39d0 (allocator ε) manifest hit at tid=1 idx=104664 cs=0x828f39d0 but CS is free/self-owned (owner=0); replay skipped (state-divergence, not schedule-divergence) ``` Two hits fired, both at the right ordinal. Both fell into the conservative "skip" branch because: 1. The cs_ptr canary recorded (`0xbc65c890`) and the cs_ptr ours sees (`0x40544890` / `0x828f39d0`) differ. This is the AUDIT-043 allocator ε divergence — we don't gate on it (we trust the `(tid, idx)` alignment), but it's logged. 2. In ours's guest memory the CS owner is 0 (free) — no peer is holding the lock. Force-parking here would deadlock; the conservative branch skips and falls through to the natural fast-path. ## Reading-error class No new class. Existing protocols: - **#28** verify source first — read `rtl_enter_critical_section`, `event_log.rs`, `KernelState` end-to-end before editing. - **#32** canary contention jitter — handled at the manifest layer (per-tid + idx key, no ordinal hardcoding). - **AUDIT-043** allocator ε — manifest cs_ptr (canary heap) ≠ ours cs_ptr (different heap); the diff tool handles this for `kernel.return` via per-tid allocator ordinal canonicalization. The Stage-3 manifest doesn't have that translation but doesn't need it: matching on `(tid, idx)` is sufficient because both engines emit the RtlEnter at the same per-tid ordinal within the matched prefix. ## Artifacts - [contention_manifest.json](../phase-d-stage1/contention_manifest.json) — 284 entries (tid-map applied) - `/tmp/stage3-conservative-r{1,2,3}.jsonl` — replay-mode cold runs (digest `1d7c6b45…` × 3) - `/tmp/stage3-default-r1.jsonl` — default-mode cold run (digest `ba5b5e07…`) - `/tmp/stage3-aggressive-r1.jsonl` — aggressive-mode probe (-1,818 prefix; do NOT use as baseline) ## Post-landing divergence forensics (root-cause clue) Trace inspection of the divergent region in BOTH engines: **Ours's tid=1 idx 104,604..104,613**: ``` import.call RtlEnter → kernel.return → import.call RtlLeave → kernel.return → import.call NtClose → handle.destroy {SID: f02c5bda6f21992e, raw: 0x1068, prior_refcount: 1} ``` **Canary's tid=6 idx 104,607..104,622**: ``` import.call RtlEnter → kernel.return → import.call RtlEnter → kernel.return (NESTED) import.call RtlLeave → kernel.return (inner) import.call RtlLeave → kernel.return (outer) import.call NtClose → handle.destroy on the SAME logical Event ``` The handle being closed is an **Event** (object_type=1). Its lineage in ours's trace: | idx | host_ns | event | |---|---|---| | 104,387 | 515.2ms | `handle.create` Event 0x1068 | | 104,572 | 516.4ms | `wait.begin` tid=1 on 0x1068, `timeout_ns=-1` (indefinite) | | 104,612 | 519.7ms | `handle.destroy` 0x1068 | The Event is signaled between the wait.begin and the destroy. Scanning ours's trace for the signaler during the wait window (516.4-519.7ms) found: **tid=5 calls `NtSetEvent` at host_ns 519.3ms** — just ~2-3ms before tid=1 wakes and proceeds to the Enter/Leave/Close sequence. So the divergence picture: 1. tid=1 creates Event E (notification primitive). 2. tid=1 blocks on E. 3. tid=5 (or another peer) signals E via `NtSetEvent`. 4. tid=1 wakes, acquires CS, **checks some queue/list protected by the CS**, optionally does **nested cleanup work**, releases CS, closes E. 5. The "optionally does nested work" branch **fires only in canary** because canary's peer tids have produced more work items by the time tid=1 wakes. Ours's peer tids produce fewer items in the same wall-time window. **Conclusion**: the 104,607 cap is a **workload-interleaving / state-accumulation divergence**, not a scheduling-determinism one. Stage 3's contention replay doesn't address it because the contention canary observes (at idx 104,664) is INSIDE the nested- cleanup branch ours never enters. This is C+22's "state-mutation-during-wait" hypothesis confirmed at the trace level: peer tids in canary mutate more shared state during the wait window than peer tids in ours do. ## Decision Stage 3 + 4 LAND as infrastructure. The 104,607 cap is **NOT** unblocked by this work alone because the divergence is upstream of any contention. Next steps (in priority order): 1. **Stage 5** — per-CS hardcoded yield. The plan's fallback: hardcode a yield (NOT a park) at a specific CS pointer / call site near idx 104,610 in ours to shift the scheduling enough that some other tid runs and mutates state, hopefully advancing the prefix. ~30 LOC, narrow scope. 2. **State-divergence root-cause investigation** — disassemble guest code at the call site of idx 104,610 to identify what state the guest is reading to decide nested-Enter vs Leave. Likely some shared variable/refcount mutated by another tid in canary but not in ours. 3. **D-extension** — extend the diff tool's `wait.begin` absorber to also fold the post-acquire `E E L L` nested-cleanup block when followed by the matching `E L NtClose` pattern. The plan tags this as a band-aid crossing reading-error #23. ## Phase B image hash `image_loaded_sha256 = ea8d160e…` — UNCHANGED. ## Next session **Stage 5 OR state-divergence investigation**, per the user's call.