Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.8 KiB
Phase D Stage 1 — Canary Contention Emitter: Result
Date: 2026-05-18
Outcome: LANDED. Canary now emits contention.observed when
RtlEnterCriticalSection_entry falls through to xeKeWaitForSingleObject.
Default cvar-OFF behavior byte-identical to pre-Stage-1 canary.
Engine source change
| file | edit | LOC |
|---|---|---|
| xenia-canary/src/xenia/cpu/cpu_flags.cc | DEFINE_bool(kernel_emit_contention, false, …) |
+8 |
| xenia-canary/src/xenia/kernel/event_log.h | kObjCriticalSection = 0x0C + EmitContentionObserved decl |
+25 |
| xenia-canary/src/xenia/kernel/event_log.cc | DECLARE_bool + EmitContentionObserved impl |
+22 |
| xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc | #include event_log.h + emit at line 624 |
+8 |
| schema-v1.md | new §"contention.observed (v1.4)" | +95 (doc) |
| Total | ~58 LOC engine + ~95 LOC doc |
Build clean: ninja -f build-Debug.ninja xenia_canary.exe → 10 objects re-compiled, links cleanly. Binary renamed to xenia_canary_stage1.exe per stop-hook discipline.
Validation
Gate 1: cvar OFF emits zero contention events
$ wine xenia_canary_stage1.exe --mute=true \
--phase_a_event_log_path=.../canary-cvaroff.jsonl \
"Sylpheed.iso" # 120s timeout → 4.4 GB / 18,616,162 events
$ grep -c "contention.observed" canary-cvaroff.jsonl
0
✓ Zero new event kinds in default cvar-OFF cold run. Pre-Stage-1 byte path preserved (cvar check short-circuits before IsEnabled()).
Gate 2: cvar ON emits contention at the 104,607 region
$ wine xenia_canary_stage1.exe --mute=true --kernel_emit_contention=true \
--phase_a_event_log_path=.../canary-cvaron.jsonl \
"Sylpheed.iso" # 120s timeout → 4.2 GB / ~17 M events
$ grep -c "contention.observed" canary-cvaron.jsonl
7135
Per-tid distribution:
| tid | count | first idx | last idx |
|---|---|---|---|
| 6 (main, ↔ ours tid=1) | 341 | 102,788 | 315,950 |
| 9 | 109 | 386 | 8,217 |
| 10 | 50 | 838 | 41,860 |
| 11 | 7 | 131 | 4,896 |
| 13 | 340 | 281 | 37,591 |
| 14 | 2,506 | 13,342 | 5,710,659 |
| 16 | 3,317 | 339 | 1,810,380 |
| 17 | 27 | 461 | 4,134 |
| 18 | 72 | 360 | 33,086 |
| 22 | 2 | 17 | 37 |
| 26 | 18 | 494 | 6,478 |
| 29 | 346 | 17 | 84,214 |
Gate 3: contention.observed fires near the 104,607 cap
$ python3 -c "..." < canary-cvaron.jsonl # filter tid=6, 104400 ≤ idx ≤ 104900
104,664 {'cs_ptr': '0xbc65c890', 'site_sid': 'c26a128bf45411f7', 'contended': True}
✓ Exactly one contention event at tid=6 idx 104,664, on cs_ptr 0xbc65c890. The 104,607 cap divergence is canary's tid=6 nested-RtlEnter after this very contention.
Per memory + C+22 analysis: canary's tid=6 contends → blocks on shared CS dispatcher → another guest thread mutates protected state → post-wake post-acquire branch reads mutated value → nested-cleanup path (E E L L). Ours's tid=1 fast-paths, no contention, reads pre-wait value, simple-release path (E L NtClose). Idx 104,664's contention.observed event is the marker the Stage-3 manifest will key on.
The plan predicted "near 104,605" — actual is 104,664. The 59-idx offset is within reading-error #32 contention jitter (3 canary cold samples in C+22 showed similar drift). The manifest builder in Stage 2 should NOT hardcode the ordinal; it should consume whatever the cold canary trace reports.
site_sid stability
All tid=6 contention events at cs_ptr 0xbc65c890 use the same site_sid c26a128bf45411f7 — the FNV-1a hash is deterministic over (0xC01AB005, 0, 0xbc65c890, 0x0C). Cross-tid contentions on the same CS produce the same SID (see tid=9 / tid=10's first events at the same cs_ptr / site_sid). Stage 3's manifest lookup can therefore use either field as a key.
Phase B image hash
image_loaded_sha256 = ea8d160e… — UNCHANGED (Stage 1 touches Phase A only).
Reading-error class
No new class earned. Existing protocols applied:
- #28 verify source first — read
xboxkrnl_rtl.ccend-to-end before editing; confirmed exact line numbers. - #32 canary cold-run non-determinism — accepted that the contention idx jitters by ±100; manifest builder is index-aware.
- #33 canary cache lives in binary dir under wine — backed up + restored both
xenia-canary/build-cross/bin/Windows/Debug/cache/and~/.local/share/Xenia/before the wipe. - #34 use
.isonot loose.xex— both cold runs against.iso.
Artifacts
- canary-cvaroff-trunc.jsonl — 131 MB truncated cvar-OFF trace (0 contention events)
- canary-cvaron-trunc.jsonl — 133 MB truncated cvar-ON trace (807 contention events post-truncation; full had 7,135)
/tmp/stage1-canary-binary-cache-backup.tar.gz— pre-stage1 canary binary cache/tmp/stage1-canary-xdg-cache-backup.tar.gz— pre-stage1 canary XDG cache- (Pre-truncation 4.4 GB cvar-OFF + 4.2 GB cvar-ON raw jsonls deleted after truncation to free 8.2 GB disk.)
What's deferred
- The
kObjCriticalSection = 0x0Cenum value must also be added to ours (event_log.rs) in Stage 3, alongside the symmetric emit. Single-LOC change there. - Stage 4 will add
contention.observedtoENGINE_LOCAL_KINDSindiff_events.pyso per-tid ordinals advance past these events without comparison. Until Stage 4 lands, do NOT diff cvar-ON canary traces against ours (the kind is unrecognized).
Next session
Stage 2 — manifest builder (~150 LOC python at xenia-rs/tools/diff-events/build_contention_manifest.py). Distills cvar-ON canary jsonl into a contention_manifest.json keyed on (tid, tid_event_idx, site_sid). Filters contended=true (only kind v1.4 emits anyway). Sorts by (tid, tid_event_idx).