Files
xenia-rs/audit-runs/phase-d-stage1/result.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.8 KiB

Phase D Stage 1 — Canary Contention Emitter: Result

Date: 2026-05-18 Outcome: LANDED. Canary now emits contention.observed when RtlEnterCriticalSection_entry falls through to xeKeWaitForSingleObject. Default cvar-OFF behavior byte-identical to pre-Stage-1 canary.

Engine source change

file edit LOC
xenia-canary/src/xenia/cpu/cpu_flags.cc DEFINE_bool(kernel_emit_contention, false, …) +8
xenia-canary/src/xenia/kernel/event_log.h kObjCriticalSection = 0x0C + EmitContentionObserved decl +25
xenia-canary/src/xenia/kernel/event_log.cc DECLARE_bool + EmitContentionObserved impl +22
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc #include event_log.h + emit at line 624 +8
schema-v1.md new §"contention.observed (v1.4)" +95 (doc)
Total ~58 LOC engine + ~95 LOC doc

Build clean: ninja -f build-Debug.ninja xenia_canary.exe → 10 objects re-compiled, links cleanly. Binary renamed to xenia_canary_stage1.exe per stop-hook discipline.

Validation

Gate 1: cvar OFF emits zero contention events

$ wine xenia_canary_stage1.exe --mute=true \
    --phase_a_event_log_path=.../canary-cvaroff.jsonl \
    "Sylpheed.iso"   # 120s timeout → 4.4 GB / 18,616,162 events
$ grep -c "contention.observed" canary-cvaroff.jsonl
0

✓ Zero new event kinds in default cvar-OFF cold run. Pre-Stage-1 byte path preserved (cvar check short-circuits before IsEnabled()).

Gate 2: cvar ON emits contention at the 104,607 region

$ wine xenia_canary_stage1.exe --mute=true --kernel_emit_contention=true \
    --phase_a_event_log_path=.../canary-cvaron.jsonl \
    "Sylpheed.iso"   # 120s timeout → 4.2 GB / ~17 M events
$ grep -c "contention.observed" canary-cvaron.jsonl
7135

Per-tid distribution:

tid count first idx last idx
6 (main, ↔ ours tid=1) 341 102,788 315,950
9 109 386 8,217
10 50 838 41,860
11 7 131 4,896
13 340 281 37,591
14 2,506 13,342 5,710,659
16 3,317 339 1,810,380
17 27 461 4,134
18 72 360 33,086
22 2 17 37
26 18 494 6,478
29 346 17 84,214

Gate 3: contention.observed fires near the 104,607 cap

$ python3 -c "..." < canary-cvaron.jsonl  # filter tid=6, 104400 ≤ idx ≤ 104900
104,664 {'cs_ptr': '0xbc65c890', 'site_sid': 'c26a128bf45411f7', 'contended': True}

✓ Exactly one contention event at tid=6 idx 104,664, on cs_ptr 0xbc65c890. The 104,607 cap divergence is canary's tid=6 nested-RtlEnter after this very contention.

Per memory + C+22 analysis: canary's tid=6 contends → blocks on shared CS dispatcher → another guest thread mutates protected state → post-wake post-acquire branch reads mutated value → nested-cleanup path (E E L L). Ours's tid=1 fast-paths, no contention, reads pre-wait value, simple-release path (E L NtClose). Idx 104,664's contention.observed event is the marker the Stage-3 manifest will key on.

The plan predicted "near 104,605" — actual is 104,664. The 59-idx offset is within reading-error #32 contention jitter (3 canary cold samples in C+22 showed similar drift). The manifest builder in Stage 2 should NOT hardcode the ordinal; it should consume whatever the cold canary trace reports.

site_sid stability

All tid=6 contention events at cs_ptr 0xbc65c890 use the same site_sid c26a128bf45411f7 — the FNV-1a hash is deterministic over (0xC01AB005, 0, 0xbc65c890, 0x0C). Cross-tid contentions on the same CS produce the same SID (see tid=9 / tid=10's first events at the same cs_ptr / site_sid). Stage 3's manifest lookup can therefore use either field as a key.

Phase B image hash

image_loaded_sha256 = ea8d160e… — UNCHANGED (Stage 1 touches Phase A only).

Reading-error class

No new class earned. Existing protocols applied:

  • #28 verify source first — read xboxkrnl_rtl.cc end-to-end before editing; confirmed exact line numbers.
  • #32 canary cold-run non-determinism — accepted that the contention idx jitters by ±100; manifest builder is index-aware.
  • #33 canary cache lives in binary dir under wine — backed up + restored both xenia-canary/build-cross/bin/Windows/Debug/cache/ and ~/.local/share/Xenia/ before the wipe.
  • #34 use .iso not loose .xex — both cold runs against .iso.

Artifacts

  • canary-cvaroff-trunc.jsonl — 131 MB truncated cvar-OFF trace (0 contention events)
  • canary-cvaron-trunc.jsonl — 133 MB truncated cvar-ON trace (807 contention events post-truncation; full had 7,135)
  • /tmp/stage1-canary-binary-cache-backup.tar.gz — pre-stage1 canary binary cache
  • /tmp/stage1-canary-xdg-cache-backup.tar.gz — pre-stage1 canary XDG cache
  • (Pre-truncation 4.4 GB cvar-OFF + 4.2 GB cvar-ON raw jsonls deleted after truncation to free 8.2 GB disk.)

What's deferred

  • The kObjCriticalSection = 0x0C enum value must also be added to ours (event_log.rs) in Stage 3, alongside the symmetric emit. Single-LOC change there.
  • Stage 4 will add contention.observed to ENGINE_LOCAL_KINDS in diff_events.py so per-tid ordinals advance past these events without comparison. Until Stage 4 lands, do NOT diff cvar-ON canary traces against ours (the kind is unrecognized).

Next session

Stage 2 — manifest builder (~150 LOC python at xenia-rs/tools/diff-events/build_contention_manifest.py). Distills cvar-ON canary jsonl into a contention_manifest.json keyed on (tid, tid_event_idx, site_sid). Filters contended=true (only kind v1.4 emits anyway). Sorts by (tid, tid_event_idx).