Files
xenia-rs/audit-runs/phase-d-stage2/result.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.5 KiB
Raw Blame History

Phase D Stage 2 — Manifest Builder: Result

Date: 2026-05-18 Outcome: LANDED. Python builder distills Stage-1's cvar-ON canary JSONL into a replay-ready contention_manifest.json. 9/9 unit tests pass.

Source change

file LOC purpose
xenia-rs/tools/diff-events/build_contention_manifest.py 175 the builder itself (parser + filter + sort + dedupe + sha256 + summary)
xenia-rs/tools/diff-events/test_build_manifest.py 240 9 tests: basic extract, kind filter, contended=false filter, sort by (tid,idx), dedupe, missing-field skip, bad-json skip, summary rendering, empty input
Total ~415 LOC tooling, zero engine LOC

Engine source UNCHANGED. This is pure Python tooling.

Manifest schema (v1)

{
  "version": 1,
  "source_canary_jsonl": "<absolute path>",
  "source_canary_sha256": "<hex>",
  "built_at_host_unix": <int>,
  "summary": {
    "total_input_events": <int>,
    "total_contention_events_kept": <int>,
    "skipped_bad_lines": <int>,
    "skipped_duplicate_keys": <int>,
    "per_tid_counts": { "<tid>": <int>, ... }
  },
  "entries": [
    { "tid": <int>, "tid_event_idx": <int>, "site_sid": "<hex16>",
      "cs_ptr": "0xHHHHHHHH", "contended": true },
    ...
  ]
}

Entries sorted by (tid asc, tid_event_idx asc). Stage 3's ours-side loader keys on (tid, tid_event_idx) for O(1) lookup. The site_sid field is the cross-engine identity (C+18 shared-global recipe); the cs_ptr is the guest VA (also identical across engines because the guest manages the struct).

Tests

$ python3 xenia-rs/tools/diff-events/test_build_manifest.py
PASS test_basic_extract
PASS test_filters_non_contention_kinds
PASS test_filters_contended_false
PASS test_sorts_by_tid_then_idx
PASS test_deduplicates_same_tid_idx
PASS test_skips_missing_fields
PASS test_handles_bad_json_lines
PASS test_render_summary_human_readable
PASS test_empty_input_yields_zero_kept
ALL 9 TESTS PASS

End-to-end run against Stage 1's cvar-ON trace

$ python3 build_contention_manifest.py \
    --canary-jsonl xenia-rs/audit-runs/phase-d-stage1/canary-cvaron-trunc.jsonl \
    --out          xenia-rs/audit-runs/phase-d-stage1/contention_manifest.json
contention manifest built from <abs path>
  source sha256:                80b9b1901c6b95461d7702c1923f79c44e34778d1f716431d0a8ce99f5945115
  total input events scanned:   569,360
  contention events kept:       807
  bad/skipped lines:            0
  duplicate (tid,idx) skipped:  0
  per-tid counts:
    tid=   6  276    <- main, ↔ ours tid=1
    tid=   9  109
    tid=  10  34
    tid=  11  7
    tid=  13  180
    tid=  14  8
    tid=  16  35
    tid=  17  27
    tid=  18  22
    tid=  22  2
    tid=  26  18
    tid=  29  89

Manifest file size: 122,846 bytes (122 KB), trivial to load in Stage 3.

Critical entry preserved

The plan calls for the manifest to include a (tid=6, idx≈104,605) entry near the 104,607 cap. Verified:

>>> hit = [e for e in manifest['entries']
...         if e['tid']==6 and e['tid_event_idx']==104664]
>>> hit
[{'tid': 6, 'tid_event_idx': 104664, 'site_sid': 'c26a128bf45411f7',
  'cs_ptr': '0xbc65c890', 'contended': True}]

This is the entry Stage 3's rtl_enter_critical_section will key on: when ours's tid=1 reaches per-tid ordinal 104,664 on a CS at 0xbc65c890, force a park via BlockReason::CriticalSection.

Manifest distribution observations

20 distinct cs_ptrs / 20 distinct site_sids (1-to-1, as expected from the deterministic SID recipe). The first 3 tid=6 entries all target the same CS 0xbc65c890:

idx cs_ptr
102,788 0xbc65c890
104,664 0xbc65c890
106,368 0xbc65c890

So 0xbc65c890 is a hot CS contended 3 times across tid=6's first ~106k events. The last 3 tid=6 entries are on DIFFERENT CSes:

idx cs_ptr
248,056 0xbca44fc8
249,124 0xbccc5508
249,671 0xbccc5508

Manifest is therefore non-trivial — not a single-CS pattern. Stage 3's loader must consult the manifest by (tid, idx) and NOT assume any particular CS is "the contended one."

Numbers vs plan estimate

Plan estimated "<100 contention entries across the whole boot given the wait-light profile." Actual = 807 in the truncated trace (250k tid=6 events / 20k per sister). Full untruncated trace = 7,135 events. Plan was ~7× off; the larger manifest is fine — JSON parses in <1ms, lookups are O(1) via dict.

Phase B image hash

image_loaded_sha256 = ea8d160e… — UNCHANGED (no Phase B touchpoints).

Reading-error class

No new class. Pure-Python tooling, no engine sources touched.

Artifacts

Next session

Stage 3 = ours-side OrderMode::ContentionReplay mode + manifest loader + forced-park branch in rtl_enter_critical_section (~250 LOC across scheduler.rs, new contention_manifest.rs module in xenia-kernel, exports.rs:2886-2946). Acceptance: main matched-prefix advances past 104,607 (target ≥106,000) with stable digest × 3 cold runs under replay mode. Default mode (no manifest passed) byte-identical to current ba5b5e07…. Stage 4 (diff-tool ENGINE_LOCAL_KINDS for contention.observed) lands before any cross-engine diff over a cvar-ON trace.