ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge

Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-29 07:27:26 +02:00
parent e6d43a23ac
commit ad45873a1b
50 changed files with 14389 additions and 506 deletions

View File

@@ -0,0 +1,73 @@
# diff_events.py — Phase A event-log diff tool
A stdlib-only Python tool that diffs two schema-v1 JSONL event logs (one per engine) and reports the **first behavioral divergence per guest thread**. Built for the Phase A diff harness — see `audit-runs/phase-a-diff-harness/README.md` and `schema-v1.md`.
## What it does
1. Reads two JSONL files. Validates each begins with a `schema_version=1` header event.
2. Builds per-thread streams keyed by `tid_event_idx` (the schema's per-tid monotonic counter).
3. Maps canary-tid ↔ ours-tid (auto-pairs by first `kernel.call` name in each stream, or manual via `--tid-map`).
4. Walks each mapped pair in parallel, comparing events with rules from the schema (raw_handle_id skipped, host_ns skipped, wait_duration_cycles skipped, etc.).
5. On first divergence: prints 5-event pre-context + the divergent event + the next event from each. Stops that thread's walk.
6. Writes a markdown report.
## Usage
```bash
# Default — auto-map tids, write markdown to stdout
python3 diff_events.py --canary canary.jsonl --ours ours.jsonl
# Write report to a file
python3 diff_events.py --canary c.jsonl --ours o.jsonl --out report.md
# Manual tid map
python3 diff_events.py --canary c.jsonl --ours o.jsonl --tid-map 6=1,7=2
# Negative-test mode — exit non-zero on ANY divergence (gate-4)
python3 diff_events.py --canary c.jsonl --ours o.jsonl --validate-identical
```
## How it compares
These fields are **skipped** when comparing payloads:
- Top-level: `engine`, `host_ns`, `guest_cycle`, `deterministic`.
- `handle.create`/`handle.destroy`: `raw_handle_id`, `handle_semantic_id` (engine-local).
- `wait.begin`: `handles_semantic_ids` (engine-local SIDs).
- `wait.end`: `wait_duration_cycles` (depends on host scheduling), `woken_by_semantic_id`.
The `tid_event_idx` field is the **alignment key**. Two events at the same `tid_event_idx` on a mapped pair of tids are expected to be the same logical event. The `kind` must match; the `payload` must match field-by-field (except skipped fields).
## Phase C+18 — Cross-tid floating `handle.create` (shared-global dispatchers)
Process-global kernel dispatcher objects (`KEVENT`/`KSEMAPHORE` etc. that game code creates with `KeInitializeEvent` or static-allocs and shares across multiple guest threads) are lazy-wrapped on **first guest-thread touch** by canary's `XObject::GetNativeObject` and ours's `ensure_dispatcher_object`. Whichever thread happens to touch the dispatcher first synthesizes the wrapper and emits the `handle.create` event. Which thread wins is timing-dependent — canary and ours may disagree.
The SID for these synthesized handles is computed via a **scheduling-invariant recipe** keyed on `(pointer, object_type)` only (see schema-v1.md §"Shared-global SIDs"). The same dispatcher therefore yields the same SID in both engines regardless of the first-toucher thread.
The diff tool detects shared-global `handle.create` events by recomputing the deterministic SID from the event's `(raw_handle_id, object_type)` payload and matching against the emitted `handle_semantic_id`. When per-tid alignment finds one side has an "extra" `handle.create` event whose SID is in the global set, the tool **advances only that side's stream pointer past the floating event** and re-compares — preserving strict alignment for everything else.
The summary table shows per-pair `floating_skipped (c/o)` counts so you can see how many events were absorbed by this mechanism.
## Known limitations (v1)
- **Auto tid-map is naive**: pairs canary-tid with ours-tid by the first `kernel.call` name on each thread. Works for boot when the same initial call happens on each engine's primary thread; can mis-pair if two threads start with the same first-call name or if a thread spawns earlier on one engine. Use `--tid-map` to override.
- **No streaming**: loads both files fully into memory. Acceptable for boot-window runs; the canary log is ~370 MB for a 12 s run.
- **First-divergence only**: per-thread walk stops at first divergence. Subsequent divergences on the same thread are not reported (a sliding-window mode could be added later if needed).
- **Schema v1 only**: refuses to parse v2 inputs (forward-incompat is intentional).
## Files
- `diff_events.py` — single-file CLI, stdlib only (json, argparse, pathlib).
- `README.md` — this file.
## Test it
```bash
# Self-diff (compare a file against itself) should report 0 divergences.
python3 diff_events.py --canary x.jsonl --ours x.jsonl --validate-identical
echo "exit=$?" # expect 0
# Negative test: corrupt one event and confirm the tool reports it.
sed '50s/"kernel.call"/"kernel.CORRUPT"/' x.jsonl > /tmp/x-corrupt.jsonl
python3 diff_events.py --canary x.jsonl --ours /tmp/x-corrupt.jsonl --validate-identical
echo "exit=$?" # expect 1
```

View File

@@ -0,0 +1,253 @@
#!/usr/bin/env python3
"""Phase D Stage 2 — contention-manifest builder.
Reads a Phase A JSONL event log produced by canary with cvar
`kernel_emit_contention=true` (Stage 1) and distills it to a
replay-ready manifest for Stage 3 to consume.
Output schema (`contention_manifest.json`):
{
"version": 1,
"source_canary_jsonl": "<absolute path>",
"source_canary_sha256": "<hex>",
"built_at_host_unix": <int>,
"summary": {
"total_input_events": <int>,
"total_contention_events_kept": <int>,
"per_tid_counts": { "<tid>": <int>, ... }
},
"entries": [
{ "tid": 6, "tid_event_idx": 104664, "site_sid": "c26a128bf45411f7",
"cs_ptr": "0xbc65c890", "contended": true },
...
]
}
Entries are sorted by (tid asc, tid_event_idx asc). Stage 3's ours-side
replay loader keys on `(tid, tid_event_idx)`; the canary tid is the
*native* tid emitted by canary (no display-mapping is applied here —
see investigation.md §"Tid mapping is per-engine native").
Only events with `kind == "contention.observed"` and `contended == true`
are kept. Stage 1's emitter never emits `contended=false`, so this
filter is paranoid-defensive. Schema events / handle events / wait
events are dropped.
Usage:
python3 build_contention_manifest.py \\
--canary-jsonl path/to/canary-cvaron-trunc.jsonl \\
--out path/to/contention_manifest.json
Exit 0 on success. Exit 1 on parse error or empty manifest (no
contention events found — likely cvar wasn't enabled when the trace
was captured).
"""
import argparse
import hashlib
import json
import sys
import time
from pathlib import Path
def parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(description=__doc__.splitlines()[0])
p.add_argument(
"--canary-jsonl",
required=True,
help="Path to canary Phase A JSONL log (with cvar=true).",
)
p.add_argument(
"--out",
required=True,
help="Output path for contention_manifest.json.",
)
p.add_argument(
"--tid-map",
default="",
help=(
"Optional canary→ours tid translation. Format "
"'CANARY=OURS,CANARY=OURS,...' (e.g. '6=1,7=2,4=11'). When "
"supplied, manifest entries are emitted with the ours-side tid "
"so the Stage-3 consumer can key on its own native current_tid. "
"Entries on a canary tid NOT in the map are dropped with a "
"warning. Same format as diff_events.py."
),
)
p.add_argument(
"--quiet",
action="store_true",
help="Suppress the human-readable summary on stderr.",
)
return p.parse_args()
def parse_tid_map(s: str) -> dict[int, int] | None:
"""Parse 'a=b,c=d' into {a: b, c: d}. Empty/None → None."""
s = s.strip()
if not s:
return None
out: dict[int, int] = {}
for piece in s.split(","):
piece = piece.strip()
if not piece:
continue
if "=" not in piece:
raise ValueError(f"bad tid-map fragment: {piece!r}")
l, r = piece.split("=", 1)
out[int(l.strip())] = int(r.strip())
return out
def sha256_of(path: Path) -> str:
h = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(1 << 20), b""):
h.update(chunk)
return h.hexdigest()
def build_manifest(
jsonl_path: Path,
tid_map: dict[int, int] | None = None,
) -> dict:
"""Read `jsonl_path` and return a manifest dict.
If `tid_map` (canary_tid → ours_tid) is provided, entries are written
with the translated ours-side tid. Entries on a canary tid not in
the map are dropped (counted in `summary.skipped_unmapped_tids`).
When `tid_map` is None, manifest tids are canary's native values
(back-compat with Stage 2's first iteration).
Raises FileNotFoundError / json.JSONDecodeError on bad input.
"""
entries: list[dict] = []
total_input = 0
bad_lines = 0
unmapped = 0
with jsonl_path.open("r", encoding="utf-8") as f:
for lineno, line in enumerate(f, start=1):
line = line.rstrip("\n")
if not line:
continue
total_input += 1
try:
ev = json.loads(line)
except json.JSONDecodeError:
bad_lines += 1
continue
if ev.get("kind") != "contention.observed":
continue
payload = ev.get("payload") or {}
if payload.get("contended") is not True:
continue
canary_tid = int(ev["tid"])
if tid_map is not None:
if canary_tid not in tid_map:
unmapped += 1
continue
tid = tid_map[canary_tid]
else:
tid = canary_tid
entry = {
"tid": tid,
"tid_event_idx": int(ev["tid_event_idx"]),
"site_sid": str(payload.get("site_sid", "")),
"cs_ptr": str(payload.get("cs_ptr", "")),
"contended": True,
}
# Defensive: every Stage 1 event carries cs_ptr + site_sid.
# If either is missing, skip rather than emit a broken entry.
if not entry["site_sid"] or not entry["cs_ptr"]:
bad_lines += 1
continue
entries.append(entry)
# Stable sort by (tid, tid_event_idx). Same (tid, idx) pair is not
# expected — the per-tid counter is monotone — but if duplicates
# appear (e.g. mis-merged jsonls), keep the first; later phases would
# otherwise see ambiguous manifest keys.
entries.sort(key=lambda e: (e["tid"], e["tid_event_idx"]))
deduped: list[dict] = []
seen: set[tuple[int, int]] = set()
dup_count = 0
for e in entries:
key = (e["tid"], e["tid_event_idx"])
if key in seen:
dup_count += 1
continue
seen.add(key)
deduped.append(e)
per_tid: dict[str, int] = {}
for e in deduped:
per_tid[str(e["tid"])] = per_tid.get(str(e["tid"]), 0) + 1
return {
"version": 1,
"source_canary_jsonl": str(jsonl_path.resolve()),
"source_canary_sha256": sha256_of(jsonl_path),
"built_at_host_unix": int(time.time()),
"tid_map": tid_map,
"summary": {
"total_input_events": total_input,
"total_contention_events_kept": len(deduped),
"skipped_bad_lines": bad_lines,
"skipped_unmapped_tids": unmapped,
"skipped_duplicate_keys": dup_count,
"per_tid_counts": per_tid,
},
"entries": deduped,
}
def render_summary(manifest: dict) -> str:
s = manifest["summary"]
lines = [
f"contention manifest built from {manifest['source_canary_jsonl']}",
f" source sha256: {manifest['source_canary_sha256']}",
f" total input events scanned: {s['total_input_events']}",
f" contention events kept: {s['total_contention_events_kept']}",
f" bad/skipped lines: {s['skipped_bad_lines']}",
f" duplicate (tid,idx) skipped: {s['skipped_duplicate_keys']}",
" per-tid counts:",
]
for tid, count in sorted(s["per_tid_counts"].items(),
key=lambda kv: int(kv[0])):
lines.append(f" tid={int(tid):4d} {count}")
return "\n".join(lines)
def main() -> int:
args = parse_args()
src = Path(args.canary_jsonl)
if not src.is_file():
print(f"error: not a file: {src}", file=sys.stderr)
return 1
try:
tid_map = parse_tid_map(args.tid_map)
except ValueError as e:
print(f"error: --tid-map: {e}", file=sys.stderr)
return 1
manifest = build_manifest(src, tid_map=tid_map)
if manifest["summary"]["total_contention_events_kept"] == 0:
print(
"error: 0 contention.observed events found — was the trace "
"captured with --kernel_emit_contention=true?",
file=sys.stderr,
)
return 1
out = Path(args.out)
out.parent.mkdir(parents=True, exist_ok=True)
with out.open("w", encoding="utf-8") as f:
json.dump(manifest, f, indent=2)
f.write("\n")
if not args.quiet:
print(render_summary(manifest), file=sys.stderr)
return 0
if __name__ == "__main__":
sys.exit(main())

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,299 @@
#!/usr/bin/env python3
"""Unit tests for `build_contention_manifest.py`.
Run as `python3 test_build_manifest.py` — prints `PASS` per test.
"""
import json
import sys
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent))
from build_contention_manifest import build_manifest, render_summary # noqa: E402
def write_jsonl(lines: list[str]) -> Path:
tmp = tempfile.NamedTemporaryFile(
mode="w", suffix=".jsonl", delete=False, encoding="utf-8"
)
for line in lines:
tmp.write(line + "\n")
tmp.close()
return Path(tmp.name)
def mk_event(
kind: str,
tid: int,
idx: int,
payload: dict,
engine: str = "canary",
) -> str:
return json.dumps(
{
"schema_version": 1,
"engine": engine,
"kind": kind,
"tid": tid,
"tid_event_idx": idx,
"guest_cycle": 0,
"host_ns": 0,
"deterministic": True,
"payload": payload,
}
)
def test_basic_extract() -> None:
src = write_jsonl([
mk_event("import.call", 6, 0, {"name": "Foo"}),
mk_event(
"contention.observed",
6,
104664,
{"cs_ptr": "0xbc65c890", "site_sid": "c26a128b", "contended": True},
),
mk_event("import.call", 6, 1, {"name": "Bar"}),
])
m = build_manifest(src)
assert m["version"] == 1
assert m["summary"]["total_input_events"] == 3
assert m["summary"]["total_contention_events_kept"] == 1
assert m["summary"]["per_tid_counts"] == {"6": 1}
e = m["entries"][0]
assert e["tid"] == 6 and e["tid_event_idx"] == 104664
assert e["site_sid"] == "c26a128b" and e["cs_ptr"] == "0xbc65c890"
assert e["contended"] is True
print("PASS test_basic_extract")
def test_filters_non_contention_kinds() -> None:
src = write_jsonl([
mk_event("handle.create", 6, 0, {"handle_semantic_id": "x"}),
mk_event("wait.begin", 6, 1, {"handles_semantic_ids": ["x"]}),
mk_event("kernel.call", 6, 2, {"name": "X"}),
mk_event(
"contention.observed",
7,
42,
{"cs_ptr": "0x1000", "site_sid": "deadbeef", "contended": True},
),
])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 1
assert m["entries"][0]["tid"] == 7
print("PASS test_filters_non_contention_kinds")
def test_filters_contended_false() -> None:
# Stage 1's emitter never emits contended=false today, but defensive
# filter must skip those if a future variant adds them.
src = write_jsonl([
mk_event(
"contention.observed",
6,
10,
{"cs_ptr": "0xa", "site_sid": "11", "contended": False},
),
mk_event(
"contention.observed",
6,
11,
{"cs_ptr": "0xa", "site_sid": "11", "contended": True},
),
])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 1
assert m["entries"][0]["tid_event_idx"] == 11
print("PASS test_filters_contended_false")
def test_sorts_by_tid_then_idx() -> None:
src = write_jsonl([
mk_event(
"contention.observed",
9,
5,
{"cs_ptr": "0x9", "site_sid": "99", "contended": True},
),
mk_event(
"contention.observed",
6,
200,
{"cs_ptr": "0xb", "site_sid": "bb", "contended": True},
),
mk_event(
"contention.observed",
6,
100,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
])
m = build_manifest(src)
keys = [(e["tid"], e["tid_event_idx"]) for e in m["entries"]]
assert keys == [(6, 100), (6, 200), (9, 5)], keys
print("PASS test_sorts_by_tid_then_idx")
def test_deduplicates_same_tid_idx() -> None:
src = write_jsonl([
mk_event(
"contention.observed",
6,
42,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
mk_event(
"contention.observed",
6,
42,
{"cs_ptr": "0xb", "site_sid": "bb", "contended": True},
),
])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 1
assert m["summary"]["skipped_duplicate_keys"] == 1
# Keeps the first occurrence.
assert m["entries"][0]["cs_ptr"] == "0xa"
print("PASS test_deduplicates_same_tid_idx")
def test_skips_missing_fields() -> None:
src = write_jsonl([
# Missing site_sid.
mk_event(
"contention.observed",
6,
1,
{"cs_ptr": "0xa", "contended": True},
),
# Missing cs_ptr.
mk_event(
"contention.observed",
6,
2,
{"site_sid": "aa", "contended": True},
),
# Both present — kept.
mk_event(
"contention.observed",
6,
3,
{"cs_ptr": "0xb", "site_sid": "bb", "contended": True},
),
])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 1
assert m["summary"]["skipped_bad_lines"] == 2
print("PASS test_skips_missing_fields")
def test_handles_bad_json_lines() -> None:
src = write_jsonl([
"not-json",
mk_event(
"contention.observed",
6,
1,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
"{\"truncated\":",
])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 1
assert m["summary"]["skipped_bad_lines"] == 2
print("PASS test_handles_bad_json_lines")
def test_render_summary_human_readable() -> None:
src = write_jsonl([
mk_event(
"contention.observed",
6,
1,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
mk_event(
"contention.observed",
14,
100,
{"cs_ptr": "0xb", "site_sid": "bb", "contended": True},
),
])
m = build_manifest(src)
out = render_summary(m)
assert "contention events kept: 2" in out
assert "tid= 6 1" in out
assert "tid= 14 1" in out
print("PASS test_render_summary_human_readable")
def test_empty_input_yields_zero_kept() -> None:
src = write_jsonl([mk_event("import.call", 0, 0, {"name": "X"})])
m = build_manifest(src)
assert m["summary"]["total_contention_events_kept"] == 0
assert m["entries"] == []
print("PASS test_empty_input_yields_zero_kept")
def test_tid_map_translates_canary_to_ours() -> None:
src = write_jsonl([
mk_event(
"contention.observed",
6,
104664,
{"cs_ptr": "0xbc65c890", "site_sid": "c26a128bf45411f7", "contended": True},
),
mk_event(
"contention.observed",
7,
10,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
])
m = build_manifest(src, tid_map={6: 1, 7: 2})
assert m["entries"][0]["tid"] == 1, m["entries"][0]
assert m["entries"][1]["tid"] == 2
print("PASS test_tid_map_translates_canary_to_ours")
def test_tid_map_drops_unmapped_canary_tids() -> None:
src = write_jsonl([
mk_event(
"contention.observed",
6,
100,
{"cs_ptr": "0xa", "site_sid": "aa", "contended": True},
),
mk_event(
"contention.observed",
99,
200,
{"cs_ptr": "0xb", "site_sid": "bb", "contended": True},
),
])
m = build_manifest(src, tid_map={6: 1})
assert m["summary"]["total_contention_events_kept"] == 1
assert m["summary"]["skipped_unmapped_tids"] == 1
assert m["entries"][0]["tid"] == 1
print("PASS test_tid_map_drops_unmapped_canary_tids")
if __name__ == "__main__":
tests = [
test_basic_extract,
test_filters_non_contention_kinds,
test_filters_contended_false,
test_sorts_by_tid_then_idx,
test_deduplicates_same_tid_idx,
test_skips_missing_fields,
test_handles_bad_json_lines,
test_render_summary_human_readable,
test_empty_input_yields_zero_kept,
test_tid_map_translates_canary_to_ours,
test_tid_map_drops_unmapped_canary_tids,
]
for t in tests:
t()
print(f"\nALL {len(tests)} TESTS PASS")

File diff suppressed because it is too large Load Diff