Files
xenia-rs/tools/diff-state
MechaCat02 ad45873a1b ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 07:27:26 +02:00
..

diff-state

Phase B initial-state snapshot diff tool. Stdlib-only Python. Mirrors the shape of tools/diff-events/ but operates on the static structural snapshots emitted by phase_b_snapshot at the moment immediately before the first guest PPC instruction of the XEX entry_point executes.

Usage

python3 tools/diff-state/diff_state.py \
  --canary <snapshot_dir>/canary \
  --ours   <snapshot_dir>/ours \
  --out    <snapshot_dir>/report.md

Writes:

  • <snapshot_dir>/report.md — human-readable divergence catalog
  • <snapshot_dir>/report.json — machine-readable sibling (same content)

Exit codes

code meaning
0 no divergence (or --validate-identical succeeded)
1 divergences found
2 STOP triggered (image_loaded_sha256 / xex_entry_point / iso_sha256 mismatch)

Field-comparison rules

Lives at the top of diff_state.py as Python constants — read those for the authoritative spec. Summary:

  • engine, schema_version, deterministic_skip are always skipped.
  • cpu_state.json: skip hw_id.
  • kernel.json: skip raw_handle_id, exports_registered_count.
  • config.json: skip build_id, iso_path, host_ns_at_snapshot, wall_clock_iso8601, cli_argv, cvars.phase_b_snapshot_dir.
  • Each snapshot's deterministic_skip array is honored too.

Set vs sequence semantics

  • Set (sort by key, then positional compare):
    • kernel.json::objects (key=handle_semantic_id)
    • kernel.json::handle_name_table (key=name)
    • vfs.json::cache_root_listing (key=relpath)
    • memory.json::heaps (key=base)
  • Sequence (positional compare): everything else, including memory.json::regions (which both engines emit pre-sorted by (start, end)).

Classification

class trigger priority
σ-structural field missing/extra; sequence-length mismatch; set element only in one engine 1 (always report)
δ-content-STOP image_loaded_sha256 / xex_entry_point / iso_sha256 mismatch STOP (exit 2)
δ-content other *_sha256 field differs 2
γ-kernel-content objects[].details field differs 2 — primary Phase C target
κ-cache non-empty cache_root_listing either side re-run after rm -rf of caches
ε-host-allocator heap base/region start differs but sha256 agrees catalog only
τ-host-timing deterministic_skip-listed timing field silent unless verbose

Negative-test recipe

To verify the tool catches a hand-mutation:

cp -r snap-001/ours snap-001/ours-mut
sed -i 's/"thread_id": 1/"thread_id": 999/' snap-001/ours-mut/kernel.json
python3 tools/diff-state/diff_state.py \
  --canary snap-001/ours --ours snap-001/ours-mut --out /tmp/r.md
# exit code 1; report names objects[handle_semantic_id=...] details.thread_id