Files
xenia-rs/tools/diff-state/README.md
MechaCat02 ad45873a1b ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 07:27:26 +02:00

76 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# diff-state
Phase B initial-state snapshot diff tool. Stdlib-only Python. Mirrors the
shape of `tools/diff-events/` but operates on the *static structural*
snapshots emitted by `phase_b_snapshot` at the moment immediately before
the first guest PPC instruction of the XEX entry_point executes.
## Usage
```bash
python3 tools/diff-state/diff_state.py \
--canary <snapshot_dir>/canary \
--ours <snapshot_dir>/ours \
--out <snapshot_dir>/report.md
```
Writes:
- `<snapshot_dir>/report.md` — human-readable divergence catalog
- `<snapshot_dir>/report.json` — machine-readable sibling (same content)
## Exit codes
| code | meaning |
|---|---|
| 0 | no divergence (or `--validate-identical` succeeded) |
| 1 | divergences found |
| 2 | STOP triggered (`image_loaded_sha256` / `xex_entry_point` / `iso_sha256` mismatch) |
## Field-comparison rules
Lives at the top of `diff_state.py` as Python constants — read those for
the authoritative spec. Summary:
- `engine`, `schema_version`, `deterministic_skip` are always skipped.
- `cpu_state.json`: skip `hw_id`.
- `kernel.json`: skip `raw_handle_id`, `exports_registered_count`.
- `config.json`: skip `build_id`, `iso_path`, `host_ns_at_snapshot`,
`wall_clock_iso8601`, `cli_argv`, `cvars.phase_b_snapshot_dir`.
- Each snapshot's `deterministic_skip` array is honored too.
## Set vs sequence semantics
- **Set** (sort by key, then positional compare):
- `kernel.json::objects` (key=`handle_semantic_id`)
- `kernel.json::handle_name_table` (key=`name`)
- `vfs.json::cache_root_listing` (key=`relpath`)
- `memory.json::heaps` (key=`base`)
- **Sequence** (positional compare): everything else, including
`memory.json::regions` (which both engines emit pre-sorted by
`(start, end)`).
## Classification
| class | trigger | priority |
|---|---|---|
| σ-structural | field missing/extra; sequence-length mismatch; set element only in one engine | 1 (always report) |
| δ-content-STOP | `image_loaded_sha256` / `xex_entry_point` / `iso_sha256` mismatch | STOP (exit 2) |
| δ-content | other `*_sha256` field differs | 2 |
| γ-kernel-content | `objects[].details` field differs | 2 — primary Phase C target |
| κ-cache | non-empty `cache_root_listing` either side | re-run after `rm -rf` of caches |
| ε-host-allocator | heap base/region start differs but sha256 agrees | catalog only |
| τ-host-timing | `deterministic_skip`-listed timing field | silent unless verbose |
## Negative-test recipe
To verify the tool catches a hand-mutation:
```bash
cp -r snap-001/ours snap-001/ours-mut
sed -i 's/"thread_id": 1/"thread_id": 999/' snap-001/ours-mut/kernel.json
python3 tools/diff-state/diff_state.py \
--canary snap-001/ours --ours snap-001/ours-mut --out /tmp/r.md
# exit code 1; report names objects[handle_semantic_id=...] details.thread_id
```