ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).
Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.
Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
[status mismatch], [args_resolved.path mismatch] tags
(tools/diff-events/diff_events.py); closes reading-error #41
(silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
(event_log.rs + xenia-app/main.rs); closes reading-error #42
(Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
(exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
(exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]
Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.
Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
75
tools/diff-state/README.md
Normal file
75
tools/diff-state/README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# diff-state
|
||||
|
||||
Phase B initial-state snapshot diff tool. Stdlib-only Python. Mirrors the
|
||||
shape of `tools/diff-events/` but operates on the *static structural*
|
||||
snapshots emitted by `phase_b_snapshot` at the moment immediately before
|
||||
the first guest PPC instruction of the XEX entry_point executes.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
python3 tools/diff-state/diff_state.py \
|
||||
--canary <snapshot_dir>/canary \
|
||||
--ours <snapshot_dir>/ours \
|
||||
--out <snapshot_dir>/report.md
|
||||
```
|
||||
|
||||
Writes:
|
||||
|
||||
- `<snapshot_dir>/report.md` — human-readable divergence catalog
|
||||
- `<snapshot_dir>/report.json` — machine-readable sibling (same content)
|
||||
|
||||
## Exit codes
|
||||
|
||||
| code | meaning |
|
||||
|---|---|
|
||||
| 0 | no divergence (or `--validate-identical` succeeded) |
|
||||
| 1 | divergences found |
|
||||
| 2 | STOP triggered (`image_loaded_sha256` / `xex_entry_point` / `iso_sha256` mismatch) |
|
||||
|
||||
## Field-comparison rules
|
||||
|
||||
Lives at the top of `diff_state.py` as Python constants — read those for
|
||||
the authoritative spec. Summary:
|
||||
|
||||
- `engine`, `schema_version`, `deterministic_skip` are always skipped.
|
||||
- `cpu_state.json`: skip `hw_id`.
|
||||
- `kernel.json`: skip `raw_handle_id`, `exports_registered_count`.
|
||||
- `config.json`: skip `build_id`, `iso_path`, `host_ns_at_snapshot`,
|
||||
`wall_clock_iso8601`, `cli_argv`, `cvars.phase_b_snapshot_dir`.
|
||||
- Each snapshot's `deterministic_skip` array is honored too.
|
||||
|
||||
## Set vs sequence semantics
|
||||
|
||||
- **Set** (sort by key, then positional compare):
|
||||
- `kernel.json::objects` (key=`handle_semantic_id`)
|
||||
- `kernel.json::handle_name_table` (key=`name`)
|
||||
- `vfs.json::cache_root_listing` (key=`relpath`)
|
||||
- `memory.json::heaps` (key=`base`)
|
||||
- **Sequence** (positional compare): everything else, including
|
||||
`memory.json::regions` (which both engines emit pre-sorted by
|
||||
`(start, end)`).
|
||||
|
||||
## Classification
|
||||
|
||||
| class | trigger | priority |
|
||||
|---|---|---|
|
||||
| σ-structural | field missing/extra; sequence-length mismatch; set element only in one engine | 1 (always report) |
|
||||
| δ-content-STOP | `image_loaded_sha256` / `xex_entry_point` / `iso_sha256` mismatch | STOP (exit 2) |
|
||||
| δ-content | other `*_sha256` field differs | 2 |
|
||||
| γ-kernel-content | `objects[].details` field differs | 2 — primary Phase C target |
|
||||
| κ-cache | non-empty `cache_root_listing` either side | re-run after `rm -rf` of caches |
|
||||
| ε-host-allocator | heap base/region start differs but sha256 agrees | catalog only |
|
||||
| τ-host-timing | `deterministic_skip`-listed timing field | silent unless verbose |
|
||||
|
||||
## Negative-test recipe
|
||||
|
||||
To verify the tool catches a hand-mutation:
|
||||
|
||||
```bash
|
||||
cp -r snap-001/ours snap-001/ours-mut
|
||||
sed -i 's/"thread_id": 1/"thread_id": 999/' snap-001/ours-mut/kernel.json
|
||||
python3 tools/diff-state/diff_state.py \
|
||||
--canary snap-001/ours --ours snap-001/ours-mut --out /tmp/r.md
|
||||
# exit code 1; report names objects[handle_semantic_id=...] details.thread_id
|
||||
```
|
||||
Reference in New Issue
Block a user