Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.5 KiB
Iterate 2.M — Always-on structured exit-state dump (writer report)
Date: 2026-05-28. LOC delta: engine +143 (xenia-app main.rs +128, xenia-kernel event_log.rs +15). Tests: xenia-kernel 227/227 PASS + xenia-app 5/5 + 2 ignored + 1 ignored = ZERO regressions. Cascade: N/A — diagnostic, not investigation (tripstone #40).
Headline
STRUCTURED-EXIT-DUMP-LANDED. Every exec invocation now emits
<phase-A-trace-dir>/exit-thread-state.json at exit time, regardless
of --quiet. The dump contains every alive thread (tid, hw_id, idx,
pc, lr, sp, priority, affinity, suspend_count, state) plus a
wedge_map cross-referencing every blocked-forever wait into
{waiter_tid, waiter_pc, handle, handle_type, signaler_tid_if_known,
human summary}. Closes reading-error #42 — Phase-A JSONL is now never
the sole source of exit-time ground truth.
Mode
Engine code change in xenia-rs/crates/:
xenia-kernel/src/event_log.rs:7-22, 48-53, 79-89— record the Phase-A trace path passed toinit()so the dump can derive a sibling path; exposepub fn output_path() -> Option<&'static Path>. ~15 LOC net.xenia-app/src/main.rs:4460-4583— newfn write_thread_state_dump( kernel: &KernelState)that builds JSON viaserde_jsonfromkernel.scheduler.slots[*].runqueue[*]+kernel.objects[h]and writes to<phase-A-dir>/exit-thread-state.json(CWD fallback when Phase-A is disabled). Always-on (noquietgate). ~110 LOC body + 13 LOC docstring.xenia-app/src/main.rs:2161-2164, 4525-4527— wire the call into both post-run paths (headlesscmd_exec_innerandrun_with_ui), immediately afterdump_thread_diagnostic. Existing plain-text diagnostic untouched.
Verification gate
Same invocation as 2.J/2.K with no extra flags:
XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \
-n 50000000 --quiet \
--phase-a-event-log audit-runs/iterate-2M-exit-state-dump/ours-cold.jsonl \
"<iso>"
Run completed EXIT=0. Stderr emitted (under --quiet):
exit-thread-state: wrote 13 thread(s), 10 wedge entr(ies) to \
audit-runs/iterate-2M-exit-state-dump/exit-thread-state.json
Gate criteria — all PASS
| criterion | result |
|---|---|
Dump emitted at <output-dir>/exit-thread-state.json without extra flags |
PASS |
| Contains all 13 alive threads (matches 2.K's plain-text dump count) | PASS |
5 blocked tids at PC 0x824ac578 present and tagged state=Blocked |
PASS (tid 1, 13, 4, 5, 3) |
| Wedge map cross-references handle → type → signaler_tid_if_known | PASS (10 entries, all blocked-forever waits) |
| tid=1 → Thread(id=13) circular wait surfaced | PASS (summary: "tid=1 → Thread(id=13)") |
| tid=8 → Semaphore(0/2^31-1) AUDIT-069 work-sem visible | PASS (summary: "tid=8 → Semaphore(0/2147483647)") |
| tid=13 → Event(sig=false) signaler-unknown surfaced | PASS (signaler_tid_if_known: null) |
Existing === Final State === / === Thread diagnostics === / -- Handle waiter lists -- blocks preserved under non-quiet |
PASS (3 grep hits in non-quiet stdout) |
| Structured dump ALSO emits under non-quiet (idempotent w.r.t. quiet flag) | PASS |
Bit-for-bit match against 2.K's exit-diag-full.log
Each of the 8 blocked tids in 2.K's plain-text dump appears in 2.M's
wedge_map/alive_threads with identical handle ids, identical
handle types, identical PC/LR/SP values, identical waiter membership.
Spot-check:
| 2.K plain-text line | 2.M JSON |
|---|---|
tid=1 ... handles: [4808] ... pc=0x824ac578 |
{"tid":1, "handle":"0x000012c8", "pc":"0x824ac578"} (4808=0x12c8) |
tid=13 ... handles: [4816] ... pc=0x824ac578 |
{"tid":13, "handle":"0x000012d0", "pc":"0x824ac578"} (4816=0x12d0) |
tid=8 ... handles: [4332, 4312] |
[{"handle":"0x000010ec"},{"handle":"0x000010d8"}] (4332=0x10ec, 4312=0x10d8) |
tid=4 ... handles: [4136] |
{"tid":4, "handle":"0x00001028"} (4136=0x1028) |
tid=5 ... handles: [4836] |
{"tid":5, "handle":"0x000012e4"} (4836=0x12e4) |
tid=3 ... handles: [4128] |
{"tid":3, "handle":"0x00001020"} (4128=0x1020) |
tid=8 ... 0x10d8 Semaphore(0/2147483647) |
{"type":"Semaphore","count":0,"max":2147483647} |
0x12c8 Thread(id=13, exit=None) |
{"type":"Thread","thread_id":13,"exited":false} |
Existing-mechanism
fn dump_thread_diagnostic (main.rs:3933-4453) produces the plain-text
=== Thread diagnostics === + -- Handle waiter lists -- block when
!quiet. 2.K's exit-diag-full.log was a manual non-quiet re-run.
2.M extends by adding a sibling structured emitter that is always
on; the existing plain-text path is unchanged (still off under
--quiet, still emits identically under non-quiet).
Relationship: the plain-text dump remains the human-readable
walk-the-log artifact; the new JSON is the machine-readable harness
input. They produce the same content from the same KernelState
snapshot; choosing JSON for the new sibling matches Phase-A JSONL's
schema-versioned input style and is jq-friendly.
Test results
cargo build --release -p xenia-app— OK, 1 pre-existing unrelated warning (phase_b_snapshot.rs::walk_committed_regionsdead_code).cargo test --release -p xenia-kernel -p xenia-app— 235 passed, 0 failed (227 lib + 5 + 2 ignored + 1 ignored + 0 doc).
Use cases
- Next iterate can
jq '.wedge_map[] | select(.waiter_pc == "0x824ac578")'to get the wedge tid set in one line. - Cross-engine diff: pair canary's analogous exit-state JSON (TBD)
with ours's via
tools/diff-events-style diff to identify missing-thread (canary tids 15/27/28 = sub_825070F0 family) and missing-signaler (Event handles withwaiters_tid≠[]and no producer in ours's trace). - No more 2.J-class misreadings: a Phase-A trace ending with
kernel.return successat the matched-prefix tail will be immediately contradicted byexit-thread-state.jsonshowing those same tids parked indefinitely. The reading-error #42 surface is closed at the output level.
Tripstone audit
- #28 (cross-engine tid stability): JSON keys tids by raw integer,
which is acceptable for ours-only intra-run reads. For cross-engine
diffs against canary, downstream tooling must continue to key on
(entry_pc, ctx_ptr)— that's a 2.M+1 concern, not a 2.M one. The dump preserves enough columns (hw_id,idx,pc,lr,sp,affinity_mask) for the consumer to do its own re-keying. - #39 (progression class): 2.M is methodology not progression. No cascade A/B/C/D claim made. Headline does NOT claim VdSwap/draw movement.
- #40 (single-keystone framing): not applicable — diagnostic, not single-cause investigation.
- #42 (Phase-A blind to blocked-forever waits): CLOSED at the output level by this iterate. Future investigations now have an always-on machine-readable wedge snapshot.
Confidence
- HIGH that the dump emits on every
execrun with no extra flags (verified empirically under--quietAND non-quiet). - HIGH that content matches 2.K's plain-text dump bit-for-bit (every handle id, every PC, every waiter list line cross-checked).
- HIGH that existing diagnostic mechanism is unbroken (plain-text still emits 3 sections under non-quiet, JSON also emits).
- HIGH that ZERO test regressions (235/235 pass).
Artifacts
Under xenia-rs/audit-runs/iterate-2M-exit-state-dump/:
ours-cold.jsonl(Phase-A trace, 121,569 events, ~28MB, bit-equal to 2.J/2.K)ours-cold.stdout.log(empty — quiet mode preserved)ours-cold.stderr.log(single line: dump emission notice)exit-thread-state.json(the new artifact, 9651 bytes, 13 threads + 10 wedge entries)ours-cold-nonquiet.stdout.log/.stderr.log(regression check: existing plain-text diagnostic preserved)writer-report.md(this file)
Patch:
xenia-rs/crates/xenia-kernel/src/event_log.rs(path tracker + accessor)xenia-rs/crates/xenia-app/src/main.rs(dump function + 2 call sites)
Next iterate enabler
exit-thread-state.json is now a stable input for:
- Canary parity: add the analogous emitter to canary's exit path so cross-engine wedge-map diffs become trivial.
- Per-handle signaler hunt: for each wedge
handle_type=Event, signaler_tid_if_known=null, walk Phase-A trace for canary's handle-equivalent (semantic_id) signal source — directly identifies which canary thread/path is missing in ours. - Regression alarm: a CI step can refuse to merge if
len(wedge_map) > Nfor the boot-replay scenario, preventing silent re-wedges.