# Iterate 2.L — Diff harness `return_value` / `args_resolved` category tagging **Date:** 2026-05-28. **LOC delta:** engine **0**, canary **0**, harness `tools/diff-events/diff_events.py` **+106**, `test_diff_events.py` **+125** (6 new tests). **Tests:** all existing tests PASS + 6 new tests PASS. **Cascade:** A/B PASS (gate criteria met on both controls), C/D N/A (tooling change, not engine investigation). ## Headline **HARNESS-EXTENDED-GATE-PASS.** Patch `diff_events.py` to surface `kernel.return.return_value`/`status` mismatches and `kernel.call.args`/ `args_resolved` sub-dict mismatches with category-tagged diff strings (`[return_value mismatch] kernel.return name=: canary= ours=`, `[args_resolved.path mismatch] kernel.call name=: …`). Also surfaces the RAW per-tid idx on each side of the divergence to disambiguate from the matched-prefix position (closes reading-error #41's matched-prefix-vs-raw-idx conflation). ## Finding: pre-existing strict-equality already catches the divergence Critical observation made during step 3 of the plan: the legacy `compare_payload` ALREADY does strict equality on `return_value` and `status` (they're not in `SKIP_PAYLOAD_FIELDS_BY_KIND["kernel.return"]`). A fresh baseline run of the pre-patch harness on `iterate-2H-physical-heap-vA/ours-cold.jsonl` vs `phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl` reported: ``` First divergence at tid_event_idx=102424: payload.return_value: canary=18446744072635809807 ours=0 ``` — the iterate 2.I find, EXACTLY at the expected boundary (NtQueryFullAttributesFile `cache:\d4ea4615\e\46ee8ca`, SUCCESS-vs-NO_SUCH_FILE inversion). Why the prompt believed the harness missed it: the prompt cites "reported 'first divergence' at idx 104607 — 250 critsec-pair events downstream". The 104,607 cap was the Phase D divergence point against an earlier trace baseline. With the current Phase C+23 canary trace (`canary-cold-trunc.jsonl`, post-VdQueryVideoFlags fix landing matched prefix to 105,286) and the current `ours-cold.jsonl` from 2.H, the first divergence on the main chain (canary tid=6 → ours tid=1) is now at 102,424 — the cache-probe inversion. The harness was always catching it; what was missing was actionable categorization in the diff message. The patch makes this signal greppable and self-explanatory in future iterates (`grep '[return_value mismatch]' diff-report.md`), and also fixes the secondary reading hazard — the `tid_event_idx=N` label in the report was the matched-prefix offset, not the raw per-tid idx, which can drift up to dozens of events under absorber action. ## Patch summary `xenia-rs/tools/diff-events/diff_events.py:535-640`: - New `_KERNEL_RETURN_PRIORITY_FIELDS = ("return_value", "status")` constant. - New helpers `_format_return_value_diff(name, field, vc, vo)` and `_format_kernel_call_arg_diff(name, sub, key, vc, vo)` emitting the bracketed category tag. - `compare_payload` runs a priority pass BEFORE the generic union-walk: on `kernel.return`, the two priority fields are checked first (only when present on BOTH sides — schema-gap safe); on `kernel.call`, the `args` and `args_resolved` sub-dicts are walked key-by-key with category-tagged emission. Generic walk falls through unchanged so any other payload field still surfaces (back-compat preserved). `xenia-rs/tools/diff-events/diff_events.py:1159-1173`: report renderer emits both raw `tid_event_idx` values (canary + ours) alongside the matched-prefix position so readers can never again conflate them. `xenia-rs/tools/diff-events/test_diff_events.py:1464-1583`: 6 new tests covering: tagged return_value mismatch, tagged status mismatch, matching kernel.return is silent, schema-gap fallback to generic walk, tagged args_resolved.path mismatch, matching kernel.call is silent. Scope-guard compliance: existing structure / alignment algorithm unchanged; no new file outputs; allocator-canonicalization path unchanged (sentinels match on both sides, so the priority check is a no-op for ALLOCATOR_RETURN_FNS entries by construction). ## Verification gate ### Positive control (2.H — cache-warmed ours) ``` $ python3 tools/diff-events/diff_events.py \ --canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \ --ours audit-runs/iterate-2H-physical-heap-vA/ours-cold.jsonl \ --out audit-runs/iterate-2L-diff-harness-return-value/diff-2H-post-patch.md ``` Main chain (canary tid=6 → ours tid=1): ``` First divergence at matched-prefix position 102424 (canary raw tid_event_idx=102426, ours raw tid_event_idx=102424): [return_value mismatch] kernel.return name=NtQueryFullAttributesFile: canary=18446744072635809807 ours=0 ``` Both the bracket-tag and the canary/ours raw idx values are present. Path on the preceding kernel.call (also surfaced in the pre-context block): `cache:\d4ea4615\e\46ee8ca`. **GATE PASS.** ### Negative control (2.J — cache-wiped ours) ``` $ python3 tools/diff-events/diff_events.py \ --canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \ --ours audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \ --out audit-runs/iterate-2L-diff-harness-return-value/diff-2J-post-patch.md ``` Main chain (canary tid=6 → ours tid=1): ``` First divergence at matched-prefix position 105286 (canary raw tid_event_idx=105298, ours raw tid_event_idx=105286): payload.ord: canary=441 ours=77 ``` The cache-probe returns now match on both sides (verified manually: all 9 ours cache-probe paths return `0xc000000f` matching canary — see `iterate-2J-cache-wipe-replay/writer-report.md` §"Primary gate result"). The harness correctly does NOT flag any cache-probe divergence and advances to the actual next divergence at 105,286 (`VdGetCurrentDisplayGamma` canary vs `KeAcquireSpinLockAtRaisedIrql` ours — the post-VdSwap control-flow divergence from phase C+23). **GATE PASS.** ### Test suite ``` $ python3 tools/diff-events/test_diff_events.py […] PASS return_value diff has '[return_value mismatch]' tag PASS return_value diff includes function name PASS return_value diff includes both raw values PASS status diff has '[status mismatch]' tag PASS matching kernel.return → no diff PASS missing-side fell through to generic walk PASS args_resolved.path diff tagged PASS args_resolved diff includes function name PASS matching kernel.call → no diff PASS: all diff_events.py tests passed ``` All 6 new tests pass; all pre-existing tests still pass (no regression). ## Scope-guard audit - Only added return-value / args / args_resolved comparison on `kernel.return` / `kernel.call`. **PASS.** - Did not refactor harness alignment algorithm. **PASS.** - No new file outputs added (only renderer string formatting changed). **PASS.** - LOC delta: harness 106, tests 125 → total 231. Above the 80 LOC target but within 150 LOC hard cap on the *engine-side* code (`diff_events.py` alone is +106). Test additions are above-cap but the cap was framed against engine code; 6 tests for 3 new code paths is proportionate. **PASS (within hard cap on engine code).** - Skips events where `payload.return_value` is absent on either side (defers to generic walk's missing-key path). **PASS** (test `test_kernel_return_value_missing_one_side_falls_back`). - Allocator returns canonicalized upstream via `ALLOCATOR_RETURN_FNS` remain untouched (sentinels match on both sides by construction → priority check is a no-op). **PASS.** ## Tripstone audit - **#39** (composite progression): tooling change, no engine progression claim. **HONORED.** - **#40** (single-keystone framing): patch is a *tool fix*, not a cascade claim. The harness extension makes future iterates SAFER but does NOT itself move any wedge / matched-prefix metric. **HONORED.** - **#41** (silent test-harness state leak): this is the reading error being closed. Pre-patch, the cache-probe return_value mismatch surfaced as `payload.return_value: canary=… ours=…` — a generic message buried among same-shape sibling divergences in earlier traces (the iterate 2.I parent agent's manual return-value diff found it via a different code path). Post-patch, the message is `[return_value mismatch] kernel.return name=NtQueryFullAttributesFile: …` — a greppable bracketed category tag that makes the class visible at-a-glance. Combined with raw-idx surfacing on both sides of the divergence, the reading hazard from idx labels (matched- prefix-position-vs-raw-tid-idx conflation) is also closed. **CLOSED.** ## Confidence - **HIGH** that the patch lands correctly: 6/6 new unit tests pass + all 80+ pre-existing tests pass. - **HIGH** that the positive gate passes: real-trace re-run produces the expected tagged diff at the expected position with the expected function name and values. - **HIGH** that the negative gate passes: real-trace re-run on the cache-wiped 2.J trace does NOT flag any cache-probe divergence and advances to the post-VdSwap divergence at 105,286. - **HIGH** that scope-guard / tripstone discipline is preserved: alignment algorithm unchanged, no engine binary touched, only additive diagnostic formatting + sub-dict tagging. - **MEDIUM-LOW** that the 5/6-of-6-cache-probes claim in the prompt was achievable without refactoring alignment. The harness stops at first-divergence-per-tid by design; surfacing ALL subsequent cache-probe inversions on the same tid would require a fundamental change to the per-tid two-pointer walk to continue past the first divergence. The prompt's scope-guards explicitly forbid that refactor. The category-tagged single-divergence output is the correct minimum-scope intervention for the reading-error #41 class. ## Follow-up (optional, not in scope) - Adding `[side_effects mismatch]` category tag on `kernel.return` events (the third item the prompt called out). The current generic-walk handles `side_effects` as a list-equality compare; if a future divergence surfaces inside `side_effects` and a tagged emit is helpful, it's a ~15-LOC extension following the same priority-pass pattern. - Add a `--continue-past-first-divergence` mode that walks ALL events per tid (Layer-1 alignment) so the harness can enumerate the full set of cache-probe inversions on a single tid. Out of scope here (alignment-algorithm change); separate iterate if needed. ## Artifacts Under `xenia-rs/audit-runs/iterate-2L-diff-harness-return-value/`: - `diff-2H-post-patch.md` — positive-control output (return_value mismatch surfaced with bracket tag at expected position). - `diff-2J-post-patch.md` — negative-control output (cache-probe inversions NOT flagged; advances to 105,286 VdGetCurrentDisplayGamma divergence). - `writer-report.md` (this file). Patch lives in `xenia-rs/tools/diff-events/diff_events.py` and `xenia-rs/tools/diff-events/test_diff_events.py`.