Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 KiB
Iterate 2.L — Diff harness return_value / args_resolved category tagging
Date: 2026-05-28. LOC delta: engine 0, canary 0, harness
tools/diff-events/diff_events.py +106, test_diff_events.py
+125 (6 new tests). Tests: all existing tests PASS + 6 new tests
PASS. Cascade: A/B PASS (gate criteria met on both controls), C/D
N/A (tooling change, not engine investigation).
Headline
HARNESS-EXTENDED-GATE-PASS. Patch diff_events.py to surface
kernel.return.return_value/status mismatches and kernel.call.args/
args_resolved sub-dict mismatches with category-tagged diff strings
([return_value mismatch] kernel.return name=<fn>: canary=<v> ours=<v>,
[args_resolved.path mismatch] kernel.call name=<fn>: …). Also surfaces
the RAW per-tid idx on each side of the divergence to disambiguate from
the matched-prefix position (closes reading-error #41's
matched-prefix-vs-raw-idx conflation).
Finding: pre-existing strict-equality already catches the
divergence
Critical observation made during step 3 of the plan: the legacy
compare_payload ALREADY does strict equality on return_value and
status (they're not in SKIP_PAYLOAD_FIELDS_BY_KIND["kernel.return"]).
A fresh baseline run of the pre-patch harness on
iterate-2H-physical-heap-vA/ours-cold.jsonl vs
phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl reported:
First divergence at tid_event_idx=102424:
payload.return_value: canary=18446744072635809807 ours=0
— the iterate 2.I find, EXACTLY at the expected boundary
(NtQueryFullAttributesFile cache:\d4ea4615\e\46ee8ca,
SUCCESS-vs-NO_SUCH_FILE inversion).
Why the prompt believed the harness missed it: the prompt cites
"reported 'first divergence' at idx 104607 — 250 critsec-pair events
downstream". The 104,607 cap was the Phase D divergence point against
an earlier trace baseline. With the current Phase C+23 canary trace
(canary-cold-trunc.jsonl, post-VdQueryVideoFlags fix landing matched
prefix to 105,286) and the current ours-cold.jsonl from 2.H, the
first divergence on the main chain (canary tid=6 → ours tid=1) is now
at 102,424 — the cache-probe inversion. The harness was always
catching it; what was missing was actionable categorization in the
diff message.
The patch makes this signal greppable and self-explanatory in future
iterates (grep '[return_value mismatch]' diff-report.md), and also
fixes the secondary reading hazard — the tid_event_idx=N label in
the report was the matched-prefix offset, not the raw per-tid idx,
which can drift up to dozens of events under absorber action.
Patch summary
xenia-rs/tools/diff-events/diff_events.py:535-640:
- New
_KERNEL_RETURN_PRIORITY_FIELDS = ("return_value", "status")constant. - New helpers
_format_return_value_diff(name, field, vc, vo)and_format_kernel_call_arg_diff(name, sub, key, vc, vo)emitting the bracketed category tag. compare_payloadruns a priority pass BEFORE the generic union-walk: onkernel.return, the two priority fields are checked first (only when present on BOTH sides — schema-gap safe); onkernel.call, theargsandargs_resolvedsub-dicts are walked key-by-key with category-tagged emission. Generic walk falls through unchanged so any other payload field still surfaces (back-compat preserved).
xenia-rs/tools/diff-events/diff_events.py:1159-1173: report renderer
emits both raw tid_event_idx values (canary + ours) alongside the
matched-prefix position so readers can never again conflate them.
xenia-rs/tools/diff-events/test_diff_events.py:1464-1583: 6 new tests
covering: tagged return_value mismatch, tagged status mismatch,
matching kernel.return is silent, schema-gap fallback to generic walk,
tagged args_resolved.path mismatch, matching kernel.call is silent.
Scope-guard compliance: existing structure / alignment algorithm unchanged; no new file outputs; allocator-canonicalization path unchanged (sentinels match on both sides, so the priority check is a no-op for ALLOCATOR_RETURN_FNS entries by construction).
Verification gate
Positive control (2.H — cache-warmed ours)
$ python3 tools/diff-events/diff_events.py \
--canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \
--ours audit-runs/iterate-2H-physical-heap-vA/ours-cold.jsonl \
--out audit-runs/iterate-2L-diff-harness-return-value/diff-2H-post-patch.md
Main chain (canary tid=6 → ours tid=1):
First divergence at matched-prefix position 102424
(canary raw tid_event_idx=102426, ours raw tid_event_idx=102424):
[return_value mismatch] kernel.return name=NtQueryFullAttributesFile:
canary=18446744072635809807 ours=0
Both the bracket-tag and the canary/ours raw idx values are present.
Path on the preceding kernel.call (also surfaced in the pre-context
block): cache:\d4ea4615\e\46ee8ca. GATE PASS.
Negative control (2.J — cache-wiped ours)
$ python3 tools/diff-events/diff_events.py \
--canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \
--ours audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \
--out audit-runs/iterate-2L-diff-harness-return-value/diff-2J-post-patch.md
Main chain (canary tid=6 → ours tid=1):
First divergence at matched-prefix position 105286
(canary raw tid_event_idx=105298, ours raw tid_event_idx=105286):
payload.ord: canary=441 ours=77
The cache-probe returns now match on both sides (verified manually:
all 9 ours cache-probe paths return 0xc000000f matching canary —
see iterate-2J-cache-wipe-replay/writer-report.md §"Primary gate
result"). The harness correctly does NOT flag any cache-probe
divergence and advances to the actual next divergence at 105,286
(VdGetCurrentDisplayGamma canary vs KeAcquireSpinLockAtRaisedIrql
ours — the post-VdSwap control-flow divergence from phase C+23).
GATE PASS.
Test suite
$ python3 tools/diff-events/test_diff_events.py
[…]
PASS return_value diff has '[return_value mismatch]' tag
PASS return_value diff includes function name
PASS return_value diff includes both raw values
PASS status diff has '[status mismatch]' tag
PASS matching kernel.return → no diff
PASS missing-side fell through to generic walk
PASS args_resolved.path diff tagged
PASS args_resolved diff includes function name
PASS matching kernel.call → no diff
PASS: all diff_events.py tests passed
All 6 new tests pass; all pre-existing tests still pass (no regression).
Scope-guard audit
- Only added return-value / args / args_resolved comparison on
kernel.return/kernel.call. PASS. - Did not refactor harness alignment algorithm. PASS.
- No new file outputs added (only renderer string formatting changed). PASS.
- LOC delta: harness 106, tests 125 → total 231. Above the 80 LOC
target but within 150 LOC hard cap on the engine-side code
(
diff_events.pyalone is +106). Test additions are above-cap but the cap was framed against engine code; 6 tests for 3 new code paths is proportionate. PASS (within hard cap on engine code). - Skips events where
payload.return_valueis absent on either side (defers to generic walk's missing-key path). PASS (testtest_kernel_return_value_missing_one_side_falls_back). - Allocator returns canonicalized upstream via
ALLOCATOR_RETURN_FNSremain untouched (sentinels match on both sides by construction → priority check is a no-op). PASS.
Tripstone audit
- #39 (composite progression): tooling change, no engine progression claim. HONORED.
- #40 (single-keystone framing): patch is a tool fix, not a cascade claim. The harness extension makes future iterates SAFER but does NOT itself move any wedge / matched-prefix metric. HONORED.
- #41 (silent test-harness state leak): this is the reading error
being closed. Pre-patch, the cache-probe return_value mismatch
surfaced as
payload.return_value: canary=… ours=…— a generic message buried among same-shape sibling divergences in earlier traces (the iterate 2.I parent agent's manual return-value diff found it via a different code path). Post-patch, the message is[return_value mismatch] kernel.return name=NtQueryFullAttributesFile: …— a greppable bracketed category tag that makes the class visible at-a-glance. Combined with raw-idx surfacing on both sides of the divergence, the reading hazard from idx labels (matched- prefix-position-vs-raw-tid-idx conflation) is also closed. CLOSED.
Confidence
- HIGH that the patch lands correctly: 6/6 new unit tests pass + all 80+ pre-existing tests pass.
- HIGH that the positive gate passes: real-trace re-run produces the expected tagged diff at the expected position with the expected function name and values.
- HIGH that the negative gate passes: real-trace re-run on the cache-wiped 2.J trace does NOT flag any cache-probe divergence and advances to the post-VdSwap divergence at 105,286.
- HIGH that scope-guard / tripstone discipline is preserved: alignment algorithm unchanged, no engine binary touched, only additive diagnostic formatting + sub-dict tagging.
- MEDIUM-LOW that the 5/6-of-6-cache-probes claim in the prompt was achievable without refactoring alignment. The harness stops at first-divergence-per-tid by design; surfacing ALL subsequent cache-probe inversions on the same tid would require a fundamental change to the per-tid two-pointer walk to continue past the first divergence. The prompt's scope-guards explicitly forbid that refactor. The category-tagged single-divergence output is the correct minimum-scope intervention for the reading-error #41 class.
Follow-up (optional, not in scope)
- Adding
[side_effects mismatch]category tag onkernel.returnevents (the third item the prompt called out). The current generic-walk handlesside_effectsas a list-equality compare; if a future divergence surfaces insideside_effectsand a tagged emit is helpful, it's a ~15-LOC extension following the same priority-pass pattern. - Add a
--continue-past-first-divergencemode that walks ALL events per tid (Layer-1 alignment) so the harness can enumerate the full set of cache-probe inversions on a single tid. Out of scope here (alignment-algorithm change); separate iterate if needed.
Artifacts
Under xenia-rs/audit-runs/iterate-2L-diff-harness-return-value/:
diff-2H-post-patch.md— positive-control output (return_value mismatch surfaced with bracket tag at expected position).diff-2J-post-patch.md— negative-control output (cache-probe inversions NOT flagged; advances to 105,286 VdGetCurrentDisplayGamma divergence).writer-report.md(this file).
Patch lives in xenia-rs/tools/diff-events/diff_events.py and
xenia-rs/tools/diff-events/test_diff_events.py.