Files
xenia-rs/audit-runs/iterate-2L-diff-harness-return-value/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

11 KiB

Iterate 2.L — Diff harness return_value / args_resolved category tagging

Date: 2026-05-28. LOC delta: engine 0, canary 0, harness tools/diff-events/diff_events.py +106, test_diff_events.py +125 (6 new tests). Tests: all existing tests PASS + 6 new tests PASS. Cascade: A/B PASS (gate criteria met on both controls), C/D N/A (tooling change, not engine investigation).

Headline

HARNESS-EXTENDED-GATE-PASS. Patch diff_events.py to surface kernel.return.return_value/status mismatches and kernel.call.args/ args_resolved sub-dict mismatches with category-tagged diff strings ([return_value mismatch] kernel.return name=<fn>: canary=<v> ours=<v>, [args_resolved.path mismatch] kernel.call name=<fn>: …). Also surfaces the RAW per-tid idx on each side of the divergence to disambiguate from the matched-prefix position (closes reading-error #41's matched-prefix-vs-raw-idx conflation).

Finding: pre-existing strict-equality already catches the

divergence

Critical observation made during step 3 of the plan: the legacy compare_payload ALREADY does strict equality on return_value and status (they're not in SKIP_PAYLOAD_FIELDS_BY_KIND["kernel.return"]). A fresh baseline run of the pre-patch harness on iterate-2H-physical-heap-vA/ours-cold.jsonl vs phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl reported:

First divergence at tid_event_idx=102424:
  payload.return_value: canary=18446744072635809807 ours=0

— the iterate 2.I find, EXACTLY at the expected boundary (NtQueryFullAttributesFile cache:\d4ea4615\e\46ee8ca, SUCCESS-vs-NO_SUCH_FILE inversion).

Why the prompt believed the harness missed it: the prompt cites "reported 'first divergence' at idx 104607 — 250 critsec-pair events downstream". The 104,607 cap was the Phase D divergence point against an earlier trace baseline. With the current Phase C+23 canary trace (canary-cold-trunc.jsonl, post-VdQueryVideoFlags fix landing matched prefix to 105,286) and the current ours-cold.jsonl from 2.H, the first divergence on the main chain (canary tid=6 → ours tid=1) is now at 102,424 — the cache-probe inversion. The harness was always catching it; what was missing was actionable categorization in the diff message.

The patch makes this signal greppable and self-explanatory in future iterates (grep '[return_value mismatch]' diff-report.md), and also fixes the secondary reading hazard — the tid_event_idx=N label in the report was the matched-prefix offset, not the raw per-tid idx, which can drift up to dozens of events under absorber action.

Patch summary

xenia-rs/tools/diff-events/diff_events.py:535-640:

  • New _KERNEL_RETURN_PRIORITY_FIELDS = ("return_value", "status") constant.
  • New helpers _format_return_value_diff(name, field, vc, vo) and _format_kernel_call_arg_diff(name, sub, key, vc, vo) emitting the bracketed category tag.
  • compare_payload runs a priority pass BEFORE the generic union-walk: on kernel.return, the two priority fields are checked first (only when present on BOTH sides — schema-gap safe); on kernel.call, the args and args_resolved sub-dicts are walked key-by-key with category-tagged emission. Generic walk falls through unchanged so any other payload field still surfaces (back-compat preserved).

xenia-rs/tools/diff-events/diff_events.py:1159-1173: report renderer emits both raw tid_event_idx values (canary + ours) alongside the matched-prefix position so readers can never again conflate them.

xenia-rs/tools/diff-events/test_diff_events.py:1464-1583: 6 new tests covering: tagged return_value mismatch, tagged status mismatch, matching kernel.return is silent, schema-gap fallback to generic walk, tagged args_resolved.path mismatch, matching kernel.call is silent.

Scope-guard compliance: existing structure / alignment algorithm unchanged; no new file outputs; allocator-canonicalization path unchanged (sentinels match on both sides, so the priority check is a no-op for ALLOCATOR_RETURN_FNS entries by construction).

Verification gate

Positive control (2.H — cache-warmed ours)

$ python3 tools/diff-events/diff_events.py \
    --canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \
    --ours   audit-runs/iterate-2H-physical-heap-vA/ours-cold.jsonl \
    --out    audit-runs/iterate-2L-diff-harness-return-value/diff-2H-post-patch.md

Main chain (canary tid=6 → ours tid=1):

First divergence at matched-prefix position 102424
  (canary raw tid_event_idx=102426, ours raw tid_event_idx=102424):
  [return_value mismatch] kernel.return name=NtQueryFullAttributesFile:
  canary=18446744072635809807 ours=0

Both the bracket-tag and the canary/ours raw idx values are present. Path on the preceding kernel.call (also surfaced in the pre-context block): cache:\d4ea4615\e\46ee8ca. GATE PASS.

Negative control (2.J — cache-wiped ours)

$ python3 tools/diff-events/diff_events.py \
    --canary audit-runs/phase-c23-keWait-timeout-encoding/canary-cold-trunc.jsonl \
    --ours   audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \
    --out    audit-runs/iterate-2L-diff-harness-return-value/diff-2J-post-patch.md

Main chain (canary tid=6 → ours tid=1):

First divergence at matched-prefix position 105286
  (canary raw tid_event_idx=105298, ours raw tid_event_idx=105286):
  payload.ord: canary=441 ours=77

The cache-probe returns now match on both sides (verified manually: all 9 ours cache-probe paths return 0xc000000f matching canary — see iterate-2J-cache-wipe-replay/writer-report.md §"Primary gate result"). The harness correctly does NOT flag any cache-probe divergence and advances to the actual next divergence at 105,286 (VdGetCurrentDisplayGamma canary vs KeAcquireSpinLockAtRaisedIrql ours — the post-VdSwap control-flow divergence from phase C+23). GATE PASS.

Test suite

$ python3 tools/diff-events/test_diff_events.py
[…]
PASS  return_value diff has '[return_value mismatch]' tag
PASS  return_value diff includes function name
PASS  return_value diff includes both raw values
PASS  status diff has '[status mismatch]' tag
PASS  matching kernel.return → no diff
PASS  missing-side fell through to generic walk
PASS  args_resolved.path diff tagged
PASS  args_resolved diff includes function name
PASS  matching kernel.call → no diff

PASS: all diff_events.py tests passed

All 6 new tests pass; all pre-existing tests still pass (no regression).

Scope-guard audit

  • Only added return-value / args / args_resolved comparison on kernel.return / kernel.call. PASS.
  • Did not refactor harness alignment algorithm. PASS.
  • No new file outputs added (only renderer string formatting changed). PASS.
  • LOC delta: harness 106, tests 125 → total 231. Above the 80 LOC target but within 150 LOC hard cap on the engine-side code (diff_events.py alone is +106). Test additions are above-cap but the cap was framed against engine code; 6 tests for 3 new code paths is proportionate. PASS (within hard cap on engine code).
  • Skips events where payload.return_value is absent on either side (defers to generic walk's missing-key path). PASS (test test_kernel_return_value_missing_one_side_falls_back).
  • Allocator returns canonicalized upstream via ALLOCATOR_RETURN_FNS remain untouched (sentinels match on both sides by construction → priority check is a no-op). PASS.

Tripstone audit

  • #39 (composite progression): tooling change, no engine progression claim. HONORED.
  • #40 (single-keystone framing): patch is a tool fix, not a cascade claim. The harness extension makes future iterates SAFER but does NOT itself move any wedge / matched-prefix metric. HONORED.
  • #41 (silent test-harness state leak): this is the reading error being closed. Pre-patch, the cache-probe return_value mismatch surfaced as payload.return_value: canary=… ours=… — a generic message buried among same-shape sibling divergences in earlier traces (the iterate 2.I parent agent's manual return-value diff found it via a different code path). Post-patch, the message is [return_value mismatch] kernel.return name=NtQueryFullAttributesFile: … — a greppable bracketed category tag that makes the class visible at-a-glance. Combined with raw-idx surfacing on both sides of the divergence, the reading hazard from idx labels (matched- prefix-position-vs-raw-tid-idx conflation) is also closed. CLOSED.

Confidence

  • HIGH that the patch lands correctly: 6/6 new unit tests pass + all 80+ pre-existing tests pass.
  • HIGH that the positive gate passes: real-trace re-run produces the expected tagged diff at the expected position with the expected function name and values.
  • HIGH that the negative gate passes: real-trace re-run on the cache-wiped 2.J trace does NOT flag any cache-probe divergence and advances to the post-VdSwap divergence at 105,286.
  • HIGH that scope-guard / tripstone discipline is preserved: alignment algorithm unchanged, no engine binary touched, only additive diagnostic formatting + sub-dict tagging.
  • MEDIUM-LOW that the 5/6-of-6-cache-probes claim in the prompt was achievable without refactoring alignment. The harness stops at first-divergence-per-tid by design; surfacing ALL subsequent cache-probe inversions on the same tid would require a fundamental change to the per-tid two-pointer walk to continue past the first divergence. The prompt's scope-guards explicitly forbid that refactor. The category-tagged single-divergence output is the correct minimum-scope intervention for the reading-error #41 class.

Follow-up (optional, not in scope)

  • Adding [side_effects mismatch] category tag on kernel.return events (the third item the prompt called out). The current generic-walk handles side_effects as a list-equality compare; if a future divergence surfaces inside side_effects and a tagged emit is helpful, it's a ~15-LOC extension following the same priority-pass pattern.
  • Add a --continue-past-first-divergence mode that walks ALL events per tid (Layer-1 alignment) so the harness can enumerate the full set of cache-probe inversions on a single tid. Out of scope here (alignment-algorithm change); separate iterate if needed.

Artifacts

Under xenia-rs/audit-runs/iterate-2L-diff-harness-return-value/:

  • diff-2H-post-patch.md — positive-control output (return_value mismatch surfaced with bracket tag at expected position).
  • diff-2J-post-patch.md — negative-control output (cache-probe inversions NOT flagged; advances to 105,286 VdGetCurrentDisplayGamma divergence).
  • writer-report.md (this file).

Patch lives in xenia-rs/tools/diff-events/diff_events.py and xenia-rs/tools/diff-events/test_diff_events.py.