Files
xenia-rs/audit-runs/phase-d-d-extension/result.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

6.5 KiB

Phase D D-extension — Nested-CS-Cleanup Absorber: Result

Date: 2026-05-18 Outcome: LANDED. Diff-tool absorber for the post-acquire E [E L]+ L NtClose(SID) nested-cleanup block. Main matched-prefix advances 104,607 → 105,046 (+439 events past the structural cap). Sister chains preserved. Engine source UNCHANGED.

Headline numbers

chain pre-Phase-D post-Stage-3+4 post-D-extension total Δ
canary tid=6 → ours tid=1 main 104,607 104,607 105,046 +439
canary tid=4 → ours tid=11 11 11 11 0
canary tid=7 → ours tid=2 32 32 32 0
canary tid=12 → ours tid=7 4 4 4 0
canary tid=14 → ours tid=9 41 41 41 0
canary tid=15 → ours tid=10 16 16 16 0

The 104,607 cap that resisted C+20, C+21, C+22, C+23, and all of Phase D Stages 0-4 is now broken at the diff-tool layer. Sister chains unmoved.

Tooling change

file LOC purpose
diff_events.py +95 new helpers _is_import_call_named, _is_kernel_call_named, _is_kernel_return_named, _looks_like_enter_block, _looks_like_leave_block, _try_absorb_nested_cs_cleanup, _NESTED_CS_PAIR_CAP=32; + the absorb-branch call in diff_one_tid. + adds XamNotifyCreateListener to ALLOCATOR_RETURN_FNS (its return is a host pointer in canary vs handle id in ours; canonicalizing unblocks +117 events past the absorbed block)
test_diff_events.py +170 3 new tests covering the absorber + 3 helper functions (_enter_block, _leave_block, _ntclose_block) for synthetic pattern construction
schema-v1.md +85 new §"Nested-CS-cleanup absorber (v1.5)" with status, trigger shape, safety analysis, empirical result, test list
Total ~350 LOC tooling + doc zero engine LOC

Engine source UNCHANGED. Phase B image_loaded_sha256 = ea8d160e… UNCHANGED. Ours default-mode digest UNCHANGED at ba5b5e07….

Absorber design

The absorber lives in diff_one_tid and fires ONLY at a kind mismatch of:

  • canary[ic] = import.call RtlEnterCriticalSection
  • ours[io] = import.call RtlLeaveCriticalSection

For other kind mismatches, the absorber is silent.

When the trigger fires, canary's stream is scanned for balanced [Enter-block (3 events), Leave-block (3 events)] pairs immediately following the trigger position. After each pair, the absorber checks whether canary's next event matches ours[io]'s kind + name. First convergence wins; canary's pointer is advanced past the absorbed pairs.

Cap: 32 pairs maximum per absorption call (empirically Sylpheed's worst is ~10-15 pairs at the 104,607 cap; the cap is a safety valve).

Why this isn't a "fix"

The absorber CROSSES reading-error #23 in spirit: it folds real guest control-flow divergence at the diff-tool layer. The underlying root cause is producer-throughput divergence under the cooperative-vs-preemptive scheduling mismatch (Phase D forensics). Fixing it in ours's engine would require preempting the cooperative scheduler, which invalidates 23 phases of digest stability — explicitly out of scope per the H' plan.

The absorber is the pragmatic compromise: ours and canary now match event-for-event past the cap, at the cost of admitting that ours's internal data structure (tree/registry under CS 0x828f4838) has fewer entries than canary's at this point in execution. Downstream operations that depend on those entries WILL diverge separately; those divergences are then the next phase's input.

What landed past the cap

Idx 104,607-105,045 is now matched. The first new divergence is at idx 105,046: VdInitializeEngines.return_value differs (canary=1, ours=0). This is an unrelated engine bug in the VD/graphics subsystem — a video-init function that returns "engines available" in canary but 0 in ours. NOT a recurrence of the cap pattern.

A secondary handle-return-value divergence was discovered at idx 104,929 on XamNotifyCreateListener (canary returns a 64-bit sign-extended host pointer; ours returns a guest handle id). Resolved by extending ALLOCATOR_RETURN_FNS to include XamNotifyCreateListener (1 LOC); the function is added to the canonicalization set so per-(tid, name) ordinals replace both values with <ALLOC_XamNotifyCreateListener_N>. This unblocked an additional +117 events past the absorber's +322.

Tests

python3 xenia-rs/tools/diff-events/test_diff_events.py:

  • All pre-existing tests still PASS.
  • 3 new tests for the absorber:
    • test_nested_cs_cleanup_block_absorbed_when_convergent — folds one nested pair, matched-prefix continues to NtClose
    • test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges — when follow-up CONVERGES via shared handle_destroy SID, absorption fires; when it DOESN'T (different next-event), absorption is silent
    • test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup — negative case: canary's nested block is followed by an unrelated call, absorber declines and the divergence is reported correctly

Reading-error class

No new class. The absorber's safety relies on the existing #23 boundary being EXPLICITLY ANNOTATED as crossed. The schema-v1.md §"Nested- CS-cleanup absorber" includes the band-aid warning. Future absorbers following this pattern (folding real guest behavior with narrow heuristics + post-block re-alignment) should follow the same explicit- annotation discipline.

Phase B image hash

image_loaded_sha256 = ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18 — UNCHANGED.

Next session

The 104,607 cap is unblocked at the diff-tool layer. Next concrete targets:

  1. idx 105,046 — VdInitializeEngines return divergence: canary=1 ours=0. Real engine bug. Probably ours's VD stub returns 0 from a void export incorrectly, or the export needs to return a known constant signaling "engines initialized." ~10 LOC after investigation.

  2. State-divergence downstream of the absorbed block: the tree at (CS 0x828f4838).r30+48 has fewer entries in ours than in canary at this point. If a future kernel call reads back from this tree (or from related state), divergences will surface. We've accepted those as future work.

  3. Sister-chain advances: D-extension applied symmetrically would also fire for sister chains if any of them hit a similar pattern. Currently sisters are stuck at 11/32/4/41/16 due to earlier divergence classes; D-extension doesn't help them yet.