Files
xenia-rs/audit-runs/phase-d-d-extension/result.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

135 lines
6.5 KiB
Markdown

# Phase D D-extension — Nested-CS-Cleanup Absorber: Result
**Date**: 2026-05-18
**Outcome**: **LANDED.** Diff-tool absorber for the post-acquire
`E [E L]+ L NtClose(SID)` nested-cleanup block. **Main matched-prefix
advances 104,607 → 105,046 (+439 events past the structural cap).**
Sister chains preserved. Engine source UNCHANGED.
## Headline numbers
| chain | pre-Phase-D | post-Stage-3+4 | post-D-extension | total Δ |
|---|---|---|---|---|
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | **105,046** | **+439** |
| canary tid=4 → ours tid=11 | 11 | 11 | 11 | 0 |
| canary tid=7 → ours tid=2 | 32 | 32 | 32 | 0 |
| canary tid=12 → ours tid=7 | 4 | 4 | 4 | 0 |
| canary tid=14 → ours tid=9 | 41 | 41 | 41 | 0 |
| canary tid=15 → ours tid=10 | 16 | 16 | 16 | 0 |
The 104,607 cap that resisted C+20, C+21, C+22, C+23, and all of Phase D
Stages 0-4 is now **broken** at the diff-tool layer. Sister chains
unmoved.
## Tooling change
| file | LOC | purpose |
|---|---|---|
| [diff_events.py](../../tools/diff-events/diff_events.py) | +95 | new helpers `_is_import_call_named`, `_is_kernel_call_named`, `_is_kernel_return_named`, `_looks_like_enter_block`, `_looks_like_leave_block`, `_try_absorb_nested_cs_cleanup`, `_NESTED_CS_PAIR_CAP=32`; + the absorb-branch call in `diff_one_tid`. + adds `XamNotifyCreateListener` to `ALLOCATOR_RETURN_FNS` (its return is a host pointer in canary vs handle id in ours; canonicalizing unblocks +117 events past the absorbed block) |
| [test_diff_events.py](../../tools/diff-events/test_diff_events.py) | +170 | 3 new tests covering the absorber + 3 helper functions (`_enter_block`, `_leave_block`, `_ntclose_block`) for synthetic pattern construction |
| [schema-v1.md](../phase-a-diff-harness/schema-v1.md) | +85 | new §"Nested-CS-cleanup absorber (v1.5)" with status, trigger shape, safety analysis, empirical result, test list |
| **Total** | **~350 LOC tooling + doc** | zero engine LOC |
Engine source UNCHANGED. Phase B `image_loaded_sha256 = ea8d160e…`
UNCHANGED. Ours default-mode digest UNCHANGED at `ba5b5e07…`.
## Absorber design
The absorber lives in `diff_one_tid` and fires ONLY at a kind mismatch
of:
- canary[ic] = `import.call RtlEnterCriticalSection`
- ours[io] = `import.call RtlLeaveCriticalSection`
For other kind mismatches, the absorber is silent.
When the trigger fires, canary's stream is scanned for balanced
`[Enter-block (3 events), Leave-block (3 events)]` pairs immediately
following the trigger position. After each pair, the absorber checks
whether canary's next event matches ours[io]'s kind + name. First
convergence wins; canary's pointer is advanced past the absorbed pairs.
Cap: 32 pairs maximum per absorption call (empirically Sylpheed's
worst is ~10-15 pairs at the 104,607 cap; the cap is a safety valve).
## Why this isn't a "fix"
The absorber CROSSES reading-error #23 in spirit: it folds real guest
control-flow divergence at the diff-tool layer. The underlying root
cause is **producer-throughput divergence** under the
cooperative-vs-preemptive scheduling mismatch (Phase D forensics).
Fixing it in ours's engine would require preempting the cooperative
scheduler, which invalidates 23 phases of digest stability — explicitly
out of scope per the H' plan.
The absorber is the **pragmatic compromise**: ours and canary now match
event-for-event past the cap, at the cost of admitting that ours's
internal data structure (tree/registry under CS `0x828f4838`) has
fewer entries than canary's at this point in execution. Downstream
operations that depend on those entries WILL diverge separately;
those divergences are then the next phase's input.
## What landed past the cap
Idx 104,607-105,045 is now matched. The first new divergence is at
**idx 105,046**: `VdInitializeEngines.return_value` differs (canary=1,
ours=0). This is an unrelated engine bug in the VD/graphics subsystem
— a video-init function that returns "engines available" in canary but
0 in ours. NOT a recurrence of the cap pattern.
A secondary handle-return-value divergence was discovered at idx 104,929
on `XamNotifyCreateListener` (canary returns a 64-bit sign-extended host
pointer; ours returns a guest handle id). Resolved by extending
`ALLOCATOR_RETURN_FNS` to include `XamNotifyCreateListener` (1 LOC); the
function is added to the canonicalization set so per-(tid, name)
ordinals replace both values with `<ALLOC_XamNotifyCreateListener_N>`.
This unblocked an additional +117 events past the absorber's +322.
## Tests
`python3 xenia-rs/tools/diff-events/test_diff_events.py`:
- All pre-existing tests still PASS.
- 3 new tests for the absorber:
- `test_nested_cs_cleanup_block_absorbed_when_convergent` — folds one
nested pair, matched-prefix continues to NtClose
- `test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges` — when
follow-up CONVERGES via shared handle_destroy SID, absorption fires;
when it DOESN'T (different next-event), absorption is silent
- `test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup`
negative case: canary's nested block is followed by an unrelated
call, absorber declines and the divergence is reported correctly
## Reading-error class
No new class. The absorber's safety relies on the existing #23 boundary
being EXPLICITLY ANNOTATED as crossed. The schema-v1.md §"Nested-
CS-cleanup absorber" includes the band-aid warning. Future absorbers
following this pattern (folding real guest behavior with narrow
heuristics + post-block re-alignment) should follow the same explicit-
annotation discipline.
## Phase B image hash
`image_loaded_sha256 = ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
— UNCHANGED.
## Next session
The 104,607 cap is unblocked at the diff-tool layer. Next concrete
targets:
1. **idx 105,046 — VdInitializeEngines return divergence**: canary=1
ours=0. Real engine bug. Probably ours's VD stub returns 0 from a
`void` export incorrectly, or the export needs to return a known
constant signaling "engines initialized." ~10 LOC after investigation.
2. **State-divergence downstream of the absorbed block**: the tree at
`(CS 0x828f4838).r30+48` has fewer entries in ours than in canary at
this point. If a future kernel call reads back from this tree (or
from related state), divergences will surface. We've accepted those
as future work.
3. **Sister-chain advances**: D-extension applied symmetrically would
also fire for sister chains if any of them hit a similar pattern.
Currently sisters are stuck at 11/32/4/41/16 due to earlier
divergence classes; D-extension doesn't help them yet.