Files
xenia-rs/audit-runs/phase-c22-rtl-enter-leave-control-flow/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

11 KiB

Phase C+22 investigation — RtlEnter/RtlLeave post-wait control-flow divergence (2026-05-18)

TL;DR

ESCALATE. The divergence at tid=6→1 idx=104,607 (canary import.call RtlEnterCriticalSection vs ours import.call RtlLeaveCriticalSection) is a downstream effect of the same scheduler-determinism asymmetry that C+20 escalated. C+21's floating-absorb correctly removes the visible wait.begin jitter event from the diff (floating_wait (c/o) = 2/0 engaged on this chain in the fresh c22 sample), but the post-wait guest-code branch taken in canary because shared state was mutated during the wait is NOT an observation artifact — it's a structural behavioral consequence of scheduler interleaving and cannot be papered over at the diff layer without falsely matching genuinely different guest behavior.

Verification: NOT jitter

Per reading-error #32 discipline, sampled 4 canary cold streams

  • 1 fresh ours cold. The Enter/Leave PATTERN in the post-wait region is structurally consistent across all canary samples:
sample events 104,604-104,615 (tid=6, import.call only)
c21 archived E E L L (nested pair after acquire)
jitter-1 E (wait.begin slow-path) E L L
jitter-2 E E L L (same as c21)
jitter-3 (index-shifted +3) E E L L
fresh c22 E (wait.begin slow-path) E L L

All canary samples take an EXTRA nested RtlEnter after the post- loop E at 104,604. Ours never does — it goes E L NtClose.

The two canary jitter shapes (with vs without the wait.begin emission inside the first E pair) are the C+21 absorption target; both shapes converge to the same post-wait nested-Enter behavior.

Mechanism (classification: A + B-via-A)

C+21 absorption confirmed working — the diff harness correctly folds the wait.begin and handle.create events on shared-global dispatcher sid=75ae880ec432eb36 / raw=0xf8000034 (an Event dispatcher used cross-tid) into the matched prefix:

fresh c22 floating_create (c/o) = 1/0
fresh c22 floating_wait   (c/o) = 2/0

Result: matched prefix advances to 104,607 (canary stream internally at idx 104,610 after C+21 unfolds the 3 absorbed events).

The remaining divergence is:

canary [104,610] import.call RtlEnterCriticalSection  (nested 2nd acquire)
ours   [104,607] import.call RtlLeaveCriticalSection  (release first acquire)

This is NOT a "ghost" event. It's a real divergence in guest control flow at the same logical execution point.

Why it happens

Sylpheed's guest code at this PC, after the post-loop CS acquire, reads a state value (e.g. a queue pointer, a reference count, an event-signaled flag) protected by that CS. Based on the value, it either:

  • (canary's path): re-enters a nested CS to drain or clean up additional state, then releases both levels.
  • (ours's path): proceeds directly to release the outer CS and close the Event handle.

In canary's contended scenario, while tid=6 was blocked on the shared dispatcher at 104,608 (the embedded DISPATCHER_HEADER of the CS object — its wait.begin was on sid=75ae880ec432eb36, the canary's first-toucher SID for this Event), another guest thread held the CS and may have mutated the protected state. When tid=6 resumes and the slow-path RtlEnter completes acquisition, the state value that the post-acquire branch reads has changed, and the branch takes the nested-cleanup path.

In ours, tid=1 never blocked here. No other thread had a chance to mutate the protected state during a wait window. The state value the branch reads is the pre-wait value, and the branch takes the simple-release path.

This is the same downstream effect that the C+20 escalation analysis predicted: "That requires ours to schedule tid=9 ahead of (or concurrently with) tid=1's RtlEnter, exactly as canary's host scheduler did. Ours's deterministic single-stepping scheduler runs tid=1 near-monolithically through this region — tid=9 has no opportunity to claim the CS before tid=1 fast-paths through."

The classification is class A in the C+22 prompt taxonomy: ours's RtlEnter takes a fast path (uncontended) that canary's contended path doesn't — same root cause as C+20.

Why this can't be absorbed in the diff tool (reading-error

#23 risk)

Unlike the wait.begin event itself (which is a transient observation directly correlated to scheduling), the post-divergence Enter / Leave sequence corresponds to distinct guest code paths. Folding canary's extra RtlEnter at idx 104,610 + matching RtlLeave at 104,613 into the matched prefix would require the diff tool to over-absorb a 6-event block per contention occurrence, regardless of whether ours's code path ACTUALLY corresponds to canary's contended path. This crosses the line from "scheduling-jitter mitigation" to "matching genuinely different guest behavior" — reading-error #23 in action.

The C+21 absorb is justified because the wait.begin event is guaranteed to be a no-op observation if/when it fires (canary's xeKeWaitForSingleObject is the slow path that the fast path trivially skips). The post-wait Enter / Leave block is the opposite: real work, real guest code execution.

Engine-side fixes considered and rejected

(i) Wire wait.begin into ours's rtl_enter_critical_section

park path Symmetric to canary, but does NOT fix the divergence at idx 104,607 because ours doesn't park here at all. The patch would be inert in this case; the divergence persists. Useful prophylactic but not the C+22 target.

(ii) Force ours to spin-wait briefly at every RtlEnter to

give other tids a chance to claim the CS Extremely fragile, no guarantee of matching canary's exact interleave. Likely shifts divergence elsewhere without resolving it.

(iii) Implement deterministic CS-priority scheduling

where any other tid that has a pending wait on the same CS gets to run before the current tid's fast-path Would change ours's scheduler semantics broadly. Multi-thousand- LOC scope. Explicitly NOT authorized per the C+22 prompt:

You may NOT (without escalating): Refactor scheduler / thread-model.

(iv) Record canary's contention trace and replay it in ours

("scheduling-trace replay") A new subsystem; recorded under C+20 escalation already.

(v) Modify Sylpheed's guest code at the post-loop branch to

force the simple-release path Would require modifying guest binary — outside scope and defeats the parity goal.

(vi) Add a no-op cs_ptr Phase A emitter additive for

diagnosis ~30 LOC each engine + canary recompile. Cvar-OFF zero-cost. Would allow future investigation to distinguish whether canary's nested RtlEnter at 104,610 is on the SAME CS pointer (recursive bump) or a DIFFERENT CS (nested cleanup lock). Deferred — not needed for the escalation decision because the mechanism (post-wait state mutation) is already established by the C+20 analysis; the additional cs_ptr data would only refine the cause-of-branch story.

Cascade outcome (per C+22 prompt)

  • A=verify divergence is NOT jitter: PASS (4 canary cold samples agree on EE-LL nested pattern; C+21 absorber engaged floating_wait (c/o) = 2/0 and matched prefix is 104,607 exactly).
  • B=classify (A/B/C/D): PASS — (A) ours's RtlEnter fast-paths while canary's contends → downstream state mutation during the wait → different post-acquire branch in guest code.
  • C=land fix or escalate cleanly: ESCALATION (per C+22 prompt authorized fallback).
  • D=main matched-prefix > 104,607: N/A (no engine change).

Cold-vs-cold gate matrix (escalation-mode)

gate result
ours-cold byte-identical to c19 YES (121,569
events match)
Main matched-prefix 104,607 (= C+21)
Sister chains 11/32/3/41/16 ✓
Phase B image_loaded_sha256 unchanged ✓
Engine source UNCHANGED
C+21 absorber engagement 1/0 + 2/0 (fired)

Per-chain delta vs C+21 baseline

NONE. All chains identical to C+21:

chain C+21 C+22 (this) delta
canary tid=6 → ours tid=1 main 104,607 104,607 0
canary tid=4 → ours tid=11 11 11 0
canary tid=7 → ours tid=2 32 32 0
canary tid=12 → ours tid=7 3 3 0
canary tid=14 → ours tid=9 41 41 0
canary tid=15 → ours tid=10 16 16 0

Methodology note — reading-error class #34

#34 (NEW): cold-run determinism depends on input path form. Running ours against default.xex directly (extracted file) produces a different boot trajectory than running against the parent .iso containing it. The C+19 / C+21 baselines used the .iso path; the .xex direct path yields 40x more imports and 1.6M unimpl warnings (CPU stuck/looping in a probe that doesn't fire on the iso). All cold-vs-cold protocol entries MUST use the iso path. Reproduces deterministically: ours-cold against .iso is byte-identical to the c19 archived ours-cold modulo host_ns/guest_cycle fields (verified 121,569 events all match post-normalization).

Likely cause: the iso path triggers xenia_vfs::disc_image:: DiscImageDevice::open at main.rs:1397-1400, mounting a full disc VFS at d:\ / \Device\Cdrom0\. The bare-xex path skips this and leaves the VFS unmounted for most disc-prefixed opens, causing different boot-validator branches.

This affects ALL future cold-vs-cold protocol runs — always pass the .iso path, not the loose .xex.

Recommendation for next sessions

This is the SECOND C-series session (after C+20) classified as scheduler-determinism in the post-loop RtlEnter region near idx 104,607. The pattern is stable and well-understood. Recommended next-target sequence:

  1. C+23 = D-NEW-2 (KeWaitForSingleObject timeout_ns sign/scale asymmetry on tid=12→7 idx=3): canary=-30000000 vs ours=429466729600. Small ε-class encoding fix in ke_wait_for_single_object's timeout-pointer dereference. Independent of scheduler determinism. Out of scope for C+22 per prompt's explicit "You may NOT ... Fix D-NEW-2 in this session."

  2. C+24 = D-NEW-3 (canary tid=14 → ours tid=9 idx=41: canary calls XAudioGetVoiceCategoryVolumeChangeMask while ours calls RtlEnterCriticalSection). Pre-context shows identical KeReleaseSpinLockFromRaisedIrql + KfLowerIrql pair; the next branch picks completely different exports. Likely a missing/stubbed XAudio export in ours that, when absent, causes a fallback to a different code path.

  3. Open the parallel scheduler-determinism track to attack the C+20 / C+22 family at the root. Estimated multi-session refactor; per prompt this is "a separate session."

Files

  • diff-cold-vs-cold.md — full diff report.
  • cold-vs-cold-result.md — matched-prefix table + gates.
  • canary-binary-cache-pre-wipe.tar.gz — pre-wipe oracle backup.
  • canary-xdg-cache-pre-wipe.tar.gz — pre-wipe XDG oracle.
  • escalation.md — this document's TL;DR + recommended next.