Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
263 lines
11 KiB
Markdown
263 lines
11 KiB
Markdown
# Phase C+22 investigation — RtlEnter/RtlLeave post-wait control-flow divergence (2026-05-18)
|
|
|
|
## TL;DR
|
|
|
|
**ESCALATE.** The divergence at tid=6→1 idx=104,607 (canary
|
|
`import.call RtlEnterCriticalSection` vs ours `import.call
|
|
RtlLeaveCriticalSection`) is a downstream effect of the **same
|
|
scheduler-determinism asymmetry** that C+20 escalated. C+21's
|
|
floating-absorb correctly removes the visible `wait.begin` jitter
|
|
event from the diff (`floating_wait (c/o) = 2/0` engaged on this
|
|
chain in the fresh c22 sample), but the **post-wait guest-code
|
|
branch** taken in canary because shared state was mutated during
|
|
the wait is NOT an observation artifact — it's a structural
|
|
behavioral consequence of scheduler interleaving and cannot be
|
|
papered over at the diff layer without falsely matching genuinely
|
|
different guest behavior.
|
|
|
|
## Verification: NOT jitter
|
|
|
|
Per reading-error #32 discipline, sampled 4 canary cold streams
|
|
+ 1 fresh ours cold. The Enter/Leave PATTERN in the post-wait
|
|
region is structurally consistent across all canary samples:
|
|
|
|
| sample | events 104,604-104,615 (tid=6, import.call only) |
|
|
|---------------|--------------------------------------------------------|
|
|
| c21 archived | E E L L (nested pair after acquire) |
|
|
| jitter-1 | E (wait.begin slow-path) E L L |
|
|
| jitter-2 | E E L L (same as c21) |
|
|
| jitter-3 | (index-shifted +3) E E L L |
|
|
| fresh c22 | E (wait.begin slow-path) E L L |
|
|
|
|
All canary samples take an EXTRA nested RtlEnter after the post-
|
|
loop `E` at 104,604. Ours never does — it goes `E L NtClose`.
|
|
|
|
The two canary jitter shapes (with vs without the wait.begin
|
|
emission inside the first E pair) are the C+21 absorption target;
|
|
both shapes converge to the same post-wait nested-Enter behavior.
|
|
|
|
## Mechanism (classification: A + B-via-A)
|
|
|
|
C+21 absorption confirmed working — the diff harness correctly
|
|
folds the wait.begin and handle.create events on shared-global
|
|
dispatcher `sid=75ae880ec432eb36 / raw=0xf8000034` (an Event
|
|
dispatcher used cross-tid) into the matched prefix:
|
|
|
|
```
|
|
fresh c22 floating_create (c/o) = 1/0
|
|
fresh c22 floating_wait (c/o) = 2/0
|
|
```
|
|
|
|
Result: matched prefix advances to 104,607 (canary stream
|
|
internally at idx 104,610 after C+21 unfolds the 3 absorbed
|
|
events).
|
|
|
|
The remaining divergence is:
|
|
|
|
```
|
|
canary [104,610] import.call RtlEnterCriticalSection (nested 2nd acquire)
|
|
ours [104,607] import.call RtlLeaveCriticalSection (release first acquire)
|
|
```
|
|
|
|
This is NOT a "ghost" event. It's a real divergence in **guest
|
|
control flow** at the same logical execution point.
|
|
|
|
### Why it happens
|
|
|
|
Sylpheed's guest code at this PC, after the post-loop CS acquire,
|
|
reads a state value (e.g. a queue pointer, a reference count, an
|
|
event-signaled flag) protected by that CS. Based on the value, it
|
|
either:
|
|
|
|
- (canary's path): re-enters a nested CS to drain or clean up
|
|
additional state, then releases both levels.
|
|
- (ours's path): proceeds directly to release the outer CS and
|
|
close the Event handle.
|
|
|
|
In canary's contended scenario, while tid=6 was blocked on the
|
|
shared dispatcher at 104,608 (the embedded `DISPATCHER_HEADER` of
|
|
the CS object — its `wait.begin` was on `sid=75ae880ec432eb36`,
|
|
the canary's first-toucher SID for this Event), **another guest
|
|
thread held the CS and may have mutated the protected state**.
|
|
When tid=6 resumes and the slow-path RtlEnter completes
|
|
acquisition, the state value that the post-acquire branch reads
|
|
has changed, and the branch takes the nested-cleanup path.
|
|
|
|
In ours, tid=1 never blocked here. No other thread had a chance
|
|
to mutate the protected state during a wait window. The state
|
|
value the branch reads is the pre-wait value, and the branch
|
|
takes the simple-release path.
|
|
|
|
This is the same downstream effect that the C+20 escalation
|
|
analysis predicted: *"That requires ours to schedule tid=9 ahead
|
|
of (or concurrently with) tid=1's RtlEnter, exactly as canary's
|
|
host scheduler did. Ours's deterministic single-stepping
|
|
scheduler runs tid=1 near-monolithically through this region —
|
|
tid=9 has no opportunity to claim the CS before tid=1 fast-paths
|
|
through."*
|
|
|
|
The classification is class A in the C+22 prompt taxonomy:
|
|
**ours's RtlEnter takes a fast path (uncontended) that canary's
|
|
contended path doesn't — same root cause as C+20.**
|
|
|
|
### Why this can't be absorbed in the diff tool (reading-error
|
|
#23 risk)
|
|
|
|
Unlike the wait.begin event itself (which is a transient
|
|
observation directly correlated to scheduling), the
|
|
post-divergence Enter / Leave sequence corresponds to **distinct
|
|
guest code paths**. Folding canary's extra RtlEnter at idx
|
|
104,610 + matching RtlLeave at 104,613 into the matched prefix
|
|
would require the diff tool to over-absorb a 6-event block per
|
|
contention occurrence, regardless of whether ours's code path
|
|
ACTUALLY corresponds to canary's contended path. This crosses
|
|
the line from "scheduling-jitter mitigation" to "matching
|
|
genuinely different guest behavior" — reading-error #23 in
|
|
action.
|
|
|
|
The C+21 absorb is justified because the wait.begin event is
|
|
guaranteed to be a no-op observation if/when it fires (canary's
|
|
xeKeWaitForSingleObject is the slow path that the fast path
|
|
trivially skips). The post-wait Enter / Leave block is the
|
|
opposite: real work, real guest code execution.
|
|
|
|
## Engine-side fixes considered and rejected
|
|
|
|
### (i) Wire wait.begin into ours's `rtl_enter_critical_section`
|
|
park path
|
|
Symmetric to canary, but does NOT fix the divergence at idx
|
|
104,607 because ours doesn't park here at all. The patch would
|
|
be inert in this case; the divergence persists. Useful
|
|
prophylactic but not the C+22 target.
|
|
|
|
### (ii) Force ours to spin-wait briefly at every RtlEnter to
|
|
give other tids a chance to claim the CS
|
|
Extremely fragile, no guarantee of matching canary's exact
|
|
interleave. Likely shifts divergence elsewhere without resolving
|
|
it.
|
|
|
|
### (iii) Implement deterministic CS-priority scheduling
|
|
where any other tid that has a pending wait on the same CS gets
|
|
to run before the current tid's fast-path
|
|
Would change ours's scheduler semantics broadly. Multi-thousand-
|
|
LOC scope. Explicitly NOT authorized per the C+22 prompt:
|
|
|
|
> You may NOT (without escalating): Refactor scheduler /
|
|
> thread-model.
|
|
|
|
### (iv) Record canary's contention trace and replay it in ours
|
|
("scheduling-trace replay")
|
|
A new subsystem; recorded under C+20 escalation already.
|
|
|
|
### (v) Modify Sylpheed's guest code at the post-loop branch to
|
|
force the simple-release path
|
|
Would require modifying guest binary — outside scope and
|
|
defeats the parity goal.
|
|
|
|
### (vi) Add a no-op `cs_ptr` Phase A emitter additive for
|
|
diagnosis
|
|
~30 LOC each engine + canary recompile. Cvar-OFF zero-cost.
|
|
Would allow future investigation to distinguish whether
|
|
canary's nested RtlEnter at 104,610 is on the SAME CS pointer
|
|
(recursive bump) or a DIFFERENT CS (nested cleanup lock).
|
|
Deferred — not needed for the escalation decision because the
|
|
mechanism (post-wait state mutation) is already established by
|
|
the C+20 analysis; the additional `cs_ptr` data would only
|
|
refine the cause-of-branch story.
|
|
|
|
## Cascade outcome (per C+22 prompt)
|
|
|
|
- A=verify divergence is NOT jitter: PASS (4 canary cold samples
|
|
agree on EE-LL nested pattern; C+21 absorber engaged
|
|
`floating_wait (c/o) = 2/0` and matched prefix is 104,607
|
|
exactly).
|
|
- B=classify (A/B/C/D): PASS — **(A) ours's RtlEnter fast-paths
|
|
while canary's contends → downstream state mutation during the
|
|
wait → different post-acquire branch in guest code.**
|
|
- C=land fix or escalate cleanly: ESCALATION (per C+22 prompt
|
|
authorized fallback).
|
|
- D=main matched-prefix > 104,607: N/A (no engine change).
|
|
|
|
## Cold-vs-cold gate matrix (escalation-mode)
|
|
|
|
| gate | result |
|
|
|-------------------------------------|-------------------|
|
|
| ours-cold byte-identical to c19 | YES (121,569 |
|
|
| | events match) |
|
|
| Main matched-prefix | 104,607 (= C+21) |
|
|
| Sister chains | 11/32/3/41/16 ✓ |
|
|
| Phase B `image_loaded_sha256` | unchanged ✓ |
|
|
| Engine source | UNCHANGED |
|
|
| C+21 absorber engagement | 1/0 + 2/0 (fired) |
|
|
|
|
## Per-chain delta vs C+21 baseline
|
|
|
|
NONE. All chains identical to C+21:
|
|
|
|
| chain | C+21 | C+22 (this) | delta |
|
|
|--------------------------------|---------|-------------|-------|
|
|
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | 0 |
|
|
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
|
|
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
|
|
| canary tid=12 → ours tid=7 | 3 | 3 | 0 |
|
|
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
|
|
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
|
|
|
|
## Methodology note — reading-error class #34
|
|
|
|
**#34 (NEW): cold-run determinism depends on input path form.**
|
|
Running ours against `default.xex` directly (extracted file)
|
|
produces a different boot trajectory than running against the
|
|
parent `.iso` containing it. The C+19 / C+21 baselines used the
|
|
`.iso` path; the `.xex` direct path yields 40x more imports and
|
|
1.6M unimpl warnings (CPU stuck/looping in a probe that doesn't
|
|
fire on the iso). All cold-vs-cold protocol entries MUST use
|
|
the iso path. Reproduces deterministically: ours-cold against
|
|
`.iso` is byte-identical to the c19 archived ours-cold modulo
|
|
host_ns/guest_cycle fields (verified 121,569 events all match
|
|
post-normalization).
|
|
|
|
Likely cause: the iso path triggers `xenia_vfs::disc_image::
|
|
DiscImageDevice::open` at main.rs:1397-1400, mounting a full
|
|
disc VFS at `d:\` / `\Device\Cdrom0\`. The bare-xex path skips
|
|
this and leaves the VFS unmounted for most disc-prefixed
|
|
opens, causing different boot-validator branches.
|
|
|
|
This affects ALL future cold-vs-cold protocol runs — always
|
|
pass the .iso path, not the loose .xex.
|
|
|
|
## Recommendation for next sessions
|
|
|
|
This is the SECOND C-series session (after C+20) classified as
|
|
scheduler-determinism in the post-loop RtlEnter region near idx
|
|
104,607. The pattern is stable and well-understood. Recommended
|
|
next-target sequence:
|
|
|
|
1. **C+23 = D-NEW-2** (`KeWaitForSingleObject` `timeout_ns`
|
|
sign/scale asymmetry on tid=12→7 idx=3): canary=-30000000
|
|
vs ours=429466729600. Small ε-class encoding fix in
|
|
`ke_wait_for_single_object`'s timeout-pointer dereference.
|
|
Independent of scheduler determinism. Out of scope for C+22
|
|
per prompt's explicit "You may NOT ... Fix D-NEW-2 in this
|
|
session."
|
|
|
|
2. **C+24 = D-NEW-3** (canary tid=14 → ours tid=9 idx=41:
|
|
canary calls `XAudioGetVoiceCategoryVolumeChangeMask` while
|
|
ours calls `RtlEnterCriticalSection`). Pre-context shows
|
|
identical KeReleaseSpinLockFromRaisedIrql + KfLowerIrql
|
|
pair; the next branch picks completely different exports.
|
|
Likely a missing/stubbed XAudio export in ours that, when
|
|
absent, causes a fallback to a different code path.
|
|
|
|
3. **Open the parallel scheduler-determinism track** to attack
|
|
the C+20 / C+22 family at the root. Estimated multi-session
|
|
refactor; per prompt this is "a separate session."
|
|
|
|
## Files
|
|
|
|
- `diff-cold-vs-cold.md` — full diff report.
|
|
- `cold-vs-cold-result.md` — matched-prefix table + gates.
|
|
- `canary-binary-cache-pre-wipe.tar.gz` — pre-wipe oracle backup.
|
|
- `canary-xdg-cache-pre-wipe.tar.gz` — pre-wipe XDG oracle.
|
|
- `escalation.md` — this document's TL;DR + recommended next.
|