handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,52 @@
|
||||
# Phase C+20 cold-vs-cold result (2026-05-14)
|
||||
|
||||
## No engine changes
|
||||
|
||||
C+20 produces NO source modifications to either `xenia-rs` or
|
||||
`xenia-canary`. The session was a verify-only iteration concluding
|
||||
in an escalation decision (see `investigation.md`).
|
||||
|
||||
## Matched-prefix table (vs C+19 baseline)
|
||||
|
||||
| chain | C+19 | C+20 | delta |
|
||||
|--------------------------------|---------|---------|-------|
|
||||
| canary tid=6 → ours tid=1 main | 104,606 | 104,606 | 0 |
|
||||
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
|
||||
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
|
||||
| canary tid=12 → ours tid=7 | 3 | 3 | 0 |
|
||||
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
|
||||
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
|
||||
|
||||
## Cold-stable digest
|
||||
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` — unchanged from C+13/C+15-α
|
||||
/C+16/C+17/C+18/C+19 (no source changes; digest cannot drift).
|
||||
|
||||
## Phase B image hash
|
||||
|
||||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18` —
|
||||
unchanged (no source changes).
|
||||
|
||||
## Tests
|
||||
|
||||
Kernel: 204 PASS (unchanged from C+19, no test additions).
|
||||
|
||||
## Canary non-determinism observation (NEW — RE class #32)
|
||||
|
||||
Cross-validated across 3 fresh canary cold jsonls
|
||||
(`canary-jitter-{1,2,3}.jsonl` from C+19 jitter probe, all wiped-cache
|
||||
cold). At tid=6 idx 104,606:
|
||||
|
||||
- jitter-1: `wait.begin sid=75ae880ec432eb36 timeout=-1`
|
||||
- jitter-2: `kernel.return RtlEnterCriticalSection rv=0` ← matches ours
|
||||
- jitter-3: `kernel.call RtlLeaveCriticalSection` (offset shift; the
|
||||
wait.begin moved to idx 104,603 with a different SID)
|
||||
|
||||
The contention pattern in canary's `RtlEnterCriticalSection` is
|
||||
host-scheduler-dependent. The matched-prefix metric is unreliable in
|
||||
this region.
|
||||
|
||||
## Outcome
|
||||
|
||||
ESCALATE to scheduler-determinism track (separate session, larger
|
||||
scope).
|
||||
271
audit-runs/phase-c20-rtl-enter-cs-wait/investigation.md
Normal file
271
audit-runs/phase-c20-rtl-enter-cs-wait/investigation.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Phase C+20 investigation — RtlEnterCriticalSection wait.begin (2026-05-14)
|
||||
|
||||
## Framing verification (reading-error #28 discipline)
|
||||
|
||||
### Canary's RtlEnterCriticalSection — xboxkrnl_rtl.cc:596-633
|
||||
|
||||
```cpp
|
||||
void RtlEnterCriticalSection_entry(pointer_t<X_RTL_CRITICAL_SECTION> cs) {
|
||||
if (!cs.guest_address()) { ... return; }
|
||||
CriticalSectionPrefetchW(&cs->lock_count);
|
||||
uint32_t cur_thread = XThread::GetCurrentThread()->guest_object();
|
||||
uint32_t spin_count = cs->header.absolute * 256;
|
||||
|
||||
if (cs->owning_thread == cur_thread) { // RECURSIVE FAST PATH
|
||||
xe::atomic_inc(&cs->lock_count);
|
||||
cs->recursion_count++;
|
||||
return;
|
||||
}
|
||||
|
||||
// Spin loop
|
||||
while (spin_count--) {
|
||||
if (xe::atomic_cas(-1, 0, &cs->lock_count)) { // UNCONTENDED FAST PATH
|
||||
cs->owning_thread = cur_thread;
|
||||
cs->recursion_count = 1;
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
if (xe::atomic_inc(&cs->lock_count) != 0) { // CONTENDED SLOW PATH
|
||||
// Create a full waiter.
|
||||
xeKeWaitForSingleObject(reinterpret_cast<void*>(cs.host_address()), 8, 0, 0,
|
||||
nullptr);
|
||||
}
|
||||
assert_true(cs->owning_thread == 0);
|
||||
cs->owning_thread = cur_thread;
|
||||
cs->recursion_count = 1;
|
||||
}
|
||||
```
|
||||
|
||||
Canary **only** emits `wait.begin` on the contended slow path (via the
|
||||
`xeKeWaitForSingleObject` call). The wait handle is the CS struct
|
||||
pointer; `xeKeWaitForSingleObject` resolves it via `XObject::GetNativeObject`
|
||||
which lazy-wraps the embedded `DISPATCHER_HEADER` (first 12 bytes of the
|
||||
CS struct) as an `XEvent` — the SID `75ae880ec432eb36` (object_type=1,
|
||||
raw_handle=0xf8000044) seen at canary tid=9 idx=295 IS this Event,
|
||||
synthesized on first contention.
|
||||
|
||||
### xeKeWaitForSingleObject emit point — xboxkrnl_threading.cc:969-991
|
||||
|
||||
```cpp
|
||||
uint32_t xeKeWaitForSingleObject(void* object_ptr, uint32_t wait_reason, ...) {
|
||||
auto object = XObject::GetNativeObject<XObject>(kernel_state(), object_ptr);
|
||||
if (!object) { assert_always(); return X_STATUS_ABANDONED_WAIT_0; }
|
||||
|
||||
if (phase_a::IsEnabled()) {
|
||||
uint64_t sid = 0;
|
||||
if (!object->handles().empty()) {
|
||||
sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
|
||||
}
|
||||
int64_t timeout_ns = timeout_ptr ? (*timeout_ptr * 100) : -1;
|
||||
phase_a::EmitWaitBegin(&sid, 1, timeout_ns, alertable != 0, false);
|
||||
}
|
||||
|
||||
X_STATUS result = object->Wait(...);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Confirms: `wait.begin` fires only when the slow path is taken.
|
||||
|
||||
### Ours's rtl_enter_critical_section — exports.rs:2886-2946
|
||||
|
||||
Has three branches:
|
||||
1. `owner == 0 || !owner_is_live` → claim uncontended.
|
||||
2. `owner == current_tid` → recursive bump.
|
||||
3. otherwise → park current thread on `cs_waiters` via
|
||||
`state.scheduler.park_current(BlockReason::CriticalSection(cs_ptr))`.
|
||||
|
||||
The park path does NOT emit `wait.begin`. Symmetric to canary's slow
|
||||
path semantically, but no schema event.
|
||||
|
||||
## Divergent event observed (fresh canary cold + fresh ours cold)
|
||||
|
||||
```
|
||||
[104604] ours+canary import.call RtlEnterCriticalSection
|
||||
[104605] ours+canary kernel.call RtlEnterCriticalSection
|
||||
[104606] CANARY wait.begin sid=75ae880ec432eb36 timeout=-1 wait_type=any
|
||||
[104606] OURS kernel.return RtlEnterCriticalSection rv=0
|
||||
[104607] CANARY kernel.return RtlEnterCriticalSection rv=0
|
||||
```
|
||||
|
||||
## Classification
|
||||
|
||||
This is a **(B) Real contention difference**, NOT (A) always-wait, NOT
|
||||
(C) emit gap.
|
||||
|
||||
Evidence:
|
||||
|
||||
1. Canary's RtlEnterCriticalSection source code provably only emits
|
||||
wait.begin in the contended branch. The earlier two
|
||||
RtlEnterCriticalSection sequences (canary tid=6 idx=104,598-600 and
|
||||
idx=104,608-610) BOTH fast-path (no wait.begin) — proving canary's
|
||||
path is conditional on contention.
|
||||
|
||||
2. SID `75ae880ec432eb36` appears 15 times in canary, on 4 different
|
||||
tids (tid=6/9/10/18). Always with object_type=1 (Event). All 15
|
||||
are `wait.begin` (or 1 `handle.create` first-touch). This is a
|
||||
shared CS used across the title's thread pool.
|
||||
|
||||
3. At canary's idx 104,604, the CS is contended because tid=9 is
|
||||
simultaneously doing cache-file work (NtCreateFile
|
||||
cache:\69d8e45ce534ffea.tmp at canary tid=9 idx=305) that almost
|
||||
certainly enters the same CS first. Canary's host_ns gap between
|
||||
ours-idx 104,603 (RtlLeave) and 104,604 (RtlEnter) is **268.2 ms**,
|
||||
during which thousands of other-tid events fire.
|
||||
|
||||
4. At ours's idx 104,604, only tid=1 and tid=5 are active in a 1ms
|
||||
window around the call. tid=5 is in `MmFreePhysicalMemory` — not
|
||||
touching this CS. Ours's gap between idx 104,603→104,604 is
|
||||
**7.6 μs**. Effectively single-threaded.
|
||||
|
||||
5. Ours has no other live thread holding this CS — fast path is the
|
||||
correct semantic result for ours's scheduling.
|
||||
|
||||
## Why this is scheduler determinism
|
||||
|
||||
The contention pattern emerges from the **interleaving** of multiple
|
||||
guest threads racing on a shared CS. To make ours produce the same
|
||||
event sequence as canary at this idx, we would need:
|
||||
|
||||
- tid=9 (or another holder) to be currently inside its critical
|
||||
section block when tid=1 reaches idx 104,604.
|
||||
- That requires ours to schedule tid=9 ahead of (or concurrently
|
||||
with) tid=1's RtlEnter, exactly as canary's host scheduler did.
|
||||
- Ours's deterministic single-stepping scheduler runs tid=1
|
||||
near-monolithically through this region — tid=9 has no opportunity
|
||||
to claim the CS before tid=1 fast-paths through.
|
||||
|
||||
This is the canonical signature of **cross-thread scheduling
|
||||
asymmetry**. Fixing it requires either:
|
||||
|
||||
(i) Reworking ours's scheduler to interleave threads at finer
|
||||
granularity matching canary's preemption points — substantial
|
||||
refactor of `xenia-cpu::scheduler`.
|
||||
|
||||
(ii) Recording a "scheduling trace" from canary (which thread holds
|
||||
which CS at which guest_cycle) and replaying it in ours — new
|
||||
subsystem.
|
||||
|
||||
(iii) Forcing ours to spin-wait briefly at every RtlEnter so other
|
||||
tids get a chance to claim the CS — extremely fragile, no
|
||||
guarantee of matching canary's exact interleave.
|
||||
|
||||
None of these are scoped for a single phase-C iteration. The prompt's
|
||||
authorized scope explicitly says:
|
||||
|
||||
> You may NOT refactor thread scheduling (escalation: scheduler
|
||||
> determinism is a separate session).
|
||||
|
||||
> Escalation: if classification is (B) and scheduler determinism is
|
||||
> required, escalate cleanly — don't push through.
|
||||
|
||||
## Decision: ESCALATE + diff-tool TODO
|
||||
|
||||
C+20 produces no engine change. The classification, supporting
|
||||
evidence, and recommended escalation path are recorded for a future
|
||||
"scheduler-determinism" milestone.
|
||||
|
||||
**Additional diff-tool action (NOT executed in C+20 per scope)**: the
|
||||
diff tool should be taught to absorb cross-tid race-window
|
||||
`wait.begin` events on shared CS dispatchers (analog to C+18's
|
||||
shared-global SID floating-absorb for `handle.create`). The
|
||||
divergence at idx 104,606 is a strict sub-case of class #30
|
||||
(scheduling-determinism observation artifact). A follow-up phase
|
||||
(C+20.5 or part of the scheduler-determinism track) should:
|
||||
|
||||
1. Detect `wait.begin` events with SID matching the canary jitter-1's
|
||||
`75ae880ec432eb36` pattern (multi-tid usage, type=1 Event,
|
||||
first-touched by `GetNativeObject` from an RtlEnter slow path).
|
||||
2. Mark as "scheduling-jitter-window" and floating-absorb in the diff
|
||||
walk so matched-prefix doesn't anchor to it.
|
||||
|
||||
This would reveal the true next divergence beyond the jitter cloud.
|
||||
|
||||
## Risk of "partial" fixes considered
|
||||
|
||||
### Could we just always emit wait.begin in ours's rtl_enter_critical_section?
|
||||
|
||||
No — would produce phantom wait.begin events on the fast path where
|
||||
canary correctly emits none. Would regress at the very next
|
||||
RtlEnterCriticalSection that ours fast-paths (e.g., ours idx 104,598
|
||||
where canary also fast-paths). Net effect: shifts the divergence
|
||||
elsewhere, doesn't fix it.
|
||||
|
||||
### Could we wire wait.begin into ours's park_current(CriticalSection)?
|
||||
|
||||
Yes — this would be semantically symmetric to canary and is a small
|
||||
patch (~25 LOC). But it would NOT fix the divergence at idx 104,606,
|
||||
because ours doesn't park at this call site at all. The patch would
|
||||
be inert until a different test case exposes a path where ours
|
||||
*does* park on a CS. Useful prophylactic, but not the C+20 target.
|
||||
|
||||
### Could we remove the `owner_is_live` shortcut?
|
||||
|
||||
The `!owner_is_live` heuristic in ours treats `owner != 0 &&
|
||||
find_by_tid(owner).is_none()` as "free". At idx 104,604, this is not
|
||||
the triggered branch — the CS is genuinely uncontended (`owner == 0`
|
||||
on the first probe), so removing it doesn't change behavior here.
|
||||
|
||||
## Reading-error class #31 (documented per prompt) + #32 (NEW)
|
||||
|
||||
**#31 Stale-canary-jsonl trap — always re-run canary fresh for cold-vs-cold
|
||||
measurements.** The prompt established this.
|
||||
|
||||
**#32 (NEW) Canary itself is non-deterministic across cold runs in
|
||||
contention-dependent regions.** Cross-checking the 3 fresh canary jitter
|
||||
jsonls at tid=6 idx 104,595-104,612 confirms canary is structurally
|
||||
non-deterministic here:
|
||||
|
||||
| jitter | idx 104,606 event |
|
||||
|--------|----------------------------------------------------------------|
|
||||
| 1 | `wait.begin sid=75ae880ec432eb36` |
|
||||
| 2 | `kernel.return RtlEnterCriticalSection` (fast path, no wait!) |
|
||||
| 3 | `kernel.call RtlLeaveCriticalSection` (sequence shifted; the |
|
||||
| | wait.begin shifted to idx 104,603 with sid=a25a16a4f6f547aa) |
|
||||
|
||||
jitter-2's behavior at idx 104,606 is **bit-identical to ours**. jitter-3
|
||||
has the wait.begin at a different idx with a different SID — proving the
|
||||
contention pattern is host-scheduler-dependent in canary itself.
|
||||
|
||||
This means:
|
||||
|
||||
1. The prompt's framing ("canary emits wait.begin, ours emits
|
||||
kernel.return") was based on ONE jitter sample (jitter-1). It is not
|
||||
a stable structural property of canary.
|
||||
2. Matched-prefix as a cross-engine metric is **unreliable** in regions
|
||||
where canary's contention is host-scheduler-driven.
|
||||
3. There is NO real engine bug to fix here. Ours's behavior matches
|
||||
canary jitter-2 at idx 104,606 verbatim.
|
||||
|
||||
**Reading-error class #32**: assuming canary determinism by sampling
|
||||
ONE cold run; need ≥2-3 cold samples to distinguish "real divergence"
|
||||
from "scheduler-driven jitter window".
|
||||
|
||||
## Cascade outcome
|
||||
|
||||
- A=verify canary's RtlEnterCriticalSection impl: PASS.
|
||||
- B=classify (A/B/C): PASS — (B), real contention.
|
||||
- C=land fix (or clean escalation): ESCALATION (per prompt authorized
|
||||
scope).
|
||||
- D=main matched-prefix > 104,606: N/A (no code change).
|
||||
|
||||
## Recommendation for next session
|
||||
|
||||
C+20-escalation = open a parallel **scheduler-determinism** track:
|
||||
|
||||
1. Add a per-CS-pointer "expected contention" inference from canary
|
||||
logs.
|
||||
2. Drive ours's scheduler to preempt tid=1 at each RtlEnter site
|
||||
where canary's matched call exhibits a wait.begin.
|
||||
3. Verify diff-tool absorbs as a structured "scheduling-trace replay"
|
||||
event class.
|
||||
|
||||
In parallel, address **D-NEW-2** (`KeWaitForSingleObject` `timeout_ns`
|
||||
sign/scale asymmetry on tid=12→7 idx=3) — a small ε-class encoding
|
||||
fix that's independent of scheduler determinism.
|
||||
|
||||
Also worth landing as a small prophylactic patch (NOT in C+20): wire
|
||||
`wait.begin` into ours's `rtl_enter_critical_section` park path so
|
||||
that whenever the slow path IS triggered, ours emits the schema event.
|
||||
Defer until first such case manifests.
|
||||
Reference in New Issue
Block a user