Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
167 lines
7.0 KiB
Markdown
167 lines
7.0 KiB
Markdown
# Phase C+21 investigation — wait.begin floating-absorb (2026-05-14)
|
||
|
||
## Framing (extends C+20 reading-error #32)
|
||
|
||
C+20 escalated the divergence at canary tid=6 idx=104,606 to "scheduler
|
||
determinism" because the cross-3-cold-run canary jitter survey showed
|
||
the wait.begin was **host-scheduler-driven** in canary itself:
|
||
|
||
| jitter | idx 104,606 event |
|
||
|--------|----------------------------------------------------------------|
|
||
| 1 | `wait.begin sid=75ae880ec432eb36` |
|
||
| 2 | `kernel.return RtlEnterCriticalSection` (fast path — matches ours) |
|
||
| 3 | offset-shifted; wait.begin at idx 104,603 with different SID |
|
||
|
||
The diff harness's per-tid_event_idx matching anchors to whatever
|
||
canary cold sample is chosen. The bug is observational — ours's
|
||
behavior is structurally equivalent to canary's fast-path; the
|
||
divergence is induced by canary's slow-path entry on contention
|
||
which doesn't reproduce.
|
||
|
||
C+20 deferred a fix because the scope said "no scheduler
|
||
determinism". C+21 takes the lighter-weight option per the prompt:
|
||
**extend the C+18 floating-absorb pattern to wait.begin events
|
||
referencing shared-global SIDs**. This works because:
|
||
|
||
1. The SIDs at issue are shared-global dispatcher SIDs (same
|
||
pointer-derived recipe as the `KEVENT`/`KSEMAPHORE` cases C+18
|
||
addressed).
|
||
2. The wait.begin events themselves are observation-side artifacts of
|
||
thread-scheduling contention — they belong to the same harness-
|
||
observation-error class as the floating handle.create events.
|
||
|
||
## Verified observational
|
||
|
||
3 archived canary cold jitter jsonls plus a fresh C+21 canary cold
|
||
captured under wiped-cache conditions:
|
||
|
||
```
|
||
SID `75ae880ec432eb36`:
|
||
- canary-jitter-1: handle.create on tid=9 idx=295; wait.begin on
|
||
tid=6/9/10/18 (15× total)
|
||
- canary-jitter-2: similar multi-tid usage (NOT at idx 104,606)
|
||
- canary-jitter-3: similar; wait.begin shifted to idx 104,603
|
||
- canary-cold-c21 (fresh): same SID pattern; idx 104,606 fast-paths
|
||
- ours-cold-c19: SID never appears (contention never reproduces)
|
||
```
|
||
|
||
Multi-tid SID usage on tids 6/9/10/18 (or 6/9/10/16/17/18/26 for the
|
||
related SID `a25a16a4f6f547aa`) is a robust shared-global signature.
|
||
|
||
The canary `EmitHandleCreateSharedGlobal` (`event_log.cc:435`)
|
||
asymmetry — it hashes the dispatcher VA but stashes
|
||
`object->handle()` as raw_handle_id — means canary's shared-global
|
||
handle.create events are NOT self-recognizable by the C+18 recipe
|
||
check alone. The C+21 fix adds a complementary cross-tid usage
|
||
heuristic that detects them through their multi-tid presence in
|
||
either handle.create OR wait.begin events.
|
||
|
||
## The fix (diff tool only — no engine changes)
|
||
|
||
`xenia-rs/tools/diff-events/diff_events.py`:
|
||
|
||
1. `collect_shared_global_sids(canary_by_tid, ours_by_tid)`: new
|
||
pre-pass union of (a) recipe-matching handle.create SIDs (C+18)
|
||
AND (b) any SID referenced by handle.create OR wait.begin on 2+
|
||
distinct tids in either engine (C+21 cross-tid heuristic).
|
||
2. `is_shared_global_wait_begin(ev, shared_sids)`: classifies a
|
||
wait.begin event as floating if ANY of its
|
||
`handles_semantic_ids` is in `shared_sids` (covers `wait_type=any`
|
||
and `wait_type=all`).
|
||
3. `diff_one_tid`: extends the two-pointer walk to absorb floating
|
||
wait.begin events on kind mismatch, mirroring the C+18
|
||
handle.create absorption logic. Per-thread waits remain strict —
|
||
only shared-global waits float.
|
||
|
||
Engine source UNCHANGED. Wire format UNCHANGED (`schema_version=1`
|
||
holds; payload structure is identical).
|
||
|
||
Total LOC: ~140 lines additive across `diff_events.py`,
|
||
`test_diff_events.py`, and `schema-v1.md`. 16 new diff-tool test
|
||
assertions on top of the existing 14 — 30 total, all PASS.
|
||
|
||
## 3-jitter verification (per RE class #32 discipline)
|
||
|
||
Pre-C+21 jitter-1 result (from C+19/C+20 baseline): tid=6→1 main
|
||
matched 102,553 (C+18) or 104,606 (C+20 — different SID).
|
||
|
||
Post-C+21:
|
||
|
||
| run | tid=6→1 matched | floating_wait (c/o) |
|
||
|-------------------|-----------------|---------------------|
|
||
| jitter-1 | **104,607** | 1 / 0 |
|
||
| jitter-2 | **104,607** | 0 / 0 |
|
||
| jitter-3 | **104,607** | 3 / 0 |
|
||
| fresh cold-c21 | **104,607** | 0 / 0 |
|
||
|
||
All four canary cold samples converge on the SAME matched-prefix
|
||
(104,607). The C+21 absorb is doing exactly what it should:
|
||
|
||
- jitter-1 contended → 1 wait.begin absorbed → advance past 104,606.
|
||
- jitter-2 fast-pathed → 0 absorbed; matches strictly.
|
||
- jitter-3 had 3 absorbable contended waits scattered → 3 absorbed.
|
||
- fresh c21 fast-pathed → 0 absorbed; matches strictly.
|
||
|
||
Sister chains UNCHANGED:
|
||
|
||
| chain | C+19/C+20 | C+21 | delta |
|
||
|---------------|-----------|---------|-------|
|
||
| 4 → 11 | 11 | 11 | 0 |
|
||
| 7 → 2 | 32 | 32 | 0 |
|
||
| 12 → 7 | 3 | 3 | 0 |
|
||
| 14 → 9 | 41 | 41 | 0 |
|
||
| 15 → 10 | 16 | 16 | 0 |
|
||
|
||
The `floating_create` column shows `0/1` on tid=15→10 (C+18's fix
|
||
still operating) and `1/0` on tid=6→1 of jitter-3 (jitter-3 had an
|
||
extra canary-side handle.create that C+21's recipe match detected).
|
||
No spurious absorption.
|
||
|
||
## The new divergence beyond the jitter cloud
|
||
|
||
At canary tid=6 idx 104,607 (ours tid=1 idx 104,607 post-absorb):
|
||
|
||
```
|
||
[104604] ours+canary import.call RtlEnterCriticalSection
|
||
[104605] ours+canary kernel.call RtlEnterCriticalSection
|
||
[104606] ours+canary kernel.return RtlEnterCriticalSection (both fast-path)
|
||
[104607] canary import.call RtlEnterCriticalSection (ANOTHER CS)
|
||
[104607] ours import.call RtlLeaveCriticalSection (leaves CS)
|
||
```
|
||
|
||
This is a **REAL structural divergence** — canary entered a different
|
||
CS while ours moved on to leave one. Not in scope for C+21. Will be
|
||
addressed in C+22 framing.
|
||
|
||
## Reading-error class #32 — locked in
|
||
|
||
The C+20 documentation introduced #32. C+21 confirms its taxonomy
|
||
applies broadly:
|
||
|
||
> **#32 Canary itself is non-deterministic across cold runs in
|
||
> contention-dependent regions. Single-canary-cold-run sampling is
|
||
> unreliable for matched-prefix in those regions.**
|
||
|
||
The C+21 fix is the diff-tool counter-measure: SIDs that are
|
||
referenced by multi-tid usage are floating; their wait.begin events
|
||
get the same observation-side treatment as their handle.create
|
||
events. The matched-prefix metric becomes **deterministic across
|
||
canary cold samples** within shared-global contention windows.
|
||
|
||
## Cascade outcome
|
||
|
||
- A=design floating-absorb extension: PASS.
|
||
- B=implement + test in diff tool: PASS (~140 LOC, 16 new tests).
|
||
- C=verifies across all 3 jitter jsonls: PASS — all yield 104,607.
|
||
- D=fresh canary measurement: matched-prefix > 104,606: PASS (104,607).
|
||
|
||
## Scope adherence
|
||
|
||
- Engine sources: UNCHANGED.
|
||
- Diff tool: `diff_events.py` + `test_diff_events.py` only.
|
||
- Docs: `schema-v1.md` v1.3 + this audit-run dir.
|
||
- GPU/audio/HID: untouched.
|
||
- D-NEW-2 (`KeWaitForSingleObject` timeout_ns mismatch on tid=12→7
|
||
idx=3): NOT fixed in C+21 — still the next downstream divergence
|
||
on the tid=12→7 chain (matched=3).
|