Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.6 KiB
Phase C+23 cold-vs-cold result (2026-05-18)
Outcome: ENGINE FIX LANDED
addis sign-extension fix at xenia-cpu/src/interpreter.rs resolves
D-NEW-2 (ε-class timeout sign-extension on the canary tid=12 → ours
tid=7 sister chain). 5 LOC effective. Determinism preserved (3× cold
runs byte-identical post-fix).
Matched-prefix table (vs C+22 baseline)
| chain | C+22 | C+23 (fresh) | delta |
|---|---|---|---|
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | 0 |
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
| canary tid=12 → ours tid=7 | 3 | 4 | +1 |
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
Floating-event absorption counts (fresh c23)
| chain | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|
| canary tid=6 → ours tid=1 main | 2 / 0 | 3 / 0 |
| canary tid=15 → ours tid=10 | 0 / 1 | 0 / 0 |
| others | 0 / 0 | 0 / 0 |
C+18 absorber engaged on main chain (2 canary handle.create floated) and on tid=15→10 (1 ours handle.create floated). C+21 absorber engaged on main chain (3 canary wait.begin events floated — this canary cold sample took the contended slow path 3 times).
Cold-stable invariants
- ours-cold byte-identical (det-fields) across 3 runs:
digest
23cf4c4cbf61a577caa4118ab2308ba6. Replaces C+22'se1dfcb1559f987b35012a7f2dc6d93f5baseline (digest moved due to engine source change). New baseline anchored here. - Event count unchanged: 121,569 ours events (matches C+22).
- Phase B
image_canonical_sha256=ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18— UNCHANGED. Image-loading path untouched. - Engine source change:
xenia-cpu/src/interpreter.rs::addis(5 LOC effective, ~25 LOC including comment + commented-out truncation). Noxenia-canarysource changes. No diff-tool changes. - Tests: kernel 204 unchanged; cpu 288 → 291 (3 new regression tests for the addis fix).
Direct fix-verification at the divergence point
ours-cold post-fix, tid=7 events 0-4:
[0] import.call KeWaitForSingleObject
[1] kernel.call KeWaitForSingleObject
[2] handle.create sid=6e3d96c5a52bf429
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
[4] kernel.return return_value=0 status=0x00000000
canary-cold, tid=12 events 0-4:
[0] import.call KeWaitForSingleObject
[1] kernel.call KeWaitForSingleObject
[2] handle.create sid=c49d8f0ab90401ea (different SID, absorbed)
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
[4] kernel.return return_value=258 status=0x00000102 (TIMEOUT)
timeout_ns: -30000000 MATCHES across engines (was 429466729600 pre-fix).
New downstream divergence at idx=4 (C+23 → C+24+ target)
The advance reveals the next-class issue at idx=4:
canary: [4] kernel.return KeWaitForSingleObject return_value=258 (TIMEOUT)
ours: [4] kernel.return KeWaitForSingleObject return_value=0 (SUCCESS)
Classification: (A) scheduler-determinism, same family as C+20 and C+22 escalations. Ours's monolithic-thread runner doesn't allow the 30 ms timeout window to elapse with no signaler, so the wait returns SUCCESS (the event was already signaled at the entry?) or the wait was implicit-fast-served. Canary's contended scheduler lets the timeout fire. Engine-side fix requires the parallel scheduler-determinism track (multi-session refactor).
Verification that fix is NOT diff-tool jitter
Multiple distinct evidences:
- Direct ours-cold inspection — the
wait.begin.timeout_nsfield is read directly from ours-cold.jsonl (no diff-tool interpretation), and it's now -30000000. - Unit tests —
lis_ori_std_negative_timeout_writes_sign_ extended_doublewordin xenia-cpu asserts the architectural fact directly. - Determinism — 3× cold runs produce byte-identical det-fields digest. The fix isn't a race that flickered on this one sample.
- Phase B image hash unchanged — the fix is purely behavioral on the JIT layer, not a re-link or image change.
Cascade outcome
- A=verify canary's timeout read logic: PASS (identical formula).
- B=identify encoding bug class: PASS — (d) sign-extension.
- C=land fix: PASS — 5 LOC + 3 tests.
- D=tid=12→7 advances past 3: PASS (3 → 4).
- E=no regression on main or other sisters: PASS (all preserved).
Files
investigation.mdcold-vs-cold-result.md(this file)diff-cold-vs-cold.mdre-validation.mdours-cold.jsonl/ours-cold-stdout.log/ours-cold-stderr.logcanary-cold-trunc.jsonl/canary-cold-stdout.logcanary-binary-cache-pre-wipe.tar.gz/canary-xdg-cache-pre-wipe.tar.gzdigest-cold-stable-1.json/-2.json/-3.jsonfix.diff
Next-target recommendation
- C+24 = D-NEW-3 (canary tid=14 → ours tid=9 idx=41): canary
calls
XAudioGetVoiceCategoryVolumeChangeMask; ours callsRtlEnterCriticalSection. Likely missing/stubbed XAudio export in ours causing fallback. Independent of scheduler-determinism. - Parallel scheduler-determinism track: tackle the C+20/C+22 + the newly-surfaced C+23-idx=4 family at the root via a per-CS-pointer expected-contention inference layer. Multi-session.