Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
135 lines
5.6 KiB
Markdown
135 lines
5.6 KiB
Markdown
# Phase C+23 cold-vs-cold result (2026-05-18)
|
||
|
||
## Outcome: ENGINE FIX LANDED
|
||
|
||
`addis` sign-extension fix at `xenia-cpu/src/interpreter.rs` resolves
|
||
D-NEW-2 (ε-class timeout sign-extension on the canary tid=12 → ours
|
||
tid=7 sister chain). 5 LOC effective. Determinism preserved (3× cold
|
||
runs byte-identical post-fix).
|
||
|
||
## Matched-prefix table (vs C+22 baseline)
|
||
|
||
| chain | C+22 | C+23 (fresh) | delta |
|
||
|--------------------------------|---------|--------------|-------|
|
||
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | 0 |
|
||
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
|
||
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
|
||
| canary tid=12 → ours tid=7 | 3 | **4** | **+1** |
|
||
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
|
||
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
|
||
|
||
## Floating-event absorption counts (fresh c23)
|
||
|
||
| chain | floating_create (c/o) | floating_wait (c/o) |
|
||
|--------------------------------|-----------------------|---------------------|
|
||
| canary tid=6 → ours tid=1 main | 2 / 0 | 3 / 0 |
|
||
| canary tid=15 → ours tid=10 | 0 / 1 | 0 / 0 |
|
||
| others | 0 / 0 | 0 / 0 |
|
||
|
||
C+18 absorber engaged on main chain (2 canary handle.create floated)
|
||
and on tid=15→10 (1 ours handle.create floated). C+21 absorber engaged
|
||
on main chain (3 canary wait.begin events floated — this canary cold
|
||
sample took the contended slow path 3 times).
|
||
|
||
## Cold-stable invariants
|
||
|
||
- **ours-cold byte-identical (det-fields) across 3 runs**:
|
||
digest `23cf4c4cbf61a577caa4118ab2308ba6`. Replaces C+22's
|
||
`e1dfcb1559f987b35012a7f2dc6d93f5` baseline (digest moved due
|
||
to engine source change). New baseline anchored here.
|
||
- **Event count** unchanged: 121,569 ours events (matches C+22).
|
||
- **Phase B `image_canonical_sha256` =
|
||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`**
|
||
— UNCHANGED. Image-loading path untouched.
|
||
- **Engine source change**: `xenia-cpu/src/interpreter.rs::addis`
|
||
(5 LOC effective, ~25 LOC including comment + commented-out
|
||
truncation). No `xenia-canary` source changes. No diff-tool changes.
|
||
- **Tests**: kernel 204 unchanged; cpu 288 → 291 (3 new regression
|
||
tests for the addis fix).
|
||
|
||
## Direct fix-verification at the divergence point
|
||
|
||
ours-cold post-fix, tid=7 events 0-4:
|
||
|
||
```
|
||
[0] import.call KeWaitForSingleObject
|
||
[1] kernel.call KeWaitForSingleObject
|
||
[2] handle.create sid=6e3d96c5a52bf429
|
||
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
|
||
[4] kernel.return return_value=0 status=0x00000000
|
||
```
|
||
|
||
canary-cold, tid=12 events 0-4:
|
||
|
||
```
|
||
[0] import.call KeWaitForSingleObject
|
||
[1] kernel.call KeWaitForSingleObject
|
||
[2] handle.create sid=c49d8f0ab90401ea (different SID, absorbed)
|
||
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
|
||
[4] kernel.return return_value=258 status=0x00000102 (TIMEOUT)
|
||
```
|
||
|
||
`timeout_ns: -30000000` MATCHES across engines (was `429466729600` pre-fix).
|
||
|
||
## New downstream divergence at idx=4 (C+23 → C+24+ target)
|
||
|
||
The advance reveals the next-class issue at idx=4:
|
||
|
||
```
|
||
canary: [4] kernel.return KeWaitForSingleObject return_value=258 (TIMEOUT)
|
||
ours: [4] kernel.return KeWaitForSingleObject return_value=0 (SUCCESS)
|
||
```
|
||
|
||
Classification: **(A) scheduler-determinism**, same family as C+20
|
||
and C+22 escalations. Ours's monolithic-thread runner doesn't allow
|
||
the 30 ms timeout window to elapse with no signaler, so the wait
|
||
returns SUCCESS (the event was already signaled at the entry?) or
|
||
the wait was implicit-fast-served. Canary's contended scheduler lets
|
||
the timeout fire. Engine-side fix requires the parallel
|
||
scheduler-determinism track (multi-session refactor).
|
||
|
||
## Verification that fix is NOT diff-tool jitter
|
||
|
||
Multiple distinct evidences:
|
||
|
||
1. **Direct ours-cold inspection** — the `wait.begin.timeout_ns`
|
||
field is read directly from ours-cold.jsonl (no diff-tool
|
||
interpretation), and it's now -30000000.
|
||
2. **Unit tests** — `lis_ori_std_negative_timeout_writes_sign_
|
||
extended_doubleword` in xenia-cpu asserts the architectural fact
|
||
directly.
|
||
3. **Determinism** — 3× cold runs produce byte-identical det-fields
|
||
digest. The fix isn't a race that flickered on this one sample.
|
||
4. **Phase B image hash unchanged** — the fix is purely behavioral
|
||
on the JIT layer, not a re-link or image change.
|
||
|
||
## Cascade outcome
|
||
|
||
- A=verify canary's timeout read logic: PASS (identical formula).
|
||
- B=identify encoding bug class: PASS — (d) sign-extension.
|
||
- C=land fix: PASS — 5 LOC + 3 tests.
|
||
- D=tid=12→7 advances past 3: PASS (3 → 4).
|
||
- E=no regression on main or other sisters: PASS (all preserved).
|
||
|
||
## Files
|
||
|
||
- `investigation.md`
|
||
- `cold-vs-cold-result.md` (this file)
|
||
- `diff-cold-vs-cold.md`
|
||
- `re-validation.md`
|
||
- `ours-cold.jsonl` / `ours-cold-stdout.log` / `ours-cold-stderr.log`
|
||
- `canary-cold-trunc.jsonl` / `canary-cold-stdout.log`
|
||
- `canary-binary-cache-pre-wipe.tar.gz` / `canary-xdg-cache-pre-wipe.tar.gz`
|
||
- `digest-cold-stable-1.json` / `-2.json` / `-3.json`
|
||
- `fix.diff`
|
||
|
||
## Next-target recommendation
|
||
|
||
- **C+24 = D-NEW-3** (canary tid=14 → ours tid=9 idx=41): canary
|
||
calls `XAudioGetVoiceCategoryVolumeChangeMask`; ours calls
|
||
`RtlEnterCriticalSection`. Likely missing/stubbed XAudio export
|
||
in ours causing fallback. Independent of scheduler-determinism.
|
||
- **Parallel scheduler-determinism track**: tackle the C+20/C+22 +
|
||
the newly-surfaced C+23-idx=4 family at the root via a
|
||
per-CS-pointer expected-contention inference layer. Multi-session.
|