handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,134 @@
|
||||
# Phase C+23 cold-vs-cold result (2026-05-18)
|
||||
|
||||
## Outcome: ENGINE FIX LANDED
|
||||
|
||||
`addis` sign-extension fix at `xenia-cpu/src/interpreter.rs` resolves
|
||||
D-NEW-2 (ε-class timeout sign-extension on the canary tid=12 → ours
|
||||
tid=7 sister chain). 5 LOC effective. Determinism preserved (3× cold
|
||||
runs byte-identical post-fix).
|
||||
|
||||
## Matched-prefix table (vs C+22 baseline)
|
||||
|
||||
| chain | C+22 | C+23 (fresh) | delta |
|
||||
|--------------------------------|---------|--------------|-------|
|
||||
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | 0 |
|
||||
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
|
||||
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
|
||||
| canary tid=12 → ours tid=7 | 3 | **4** | **+1** |
|
||||
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
|
||||
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
|
||||
|
||||
## Floating-event absorption counts (fresh c23)
|
||||
|
||||
| chain | floating_create (c/o) | floating_wait (c/o) |
|
||||
|--------------------------------|-----------------------|---------------------|
|
||||
| canary tid=6 → ours tid=1 main | 2 / 0 | 3 / 0 |
|
||||
| canary tid=15 → ours tid=10 | 0 / 1 | 0 / 0 |
|
||||
| others | 0 / 0 | 0 / 0 |
|
||||
|
||||
C+18 absorber engaged on main chain (2 canary handle.create floated)
|
||||
and on tid=15→10 (1 ours handle.create floated). C+21 absorber engaged
|
||||
on main chain (3 canary wait.begin events floated — this canary cold
|
||||
sample took the contended slow path 3 times).
|
||||
|
||||
## Cold-stable invariants
|
||||
|
||||
- **ours-cold byte-identical (det-fields) across 3 runs**:
|
||||
digest `23cf4c4cbf61a577caa4118ab2308ba6`. Replaces C+22's
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` baseline (digest moved due
|
||||
to engine source change). New baseline anchored here.
|
||||
- **Event count** unchanged: 121,569 ours events (matches C+22).
|
||||
- **Phase B `image_canonical_sha256` =
|
||||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`**
|
||||
— UNCHANGED. Image-loading path untouched.
|
||||
- **Engine source change**: `xenia-cpu/src/interpreter.rs::addis`
|
||||
(5 LOC effective, ~25 LOC including comment + commented-out
|
||||
truncation). No `xenia-canary` source changes. No diff-tool changes.
|
||||
- **Tests**: kernel 204 unchanged; cpu 288 → 291 (3 new regression
|
||||
tests for the addis fix).
|
||||
|
||||
## Direct fix-verification at the divergence point
|
||||
|
||||
ours-cold post-fix, tid=7 events 0-4:
|
||||
|
||||
```
|
||||
[0] import.call KeWaitForSingleObject
|
||||
[1] kernel.call KeWaitForSingleObject
|
||||
[2] handle.create sid=6e3d96c5a52bf429
|
||||
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
|
||||
[4] kernel.return return_value=0 status=0x00000000
|
||||
```
|
||||
|
||||
canary-cold, tid=12 events 0-4:
|
||||
|
||||
```
|
||||
[0] import.call KeWaitForSingleObject
|
||||
[1] kernel.call KeWaitForSingleObject
|
||||
[2] handle.create sid=c49d8f0ab90401ea (different SID, absorbed)
|
||||
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
|
||||
[4] kernel.return return_value=258 status=0x00000102 (TIMEOUT)
|
||||
```
|
||||
|
||||
`timeout_ns: -30000000` MATCHES across engines (was `429466729600` pre-fix).
|
||||
|
||||
## New downstream divergence at idx=4 (C+23 → C+24+ target)
|
||||
|
||||
The advance reveals the next-class issue at idx=4:
|
||||
|
||||
```
|
||||
canary: [4] kernel.return KeWaitForSingleObject return_value=258 (TIMEOUT)
|
||||
ours: [4] kernel.return KeWaitForSingleObject return_value=0 (SUCCESS)
|
||||
```
|
||||
|
||||
Classification: **(A) scheduler-determinism**, same family as C+20
|
||||
and C+22 escalations. Ours's monolithic-thread runner doesn't allow
|
||||
the 30 ms timeout window to elapse with no signaler, so the wait
|
||||
returns SUCCESS (the event was already signaled at the entry?) or
|
||||
the wait was implicit-fast-served. Canary's contended scheduler lets
|
||||
the timeout fire. Engine-side fix requires the parallel
|
||||
scheduler-determinism track (multi-session refactor).
|
||||
|
||||
## Verification that fix is NOT diff-tool jitter
|
||||
|
||||
Multiple distinct evidences:
|
||||
|
||||
1. **Direct ours-cold inspection** — the `wait.begin.timeout_ns`
|
||||
field is read directly from ours-cold.jsonl (no diff-tool
|
||||
interpretation), and it's now -30000000.
|
||||
2. **Unit tests** — `lis_ori_std_negative_timeout_writes_sign_
|
||||
extended_doubleword` in xenia-cpu asserts the architectural fact
|
||||
directly.
|
||||
3. **Determinism** — 3× cold runs produce byte-identical det-fields
|
||||
digest. The fix isn't a race that flickered on this one sample.
|
||||
4. **Phase B image hash unchanged** — the fix is purely behavioral
|
||||
on the JIT layer, not a re-link or image change.
|
||||
|
||||
## Cascade outcome
|
||||
|
||||
- A=verify canary's timeout read logic: PASS (identical formula).
|
||||
- B=identify encoding bug class: PASS — (d) sign-extension.
|
||||
- C=land fix: PASS — 5 LOC + 3 tests.
|
||||
- D=tid=12→7 advances past 3: PASS (3 → 4).
|
||||
- E=no regression on main or other sisters: PASS (all preserved).
|
||||
|
||||
## Files
|
||||
|
||||
- `investigation.md`
|
||||
- `cold-vs-cold-result.md` (this file)
|
||||
- `diff-cold-vs-cold.md`
|
||||
- `re-validation.md`
|
||||
- `ours-cold.jsonl` / `ours-cold-stdout.log` / `ours-cold-stderr.log`
|
||||
- `canary-cold-trunc.jsonl` / `canary-cold-stdout.log`
|
||||
- `canary-binary-cache-pre-wipe.tar.gz` / `canary-xdg-cache-pre-wipe.tar.gz`
|
||||
- `digest-cold-stable-1.json` / `-2.json` / `-3.json`
|
||||
- `fix.diff`
|
||||
|
||||
## Next-target recommendation
|
||||
|
||||
- **C+24 = D-NEW-3** (canary tid=14 → ours tid=9 idx=41): canary
|
||||
calls `XAudioGetVoiceCategoryVolumeChangeMask`; ours calls
|
||||
`RtlEnterCriticalSection`. Likely missing/stubbed XAudio export
|
||||
in ours causing fallback. Independent of scheduler-determinism.
|
||||
- **Parallel scheduler-determinism track**: tackle the C+20/C+22 +
|
||||
the newly-surfaced C+23-idx=4 family at the root via a
|
||||
per-CS-pointer expected-contention inference layer. Multi-session.
|
||||
Reference in New Issue
Block a user