handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,83 @@
|
||||
# Phase C+18 cold-vs-cold result (2026-05-14)
|
||||
|
||||
## Matched-prefix table
|
||||
|
||||
| canary_tid | ours_tid | C+17 | C+18 | delta | first_divergence_at | kind |
|
||||
|------------|----------|---------|---------|----------|---------------------|------|
|
||||
| 6 | 1 | 102,553 | 102,553 | 0 | 102,553 | `NtDuplicateObject` no `handle.create` (D-NEW-1, unchanged) |
|
||||
| 4 | 11 | 11 | 11 | 0 | — | no divergence in 11 events |
|
||||
| 7 | 2 | 32 | 32 | 0 | — | no divergence in 32 events |
|
||||
| 12 | 7 | 3 | 3 | 0 | 3 | `timeout_ns` mismatch (D-NEW-2, unchanged) |
|
||||
| 14 | 9 | 41 | 41 | 0 | 41 | unrelated `XAudioGetVoiceCategoryVolumeChangeMask` |
|
||||
| 15 | 10 | 2 | **16** | **+14** | — | **regression RESTORED** (D-NEW-3 fix landed) |
|
||||
|
||||
**tid=15→10 RESTORED**: matched-prefix advances 2 → 16 (+14), back to
|
||||
the C+16 baseline. 1 ours-side `handle.create` floating event absorbed
|
||||
by Phase C+18 cross-tid SID matching (`floating_skipped` column =
|
||||
`0/1`). No other chains regress. Main chain unchanged at 102,553.
|
||||
|
||||
## Acceptance gates
|
||||
|
||||
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from
|
||||
C+13/C+15-α/C+16/C+17 baseline). The fix is observation-only at the
|
||||
digest level; the new SID recipe is a string change in the event-log
|
||||
emit, NOT a guest-behavior change.
|
||||
- **Gate 2 (cvar-on emit)**: PASS — ours 121,544 events (unchanged
|
||||
from C+17 — same emit count, different SID values), canary
|
||||
3,517,980 events in ~90s.
|
||||
- **Gate 3 (diff tool)**: PASS — produces 6-chain report with new
|
||||
`floating_skipped (c/o)` column. tid=15→10 shows `0/1` — exactly
|
||||
one ours-side floating create absorbed.
|
||||
- **Gate 4 (cold-vs-cold)**: PASS — main matched prefix unchanged at
|
||||
102,553, tid=15→10 restored from 2 → 16, all other sister chains
|
||||
unchanged.
|
||||
- **Gate 5 (build)**: PASS — both engines build clean (only the
|
||||
pre-existing `dead_code` warning on `walk_committed_regions`).
|
||||
- **Gate 6 (tests)**: PASS — ours kernel tests 191 → 193 (+2 new SID
|
||||
determinism tests). Diff-tool tests: 14/14 PASS (new
|
||||
`test_diff_events.py`).
|
||||
- **Gate 7 (Phase B image hash)**: PASS — `image_loaded_sha256` =
|
||||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
|
||||
(unchanged).
|
||||
- **Gate 8 (event-log determinism)**: PASS — `handle.create` emit
|
||||
count unchanged (121,544 → 121,544 in ours). The new SID recipe is
|
||||
bit-deterministic over `(pointer, object_type)`.
|
||||
|
||||
## Sister-chain analysis
|
||||
|
||||
- **tid=4→11** (no divergence): unchanged 11 events matched.
|
||||
- **tid=7→2** (no divergence): unchanged 32 events matched.
|
||||
- **tid=12→7** (`timeout_ns` mismatch): unchanged at idx=3. D-NEW-2 is
|
||||
next-after-D-NEW-1 in the queue.
|
||||
- **tid=14→9** (audio export): unchanged at idx=41. D-NEW-something
|
||||
to be triaged later.
|
||||
- **tid=15→10** (RESTORED): the diff tool's floating-create absorb
|
||||
pulled tid=15's matched count back up to 16 (= matches the full
|
||||
canary tid=15 stream length within the 20k truncation cap; the next
|
||||
divergence is presumably beyond the truncation window).
|
||||
|
||||
## Refcount and stability audit
|
||||
|
||||
The fix touches the SID computation only — the `handle_refcount` and
|
||||
`state.objects` insertion logic is unchanged. The C+17 refcount-leak
|
||||
risk audit (5 tests) continues to apply unchanged.
|
||||
|
||||
The deterministic SID is a fresh value computed at first-touch and
|
||||
overwrites the registry entry. The old per-tid SID is never seen by
|
||||
the diff tool. No double-insertion or stale-mapping issues.
|
||||
|
||||
## D-NEW-3 status
|
||||
|
||||
**RESOLVED**. The race is now invisible to the diff tool. Both engines
|
||||
emit the same SID for the same dispatcher; the diff tool absorbs the
|
||||
floating-tid mismatch via the cross-tid match.
|
||||
|
||||
## Next target
|
||||
|
||||
**C+19 = D-NEW-1 (`NtDuplicateObject` `handle.create`)**, unchanged
|
||||
from C+17 plan. Canary's `ObjectTable::DuplicateHandle` allocates a
|
||||
fresh slot via `AddHandle` (emits `handle.create`); ours's
|
||||
`nt_duplicate_object` aliases via `dup_id=source_id` per AUDIT-062 and
|
||||
does NOT emit a new event. Tradeoff between mirror (~30-40 LOC, risk =
|
||||
AUDIT-062 worker-cluster regression) vs diff-tool suppress (band-aid).
|
||||
Reference in New Issue
Block a user