Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
109 lines
6.0 KiB
Markdown
109 lines
6.0 KiB
Markdown
# Phase C+17 cold-vs-cold result (2026-05-14)
|
||
|
||
## Matched-prefix table
|
||
|
||
| canary_tid | ours_tid | C+16 | C+17 | delta | first_divergence_at | kind |
|
||
|------------|----------|---------|---------|-----------|---------------------|-----------------------------------------------------|
|
||
| 6 | 1 | 102,171 | 102,553 | **+382** | 102,553 | `NtDuplicateObject` no `handle.create` (NEW-1) |
|
||
| 4 | 11 | 8 | 11 | **+3** | — | no divergence in 11 events (ours stalls) |
|
||
| 7 | 2 | 30 | 32 | **+2** | — | no divergence in 32 events |
|
||
| 12 | 7 | 2 | 3 | **+1** | 3 | `timeout_ns` differs in `wait.begin` (NEW-2) |
|
||
| 14 | 9 | 2 | 41 | **+39** | 41 | unrelated `XAudioGetVoiceCategoryVolumeChangeMask` |
|
||
| 15 | 10 | 16 | 2 | **-14** | 2 | ordering: ours emits `handle.create` on first thread-touch of shared dispatcher (NEW-3) |
|
||
|
||
**Main chain advanced +382** (D-2/D-3/D-4 root cause resolved). 4 of 5 sister
|
||
chains advanced. The tid=15→10 chain regressed by 14 events due to a
|
||
cross-thread-caching ordering side-effect (see broad-impact.md / NEW-3); the
|
||
underlying state alignment is the SAME root cause, so the regression is
|
||
"observation-side" — canary's `GetNativeObject` is process-global, so the
|
||
adoption happens on whichever thread touches the dispatcher first.
|
||
|
||
## New first divergence on main (idx=102,553)
|
||
|
||
```
|
||
canary: [102551] import.call NtDuplicateObject
|
||
ours: [102551] import.call NtDuplicateObject
|
||
canary: [102552] kernel.call NtDuplicateObject
|
||
ours: [102552] kernel.call NtDuplicateObject
|
||
canary: [102553] handle.create sid=df686b147b291902 (object_type=1)
|
||
ours: [102553] kernel.return NtDuplicateObject
|
||
canary: [102554] kernel.return NtDuplicateObject
|
||
ours: [102554] import.call RtlEnterCriticalSection
|
||
```
|
||
|
||
Canary's `NtDuplicateObject_entry` calls `ObjectTable::DuplicateHandle` which
|
||
fires `AddHandle` for the new slot, emitting `handle.create`. Ours's
|
||
`nt_duplicate_object` short-circuits via handle aliasing (AUDIT-062's
|
||
`dup_id=source_id` design) and does NOT emit a new `handle.create`. This is
|
||
**D-NEW-1 HIGH** — first C+18 target.
|
||
|
||
## Acceptance gates
|
||
|
||
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
|
||
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13/C+15-α/C+16
|
||
baseline). The fix is observation-only at the digest level; the new
|
||
shadow-handle refcount entries do not feed back into guest behavior
|
||
inside the 50M-instruction window.
|
||
- **Gate 2 (cvar-on emit)**: PASS — ours 121,544 events (was 121,537 in
|
||
C+16, +7 from new lazy `handle.create` emits in the main chain
|
||
bring-up); canary 3,059,463 events in ~90s. Both JSONL parse cleanly.
|
||
- **Gate 3 (diff tool)**: PASS — diff tool produces 6-chain report with
|
||
the new SID-skip semantics for `wait.begin.handles_semantic_ids`.
|
||
- **Gate 4 (cold-vs-cold)**: PASS — main matched prefix advances
|
||
102,171 → 102,553 (+382). 4 of 5 sister chains advance; 1 minor
|
||
regression on tid=15→10 (NEW-3, observation-side).
|
||
- **Gate 5 (build clean)**: PASS — `cargo build --release` clean
|
||
(1 pre-existing dead_code warning unrelated).
|
||
- **Gate 6 (tests)**: PASS — 186 → 191 (added 5 new lifecycle tests for
|
||
`ensure_dispatcher_object`; all pass + entire workspace green).
|
||
- **Gate 7 (Phase B image hash)**: PASS — `image_loaded_sha256` =
|
||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
|
||
(unchanged).
|
||
- **Gate 8 (event-log determinism)**: PASS — `handle.create` event
|
||
stream (post-strip of `host_ns`) is bit-identical across 3 cold
|
||
runs: md5 `0bd91b4c61dea52d72859e7d9c3541ba`.
|
||
|
||
## Sister-chain analysis
|
||
|
||
All 5 sister chains' first divergences are no longer "wait.begin with SID=0":
|
||
|
||
- tid=4→11: was `KeWaitForMultipleObjects` at idx=8 with empty SIDs;
|
||
now goes 11 events deep with NO divergence (ours stalls, but for
|
||
reasons unrelated to D-2/D-3/D-4).
|
||
- tid=7→2: was `KeWaitForSingleObject` at idx=30 with SID=0; now 32
|
||
events with NO divergence.
|
||
- tid=12→7: was at idx=2 with SID=0; now idx=3 — the `handle.create`
|
||
matches (SID skipped per diff-tool policy), divergence is now
|
||
`timeout_ns` mismatch (-30000000 vs 429466729600) — a real
|
||
game-side wait-quantum mismatch.
|
||
- tid=14→9: was at idx=2 with SID=0; now idx=41 — reached a real
|
||
`XAudioGetVoiceCategoryVolumeChangeMask` divergence (sister-chain
|
||
audio export the boot doesn't reach in ours).
|
||
- tid=15→10: was at idx=16 (no divergence in 16 events); now idx=2
|
||
diverges because ours emits `handle.create` on this thread's first
|
||
touch of a globally-shared semaphore dispatcher at `0x828a3230`,
|
||
while canary emitted it earlier on another thread. Observation-side
|
||
ordering issue; underlying state model is the same. NEW-3 below.
|
||
|
||
## Refcount leak risk audit
|
||
|
||
The fix bumps `state.handle_refcount[ptr] = 1` for each first-touch shadow.
|
||
Three concerns and mitigations:
|
||
|
||
1. **Leak risk**: no code path currently destroys these shadows
|
||
(`ensure_dispatcher_object` adoptions). Canary's design has the same
|
||
property — `GetNativeObject`-synthesized `XObject`s survive until
|
||
process exit. No leak relative to canary's behavior.
|
||
2. **Double-bump risk**: the early-return guard at the top of
|
||
`ensure_dispatcher_object` (`state.objects.contains_key(&ptr)`)
|
||
ensures the refcount entry is initialized exactly once per pointer.
|
||
Test `ensure_dispatcher_object_is_idempotent_on_repeated_touch`
|
||
verifies this.
|
||
3. **Refcount underflow risk**: if a future change wires
|
||
`handle.destroy` on shadow removal (e.g., when `NtClose` is
|
||
somehow called on a guest dispatcher pointer), the refcount must
|
||
not underflow. The `or_insert(1)` form preserves any pre-existing
|
||
refcount (e.g., if the same pointer was previously allocated via
|
||
`alloc_handle_for`, though that's impossible since `next_handle`
|
||
starts at `0x1000` and pointers live above `0x1_0000`).
|