handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
121
audit-runs/phase-c15a-schema-wiring/new-divergences.md
Normal file
121
audit-runs/phase-c15a-schema-wiring/new-divergences.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Phase C+15-α — New Divergence Catalog (2026-05-14)
|
||||
|
||||
Surfaced by the schema-v1.1 wiring of `handle.create/destroy`,
|
||||
`thread.create/exit`, `wait.begin` in both engines.
|
||||
|
||||
## Cold-vs-cold matched-prefix table (post-wiring)
|
||||
|
||||
| canary_tid | ours_tid | matched | first_divergence_at | divergence kind |
|
||||
|------------|----------|---------|---------------------|------------------------|
|
||||
| 6 | 1 | 102,168 | 102,168 | extra `handle.destroy` in ours (XamTaskCloseHandle refcount mismatch) |
|
||||
| 15 | 10 | 16 | — | no divergence in 16 evts (canary 3.6M, ours stalls) |
|
||||
| 7 | 2 | 30 | 30 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
| 4 | 11 | 8 | 8 | KeWaitForMultipleObjects native-obj handle (class E) |
|
||||
| 12 | 7 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
| 14 | 9 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
|
||||
Main matched prefix dropped from **104,574 (C+13/C+14)** to **102,168** — a
|
||||
regression of ~2,400 events. This is the expected outcome: invisible state
|
||||
divergences are now visible.
|
||||
|
||||
## Cataloged divergences (priority-ordered for future iterate)
|
||||
|
||||
### D-1 (HIGH) — main chain idx=102,168: extra `handle.destroy` on `XamTaskCloseHandle`
|
||||
|
||||
- **Chain**: canary tid=6 ↔ ours tid=1.
|
||||
- **Event**:
|
||||
- ours: `handle.destroy sid=b53a312c0ac30f49` then `kernel.return XamTaskCloseHandle return=1`
|
||||
- canary: `kernel.return XamTaskCloseHandle return=1` (no `handle.destroy`)
|
||||
- **Hypothesis**: Ours's `xam_task_close_handle` (xam.rs:300-344) decrements
|
||||
refcount and destroys the handle when it reaches 0. Canary's
|
||||
`XamTaskCloseHandle_entry` → `NtClose` → `ObjectTable::ReleaseHandle` only
|
||||
destroys when refcount reaches 0; canary's spawned thread keeps an additional
|
||||
ref on the thread handle (`object->Retain()` in `XThread::Create` line 408
|
||||
via `RetainHandle()`). Ours's refcount of 1 at this point is wrong — should
|
||||
be 2 (user ref + spawned-thread ref). Ours destroys prematurely.
|
||||
- **Impact**: leaks downstream divergences; spawned thread now has a dangling
|
||||
handle reference.
|
||||
- **Fix scope**: ~20 LOC in `xam_task_schedule` / `ex_create_thread` —
|
||||
add explicit `state.handle_refcount[handle] += 1` after spawn for the
|
||||
XThread's own ref. Verify against canary's `RetainHandle()` semantics.
|
||||
|
||||
### D-2 (HIGH) — chain tid=4 / canary, tid=11 / ours: ours stops at idx=8
|
||||
|
||||
- **Chain**: canary tid=4 ↔ ours tid=11.
|
||||
- **Event**:
|
||||
- ours: `kernel.return KeWaitForMultipleObjects status=0` at idx=8, then
|
||||
stream ends (9 total events).
|
||||
- canary: `handle.create sid=bcaf14d76932b128 (Event)` at idx=8, then
|
||||
`handle.create sid=0760e947bacff199` at idx=9, then continues for 151,690
|
||||
events.
|
||||
- **Hypothesis (class E asymmetry)**: Canary's `KeWaitForMultipleObjects_entry`
|
||||
iterates the object pointer array and calls
|
||||
`XObject::GetNativeObject<XObject>(kernel_state, object_ptr, -1, true)`
|
||||
for each — when the object has not yet been wrapped in an `XObject*`, this
|
||||
CREATES a new XObject (and thus a new handle). Ours's `do_wait_multiple`
|
||||
uses `resolve_pseudo_handle` which does NOT create a new XObject — it
|
||||
looks up the existing handle. The "handle for the native dispatcher object"
|
||||
is an engine-architectural difference: canary lazily wraps,
|
||||
ours pre-registers.
|
||||
- **Impact**: every Ke*Wait* that takes object pointers (not handles) creates
|
||||
N extra handle.create events on the canary side. Ours emits none.
|
||||
- **Fix scope**: this is class E (intentional asymmetry). Recommended action:
|
||||
add `Ke{Wait,Set,Reset,...}*Object*` exports that take object pointers to a
|
||||
diff-tool **suppress-handle-create-side-effect** list, OR have ours emit
|
||||
a synthetic `handle.create` when `resolve_pseudo_handle` first encounters
|
||||
a new pointer. Latter aligns canary's view better. ~30-50 LOC.
|
||||
|
||||
### D-3 (HIGH) — same class on chains 7→2 (idx=30), 12→7 (idx=2), 14→9 (idx=2)
|
||||
|
||||
Same root cause as D-2 — `KeWaitForSingleObject` with raw object pointer.
|
||||
Canary's `xeKeWaitForSingleObject` calls `GetNativeObject` which creates a
|
||||
handle for the dispatcher; ours's `resolve_pseudo_handle` does not.
|
||||
|
||||
Group all 4 chains under one fix in D-2.
|
||||
|
||||
### D-4 (MEDIUM) — wait.begin SID `0000000000000000` on tid=10 of ours
|
||||
|
||||
- **Chain**: canary tid=15 ↔ ours tid=10 (the only thread where prefix didn't
|
||||
regress — but ours stalls at idx=16).
|
||||
- **Event** at idx=2: both engines emit `wait.begin` but ours's
|
||||
`handles_semantic_ids = ["0000000000000000"]` while canary's is real.
|
||||
- **Hypothesis**: SID = 0 means `lookup_handle_semantic_id` returned 0 (handle
|
||||
not registered). The handle being waited on must have been created before
|
||||
the event_log SID registry was active (during boot / init), OR it's a
|
||||
pseudo-handle from `resolve_pseudo_handle`. Pseudo-handles aren't real
|
||||
handles in our model.
|
||||
- **Fix scope**: when `lookup_handle_semantic_id(h) == 0`, lazy-emit a
|
||||
synthetic `handle.create` for `h` (with a default object_type per
|
||||
`state.objects[h]`'s schema kind). Aligns with D-2 fix. ~10 LOC.
|
||||
|
||||
### D-5 (LOW) — chains 7→2, 12→7, 14→9: ours streams truncated
|
||||
|
||||
- Ours's tid=2/7/9/10 streams are 32/4/76/16 events long; canary's are
|
||||
32/27,834/4,733,192/3,610,535. Ours's worker threads stall early.
|
||||
- **Hypothesis**: Downstream of D-2 / D-1 — once the main thread or peer
|
||||
workers diverge, downstream threads block on signals that never come.
|
||||
- **Fix scope**: deferred until D-1/D-2 land; likely no separate fix needed.
|
||||
|
||||
## Acceptance gate status
|
||||
|
||||
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13 baseline).
|
||||
- **Gate 2 (cvar-on emit)**: PASS — both engines produce 14M+ / 121K events
|
||||
respectively; JSONL parses cleanly; all new kinds present.
|
||||
- **Gate 3 (diff tool)**: PASS — diff tool consumes new kinds, produces
|
||||
6-chain divergence report. Cross-engine SID skip-comparison documented in
|
||||
`SKIP_PAYLOAD_FIELDS_BY_KIND`.
|
||||
- **Gate 4 (cold-vs-cold)**: PASS (with regression as designed) — main chain
|
||||
prefix 104,574 → 102,168 (-2,406 events). Divergence catalog produced.
|
||||
- **Gate 5 (build clean)**: PASS — canary + ours both build.
|
||||
- **Gate 6 (tests)**: PASS — 181 → 181 passing (no new tests added; existing
|
||||
unchanged).
|
||||
|
||||
## Reading-error class avoided
|
||||
|
||||
**Class #29 — per-host-thread tid_event_idx counter for shared synthetic tids**:
|
||||
canary's pre-session `thread_local uint64_t t_tid_event_idx` was correct for
|
||||
guest-tid events (1 tid : 1 host_thread) but broken for boot-time emissions
|
||||
with `tid=0` because boot init runs on multiple host threads. Symptom: the
|
||||
diff tool rejected the canary log with "events out of order at index 8".
|
||||
Fixed via tid-keyed global map (matches ours's design).
|
||||
Reference in New Issue
Block a user