Files
xenia-rs/audit-runs/phase-c15a-schema-wiring/new-divergences.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

122 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase C+15-α — New Divergence Catalog (2026-05-14)
Surfaced by the schema-v1.1 wiring of `handle.create/destroy`,
`thread.create/exit`, `wait.begin` in both engines.
## Cold-vs-cold matched-prefix table (post-wiring)
| canary_tid | ours_tid | matched | first_divergence_at | divergence kind |
|------------|----------|---------|---------------------|------------------------|
| 6 | 1 | 102,168 | 102,168 | extra `handle.destroy` in ours (XamTaskCloseHandle refcount mismatch) |
| 15 | 10 | 16 | — | no divergence in 16 evts (canary 3.6M, ours stalls) |
| 7 | 2 | 30 | 30 | KeWaitForSingleObject native-obj handle (class E) |
| 4 | 11 | 8 | 8 | KeWaitForMultipleObjects native-obj handle (class E) |
| 12 | 7 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
| 14 | 9 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
Main matched prefix dropped from **104,574 (C+13/C+14)** to **102,168** — a
regression of ~2,400 events. This is the expected outcome: invisible state
divergences are now visible.
## Cataloged divergences (priority-ordered for future iterate)
### D-1 (HIGH) — main chain idx=102,168: extra `handle.destroy` on `XamTaskCloseHandle`
- **Chain**: canary tid=6 ↔ ours tid=1.
- **Event**:
- ours: `handle.destroy sid=b53a312c0ac30f49` then `kernel.return XamTaskCloseHandle return=1`
- canary: `kernel.return XamTaskCloseHandle return=1` (no `handle.destroy`)
- **Hypothesis**: Ours's `xam_task_close_handle` (xam.rs:300-344) decrements
refcount and destroys the handle when it reaches 0. Canary's
`XamTaskCloseHandle_entry``NtClose``ObjectTable::ReleaseHandle` only
destroys when refcount reaches 0; canary's spawned thread keeps an additional
ref on the thread handle (`object->Retain()` in `XThread::Create` line 408
via `RetainHandle()`). Ours's refcount of 1 at this point is wrong — should
be 2 (user ref + spawned-thread ref). Ours destroys prematurely.
- **Impact**: leaks downstream divergences; spawned thread now has a dangling
handle reference.
- **Fix scope**: ~20 LOC in `xam_task_schedule` / `ex_create_thread`
add explicit `state.handle_refcount[handle] += 1` after spawn for the
XThread's own ref. Verify against canary's `RetainHandle()` semantics.
### D-2 (HIGH) — chain tid=4 / canary, tid=11 / ours: ours stops at idx=8
- **Chain**: canary tid=4 ↔ ours tid=11.
- **Event**:
- ours: `kernel.return KeWaitForMultipleObjects status=0` at idx=8, then
stream ends (9 total events).
- canary: `handle.create sid=bcaf14d76932b128 (Event)` at idx=8, then
`handle.create sid=0760e947bacff199` at idx=9, then continues for 151,690
events.
- **Hypothesis (class E asymmetry)**: Canary's `KeWaitForMultipleObjects_entry`
iterates the object pointer array and calls
`XObject::GetNativeObject<XObject>(kernel_state, object_ptr, -1, true)`
for each — when the object has not yet been wrapped in an `XObject*`, this
CREATES a new XObject (and thus a new handle). Ours's `do_wait_multiple`
uses `resolve_pseudo_handle` which does NOT create a new XObject — it
looks up the existing handle. The "handle for the native dispatcher object"
is an engine-architectural difference: canary lazily wraps,
ours pre-registers.
- **Impact**: every Ke*Wait* that takes object pointers (not handles) creates
N extra handle.create events on the canary side. Ours emits none.
- **Fix scope**: this is class E (intentional asymmetry). Recommended action:
add `Ke{Wait,Set,Reset,...}*Object*` exports that take object pointers to a
diff-tool **suppress-handle-create-side-effect** list, OR have ours emit
a synthetic `handle.create` when `resolve_pseudo_handle` first encounters
a new pointer. Latter aligns canary's view better. ~30-50 LOC.
### D-3 (HIGH) — same class on chains 7→2 (idx=30), 12→7 (idx=2), 14→9 (idx=2)
Same root cause as D-2 — `KeWaitForSingleObject` with raw object pointer.
Canary's `xeKeWaitForSingleObject` calls `GetNativeObject` which creates a
handle for the dispatcher; ours's `resolve_pseudo_handle` does not.
Group all 4 chains under one fix in D-2.
### D-4 (MEDIUM) — wait.begin SID `0000000000000000` on tid=10 of ours
- **Chain**: canary tid=15 ↔ ours tid=10 (the only thread where prefix didn't
regress — but ours stalls at idx=16).
- **Event** at idx=2: both engines emit `wait.begin` but ours's
`handles_semantic_ids = ["0000000000000000"]` while canary's is real.
- **Hypothesis**: SID = 0 means `lookup_handle_semantic_id` returned 0 (handle
not registered). The handle being waited on must have been created before
the event_log SID registry was active (during boot / init), OR it's a
pseudo-handle from `resolve_pseudo_handle`. Pseudo-handles aren't real
handles in our model.
- **Fix scope**: when `lookup_handle_semantic_id(h) == 0`, lazy-emit a
synthetic `handle.create` for `h` (with a default object_type per
`state.objects[h]`'s schema kind). Aligns with D-2 fix. ~10 LOC.
### D-5 (LOW) — chains 7→2, 12→7, 14→9: ours streams truncated
- Ours's tid=2/7/9/10 streams are 32/4/76/16 events long; canary's are
32/27,834/4,733,192/3,610,535. Ours's worker threads stall early.
- **Hypothesis**: Downstream of D-2 / D-1 — once the main thread or peer
workers diverge, downstream threads block on signals that never come.
- **Fix scope**: deferred until D-1/D-2 land; likely no separate fix needed.
## Acceptance gate status
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13 baseline).
- **Gate 2 (cvar-on emit)**: PASS — both engines produce 14M+ / 121K events
respectively; JSONL parses cleanly; all new kinds present.
- **Gate 3 (diff tool)**: PASS — diff tool consumes new kinds, produces
6-chain divergence report. Cross-engine SID skip-comparison documented in
`SKIP_PAYLOAD_FIELDS_BY_KIND`.
- **Gate 4 (cold-vs-cold)**: PASS (with regression as designed) — main chain
prefix 104,574 → 102,168 (-2,406 events). Divergence catalog produced.
- **Gate 5 (build clean)**: PASS — canary + ours both build.
- **Gate 6 (tests)**: PASS — 181 → 181 passing (no new tests added; existing
unchanged).
## Reading-error class avoided
**Class #29 — per-host-thread tid_event_idx counter for shared synthetic tids**:
canary's pre-session `thread_local uint64_t t_tid_event_idx` was correct for
guest-tid events (1 tid : 1 host_thread) but broken for boot-time emissions
with `tid=0` because boot init runs on multiple host threads. Symptom: the
diff tool rejected the canary log with "events out of order at index 8".
Fixed via tid-keyed global map (matches ours's design).