handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
102
audit-runs/phase-c15a-schema-wiring/audit.md
Normal file
102
audit-runs/phase-c15a-schema-wiring/audit.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Phase C+15-α Schema-Wiring Audit (2026-05-14)
|
||||
|
||||
## Phase 1 — Wired/unwired matrix (pre-session)
|
||||
|
||||
| Kind | Canary emits? | Ours emits? | Status (pre) | Priority |
|
||||
|---------------------|---------------|-------------|---------------|----------|
|
||||
| `schema_version` | yes | yes | wired | — |
|
||||
| `import.call` | yes | yes | wired | — |
|
||||
| `kernel.call` | yes | yes | wired (+C+10) | — |
|
||||
| `kernel.return` | yes | yes | wired | — |
|
||||
| `handle.create` | declared | declared | **stubbed** | HIGH |
|
||||
| `handle.destroy` | declared | declared | **stubbed** | HIGH |
|
||||
| `thread.create` | declared | declared | **stubbed** | HIGH |
|
||||
| `thread.exit` | declared | declared | **stubbed** | HIGH |
|
||||
| `wait.begin` | declared | declared | **stubbed** | HIGH |
|
||||
| `wait.end` | declared | declared | **stubbed** | HIGH |
|
||||
| `thread.suspend` | declared | not in API | unwired | LOW |
|
||||
| `thread.resume` | declared | not in API | unwired | LOW |
|
||||
| `vfs.open` | declared | not in API | redundant? | MEDIUM |
|
||||
| `vfs.read` | declared | not in API | high-vol | LOW |
|
||||
| `vfs.close` | declared | not in API | redundant? | MEDIUM |
|
||||
| `mem.write` | declared | not in API | opt-in | LOW |
|
||||
|
||||
## Phase 2/3 — Kinds wired this session
|
||||
|
||||
Wired symmetrically in both engines (cvar-gated default-off):
|
||||
|
||||
- **`handle.create`** — emitted from `KernelState::alloc_handle_for` (ours) /
|
||||
`ObjectTable::AddHandle` (canary). 39+ call sites covered via centralized hook.
|
||||
- **`handle.destroy`** — emitted from `nt_close` + `xam_task_close_handle` (ours) /
|
||||
`ObjectTable::RemoveHandle` (canary).
|
||||
- **`thread.create`** — emitted from `ex_create_thread` (ours) / `ExCreateThread`
|
||||
in `xboxkrnl_threading.cc` (canary). After spawn succeeds.
|
||||
- **`thread.exit`** — emitted from `ex_terminate_thread` (ours) / `XThread::Exit`
|
||||
(canary). Canary's `XThread::Exit` covers both explicit `ExTerminateThread`
|
||||
and implicit thread-entry returns.
|
||||
- **`wait.begin`** — emitted from `nt_wait_for_single_object_ex` +
|
||||
`ke_wait_for_single_object` (ours) / `xeKeWaitForSingleObject` +
|
||||
`NtWaitForSingleObjectEx` (canary).
|
||||
|
||||
Deferred (v1.2):
|
||||
|
||||
- **`wait.end`** — design challenge: wait can park the guest thread, and the
|
||||
wake-status path differs between engines. Sync outcome status is already
|
||||
captured in the immediately-following `kernel.return`. Async wake outcome
|
||||
surfaced in subsequent events.
|
||||
- **`thread.suspend` / `thread.resume`** — low-frequency; defer until needed.
|
||||
- **`vfs.*`** — redundant with `kernel.call` for Nt*File. Skip per schema-v1
|
||||
audit recommendation.
|
||||
- **`mem.write`** — opt-in only (separate cvar); high-volume.
|
||||
|
||||
## Code summary
|
||||
|
||||
### Ours (~140 LOC)
|
||||
|
||||
- `crates/xenia-kernel/src/event_log.rs` — registry + auto helpers
|
||||
(`register_handle_semantic_id`, `lookup_handle_semantic_id`,
|
||||
`forget_handle_semantic_id`, `emit_handle_create_auto`,
|
||||
`emit_handle_destroy_auto`). +85 LOC.
|
||||
- `crates/xenia-kernel/src/objects.rs` — `KernelObject::schema_object_type()`.
|
||||
+14 LOC.
|
||||
- `crates/xenia-kernel/src/state.rs` — `alloc_handle_for` emit hook. +24 LOC.
|
||||
- `crates/xenia-kernel/src/exports.rs` — `nt_close` destroy emit,
|
||||
`ex_create_thread` thread.create emit, `ex_terminate_thread` thread.exit emit,
|
||||
`nt_wait_for_single_object_ex` + `ke_wait_for_single_object` wait.begin emits,
|
||||
+ `decode_timeout_ns` helper. +85 LOC.
|
||||
- `crates/xenia-kernel/src/xam.rs` — `xam_task_close_handle` destroy emit. +14 LOC.
|
||||
|
||||
### Canary (~130 LOC)
|
||||
|
||||
- `src/xenia/kernel/event_log.h` — registry API (`RegisterHandleSemanticId`,
|
||||
`LookupHandleSemanticId`, `ForgetHandleSemanticId`, `EmitHandleCreateAuto`,
|
||||
`EmitHandleDestroyAuto`). +20 LOC.
|
||||
- `src/xenia/kernel/event_log.cc` — per-tid counter map (was per-host-thread
|
||||
`thread_local`; produced duplicate `tid_event_idx` for tid=0 across host
|
||||
threads — a bug in the pre-session implementation), `CurrentTid` non-asserting
|
||||
via new `XThread::TryGetCurrentThread`, registry helpers, auto-emit wrappers.
|
||||
+60 LOC net.
|
||||
- `src/xenia/kernel/xthread.h` + `xthread.cc` — `TryGetCurrentThread` accessor
|
||||
+ `XThread::Exit` thread.exit emit. +12 LOC.
|
||||
- `src/xenia/kernel/util/object_table.cc` — `AddHandle`/`RemoveHandle` hooks
|
||||
+ `SchemaObjectType` mapping. +35 LOC.
|
||||
- `src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc` — `ExCreateThread`
|
||||
thread.create emit, `xeKeWaitForSingleObject` + `NtWaitForSingleObjectEx`
|
||||
wait.begin emits. +30 LOC.
|
||||
|
||||
### Diff tool
|
||||
|
||||
- `tools/diff-events/diff_events.py` — `SKIP_PAYLOAD_FIELDS_BY_KIND` now skips
|
||||
`handle_semantic_id` (cross-engine `creating_tid` differs, so SIDs are
|
||||
engine-local), `parent_tid`, `handles_semantic_ids`, `woken_by_semantic_id`.
|
||||
+6 LOC.
|
||||
|
||||
## Bug found and fixed this session
|
||||
|
||||
**Pre-session bug**: canary's `t_tid_event_idx` was a host-thread-local global,
|
||||
not a tid-keyed counter. When `AddHandle` runs from multiple host threads with
|
||||
tid==0 (boot init + early XThread bootstrap before guest tid is assigned), each
|
||||
host thread had its own counter starting at 0, producing duplicate
|
||||
`tid_event_idx` values within the tid=0 stream. The diff tool rejected the
|
||||
file with "events out of order at index 8". Fixed by replacing the thread_local
|
||||
with a tid-keyed `std::unordered_map` + mutex (matches ours's design).
|
||||
189
audit-runs/phase-c15a-schema-wiring/diff-cold-vs-cold.md
Normal file
189
audit-runs/phase-c15a-schema-wiring/diff-cold-vs-cold.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Phase A diff report
|
||||
|
||||
**This report is the output of Phase A's diff harness. Divergences
|
||||
shown here are INPUT for Phase B (first-divergence localization),
|
||||
not findings of Phase A.** Phase A's job is to make the harness
|
||||
itself correct, not to analyze what it surfaces.
|
||||
|
||||
## Summary
|
||||
|
||||
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|
||||
|---|---|---|---|---|---|
|
||||
| 4 | 11 | 8 | 151690 | 9 | 8 |
|
||||
| 6 | 1 | 102168 | 432396 | 108490 | 102168 |
|
||||
| 7 | 2 | 30 | 32 | 32 | 30 |
|
||||
| 12 | 7 | 2 | 27834 | 4 | 2 |
|
||||
| 14 | 9 | 2 | 4733192 | 76 | 2 |
|
||||
| 15 | 10 | 16 | 3610535 | 16 | — |
|
||||
|
||||
## canary_tid=4 → ours_tid=11
|
||||
|
||||
First divergence at `tid_event_idx=8`: kind: canary='handle.create' ours='kernel.return'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [3] import.call KeSetEvent
|
||||
ours: [3] import.call KeSetEvent
|
||||
canary: [4] kernel.call KeSetEvent
|
||||
ours: [4] kernel.call KeSetEvent
|
||||
canary: [5] kernel.return KeSetEvent
|
||||
ours: [5] kernel.return KeSetEvent
|
||||
canary: [6] import.call KeWaitForMultipleObjects
|
||||
ours: [6] import.call KeWaitForMultipleObjects
|
||||
canary: [7] kernel.call KeWaitForMultipleObjects
|
||||
ours: [7] kernel.call KeWaitForMultipleObjects
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [8] handle.create sid=bcaf14d76932b128
|
||||
ours: [8] kernel.return KeWaitForMultipleObjects
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [9] handle.create sid=0760e947bacff199
|
||||
ours: <end of stream>
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1896894000, "kind": "handle.create", "payload": {"handle_semantic_id": "bcaf14d76932b128", "object_name": null, "object_type": 1, "raw_handle_id": "0xf800009c"}, "schema_version": 1, "tid": 4, "tid_event_idx": 8}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 91, "host_ns": 1693823256, "kind": "kernel.return", "payload": {"name": "KeWaitForMultipleObjects", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 11, "tid_event_idx": 8}
|
||||
```
|
||||
|
||||
## canary_tid=6 → ours_tid=1
|
||||
|
||||
First divergence at `tid_event_idx=102168`: kind: canary='kernel.return' ours='handle.destroy'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [102163] kernel.call XamTaskSchedule
|
||||
ours: [102163] kernel.call XamTaskSchedule
|
||||
canary: [102164] handle.create sid=097dca960c32feb2
|
||||
ours: [102164] handle.create sid=b53a312c0ac30f49
|
||||
canary: [102165] kernel.return XamTaskSchedule
|
||||
ours: [102165] kernel.return XamTaskSchedule
|
||||
canary: [102166] import.call XamTaskCloseHandle
|
||||
ours: [102166] import.call XamTaskCloseHandle
|
||||
canary: [102167] kernel.call XamTaskCloseHandle
|
||||
ours: [102167] kernel.call XamTaskCloseHandle
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [102168] kernel.return XamTaskCloseHandle
|
||||
ours: [102168] handle.destroy sid=b53a312c0ac30f49
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [102169] import.call KeWaitForSingleObject
|
||||
ours: [102169] kernel.return XamTaskCloseHandle
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1473555500, "kind": "kernel.return", "payload": {"name": "XamTaskCloseHandle", "return_value": 1, "side_effects": [], "status": "0x00000001"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102168}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 495859368, "kind": "handle.destroy", "payload": {"handle_semantic_id": "b53a312c0ac30f49", "prior_refcount": 1, "raw_handle_id": "0x00001018"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102168}
|
||||
```
|
||||
|
||||
## canary_tid=7 → ours_tid=2
|
||||
|
||||
First divergence at `tid_event_idx=30`: kind: canary='handle.create' ours='wait.begin'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [25] import.call KeSetEvent
|
||||
ours: [25] import.call KeSetEvent
|
||||
canary: [26] kernel.call KeSetEvent
|
||||
ours: [26] kernel.call KeSetEvent
|
||||
canary: [27] kernel.return KeSetEvent
|
||||
ours: [27] kernel.return KeSetEvent
|
||||
canary: [28] import.call KeWaitForSingleObject
|
||||
ours: [28] import.call KeWaitForSingleObject
|
||||
canary: [29] kernel.call KeWaitForSingleObject
|
||||
ours: [29] kernel.call KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [30] handle.create sid=e1f14feb316c28dd
|
||||
ours: [30] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [31] wait.begin {'handles_semantic_ids': ['e1f14feb316c28dd'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
ours: [31] kernel.return KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1475668700, "kind": "handle.create", "payload": {"handle_semantic_id": "e1f14feb316c28dd", "object_name": null, "object_type": 1, "raw_handle_id": "0xf800001c"}, "schema_version": 1, "tid": 7, "tid_event_idx": 30}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 496144562, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 2, "tid_event_idx": 30}
|
||||
```
|
||||
|
||||
## canary_tid=12 → ours_tid=7
|
||||
|
||||
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call KeWaitForSingleObject
|
||||
ours: [0] import.call KeWaitForSingleObject
|
||||
canary: [1] kernel.call KeWaitForSingleObject
|
||||
ours: [1] kernel.call KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [2] handle.create sid=750aad55e1061f0a
|
||||
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': 429466729600, 'alertable': False, 'wait_type': 'any'}
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [3] wait.begin {'handles_semantic_ids': ['750aad55e1061f0a'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
|
||||
ours: [3] kernel.return KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1660019000, "kind": "handle.create", "payload": {"handle_semantic_id": "750aad55e1061f0a", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000068"}, "schema_version": 1, "tid": 12, "tid_event_idx": 2}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 528900173, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": 429466729600, "wait_type": "any"}, "schema_version": 1, "tid": 7, "tid_event_idx": 2}
|
||||
```
|
||||
|
||||
## canary_tid=14 → ours_tid=9
|
||||
|
||||
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call KeWaitForSingleObject
|
||||
ours: [0] import.call KeWaitForSingleObject
|
||||
canary: [1] kernel.call KeWaitForSingleObject
|
||||
ours: [1] kernel.call KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [2] handle.create sid=3df8ca649bf76cc8
|
||||
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [3] wait.begin {'handles_semantic_ids': ['3df8ca649bf76cc8'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
ours: [3] kernel.return KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1788314200, "kind": "handle.create", "payload": {"handle_semantic_id": "3df8ca649bf76cc8", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000098"}, "schema_version": 1, "tid": 14, "tid_event_idx": 2}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 1655554743, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 9, "tid_event_idx": 2}
|
||||
```
|
||||
|
||||
## canary_tid=15 → ours_tid=10
|
||||
|
||||
No divergence within the 16 compared events (canary has 3610535, ours has 16).
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
121
audit-runs/phase-c15a-schema-wiring/new-divergences.md
Normal file
121
audit-runs/phase-c15a-schema-wiring/new-divergences.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Phase C+15-α — New Divergence Catalog (2026-05-14)
|
||||
|
||||
Surfaced by the schema-v1.1 wiring of `handle.create/destroy`,
|
||||
`thread.create/exit`, `wait.begin` in both engines.
|
||||
|
||||
## Cold-vs-cold matched-prefix table (post-wiring)
|
||||
|
||||
| canary_tid | ours_tid | matched | first_divergence_at | divergence kind |
|
||||
|------------|----------|---------|---------------------|------------------------|
|
||||
| 6 | 1 | 102,168 | 102,168 | extra `handle.destroy` in ours (XamTaskCloseHandle refcount mismatch) |
|
||||
| 15 | 10 | 16 | — | no divergence in 16 evts (canary 3.6M, ours stalls) |
|
||||
| 7 | 2 | 30 | 30 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
| 4 | 11 | 8 | 8 | KeWaitForMultipleObjects native-obj handle (class E) |
|
||||
| 12 | 7 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
| 14 | 9 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
|
||||
|
||||
Main matched prefix dropped from **104,574 (C+13/C+14)** to **102,168** — a
|
||||
regression of ~2,400 events. This is the expected outcome: invisible state
|
||||
divergences are now visible.
|
||||
|
||||
## Cataloged divergences (priority-ordered for future iterate)
|
||||
|
||||
### D-1 (HIGH) — main chain idx=102,168: extra `handle.destroy` on `XamTaskCloseHandle`
|
||||
|
||||
- **Chain**: canary tid=6 ↔ ours tid=1.
|
||||
- **Event**:
|
||||
- ours: `handle.destroy sid=b53a312c0ac30f49` then `kernel.return XamTaskCloseHandle return=1`
|
||||
- canary: `kernel.return XamTaskCloseHandle return=1` (no `handle.destroy`)
|
||||
- **Hypothesis**: Ours's `xam_task_close_handle` (xam.rs:300-344) decrements
|
||||
refcount and destroys the handle when it reaches 0. Canary's
|
||||
`XamTaskCloseHandle_entry` → `NtClose` → `ObjectTable::ReleaseHandle` only
|
||||
destroys when refcount reaches 0; canary's spawned thread keeps an additional
|
||||
ref on the thread handle (`object->Retain()` in `XThread::Create` line 408
|
||||
via `RetainHandle()`). Ours's refcount of 1 at this point is wrong — should
|
||||
be 2 (user ref + spawned-thread ref). Ours destroys prematurely.
|
||||
- **Impact**: leaks downstream divergences; spawned thread now has a dangling
|
||||
handle reference.
|
||||
- **Fix scope**: ~20 LOC in `xam_task_schedule` / `ex_create_thread` —
|
||||
add explicit `state.handle_refcount[handle] += 1` after spawn for the
|
||||
XThread's own ref. Verify against canary's `RetainHandle()` semantics.
|
||||
|
||||
### D-2 (HIGH) — chain tid=4 / canary, tid=11 / ours: ours stops at idx=8
|
||||
|
||||
- **Chain**: canary tid=4 ↔ ours tid=11.
|
||||
- **Event**:
|
||||
- ours: `kernel.return KeWaitForMultipleObjects status=0` at idx=8, then
|
||||
stream ends (9 total events).
|
||||
- canary: `handle.create sid=bcaf14d76932b128 (Event)` at idx=8, then
|
||||
`handle.create sid=0760e947bacff199` at idx=9, then continues for 151,690
|
||||
events.
|
||||
- **Hypothesis (class E asymmetry)**: Canary's `KeWaitForMultipleObjects_entry`
|
||||
iterates the object pointer array and calls
|
||||
`XObject::GetNativeObject<XObject>(kernel_state, object_ptr, -1, true)`
|
||||
for each — when the object has not yet been wrapped in an `XObject*`, this
|
||||
CREATES a new XObject (and thus a new handle). Ours's `do_wait_multiple`
|
||||
uses `resolve_pseudo_handle` which does NOT create a new XObject — it
|
||||
looks up the existing handle. The "handle for the native dispatcher object"
|
||||
is an engine-architectural difference: canary lazily wraps,
|
||||
ours pre-registers.
|
||||
- **Impact**: every Ke*Wait* that takes object pointers (not handles) creates
|
||||
N extra handle.create events on the canary side. Ours emits none.
|
||||
- **Fix scope**: this is class E (intentional asymmetry). Recommended action:
|
||||
add `Ke{Wait,Set,Reset,...}*Object*` exports that take object pointers to a
|
||||
diff-tool **suppress-handle-create-side-effect** list, OR have ours emit
|
||||
a synthetic `handle.create` when `resolve_pseudo_handle` first encounters
|
||||
a new pointer. Latter aligns canary's view better. ~30-50 LOC.
|
||||
|
||||
### D-3 (HIGH) — same class on chains 7→2 (idx=30), 12→7 (idx=2), 14→9 (idx=2)
|
||||
|
||||
Same root cause as D-2 — `KeWaitForSingleObject` with raw object pointer.
|
||||
Canary's `xeKeWaitForSingleObject` calls `GetNativeObject` which creates a
|
||||
handle for the dispatcher; ours's `resolve_pseudo_handle` does not.
|
||||
|
||||
Group all 4 chains under one fix in D-2.
|
||||
|
||||
### D-4 (MEDIUM) — wait.begin SID `0000000000000000` on tid=10 of ours
|
||||
|
||||
- **Chain**: canary tid=15 ↔ ours tid=10 (the only thread where prefix didn't
|
||||
regress — but ours stalls at idx=16).
|
||||
- **Event** at idx=2: both engines emit `wait.begin` but ours's
|
||||
`handles_semantic_ids = ["0000000000000000"]` while canary's is real.
|
||||
- **Hypothesis**: SID = 0 means `lookup_handle_semantic_id` returned 0 (handle
|
||||
not registered). The handle being waited on must have been created before
|
||||
the event_log SID registry was active (during boot / init), OR it's a
|
||||
pseudo-handle from `resolve_pseudo_handle`. Pseudo-handles aren't real
|
||||
handles in our model.
|
||||
- **Fix scope**: when `lookup_handle_semantic_id(h) == 0`, lazy-emit a
|
||||
synthetic `handle.create` for `h` (with a default object_type per
|
||||
`state.objects[h]`'s schema kind). Aligns with D-2 fix. ~10 LOC.
|
||||
|
||||
### D-5 (LOW) — chains 7→2, 12→7, 14→9: ours streams truncated
|
||||
|
||||
- Ours's tid=2/7/9/10 streams are 32/4/76/16 events long; canary's are
|
||||
32/27,834/4,733,192/3,610,535. Ours's worker threads stall early.
|
||||
- **Hypothesis**: Downstream of D-2 / D-1 — once the main thread or peer
|
||||
workers diverge, downstream threads block on signals that never come.
|
||||
- **Fix scope**: deferred until D-1/D-2 land; likely no separate fix needed.
|
||||
|
||||
## Acceptance gate status
|
||||
|
||||
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13 baseline).
|
||||
- **Gate 2 (cvar-on emit)**: PASS — both engines produce 14M+ / 121K events
|
||||
respectively; JSONL parses cleanly; all new kinds present.
|
||||
- **Gate 3 (diff tool)**: PASS — diff tool consumes new kinds, produces
|
||||
6-chain divergence report. Cross-engine SID skip-comparison documented in
|
||||
`SKIP_PAYLOAD_FIELDS_BY_KIND`.
|
||||
- **Gate 4 (cold-vs-cold)**: PASS (with regression as designed) — main chain
|
||||
prefix 104,574 → 102,168 (-2,406 events). Divergence catalog produced.
|
||||
- **Gate 5 (build clean)**: PASS — canary + ours both build.
|
||||
- **Gate 6 (tests)**: PASS — 181 → 181 passing (no new tests added; existing
|
||||
unchanged).
|
||||
|
||||
## Reading-error class avoided
|
||||
|
||||
**Class #29 — per-host-thread tid_event_idx counter for shared synthetic tids**:
|
||||
canary's pre-session `thread_local uint64_t t_tid_event_idx` was correct for
|
||||
guest-tid events (1 tid : 1 host_thread) but broken for boot-time emissions
|
||||
with `tid=0` because boot init runs on multiple host threads. Symptom: the
|
||||
diff tool rejected the canary log with "events out of order at index 8".
|
||||
Fixed via tid-keyed global map (matches ours's design).
|
||||
Reference in New Issue
Block a user