handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
134
audit-runs/phase-c17-keWait-native-object/broad-impact.md
Normal file
134
audit-runs/phase-c17-keWait-native-object/broad-impact.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Phase C+17 — Broad-impact catalog (2026-05-14)
|
||||
|
||||
The C+17 fix touches a widely-used primitive (`ensure_dispatcher_object`,
|
||||
called by `Ke{Wait,Set,Reset,Pulse}Event`, `Ke{Wait,Release}Semaphore`, etc.).
|
||||
This catalog enumerates the surfaced divergences post-fix per chain.
|
||||
|
||||
## Resolved (3 of 5 catalogued in C+15-α)
|
||||
|
||||
### D-2 / D-3 / D-4 — KeWait*ForSingleObject native-obj handle (all 5 chains)
|
||||
|
||||
Class E asymmetry. Canary's `xeKeWaitForSingleObject` /
|
||||
`KeWaitForMultipleObjects_entry` calls `XObject::GetNativeObject` which
|
||||
emits `handle.create` for the synthesized wrapper; ours's
|
||||
`ensure_dispatcher_object` did the same shadow synthesis but never emitted
|
||||
the schema event. Fix: emit `handle.create` (with the appropriate
|
||||
`object_type` from `KernelObject::schema_object_type`) on first
|
||||
adoption, and register the SID so subsequent `wait.begin` events resolve
|
||||
non-zero `handles_semantic_ids[]`.
|
||||
|
||||
Observed: all 5 chains' divergences move past the wait-begin idx that was
|
||||
previously blocked at SID=0.
|
||||
|
||||
## Advanced
|
||||
|
||||
### Main tid=6→1 (+382)
|
||||
|
||||
102,171 → 102,553. The 382 new matching events between the two indexes are
|
||||
mostly `kernel.{call,return}`, `import.call`, `RtlEnter/LeaveCriticalSection`,
|
||||
plus the now-aligned `handle.create`+`wait.begin` pairs from
|
||||
`KeWaitForSingleObject` and `KeWaitForMultipleObjects` calls. Several
|
||||
new shadow `handle.create` events fire on first encounter of
|
||||
specific PKEVENT/PKSEMAPHORE pointers in the game's init path.
|
||||
|
||||
### Sister chains (+3 / +2 / +1 / +39)
|
||||
|
||||
- tid=4→11 +3: matches all 11 emitted events.
|
||||
- tid=7→2 +2: matches all 32 events.
|
||||
- tid=12→7 +1: matches through `handle.create` at idx=2.
|
||||
- tid=14→9 +39: walks past all the now-aligned `KeWait*` framing into the
|
||||
audio subsystem.
|
||||
|
||||
## Persisted (pre-existing bugs unaffected)
|
||||
|
||||
None of the C+15-α catalog's other groups are touched.
|
||||
|
||||
## NEW divergences (cataloged for future iterates)
|
||||
|
||||
### D-NEW-1 (HIGH) — main idx=102,553: `NtDuplicateObject` no `handle.create`
|
||||
|
||||
Canary's `NtDuplicateObject_entry` → `ObjectTable::DuplicateHandle`
|
||||
allocates a new slot via `AddHandle(object, &new_handle)`
|
||||
(util/object_table.cc:148-201), which fires the C+15-α-wired
|
||||
`phase_a::EmitHandleCreateAuto`. Ours's `nt_duplicate_object`
|
||||
(exports.rs `nt_duplicate_object`) implements per-AUDIT-062 alias-on-dup
|
||||
semantics: `dup_id = source_id` so refcount-bumped re-use of the same
|
||||
slot. No new `handle.create` fires.
|
||||
|
||||
This is a genuine engine-architectural difference. Mirror options:
|
||||
- (a) Make ours allocate a fresh handle on `NtDuplicateObject` and emit
|
||||
`handle.create` (mirror canary). ~30-40 LOC; downstream impact on
|
||||
every existing AUDIT-062-dependent code path needs audit.
|
||||
- (b) Diff-tool suppress this `handle.create` site. Band-aid.
|
||||
|
||||
Recommendation: (a). C+18 target. Trade-off: AUDIT-062's "alias on dup"
|
||||
was implemented to handle a specific worker-cluster handle-aliasing
|
||||
issue; un-doing it may surface a different regression. The risk
|
||||
profile is similar to C+15-α: invisible state divergences become
|
||||
visible. ~30 LOC fix or ~30 LOC tactical revert.
|
||||
|
||||
### D-NEW-2 (MEDIUM) — tid=12→7 idx=3: `wait.begin.timeout_ns` mismatch
|
||||
|
||||
```
|
||||
canary: wait.begin handles_semantic_ids=[SID-A] timeout_ns=-30000000
|
||||
ours: wait.begin handles_semantic_ids=[SID-B] timeout_ns=429466729600
|
||||
```
|
||||
|
||||
The SIDs differ (skipped per diff policy). The `timeout_ns` is the issue:
|
||||
canary uses 30ms relative timeout; ours has 429.47ms absolute-time
|
||||
encoding. Likely cause: ours's `decode_timeout_ns` returns the raw
|
||||
`mem.read_u64(timeout_ptr) as i64 * 100` without applying the
|
||||
"negative=relative / positive=absolute" semantics consistently with
|
||||
canary. Inspect `decode_timeout_ns` (exports.rs:4890) — canary's
|
||||
threading.cc emit code passes `(*timeout_ptr) * 100` directly without
|
||||
sign conversion either, so the divergence is upstream in how each engine
|
||||
**writes** the TIMEOUT* struct. Probably ε-class (game-side state
|
||||
encoding).
|
||||
|
||||
C+19 target estimate. ~10-30 LOC investigation.
|
||||
|
||||
### D-NEW-3 (LOW) — tid=15→10 idx=2: `handle.create` ordering on shared dispatcher
|
||||
|
||||
Canary's `GetNativeObject` is **process-global**: once any thread adopts
|
||||
a dispatcher pointer (stashing `kXObjSignature` in the wait_list), all
|
||||
subsequent threads find the existing handle and do NOT re-emit. Canary's
|
||||
`handle.create` for the semaphore at guest pointer `0x828a3230` (XAudio
|
||||
voice volume changemask?) emitted earlier on a different thread; on tid=15
|
||||
the first wait happens to skip straight to `wait.begin`.
|
||||
|
||||
Ours's `ensure_dispatcher_object` is also process-global (the `state.objects`
|
||||
map is shared in `KernelState`). However, the **timing of first adoption**
|
||||
differs because thread interleaving / boot ordering between the two engines
|
||||
isn't bit-identical. Ours's tid=10 happens to be the first to touch
|
||||
`0x828a3230`, so it emits `handle.create` at idx=2; canary's tid=15
|
||||
arrived after another thread (probably tid=6 or tid=10) had already
|
||||
adopted it.
|
||||
|
||||
This is a **timing-induced ordering** divergence, not a state-model
|
||||
asymmetry. It's the inverse of the typical D-1/D-2 class — both engines
|
||||
emit the SAME total number of `handle.create` events; the issue is which
|
||||
thread happens to be the "first toucher". The diff tool currently treats
|
||||
this as a divergence because it compares per-tid sequences strictly.
|
||||
|
||||
Two possible mitigations:
|
||||
- (a) Diff-tool: relax ordering for `handle.create` emits when the
|
||||
"next thread" event is `wait.begin` on the same dispatcher. Complex.
|
||||
- (b) Suppress `handle.create` from the per-thread sequence entirely;
|
||||
treat it as a global emit and only diff `wait.begin` SIDs against a
|
||||
process-global SID-registry. Could work via `SKIP_PAYLOAD_FIELDS_BY_KIND`
|
||||
extension to drop the event from per-tid alignment.
|
||||
- (c) Live with the +0/-14 trade-off on tid=15→10 — the main chain
|
||||
improvement dwarfs it.
|
||||
|
||||
Recommendation: (c) for now; C+20+ if the chain becomes load-bearing.
|
||||
|
||||
## Reading-error register
|
||||
|
||||
- **Reading-error #28 (verify framing first)**: FOLLOWED. Canary's
|
||||
`GetNativeObject` was read end-to-end before any code change.
|
||||
- **Reading-error #23 (widely-used primitive flip)**: MITIGATED. Cold-vs-cold
|
||||
gate caught no main-chain regression; minor sister-chain regression on
|
||||
tid=15→10 is documented as NEW-3.
|
||||
- **Reading-error #19 (host-side emits)**: FOLLOWED. `event_log::is_enabled()`
|
||||
guards on every new emit; default-off cost is one relaxed atomic-bool
|
||||
check (zero cost when disabled).
|
||||
108
audit-runs/phase-c17-keWait-native-object/cold-vs-cold-result.md
Normal file
108
audit-runs/phase-c17-keWait-native-object/cold-vs-cold-result.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# Phase C+17 cold-vs-cold result (2026-05-14)
|
||||
|
||||
## Matched-prefix table
|
||||
|
||||
| canary_tid | ours_tid | C+16 | C+17 | delta | first_divergence_at | kind |
|
||||
|------------|----------|---------|---------|-----------|---------------------|-----------------------------------------------------|
|
||||
| 6 | 1 | 102,171 | 102,553 | **+382** | 102,553 | `NtDuplicateObject` no `handle.create` (NEW-1) |
|
||||
| 4 | 11 | 8 | 11 | **+3** | — | no divergence in 11 events (ours stalls) |
|
||||
| 7 | 2 | 30 | 32 | **+2** | — | no divergence in 32 events |
|
||||
| 12 | 7 | 2 | 3 | **+1** | 3 | `timeout_ns` differs in `wait.begin` (NEW-2) |
|
||||
| 14 | 9 | 2 | 41 | **+39** | 41 | unrelated `XAudioGetVoiceCategoryVolumeChangeMask` |
|
||||
| 15 | 10 | 16 | 2 | **-14** | 2 | ordering: ours emits `handle.create` on first thread-touch of shared dispatcher (NEW-3) |
|
||||
|
||||
**Main chain advanced +382** (D-2/D-3/D-4 root cause resolved). 4 of 5 sister
|
||||
chains advanced. The tid=15→10 chain regressed by 14 events due to a
|
||||
cross-thread-caching ordering side-effect (see broad-impact.md / NEW-3); the
|
||||
underlying state alignment is the SAME root cause, so the regression is
|
||||
"observation-side" — canary's `GetNativeObject` is process-global, so the
|
||||
adoption happens on whichever thread touches the dispatcher first.
|
||||
|
||||
## New first divergence on main (idx=102,553)
|
||||
|
||||
```
|
||||
canary: [102551] import.call NtDuplicateObject
|
||||
ours: [102551] import.call NtDuplicateObject
|
||||
canary: [102552] kernel.call NtDuplicateObject
|
||||
ours: [102552] kernel.call NtDuplicateObject
|
||||
canary: [102553] handle.create sid=df686b147b291902 (object_type=1)
|
||||
ours: [102553] kernel.return NtDuplicateObject
|
||||
canary: [102554] kernel.return NtDuplicateObject
|
||||
ours: [102554] import.call RtlEnterCriticalSection
|
||||
```
|
||||
|
||||
Canary's `NtDuplicateObject_entry` calls `ObjectTable::DuplicateHandle` which
|
||||
fires `AddHandle` for the new slot, emitting `handle.create`. Ours's
|
||||
`nt_duplicate_object` short-circuits via handle aliasing (AUDIT-062's
|
||||
`dup_id=source_id` design) and does NOT emit a new `handle.create`. This is
|
||||
**D-NEW-1 HIGH** — first C+18 target.
|
||||
|
||||
## Acceptance gates
|
||||
|
||||
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
|
||||
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13/C+15-α/C+16
|
||||
baseline). The fix is observation-only at the digest level; the new
|
||||
shadow-handle refcount entries do not feed back into guest behavior
|
||||
inside the 50M-instruction window.
|
||||
- **Gate 2 (cvar-on emit)**: PASS — ours 121,544 events (was 121,537 in
|
||||
C+16, +7 from new lazy `handle.create` emits in the main chain
|
||||
bring-up); canary 3,059,463 events in ~90s. Both JSONL parse cleanly.
|
||||
- **Gate 3 (diff tool)**: PASS — diff tool produces 6-chain report with
|
||||
the new SID-skip semantics for `wait.begin.handles_semantic_ids`.
|
||||
- **Gate 4 (cold-vs-cold)**: PASS — main matched prefix advances
|
||||
102,171 → 102,553 (+382). 4 of 5 sister chains advance; 1 minor
|
||||
regression on tid=15→10 (NEW-3, observation-side).
|
||||
- **Gate 5 (build clean)**: PASS — `cargo build --release` clean
|
||||
(1 pre-existing dead_code warning unrelated).
|
||||
- **Gate 6 (tests)**: PASS — 186 → 191 (added 5 new lifecycle tests for
|
||||
`ensure_dispatcher_object`; all pass + entire workspace green).
|
||||
- **Gate 7 (Phase B image hash)**: PASS — `image_loaded_sha256` =
|
||||
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
|
||||
(unchanged).
|
||||
- **Gate 8 (event-log determinism)**: PASS — `handle.create` event
|
||||
stream (post-strip of `host_ns`) is bit-identical across 3 cold
|
||||
runs: md5 `0bd91b4c61dea52d72859e7d9c3541ba`.
|
||||
|
||||
## Sister-chain analysis
|
||||
|
||||
All 5 sister chains' first divergences are no longer "wait.begin with SID=0":
|
||||
|
||||
- tid=4→11: was `KeWaitForMultipleObjects` at idx=8 with empty SIDs;
|
||||
now goes 11 events deep with NO divergence (ours stalls, but for
|
||||
reasons unrelated to D-2/D-3/D-4).
|
||||
- tid=7→2: was `KeWaitForSingleObject` at idx=30 with SID=0; now 32
|
||||
events with NO divergence.
|
||||
- tid=12→7: was at idx=2 with SID=0; now idx=3 — the `handle.create`
|
||||
matches (SID skipped per diff-tool policy), divergence is now
|
||||
`timeout_ns` mismatch (-30000000 vs 429466729600) — a real
|
||||
game-side wait-quantum mismatch.
|
||||
- tid=14→9: was at idx=2 with SID=0; now idx=41 — reached a real
|
||||
`XAudioGetVoiceCategoryVolumeChangeMask` divergence (sister-chain
|
||||
audio export the boot doesn't reach in ours).
|
||||
- tid=15→10: was at idx=16 (no divergence in 16 events); now idx=2
|
||||
diverges because ours emits `handle.create` on this thread's first
|
||||
touch of a globally-shared semaphore dispatcher at `0x828a3230`,
|
||||
while canary emitted it earlier on another thread. Observation-side
|
||||
ordering issue; underlying state model is the same. NEW-3 below.
|
||||
|
||||
## Refcount leak risk audit
|
||||
|
||||
The fix bumps `state.handle_refcount[ptr] = 1` for each first-touch shadow.
|
||||
Three concerns and mitigations:
|
||||
|
||||
1. **Leak risk**: no code path currently destroys these shadows
|
||||
(`ensure_dispatcher_object` adoptions). Canary's design has the same
|
||||
property — `GetNativeObject`-synthesized `XObject`s survive until
|
||||
process exit. No leak relative to canary's behavior.
|
||||
2. **Double-bump risk**: the early-return guard at the top of
|
||||
`ensure_dispatcher_object` (`state.objects.contains_key(&ptr)`)
|
||||
ensures the refcount entry is initialized exactly once per pointer.
|
||||
Test `ensure_dispatcher_object_is_idempotent_on_repeated_touch`
|
||||
verifies this.
|
||||
3. **Refcount underflow risk**: if a future change wires
|
||||
`handle.destroy` on shadow removal (e.g., when `NtClose` is
|
||||
somehow called on a guest dispatcher pointer), the refcount must
|
||||
not underflow. The `or_insert(1)` form preserves any pre-existing
|
||||
refcount (e.g., if the same pointer was previously allocated via
|
||||
`alloc_handle_for`, though that's impossible since `next_handle`
|
||||
starts at `0x1000` and pointers live above `0x1_0000`).
|
||||
159
audit-runs/phase-c17-keWait-native-object/diff-cold-vs-cold.md
Normal file
159
audit-runs/phase-c17-keWait-native-object/diff-cold-vs-cold.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Phase A diff report
|
||||
|
||||
**This report is the output of Phase A's diff harness. Divergences
|
||||
shown here are INPUT for Phase B (first-divergence localization),
|
||||
not findings of Phase A.** Phase A's job is to make the harness
|
||||
itself correct, not to analyze what it surfaces.
|
||||
|
||||
## Summary
|
||||
|
||||
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|
||||
|---|---|---|---|---|---|
|
||||
| 4 | 11 | 11 | 39205 | 11 | — |
|
||||
| 6 | 1 | 102553 | 330781 | 108490 | 102553 |
|
||||
| 7 | 2 | 32 | 32 | 33 | — |
|
||||
| 12 | 7 | 3 | 6014 | 5 | 3 |
|
||||
| 14 | 9 | 41 | 1049995 | 77 | 41 |
|
||||
| 15 | 10 | 2 | 713071 | 17 | 2 |
|
||||
|
||||
## canary_tid=4 → ours_tid=11
|
||||
|
||||
No divergence within the 11 compared events (canary has 39205, ours has 11).
|
||||
|
||||
## canary_tid=6 → ours_tid=1
|
||||
|
||||
First divergence at `tid_event_idx=102553`: kind: canary='handle.create' ours='kernel.return'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [102548] import.call RtlLeaveCriticalSection
|
||||
ours: [102548] import.call RtlLeaveCriticalSection
|
||||
canary: [102549] kernel.call RtlLeaveCriticalSection
|
||||
ours: [102549] kernel.call RtlLeaveCriticalSection
|
||||
canary: [102550] kernel.return RtlLeaveCriticalSection
|
||||
ours: [102550] kernel.return RtlLeaveCriticalSection
|
||||
canary: [102551] import.call NtDuplicateObject
|
||||
ours: [102551] import.call NtDuplicateObject
|
||||
canary: [102552] kernel.call NtDuplicateObject
|
||||
ours: [102552] kernel.call NtDuplicateObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [102553] handle.create sid=df686b147b291902
|
||||
ours: [102553] kernel.return NtDuplicateObject
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [102554] kernel.return NtDuplicateObject
|
||||
ours: [102554] import.call RtlEnterCriticalSection
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1393604400, "kind": "handle.create", "payload": {"handle_semantic_id": "df686b147b291902", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000044"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102553}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 5398419, "host_ns": 473009661, "kind": "kernel.return", "payload": {"name": "NtDuplicateObject", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102553}
|
||||
```
|
||||
|
||||
## canary_tid=7 → ours_tid=2
|
||||
|
||||
No divergence within the 32 compared events (canary has 32, ours has 33).
|
||||
|
||||
## canary_tid=12 → ours_tid=7
|
||||
|
||||
First divergence at `tid_event_idx=3`: payload.timeout_ns: canary=-30000000 ours=429466729600
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call KeWaitForSingleObject
|
||||
ours: [0] import.call KeWaitForSingleObject
|
||||
canary: [1] kernel.call KeWaitForSingleObject
|
||||
ours: [1] kernel.call KeWaitForSingleObject
|
||||
canary: [2] handle.create sid=750aad55e1061f0a
|
||||
ours: [2] handle.create sid=b6ff5e6c9ca50ba1
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [3] wait.begin {'handles_semantic_ids': ['750aad55e1061f0a'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
|
||||
ours: [3] wait.begin {'handles_semantic_ids': ['b6ff5e6c9ca50ba1'], 'timeout_ns': 429466729600, 'alertable': False, 'wait_type': 'any'}
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [4] kernel.return KeWaitForSingleObject
|
||||
ours: [4] kernel.return KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1543612200, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["750aad55e1061f0a"], "timeout_ns": -30000000, "wait_type": "any"}, "schema_version": 1, "tid": 12, "tid_event_idx": 3}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 497636971, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["b6ff5e6c9ca50ba1"], "timeout_ns": 429466729600, "wait_type": "any"}, "schema_version": 1, "tid": 7, "tid_event_idx": 3}
|
||||
```
|
||||
|
||||
## canary_tid=14 → ours_tid=9
|
||||
|
||||
First divergence at `tid_event_idx=41`: payload.ord: canary=503 ours=293
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
|
||||
ours: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
|
||||
canary: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
|
||||
ours: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
|
||||
canary: [38] import.call KfLowerIrql
|
||||
ours: [38] import.call KfLowerIrql
|
||||
canary: [39] kernel.call KfLowerIrql
|
||||
ours: [39] kernel.call KfLowerIrql
|
||||
canary: [40] kernel.return KfLowerIrql
|
||||
ours: [40] kernel.return KfLowerIrql
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [41] import.call XAudioGetVoiceCategoryVolumeChangeMask
|
||||
ours: [41] import.call RtlEnterCriticalSection
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [42] kernel.call XAudioGetVoiceCategoryVolumeChangeMask
|
||||
ours: [42] kernel.call RtlEnterCriticalSection
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1766886300, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "XAudioGetVoiceCategoryVolumeChangeMask", "ord": 503}, "schema_version": 1, "tid": 14, "tid_event_idx": 41}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 417, "host_ns": 1698700479, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlEnterCriticalSection", "ord": 293}, "schema_version": 1, "tid": 9, "tid_event_idx": 41}
|
||||
```
|
||||
|
||||
## canary_tid=15 → ours_tid=10
|
||||
|
||||
First divergence at `tid_event_idx=2`: kind: canary='wait.begin' ours='handle.create'
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call KeWaitForSingleObject
|
||||
ours: [0] import.call KeWaitForSingleObject
|
||||
canary: [1] kernel.call KeWaitForSingleObject
|
||||
ours: [1] kernel.call KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [2] wait.begin {'handles_semantic_ids': ['66ae1b598f928969'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
ours: [2] handle.create sid=b9e6799594b746ee
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [3] kernel.return KeWaitForSingleObject
|
||||
ours: [3] wait.begin {'handles_semantic_ids': ['b9e6799594b746ee'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1665434200, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["66ae1b598f928969"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 15, "tid_event_idx": 2}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 1630381728, "kind": "handle.create", "payload": {"handle_semantic_id": "b9e6799594b746ee", "object_name": null, "object_type": 3, "raw_handle_id": "0x828a3230"}, "schema_version": 1, "tid": 10, "tid_event_idx": 2}
|
||||
```
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
187
audit-runs/phase-c17-keWait-native-object/investigation.md
Normal file
187
audit-runs/phase-c17-keWait-native-object/investigation.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# Phase C+17 Investigation — KeWait native-object handle synthesis (2026-05-14)
|
||||
|
||||
## Framing verification (reading-error #28 discipline)
|
||||
|
||||
C+15-α / C+16 catalog D-2/D-3/D-4 hypothesis: ours's `KeWait*` doesn't emit
|
||||
`handle.create` when passed a raw native dispatcher object pointer (PKEVENT /
|
||||
PKSEMAPHORE), while canary's `xeKeWaitForSingleObject` /
|
||||
`KeWaitForMultipleObjects_entry` call `XObject::GetNativeObject` which
|
||||
lazy-synthesizes an `XEvent`/`XSemaphore`/`XMutant`/`XTimer` wrapper and
|
||||
inserts it in the object table — `ObjectTable::AddHandle` fires
|
||||
`phase_a::EmitHandleCreateAuto` (object_table.cc:191-198).
|
||||
|
||||
### Canary's `GetNativeObject` semantics (xobject.cc:397-483)
|
||||
|
||||
Triggered by: `KeWait*` (and family) is called with a raw kernel-object
|
||||
pointer. The first action of `xeKeWaitForSingleObject` is to call
|
||||
`XObject::GetNativeObject<XObject>(kernel_state, object_ptr)`
|
||||
(threading.cc:972, threading.cc:1070).
|
||||
|
||||
`GetNativeObject(kernel_state, native_ptr, as_type=-1, already_locked=false)`:
|
||||
|
||||
1. Read `X_DISPATCH_HEADER` at `native_ptr`. `as_type` defaults to
|
||||
`header->type` (the dispatcher-type byte: 0=manual event, 1=auto event,
|
||||
2=mutant, 5=semaphore, 8/9=timer).
|
||||
2. Check the `wait_list.flink_ptr` magic: if it equals `kXObjSignature`
|
||||
(`'X','E','N','\0'` = 0x58454E00) the dispatcher has already been adopted;
|
||||
read the existing handle from `wait_list.blink_ptr` and return the existing
|
||||
`XObject` via `LookupObject<XObject>(handle, true)`.
|
||||
3. Otherwise FIRST USE — synthesize:
|
||||
- case 0 / 1: `new XEvent(kernel_state)` → calls
|
||||
`XEvent::InitializeNative(native_ptr, header)` then assigns to result.
|
||||
- case 2: `new XMutant` + `InitializeNative` (but body asserts —
|
||||
unsupported).
|
||||
- case 5: `new XSemaphore` + `InitializeNative` (semaphore->limit /
|
||||
signal_state).
|
||||
- case 3/4/6/7/8/9/18..24: `assert_always()`. Timer not handled here.
|
||||
4. After construction, call `StashHandle(header, object->handle())` — writes
|
||||
`kXObjSignature` to `wait_list.flink_ptr` and the new handle to
|
||||
`wait_list.blink_ptr`. This guarantees idempotency: next call returns the
|
||||
same handle.
|
||||
|
||||
Crucially, the `XObject` ctor `XObject(KernelState*, Type, host_object)`
|
||||
(xobject.cc:35-48) **always** calls `kernel_state->object_table()->AddHandle(this, nullptr)`,
|
||||
which (C+15-α-wired) **emits `handle.create`** via
|
||||
`phase_a::EmitHandleCreateAuto` (object_table.cc:148-201).
|
||||
|
||||
So: first call → 1× `handle.create` emit; subsequent calls (signature
|
||||
matches) → 0 emits.
|
||||
|
||||
### Canary KeWaitForSingleObject entry ordering (threading.cc:969-1013)
|
||||
|
||||
```
|
||||
xeKeWaitForSingleObject(object_ptr, ...):
|
||||
auto object = XObject::GetNativeObject<XObject>(kernel_state(), object_ptr);
|
||||
^^^ emits handle.create on first use (object_type=1 / 3 / etc)
|
||||
if (!object) { return X_STATUS_ABANDONED_WAIT_0; }
|
||||
if (phase_a::IsEnabled()) {
|
||||
uint64_t sid = 0;
|
||||
if (!object->handles().empty()) {
|
||||
sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
|
||||
}
|
||||
phase_a::EmitWaitBegin(&sid, 1, ...); // wait.begin with real SID
|
||||
}
|
||||
result = object->Wait(...);
|
||||
```
|
||||
|
||||
So canary's emit order on first use is: `handle.create` → `wait.begin`,
|
||||
exactly as observed on the cold log (idx=102171 → 102172).
|
||||
|
||||
### Lifetime / refcount
|
||||
|
||||
The synthesized `XObject` lives until its `handle_ref_count` reaches 0. Since
|
||||
`AddHandle` initializes it to 1, and there's no balancing `RemoveHandle`
|
||||
elsewhere in the lazy-wrap path, the wrapper survives for the rest of the
|
||||
session (no `handle.destroy` is emitted by canary either — confirmed by
|
||||
absence in canary's log post-102171). This is structurally consistent with
|
||||
canary's "stash the handle in the dispatcher; reuse forever" pattern.
|
||||
|
||||
For ours we mirror this: emit one `handle.create` on first
|
||||
`ensure_dispatcher_object` adoption; no `handle.destroy` thereafter.
|
||||
|
||||
### Object-type mapping
|
||||
|
||||
| dispatcher header.type | canary symbol | ours `KernelObject` variant | ours object_type code (event_log) |
|
||||
|------------------------|-------------------------|------------------------------|------------------------------------|
|
||||
| 0 (manual event) | XEvent (notification) | Event { manual_reset=true } | EVENT = 1 |
|
||||
| 1 (auto event) | XEvent (synchronization)| Event { manual_reset=false } | EVENT = 1 |
|
||||
| 5 (semaphore) | XSemaphore | Semaphore { .. } | SEMAPHORE = 3 |
|
||||
| 8 (notif timer) | XTimer (canary asserts) | Timer { manual_reset=true } | TIMER = 4 |
|
||||
| 9 (sync timer) | XTimer (canary asserts) | Timer { manual_reset=false } | TIMER = 4 |
|
||||
| 2 (mutant) | XMutant (canary asserts)| (no shadow — return early) | n/a |
|
||||
|
||||
Note canary's `GetNativeObject` `assert_always()`s for timer types 8/9 — it
|
||||
panics on unsupported dispatcher types. Sylpheed apparently never hits these
|
||||
in canary (canary keeps running, so the assert is never tripped in our cold
|
||||
log). Ours's `ensure_dispatcher_object` historically supports timer/8/9 via
|
||||
the shadow path; we keep that for ours's robustness and emit
|
||||
`object_type=TIMER` for them. Cross-engine SID matching only matters for
|
||||
codes both engines emit; ours's extra timer emits would surface as new
|
||||
divergences (acceptable per the catalog).
|
||||
|
||||
## Ours's pre-fix behavior
|
||||
|
||||
- `resolve_pseudo_handle` (exports.rs:4321): only translates the magic
|
||||
`0xFFFF_FFFF` / `0xFFFF_FFFE` self-handle. For any other value it's a
|
||||
pass-through. Native dispatcher pointers and real handles both reach the
|
||||
next step unchanged.
|
||||
- `ensure_dispatcher_object` (exports.rs:4363): on first encounter of a guest
|
||||
pointer (`ptr >= 0x1_0000` and not already in `state.objects`), reads the
|
||||
dispatcher header, creates the shadow `KernelObject::{Event, Semaphore,
|
||||
Timer}`, inserts into `state.objects`, stamps `kXObjSignature` at
|
||||
`+0x08/+0x0C`. **Does NOT emit `handle.create`.** **Does NOT bump
|
||||
`handle_refcount`** (entry stays absent).
|
||||
- `ke_wait_for_single_object` (exports.rs:4954): calls `resolve_pseudo_handle`
|
||||
→ `ensure_dispatcher_object` → `refresh_pkevent_shadow_from_guest` →
|
||||
emits `wait.begin` with `lookup_handle_semantic_id(handle) = 0`
|
||||
(since no SID was ever registered) → calls `do_wait_single`.
|
||||
|
||||
Result observed at idx=102171: ours emits `wait.begin
|
||||
handles_semantic_ids=['0000000000000000']` and zero `handle.create` events.
|
||||
|
||||
## Fix shape
|
||||
|
||||
Symmetric: extend `ensure_dispatcher_object` to do the equivalent of
|
||||
canary's `XObject::AddHandle` post-construction emit. Specifically:
|
||||
|
||||
1. After inserting the shadow into `state.objects` (existing line ~4409),
|
||||
**and** when this is a fresh adoption (the inserted-before check is the
|
||||
guard at line 4367), seed `handle_refcount.insert(ptr, 1)` for lifecycle
|
||||
symmetry (no canary-side `handle.destroy` is expected, but consistency
|
||||
with `alloc_handle_for` is worth ~1 LOC).
|
||||
2. When `event_log::is_enabled()`, call
|
||||
`event_log::emit_handle_create_auto(tid, cycle, /* pc */ 0, object_type,
|
||||
raw_handle_id=ptr, object_name=None)`. The chosen `object_type` matches
|
||||
the variant: Event=1, Semaphore=3, Timer=4. This both emits the event AND
|
||||
registers the SID in the registry so the subsequent `wait.begin` resolves
|
||||
non-zero.
|
||||
|
||||
Order in `ke_wait_for_single_object` already matches canary: synth (now
|
||||
emits `handle.create`) before `wait.begin`. No re-ordering needed.
|
||||
|
||||
For `ke_wait_for_multiple_objects` the same applies — the loop already calls
|
||||
`ensure_dispatcher_object` per pointer (exports.rs:5022). Each first
|
||||
adoption emits one `handle.create` and the SID array used by `wait.begin`
|
||||
becomes non-zero per element.
|
||||
|
||||
### Idempotency / refcount lifecycle
|
||||
|
||||
- First-touch: shadow inserted + `handle_refcount[ptr] = 1` + emit
|
||||
`handle.create`.
|
||||
- Re-touch (same pointer): early return at the `contains_key` guard → no
|
||||
emit, no refcount change. Matches canary's "already-initialized" branch.
|
||||
- Destroy: there is no path that destroys these shadows in ours today
|
||||
(parity with canary). If someone later wires `handle.destroy` on
|
||||
shadow-removal, the refcount will be present and decrement-to-zero will
|
||||
fire the symmetric event. Not in scope here.
|
||||
|
||||
### Scope
|
||||
|
||||
C+17 strictly addresses D-2/D-3/D-4. We **do not** touch:
|
||||
|
||||
- `NtWait*` (handle-based; already SID-resolves through the registry once
|
||||
the underlying `Nt*Create*` emit fires `handle.create`).
|
||||
- `Ke{Set,Reset,Pulse}Event` / `KeReleaseSemaphore` paths that also call
|
||||
`ensure_dispatcher_object`. These will now emit `handle.create` on their
|
||||
first-touch — that's EXPECTED engine-symmetric behavior, and matches
|
||||
canary (every entry into `GetNativeObject` may emit). The wait-side has
|
||||
pre-context emits in both engines, so observable order is preserved.
|
||||
|
||||
## Tripstone register
|
||||
|
||||
- Reading-error #28 (canary semantics first): VERIFIED.
|
||||
- Reading-error #23 (widely-used primitive flip): MITIGATED via cold-vs-cold
|
||||
gate and HARD-REVERT-IF-MAIN-REGRESSES discipline.
|
||||
- Reading-error #19 (host-side emits): event_log::is_enabled() guard
|
||||
preserved on every new emit — default-off zero cost.
|
||||
- Refcount semantics: matches canary's "stash forever" lazy-wrap pattern;
|
||||
not symmetric with `alloc_handle_for`'s NtClose-balanced lifecycle (which
|
||||
is correct — these are different kinds of handles).
|
||||
|
||||
## Cascade prediction (for the run)
|
||||
|
||||
A=verify canary's GetNativeObject semantics: DONE.
|
||||
B=land symmetric ~30-50 LOC fix: PENDING.
|
||||
C=main matched-prefix > 102,171: ~75%.
|
||||
D=sister chains advance (4 chains): ~75%.
|
||||
E=NEW divergences surface (downstream): ~80% (intended).
|
||||
Reference in New Issue
Block a user