# Phase A — Event Log Schema v1 **Status:** frozen for Phase A and Phase B. Adding a new event kind requires a `schema_version` bump and a coordinated update in both engines + the diff tool. ## Wire format JSONL — one JSON object per line, UTF-8, `\n`-terminated. Both engines emit the same byte format. The **first line** of every event-log file MUST be a `schema_version` event: ```json {"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,"guest_cycle":0,"host_ns":0,"deterministic":true,"payload":{"version":1,"emitter_build":""}} ``` The diff tool refuses to parse a file whose first event is not `schema_version` with version `1`. ## Common fields (every event) | Field | Type | Notes | |---|---|---| | `schema_version` | int | always `1` in this phase | | `engine` | string | `"canary"` or `"ours"` | | `kind` | string | one of the v1 kinds below | | `tid` | int | guest thread id of the calling thread (host TID never logged) | | `tid_event_idx` | int | **per-tid monotonic, starts at 0** — the diff key | | `guest_cycle` | int | per-engine monotonic guest-instruction count; `0` if the engine cannot supply one (see "Cycle source" below). NOT used by the diff tool for correctness — `tid_event_idx` is the canonical key. | | `host_ns` | int | host monotonic-clock ns since process start; debug only, never compared by diff | | `deterministic` | bool | `false` if any payload field is derived from host time / raw allocator address / RNG / etc. Diff tool skip-compares non-deterministic fields. | | `payload` | object | kind-specific (see below) | ## Cycle source notes - **canary**: the PPC `tb` (timebase) register can be read from the PPCContext passed into shim handlers. If a hook is on a path that does not have access to a PPCContext (e.g. a host-side handle-table destructor), the emitter MUST set `guest_cycle = 0` and leave `deterministic = false` on the payload-side metadata. The diff tool ignores `guest_cycle` for ordering — `tid_event_idx` is canonical. - **ours**: `scheduler.thread(current_ref()).ctx.timebase` (already maintained per guest thread). ## Per-tid event index Both engines maintain a per-tid monotonic counter starting at `0`. The counter is bumped **before** the event is serialized, so the first event for tid `N` has `tid_event_idx = 0`. The `schema_version` event is special: it is emitted by the writer thread (typically the boot thread before any guest code has run) with `tid = 0` and `tid_event_idx = 0`. The actual guest thread `0` does not exist; the diff tool treats `tid = 0` as the schema header only. ## Handle semantic ID Canary and ours produce guest handles in different ranges (canary: `0xF8xxxxxx` region; ours: bump-allocated `0x4, 0x8, 0xC, …`). Raw handle IDs are unsuitable as a cross-engine identity. Instead, both engines compute a stable **handle semantic ID** at handle creation time using **FNV-1a 64-bit** over a fixed-format byte string. FNV-1a is used (not SHA256) because both engines can implement it in <10 lines with no dependency, and the diff tool only needs a deterministic identity hash — not a crypto property. ``` input_bytes = le_u32(create_site_pc) ‖ le_u32(creating_tid) ‖ le_u64(tid_event_idx_at_creation) ‖ le_u32(object_type) hash = 0xCBF29CE484222325 for each byte b in input_bytes: hash = (hash XOR b) * 0x100000001B3 mod 2^64 handle_semantic_id = format("{:016x}", hash) ``` Both engines MUST emit the lowercase 16-hex-char form. The `create_site_pc` is the guest PC at the call site of the kernel call that created the handle: in canary, `PPCContext::lr - 4` (the `bl` to the import stub); in ours, the equivalent return address from the syscall dispatcher. **Object type codes** (v1 — both engines agree): | Code | Type | |---|---| | `0x00` | Unknown | | `0x01` | Event | | `0x02` | Mutant | | `0x03` | Semaphore | | `0x04` | Timer | | `0x05` | Thread | | `0x06` | File | | `0x07` | IoCompletion | | `0x08` | Module | | `0x09` | EnumState | | `0x0A` | Section | | `0x0B` | Notification | All subsequent events that reference a handle emit BOTH `handle_semantic_id` (the diff key) and `raw_handle_id` (engine-local, never compared). ## Event kinds (v1) ### `schema_version` Header event. `payload = {"version": 1, "emitter_build": ""}`. ### `thread.create` Emitted by the **parent** thread at the kernel call that creates the new thread. ```json "payload": { "handle_semantic_id": "0123456789abcdef", "parent_tid": 1, "entry_pc": "0x82001234", "ctx_ptr": "0xbce25340", "priority": 0, "affinity": 1, "stack_size": 65536, "suspended": false } ``` ### `thread.exit` Emitted by the **exiting** thread (last event before tid disappears). ```json "payload": {"exit_code": 0} ``` ### `thread.suspend` / `thread.resume` ```json "payload": {"target_tid": 13} ``` ### `kernel.call` Emit at handler entry, **before** any side effects. ```json "payload": { "name": "NtCreateFile", "args": {"file_handle_ptr": "0x70000010", "desired_access": "0x80100080", "obj_attr_ptr": "0x70000020", ...}, "args_resolved": {"path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik"} } ``` - Numeric args use `0x`-prefixed hex strings if pointer-typed; ints stay as ints. - `args_resolved` is a best-effort dereference (strings, struct dumps, buffer summaries). Optional. ### `kernel.return` Emit at handler exit, **after** all side effects committed. ```json "payload": { "name": "NtCreateFile", "return_value": 0, "status": "0x00000000", "side_effects": [ {"kind": "handle.create", "handle_semantic_id": "...", "object_type": 6, "raw_handle_id": "0x40"} ] } ``` The `side_effects` array MAY duplicate events also emitted as standalone (`handle.create`). The diff tool treats both as authoritative; duplicates do not cause divergence. ### `handle.create` For host-side creates not tied to a kernel call (rare). ```json "payload": { "handle_semantic_id": "0123456789abcdef", "object_type": 1, "object_name": null, "raw_handle_id": "0xf8000048" } ``` ### `handle.destroy` ```json "payload": { "handle_semantic_id": "0123456789abcdef", "raw_handle_id": "0xf8000048", "prior_refcount": 1 } ``` ### `wait.begin` ```json "payload": { "handles_semantic_ids": ["0123...", "abcd..."], "timeout_ns": -1, "alertable": false, "wait_type": "any" } ``` `timeout_ns = -1` means INFINITE. `wait_type` is `"any"` or `"all"`. ### `wait.end` ```json "payload": { "status": "0x00000000", "woken_by_semantic_id": "0123456789abcdef", "wait_duration_cycles": 12345 } ``` `wait_duration_cycles` is `deterministic = false` (host scheduling affects it). `woken_by_semantic_id` is null on timeout / alerted. ### `mem.write` **OPT-IN — gated by a separate cvar (`phase_a_event_log_mem_writes`, default false).** In Phase A this kind is reserved; emitters MAY ship a TODO stub. Schema: ```json "payload": { "guest_addr": "0x82000000", "value": "0x12345678", "size": 4, "source": "guest_jit" } ``` ### `vfs.open` / `vfs.read` / `vfs.close` File-IO events, separate from `kernel.call` so the diff tool can match on canonical path: ```json "payload": {"canonical_path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik", "raw_handle_id": "0x40", "handle_semantic_id": "..."} ``` ### `import.call` Emitted at the syscall dispatcher (ours) or the import-stub JIT trap (canary), one per imported function invocation, **before** the implementing `kernel.call`. ```json "payload": { "module": "xboxkrnl.exe", "ord": 0x101, "name": "NtCreateFile" } ``` ## Diff-tool field-comparison rules | Field | Rule | |---|---| | `engine` | skipped (always differs) | | `host_ns` | skipped (host-clock) | | `guest_cycle` | skipped (engines disagree on absolute count; diff uses `tid_event_idx`) | | `raw_handle_id` | skipped (engines use different handle namespaces) | | `handle_semantic_id` | **C+15-α: skipped** (engine-local — see below) | | `handles_semantic_ids` (wait.begin) | **C+15-α: skipped** (same reason) | | `parent_tid` (thread.create) | **C+15-α: skipped** (engine-local guest tids) | | `ctx_ptr` (thread.create) | **C+22 v1.7: per-(tid, kind, field) ordinal sentinel** (``) — host-heap-derived VA, AUDIT-043 ε class | | `woken_by_semantic_id` (wait.end) | **C+15-α: skipped** (engine-local SID) | | `deterministic` (event-level field) | skipped (metadata) | | Any payload field listed under a non-deterministic kind | skipped where flagged | | All other payload fields | strict equality | ### Phase C+15-α note on `handle_semantic_id` The SID computation includes `creating_tid` as input, but guest TIDs differ between engines (canary's tid=6 maps to ours's tid=1 on the main chain). Both engines compute SIDs **using their own local tids**, so the same logical handle gets two different SIDs across engines. The diff tool skip-compares SID fields and relies on `tid_event_idx + object_type` for alignment. A future schema v2 could canonicalize SIDs via the diff tool's tid map and restore strict comparison. For v1.1 the simpler skip-policy suffices. ## Shared-global SIDs (v1.2 — added in Phase C+18) A subset of guest kernel dispatcher objects (`KEVENT`, `KSEMAPHORE`, `KTIMER`, `KMUTANT`) are **process-global**: they live in statically-initialized or pre-allocated guest memory and are touched by MULTIPLE guest threads during boot. Examples include the XAudio voice-volume change-mask semaphore at `0x828a3230` in Sylpheed. Canary's `XObject::GetNativeObject` (`src/xenia/kernel/xobject.cc:397-483`) and ours's `ensure_dispatcher_object` (`crates/xenia-kernel/src/exports.rs:4363`) **lazy-wrap** these dispatchers on **first guest-thread touch**: the first `KeWait*` invocation that passes the raw kernel-object pointer synthesizes the `XObject` wrapper, stamps the `X_DISPATCH_HEADER` with the `kXObjSignature` marker (`'X','E','N','\0' = 0x58454E00`), stashes the handle, and emits `handle.create`. Subsequent touches find the marker and short-circuit without emit (per-pointer idempotent). ### The first-toucher race **Which** guest thread wins the "first toucher" race is **timing-dependent**: - Canary and ours have different host schedulers, JIT throughput, and guest-thread bootstrap ordering. - Even within the same engine across runs the first-toucher can differ — but each engine produces a deterministic per-run total ordering, so cold-vs-cold reproducibility holds. The per-thread SID recipe `semantic_id(create_site_pc, creating_tid, tid_event_idx_at_creation, object_type)` (v1) depends on BOTH `creating_tid` and `tid_event_idx_at_creation`, so: - Same dispatcher → DIFFERENT SIDs in each engine (race-dependent). - `handle.create` for the same object lands on different per-tid streams in canary vs ours. The C+17 fix made ours emit `handle.create` for these synthesized shadows, but the C+17 D-NEW-3 regression on tid=15→10 was exactly the first-toucher race: ours's tid=10 was the first toucher locally; canary's tid=15 was NOT the first toucher in its run — some other canary tid had already adopted `0x828a3230`. ours's tid=10 emitted an "extra" `handle.create` that canary's tid=15 lacked, and the diff tool flagged a kind mismatch at idx=2. ### The C+18 fix: deterministic SID recipe Process-global dispatchers use a **second** SID recipe that is scheduling-invariant. Both engines now use: ``` SHARED_GLOBAL_SID_MARKER = 0xC01AB005 (fixed sentinel, both engines) input_bytes = le_u32(SHARED_GLOBAL_SID_MARKER) // 4 bytes — "create_site_pc" slot ‖ le_u32(0) // 4 bytes — "creating_tid" slot ‖ le_u64(pointer) // 8 bytes — "tid_event_idx" slot ‖ le_u32(object_type) // 4 bytes hash = FNV-1a-64(input_bytes) shared_global_sid = format("{:016x}", hash) ``` The marker `0xC01AB005` is outside any plausible guest-PC range (PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap 0x4xxxxxxx), so it can never collide with a regular per-thread SID (which uses a real guest PC as `create_site_pc`). Both engines compute the SAME SID for the same dispatcher pointer regardless of: - which guest thread is the first toucher, - the `tid_event_idx_at_creation`, - the per-engine scheduling order. ### Which call sites use which recipe | Call site | SID recipe | |--------------------------------------------------------|-------------------| | `KernelState::alloc_handle_for` (ours) | per-thread | | `ObjectTable::AddHandle` direct (canary) | per-thread | | `ensure_dispatcher_object` (ours) | **shared-global** | | `XObject::GetNativeObject` synthesized (canary) | **shared-global** | Regular per-thread `handle.create` events (file open, thread create, named-event create, etc.) keep the v1 per-thread recipe. The shared-global recipe is restricted to lazy-wrap synthesis. ### Diff tool: cross-tid floating `handle.create` matching The diff tool pre-pass collects all shared-global SIDs in either engine's stream. A `handle.create` event is detected as shared-global by recomputing the deterministic SID from its `(raw_handle_id, object_type)` payload and comparing against the event's `handle_semantic_id`. Regular per-thread SIDs cannot match this check by construction. When per-tid alignment finds a kind mismatch and one side has a shared-global `handle.create` whose SID is in the floating set: - The diff tool advances ONLY that side's stream pointer past the floating event. - Re-compare at the same canonical position. The diff report's summary table shows a `floating_skipped (c/o)` column for visibility — counts of absorbed events per side. ### Index relaxation The C+18 fix relaxes the legacy diff-tool rule that requires `canary.tid_event_idx == ours.tid_event_idx` for matching events. With floating absorption, the per-tid indices can drift by 1 between the two sides — but the `kind` and `payload` comparisons remain strict. The raw indices are still preserved on the events themselves (useful for debugging and report context). ### Backward compatibility - Wire format unchanged. `schema_version` is still `1`. - Pre-C+18 event logs (no shared-global SIDs in the stream) trigger the legacy code path automatically — the floating set is empty. - The marker constant `0xC01AB005` MUST be exactly this value in both engines and the diff tool. Tests in both engines plus `tools/diff-events/test_diff_events.py` lock it in. ## Wait-begin floating absorb (v1.3 — added in Phase C+21) ### Motivation Canary's `RtlEnterCriticalSection` (and its symmetric counterparts — `KeWaitForSingleObject` invoked on a process-global dispatcher, mutex/semaphore contended-acquire paths) emits `wait.begin` **only on the contended slow path**. The fast path (uncontended atomic-CAS, or recursive bump) emits NO `wait.begin` and only the `kernel.call` → `kernel.return` pair. Which path is taken depends on whether ANOTHER guest thread is currently holding the dispatcher when the wait is attempted — i.e. it is **host-scheduler-driven**, varying across cold runs of the same engine. Reading-error class **#32** (documented in C+20's `investigation.md`) captures this: cross-checking 3 fresh canary cold runs at canary tid=6 idx 104,606 showed: - jitter-1: `wait.begin sid=75ae880ec432eb36` (contended) - jitter-2: `kernel.return` (fast — matches ours) - jitter-3: offset-shifted wait.begin at a different idx with a different SID The matched-prefix metric is unreliable inside such regions if the diff tool treats wait.begin events as strictly positional. ### The fix A `wait.begin` event is **floating** if at least one of its `payload.handles_semantic_ids` references a shared-global SID (see §"Shared-global SIDs"). During the per-tid two-pointer walk: - If one side has a floating `wait.begin` and the other has a different kind at the same canonical position, advance ONLY the wait.begin side's pointer and re-compare. `wait_type=all` waits are floating as long as ANY single handle in the set is shared-global — the entire wait's blocking behavior is timing-dependent if even one of its handles is on a process-global dispatcher. ### Shared-global SID detection (extended in C+21) The diff tool's `collect_shared_global_sids` pre-pass now unions TWO sources: 1. **Recipe-matching `handle.create` events** (Phase C+18 — direct). This catches ours's `ensure_dispatcher_object` output where `raw_handle_id == ptr` (the recipe-input pointer). 2. **Cross-tid usage heuristic** (Phase C+21 — indirect). Any SID referenced via `handle.create` OR `wait.begin` on **two or more distinct guest tids** in EITHER engine is treated as shared-global. The cross-tid heuristic exists because canary's `EmitHandleCreateSharedGlobal` (`event_log.cc:435`) emits the SID computed from the dispatcher VA but stashes `object->handle()` (a handle-table slot in the `0xF8xxxxxx` region) as `raw_handle_id`. Those two values DIFFER, so canary's shared-global `handle.create` events are NOT recipe-recognizable from their payload alone. Multi-tid SID usage is a robust observational signal: per-thread SIDs by construction stay on the single creating tid (their hash inputs include `creating_tid`), so any cross-tid SID usage indicates a process-global dispatcher. ### Risk of over-absorption (and why it's bounded) The cross-tid heuristic could in principle mis-classify a per-thread SID that one thread creates and another thread waits on — a legitimate cross-thread synchronization pattern. The floating-absorb, however, only fires on a **kind mismatch** at the canonical position. Per-thread waits that match strictly on both sides advance normally without any absorb. The heuristic only loosens alignment when one side is missing a `handle.create` or `wait.begin` — exactly the scheduling-jitter window the C+21 fix targets. ### Diff-tool report changes The summary table's `floating_skipped (c/o)` column is split into two columns: - `floating_create (c/o)` — C+18 `handle.create` absorptions. - `floating_wait (c/o)` — C+21 `wait.begin` absorptions. Both per-side and observation-only — counts may legitimately be non-zero in a clean run. ### Backward compatibility - Wire format unchanged. `schema_version` is still `1`. - Pre-C+21 event logs (no `wait.begin` events that reference shared-global SIDs) trigger no new behavior — the wait absorption branches are inert. - The C+18 floating-create logic is unchanged; the C+21 fix is strictly additive. - Engine source is UNCHANGED in C+21 — the fix is in the diff tool only. ## contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18) ### Motivation The 104,607 cap is canary's tid=6 contending on a CS while ours's tid=1 fast-paths through the same call (Phase C+22). Schedules diverge for host-OS reasons, so neither engine is "wrong" — but matched-prefix stalls. Phase D's H' approach makes ours's `rtl_enter_critical_section` *replay* canary's contention by consulting a per-call manifest built from canary's contention trace. Stage 1 (this section) introduces the canary-side **emitter** for that manifest: a new event kind `contention.observed` that fires from `RtlEnterCriticalSection_entry` (`xboxkrnl_rtl.cc:596-633`) just before the call falls through to `xeKeWaitForSingleObject` after spin-loop exhaustion. Cvar-gated (`kernel_emit_contention`, default false) so default canary behavior is byte-identical. ### Event shape ```json { "schema_version": 1, "engine": "canary", "kind": "contention.observed", "tid": , "tid_event_idx": , "guest_cycle": 0, "host_ns": , "deterministic": true, "payload": { "cs_ptr": "0xHHHHHHHH", // guest VA of the RTL_CRITICAL_SECTION "site_sid": "HHHHHHHHHHHHHHHH", // shared-global SID (see below) "contended": true // always true at v1.4 (uncontended is implicit) } } ``` `site_sid` is computed via the **C+18 shared-global SID recipe**: ``` site_sid = FNV-1a-64 over ( kSharedGlobalSidMarker [u32 LE] // 0xC01AB005 , 0 [u32 LE] // creating_tid (unused) , cs_ptr as u64 [u64 LE] // pointer-as-idx , kObjCriticalSection [u32 LE] // 0x0C, new in v1.4 ) ``` Both engines compute the same SID for the same CS pointer. The marker constant `kObjCriticalSection = 0x0C` is the new ObjectType value introduced for this kind; it does NOT correspond to a real XObject (CS lives as a guest-memory struct, not a handle-tabled object). ### When emitted (canary) In `RtlEnterCriticalSection_entry`: 1. Recursive-lock fast path (already own lock) → **NO emit** (not contention). 2. Spin-loop succeeds (`atomic_cas` flips `lock_count` from -1 → 0) → **NO emit** (fast acquire). 3. Spin-loop exhausted **AND** `atomic_inc(&cs->lock_count) != 0` → **EMIT** with `contended=true`, then `xeKeWaitForSingleObject`. 4. Spin-loop exhausted **AND** `atomic_inc(...) == 0` (CS became free between spin and inc) → **NO emit** (we won the race after spin). The emit point sits **between** atomic_inc's positive result and the `xeKeWaitForSingleObject` call, so the new event always precedes the existing `wait.begin` event in the per-tid ordinal. ### When emitted (ours, Stage 3 — pending) Stage 3 will add a symmetric emit in `rtl_enter_critical_section` (`xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946`) at the forced-park branch driven by the manifest. This keeps per-tid ordinals aligned across engines after replay. ### Diff-tool treatment (Stage 4 — pending) `contention.observed` will be added to `ENGINE_LOCAL_KINDS` in `diff_events.py`: the per-tid pointer advances past these events on either side without comparison. This keeps matched-prefix counts unchanged when ONE side emits the event (Stage 1's canary-only world) or when BOTH emit at the same ordinal (Stage 3's parity world). ### Cvar default + byte-identity `kernel_emit_contention=false` by default. With cvar=false, the helper `phase_a::EmitContentionObserved` short-circuits at the cvar check before any `IsEnabled()` lookup. The pre-Stage-1 canary code path is preserved byte-for-byte; cvar-OFF cold runs produce zero `contention.observed` events (validated on the Stage 1 cold run: 0 occurrences in a 4.4 GB / 18.6 M event trace). ## Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18) ### Status **Band-aid.** Explicit annotation: this absorber CROSSES the reading-error #23 boundary in spirit. It folds real guest control-flow divergence at the diff-tool layer. It exists because the underlying root cause — producer-throughput divergence under the cooperative-vs-preemptive scheduling mismatch (see Phase D forensics) — is **explicitly out of scope** for the H' plan: fixing it in ours's engine would require preempting the cooperative scheduler, which invalidates 23 phases of digest stability. The absorber is the practical compromise. ### Trigger shape The absorber fires ONLY at a kind mismatch of: - canary[ic] = `import.call` with `payload.name == "RtlEnterCriticalSection"` - ours[io] = `import.call` with `payload.name == "RtlLeaveCriticalSection"` For any other kind mismatch, the absorber is silent. This narrowness is intentional: real engine divergences appear in other shapes and must still surface. ### Behavior When the trigger pattern matches, canary's stream is scanned for one or more balanced `[Enter-block, Leave-block]` pairs immediately following the trigger position: - An Enter-block is 3 consecutive events: `import.call RtlEnterCriticalSection → kernel.call RtlEnterCriticalSection → kernel.return RtlEnterCriticalSection`. - A Leave-block is 3 consecutive events with `RtlLeaveCriticalSection`. The absorber consumes pairs greedily up to a cap of `_NESTED_CS_PAIR_CAP = 32` pairs (empirically, Sylpheed's worst-case is ~10-15 pairs at the 104,607 cap). After consuming each pair, it checks whether canary's next event has the SAME `kind` AND same `payload.name` as ours[io]. The first convergence wins; canary's pointer is advanced past the absorbed pairs. If no convergence is found within the cap, the absorber returns None and the divergence falls through to normal reporting. ### Why this is safe (within #23's spirit) 1. The absorption only happens when canary's stream re-aligns with ours's stream past the nested block. If it doesn't re-align, the real divergence is reported. 2. The nested-block shape matches a specific PPC pattern: the consumer thread in canary acquires a CS, calls a helper that iterates a tree/registry, takes the nested-CS-enter path for each item, and releases the outer CS. Ours's tree is shorter so it skips this. The net effect on guest state is bounded: ours has fewer items processed in this iteration, but the EVENT stream past the absorption resumes the same logical operation. 3. The Phase B `image_loaded_sha256` is the foundational invariant. It's unaffected by this absorber (no engine source change). ### Why this is NOT safe in the general sense - Diverging downstream state IS lost: ours's tree has fewer entries than canary's after the absorbed block. Subsequent ours operations that touch the tree will behave differently. Other absorbers / fixes will be needed if those state-differences manifest later. - A future engine bug that produces a spuriously nested Enter+Leave pair could be falsely absorbed. Mitigation: the absorber requires canary's post-block stream to re-align with ours's; spurious nested pairs without re-alignment fall through to normal divergence reporting. ### Empirical result (Sylpheed 104,607 cap) Pre-absorber (post-Stage-3+4): main matched-prefix = 104,607 (cap). Post-absorber: main matched-prefix = **105,046 (+439 events)**. The next divergence is at idx 105,046 on `VdInitializeEngines.return_value` (canary=1, ours=0) — an unrelated engine bug in the video subsystem, NOT a recurrence of the cap pattern. Sister chains preserved (11/32/4/41/16). ### Tests Three unit tests in `test_diff_events.py`: - `test_nested_cs_cleanup_block_absorbed_when_convergent` — folds one nested pair - `test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges` — confirms re-alignment requirement - `test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup` — negative case ## sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21) ### Motivation AUDIT-069 Sessions 1-5 established that ours under-produces semaphore releases by ~80% on the work-semaphore vs canary (`99 vs 414` in S5, refined in S6 to `83 vs 414` apples-to-apples on the work semaphore alone). The measurement infrastructure was a one-off cvar (`audit_70_semaphore_release_watch`, hand-built per-handle log lines) plus an ours-side `--lr-trace` capture at the wrapper-entry PC. Future AUDIT-070+ sessions and any general regression triage need this metric to be diff-visible without bespoke cvars per investigation. `sema.release` lifts the AUDIT-070 cvar's signal into the Phase A schema as a **symmetric** event kind in both engines. ### Event shape ```json { "schema_version": 1, "engine": "canary", "kind": "sema.release", "tid": , "tid_event_idx": , "guest_cycle": , "host_ns": , "deterministic": true, "payload": { "handle_semantic_id": "HHHHHHHHHHHHHHHH", // shared-global SID for the work-sem "raw_handle_id": "0xHHHHHHHH", // engine-local "release_count": 1, // games typically release 1 "previous_count": 0, // semaphore count BEFORE release "caller_pc": "0xHHHHHHHH" // guest LR at release time } } ``` ### SID recipe The work-semaphore in Sylpheed (canary handle `0xF800003C`, ours handle `0x1044`) is a **process-global dispatcher** in the C+18 sense: it lives in pre-allocated guest memory and is touched by multiple guest threads (main, worker, cache-thread, other producers). Its `handle_semantic_id` SHOULD use the **shared-global recipe** (`ComputeSharedGlobalSemanticId(dispatcher_ptr, kObjSemaphore=0x03)`) so canary and ours produce the same SID for the same guest dispatcher. Per-thread semaphores (rare in Sylpheed) MAY use the v1 per-thread recipe; the diff tool does NOT compare SIDs for `sema.release` (the kind is engine-local positionally — see below). ### Why engine-local Per AUDIT-069 H3 and S6's first-N=20 measurement, the cadence and ordinal interleaving of releases between the worker, main, and cache-thread are **timing-dependent**: the first 20 releases match perfectly across engines, but worker tid diverges at canary ord=83 when the cache-thread's first release fires (which ours never emits because ours's cache-thread wedges at `sub_821CB030+0x1AC`). Strict positional alignment would always trip on this known divergence. `sema.release` is therefore in `ENGINE_LOCAL_KINDS` in the diff tool (alongside `contention.observed`): both engines emit, but the diff tool advances past these events on either side without alignment. The **count** is surfaced in the report's "Counted engine-local kinds" summary table (per-tid + total per engine) so cadence regressions are diff-visible at-a-glance. ### Emit points (planned, NOT yet wired) - **Canary**: extend `audit_70_semaphore_release_watch` to call `phase_a::EmitSemaRelease(handle, count, prev_count)` from `NtReleaseSemaphore_entry` + `xeKeReleaseSemaphore`. Cvar gating remains the existing `audit_70_semaphore_release_watch` (or a new `phase_a_event_log_sema_releases=false` for finer control). - **Ours**: emit `sema.release` from `nt_release_semaphore` in `crates/xenia-kernel/src/exports.rs` and from `KSemaphore::release` (kernel-mode equivalent). Default-off via a runtime flag; default cold runs must remain digest-stable. Both engines MUST emit at handler entry (not wrapper-internal) so the event count corresponds 1:1 to guest `NtReleaseSemaphore` invocations, matching the canary cvar's existing semantics. ### Status - **Diff tool**: support landed (this session, v1.6). `sema.release` in `ENGINE_LOCAL_KINDS` + `COUNTED_ENGINE_LOCAL_KINDS`; counts surfaced in report summary; 3 new tests in `test_diff_events.py`. - **Canary emit**: NOT YET WIRED. Planned for AUDIT-070+ when the root cause investigation requires it. Existing cvar `audit_70_semaphore_release_watch` continues to emit non-schema log lines (used by S5/S6 captures). - **Ours emit**: NOT YET WIRED. See above. ### Backward compatibility - Wire format unchanged. `schema_version` is still `1`. - Pre-v1.6 event logs (no `sema.release` events) trigger no new behavior — the engine-local skip branches are inert; the "Counted engine-local kinds" report section is suppressed when no counted-kind events exist. - Diff tool changes are purely additive: existing engine binaries diff identically pre- and post-v1.6. ## Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26) ### Motivation C+2 (`ALLOCATOR_RETURN_FNS`) canonicalizes `kernel.return.return_value` for a known set of host-allocator-returning exports (`MmAllocatePhysicalMemoryEx`, `RtlAllocateHeap`, …). That covers the case where the allocated VA appears as the function's *return* value. But the same allocator-drift class (AUDIT-043 ε: canary's BC physical heap `0xBCxxxxxx` vs ours's unified user heap `0x4xxxxxxx`) ALSO surfaces inside **typed event payloads** of non-allocator exports — most notably the `thread.create.ctx_ptr` field, which holds the host-allocated TLS/context block that `ExCreateThread` passes to the new guest thread's r3. Empirical surface (C+22 cold-vs-cold idx 105,128 on the Sylpheed audio-stack worker `ExCreateThread(entry_pc=0x824cd458)`): | field | canary | ours | |---|---|---| | `ctx_ptr` | `0xbe56bb3c` (BC physical heap) | `0x42453b3c` (unified user heap) | | `entry_pc` | `0x824cd458` | `0x824cd458` (bit-identical — game code) | | `priority` | `0` | `0` | | `affinity` | `4` | `4` | | `stack_size` | `32768` | `32768` | | `suspended` | `false` | `false` | The C+2 `ALLOCATOR_RETURN_FNS` mechanism doesn't help here because `ExCreateThread`'s return value is the new thread's *handle* (canary's `0xF8xxxxxx` vs ours's `0x4, 0x8, …`), already covered by `handle_semantic_id` skip-policy. The host-heap-allocated context block is a side-channel field inside the `thread.create` event payload. ### The fix `HOST_HEAP_PAYLOAD_FIELDS_BY_KIND` maps event kind → tuple of payload field names. Each listed field's value (expected `0x`-prefixed hex string) is rewritten to a per-(tid, kind, field) ordinal sentinel `__>` BEFORE payload comparison. The mechanism mirrors `canonicalize_allocator_returns` exactly, restricted to typed payload fields. Initial set (v1.7): ```python HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = { "thread.create": ("ctx_ptr",), } ``` ### Strict-field preservation For each canonicalized event kind, the **strict** fields (game-visible attributes that MUST match across engines) are untouched. For `thread.create` these are: - `entry_pc` — guest VA of the new thread's entry function, bit- identical in both engines because both engines load the same XEX and the entry comes from guest code. - `priority`, `affinity`, `stack_size`, `suspended` — game-visible thread attributes the guest passes to `ExCreateThread`. Skip-policy fields (`handle_semantic_id`, `parent_tid`) continue to be skipped via `SKIP_PAYLOAD_FIELDS_BY_KIND` (unchanged from C+15-α — see "Diff-tool field-comparison rules" above). ### Why `parent_tid` does NOT need new canonicalization Per the C+15-α skip-policy table, `parent_tid` is already in `SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]`. The diff tool pairs guest TIDs at the chain level (`--tid-map` or `auto_tid_map`), and the per-event `parent_tid` is engine-local (canary tid=6 vs ours tid=1 for the same logical "main thread" chain). Skipping is sufficient — no ordinal sentinel needed. Could a future schema v2 canonicalize `parent_tid` via the tid map? Yes, but it would surface mismatches as a *map gap* rather than as a clearer per-tid alignment failure that's already visible at chain boundaries. The v1.x skip-policy is the simpler choice; tests pin the existing behavior so it doesn't regress. ### Ordinal-count contract As with `ALLOCATOR_RETURN_FNS`: if one engine emits MORE `thread.create` events on a given tid than the other, ordinals drift and the next typed event surfaces a divergence against whatever the other side has at that position. Ordinal-count mismatch IS a behavioral divergence — the canonicalization preserves divergence detection, only collapsing host-allocator-VA noise. ### Defensive value handling If `ctx_ptr` is non-string (`None`, int, missing) — pre-C+22 event logs whose emitter omits the field — the canonicalizer leaves it untouched and does NOT consume an ordinal. The next string-typed value gets ordinal 0. This keeps pre-v1.7 logs diffable without forcing an emitter retrofit. ### Backward compatibility - Wire format unchanged. `schema_version` is still `1`. - Pre-C+22 event logs whose `thread.create.ctx_ptr` happens to bit-match (e.g. static-allocator addresses like `0x828F3D08` that BOTH engines use for the pre-XEX kernel-state ctxs) still match strictly via the ordinal sentinel — they get the same ordinal in both engines. - The `--no-canonicalize-host-heap-fields` CLI flag disables the pass (reverts to raw-VA comparison), mirroring the existing `--no-canonicalize-allocators`. Used by gate tests and investigation rerun. - Engine source is UNCHANGED in C+22 — the fix is in the diff tool only. ### Extension shape The map shape `kind -> (field, …)` is intentionally minimal: each entry is one event kind plus the fields on it that hold host-heap VAs. Future entries could include e.g. `thread.create.tls_ptr` (if such a field is added to the schema) or a hypothetical `vfs.mmap.host_ptr`. Strict-field policy remains: any field NOT listed here is compared bit-identically. ## Forward compatibility Phase A's original schema-v1 declared 13 sections (16 distinct kind strings); Phase A wired 4 of them. Phase C+15-α wired an additional 5 (`handle.create`, `handle.destroy`, `thread.create`, `thread.exit`, `wait.begin`). `wait.end`, `thread.suspend/resume`, `mem.write`, `vfs.open/read/close` remain declared but unwired; adding them is additive surface area at schema v1.1+. A future schema v2 may break wire format (e.g. canonical SIDs, structured args). Both engines pin `schema_version = 1` in this phase; the diff tool refuses to mix v1 and v2 inputs.