Files
xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

865 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase A — Event Log Schema v1
**Status:** frozen for Phase A and Phase B. Adding a new event kind requires a `schema_version` bump and a coordinated update in both engines + the diff tool.
## Wire format
JSONL — one JSON object per line, UTF-8, `\n`-terminated. Both engines emit the same byte format.
The **first line** of every event-log file MUST be a `schema_version` event:
```json
{"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,"guest_cycle":0,"host_ns":0,"deterministic":true,"payload":{"version":1,"emitter_build":"<commit-or-build-id>"}}
```
The diff tool refuses to parse a file whose first event is not `schema_version` with version `1`.
## Common fields (every event)
| Field | Type | Notes |
|---|---|---|
| `schema_version` | int | always `1` in this phase |
| `engine` | string | `"canary"` or `"ours"` |
| `kind` | string | one of the v1 kinds below |
| `tid` | int | guest thread id of the calling thread (host TID never logged) |
| `tid_event_idx` | int | **per-tid monotonic, starts at 0** — the diff key |
| `guest_cycle` | int | per-engine monotonic guest-instruction count; `0` if the engine cannot supply one (see "Cycle source" below). NOT used by the diff tool for correctness — `tid_event_idx` is the canonical key. |
| `host_ns` | int | host monotonic-clock ns since process start; debug only, never compared by diff |
| `deterministic` | bool | `false` if any payload field is derived from host time / raw allocator address / RNG / etc. Diff tool skip-compares non-deterministic fields. |
| `payload` | object | kind-specific (see below) |
## Cycle source notes
- **canary**: the PPC `tb` (timebase) register can be read from the PPCContext passed into shim handlers. If a hook is on a path that does not have access to a PPCContext (e.g. a host-side handle-table destructor), the emitter MUST set `guest_cycle = 0` and leave `deterministic = false` on the payload-side metadata. The diff tool ignores `guest_cycle` for ordering — `tid_event_idx` is canonical.
- **ours**: `scheduler.thread(current_ref()).ctx.timebase` (already maintained per guest thread).
## Per-tid event index
Both engines maintain a per-tid monotonic counter starting at `0`. The counter is bumped **before** the event is serialized, so the first event for tid `N` has `tid_event_idx = 0`.
The `schema_version` event is special: it is emitted by the writer thread (typically the boot thread before any guest code has run) with `tid = 0` and `tid_event_idx = 0`. The actual guest thread `0` does not exist; the diff tool treats `tid = 0` as the schema header only.
## Handle semantic ID
Canary and ours produce guest handles in different ranges (canary: `0xF8xxxxxx` region; ours: bump-allocated `0x4, 0x8, 0xC, …`). Raw handle IDs are unsuitable as a cross-engine identity. Instead, both engines compute a stable **handle semantic ID** at handle creation time using **FNV-1a 64-bit** over a fixed-format byte string. FNV-1a is used (not SHA256) because both engines can implement it in <10 lines with no dependency, and the diff tool only needs a deterministic identity hash — not a crypto property.
```
input_bytes = le_u32(create_site_pc) ‖ le_u32(creating_tid) ‖ le_u64(tid_event_idx_at_creation) ‖ le_u32(object_type)
hash = 0xCBF29CE484222325
for each byte b in input_bytes:
hash = (hash XOR b) * 0x100000001B3 mod 2^64
handle_semantic_id = format("{:016x}", hash)
```
Both engines MUST emit the lowercase 16-hex-char form. The `create_site_pc` is the guest PC at the call site of the kernel call that created the handle: in canary, `PPCContext::lr - 4` (the `bl` to the import stub); in ours, the equivalent return address from the syscall dispatcher.
**Object type codes** (v1 — both engines agree):
| Code | Type |
|---|---|
| `0x00` | Unknown |
| `0x01` | Event |
| `0x02` | Mutant |
| `0x03` | Semaphore |
| `0x04` | Timer |
| `0x05` | Thread |
| `0x06` | File |
| `0x07` | IoCompletion |
| `0x08` | Module |
| `0x09` | EnumState |
| `0x0A` | Section |
| `0x0B` | Notification |
All subsequent events that reference a handle emit BOTH `handle_semantic_id` (the diff key) and `raw_handle_id` (engine-local, never compared).
## Event kinds (v1)
### `schema_version`
Header event. `payload = {"version": 1, "emitter_build": "<string>"}`.
### `thread.create`
Emitted by the **parent** thread at the kernel call that creates the new thread.
```json
"payload": {
"handle_semantic_id": "0123456789abcdef",
"parent_tid": 1,
"entry_pc": "0x82001234",
"ctx_ptr": "0xbce25340",
"priority": 0,
"affinity": 1,
"stack_size": 65536,
"suspended": false
}
```
### `thread.exit`
Emitted by the **exiting** thread (last event before tid disappears).
```json
"payload": {"exit_code": 0}
```
### `thread.suspend` / `thread.resume`
```json
"payload": {"target_tid": 13}
```
### `kernel.call`
Emit at handler entry, **before** any side effects.
```json
"payload": {
"name": "NtCreateFile",
"args": {"file_handle_ptr": "0x70000010", "desired_access": "0x80100080", "obj_attr_ptr": "0x70000020", ...},
"args_resolved": {"path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik"}
}
```
- Numeric args use `0x`-prefixed hex strings if pointer-typed; ints stay as ints.
- `args_resolved` is a best-effort dereference (strings, struct dumps, buffer summaries). Optional.
### `kernel.return`
Emit at handler exit, **after** all side effects committed.
```json
"payload": {
"name": "NtCreateFile",
"return_value": 0,
"status": "0x00000000",
"side_effects": [
{"kind": "handle.create", "handle_semantic_id": "...", "object_type": 6, "raw_handle_id": "0x40"}
]
}
```
The `side_effects` array MAY duplicate events also emitted as standalone (`handle.create`). The diff tool treats both as authoritative; duplicates do not cause divergence.
### `handle.create`
For host-side creates not tied to a kernel call (rare).
```json
"payload": {
"handle_semantic_id": "0123456789abcdef",
"object_type": 1,
"object_name": null,
"raw_handle_id": "0xf8000048"
}
```
### `handle.destroy`
```json
"payload": {
"handle_semantic_id": "0123456789abcdef",
"raw_handle_id": "0xf8000048",
"prior_refcount": 1
}
```
### `wait.begin`
```json
"payload": {
"handles_semantic_ids": ["0123...", "abcd..."],
"timeout_ns": -1,
"alertable": false,
"wait_type": "any"
}
```
`timeout_ns = -1` means INFINITE. `wait_type` is `"any"` or `"all"`.
### `wait.end`
```json
"payload": {
"status": "0x00000000",
"woken_by_semantic_id": "0123456789abcdef",
"wait_duration_cycles": 12345
}
```
`wait_duration_cycles` is `deterministic = false` (host scheduling affects it). `woken_by_semantic_id` is null on timeout / alerted.
### `mem.write`
**OPT-IN — gated by a separate cvar (`phase_a_event_log_mem_writes`, default false).** In Phase A this kind is reserved; emitters MAY ship a TODO stub. Schema:
```json
"payload": {
"guest_addr": "0x82000000",
"value": "0x12345678",
"size": 4,
"source": "guest_jit"
}
```
### `vfs.open` / `vfs.read` / `vfs.close`
File-IO events, separate from `kernel.call` so the diff tool can match on canonical path:
```json
"payload": {"canonical_path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik", "raw_handle_id": "0x40", "handle_semantic_id": "..."}
```
### `import.call`
Emitted at the syscall dispatcher (ours) or the import-stub JIT trap (canary), one per imported function invocation, **before** the implementing `kernel.call`.
```json
"payload": {
"module": "xboxkrnl.exe",
"ord": 0x101,
"name": "NtCreateFile"
}
```
## Diff-tool field-comparison rules
| Field | Rule |
|---|---|
| `engine` | skipped (always differs) |
| `host_ns` | skipped (host-clock) |
| `guest_cycle` | skipped (engines disagree on absolute count; diff uses `tid_event_idx`) |
| `raw_handle_id` | skipped (engines use different handle namespaces) |
| `handle_semantic_id` | **C+15-α: skipped** (engine-local — see below) |
| `handles_semantic_ids` (wait.begin) | **C+15-α: skipped** (same reason) |
| `parent_tid` (thread.create) | **C+15-α: skipped** (engine-local guest tids) |
| `ctx_ptr` (thread.create) | **C+22 v1.7: per-(tid, kind, field) ordinal sentinel** (`<HOSTHEAP_thread.create_ctx_ptr_N>`) — host-heap-derived VA, AUDIT-043 ε class |
| `woken_by_semantic_id` (wait.end) | **C+15-α: skipped** (engine-local SID) |
| `deterministic` (event-level field) | skipped (metadata) |
| Any payload field listed under a non-deterministic kind | skipped where flagged |
| All other payload fields | strict equality |
### Phase C+15-α note on `handle_semantic_id`
The SID computation includes `creating_tid` as input, but guest TIDs differ
between engines (canary's tid=6 maps to ours's tid=1 on the main chain).
Both engines compute SIDs **using their own local tids**, so the same logical
handle gets two different SIDs across engines. The diff tool skip-compares
SID fields and relies on `tid_event_idx + object_type` for alignment.
A future schema v2 could canonicalize SIDs via the diff tool's tid map and
restore strict comparison. For v1.1 the simpler skip-policy suffices.
## Shared-global SIDs (v1.2 — added in Phase C+18)
A subset of guest kernel dispatcher objects (`KEVENT`, `KSEMAPHORE`,
`KTIMER`, `KMUTANT`) are **process-global**: they live in
statically-initialized or pre-allocated guest memory and are touched
by MULTIPLE guest threads during boot. Examples include the XAudio
voice-volume change-mask semaphore at `0x828a3230` in Sylpheed.
Canary's `XObject::GetNativeObject` (`src/xenia/kernel/xobject.cc:397-483`)
and ours's `ensure_dispatcher_object` (`crates/xenia-kernel/src/exports.rs:4363`)
**lazy-wrap** these dispatchers on **first guest-thread touch**: the
first `KeWait*` invocation that passes the raw kernel-object pointer
synthesizes the `XObject` wrapper, stamps the `X_DISPATCH_HEADER` with
the `kXObjSignature` marker (`'X','E','N','\0' = 0x58454E00`), stashes
the handle, and emits `handle.create`. Subsequent touches find the
marker and short-circuit without emit (per-pointer idempotent).
### The first-toucher race
**Which** guest thread wins the "first toucher" race is
**timing-dependent**:
- Canary and ours have different host schedulers, JIT throughput, and
guest-thread bootstrap ordering.
- Even within the same engine across runs the first-toucher can
differ — but each engine produces a deterministic per-run total
ordering, so cold-vs-cold reproducibility holds.
The per-thread SID recipe `semantic_id(create_site_pc, creating_tid,
tid_event_idx_at_creation, object_type)` (v1) depends on BOTH
`creating_tid` and `tid_event_idx_at_creation`, so:
- Same dispatcher → DIFFERENT SIDs in each engine (race-dependent).
- `handle.create` for the same object lands on different per-tid
streams in canary vs ours.
The C+17 fix made ours emit `handle.create` for these synthesized
shadows, but the C+17 D-NEW-3 regression on tid=15→10 was
exactly the first-toucher race: ours's tid=10 was the first toucher
locally; canary's tid=15 was NOT the first toucher in its run — some
other canary tid had already adopted `0x828a3230`. ours's tid=10
emitted an "extra" `handle.create` that canary's tid=15 lacked, and
the diff tool flagged a kind mismatch at idx=2.
### The C+18 fix: deterministic SID recipe
Process-global dispatchers use a **second** SID recipe that is
scheduling-invariant. Both engines now use:
```
SHARED_GLOBAL_SID_MARKER = 0xC01AB005 (fixed sentinel, both engines)
input_bytes =
le_u32(SHARED_GLOBAL_SID_MARKER) // 4 bytes — "create_site_pc" slot
‖ le_u32(0) // 4 bytes — "creating_tid" slot
‖ le_u64(pointer) // 8 bytes — "tid_event_idx" slot
‖ le_u32(object_type) // 4 bytes
hash = FNV-1a-64(input_bytes)
shared_global_sid = format("{:016x}", hash)
```
The marker `0xC01AB005` is outside any plausible guest-PC range
(PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap
0x4xxxxxxx), so it can never collide with a regular per-thread SID
(which uses a real guest PC as `create_site_pc`).
Both engines compute the SAME SID for the same dispatcher pointer
regardless of:
- which guest thread is the first toucher,
- the `tid_event_idx_at_creation`,
- the per-engine scheduling order.
### Which call sites use which recipe
| Call site | SID recipe |
|--------------------------------------------------------|-------------------|
| `KernelState::alloc_handle_for` (ours) | per-thread |
| `ObjectTable::AddHandle` direct (canary) | per-thread |
| `ensure_dispatcher_object` (ours) | **shared-global** |
| `XObject::GetNativeObject` synthesized (canary) | **shared-global** |
Regular per-thread `handle.create` events (file open, thread create,
named-event create, etc.) keep the v1 per-thread recipe. The
shared-global recipe is restricted to lazy-wrap synthesis.
### Diff tool: cross-tid floating `handle.create` matching
The diff tool pre-pass collects all shared-global SIDs in either
engine's stream. A `handle.create` event is detected as shared-global
by recomputing the deterministic SID from its `(raw_handle_id,
object_type)` payload and comparing against the event's
`handle_semantic_id`. Regular per-thread SIDs cannot match this check
by construction.
When per-tid alignment finds a kind mismatch and one side has a
shared-global `handle.create` whose SID is in the floating set:
- The diff tool advances ONLY that side's stream pointer past the
floating event.
- Re-compare at the same canonical position.
The diff report's summary table shows a `floating_skipped (c/o)`
column for visibility — counts of absorbed events per side.
### Index relaxation
The C+18 fix relaxes the legacy diff-tool rule that requires
`canary.tid_event_idx == ours.tid_event_idx` for matching events.
With floating absorption, the per-tid indices can drift by 1 between
the two sides — but the `kind` and `payload` comparisons remain
strict. The raw indices are still preserved on the events themselves
(useful for debugging and report context).
### Backward compatibility
- Wire format unchanged. `schema_version` is still `1`.
- Pre-C+18 event logs (no shared-global SIDs in the stream) trigger
the legacy code path automatically — the floating set is empty.
- The marker constant `0xC01AB005` MUST be exactly this value in both
engines and the diff tool. Tests in both engines plus
`tools/diff-events/test_diff_events.py` lock it in.
## Wait-begin floating absorb (v1.3 — added in Phase C+21)
### Motivation
Canary's `RtlEnterCriticalSection` (and its symmetric counterparts —
`KeWaitForSingleObject` invoked on a process-global dispatcher,
mutex/semaphore contended-acquire paths) emits `wait.begin` **only on
the contended slow path**. The fast path (uncontended atomic-CAS, or
recursive bump) emits NO `wait.begin` and only the `kernel.call`
`kernel.return` pair. Which path is taken depends on whether ANOTHER
guest thread is currently holding the dispatcher when the wait is
attempted — i.e. it is **host-scheduler-driven**, varying across cold
runs of the same engine.
Reading-error class **#32** (documented in C+20's
`investigation.md`) captures this: cross-checking 3 fresh canary cold
runs at canary tid=6 idx 104,606 showed:
- jitter-1: `wait.begin sid=75ae880ec432eb36` (contended)
- jitter-2: `kernel.return` (fast — matches ours)
- jitter-3: offset-shifted wait.begin at a different idx with a
different SID
The matched-prefix metric is unreliable inside such regions if the
diff tool treats wait.begin events as strictly positional.
### The fix
A `wait.begin` event is **floating** if at least one of its
`payload.handles_semantic_ids` references a shared-global SID
(see §"Shared-global SIDs"). During the per-tid two-pointer walk:
- If one side has a floating `wait.begin` and the other has a
different kind at the same canonical position, advance ONLY the
wait.begin side's pointer and re-compare.
`wait_type=all` waits are floating as long as ANY single handle in
the set is shared-global — the entire wait's blocking behavior is
timing-dependent if even one of its handles is on a process-global
dispatcher.
### Shared-global SID detection (extended in C+21)
The diff tool's `collect_shared_global_sids` pre-pass now unions
TWO sources:
1. **Recipe-matching `handle.create` events** (Phase C+18 — direct).
This catches ours's `ensure_dispatcher_object` output where
`raw_handle_id == ptr` (the recipe-input pointer).
2. **Cross-tid usage heuristic** (Phase C+21 — indirect). Any SID
referenced via `handle.create` OR `wait.begin` on **two or more
distinct guest tids** in EITHER engine is treated as shared-global.
The cross-tid heuristic exists because canary's
`EmitHandleCreateSharedGlobal` (`event_log.cc:435`) emits the SID
computed from the dispatcher VA but stashes
`object->handle()` (a handle-table slot in the `0xF8xxxxxx`
region) as `raw_handle_id`. Those two values DIFFER, so canary's
shared-global `handle.create` events are NOT recipe-recognizable
from their payload alone. Multi-tid SID usage is a robust
observational signal: per-thread SIDs by construction stay on the
single creating tid (their hash inputs include `creating_tid`),
so any cross-tid SID usage indicates a process-global dispatcher.
### Risk of over-absorption (and why it's bounded)
The cross-tid heuristic could in principle mis-classify a per-thread
SID that one thread creates and another thread waits on — a
legitimate cross-thread synchronization pattern. The floating-absorb,
however, only fires on a **kind mismatch** at the canonical position.
Per-thread waits that match strictly on both sides advance normally
without any absorb. The heuristic only loosens alignment when one
side is missing a `handle.create` or `wait.begin` — exactly the
scheduling-jitter window the C+21 fix targets.
### Diff-tool report changes
The summary table's `floating_skipped (c/o)` column is split into
two columns:
- `floating_create (c/o)` — C+18 `handle.create` absorptions.
- `floating_wait (c/o)` — C+21 `wait.begin` absorptions.
Both per-side and observation-only — counts may legitimately be
non-zero in a clean run.
### Backward compatibility
- Wire format unchanged. `schema_version` is still `1`.
- Pre-C+21 event logs (no `wait.begin` events that reference
shared-global SIDs) trigger no new behavior — the wait absorption
branches are inert.
- The C+18 floating-create logic is unchanged; the C+21 fix is
strictly additive.
- Engine source is UNCHANGED in C+21 — the fix is in the diff tool
only.
## contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18)
### Motivation
The 104,607 cap is canary's tid=6 contending on a CS while ours's tid=1
fast-paths through the same call (Phase C+22). Schedules diverge for
host-OS reasons, so neither engine is "wrong" — but matched-prefix
stalls. Phase D's H' approach makes ours's `rtl_enter_critical_section`
*replay* canary's contention by consulting a per-call manifest built
from canary's contention trace.
Stage 1 (this section) introduces the canary-side **emitter** for that
manifest: a new event kind `contention.observed` that fires from
`RtlEnterCriticalSection_entry` (`xboxkrnl_rtl.cc:596-633`) just before
the call falls through to `xeKeWaitForSingleObject` after spin-loop
exhaustion. Cvar-gated (`kernel_emit_contention`, default false) so
default canary behavior is byte-identical.
### Event shape
```json
{
"schema_version": 1,
"engine": "canary",
"kind": "contention.observed",
"tid": <guest tid of caller>,
"tid_event_idx": <per-tid ordinal consumes one slot>,
"guest_cycle": 0,
"host_ns": <emit timestamp>,
"deterministic": true,
"payload": {
"cs_ptr": "0xHHHHHHHH", // guest VA of the RTL_CRITICAL_SECTION
"site_sid": "HHHHHHHHHHHHHHHH", // shared-global SID (see below)
"contended": true // always true at v1.4 (uncontended is implicit)
}
}
```
`site_sid` is computed via the **C+18 shared-global SID recipe**:
```
site_sid = FNV-1a-64 over
( kSharedGlobalSidMarker [u32 LE] // 0xC01AB005
, 0 [u32 LE] // creating_tid (unused)
, cs_ptr as u64 [u64 LE] // pointer-as-idx
, kObjCriticalSection [u32 LE] // 0x0C, new in v1.4
)
```
Both engines compute the same SID for the same CS pointer. The marker
constant `kObjCriticalSection = 0x0C` is the new ObjectType value
introduced for this kind; it does NOT correspond to a real XObject
(CS lives as a guest-memory struct, not a handle-tabled object).
### When emitted (canary)
In `RtlEnterCriticalSection_entry`:
1. Recursive-lock fast path (already own lock) → **NO emit** (not contention).
2. Spin-loop succeeds (`atomic_cas` flips `lock_count` from -1 → 0) → **NO emit** (fast acquire).
3. Spin-loop exhausted **AND** `atomic_inc(&cs->lock_count) != 0`**EMIT** with `contended=true`, then `xeKeWaitForSingleObject`.
4. Spin-loop exhausted **AND** `atomic_inc(...) == 0` (CS became free between spin and inc) → **NO emit** (we won the race after spin).
The emit point sits **between** atomic_inc's positive result and the
`xeKeWaitForSingleObject` call, so the new event always precedes the
existing `wait.begin` event in the per-tid ordinal.
### When emitted (ours, Stage 3 — pending)
Stage 3 will add a symmetric emit in `rtl_enter_critical_section`
(`xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946`) at the
forced-park branch driven by the manifest. This keeps per-tid ordinals
aligned across engines after replay.
### Diff-tool treatment (Stage 4 — pending)
`contention.observed` will be added to `ENGINE_LOCAL_KINDS` in
`diff_events.py`: the per-tid pointer advances past these events on
either side without comparison. This keeps matched-prefix counts
unchanged when ONE side emits the event (Stage 1's canary-only world)
or when BOTH emit at the same ordinal (Stage 3's parity world).
### Cvar default + byte-identity
`kernel_emit_contention=false` by default. With cvar=false, the helper
`phase_a::EmitContentionObserved` short-circuits at the cvar check
before any `IsEnabled()` lookup. The pre-Stage-1 canary code path is
preserved byte-for-byte; cvar-OFF cold runs produce zero
`contention.observed` events (validated on the Stage 1 cold run:
0 occurrences in a 4.4 GB / 18.6 M event trace).
## Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18)
### Status
**Band-aid.** Explicit annotation: this absorber CROSSES the reading-error
#23 boundary in spirit. It folds real guest control-flow divergence at
the diff-tool layer. It exists because the underlying root cause —
producer-throughput divergence under the cooperative-vs-preemptive
scheduling mismatch (see Phase D forensics) — is **explicitly out of
scope** for the H' plan: fixing it in ours's engine would require
preempting the cooperative scheduler, which invalidates 23 phases of
digest stability. The absorber is the practical compromise.
### Trigger shape
The absorber fires ONLY at a kind mismatch of:
- canary[ic] = `import.call` with `payload.name == "RtlEnterCriticalSection"`
- ours[io] = `import.call` with `payload.name == "RtlLeaveCriticalSection"`
For any other kind mismatch, the absorber is silent. This narrowness is
intentional: real engine divergences appear in other shapes and must
still surface.
### Behavior
When the trigger pattern matches, canary's stream is scanned for one or
more balanced `[Enter-block, Leave-block]` pairs immediately following
the trigger position:
- An Enter-block is 3 consecutive events:
`import.call RtlEnterCriticalSection → kernel.call RtlEnterCriticalSection → kernel.return RtlEnterCriticalSection`.
- A Leave-block is 3 consecutive events with `RtlLeaveCriticalSection`.
The absorber consumes pairs greedily up to a cap of `_NESTED_CS_PAIR_CAP
= 32` pairs (empirically, Sylpheed's worst-case is ~10-15 pairs at the
104,607 cap). After consuming each pair, it checks whether canary's next
event has the SAME `kind` AND same `payload.name` as ours[io]. The first
convergence wins; canary's pointer is advanced past the absorbed pairs.
If no convergence is found within the cap, the absorber returns None
and the divergence falls through to normal reporting.
### Why this is safe (within #23's spirit)
1. The absorption only happens when canary's stream re-aligns with
ours's stream past the nested block. If it doesn't re-align, the
real divergence is reported.
2. The nested-block shape matches a specific PPC pattern: the consumer
thread in canary acquires a CS, calls a helper that iterates a
tree/registry, takes the nested-CS-enter path for each item, and
releases the outer CS. Ours's tree is shorter so it skips this.
The net effect on guest state is bounded: ours has fewer items
processed in this iteration, but the EVENT stream past the
absorption resumes the same logical operation.
3. The Phase B `image_loaded_sha256` is the foundational invariant.
It's unaffected by this absorber (no engine source change).
### Why this is NOT safe in the general sense
- Diverging downstream state IS lost: ours's tree has fewer entries
than canary's after the absorbed block. Subsequent ours operations
that touch the tree will behave differently. Other absorbers / fixes
will be needed if those state-differences manifest later.
- A future engine bug that produces a spuriously nested Enter+Leave
pair could be falsely absorbed. Mitigation: the absorber requires
canary's post-block stream to re-align with ours's; spurious nested
pairs without re-alignment fall through to normal divergence
reporting.
### Empirical result (Sylpheed 104,607 cap)
Pre-absorber (post-Stage-3+4): main matched-prefix = 104,607 (cap).
Post-absorber: main matched-prefix = **105,046 (+439 events)**.
The next divergence is at idx 105,046 on `VdInitializeEngines.return_value`
(canary=1, ours=0) — an unrelated engine bug in the video subsystem,
NOT a recurrence of the cap pattern. Sister chains preserved
(11/32/4/41/16).
### Tests
Three unit tests in `test_diff_events.py`:
- `test_nested_cs_cleanup_block_absorbed_when_convergent` — folds one nested pair
- `test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges` — confirms re-alignment requirement
- `test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup` — negative case
## sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21)
### Motivation
AUDIT-069 Sessions 1-5 established that ours under-produces semaphore
releases by ~80% on the work-semaphore vs canary (`99 vs 414` in S5,
refined in S6 to `83 vs 414` apples-to-apples on the work semaphore
alone). The measurement infrastructure was a one-off cvar
(`audit_70_semaphore_release_watch`, hand-built per-handle log lines)
plus an ours-side `--lr-trace` capture at the wrapper-entry PC. Future
AUDIT-070+ sessions and any general regression triage need this metric
to be diff-visible without bespoke cvars per investigation.
`sema.release` lifts the AUDIT-070 cvar's signal into the Phase A
schema as a **symmetric** event kind in both engines.
### Event shape
```json
{
"schema_version": 1,
"engine": "canary",
"kind": "sema.release",
"tid": <guest tid of caller>,
"tid_event_idx": <per-tid ordinal consumes one slot>,
"guest_cycle": <PPC timebase>,
"host_ns": <emit timestamp>,
"deterministic": true,
"payload": {
"handle_semantic_id": "HHHHHHHHHHHHHHHH", // shared-global SID for the work-sem
"raw_handle_id": "0xHHHHHHHH", // engine-local
"release_count": 1, // games typically release 1
"previous_count": 0, // semaphore count BEFORE release
"caller_pc": "0xHHHHHHHH" // guest LR at release time
}
}
```
### SID recipe
The work-semaphore in Sylpheed (canary handle `0xF800003C`, ours
handle `0x1044`) is a **process-global dispatcher** in the C+18 sense:
it lives in pre-allocated guest memory and is touched by multiple
guest threads (main, worker, cache-thread, other producers). Its
`handle_semantic_id` SHOULD use the **shared-global recipe**
(`ComputeSharedGlobalSemanticId(dispatcher_ptr, kObjSemaphore=0x03)`)
so canary and ours produce the same SID for the same guest dispatcher.
Per-thread semaphores (rare in Sylpheed) MAY use the v1 per-thread
recipe; the diff tool does NOT compare SIDs for `sema.release` (the
kind is engine-local positionally — see below).
### Why engine-local
Per AUDIT-069 H3 and S6's first-N=20 measurement, the cadence and
ordinal interleaving of releases between the worker, main, and
cache-thread are **timing-dependent**: the first 20 releases match
perfectly across engines, but worker tid diverges at canary ord=83
when the cache-thread's first release fires (which ours never
emits because ours's cache-thread wedges at `sub_821CB030+0x1AC`).
Strict positional alignment would always trip on this known
divergence.
`sema.release` is therefore in `ENGINE_LOCAL_KINDS` in the diff tool
(alongside `contention.observed`): both engines emit, but the diff
tool advances past these events on either side without alignment.
The **count** is surfaced in the report's "Counted engine-local
kinds" summary table (per-tid + total per engine) so cadence
regressions are diff-visible at-a-glance.
### Emit points (planned, NOT yet wired)
- **Canary**: extend `audit_70_semaphore_release_watch` to call
`phase_a::EmitSemaRelease(handle, count, prev_count)` from
`NtReleaseSemaphore_entry` + `xeKeReleaseSemaphore`. Cvar gating
remains the existing `audit_70_semaphore_release_watch` (or a new
`phase_a_event_log_sema_releases=false` for finer control).
- **Ours**: emit `sema.release` from `nt_release_semaphore` in
`crates/xenia-kernel/src/exports.rs` and from
`KSemaphore::release` (kernel-mode equivalent). Default-off via a
runtime flag; default cold runs must remain digest-stable.
Both engines MUST emit at handler entry (not wrapper-internal) so the
event count corresponds 1:1 to guest `NtReleaseSemaphore` invocations,
matching the canary cvar's existing semantics.
### Status
- **Diff tool**: support landed (this session, v1.6). `sema.release`
in `ENGINE_LOCAL_KINDS` + `COUNTED_ENGINE_LOCAL_KINDS`; counts
surfaced in report summary; 3 new tests in `test_diff_events.py`.
- **Canary emit**: NOT YET WIRED. Planned for AUDIT-070+ when the
root cause investigation requires it. Existing cvar
`audit_70_semaphore_release_watch` continues to emit non-schema
log lines (used by S5/S6 captures).
- **Ours emit**: NOT YET WIRED. See above.
### Backward compatibility
- Wire format unchanged. `schema_version` is still `1`.
- Pre-v1.6 event logs (no `sema.release` events) trigger no new
behavior — the engine-local skip branches are inert; the
"Counted engine-local kinds" report section is suppressed when
no counted-kind events exist.
- Diff tool changes are purely additive: existing engine binaries
diff identically pre- and post-v1.6.
## Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26)
### Motivation
C+2 (`ALLOCATOR_RETURN_FNS`) canonicalizes `kernel.return.return_value`
for a known set of host-allocator-returning exports
(`MmAllocatePhysicalMemoryEx`, `RtlAllocateHeap`, …). That covers
the case where the allocated VA appears as the function's *return*
value. But the same allocator-drift class (AUDIT-043 ε:
canary's BC physical heap `0xBCxxxxxx` vs ours's unified user heap
`0x4xxxxxxx`) ALSO surfaces inside **typed event payloads** of
non-allocator exports — most notably the `thread.create.ctx_ptr`
field, which holds the host-allocated TLS/context block that
`ExCreateThread` passes to the new guest thread's r3.
Empirical surface (C+22 cold-vs-cold idx 105,128 on the Sylpheed
audio-stack worker `ExCreateThread(entry_pc=0x824cd458)`):
| field | canary | ours |
|---|---|---|
| `ctx_ptr` | `0xbe56bb3c` (BC physical heap) | `0x42453b3c` (unified user heap) |
| `entry_pc` | `0x824cd458` | `0x824cd458` (bit-identical — game code) |
| `priority` | `0` | `0` |
| `affinity` | `4` | `4` |
| `stack_size` | `32768` | `32768` |
| `suspended` | `false` | `false` |
The C+2 `ALLOCATOR_RETURN_FNS` mechanism doesn't help here because
`ExCreateThread`'s return value is the new thread's *handle*
(canary's `0xF8xxxxxx` vs ours's `0x4, 0x8, …`), already covered
by `handle_semantic_id` skip-policy. The host-heap-allocated
context block is a side-channel field inside the
`thread.create` event payload.
### The fix
`HOST_HEAP_PAYLOAD_FIELDS_BY_KIND` maps event kind → tuple of
payload field names. Each listed field's value (expected
`0x`-prefixed hex string) is rewritten to a per-(tid, kind, field)
ordinal sentinel `<HOSTHEAP_<KIND>_<FIELD>_<ORDINAL>>` BEFORE
payload comparison. The mechanism mirrors
`canonicalize_allocator_returns` exactly, restricted to typed
payload fields.
Initial set (v1.7):
```python
HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = {
"thread.create": ("ctx_ptr",),
}
```
### Strict-field preservation
For each canonicalized event kind, the **strict** fields (game-visible
attributes that MUST match across engines) are untouched. For
`thread.create` these are:
- `entry_pc` — guest VA of the new thread's entry function, bit-
identical in both engines because both engines load the same XEX
and the entry comes from guest code.
- `priority`, `affinity`, `stack_size`, `suspended` — game-visible
thread attributes the guest passes to `ExCreateThread`.
Skip-policy fields (`handle_semantic_id`, `parent_tid`) continue
to be skipped via `SKIP_PAYLOAD_FIELDS_BY_KIND` (unchanged from
C+15-α — see "Diff-tool field-comparison rules" above).
### Why `parent_tid` does NOT need new canonicalization
Per the C+15-α skip-policy table, `parent_tid` is already in
`SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]`. The diff tool
pairs guest TIDs at the chain level (`--tid-map` or
`auto_tid_map`), and the per-event `parent_tid` is engine-local
(canary tid=6 vs ours tid=1 for the same logical "main thread"
chain). Skipping is sufficient — no ordinal sentinel needed.
Could a future schema v2 canonicalize `parent_tid` via the tid
map? Yes, but it would surface mismatches as a *map gap* rather
than as a clearer per-tid alignment failure that's already
visible at chain boundaries. The v1.x skip-policy is the
simpler choice; tests pin the existing behavior so it doesn't
regress.
### Ordinal-count contract
As with `ALLOCATOR_RETURN_FNS`: if one engine emits MORE
`thread.create` events on a given tid than the other, ordinals
drift and the next typed event surfaces a divergence against
whatever the other side has at that position. Ordinal-count
mismatch IS a behavioral divergence — the canonicalization
preserves divergence detection, only collapsing
host-allocator-VA noise.
### Defensive value handling
If `ctx_ptr` is non-string (`None`, int, missing) — pre-C+22
event logs whose emitter omits the field — the canonicalizer
leaves it untouched and does NOT consume an ordinal. The next
string-typed value gets ordinal 0. This keeps pre-v1.7 logs
diffable without forcing an emitter retrofit.
### Backward compatibility
- Wire format unchanged. `schema_version` is still `1`.
- Pre-C+22 event logs whose `thread.create.ctx_ptr` happens to
bit-match (e.g. static-allocator addresses like `0x828F3D08`
that BOTH engines use for the pre-XEX kernel-state ctxs)
still match strictly via the ordinal sentinel — they get the
same ordinal in both engines.
- The `--no-canonicalize-host-heap-fields` CLI flag disables the
pass (reverts to raw-VA comparison), mirroring the existing
`--no-canonicalize-allocators`. Used by gate tests and
investigation rerun.
- Engine source is UNCHANGED in C+22 — the fix is in the diff
tool only.
### Extension shape
The map shape `kind -> (field, …)` is intentionally minimal:
each entry is one event kind plus the fields on it that hold
host-heap VAs. Future entries could include e.g.
`thread.create.tls_ptr` (if such a field is added to the schema)
or a hypothetical `vfs.mmap.host_ptr`. Strict-field policy
remains: any field NOT listed here is compared bit-identically.
## Forward compatibility
Phase A's original schema-v1 declared 13 sections (16 distinct kind strings);
Phase A wired 4 of them. Phase C+15-α wired an additional 5 (`handle.create`,
`handle.destroy`, `thread.create`, `thread.exit`, `wait.begin`). `wait.end`,
`thread.suspend/resume`, `mem.write`, `vfs.open/read/close` remain declared
but unwired; adding them is additive surface area at schema v1.1+.
A future schema v2 may break wire format (e.g. canonical SIDs, structured args).
Both engines pin `schema_version = 1` in this phase; the diff tool refuses to
mix v1 and v2 inputs.