Files
xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

36 KiB
Raw Blame History

Phase A — Event Log Schema v1

Status: frozen for Phase A and Phase B. Adding a new event kind requires a schema_version bump and a coordinated update in both engines + the diff tool.

Wire format

JSONL — one JSON object per line, UTF-8, \n-terminated. Both engines emit the same byte format.

The first line of every event-log file MUST be a schema_version event:

{"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,"guest_cycle":0,"host_ns":0,"deterministic":true,"payload":{"version":1,"emitter_build":"<commit-or-build-id>"}}

The diff tool refuses to parse a file whose first event is not schema_version with version 1.

Common fields (every event)

Field Type Notes
schema_version int always 1 in this phase
engine string "canary" or "ours"
kind string one of the v1 kinds below
tid int guest thread id of the calling thread (host TID never logged)
tid_event_idx int per-tid monotonic, starts at 0 — the diff key
guest_cycle int per-engine monotonic guest-instruction count; 0 if the engine cannot supply one (see "Cycle source" below). NOT used by the diff tool for correctness — tid_event_idx is the canonical key.
host_ns int host monotonic-clock ns since process start; debug only, never compared by diff
deterministic bool false if any payload field is derived from host time / raw allocator address / RNG / etc. Diff tool skip-compares non-deterministic fields.
payload object kind-specific (see below)

Cycle source notes

  • canary: the PPC tb (timebase) register can be read from the PPCContext passed into shim handlers. If a hook is on a path that does not have access to a PPCContext (e.g. a host-side handle-table destructor), the emitter MUST set guest_cycle = 0 and leave deterministic = false on the payload-side metadata. The diff tool ignores guest_cycle for ordering — tid_event_idx is canonical.
  • ours: scheduler.thread(current_ref()).ctx.timebase (already maintained per guest thread).

Per-tid event index

Both engines maintain a per-tid monotonic counter starting at 0. The counter is bumped before the event is serialized, so the first event for tid N has tid_event_idx = 0.

The schema_version event is special: it is emitted by the writer thread (typically the boot thread before any guest code has run) with tid = 0 and tid_event_idx = 0. The actual guest thread 0 does not exist; the diff tool treats tid = 0 as the schema header only.

Handle semantic ID

Canary and ours produce guest handles in different ranges (canary: 0xF8xxxxxx region; ours: bump-allocated 0x4, 0x8, 0xC, …). Raw handle IDs are unsuitable as a cross-engine identity. Instead, both engines compute a stable handle semantic ID at handle creation time using FNV-1a 64-bit over a fixed-format byte string. FNV-1a is used (not SHA256) because both engines can implement it in <10 lines with no dependency, and the diff tool only needs a deterministic identity hash — not a crypto property.

input_bytes = le_u32(create_site_pc) ‖ le_u32(creating_tid) ‖ le_u64(tid_event_idx_at_creation) ‖ le_u32(object_type)
hash = 0xCBF29CE484222325
for each byte b in input_bytes:
    hash = (hash XOR b) * 0x100000001B3   mod 2^64
handle_semantic_id = format("{:016x}", hash)

Both engines MUST emit the lowercase 16-hex-char form. The create_site_pc is the guest PC at the call site of the kernel call that created the handle: in canary, PPCContext::lr - 4 (the bl to the import stub); in ours, the equivalent return address from the syscall dispatcher.

Object type codes (v1 — both engines agree):

Code Type
0x00 Unknown
0x01 Event
0x02 Mutant
0x03 Semaphore
0x04 Timer
0x05 Thread
0x06 File
0x07 IoCompletion
0x08 Module
0x09 EnumState
0x0A Section
0x0B Notification

All subsequent events that reference a handle emit BOTH handle_semantic_id (the diff key) and raw_handle_id (engine-local, never compared).

Event kinds (v1)

schema_version

Header event. payload = {"version": 1, "emitter_build": "<string>"}.

thread.create

Emitted by the parent thread at the kernel call that creates the new thread.

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "parent_tid": 1,
  "entry_pc": "0x82001234",
  "ctx_ptr": "0xbce25340",
  "priority": 0,
  "affinity": 1,
  "stack_size": 65536,
  "suspended": false
}

thread.exit

Emitted by the exiting thread (last event before tid disappears).

"payload": {"exit_code": 0}

thread.suspend / thread.resume

"payload": {"target_tid": 13}

kernel.call

Emit at handler entry, before any side effects.

"payload": {
  "name": "NtCreateFile",
  "args": {"file_handle_ptr": "0x70000010", "desired_access": "0x80100080", "obj_attr_ptr": "0x70000020", ...},
  "args_resolved": {"path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik"}
}
  • Numeric args use 0x-prefixed hex strings if pointer-typed; ints stay as ints.
  • args_resolved is a best-effort dereference (strings, struct dumps, buffer summaries). Optional.

kernel.return

Emit at handler exit, after all side effects committed.

"payload": {
  "name": "NtCreateFile",
  "return_value": 0,
  "status": "0x00000000",
  "side_effects": [
    {"kind": "handle.create", "handle_semantic_id": "...", "object_type": 6, "raw_handle_id": "0x40"}
  ]
}

The side_effects array MAY duplicate events also emitted as standalone (handle.create). The diff tool treats both as authoritative; duplicates do not cause divergence.

handle.create

For host-side creates not tied to a kernel call (rare).

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "object_type": 1,
  "object_name": null,
  "raw_handle_id": "0xf8000048"
}

handle.destroy

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "raw_handle_id": "0xf8000048",
  "prior_refcount": 1
}

wait.begin

"payload": {
  "handles_semantic_ids": ["0123...", "abcd..."],
  "timeout_ns": -1,
  "alertable": false,
  "wait_type": "any"
}

timeout_ns = -1 means INFINITE. wait_type is "any" or "all".

wait.end

"payload": {
  "status": "0x00000000",
  "woken_by_semantic_id": "0123456789abcdef",
  "wait_duration_cycles": 12345
}

wait_duration_cycles is deterministic = false (host scheduling affects it). woken_by_semantic_id is null on timeout / alerted.

mem.write

OPT-IN — gated by a separate cvar (phase_a_event_log_mem_writes, default false). In Phase A this kind is reserved; emitters MAY ship a TODO stub. Schema:

"payload": {
  "guest_addr": "0x82000000",
  "value": "0x12345678",
  "size": 4,
  "source": "guest_jit"
}

vfs.open / vfs.read / vfs.close

File-IO events, separate from kernel.call so the diff tool can match on canonical path:

"payload": {"canonical_path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik", "raw_handle_id": "0x40", "handle_semantic_id": "..."}

import.call

Emitted at the syscall dispatcher (ours) or the import-stub JIT trap (canary), one per imported function invocation, before the implementing kernel.call.

"payload": {
  "module": "xboxkrnl.exe",
  "ord": 0x101,
  "name": "NtCreateFile"
}

Diff-tool field-comparison rules

Field Rule
engine skipped (always differs)
host_ns skipped (host-clock)
guest_cycle skipped (engines disagree on absolute count; diff uses tid_event_idx)
raw_handle_id skipped (engines use different handle namespaces)
handle_semantic_id C+15-α: skipped (engine-local — see below)
handles_semantic_ids (wait.begin) C+15-α: skipped (same reason)
parent_tid (thread.create) C+15-α: skipped (engine-local guest tids)
ctx_ptr (thread.create) C+22 v1.7: per-(tid, kind, field) ordinal sentinel (<HOSTHEAP_thread.create_ctx_ptr_N>) — host-heap-derived VA, AUDIT-043 ε class
woken_by_semantic_id (wait.end) C+15-α: skipped (engine-local SID)
deterministic (event-level field) skipped (metadata)
Any payload field listed under a non-deterministic kind skipped where flagged
All other payload fields strict equality

Phase C+15-α note on handle_semantic_id

The SID computation includes creating_tid as input, but guest TIDs differ between engines (canary's tid=6 maps to ours's tid=1 on the main chain). Both engines compute SIDs using their own local tids, so the same logical handle gets two different SIDs across engines. The diff tool skip-compares SID fields and relies on tid_event_idx + object_type for alignment.

A future schema v2 could canonicalize SIDs via the diff tool's tid map and restore strict comparison. For v1.1 the simpler skip-policy suffices.

Shared-global SIDs (v1.2 — added in Phase C+18)

A subset of guest kernel dispatcher objects (KEVENT, KSEMAPHORE, KTIMER, KMUTANT) are process-global: they live in statically-initialized or pre-allocated guest memory and are touched by MULTIPLE guest threads during boot. Examples include the XAudio voice-volume change-mask semaphore at 0x828a3230 in Sylpheed.

Canary's XObject::GetNativeObject (src/xenia/kernel/xobject.cc:397-483) and ours's ensure_dispatcher_object (crates/xenia-kernel/src/exports.rs:4363) lazy-wrap these dispatchers on first guest-thread touch: the first KeWait* invocation that passes the raw kernel-object pointer synthesizes the XObject wrapper, stamps the X_DISPATCH_HEADER with the kXObjSignature marker ('X','E','N','\0' = 0x58454E00), stashes the handle, and emits handle.create. Subsequent touches find the marker and short-circuit without emit (per-pointer idempotent).

The first-toucher race

Which guest thread wins the "first toucher" race is timing-dependent:

  • Canary and ours have different host schedulers, JIT throughput, and guest-thread bootstrap ordering.
  • Even within the same engine across runs the first-toucher can differ — but each engine produces a deterministic per-run total ordering, so cold-vs-cold reproducibility holds.

The per-thread SID recipe semantic_id(create_site_pc, creating_tid, tid_event_idx_at_creation, object_type) (v1) depends on BOTH creating_tid and tid_event_idx_at_creation, so:

  • Same dispatcher → DIFFERENT SIDs in each engine (race-dependent).
  • handle.create for the same object lands on different per-tid streams in canary vs ours.

The C+17 fix made ours emit handle.create for these synthesized shadows, but the C+17 D-NEW-3 regression on tid=15→10 was exactly the first-toucher race: ours's tid=10 was the first toucher locally; canary's tid=15 was NOT the first toucher in its run — some other canary tid had already adopted 0x828a3230. ours's tid=10 emitted an "extra" handle.create that canary's tid=15 lacked, and the diff tool flagged a kind mismatch at idx=2.

The C+18 fix: deterministic SID recipe

Process-global dispatchers use a second SID recipe that is scheduling-invariant. Both engines now use:

SHARED_GLOBAL_SID_MARKER = 0xC01AB005  (fixed sentinel, both engines)

input_bytes =
    le_u32(SHARED_GLOBAL_SID_MARKER)   // 4 bytes — "create_site_pc" slot
  ‖ le_u32(0)                          // 4 bytes — "creating_tid" slot
  ‖ le_u64(pointer)                    // 8 bytes — "tid_event_idx" slot
  ‖ le_u32(object_type)                // 4 bytes

hash = FNV-1a-64(input_bytes)
shared_global_sid = format("{:016x}", hash)

The marker 0xC01AB005 is outside any plausible guest-PC range (PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap 0x4xxxxxxx), so it can never collide with a regular per-thread SID (which uses a real guest PC as create_site_pc).

Both engines compute the SAME SID for the same dispatcher pointer regardless of:

  • which guest thread is the first toucher,
  • the tid_event_idx_at_creation,
  • the per-engine scheduling order.

Which call sites use which recipe

Call site SID recipe
KernelState::alloc_handle_for (ours) per-thread
ObjectTable::AddHandle direct (canary) per-thread
ensure_dispatcher_object (ours) shared-global
XObject::GetNativeObject synthesized (canary) shared-global

Regular per-thread handle.create events (file open, thread create, named-event create, etc.) keep the v1 per-thread recipe. The shared-global recipe is restricted to lazy-wrap synthesis.

Diff tool: cross-tid floating handle.create matching

The diff tool pre-pass collects all shared-global SIDs in either engine's stream. A handle.create event is detected as shared-global by recomputing the deterministic SID from its (raw_handle_id, object_type) payload and comparing against the event's handle_semantic_id. Regular per-thread SIDs cannot match this check by construction.

When per-tid alignment finds a kind mismatch and one side has a shared-global handle.create whose SID is in the floating set:

  • The diff tool advances ONLY that side's stream pointer past the floating event.
  • Re-compare at the same canonical position.

The diff report's summary table shows a floating_skipped (c/o) column for visibility — counts of absorbed events per side.

Index relaxation

The C+18 fix relaxes the legacy diff-tool rule that requires canary.tid_event_idx == ours.tid_event_idx for matching events. With floating absorption, the per-tid indices can drift by 1 between the two sides — but the kind and payload comparisons remain strict. The raw indices are still preserved on the events themselves (useful for debugging and report context).

Backward compatibility

  • Wire format unchanged. schema_version is still 1.
  • Pre-C+18 event logs (no shared-global SIDs in the stream) trigger the legacy code path automatically — the floating set is empty.
  • The marker constant 0xC01AB005 MUST be exactly this value in both engines and the diff tool. Tests in both engines plus tools/diff-events/test_diff_events.py lock it in.

Wait-begin floating absorb (v1.3 — added in Phase C+21)

Motivation

Canary's RtlEnterCriticalSection (and its symmetric counterparts — KeWaitForSingleObject invoked on a process-global dispatcher, mutex/semaphore contended-acquire paths) emits wait.begin only on the contended slow path. The fast path (uncontended atomic-CAS, or recursive bump) emits NO wait.begin and only the kernel.callkernel.return pair. Which path is taken depends on whether ANOTHER guest thread is currently holding the dispatcher when the wait is attempted — i.e. it is host-scheduler-driven, varying across cold runs of the same engine.

Reading-error class #32 (documented in C+20's investigation.md) captures this: cross-checking 3 fresh canary cold runs at canary tid=6 idx 104,606 showed:

  • jitter-1: wait.begin sid=75ae880ec432eb36 (contended)
  • jitter-2: kernel.return (fast — matches ours)
  • jitter-3: offset-shifted wait.begin at a different idx with a different SID

The matched-prefix metric is unreliable inside such regions if the diff tool treats wait.begin events as strictly positional.

The fix

A wait.begin event is floating if at least one of its payload.handles_semantic_ids references a shared-global SID (see §"Shared-global SIDs"). During the per-tid two-pointer walk:

  • If one side has a floating wait.begin and the other has a different kind at the same canonical position, advance ONLY the wait.begin side's pointer and re-compare.

wait_type=all waits are floating as long as ANY single handle in the set is shared-global — the entire wait's blocking behavior is timing-dependent if even one of its handles is on a process-global dispatcher.

Shared-global SID detection (extended in C+21)

The diff tool's collect_shared_global_sids pre-pass now unions TWO sources:

  1. Recipe-matching handle.create events (Phase C+18 — direct). This catches ours's ensure_dispatcher_object output where raw_handle_id == ptr (the recipe-input pointer).

  2. Cross-tid usage heuristic (Phase C+21 — indirect). Any SID referenced via handle.create OR wait.begin on two or more distinct guest tids in EITHER engine is treated as shared-global.

The cross-tid heuristic exists because canary's EmitHandleCreateSharedGlobal (event_log.cc:435) emits the SID computed from the dispatcher VA but stashes object->handle() (a handle-table slot in the 0xF8xxxxxx region) as raw_handle_id. Those two values DIFFER, so canary's shared-global handle.create events are NOT recipe-recognizable from their payload alone. Multi-tid SID usage is a robust observational signal: per-thread SIDs by construction stay on the single creating tid (their hash inputs include creating_tid), so any cross-tid SID usage indicates a process-global dispatcher.

Risk of over-absorption (and why it's bounded)

The cross-tid heuristic could in principle mis-classify a per-thread SID that one thread creates and another thread waits on — a legitimate cross-thread synchronization pattern. The floating-absorb, however, only fires on a kind mismatch at the canonical position. Per-thread waits that match strictly on both sides advance normally without any absorb. The heuristic only loosens alignment when one side is missing a handle.create or wait.begin — exactly the scheduling-jitter window the C+21 fix targets.

Diff-tool report changes

The summary table's floating_skipped (c/o) column is split into two columns:

  • floating_create (c/o) — C+18 handle.create absorptions.
  • floating_wait (c/o) — C+21 wait.begin absorptions.

Both per-side and observation-only — counts may legitimately be non-zero in a clean run.

Backward compatibility

  • Wire format unchanged. schema_version is still 1.
  • Pre-C+21 event logs (no wait.begin events that reference shared-global SIDs) trigger no new behavior — the wait absorption branches are inert.
  • The C+18 floating-create logic is unchanged; the C+21 fix is strictly additive.
  • Engine source is UNCHANGED in C+21 — the fix is in the diff tool only.

contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18)

Motivation

The 104,607 cap is canary's tid=6 contending on a CS while ours's tid=1 fast-paths through the same call (Phase C+22). Schedules diverge for host-OS reasons, so neither engine is "wrong" — but matched-prefix stalls. Phase D's H' approach makes ours's rtl_enter_critical_section replay canary's contention by consulting a per-call manifest built from canary's contention trace.

Stage 1 (this section) introduces the canary-side emitter for that manifest: a new event kind contention.observed that fires from RtlEnterCriticalSection_entry (xboxkrnl_rtl.cc:596-633) just before the call falls through to xeKeWaitForSingleObject after spin-loop exhaustion. Cvar-gated (kernel_emit_contention, default false) so default canary behavior is byte-identical.

Event shape

{
  "schema_version": 1,
  "engine": "canary",
  "kind": "contention.observed",
  "tid": <guest tid of caller>,
  "tid_event_idx": <per-tid ordinal  consumes one slot>,
  "guest_cycle": 0,
  "host_ns": <emit timestamp>,
  "deterministic": true,
  "payload": {
    "cs_ptr": "0xHHHHHHHH",        // guest VA of the RTL_CRITICAL_SECTION
    "site_sid": "HHHHHHHHHHHHHHHH", // shared-global SID (see below)
    "contended": true              // always true at v1.4 (uncontended is implicit)
  }
}

site_sid is computed via the C+18 shared-global SID recipe:

site_sid = FNV-1a-64 over
  ( kSharedGlobalSidMarker [u32 LE]    // 0xC01AB005
  , 0 [u32 LE]                          // creating_tid (unused)
  , cs_ptr as u64 [u64 LE]              // pointer-as-idx
  , kObjCriticalSection [u32 LE]        // 0x0C, new in v1.4
  )

Both engines compute the same SID for the same CS pointer. The marker constant kObjCriticalSection = 0x0C is the new ObjectType value introduced for this kind; it does NOT correspond to a real XObject (CS lives as a guest-memory struct, not a handle-tabled object).

When emitted (canary)

In RtlEnterCriticalSection_entry:

  1. Recursive-lock fast path (already own lock) → NO emit (not contention).
  2. Spin-loop succeeds (atomic_cas flips lock_count from -1 → 0) → NO emit (fast acquire).
  3. Spin-loop exhausted AND atomic_inc(&cs->lock_count) != 0EMIT with contended=true, then xeKeWaitForSingleObject.
  4. Spin-loop exhausted AND atomic_inc(...) == 0 (CS became free between spin and inc) → NO emit (we won the race after spin).

The emit point sits between atomic_inc's positive result and the xeKeWaitForSingleObject call, so the new event always precedes the existing wait.begin event in the per-tid ordinal.

When emitted (ours, Stage 3 — pending)

Stage 3 will add a symmetric emit in rtl_enter_critical_section (xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946) at the forced-park branch driven by the manifest. This keeps per-tid ordinals aligned across engines after replay.

Diff-tool treatment (Stage 4 — pending)

contention.observed will be added to ENGINE_LOCAL_KINDS in diff_events.py: the per-tid pointer advances past these events on either side without comparison. This keeps matched-prefix counts unchanged when ONE side emits the event (Stage 1's canary-only world) or when BOTH emit at the same ordinal (Stage 3's parity world).

Cvar default + byte-identity

kernel_emit_contention=false by default. With cvar=false, the helper phase_a::EmitContentionObserved short-circuits at the cvar check before any IsEnabled() lookup. The pre-Stage-1 canary code path is preserved byte-for-byte; cvar-OFF cold runs produce zero contention.observed events (validated on the Stage 1 cold run: 0 occurrences in a 4.4 GB / 18.6 M event trace).

Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18)

Status

Band-aid. Explicit annotation: this absorber CROSSES the reading-error #23 boundary in spirit. It folds real guest control-flow divergence at the diff-tool layer. It exists because the underlying root cause — producer-throughput divergence under the cooperative-vs-preemptive scheduling mismatch (see Phase D forensics) — is explicitly out of scope for the H' plan: fixing it in ours's engine would require preempting the cooperative scheduler, which invalidates 23 phases of digest stability. The absorber is the practical compromise.

Trigger shape

The absorber fires ONLY at a kind mismatch of:

  • canary[ic] = import.call with payload.name == "RtlEnterCriticalSection"
  • ours[io] = import.call with payload.name == "RtlLeaveCriticalSection"

For any other kind mismatch, the absorber is silent. This narrowness is intentional: real engine divergences appear in other shapes and must still surface.

Behavior

When the trigger pattern matches, canary's stream is scanned for one or more balanced [Enter-block, Leave-block] pairs immediately following the trigger position:

  • An Enter-block is 3 consecutive events: import.call RtlEnterCriticalSection → kernel.call RtlEnterCriticalSection → kernel.return RtlEnterCriticalSection.
  • A Leave-block is 3 consecutive events with RtlLeaveCriticalSection.

The absorber consumes pairs greedily up to a cap of _NESTED_CS_PAIR_CAP = 32 pairs (empirically, Sylpheed's worst-case is ~10-15 pairs at the 104,607 cap). After consuming each pair, it checks whether canary's next event has the SAME kind AND same payload.name as ours[io]. The first convergence wins; canary's pointer is advanced past the absorbed pairs.

If no convergence is found within the cap, the absorber returns None and the divergence falls through to normal reporting.

Why this is safe (within #23's spirit)

  1. The absorption only happens when canary's stream re-aligns with ours's stream past the nested block. If it doesn't re-align, the real divergence is reported.
  2. The nested-block shape matches a specific PPC pattern: the consumer thread in canary acquires a CS, calls a helper that iterates a tree/registry, takes the nested-CS-enter path for each item, and releases the outer CS. Ours's tree is shorter so it skips this. The net effect on guest state is bounded: ours has fewer items processed in this iteration, but the EVENT stream past the absorption resumes the same logical operation.
  3. The Phase B image_loaded_sha256 is the foundational invariant. It's unaffected by this absorber (no engine source change).

Why this is NOT safe in the general sense

  • Diverging downstream state IS lost: ours's tree has fewer entries than canary's after the absorbed block. Subsequent ours operations that touch the tree will behave differently. Other absorbers / fixes will be needed if those state-differences manifest later.
  • A future engine bug that produces a spuriously nested Enter+Leave pair could be falsely absorbed. Mitigation: the absorber requires canary's post-block stream to re-align with ours's; spurious nested pairs without re-alignment fall through to normal divergence reporting.

Empirical result (Sylpheed 104,607 cap)

Pre-absorber (post-Stage-3+4): main matched-prefix = 104,607 (cap). Post-absorber: main matched-prefix = 105,046 (+439 events).

The next divergence is at idx 105,046 on VdInitializeEngines.return_value (canary=1, ours=0) — an unrelated engine bug in the video subsystem, NOT a recurrence of the cap pattern. Sister chains preserved (11/32/4/41/16).

Tests

Three unit tests in test_diff_events.py:

  • test_nested_cs_cleanup_block_absorbed_when_convergent — folds one nested pair
  • test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges — confirms re-alignment requirement
  • test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup — negative case

sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21)

Motivation

AUDIT-069 Sessions 1-5 established that ours under-produces semaphore releases by ~80% on the work-semaphore vs canary (99 vs 414 in S5, refined in S6 to 83 vs 414 apples-to-apples on the work semaphore alone). The measurement infrastructure was a one-off cvar (audit_70_semaphore_release_watch, hand-built per-handle log lines) plus an ours-side --lr-trace capture at the wrapper-entry PC. Future AUDIT-070+ sessions and any general regression triage need this metric to be diff-visible without bespoke cvars per investigation.

sema.release lifts the AUDIT-070 cvar's signal into the Phase A schema as a symmetric event kind in both engines.

Event shape

{
  "schema_version": 1,
  "engine": "canary",
  "kind": "sema.release",
  "tid": <guest tid of caller>,
  "tid_event_idx": <per-tid ordinal  consumes one slot>,
  "guest_cycle": <PPC timebase>,
  "host_ns": <emit timestamp>,
  "deterministic": true,
  "payload": {
    "handle_semantic_id": "HHHHHHHHHHHHHHHH",  // shared-global SID for the work-sem
    "raw_handle_id": "0xHHHHHHHH",             // engine-local
    "release_count": 1,                         // games typically release 1
    "previous_count": 0,                        // semaphore count BEFORE release
    "caller_pc": "0xHHHHHHHH"                   // guest LR at release time
  }
}

SID recipe

The work-semaphore in Sylpheed (canary handle 0xF800003C, ours handle 0x1044) is a process-global dispatcher in the C+18 sense: it lives in pre-allocated guest memory and is touched by multiple guest threads (main, worker, cache-thread, other producers). Its handle_semantic_id SHOULD use the shared-global recipe (ComputeSharedGlobalSemanticId(dispatcher_ptr, kObjSemaphore=0x03)) so canary and ours produce the same SID for the same guest dispatcher.

Per-thread semaphores (rare in Sylpheed) MAY use the v1 per-thread recipe; the diff tool does NOT compare SIDs for sema.release (the kind is engine-local positionally — see below).

Why engine-local

Per AUDIT-069 H3 and S6's first-N=20 measurement, the cadence and ordinal interleaving of releases between the worker, main, and cache-thread are timing-dependent: the first 20 releases match perfectly across engines, but worker tid diverges at canary ord=83 when the cache-thread's first release fires (which ours never emits because ours's cache-thread wedges at sub_821CB030+0x1AC). Strict positional alignment would always trip on this known divergence.

sema.release is therefore in ENGINE_LOCAL_KINDS in the diff tool (alongside contention.observed): both engines emit, but the diff tool advances past these events on either side without alignment. The count is surfaced in the report's "Counted engine-local kinds" summary table (per-tid + total per engine) so cadence regressions are diff-visible at-a-glance.

Emit points (planned, NOT yet wired)

  • Canary: extend audit_70_semaphore_release_watch to call phase_a::EmitSemaRelease(handle, count, prev_count) from NtReleaseSemaphore_entry + xeKeReleaseSemaphore. Cvar gating remains the existing audit_70_semaphore_release_watch (or a new phase_a_event_log_sema_releases=false for finer control).
  • Ours: emit sema.release from nt_release_semaphore in crates/xenia-kernel/src/exports.rs and from KSemaphore::release (kernel-mode equivalent). Default-off via a runtime flag; default cold runs must remain digest-stable.

Both engines MUST emit at handler entry (not wrapper-internal) so the event count corresponds 1:1 to guest NtReleaseSemaphore invocations, matching the canary cvar's existing semantics.

Status

  • Diff tool: support landed (this session, v1.6). sema.release in ENGINE_LOCAL_KINDS + COUNTED_ENGINE_LOCAL_KINDS; counts surfaced in report summary; 3 new tests in test_diff_events.py.
  • Canary emit: NOT YET WIRED. Planned for AUDIT-070+ when the root cause investigation requires it. Existing cvar audit_70_semaphore_release_watch continues to emit non-schema log lines (used by S5/S6 captures).
  • Ours emit: NOT YET WIRED. See above.

Backward compatibility

  • Wire format unchanged. schema_version is still 1.
  • Pre-v1.6 event logs (no sema.release events) trigger no new behavior — the engine-local skip branches are inert; the "Counted engine-local kinds" report section is suppressed when no counted-kind events exist.
  • Diff tool changes are purely additive: existing engine binaries diff identically pre- and post-v1.6.

Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26)

Motivation

C+2 (ALLOCATOR_RETURN_FNS) canonicalizes kernel.return.return_value for a known set of host-allocator-returning exports (MmAllocatePhysicalMemoryEx, RtlAllocateHeap, …). That covers the case where the allocated VA appears as the function's return value. But the same allocator-drift class (AUDIT-043 ε: canary's BC physical heap 0xBCxxxxxx vs ours's unified user heap 0x4xxxxxxx) ALSO surfaces inside typed event payloads of non-allocator exports — most notably the thread.create.ctx_ptr field, which holds the host-allocated TLS/context block that ExCreateThread passes to the new guest thread's r3.

Empirical surface (C+22 cold-vs-cold idx 105,128 on the Sylpheed audio-stack worker ExCreateThread(entry_pc=0x824cd458)):

field canary ours
ctx_ptr 0xbe56bb3c (BC physical heap) 0x42453b3c (unified user heap)
entry_pc 0x824cd458 0x824cd458 (bit-identical — game code)
priority 0 0
affinity 4 4
stack_size 32768 32768
suspended false false

The C+2 ALLOCATOR_RETURN_FNS mechanism doesn't help here because ExCreateThread's return value is the new thread's handle (canary's 0xF8xxxxxx vs ours's 0x4, 0x8, …), already covered by handle_semantic_id skip-policy. The host-heap-allocated context block is a side-channel field inside the thread.create event payload.

The fix

HOST_HEAP_PAYLOAD_FIELDS_BY_KIND maps event kind → tuple of payload field names. Each listed field's value (expected 0x-prefixed hex string) is rewritten to a per-(tid, kind, field) ordinal sentinel <HOSTHEAP_<KIND>_<FIELD>_<ORDINAL>> BEFORE payload comparison. The mechanism mirrors canonicalize_allocator_returns exactly, restricted to typed payload fields.

Initial set (v1.7):

HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = {
    "thread.create": ("ctx_ptr",),
}

Strict-field preservation

For each canonicalized event kind, the strict fields (game-visible attributes that MUST match across engines) are untouched. For thread.create these are:

  • entry_pc — guest VA of the new thread's entry function, bit- identical in both engines because both engines load the same XEX and the entry comes from guest code.
  • priority, affinity, stack_size, suspended — game-visible thread attributes the guest passes to ExCreateThread.

Skip-policy fields (handle_semantic_id, parent_tid) continue to be skipped via SKIP_PAYLOAD_FIELDS_BY_KIND (unchanged from C+15-α — see "Diff-tool field-comparison rules" above).

Why parent_tid does NOT need new canonicalization

Per the C+15-α skip-policy table, parent_tid is already in SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]. The diff tool pairs guest TIDs at the chain level (--tid-map or auto_tid_map), and the per-event parent_tid is engine-local (canary tid=6 vs ours tid=1 for the same logical "main thread" chain). Skipping is sufficient — no ordinal sentinel needed.

Could a future schema v2 canonicalize parent_tid via the tid map? Yes, but it would surface mismatches as a map gap rather than as a clearer per-tid alignment failure that's already visible at chain boundaries. The v1.x skip-policy is the simpler choice; tests pin the existing behavior so it doesn't regress.

Ordinal-count contract

As with ALLOCATOR_RETURN_FNS: if one engine emits MORE thread.create events on a given tid than the other, ordinals drift and the next typed event surfaces a divergence against whatever the other side has at that position. Ordinal-count mismatch IS a behavioral divergence — the canonicalization preserves divergence detection, only collapsing host-allocator-VA noise.

Defensive value handling

If ctx_ptr is non-string (None, int, missing) — pre-C+22 event logs whose emitter omits the field — the canonicalizer leaves it untouched and does NOT consume an ordinal. The next string-typed value gets ordinal 0. This keeps pre-v1.7 logs diffable without forcing an emitter retrofit.

Backward compatibility

  • Wire format unchanged. schema_version is still 1.
  • Pre-C+22 event logs whose thread.create.ctx_ptr happens to bit-match (e.g. static-allocator addresses like 0x828F3D08 that BOTH engines use for the pre-XEX kernel-state ctxs) still match strictly via the ordinal sentinel — they get the same ordinal in both engines.
  • The --no-canonicalize-host-heap-fields CLI flag disables the pass (reverts to raw-VA comparison), mirroring the existing --no-canonicalize-allocators. Used by gate tests and investigation rerun.
  • Engine source is UNCHANGED in C+22 — the fix is in the diff tool only.

Extension shape

The map shape kind -> (field, …) is intentionally minimal: each entry is one event kind plus the fields on it that hold host-heap VAs. Future entries could include e.g. thread.create.tls_ptr (if such a field is added to the schema) or a hypothetical vfs.mmap.host_ptr. Strict-field policy remains: any field NOT listed here is compared bit-identically.

Forward compatibility

Phase A's original schema-v1 declared 13 sections (16 distinct kind strings); Phase A wired 4 of them. Phase C+15-α wired an additional 5 (handle.create, handle.destroy, thread.create, thread.exit, wait.begin). wait.end, thread.suspend/resume, mem.write, vfs.open/read/close remain declared but unwired; adding them is additive surface area at schema v1.1+.

A future schema v2 may break wire format (e.g. canonical SIDs, structured args). Both engines pin schema_version = 1 in this phase; the diff tool refuses to mix v1 and v2 inputs.