Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
36 KiB
Phase A — Event Log Schema v1
Status: frozen for Phase A and Phase B. Adding a new event kind requires a schema_version bump and a coordinated update in both engines + the diff tool.
Wire format
JSONL — one JSON object per line, UTF-8, \n-terminated. Both engines emit the same byte format.
The first line of every event-log file MUST be a schema_version event:
{"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,"guest_cycle":0,"host_ns":0,"deterministic":true,"payload":{"version":1,"emitter_build":"<commit-or-build-id>"}}
The diff tool refuses to parse a file whose first event is not schema_version with version 1.
Common fields (every event)
| Field | Type | Notes |
|---|---|---|
schema_version |
int | always 1 in this phase |
engine |
string | "canary" or "ours" |
kind |
string | one of the v1 kinds below |
tid |
int | guest thread id of the calling thread (host TID never logged) |
tid_event_idx |
int | per-tid monotonic, starts at 0 — the diff key |
guest_cycle |
int | per-engine monotonic guest-instruction count; 0 if the engine cannot supply one (see "Cycle source" below). NOT used by the diff tool for correctness — tid_event_idx is the canonical key. |
host_ns |
int | host monotonic-clock ns since process start; debug only, never compared by diff |
deterministic |
bool | false if any payload field is derived from host time / raw allocator address / RNG / etc. Diff tool skip-compares non-deterministic fields. |
payload |
object | kind-specific (see below) |
Cycle source notes
- canary: the PPC
tb(timebase) register can be read from the PPCContext passed into shim handlers. If a hook is on a path that does not have access to a PPCContext (e.g. a host-side handle-table destructor), the emitter MUST setguest_cycle = 0and leavedeterministic = falseon the payload-side metadata. The diff tool ignoresguest_cyclefor ordering —tid_event_idxis canonical. - ours:
scheduler.thread(current_ref()).ctx.timebase(already maintained per guest thread).
Per-tid event index
Both engines maintain a per-tid monotonic counter starting at 0. The counter is bumped before the event is serialized, so the first event for tid N has tid_event_idx = 0.
The schema_version event is special: it is emitted by the writer thread (typically the boot thread before any guest code has run) with tid = 0 and tid_event_idx = 0. The actual guest thread 0 does not exist; the diff tool treats tid = 0 as the schema header only.
Handle semantic ID
Canary and ours produce guest handles in different ranges (canary: 0xF8xxxxxx region; ours: bump-allocated 0x4, 0x8, 0xC, …). Raw handle IDs are unsuitable as a cross-engine identity. Instead, both engines compute a stable handle semantic ID at handle creation time using FNV-1a 64-bit over a fixed-format byte string. FNV-1a is used (not SHA256) because both engines can implement it in <10 lines with no dependency, and the diff tool only needs a deterministic identity hash — not a crypto property.
input_bytes = le_u32(create_site_pc) ‖ le_u32(creating_tid) ‖ le_u64(tid_event_idx_at_creation) ‖ le_u32(object_type)
hash = 0xCBF29CE484222325
for each byte b in input_bytes:
hash = (hash XOR b) * 0x100000001B3 mod 2^64
handle_semantic_id = format("{:016x}", hash)
Both engines MUST emit the lowercase 16-hex-char form. The create_site_pc is the guest PC at the call site of the kernel call that created the handle: in canary, PPCContext::lr - 4 (the bl to the import stub); in ours, the equivalent return address from the syscall dispatcher.
Object type codes (v1 — both engines agree):
| Code | Type |
|---|---|
0x00 |
Unknown |
0x01 |
Event |
0x02 |
Mutant |
0x03 |
Semaphore |
0x04 |
Timer |
0x05 |
Thread |
0x06 |
File |
0x07 |
IoCompletion |
0x08 |
Module |
0x09 |
EnumState |
0x0A |
Section |
0x0B |
Notification |
All subsequent events that reference a handle emit BOTH handle_semantic_id (the diff key) and raw_handle_id (engine-local, never compared).
Event kinds (v1)
schema_version
Header event. payload = {"version": 1, "emitter_build": "<string>"}.
thread.create
Emitted by the parent thread at the kernel call that creates the new thread.
"payload": {
"handle_semantic_id": "0123456789abcdef",
"parent_tid": 1,
"entry_pc": "0x82001234",
"ctx_ptr": "0xbce25340",
"priority": 0,
"affinity": 1,
"stack_size": 65536,
"suspended": false
}
thread.exit
Emitted by the exiting thread (last event before tid disappears).
"payload": {"exit_code": 0}
thread.suspend / thread.resume
"payload": {"target_tid": 13}
kernel.call
Emit at handler entry, before any side effects.
"payload": {
"name": "NtCreateFile",
"args": {"file_handle_ptr": "0x70000010", "desired_access": "0x80100080", "obj_attr_ptr": "0x70000020", ...},
"args_resolved": {"path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik"}
}
- Numeric args use
0x-prefixed hex strings if pointer-typed; ints stay as ints. args_resolvedis a best-effort dereference (strings, struct dumps, buffer summaries). Optional.
kernel.return
Emit at handler exit, after all side effects committed.
"payload": {
"name": "NtCreateFile",
"return_value": 0,
"status": "0x00000000",
"side_effects": [
{"kind": "handle.create", "handle_semantic_id": "...", "object_type": 6, "raw_handle_id": "0x40"}
]
}
The side_effects array MAY duplicate events also emitted as standalone (handle.create). The diff tool treats both as authoritative; duplicates do not cause divergence.
handle.create
For host-side creates not tied to a kernel call (rare).
"payload": {
"handle_semantic_id": "0123456789abcdef",
"object_type": 1,
"object_name": null,
"raw_handle_id": "0xf8000048"
}
handle.destroy
"payload": {
"handle_semantic_id": "0123456789abcdef",
"raw_handle_id": "0xf8000048",
"prior_refcount": 1
}
wait.begin
"payload": {
"handles_semantic_ids": ["0123...", "abcd..."],
"timeout_ns": -1,
"alertable": false,
"wait_type": "any"
}
timeout_ns = -1 means INFINITE. wait_type is "any" or "all".
wait.end
"payload": {
"status": "0x00000000",
"woken_by_semantic_id": "0123456789abcdef",
"wait_duration_cycles": 12345
}
wait_duration_cycles is deterministic = false (host scheduling affects it). woken_by_semantic_id is null on timeout / alerted.
mem.write
OPT-IN — gated by a separate cvar (phase_a_event_log_mem_writes, default false). In Phase A this kind is reserved; emitters MAY ship a TODO stub. Schema:
"payload": {
"guest_addr": "0x82000000",
"value": "0x12345678",
"size": 4,
"source": "guest_jit"
}
vfs.open / vfs.read / vfs.close
File-IO events, separate from kernel.call so the diff tool can match on canonical path:
"payload": {"canonical_path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik", "raw_handle_id": "0x40", "handle_semantic_id": "..."}
import.call
Emitted at the syscall dispatcher (ours) or the import-stub JIT trap (canary), one per imported function invocation, before the implementing kernel.call.
"payload": {
"module": "xboxkrnl.exe",
"ord": 0x101,
"name": "NtCreateFile"
}
Diff-tool field-comparison rules
| Field | Rule |
|---|---|
engine |
skipped (always differs) |
host_ns |
skipped (host-clock) |
guest_cycle |
skipped (engines disagree on absolute count; diff uses tid_event_idx) |
raw_handle_id |
skipped (engines use different handle namespaces) |
handle_semantic_id |
C+15-α: skipped (engine-local — see below) |
handles_semantic_ids (wait.begin) |
C+15-α: skipped (same reason) |
parent_tid (thread.create) |
C+15-α: skipped (engine-local guest tids) |
ctx_ptr (thread.create) |
C+22 v1.7: per-(tid, kind, field) ordinal sentinel (<HOSTHEAP_thread.create_ctx_ptr_N>) — host-heap-derived VA, AUDIT-043 ε class |
woken_by_semantic_id (wait.end) |
C+15-α: skipped (engine-local SID) |
deterministic (event-level field) |
skipped (metadata) |
| Any payload field listed under a non-deterministic kind | skipped where flagged |
| All other payload fields | strict equality |
Phase C+15-α note on handle_semantic_id
The SID computation includes creating_tid as input, but guest TIDs differ
between engines (canary's tid=6 maps to ours's tid=1 on the main chain).
Both engines compute SIDs using their own local tids, so the same logical
handle gets two different SIDs across engines. The diff tool skip-compares
SID fields and relies on tid_event_idx + object_type for alignment.
A future schema v2 could canonicalize SIDs via the diff tool's tid map and restore strict comparison. For v1.1 the simpler skip-policy suffices.
Shared-global SIDs (v1.2 — added in Phase C+18)
A subset of guest kernel dispatcher objects (KEVENT, KSEMAPHORE,
KTIMER, KMUTANT) are process-global: they live in
statically-initialized or pre-allocated guest memory and are touched
by MULTIPLE guest threads during boot. Examples include the XAudio
voice-volume change-mask semaphore at 0x828a3230 in Sylpheed.
Canary's XObject::GetNativeObject (src/xenia/kernel/xobject.cc:397-483)
and ours's ensure_dispatcher_object (crates/xenia-kernel/src/exports.rs:4363)
lazy-wrap these dispatchers on first guest-thread touch: the
first KeWait* invocation that passes the raw kernel-object pointer
synthesizes the XObject wrapper, stamps the X_DISPATCH_HEADER with
the kXObjSignature marker ('X','E','N','\0' = 0x58454E00), stashes
the handle, and emits handle.create. Subsequent touches find the
marker and short-circuit without emit (per-pointer idempotent).
The first-toucher race
Which guest thread wins the "first toucher" race is timing-dependent:
- Canary and ours have different host schedulers, JIT throughput, and guest-thread bootstrap ordering.
- Even within the same engine across runs the first-toucher can differ — but each engine produces a deterministic per-run total ordering, so cold-vs-cold reproducibility holds.
The per-thread SID recipe semantic_id(create_site_pc, creating_tid, tid_event_idx_at_creation, object_type) (v1) depends on BOTH
creating_tid and tid_event_idx_at_creation, so:
- Same dispatcher → DIFFERENT SIDs in each engine (race-dependent).
handle.createfor the same object lands on different per-tid streams in canary vs ours.
The C+17 fix made ours emit handle.create for these synthesized
shadows, but the C+17 D-NEW-3 regression on tid=15→10 was
exactly the first-toucher race: ours's tid=10 was the first toucher
locally; canary's tid=15 was NOT the first toucher in its run — some
other canary tid had already adopted 0x828a3230. ours's tid=10
emitted an "extra" handle.create that canary's tid=15 lacked, and
the diff tool flagged a kind mismatch at idx=2.
The C+18 fix: deterministic SID recipe
Process-global dispatchers use a second SID recipe that is scheduling-invariant. Both engines now use:
SHARED_GLOBAL_SID_MARKER = 0xC01AB005 (fixed sentinel, both engines)
input_bytes =
le_u32(SHARED_GLOBAL_SID_MARKER) // 4 bytes — "create_site_pc" slot
‖ le_u32(0) // 4 bytes — "creating_tid" slot
‖ le_u64(pointer) // 8 bytes — "tid_event_idx" slot
‖ le_u32(object_type) // 4 bytes
hash = FNV-1a-64(input_bytes)
shared_global_sid = format("{:016x}", hash)
The marker 0xC01AB005 is outside any plausible guest-PC range
(PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap
0x4xxxxxxx), so it can never collide with a regular per-thread SID
(which uses a real guest PC as create_site_pc).
Both engines compute the SAME SID for the same dispatcher pointer regardless of:
- which guest thread is the first toucher,
- the
tid_event_idx_at_creation, - the per-engine scheduling order.
Which call sites use which recipe
| Call site | SID recipe |
|---|---|
KernelState::alloc_handle_for (ours) |
per-thread |
ObjectTable::AddHandle direct (canary) |
per-thread |
ensure_dispatcher_object (ours) |
shared-global |
XObject::GetNativeObject synthesized (canary) |
shared-global |
Regular per-thread handle.create events (file open, thread create,
named-event create, etc.) keep the v1 per-thread recipe. The
shared-global recipe is restricted to lazy-wrap synthesis.
Diff tool: cross-tid floating handle.create matching
The diff tool pre-pass collects all shared-global SIDs in either
engine's stream. A handle.create event is detected as shared-global
by recomputing the deterministic SID from its (raw_handle_id, object_type) payload and comparing against the event's
handle_semantic_id. Regular per-thread SIDs cannot match this check
by construction.
When per-tid alignment finds a kind mismatch and one side has a
shared-global handle.create whose SID is in the floating set:
- The diff tool advances ONLY that side's stream pointer past the floating event.
- Re-compare at the same canonical position.
The diff report's summary table shows a floating_skipped (c/o)
column for visibility — counts of absorbed events per side.
Index relaxation
The C+18 fix relaxes the legacy diff-tool rule that requires
canary.tid_event_idx == ours.tid_event_idx for matching events.
With floating absorption, the per-tid indices can drift by 1 between
the two sides — but the kind and payload comparisons remain
strict. The raw indices are still preserved on the events themselves
(useful for debugging and report context).
Backward compatibility
- Wire format unchanged.
schema_versionis still1. - Pre-C+18 event logs (no shared-global SIDs in the stream) trigger the legacy code path automatically — the floating set is empty.
- The marker constant
0xC01AB005MUST be exactly this value in both engines and the diff tool. Tests in both engines plustools/diff-events/test_diff_events.pylock it in.
Wait-begin floating absorb (v1.3 — added in Phase C+21)
Motivation
Canary's RtlEnterCriticalSection (and its symmetric counterparts —
KeWaitForSingleObject invoked on a process-global dispatcher,
mutex/semaphore contended-acquire paths) emits wait.begin only on
the contended slow path. The fast path (uncontended atomic-CAS, or
recursive bump) emits NO wait.begin and only the kernel.call →
kernel.return pair. Which path is taken depends on whether ANOTHER
guest thread is currently holding the dispatcher when the wait is
attempted — i.e. it is host-scheduler-driven, varying across cold
runs of the same engine.
Reading-error class #32 (documented in C+20's
investigation.md) captures this: cross-checking 3 fresh canary cold
runs at canary tid=6 idx 104,606 showed:
- jitter-1:
wait.begin sid=75ae880ec432eb36(contended) - jitter-2:
kernel.return(fast — matches ours) - jitter-3: offset-shifted wait.begin at a different idx with a different SID
The matched-prefix metric is unreliable inside such regions if the diff tool treats wait.begin events as strictly positional.
The fix
A wait.begin event is floating if at least one of its
payload.handles_semantic_ids references a shared-global SID
(see §"Shared-global SIDs"). During the per-tid two-pointer walk:
- If one side has a floating
wait.beginand the other has a different kind at the same canonical position, advance ONLY the wait.begin side's pointer and re-compare.
wait_type=all waits are floating as long as ANY single handle in
the set is shared-global — the entire wait's blocking behavior is
timing-dependent if even one of its handles is on a process-global
dispatcher.
Shared-global SID detection (extended in C+21)
The diff tool's collect_shared_global_sids pre-pass now unions
TWO sources:
-
Recipe-matching
handle.createevents (Phase C+18 — direct). This catches ours'sensure_dispatcher_objectoutput whereraw_handle_id == ptr(the recipe-input pointer). -
Cross-tid usage heuristic (Phase C+21 — indirect). Any SID referenced via
handle.createORwait.beginon two or more distinct guest tids in EITHER engine is treated as shared-global.
The cross-tid heuristic exists because canary's
EmitHandleCreateSharedGlobal (event_log.cc:435) emits the SID
computed from the dispatcher VA but stashes
object->handle() (a handle-table slot in the 0xF8xxxxxx
region) as raw_handle_id. Those two values DIFFER, so canary's
shared-global handle.create events are NOT recipe-recognizable
from their payload alone. Multi-tid SID usage is a robust
observational signal: per-thread SIDs by construction stay on the
single creating tid (their hash inputs include creating_tid),
so any cross-tid SID usage indicates a process-global dispatcher.
Risk of over-absorption (and why it's bounded)
The cross-tid heuristic could in principle mis-classify a per-thread
SID that one thread creates and another thread waits on — a
legitimate cross-thread synchronization pattern. The floating-absorb,
however, only fires on a kind mismatch at the canonical position.
Per-thread waits that match strictly on both sides advance normally
without any absorb. The heuristic only loosens alignment when one
side is missing a handle.create or wait.begin — exactly the
scheduling-jitter window the C+21 fix targets.
Diff-tool report changes
The summary table's floating_skipped (c/o) column is split into
two columns:
floating_create (c/o)— C+18handle.createabsorptions.floating_wait (c/o)— C+21wait.beginabsorptions.
Both per-side and observation-only — counts may legitimately be non-zero in a clean run.
Backward compatibility
- Wire format unchanged.
schema_versionis still1. - Pre-C+21 event logs (no
wait.beginevents that reference shared-global SIDs) trigger no new behavior — the wait absorption branches are inert. - The C+18 floating-create logic is unchanged; the C+21 fix is strictly additive.
- Engine source is UNCHANGED in C+21 — the fix is in the diff tool only.
contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18)
Motivation
The 104,607 cap is canary's tid=6 contending on a CS while ours's tid=1
fast-paths through the same call (Phase C+22). Schedules diverge for
host-OS reasons, so neither engine is "wrong" — but matched-prefix
stalls. Phase D's H' approach makes ours's rtl_enter_critical_section
replay canary's contention by consulting a per-call manifest built
from canary's contention trace.
Stage 1 (this section) introduces the canary-side emitter for that
manifest: a new event kind contention.observed that fires from
RtlEnterCriticalSection_entry (xboxkrnl_rtl.cc:596-633) just before
the call falls through to xeKeWaitForSingleObject after spin-loop
exhaustion. Cvar-gated (kernel_emit_contention, default false) so
default canary behavior is byte-identical.
Event shape
{
"schema_version": 1,
"engine": "canary",
"kind": "contention.observed",
"tid": <guest tid of caller>,
"tid_event_idx": <per-tid ordinal — consumes one slot>,
"guest_cycle": 0,
"host_ns": <emit timestamp>,
"deterministic": true,
"payload": {
"cs_ptr": "0xHHHHHHHH", // guest VA of the RTL_CRITICAL_SECTION
"site_sid": "HHHHHHHHHHHHHHHH", // shared-global SID (see below)
"contended": true // always true at v1.4 (uncontended is implicit)
}
}
site_sid is computed via the C+18 shared-global SID recipe:
site_sid = FNV-1a-64 over
( kSharedGlobalSidMarker [u32 LE] // 0xC01AB005
, 0 [u32 LE] // creating_tid (unused)
, cs_ptr as u64 [u64 LE] // pointer-as-idx
, kObjCriticalSection [u32 LE] // 0x0C, new in v1.4
)
Both engines compute the same SID for the same CS pointer. The marker
constant kObjCriticalSection = 0x0C is the new ObjectType value
introduced for this kind; it does NOT correspond to a real XObject
(CS lives as a guest-memory struct, not a handle-tabled object).
When emitted (canary)
In RtlEnterCriticalSection_entry:
- Recursive-lock fast path (already own lock) → NO emit (not contention).
- Spin-loop succeeds (
atomic_casflipslock_countfrom -1 → 0) → NO emit (fast acquire). - Spin-loop exhausted AND
atomic_inc(&cs->lock_count) != 0→ EMIT withcontended=true, thenxeKeWaitForSingleObject. - Spin-loop exhausted AND
atomic_inc(...) == 0(CS became free between spin and inc) → NO emit (we won the race after spin).
The emit point sits between atomic_inc's positive result and the
xeKeWaitForSingleObject call, so the new event always precedes the
existing wait.begin event in the per-tid ordinal.
When emitted (ours, Stage 3 — pending)
Stage 3 will add a symmetric emit in rtl_enter_critical_section
(xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946) at the
forced-park branch driven by the manifest. This keeps per-tid ordinals
aligned across engines after replay.
Diff-tool treatment (Stage 4 — pending)
contention.observed will be added to ENGINE_LOCAL_KINDS in
diff_events.py: the per-tid pointer advances past these events on
either side without comparison. This keeps matched-prefix counts
unchanged when ONE side emits the event (Stage 1's canary-only world)
or when BOTH emit at the same ordinal (Stage 3's parity world).
Cvar default + byte-identity
kernel_emit_contention=false by default. With cvar=false, the helper
phase_a::EmitContentionObserved short-circuits at the cvar check
before any IsEnabled() lookup. The pre-Stage-1 canary code path is
preserved byte-for-byte; cvar-OFF cold runs produce zero
contention.observed events (validated on the Stage 1 cold run:
0 occurrences in a 4.4 GB / 18.6 M event trace).
Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18)
Status
Band-aid. Explicit annotation: this absorber CROSSES the reading-error #23 boundary in spirit. It folds real guest control-flow divergence at the diff-tool layer. It exists because the underlying root cause — producer-throughput divergence under the cooperative-vs-preemptive scheduling mismatch (see Phase D forensics) — is explicitly out of scope for the H' plan: fixing it in ours's engine would require preempting the cooperative scheduler, which invalidates 23 phases of digest stability. The absorber is the practical compromise.
Trigger shape
The absorber fires ONLY at a kind mismatch of:
- canary[ic] =
import.callwithpayload.name == "RtlEnterCriticalSection" - ours[io] =
import.callwithpayload.name == "RtlLeaveCriticalSection"
For any other kind mismatch, the absorber is silent. This narrowness is intentional: real engine divergences appear in other shapes and must still surface.
Behavior
When the trigger pattern matches, canary's stream is scanned for one or
more balanced [Enter-block, Leave-block] pairs immediately following
the trigger position:
- An Enter-block is 3 consecutive events:
import.call RtlEnterCriticalSection → kernel.call RtlEnterCriticalSection → kernel.return RtlEnterCriticalSection. - A Leave-block is 3 consecutive events with
RtlLeaveCriticalSection.
The absorber consumes pairs greedily up to a cap of _NESTED_CS_PAIR_CAP = 32 pairs (empirically, Sylpheed's worst-case is ~10-15 pairs at the
104,607 cap). After consuming each pair, it checks whether canary's next
event has the SAME kind AND same payload.name as ours[io]. The first
convergence wins; canary's pointer is advanced past the absorbed pairs.
If no convergence is found within the cap, the absorber returns None and the divergence falls through to normal reporting.
Why this is safe (within #23's spirit)
- The absorption only happens when canary's stream re-aligns with ours's stream past the nested block. If it doesn't re-align, the real divergence is reported.
- The nested-block shape matches a specific PPC pattern: the consumer thread in canary acquires a CS, calls a helper that iterates a tree/registry, takes the nested-CS-enter path for each item, and releases the outer CS. Ours's tree is shorter so it skips this. The net effect on guest state is bounded: ours has fewer items processed in this iteration, but the EVENT stream past the absorption resumes the same logical operation.
- The Phase B
image_loaded_sha256is the foundational invariant. It's unaffected by this absorber (no engine source change).
Why this is NOT safe in the general sense
- Diverging downstream state IS lost: ours's tree has fewer entries than canary's after the absorbed block. Subsequent ours operations that touch the tree will behave differently. Other absorbers / fixes will be needed if those state-differences manifest later.
- A future engine bug that produces a spuriously nested Enter+Leave pair could be falsely absorbed. Mitigation: the absorber requires canary's post-block stream to re-align with ours's; spurious nested pairs without re-alignment fall through to normal divergence reporting.
Empirical result (Sylpheed 104,607 cap)
Pre-absorber (post-Stage-3+4): main matched-prefix = 104,607 (cap). Post-absorber: main matched-prefix = 105,046 (+439 events).
The next divergence is at idx 105,046 on VdInitializeEngines.return_value
(canary=1, ours=0) — an unrelated engine bug in the video subsystem,
NOT a recurrence of the cap pattern. Sister chains preserved
(11/32/4/41/16).
Tests
Three unit tests in test_diff_events.py:
test_nested_cs_cleanup_block_absorbed_when_convergent— folds one nested pairtest_nested_cs_cleanup_NOT_absorbed_when_followup_diverges— confirms re-alignment requirementtest_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup— negative case
sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21)
Motivation
AUDIT-069 Sessions 1-5 established that ours under-produces semaphore
releases by ~80% on the work-semaphore vs canary (99 vs 414 in S5,
refined in S6 to 83 vs 414 apples-to-apples on the work semaphore
alone). The measurement infrastructure was a one-off cvar
(audit_70_semaphore_release_watch, hand-built per-handle log lines)
plus an ours-side --lr-trace capture at the wrapper-entry PC. Future
AUDIT-070+ sessions and any general regression triage need this metric
to be diff-visible without bespoke cvars per investigation.
sema.release lifts the AUDIT-070 cvar's signal into the Phase A
schema as a symmetric event kind in both engines.
Event shape
{
"schema_version": 1,
"engine": "canary",
"kind": "sema.release",
"tid": <guest tid of caller>,
"tid_event_idx": <per-tid ordinal — consumes one slot>,
"guest_cycle": <PPC timebase>,
"host_ns": <emit timestamp>,
"deterministic": true,
"payload": {
"handle_semantic_id": "HHHHHHHHHHHHHHHH", // shared-global SID for the work-sem
"raw_handle_id": "0xHHHHHHHH", // engine-local
"release_count": 1, // games typically release 1
"previous_count": 0, // semaphore count BEFORE release
"caller_pc": "0xHHHHHHHH" // guest LR at release time
}
}
SID recipe
The work-semaphore in Sylpheed (canary handle 0xF800003C, ours
handle 0x1044) is a process-global dispatcher in the C+18 sense:
it lives in pre-allocated guest memory and is touched by multiple
guest threads (main, worker, cache-thread, other producers). Its
handle_semantic_id SHOULD use the shared-global recipe
(ComputeSharedGlobalSemanticId(dispatcher_ptr, kObjSemaphore=0x03))
so canary and ours produce the same SID for the same guest dispatcher.
Per-thread semaphores (rare in Sylpheed) MAY use the v1 per-thread
recipe; the diff tool does NOT compare SIDs for sema.release (the
kind is engine-local positionally — see below).
Why engine-local
Per AUDIT-069 H3 and S6's first-N=20 measurement, the cadence and
ordinal interleaving of releases between the worker, main, and
cache-thread are timing-dependent: the first 20 releases match
perfectly across engines, but worker tid diverges at canary ord=83
when the cache-thread's first release fires (which ours never
emits because ours's cache-thread wedges at sub_821CB030+0x1AC).
Strict positional alignment would always trip on this known
divergence.
sema.release is therefore in ENGINE_LOCAL_KINDS in the diff tool
(alongside contention.observed): both engines emit, but the diff
tool advances past these events on either side without alignment.
The count is surfaced in the report's "Counted engine-local
kinds" summary table (per-tid + total per engine) so cadence
regressions are diff-visible at-a-glance.
Emit points (planned, NOT yet wired)
- Canary: extend
audit_70_semaphore_release_watchto callphase_a::EmitSemaRelease(handle, count, prev_count)fromNtReleaseSemaphore_entry+xeKeReleaseSemaphore. Cvar gating remains the existingaudit_70_semaphore_release_watch(or a newphase_a_event_log_sema_releases=falsefor finer control). - Ours: emit
sema.releasefromnt_release_semaphoreincrates/xenia-kernel/src/exports.rsand fromKSemaphore::release(kernel-mode equivalent). Default-off via a runtime flag; default cold runs must remain digest-stable.
Both engines MUST emit at handler entry (not wrapper-internal) so the
event count corresponds 1:1 to guest NtReleaseSemaphore invocations,
matching the canary cvar's existing semantics.
Status
- Diff tool: support landed (this session, v1.6).
sema.releaseinENGINE_LOCAL_KINDS+COUNTED_ENGINE_LOCAL_KINDS; counts surfaced in report summary; 3 new tests intest_diff_events.py. - Canary emit: NOT YET WIRED. Planned for AUDIT-070+ when the
root cause investigation requires it. Existing cvar
audit_70_semaphore_release_watchcontinues to emit non-schema log lines (used by S5/S6 captures). - Ours emit: NOT YET WIRED. See above.
Backward compatibility
- Wire format unchanged.
schema_versionis still1. - Pre-v1.6 event logs (no
sema.releaseevents) trigger no new behavior — the engine-local skip branches are inert; the "Counted engine-local kinds" report section is suppressed when no counted-kind events exist. - Diff tool changes are purely additive: existing engine binaries diff identically pre- and post-v1.6.
Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26)
Motivation
C+2 (ALLOCATOR_RETURN_FNS) canonicalizes kernel.return.return_value
for a known set of host-allocator-returning exports
(MmAllocatePhysicalMemoryEx, RtlAllocateHeap, …). That covers
the case where the allocated VA appears as the function's return
value. But the same allocator-drift class (AUDIT-043 ε:
canary's BC physical heap 0xBCxxxxxx vs ours's unified user heap
0x4xxxxxxx) ALSO surfaces inside typed event payloads of
non-allocator exports — most notably the thread.create.ctx_ptr
field, which holds the host-allocated TLS/context block that
ExCreateThread passes to the new guest thread's r3.
Empirical surface (C+22 cold-vs-cold idx 105,128 on the Sylpheed
audio-stack worker ExCreateThread(entry_pc=0x824cd458)):
| field | canary | ours |
|---|---|---|
ctx_ptr |
0xbe56bb3c (BC physical heap) |
0x42453b3c (unified user heap) |
entry_pc |
0x824cd458 |
0x824cd458 (bit-identical — game code) |
priority |
0 |
0 |
affinity |
4 |
4 |
stack_size |
32768 |
32768 |
suspended |
false |
false |
The C+2 ALLOCATOR_RETURN_FNS mechanism doesn't help here because
ExCreateThread's return value is the new thread's handle
(canary's 0xF8xxxxxx vs ours's 0x4, 0x8, …), already covered
by handle_semantic_id skip-policy. The host-heap-allocated
context block is a side-channel field inside the
thread.create event payload.
The fix
HOST_HEAP_PAYLOAD_FIELDS_BY_KIND maps event kind → tuple of
payload field names. Each listed field's value (expected
0x-prefixed hex string) is rewritten to a per-(tid, kind, field)
ordinal sentinel <HOSTHEAP_<KIND>_<FIELD>_<ORDINAL>> BEFORE
payload comparison. The mechanism mirrors
canonicalize_allocator_returns exactly, restricted to typed
payload fields.
Initial set (v1.7):
HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = {
"thread.create": ("ctx_ptr",),
}
Strict-field preservation
For each canonicalized event kind, the strict fields (game-visible
attributes that MUST match across engines) are untouched. For
thread.create these are:
entry_pc— guest VA of the new thread's entry function, bit- identical in both engines because both engines load the same XEX and the entry comes from guest code.priority,affinity,stack_size,suspended— game-visible thread attributes the guest passes toExCreateThread.
Skip-policy fields (handle_semantic_id, parent_tid) continue
to be skipped via SKIP_PAYLOAD_FIELDS_BY_KIND (unchanged from
C+15-α — see "Diff-tool field-comparison rules" above).
Why parent_tid does NOT need new canonicalization
Per the C+15-α skip-policy table, parent_tid is already in
SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]. The diff tool
pairs guest TIDs at the chain level (--tid-map or
auto_tid_map), and the per-event parent_tid is engine-local
(canary tid=6 vs ours tid=1 for the same logical "main thread"
chain). Skipping is sufficient — no ordinal sentinel needed.
Could a future schema v2 canonicalize parent_tid via the tid
map? Yes, but it would surface mismatches as a map gap rather
than as a clearer per-tid alignment failure that's already
visible at chain boundaries. The v1.x skip-policy is the
simpler choice; tests pin the existing behavior so it doesn't
regress.
Ordinal-count contract
As with ALLOCATOR_RETURN_FNS: if one engine emits MORE
thread.create events on a given tid than the other, ordinals
drift and the next typed event surfaces a divergence against
whatever the other side has at that position. Ordinal-count
mismatch IS a behavioral divergence — the canonicalization
preserves divergence detection, only collapsing
host-allocator-VA noise.
Defensive value handling
If ctx_ptr is non-string (None, int, missing) — pre-C+22
event logs whose emitter omits the field — the canonicalizer
leaves it untouched and does NOT consume an ordinal. The next
string-typed value gets ordinal 0. This keeps pre-v1.7 logs
diffable without forcing an emitter retrofit.
Backward compatibility
- Wire format unchanged.
schema_versionis still1. - Pre-C+22 event logs whose
thread.create.ctx_ptrhappens to bit-match (e.g. static-allocator addresses like0x828F3D08that BOTH engines use for the pre-XEX kernel-state ctxs) still match strictly via the ordinal sentinel — they get the same ordinal in both engines. - The
--no-canonicalize-host-heap-fieldsCLI flag disables the pass (reverts to raw-VA comparison), mirroring the existing--no-canonicalize-allocators. Used by gate tests and investigation rerun. - Engine source is UNCHANGED in C+22 — the fix is in the diff tool only.
Extension shape
The map shape kind -> (field, …) is intentionally minimal:
each entry is one event kind plus the fields on it that hold
host-heap VAs. Future entries could include e.g.
thread.create.tls_ptr (if such a field is added to the schema)
or a hypothetical vfs.mmap.host_ptr. Strict-field policy
remains: any field NOT listed here is compared bit-identically.
Forward compatibility
Phase A's original schema-v1 declared 13 sections (16 distinct kind strings);
Phase A wired 4 of them. Phase C+15-α wired an additional 5 (handle.create,
handle.destroy, thread.create, thread.exit, wait.begin). wait.end,
thread.suspend/resume, mem.write, vfs.open/read/close remain declared
but unwired; adding them is additive surface area at schema v1.1+.
A future schema v2 may break wire format (e.g. canonical SIDs, structured args).
Both engines pin schema_version = 1 in this phase; the diff tool refuses to
mix v1 and v2 inputs.