Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

36 KiB

Raw Blame History

Phase A — Event Log Schema v1

Status: frozen for Phase A and Phase B. Adding a new event kind requires a schema_version bump and a coordinated update in both engines + the diff tool.

Wire format

JSONL — one JSON object per line, UTF-8, \n-terminated. Both engines emit the same byte format.

The first line of every event-log file MUST be a schema_version event:

{"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,"guest_cycle":0,"host_ns":0,"deterministic":true,"payload":{"version":1,"emitter_build":"<commit-or-build-id>"}}

The diff tool refuses to parse a file whose first event is not schema_version with version 1.

Common fields (every event)

Field	Type	Notes
`schema_version`	int	always `1` in this phase
`engine`	string	`"canary"` or `"ours"`
`kind`	string	one of the v1 kinds below
`tid`	int	guest thread id of the calling thread (host TID never logged)
`tid_event_idx`	int	per-tid monotonic, starts at 0 — the diff key
`guest_cycle`	int	per-engine monotonic guest-instruction count; `0` if the engine cannot supply one (see "Cycle source" below). NOT used by the diff tool for correctness — `tid_event_idx` is the canonical key.
`host_ns`	int	host monotonic-clock ns since process start; debug only, never compared by diff
`deterministic`	bool	`false` if any payload field is derived from host time / raw allocator address / RNG / etc. Diff tool skip-compares non-deterministic fields.
`payload`	object	kind-specific (see below)

Cycle source notes

canary: the PPC tb (timebase) register can be read from the PPCContext passed into shim handlers. If a hook is on a path that does not have access to a PPCContext (e.g. a host-side handle-table destructor), the emitter MUST set guest_cycle = 0 and leave deterministic = false on the payload-side metadata. The diff tool ignores guest_cycle for ordering — tid_event_idx is canonical.
ours: scheduler.thread(current_ref()).ctx.timebase (already maintained per guest thread).

Per-tid event index

Both engines maintain a per-tid monotonic counter starting at 0. The counter is bumped before the event is serialized, so the first event for tid N has tid_event_idx = 0.

The schema_version event is special: it is emitted by the writer thread (typically the boot thread before any guest code has run) with tid = 0 and tid_event_idx = 0. The actual guest thread 0 does not exist; the diff tool treats tid = 0 as the schema header only.

Handle semantic ID

Canary and ours produce guest handles in different ranges (canary: 0xF8xxxxxx region; ours: bump-allocated 0x4, 0x8, 0xC, …). Raw handle IDs are unsuitable as a cross-engine identity. Instead, both engines compute a stable handle semantic ID at handle creation time using FNV-1a 64-bit over a fixed-format byte string. FNV-1a is used (not SHA256) because both engines can implement it in <10 lines with no dependency, and the diff tool only needs a deterministic identity hash — not a crypto property.

input_bytes = le_u32(create_site_pc) ‖ le_u32(creating_tid) ‖ le_u64(tid_event_idx_at_creation) ‖ le_u32(object_type)
hash = 0xCBF29CE484222325
for each byte b in input_bytes:
    hash = (hash XOR b) * 0x100000001B3   mod 2^64
handle_semantic_id = format("{:016x}", hash)

Both engines MUST emit the lowercase 16-hex-char form. The create_site_pc is the guest PC at the call site of the kernel call that created the handle: in canary, PPCContext::lr - 4 (the bl to the import stub); in ours, the equivalent return address from the syscall dispatcher.

Object type codes (v1 — both engines agree):

Code	Type
`0x00`	Unknown
`0x01`	Event
`0x02`	Mutant
`0x03`	Semaphore
`0x04`	Timer
`0x05`	Thread
`0x06`	File
`0x07`	IoCompletion
`0x08`	Module
`0x09`	EnumState
`0x0A`	Section
`0x0B`	Notification

All subsequent events that reference a handle emit BOTH handle_semantic_id (the diff key) and raw_handle_id (engine-local, never compared).

Event kinds (v1)

`schema_version`

Header event. payload = {"version": 1, "emitter_build": "<string>"}.

`thread.create`

Emitted by the parent thread at the kernel call that creates the new thread.

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "parent_tid": 1,
  "entry_pc": "0x82001234",
  "ctx_ptr": "0xbce25340",
  "priority": 0,
  "affinity": 1,
  "stack_size": 65536,
  "suspended": false
}

`thread.exit`

Emitted by the exiting thread (last event before tid disappears).

"payload": {"exit_code": 0}

`thread.suspend` / `thread.resume`

"payload": {"target_tid": 13}

`kernel.call`

Emit at handler entry, before any side effects.

"payload": {
  "name": "NtCreateFile",
  "args": {"file_handle_ptr": "0x70000010", "desired_access": "0x80100080", "obj_attr_ptr": "0x70000020", ...},
  "args_resolved": {"path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik"}
}

Numeric args use 0x-prefixed hex strings if pointer-typed; ints stay as ints.
args_resolved is a best-effort dereference (strings, struct dumps, buffer summaries). Optional.

`kernel.return`

Emit at handler exit, after all side effects committed.

"payload": {
  "name": "NtCreateFile",
  "return_value": 0,
  "status": "0x00000000",
  "side_effects": [
    {"kind": "handle.create", "handle_semantic_id": "...", "object_type": 6, "raw_handle_id": "0x40"}
  ]
}

The side_effects array MAY duplicate events also emitted as standalone (handle.create). The diff tool treats both as authoritative; duplicates do not cause divergence.

`handle.create`

For host-side creates not tied to a kernel call (rare).

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "object_type": 1,
  "object_name": null,
  "raw_handle_id": "0xf8000048"
}

`handle.destroy`

"payload": {
  "handle_semantic_id": "0123456789abcdef",
  "raw_handle_id": "0xf8000048",
  "prior_refcount": 1
}

`wait.begin`

"payload": {
  "handles_semantic_ids": ["0123...", "abcd..."],
  "timeout_ns": -1,
  "alertable": false,
  "wait_type": "any"
}

timeout_ns = -1 means INFINITE. wait_type is "any" or "all".

`wait.end`

"payload": {
  "status": "0x00000000",
  "woken_by_semantic_id": "0123456789abcdef",
  "wait_duration_cycles": 12345
}

wait_duration_cycles is deterministic = false (host scheduling affects it). woken_by_semantic_id is null on timeout / alerted.

`mem.write`

OPT-IN — gated by a separate cvar (phase_a_event_log_mem_writes, default false). In Phase A this kind is reserved; emitters MAY ship a TODO stub. Schema:

"payload": {
  "guest_addr": "0x82000000",
  "value": "0x12345678",
  "size": 4,
  "source": "guest_jit"
}

`vfs.open` / `vfs.read` / `vfs.close`

File-IO events, separate from kernel.call so the diff tool can match on canonical path:

"payload": {"canonical_path": "\\Device\\Cdrom0\\dat\\movie\\opening.bik", "raw_handle_id": "0x40", "handle_semantic_id": "..."}

`import.call`

Emitted at the syscall dispatcher (ours) or the import-stub JIT trap (canary), one per imported function invocation, before the implementing kernel.call.

"payload": {
  "module": "xboxkrnl.exe",
  "ord": 0x101,
  "name": "NtCreateFile"
}

Diff-tool field-comparison rules

Field	Rule
`engine`	skipped (always differs)
`host_ns`	skipped (host-clock)
`guest_cycle`	skipped (engines disagree on absolute count; diff uses `tid_event_idx`)
`raw_handle_id`	skipped (engines use different handle namespaces)
`handle_semantic_id`	C+15-α: skipped (engine-local — see below)
`handles_semantic_ids` (wait.begin)	C+15-α: skipped (same reason)
`parent_tid` (thread.create)	C+15-α: skipped (engine-local guest tids)
`ctx_ptr` (thread.create)	C+22 v1.7: per-(tid, kind, field) ordinal sentinel (`<HOSTHEAP_thread.create_ctx_ptr_N>`) — host-heap-derived VA, AUDIT-043 ε class
`woken_by_semantic_id` (wait.end)	C+15-α: skipped (engine-local SID)
`deterministic` (event-level field)	skipped (metadata)
Any payload field listed under a non-deterministic kind	skipped where flagged
All other payload fields	strict equality

Phase C+15-α note on `handle_semantic_id`

The SID computation includes creating_tid as input, but guest TIDs differ between engines (canary's tid=6 maps to ours's tid=1 on the main chain). Both engines compute SIDs using their own local tids, so the same logical handle gets two different SIDs across engines. The diff tool skip-compares SID fields and relies on tid_event_idx + object_type for alignment.

A future schema v2 could canonicalize SIDs via the diff tool's tid map and restore strict comparison. For v1.1 the simpler skip-policy suffices.

Shared-global SIDs (v1.2 — added in Phase C+18)

A subset of guest kernel dispatcher objects (KEVENT, KSEMAPHORE, KTIMER, KMUTANT) are process-global: they live in statically-initialized or pre-allocated guest memory and are touched by MULTIPLE guest threads during boot. Examples include the XAudio voice-volume change-mask semaphore at 0x828a3230 in Sylpheed.

Canary's XObject::GetNativeObject (src/xenia/kernel/xobject.cc:397-483) and ours's ensure_dispatcher_object (crates/xenia-kernel/src/exports.rs:4363) lazy-wrap these dispatchers on first guest-thread touch: the first KeWait* invocation that passes the raw kernel-object pointer synthesizes the XObject wrapper, stamps the X_DISPATCH_HEADER with the kXObjSignature marker ('X','E','N','\0' = 0x58454E00), stashes the handle, and emits handle.create. Subsequent touches find the marker and short-circuit without emit (per-pointer idempotent).

The first-toucher race

Which guest thread wins the "first toucher" race is timing-dependent:

Canary and ours have different host schedulers, JIT throughput, and guest-thread bootstrap ordering.
Even within the same engine across runs the first-toucher can differ — but each engine produces a deterministic per-run total ordering, so cold-vs-cold reproducibility holds.

The per-thread SID recipe semantic_id(create_site_pc, creating_tid, tid_event_idx_at_creation, object_type) (v1) depends on BOTH creating_tid and tid_event_idx_at_creation, so:

Same dispatcher → DIFFERENT SIDs in each engine (race-dependent).
handle.create for the same object lands on different per-tid streams in canary vs ours.

The C+17 fix made ours emit handle.create for these synthesized shadows, but the C+17 D-NEW-3 regression on tid=15→10 was exactly the first-toucher race: ours's tid=10 was the first toucher locally; canary's tid=15 was NOT the first toucher in its run — some other canary tid had already adopted 0x828a3230. ours's tid=10 emitted an "extra" handle.create that canary's tid=15 lacked, and the diff tool flagged a kind mismatch at idx=2.

The C+18 fix: deterministic SID recipe

Process-global dispatchers use a second SID recipe that is scheduling-invariant. Both engines now use:

SHARED_GLOBAL_SID_MARKER = 0xC01AB005  (fixed sentinel, both engines)

input_bytes =
    le_u32(SHARED_GLOBAL_SID_MARKER)   // 4 bytes — "create_site_pc" slot
  ‖ le_u32(0)                          // 4 bytes — "creating_tid" slot
  ‖ le_u64(pointer)                    // 8 bytes — "tid_event_idx" slot
  ‖ le_u32(object_type)                // 4 bytes

hash = FNV-1a-64(input_bytes)
shared_global_sid = format("{:016x}", hash)

The marker 0xC01AB005 is outside any plausible guest-PC range (PPC text 0x82000000-0x82FFFFFF; XEX header 0x3001xxxx; heap 0x4xxxxxxx), so it can never collide with a regular per-thread SID (which uses a real guest PC as create_site_pc).

Both engines compute the SAME SID for the same dispatcher pointer regardless of:

which guest thread is the first toucher,
the tid_event_idx_at_creation,
the per-engine scheduling order.

Which call sites use which recipe

Call site	SID recipe
`KernelState::alloc_handle_for` (ours)	per-thread
`ObjectTable::AddHandle` direct (canary)	per-thread
`ensure_dispatcher_object` (ours)	shared-global
`XObject::GetNativeObject` synthesized (canary)	shared-global

Regular per-thread handle.create events (file open, thread create, named-event create, etc.) keep the v1 per-thread recipe. The shared-global recipe is restricted to lazy-wrap synthesis.

Diff tool: cross-tid floating `handle.create` matching

The diff tool pre-pass collects all shared-global SIDs in either engine's stream. A handle.create event is detected as shared-global by recomputing the deterministic SID from its (raw_handle_id, object_type) payload and comparing against the event's handle_semantic_id. Regular per-thread SIDs cannot match this check by construction.

When per-tid alignment finds a kind mismatch and one side has a shared-global handle.create whose SID is in the floating set:

The diff tool advances ONLY that side's stream pointer past the floating event.
Re-compare at the same canonical position.

The diff report's summary table shows a floating_skipped (c/o) column for visibility — counts of absorbed events per side.

Index relaxation

The C+18 fix relaxes the legacy diff-tool rule that requires canary.tid_event_idx == ours.tid_event_idx for matching events. With floating absorption, the per-tid indices can drift by 1 between the two sides — but the kind and payload comparisons remain strict. The raw indices are still preserved on the events themselves (useful for debugging and report context).

Backward compatibility

Wire format unchanged. schema_version is still 1.
Pre-C+18 event logs (no shared-global SIDs in the stream) trigger the legacy code path automatically — the floating set is empty.
The marker constant 0xC01AB005 MUST be exactly this value in both engines and the diff tool. Tests in both engines plus tools/diff-events/test_diff_events.py lock it in.

Wait-begin floating absorb (v1.3 — added in Phase C+21)

Motivation

Canary's RtlEnterCriticalSection (and its symmetric counterparts — KeWaitForSingleObject invoked on a process-global dispatcher, mutex/semaphore contended-acquire paths) emits wait.begin only on the contended slow path. The fast path (uncontended atomic-CAS, or recursive bump) emits NO wait.begin and only the kernel.call → kernel.return pair. Which path is taken depends on whether ANOTHER guest thread is currently holding the dispatcher when the wait is attempted — i.e. it is host-scheduler-driven, varying across cold runs of the same engine.

Reading-error class #32 (documented in C+20's investigation.md) captures this: cross-checking 3 fresh canary cold runs at canary tid=6 idx 104,606 showed:

jitter-1: wait.begin sid=75ae880ec432eb36 (contended)
jitter-2: kernel.return (fast — matches ours)
jitter-3: offset-shifted wait.begin at a different idx with a different SID

The matched-prefix metric is unreliable inside such regions if the diff tool treats wait.begin events as strictly positional.

The fix

A wait.begin event is floating if at least one of its payload.handles_semantic_ids references a shared-global SID (see §"Shared-global SIDs"). During the per-tid two-pointer walk:

If one side has a floating wait.begin and the other has a different kind at the same canonical position, advance ONLY the wait.begin side's pointer and re-compare.

wait_type=all waits are floating as long as ANY single handle in the set is shared-global — the entire wait's blocking behavior is timing-dependent if even one of its handles is on a process-global dispatcher.

Shared-global SID detection (extended in C+21)

The diff tool's collect_shared_global_sids pre-pass now unions TWO sources:

Recipe-matching handle.create events (Phase C+18 — direct). This catches ours's ensure_dispatcher_object output where raw_handle_id == ptr (the recipe-input pointer).
Cross-tid usage heuristic (Phase C+21 — indirect). Any SID referenced via handle.create OR wait.begin on two or more distinct guest tids in EITHER engine is treated as shared-global.

The cross-tid heuristic exists because canary's EmitHandleCreateSharedGlobal (event_log.cc:435) emits the SID computed from the dispatcher VA but stashes object->handle() (a handle-table slot in the 0xF8xxxxxx region) as raw_handle_id. Those two values DIFFER, so canary's shared-global handle.create events are NOT recipe-recognizable from their payload alone. Multi-tid SID usage is a robust observational signal: per-thread SIDs by construction stay on the single creating tid (their hash inputs include creating_tid), so any cross-tid SID usage indicates a process-global dispatcher.

Risk of over-absorption (and why it's bounded)

The cross-tid heuristic could in principle mis-classify a per-thread SID that one thread creates and another thread waits on — a legitimate cross-thread synchronization pattern. The floating-absorb, however, only fires on a kind mismatch at the canonical position. Per-thread waits that match strictly on both sides advance normally without any absorb. The heuristic only loosens alignment when one side is missing a handle.create or wait.begin — exactly the scheduling-jitter window the C+21 fix targets.

Diff-tool report changes

The summary table's floating_skipped (c/o) column is split into two columns:

floating_create (c/o) — C+18 handle.create absorptions.
floating_wait (c/o) — C+21 wait.begin absorptions.

Both per-side and observation-only — counts may legitimately be non-zero in a clean run.

Backward compatibility

Wire format unchanged. schema_version is still 1.
Pre-C+21 event logs (no wait.begin events that reference shared-global SIDs) trigger no new behavior — the wait absorption branches are inert.
The C+18 floating-create logic is unchanged; the C+21 fix is strictly additive.
Engine source is UNCHANGED in C+21 — the fix is in the diff tool only.

contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18)

Motivation

The 104,607 cap is canary's tid=6 contending on a CS while ours's tid=1 fast-paths through the same call (Phase C+22). Schedules diverge for host-OS reasons, so neither engine is "wrong" — but matched-prefix stalls. Phase D's H' approach makes ours's rtl_enter_critical_section replay canary's contention by consulting a per-call manifest built from canary's contention trace.

Stage 1 (this section) introduces the canary-side emitter for that manifest: a new event kind contention.observed that fires from RtlEnterCriticalSection_entry (xboxkrnl_rtl.cc:596-633) just before the call falls through to xeKeWaitForSingleObject after spin-loop exhaustion. Cvar-gated (kernel_emit_contention, default false) so default canary behavior is byte-identical.

Event shape

{
  "schema_version": 1,
  "engine": "canary",
  "kind": "contention.observed",
  "tid": <guest tid of caller>,
  "tid_event_idx": <per-tid ordinal — consumes one slot>,
  "guest_cycle": 0,
  "host_ns": <emit timestamp>,
  "deterministic": true,
  "payload": {
    "cs_ptr": "0xHHHHHHHH",        // guest VA of the RTL_CRITICAL_SECTION
    "site_sid": "HHHHHHHHHHHHHHHH", // shared-global SID (see below)
    "contended": true              // always true at v1.4 (uncontended is implicit)
  }
}

site_sid is computed via the C+18 shared-global SID recipe:

site_sid = FNV-1a-64 over
  ( kSharedGlobalSidMarker [u32 LE]    // 0xC01AB005
  , 0 [u32 LE]                          // creating_tid (unused)
  , cs_ptr as u64 [u64 LE]              // pointer-as-idx
  , kObjCriticalSection [u32 LE]        // 0x0C, new in v1.4
  )

Both engines compute the same SID for the same CS pointer. The marker constant kObjCriticalSection = 0x0C is the new ObjectType value introduced for this kind; it does NOT correspond to a real XObject (CS lives as a guest-memory struct, not a handle-tabled object).

When emitted (canary)

In RtlEnterCriticalSection_entry:

Recursive-lock fast path (already own lock) → NO emit (not contention).
Spin-loop succeeds (atomic_cas flips lock_count from -1 → 0) → NO emit (fast acquire).
Spin-loop exhausted AND atomic_inc(&cs->lock_count) != 0 → EMIT with contended=true, then xeKeWaitForSingleObject.
Spin-loop exhausted AND atomic_inc(...) == 0 (CS became free between spin and inc) → NO emit (we won the race after spin).

The emit point sits between atomic_inc's positive result and the xeKeWaitForSingleObject call, so the new event always precedes the existing wait.begin event in the per-tid ordinal.

When emitted (ours, Stage 3 — pending)

Stage 3 will add a symmetric emit in rtl_enter_critical_section (xenia-rs/crates/xenia-kernel/src/exports.rs:2886-2946) at the forced-park branch driven by the manifest. This keeps per-tid ordinals aligned across engines after replay.

Diff-tool treatment (Stage 4 — pending)

contention.observed will be added to ENGINE_LOCAL_KINDS in diff_events.py: the per-tid pointer advances past these events on either side without comparison. This keeps matched-prefix counts unchanged when ONE side emits the event (Stage 1's canary-only world) or when BOTH emit at the same ordinal (Stage 3's parity world).

Cvar default + byte-identity

kernel_emit_contention=false by default. With cvar=false, the helper phase_a::EmitContentionObserved short-circuits at the cvar check before any IsEnabled() lookup. The pre-Stage-1 canary code path is preserved byte-for-byte; cvar-OFF cold runs produce zero contention.observed events (validated on the Stage 1 cold run: 0 occurrences in a 4.4 GB / 18.6 M event trace).

Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18)

Status

Band-aid. Explicit annotation: this absorber CROSSES the reading-error #23 boundary in spirit. It folds real guest control-flow divergence at the diff-tool layer. It exists because the underlying root cause — producer-throughput divergence under the cooperative-vs-preemptive scheduling mismatch (see Phase D forensics) — is explicitly out of scope for the H' plan: fixing it in ours's engine would require preempting the cooperative scheduler, which invalidates 23 phases of digest stability. The absorber is the practical compromise.

Trigger shape

The absorber fires ONLY at a kind mismatch of:

canary[ic] = import.call with payload.name == "RtlEnterCriticalSection"
ours[io] = import.call with payload.name == "RtlLeaveCriticalSection"

For any other kind mismatch, the absorber is silent. This narrowness is intentional: real engine divergences appear in other shapes and must still surface.

Behavior

When the trigger pattern matches, canary's stream is scanned for one or more balanced [Enter-block, Leave-block] pairs immediately following the trigger position:

An Enter-block is 3 consecutive events: import.call RtlEnterCriticalSection → kernel.call RtlEnterCriticalSection → kernel.return RtlEnterCriticalSection.
A Leave-block is 3 consecutive events with RtlLeaveCriticalSection.

The absorber consumes pairs greedily up to a cap of _NESTED_CS_PAIR_CAP = 32 pairs (empirically, Sylpheed's worst-case is ~10-15 pairs at the 104,607 cap). After consuming each pair, it checks whether canary's next event has the SAME kind AND same payload.name as ours[io]. The first convergence wins; canary's pointer is advanced past the absorbed pairs.

If no convergence is found within the cap, the absorber returns None and the divergence falls through to normal reporting.

Why this is safe (within #23's spirit)

The absorption only happens when canary's stream re-aligns with ours's stream past the nested block. If it doesn't re-align, the real divergence is reported.
The nested-block shape matches a specific PPC pattern: the consumer thread in canary acquires a CS, calls a helper that iterates a tree/registry, takes the nested-CS-enter path for each item, and releases the outer CS. Ours's tree is shorter so it skips this. The net effect on guest state is bounded: ours has fewer items processed in this iteration, but the EVENT stream past the absorption resumes the same logical operation.
The Phase B image_loaded_sha256 is the foundational invariant. It's unaffected by this absorber (no engine source change).

Why this is NOT safe in the general sense

Diverging downstream state IS lost: ours's tree has fewer entries than canary's after the absorbed block. Subsequent ours operations that touch the tree will behave differently. Other absorbers / fixes will be needed if those state-differences manifest later.
A future engine bug that produces a spuriously nested Enter+Leave pair could be falsely absorbed. Mitigation: the absorber requires canary's post-block stream to re-align with ours's; spurious nested pairs without re-alignment fall through to normal divergence reporting.

Empirical result (Sylpheed 104,607 cap)

Pre-absorber (post-Stage-3+4): main matched-prefix = 104,607 (cap). Post-absorber: main matched-prefix = 105,046 (+439 events).

The next divergence is at idx 105,046 on VdInitializeEngines.return_value (canary=1, ours=0) — an unrelated engine bug in the video subsystem, NOT a recurrence of the cap pattern. Sister chains preserved (11/32/4/41/16).

Tests

Three unit tests in test_diff_events.py:

test_nested_cs_cleanup_block_absorbed_when_convergent — folds one nested pair
test_nested_cs_cleanup_NOT_absorbed_when_followup_diverges — confirms re-alignment requirement
test_nested_cs_cleanup_NOT_absorbed_when_canary_has_no_followup — negative case

sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21)

Motivation

AUDIT-069 Sessions 1-5 established that ours under-produces semaphore releases by ~80% on the work-semaphore vs canary (99 vs 414 in S5, refined in S6 to 83 vs 414 apples-to-apples on the work semaphore alone). The measurement infrastructure was a one-off cvar (audit_70_semaphore_release_watch, hand-built per-handle log lines) plus an ours-side --lr-trace capture at the wrapper-entry PC. Future AUDIT-070+ sessions and any general regression triage need this metric to be diff-visible without bespoke cvars per investigation.

sema.release lifts the AUDIT-070 cvar's signal into the Phase A schema as a symmetric event kind in both engines.

Event shape

{
  "schema_version": 1,
  "engine": "canary",
  "kind": "sema.release",
  "tid": <guest tid of caller>,
  "tid_event_idx": <per-tid ordinal — consumes one slot>,
  "guest_cycle": <PPC timebase>,
  "host_ns": <emit timestamp>,
  "deterministic": true,
  "payload": {
    "handle_semantic_id": "HHHHHHHHHHHHHHHH",  // shared-global SID for the work-sem
    "raw_handle_id": "0xHHHHHHHH",             // engine-local
    "release_count": 1,                         // games typically release 1
    "previous_count": 0,                        // semaphore count BEFORE release
    "caller_pc": "0xHHHHHHHH"                   // guest LR at release time
  }
}

SID recipe

The work-semaphore in Sylpheed (canary handle 0xF800003C, ours handle 0x1044) is a process-global dispatcher in the C+18 sense: it lives in pre-allocated guest memory and is touched by multiple guest threads (main, worker, cache-thread, other producers). Its handle_semantic_id SHOULD use the shared-global recipe (ComputeSharedGlobalSemanticId(dispatcher_ptr, kObjSemaphore=0x03)) so canary and ours produce the same SID for the same guest dispatcher.

Per-thread semaphores (rare in Sylpheed) MAY use the v1 per-thread recipe; the diff tool does NOT compare SIDs for sema.release (the kind is engine-local positionally — see below).

Why engine-local

Per AUDIT-069 H3 and S6's first-N=20 measurement, the cadence and ordinal interleaving of releases between the worker, main, and cache-thread are timing-dependent: the first 20 releases match perfectly across engines, but worker tid diverges at canary ord=83 when the cache-thread's first release fires (which ours never emits because ours's cache-thread wedges at sub_821CB030+0x1AC). Strict positional alignment would always trip on this known divergence.

sema.release is therefore in ENGINE_LOCAL_KINDS in the diff tool (alongside contention.observed): both engines emit, but the diff tool advances past these events on either side without alignment. The count is surfaced in the report's "Counted engine-local kinds" summary table (per-tid + total per engine) so cadence regressions are diff-visible at-a-glance.

Emit points (planned, NOT yet wired)

Canary: extend audit_70_semaphore_release_watch to call phase_a::EmitSemaRelease(handle, count, prev_count) from NtReleaseSemaphore_entry + xeKeReleaseSemaphore. Cvar gating remains the existing audit_70_semaphore_release_watch (or a new phase_a_event_log_sema_releases=false for finer control).
Ours: emit sema.release from nt_release_semaphore in crates/xenia-kernel/src/exports.rs and from KSemaphore::release (kernel-mode equivalent). Default-off via a runtime flag; default cold runs must remain digest-stable.

Both engines MUST emit at handler entry (not wrapper-internal) so the event count corresponds 1:1 to guest NtReleaseSemaphore invocations, matching the canary cvar's existing semantics.

Status

Diff tool: support landed (this session, v1.6). sema.release in ENGINE_LOCAL_KINDS + COUNTED_ENGINE_LOCAL_KINDS; counts surfaced in report summary; 3 new tests in test_diff_events.py.
Canary emit: NOT YET WIRED. Planned for AUDIT-070+ when the root cause investigation requires it. Existing cvar audit_70_semaphore_release_watch continues to emit non-schema log lines (used by S5/S6 captures).
Ours emit: NOT YET WIRED. See above.

Backward compatibility

Wire format unchanged. schema_version is still 1.
Pre-v1.6 event logs (no sema.release events) trigger no new behavior — the engine-local skip branches are inert; the "Counted engine-local kinds" report section is suppressed when no counted-kind events exist.
Diff tool changes are purely additive: existing engine binaries diff identically pre- and post-v1.6.

Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26)

Motivation

C+2 (ALLOCATOR_RETURN_FNS) canonicalizes kernel.return.return_value for a known set of host-allocator-returning exports (MmAllocatePhysicalMemoryEx, RtlAllocateHeap, …). That covers the case where the allocated VA appears as the function's return value. But the same allocator-drift class (AUDIT-043 ε: canary's BC physical heap 0xBCxxxxxx vs ours's unified user heap 0x4xxxxxxx) ALSO surfaces inside typed event payloads of non-allocator exports — most notably the thread.create.ctx_ptr field, which holds the host-allocated TLS/context block that ExCreateThread passes to the new guest thread's r3.

Empirical surface (C+22 cold-vs-cold idx 105,128 on the Sylpheed audio-stack worker ExCreateThread(entry_pc=0x824cd458)):

field	canary	ours
`ctx_ptr`	`0xbe56bb3c` (BC physical heap)	`0x42453b3c` (unified user heap)
`entry_pc`	`0x824cd458`	`0x824cd458` (bit-identical — game code)
`priority`	`0`	`0`
`affinity`	`4`	`4`
`stack_size`	`32768`	`32768`
`suspended`	`false`	`false`

The C+2 ALLOCATOR_RETURN_FNS mechanism doesn't help here because ExCreateThread's return value is the new thread's handle (canary's 0xF8xxxxxx vs ours's 0x4, 0x8, …), already covered by handle_semantic_id skip-policy. The host-heap-allocated context block is a side-channel field inside the thread.create event payload.

The fix

HOST_HEAP_PAYLOAD_FIELDS_BY_KIND maps event kind → tuple of payload field names. Each listed field's value (expected 0x-prefixed hex string) is rewritten to a per-(tid, kind, field) ordinal sentinel <HOSTHEAP_<KIND>_<FIELD>_<ORDINAL>> BEFORE payload comparison. The mechanism mirrors canonicalize_allocator_returns exactly, restricted to typed payload fields.

Initial set (v1.7):

HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = {
    "thread.create": ("ctx_ptr",),
}

Strict-field preservation

For each canonicalized event kind, the strict fields (game-visible attributes that MUST match across engines) are untouched. For thread.create these are:

entry_pc — guest VA of the new thread's entry function, bit- identical in both engines because both engines load the same XEX and the entry comes from guest code.
priority, affinity, stack_size, suspended — game-visible thread attributes the guest passes to ExCreateThread.

Skip-policy fields (handle_semantic_id, parent_tid) continue to be skipped via SKIP_PAYLOAD_FIELDS_BY_KIND (unchanged from C+15-α — see "Diff-tool field-comparison rules" above).

Why `parent_tid` does NOT need new canonicalization

Per the C+15-α skip-policy table, parent_tid is already in SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]. The diff tool pairs guest TIDs at the chain level (--tid-map or auto_tid_map), and the per-event parent_tid is engine-local (canary tid=6 vs ours tid=1 for the same logical "main thread" chain). Skipping is sufficient — no ordinal sentinel needed.

Could a future schema v2 canonicalize parent_tid via the tid map? Yes, but it would surface mismatches as a map gap rather than as a clearer per-tid alignment failure that's already visible at chain boundaries. The v1.x skip-policy is the simpler choice; tests pin the existing behavior so it doesn't regress.

Ordinal-count contract

As with ALLOCATOR_RETURN_FNS: if one engine emits MORE thread.create events on a given tid than the other, ordinals drift and the next typed event surfaces a divergence against whatever the other side has at that position. Ordinal-count mismatch IS a behavioral divergence — the canonicalization preserves divergence detection, only collapsing host-allocator-VA noise.

Defensive value handling

If ctx_ptr is non-string (None, int, missing) — pre-C+22 event logs whose emitter omits the field — the canonicalizer leaves it untouched and does NOT consume an ordinal. The next string-typed value gets ordinal 0. This keeps pre-v1.7 logs diffable without forcing an emitter retrofit.

Backward compatibility

Wire format unchanged. schema_version is still 1.
Pre-C+22 event logs whose thread.create.ctx_ptr happens to bit-match (e.g. static-allocator addresses like 0x828F3D08 that BOTH engines use for the pre-XEX kernel-state ctxs) still match strictly via the ordinal sentinel — they get the same ordinal in both engines.
The --no-canonicalize-host-heap-fields CLI flag disables the pass (reverts to raw-VA comparison), mirroring the existing --no-canonicalize-allocators. Used by gate tests and investigation rerun.
Engine source is UNCHANGED in C+22 — the fix is in the diff tool only.

Extension shape

The map shape kind -> (field, …) is intentionally minimal: each entry is one event kind plus the fields on it that hold host-heap VAs. Future entries could include e.g. thread.create.tls_ptr (if such a field is added to the schema) or a hypothetical vfs.mmap.host_ptr. Strict-field policy remains: any field NOT listed here is compared bit-identically.

Forward compatibility

Phase A's original schema-v1 declared 13 sections (16 distinct kind strings); Phase A wired 4 of them. Phase C+15-α wired an additional 5 (handle.create, handle.destroy, thread.create, thread.exit, wait.begin). wait.end, thread.suspend/resume, mem.write, vfs.open/read/close remain declared but unwired; adding them is additive surface area at schema v1.1+.

A future schema v2 may break wire format (e.g. canonical SIDs, structured args). Both engines pin schema_version = 1 in this phase; the diff tool refuses to mix v1 and v2 inputs.

36 KiB Raw Blame History Unescape Escape

Phase A — Event Log Schema v1

Wire format

Common fields (every event)

Cycle source notes

Per-tid event index

Handle semantic ID

Event kinds (v1)

schema_version

thread.create

thread.exit

thread.suspend / thread.resume

kernel.call

kernel.return

handle.create

handle.destroy

wait.begin

wait.end

mem.write

vfs.open / vfs.read / vfs.close

import.call

Diff-tool field-comparison rules

Phase C+15-α note on handle_semantic_id

Shared-global SIDs (v1.2 — added in Phase C+18)

The first-toucher race

The C+18 fix: deterministic SID recipe

Which call sites use which recipe

Diff tool: cross-tid floating handle.create matching

Index relaxation

Backward compatibility

Wait-begin floating absorb (v1.3 — added in Phase C+21)

Motivation

The fix

Shared-global SID detection (extended in C+21)

Risk of over-absorption (and why it's bounded)

Diff-tool report changes

Backward compatibility

contention.observed (v1.4 — added in Phase D Stage 1, 2026-05-18)

Motivation

Event shape

When emitted (canary)

When emitted (ours, Stage 3 — pending)

Diff-tool treatment (Stage 4 — pending)

Cvar default + byte-identity

Nested-CS-cleanup absorber (v1.5 — added in Phase D D-extension, 2026-05-18)

Status

Trigger shape

Behavior

Why this is safe (within #23's spirit)

Why this is NOT safe in the general sense

Empirical result (Sylpheed 104,607 cap)

Tests

sema.release (v1.6 — added in AUDIT-069 Session 6, 2026-05-21)

Motivation

Event shape

SID recipe

Why engine-local

Emit points (planned, NOT yet wired)

Status

Backward compatibility

Host-heap payload-field canonicalization (v1.7 — added in Phase C+22, 2026-05-26)

Motivation

The fix

Strict-field preservation

Why parent_tid does NOT need new canonicalization

Ordinal-count contract

Defensive value handling

Backward compatibility

Extension shape

Forward compatibility

36 KiB

Raw Blame History

`schema_version`

`thread.create`

`thread.exit`

`thread.suspend` / `thread.resume`

`kernel.call`

`kernel.return`

`handle.create`

`handle.destroy`

`wait.begin`

`wait.end`

`mem.write`

`vfs.open` / `vfs.read` / `vfs.close`

`import.call`

Phase C+15-α note on `handle_semantic_id`

Diff tool: cross-tid floating `handle.create` matching

Why `parent_tid` does NOT need new canonicalization