# diff_events.py — Phase A event-log diff tool A stdlib-only Python tool that diffs two schema-v1 JSONL event logs (one per engine) and reports the **first behavioral divergence per guest thread**. Built for the Phase A diff harness — see `audit-runs/phase-a-diff-harness/README.md` and `schema-v1.md`. ## What it does 1. Reads two JSONL files. Validates each begins with a `schema_version=1` header event. 2. Builds per-thread streams keyed by `tid_event_idx` (the schema's per-tid monotonic counter). 3. Maps canary-tid ↔ ours-tid (auto-pairs by first `kernel.call` name in each stream, or manual via `--tid-map`). 4. Walks each mapped pair in parallel, comparing events with rules from the schema (raw_handle_id skipped, host_ns skipped, wait_duration_cycles skipped, etc.). 5. On first divergence: prints 5-event pre-context + the divergent event + the next event from each. Stops that thread's walk. 6. Writes a markdown report. ## Usage ```bash # Default — auto-map tids, write markdown to stdout python3 diff_events.py --canary canary.jsonl --ours ours.jsonl # Write report to a file python3 diff_events.py --canary c.jsonl --ours o.jsonl --out report.md # Manual tid map python3 diff_events.py --canary c.jsonl --ours o.jsonl --tid-map 6=1,7=2 # Negative-test mode — exit non-zero on ANY divergence (gate-4) python3 diff_events.py --canary c.jsonl --ours o.jsonl --validate-identical ``` ## How it compares These fields are **skipped** when comparing payloads: - Top-level: `engine`, `host_ns`, `guest_cycle`, `deterministic`. - `handle.create`/`handle.destroy`: `raw_handle_id`, `handle_semantic_id` (engine-local). - `wait.begin`: `handles_semantic_ids` (engine-local SIDs). - `wait.end`: `wait_duration_cycles` (depends on host scheduling), `woken_by_semantic_id`. The `tid_event_idx` field is the **alignment key**. Two events at the same `tid_event_idx` on a mapped pair of tids are expected to be the same logical event. The `kind` must match; the `payload` must match field-by-field (except skipped fields). ## Phase C+18 — Cross-tid floating `handle.create` (shared-global dispatchers) Process-global kernel dispatcher objects (`KEVENT`/`KSEMAPHORE` etc. that game code creates with `KeInitializeEvent` or static-allocs and shares across multiple guest threads) are lazy-wrapped on **first guest-thread touch** by canary's `XObject::GetNativeObject` and ours's `ensure_dispatcher_object`. Whichever thread happens to touch the dispatcher first synthesizes the wrapper and emits the `handle.create` event. Which thread wins is timing-dependent — canary and ours may disagree. The SID for these synthesized handles is computed via a **scheduling-invariant recipe** keyed on `(pointer, object_type)` only (see schema-v1.md §"Shared-global SIDs"). The same dispatcher therefore yields the same SID in both engines regardless of the first-toucher thread. The diff tool detects shared-global `handle.create` events by recomputing the deterministic SID from the event's `(raw_handle_id, object_type)` payload and matching against the emitted `handle_semantic_id`. When per-tid alignment finds one side has an "extra" `handle.create` event whose SID is in the global set, the tool **advances only that side's stream pointer past the floating event** and re-compares — preserving strict alignment for everything else. The summary table shows per-pair `floating_skipped (c/o)` counts so you can see how many events were absorbed by this mechanism. ## Known limitations (v1) - **Auto tid-map is naive**: pairs canary-tid with ours-tid by the first `kernel.call` name on each thread. Works for boot when the same initial call happens on each engine's primary thread; can mis-pair if two threads start with the same first-call name or if a thread spawns earlier on one engine. Use `--tid-map` to override. - **No streaming**: loads both files fully into memory. Acceptable for boot-window runs; the canary log is ~370 MB for a 12 s run. - **First-divergence only**: per-thread walk stops at first divergence. Subsequent divergences on the same thread are not reported (a sliding-window mode could be added later if needed). - **Schema v1 only**: refuses to parse v2 inputs (forward-incompat is intentional). ## Files - `diff_events.py` — single-file CLI, stdlib only (json, argparse, pathlib). - `README.md` — this file. ## Test it ```bash # Self-diff (compare a file against itself) should report 0 divergences. python3 diff_events.py --canary x.jsonl --ours x.jsonl --validate-identical echo "exit=$?" # expect 0 # Negative test: corrupt one event and confirm the tool reports it. sed '50s/"kernel.call"/"kernel.CORRUPT"/' x.jsonl > /tmp/x-corrupt.jsonl python3 diff_events.py --canary x.jsonl --ours /tmp/x-corrupt.jsonl --validate-identical echo "exit=$?" # expect 1 ```