Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.4 KiB
AUDIT-069 Session 2 — writer report v2
Date: 2026-05-20
xenia-rs HEAD: e6d43a23ac393004d2e5adf2f0395fd0b5e6448b (UNCHANGED from S1)
git diff HEAD | sha256sum: ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357 (UNCHANGED from S1 end)
No canary instrumentation added this session.
Headline
S1's framing is FALSIFIED. ours does NOT lack a "canary-tid=10 equivalent" thread. The spawn chain executes identically:
main (ours tid=1) → sub_8244FEA8 → sub_8244FF50 → ExCreateThread(entry=0x82450A28, ctx=0x828F3B68) → ours tid=5 starts → sub_82450A28 (1×) → sub_82450A68 (1×) → γ-signaler family (sub_8245D9D8 6×, sub_8245DA78 1×, sub_8245DB40 75×)
This is bit-equivalent to canary's chain, modulo the tid label (canary calls it tid=10, ours calls it tid=5 — same entry, same ctx, same dispatch loop, same γ-signaler family fires from inside it).
The signaler spawn-chain is NOT the bug. S1's "the bug is at the thread-spawn layer" hypothesis is wrong.
Spawn chain (DB-derived, READ-ONLY DuckDB)
| Fn | callers in DB | role |
|---|---|---|
| 0x82450A28 | 1 ref-edge from 0x8244FFF8 (sub_8244FF50+0xA8) | thread entry (data ptr only) |
| 0x8244FF50 | 1 call-edge from 0x8244FEE8 (sub_8244FEA8+0x40) | ExCreateThread caller |
| 0x8244FEA8 | 11 call-edges (8 unique callers across sub_821A5150, sub_821CB968, sub_821CC2E8, sub_821D2850, sub_82237EC8, sub_8225EE20, sub_822E0350, sub_824528A8, sub_82452DC0 (2×), sub_8245E528) | spawn helper |
Per-PC fire counts (ours-cold, 1.5B instr, fresh today)
| PC | symbol | fires | tid |
|---|---|---|---|
| 0x8244FEA8 | sub_8244FEA8 (spawn helper) | 7 | 1 |
| 0x8244FF50 | sub_8244FF50 (ExCreateThread caller) | 1 | 1 |
| 0x82450A28 | sub_82450A28 (thread entry) | 1 | 5 |
| 0x82450A68 | sub_82450A68 (worker dispatch loop) | 1 | 5 |
| 0x8245D9D8 | γ-signaler D | 6 | 5 |
| 0x8245DA78 | γ-signaler D-B | 1 | 5 |
| 0x8245DB40 | γ-signaler D-NEW | 75 | 5 |
Spawn event log confirms ExCreateThread: tid=5 handle=0x1050 entry=0x82450a28 start_ctx=0x828f3b68.
Total kernel.calls{name=ExCreateThread} = 10.
Comparison with canary (S1 data — fresh today, not stale)
| metric | canary | ours |
|---|---|---|
| thread with entry=0x82450A28 | tid=10 | tid=5 |
| start_ctx | 0x828F3B68 | 0x828F3B68 |
| γ-D family signaler firings | all on tid=10 | all on tid=5 |
| NtSetEvent fires from γ-D (via wrapper 0x824AA2F0) | confirmed | confirmed |
The spawn chain and γ-signaler invocation match. The only divergence at the signaler call site is which handle gets signaled, not whether the signaler runs.
Divergence point (parent fires, child also fires)
NONE — every node in the spawn chain fires in ours. The S1-prescribed "first ancestor that fires while child does not" never materialises because the entire chain is reached identically.
The actual divergence is downstream of the spawn-chain — at the handle-selection step inside the γ-signaler family, per AUDIT-062's prior finding ("ours's γ-signalers signal WRONG handles — neighbors of the wedge handle, not the wedge itself").
Gate condition
There is no gate that ours fails. The control flow reaches the γ-signaler
and invokes the NtSetEvent wrapper (sub_824AA2F0) with bit-identical
control flow. The argument to NtSetEvent (the handle) is the
divergent term.
In the AUDIT-062 archive ours-ntset.jsonl, the γ-D signaler on ours tid=5
calls NtSetEvent on handles 0x103C, 0x1068, 0x106C, 0x1094, ...
These are guest-side handle slots that the waiter is NOT waiting on.
Per S1, canary's wedge waiter (tid=17, tid=26) waits on F80000A4 and
F8000110. Note that canary's handles are pseudo-handles (high-bit
encoded), while ours's slot allocator hands out normal 0x10xx IDs —
a known cross-engine handle convention mismatch already documented
in AUDIT-019/043/062.
The semantic question is therefore: what does the producer compute as the "next handle to signal", and is the computation reading a different value of the bookkeeping struct in ours vs canary? This is the question AUDIT-062 hit and parked; it must be re-opened now that S1 has clarified the producer thread is reached identically.
ours-side analog status
The relevant kernel handlers are:
-
NtSetEvent— oursxenia-kernel/src/exports.rsis per-AUDIT-062 archive bit-equivalent to canary in semantics (signals the event, schedules wakeup). Returns SUCCESS in both. -
ExCreateThread— ours bit-equivalent (S2 spawn matches canary trajectory ctx + entry + suspended flag). -
xeKeWaitForSingleObject(wedge wait at 0x821CB1DC) — ours behaviour matches per AUDIT-049/065 prior work; the WAIT itself is fine, what remains broken is the signaler picking the right handle on tid=5.Net: NO kernel handler bug. The divergence is guest-state computed inside the γ-signaler family at sub_8245D9D8 / sub_8245DA78 / sub_8245DB40 — i.e. data that lives in the queue/list dispatched by sub_82450A68.
Reading-error #28 reclassification
S1 inadvertently committed the same class of error documented as #28 in prior audit memory: "treating per-engine tid label numerically across engines without a tid-mapping translation." S1 used canary's "tid=10" verbatim and AUDIT-062's "tid=10: 0 fires" verbatim, concluding "ours's thread set lacks the canary-tid=10 equivalent." In reality the same guest thread exists on both, with renumbered host-side tid labels.
The correct cross-engine identity is (entry_pc, start_ctx), not the
tid integer. S2 re-validates by entry=0x82450a28 ∧ ctx=0x828f3b68,
which uniquely identifies the spawn on both engines.
Do NOT register a new reading-error #; this is the existing #28 surface.
Session 3 recommendation (refined)
Drop the spawn-chain investigation entirely. The producer thread runs.
Path A (RECOMMENDED, ~80 LOC ours-only): build a probe of the
handle-passed-to-NtSetEvent on tid=5 (ours) inside the γ-signaler
PCs, paired with the symmetric audit_69_event_signal_watch capture
from S1 in canary. Compare the sequence of handle IDs per signaler
invocation. The first mismatch identifies the guest-state divergence
that drives wrong-handle selection.
Plumbing path: extend --lr-trace in ours (crates/xenia-app/src/main.rs:233-243)
to also capture r3 snapshot at multiple PCs, matching canary's
audit_69 wrapper-entry capture. Already exists (M12 lr_trace lists
pc/tid/hw/cycle/r3/r4/r5/r6/lr). Probe ours 0x824AA2F0 and 0x824AAF50
entry PCs.
Path B (~50 LOC diff-tool): extend the diff-events JSONL absorber to treat the canary→ours handle-ID mapping as a runtime-discovered alias when the underlying dispatcher pointer matches. Doesn't fix the bug, absorbs the symptom.
Path C (root-cause, larger): walk sub_82450A68 dispatch loop body disassembly + AUDIT-062 archive to identify which guest-memory struct holds the queue of "handles to signal." The wrong handles on ours mean this struct gets populated wrong somewhere upstream of tid=5's dispatch loop — likely from sub_8244FEA8's 7 fires (which call sites enqueue work, and what data is enqueued).
LOC budget for S3: Path A ~80, Path B ~50, Path C unknown (~200+).
Cascade A/B/C/D
- A (DB-derived spawn chain): PASS (11 callers, 1 unique call edge to FF50).
- B (per-fn fire counts ours+canary): PASS (ours fresh, canary from S1 fresh).
- C (divergence-point identification): N/A — no divergence in spawn chain; S1 framing falsified. Re-direction recommended.
- D (kernel-handler bit-equivalence check): PASS (NtSetEvent / ExCreateThread per AUDIT-062 archive; no new kernel bug detected).
Net: 3/4 PASS, 1/4 N/A (because the postulated divergence wasn't there).
Discipline
- xenia-rs HEAD UNCHANGED (sha256 of
git diff HEADmatches S1 end). - No canary instrumentation added this session — S1's data is fresh.
- ours-rs ran with
--ctor-probe(read-only, lockstep-digest-unaffected flag already in main.rs:194). - No source modifications to ours.
- ours-rs cache (none on this host); no canary run, no canary cache to wipe.
Artifacts
audit-runs/audit-069-wait-signal-producer/
session-2-spawn-walk.log (combined probe + DB queries + fires table)
writer-report-v2.md (this file)
s2/ours-probe.stdout (780 lines, 91 CTOR-PROBE records)
s2/ours-probe.stderr (241 lines, all spawn events + summary)
No fix-canary-v2.diff (no canary instrumentation added).