Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.9 KiB
AUDIT-069 Session 3 — writer report v3
Date: 2026-05-20
xenia-rs HEAD: e6d43a23ac393004d2e5adf2f0395fd0b5e6448b (UNCHANGED from S1/S2)
git diff HEAD | sha256sum: ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357
(UNCHANGED at start AND end of S3)
No canary instrumentation added this session.
No ours source modifications. --lr-trace is a runtime flag (main.rs:233-243).
Headline (HIGH confidence, direct measurement)
ours's tid=5 (= canary tid=10 by entry/ctx identity) fires the γ-signaler
family from the SAME guest LRs as canary — but only 81 times where
canary fires 492 times (16%). This is NOT a "wrong-handle" bug — it is
a producer-loop underrun. The dispatch loop in sub_82450A68 exits
early or starves; consumer threads then block on events that ours never
gets to signal.
S2's "the producer fires identically, just selects wrong handles" framing is REFINED, not falsified: the producer reaches the wrappers via the EXACT same call sites but completes ~5× fewer iterations.
Method
Read-only --lr-trace=0x824AA2F0,0x824AAF50 on cold ours boot, 1.5B
instructions / 47 s wallclock (and re-validated at 5B / 159s — same 81
fires, same handle universe, same import_calls=39290 → no new work after
the producer's initial burst). JSONL output to s3/ours-lr-trace.jsonl.
Cross-engine paired against S1's signal-probe-correlated.log (canary
data, fresh 2026-05-20).
Per-LR fire counts
| caller LR | symbol | wrapper PC | canary tid=10 | ours tid=5 | ratio |
|---|---|---|---|---|---|
| 0x8245DA44 | γ-D-A (sub_8245D9D8) | 0x824AA2F0 | 23 | 5 | 22% |
| 0x8245DB08 | γ-D-B (sub_8245DA78) | 0x824AA2F0 | 8 | 1 | 12% |
| 0x8245DC5C | γ-DB40 (sub_8245DB40 NEW) | 0x824AAF50 | 461 | 75 | 16% |
| TOTAL | 492 | 81 | 16% |
ours runs the same producer code, but the loop terminates early. S2's per-PC fire-count table also shows ours = 6/1/75 for the three γ-fns — this S3 data agrees with S2 for the wrapper-entry side too.
Handle namespaces are incomparable by raw ID
- canary uses
XEvent::native_object()pseudo-handlesF8000xxx(high bit set, encodes a synthetic ID assigned byXObject::GetNativeObject). - ours uses normal slot IDs
0x10xxfrom the handle-slot allocator.
Comparison must be by (a) position in the per-LR sequence and (b) call args (size r5, signal-kind r4).
Position-0 args MATCH (HIGH confidence, direct measurement)
| LR | r5 (size / kind) | matches? |
|---|---|---|
| 0x8245DC5C | ours=0x800 / canary=0x800 | YES |
| 0x8245DA44 | ours=2 (Set) / canary=2 | YES |
| 0x8245DB08 | ours=2 / canary=2 | YES |
r4 (buffer/ctx pointers) DIFFER in absolute address (different memory layouts) but TYPE-shaped identically. The first invocation of each signaler is structurally identical. The divergence is in COUNT of subsequent loop iterations, not in handle-selection of position-0.
See s3/handle-sequence-diff.md for full position-aligned table.
γ-DB40 signal-target distribution (the 461-vs-75 case)
| canary handle | count | ours handle | count |
|---|---|---|---|
| F80000C8 | 229 | 0x000010E0 | 69 |
| F80000DC | 79 | 0x00001040 | 1 |
| F8000078 | 71 | 0x0000105C | 1 |
| F80000BC | 39 | 0x00001098 | 1 |
| F800012C | 28 | 0x000010AC | 1 |
| F80000B4 | 7 | 0x000010D0 | 1 |
| F8000044 | 4 | 0x0000121C | 1 |
Shape: both have one dominant handle that absorbs ~half the signals (canary 229/461=50%, ours 69/75=92%) and a long tail. ours's tail is truncated — only 7 distinct handles in γ-DB40 vs canary's 10+.
This is consistent with the producer enqueues the same kinds of work
items but the upstream feeder under-fires, so the dominant work-item
(handle 0x10E0 ≈ F80000C8 by position) gets some iterations,
the next-most-common items get truncated to 1×, and the long tail
(canary's F80000DC 79× / F8000078 71×) is mostly missing.
Wedge handle status (HIGH confidence)
AUDIT-062 archive recorded ours wedge handles 0x12AC and 0x12B8 with
<NO_SIGNALS_DESPITE_WAITS> annotation in a deeper-boot run.
In S3's lr-trace: handle 0x12AC count = 0, handle 0x12B8 count = 0. No handle ≥ 0x121C appears in tid=5's signal trace at all.
Max handle observed in this run: 0x121C (cache:/aab216c3 NtCreateFile).
The wedge handles are NEVER allocated in this 5B-instruction run, because
boot terminates before the trajectory that would create them. The
producer fires 81 times, then tid=5 goes quiet; the import_call counter
freezes at 39,290; --halt-on-deadlock does NOT trigger (consumers wait
on existing events that were never the wedge in this run).
This is a stronger statement than "the wedge handle is never signaled": the wedge handle is never even CREATED, because the boot never reaches the point of creating it. ours's boot trajectory is truncated by the producer underrun upstream.
Classification: producer-loop underrun (HIGH confidence)
NOT a race (timing-dependent), NOT a wrong-handle bug (the args at matching positions are structurally identical), NOT a missing-kernel- handler bug (the signals that DO fire pass through bit-equivalent wrappers).
It is producer-loop underrun: sub_82450A68's dispatch loop iterates fewer times. Either:
- The work queue (read from guest memory by sub_82450A68) is populated with fewer items by some upstream feeder.
- The dispatch loop's exit condition trips early.
- The thread blocks on a dispatcher event that never gets re-signaled.
Mechanism candidates (S4 to discriminate):
- upstream feeder: callers of sub_8244FEA8 (11 sites in DB) — one enqueues less work in ours. Most likely the audio cluster (sub_8225EE20) or sub_82452DC0 (2 calls) given they relate to APUBUG- PRODUCER-001 territory.
- dispatch loop exit: the loop reads a flag from the dispatcher
struct at
0x828F3B68 + offset; a state divergence there exits early. - inner KeWait at 0x824AB240 (mentioned in S1 spawn-chain notes): if this wait times out / fails differently in ours, the loop exits.
Reading-error registry
NO new reading-error class needed. This session confirms one existing class:
- #28 cross-engine tid label mismatch — used correctly here (compared by entry/ctx, not by tid integer).
- AUDIT-062 "wrong handles" framing is a SYMPTOM of the producer underrun (fewer signals → some handles signaled, others starved), not a separate bug.
Cascade
- A (capture ours per-PC signaler firings): PASS (137 records, 81 on tid=5).
- B (parallel canary sequence from S1): PASS (492 records on tid=10).
- C (first-mismatch identification): PASS — divergence is in iteration count, not in handle-at-position-0. Position-0 args match structurally.
- D (race-vs-missing-signal classification): PASS — neither pure race nor pure missing-signal. It is producer-loop underrun (boot doesn't reach the wedge-handle-creating subsystem).
Net 4/4 PASS.
S4 recommendation (refined)
Drop the "wrong-handles-from-γ-signaler" framing. Focus upstream on WHY tid=5's dispatch loop runs ~5× fewer iterations.
Path A (RECOMMENDED, ~30 LOC ours-only diagnostic, no source mod)
Use --lr-trace=0x82450A68 (the dispatch-loop body PC) plus the existing
--branch-probe to see WHERE in the loop body ours exits. If the loop has
a backward branch at offset X and ours's last fire is at offset Y < X, the
loop is exiting early. Pair with the inner bl 0x824AB240 (KeWaitForMultipleObjects)
to see if the loop blocks on a wait that returns differently than canary.
Path B (~80 LOC ours-only) — feeder-side capture
--lr-trace=0x8244FEA8 on cold ours AND canary. The spawn-helper fires 11
times statically in DB-derived list of callers; runtime fires 7× in S2's
ours run. Pair r3/r4 (the spawned thread's start_ctx args) with canary's
equivalent. ours may be missing one or more enqueues — the missing
enqueue is the upstream root cause.
Path C (~250 LOC, larger) — work-queue struct disassembly
Disassemble sub_82450A68 body, identify the work-queue struct it reads
from (likely at [r29 + N] where r29 = start_ctx 0x828F3B68 or a derived
pointer). Watch the struct with --mem-watch to identify the populator
(which fn writes the queue items). Trace that populator upstream.
LOC budget for S4: Path A ~30, Path B ~80, Path C ~250.
Path A first — gives the precise exit-condition (loop-body branch vs inner-wait timeout) at zero LOC cost.
Discipline
- xenia-rs HEAD UNCHANGED (sha256 of
git diff HEADmatches S1/S2 end). - No source modifications.
--lr-traceis read-only, lockstep-digest-unaffected (per state.rs:1463-1500).- No canary run this session (S1's data is fresh).
- No canary cache to wipe (no canary run).
- ours runs cold (no cache pre-population).
Artifacts
audit-runs/audit-069-wait-signal-producer/s3/
ours-lr-trace.jsonl (137 records, both PCs, all tids)
ours-lr-trace.stderr (run log + counters)
ours-lr-trace.stdout (empty under --quiet)
ours-lr-trace-824AA2F0.log (60 records, NtSetEvent wrapper)
ours-lr-trace-824AAF50.log (77 records, Ke wrapper)
ours-lr-trace-extended.{jsonl,stderr,stdout} (5B-instr re-validation: same 81 fires)
handle-sequence-diff.md (parallel comparison + first-mismatch table)
writer-report-v3.md (this file)
No fresh canary run was needed — S1's signal-probe-correlated.log
(154,187 lines) carries all canary signal-probe data.
Summary of S1 → S2 → S3 progression
- S1: identified canary's tid=10 as the signaler; claimed ours lacks this thread (FALSIFIED by S2).
- S2: spawn-chain runs identically on ours tid=5; refined to "wrong- handle selection" downstream (REFINED by S3).
- S3: ours runs identical PC/LR chain but with ~5× fewer iterations. Loop underrun classification. Wedge handle never even gets created in ours's truncated boot trajectory.
The bug is upstream of the γ-signaler: in WHAT the dispatch loop reads from the work queue, or in the loop's exit condition.