Files
xenia-rs/audit-runs/iterate-2S-longbudget-signal-match/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

349 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.S — Long-budget (500M) replay with 2.Q `signal.match` active (writer report)
**Date:** 2026-05-28. **LOC delta:** engine **0**, canary **0**, tooling **0**.
Pure measurement.
**Tests:** N/A (no source modifications).
**Cascade:** N/A — observability replay only.
## Headline
**BUDGET-CAP-FALSIFIED / C-2-SCHEDULER-FAIRNESS-CONFIRMED-STRUCTURAL.**
500M-instruction replay (10× 2.Q's 50M) under `XENIA_CACHE_WIPE=1` with
2.Q `signal.match` instrumentation active emits **121,605 events,
bit-identical to 2.Q's 50M run.** Run terminates `EXIT=0` at wallclock
13.7 s on `reached max instruction count limit=500000000`. **Zero
`signal.match` events on wedge handle `0x000012e4` (or any of the
4 unsignaled wedge handles {0x12c8, 0x12d0, 0x12e4, 0x1020}) in the
entire 500M-instruction window.** Exit-state thread geometry bit-identical
to 2.M/2.N/2.Q (13 threads, 10 wedge entries, same wedge map). **tid=6
remains `Ready` on `hw_id=5` with no resumption** despite the engine
having ~13× more wallclock budget to schedule it. Combined with 2.K's
identical "zero new events 50M→500M" result, this **definitively rules
out the C-1 burst-then-halt subclass framing as a budget-truncation
artifact** and **confirms 2.R's C-2 (Ready-but-not-running on CPU5) as
structural**. Next iterate should be 2.T (`wake.requested`
instrumentation in `wake_eligible_waiters`) to decisively distinguish
kernel-wake-call-not-issued vs scheduler-pick-skipping-Ready-tid-6.
## Mode
ZERO LOC. Invocation (identical to 2.K except cwd):
```
XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \
-n 500000000 --quiet \
--phase-a-event-log audit-runs/iterate-2S-longbudget-signal-match/ours-cold.jsonl \
"../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
```
Engine binary `xenia-rs/target/release/xenia-rs` from May 28 19:51 carries
the uncommitted 2.Q `signal.match` patch (working tree HEAD
`e6d43a23…` + diff sha256 `e81a4b84…`). XDG cache `/home/fabi/.local/share/
xenia-rs/cache/` was empty before run; `XENIA_CACHE_WIPE=1` set for
belt-and-braces.
Run completed `EXIT=0`. Diagnostic re-run (non-quiet) captured:
`reached max instruction count limit=500000000` ... `exec complete
wall_ms=13705 instructions=500000004 import_calls=40390 unimplemented=0`.
Instruction budget hit cleanly, no panic / fault / SIGSEGV / timeout.
## Primary gate results
### Gate 1 — `signal.match` events on wedge handle `0x000012e4`
| metric | value |
|---|---:|
| `signal.match` on `0x000012e4` whole run | **0** |
| `signal.match` on `0x000012e4` in [1.0, 5.0] s | **0** |
| `signal.match` total (run-wide) | 36 |
**Same as 2.Q.** No signaler ever produces `0x000012e4`. The disambiguation
gate from the goal-spec resolves to "C-2 confirmed structural" (since
neither budget cap nor signal observability changed).
### Gate 2 — Exit-thread-state on tid=6 (and other wedge tids)
`exit-thread-state.json` 9651 bytes, bit-identical to 2.M/2.N/2.Q. tid=6
state and full wedge geometry unchanged:
| tid | state | hw_id | affinity | last_pc | wedge waiting on |
|---:|---|---:|---|---|---|
| 1 | Blocked | 0 | 0xff | 0x824ac578 | 0x12c8 = Thread(13) |
| 2 | Blocked | 1 | 0xff | 0x824a95f8 | 0x8287093c = Event |
| 3 | Blocked | 5 | 0x20 | 0x824ac578 | 0x1020 = Event |
| 4 | Blocked | 3 | 0x08 | 0x824ac578 | 0x1028 = Semaphore(0/2³¹-1) |
| 5 | Blocked | 3 | 0x08 | 0x824ac578 | **0x12e4 = Event** |
| **6** | **Ready** | **5** | **0x20** | **0x824ab214** | **—** |
| 7 | Blocked | 2 | 0x04 | 0x824cd4f4 | 0xbe8cbb5c = Event |
| 8 | Blocked | 2 | 0x04 | 0x824ab214 | 0x10ec=Event + 0x10d8=Sem |
| 9 | Ready | 4 | 0x10 | 0x824d1404 | — |
| 10 | Ready | 5 | 0x20 | 0x824d1404 | — |
| 11 | Blocked | 0 | 0xff | 0x824d2a94 | 0x828a3244 + 0x828a3220 |
| 12 | Ready | 5 | 0x20 | 0x824aa6a4 | — |
| 13 | Blocked | 1 | 0x02 | 0x824ac578 | 0x12d0 = Event |
**tid=6 STILL Ready at hw_id=5** — exactly as 2.R observed. The 10× budget
did not allow the scheduler to resume tid=6.
### Gate 3 — tid=5 last guest_cycle
| metric | 2.R (50M jitter sample) | 2.S (500M run) | delta |
|---|---:|---:|---:|
| tid=5 last guest_cycle | (n/a separately reported, but wedge wait at 1,007,809,113 host_ns) | **486,334** | — |
| tid=5 last host_ns | 1,007,809,113 (2.R) | **859,219,713** | LOWER (jitter, not regression) |
Note: `host_ns` is wallclock-derived and varies jitter-to-jitter. `guest_cycle`
is the deterministic guest-side counter; tid=5's last guest_cycle 486,334
is bit-equivalent across 2.Q / 2.S (same Phase-A event content).
## Secondary gate results
### Total event counts
| metric | 2.K (50M-baseline, no signal.match) | 2.Q (50M+signal.match) | 2.S (500M+signal.match) |
|---|---:|---:|---:|
| total events | 121,569 | 121,605 | **121,605** |
| `signal.match` events | 0 (kind not emitted) | 36 | **36** |
| baseline events (ex signal.match) | 121,569 | 121,569 | **121,569** |
| Phase-A delta 50M→500M | 0 (vs 2.J) | n/a | **0** |
| Wallclock | 13.96 s | not reported (~5s) | **13.7 s** |
| Termination reason | `reached max instruction count limit` | (50M) | **`reached max instruction count limit=500000000`** |
**Bit-identical event count to 2.Q.** 10× budget bought ~10× wallclock
but produced **zero additional Phase-A events**.
### `signal.match` by signaler tid (whole run)
| tid | count | target handles |
|---:|---:|---|
| 5 | 19 | 0x1028×7, 0x10b4×5, 0x103c, 0x1068, 0x10a0, 0x10fc, 0x1128, 0x1160, 0x11a0 |
| 1 | 9 | 0x1044×7, 0x10d8, 0x10dc |
| 6 | **3** | 0x10ac, 0x1108, 0x116c |
| 13 | 1 | 0x1044 |
| 2 | 1 | 0x8287094c |
| 11 | 1 | 0x828a3254 |
| 9 | 1 | 0x828a3230 |
| 8 | 1 | 0x000012c0 |
**tid=6 fires only 3 `signal.match` events** (3 of its 41 `NtSetEvent`
calls land on a parked waiter — namely tid=5's 3 satisfied
`NtWaitForSingleObjectEx` calls per [[iterate_2R_missing_producer_2026_05_28]]'s
per-wait table). The other 38 `NtSetEvent` calls land on already-signaled
or no-waiter events — consistent with the canary tid=11 polling-loop
behavior (the analog of ours tid=6) issuing many "ensure signaled" sets
on a manual-reset event that already has no waiter.
### tid=5 NtReleaseSemaphore on handle 0x000010b4 (the tid=6 backlog feeder)
`signal.match` on `0x000010b4` (tid=6 is the sole waiter per 2.Q snapshot):
| ns (ms) | signaler | waiters |
|---:|---:|---|
| 493.7 | tid=5 | [6] |
| 493.8 | tid=5 | [6] |
| 520.9 | tid=5 | [6] |
| 719.9 | tid=5 | [6] |
| **856.6** | **tid=5** | **[6]** |
**5 releases on 0x10b4 targeting tid=6 as parked waiter.** Critically,
**tid=5 release at ns=856.6 ms fires AFTER tid=6's last event at
ns=723.5 ms** (tid=6's last `NtSetEvent` `signal.match`). That release
should wake tid=6 — but tid=6 never reschedules (its last event is
723.6 ms, ~133 ms before the 856.6 ms release; the 859.2 ms run-end
is then only ~2.6 ms after the release with no tid=6 activity). This is
the same starvation pattern 2.R documented for a different jitter
sample (2.R had tid=5 issuing 76 releases in [880, 991] ms). **2.S
confirms the pattern is reproducible across jitter samples and across
budgets.**
(2.R's "76 releases" appears to have come from raw `kernel.call` args
parsing rather than `signal.match`; 2.S only has 5 because `signal.match`
filters to events where waiter_count ≥ 1 — the other ~70 releases must
have been on different handles or with no waiter present at signal time.
Either way the wake-pattern conclusion is the same.)
### Per-tid event counts and last activity
| tid | events | last host_ns (ms) | last guest_cycle |
|---:|---:|---:|---:|
| 1 | 108,516 | 852.3 | 9,169,116 |
| 5 | 10,031 | 859.2 | 486,334 |
| 4 | 2,075 | 859.2 | 92,705 |
| 13 | 436 | 855.3 | 27,211 |
| 6 | **318** | **723.6** | **6,020,629** |
| 9 | 78 | 819.3 | 689 |
| 8 | 38 | 852.0 | 443 |
| 3 | 37 | 468.5 | 1,030 |
| 2 | 34 | 468.1 | 4,273 |
| 10 | 17 | 819.4 | 103 |
| 11 | 12 | 819.0 | 91 |
| 12 | 6 | 851.6 | 45 |
| 7 | 5 | 500.7 | 30 |
tid=6's 318 events in 723.6 ms of host time is its **complete observable
lifetime in this run**, with the rest of the 13,700 ms wallclock budget
contributing zero further tid=6 events. tid=1 (the main bootstrap) and
tid=5 (the AUDIT-068 dispatcher) continue logging events until ~852-859 ms
host_ns, well past tid=6's quiescence — proving the trace isn't truncated
early; tid=6 specifically is starved.
## Disambiguation result vs goal-spec
| outcome | gate predicate | result | conclusion |
|---|---|---|---|
| BUDGET-CAP-WAS-ISSUE-WEDGE-DISSOLVED | `signal.match` on 0x12e4 in [1.0, 5.0]s > 0 OR tid=6 last event > 888.5 ms OR exit state changes | NO (0 on 0x12e4; tid=6 last 723.6 ms; exit state bit-identical to 2.M/2.N/2.Q) | **FALSIFIED** |
| C-2-CONFIRMED-STRUCTURAL | 0 signals on 0x12e4 AND tid=6 still Ready/idle at exit | YES + YES | **CONFIRMED** |
| NEW-BEHAVIOR-OBSERVED | event count or wedge map differs from 2.Q | NO (event count identical, wedge map identical) | NOT TRIGGERED |
| RUN-FAILED | non-zero exit / crash / hang | NO (EXIT=0, wall_ms=13,705) | NOT TRIGGERED |
**Result: C-2 (scheduler-fairness Ready-but-not-running on CPU5) confirmed
structural** by 500M-budget reproduction.
The C-1 (burst-then-halt by backlog drain) subclass framing **survives as
a partial-cause description** (tid=6's 228 ms burst from ns=498 to 723 is
real and finite, matches a backlog-drain shape), but **cannot be the SOLE
cause** because:
1. tid=5 issues a release on tid=6's waited semaphore 0x10b4 at ns=856.6 ms
(133 ms after tid=6 quiescent), which by C-1 alone should rescue tid=6;
2. The 500M budget gives the scheduler ~13s of wallclock to pick tid=6,
which has affinity 0x20 = CPU5 (shared with two other Ready tids
tid=10 and tid=12 — three Ready threads on one HW thread);
3. No tid=6 events appear in the entire post-723.6ms window.
The mechanism that must explain (1)+(2)+(3) is the C-2 scheduler-fairness
issue: tid=6 is on the Ready queue for hw_id=5 but the scheduler is not
context-switching to it. The 5th tid=5 release on 0x10b4 makes a
`signal.match` emit with tid=6 in waiter list — yet tid=6 doesn't actually
get woken+rescheduled before the budget runs out.
Open: whether (a) `wake_eligible_waiters` is correctly transitioning
tid=6 from Blocked→Ready and the scheduler then never re-picks it
(pure scheduler bug), OR (b) `wake_eligible_waiters` is failing to even
issue the wake-request for tid=6 (wake-call bug masquerading as scheduler
issue). 2.T (`wake.requested` instrumentation) decisively distinguishes
these.
## Comparison: 2.K → 2.Q → 2.S
| gate | 2.K (500M, no signal.match) | 2.Q (50M + signal.match) | 2.S (500M + signal.match) |
|------|---:|---:|---:|
| total events | 121,569 | 121,605 | **121,605** |
| baseline events | 121,569 | 121,569 | **121,569** |
| `signal.match` events | n/a | 36 | **36** |
| `signal.match` on 0x12e4 | n/a | 0 | **0** |
| Phase-A events 50M→500M | 0 (vs 2.J) | n/a | **0** |
| exit-state size | 9651 | 9651 | **9651** |
| wedge tids parked at 0x824ac578 | 5 | 5 | **5** |
| tid=6 final state | Ready | Ready | **Ready** |
| Termination | budget hit | (50M) | **budget hit (500M)** |
| Wallclock | 13.96 s | ~5 s | **13.7 s** |
| Engine binary HEAD | `e6d43a23` | `e6d43a23` + 2.Q patch | **`e6d43a23` + 2.Q patch** |
**Bit-equivalent to 2.Q** on every observable. Bit-equivalent to 2.K on
non-`signal.match` events. The 10× budget is observability-null, AND the
2.Q `signal.match` adds no events in the 50M→500M window.
## Tripstone audit
- **#28** (cross-engine tid stability): No cross-engine tid claims made.
Comparisons across 2.K/2.Q/2.S all on ours-side; ours-side scheduler
tids stable for this trajectory.
- **#39** (composite progression IS progression): NO progression claim.
VdSwap=1, draws=0, render_targets=0 — bit-identical to 2.J/2.K/2.Q/2.N.
Matched-prefix unchanged.
- **#40** (single-keystone framing): Carefully NOT collapsing into a
single-cause story. C-2 framing is *confirmed structural* via budget
reproduction, but the underlying mechanism (wake-call-not-issued vs
scheduler-skip) remains open and is what 2.T will distinguish. C-1
burst-then-halt remains partially descriptive (the burst exists) but
cannot be sole cause given (1)+(2)+(3) above.
- **#41** (categorized diff tags): `signal.match` is ENGINE_LOCAL in
the diff harness; doesn't affect matched-prefix.
- **#42** (Phase-A blind to blocked-forever waits): Used 2.M
`exit-thread-state.json` as authoritative for tid=6's Ready state
(Phase-A would have shown only the wait-loop completion events, missing
the actual final Ready geometry). Confirms tid=6 is NOT Blocked, it's
Ready-and-skipped.
## Reading-error #43 candidate — REJECTED
Goal-spec floated reading-error #43 as a candidate if budget-cap dissolved
the wedge. **The wedge did NOT dissolve at 10× budget.** Reading-error #43
is NOT triggered. **Inverse risk** (NOT a new reading-error, but worth
noting): be skeptical of "budget cap probably explains this" framings for
wait-loop wedges — 2.K already showed this, 2.S re-confirms. Any future
iterate that argues "we need more budget" must clear a high bar after
two consecutive 500M reproductions show zero new events.
## Confidence
- **HIGH** that 500M budget was hit cleanly (`exec complete wall_ms=13705
instructions=500000004`, diagnostic re-run).
- **HIGH** that event count is bit-identical to 2.Q (121,605 = 121,605
per `wc -l`).
- **HIGH** that exit-state thread geometry is bit-identical to 2.M/2.N/2.Q
(9651 bytes file, 13 threads, 10 wedge entries, same wedge_map).
- **HIGH** that `signal.match` on 0x000012e4 is 0 in entire 500M window
(exhaustive grep via python jsonl scan).
- **HIGH** that tid=6 last event at 723.6 ms is well below the 859.2 ms
trace-end, proving tid=6 specifically is starved (not a trace
truncation).
- **HIGH** that 2.R's C-2 framing is confirmed structural by this
reproduction — budget cap is NOT the cause.
- **MEDIUM-HIGH** that the underlying mechanism is scheduler-pick-skipping
Ready tid=6 (vs wake-call-not-issued); 2.T will distinguish.
- **HIGH** that 2.R's C-1 partial framing (burst-then-halt) is real but
cannot be sole cause given the 856.6 ms release evidence.
## Next iterate recommendation
**2.T — `wake.requested` instrumentation in `xenia-kernel/exports.rs`
`wake_eligible_waiters` (~80-150 LOC).** Emit a new schema-v1
`wake.requested` event per waiter the wake-loop touches, carrying
{signaler_tid, target_tid, handle, sid, prior_state, post_state,
context_switch_scheduled, ready_queue_position}. This decisively
distinguishes:
- **C-2a (wake-call not issued)**: `signal.match` shows tid=6 in waiter
list at ns=856.6 ms but no corresponding `wake.requested` event for
tid=6 → bug is in `wake_eligible_waiters` waiter-iteration or
per-handle waiter-list registration.
- **C-2b (wake-call issued, scheduler-skip)**: `wake.requested` fires
with `prior_state=Blocked, post_state=Ready, context_switch_scheduled=true`
but tid=6 never actually executes — bug is in the scheduler ready-queue
pick logic (hw_id=5 with affinity-0x20 contention).
C-2a vs C-2b have totally different fix paths (kernel-handle-list bug vs
scheduler-fairness bug), so this disambiguation is high-value. Same
observability-only pattern as 2.Q (zero semantic change). Estimated
~80-150 LOC, ~30 min to implement + ~10 min run + report.
Alternative deferred:
- **2.U — closure / commit 2.Q patch.** Per [[iterate_2Q_signal_match_2026_05_28]]
the patch is uncommitted in working tree. Commit hygiene if no immediate
follow-up work needed.
- **2.V — canary `signal.match` mirror (~30-60 LOC C++).** Adds parity
for cross-engine SID diff (per AUDIT-062 wrong-slot vs missing-producer
question). Higher long-term ROI but lower than 2.T's immediate
disambiguation value.
**Recommended:** 2.T first (~30-40 min total), then commit 2.Q + 2.T
together as a single observability batch.
## Artifacts
Under `xenia-rs/audit-runs/iterate-2S-longbudget-signal-match/`:
- `ours-cold.jsonl` (28.7 MB, 121,605 events, 500M-instr quiet run)
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single 2.M emission notice line — bit-equivalent
to 2.Q's stderr)
- `exit-thread-state.json` (9651 bytes; bit-identical to 2.M/2.N/2.Q —
13 threads + 10 wedge entries)
- `writer-report.md` (this file)
Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted
diff sha256 `e81a4b84224ab07330a0af259589e928` (2.Q `signal.match` patch
+ prior retained 2.F/2.H/2.L/2.M patches). xenia-canary UNCHANGED.