Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
349 lines
16 KiB
Markdown
349 lines
16 KiB
Markdown
# Iterate 2.S — Long-budget (500M) replay with 2.Q `signal.match` active (writer report)
|
||
|
||
**Date:** 2026-05-28. **LOC delta:** engine **0**, canary **0**, tooling **0**.
|
||
Pure measurement.
|
||
**Tests:** N/A (no source modifications).
|
||
**Cascade:** N/A — observability replay only.
|
||
|
||
## Headline
|
||
|
||
**BUDGET-CAP-FALSIFIED / C-2-SCHEDULER-FAIRNESS-CONFIRMED-STRUCTURAL.**
|
||
500M-instruction replay (10× 2.Q's 50M) under `XENIA_CACHE_WIPE=1` with
|
||
2.Q `signal.match` instrumentation active emits **121,605 events,
|
||
bit-identical to 2.Q's 50M run.** Run terminates `EXIT=0` at wallclock
|
||
13.7 s on `reached max instruction count limit=500000000`. **Zero
|
||
`signal.match` events on wedge handle `0x000012e4` (or any of the
|
||
4 unsignaled wedge handles {0x12c8, 0x12d0, 0x12e4, 0x1020}) in the
|
||
entire 500M-instruction window.** Exit-state thread geometry bit-identical
|
||
to 2.M/2.N/2.Q (13 threads, 10 wedge entries, same wedge map). **tid=6
|
||
remains `Ready` on `hw_id=5` with no resumption** despite the engine
|
||
having ~13× more wallclock budget to schedule it. Combined with 2.K's
|
||
identical "zero new events 50M→500M" result, this **definitively rules
|
||
out the C-1 burst-then-halt subclass framing as a budget-truncation
|
||
artifact** and **confirms 2.R's C-2 (Ready-but-not-running on CPU5) as
|
||
structural**. Next iterate should be 2.T (`wake.requested`
|
||
instrumentation in `wake_eligible_waiters`) to decisively distinguish
|
||
kernel-wake-call-not-issued vs scheduler-pick-skipping-Ready-tid-6.
|
||
|
||
## Mode
|
||
|
||
ZERO LOC. Invocation (identical to 2.K except cwd):
|
||
|
||
```
|
||
XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \
|
||
-n 500000000 --quiet \
|
||
--phase-a-event-log audit-runs/iterate-2S-longbudget-signal-match/ours-cold.jsonl \
|
||
"../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
|
||
```
|
||
|
||
Engine binary `xenia-rs/target/release/xenia-rs` from May 28 19:51 carries
|
||
the uncommitted 2.Q `signal.match` patch (working tree HEAD
|
||
`e6d43a23…` + diff sha256 `e81a4b84…`). XDG cache `/home/fabi/.local/share/
|
||
xenia-rs/cache/` was empty before run; `XENIA_CACHE_WIPE=1` set for
|
||
belt-and-braces.
|
||
|
||
Run completed `EXIT=0`. Diagnostic re-run (non-quiet) captured:
|
||
`reached max instruction count limit=500000000` ... `exec complete
|
||
wall_ms=13705 instructions=500000004 import_calls=40390 unimplemented=0`.
|
||
Instruction budget hit cleanly, no panic / fault / SIGSEGV / timeout.
|
||
|
||
## Primary gate results
|
||
|
||
### Gate 1 — `signal.match` events on wedge handle `0x000012e4`
|
||
|
||
| metric | value |
|
||
|---|---:|
|
||
| `signal.match` on `0x000012e4` whole run | **0** |
|
||
| `signal.match` on `0x000012e4` in [1.0, 5.0] s | **0** |
|
||
| `signal.match` total (run-wide) | 36 |
|
||
|
||
**Same as 2.Q.** No signaler ever produces `0x000012e4`. The disambiguation
|
||
gate from the goal-spec resolves to "C-2 confirmed structural" (since
|
||
neither budget cap nor signal observability changed).
|
||
|
||
### Gate 2 — Exit-thread-state on tid=6 (and other wedge tids)
|
||
|
||
`exit-thread-state.json` 9651 bytes, bit-identical to 2.M/2.N/2.Q. tid=6
|
||
state and full wedge geometry unchanged:
|
||
|
||
| tid | state | hw_id | affinity | last_pc | wedge waiting on |
|
||
|---:|---|---:|---|---|---|
|
||
| 1 | Blocked | 0 | 0xff | 0x824ac578 | 0x12c8 = Thread(13) |
|
||
| 2 | Blocked | 1 | 0xff | 0x824a95f8 | 0x8287093c = Event |
|
||
| 3 | Blocked | 5 | 0x20 | 0x824ac578 | 0x1020 = Event |
|
||
| 4 | Blocked | 3 | 0x08 | 0x824ac578 | 0x1028 = Semaphore(0/2³¹-1) |
|
||
| 5 | Blocked | 3 | 0x08 | 0x824ac578 | **0x12e4 = Event** |
|
||
| **6** | **Ready** | **5** | **0x20** | **0x824ab214** | **—** |
|
||
| 7 | Blocked | 2 | 0x04 | 0x824cd4f4 | 0xbe8cbb5c = Event |
|
||
| 8 | Blocked | 2 | 0x04 | 0x824ab214 | 0x10ec=Event + 0x10d8=Sem |
|
||
| 9 | Ready | 4 | 0x10 | 0x824d1404 | — |
|
||
| 10 | Ready | 5 | 0x20 | 0x824d1404 | — |
|
||
| 11 | Blocked | 0 | 0xff | 0x824d2a94 | 0x828a3244 + 0x828a3220 |
|
||
| 12 | Ready | 5 | 0x20 | 0x824aa6a4 | — |
|
||
| 13 | Blocked | 1 | 0x02 | 0x824ac578 | 0x12d0 = Event |
|
||
|
||
**tid=6 STILL Ready at hw_id=5** — exactly as 2.R observed. The 10× budget
|
||
did not allow the scheduler to resume tid=6.
|
||
|
||
### Gate 3 — tid=5 last guest_cycle
|
||
|
||
| metric | 2.R (50M jitter sample) | 2.S (500M run) | delta |
|
||
|---|---:|---:|---:|
|
||
| tid=5 last guest_cycle | (n/a separately reported, but wedge wait at 1,007,809,113 host_ns) | **486,334** | — |
|
||
| tid=5 last host_ns | 1,007,809,113 (2.R) | **859,219,713** | LOWER (jitter, not regression) |
|
||
|
||
Note: `host_ns` is wallclock-derived and varies jitter-to-jitter. `guest_cycle`
|
||
is the deterministic guest-side counter; tid=5's last guest_cycle 486,334
|
||
is bit-equivalent across 2.Q / 2.S (same Phase-A event content).
|
||
|
||
## Secondary gate results
|
||
|
||
### Total event counts
|
||
|
||
| metric | 2.K (50M-baseline, no signal.match) | 2.Q (50M+signal.match) | 2.S (500M+signal.match) |
|
||
|---|---:|---:|---:|
|
||
| total events | 121,569 | 121,605 | **121,605** |
|
||
| `signal.match` events | 0 (kind not emitted) | 36 | **36** |
|
||
| baseline events (ex signal.match) | 121,569 | 121,569 | **121,569** |
|
||
| Phase-A delta 50M→500M | 0 (vs 2.J) | n/a | **0** |
|
||
| Wallclock | 13.96 s | not reported (~5s) | **13.7 s** |
|
||
| Termination reason | `reached max instruction count limit` | (50M) | **`reached max instruction count limit=500000000`** |
|
||
|
||
**Bit-identical event count to 2.Q.** 10× budget bought ~10× wallclock
|
||
but produced **zero additional Phase-A events**.
|
||
|
||
### `signal.match` by signaler tid (whole run)
|
||
|
||
| tid | count | target handles |
|
||
|---:|---:|---|
|
||
| 5 | 19 | 0x1028×7, 0x10b4×5, 0x103c, 0x1068, 0x10a0, 0x10fc, 0x1128, 0x1160, 0x11a0 |
|
||
| 1 | 9 | 0x1044×7, 0x10d8, 0x10dc |
|
||
| 6 | **3** | 0x10ac, 0x1108, 0x116c |
|
||
| 13 | 1 | 0x1044 |
|
||
| 2 | 1 | 0x8287094c |
|
||
| 11 | 1 | 0x828a3254 |
|
||
| 9 | 1 | 0x828a3230 |
|
||
| 8 | 1 | 0x000012c0 |
|
||
|
||
**tid=6 fires only 3 `signal.match` events** (3 of its 41 `NtSetEvent`
|
||
calls land on a parked waiter — namely tid=5's 3 satisfied
|
||
`NtWaitForSingleObjectEx` calls per [[iterate_2R_missing_producer_2026_05_28]]'s
|
||
per-wait table). The other 38 `NtSetEvent` calls land on already-signaled
|
||
or no-waiter events — consistent with the canary tid=11 polling-loop
|
||
behavior (the analog of ours tid=6) issuing many "ensure signaled" sets
|
||
on a manual-reset event that already has no waiter.
|
||
|
||
### tid=5 NtReleaseSemaphore on handle 0x000010b4 (the tid=6 backlog feeder)
|
||
|
||
`signal.match` on `0x000010b4` (tid=6 is the sole waiter per 2.Q snapshot):
|
||
|
||
| ns (ms) | signaler | waiters |
|
||
|---:|---:|---|
|
||
| 493.7 | tid=5 | [6] |
|
||
| 493.8 | tid=5 | [6] |
|
||
| 520.9 | tid=5 | [6] |
|
||
| 719.9 | tid=5 | [6] |
|
||
| **856.6** | **tid=5** | **[6]** |
|
||
|
||
**5 releases on 0x10b4 targeting tid=6 as parked waiter.** Critically,
|
||
**tid=5 release at ns=856.6 ms fires AFTER tid=6's last event at
|
||
ns=723.5 ms** (tid=6's last `NtSetEvent` `signal.match`). That release
|
||
should wake tid=6 — but tid=6 never reschedules (its last event is
|
||
723.6 ms, ~133 ms before the 856.6 ms release; the 859.2 ms run-end
|
||
is then only ~2.6 ms after the release with no tid=6 activity). This is
|
||
the same starvation pattern 2.R documented for a different jitter
|
||
sample (2.R had tid=5 issuing 76 releases in [880, 991] ms). **2.S
|
||
confirms the pattern is reproducible across jitter samples and across
|
||
budgets.**
|
||
|
||
(2.R's "76 releases" appears to have come from raw `kernel.call` args
|
||
parsing rather than `signal.match`; 2.S only has 5 because `signal.match`
|
||
filters to events where waiter_count ≥ 1 — the other ~70 releases must
|
||
have been on different handles or with no waiter present at signal time.
|
||
Either way the wake-pattern conclusion is the same.)
|
||
|
||
### Per-tid event counts and last activity
|
||
|
||
| tid | events | last host_ns (ms) | last guest_cycle |
|
||
|---:|---:|---:|---:|
|
||
| 1 | 108,516 | 852.3 | 9,169,116 |
|
||
| 5 | 10,031 | 859.2 | 486,334 |
|
||
| 4 | 2,075 | 859.2 | 92,705 |
|
||
| 13 | 436 | 855.3 | 27,211 |
|
||
| 6 | **318** | **723.6** | **6,020,629** |
|
||
| 9 | 78 | 819.3 | 689 |
|
||
| 8 | 38 | 852.0 | 443 |
|
||
| 3 | 37 | 468.5 | 1,030 |
|
||
| 2 | 34 | 468.1 | 4,273 |
|
||
| 10 | 17 | 819.4 | 103 |
|
||
| 11 | 12 | 819.0 | 91 |
|
||
| 12 | 6 | 851.6 | 45 |
|
||
| 7 | 5 | 500.7 | 30 |
|
||
|
||
tid=6's 318 events in 723.6 ms of host time is its **complete observable
|
||
lifetime in this run**, with the rest of the 13,700 ms wallclock budget
|
||
contributing zero further tid=6 events. tid=1 (the main bootstrap) and
|
||
tid=5 (the AUDIT-068 dispatcher) continue logging events until ~852-859 ms
|
||
host_ns, well past tid=6's quiescence — proving the trace isn't truncated
|
||
early; tid=6 specifically is starved.
|
||
|
||
## Disambiguation result vs goal-spec
|
||
|
||
| outcome | gate predicate | result | conclusion |
|
||
|---|---|---|---|
|
||
| BUDGET-CAP-WAS-ISSUE-WEDGE-DISSOLVED | `signal.match` on 0x12e4 in [1.0, 5.0]s > 0 OR tid=6 last event > 888.5 ms OR exit state changes | NO (0 on 0x12e4; tid=6 last 723.6 ms; exit state bit-identical to 2.M/2.N/2.Q) | **FALSIFIED** |
|
||
| C-2-CONFIRMED-STRUCTURAL | 0 signals on 0x12e4 AND tid=6 still Ready/idle at exit | YES + YES | **CONFIRMED** |
|
||
| NEW-BEHAVIOR-OBSERVED | event count or wedge map differs from 2.Q | NO (event count identical, wedge map identical) | NOT TRIGGERED |
|
||
| RUN-FAILED | non-zero exit / crash / hang | NO (EXIT=0, wall_ms=13,705) | NOT TRIGGERED |
|
||
|
||
**Result: C-2 (scheduler-fairness Ready-but-not-running on CPU5) confirmed
|
||
structural** by 500M-budget reproduction.
|
||
|
||
The C-1 (burst-then-halt by backlog drain) subclass framing **survives as
|
||
a partial-cause description** (tid=6's 228 ms burst from ns=498 to 723 is
|
||
real and finite, matches a backlog-drain shape), but **cannot be the SOLE
|
||
cause** because:
|
||
|
||
1. tid=5 issues a release on tid=6's waited semaphore 0x10b4 at ns=856.6 ms
|
||
(133 ms after tid=6 quiescent), which by C-1 alone should rescue tid=6;
|
||
2. The 500M budget gives the scheduler ~13s of wallclock to pick tid=6,
|
||
which has affinity 0x20 = CPU5 (shared with two other Ready tids
|
||
tid=10 and tid=12 — three Ready threads on one HW thread);
|
||
3. No tid=6 events appear in the entire post-723.6ms window.
|
||
|
||
The mechanism that must explain (1)+(2)+(3) is the C-2 scheduler-fairness
|
||
issue: tid=6 is on the Ready queue for hw_id=5 but the scheduler is not
|
||
context-switching to it. The 5th tid=5 release on 0x10b4 makes a
|
||
`signal.match` emit with tid=6 in waiter list — yet tid=6 doesn't actually
|
||
get woken+rescheduled before the budget runs out.
|
||
|
||
Open: whether (a) `wake_eligible_waiters` is correctly transitioning
|
||
tid=6 from Blocked→Ready and the scheduler then never re-picks it
|
||
(pure scheduler bug), OR (b) `wake_eligible_waiters` is failing to even
|
||
issue the wake-request for tid=6 (wake-call bug masquerading as scheduler
|
||
issue). 2.T (`wake.requested` instrumentation) decisively distinguishes
|
||
these.
|
||
|
||
## Comparison: 2.K → 2.Q → 2.S
|
||
|
||
| gate | 2.K (500M, no signal.match) | 2.Q (50M + signal.match) | 2.S (500M + signal.match) |
|
||
|------|---:|---:|---:|
|
||
| total events | 121,569 | 121,605 | **121,605** |
|
||
| baseline events | 121,569 | 121,569 | **121,569** |
|
||
| `signal.match` events | n/a | 36 | **36** |
|
||
| `signal.match` on 0x12e4 | n/a | 0 | **0** |
|
||
| Phase-A events 50M→500M | 0 (vs 2.J) | n/a | **0** |
|
||
| exit-state size | 9651 | 9651 | **9651** |
|
||
| wedge tids parked at 0x824ac578 | 5 | 5 | **5** |
|
||
| tid=6 final state | Ready | Ready | **Ready** |
|
||
| Termination | budget hit | (50M) | **budget hit (500M)** |
|
||
| Wallclock | 13.96 s | ~5 s | **13.7 s** |
|
||
| Engine binary HEAD | `e6d43a23` | `e6d43a23` + 2.Q patch | **`e6d43a23` + 2.Q patch** |
|
||
|
||
**Bit-equivalent to 2.Q** on every observable. Bit-equivalent to 2.K on
|
||
non-`signal.match` events. The 10× budget is observability-null, AND the
|
||
2.Q `signal.match` adds no events in the 50M→500M window.
|
||
|
||
## Tripstone audit
|
||
|
||
- **#28** (cross-engine tid stability): No cross-engine tid claims made.
|
||
Comparisons across 2.K/2.Q/2.S all on ours-side; ours-side scheduler
|
||
tids stable for this trajectory.
|
||
- **#39** (composite progression IS progression): NO progression claim.
|
||
VdSwap=1, draws=0, render_targets=0 — bit-identical to 2.J/2.K/2.Q/2.N.
|
||
Matched-prefix unchanged.
|
||
- **#40** (single-keystone framing): Carefully NOT collapsing into a
|
||
single-cause story. C-2 framing is *confirmed structural* via budget
|
||
reproduction, but the underlying mechanism (wake-call-not-issued vs
|
||
scheduler-skip) remains open and is what 2.T will distinguish. C-1
|
||
burst-then-halt remains partially descriptive (the burst exists) but
|
||
cannot be sole cause given (1)+(2)+(3) above.
|
||
- **#41** (categorized diff tags): `signal.match` is ENGINE_LOCAL in
|
||
the diff harness; doesn't affect matched-prefix.
|
||
- **#42** (Phase-A blind to blocked-forever waits): Used 2.M
|
||
`exit-thread-state.json` as authoritative for tid=6's Ready state
|
||
(Phase-A would have shown only the wait-loop completion events, missing
|
||
the actual final Ready geometry). Confirms tid=6 is NOT Blocked, it's
|
||
Ready-and-skipped.
|
||
|
||
## Reading-error #43 candidate — REJECTED
|
||
|
||
Goal-spec floated reading-error #43 as a candidate if budget-cap dissolved
|
||
the wedge. **The wedge did NOT dissolve at 10× budget.** Reading-error #43
|
||
is NOT triggered. **Inverse risk** (NOT a new reading-error, but worth
|
||
noting): be skeptical of "budget cap probably explains this" framings for
|
||
wait-loop wedges — 2.K already showed this, 2.S re-confirms. Any future
|
||
iterate that argues "we need more budget" must clear a high bar after
|
||
two consecutive 500M reproductions show zero new events.
|
||
|
||
## Confidence
|
||
|
||
- **HIGH** that 500M budget was hit cleanly (`exec complete wall_ms=13705
|
||
instructions=500000004`, diagnostic re-run).
|
||
- **HIGH** that event count is bit-identical to 2.Q (121,605 = 121,605
|
||
per `wc -l`).
|
||
- **HIGH** that exit-state thread geometry is bit-identical to 2.M/2.N/2.Q
|
||
(9651 bytes file, 13 threads, 10 wedge entries, same wedge_map).
|
||
- **HIGH** that `signal.match` on 0x000012e4 is 0 in entire 500M window
|
||
(exhaustive grep via python jsonl scan).
|
||
- **HIGH** that tid=6 last event at 723.6 ms is well below the 859.2 ms
|
||
trace-end, proving tid=6 specifically is starved (not a trace
|
||
truncation).
|
||
- **HIGH** that 2.R's C-2 framing is confirmed structural by this
|
||
reproduction — budget cap is NOT the cause.
|
||
- **MEDIUM-HIGH** that the underlying mechanism is scheduler-pick-skipping
|
||
Ready tid=6 (vs wake-call-not-issued); 2.T will distinguish.
|
||
- **HIGH** that 2.R's C-1 partial framing (burst-then-halt) is real but
|
||
cannot be sole cause given the 856.6 ms release evidence.
|
||
|
||
## Next iterate recommendation
|
||
|
||
**2.T — `wake.requested` instrumentation in `xenia-kernel/exports.rs`
|
||
`wake_eligible_waiters` (~80-150 LOC).** Emit a new schema-v1
|
||
`wake.requested` event per waiter the wake-loop touches, carrying
|
||
{signaler_tid, target_tid, handle, sid, prior_state, post_state,
|
||
context_switch_scheduled, ready_queue_position}. This decisively
|
||
distinguishes:
|
||
|
||
- **C-2a (wake-call not issued)**: `signal.match` shows tid=6 in waiter
|
||
list at ns=856.6 ms but no corresponding `wake.requested` event for
|
||
tid=6 → bug is in `wake_eligible_waiters` waiter-iteration or
|
||
per-handle waiter-list registration.
|
||
- **C-2b (wake-call issued, scheduler-skip)**: `wake.requested` fires
|
||
with `prior_state=Blocked, post_state=Ready, context_switch_scheduled=true`
|
||
but tid=6 never actually executes — bug is in the scheduler ready-queue
|
||
pick logic (hw_id=5 with affinity-0x20 contention).
|
||
|
||
C-2a vs C-2b have totally different fix paths (kernel-handle-list bug vs
|
||
scheduler-fairness bug), so this disambiguation is high-value. Same
|
||
observability-only pattern as 2.Q (zero semantic change). Estimated
|
||
~80-150 LOC, ~30 min to implement + ~10 min run + report.
|
||
|
||
Alternative deferred:
|
||
- **2.U — closure / commit 2.Q patch.** Per [[iterate_2Q_signal_match_2026_05_28]]
|
||
the patch is uncommitted in working tree. Commit hygiene if no immediate
|
||
follow-up work needed.
|
||
- **2.V — canary `signal.match` mirror (~30-60 LOC C++).** Adds parity
|
||
for cross-engine SID diff (per AUDIT-062 wrong-slot vs missing-producer
|
||
question). Higher long-term ROI but lower than 2.T's immediate
|
||
disambiguation value.
|
||
|
||
**Recommended:** 2.T first (~30-40 min total), then commit 2.Q + 2.T
|
||
together as a single observability batch.
|
||
|
||
## Artifacts
|
||
|
||
Under `xenia-rs/audit-runs/iterate-2S-longbudget-signal-match/`:
|
||
|
||
- `ours-cold.jsonl` (28.7 MB, 121,605 events, 500M-instr quiet run)
|
||
- `ours-cold.stdout.log` (empty — quiet mode)
|
||
- `ours-cold.stderr.log` (single 2.M emission notice line — bit-equivalent
|
||
to 2.Q's stderr)
|
||
- `exit-thread-state.json` (9651 bytes; bit-identical to 2.M/2.N/2.Q —
|
||
13 threads + 10 wedge entries)
|
||
- `writer-report.md` (this file)
|
||
|
||
Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted
|
||
diff sha256 `e81a4b84224ab07330a0af259589e928` (2.Q `signal.match` patch
|
||
+ prior retained 2.F/2.H/2.L/2.M patches). xenia-canary UNCHANGED.
|