Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

16 KiB

Raw Blame History

Iterate 2.S — Long-budget (500M) replay with 2.Q `signal.match` active (writer report)

Date: 2026-05-28. LOC delta: engine 0, canary 0, tooling 0. Pure measurement. Tests: N/A (no source modifications). Cascade: N/A — observability replay only.

Headline

BUDGET-CAP-FALSIFIED / C-2-SCHEDULER-FAIRNESS-CONFIRMED-STRUCTURAL. 500M-instruction replay (10× 2.Q's 50M) under XENIA_CACHE_WIPE=1 with 2.Q signal.match instrumentation active emits 121,605 events, bit-identical to 2.Q's 50M run. Run terminates EXIT=0 at wallclock 13.7 s on reached max instruction count limit=500000000. Zero signal.match events on wedge handle 0x000012e4 (or any of the 4 unsignaled wedge handles {0x12c8, 0x12d0, 0x12e4, 0x1020}) in the entire 500M-instruction window. Exit-state thread geometry bit-identical to 2.M/2.N/2.Q (13 threads, 10 wedge entries, same wedge map). tid=6 remains Ready on hw_id=5 with no resumption despite the engine having ~13× more wallclock budget to schedule it. Combined with 2.K's identical "zero new events 50M→500M" result, this definitively rules out the C-1 burst-then-halt subclass framing as a budget-truncation artifact and confirms 2.R's C-2 (Ready-but-not-running on CPU5) as structural. Next iterate should be 2.T (wake.requested instrumentation in wake_eligible_waiters) to decisively distinguish kernel-wake-call-not-issued vs scheduler-pick-skipping-Ready-tid-6.

Mode

ZERO LOC. Invocation (identical to 2.K except cwd):

XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \
  -n 500000000 --quiet \
  --phase-a-event-log audit-runs/iterate-2S-longbudget-signal-match/ours-cold.jsonl \
  "../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"

Engine binary xenia-rs/target/release/xenia-rs from May 28 19:51 carries the uncommitted 2.Q signal.match patch (working tree HEAD e6d43a23… + diff sha256 e81a4b84…). XDG cache /home/fabi/.local/share/ xenia-rs/cache/ was empty before run; XENIA_CACHE_WIPE=1 set for belt-and-braces.

Run completed EXIT=0. Diagnostic re-run (non-quiet) captured: reached max instruction count limit=500000000 ... exec complete wall_ms=13705 instructions=500000004 import_calls=40390 unimplemented=0. Instruction budget hit cleanly, no panic / fault / SIGSEGV / timeout.

Primary gate results

Gate 1 — `signal.match` events on wedge handle `0x000012e4`

metric	value
`signal.match` on `0x000012e4` whole run	0
`signal.match` on `0x000012e4` in [1.0, 5.0] s	0
`signal.match` total (run-wide)	36

Same as 2.Q. No signaler ever produces 0x000012e4. The disambiguation gate from the goal-spec resolves to "C-2 confirmed structural" (since neither budget cap nor signal observability changed).

Gate 2 — Exit-thread-state on tid=6 (and other wedge tids)

exit-thread-state.json 9651 bytes, bit-identical to 2.M/2.N/2.Q. tid=6 state and full wedge geometry unchanged:

tid	state	hw_id	affinity	last_pc	wedge waiting on
1	Blocked	0	0xff	0x824ac578	0x12c8 = Thread(13)
2	Blocked	1	0xff	0x824a95f8	0x8287093c = Event
3	Blocked	5	0x20	0x824ac578	0x1020 = Event
4	Blocked	3	0x08	0x824ac578	0x1028 = Semaphore(0/2³¹-1)
5	Blocked	3	0x08	0x824ac578	0x12e4 = Event
6	Ready	5	0x20	0x824ab214	—
7	Blocked	2	0x04	0x824cd4f4	0xbe8cbb5c = Event
8	Blocked	2	0x04	0x824ab214	0x10ec=Event + 0x10d8=Sem
9	Ready	4	0x10	0x824d1404	—
10	Ready	5	0x20	0x824d1404	—
11	Blocked	0	0xff	0x824d2a94	0x828a3244 + 0x828a3220
12	Ready	5	0x20	0x824aa6a4	—
13	Blocked	1	0x02	0x824ac578	0x12d0 = Event

tid=6 STILL Ready at hw_id=5 — exactly as 2.R observed. The 10× budget did not allow the scheduler to resume tid=6.

Gate 3 — tid=5 last guest_cycle

metric	2.R (50M jitter sample)	2.S (500M run)	delta
tid=5 last guest_cycle	(n/a separately reported, but wedge wait at 1,007,809,113 host_ns)	486,334	—
tid=5 last host_ns	1,007,809,113 (2.R)	859,219,713	LOWER (jitter, not regression)

Note: host_ns is wallclock-derived and varies jitter-to-jitter. guest_cycle is the deterministic guest-side counter; tid=5's last guest_cycle 486,334 is bit-equivalent across 2.Q / 2.S (same Phase-A event content).

Secondary gate results

Total event counts

metric	2.K (50M-baseline, no signal.match)	2.Q (50M+signal.match)	2.S (500M+signal.match)
total events	121,569	121,605	121,605
`signal.match` events	0 (kind not emitted)	36	36
baseline events (ex signal.match)	121,569	121,569	121,569
Phase-A delta 50M→500M	0 (vs 2.J)	n/a	0
Wallclock	13.96 s	not reported (~5s)	13.7 s
Termination reason	`reached max instruction count limit`	(50M)	`reached max instruction count limit=500000000`

Bit-identical event count to 2.Q. 10× budget bought ~10× wallclock but produced zero additional Phase-A events.

`signal.match` by signaler tid (whole run)

tid	count	target handles
5	19	0x1028×7, 0x10b4×5, 0x103c, 0x1068, 0x10a0, 0x10fc, 0x1128, 0x1160, 0x11a0
1	9	0x1044×7, 0x10d8, 0x10dc
6	3	0x10ac, 0x1108, 0x116c
13	1	0x1044
2	1	0x8287094c
11	1	0x828a3254
9	1	0x828a3230
8	1	0x000012c0

tid=6 fires only 3 signal.match events (3 of its 41 NtSetEvent calls land on a parked waiter — namely tid=5's 3 satisfied NtWaitForSingleObjectEx calls per iterate_2R_missing_producer_2026_05_28's per-wait table). The other 38 NtSetEvent calls land on already-signaled or no-waiter events — consistent with the canary tid=11 polling-loop behavior (the analog of ours tid=6) issuing many "ensure signaled" sets on a manual-reset event that already has no waiter.

tid=5 NtReleaseSemaphore on handle 0x000010b4 (the tid=6 backlog feeder)

signal.match on 0x000010b4 (tid=6 is the sole waiter per 2.Q snapshot):

ns (ms)	signaler	waiters
493.7	tid=5	[6]
493.8	tid=5	[6]
520.9	tid=5	[6]
719.9	tid=5	[6]
856.6	tid=5	[6]

5 releases on 0x10b4 targeting tid=6 as parked waiter. Critically, tid=5 release at ns=856.6 ms fires AFTER tid=6's last event at ns=723.5 ms (tid=6's last NtSetEvent signal.match). That release should wake tid=6 — but tid=6 never reschedules (its last event is 723.6 ms, ~133 ms before the 856.6 ms release; the 859.2 ms run-end is then only ~2.6 ms after the release with no tid=6 activity). This is the same starvation pattern 2.R documented for a different jitter sample (2.R had tid=5 issuing 76 releases in [880, 991] ms). 2.S confirms the pattern is reproducible across jitter samples and across budgets.

(2.R's "76 releases" appears to have come from raw kernel.call args parsing rather than signal.match; 2.S only has 5 because signal.match filters to events where waiter_count ≥ 1 — the other ~70 releases must have been on different handles or with no waiter present at signal time. Either way the wake-pattern conclusion is the same.)

Per-tid event counts and last activity

tid	events	last host_ns (ms)	last guest_cycle
1	108,516	852.3	9,169,116
5	10,031	859.2	486,334
4	2,075	859.2	92,705
13	436	855.3	27,211
6	318	723.6	6,020,629
9	78	819.3	689
8	38	852.0	443
3	37	468.5	1,030
2	34	468.1	4,273
10	17	819.4	103
11	12	819.0	91
12	6	851.6	45
7	5	500.7	30

tid=6's 318 events in 723.6 ms of host time is its complete observable lifetime in this run, with the rest of the 13,700 ms wallclock budget contributing zero further tid=6 events. tid=1 (the main bootstrap) and tid=5 (the AUDIT-068 dispatcher) continue logging events until ~852-859 ms host_ns, well past tid=6's quiescence — proving the trace isn't truncated early; tid=6 specifically is starved.

Disambiguation result vs goal-spec

outcome	gate predicate	result	conclusion
BUDGET-CAP-WAS-ISSUE-WEDGE-DISSOLVED	`signal.match` on 0x12e4 in [1.0, 5.0]s > 0 OR tid=6 last event > 888.5 ms OR exit state changes	NO (0 on 0x12e4; tid=6 last 723.6 ms; exit state bit-identical to 2.M/2.N/2.Q)	FALSIFIED
C-2-CONFIRMED-STRUCTURAL	0 signals on 0x12e4 AND tid=6 still Ready/idle at exit	YES + YES	CONFIRMED
NEW-BEHAVIOR-OBSERVED	event count or wedge map differs from 2.Q	NO (event count identical, wedge map identical)	NOT TRIGGERED
RUN-FAILED	non-zero exit / crash / hang	NO (EXIT=0, wall_ms=13,705)	NOT TRIGGERED

Result: C-2 (scheduler-fairness Ready-but-not-running on CPU5) confirmed structural by 500M-budget reproduction.

The C-1 (burst-then-halt by backlog drain) subclass framing survives as a partial-cause description (tid=6's 228 ms burst from ns=498 to 723 is real and finite, matches a backlog-drain shape), but cannot be the SOLE cause because:

tid=5 issues a release on tid=6's waited semaphore 0x10b4 at ns=856.6 ms (133 ms after tid=6 quiescent), which by C-1 alone should rescue tid=6;
The 500M budget gives the scheduler ~13s of wallclock to pick tid=6, which has affinity 0x20 = CPU5 (shared with two other Ready tids tid=10 and tid=12 — three Ready threads on one HW thread);
No tid=6 events appear in the entire post-723.6ms window.

The mechanism that must explain (1)+(2)+(3) is the C-2 scheduler-fairness issue: tid=6 is on the Ready queue for hw_id=5 but the scheduler is not context-switching to it. The 5th tid=5 release on 0x10b4 makes a signal.match emit with tid=6 in waiter list — yet tid=6 doesn't actually get woken+rescheduled before the budget runs out.

Open: whether (a) wake_eligible_waiters is correctly transitioning tid=6 from Blocked→Ready and the scheduler then never re-picks it (pure scheduler bug), OR (b) wake_eligible_waiters is failing to even issue the wake-request for tid=6 (wake-call bug masquerading as scheduler issue). 2.T (wake.requested instrumentation) decisively distinguishes these.

Comparison: 2.K → 2.Q → 2.S

gate	2.K (500M, no signal.match)	2.Q (50M + signal.match)	2.S (500M + signal.match)
total events	121,569	121,605	121,605
baseline events	121,569	121,569	121,569
`signal.match` events	n/a	36	36
`signal.match` on 0x12e4	n/a	0	0
Phase-A events 50M→500M	0 (vs 2.J)	n/a	0
exit-state size	9651	9651	9651
wedge tids parked at 0x824ac578	5	5	5
tid=6 final state	Ready	Ready	Ready
Termination	budget hit	(50M)	budget hit (500M)
Wallclock	13.96 s	~5 s	13.7 s
Engine binary HEAD	`e6d43a23`	`e6d43a23` + 2.Q patch	`e6d43a23` + 2.Q patch

Bit-equivalent to 2.Q on every observable. Bit-equivalent to 2.K on non-signal.match events. The 10× budget is observability-null, AND the 2.Q signal.match adds no events in the 50M→500M window.

Tripstone audit

#28 (cross-engine tid stability): No cross-engine tid claims made. Comparisons across 2.K/2.Q/2.S all on ours-side; ours-side scheduler tids stable for this trajectory.
#39 (composite progression IS progression): NO progression claim. VdSwap=1, draws=0, render_targets=0 — bit-identical to 2.J/2.K/2.Q/2.N. Matched-prefix unchanged.
#40 (single-keystone framing): Carefully NOT collapsing into a single-cause story. C-2 framing is confirmed structural via budget reproduction, but the underlying mechanism (wake-call-not-issued vs scheduler-skip) remains open and is what 2.T will distinguish. C-1 burst-then-halt remains partially descriptive (the burst exists) but cannot be sole cause given (1)+(2)+(3) above.
#41 (categorized diff tags): signal.match is ENGINE_LOCAL in the diff harness; doesn't affect matched-prefix.
#42 (Phase-A blind to blocked-forever waits): Used 2.M exit-thread-state.json as authoritative for tid=6's Ready state (Phase-A would have shown only the wait-loop completion events, missing the actual final Ready geometry). Confirms tid=6 is NOT Blocked, it's Ready-and-skipped.

Reading-error #43 candidate — REJECTED

Goal-spec floated reading-error #43 as a candidate if budget-cap dissolved the wedge. The wedge did NOT dissolve at 10× budget. Reading-error #43 is NOT triggered. Inverse risk (NOT a new reading-error, but worth noting): be skeptical of "budget cap probably explains this" framings for wait-loop wedges — 2.K already showed this, 2.S re-confirms. Any future iterate that argues "we need more budget" must clear a high bar after two consecutive 500M reproductions show zero new events.

Confidence

HIGH that 500M budget was hit cleanly (exec complete wall_ms=13705 instructions=500000004, diagnostic re-run).
HIGH that event count is bit-identical to 2.Q (121,605 = 121,605 per wc -l).
HIGH that exit-state thread geometry is bit-identical to 2.M/2.N/2.Q (9651 bytes file, 13 threads, 10 wedge entries, same wedge_map).
HIGH that signal.match on 0x000012e4 is 0 in entire 500M window (exhaustive grep via python jsonl scan).
HIGH that tid=6 last event at 723.6 ms is well below the 859.2 ms trace-end, proving tid=6 specifically is starved (not a trace truncation).
HIGH that 2.R's C-2 framing is confirmed structural by this reproduction — budget cap is NOT the cause.
MEDIUM-HIGH that the underlying mechanism is scheduler-pick-skipping Ready tid=6 (vs wake-call-not-issued); 2.T will distinguish.
HIGH that 2.R's C-1 partial framing (burst-then-halt) is real but cannot be sole cause given the 856.6 ms release evidence.

Next iterate recommendation

2.T — wake.requested instrumentation in xenia-kernel/exports.rs wake_eligible_waiters (~80-150 LOC). Emit a new schema-v1 wake.requested event per waiter the wake-loop touches, carrying {signaler_tid, target_tid, handle, sid, prior_state, post_state, context_switch_scheduled, ready_queue_position}. This decisively distinguishes:

C-2a (wake-call not issued): signal.match shows tid=6 in waiter list at ns=856.6 ms but no corresponding wake.requested event for tid=6 → bug is in wake_eligible_waiters waiter-iteration or per-handle waiter-list registration.
C-2b (wake-call issued, scheduler-skip): wake.requested fires with prior_state=Blocked, post_state=Ready, context_switch_scheduled=true but tid=6 never actually executes — bug is in the scheduler ready-queue pick logic (hw_id=5 with affinity-0x20 contention).

C-2a vs C-2b have totally different fix paths (kernel-handle-list bug vs scheduler-fairness bug), so this disambiguation is high-value. Same observability-only pattern as 2.Q (zero semantic change). Estimated ~80-150 LOC, ~30 min to implement + ~10 min run + report.

Alternative deferred:

2.U — closure / commit 2.Q patch. Per iterate_2Q_signal_match_2026_05_28 the patch is uncommitted in working tree. Commit hygiene if no immediate follow-up work needed.
2.V — canary signal.match mirror (~30-60 LOC C++). Adds parity for cross-engine SID diff (per AUDIT-062 wrong-slot vs missing-producer question). Higher long-term ROI but lower than 2.T's immediate disambiguation value.

Recommended: 2.T first (~30-40 min total), then commit 2.Q + 2.T together as a single observability batch.

Artifacts

Under xenia-rs/audit-runs/iterate-2S-longbudget-signal-match/:

ours-cold.jsonl (28.7 MB, 121,605 events, 500M-instr quiet run)
ours-cold.stdout.log (empty — quiet mode)
ours-cold.stderr.log (single 2.M emission notice line — bit-equivalent to 2.Q's stderr)
exit-thread-state.json (9651 bytes; bit-identical to 2.M/2.N/2.Q — 13 threads + 10 wedge entries)
writer-report.md (this file)

Engine HEAD e6d43a23ac393004d2e5adf2f0395fd0b5e6448b + uncommitted diff sha256 e81a4b84224ab07330a0af259589e928 (2.Q signal.match patch

prior retained 2.F/2.H/2.L/2.M patches). xenia-canary UNCHANGED.

16 KiB Raw Blame History Unescape Escape

Iterate 2.S — Long-budget (500M) replay with 2.Q signal.match active (writer report)