Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

5.2 KiB

Raw Blame History

AUDIT-069 Session 6 — Time-ordered first-N release diff

Source data

Canary: xenia-rs/audit-runs/audit-069-wait-signal-producer/s5/canary-release-trace.log (414 AUDIT-070-RELEASE events on work-semaphore handle 0xF800003C).
Ours: xenia-rs/audit-runs/audit-069-wait-signal-producer/s5/ours-release-trace.jsonl (99 --lr-trace events at PC 0x824ab158, the NtReleaseSemaphore wrapper entry; 83 on handle 0x1044 which is ours's work-semaphore analog of canary's 0xF800003C).

Apples-to-apples comparison uses canary 414 ↔ ours 83 on the work semaphore (handle-equivalent, both 0xF800003C canary / 0x1044 ours). Ratio = 20.0% — close to but slightly below the S5-reported "24%" headline figure (which counted ours's 99 across ALL handles vs canary's 414 single-handle).

Per-tid release totals

Canary tid	Role	Releases	Ours tid (map)	Releases	Delta
6	main	7	1	7	0
10	worker	382	5	75	+307 canary
17	cache producer	9	13	1	+8 canary
18	(other producer)	14	—	0	+14 canary (no ours analog)
16	(other producer)	1	—	0	+1 canary
26	(other producer)	1	—	0	+1 canary

Main thread releases match exactly (7=7) — ours's main is bit-equivalent on this path.

FIRST-N=20 time-ordered diff

Time-ordered by canary host_ns and aligned to ours via the AUDIT-068/069 documented tid map (6↔1, 10↔5, 17↔13):

can_ord	can_tid	ours_tid	ours_ord (on tid)	status	canary host_ns
0	6	1	0	MATCHED	6,600
1	10	5	0	MATCHED	9,503,200
2	6	1	1	MATCHED	44,374,500
3	10	5	1	MATCHED	45,152,800
4	6	1	2	MATCHED	56,846,700
5	10	5	2	MATCHED	105,855,200
6	6	1	3	MATCHED	188,211,400
7	10	5	3	MATCHED	192,596,400
8	6	1	4	MATCHED	194,344,500
9	10	5	4	MATCHED	195,199,800
10	6	1	5	MATCHED	196,786,900
11	10	5	5	MATCHED	197,419,200
12	6	1	6	MATCHED	335,050,200
13	10	5	6	MATCHED	336,046,100
14	10	5	7	MATCHED	337,214,700
15	10	5	8	MATCHED	337,443,900
16	10	5	9	MATCHED	337,674,900
17	10	5	10	MATCHED	337,900,800
18	10	5	11	MATCHED	338,123,800
19	10	5	12	MATCHED	338,350,000

All 20 match. Bootstrap is identical for first 20 releases.

First divergence

Extending the walk past the first 20:

First time-ordered canary event NOT matched in ours:
    canary ord = 83   tid = 10 (worker)   host_ns = 372,415,500
    reason     = ours's tid=5 worker has produced ALL of its 75 releases by this point

But the causal divergence is one ord earlier:

canary ord = 82   tid = 17 (cache-thread)   host_ns = 372,105,500   lr = 0x824AB168
    → canary's tid=17 emits its FIRST work-sem release at 372 ms
    → ours's tid=13 (cache-thread analog) emitted its only release at cycle=26,803 (LR 0x82450314)
       early in bootstrap, then NEVER releases again — it wedges at sub_821CB030+0x1AC
       (per AUDIT-069 S1 wait-site, AUDIT-049 wedge family).

Canary tid=17's 9 releases (ords 82, 84, 86, 88, 92, 93, 94, 95, 96) feed the work-semaphore at host_ns 372–399 ms. These supply work-items to canary's worker tid=10, which then produces another ~300 releases as it processes the queued items.

Ours's tid=13 is silent after its bootstrap-time release. The worker tid=5 runs out of work and halts at 75 releases — the moment it finishes consuming items produced before tid=13 wedged.

Interpretation vs S5 H3 ("systemic under-production")

H3 predicted a "systemic" under-production across all producers. The first-20 diff REFINES H3:

First 20 releases match cleanly across both engines. The system is NOT broken at boot.
The under-production is concentrated on the cache-thread (canary tid=17 / ours tid=13). That thread's failure to produce 8 more releases (after its 1st) cascades into a missing ~300 worker releases.
Canary tids 18/16/26 (14+1+1 = 16 additional releases from "other producers") have no observable ours analog. Whether ours never spawned analogs or those threads exist but never reach the release site is not determined by this measurement.

H3 is therefore PARTIALLY CONFIRMED with refinement: the dominant under-production source is the cache-thread (tid=17/13), not a generic systemic deficit. The remaining 16 releases from canary-only producer tids (18/16/26) are the secondary contribution.

Recommended AUDIT-070 next steps

Probe ours tid=13 between cycle 26,803 (its first release) and its wedge at sub_821CB030+0x1AC. Identify why the cache-thread loops once in ours but ~10× in canary. AUDIT-069 S4's hypothesis (work-sem over-release causing producer to never re-enter wait) is now FALSIFIED by S5+S6 data; the producer simply never gets back to its release site.
Inventory canary tids 18/16/26. Identify their entry PCs in canary, then check whether ours spawns analogs at all (thread.create events in a Phase A event log).
The schema bridge wired in this session (see summary.md) makes future regressions in semaphore-release cadence diff-visible without ad-hoc cvars.

5.2 KiB Raw Blame History Unescape Escape