Files
xenia-rs/audit-runs/audit-059-gamma-wedge/ours-summary.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

9.9 KiB
Raw Blame History

AUDIT-059 — γ-wedge Probe O Summary

Date: 2026-05-11 Mode: READ-ONLY (xenia-rs HEAD untouched). Branch chore/portable-snapshot @ e6d43a2. Binary: xenia-rs/target/release/xenia-rs-probe (renamed to survive Stop hook). Inputs: Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso, xenia-rs/sylpheed.db.

Phase 1 — wedge identification (--halt-on-deadlock, --trace-handles)

Run halts on deadlock well before n=500M. All 12 HW threads parked; 9 Blocked + 3 Ready (spin?). Snapshot reproduces identically at -n=100M and -n=500M.

Blocked-thread inventory at halt

hw/idx tid PC Handle(s) waited Notes
0/0 1 0x824ac578 0x000012a4 (Thread, id=13) main thread join on tid=13
0/1 11 0x824d2a94 0x828a3244 + 0x828a3220 audio host-pump pair (AUDIT-032/048)
1/0 2 0x824a95f8 0x8287093c helper
1/1 13 0x824ac578 0x000012ac (Event/Auto) keystone γ-wedge
2/0 7 0x824cd4f4 0x42450b5c (deadline) audio? has deadline
2/1 8 0x824ab214 0x000010e4 + 0x000010d0 (WaitAll) sema OK + manual-event NO_SIG
3/0 4 0x824ac578 0x00001028 (Semaphore) sema released 7× consumed 8 — race?
3/1 5 0x824ac578 0x000012b8 (Event/Auto) worker-cluster γ-wedge
5/0 3 0x824ac578 0x00001020 (Event/Manual) NO_SIG

Per-handle audit (--trace-handles-focus)

signal_attempts (primary + ghost) for each wedge at halt:

Handle Kind Waiters signal_attempts Verdict
0x1020 Event/Manual 1 (tid=3) 0 γ-wedge
0x1040 Event/Auto 0 (32 waits historic) 0 γ-wedge
0x10a8 Event/Auto 0 (7 waits historic) 0 γ-wedge
0x10e4 Event/Manual 1 (tid=8) 0 γ-wedge
0x12a4 Thread 1 (tid=1, main) 0 downstream of 0x12ac
0x12ac Event/Auto 1 (tid=13) 0 keystone γ-wedge
0x12b8 Event/Auto 1 (tid=5) 0 worker-cluster γ-wedge
0x1028 Semaphore 1 (tid=4) 7 (works) sema not the bug

Phase 2 — create-site triangulation (focus dump + lr-trace)

Handle 0x12AC (tid=13 keystone wedge)

  • Create-call-site PC: 0x821cb158 = sub_821CB030+0x128 (bl NtCreateEvent wrapper sub_824A9F18).
  • Wait-call-site PC: 0x821cb1dc = sub_821CB030+0x1AC (bl sub_824AC540 INFINITE wait).
  • Created on stack frame: r3=0x715a7a60 (stack-local OUT handle slot, tid=13's stack).
  • Creator full chain (frames 1..5 from per-handle created stack):
    sub_821CB030+0x12c  (this fn creates AND waits)
    sub_821CBA08+0xd8
    sub_821CC3F8+0x5c   (GamePart_Title)
    sub_821C4EB0+0x68   (UImpl@GamePart_Title@silph)   <- vtable class .?AUImpl@GamePart_Title@silph@@
    sub_821749C0+0xc0
    
  • Class identification (from wait-thread frame-3/4 saved-r29 vtable probes):
    • frame 3 r29 vtable 0x820a3dc8 = .?AVGamePart_Title@silph@@
    • frame 4 r29 vtable 0x820a3e00 = .?AUImpl@GamePart_Title@silph@@

Handle 0x12B8 (tid=5 worker-cluster wedge)

  • Create-call-site PC: inside sub_82458068+0x8C (bl NtCreateEvent wrapper).
  • Wait-call-site PC: inside sub_82458B08+0x2C (bl wait wrapper).
  • Creator full chain:
    sub_82458068+0x8c
    sub_82458960+0x94
    sub_82451238+0x1c8
    sub_82450B68+0x1a0
    sub_82450A68+0xcc
    
  • Lives entirely in worker cluster 0x82450000-0x8245C000.

Handle 0x12A4 (tid=1 main thread join)

  • Created via XCreateThread (Thread kind), reference id 13.
  • Wait chain (from WAIT-THREAD):
    sub_82173990+0x2d4  (program top — AUDIT-033 gateway)
    sub_822F1AA8+0xa8
    sub_8216EA68+0x3ac
    entry_point+0x198
    
  • Wait-frame-3 r29 vtable 0x820a183c = .?AVSilph@silph@@.
  • Resolves the AUDIT-049 finding that handle 0x1280 was the thread handle. Downstream of 0x12AC — wake tid=13 and main thread wakes.

Phase 3 — candidate-signaler fire counts (lr-trace)

Producer Fires Distinct LRs AUDIT-050 reachability Comment
sub_82452DC0 8 0x82448120 (4), 0x82460cc8 (2), 0x821790b8 (1), 0x821cb1d0 (1) Only reachable NtSetEvent caller in 0x82450000-0x8245C000 (AUDIT-050) Tid=13 itself calls it 1× from sub_821CB030+0x19C right before waiting on 0x12AC. Submits work, never gets reply.
sub_82458B90 1 0x82457f18 (sub_82457EF0+0x24) direct NtSetEvent caller fires once but not on 0x12AC
sub_82453910 0 direct NtSetEvent caller; 5 static callers (sub_821A5150, sub_821C8388, sub_821CBA08+0x1E8, sub_82173990+0x208, sub_821C4AE0+0xE8) inert — sub_821CBA08+0x1E8 is in the 0x12AC chain
sub_82458A70 0 called from sub_82450B68+0x310 AND sub_82450550+0x44 inert — sub_82450B68 is in 0x12B8's create-chain
sub_824566D0 0 direct inert
sub_824500E8 0 direct (0 static callers — dead?) inert

Static-graph triangulation for 0x12AC signaler

  • sub_82452DC0 has 34 static callers including 2 sites inside sub_821CB030 (+0x19C and +0x2BC). Tid=13 already drives the +0x19C site once. The signal that should wake tid=13 must originate from a worker thread inside one of sub_82452DC0's bl descendants (the work-submitter's queue is supposed to land work on a worker thread that ultimately calls NtSetEvent on the same KEVENT registered at [guest-context-base + N]).
  • sub_82453910 is statically reachable from sub_821CBA08+0x1E8 (0x12AC creator-chain frame). 0 fires in ours despite the chain being executed (sub_821CBA08 fires at least once on tid=13's path through 0x12AC creation). Worth tracing why sub_821CBA08+0x1E8 site doesn't reach.

Top wedges + signaler shortlist for AUDIT-060

  • Keystone γ-wedge: handle 0x12AC (Event/Auto), created at sub_821CB030+0x128 and waited at sub_821CB030+0x1AC. Class context silph::GamePart_Title::UImpl. signal_attempts=0. Waking it unblocks tid=13 → tid=1 (0x12A4 Thread) → main thread.
  • Secondary γ-wedge (independent): handle 0x12B8 (Event/Auto), created at sub_82458068+0x8C, waited at sub_82458B08+0x2C, entirely within worker cluster on tid=5.

Best-candidate NtSetEvent producers (shortlist of 5)

  1. sub_82452DC0 (PC 0x82452DC0) — the master work-submitter, 8 fires in ours vs ~50-60 canary (AUDIT-056). Sole statically-reachable NtSetEvent caller per AUDIT-050. The expected signaler chain is inside its callee tree, fired from a worker thread that consumes the queued job. Investigate why our 8 fires don't produce a wake on 0x12AC.
  2. sub_82453910 (NtSetEvent caller) — reachable from sub_821CBA08+0x1E8 (same chain as 0x12AC creator). 0 fires in ours. Possibly the direct signaler for 0x12AC if the chain executes far enough.
  3. sub_82458A70 (NtSetEvent caller) — reachable from sub_82450B68+0x310 (same chain as 0x12B8 creator). 0 fires. Likely the direct signaler for 0x12B8.
  4. sub_82458B90 (NtSetEvent caller) — 1 fire from sub_82457EF0+0x24 in our run. Not on tracked handles; possibly auxiliary.
  5. sub_824566D0 (NtSetEvent caller) — 0 fires; called from sub_82456AD0/sub_82456670/sub_82456AA4. Auxiliary.

Cross-handle BFS observation

The 0x12AC keystone wedge and the 0x12B8 worker-cluster wedge live in different islands (GamePart_Title vs raw worker cluster). The fact that the four NtSetEvent producers most-statically-linked to the wedge create-chains all fire 0× in our run (only sub_82452DC0 and sub_82458B90 fire, neither on the wedge handles) confirms AUDIT-050's framing: the cluster is half-bootstrapped — work-submitter live, downstream worker callbacks dead.

Surprises / notes

  • Phase 1 with --quiet produced 0-byte output. --quiet suppresses the deadlock-halt diagnostic dump too — drop it for any future deadlock investigation runs. (Re-ran without --quiet; 466 lines.)
  • --lr-trace=0x824a9f6c (mid-function PC) recorded lr=0x824a9f6c self-reference instead of caller LR — would have been useless for caller triangulation. The created stack (6 frames) dump in --trace-handles-focus is the better data source.
  • Handle namespace per-run drift confirmed: AUDIT-049 saw 0x1280/0x1288, this probe sees 0x12A4/0x12AC. AUDIT-058 saw 0x12A4. The keystone-wedge function context (sub_821CB030 / sub_821C4EB0 / silph::GamePart_Title::UImpl) is stable across all three audits.
  • AUDIT-049/050/058's claim that the cluster is half-bootstrapped is reinforced by Phase 3 fire counts: the work-submitter fires, but none of its downstream NtSetEvent producers fire. This is exactly the symptom expected if the work-submitter enqueues but the worker-side dequeue/process loop never runs (or runs on the wrong queue).

Artifacts

xenia-rs/audit-runs/audit-059-gamma-wedge/
  ours-phase1-500m.stdout / .stderr         (500M-instr halt-on-deadlock dump)
  ours-phase1.stdout / .stderr              (100M repro, identical wedges)
  ours-phase2.stdout / .stderr              (focus + create stacks; lr-create.jsonl)
  ours-phase2b.stdout / .stderr             (NtCreateEvent ENTRY lr-trace; lr-create-entry.jsonl)
  ours-phase3.stdout / .stderr              (signaler-fires.jsonl: 8+1+0+0+0+0)
  ours-summary.md                            (this file)

Recommended AUDIT-060: trace sub_82452DC0's callee tree on tid=13 (the +0x19C fire) and walk the work-queue consumer in the worker cluster; identify which worker thread is supposed to dequeue and signal 0x12AC, and why none do. Cross-reference with AUDIT-056's canary 5.6× throughput gap.