Files
xenia-rs/audit-runs/audit-059-gamma-wedge/ours-summary.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

141 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AUDIT-059 — γ-wedge Probe O Summary
Date: 2026-05-11
Mode: READ-ONLY (xenia-rs HEAD untouched). Branch `chore/portable-snapshot @ e6d43a2`.
Binary: `xenia-rs/target/release/xenia-rs-probe` (renamed to survive Stop hook).
Inputs: `Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso`, `xenia-rs/sylpheed.db`.
## Phase 1 — wedge identification (`--halt-on-deadlock`, `--trace-handles`)
Run halts on deadlock well before n=500M. All 12 HW threads parked; 9 Blocked + 3 Ready (spin?).
Snapshot reproduces identically at -n=100M and -n=500M.
### Blocked-thread inventory at halt
| hw/idx | tid | PC | Handle(s) waited | Notes |
|--------|-----|-------------|--------------------------------------------------------|-------|
| 0/0 | 1 | 0x824ac578 | 0x000012a4 (Thread, id=13) | **main thread join on tid=13** |
| 0/1 | 11 | 0x824d2a94 | 0x828a3244 + 0x828a3220 | audio host-pump pair (AUDIT-032/048) |
| 1/0 | 2 | 0x824a95f8 | 0x8287093c | helper |
| 1/1 | 13 | 0x824ac578 | **0x000012ac (Event/Auto)** | **keystone γ-wedge** |
| 2/0 | 7 | 0x824cd4f4 | 0x42450b5c (deadline) | audio? has deadline |
| 2/1 | 8 | 0x824ab214 | 0x000010e4 + 0x000010d0 (WaitAll) | sema OK + manual-event NO_SIG |
| 3/0 | 4 | 0x824ac578 | 0x00001028 (Semaphore) | sema released 7× consumed 8 — race? |
| 3/1 | 5 | 0x824ac578 | **0x000012b8 (Event/Auto)** | worker-cluster γ-wedge |
| 5/0 | 3 | 0x824ac578 | 0x00001020 (Event/Manual) | NO_SIG |
### Per-handle audit (`--trace-handles-focus`)
`signal_attempts` (primary + ghost) for each wedge at halt:
| Handle | Kind | Waiters | signal_attempts | Verdict |
|--------|--------------|---------|-----------------|---------|
| 0x1020 | Event/Manual | 1 (tid=3) | 0 | γ-wedge |
| 0x1040 | Event/Auto | 0 (32 waits historic) | 0 | γ-wedge |
| 0x10a8 | Event/Auto | 0 (7 waits historic) | 0 | γ-wedge |
| 0x10e4 | Event/Manual | 1 (tid=8) | 0 | γ-wedge |
| 0x12a4 | Thread | 1 (tid=1, main) | 0 | downstream of 0x12ac |
| **0x12ac** | **Event/Auto** | **1 (tid=13)** | **0** | **keystone γ-wedge** |
| 0x12b8 | Event/Auto | 1 (tid=5) | 0 | worker-cluster γ-wedge |
| 0x1028 | Semaphore | 1 (tid=4) | 7 (works) | sema not the bug |
## Phase 2 — create-site triangulation (focus dump + lr-trace)
### Handle 0x12AC (tid=13 keystone wedge)
- **Create-call-site PC**: `0x821cb158` = `sub_821CB030+0x128` (bl NtCreateEvent wrapper `sub_824A9F18`).
- **Wait-call-site PC**: `0x821cb1dc` = `sub_821CB030+0x1AC` (bl `sub_824AC540` INFINITE wait).
- **Created on stack frame**: r3=0x715a7a60 (stack-local OUT handle slot, tid=13's stack).
- **Creator full chain** (frames 1..5 from per-handle `created stack`):
```
sub_821CB030+0x12c (this fn creates AND waits)
sub_821CBA08+0xd8
sub_821CC3F8+0x5c (GamePart_Title)
sub_821C4EB0+0x68 (UImpl@GamePart_Title@silph) <- vtable class .?AUImpl@GamePart_Title@silph@@
sub_821749C0+0xc0
```
- Class identification (from wait-thread frame-3/4 saved-r29 vtable probes):
- frame 3 r29 vtable 0x820a3dc8 = `.?AVGamePart_Title@silph@@`
- frame 4 r29 vtable 0x820a3e00 = `.?AUImpl@GamePart_Title@silph@@`
### Handle 0x12B8 (tid=5 worker-cluster wedge)
- **Create-call-site PC**: inside `sub_82458068+0x8C` (bl NtCreateEvent wrapper).
- **Wait-call-site PC**: inside `sub_82458B08+0x2C` (bl wait wrapper).
- **Creator full chain**:
```
sub_82458068+0x8c
sub_82458960+0x94
sub_82451238+0x1c8
sub_82450B68+0x1a0
sub_82450A68+0xcc
```
- Lives entirely in worker cluster 0x82450000-0x8245C000.
### Handle 0x12A4 (tid=1 main thread join)
- Created via `XCreateThread` (Thread kind), reference id 13.
- Wait chain (from WAIT-THREAD):
```
sub_82173990+0x2d4 (program top — AUDIT-033 gateway)
sub_822F1AA8+0xa8
sub_8216EA68+0x3ac
entry_point+0x198
```
- Wait-frame-3 r29 vtable 0x820a183c = `.?AVSilph@silph@@`.
- Resolves the AUDIT-049 finding that handle `0x1280` was the thread handle. Downstream of 0x12AC — wake tid=13 and main thread wakes.
## Phase 3 — candidate-signaler fire counts (lr-trace)
| Producer | Fires | Distinct LRs | AUDIT-050 reachability | Comment |
|----------|-------|--------------|------------------------|---------|
| sub_82452DC0 | **8** | 0x82448120 (4), 0x82460cc8 (2), 0x821790b8 (1), **0x821cb1d0 (1)** | Only reachable NtSetEvent caller in 0x82450000-0x8245C000 (AUDIT-050) | Tid=13 itself calls it 1× from sub_821CB030+0x19C right before waiting on 0x12AC. Submits work, never gets reply. |
| sub_82458B90 | 1 | 0x82457f18 (sub_82457EF0+0x24) | direct NtSetEvent caller | fires once but not on 0x12AC |
| sub_82453910 | 0 | — | direct NtSetEvent caller; 5 static callers (sub_821A5150, sub_821C8388, sub_821CBA08+0x1E8, sub_82173990+0x208, sub_821C4AE0+0xE8) | **inert** — sub_821CBA08+0x1E8 is in the 0x12AC chain |
| sub_82458A70 | 0 | — | called from sub_82450B68+0x310 AND sub_82450550+0x44 | **inert** — sub_82450B68 is in 0x12B8's create-chain |
| sub_824566D0 | 0 | — | direct | inert |
| sub_824500E8 | 0 | — | direct (0 static callers — dead?) | inert |
### Static-graph triangulation for 0x12AC signaler
- **`sub_82452DC0`** has 34 static callers including 2 sites inside `sub_821CB030` (+0x19C and +0x2BC). Tid=13 already drives the +0x19C site once. The signal that should wake tid=13 must originate from a worker thread inside one of sub_82452DC0's `bl` descendants (the work-submitter's queue is supposed to land work on a worker thread that ultimately calls NtSetEvent on the same KEVENT registered at `[guest-context-base + N]`).
- **`sub_82453910`** is statically reachable from `sub_821CBA08+0x1E8` (0x12AC creator-chain frame). 0 fires in ours despite the chain being executed (sub_821CBA08 fires at least once on tid=13's path through 0x12AC creation). Worth tracing why `sub_821CBA08+0x1E8` site doesn't reach.
## Top wedges + signaler shortlist for AUDIT-060
- **Keystone γ-wedge**: handle **0x12AC** (Event/Auto), created at `sub_821CB030+0x128` and waited at `sub_821CB030+0x1AC`. Class context `silph::GamePart_Title::UImpl`. signal_attempts=0. Waking it unblocks tid=13 → tid=1 (0x12A4 Thread) → main thread.
- **Secondary γ-wedge (independent)**: handle **0x12B8** (Event/Auto), created at `sub_82458068+0x8C`, waited at `sub_82458B08+0x2C`, entirely within worker cluster on tid=5.
### Best-candidate NtSetEvent producers (shortlist of 5)
1. **`sub_82452DC0`** (PC 0x82452DC0) — the master work-submitter, 8 fires in ours vs ~50-60 canary (AUDIT-056). Sole statically-reachable NtSetEvent caller per AUDIT-050. The expected signaler chain is *inside* its callee tree, fired from a worker thread that consumes the queued job. **Investigate why our 8 fires don't produce a wake on 0x12AC.**
2. **`sub_82453910`** (NtSetEvent caller) — reachable from `sub_821CBA08+0x1E8` (same chain as 0x12AC creator). 0 fires in ours. Possibly the *direct* signaler for 0x12AC if the chain executes far enough.
3. **`sub_82458A70`** (NtSetEvent caller) — reachable from `sub_82450B68+0x310` (same chain as 0x12B8 creator). 0 fires. Likely the *direct* signaler for 0x12B8.
4. **`sub_82458B90`** (NtSetEvent caller) — 1 fire from `sub_82457EF0+0x24` in our run. Not on tracked handles; possibly auxiliary.
5. **`sub_824566D0`** (NtSetEvent caller) — 0 fires; called from sub_82456AD0/sub_82456670/sub_82456AA4. Auxiliary.
### Cross-handle BFS observation
The 0x12AC keystone wedge and the 0x12B8 worker-cluster wedge live in *different islands* (GamePart_Title vs raw worker cluster). The fact that **the four NtSetEvent producers most-statically-linked to the wedge create-chains all fire 0×** in our run (only `sub_82452DC0` and `sub_82458B90` fire, neither on the wedge handles) confirms AUDIT-050's framing: **the cluster is half-bootstrapped — work-submitter live, downstream worker callbacks dead**.
## Surprises / notes
- Phase 1 with `--quiet` produced 0-byte output. `--quiet` suppresses the deadlock-halt diagnostic dump too — drop it for any future deadlock investigation runs. (Re-ran without `--quiet`; 466 lines.)
- `--lr-trace=0x824a9f6c` (mid-function PC) recorded `lr=0x824a9f6c` self-reference instead of caller LR — would have been useless for caller triangulation. The `created stack (6 frames)` dump in `--trace-handles-focus` is the better data source.
- Handle namespace per-run drift confirmed: AUDIT-049 saw 0x1280/0x1288, this probe sees 0x12A4/0x12AC. AUDIT-058 saw 0x12A4. The keystone-wedge *function context* (sub_821CB030 / sub_821C4EB0 / `silph::GamePart_Title::UImpl`) is stable across all three audits.
- AUDIT-049/050/058's claim that the cluster is half-bootstrapped is reinforced by Phase 3 fire counts: the work-submitter fires, but **none of its downstream NtSetEvent producers fire**. This is exactly the symptom expected if the work-submitter enqueues but the worker-side dequeue/process loop never runs (or runs on the wrong queue).
## Artifacts
```
xenia-rs/audit-runs/audit-059-gamma-wedge/
ours-phase1-500m.stdout / .stderr (500M-instr halt-on-deadlock dump)
ours-phase1.stdout / .stderr (100M repro, identical wedges)
ours-phase2.stdout / .stderr (focus + create stacks; lr-create.jsonl)
ours-phase2b.stdout / .stderr (NtCreateEvent ENTRY lr-trace; lr-create-entry.jsonl)
ours-phase3.stdout / .stderr (signaler-fires.jsonl: 8+1+0+0+0+0)
ours-summary.md (this file)
```
Recommended AUDIT-060: trace `sub_82452DC0`'s callee tree on tid=13 (the +0x19C fire) and walk the work-queue consumer in the worker cluster; identify which worker thread is supposed to dequeue and signal 0x12AC, and why none do. Cross-reference with AUDIT-056's canary 5.6× throughput gap.