Merge audit-2026-05-fix/p2-session-closeout

This commit is contained in:
MechaCat02
2026-05-03 17:35:37 +02:00

View File

@@ -3890,3 +3890,87 @@ plateau persists because:
the addic/subfic class (canary semantics are a 64-bit add against the addic/subfic class (canary semantics are a 64-bit add against
guest memory the Mm layer doesn't fully model yet). guest memory the Mm layer doesn't fully model yet).
---
## Follow-up session 2026-05-03 — outcome
Three audit IDs closed across 3 commits, merged to master with `--no-ff`.
HEAD: `8668550`. Tests: 556 → 561 (+5 from new wall-clock + ghost-trail tests).
### Audit IDs landed
| ID | Commit | Description |
|---|---|---|
| **GPUBUG-DRAIN-001** | `7a1b6b3` | VdSwap PM4 fallback warning silenced under `--parallel`. New `drain_until_wptr(target, time_budget)` mirrors canary's `WorkerThreadMain` predicate; vd_swap skips PM4 ring injection (unreliable when ring backs up under --parallel) and uses direct `notify_xe_swap`. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001). DrainFence handler publishes the digest mirror before reply (was racing the CPU's post-drain digest_snapshot read). |
| **KRNBUG-AUDIT-001** | `d1105aa` | Diagnostic instrumentation: `--trace-handles-focus=<LIST>` flag + per-handle DIAGNOSIS report. `record_signal` falls through to ghost-trail capture for focused handles even when no `record_create` exists. Producer-class classification (GuestExport / KernelInternal). Distinguishes "guest never tried" from "signal landed but missed waiter" in one run. |
| **KRNBUG-D08** | `27d3608` | V-sync wall-clock under `--parallel`. Lockstep stays on the deterministic instruction-count proxy (sylpheed goldens unchanged). `--parallel` switches to wall-clock via `tick_vsync_wallclock`, raising delivered v-syncs from ~2 → 17 at -n 30M. INTERRUPT_QUEUE_CAP=4 still bottlenecks burst delivery. |
### Parked-waiter producer-trace finding
Empirical run at -n 500M lockstep with the new
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:
```
handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0
created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
timeline: cycle=0 tid=10 lr=0x824ac578 src=do_wait_single[wait]
GuestExport=0 KernelInternal=0 waits=1
=> producer is a missing kernel signal source (or BST-paradox upstream)
```
Same shape for 0x100c and 0x15e4. 0x42450b5c shows `<UNCREATED>` +
`<AUDIT_BLIND>` (waiter parked via a non-`do_wait_single` path).
**Conclusion**: hypothesis (A) confirmed for 3 of the 4 handles. The
producer code path is genuinely missing — NO Nt/KeSetEvent /
KePulseEvent / KeReleaseSemaphore call EVER targets these handles
during 500M instructions of execution. The PPC-vs-Rust traversal
paradox (BST-bug from `project_xenia_rs_sylpheed_event_chain_2026_04_29`)
is **NOT** the cause for these specific handles. The 3 handles share
the same creator (lr=0x824a9f6c, tid=1, all at cycle=0) and the same
wait-call wrapper (lr=0x824ac578) — likely 3 sibling worker threads
all waiting for "work to do" notifications that never come. Most
likely producer-class candidates for next session:
- File I/O completion (`signal_io_completion_event`) — currently a
real implementation but possibly never reached; trace `NtReadFile`
paths to see if completion events would target these handles.
- XAM async task completion — F2/F3 deferred from prior sprint.
- Audio buffer-complete — `XAudioRegisterRenderDriverClient` is a
one-shot stub.
- Timer DPCs — `KeSetTimer` real impl but APC delivery may be
routing wrong.
### Acceptance criteria
| # | Criterion | Met? |
|---|-----------|------|
| 1 | Phase 1: zero "PM4_XE_SWAP not consumed" warnings under canonical invocation | ✅ |
| 2 | Phase 2: per-handle DIAGNOSIS for all four parked handles | ✅ |
| 3 | Phase 3: vsync rate restored under --parallel; n2m golden untouched | ✅ partial — rate up but FIFO cap=4 still bottlenecks |
| 4 | cargo test ≥556 | ✅ 561 |
| 5 | All work merged to master | ✅ |
| 6 | **STRETCH** ≥1 of 4 handles signals | ❌ — but data-driven hypothesis fail-fast tells us why (producer missing, not wake-eligibility bug) |
| 7 | **STRETCH** draws > 0 at -n 100M lockstep | ❌ — gating remains parked-waiter handles |
### Recommended next session
1. **Producer hunt** for the 3 Event/Manual handles. With the
diagnostic baked in, a focused hunt: identify the guest function
at `lr=0x824ac578` (the shared wait-call wrapper), walk its
callers, find what kernel signal source SHOULD be wired for each
handle. Likely starting points: file I/O completion
(`signal_io_completion_event`), XamTaskSchedule callback (F2),
XAudio buffer-complete.
2. **Raise INTERRUPT_QUEUE_CAP** for `--parallel` workloads — the
3044 dropped vsyncs at -n 30M --parallel suggest the FIFO is the
next bottleneck.
3. **F2/F3** (XAM async completion) per the still-deferred list,
especially if Phase 2 of next session pinpoints a missing XAM
producer.
4. **GPUBUG-FETCH-PATCH-001**: re-enable the PM4_TYPE0
fetch-constant patch via a side-channel (GpuCommand variant)
when draws actually start firing — relevant for bloom/blur N+1.