handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,181 @@
# AUDIT-059 PROBE C — canary γ-wedge signaler triangulation
Date: 2026-05-11
Mode: READ-ONLY canary instrumentation (patch reverted clean).
Canary HEAD before/after: `6de80dffe` (clean tree confirmed).
Patch: audit-030 `--log_lr_on_pc` (30 LOC across 4 files; saved to `canary-patches-applied.diff`).
Build: `cd build && ninja -f build-Debug.ninja xenia_canary` → copied to `xenia-canary-probe`.
## Phase 1 — handle creation at `sub_821CB030+0x128` (PC `0x821CB15C`)
Probe target: PC `0x821CB15C` (post-bl after `bl 0x824A9F18` NtCreateEvent wrapper).
At this PC, `r3` = freshly-created event handle.
**2 fires captured in 130 seconds** (`canary-ntcreate.log`):
| # | Wallclock pos | tid (canary) | r3 (handle) | r31 (stack) |
|---|---------------|--------------|-------------|-------------|
| 1 | line 2058 | F8000090 | **0xF8000098** | 0x7064FA70 |
| 2 | line 10567 | F80000CC | **0xF8000108** | 0x708FF990 |
Both fires precede a **synchronous file-IO sequence** (RtlInitAnsiString → NtQueryFullAttributesFile → NtCreateFile for `cache:\aab216c3\5\...` paths).
Both events are then `NtDuplicateObject`'d (the duplicate is the real wait target):
| Original handle | Dup target | Wait-site |
|-----------------|------------|-----------|
| `F8000098` (XObject) | `F80000A0` (XEvent) | tid F8000090, NtClose@line 2081 (fast) |
| `F8000108` (XObject) | `F8000110` (XEvent) | tid F80000CC, NtClose@line 10605 |
## Phase 1b — wait-site at `sub_821CB030+0x1AC` (PC `0x821CB1DC`)
Verifies the wait fires in canary too. 2 fires, both with `lr=0x821CB1D0`:
```
i> F8000090 TRACE-PC-LR pc=821CB1DC lr=821CB1D0 r3=F8000098 r4=FFFFFFFF r5=BC65CDC0
i> F80000C8 TRACE-PC-LR pc=821CB1DC lr=821CB1D0 r3=F8000108 r4=FFFFFFFF r5=BC667CC0
```
`r4=FFFFFFFF` → INFINITE wait timeout. Wait DOES execute in canary — but completes
(matched by subsequent NtClose). This is the AUDIT-041 wait-site `bl 0x824AA330`.
## Phase 2 — NtSetEvent triangulation
Probe target: NtSetEvent thunk PC `0x8284DF5C` (53,701 fires in 130s).
Cross-checked against the `sub_824AA2F0` (NtSetEvent wrapper) entry probe (20,919 fires).
### Identification of wedge-equivalent handle by NtSetEvent fire pattern
Hypothesis: the dup-XEvent (target of NtDuplicateObject) is what gets signaled.
In `canary-ntsetevent.log`, **dup handle `F8000110`** appears in NtSetEvent exactly **2×**:
```
i> F8000054 TRACE-PC-LR pc=8284DF5C lr=824AA304 r3=F8000110 r5=BC32CC60 r31=7036FDC0
i> F8000084 TRACE-PC-LR pc=8284DF5C lr=824AA304 r3=F8000110 r5=00000002 r31=705AF860
```
`lr=824AA304` = wrapper-internal post-bl PC inside `sub_824AA2F0` (NtSetEvent wrapper).
To get the **caller LR** (i.e. who called the wrapper), probe the wrapper entry `0x824AA2F0`.
### Wrapper-entry probe — cross-run structural correlation
In the wrapper-entry run, the handle namespace shifted slightly (per-run slab-allocator
nondeterminism), but the **r31 stack invariant** matches across runs.
Two-fire handle in the wrapper-entry run that matches r31 stack frames `7036FDC0` and
`705AF860` exactly:
```
i> F8000054 TRACE-PC-LR pc=824AA2F0 lr=82458D14 r3=F8000118 r4=BC369420 r5=BC32CC60 r31=7036FDC0
i> F8000084 TRACE-PC-LR pc=824AA2F0 lr=8245ED80 r3=F8000118 r4=705AF8B0 r5=00000002 r31=705AF860
```
**Cross-run match by (tid, r31)**: `F8000054@7036FDC0` and `F8000084@705AF860` are the same
two threads/stack-frames signaling the cache-IO completion event in both runs.
### Resolved canary signalers
| LR | Caller function | Pre-bl insn | Demangled |
|----|-----------------|-------------|-----------|
| `0x82458D14` | **`sub_82458B90`** | `bl 0x824AA2F0` @ 0x82458D10 | NtSetEvent wrapper call |
| `0x8245ED80` | **`sub_8245EC10`** | `bl 0x824AA2F0` @ 0x8245ED7C | NtSetEvent wrapper call |
Both LRs are NtSetEvent-wrapper call sites. Each fires once per wedge instance.
## Cross-reference with ours-side (sibling PROBE O findings)
From `ours-summary.md` (Phase 3 candidate-signaler table):
| Producer | Fires in ours | Distinct LRs | Notes |
|----------|---------------|--------------|-------|
| `sub_82458B90` | **1** | 0x82457f18 (sub_82457EF0+0x24) | direct NtSetEvent caller; **fires once but NOT on wedge handle** |
| `sub_8245EC10` | **0** | — | **0 static callers** — indirect-dispatch-only (audit-050 dead) |
### Static caller chains in ours's database
```
sub_82458B90 callers:
└─ sub_82457EF0+0x24 (only caller; sub_82457EF0 itself has 0 static callers — fnptr-array only)
sub_8245EC10 callers:
└─ NONE STATICALLY
Located in dispatch_table @ 0x820B5830 [slot 1]
slot 0: sub_8245F1D0
slot 1: sub_8245EC10
Table referenced from:
- sub_8245F1D0+0x1C (self-ref recursive)
- sub_8245FEB8+0x100 (stw r11, 0(r31) at 0x8245FFC0 — class vptr install)
sub_8245FEB8 callers: sub_8245FB68 (2 sites), sub_824601A0 (1 site)
sub_8245FB68 callers: sub_8245F880, sub_8245FAB0
sub_824601A0 callers: sub_82460118
```
Both signaler functions live in the worker cluster `0x82458xxx-0x8245Exxx`. `sub_8245EC10` is
a slot-1 entry in a 2-slot dispatch_table at `0x820B5830` — installed at struct offset 0
(vptr) by `sub_8245FEB8`'s constructor. `sub_82458B90`'s only static caller chain goes up
through `sub_82457EF0`, which itself has 0 static callers.
## Findings
1. **Wedge structural identification**: `sub_821CB030+0x128` creates a per-call file-IO
completion XEvent that is immediately duplicated and submitted to a worker
(`sub_82452DC0` @ +0x19C) for asynchronous file load. The wait at +0x1AC blocks until
the worker signals the duplicate XEvent.
2. **Canary signalers (the missing piece)**: Two distinct call-sites signal the wedge
in canary:
- `sub_82458B90` (= LR `0x82458D14`)
- `sub_8245EC10` (= LR `0x8245ED80`)
Both wrap `bl 0x824AA2F0` (NtSetEvent wrapper). Each fires once per file-IO completion.
3. **Static-graph triangulation for ours**:
- `sub_82458B90` has 1 static caller (`sub_82457EF0+0x24`); chain dies because
`sub_82457EF0` has 0 static callers (fnptr-array activation).
- `sub_8245EC10` has 0 static callers — vtable slot 1 in dispatch_table `0x820B5830`,
installed by `sub_8245FEB8` ctor; ctor's reachability chain also dies in the
`0x82458xxx-0x8245Fxxx` cluster.
4. **The wedge is downstream of AUDIT-050's unreachability island**. Both canary
signalers live in the half-bootstrapped worker cluster. The work-submitter
(`sub_82452DC0`) DOES fire in ours (8× per PROBE O) on tid=13 — but the queued
work never reaches a worker that calls `sub_82458B90` or `sub_8245EC10` because
the worker-side dispatch infrastructure (vtable install via `sub_8245FEB8` ctor;
fnptr-array activation of `sub_82457EF0`) never runs in ours.
5. **AUDIT-058's `sub_825070F0` activation hypothesis is corroborated**: `sub_825070F0`
(AUDIT-057's top missing-thread spawner, 4 workers @ ctx 0xBCE25340) is the
plausible bootstrap for the workers that would receive the queued work and run
the dispatch_table @ `0x820B5830` callbacks. Until that spawn happens in ours,
the worker side stays dead → signal never lands.
## Recommended AUDIT-060
1. **Direct path**: probe `sub_82452DC0+0x19C bl` site in canary (with our existing
`--log_lr_on_pc=0x82452E5C` or post-bl PC) to trace what happens after work submission.
Find which worker thread (one of the 4 spawned by `sub_825070F0`) dequeues the job
and ultimately calls `sub_82458B90` or `sub_8245EC10`.
2. **Indirect path**: probe `sub_8245FEB8` (vptr installer for dispatch_table `0x820B5830`)
in canary AND ours. If it fires in canary but not ours, that confirms the worker-class
constructor is in the unreachability island.
3. **Bootstrap path**: trace what activates `sub_825070F0` in canary (per AUDIT-058 it
fires 1× post-`\\dat\\movie` ResolvePath). Capture LR at `sub_825070F0` entry in
canary, then check that LR's caller-fn for fire count in ours.
## Artifacts
```
xenia-rs/audit-runs/audit-059-gamma-wedge/
canary-patches-applied.diff (audit-030 patch record before revert)
canary-ntcreate.log/.err (Phase 1: PC 0x821CB15C, 2 fires)
canary-waitsite.log/.err (Phase 1b: PC 0x821CB1DC, 2 fires)
canary-ntsetevent.log/.err (NtSetEvent thunk PC 0x8284DF5C; 53,701 fires; r3=F8000110 ×2)
canary-setwrapper.log/.err (NtSetEvent wrapper PC 0x824AA2F0; 20,919 fires; r3=F8000118 ×2)
canary-summary.md (this file)
ours-summary.md (sibling PROBE O ours-side findings)
```
Canary HEAD verified `6de80dffe`, working tree clean. xenia-rs untouched.