Files
xenia-rs/audit-runs/audit-059-gamma-wedge/canary-summary.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

182 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AUDIT-059 PROBE C — canary γ-wedge signaler triangulation
Date: 2026-05-11
Mode: READ-ONLY canary instrumentation (patch reverted clean).
Canary HEAD before/after: `6de80dffe` (clean tree confirmed).
Patch: audit-030 `--log_lr_on_pc` (30 LOC across 4 files; saved to `canary-patches-applied.diff`).
Build: `cd build && ninja -f build-Debug.ninja xenia_canary` → copied to `xenia-canary-probe`.
## Phase 1 — handle creation at `sub_821CB030+0x128` (PC `0x821CB15C`)
Probe target: PC `0x821CB15C` (post-bl after `bl 0x824A9F18` NtCreateEvent wrapper).
At this PC, `r3` = freshly-created event handle.
**2 fires captured in 130 seconds** (`canary-ntcreate.log`):
| # | Wallclock pos | tid (canary) | r3 (handle) | r31 (stack) |
|---|---------------|--------------|-------------|-------------|
| 1 | line 2058 | F8000090 | **0xF8000098** | 0x7064FA70 |
| 2 | line 10567 | F80000CC | **0xF8000108** | 0x708FF990 |
Both fires precede a **synchronous file-IO sequence** (RtlInitAnsiString → NtQueryFullAttributesFile → NtCreateFile for `cache:\aab216c3\5\...` paths).
Both events are then `NtDuplicateObject`'d (the duplicate is the real wait target):
| Original handle | Dup target | Wait-site |
|-----------------|------------|-----------|
| `F8000098` (XObject) | `F80000A0` (XEvent) | tid F8000090, NtClose@line 2081 (fast) |
| `F8000108` (XObject) | `F8000110` (XEvent) | tid F80000CC, NtClose@line 10605 |
## Phase 1b — wait-site at `sub_821CB030+0x1AC` (PC `0x821CB1DC`)
Verifies the wait fires in canary too. 2 fires, both with `lr=0x821CB1D0`:
```
i> F8000090 TRACE-PC-LR pc=821CB1DC lr=821CB1D0 r3=F8000098 r4=FFFFFFFF r5=BC65CDC0
i> F80000C8 TRACE-PC-LR pc=821CB1DC lr=821CB1D0 r3=F8000108 r4=FFFFFFFF r5=BC667CC0
```
`r4=FFFFFFFF` → INFINITE wait timeout. Wait DOES execute in canary — but completes
(matched by subsequent NtClose). This is the AUDIT-041 wait-site `bl 0x824AA330`.
## Phase 2 — NtSetEvent triangulation
Probe target: NtSetEvent thunk PC `0x8284DF5C` (53,701 fires in 130s).
Cross-checked against the `sub_824AA2F0` (NtSetEvent wrapper) entry probe (20,919 fires).
### Identification of wedge-equivalent handle by NtSetEvent fire pattern
Hypothesis: the dup-XEvent (target of NtDuplicateObject) is what gets signaled.
In `canary-ntsetevent.log`, **dup handle `F8000110`** appears in NtSetEvent exactly **2×**:
```
i> F8000054 TRACE-PC-LR pc=8284DF5C lr=824AA304 r3=F8000110 r5=BC32CC60 r31=7036FDC0
i> F8000084 TRACE-PC-LR pc=8284DF5C lr=824AA304 r3=F8000110 r5=00000002 r31=705AF860
```
`lr=824AA304` = wrapper-internal post-bl PC inside `sub_824AA2F0` (NtSetEvent wrapper).
To get the **caller LR** (i.e. who called the wrapper), probe the wrapper entry `0x824AA2F0`.
### Wrapper-entry probe — cross-run structural correlation
In the wrapper-entry run, the handle namespace shifted slightly (per-run slab-allocator
nondeterminism), but the **r31 stack invariant** matches across runs.
Two-fire handle in the wrapper-entry run that matches r31 stack frames `7036FDC0` and
`705AF860` exactly:
```
i> F8000054 TRACE-PC-LR pc=824AA2F0 lr=82458D14 r3=F8000118 r4=BC369420 r5=BC32CC60 r31=7036FDC0
i> F8000084 TRACE-PC-LR pc=824AA2F0 lr=8245ED80 r3=F8000118 r4=705AF8B0 r5=00000002 r31=705AF860
```
**Cross-run match by (tid, r31)**: `F8000054@7036FDC0` and `F8000084@705AF860` are the same
two threads/stack-frames signaling the cache-IO completion event in both runs.
### Resolved canary signalers
| LR | Caller function | Pre-bl insn | Demangled |
|----|-----------------|-------------|-----------|
| `0x82458D14` | **`sub_82458B90`** | `bl 0x824AA2F0` @ 0x82458D10 | NtSetEvent wrapper call |
| `0x8245ED80` | **`sub_8245EC10`** | `bl 0x824AA2F0` @ 0x8245ED7C | NtSetEvent wrapper call |
Both LRs are NtSetEvent-wrapper call sites. Each fires once per wedge instance.
## Cross-reference with ours-side (sibling PROBE O findings)
From `ours-summary.md` (Phase 3 candidate-signaler table):
| Producer | Fires in ours | Distinct LRs | Notes |
|----------|---------------|--------------|-------|
| `sub_82458B90` | **1** | 0x82457f18 (sub_82457EF0+0x24) | direct NtSetEvent caller; **fires once but NOT on wedge handle** |
| `sub_8245EC10` | **0** | — | **0 static callers** — indirect-dispatch-only (audit-050 dead) |
### Static caller chains in ours's database
```
sub_82458B90 callers:
└─ sub_82457EF0+0x24 (only caller; sub_82457EF0 itself has 0 static callers — fnptr-array only)
sub_8245EC10 callers:
└─ NONE STATICALLY
Located in dispatch_table @ 0x820B5830 [slot 1]
slot 0: sub_8245F1D0
slot 1: sub_8245EC10
Table referenced from:
- sub_8245F1D0+0x1C (self-ref recursive)
- sub_8245FEB8+0x100 (stw r11, 0(r31) at 0x8245FFC0 — class vptr install)
sub_8245FEB8 callers: sub_8245FB68 (2 sites), sub_824601A0 (1 site)
sub_8245FB68 callers: sub_8245F880, sub_8245FAB0
sub_824601A0 callers: sub_82460118
```
Both signaler functions live in the worker cluster `0x82458xxx-0x8245Exxx`. `sub_8245EC10` is
a slot-1 entry in a 2-slot dispatch_table at `0x820B5830` — installed at struct offset 0
(vptr) by `sub_8245FEB8`'s constructor. `sub_82458B90`'s only static caller chain goes up
through `sub_82457EF0`, which itself has 0 static callers.
## Findings
1. **Wedge structural identification**: `sub_821CB030+0x128` creates a per-call file-IO
completion XEvent that is immediately duplicated and submitted to a worker
(`sub_82452DC0` @ +0x19C) for asynchronous file load. The wait at +0x1AC blocks until
the worker signals the duplicate XEvent.
2. **Canary signalers (the missing piece)**: Two distinct call-sites signal the wedge
in canary:
- `sub_82458B90` (= LR `0x82458D14`)
- `sub_8245EC10` (= LR `0x8245ED80`)
Both wrap `bl 0x824AA2F0` (NtSetEvent wrapper). Each fires once per file-IO completion.
3. **Static-graph triangulation for ours**:
- `sub_82458B90` has 1 static caller (`sub_82457EF0+0x24`); chain dies because
`sub_82457EF0` has 0 static callers (fnptr-array activation).
- `sub_8245EC10` has 0 static callers — vtable slot 1 in dispatch_table `0x820B5830`,
installed by `sub_8245FEB8` ctor; ctor's reachability chain also dies in the
`0x82458xxx-0x8245Fxxx` cluster.
4. **The wedge is downstream of AUDIT-050's unreachability island**. Both canary
signalers live in the half-bootstrapped worker cluster. The work-submitter
(`sub_82452DC0`) DOES fire in ours (8× per PROBE O) on tid=13 — but the queued
work never reaches a worker that calls `sub_82458B90` or `sub_8245EC10` because
the worker-side dispatch infrastructure (vtable install via `sub_8245FEB8` ctor;
fnptr-array activation of `sub_82457EF0`) never runs in ours.
5. **AUDIT-058's `sub_825070F0` activation hypothesis is corroborated**: `sub_825070F0`
(AUDIT-057's top missing-thread spawner, 4 workers @ ctx 0xBCE25340) is the
plausible bootstrap for the workers that would receive the queued work and run
the dispatch_table @ `0x820B5830` callbacks. Until that spawn happens in ours,
the worker side stays dead → signal never lands.
## Recommended AUDIT-060
1. **Direct path**: probe `sub_82452DC0+0x19C bl` site in canary (with our existing
`--log_lr_on_pc=0x82452E5C` or post-bl PC) to trace what happens after work submission.
Find which worker thread (one of the 4 spawned by `sub_825070F0`) dequeues the job
and ultimately calls `sub_82458B90` or `sub_8245EC10`.
2. **Indirect path**: probe `sub_8245FEB8` (vptr installer for dispatch_table `0x820B5830`)
in canary AND ours. If it fires in canary but not ours, that confirms the worker-class
constructor is in the unreachability island.
3. **Bootstrap path**: trace what activates `sub_825070F0` in canary (per AUDIT-058 it
fires 1× post-`\\dat\\movie` ResolvePath). Capture LR at `sub_825070F0` entry in
canary, then check that LR's caller-fn for fire count in ours.
## Artifacts
```
xenia-rs/audit-runs/audit-059-gamma-wedge/
canary-patches-applied.diff (audit-030 patch record before revert)
canary-ntcreate.log/.err (Phase 1: PC 0x821CB15C, 2 fires)
canary-waitsite.log/.err (Phase 1b: PC 0x821CB1DC, 2 fires)
canary-ntsetevent.log/.err (NtSetEvent thunk PC 0x8284DF5C; 53,701 fires; r3=F8000110 ×2)
canary-setwrapper.log/.err (NtSetEvent wrapper PC 0x824AA2F0; 20,919 fires; r3=F8000118 ×2)
canary-summary.md (this file)
ours-summary.md (sibling PROBE O ours-side findings)
```
Canary HEAD verified `6de80dffe`, working tree clean. xenia-rs untouched.