[AUDIT-059 R-A] Phase A backward-trace: divergence is sub_822F1AA8 loop exit, not factory/registry

Round-37 anchor reframe: both engines install the SAME static .rdata vtable 0x820A183C at [0x828E1F08]. Instance VAs differ only because of ε-class allocator divergence (audit-043). vtable bytes byte-identical; the user prompt's "factory/registry" framing was falsified. Phase A walkthrough (rounds A1..A8): - A.1 canary --audit_jit_prolog_pc=0x821741C8: tid=6, r3=0xBCCC4A80 (= inner sub-object of [0x828E1F08]'s singleton), LR=0x822F1D5C (return-from-bctrl inside sub_822F1AA8) - A.2 found tid=6 spawn site sub_821746B0 at PC 0x82174824 spawning entry=sub_821748F0 ctx=BC365700/BC366DA0. sub_822F1AA8 ALSO spawns a second thread (entry=sub_822F1EE0 ctx=BCE24A40) at PC 0x822F1B08 - A.3 sub_822F1AA8 has 2 callers, both in sub_8216EA68 (its sole caller is sub_824AB748 = entry_point) - A.4 ours mirror probe: sub_821746B0 enters, [0x828E2B14] gate passes, ExCreateThread fires returning handle 0x1070 (= tid=13). Ours' tid=13 IS the same logical thread as canary's spawned silph initializer - A.5 canary --audit_jit_prolog_pc=0x821749C0: fires only 2× on short-lived tid=17, tid=26 (the spawned initializers — NOT tid=6) - A.6 canary --audit_jit_prolog_pc=0x822F1AA8: fires 1× on tid=6 with r3=0xBCE24A40 LR=0x8216EE14 (the second sub_822F1AA8 call site) - A.7 canary --audit_jit_prolog_pc=0x824AB748 (entry_point): fires on tid=00000006. CONFIRMS canary's tid=6 = canary's main thread. Verdict: identical call chain entry_point → sub_8216EA68 → sub_822F1AA8 in both engines; same controller (ε-divergent VA, byte-identical fields). Canary's main thread stays in sub_822F1AA8's dispatcher loop firing sub_821741C8 ~1678×/30s. Ours' main thread exits the loop and thread-joins on the spawned initializer (tid=13), which is itself wedged on handle 0x1078 forever. Loop exit is gated by bit 28 of [r30+0] (the controller's flag word). Same value 0x21 at function entry in both engines. Some code between entry and loop check sets bit 28 in ours but not in canary. Mem-watch on 0x40d09a40 shows zero guest stores in ours' 50M parallel run — setter is either a kernel-side store, computed alias, or probe-quantum-elided JIT store. Phase B classification: Class 3a (state-divergence on controller object). The vtable is the same; the controller's bit 28 evolves differently during sub_822F1AA8 setup. Class 4 (synthesis) is now less attractive since we correctly reach the dispatcher with the right inputs — we just exit too soon. Phase C will need either JIT instrumentation to identify the bit-28 setter, or a kernel-side hook to clear bit 28 on entry to the loop check site. Findings notes: - round-A4b-ours-spawn-gate/FINDINGS.md (spawn topology + tid mapping) - round-A8-ours-822F1AA8-trace/FINDINGS.md (full loop structure + bit-28 gate) New reading-error class #18: probe-output anchor misframing (singleton[VA]=X vtable=Y was misread as "Y is canary-only vtable" when Y is the same .rdata vtable in both engines). Branch: iterate-2C/silph-ui-spawn-trace off master @ 229b46c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:02:20 +02:00
parent 229b46c765
commit 52c30d82a7
9 changed files with 259662 additions and 0 deletions
--- a/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/FINDINGS.md
@@ -0,0 +1,136 @@
+# Phase A synthesis — canary tid=6 IS the main thread; the wedge is sub_822F1AA8's loop exit
+
+## Top-line finding
+
+**Canary's `tid=6` is canary's main thread.** Confirmed by probing `entry_point`
+(`sub_824AB748`) with `--audit_jit_prolog_pc=0x824AB748`: fires 1× on
+`tid=00000006` with `lr=BCBCBCBC` (= OS-initial / no caller). Ours numbers
+its main thread `tid=1`. Same logical thread; different label.
+
+Therefore "tid=6 fires sub_821741C8 471×" (round 33) means **the main thread**
+loops inside `sub_822F1AA8` firing `sub_821741C8` ~1678×/30s in canary. In
+ours, the main thread (tid=1) runs `sub_822F1AA8` ONCE, exits the loop, and
+proceeds to thread-join on the spawned init thread (handle 0x1070 = tid=13),
+which is itself blocked forever on handle 0x1078.
+
+## Call chain (identical in both engines, different runtime behavior)
+
+```
+entry_point (sub_824AB748)
+  │
+  ├─ sub_824ACB38           CRT-driven fnptr-array iterator (audit-050 region)
+  ├─ ...
+  └─ sub_8216EA68           Many local calls including:
+        ├─ ExCreateThread(entry=sub_8217F0F8 ...)      ; sibling thread
+        ├─ sub_822F1AA8(controller=...)                ; FIRST call (PC 0x8216ECCC)
+        └─ sub_822F1AA8(controller=0xBCE24A40 canary / ; SECOND call (PC 0x8216EE10)
+                                  0x40d09a40 ours)        ↑ this is the loop
+```
+
+The SECOND call is what runs the dispatcher loop. Its LR = 0x8216EE14.
+Confirmed in both engines.
+
+## sub_822F1AA8 loop structure
+
+```
+0x822F1AA8: entry, r30 = r3 (controller)
+0x822F1AEC-0x822F1B08: ExCreateThread(entry=sub_822F1EE0, ctx=r30) → r29 = handle
+0x822F1B30-0x822F1B34: bl 0x824AA8B0(r3=r29)              ; ?
+0x822F1B38-0x822F1B4C: first bctrl → vtable[+0] of [0x828E1F08]
+0x822F1B50-0x822F1B74: setup, bl 0x824AA330 INFINITE wait on [r22+32]
+0x822F1B80-0x822F1BA8: post-wait setup; [r30+0] |= 0x2
+0x822F1BB0-0x822F1BBC: TOP-OF-LOOP CHECK: if [r30+0] & 0x10000000 → goto 0x822F1E10 (exit)
+0x822F1BCC..0x822F1DEC: loop body (includes the vtable[+8] bctrl → sub_821741C8 at PC 0x822F1D58)
+0x822F1DEC-0x822F1DFC: bl 0x824AA330 INFINITE wait on [r23+0]
+0x822F1E00-0x822F1E0C: END-OF-ITERATION CHECK: if [r30+0] & 0x10000000 == 0 → goto 0x822F1BCC (re-loop)
+0x822F1E10-0x822F1E18: EXIT: [r30+0] |= 0x02000000 (set MSB-6 = LSB-25)
+0x822F1E1C-0x822F1E24: release something via bl 0x824AA2F0
+0x822F1E28-0x822F1E30: bl 0x824AA330 INFINITE on [r30+28] = SPAWNED THREAD HANDLE (thread join!)
+0x822F1E40: bl 0x824AA3E0
+0x822F1E44-0x822F1E5C: final cleanup: vtable[+24] bctrl on [0x828E1F08]
+0x822F1E60-0x822F1E78: [r30+0] = 0, then [r30+0] |= 1; bl 0x824567E0
+0x822F1E7C-0x822F1E88: epilogue
+```
+
+**Loop exit gate**: `[r30+0] & 0x10000000` (bit 28 LSB / bit 3 MSB). Set →
+exit. Both top-of-loop check (0x822F1BBC) and end-of-iteration check
+(0x822F1E0C) gate on the same bit.
+
+## What's different between engines
+
+| Engine | [r30+0] at entry | Loop iterations | Exits sub_822F1AA8? |
+|--------|------------------|------------------|----------------------|
+| canary | 0x21 (per probe)  | ~1678+ in 30s    | NO (stays in loop)   |
+| ours   | 0x21 (per probe)  | 0 (probes show none of the loop-body PCs fire after entry) | YES (exits quickly) |
+
+Both engines have `[r30+0]=0x21` at entry — bit 28 NOT set. After the `ori
+r11, r11, 0x2` at 0x822F1B90, both should have `[r30+0]=0x23`. Bit 28 still
+not set.
+
+So **some code sets bit 28 on [r30+0] between sub_822F1AA8 entry and the
+loop check** in ours but not in canary.
+
+Mem-watch on 0x40d09a40 (ours' controller VA) shows **zero guest writes** in
+my 50M-instruction parallel run. Possible reasons:
+- The setter writes from kernel/runtime code that mem-watch doesn't capture
+  (kernel-host store, not guest JIT store)
+- The setter writes via a computed alias (different VA but same backing)
+- The bit IS set via a probe-quantum-elided JIT store
+
+## Phase B classification
+
+**Class 3a — state-divergence on the controller object**. The vtable
+identity is the same (round-37 confirmed `0x820A183C` in both). The
+controller object's bit 28 of `[+0]` evolves differently during the setup
+between sub_822F1AA8 entry and the loop check.
+
+Class 4 (synthesis) is now LESS attractive: ours' main thread DOES reach
+sub_822F1AA8 with the right controller. We don't need to spawn the
+dispatcher — we need to PREVENT the main thread from exiting the loop.
+
+## Pragmatic next step — JIT instrumentation to find bit-28 setter
+
+Most direct diagnostic: add a JIT hook in xenia-cpu that, for guest stores
+in the range [0x822F1AA8, 0x822F1E10), captures the guest PC + the written
+value when the store would set bit 28 of any address. This identifies the
+exact PC that sets the loop-exit bit.
+
+Alternative: extend `--mem-watch` to also capture kernel-side stores by
+hooking the GuestMemory write path at the kernel-state level.
+
+Even simpler: add a one-shot `--bit-watch=ADDR:MASK` cvar that fires when
+the value at ADDR has any bit in MASK transition from 0→1, regardless of
+who wrote it. This is the cleanest diagnostic for this exact pattern.
+
+## Fix shape (when bit-28 setter is identified)
+
+If the bit-28 setter is inside the vtable[+0] dispatch chain at 0x822F1B4C
+(target sub_82173990), then the fix might be a state-init issue in the
+kernel/runtime.
+
+If the bit-28 setter is inside the inner wait or one of the kernel calls
+(`bl 0x824AA8B0`, `bl 0x824AA330`), the fix might be a missing event signal
+or a wrong handle-state evolution.
+
+If we can't identify the setter cleanly, the synthesis fallback is to
+**inject a kernel-side hook that clears bit 28 of [r30+0] on every entry to
+sub_822F1AA8's bit-check site (0x822F1BB0)**. Crude but should keep the
+main thread in the loop.
+
+## Why this is a clearer wedge picture than rounds 22-33
+
+Rounds 22-33 chased the audit-049 wedge from various angles. The diagnoses
+landed on different layers:
+- R22: "wrong cluster targeted" (cluster A vs B)
+- R26-30: "state-machine progression bug"
+- R32-33: "pool 3 starvation; bootstrap walk-back"
+
+This round establishes the simplest possible framing:
+
+> **Canary's main thread loops forever in a dispatcher; ours' main thread
+> exits the loop after one setup phase. The exit is gated by a single bit
+> on the controller's flag word.**
+
+If bit 28 of `[controller+0]` could be permanently cleared, ours' main
+thread would stay in the loop, sub_821741C8 would dispatch, signals would
+flow, tid=13 would complete, draws would happen.