# Phase A synthesis — canary tid=6 IS the main thread; the wedge is sub_822F1AA8's loop exit ## Top-line finding **Canary's `tid=6` is canary's main thread.** Confirmed by probing `entry_point` (`sub_824AB748`) with `--audit_jit_prolog_pc=0x824AB748`: fires 1× on `tid=00000006` with `lr=BCBCBCBC` (= OS-initial / no caller). Ours numbers its main thread `tid=1`. Same logical thread; different label. Therefore "tid=6 fires sub_821741C8 471×" (round 33) means **the main thread** loops inside `sub_822F1AA8` firing `sub_821741C8` ~1678×/30s in canary. In ours, the main thread (tid=1) runs `sub_822F1AA8` ONCE, exits the loop, and proceeds to thread-join on the spawned init thread (handle 0x1070 = tid=13), which is itself blocked forever on handle 0x1078. ## Call chain (identical in both engines, different runtime behavior) ``` entry_point (sub_824AB748) │ ├─ sub_824ACB38 CRT-driven fnptr-array iterator (audit-050 region) ├─ ... └─ sub_8216EA68 Many local calls including: ├─ ExCreateThread(entry=sub_8217F0F8 ...) ; sibling thread ├─ sub_822F1AA8(controller=...) ; FIRST call (PC 0x8216ECCC) └─ sub_822F1AA8(controller=0xBCE24A40 canary / ; SECOND call (PC 0x8216EE10) 0x40d09a40 ours) ↑ this is the loop ``` The SECOND call is what runs the dispatcher loop. Its LR = 0x8216EE14. Confirmed in both engines. ## sub_822F1AA8 loop structure ``` 0x822F1AA8: entry, r30 = r3 (controller) 0x822F1AEC-0x822F1B08: ExCreateThread(entry=sub_822F1EE0, ctx=r30) → r29 = handle 0x822F1B30-0x822F1B34: bl 0x824AA8B0(r3=r29) ; ? 0x822F1B38-0x822F1B4C: first bctrl → vtable[+0] of [0x828E1F08] 0x822F1B50-0x822F1B74: setup, bl 0x824AA330 INFINITE wait on [r22+32] 0x822F1B80-0x822F1BA8: post-wait setup; [r30+0] |= 0x2 0x822F1BB0-0x822F1BBC: TOP-OF-LOOP CHECK: if [r30+0] & 0x10000000 → goto 0x822F1E10 (exit) 0x822F1BCC..0x822F1DEC: loop body (includes the vtable[+8] bctrl → sub_821741C8 at PC 0x822F1D58) 0x822F1DEC-0x822F1DFC: bl 0x824AA330 INFINITE wait on [r23+0] 0x822F1E00-0x822F1E0C: END-OF-ITERATION CHECK: if [r30+0] & 0x10000000 == 0 → goto 0x822F1BCC (re-loop) 0x822F1E10-0x822F1E18: EXIT: [r30+0] |= 0x02000000 (set MSB-6 = LSB-25) 0x822F1E1C-0x822F1E24: release something via bl 0x824AA2F0 0x822F1E28-0x822F1E30: bl 0x824AA330 INFINITE on [r30+28] = SPAWNED THREAD HANDLE (thread join!) 0x822F1E40: bl 0x824AA3E0 0x822F1E44-0x822F1E5C: final cleanup: vtable[+24] bctrl on [0x828E1F08] 0x822F1E60-0x822F1E78: [r30+0] = 0, then [r30+0] |= 1; bl 0x824567E0 0x822F1E7C-0x822F1E88: epilogue ``` **Loop exit gate**: `[r30+0] & 0x10000000` (bit 28 LSB / bit 3 MSB). Set → exit. Both top-of-loop check (0x822F1BBC) and end-of-iteration check (0x822F1E0C) gate on the same bit. ## What's different between engines | Engine | [r30+0] at entry | Loop iterations | Exits sub_822F1AA8? | |--------|------------------|------------------|----------------------| | canary | 0x21 (per probe) | ~1678+ in 30s | NO (stays in loop) | | ours | 0x21 (per probe) | 0 (probes show none of the loop-body PCs fire after entry) | YES (exits quickly) | Both engines have `[r30+0]=0x21` at entry — bit 28 NOT set. After the `ori r11, r11, 0x2` at 0x822F1B90, both should have `[r30+0]=0x23`. Bit 28 still not set. So **some code sets bit 28 on [r30+0] between sub_822F1AA8 entry and the loop check** in ours but not in canary. Mem-watch on 0x40d09a40 (ours' controller VA) shows **zero guest writes** in my 50M-instruction parallel run. Possible reasons: - The setter writes from kernel/runtime code that mem-watch doesn't capture (kernel-host store, not guest JIT store) - The setter writes via a computed alias (different VA but same backing) - The bit IS set via a probe-quantum-elided JIT store ## Phase B classification **Class 3a — state-divergence on the controller object**. The vtable identity is the same (round-37 confirmed `0x820A183C` in both). The controller object's bit 28 of `[+0]` evolves differently during the setup between sub_822F1AA8 entry and the loop check. Class 4 (synthesis) is now LESS attractive: ours' main thread DOES reach sub_822F1AA8 with the right controller. We don't need to spawn the dispatcher — we need to PREVENT the main thread from exiting the loop. ## Pragmatic next step — JIT instrumentation to find bit-28 setter Most direct diagnostic: add a JIT hook in xenia-cpu that, for guest stores in the range [0x822F1AA8, 0x822F1E10), captures the guest PC + the written value when the store would set bit 28 of any address. This identifies the exact PC that sets the loop-exit bit. Alternative: extend `--mem-watch` to also capture kernel-side stores by hooking the GuestMemory write path at the kernel-state level. Even simpler: add a one-shot `--bit-watch=ADDR:MASK` cvar that fires when the value at ADDR has any bit in MASK transition from 0→1, regardless of who wrote it. This is the cleanest diagnostic for this exact pattern. ## Fix shape (when bit-28 setter is identified) If the bit-28 setter is inside the vtable[+0] dispatch chain at 0x822F1B4C (target sub_82173990), then the fix might be a state-init issue in the kernel/runtime. If the bit-28 setter is inside the inner wait or one of the kernel calls (`bl 0x824AA8B0`, `bl 0x824AA330`), the fix might be a missing event signal or a wrong handle-state evolution. If we can't identify the setter cleanly, the synthesis fallback is to **inject a kernel-side hook that clears bit 28 of [r30+0] on every entry to sub_822F1AA8's bit-check site (0x822F1BB0)**. Crude but should keep the main thread in the loop. ## Why this is a clearer wedge picture than rounds 22-33 Rounds 22-33 chased the audit-049 wedge from various angles. The diagnoses landed on different layers: - R22: "wrong cluster targeted" (cluster A vs B) - R26-30: "state-machine progression bug" - R32-33: "pool 3 starvation; bootstrap walk-back" This round establishes the simplest possible framing: > **Canary's main thread loops forever in a dispatcher; ours' main thread > exits the loop after one setup phase. The exit is gated by a single bit > on the controller's flag word.** If bit 28 of `[controller+0]` could be permanently cleared, ours' main thread would stay in the loop, sub_821741C8 would dispatch, signals would flow, tid=13 would complete, draws would happen.