[iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier

Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC). Canary sets it from `GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C. Left unwritten it read 0 for every thread. Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`. With index 0 for all threads, every thread marked slot 0; the multi-byte rendezvous signature it then spins on (`ld [obj+0x164]` compared against the packed per-slot expectation) could never assemble. Both pump threads busied at pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran their `KeSetEvent` loops — so the events they signal (the 21k-per-thread heartbeat in canary) never fired, starving the downstream worker handshake. Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler spawn + affinity migration) so the two stay in sync. Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier (barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c). imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical). draws still 0 (a later, separate render gate). golden re-baselined. cargo test --workspace: 672 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
2026-06-13 18:08:46 +02:00 · 2026-06-13 13:39:57 +02:00 · 2026-06-13 11:54:44 +02:00 · 2026-06-13 10:53:54 +02:00 · 2026-06-13 10:38:17 +02:00 · 2026-06-13 10:02:02 +02:00
53 changed files with 694918 additions and 206 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -11,3 +11,8 @@ audit-*.md
 *.stdout
 *.stderr
 *.log
 # Runtime cache artifacts (vkd3d-proton / DXVK shader caches dropped into the
 # working dir by the Wine canary build)
 vkd3d-proton.cache*
 *.dxvk-cache
--- a/audit-runs/audit-009/branch-probe.trace
+++ b/audit-runs/audit-009/branch-probe.trace
--- a/audit-runs/audit-059-handle-disambiguation/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/FINDINGS.md
@@ -0,0 +1,131 @@
 # AUDIT-059 — handle disambiguation (iterate 2.BD)
 **Date:** 2026-06-06. **Engines:** ours `target/release/xenia-rs -n 50M` (3.9 s wall, 50M instr, 40k import calls), canary Wine `xenia_canary.exe --mute=true --audit_handle_lifecycle=true` (~35 s wall, 34k log lines, 0 fatals).
 ## Verdict — HANDOFF's wedge handles are stale
 HANDOFF said: *"opt_callback signals 0x108c, tid=1 wedges on 0x10e8."* Both IDs are now `<UNCREATED>` in ours, along with `0x1090 / 0x10dc / 0x10fc / 0x1104` (also in HANDOFF's adjacent list). The allocation order shifted since that snapshot.
 ## Real wedges, current code state
 | Handle | Kind | Engine state | Waiter | Notes |
 |---|---|---|---|---|
 | **0x12a4** | `<UNCREATED>` | `<AUDIT_BLIND>`, waiters=1 | **tid=1 main**, pc=0x824ac578 | Wait went via `do_wait_single` but creation never hit `NtCreateEvent` — `KeInitializeEvent` path. **This is the iterate-2.BC wedge** (recorded as "0x10e8" in HANDOFF — same site, different ID). |
 | **0x12ac** | Event/Auto | `<NO_SIGNALS_DESPITE_WAITS>`, waiters=1 | **tid=13** silph UI cluster, pc=0x824ac578 lr=0x821cb1e0 | Frame trail: `0x821cb1e0 → 0x821cbae0 → 0x821cc454 → 0x821c4f18 → 0x82174a80`. Frames 3-5 carry `silph::UImpl@GamePart_Title` / `silph::VGamePart_Title` vtables — **audit-049's cluster, unchanged**. |
 | 0x12b8 | Event/Auto | NO_SIGNALS, waiters=1 | (tid TBD) | Sibling, 0xC bytes from 0x12ac. |
 | 0x1020 | Event/Manual | NO_SIGNALS, waiters=1 | — | γ-class. |
 | 0x1040 | Event/Auto | NO_SIGNALS, waits=32 (hot poll) | — | Heavy wait, no signal. |
 | 0x10a8 | Event/Auto | NO_SIGNALS, waits=7 | — | γ-class. |
 | 0x10e4 | Event/Manual | NO_SIGNALS, waiters=1, waits=2 | — | γ-class. |
 **Working handles** (sanity baseline): 0x1028 (Sema, 8 waits / 7 signals / 7 wakes), 0x10d0 (Sema, 2 waits / 1 signal / 1 wake), 0x10f0 (Event/Auto, 1/1/1 ✓ marked `<SUSPECT>` but actually fine), 0x10e0 (Event/Manual, 32 primary signals from somewhere).
 ## GPU interrupt delivery — the iterate-2.BC delta confirmed
 | Engine | gpu.interrupt.delivered (vsync) | EmulateCPInterruptDPC / vblank pump |
 |---|---:|---:|
 | **ours** | 54 (source=0) + 1 (source=1) | — |
 | **canary** | — | **4712** in 30 s ≈ 157 Hz |
 **~87× ratio.** Confirms HANDOFF's diagnosis: ours' victim-thread injector dies once guest threads all park; canary's host frame-limiter thread keeps firing regardless.
 ## Canary signaler attribution
 Top KeSetEvent guest_ptrs in canary (30 s window):
 | guest_ptr | KeSetEvent fires | Inferred role |
 |---|---:|---|
 | `0x828A3254` | 5729 | Audio host-pump worker (per AUDIT-032: `r3=0x828A3230` region) |
 | `0x828A3244` | 5728 | Audio host-pump sibling |
 | `0x828A3244` + 16-byte stride | — | Static XEX-image audio event struct |
 | `0xBCE25234` | 1301 | **silph UI cluster PKEVENT** (heap-allocated, 0x10 stride). Likely ours' 0x12ac analog. |
 | `0xBCE25214 / 0xBCE25244 / 0xBCE25224` | 648 / 603 / 603 | Sibling silph UI PKEVENTs (0x10 stride struct). Likely ours' 0x12a4 / 0x12b8 / 0x1040 analogs. |
 Ours signals every one of those equivalents **0 times**.
 ## Round 2 — LR-extended probes name the producer
 Extended the canary probes with guest-LR capture (5 sites in `xboxkrnl_threading.cc`, 10 LOC). Re-ran the harness. Now each `KeSetEvent` line carries the guest function that signaled the event. Result for the silph UI cluster:
 | PKEVENT | KeSetEvent count | Producer LR(s) |
 |---|---:|---|
 | `0xBCE25214` | 574 | `0x82508510` (single producer) |
 | `0xBCE25224` | 565 | `0x82508358` (single producer) |
 | `0xBCE25234` | 1153 | `0x82506C90` (579) + `0x82508524` (574) |
 | `0xBCE25244` | 570 | `0x82506F9C` (single producer) |
 | `0xBCE25284` | 1 | `0x82507ABC` (one-shot 5th-worker init?) |
 All 6 producer LRs sit in `0x82506000–0x82509000`. **This is exactly the `sub_825070F0` worker thread cluster** that audit-057/058 already named:
 > *audit-057: "sub_825070F0 (4 missing, initializes 4 workers w/ shared ctx 0xBCE25340, entries 0x82506528/58/88/B8)"*
 The 4 worker entries (`0x82506528/58/88/B8`) are inside `sub_82506xxx` — exactly where the producer LRs `0x82506C90`/`0x82506F9C` live. The other producer LRs `0x825083xx` / `0x825085xx` are in downstream callees (workers call deeper code which itself calls KeSetEvent).
 For comparison the audio host-pump pair gets a single sharp producer too:
 - `0x828A3254` × 5271 ← `lr=0x824D2A44`
 - `0x828A3244` × 5271 ← `lr=0x824D292C`
 (These match AUDIT-032's PC `0x824D229C / r3=0x828A3230` region — already-understood audio host-pump.)
 ## Verdict — 2.BE is INSUFFICIENT for the silph UI wedge
 The silph UI PKEVENTs are signaled exclusively by threads spawned by `sub_825070F0`. Per audit-057/058, **`sub_825070F0` fires 0× in ours** — those 4 worker threads never spawn. Therefore the PKEVENTs are never signaled. Therefore tid=13 (`0x12ac` in ours) wedges forever.
 **`sub_825070F0`'s call chain is gated by the audit-009 "unreachability island"** — a CRT-driven fnptr-array bootstrap that ours fails to enumerate. VSync delivery is irrelevant to that bootstrap; the host frame-limiter thread does not drive CRT initializers.
 Therefore:
 - **2.BE alone CANNOT unwedge tid=13.** It will close the 54-vs-4712 VSync delivery gap and may unblock things downstream of vsync, but the silph UI wedge has an independent missing-signaler root cause.
 - **2.BE may still unwedge tid=1 main on `0x12a4`** — that wait went via `KeInitializeEvent` (handle never hit `NtCreateEvent` in ours, hence `<AUDIT_BLIND>`). Whether `0x12a4`'s signaler depends on VSync is unknown without further probing.
 ## Implications for next moves
 A single fix won't take us to draws > 0. We need at least two:
 1. **2.BE (VSync delivery)** — still worth landing for the architectural correctness it brings, AND because it's the only fix that can unwedge tid=1 main's `0x12a4` if that's vsync-derived. ~60–80 LOC per Agent C's plan.
 2. **2.BF (sub_825070F0 activation)** — this is the audit-058 unfinished business. Options:
   - (a) **Static work:** trace canary's CRT-driven fnptr-array path that activates the silph UI bootstrap; backport the missing init into ours. High info, slow. Requires more probing.
   - (b) **Direct synthetic spawn:** ours injects host-side `ExCreateThread` calls for the 4 worker entries at boot completion, mirroring AUDIT-048's audio-host-pump precedent. Pragmatic; ~40 LOC; risks getting context (`0xBCE25340`) wrong.
 A possible third move:
 3. **Re-probe with LR on Wait paths** (we already added it but didn't grep for it) — to tell us whether tid=1's wait on `0x12a4` is the same LR as `sub_825070F0`-chain or a totally different signaler. If different, it's a 3rd missing producer.
 ## Round 4 — wait-side guest LR via one-frame back-chain walk
 After fixing the PPC stack-walk offset (Xbox 360 stores saved LR at `[prev_sp - 8]`, not the `+4` AIX convention), wait-side LR comes through cleanly.
 **Canary's top wait sites:**
 | canary handle | wait count | guest_lr | LR region | mapping |
 |---|---:|---|---|---|
 | `F800005C` | 1635 | `0x8216EE14` | kernel early-boot infra | unrelated |
 | `F800000C` | 1597 | `0x824AFFC4` | xboxkrnl wrapper (scheduler / work-queue?) | unrelated |
 | **`F80000DC`** | **476** | **`0x821C7D3C`** | **silph::UImpl/GamePart** | **= ours' 0x12ac silph UI wedge** |
 | `F80000B0` | 6 across | `0x821CBAE0` + `0x821CC19C` + `0x822DFE2x/D0` | **exact match with audit-049's frame trail** | sibling silph UI wait |
 Identity proof: ours' audit-049 frame trail for the silph UI wedge was `0x821cb1e0 / 0x821cbae0 / 0x821cc454 / 0x821c4f18 / 0x82174a80`. Round 4 captures `0x821CBAE0` and `0x821CC19C` (adjacent PCs) as wait LRs in canary — same cluster, same code.
 **Refined verdict.** ours' `0x12a4` (tid=1 main, AUDIT_BLIND) and `0x12ac` (tid=13 silph UI) are 8 bytes apart — likely sibling KEVENT fields in the same silph UI struct. canary's analogs are in the `F80000xx` namespace, similarly clustered. The single fix that addresses both:
 > **2.BF (b)** — synthetic host-side spawn of `sub_825070F0`'s 4 workers at the audit-058-identified context (`0xBCE25340`), entries `0x82506528/58/88/B8`. Once those workers run, they signal the silph UI PKEVENT cluster, unwedging BOTH tid=1 main and tid=13 silph UI in one shot.
 2.BE (host-driven VSync ISR delivery) becomes follow-on work after the UI bootstrap completes and frame pacing actually matters.
 ## Open questions for iterate 2.BD′ / 2.BE planning
 1. **Does 2.BE alone unwedge tid=13?** Cheapest verification path: land 2.BE and re-run audit-059, see whether `0x12ac` signal count goes 0 → non-zero.
 2. **What is the LR-pattern of canary's `KeSetEvent guest_ptr=0xBCE25234` callers?** The current probe doesn't capture LR — extending the cvar to do so on a filtered subset would let us name the producer function in canary's namespace.
 3. **Does the GPU frame-limiter's CP interrupt actually walk into the silph UI cluster?** I.e., does `EmulateCPInterruptDPC` → `interrupt_callback` → guest code ever hit `sub_821CB030` or its callees? An LR probe inside `EmulateCPInterruptDPC` would answer this.
 ## Artifacts
 - `canary.log` 2.2 MB / 34,095 lines / 32,977 AUDIT-HLC lines
 - `canary.stdout` 2.2 MB (duplicate of canary.log due to log_file fallback)
 - `canary.stderr` 8.4 KB (Wine diagnostics)
 - `ours.log` 479 lines (focus ledger + thread diagnostics + final state)
 - `ours.stderr` 317 lines (kernel-call counters)
 - `vkd3d-proton.cache.write` 15 KB (build artifact, ignored)
 Commits in play (xenia-canary, fork-local only):
 - `03362b59f` cross-build-wine (cross-compile toolchain)
 - `d031d7c51` audit-handle-lifecycle-probes (this audit's probes)
--- a/audit-runs/audit-059-handle-disambiguation/ROUND_34_PLAN.md
+++ b/audit-runs/audit-059-handle-disambiguation/ROUND_34_PLAN.md
@@ -0,0 +1,116 @@
 # Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN
 ## Background
 Rounds 23-33 drove γ-cluster #2 down to the actual gate: **`sub_821741C8`** (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from `sub_821752C0` thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.
 The fix shape is a synth module analogous to `silph_synth.rs` (rounds 18-21):
 - Synthesize a singleton-like object with the right vtable
 - Spawn a guest thread at the right entry with this object as r3
 - Let the dispatch chain do the rest
 Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.
 ## Sub-round breakdown (estimated 5-8 rounds)
 ### 34.α — Probe canary's dispatcher singleton (1 round)
 Capture canary's runtime state at `sub_821741C8` entry:
 - `r3 = 0xBCA44C00` (canary tid=6's dispatcher singleton)
 - Dump `r3..r3+0x80` to identify all fields
 - Note vtable address at `[r3+0]`
 ```bash
 WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
  --audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
  --audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
  ...
 ```
 ### 34.β — Probe full vtable layout (1 round)
 Read the vtable bytes statically from the PE (canary's `[r3+0]` IS a static XEX VA — same trick as round 21):
 - Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
 - Confirm slot 9 = `sub_821C7CB8` and `vtable+0x24` thunk to `sub_821741C8`
 - Look at all other slots — do any reference deep guest code that needs more init?
 Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.
 ### 34.γ — Skeleton synth + thread spawn (1 round)
 Create `crates/xenia-kernel/src/silph_ui_synth.rs` mirroring `silph_synth.rs` structure:
 ```rust
 pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
    if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
    // Allocate ~0x100-0x200 bytes for the dispatcher singleton
    let ctx = state.heap_alloc(0x200, 16)?;
    mem.write_zeros(ctx, 0x200);
    // Install static-XEX vtable at [+0]
    mem.write_u32(ctx + 0x00, VTABLE_VA);  // discovered in 34.β
    // Other init fields from 34.α dump
    // ...
    // Spawn dispatcher thread at sub_821748F0 with r3=ctx
    scheduler.spawn(SpawnParams{
        entry: 0x821748F0,
        start_context: ctx,
        create_suspended: false,
        ...
    })?;
    state.silph_ui_synth_done = true;
    state.silph_ui_synth_ctx = ctx;
    Ok(ctx)
 }
 ```
 Hook point: first reach of `sub_821CB030` in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).
 Add 3-mode env gate: `XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}`.
 ### 34.δ — Run + diagnose first crash (1 round)
 Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:
 - Probe at thread entry + early BB heads
 - Identify the offset that's accessed
 - Compare to canary's value at that offset
 ### 34.ε..η — Iterate on field fills (2-4 rounds)
 Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).
 ### 34.θ — Producer-side seeding (1 round)
 Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are `sub_821CBEA8 / sub_821D24A0 / sub_821CD458` — they may need their own bootstrap. Probe what triggers them in canary.
 ## Verification at each stage
 After every commit:
 - `cargo test --release --workspace` — 765/765 must pass
 - `XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac`
 - Check:
  - No crash
  - `sub_821741C8` fires
  - `sub_82450b68` r4=3 fires increase
  - Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
  - Eventually: `VdSwap > 1, draws > 0`
 ## Risk register
 - **High**: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
 - **High**: foreign-arena pointers in canary's heap (similar to round 19's `[+0x28/+0x2C/+0x30]`) may need their own synthesis
 - **Medium**: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
 - **Low**: workspace tests breaking (probe infrastructure is solid)
 - **Low**: existing iterate-2BE work regressing (it's on a separate branch)
 ## Off-ramps
 If we hit a wall at any sub-round, the off-ramps are:
 1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
 2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
 3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers
 ## Branch plan
 New branch: `iterate-2BF/silph-ui-synth` off `iterate-2BF/synthetic-silph-spawn` HEAD `40f208e`. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.
 ## When ready to execute
 Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.
--- a/audit-runs/audit-059-handle-disambiguation/round-A1-canary-dispatcher-entry/canary.stdout
+++ b/audit-runs/audit-059-handle-disambiguation/round-A1-canary-dispatcher-entry/canary.stdout
--- a/audit-runs/audit-059-handle-disambiguation/round-A4-ours-chain-probe/ours.log
+++ b/audit-runs/audit-059-handle-disambiguation/round-A4-ours-chain-probe/ours.log
@@ -0,0 +1,66 @@
 AUDIT-PC-PROBE pc=0x8216ea68 tid=1 hw=0 cycle=5362918 lr=0x824ab8e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6181256 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
 AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181641 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9229300 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-PC-PROBE pc=0x821748f0 tid=13 hw=1 cycle=0 lr=0xbcbcbcbc r3=0x4024a840 r11=0x00000000 [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
 === Final State ===
 PC:  0x00000000
 LR:  0xbcbcbcbc
 CTR: 0x00000000
 CR:  0x00000000
 XER: CA=0 OV=0 SO=0
 === Thread diagnostics ===
  hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
     r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
  hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
     r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
     r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
  hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
     r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
  hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
     r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
  hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
     r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
     r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
  hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4128], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
     r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
  hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
  hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
  hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
  hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
     r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
  hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
     r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
  hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
  hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
     r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
     r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
  -- Handle waiter lists --
    handle=0x00001020 Semaphore(0/2147483647) waiters(tid)=[8]
    handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
    handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
    handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
    handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
    handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
    handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
    handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
    handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
    handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
    handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
--- a/audit-runs/audit-059-handle-disambiguation/round-A4b-ours-spawn-gate/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/round-A4b-ours-spawn-gate/FINDINGS.md
@@ -0,0 +1,167 @@
 # Round-A1..A4 findings — canary tid=6 spawn chain & divergence frontier
 ## Anchor reframe (round-37 misread corrected)
 The "factory/registry layer divergence at [0x828E1F08]" framing is falsified.
 Both engines install the SAME static-XEX `.rdata` vtable `0x820A183C` at the
 singleton's `[+0]`. The instance VAs differ only because of ε-class allocator
 divergence (audit-043).
 | Probe                      | Canary               | Ours                 |
 |----------------------------|----------------------|----------------------|
 | `[0x828E1F08]`             | 0xBC22C910 (heap)    | 0x40111910 (heap)    |
 | `[[0x828E1F08]+0]` vtable  | 0x820A183C           | 0x820A183C (SAME)    |
 | `vtable[+0]` thunk         | 0x82175330           | 0x82175330 (SAME)    |
 | `vtable[+8]` thunk         | 0x82175340 → b sub_821741C8 | SAME (vtable bytes from XEX `.rdata`) |
 The thunks at 0x82175330+ are 8-byte `lwz r3, 8(r3); b <real_method>`
 trampolines. Slot 2 (`+0x08`) is the worker dispatch entry that round 33
 identified as 471× in canary tid=6 / 0× in ours.
 ## A.1 — Canary dispatcher loop is in sub_822F1AA8 on tid=6
 Probe `--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=256` on
 canary (35 s):
 - ~1678 fires of sub_821741C8 on **tid=6**
 - r3 at entry = `0xBCCC4A80` (the inner sub-object of the silph::UImpl
  singleton — extracted via the thunk's `lwz r3, 8(r3)`)
 - LR at entry = `0x822F1D5C` (return PC after the `bctrl` at 0x822F1D58 inside
  sub_822F1AA8)
 - Singleton's `[+C0..+D0]` UTF-16 spells "HF Frequency" (a UI label)
 The dispatch site in canary (the `bctrl`) is at PC 0x822F1D58 inside
 sub_822F1AA8:
 ```
 0x822F1D40:  lwz     r3, 7944(r25)        ; r3 = [r25+0x1F08] = [0x828E1F08]
 0x822F1D4C:  lwz     r11, 0(r3)           ; vtable
 0x822F1D50:  lwz     r11, 8(r11)          ; vtable[+8] = thunk 0x82175340
 0x822F1D54:  mtctr   r11
 0x822F1D58:  bctrl                         ; → 0x82175340 → b 0x821741C8
 ```
 ## A.2 — Canary tid=6 spawn site is sub_821746B0 at PC 0x82174824
 Enumeration of `ExCreateThread` calls in canary (35 s, 21 unique tuples):
 ```
 entry=821748F0 start_ctx=BC365700 lr=824AC5F0 guest_lr=82174828  ← silph dispatcher #1
 entry=821748F0 start_ctx=BC366DA0 lr=824AC5F0 guest_lr=82174828  ← silph dispatcher #2
 ```
 PC `0x82174824` is the `bl 0x82172370` (the `ExCreateThread` thunk) inside
 `sub_821746B0`. The setup is:
 ```
 0x8217480C:  lis     r11, 0x8217
 0x82174810:  li      r7, 0
 0x82174814:  li      r6, 4               ; priority
 0x82174818:  mr      r5, r29             ; start_ctx
 0x8217481C:  addi    r4, r11, 18672      ; r4 = 0x821748F0 (entry)
 0x82174820:  li      r3, 0
 0x82174824:  bl      0x82172370          ; ExCreateThread
 ```
 The entry `0x821748F0` is a thread main that calls `bl 0x821749C0` (the
 inner dispatch).
 ## A.3 — sub_822F1AA8 spawns a SECOND thread at 0x822F1B08
 The dispatch-loop function `sub_822F1AA8` itself ALSO spawns a thread at
 PC 0x822F1B08 with entry=`sub_822F1EE0` and `start_ctx=BCE24A40`:
 ```
 0x822F1AEC:  lis     r11, 0x822F
 0x822F1AFC:  addi    r4, r11, 7904        ; r4 = 0x822F1EE0
 0x822F1B08:  bl      0x82172370           ; ExCreateThread
 ```
 sub_822F1EE0 → sub_822F1F20 contains its own atomic state-machine + wait loop.
 ## A.3' — sub_822F1AA8 has exactly 2 callers, both in sub_8216EA68
 ```
 source=0x8216ECCC source_func=0x8216EA68 kind=call
 source=0x8216EE10 source_func=0x8216EA68 kind=call
 ```
 So sub_8216EA68 is the only function that drives sub_822F1AA8.
 ## A.4 — Ours' divergence is INSIDE the spawned thread, NOT at the spawn
 Mirror-probed ours at `sub_821746B0` body BB heads (parallel mode, 50M
 instructions, XENIA_CACHE_PERSIST=1):
 | PC          | Fires | Notes                                          |
 |-------------|-------|------------------------------------------------|
 | 0x821746B0  | 1     | Entry. r3=0x40ba9a80                           |
 | 0x821746E0  | 1     | After `bl 0x8284DCFC` (critical-section)       |
 | 0x82174798  | 1     | After the early `beq` (r28==0 branch)          |
 | 0x821747B8  | 1     | **Past the gate**: `[0x828E2B14]=0x40105000` non-NULL; `bl 0x82150EF8` returned r3=0x4024a840 (NON-NULL) |
 | 0x821747D8  | 1     | After the inner `bl 0x821723F0`                |
 | 0x8217480C  | 1     | Enters the spawn block                         |
 | 0x82174828  | 1     | **Post-`bl ExCreateThread`**, r3=0x1070 = thread handle |
 **OURS DOES SPAWN THE THREAD VIA THIS SITE.** The returned handle 0x1070 is
 **tid=13's thread handle** (per round 37 final state). So **ours' tid=13 IS
 the same logical thread as canary's tid=6** — spawned by the identical call
 site with the same entry (0x821748F0).
 ## A.4 — Divergence is INSIDE the spawned thread's body
 Round 37's frame trail for ours' tid=13 wedge:
 `0x821CB1E0 → 0x821CBAE0 → 0x821CC454 → 0x821C4F18 → 0x82174A80`
 The LAST frame `0x82174A80` is **inside sub_821749C0** (= the inner dispatch
 called from sub_821748F0). It's right after the vtable dispatch at
 0x82174A78 (`bctrl` on `[r30+vtable][+16]`):
 ```
 0x82174a64:  mr      r3, r30              ; r3 = some object
 0x82174a68:  lwz     r11, 0(r30)
 0x82174a6c:  lwz     r4, 4(r29)
 0x82174a70:  lwz     r5, 8(r31)
 0x82174a74:  lwz     r11, 16(r11)         ; r11 = vtable[+0x10]
 0x82174a78:  mtctr   r11
 0x82174a7c:  bctrl                         ; dispatch
 0x82174a80:  lwz     r3, 0(r29)           ; ← wedge frame top (LR after bctrl)
 ```
 So `sub_821749C0`'s vtable[+0x10] dispatch on tid=13/tid=6's `r30` object
 lands at audit-049 territory in ours (chain through sub_821CB030+0x128 that
 ends waiting forever on handle 0x1078). In canary, the same dispatch on the
 same object SHOULD land somewhere that ultimately reaches sub_822F1AA8's
 dispatch loop and runs sub_821741C8 1678× via vtable[+8].
 **The object `r30` is the result of `bl 0x821CF3F0`** at PC 0x821749DC. So
 sub_821CF3F0 returns a registry-lookup object; the vtable on this object's
 slot +0x10 method's body determines whether the thread wedges or runs.
 ## Phase B classification
 Class 3 — **Missing init-time precondition**. Ours reaches the spawn site,
 ours' tid=13 enters the chain, ours' tid=13 enters sub_821749C0, but the
 vtable[+0x10] dispatch at PC 0x82174A78 in ours lands in audit-049 territory
 (wait forever on 0x1078) rather than continuing through the canonical chain
 toward sub_822F1AA8's outer dispatch loop.
 Possible classes to refine in next round:
 - **3a**: same vtable but state-dependent — `r30`'s field at a specific offset
  differs in ours vs canary, causing the method body to take a different
  branch.
 - **3b**: the vtable in `r30` is DIFFERENT in ours vs canary (e.g., ours has
  a base-class vtable but canary has a derived-class vtable).
 - **4**: synthesis fallback — spawn a SECOND thread that runs sub_822F1AA8's
  dispatch loop directly, bypassing the wedged sub_821749C0 chain.
 ## Next probe (A.4.5)
 Probe both engines at sub_821749C0 entry filtering tid=13 (ours) / tid=6
 (canary), capturing:
 - `r3` and `r4` at entry (the factory-output object and the ctx)
 - After the `bl 0x821CF3F0` at 0x821749DC: capture r30 (= sub_821CF3F0
  return — the object whose vtable is dispatched at 0x82174A78)
 - At PC 0x82174A78 (the divergent bctrl): r30 + r30+0 (vtable) + vtable[+0x10]
  (the dispatch target)
 If ours and canary have IDENTICAL `vtable[+0x10]` targets but the method
 body's behavior differs → class 3a (state divergence). If targets differ →
 class 3b (vtable identity divergence).
--- a/audit-runs/audit-059-handle-disambiguation/round-A4b-ours-spawn-gate/ours.log
+++ b/audit-runs/audit-059-handle-disambiguation/round-A4b-ours-spawn-gate/ours.log
@@ -0,0 +1,91 @@
 AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9228833 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746b0 tid=1 cycle=9228833
 AUDIT-PC-PROBE pc=0x821746e0 tid=1 hw=0 cycle=9228856 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746e0 tid=1 cycle=9228856
 AUDIT-PC-PROBE pc=0x82174798 tid=1 hw=0 cycle=9228859 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174798 tid=1 cycle=9228859
 AUDIT-PC-PROBE pc=0x821747b8 tid=1 hw=0 cycle=9229012 lr=0x821747ac r3=0x4024a840 r11=0x4024a840 [r3+0]=0x4024ace0 [[r3+0]+24]=0x43777290 [r3+0x0C]=0x4024a820 [r3+0x30]=0x4250dec0
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747b8 tid=1 cycle=9229012
 AUDIT-PC-PROBE pc=0x821747d8 tid=1 hw=0 cycle=9229440 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747d8 tid=1 cycle=9229440
 AUDIT-PC-PROBE pc=0x8217480c tid=1 hw=0 cycle=9229443 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x8217480c tid=1 cycle=9229443
 AUDIT-PC-PROBE pc=0x82174828 tid=1 hw=0 cycle=9229509 lr=0x82174828 r3=0x00001070 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174828 tid=1 cycle=9229509
 === Final State ===
 PC:  0x824ac578
 LR:  0x824ac578
 CTR: 0x82153bf0
 CR:  0x24000028
 XER: CA=0 OV=0 SO=0
 r0 : 0x0000000082153bf0
 r1 : 0x00000000700ff6e0
 r2 : 0x0000000020000000
 r4 : 0x0000000000000001
 r7 : 0x0000000003a72328
 r8 : 0x0000000043b77284
 r9 : 0x0000000043b77328
 r10: 0x0000000000000001
 r11: 0x0000000000000103
 r12: 0x0000000082173c64
 r13: 0x000000007fff0000
 r18: 0x0000000040d09a7c
 r23: 0x00000000828f3844
 r26: 0x000000004024a620
 r27: 0x00000000820a17a8
 r31: 0x0000000000001070
 === Thread diagnostics ===
  hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
     r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
  hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
     r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
     r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
  hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
     r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
  hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
     r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
  hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
     r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
     r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
  hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
     r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
  hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
  hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
  hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
  hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
     r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
  hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
     r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
  hw=5 idx=2 tid=10 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71487e00
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
  hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
     r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
     r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
  -- Handle waiter lists --
    handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
    handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
    handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
    handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
    handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
    handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
    handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
    handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
    handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
    handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
    handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
--- a/audit-runs/audit-059-handle-disambiguation/round-A5-canary-sub821749C0/canary.stdout
+++ b/audit-runs/audit-059-handle-disambiguation/round-A5-canary-sub821749C0/canary.stdout
--- a/audit-runs/audit-059-handle-disambiguation/round-A6-canary-822F1AA8/canary.stdout
+++ b/audit-runs/audit-059-handle-disambiguation/round-A6-canary-822F1AA8/canary.stdout
--- a/audit-runs/audit-059-handle-disambiguation/round-A7-canary-entry-point/canary.stdout
+++ b/audit-runs/audit-059-handle-disambiguation/round-A7-canary-entry-point/canary.stdout
--- a/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/FINDINGS.md
@@ -0,0 +1,136 @@
 # Phase A synthesis — canary tid=6 IS the main thread; the wedge is sub_822F1AA8's loop exit
 ## Top-line finding
 **Canary's `tid=6` is canary's main thread.** Confirmed by probing `entry_point`
 (`sub_824AB748`) with `--audit_jit_prolog_pc=0x824AB748`: fires 1× on
 `tid=00000006` with `lr=BCBCBCBC` (= OS-initial / no caller). Ours numbers
 its main thread `tid=1`. Same logical thread; different label.
 Therefore "tid=6 fires sub_821741C8 471×" (round 33) means **the main thread**
 loops inside `sub_822F1AA8` firing `sub_821741C8` ~1678×/30s in canary. In
 ours, the main thread (tid=1) runs `sub_822F1AA8` ONCE, exits the loop, and
 proceeds to thread-join on the spawned init thread (handle 0x1070 = tid=13),
 which is itself blocked forever on handle 0x1078.
 ## Call chain (identical in both engines, different runtime behavior)
 ```
 entry_point (sub_824AB748)
  │
  ├─ sub_824ACB38           CRT-driven fnptr-array iterator (audit-050 region)
  ├─ ...
  └─ sub_8216EA68           Many local calls including:
        ├─ ExCreateThread(entry=sub_8217F0F8 ...)      ; sibling thread
        ├─ sub_822F1AA8(controller=...)                ; FIRST call (PC 0x8216ECCC)
        └─ sub_822F1AA8(controller=0xBCE24A40 canary / ; SECOND call (PC 0x8216EE10)
                                  0x40d09a40 ours)        ↑ this is the loop
 ```
 The SECOND call is what runs the dispatcher loop. Its LR = 0x8216EE14.
 Confirmed in both engines.
 ## sub_822F1AA8 loop structure
 ```
 0x822F1AA8: entry, r30 = r3 (controller)
 0x822F1AEC-0x822F1B08: ExCreateThread(entry=sub_822F1EE0, ctx=r30) → r29 = handle
 0x822F1B30-0x822F1B34: bl 0x824AA8B0(r3=r29)              ; ?
 0x822F1B38-0x822F1B4C: first bctrl → vtable[+0] of [0x828E1F08]
 0x822F1B50-0x822F1B74: setup, bl 0x824AA330 INFINITE wait on [r22+32]
 0x822F1B80-0x822F1BA8: post-wait setup; [r30+0] |= 0x2
 0x822F1BB0-0x822F1BBC: TOP-OF-LOOP CHECK: if [r30+0] & 0x10000000 → goto 0x822F1E10 (exit)
 0x822F1BCC..0x822F1DEC: loop body (includes the vtable[+8] bctrl → sub_821741C8 at PC 0x822F1D58)
 0x822F1DEC-0x822F1DFC: bl 0x824AA330 INFINITE wait on [r23+0]
 0x822F1E00-0x822F1E0C: END-OF-ITERATION CHECK: if [r30+0] & 0x10000000 == 0 → goto 0x822F1BCC (re-loop)
 0x822F1E10-0x822F1E18: EXIT: [r30+0] |= 0x02000000 (set MSB-6 = LSB-25)
 0x822F1E1C-0x822F1E24: release something via bl 0x824AA2F0
 0x822F1E28-0x822F1E30: bl 0x824AA330 INFINITE on [r30+28] = SPAWNED THREAD HANDLE (thread join!)
 0x822F1E40: bl 0x824AA3E0
 0x822F1E44-0x822F1E5C: final cleanup: vtable[+24] bctrl on [0x828E1F08]
 0x822F1E60-0x822F1E78: [r30+0] = 0, then [r30+0] |= 1; bl 0x824567E0
 0x822F1E7C-0x822F1E88: epilogue
 ```
 **Loop exit gate**: `[r30+0] & 0x10000000` (bit 28 LSB / bit 3 MSB). Set →
 exit. Both top-of-loop check (0x822F1BBC) and end-of-iteration check
 (0x822F1E0C) gate on the same bit.
 ## What's different between engines
 | Engine | [r30+0] at entry | Loop iterations | Exits sub_822F1AA8? |
 |--------|------------------|------------------|----------------------|
 | canary | 0x21 (per probe)  | ~1678+ in 30s    | NO (stays in loop)   |
 | ours   | 0x21 (per probe)  | 0 (probes show none of the loop-body PCs fire after entry) | YES (exits quickly) |
 Both engines have `[r30+0]=0x21` at entry — bit 28 NOT set. After the `ori
 r11, r11, 0x2` at 0x822F1B90, both should have `[r30+0]=0x23`. Bit 28 still
 not set.
 So **some code sets bit 28 on [r30+0] between sub_822F1AA8 entry and the
 loop check** in ours but not in canary.
 Mem-watch on 0x40d09a40 (ours' controller VA) shows **zero guest writes** in
 my 50M-instruction parallel run. Possible reasons:
 - The setter writes from kernel/runtime code that mem-watch doesn't capture
  (kernel-host store, not guest JIT store)
 - The setter writes via a computed alias (different VA but same backing)
 - The bit IS set via a probe-quantum-elided JIT store
 ## Phase B classification
 **Class 3a — state-divergence on the controller object**. The vtable
 identity is the same (round-37 confirmed `0x820A183C` in both). The
 controller object's bit 28 of `[+0]` evolves differently during the setup
 between sub_822F1AA8 entry and the loop check.
 Class 4 (synthesis) is now LESS attractive: ours' main thread DOES reach
 sub_822F1AA8 with the right controller. We don't need to spawn the
 dispatcher — we need to PREVENT the main thread from exiting the loop.
 ## Pragmatic next step — JIT instrumentation to find bit-28 setter
 Most direct diagnostic: add a JIT hook in xenia-cpu that, for guest stores
 in the range [0x822F1AA8, 0x822F1E10), captures the guest PC + the written
 value when the store would set bit 28 of any address. This identifies the
 exact PC that sets the loop-exit bit.
 Alternative: extend `--mem-watch` to also capture kernel-side stores by
 hooking the GuestMemory write path at the kernel-state level.
 Even simpler: add a one-shot `--bit-watch=ADDR:MASK` cvar that fires when
 the value at ADDR has any bit in MASK transition from 0→1, regardless of
 who wrote it. This is the cleanest diagnostic for this exact pattern.
 ## Fix shape (when bit-28 setter is identified)
 If the bit-28 setter is inside the vtable[+0] dispatch chain at 0x822F1B4C
 (target sub_82173990), then the fix might be a state-init issue in the
 kernel/runtime.
 If the bit-28 setter is inside the inner wait or one of the kernel calls
 (`bl 0x824AA8B0`, `bl 0x824AA330`), the fix might be a missing event signal
 or a wrong handle-state evolution.
 If we can't identify the setter cleanly, the synthesis fallback is to
 **inject a kernel-side hook that clears bit 28 of [r30+0] on every entry to
 sub_822F1AA8's bit-check site (0x822F1BB0)**. Crude but should keep the
 main thread in the loop.
 ## Why this is a clearer wedge picture than rounds 22-33
 Rounds 22-33 chased the audit-049 wedge from various angles. The diagnoses
 landed on different layers:
 - R22: "wrong cluster targeted" (cluster A vs B)
 - R26-30: "state-machine progression bug"
 - R32-33: "pool 3 starvation; bootstrap walk-back"
 This round establishes the simplest possible framing:
 > **Canary's main thread loops forever in a dispatcher; ours' main thread
 > exits the loop after one setup phase. The exit is gated by a single bit
 > on the controller's flag word.**
 If bit 28 of `[controller+0]` could be permanently cleared, ours' main
 thread would stay in the loop, sub_821741C8 would dispatch, signals would
 flow, tid=13 would complete, draws would happen.
--- a/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/ours.log
+++ b/audit-runs/audit-059-handle-disambiguation/round-A8-ours-822F1AA8-trace/ours.log
@@ -0,0 +1,79 @@
 AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6180796 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
 AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181181 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
 === Final State ===
 PC:  0x824ac578
 LR:  0x824ac578
 CTR: 0x82153bf0
 CR:  0x24000028
 XER: CA=0 OV=0 SO=0
 r0 : 0x0000000082153bf0
 r1 : 0x00000000700ff6e0
 r2 : 0x0000000020000000
 r4 : 0x0000000000000001
 r7 : 0x0000000003a72328
 r8 : 0x0000000043b77284
 r9 : 0x0000000043b77328
 r10: 0x0000000000000001
 r11: 0x0000000000000103
 r12: 0x0000000082173c64
 r13: 0x000000007fff0000
 r18: 0x0000000040d09a7c
 r23: 0x00000000828f3844
 r26: 0x000000004024a4e0
 r27: 0x00000000820a17a8
 r31: 0x0000000000001070
 === Thread diagnostics ===
  hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
     r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
  hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
     r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
     r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
  hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
     r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
  hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
     r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
  hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
     r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
     r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
  hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
     r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
  hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
  hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
     r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
     r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
  hw=4 idx=0 tid=9 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71387df0
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
  hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
     r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
     r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
  hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
     r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
     r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
  hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
     r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
     r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
  hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
     r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
     r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
  -- Handle waiter lists --
    handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
    handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
    handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
    handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
    handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
    handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
    handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
    handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
    handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
    handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
    handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
--- a/audit-runs/audit-059-handle-disambiguation/round-C1-setter-validation/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/round-C1-setter-validation/FINDINGS.md
@@ -0,0 +1,127 @@
 # Phase C.1 — Validation refutes Phase A's bit-28 setter hypothesis
 ## TL;DR
 Phase A claimed: "bit 28 of `[0x40d09a40]` (controller word) gets set in ours, causing sub_822F1AA8's dispatcher loop to exit early; candidate setter is `sub_821B55D8` at PC `0x821B5DA4`."
 **Phase C.1 falsifies this in 4 sub-rounds:**
 1. **`sub_821B55D8` is dead code** in both engines — its `XamInputSetState` wrapper `sub_824AA858` fires 0× in both.
 2. **`[0x40d09a40]` is never set to anything with bit 28** — `--dump-addr` at end of run shows `+0x00 = 0x00000021`, the entry value. Bit 28 is NEVER set.
 3. **The actual wedge is at the `bcctrl` at PC `0x822F1B4C`** (inside sub_822F1AA8 setup, BEFORE the dispatcher loop). tid=1 never reaches the loop top-check.
 4. **The bcctrl calls `sub_82173990`** (vtable[0] of the dispatcher singleton at `[0x828E1F08]`), which eventually waits for tid=13 to terminate. tid=13 wedges in the audit-049 silph::UImpl@GamePart_Title chain on handle `0x1078`.
 The C.2 force-clear POC (the planned next step) would have **zero effect** because bit 28 is never set. Skipped per plan stopping criterion.
 ## Probe-fire counts (ours, 50M-instr parallel)
 | PC | sub-round | fires | meaning |
 |---|---|---|---|
 | `0x821B55D8` (Phase A candidate fn entry) | 1 | **0** | function never reached → β/γ |
 | `0x821B5D98,DA0,DAC,D48` (loop BB heads)  | 1 | **0** | function never reached |
 | `0x822F1AA8` (sub_822F1AA8 entry)         | 2,3,4 | 2-3 | reached |
 | `0x822F1B38` (post-`bl 0x824AA8B0`)       | 4 | 2 | reached |
 | `0x822F1B50` (post-`bcctrl`)              | 4 | **0** | **bcctrl never returns** |
 | `0x822F1B60,B78,B80,BBC` (loop setup/top) | 3 | 0 | unreachable past bcctrl |
 | `0x822F1E10` (loop exit cleanup)          | 2 | 0 | loop never entered, never exited |
 | `0x822F1E34` (post-thread-join)           | 2 | 0 | never reached |
 | `0x82173990` (vtable[0] target)           | 4 | 2 | called via bcctrl, r3=singleton (LR=0x822F1B50) |
 | `0x821748F0` (tid=13 entry)               | 4 | 2 | tid=13 runs |
 | `0x821C4EB0` (silph::UImpl@GamePart_Title) | 4 | 2 | audit-009/049 reached on tid=13 |
 | `0x82457388,0x824574C0,0x82457408,0x82457490` (other oris candidates) | 2 | 0 | unreachable |
 ## Canary probe results
 | PC | fires | meaning |
 |---|---|---|
 | `0x824AA858` (XamInputSetState wrapper) | **0** | sub_821B55D8 chain is dead code in CANARY too |
 | `0x822F1B50` (post-bcctrl, attempted) | **0** | canary's JitProlog only fires at function entries, so not directly testable; but per audit round-33 sub_821741C8 fires 471× in canary → bcctrl DOES return in canary |
 ## Critical evidence: `--dump-addr=0x40d09a40` at end of run
 ```
 addr=0x40d09a40
  +0x00: 00 00 00 21 00 00 00 01 42 44 df 00 40 54 1a 40
                  ^^^^^^^^^^^                 ^^^^^^^^^^^
  +0x10: 40 54 1b 40 40 54 1b 80 40 54 1b c0 00 00 10 54
  +0x20: 00 00 00 00 40 24 a8 20 00 00 00 08 00 00 00 00
 ```
 - `[+0x00] = 0x00000021` ← bit 28 (mask 0x10000000) is NOT SET. Same value as at sub_822F1AA8 entry.
 - `[+0x1c] = 0x00001054` ← spawned init thread handle (= tid=8's thread handle, NOT 0x1070)
 - Thread state: tid=1 waits on handle `0x1070`, tid=13 waits on handle `0x1078`.
 Handle `0x1070` is **tid=13's thread handle** (per stderr: `ExCreateThread: tid=13 handle=0x1070 entry=0x821748f0 ctx=0x4024a840 suspended=true`). So tid=1's wait at the wedge point is a **thread-join on tid=13**, NOT a thread-join on the dispatcher init thread (tid=8, handle 0x1054).
 ## Wedge path (corrected)
 ```
 entry_point (sub_824AB748)  [tid=1 main]
  └─ sub_8216EA68
        └─ sub_822F1AA8(controller=0x40d09a40)       [LR=0x8216EE14]
              ├─ ExCreateThread(entry=sub_822F1EE0, ctx=controller)  [PC 0x822F1B08]
              │     ⇒ tid=8 spawn, handle=0x1054  (suspended)
              ├─ bl 0x824AA8B0 (no-op probe)                          [PC 0x822F1B34]
              └─ bcctrl on vtable[+0] of [0x828E1F08] singleton       [PC 0x822F1B4C]
                    │
                    └─ sub_82173990(r3=singleton)   [r3=0x40ba9a80, vtable=0x40111910]
                          └─ ... (768-byte function with ≥18 calls; calls sub_82448AA0, sub_824AA7A0,
                              sub_82448BC8, sub_82448C50, sub_8216F218, sub_8217C850, sub_82178E50,
                              sub_821835E0, ...)
                              └─ ... → KeWaitForSingleObject INFINITE on handle 0x1070
                                       (= tid=13's thread handle, thread-join)
                                       ⇒ WEDGE — tid=13 never exits
 (Concurrently — spawned somewhere else, not from sub_822F1AA8:)
 [tid=13, spawn-handle=0x1070, ctx=0x4024a840]
  └─ sub_821748F0 (worker boilerplate, entry from ExCreateThread)
        ├─ sub_82172798, sub_82172818
        └─ sub_821749C0
              └─ sub_821CF3F0
                    └─ ... → sub_821C4EB0 (UImpl@GamePart_Title@silph)   [audit-009/049!]
                          └─ ... → sub_821CB030 (creates KEVENT at +0x128)
                                ⇒ KeWaitForSingleObject INFINITE on handle 0x1078
                                ⇒ WEDGE — handle 0x1078 is never signaled in ours
 ```
 ## Why Phase A's hypothesis is wrong
 Phase A:
 1. Disassembled sub_822F1AA8's body, observed the bit-28 loop-exit check at `0x822F1BB8` and end-of-iter check at `0x822F1E0C`.
 2. Mem-watch on `0x40d09a40` showed zero stores → inferred "the setter writes via some path mem-watch doesn't capture."
 3. DB-scanned `oris ?, ?, 0x1000` (49 sites), found `sub_821B55D8 + 0x821B5DA4` with pattern `bl sub_824AA858 ; if r3 == 0xAA: oris r11, 0x1000 ; stw`.
 4. Concluded `sub_821B55D8` was the setter.
 What Phase A missed:
 - Mem-watch's 0-stores result was correct: **NO setter exists**. Bit 28 is never set in either engine. The mem-watch null-result was a hint that the bit-28 hypothesis itself was wrong, but Phase A interpreted it as "mem-watch misses something."
 - The disasm-based hypothesis was visually compelling (a loop iterating arrays and setting bit 28 when a kernel call returns 0xAA) but never verified runtime.
 - `sub_821B55D8` is itself dead code in both engines.
 ## Reading-error class #19: disasm-pattern-match without runtime verification
 When scanning for a hypothesized signal source via DB pattern-match (`oris ?, ?, 0x1000`), the analyst must run a probe to verify the suspected site is *both reached* and *takes the suspected path* before declaring it the cause. Phase A bypassed both checks. The single `--dump-addr=0x40d09a40` flag in sub-round 2 (literally 4 keystrokes added to the existing probe command) revealed the central assumption was wrong.
 ## Real divergence (handed to next session)
 This is the **same wedge as audit-049/058/059**: tid=13 wedges in the silph::UImpl@GamePart_Title cluster on handle `0x1078`. tid=1 wedges on tid=13's thread-handle (`0x1070`) inside `sub_82173990`'s call chain.
 `sub_82173990` is vtable[0] of the dispatcher singleton at `[0x828E1F08]`. It's a 768-byte function with ≥18 calls; the actual wait site is somewhere down its tree. To localize where in `sub_82173990` the wait happens, probe its BB heads + the `KeWaitForSingleObject` thunks (`sub_824AA330`, `sub_824AA708`).
 The fix-shape is **NOT** "force-clear bit 28." The fix-shape is **"signal handle 0x1078 in the audit-049 cluster, or short-circuit tid=13's wait."** Round 22 (silph_synth.rs) attempted the cluster-A version of this. Cluster B (silph::UImpl) needs its own synthesis or a kernel-side signal of handle 0x1078.
 ## Phase C verdict
 - C.1: 4 sub-rounds executed (within budget).
 - C.2: **NOT EXECUTED** — POC would be no-op since bit 28 is never set. Per plan stopping criterion, do not proceed to C.2 blind when C.1 refutes the diagnosis.
 - C.3: not applicable.
 - Branch state: no source changes. Audit artifacts only.
 ## Files in this directory
 - `ours-c1-probe.log/stderr` — sub-round 1, probe at sub_821B55D8 BB heads (0 fires)
 - `ours-sr2-confirm-bit28.log/stderr` — sub-round 2, probe loop top/exit + dump-addr (bit 28 NEVER SET)
 - `ours-sr3-wait-trace.log/stderr` — sub-round 3, probe wait site + handle 0x1070 trace
 - `ours-sr4-bcctrl-trace.log/stderr` — sub-round 4, probe pre/post bcctrl + sub_82173990 entry + tid=13 entry (decisive)
 - canary side in `../round-C1-setter-validation-canary/`:
  - `canary-824AA858.log` — XamInputSetState wrapper fires 0× in canary too
  - `canary-822F1B50.log` — JitProlog can't probe at BB-internal PCs (function-entry-only)
--- a/audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md
+++ b/audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md
@@ -0,0 +1,144 @@
 # Phase D — Audit-049 Auto-Signal POC — FINDINGS
 **Branch**: `iterate-2C/silph-ui-spawn-trace` (extends Phase C `481591f`)
 **Date**: 2026-06-11
 **Sub-rounds**: D2.SR1 → D2.SR4 (4/4 used)
 **Verdict**: **B — partial unwedge**
 ## Mission
 Phase C diagnosed the audit-049 wedge as tid=13 (silph::UImpl@GamePart_Title) waiting INFINITE on a KEVENT created at `sub_821CB030+0x128` (`lr=0x821cb15c`, post-bl PC). The Phase D POC tests this diagnosis by hooking `NtCreateEvent` from that exact call site and auto-signaling the resulting handle after a configurable delay (`XENIA_SILPH_UI_AUTOSIGNAL_DELAY` instructions).
 If tid=13 unblocks, the diagnosis is confirmed. If new wedges or new threads appear downstream, even better — that's actual game progression past the wedge.
 ## Result summary
 | Symptom | SR2/SR3 baseline | SR4 (POC firing) |
 |---|---|---|
 | `silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c` | yes (SR2/SR3) | yes |
 | `silph autosignal: firing handle=0x1078` | NO | **yes (cycle 16326209)** |
 | handle 0x1078 final | `signaled=false waiters=1 <NO_SIGNALS_DESPITE_WAITS>` | `signal_attempts=1 waiters=0` |
 | tid=13 final state | `Blocked(WaitAny[0x1078])` | **`Ready` pc=0x824a9108** |
 | tid=1 final state | `Blocked(WaitAny[0x1070])` thread-join | `Blocked(WaitAny[0x1070])` (tid=13 not yet exited) |
 | ExCreateThread total | 10 | **12 (+tid=14, +tid=15)** |
 | New downstream wedges | none past 0x1078 | **0x1084 (Event/Auto), 0x1088 (Event/Manual)** |
 | `cxx_throw` runtime_error decoded | none | **yes, stack depth 6, top L0=0x82612b50 → L4=sub_82450B60+0x1A8 → L6=sub_82450a50** |
 | VdSwap | 1 | 1 |
 | gpu.interrupt.delivered{source=0} | 6393 | 4539 (different trajectory, no draws) |
 **Conclusion**: tid=13 unwedged cleanly from the audit-049 wait, spawned two follow-on threads (tid=14 entry=`silph` ctx=`0x40929c00`, tid=15 a worker), and progressed deep enough into the silph::UImpl state machine to throw a `runtime_error` from sub_82450a50 → sub_82450B60+0x1A8 (the dispatcher cluster from round 26). The auto-signal **is not** the proper signaler — it lets tid=13 proceed but downstream state-machine invariants the missing real signaler would have established are not in place, so the dispatcher trips on a "not-registered instance" lookup.
 This is a **clean confirmation** of the Phase C diagnosis: the wedge handle, the wait site, and the LR filter are all correct. The fix shape is:
 - Either: synthesize the missing signaler properly (cluster-B silph_ui_synth.rs analogue from R33's deferred plan)
 - Or: track what the auto-signal needed to write into the work-item state (`[+8]` field per R26) BEFORE signaling, so the dispatcher's BST lookup succeeds
 ## Sub-round detail
 ### D2.SR1 — initial run, hook never fires (wrong LR filter)
 Filter checked `creator_lr ∈ [0x821CB15C, 0x821CB160]` against `ctx.lr` at `nt_create_event` entry. But `ctx.lr` is the **thunk wrapper return slot** (`0x824a9f6c`), not the guest caller's post-bl PC. Confirmed via handle-audit `created stack` dump: frame 0 lr=`0x824a9f6c`, frame 1 lr=`0x821cb15c`. The guest caller's LR lives one frame up the PPC EABI back-chain.
 Diagnosis classification: **D (filter mismatch)**. Reading-error class #20 (new).
 ### D2.SR2 — frame-1-LR fix; hook schedules, never fires
 Refactored `maybe_register_silph_autosignal` to take `(ctx, mem)`, walk back-chain via existing `walk_guest_back_chain` (1 step), match the saved LR. Hook now fires:
 ```
 silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 10000 (now=0, delay=10000)
 ```
 But no "firing" log appears, and tid=13 stays Blocked. Classification: **D (drain site never reached)**.
 ### D2.SR3 — diagnostic added; confirms drain site never visited
 Added a one-shot info-level "tick (first visit, none due)" log inside `fire_due_silph_autosignals` when pending is non-empty but nothing due. Re-ran. **The tick-diagnostic never fired either** — proving the function isn't being called at all in `--parallel` mode.
 Root cause: `--parallel` dispatches to `run_execution_parallel` (line 2928 of main.rs), which has its own outer loop at line 3186. My Phase D wiring only touched the lockstep path at line 2763. Classification: **D (wrong code path wired)**.
 ### D2.SR4 — parallel-path wiring added; hook fires; tid=13 unblocks
 Added the same `set_now_cycle_hint` + `fire_due_silph_autosignals` calls inside the parallel outer loop, right after `coord_pre_round` (and under the same `kernel_arc` guard, so no extra locking). Re-built, re-ran.
 Now all three log lines appear:
 ```
 silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 16326202 (now=16316202, delay=10000)
 silph autosignal: tick (first visit, none due) now=16316213 pending=1 first_deadline=16326202
 silph autosignal: firing handle=0x1078 prev_signaled=Some(false) at cycle 16326209
 ```
 `now=16316202` at schedule time confirms `set_now_cycle_hint` is wired through correctly (the parallel path was simply never visited in SR2/SR3). Fire at cycle 16326209 = deadline 16326202 + 7-cycle scheduler granularity. Diagnostic classification: **B (partial unwedge — new waits and cxx_throw downstream)**.
 ## Code shape
 POC is ~70 LOC across four files, all env-gated. Default off.
 | File | Change | Lines |
 |---|---|---|
 | `crates/xenia-cpu/src/scheduler.rs` | `GuestThread.start_entry/start_context` fields; `spawn()` populates; `current_thread_entry_and_ctx()` helper | +18 |
 | `crates/xenia-kernel/src/state.rs` | `AutoSignalPending` struct; `silph_autosignal_*` fields; `set_now_cycle_hint`, `maybe_register_silph_autosignal`, `fire_due_silph_autosignals` methods | +95 |
 | `crates/xenia-kernel/src/exports.rs` | Hook in `nt_create_event` | +3 |
 | `crates/xenia-app/src/main.rs` | Fire-site wiring in lockstep loop (line 2788) **and** parallel loop (line 3215) | +12 |
 Tests stay green at **655/655**.
 ## Reading-error class #20 (new)
 **`ctx.lr` at kernel export entry ≠ guest caller's post-bl PC.** When a guest `bl` calls an export thunk, the thunk-wrapper has its own frame between the guest caller and the export body. At export-body entry, `ctx.lr` holds the *wrapper's* return slot, not the guest caller's post-bl PC.
 To match a specific guest call site by LR, the export must walk one step up the back-chain (`walk_guest_back_chain(ctx.gpr[1], ctx.lr, mem, 2)`) and use `frames[1].lr`.
 SR1 burned one full sub-round on this. Detect early in future POCs by comparing `ctx.lr` against the handle-audit's `created stack` frame dump for a known-good event (e.g. one created from a labelled site).
 ## Reading-error class #21 (new)
 **`--parallel` and lockstep have separate outer loops in main.rs.** They share `coord_pre_round` (carved out exactly for this reason), but anything wired adjacent to that call site only takes effect on the path it's wired on. Lockstep is `run_execution` (line 2706, outer loop at 2763). Parallel is `run_execution_parallel` (line 2928, outer loop at 3186).
 Per-round hooks added for a specific build mode must be wired in **both** paths. SR2/SR3 burned two sub-rounds on this.
 ## Files modified + LR mapping (for follow-up sessions)
 **Wedge handle creation** (confirmed by handle-audit dump):
 ```
 created cycle=0 tid=13 lr=0x824a9f6c [src=NtCreateEvent thunk return]
 created stack (6 frames):
   [ 0] fp=0x715a7a10 lr=0x824a9f6c   ← ctx.lr at nt_create_event
   [ 1] fp=0x715a7aa0 lr=0x821cb15c   ← guest caller's post-bl PC (filter on this)
   [ 2] fp=0x715a7bd0 lr=0x821cbae0   ← sub_821CBA08 frame
   [ 3] fp=0x715a7cd0 lr=0x821cc454   ← sub_821CC3F8 frame
   [ 4] fp=0x715a7d60 lr=0x821c4f18   ← sub_821C4EB0 frame (silph::UImpl@GamePart_Title)
   [ 5] fp=0x715a7e00 lr=0x82174a80   ← sub_821748F0 trampoline frame
 ```
 **Downstream cxx_throw stack** (after auto-signal fires, tid=5 throws runtime_error):
 ```
 L0 lr=0x82612b50  std::exception throw path
 L1 lr=0x825f2444
 L2 lr=0x824547e8
 L3 lr=0x82451418
 L4 lr=0x82450d08  ← sub_82450B60+0x1A8 (dispatcher, audit-059 R26)
 L5 lr=0x82450b34
 L6 lr=0x82450a50  ← sub_82450a50 (worker dispatch)
 cxx_throw runtime_error decoded magic=0x19930520
 cxx_throw BST ceil search candidate_key=0x828e2b2c match_found=false
 cxx_throw lhs (not-registered instance) lhs=0x715a7af0
 ```
 This confirms the dispatcher reached audit-049 territory (R26's `sub_82450B60+0x1A8` PC `0x82450D08`), looked up a runtime instance in its BST keyed by VA, and the instance was never registered. **The auto-signal bypassed an upstream registration step** the real signaler would have driven.
 ## Recommendation
 Ship the POC env-gated (default off; no behavior change unless opted in). The verdict-B success makes it a useful diagnostic flag for future audit-049 work: future investigations can set `XENIA_SILPH_UI_AUTOSIGNAL_DELAY=10000` to skip the wedge and probe downstream behavior without first writing the proper signaler.
 Long-term fix path remains the R33 silph_ui_synth.rs analogue: synthesize the missing signaler + its precondition state (BST instance registration at `0x715a7af0`-equivalent, work-item state `[+8]` per R26). The auto-signal POC is **not** the final fix — it confirms diagnosis but doesn't honor the dispatcher's BST registry invariant.
 ## Artifacts
 - `poc-sr1.log`, `poc-sr1.stderr` — initial run, filter mismatch (D)
 - `poc-sr2.log`, `poc-sr2.stderr` — frame-1-LR fix, no fire (D)
 - `poc-sr3.log`, `poc-sr3.stderr` — diagnostic added, no fire (D, parallel path unwired)
 - `poc-sr4.log`, `poc-sr4.stderr` — parallel-path wired, **fires + partial unwedge (B)**
 All `.log`/`.stderr` files are `.gitignore`d; this `FINDINGS.md` is the only artifact-side commit.
--- a/audit-runs/audit-059-handle-disambiguation/round27-state-advance/disasm-sub82450B60.txt
+++ b/audit-runs/audit-059-handle-disambiguation/round27-state-advance/disasm-sub82450B60.txt
@@ -0,0 +1,200 @@
  0x82450b60:  lwz     r18, 9792(r31)
  0x82450b64:  lwz     r16, 13880(r14)
  0x82450b68:  mflr    r12
  0x82450b6c:  bl      0x825F0F74
  0x82450b70:  subi    r31, r1, 176
  0x82450b74:  stwu    r1, -176(r1)
  0x82450b78:  mr      r29, r4
  0x82450b7c:  mr      r27, r3
  0x82450b80:  cmpwi   cr6, r29, 5
  0x82450b84:  bne     cr6, 0x82450B94
  0x82450b88:  addi    r28, r27, 196
  0x82450b8c:  addi    r26, r27, 28
  0x82450b90:  b       0x82450BAC
  0x82450b94:  slwi    r11, r29, 2
  0x82450b98:  mr      r26, r27
  0x82450b9c:  add     r11, r29, r11
  0x82450ba0:  slwi    r11, r11, 2
  0x82450ba4:  add     r11, r11, r27
  0x82450ba8:  addi    r28, r11, 96
  0x82450bac:  addi    r23, r27, 56
  0x82450bb0:  mr      r3, r23
  0x82450bb4:  stw     r23, 84(r31)
  0x82450bb8:  bl      0x8284DCFC
  0x82450bbc:  mr      r3, r26
  0x82450bc0:  bl      0x8284DCFC
  0x82450bc4:  lwz     r7, 16(r28)
  0x82450bc8:  cntlzw  r11, r7
  0x82450bcc:  extrwi  r11, r11, 1, 26
  0x82450bd0:  cmplwi  cr6, r11, 0x0
  0x82450bd4:  beq     cr6, 0x82450BEC
  0x82450bd8:  mr      r3, r26
  0x82450bdc:  bl      0x8284DD0C
  0x82450be0:  mr      r3, r23
  0x82450be4:  bl      0x8284DD0C
  0x82450be8:  b       0x82450EE8
  0x82450bec:  lwz     r11, 12(r28)
  0x82450bf0:  lwz     r9, 8(r28)
  0x82450bf4:  srwi    r10, r11, 2
  0x82450bf8:  clrlwi  r8, r11, 30
  0x82450bfc:  cmplw   cr6, r9, r10
  0x82450c00:  bgt     cr6, 0x82450C08
  0x82450c04:  sub     r10, r10, r9
  0x82450c08:  lwz     r9, 4(r28)
  0x82450c0c:  slwi    r10, r10, 2
  0x82450c10:  slwi    r8, r8, 2
  0x82450c14:  lwz     r6, 8(r28)
  0x82450c18:  addi    r11, r11, 1
  0x82450c1c:  slwi    r6, r6, 2
  0x82450c20:  li      r24, 0
  0x82450c24:  lwzx    r10, r10, r9
  0x82450c28:  cmplw   cr6, r6, r11
  0x82450c2c:  lwzx    r30, r10, r8
  0x82450c30:  stw     r11, 12(r28)
  0x82450c34:  stw     r30, 80(r31)
  0x82450c38:  bgt     cr6, 0x82450C40
  0x82450c3c:  stw     r24, 12(r28)
  0x82450c40:  subic.  r11, r7, 1
  0x82450c44:  stw     r11, 16(r28)
  0x82450c48:  bne     0x82450C50
  0x82450c4c:  stw     r24, 12(r28)
  0x82450c50:  addi    r25, r27, 28
  0x82450c54:  mr      r3, r25
  0x82450c58:  bl      0x8284DCFC
  0x82450c5c:  mr      r3, r25
  0x82450c60:  stw     r30, 216(r27)
  0x82450c64:  bl      0x8284DD0C
  0x82450c68:  mr      r3, r26
  0x82450c6c:  bl      0x8284DD0C
  0x82450c70:  lwz     r11, 28(r30)
  0x82450c74:  clrlwi  r11, r11, 31
  0x82450c78:  cmplwi  cr6, r11, 0x0
  0x82450c7c:  bne     cr6, 0x82450D30
  0x82450c80:  lwz     r11, 8(r30)
  0x82450c84:  cmplwi  cr6, r11, 0x1
  0x82450c88:  blt     cr6, 0x82450CE4
  0x82450c8c:  bne     cr6, 0x82450D3C
  0x82450c90:  lwz     r11, 28(r30)
  0x82450c94:  rlwinm  r11, r11, 0, 29, 29
  0x82450c98:  cmplwi  cr6, r11, 0x0
  0x82450c9c:  beq     cr6, 0x82450CB0
  0x82450ca0:  mr      r4, r30
  0x82450ca4:  mr      r3, r27
  0x82450ca8:  bl      0x824510E0
  0x82450cac:  b       0x82450CBC
  0x82450cb0:  mr      r4, r30
  0x82450cb4:  mr      r3, r27
  0x82450cb8:  bl      0x824517B0
  0x82450cbc:  stw     r29, 220(r27)
  0x82450cc0:  bl      0x824AA830
  0x82450cc4:  mr      r11, r3
  0x82450cc8:  lwz     r3, 92(r27)
  0x82450ccc:  li      r5, 0
  0x82450cd0:  addi    r11, r11, 66
  0x82450cd4:  li      r4, 1
  0x82450cd8:  stw     r11, 224(r27)
  0x82450cdc:  bl      0x824AB158
  0x82450ce0:  b       0x82450D3C
  0x82450ce4:  lwz     r11, 28(r30)
  0x82450ce8:  mr      r4, r30
  0x82450cec:  mr      r3, r27
  0x82450cf0:  rlwinm  r11, r11, 0, 29, 29
  0x82450cf4:  cmplwi  cr6, r11, 0x0
  0x82450cf8:  beq     cr6, 0x82450D04
  0x82450cfc:  bl      0x82450F68
  0x82450d00:  b       0x82450D08
  0x82450d04:  bl      0x82451238
  0x82450d08:  stw     r29, 220(r27)
  0x82450d0c:  bl      0x824AA830
  0x82450d10:  mr      r11, r3
  0x82450d14:  lwz     r3, 92(r27)
  0x82450d18:  li      r5, 0
  0x82450d1c:  addi    r11, r11, 66
  0x82450d20:  li      r4, 1
  0x82450d24:  stw     r11, 224(r27)
  0x82450d28:  bl      0x824AB158
  0x82450d2c:  b       0x82450D3C
  0x82450d30:  lwz     r11, 28(r30)
  0x82450d34:  ori     r11, r11, 0x2
  0x82450d38:  stw     r11, 28(r30)
  0x82450d3c:  lwz     r11, 8(r30)
  0x82450d40:  mr      r29, r24
  0x82450d44:  cmpwi   cr6, r11, 2
  0x82450d48:  blt     cr6, 0x82450E08
  0x82450d4c:  cmpwi   cr6, r11, 3
  0x82450d50:  ble     cr6, 0x82450DA0
  0x82450d54:  cmpwi   cr6, r11, 4
  0x82450d58:  bne     cr6, 0x82450E08
  0x82450d5c:  lwz     r11, 28(r30)
  0x82450d60:  rlwinm  r11, r11, 0, 29, 29
  0x82450d64:  cmplwi  cr6, r11, 0x0
  0x82450d68:  bne     cr6, 0x82450D98
  0x82450d6c:  lwz     r29, 36(r30)
  0x82450d70:  mr      r3, r29
  0x82450d74:  lwz     r11, 0(r29)
  0x82450d78:  lwz     r11, 4(r11)
  0x82450d7c:  mtctr   r11
  0x82450d80:  bctrl
  0x82450d84:  clrlwi  r11, r3, 24
  0x82450d88:  cmplwi  cr6, r11, 0x0
  0x82450d8c:  beq     cr6, 0x82450D98
  0x82450d90:  mr      r3, r29
  0x82450d94:  bl      0x8244FB38
  0x82450d98:  li      r29, 1
  0x82450d9c:  b       0x82450E28
  0x82450da0:  addi    r3, r30, 40
  0x82450da4:  bl      0x82451DB8
  0x82450da8:  lwz     r11, 32(r30)
  0x82450dac:  cmplwi  cr6, r11, 0x0
  0x82450db0:  beq     cr6, 0x82450DCC
  0x82450db4:  rlwinm  r11, r11, 0, 0, 31
  0x82450db8:  lwz     r10, 4(r30)
  0x82450dbc:  lwz     r11, 4(r11)
  0x82450dc0:  cmplw   cr6, r10, r11
  0x82450dc4:  li      r11, 1
  0x82450dc8:  beq     cr6, 0x82450DD0
  0x82450dcc:  mr      r11, r24
  0x82450dd0:  clrlwi  r11, r11, 24
  0x82450dd4:  cmplwi  cr6, r11, 0x0
  0x82450dd8:  beq     cr6, 0x82450E00
  0x82450ddc:  lwz     r4, 8(r30)
  0x82450de0:  lwz     r5, 0(r30)
  0x82450de4:  lwz     r3, 32(r30)
  0x82450de8:  cmpwi   cr6, r4, 1
  0x82450dec:  ble     cr6, 0x82450DFC
  0x82450df0:  bl      0x8245D9D8
  0x82450df4:  li      r29, 1
  0x82450df8:  b       0x82450E28
  0x82450dfc:  stw     r4, 8(r3)
  0x82450e00:  li      r29, 1
  0x82450e04:  b       0x82450E28
  0x82450e08:  mr      r3, r26
  0x82450e0c:  stw     r26, 88(r31)
  0x82450e10:  bl      0x8284DCFC
  0x82450e14:  addi    r4, r31, 80
  0x82450e18:  mr      r3, r28
  0x82450e1c:  bl      0x823232C0
  0x82450e20:  mr      r3, r26
  0x82450e24:  bl      0x8284DD0C
  0x82450e28:  clrlwi  r11, r29, 24
  0x82450e2c:  cmplwi  cr6, r11, 0x0
  0x82450e30:  beq     cr6, 0x82450ECC
  0x82450e34:  lwz     r11, 28(r30)
  0x82450e38:  rlwinm  r11, r11, 0, 30, 30
  0x82450e3c:  cmplwi  cr6, r11, 0x0
  0x82450e40:  beq     cr6, 0x82450E68
  0x82450e44:  mr      r3, r26
  0x82450e48:  stw     r26, 88(r31)
  0x82450e4c:  bl      0x8284DCFC
  0x82450e50:  addi    r4, r31, 80
  0x82450e54:  mr      r3, r28
  0x82450e58:  bl      0x823232C0
  0x82450e5c:  mr      r3, r26
  0x82450e60:  bl      0x8284DD0C
  0x82450e64:  b       0x82450ECC
  0x82450e68:  lwz     r11, 40(r30)
  0x82450e6c:  cmplwi  cr6, r11, 0x0
  0x82450e70:  beq     cr6, 0x82450EA4
  0x82450e74:  rlwinm  r3, r11, 0, 0, 31
  0x82450e78:  bl      0x82458A70
  0x82450e7c:  lwz     r29, 40(r30)
--- a/audit-runs/audit-059-handle-disambiguation/round27-state-advance/disasm-sub82451238.txt
+++ b/audit-runs/audit-059-handle-disambiguation/round27-state-advance/disasm-sub82451238.txt
@@ -0,0 +1,80 @@
  0x82451238:  mflr    r12
  0x8245123c:  li      r0, 0
  0x82451240:  stw     r0, 4(r1)
  0x82451244:  bl      0x825F0F80
  0x82451248:  subi    r31, r1, 160
  0x8245124c:  stwu    r1, -160(r1)
  0x82451250:  mr      r30, r4
  0x82451254:  li      r9, 1
  0x82451258:  lwz     r10, 32(r30)
  0x8245125c:  stw     r30, 188(r31)
  0x82451260:  stw     r9, 8(r30)
  0x82451264:  cmplwi  cr6, r10, 0x0
  0x82451268:  beq     cr6, 0x82451288
  0x8245126c:  lwz     r11, 4(r30)
  0x82451270:  lwz     r8, 4(r10)
  0x82451274:  cmplw   cr6, r11, r8
  0x82451278:  bne     cr6, 0x82451288
  0x8245127c:  mr      r11, r9
  0x82451280:  li      r26, 0
  0x82451284:  b       0x82451290
  0x82451288:  li      r26, 0
  0x8245128c:  mr      r11, r26
  0x82451290:  clrlwi  r11, r11, 24
  0x82451294:  cmplwi  cr6, r11, 0x0
  0x82451298:  beq     cr6, 0x824512A0
  0x8245129c:  stw     r9, 8(r10)
  0x824512a0:  lwz     r3, 36(r30)
  0x824512a4:  lwz     r11, 0(r3)
  0x824512a8:  lwz     r11, 32(r11)
  0x824512ac:  mtctr   r11
  0x824512b0:  bctrl
  0x824512b4:  mr      r27, r3
  0x824512b8:  stw     r26, 84(r31)
  0x824512bc:  stw     r27, 96(r31)
  0x824512c0:  bl      0x82454498
  0x824512c4:  addi    r4, r31, 84
  0x824512c8:  bl      0x82454580
  0x824512cc:  stw     r26, 92(r31)
  0x824512d0:  addi    r11, r27, 2047
  0x824512d4:  lis     r10, 0x2
  0x824512d8:  clrrwi  r11, r11, 11
  0x824512dc:  cmplw   cr6, r11, r10
  0x824512e0:  stw     r11, 100(r31)
  0x824512e4:  ble     cr6, 0x824512F4
  0x824512e8:  lis     r11, 0x8207
  0x824512ec:  addi    r11, r11, 6724
  0x824512f0:  b       0x824512F8
  0x824512f4:  addi    r11, r31, 100
  0x824512f8:  addi    r3, r31, 84
  0x824512fc:  lwz     r4, 0(r11)
  0x82451300:  bl      0x82454B08
  0x82451304:  mr      r8, r8
  0x82451308:  mr      r28, r3
  0x8245130c:  stw     r28, 92(r31)
  0x82451310:  b       0x82451324
  0x82451314:  lwz     r30, 188(r31)
  0x82451318:  lwz     r27, 96(r31)
  0x8245131c:  li      r26, 0
  0x82451320:  lwz     r28, 92(r31)
  0x82451324:  addi    r3, r31, 84
  0x82451328:  bl      0x82454AA0
  0x8245132c:  mr      r29, r3
  0x82451330:  cmplwi  cr6, r28, 0x0
  0x82451334:  beq     cr6, 0x82451684
  0x82451338:  lwz     r3, 36(r30)
  0x8245133c:  li      r8, 0
  0x82451340:  addi    r7, r31, 88
  0x82451344:  mr      r6, r29
  0x82451348:  mr      r5, r29
  0x8245134c:  mr      r4, r28
  0x82451350:  lwz     r11, 0(r3)
  0x82451354:  lwz     r11, 28(r11)
  0x82451358:  mtctr   r11
  0x8245135c:  bctrl
  0x82451360:  clrlwi  r11, r3, 24
  0x82451364:  cmplwi  cr6, r11, 0x0
  0x82451368:  beq     cr6, 0x82451684
  0x8245136c:  lwz     r11, 28(r30)
  0x82451370:  rlwinm  r11, r11, 0, 28, 28
  0x82451374:  cmplwi  cr6, r11, 0x0
--- a/audit-runs/audit-059-handle-disambiguation/round35-lockstep-inflate/diff.out
+++ b/audit-runs/audit-059-handle-disambiguation/round35-lockstep-inflate/diff.out
@@ -0,0 +1,52 @@
 === Fire counts ===
  ours:   3
  canary: 7
 === Per-LR breakdown ===
  ours:
    lr=0x82458674: 3
  canary:
    lr=0x82457bd4: 2
    lr=0x82458674: 5
 === Side-by-side first 5 fires (entry registers) ===
 --- fire #0 ---
  ours:   tid=6   cycle=363        lr=0x82458674 r3=0x40ba9ac0
          dump: 419fecda 000007f6 00000000 41d7dd10 00001688 00000000 00000000 41f5dd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a5c0
  canary: tid=11  cycle=<unk>      lr=0x82458674 r3=0xbccc4ac0 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
          dump: bdb19cda 000007f6 00000000 bde98d10 00001688 00000000 00000000 be078d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365760
 --- fire #1 ---
  ours:   tid=6   cycle=140548     lr=0x82458674 r3=0x40ba9b80
          dump: 42c0f09a 00018ff6 00000000 43777210 0004d055 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a960
  canary: tid=11  cycle=<unk>      lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
          dump: bed2a09a 00018ff6 00000000 bf892210 0004d055 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365840
 --- fire #2 ---
  ours:   tid=6   cycle=5957876    lr=0x82458674 r3=0x40ba9b80
          dump: 419fecda 000007f6 00000000 414f5f70 000003b9 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 4024a980
  canary: tid=11  cycle=<unk>      lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
          dump: bdb19cda 000007f6 00000000 bd610b90 000003b9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 bc365860
 --- fire #3 ---
  ours:   <no fire>
  canary: tid=11  cycle=<unk>      lr=0x82458674 r3=0xbccc5300 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
          dump: bdb1acda 000007f6 00000000 bce24ed0 00000167 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365f40
 --- fire #4 ---
  ours:   <no fire>
  canary: tid=6   cycle=<unk>      lr=0x82457bd4 r3=0x701cf3c0 r4=0x00000004 r5=0x00002530 r6=0x00008000 r7=0x00000001
          dump: be95af9a 0000c170 00000000 b2050010 000681e9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 0000c17a 00000001 701cf4e0 00000000 be95af90
 === Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===
  Both fields are stable identifiers across engines (host VAs differ but data should match).
  Index of fields:
    [+0x04] = work-item 'size?' (looks like a length field)
    [+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]
  ours [+04,+10]:   [(2038, 5768), (102390, 315477), (2038, 953)]
  canary [+04,+10]: [(2038, 5768), (102390, 315477), (2038, 953), (2038, 359), (49520, 426473), (232195, 999643), (6134, 13763)]
  ours fires whose [+04,+10] match a canary fire: 3/3
--- a/audit-runs/audit-059-handle-disambiguation/round35-lockstep-inflate/diff.py
+++ b/audit-runs/audit-059-handle-disambiguation/round35-lockstep-inflate/diff.py
@@ -0,0 +1,175 @@
 #!/usr/bin/env python3
 """Round 35 lockstep diff: align sub_8280AD40 entry fires between
 ours (--audit-pc-probe-hex AUDIT-PC-PROBE / AUDIT-R3-DUMP) and
 canary (AUDIT-HLC JitProlog).
 Outputs side-by-side rendering of:
  - per-fire entry register snapshot (r3..r10, lr)
  - 64-byte r3 dump (u32 lanes, big-endian)
 Alignment is by tid + invocation order (no input-equivalence required).
 """
 import re
 import sys
 import os
 THIS_DIR = os.path.dirname(os.path.abspath(__file__))
 OURS_LOG = os.path.join(THIS_DIR, "ours.log")
 CANARY_LOG = os.path.join(
    os.path.dirname(THIS_DIR), "round35-lockstep-inflate-canary", "canary.log"
 )
 PC_TARGET = 0x8280AD40
 def parse_ours(path):
    """Pair AUDIT-PC-PROBE lines with their following AUDIT-R3-DUMP lines."""
    fires = []
    cur = None
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line.startswith("AUDIT-PC-PROBE"):
                m = re.search(
                    r"pc=0x([0-9a-f]+) tid=(\d+) hw=\d+ cycle=(\d+) lr=0x([0-9a-f]+) r3=0x([0-9a-f]+) r11=0x([0-9a-f]+)",
                    line,
                )
                if not m:
                    continue
                pc = int(m.group(1), 16)
                if pc != PC_TARGET:
                    cur = None
                    continue
                cur = {
                    "tid": int(m.group(2)),
                    "cycle": int(m.group(3)),
                    "lr": int(m.group(4), 16),
                    "r3": int(m.group(5), 16),
                    "dump": [],
                }
                fires.append(cur)
            elif line.startswith("AUDIT-R3-DUMP") and cur is not None:
                lanes = re.findall(r"\+0x[0-9a-f]+=0x([0-9a-f]+)", line)
                cur["dump"] = [int(x, 16) for x in lanes]
                cur = None
    return fires
 def parse_canary(path):
    """Pair AUDIT-HLC JitProlog header lines with following r3+NN dump lines."""
    fires = []
    cur = None
    hdr_re = re.compile(
        r"AUDIT-HLC JitProlog pc=8280AD40 tid=([0-9A-F]+) r3=([0-9A-F]+) r4=([0-9A-F]+) "
        r"r5=([0-9A-F]+) r6=([0-9A-F]+) r7=([0-9A-F]+) r8=([0-9A-F]+) r9=([0-9A-F]+) r10=([0-9A-F]+) lr=([0-9A-F]+)"
    )
    dump_re = re.compile(
        r"AUDIT-HLC JitProlog pc=8280AD40 r3\+([0-9A-F]+): ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+)"
    )
    with open(path) as f:
        for line in f:
            line = line.strip()
            m = hdr_re.search(line)
            if m:
                cur = {
                    "tid": int(m.group(1), 16),
                    "r3": int(m.group(2), 16),
                    "r4": int(m.group(3), 16),
                    "r5": int(m.group(4), 16),
                    "r6": int(m.group(5), 16),
                    "r7": int(m.group(6), 16),
                    "r8": int(m.group(7), 16),
                    "r9": int(m.group(8), 16),
                    "r10": int(m.group(9), 16),
                    "lr": int(m.group(10), 16),
                    "dump": [],
                }
                fires.append(cur)
                continue
            m = dump_re.search(line)
            if m and cur is not None:
                off = int(m.group(1), 16)
                for i in range(4):
                    word = int(m.group(2 + i), 16)
                    # extend dump to fit
                    idx = off // 4 + i
                    while len(cur["dump"]) <= idx:
                        cur["dump"].append(0)
                    cur["dump"][idx] = word
    return fires
 def fmt_dump(d):
    return " ".join(f"{w:08x}" for w in d[:16])
 def main():
    ours = parse_ours(OURS_LOG)
    canary = parse_canary(CANARY_LOG)
    print(f"=== Fire counts ===")
    print(f"  ours:   {len(ours)}")
    print(f"  canary: {len(canary)}")
    print()
    print(f"=== Per-LR breakdown ===")
    for label, fires in (("ours", ours), ("canary", canary)):
        lr_counts = {}
        for f in fires:
            lr_counts[f["lr"]] = lr_counts.get(f["lr"], 0) + 1
        print(f"  {label}:")
        for lr, n in sorted(lr_counts.items()):
            print(f"    lr=0x{lr:08x}: {n}")
    print()
    print(f"=== Side-by-side first 5 fires (entry registers) ===")
    n = max(len(ours), len(canary))
    n = min(n, 5)
    for i in range(n):
        print(f"\n--- fire #{i} ---")
        if i < len(ours):
            f = ours[i]
            print(
                f"  ours:   tid={f['tid']:<3} cycle={f['cycle']:<10} lr=0x{f['lr']:08x} r3=0x{f['r3']:08x}"
            )
            print(f"          dump: {fmt_dump(f['dump'])}")
        else:
            print(f"  ours:   <no fire>")
        if i < len(canary):
            f = canary[i]
            print(
                f"  canary: tid={f['tid']:<3} cycle=<unk>      lr=0x{f['lr']:08x} r3=0x{f['r3']:08x} "
                f"r4=0x{f['r4']:08x} r5=0x{f['r5']:08x} r6=0x{f['r6']:08x} r7=0x{f['r7']:08x}"
            )
            print(f"          dump: {fmt_dump(f['dump'])}")
        else:
            print(f"  canary: <no fire>")
    print()
    print("=== Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===")
    print("  Both fields are stable identifiers across engines (host VAs differ but data should match).")
    print()
    print("  Index of fields:")
    print("    [+0x04] = work-item 'size?' (looks like a length field)")
    print("    [+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]")
    print()
    # +0x04 is dump[1], +0x10 is dump[4]
    ours_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in ours]
    canary_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in canary]
    print(f"  ours [+04,+10]:   {ours_keys}")
    print(f"  canary [+04,+10]: {canary_keys}")
    print()
    # Cross-match: every ours key should appear in canary (canary is a superset)
    matched = []
    unmatched_ours = []
    for k in ours_keys:
        if k in canary_keys:
            matched.append(k)
        else:
            unmatched_ours.append(k)
    print(f"  ours fires whose [+04,+10] match a canary fire: {len(matched)}/{len(ours)}")
    if unmatched_ours:
        print(f"  ours fires with NO canary match: {unmatched_ours}")
 if __name__ == "__main__":
    main()
--- a/audit-runs/audit-059-handle-disambiguation/round36-dispatcher-ctx/first-fire.txt
+++ b/audit-runs/audit-059-handle-disambiguation/round36-dispatcher-ctx/first-fire.txt
@@ -0,0 +1,17 @@
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 tid=00000006 r3=BCCC4A80 r4=00000018 r5=828F3888 r6=701CF924 r7=82456F00 r8=00000000 r9=00000000 r10=00000018 lr=822F1D5C
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+00: BC22C910 00010004 00000000 000003E8
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+10: 0101FFFF 00000000 00000000 01010000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+20: FFFFFFFF 00000000 00000000 00000000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+30: 00000000 BC365BC0 00000000 00000000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+40: 00000000 00000000 00000000 BDE9A398
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+50: BC365560 00000000 00000000 00000000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+60: 00000000 00000000 00000000 01010040
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+70: 00000000 00000000 00000000 FFFFFFFF
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+80: 00000000 00000000 00000000 BC22C930
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+90: 00000000 00000001 00000800 00000000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+A0: F800004C 00000000 00000000 BC365220
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+B0: BC3655C0 00000000 00000000 00000000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+C0: 00CC0048 00460020 00460072 00650071
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+D0: 00750065 006E0063 00790000 01010000
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+E0: 00000000 00000000 00000000 FFFFFFFF
 K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+F0: 00000000 00000000 00000000 BD610B80
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run1
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run1
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run2
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run2
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run3
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run3
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run4_toml
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.run4_toml
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.sub821B55D8
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.sub821B55D8
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.sub821B6DF4_zero
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.log.sub821B6DF4_zero
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run1
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run1
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbd84000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 564.236:00dc:013c:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 564.240:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 564.240:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 564.399:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 564.839:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 564.839:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 564.839:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 564.843:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: Promoting write cache to read cache. No need to merge any disk caches.
 564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 1.012 ms.
 564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.607 ms.
 564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.370 ms.
 564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 564.903:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 564.903:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 564.946:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 565.067:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 565.067:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 565.067:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.136 ms.
 565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.221 ms.
 565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
 565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 565.075:00dc:013c:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 565.173:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 565.194:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 565.195:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 565.773:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 566.349:00dc:016c:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 566.387:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run2
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run2
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 805.907:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 805.910:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 805.910:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 805.955:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.105:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 806.105:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 806.105:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.161 ms.
 806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.185 ms.
 806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
 806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 806.154:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 806.154:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 806.197:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 806.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 806.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 806.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.156 ms.
 806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.659 ms.
 806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.035 ms.
 806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 806.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 806.408:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 806.422:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 806.423:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 806.948:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 807.499:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 807.521:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run3
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run3
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 893.096:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 893.099:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 893.099:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 893.145:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.310:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 893.310:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 893.310:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.187 ms.
 893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.161 ms.
 893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.040 ms.
 893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 893.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 893.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 893.405:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 893.522:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 893.522:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 893.522:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.153 ms.
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.199 ms.
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.034 ms.
 893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 893.529:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 893.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 893.631:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 893.632:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 894.203:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 894.705:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 894.727:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run4_toml
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.run4_toml
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 956.778:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 956.781:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 956.781:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 956.826:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 956.985:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 956.985:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 956.985:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 956.985:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.171 ms.
 956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.269 ms.
 956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
 956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 957.031:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 957.031:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 957.075:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 957.188:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 957.188:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 957.188:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.172 ms.
 957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.231 ms.
 957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
 957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 957.195:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 957.285:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 957.806:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 958.343:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 958.382:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.sub821B55D8
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.sub821B55D8
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 1217.108:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 1217.111:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 1217.111:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 1217.160:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 1217.309:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.166 ms.
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.173 ms.
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
 1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 1217.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 1217.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 1217.403:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 1217.516:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.157 ms.
 1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.208 ms.
 1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.032 ms.
 1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 1217.524:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 1217.612:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 1218.136:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 1218.678:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 1218.699:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.sub821B6DF4_zero
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stderr.sub821B6DF4_zero
@@ -0,0 +1,89 @@
 warn:  CreateDXGIFactory2: Ignoring flags
 info:  Game: xenia_canary.exe
 info:  DXVK: v2.7.1
 info:  Build: x86_64 gcc 15.1.0
 info:  Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
 info:  Extension providers:
 info:    Platform WSI
 info:    OpenVR
 info:  OpenVR: could not open registry key, status 2
 info:  OpenVR: Failed to locate module
 info:    OpenXR
 info:  Enabled instance extensions:
 info:    VK_EXT_surface_maintenance1
 info:    VK_KHR_get_surface_capabilities2
 info:    VK_KHR_surface
 info:    VK_KHR_win32_surface
 info:  Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
 info:  Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
 info:    Skipping: Software driver
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 warn:  DxgiAdapter::QueryInterface: Unknown interface query
 warn:  f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
 1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
 1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
 1413.916:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
 1413.919:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 1413.919:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 1413.963:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 1414.111:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.173 ms.
 1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.276 ms.
 1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
 1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 1414.157:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
 1414.157:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
 1414.199:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
 1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
 1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
 1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
 1414.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
 1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
 1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
 1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.158 ms.
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.256 ms.
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
 1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
 1414.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
 warn:  DXGIGetDebugInterface1: Stub
 info:  DXGI: Hiding actual GPU, reporting:
 info:    vendor ID: 0x1002
 info:    device ID: 0x73df
 1414.406:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
 1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
 1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
 warn:  DXGI: MakeWindowAssociation: Ignoring flags
 warn:  DxgiOutput::WaitForVBlank: Inaccurate
 info:  Setting timer interval to 1000 us
 1414.927:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
 1415.477:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
 1415.500:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run1
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run1
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run2
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run2
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run3
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run3
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run4_toml
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.run4_toml
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.sub821B55D8
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.sub821B55D8
--- a/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.sub821B6DF4_zero
+++ b/audit-runs/audit-059-handle-disambiguation/round9-sub821B6DF4-caller/canary.stdout.sub821B6DF4_zero
--- a/audit-runs/iterate-2D-deferred-fixes/DEFERRED_FIXES.md
+++ b/audit-runs/iterate-2D-deferred-fixes/DEFERRED_FIXES.md
@@ -0,0 +1,47 @@
 # iterate-2D Deferred Structural Fixes — Outcome
 Branch `iterate-2D/subsystem-fixes`. After verification + the user's go-ahead:
 ## Issue 1 — 32-bit word-form ALU truncation (PPCBUG-020) — ✅ FIXED & LANDED
 Commit **341196a**. Confirmed load-bearing via runtime ours-vs-canary capture:
 Sylpheed's ms→LARGE_INTEGER converter `sub_824ACA88` (`clrldi; mulli r11,r11,-10000; std`)
 produced `0x00000000_FFFD8F00` in ours vs canary's correct `0xFFFFFFFF_FFFD8F00` for a 16 ms
 wait — a positive (absolute) timeout → ~26000× over-wait that froze the main frame loop.
 Fixed the 17 data-losing word-form ops (full 64-bit result, CA/OV/CR0 preserved byte-identical),
 updated 7 bug-asserting tests, re-baselined `sylpheed_n50m` (imports 40454→1790936), `sylpheed_n2m`
 unchanged. 660/660 + ignored oracle green; lockstep determinism preserved. Boot unwedged
 (parallel NtWaitForMultipleObjectsEx 94→30428; frozen worker/critical-section loops now run).
 VdSwap still 1 — rendering progression needs the out-of-scope acd1656 fixes (nt_create_event
 polarity + 2.AF), not in this branch.
 ## Issue 2 — Memory page-size per-region collapse — DEFERRED (verified NOT load-bearing)
 Sylpheed requests `MmAllocatePhysicalMemoryEx` with flags=0, alignment(r8)=0 (default); ours returns
 self-consistent 4K-aligned addresses and boots. ours has no 0xA0/0xC0/0xE0 physical-region model at
 all, so a faithful fix is a region-model rewrite that shifts every physical guest VA (golden-breaking,
 invalidates the audit-059 VA map) with no demonstrated boot benefit. A partial page-size-only change
 would shift VAs for zero correctness gain — do NOT do it piecemeal. Pursue only if a render-path
 struct is proven to depend on physical region/alignment.
 ## Issue 3 — Timing — LEFT (not load-bearing / determinism-coupled)
 - 3d DPC/APC: INERT — the only timer (NtSetTimerEx) passes a NULL APC routine; no
  NtQueueApcThread/KeInsertQueueDpc imported.
 - 3b timeout sign: was a SYMPTOM of Issue 1 (the "positive absolute" timeouts were mulli-corruption
  artifacts) — resolved by the Issue 1 fix.
 - 3a/3c timebase/skew: timebase = instruction-count IS the deterministic lockstep clock; must not
  become wallclock. 2.AF deadline-drain already present. Not load-bearing for Sylpheed.
 ## Issue 4 — VFS synthesized-success-on-miss — LEFT (risky / coupled to Issue 1 trajectory)
 The synthesis fallback handles a MIX (writable-partition probes partition0/Cache0 + a genuine disc
 miss dat/files.tbl, verified absent from the ISO). Canary doesn't fire XamShowDirtyDiscErrorUI during
 boot (the one "DirtyDisc" log hit is the import-table declaration). Not cleanly separable without
 heuristic disc-vs-partition routing. Re-verify on the corrected post-Issue-1 (and post-acd1656)
 trajectory before changing.
 ## Issue 5 — Mutant object — SKIPPED (verified unused)
 Sylpheed's XEX import table contains NO mutant symbols (NtCreateMutant/NtReleaseMutant/KeReleaseMutant/
 KeInitializeMutant/NtQueryMutant) — the game cannot call them; unimplemented=0 across boot. A correct
 implementation needs mutant hand-off semantics + an owner-type redesign (the existing
 `Mutex { owner: Option<u8> }` tracks a HW slot, not a thread) in the determinism-critical wait path,
 for code that never executes. Per the mandate's skip-if-unused criterion, left unimplemented. Can be
 added on request as a pure canary-parity / future-title feature (determinism-safe since no Sylpheed
 mutant ever exists at runtime).
--- a/crates/xenia-app/src/main.rs
+++ b/crates/xenia-app/src/main.rs
@@ -1301,6 +1301,29 @@ fn cmd_exec_inner(
        }
    }
    // iterate-2E — pointer-chase probe. `XENIA_AUDIT_DEREF=<reg>:<off>`
    // (e.g. `4:36`). On each AUDIT-PC-PROBE fire, dumps gpr[reg] as a base
    // object, the sub-object at [base+off], and that sub-object's vtable.
    // Read-only; lockstep digest unaffected.
    if let Ok(spec) = std::env::var("XENIA_AUDIT_DEREF") {
        if !spec.is_empty() {
            let (rs, os) = spec
                .split_once(':')
                .ok_or_else(|| anyhow::anyhow!("XENIA_AUDIT_DEREF {spec:?}: expected <reg>:<off>"))?;
            let reg: u8 = rs.trim_start_matches('r').parse()
                .map_err(|e| anyhow::anyhow!("XENIA_AUDIT_DEREF reg {rs:?}: {e}"))?;
            let off: u32 = if let Some(h) = os.strip_prefix("0x") {
                u32::from_str_radix(h, 16)
            } else {
                os.parse::<u32>()
            }.map_err(|e| anyhow::anyhow!("XENIA_AUDIT_DEREF off {os:?}: {e}"))?;
            kernel.audit_deref = Some((reg, off));
            if !quiet {
                tracing::info!("audit-deref armed: r{} +0x{:x}", reg, off);
            }
        }
    }
    // Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or
    // `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents
    // are dumped at end-of-run by `dump_thread_diagnostic`. Pure
@@ -1474,16 +1497,28 @@ fn cmd_exec_inner(
                    mem.write_u32(addr, block);
                }
                ("xboxkrnl.exe", 0x00AD) => {
-                    // KeTimeStampBundle — 0x18 block with FILETIME at +0 and
+                    // KeTimeStampBundle — X_TIME_STAMP_BUNDLE (canary layout,
-                    // interrupt-time u64 at +0x10. Mirrors the clock used by
+                    // kernel_state.h): +0x00 interrupt_time u64, +0x08
-                    // KeQuerySystemTime so fast-path readers see consistent values.
+                    // system_time u64 (FILETIME 100ns), +0x10 tick_count u32
                    // (milliseconds since boot), +0x14 padding. The guest's
                    // worker-hub channel-dispatch loop (sub_82450A68 @
                    // 0x82450b10) polls [block+0x10] (tick_count) and gates
                    // dispatch on a `tick_count + 66` (ms) deadline. The block
                    // MUST be ticked over the run or that deadline never
                    // elapses (tid14 0x109c starvation gate). Initialize to a
                    // zero-uptime base; KernelState::update_timestamp_bundle
                    // ticks it every round from the deterministic global_clock.
                    let block = alloc_zero(0x18, &mut mem, &mut kernel);
                    if block != 0 {
-                        let fake_time: u64 = 132_500_000_000_000_000; // ~2021 FILETIME
+                        // FILETIME base (~2021) so system_time is plausible.
-                        mem.write_u32(block, (fake_time >> 32) as u32);
+                        let fake_time: u64 = 132_500_000_000_000_000;
-                        mem.write_u32(block + 4, fake_time as u32);
+                        mem.write_u32(block, 0); // interrupt_time hi
-                        mem.write_u32(block + 0x10, (fake_time >> 32) as u32);
+                        mem.write_u32(block + 4, 0); // interrupt_time lo
-                        mem.write_u32(block + 0x14, fake_time as u32);
+                        mem.write_u32(block + 0x08, (fake_time >> 32) as u32); // system_time hi
                        mem.write_u32(block + 0x0C, fake_time as u32); // system_time lo
                        mem.write_u32(block + 0x10, 0); // tick_count (ms) = 0 at boot
                        mem.write_u32(block + 0x14, 0); // padding
                        kernel.timestamp_bundle_addr = block;
                    }
                    mem.write_u32(addr, block);
                }
@@ -2124,6 +2159,27 @@ fn coord_pre_round(
    }
    kernel.fire_due_timers();
    // 2.AF — fire expired wait-deadlines under load. Without this drain,
    // `advance_to_next_wake_if_due` only runs in `coord_idle_advance` (the
    // no-Ready-threads path), so a thread whose `KeWait*`/`KeDelay` deadline
    // expires while other threads keep the scheduler busy sits Blocked
    // forever (observed: tid=5's 42.95ms deadline unfired 29s+). Drain every
    // entry whose deadline `<=` the current guest timebase — the same `now`
    // basis `fire_due_timers` uses, so the two stay in lock-step — and let
    // `handle_timeout_wake` stamp `STATUS_TIMEOUT` and scrub the waiter from
    // each handle. `advance_to_next_wake_if_due` pops at most one due wake
    // per call and returns `None` once the earliest remaining deadline is in
    // the future, so this loop terminates. Deterministic: `ctx(0).timebase`
    // is the guest-cycle timebase, not host_ns. This runs in `coord_pre_round`
    // which both the lockstep and parallel outer loops call every round.
    loop {
        let now = kernel.now_basis_at(0);
        let Some((r, reason)) = kernel.scheduler.advance_to_next_wake_if_due(now)
        else {
            break;
        };
        kernel.handle_timeout_wake(r, reason);
    }
    // Graphics-interrupt delivery is no longer done here — see
    // `dispatch_graphics_interrupts`, called from the outer loop with
    // `mem` and `&mut stats` in scope. The audio path still uses the
@@ -2575,6 +2631,10 @@ fn worker_prologue(
    match result {
        StepResult::Continue => {}
        StepResult::Yield => {
            // db16cyc spin-wait hint (per-instruction path): yield the slot.
            kernel.scheduler.yield_current();
        }
        StepResult::SystemCall => {
            tracing::warn!("SYSCALL at {:#010x} (hw={})", pc, hw_id);
        }
@@ -2654,6 +2714,11 @@ fn worker_epilogue(
    match result {
        StepResult::Continue => {}
        StepResult::Yield => {
            // db16cyc spin-wait hint: hand the slot to a Ready peer so the
            // spinner doesn't starve the co-located thread it is waiting on.
            kernel.scheduler.yield_current();
        }
        StepResult::SystemCall => {
            let last_pc = block.instrs.last().map(|i| i.addr).unwrap_or(pc_before);
            tracing::warn!("SYSCALL at {:#010x} (hw={})", last_pc, hw_id);
@@ -2780,6 +2845,32 @@ fn run_execution(
            RoundCtl::BreakOuter => break,
            RoundCtl::Continue => {}
        }
        // ITERATE-2C Phase D — deposit the current instruction count so
        // `nt_create_event` can compute absolute auto-signal deadlines,
        // then drain any pending auto-signals whose deadline has passed.
        // Both calls are no-ops when `XENIA_SILPH_UI_AUTOSIGNAL_DELAY`
        // is unset (the pending queue stays empty).
        kernel.set_now_cycle_hint(stats.instruction_count);
        // Drive the coherent monotonic "now" the kernel deadline-arithmetic
        // reads (`KernelState::now_basis_at` -> `Scheduler::global_clock`)
        // from the deterministic retired-instruction count. Floored up (never
        // backwards). This is the LOCKSTEP analogue of the parallel writeback's
        // `advance_global_clock`: a parked/poll thread computing a relative
        // timeout via `parse_timeout` now reads a real, non-zero, monotone
        // basis instead of `idle_ctx`'s timebase-0, so its deadline lands in
        // the future and `coord_idle_advance` stops re-arming the constant
        // past deadline forever (the timebase-desync livelock / render-gate
        // root). Pure function of guest instructions -> bit-reproducible.
        kernel
            .scheduler
            .advance_global_clock_to(stats.instruction_count);
        // ITERATE-2J — tick the KeTimeStampBundle (ordinal 0x00AD) from the
        // same deterministic clock so the guest's worker-hub tick_count
        // deadline gate (`[block+0x10] + 66` ms) actually elapses. Without
        // this the block is frozen at boot and the hub spins forever,
        // starving tid14 on event 0x109c.
        kernel.update_timestamp_bundle(mem, kernel.scheduler.global_clock());
        kernel.fire_due_silph_autosignals(stats.instruction_count);
        dispatch_graphics_interrupts(
            kernel,
            mem,
@@ -3118,6 +3209,16 @@ fn run_execution_parallel(
                                    .and_then(|t| guard.scheduler.find_by_tid(t))
                                    .unwrap_or(thread_ref);
                                *guard.scheduler.ctx_mut_ref(target_ref) = ctx_taken;
                                // Advance the parallel-mode coherent clock by
                                // the instructions this block retired. This is
                                // the single authoritative "now" the kernel
                                // deadline-arithmetic reads in parallel mode
                                // (per-thread `ctx.timebase` is incoherent here
                                // because peers extract/zero their slots) —
                                // keeping it monotonic breaks the timebase-
                                // desync livelock where a woken thread re-armed
                                // the same constant deadline forever.
                                guard.scheduler.advance_global_clock(executed);
                                // worker_epilogue's exit_current path
                                // expects scheduler.current to be set
                                // to the running thread.
@@ -3204,6 +3305,25 @@ fn run_execution_parallel(
            }
            let mut guard = pre_outcome.1;
            // ITERATE-2C Phase D — same auto-signal hook as the lockstep
            // path. Held under the same `kernel_arc` guard the rest of
            // this prologue runs under, so no extra locking.
            {
                let s = stats_mtx.lock().expect("stats mutex poisoned");
                guard.set_now_cycle_hint(s.instruction_count);
                guard.fire_due_silph_autosignals(s.instruction_count);
            }
            // ITERATE-2J — tick the KeTimeStampBundle (ordinal 0x00AD) from
            // the parallel-mode coherent global_clock (summed per-block
            // retired instructions). Same fix as the lockstep loop: keeps the
            // guest's worker-hub tick_count deadline gate advancing so it
            // dispatches channel-3 and unblocks tid14 on event 0x109c.
            {
                let clock = guard.scheduler.global_clock();
                guard.update_timestamp_bundle(mem, clock);
            }
            // Iterate-2.BE — host-driven synchronous ISR dispatch.
            // Runs under the kernel lock while workers are still parked
            // at the phaser B2 barrier (the coordinator hasn't published
@@ -3555,6 +3675,9 @@ fn dispatch_graphics_interrupts(
            isr_instrs += 1;
            match r {
                StepResult::Continue => {}
                // db16cyc inside the synchronous ISR has no slot to yield —
                // the ISR runs to completion on the borrowed context.
                StepResult::Yield => {}
                StepResult::SystemCall => {
                    tracing::warn!("graphics ISR hit `sc` instruction; aborting");
                    break;
--- a/crates/xenia-app/tests/golden/sylpheed_n50m.json
+++ b/crates/xenia-app/tests/golden/sylpheed_n50m.json
@@ -1,9 +1,9 @@
 {
-  "instructions": 50000001,
+  "instructions": 50000003,
-  "imports": 40454,
+  "imports": 451508,
  "unimpl": 0,
  "draws": 0,
-  "swaps": 1,
+  "swaps": 2,
  "unique_render_targets": 0,
  "shader_blobs_live": 0,
  "texture_cache_entries": 0
--- a/crates/xenia-cpu/src/interpreter.rs
+++ b/crates/xenia-cpu/src/interpreter.rs
@@ -28,6 +28,15 @@ pub enum StepResult {
    Trap,
    /// Execution halted (by debugger or error).
    Halted,
    /// Executed the `db16cyc` spin-wait hint (`or r31,r31,r31`, encoding
    /// `0x7FFFFB78`). The PC has already advanced past the hint; this is a
    /// cooperative-yield signal so the scheduler hands the slot to a Ready
    /// peer. On real hardware all six HW threads run concurrently and the
    /// spin resolves naturally; under our round-robin lockstep a spinning
    /// barrier/spinlock participant would otherwise monopolize its slot and
    /// starve the co-located thread it is waiting on. Matches canary's
    /// `InstrEmit_orx` db16cyc → `DelayExecution()` handling.
    Yield,
 }
 /// Execute a single PPC instruction.
@@ -95,6 +104,9 @@ pub fn step_block(
        ctx.cycle_count += 1;
        ctx.timebase += 1;
        if !matches!(result, StepResult::Continue) {
            // `Yield` (db16cyc spin hint) terminates the block here so the
            // scheduler regains control and can rotate the slot; the PC has
            // already advanced past the hint inside `execute`.
            return result;
        }
        // PC discontinuity within a block. By construction only the
@@ -117,65 +129,65 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::addis => {
-            // Xbox 360 user mode is 32-bit ABI (MSR.SF=0), so addis must
+            // PPCBUG-020 fix: Xenon is a 64-bit core; `addis` produces the full
-            // produce a value whose upper 32 bits don't pollute downstream
+            // 64-bit `RA + (EXTS(SI) << 16)`. Matches canary
-            // 64-bit arithmetic. The PPC ISA in 64-bit mode sign-extends
+            // (`Add(RA, Int64(EXTS(imm) << 16))`, stores full 64-bit).
            // simm16 before the shift, producing 0xFFFFFFFF_xxxx0000 for
            // negative simm16 (high bit set). When this value flows into
            // a 64-bit subfc against a zero-extended lwz value, the unsigned
            // 64-bit comparison yields wrong CA. Truncate to 32 bits to
            // simulate 32-bit ABI behavior.
            let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
            let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16);
-            ctx.gpr[instr.rd()] = result as u32 as u64;
+            ctx.gpr[instr.rd()] = result;
            ctx.pc += 4;
        }
        PpcOpcode::addic => {
-            // PPCBUG-002: 32-bit ABI. CA must be from a 32-bit unsigned compare;
+            // PPCBUG-020 fix: full 64-bit `RA + EXTS(SI)` (canary `Add(RA,
-            // canary's `AddDidCarry` truncates both operands to int32 first.
+            // Int64(EXTS(imm)))`). CA stays a 32-bit unsigned compare to match
            // canary's `AddDidCarry` (truncates operands to int32 first).
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let imm32 = instr.simm16() as i32 as u32;
            let result32 = ra32.wrapping_add(imm32);
            ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(instr.simm16() as i64 as u64);
            ctx.pc += 4;
        }
        PpcOpcode::addicx => {
-            // PPCBUG-003: same fix as addic plus CR0 i32 view.
+            // PPCBUG-020 fix: full 64-bit result; CA 32-bit; CR0 32-bit i32 view
            // (= low 32 of the result; unchanged from the pre-fix behaviour).
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let imm32 = instr.simm16() as i32 as u32;
            let result32 = ra32.wrapping_add(imm32);
            ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(instr.simm16() as i64 as u64);
            ctx.update_cr_signed(0, result32 as i32 as i64);
            ctx.pc += 4;
        }
        PpcOpcode::subficx => {
-            // PPCBUG-005: 32-bit ABI. Sign-extended imm has bits 32-63 set for
+            // PPCBUG-020 fix: full 64-bit `EXTS(SI) - RA` (canary `Sub(Int64(
-            // negative SIMM, poisoning the writeback. Canary uses 32-bit form.
+            // EXTS(imm)), RA)`). CA stays a 32-bit compare.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let imm32 = instr.simm16() as i32 as u32;
            let result32 = imm32.wrapping_sub(ra32);
            ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = (instr.simm16() as i64 as u64).wrapping_sub(ctx.gpr[instr.ra()]);
            ctx.pc += 4;
        }
        PpcOpcode::mulli => {
-            // PPCBUG-004: 32-bit ABI. Read RA as i32 (low 32, sign-extended for
+            // PPCBUG-020 fix: full 64-bit low product of (full 64-bit RA) ×
-            // multiply), product fits in 32 bits per ISA (overflow wraps).
+            // EXTS(SI). Matches canary InstrEmit_mulli
-            let ra = ctx.gpr[instr.ra()] as i32 as i64;
+            // (`StoreGPR(Mul(LoadGPR(RA), Int64(EXTS(imm))))`).
            let ra = ctx.gpr[instr.ra()] as i64;
            let imm = instr.simm16() as i64;
-            ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64;
+            ctx.gpr[instr.rd()] = ra.wrapping_mul(imm) as u64;
            ctx.pc += 4;
        }
        // ===== ALU: Register =====
        PpcOpcode::addx => {
-            // PPCBUG-012+020: 32-bit ABI writeback truncation + CR0 i32 view.
+            // PPCBUG-020 fix: full 64-bit `RA + RB` (canary `Add(RA, RB)`).
            // OV/CR0 keep their 32-bit computation (low 32 of the result is
            // unchanged), so only the previously-zeroed upper 32 bits change.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let result32 = ra32.wrapping_add(rb32);
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]);
            if instr.oe() {
                let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -186,12 +198,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::addcx => {
-            // PPCBUG-013+020: 32-bit truncation; CA from u32 unsigned compare.
+            // PPCBUG-020 fix: full 64-bit `RA + RB`; CA stays 32-bit (canary
            // `AddDidCarry` truncates to int32). Low 32 of result unchanged.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let result32 = ra32.wrapping_add(rb32);
            ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]);
            if instr.oe() {
                let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -202,13 +215,15 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::addex => {
-            // PPCBUG-014+020: 32-bit truncation; CA from u32 unsigned compare.
+            // PPCBUG-020 fix: full 64-bit `RA + RB + CA`; CA stays 32-bit.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = ra32.wrapping_add(rb32).wrapping_add(ca);
            ctx.xer_ca = if result32 < ra32 || (ca != 0 && result32 == ra32) { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()]
                .wrapping_add(ctx.gpr[instr.rb()])
                .wrapping_add(ca as u64);
            if instr.oe() {
                let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128) + (ca as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -219,12 +234,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::addzex => {
-            // PPCBUG-015+020: 32-bit truncation.
+            // PPCBUG-020 fix: full 64-bit `RA + CA`; CA stays 32-bit.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = ra32.wrapping_add(ca);
            ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ca as u64);
            if instr.oe() {
                let true_sum = (ra32 as i32 as i128) + (ca as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -235,12 +250,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::addmex => {
-            // PPCBUG-016+020: 32-bit truncation. RT = RA + CA - 1.
+            // PPCBUG-020 fix: full 64-bit `RA + CA - 1`; CA stays 32-bit.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = ra32.wrapping_add(ca).wrapping_sub(1);
            ctx.xer_ca = if ra32 != 0 || ca != 0 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ca as u64).wrapping_sub(1);
            if instr.oe() {
                let true_sum = (ra32 as i32 as i128) + (ca as i128) - 1;
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -251,11 +266,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::subfx => {
-            // PPCBUG-017+020: 32-bit truncation.
+            // PPCBUG-020 fix: full 64-bit `RB - RA` (canary `Sub(RB, RA)`).
            // OV/CR0 keep their 32-bit view (low 32 of result unchanged).
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let result32 = rb32.wrapping_sub(ra32);
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.rb()].wrapping_sub(ctx.gpr[instr.ra()]);
            if instr.oe() {
                let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128);
                overflow::apply(ctx, true_diff != (result32 as i32) as i128);
@@ -266,14 +282,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::subfcx => {
-            // PPCBUG-007: 32-bit ABI. The `rb >= ra` u64 unsigned compare is
+            // PPCBUG-020 fix: full 64-bit `RB - RA`; CA stays a 32-bit `rb >= ra`
-            // exactly the shape that broke addis. Defensive 32-bit truncation
+            // compare (canary `SubDidCarry` truncates to int32).
            // is required for correct CA even after upstream cleanup.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let result32 = rb32.wrapping_sub(ra32);
            ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = ctx.gpr[instr.rb()].wrapping_sub(ctx.gpr[instr.ra()]);
            if instr.oe() {
                let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128);
                overflow::apply(ctx, true_diff != (result32 as i32) as i128);
@@ -284,14 +299,16 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::subfex => {
-            // PPCBUG-008: 32-bit ABI. Compute in u32 space — `!ra` on u64 always
+            // PPCBUG-020 fix: full 64-bit `~RA + RB + CA` (canary semantics).
-            // pollutes the upper 32 bits, making this an active poisoner.
+            // CA keeps its 32-bit compare. Low 32 of the result is unchanged.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let rb32 = ctx.gpr[instr.rb()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
            ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()])
                .wrapping_add(ctx.gpr[instr.rb()])
                .wrapping_add(ca as u64);
            if instr.oe() {
                // RT <- !RA + RB + CA  ==  RB - RA - 1 + CA  (32-bit semantics).
                let true_sum = (rb32 as i32 as i128) - (ra32 as i32 as i128) - 1 + (ca as i128);
@@ -303,14 +320,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::subfzex => {
-            // PPCBUG-018: same active-poisoning shape as subfex; operate in u32.
+            // PPCBUG-020 fix: full 64-bit `~RA + CA` (canary semantics).
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = (!ra32).wrapping_add(ca);
-            // RT <- !RA + CA (no -1 term). 32-bit carry-out only when
+            // CA: 32-bit carry-out only when !ra32 = u32::MAX (ra32 = 0) AND ca = 1.
            // !ra32 = u32::MAX (i.e. ra32 = 0) AND ca = 1.
            ctx.xer_ca = if ra32 == 0 && ca != 0 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()]).wrapping_add(ca as u64);
            if instr.oe() {
                let true_sum = -(ra32 as i32 as i128) - 1 + (ca as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -321,13 +337,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::subfmex => {
-            // PPCBUG-019: also fixes the always-true CA edge — `!ra` on u64
+            // PPCBUG-020 fix: full 64-bit `~RA + CA - 1` (canary semantics). CA
-            // is non-zero when ra32==0xFFFFFFFF and ca==0, so CA was stuck at 1.
+            // uses the 32-bit `!ra32` so it isn't stuck at 1 from u64 inversion.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let ca = ctx.xer_ca as u32;
            let result32 = (!ra32).wrapping_add(ca).wrapping_sub(1);
            ctx.xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 };
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()]).wrapping_add(ca as u64).wrapping_sub(1);
            if instr.oe() {
                let true_sum = -(ra32 as i32 as i128) - 2 + (ca as i128);
                overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -338,12 +354,11 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::negx => {
-            // PPCBUG-006: 32-bit ABI. `(!ra).wrapping_add(1)` on u64 always
+            // PPCBUG-020 fix: full 64-bit `-RA` (canary `Sub(0, RA)`). OV keeps
-            // sets upper 32 bits — every neg poisoned the GPR. neg_ov also
+            // the 32-bit INT_MIN check (low 32 of the result is unchanged).
            // checks at 64-bit INT_MIN; should be 32-bit INT_MIN.
            let ra32 = ctx.gpr[instr.ra()] as u32;
            let result32 = (!ra32).wrapping_add(1);
-            ctx.gpr[instr.rd()] = result32 as u64;
+            ctx.gpr[instr.rd()] = 0u64.wrapping_sub(ctx.gpr[instr.ra()]);
            if instr.oe() {
                overflow::apply(ctx, ra32 == 0x8000_0000);
            }
@@ -353,12 +368,15 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::mullwx => {
-            // PPCBUG-009: 32-bit ABI. Truncate product to u32 — overflow detection
+            // PPCBUG-020 fix: full 64-bit low product of EXTS(RA[32:63]) ×
-            // (mullw_ov) still uses the full i64 product to catch the overflow.
+            // EXTS(RB[32:63]) (canary InstrEmit_mullwx stores the full i64
            // product). A 32×32 product can occupy the upper 32 bits (e.g.
            // 0x10000 × 0x10000 = 0x1_0000_0000); the old `as u32` dropped them.
            // OV uses the full product; CR0 keeps its 32-bit (low-word) view.
            let ra = ctx.gpr[instr.ra()] as i32 as i64;
            let rb = ctx.gpr[instr.rb()] as i32 as i64;
            let product = ra.wrapping_mul(rb);
-            ctx.gpr[instr.rd()] = product as u32 as u64;
+            ctx.gpr[instr.rd()] = product as u64;
            if instr.oe() {
                overflow::apply(ctx, overflow::mullw_ov(product));
            }
@@ -542,6 +560,18 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()];
            if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
            ctx.pc += 4;
            // `or r31,r31,r31` with encoding 0x7FFFFB78 is the Xenon `db16cyc`
            // spin-wait hint (a no-op write of r31 onto itself). Canary's
            // `InstrEmit_orx` special-cases exactly this code → `DelayExecution()`.
            // Under our round-robin lockstep, a guest spinlock/barrier loop that
            // executes db16cyc would otherwise consume its whole block every round
            // and starve the co-located thread it is waiting on (the lock holder /
            // barrier peer). Surface it as a cooperative yield so the scheduler can
            // hand the slot to a Ready peer. The semantic result of the op is
            // already applied (r31 |= r31 is a no-op), so yielding is value-neutral.
            if instr.raw == 0x7FFF_FB78 {
                return StepResult::Yield;
            }
        }
        PpcOpcode::orcx => {
            // PPCBUG-028: same shape as andcx — operate in u32.
@@ -620,7 +650,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
        PpcOpcode::slwx => {
            // PPCBUG-044: 32-bit ABI CR0 view. A result with bit 31 set
            // (e.g. 0x80000000) is negative in i32 view but positive in i64.
-            let sh = ctx.gpr[instr.rb()] as u32;
+            // Shift amount is RB[58:63] (6 bits): if >=32 the result is zeroed,
            // else shift by the low bits. Matches canary InstrEmit_slwx, which
            // masks `rb & 0x3F` then tests bit 5 — NOT a full-u32 `< 32` test
            // (a count like 0x40 has low-6-bits 0 and must pass the value
            // through, not zero it).
            let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
            ctx.gpr[instr.ra()] = if sh < 32 {
                ((ctx.gpr[instr.rs()] as u32) << sh) as u64
            } else { 0 };
@@ -630,7 +665,9 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
        PpcOpcode::srwx => {
            // PPCBUG-044: 32-bit ABI CR0 view (zero-extended right shift can never
            // have bit 31 set, but use the canonical form for consistency).
-            let sh = ctx.gpr[instr.rb()] as u32;
+            // Shift amount masked to RB[58:63] (6 bits) to match canary
            // InstrEmit_srwx (`rb & 0x3F`, test bit 5).
            let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
            ctx.gpr[instr.ra()] = if sh < 32 {
                ((ctx.gpr[instr.rs()] as u32) >> sh) as u64
            } else { 0 };
@@ -638,37 +675,46 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            ctx.pc += 4;
        }
        PpcOpcode::srawx => {
-            // PPCBUG-041+043 coupled: 32-bit ABI writeback truncation + CR0 i32.
+            // sraw: 32-bit arithmetic shift right. Per PowerISA the 32-bit result
-            // CA logic is independently correct (uses u32 shifted-out test).
+            // is SIGN-extended into the full 64-bit RA (`RA <- r&m | (i64.s)&¬m`),
            // matching canary InstrEmit_srawx (`v = f.SignExtend(v, INT64_TYPE)`).
            // Earlier ours zero-extended (`result as u32 as u64`) — the PPCBUG-041
            // "writeback truncation" band-aid — which corrupts any negative shift
            // result consumed as a 64-bit value. CA logic is independently correct
            // (uses the u32 shifted-out test) and the CR0 view is unchanged (the
            // sign-extended i64 has the same i32 view).
            let rs = ctx.gpr[instr.rs()] as i32;
            let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
-            if sh == 0 {
+            let result: i32 = if sh == 0 {
                ctx.gpr[instr.ra()] = rs as u32 as u64;
                ctx.xer_ca = 0;
                rs
            } else if sh < 32 {
                let result = rs >> sh;
                ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 };
-                ctx.gpr[instr.ra()] = result as u32 as u64;
+                rs >> sh
            } else {
-                ctx.gpr[instr.ra()] = if rs < 0 { 0xFFFF_FFFFu64 } else { 0 };
+                // sh >= 32: result is all sign bits of rs.
                ctx.xer_ca = if rs < 0 { 1 } else { 0 };
-            }
+                rs >> 31
-            if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
+            };
            ctx.gpr[instr.ra()] = result as i64 as u64;
            if instr.rc_bit() { ctx.update_cr_signed(0, result as i64); }
            ctx.pc += 4;
        }
        PpcOpcode::srawix => {
-            // PPCBUG-042+043 coupled: same shape as srawx for the sh-immediate form.
+            // srawi: same as srawx for the sh-immediate form (sh in 0..31).
            // Sign-extend the 32-bit result into the full 64-bit RA per PowerISA /
            // canary InstrEmit_srawix.
            let rs = ctx.gpr[instr.rs()] as i32;
            let sh = instr.sh();
-            if sh == 0 {
+            let result: i32 = if sh == 0 {
                ctx.gpr[instr.ra()] = rs as u32 as u64;
                ctx.xer_ca = 0;
                rs
            } else {
                let result = rs >> sh;
                ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 };
-                ctx.gpr[instr.ra()] = result as u32 as u64;
+                rs >> sh
-            }
+            };
-            if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
+            ctx.gpr[instr.ra()] = result as i64 as u64;
            if instr.rc_bit() { ctx.update_cr_signed(0, result as i64); }
            ctx.pc += 4;
        }
        PpcOpcode::sldx => {
@@ -1605,7 +1651,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
            match spr {
                crate::context::spr::XER => ctx.set_xer(val as u32),
                crate::context::spr::LR => ctx.lr = val,
-                crate::context::spr::CTR => ctx.ctr = val as u32 as u64,
+                // CTR is a 64-bit SPR — store the full GPR, matching canary
                // InstrEmit_mtspr (`f.StoreCTR(rt)`, no truncation). The PPCBUG-054
                // `val as u32 as u64` band-aid dropped the upper 32 bits, which a
                // later `mfspr rX, CTR` would read back wrong. (bdnz/bcctr only
                // ever consume CTR's low 32 bits, so branching is unaffected.)
                crate::context::spr::CTR => ctx.ctr = val,
                crate::context::spr::DEC => ctx.dec = val as u32,
                crate::context::spr::TBL_WRITE => {
                    ctx.timebase = (ctx.timebase & 0xFFFF_FFFF_0000_0000) | (val & 0xFFFF_FFFF);
@@ -5015,6 +5066,106 @@ mod tests {
        assert_eq!(ctx.pc, 4);
    }
    #[test]
    fn test_db16cyc_yields() {
        // `or r31,r31,r31` encoding 0x7FFFFB78 is the Xenon db16cyc spin hint.
        // It must (a) be value-neutral (r31 unchanged), (b) advance PC, and
        // (c) report StepResult::Yield so the scheduler can hand off the slot.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        write_instr(&mut mem, 0, 0x7FFF_FB78);
        ctx.pc = 0;
        ctx.gpr[31] = 0x1234_5678_9ABC_DEF0;
        let r = step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[31], 0x1234_5678_9ABC_DEF0, "db16cyc is value-neutral");
        assert_eq!(ctx.pc, 4, "PC advances past the hint");
        assert_eq!(r, StepResult::Yield, "db16cyc surfaces as a cooperative yield");
    }
    #[test]
    fn test_plain_or_self_is_not_yield() {
        // A regular `or rN,rN,rN` that is NOT the db16cyc encoding (e.g. r3)
        // is an ordinary no-op move and must keep executing (Continue), so we
        // only yield on the exact spin-hint code canary special-cases.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        // or r3, r3, r3  (RT=RA=RB=3, Rc=0): 31<<26 | 3<<21 | 3<<16 | 3<<11 | 444<<1
        let raw = (31u32 << 26) | (3 << 21) | (3 << 16) | (3 << 11) | (444 << 1);
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        ctx.gpr[3] = 0xCAFE;
        let r = step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[3], 0xCAFE);
        assert_eq!(ctx.pc, 4);
        assert_eq!(r, StepResult::Continue, "non-db16cyc or-self stays Continue");
    }
    #[test]
    fn test_smt_priority_hints_are_nops_not_yields() {
        // iterate-2H spin/yield/sync hint-class audit. The PowerPC SMT
        // thread-priority hints `or 1,1,1` / `or 2,2,2` / `or 3,3,3` / `or 6,6,6`
        // (and the db8cyc family `or 26..30`) are reserved no-op encodings.
        // Canary's `InstrEmit_orx` emits `f.Nop()` for EVERY `or rX,rX,rX`
        // (RT==RB==RA && !Rc) form EXCEPT the exact db16cyc code 0x7FFFFB78,
        // which alone gets `f.DelayExecution()`. So ours must NOT yield on any
        // of these — over-yielding would diverge from canary and perturb the
        // deterministic schedule. (Audit evidence: none of 1/2/3/6/26..30 even
        // appear in Sylpheed's image; only `or 31,31,31` (db16cyc) is used as a
        // spin hint. This test locks the no-over-yield invariant regardless.)
        for r in [1u32, 2, 3, 6, 26, 27, 28, 29, 30] {
            let mut ctx = PpcContext::new();
            let mut mem = TestMem::new();
            // or rN,rN,rN, Rc=0: 31<<26 | r<<21 | r<<16 | r<<11 | 444<<1
            let raw = (31u32 << 26) | (r << 21) | (r << 16) | (r << 11) | (444 << 1);
            write_instr(&mut mem, 0, raw);
            ctx.pc = 0;
            ctx.gpr[r as usize] = 0xDEAD_BEEF_F00D_BA11;
            let res = step(&mut ctx, &mut mem);
            assert_eq!(
                ctx.gpr[r as usize], 0xDEAD_BEEF_F00D_BA11,
                "or {r},{r},{r} is value-neutral"
            );
            assert_eq!(ctx.pc, 4, "or {r},{r},{r} advances PC");
            assert_eq!(
                res,
                StepResult::Continue,
                "priority hint or {r},{r},{r} is a plain no-op (canary Nop), NOT a yield"
            );
        }
    }
    #[test]
    fn test_lwsync_ptesync_eieio_isync_decode_as_benign_noops() {
        // Memory/sync barrier class. Canary keys `sync` on XO=598 only, so
        // sync (L=0), lwsync (L=1), ptesync (L=2) all map to the same
        // `InstrEmit_sync` -> `MemoryBarrier`; `eieio` -> `MemoryBarrier`;
        // `isync` -> `Nop`. Under our single-host interpreter every one is a
        // value-neutral no-op that advances PC and must DECODE (never trap as
        // unknown). This guards the L-field disambiguation and the decode path.
        let cases: &[(u32, &str)] = &[
            (0x7C00_04AC, "sync"),    // L=0
            (0x7C20_04AC, "lwsync"),  // L=1
            (0x7C40_04AC, "ptesync"), // L=2
            (0x7C00_06AC, "eieio"),
            (0x4C00_012C, "isync"),
        ];
        for &(raw, name) in cases {
            let mut ctx = PpcContext::new();
            let mut mem = TestMem::new();
            let pre_xer = ctx.xer();
            let pre_fpscr = ctx.fpscr;
            let pre_gpr = ctx.gpr;
            write_instr(&mut mem, 0x200, raw);
            ctx.pc = 0x200;
            let res = step(&mut ctx, &mut mem);
            assert_eq!(res, StepResult::Continue, "{name} continues");
            assert_eq!(ctx.pc, 0x204, "{name} advances PC (decoded, did not trap)");
            assert_eq!(ctx.xer(), pre_xer, "{name} leaves XER");
            assert_eq!(ctx.fpscr, pre_fpscr, "{name} leaves FPSCR");
            assert_eq!(ctx.gpr, pre_gpr, "{name} leaves GPRs");
        }
    }
    #[test]
    fn test_fadd() {
        let mut ctx = PpcContext::new();
@@ -5332,15 +5483,17 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.xer_ov, 1);
+        assert_eq!(ctx.xer_ov, 1, "32-bit INT_MIN check (preserved) sets OV");
-        // -INT_MIN wraps to INT_MIN (low 32 bits) with upper 32 bits zero.
+        // PPCBUG-020 fix: neg is full 64-bit `0 - RA` (canary `Sub(0, RA)`).
-        assert_eq!(ctx.gpr[5], 0x0000_0000_8000_0000);
+        // RA = 0x0000_0000_8000_0000 → 0xFFFF_FFFF_8000_0000. (OV remains the
        // preserved 32-bit INT_MIN flag.)
        assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_8000_0000);
    }
    #[test]
    fn neg_clean_input_no_upper_bits() {
-        // PPCBUG-006 regression: neg r3=5 must produce 0x00000000_FFFFFFFB,
+        // PPCBUG-020 fix: neg r3=5 = `0 - 5` = -5 = 0xFFFFFFFF_FFFFFFFB on a
-        // not 0xFFFFFFFF_FFFFFFFB (the 64-bit !ra-then-add-1 result).
+        // 64-bit core (canary `Sub(0, RA)`), not the truncated 0x00000000_FFFFFFFB.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 5;
@@ -5348,7 +5501,7 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[5], 0x0000_0000_FFFF_FFFB);
+        assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_FFFF_FFFB);
    }
    #[test]
@@ -5502,9 +5655,10 @@ mod tests {
    }
    #[test]
-    fn mullwx_overflow_truncates_to_32() {
+    fn mullwx_overflow_keeps_full_64bit_product() {
-        // PPCBUG-009: mullwo r5, r3, r4 with ra=0x10000, rb=0x10000 → product
+        // PPCBUG-020 fix: mullwo r5, r3, r4 with ra=0x10000, rb=0x10000 → full
-        // 0x100000000 (overflow). Low 32 = 0; OE must fire.
+        // 64-bit product 0x1_0000_0000 (canary stores the full i64 product, not
        // the truncated low 32). OE still fires (the product overflows int32).
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0x10000;
@@ -5514,7 +5668,7 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[5], 0, "low 32 bits = 0");
+        assert_eq!(ctx.gpr[5], 0x0000_0001_0000_0000, "full 64-bit product");
        assert_eq!(ctx.xer_ov, 1, "overflow detected");
    }
@@ -5536,9 +5690,74 @@ mod tests {
    }
    #[test]
-    fn srawx_negative_value_zero_extends_upper() {
+    fn slwx_shift_count_masks_to_6_bits() {
-        // PPCBUG-041+043: srawx of negative i32 by 1 produces a negative i32;
+        // slw masks the shift count to RB[58:63] (6 bits): a count of 0x40 has
-        // writeback must zero-extend to u64 (not sign-extend).
+        // low-6-bits 0, so the value passes through unchanged — it must NOT be
        // zeroed by a naive full-u32 `>= 32` test. Matches canary InstrEmit_slwx.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0x0000_1234u64;
        ctx.gpr[4] = 0x40; // count & 0x3F == 0 → shift by 0
        // slwx r5, r3, r4 (XO=24)
        let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (24 << 1);
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[5], 0x0000_1234u64, "0x40 masks to 0 → passthrough");
    }
    #[test]
    fn slwx_count_32_to_63_zeroes() {
        // A masked count in [32,63] (bit 5 set) zeroes the result.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0xFFFF_FFFFu64;
        ctx.gpr[4] = 0x60; // & 0x3F = 0x20 (32) → zero
        let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (24 << 1);
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[5], 0);
    }
    #[test]
    fn srwx_shift_count_masks_to_6_bits() {
        // srw, same 6-bit mask. Count 0x48 → low-6-bits = 8 → logical >> 8.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0x0000_FF00u64;
        ctx.gpr[4] = 0x48; // & 0x3F = 8
        // srwx r5, r3, r4 (XO=536)
        let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (536 << 1);
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[5], 0x0000_00FFu64, "0x48 masks to 8 → >>8");
    }
    #[test]
    fn rlwinm_mb_greater_than_me_wraparound_mask() {
        // rlwinm with MB > ME produces a wraparound mask covering bits
        // [0..ME] ∪ [MB..31] (a "split" mask). PowerISA MASK(mb,me) wraps when
        // mb > me. Here rotate by 0, MB=28, ME=3 → mask = 0xF000000F.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0xFFFF_FFFFu64;
        // rlwinm r5, r3, SH=0, MB=28, ME=3 (opcode 21)
        let raw = (21u32 << 26) | (3 << 21) | (5 << 16) | (0 << 11) | (28 << 6) | (3 << 1);
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[5], 0x0000_0000_F000_000Fu64,
                   "MB>ME wraparound mask = bits [0..3] | [28..31]");
    }
    #[test]
    fn srawx_negative_value_sign_extends_upper() {
        // sraw of negative i32 by 1 produces a negative i32 result that PowerISA
        // SIGN-extends into the full 64-bit RA (canary InstrEmit_srawx uses
        // `f.SignExtend`). 0x80000000 >> 1 = 0xC0000000 (i32) → 0xFFFFFFFF_C0000000.
        // (Was 0x00000000_C0000000 under the PPCBUG-041 zero-extend band-aid.)
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0x8000_0000u64; // i32::MIN
@@ -5548,14 +5767,15 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[5], 0x0000_0000_C000_0000u64);
+        assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_C000_0000u64);
        assert!(ctx.cr[0].lt);
    }
    #[test]
-    fn srawix_high_count_negative_input_yields_low32_all_ones() {
+    fn srawix_high_count_negative_input_sign_extends_all_ones() {
-        // PPCBUG-042+043: srawi with count=31 on negative input → low 32 bits
+        // srawi count=31 on negative input → result is -1 (0xFFFFFFFF as i32),
-        // all ones (0xFFFFFFFF), upper 32 zero (was u64::MAX before fix).
+        // sign-extended to the full 64-bit RA: 0xFFFFFFFF_FFFFFFFF (canary
        // InstrEmit_srawix). Was 0x00000000_FFFFFFFF under the zero-extend band-aid.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0x8000_0000u64;
@@ -5564,7 +5784,7 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[5], 0x0000_0000_FFFF_FFFFu64);
+        assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_FFFF_FFFFu64);
    }
    #[test]
@@ -5598,17 +5818,18 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        // Result low 32: 0x00000001 + 0xFFFFFFFF = 0x00000000 with carry.
+        // PPCBUG-020 fix: full 64-bit `RA + EXTS(-1)` = 0xFFFFFFFF_00000001 +
-        assert_eq!(ctx.gpr[4], 0);
+        // 0xFFFFFFFF_FFFFFFFF = 0xFFFFFFFF_00000000 (canary). CA still comes
        // from the 32-bit compare (low 32: 0x00000001 + 0xFFFFFFFF = 0, carry).
        assert_eq!(ctx.gpr[4], 0xFFFFFFFF_00000000u64);
        assert_eq!(ctx.xer_ca, 1, "32-bit compare must see CA=1");
    }
    #[test]
-    fn mulli_overflow_wraps_to_32() {
+    fn mulli_full_64bit_product() {
-        // PPCBUG-004: mulli must truncate to 32 bits even when the upper 32 bits
+        // PPCBUG-020 fix: mulli uses the full 64-bit RA (canary
-        // of RA are polluted (e.g. by upstream bugs). Pre-fix: ra = u64::MAX as
+        // `Mul(LoadGPR(RA), Int64(EXTS(imm)))`). RA = u64::MAX = -1, × 2 = -2
-        // i64 = -1, * 2 = -2, written to GPR as `0xFFFFFFFF_FFFFFFFE`. Post-fix:
+        // = 0xFFFFFFFF_FFFFFFFE (full 64-bit), not the truncated 0xFFFFFFFE.
        // truncated to `0xFFFFFFFE`. Discriminating regression test.
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = u64::MAX;
@@ -5617,13 +5838,14 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[4], 0xFFFF_FFFEu64, "low 32 bits = -2 in i32; upper 32 zero");
+        assert_eq!(ctx.gpr[4], 0xFFFF_FFFF_FFFF_FFFEu64, "full 64-bit -2");
    }
    #[test]
-    fn subficx_neg_simm_zero_extends() {
+    fn subficx_full_64bit_result() {
-        // PPCBUG-005: subfic r4, r3, -1 with r3=5: imm-ra = 0xFFFFFFFF - 5 = 0xFFFFFFFA.
+        // PPCBUG-020 fix: subfic r4, r3, -1 with r3=5 = `EXTS(-1) - RA` =
-        // Buggy form: imm sign-extended to u64 0xFFFFFFFFFFFFFFFF - 5 = poisoned.
+        // 0xFFFFFFFF_FFFFFFFF - 5 = 0xFFFFFFFF_FFFFFFFA (canary `Sub(Int64(
        // EXTS(imm)), RA)`). CA stays a 32-bit compare (0xFFFFFFFF >= 5 → 1).
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 5;
@@ -5632,7 +5854,7 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.gpr[4], 0x0000_0000_FFFF_FFFAu64);
+        assert_eq!(ctx.gpr[4], 0xFFFF_FFFF_FFFF_FFFAu64);
        assert_eq!(ctx.xer_ca, 1, "0xFFFFFFFF >= 5 → CA=1");
    }
@@ -6538,12 +6760,13 @@ mod tests {
        assert_eq!(ctx.pc, 4);
    }
-    // PPCBUG-054: mtspr CTR must truncate the source GPR to 32 bits, matching
+    // CTR is a 64-bit SPR. mtspr CTR stores the full GPR (canary
-    // canary's `f.Truncate(ctr, INT32_TYPE)`. Prevents upstream 64-bit GPR
+    // InstrEmit_mtspr: `f.StoreCTR(rt)`, no truncation). The bdnz/bclr zero-TEST
-    // pollution from poisoning the 32-bit CTR counter independently of the
+    // still truncates to 32 bits (separate, canary-faithful — see the bcx tests
-    // bcx zero-test fix.
+    // above); the earlier PPCBUG-054 store-side truncation was a band-aid that a
    // later `mfspr rX, CTR` would read back wrong.
    #[test]
-    fn mtspr_ctr_truncates_to_32_bits() {
+    fn mtspr_ctr_keeps_full_64_bits() {
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0xFFFF_FFFF_8000_0001;
@@ -6553,7 +6776,26 @@ mod tests {
        write_instr(&mut mem, 0, raw);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
-        assert_eq!(ctx.ctr, 0x8000_0001);
+        assert_eq!(ctx.ctr, 0xFFFF_FFFF_8000_0001);
    }
    // mfspr rX, CTR must read back the full 64-bit CTR (round-trips the value
    // mtspr stored). This is the observable consequence of the mtspr fix.
    #[test]
    fn mfspr_ctr_reads_full_64_bits() {
        let mut ctx = PpcContext::new();
        let mut mem = TestMem::new();
        ctx.gpr[3] = 0xFFFF_FFFF_8000_0001;
        // mtspr CTR, r3 then mfspr r5, CTR
        let spr_swapped = ((9u32 & 0x1F) << 5) | ((9u32 >> 5) & 0x1F);
        let mt = (31u32 << 26) | (3 << 21) | (spr_swapped << 11) | (467 << 1);
        let mf = (31u32 << 26) | (5 << 21) | (spr_swapped << 11) | (339 << 1);
        write_instr(&mut mem, 0, mt);
        write_instr(&mut mem, 4, mf);
        ctx.pc = 0;
        step(&mut ctx, &mut mem);
        step(&mut ctx, &mut mem);
        assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_8000_0001);
    }
    // ───────────────────────────────────────────────────────────────────────
@@ -7640,8 +7882,8 @@ mod tests {
            ctx.xer_ca = 0;
            step(&mut ctx, &mem);
            assert_eq!(ctx.xer_ca, 0, "ra=0, ca=0 should produce CA=0");
-            // PPCBUG-018: 32-bit ABI. !0u32 + 0 = u32::MAX, with upper 32 bits zero.
+            // PPCBUG-020 fix: full 64-bit `!RA + CA` = !0u64 + 0 = u64::MAX.
-            assert_eq!(ctx.gpr[3], 0xFFFF_FFFFu64, "result = !0u32 + 0 = u32::MAX");
+            assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_FFFF_FFFFu64, "result = !0u64 + 0");
        }
        // Case 3: ra=1, ca=0 → CA=0  (old buggy code reported CA=1)
        {
@@ -7653,8 +7895,8 @@ mod tests {
            ctx.xer_ca = 0;
            step(&mut ctx, &mem);
            assert_eq!(ctx.xer_ca, 0, "ra=1, ca=0 should produce CA=0");
-            // PPCBUG-018: 32-bit ABI. !1u32 + 0 = u32::MAX - 1, with upper 32 bits zero.
+            // PPCBUG-020 fix: full 64-bit `!1u64 + 0` = u64::MAX - 1.
-            assert_eq!(ctx.gpr[3], 0xFFFF_FFFEu64, "result = !1u32 + 0 = u32::MAX - 1");
+            assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_FFFF_FFFEu64, "result = !1u64 + 0");
        }
        // Case 4: ra=u32::MAX, ca=1 → CA=0; result = !u32::MAX + 1 = 1.
        {
@@ -7666,7 +7908,9 @@ mod tests {
            ctx.xer_ca = 1;
            step(&mut ctx, &mem);
            assert_eq!(ctx.xer_ca, 0, "ra=u32::MAX, ca=1 should produce CA=0");
-            assert_eq!(ctx.gpr[3], 1, "result = !u32::MAX + 1 = 1");
+            // PPCBUG-020 fix: full 64-bit `!RA + CA`. RA = 0x0000_0000_FFFF_FFFF
            // → !RA = 0xFFFF_FFFF_0000_0000, + 1 = 0xFFFF_FFFF_0000_0001.
            assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_0000_0001u64, "result = !RA + 1");
        }
    }
--- a/crates/xenia-cpu/src/scheduler.rs
+++ b/crates/xenia-cpu/src/scheduler.rs
@@ -35,6 +35,20 @@ pub const INITIAL_GUEST_TID: u32 = 1;
 /// Axis 1 carries the field on every thread but doesn't decrement yet.
 pub const QUANTUM_DEFAULT: u32 = 50_000;
 /// Anti-starvation floor. On a cooperative single-host slot, strict-priority
 /// `pick_runnable` lets a high-priority CPU-bound spinner (e.g. a pri-15
 /// time-critical poll loop pinned by affinity) win every round forever,
 /// permanently starving a co-located lower-priority peer that the spinner is
 /// actually *waiting on* — a deadlock that never occurs on real hardware,
 /// where SMT contexts run those threads concurrently.
 ///
 /// Once a Ready thread has been passed over this many consecutive slot
 /// visits, `pick_runnable` grants it ONE pick (then its counter resets). The
 /// limit is large enough that the genuinely-higher-priority thread still wins
 /// the overwhelming majority of visits (here: ~4095/4096); the boost only
 /// guarantees *bounded* forward progress, it does not invert priority.
 pub const STARVE_LIMIT: u32 = 4096;
 /// Above this depth, `spawn` prunes `Exited` entries from a slot's runqueue
 /// before pushing the new thread. Keeps peer `ThreadRef`s stable on the
 /// common (low-depth) path — a game that spawns a handful of long-lived
@@ -117,6 +131,20 @@ pub struct GuestThread {
    /// Axis 3 instruction budget. Decremented per retired step on this
    /// thread; on zero, slot rotates within same-priority tier.
    pub quantum_remaining: u32,
    /// Anti-starvation counter. Incremented each slot visit this thread is
    /// Ready but NOT picked; reset to 0 when picked. When it reaches
    /// `STARVE_LIMIT`, `pick_runnable` grants this thread one boosted pick so
    /// a monopolizing higher-priority peer on the same slot cannot starve it
    /// indefinitely. Deterministic: a pure function of pick history.
    pub steps_starved: u32,
    /// SpawnParams.entry — the BL target the trampoline jumped to.
    /// Persisted so kernel exports can filter syscalls by spawning
    /// chain (e.g. the silph UI auto-signal POC). 0 for the initial
    /// thread (uses `install_initial_thread`, not `spawn`).
    pub start_entry: u32,
    /// SpawnParams.start_context — initial r3 at spawn. Persisted for
    /// the same filtering reason as `start_entry`.
    pub start_context: u32,
 }
 impl GuestThread {
@@ -136,6 +164,9 @@ impl GuestThread {
            affinity_mask: 0xFF,
            ideal_processor: None,
            quantum_remaining: QUANTUM_DEFAULT,
            steps_starved: 0,
            start_entry: 0,
            start_context: 0,
        }
    }
 }
@@ -208,15 +239,35 @@ impl Default for HwSlot {
 impl HwSlot {
    /// Index of the highest-priority Ready/ServicingIrq thread in this
    /// slot's runqueue. Tiebreak: prefer lower index (deterministic).
    ///
    /// Selection is by *effective* priority: a Ready thread that has been
    /// passed over for `STARVE_LIMIT` consecutive visits is boosted so it
    /// wins exactly one pick, then [`Scheduler::begin_slot_visit`] resets its
    /// counter. This restores the guest-visible invariant that every Ready
    /// thread makes forward progress, without inverting the intended priority
    /// order (a starved thread only beats its monopolizer once per
    /// `STARVE_LIMIT` visits). The boost is a pure function of the per-thread
    /// counters/priority/index, so picks stay deterministic.
    pub fn pick_runnable(&self) -> Option<usize> {
        self.runqueue
            .iter()
            .enumerate()
            .filter(|(_, t)| matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)))
-            .max_by_key(|(i, t)| (t.priority, -(*i as i64)))
+            .max_by_key(|(i, t)| (Self::effective_priority(t), -(*i as i64)))
            .map(|(i, _)| i)
    }
    /// Priority used for selection. A thread starved for `STARVE_LIMIT`
    /// visits is lifted to `i32::MAX` so it wins the next pick regardless of
    /// peer priority; otherwise its nominal priority is used unchanged.
    fn effective_priority(t: &GuestThread) -> i32 {
        if t.steps_starved >= STARVE_LIMIT {
            i32::MAX
        } else {
            t.priority
        }
    }
    /// How many non-Exited threads currently live on this slot (used by
    /// placement policies).
    pub fn live_depth(&self) -> usize {
@@ -341,6 +392,28 @@ pub struct Scheduler {
    /// Sorted by deadline ascending. Scheduler wakes the first entry via
    /// `advance_to_next_wake` when a round finds nothing runnable.
    timed_waits: Vec<(u64, ThreadRef)>,
    /// Coherent monotonic "now" clock — the single authoritative basis the
    /// kernel deadline-arithmetic (`KernelState::now_basis_at`) reads in
    /// BOTH execution modes. Per-thread `ctx(hw_id).timebase` is NOT a
    /// coherent "now":
    ///   * In `--parallel`, workers extract their `PpcContext` (leaving a
    ///     zeroed timebase in the slot) and step unlocked.
    ///   * In **lockstep**, a parked/poll thread has `running_idx == None`,
    ///     so `ctx()` returns `idle_ctx` (timebase 0); a `parse_timeout`
    ///     reading that basis registers `deadline = 0 + relative`, a value
    ///     permanently in the past, and `coord_idle_advance` re-arms that
    ///     same constant deadline forever (timebase-desync livelock — the
    ///     render-gate root: the submitter's 16ms re-wait never fires).
    /// So a coordinator/parked thread reading per-thread timebase can see a
    /// stale/zero basis decoupled from the deadline it just advanced to.
    /// This field is that coherent basis instead. It is DETERMINISTIC: a
    /// pure function of retired guest instructions (never wall-clock).
    /// Advanced by `advance_global_clock` (per-block retired count on each
    /// parallel writeback), `advance_global_clock_to` (floored up to the
    /// deterministic per-round `stats.instruction_count` in lockstep), and
    /// floored up by `advance_all_timebases_to`. Two cold lockstep runs
    /// read identical values, so the lockstep trace stays bit-reproducible.
    global_clock: u64,
    /// Global count of TLS slots allocated — `spawn` pre-sizes new threads'
    /// `tls_values` to this.
    tls_slot_count: usize,
@@ -379,6 +452,7 @@ impl Scheduler {
            order,
            rng_state,
            timed_waits: Vec::new(),
            global_clock: 0,
            tls_slot_count: 0,
            non_empty_runnable: 0,
            rotation_cursor: 0,
@@ -500,6 +574,17 @@ impl Scheduler {
        self.current.expect("no current thread")
    }
    /// `(start_entry, start_context)` of the currently-running thread.
    /// Returns None if there is no current thread or its ref is stale.
    /// Used by `KernelState::maybe_register_silph_autosignal` to filter
    /// `NtCreateEvent` calls by spawning chain.
    pub fn current_thread_entry_and_ctx(&self) -> Option<(u32, u32)> {
        let r = self.current?;
        let slot = self.slots.get(r.hw_id as usize)?;
        let t = slot.runqueue.get(r.idx as usize)?;
        Some((t.start_entry, t.start_context))
    }
    // ----- Guest-thread lookup -----
    /// Find the `ThreadRef` of the (non-Exited) thread with `tid`.
@@ -614,6 +699,8 @@ impl Scheduler {
        t.priority = params.priority;
        t.affinity_mask = mask;
        t.ideal_processor = params.ideal_processor;
        t.start_entry = params.entry;
        t.start_context = params.start_context;
        // M3.7 — populate the inter-thread reservation handle + slot id
        // so the interpreter can route lwarx/stwcx through the table.
        t.ctx.hw_id = slot_id;
@@ -744,10 +831,22 @@ impl Scheduler {
    /// stashes `self.current` so exports can reach it.
    pub fn begin_slot_visit(&mut self, hw_id: u8) {
        let slot = &mut self.slots[hw_id as usize];
-        slot.running_idx = slot.pick_runnable();
+        let picked = slot.pick_runnable();
-        self.current = slot
+        slot.running_idx = picked;
-            .running_idx
+        // Anti-starvation bookkeeping: reset the picked thread's counter,
-            .map(|idx| ThreadRef::new(hw_id, idx as u16));
+        // increment every other Ready peer that was passed over this visit.
        // Once a passed-over thread reaches STARVE_LIMIT it wins the next
        // pick_runnable (effective_priority -> i32::MAX), then lands here as
        // `picked` and resets — bounding any thread's starvation. Pure
        // function of pick history, so it stays deterministic.
        for (i, t) in slot.runqueue.iter_mut().enumerate() {
            if Some(i) == picked {
                t.steps_starved = 0;
            } else if matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)) {
                t.steps_starved = t.steps_starved.saturating_add(1);
            }
        }
        self.current = picked.map(|idx| ThreadRef::new(hw_id, idx as u16));
    }
    /// Clear `current` at the end of each per-slot visit.
@@ -803,6 +902,41 @@ impl Scheduler {
        false
    }
    /// Cooperative yield: the currently-running thread executed a `db16cyc`
    /// spin-wait hint (see `StepResult::Yield`). It is busy-spinning on a
    /// guest spinlock/barrier whose release depends on a *co-located* peer
    /// that cannot make progress while this thread keeps winning the slot.
    ///
    /// Promote every Ready peer on this slot past `STARVE_LIMIT` so the next
    /// `begin_slot_visit` picks one of them (their `effective_priority` →
    /// `i32::MAX`), and reset the yielder's own counter. Each promoted peer
    /// runs once and resets to 0 in `begin_slot_visit`; once all peers have
    /// had their turn the spinner is picked again, spins, and re-yields —
    /// producing a fair round-robin between the spinner and the threads it is
    /// waiting on. This mirrors real hardware, where all six HW threads run
    /// concurrently and the spin resolves as soon as the peer releases.
    ///
    /// Pure function of the slot's current state (no RNG, no wall-clock), so
    /// it preserves lockstep determinism. No-op if there is no Ready peer
    /// (the spinner is alone on its slot — nothing to hand off to).
    ///
    /// Returns `true` if at least one peer was promoted.
    pub fn yield_current(&mut self) -> bool {
        let Some(r) = self.current else { return false; };
        let slot = &mut self.slots[r.hw_id as usize];
        let me = r.idx as usize;
        let mut promoted = false;
        for (i, t) in slot.runqueue.iter_mut().enumerate() {
            if i == me {
                t.steps_starved = 0;
            } else if matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)) {
                t.steps_starved = STARVE_LIMIT;
                promoted = true;
            }
        }
        promoted
    }
    // ----- Park / wake / exit -----
    pub fn park_current(&mut self, reason: BlockReason) {
@@ -1091,6 +1225,42 @@ impl Scheduler {
                }
            }
        }
        // Keep the parallel-mode coherent clock at least as far forward as
        // any deadline we fast-forward to (idle/timer/wake advances). This
        // only mutates the new `global_clock` field — lockstep never reads
        // it — so it cannot perturb the deterministic lockstep trace.
        self.global_clock = self.global_clock.max(deadline);
    }
    /// Parallel-mode coherent "now" (see [`Self::global_clock`] field doc).
    /// Read by the kernel deadline-arithmetic ONLY when
    /// `KernelState::parallel_active`; lockstep keeps reading per-thread
    /// `ctx(hw_id).timebase`.
    #[inline]
    pub fn global_clock(&self) -> u64 {
        self.global_clock
    }
    /// Advance the parallel-mode coherent clock by `n` retired instructions.
    /// Called from the parallel worker writeback with the block's executed
    /// count so "now" tracks aggregate guest progress.
    #[inline]
    pub fn advance_global_clock(&mut self, n: u64) {
        self.global_clock = self.global_clock.saturating_add(n);
    }
    /// Floor the coherent clock up to `now` (monotonic; never goes
    /// backwards). Used by the **lockstep** outer loop once per round to
    /// track the deterministic retired-instruction count
    /// (`stats.instruction_count`) as the single coherent "now". A plain
    /// floor-up rather than `saturating_add` because the lockstep caller
    /// passes an absolute monotonic counter (not a per-block delta), and
    /// because `advance_all_timebases_to` may already have pushed
    /// `global_clock` past the instruction count when fast-forwarding to a
    /// future deadline — clamping with `max` keeps both sources monotone.
    #[inline]
    pub fn advance_global_clock_to(&mut self, now: u64) {
        self.global_clock = self.global_clock.max(now);
    }
    /// Fast-forward the timebase to the earliest pending timed wait and
@@ -1161,6 +1331,28 @@ impl Scheduler {
        })
    }
    /// True if any thread is currently `Blocked` on a `WaitAny`/`WaitAll`
    /// whose handle set contains `handle`. Used by the handle-slab recycler
    /// (AUDIT-059 R34) to avoid an ABA hazard: if a closed handle's slot is
    /// returned to the free list while a thread is still parked on it, a
    /// later `alloc_handle` could hand the same slot to a NEW object, and a
    /// signal on that new object would wake the stale waiter that was
    /// waiting on the OLD (closed) object. Canary sidesteps this by keeping
    /// the object alive via an object_ref while waiters hold references; we
    /// instead simply decline to recycle a still-waited slot (leaking it,
    /// matching the pre-R34 bump-only behaviour for that rare case).
    pub fn any_thread_waiting_on(&self, handle: u32) -> bool {
        self.slots.iter().any(|slot| {
            slot.runqueue.iter().any(|t| match &t.state {
                HwState::Blocked(BlockReason::WaitAny { handles, .. })
                | HwState::Blocked(BlockReason::WaitAll { handles, .. }) => {
                    handles.contains(&handle)
                }
                _ => false,
            })
        })
    }
    /// Snapshot thread states for diagnostic logging. One entry per live
    /// guest thread (Exited are included so post-mortem can see exit codes).
    pub fn diagnostic_snapshot(&self) -> Vec<(ThreadRef, Option<u32>, HwState)> {
@@ -1858,6 +2050,118 @@ mod tests {
        assert_eq!(t.quantum_remaining, QUANTUM_DEFAULT, "quantum reloaded");
    }
    #[test]
    fn test_anti_starvation_bounded_progress() {
        // Reproduces the Sylpheed render-gate deadlock: a high-priority
        // CPU-bound spinner (the pri-15 poll loop) co-located on one slot
        // with a pri-0 worker (the submitter) the spinner is waiting on.
        // Strict priority would starve the worker forever; the anti-starve
        // floor must hand it a pick within STARVE_LIMIT+1 visits, then the
        // spinner reclaims the slot (priority is NOT inverted).
        let mut s = mk_empty_scheduler();
        let mut spinner = SpawnParams::default();
        spinner.guest_tid = 1;
        spinner.thread_handle = 0x1000;
        spinner.affinity_mask = 0b0001;
        spinner.pcr_base = 0x4000_0000;
        spinner.priority = 15;
        s.spawn(spinner, &mut NullPcr).unwrap();
        let mut worker = SpawnParams::default();
        worker.guest_tid = 2;
        worker.thread_handle = 0x1004;
        worker.affinity_mask = 0b0001;
        worker.pcr_base = 0x4000_1000;
        worker.priority = 0;
        s.spawn(worker, &mut NullPcr).unwrap();
        let mut worker_picks = 0u32;
        let mut spinner_picks = 0u32;
        // Both stay Ready (the spinner never blocks — that's the bug shape).
        for _ in 0..(STARVE_LIMIT + 2) {
            s.begin_slot_visit(0);
            match s.thread(s.current.unwrap()).tid {
                1 => spinner_picks += 1,
                2 => worker_picks += 1,
                other => panic!("unexpected tid {other}"),
            }
            s.end_slot_visit();
        }
        assert_eq!(
            worker_picks, 1,
            "starved worker gets exactly one bounded pick within STARVE_LIMIT+2 visits"
        );
        assert_eq!(
            spinner_picks,
            STARVE_LIMIT + 1,
            "high-priority spinner still dominates — priority is not inverted"
        );
    }
    #[test]
    fn test_db16cyc_yield_hands_slot_to_peer() {
        // Reproduces the Sylpheed title-screen gate: a guest spinlock/barrier
        // participant (tid=1) executes the `db16cyc` spin hint each round and
        // would otherwise win `pick_runnable` forever (equal priority, lower
        // index), starving the co-located peer (tid=2) it is waiting on.
        // `yield_current` must promote the Ready peer so the very next
        // `begin_slot_visit` picks it — without waiting STARVE_LIMIT rounds.
        let mut s = mk_empty_scheduler();
        for tid in [1u32, 2] {
            let mut p = SpawnParams::default();
            p.guest_tid = tid;
            p.thread_handle = 0x1000 + tid * 4;
            p.affinity_mask = 0b0001;
            p.pcr_base = 0x4000_0000 + tid * 0x1000;
            p.priority = 0; // equal priority — index would otherwise decide
            s.spawn(p, &mut NullPcr).unwrap();
        }
        // Round 1: the spinner (lower index) wins.
        s.begin_slot_visit(0);
        let spinner = s.thread(s.current.unwrap()).tid;
        assert_eq!(spinner, 1, "lower-index equal-priority thread wins first pick");
        // It spins (db16cyc) → cooperative yield.
        assert!(s.yield_current(), "yield promotes the Ready peer");
        s.end_slot_visit();
        // Round 2: the promoted peer must now be picked, not the spinner.
        s.begin_slot_visit(0);
        let after_yield = s.thread(s.current.unwrap()).tid;
        assert_eq!(
            after_yield, 2,
            "after db16cyc yield the co-located peer runs (no STARVE_LIMIT wait)"
        );
        s.end_slot_visit();
        // Round 3: peer's boost was consumed (reset to 0 when picked), so the
        // spinner reclaims the slot — fair alternation, no priority inversion.
        s.begin_slot_visit(0);
        assert_eq!(
            s.thread(s.current.unwrap()).tid,
            1,
            "spinner reclaims the slot after the peer has had its turn"
        );
    }
    #[test]
    fn test_yield_current_noop_when_alone() {
        // A spinner with no Ready peer on its slot has nothing to hand off to;
        // yield_current must be a no-op (returns false) and not panic.
        let mut s = mk_empty_scheduler();
        let mut p = SpawnParams::default();
        p.guest_tid = 1;
        p.thread_handle = 0x1004;
        p.affinity_mask = 0b0001;
        p.pcr_base = 0x4000_0000;
        s.spawn(p, &mut NullPcr).unwrap();
        s.begin_slot_visit(0);
        assert!(!s.yield_current(), "no peer to promote → no-op");
        // Still the same thread next round.
        s.end_slot_visit();
        s.begin_slot_visit(0);
        assert_eq!(s.thread(s.current.unwrap()).tid, 1);
    }
    #[test]
    fn test_cooperative_yield_does_not_need_quantum() {
        let mut s = mk_empty_scheduler();
--- a/crates/xenia-cpu/src/vmx.rs
+++ b/crates/xenia-cpu/src/vmx.rs
@@ -293,28 +293,23 @@ pub fn store_vector_right(mem: &dyn MemoryAccess, ea: u32, v: Vec128) {
    }
 }
-// ─── 5-6-5 pixel pack (vpkpx / vupkhpx / vupklpx) ─────────────────────────
+// ─── pixel pack (vpkpx / vupkhpx / vupklpx) ───────────────────────────────
-// PPC vpkpx takes a 32-bit RGB lane and packs it into a 16-bit 1-5-5-5 pixel.
+// PPC vpkpx packs each 32-bit lane into a 16-bit 1-5-5-5 pixel.
-// vupkhpx / vupklpx reverse the operation.
+// Mapping transcribed EXACTLY from xenia-canary
-//
+// `ppc_emit_altivec.cc::vkpkx_in_low` (lines 1795-1808):
-// Format: input 32-bit word holds
+//     tmp1 = (input >> 9) & 0xFC00   // out bits 15:10 = in bits 24:19
-//     bits 0-6: unused (0)
+//     tmp2 = (input >> 6) & 0x3E0    // out bits  9:5  = in bits 14:10
-//     bit 7:    alpha-select (→ bit 15 of output)
+//     tmp3 = (input >> 3) & 0x1F     // out bits  4:0  = in bits  7:3
-//     bits 8-15:  R (top 5 bits kept)
+//     result = tmp1 | tmp2 | tmp3
-//     bits 16-23: G (top 5 bits kept)
+// This is a pure shift/mask: there is NO standalone alpha select. Output
-//     bits 24-31: B (top 5 bits kept)
+// bit 15 is simply input bit 24 (the top of the 6-bit field masked by
-// Output 16-bit word:
+// 0xFC00) — NOT input bit 7. The red field is 6 bits wide here.
 //     bit 15:   A (from input bit 7)
 //     bits 10-14: R
 //     bits 5-9:   G
 //     bits 0-4:   B
 #[inline] pub fn pack_pixel_555(input: u32) -> u16 {
-    let a = (input >> 7) & 0x1;
+    let tmp1 = (input >> 9) & 0xFC00;
-    let r = (input >> 8) & 0xFF;
+    let tmp2 = (input >> 6) & 0x3E0;
-    let g = (input >> 16) & 0xFF;
+    let tmp3 = (input >> 3) & 0x1F;
-    let b = (input >> 24) & 0xFF;
+    (tmp1 | tmp2 | tmp3) as u16
    ((a << 15) | ((r & 0xF8) << 7) | ((g & 0xF8) << 2) | ((b & 0xF8) >> 3)) as u16
 }
 #[inline] pub fn unpack_pixel_555(input: u16) -> u32 {
@@ -801,9 +796,38 @@ mod tests {
    }
    #[test]
-    fn pack_unpack_pixel_555() {
+    fn pack_pixel_555_matches_canary() {
-        let encoded = pack_pixel_555(0x80_F8_F8_F8);
+        // Mapping (canary ppc_emit_altivec.cc::vkpkx_in_low):
-        assert_eq!(encoded & 0x8000, 0x8000);
+        //   out[15:10] = in[24:19], out[9:5] = in[14:10], out[4:0] = in[7:3]
        // Pure shift/mask, NO standalone alpha bit.
        // All three colour fields exercised. Expected (hand-computed):
        //   (0x018844C0 >> 9)&0xFC00 = 0xC400
        //   (0x018844C0 >> 6)&0x3E0  = 0x100
        //   (0x018844C0 >> 3)&0x1F   = 0x18
        //   => 0xC518
        assert_eq!(pack_pixel_555(0x01_88_44_C0), 0xC518);
        // Boundary the audit flagged: low byte 0xF8 has bit 7 set. Canary does
        // NOT turn that into output bit 15 (alpha). Output bit 15 = in bit 24,
        // which is 0 here => high bit clear. (Old impl wrongly produced 0x8000.)
        assert_eq!(pack_pixel_555(0x80_F8_F8_F8), 0x7FFF);
        assert_eq!(pack_pixel_555(0x80_F8_F8_F8) & 0x8000, 0);
        // Lone source bit 7 (0x80) lands in the blue field, not in bit 15.
        assert_eq!(pack_pixel_555(0x00_00_00_80), 0x0010);
        // Output bit 15 is sourced from input bit 24, not bit 7.
        assert_eq!(pack_pixel_555(0x01_00_00_00), 0x8000);
        // Saturated input -> all field bits set.
        assert_eq!(pack_pixel_555(0xFF_FF_FF_FF), 0xFFFF);
    }
    #[test]
    fn unpack_pixel_555_roundtrip() {
        // vupkhpx/vupklpx are NOTIMPLEMENTED in canary, so unpack_pixel_555 is
        // unchanged; just sanity-check the alpha-replicate path still holds.
        let w = unpack_pixel_555(0x8000 | (0x1F << 10) | (0x1F << 5) | 0x1F);
        assert_eq!(w & 0xFF000000, 0xFF000000);
    }
--- a/crates/xenia-gpu/src/gpu_system.rs
+++ b/crates/xenia-gpu/src/gpu_system.rs
@@ -28,6 +28,56 @@ use crate::primitive::{self, ProcessedPrimitive};
 use crate::register_file::RegisterFile;
 use crate::ring_view::RingBufferView;
 /// The guest-virtual window that physical allocations are committed into.
 /// `xenia-kernel`'s `heap_alloc` bumps its cursor through `0x4000_0000..=
 /// 0x6FFF_FFFF` and commits the host backing for `MmAllocatePhysicalMemoryEx`
 /// there, so this write-combine mirror is the canonical home of physical DRAM.
 /// Keep in sync with `KernelState::heap_cursor`'s initial value.
 pub const PHYSICAL_BACKING_BASE: u32 = 0x4000_0000;
 /// Re-project a guest *physical* address — as handed to the Vd/GPU ABI and
 /// embedded in PM4 pointers (`INDIRECT_BUFFER`, `WAIT_REG_MEM`-memory,
 /// `MEM_WRITE`, `EVENT_WRITE*`, `IM_LOAD`, …) — onto the guest-virtual window
 /// where its host backing is actually committed.
 ///
 /// The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
 /// windows that differ only in cache policy: bare physical (`0x0xxxxxxx`),
 /// write-combine (`0x4xxxxxxx`), and the cached `0xA/0xC/0xExxxxxxx` mirrors —
 /// all aliasing `addr & 0x1FFF_FFFF`. On real hardware (and in xenia-canary
 /// via overlapping `mmap`s) these are literally the same bytes.
 ///
 /// Ours has a single flat `membase` and `MmAllocatePhysicalMemoryEx` commits
 /// physical backing in the write-combine `0x4xxxxxxx` window. The guest then
 /// masks its allocation base to *bare physical* before passing it to
 /// `VdInitializeRingBuffer` / `VdEnableRingBufferRPtrWriteBack`, and PM4
 /// pointers are likewise bare-physical. A flat `membase + phys` access
 /// therefore hits a never-committed, zero-filled page instead of the committed
 /// `0x4xxxxxxx` backing — so the GPU decoded zero PM4 headers and never ran
 /// the real command stream.
 ///
 /// Projecting any physical-mirror address back onto the `0x4xxxxxxx` window
 /// lands on the page `heap_alloc` actually backed, regardless of which mirror
 /// the guest used (idempotent for `0x4xxxxxxx` itself). The projection is
 /// derived from `heap_alloc`'s placement, not a guess — if that window ever
 /// moves, `PHYSICAL_BACKING_BASE` must move with it.
 ///
 /// This is deliberately applied only at the GPU/Vd boundary (where addresses
 /// arrive in their bare-physical form), NOT on the CPU's flat load/store path:
 /// the guest CPU already accesses its allocations through the `0x4xxxxxxx`
 /// base, and non-physical guest-virtual addresses (image `0x82xxxxxx`, stacks
 /// `0x7xxxxxxx`) must stay flat.
 #[inline]
 pub fn physical_to_backing(addr: u32) -> u32 {
    match addr {
        0x0000_0000..=0x1FFF_FFFF
        | 0x4000_0000..=0x4FFF_FFFF
        | 0xA000_0000..=0xBFFF_FFFF
        | 0xC000_0000..=0xDFFF_FFFF
        | 0xE000_0000..=0xFFFF_FFFF => PHYSICAL_BACKING_BASE | (addr & 0x1FFF_FFFF),
        _ => addr,
    }
 }
 /// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets.
 #[derive(Debug, Clone)]
 pub struct ShaderBlob {
@@ -58,21 +108,37 @@ pub enum WaitCmp {
    GreaterEq,
    /// value > ref
    Greater,
-    /// Always — caller wants to sleep regardless.
+    /// Always — caller wants to sleep regardless (selector bit 7).
    Always,
    /// Never matches — `wait_info & 7 == 0` selects bit 0 of canary's
    /// selector word, which is always zero.
    Never,
 }
 impl WaitCmp {
-    /// Interpret the lower 3 bits of `wait_info` per canary's `MatchValueAndRef`.
+    /// Interpret the lower 3 bits of `wait_info` per canary's `MatchValueAndRef`
    /// (`pm4_command_processor_implement.h:685-696`). Canary forms a selector
    /// `((value<ref)<<1) | ((value<=ref)<<2) | ((value==ref)<<3) |
    /// ((value!=ref)<<4) | ((value>=ref)<<5) | ((value>ref)<<6) | (1<<7)` and
    /// evaluates `(selector >> (wait_info & 7)) & 1`. So the index is the bit
    /// position: 1=Less, 2=LessEq, 3=Equal, 4=NotEqual, 5=GreaterEq,
    /// 6=Greater, 7=always-true, 0=never (bit 0 is always clear).
    ///
    /// GPUBUG: the prior mapping was off by one (it started at `0 => Less`),
    /// so `wait_info & 7 == 3` decoded as `NotEqual` instead of `Equal`. That
    /// inverted the standard CP coherency wait
    /// (`WAIT_REG_MEM COHER_STATUS_HOST, Equal 0`): the GPU parked forever on
    /// the first INDIRECT_BUFFER and never reached any draw.
    pub fn from_wait_info(wait_info: u32) -> Self {
        match wait_info & 0x7 {
-            0 => WaitCmp::Less,
+            1 => WaitCmp::Less,
-            1 => WaitCmp::LessEq,
+            2 => WaitCmp::LessEq,
-            2 => WaitCmp::Equal,
+            3 => WaitCmp::Equal,
-            3 => WaitCmp::NotEqual,
+            4 => WaitCmp::NotEqual,
-            4 => WaitCmp::GreaterEq,
+            5 => WaitCmp::GreaterEq,
-            5 => WaitCmp::Greater,
+            6 => WaitCmp::Greater,
-            _ => WaitCmp::Always,
+            7 => WaitCmp::Always,
            _ => WaitCmp::Never,
        }
    }
@@ -85,6 +151,7 @@ impl WaitCmp {
            WaitCmp::GreaterEq => value >= reference,
            WaitCmp::Greater => value > reference,
            WaitCmp::Always => true,
            WaitCmp::Never => false,
        }
    }
 }
@@ -561,6 +628,12 @@ impl GpuSystem {
    pub fn execute_one(&mut self, mem: &dyn MemoryAccess) -> ExecOutcome {
        // 0) If currently parked, probe the condition and either wake up or stay blocked.
        if let Some(block) = self.pending_block.clone() {
            // Re-service the CP coherency handshake on each probe so a
            // COHER_STATUS_HOST wait can clear (canary does this in its WAIT
            // loop body, not just at entry).
            if let GpuBlock::WaitRegMem { poll_addr, is_memory: false, .. } = &block {
                self.make_coherent(*poll_addr);
            }
            if block.is_satisfied(mem, &self.register_file) {
                tracing::debug!(?block, "gpu: wait satisfied — resuming");
                self.pending_block = None;
@@ -658,6 +731,10 @@ impl GpuSystem {
    /// Called by `VdInitializeRingBuffer` to give us the primary ring.
    pub fn initialize_ring_buffer(&mut self, base: u32, size_log2: u32) {
        let size_bytes = 1u32 << size_log2.min(31);
        // The guest hands us a bare *physical* ring base; project it onto the
        // committed backing window so ring reads hit real PM4 packets (see
        // `physical_to_backing`).
        let base = physical_to_backing(base);
        self.ring.base = base;
        self.ring.size_dwords = size_bytes / 4;
        self.ring.read_offset_dwords = 0;
@@ -675,6 +752,10 @@ impl GpuSystem {
    /// Called by `VdEnableRingBufferRPtrWriteBack` to record where the guest
    /// expects us to mirror `read_offset_dwords`.
    pub fn enable_rptr_writeback(&mut self, addr: u32, block_log2: u32) {
        // The guest registers a bare *physical* writeback address and polls
        // the same allocation through its `0x4xxxxxxx` base; project so our
        // RPtr store lands on the page the guest actually reads.
        let addr = physical_to_backing(addr);
        self.ring.rptr_writeback_addr = addr;
        self.ring.rptr_writeback_block_dwords = 1u32 << block_log2.min(31);
        tracing::info!(
@@ -724,6 +805,26 @@ impl GpuSystem {
    /// upstream packet effects (memory writes, register file updates
    /// the guest reads via subsequent MMIO) happen-before the
    /// CPU-visible RPTR bump.
    /// Service a CP coherency request, mirroring canary's
    /// `CommandProcessor::MakeCoherent` (`command_processor.cc:801-838`).
    ///
    /// The guest requests a vertex/texture-cache flush by writing
    /// `COHER_STATUS_HOST` with its status bit (bit 31) set, then spins on a
    /// `WAIT_REG_MEM COHER_STATUS_HOST, Equal 0`. We have no host cache to
    /// flush (memory is shared, coherency is implicit), so completing the
    /// request is simply clearing the register — which lets the wait satisfy.
    /// No-op unless `poll_addr` is `COHER_STATUS_HOST` and its status bit is
    /// set, so it is safe to call on every coherency-register WAIT probe.
    fn make_coherent(&mut self, poll_addr: u32) {
        if poll_addr != reg::COHER_STATUS_HOST {
            return;
        }
        let status = self.register_file.read(reg::COHER_STATUS_HOST);
        if status & 0x8000_0000 != 0 {
            self.register_file.write(reg::COHER_STATUS_HOST, 0);
        }
    }
    fn writeback_read_ptr(&mut self, mem: &dyn MemoryAccess) {
        if self.ring.rptr_writeback_addr != 0 && self.ring.is_initialized() {
            mem.write_u32_fence(
@@ -816,7 +917,9 @@ impl GpuSystem {
            }
            pm4::PM4_INDIRECT_BUFFER | pm4::PM4_INDIRECT_BUFFER_PFD => {
                self.stats.indirect_buffer_jumps += 1;
-                let ib_ptr = self.read_payload(mem, 1);
+                // The IB pointer is a guest *physical* address — project it
                // onto the committed backing window (see `physical_to_backing`).
                let ib_ptr = physical_to_backing(self.read_payload(mem, 1));
                let ib_size = self.read_payload(mem, 2);
                // Advance past the IB header + payload before recursing so
                // the return location is correct.
@@ -854,7 +957,8 @@ impl GpuSystem {
                let is_memory = (wait_info & 0x10) != 0;
                let cmp = WaitCmp::from_wait_info(wait_info);
                let poll_addr = if is_memory {
-                    poll_addr_raw & !3
+                    // Physical memory poll address → committed backing.
                    physical_to_backing(poll_addr_raw & !3)
                } else {
                    poll_addr_raw
                };
@@ -865,6 +969,12 @@ impl GpuSystem {
                    mask,
                    cmp,
                };
                // A WAIT polling COHER_STATUS_HOST is the CP coherency
                // handshake: service it now so the status bit clears (see
                // `make_coherent`), exactly as canary does in its WAIT loop.
                if !is_memory {
                    self.make_coherent(poll_addr);
                }
                if block.is_satisfied(mem, &self.register_file) {
                    // Condition already true; proceed past this packet.
                    tracing::trace!(?block, "gpu: WAIT_REG_MEM immediately satisfied");
@@ -908,7 +1018,7 @@ impl GpuSystem {
            pm4::PM4_REG_TO_MEM => {
                // payload[0] = reg_index, payload[1] = mem addr
                let reg_index = self.read_payload(mem, 1) & 0x1FFF;
-                let dst = self.read_payload(mem, 2) & !3;
+                let dst = physical_to_backing(self.read_payload(mem, 2) & !3);
                let value = self.register_file.read(reg_index);
                mem.write_u32(dst, value);
                tracing::trace!(
@@ -920,7 +1030,7 @@ impl GpuSystem {
            }
            pm4::PM4_MEM_WRITE => {
                // payload[0] = dst, payload[1..=count-1] = values
-                let mut dst = self.read_payload(mem, 1) & !3;
+                let mut dst = physical_to_backing(self.read_payload(mem, 1) & !3);
                for i in 2..=count {
                    let val = self.read_payload(mem, i);
                    mem.write_u32(dst, val);
@@ -936,7 +1046,7 @@ impl GpuSystem {
                let mask = self.read_payload(mem, 4);
                let is_memory = (wait_info & 0x10) != 0;
                let cmp = WaitCmp::from_wait_info(wait_info);
-                let poll_addr = if is_memory { poll_raw & !3 } else { poll_raw };
+                let poll_addr = if is_memory { physical_to_backing(poll_raw & !3) } else { poll_raw };
                let cur_raw = if is_memory {
                    mem.read_u32(poll_addr)
                } else {
@@ -946,7 +1056,7 @@ impl GpuSystem {
                    let write_addr = self.read_payload(mem, 5);
                    let write_data = self.read_payload(mem, 6);
                    if (wait_info & 0x100) != 0 {
-                        mem.write_u32(write_addr & !3, write_data);
+                        mem.write_u32(physical_to_backing(write_addr & !3), write_data);
                    } else {
                        self.register_file
                            .write(write_addr & 0x1FFF, write_data);
@@ -965,7 +1075,7 @@ impl GpuSystem {
                // payload[0] = initiator (bit 31: write counter, else write `value`)
                // payload[1] = address, payload[2] = value
                let initiator = self.read_payload(mem, 1);
-                let address = self.read_payload(mem, 2);
+                let address = physical_to_backing(self.read_payload(mem, 2));
                let value = self.read_payload(mem, 3);
                self.register_file
                    .write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F);
@@ -993,7 +1103,7 @@ impl GpuSystem {
                // payload[0] = initiator, [1] = address. Writes 6 u16 extents
                // (min/max x/y/z) — we're not tracking scissors yet, so write zeros.
                let initiator = self.read_payload(mem, 1);
-                let address = self.read_payload(mem, 2) & !3;
+                let address = physical_to_backing(self.read_payload(mem, 2) & !3);
                self.register_file
                    .write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F);
                self.handle_event_initiator(initiator & 0x3F, mem);
@@ -1123,7 +1233,7 @@ impl GpuSystem {
            }
            pm4::PM4_LOAD_ALU_CONSTANT => {
                // payload[0] = source mem addr, [1] = offset_type, [2] = size_dwords
-                let src = self.read_payload(mem, 1) & !3;
+                let src = physical_to_backing(self.read_payload(mem, 1) & !3);
                let offset_type = self.read_payload(mem, 2);
                let size_dwords = self.read_payload(mem, 3);
                let index = offset_type & 0x7FF;
@@ -1155,7 +1265,7 @@ impl GpuSystem {
                    }
                    v
                } else {
-                    let addr = self.read_payload(mem, 1) & !3;
+                    let addr = physical_to_backing(self.read_payload(mem, 1) & !3);
                    let mut v = Vec::with_capacity(size_dwords as usize);
                    for i in 0..size_dwords {
                        v.push(mem.read_u32(addr + i * 4));
@@ -1477,8 +1587,9 @@ mod tests {
        // header
        let hdr = (3u32 << 30) | ((5u32 - 1) << 16) | ((pm4::PM4_WAIT_REG_MEM as u32) << 8);
        mem.write_u32(0x4000_0000, hdr);
-        // wait_info: is_memory=1 (bit 4), cmp=equal (bits 2:0 = 2)
+        // wait_info: is_memory=1 (bit 4), cmp=equal (bits 2:0 = 3, per canary's
-        mem.write_u32(0x4000_0004, 0x12);
+        // MatchValueAndRef selector: 1=Less, 2=LessEq, 3=Equal, …).
        mem.write_u32(0x4000_0004, 0x13);
        mem.write_u32(0x4000_0008, 0x4000_1000);
        mem.write_u32(0x4000_000C, 0x42);
        mem.write_u32(0x4000_0010, 0xFFFF_FFFF);
--- a/crates/xenia-gpu/src/lib.rs
+++ b/crates/xenia-gpu/src/lib.rs
@@ -34,7 +34,7 @@ pub mod xenos_constants;
 pub use gpu_system::{
    ExecOutcome, GpuBlock, GpuMmio, GpuStats, GpuSystem, InterruptSource, PendingInterrupt,
-    ShaderBlob, SwapNotification, WaitCmp,
+    PHYSICAL_BACKING_BASE, ShaderBlob, SwapNotification, WaitCmp, physical_to_backing,
 };
 pub use handle::{
    DrainReply, GpuBackend, GpuCommand, GpuDigestSnapshot, GpuHandle, GpuWorker,
--- a/crates/xenia-gpu/src/resolve.rs
+++ b/crates/xenia-gpu/src/resolve.rs
@@ -364,7 +364,11 @@ pub fn copy_to_memory(
            // Destination coordinates are 0-based against `dest_base` — the
            // base already points at the top-left of the copy rectangle.
            let dst_off = tiled_2d_offset(dx, dy, pitch_aligned, bpp_log2);
-            let dst_addr = info.dest_base.wrapping_add(dst_off);
+            // `dest_base` is a bare guest *physical* address; project onto the
            // committed backing window so resolved pixels land where the guest
            // (and `vd_swap`'s frontbuffer read) actually see them.
            let dst_addr =
                crate::gpu_system::physical_to_backing(info.dest_base.wrapping_add(dst_off));
            if info.source_is_64bpp {
                let (lo, hi) = match single_sample_idx {
--- a/crates/xenia-kernel/src/exports.rs
+++ b/crates/xenia-kernel/src/exports.rs
@@ -486,12 +486,20 @@ fn ke_query_performance_frequency(ctx: &mut PpcContext, _mem: &GuestMemory, _sta
    ctx.gpr[3] = 50_000_000; // 50 MHz
 }
-fn ke_query_system_time(ctx: &mut PpcContext, mem: &GuestMemory, _state: &mut KernelState) {
+fn ke_query_system_time(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
    let time_ptr = ctx.gpr[3] as u32;
    if time_ptr != 0 {
-        let fake_time: u64 = 132_500_000_000_000_000; // ~2021 FILETIME
+        // ITERATE-2J — advance with the same deterministic clock the
-        mem.write_u32(time_ptr, (fake_time >> 32) as u32);
+        // KeTimeStampBundle uses (1 global_clock unit ≈ 100 ns) so a guest
-        mem.write_u32(time_ptr + 4, fake_time as u32);
+        // that polls KeQuerySystemTime for elapsed time also sees forward
        // progress instead of a frozen constant. FILETIME base (~2021) +
        // 100-ns-unit clock.
        const FILETIME_BASE: u64 = 132_500_000_000_000_000;
        let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
        let now = state.now_basis_at(hw_id);
        let system_time = FILETIME_BASE.wrapping_add(now);
        mem.write_u32(time_ptr, (system_time >> 32) as u32);
        mem.write_u32(time_ptr + 4, system_time as u32);
    }
 }
@@ -696,9 +704,36 @@ fn mm_create_kernel_stack(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut K
    }
 }
 /// Region-aware guest-virtual → physical translation, matching canary's
 /// `Memory::GetPhysicalAddress` + `PhysicalHeap::GetPhysicalAddress`
 /// (`xenia-canary/src/xenia/memory.cc:528-545` and `:2317-2326`).
 ///
 /// Canary `PhysicalHeap::GetPhysicalAddress`:
 /// ```c
 ///   address -= heap_base_;
 ///   if (heap_base_ >= 0xE0000000) { address += 0x1000; }
 ///   return address;
 /// ```
 /// The three physical heap bases (0xA0000000 / 0xC0000000 / 0xE0000000) all
 /// alias the same 512 MB physical window, so `address - heap_base ==
 /// address & 0x1FFFFFFF` for each. The only region-specific delta is the
 /// `+0x1000` host-address-offset for the 0xE0000000+ 4 KB mirror — see
 /// `memory.h:368-372` (`host_address_offset` for `heap_base >= 0xE0000000`).
 /// For non-physical / sub-0x1FFFFFFF virtual addresses canary returns the
 /// address unchanged, which equals `address & 0x1FFFFFFF` there too.
 pub(crate) fn translate_physical_address(virt: u32) -> u32 {
    let phys = virt & 0x1FFF_FFFF;
    if virt >= 0xE000_0000 {
        phys + 0x1000
    } else {
        phys
    }
 }
 fn mm_get_physical_address(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
-    // r3 = virtual address -> return physical address
+    // r3 = virtual address -> return physical address.
-    ctx.gpr[3] &= 0x1FFF_FFFF; // Mask to 512MB physical
+    // Region-aware, mirroring canary (see `translate_physical_address`).
    ctx.gpr[3] = translate_physical_address(ctx.gpr[3] as u32) as u64;
 }
 fn mm_query_address_protect(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
@@ -1480,20 +1515,35 @@ fn nt_query_information_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mu
        *size
    };
-    // Root-of-device opens (`game:\`, `cache:\`, `partition0`) strip to
+    // Snapshot what we need from the handle, then drop the borrow so we can
-    // an empty string post-prefix — see `open_vfs_file`'s synth path.
+    // re-resolve the path against the VFS for its real attribute byte.
-    // Games query these as directories (DirectoryObject probe), and
+    let path = path.clone();
    // reporting `Directory=0` makes Sylpheed treat the open as "found a
    // non-directory where I expected a directory" and call
    // `XamShowDirtyDiscErrorUI`. Canary's `NtQueryInformationFile` pulls
    // the real file-system entry's kind; we key on path shape since we
    // don't model directory entries.
    let is_directory = path.is_empty()
        || path.ends_with('/')
        || path.ends_with(':');
    let size = live_size;
    let position = *position;
    // Pull the REAL GDFX attribute byte (canary `disc_image_device.cc:154`)
    // for disc-backed handles by re-resolving the stored path. Root-of-device
    // opens (`game:\`, `cache:\`, `partition0`) strip to an empty string and
    // synth-stub opens have no VFS entry — for those we fall back to the
    // path-shape heuristic. Games query these as directories (DirectoryObject
    // probe), and reporting `Directory=0` makes Sylpheed treat the open as
    // "found a non-directory where I expected a directory" and call
    // `XamShowDirtyDiscErrorUI`.
    let vfs_attributes: Option<u32> = if path.is_empty() {
        None
    } else {
        state
            .vfs
            .as_ref()
            .and_then(|vfs| vfs.stat(&path).ok())
            .map(|e| e.attributes)
            .filter(|&a| a != 0)
    };
    let is_directory = match vfs_attributes {
        Some(a) => (a & 0x10) != 0,
        None => path.is_empty() || path.ends_with('/') || path.ends_with(':'),
    };
    // `FILE_ATTRIBUTE_DIRECTORY` (NT / Xbox) — advertised in
    // `FileNetworkOpenInformation.FileAttributes`; Sylpheed's async-I/O
    // worker queries with class=34 and the calling code checks this bit
@@ -1532,10 +1582,13 @@ fn nt_query_information_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mu
            }
            mem.write_u64(file_info + 32, size);
            mem.write_u64(file_info + 40, size);
-            let attrs = if is_directory {
+            // Prefer the real GDFX attribute byte; fall back to the
-                FILE_ATTRIBUTE_DIRECTORY
+            // DIRECTORY/NORMAL split for root-of-device and synth-stub
-            } else {
+            // handles that have no VFS entry.
-                FILE_ATTRIBUTE_NORMAL
+            let attrs = match vfs_attributes {
                Some(a) => a,
                None if is_directory => FILE_ATTRIBUTE_DIRECTORY,
                None => FILE_ATTRIBUTE_NORMAL,
            };
            mem.write_u32(file_info + 48, attrs);
            mem.write_u32(file_info + 52, 0); // pad
@@ -1738,7 +1791,18 @@ fn nt_query_full_attributes_file(ctx: &mut PpcContext, mem: &GuestMemory, state:
                mem.write_u32(out + 28, filetime as u32);
                mem.write_u64(out + 32, entry.size);
                mem.write_u64(out + 40, entry.size);
-                let attrs: u32 = if entry.is_directory { 0x10 } else { 0x80 };
+                // Use the REAL GDFX attribute byte forwarded by the VFS
                // (canary `disc_image_device.cc:154`) instead of a
                // path-shape guess. Disc rips never carry a 0-attribute
                // entry, but guard anyway so a synthesised/legacy entry
                // still advertises a sane DIRECTORY/NORMAL split.
                let attrs: u32 = if entry.attributes != 0 {
                    entry.attributes
                } else if entry.is_directory {
                    0x10
                } else {
                    0x80
                };
                mem.write_u32(out + 48, attrs);
                mem.write_u32(out + 52, 0);
            }
@@ -1859,6 +1923,7 @@ fn nt_query_directory_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut
                    is_directory: e.is_directory,
                    size: e.size,
                    offset: e.offset,
                    attributes: e.attributes,
                })
            })
            .collect(),
@@ -1909,7 +1974,12 @@ fn nt_query_directory_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut
        mem.write_u64(base + 0x20, 0);
        mem.write_u64(base + 0x28, entry.size);
        mem.write_u64(base + 0x30, entry.size);
-        let attrs = if entry.is_directory {
+        // Real GDFX attribute byte (canary `disc_image_device.cc:154`);
        // fall back to the directory/normal split only for legacy entries
        // that carry no attribute bits.
        let attrs = if entry.attributes != 0 {
            entry.attributes
        } else if entry.is_directory {
            FILE_ATTRIBUTE_DIRECTORY
        } else {
            FILE_ATTRIBUTE_NORMAL
@@ -1977,14 +2047,29 @@ fn nt_close(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) {
        // so a later scheduler round doesn't try to signal a dead handle.
        // `disarm_timer` is a no-op for non-timer handles.
        state.disarm_timer(handle);
        // AUDIT-059 R34: return the slot to the recycle FIFO so a later
        // `alloc_handle` mints the same ID (matching canary's slab).
        state.release_handle_slot(handle);
    }
    ctx.gpr[3] = 0;
 }
 fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
-    // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state
+    // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
    //
    // Xenon DISPATCHER_HEADER `Type` (NT convention):
    //   0 = NotificationEvent   (manual-reset)
    //   1 = SynchronizationEvent (auto-reset)
    // Canary: `xboxkrnl_threading.cc:668` `ev->Initialize(!event_type, !!initial_state)`
    // with `XEvent::Initialize(bool manual_reset, ...)` (xevent.cc:25) and
    // `InitializeNative` (xevent.cc:41 `case 0x00: manual_reset_ = true`).
    // So `manual_reset = (event_type == 0)`. The Ke-path
    // (`ensure_dispatcher_object`) was already correct; the Nt-path here was
    // inverted, mis-classifying Sylpheed's per-frame VSync gate (type=1 auto +
    // initial=1) as manual-reset+signaled → it stayed signaled forever and
    // tid=1's main loop spun ~2800x canary's 60Hz.
    let handle_ptr = ctx.gpr[3] as u32;
-    let manual_reset = ctx.gpr[5] != 0;
+    let manual_reset = ctx.gpr[5] == 0;
    let signaled = ctx.gpr[6] != 0;
    let handle = state.alloc_handle_for(KernelObject::Event {
        manual_reset,
@@ -1998,6 +2083,9 @@ fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelSt
        mem,
        "NtCreateEvent",
    );
    // ITERATE-2C Phase D — audit-049 auto-signal POC. Env-gated; no-op
    // when `XENIA_SILPH_UI_AUTOSIGNAL_DELAY` is unset.
    state.maybe_register_silph_autosignal(handle, ctx, mem);
    if handle_ptr != 0 {
        mem.write_u32(handle_ptr, handle);
    }
@@ -2085,7 +2173,7 @@ fn nt_set_timer_ex(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelSt
    // timebase separately (immutable borrow) before any mutation of the
    // object to keep the borrow-checker happy.
    let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
-    let now = state.scheduler.ctx(hw_id).timebase;
+    let now = state.now_basis_at(hw_id);
    // Read signed i64 due_time (big-endian hi/lo — same pattern as
    // parse_timeout). Negative = relative-from-now, positive = absolute
@@ -3081,13 +3169,18 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
            // safer to cap the read at the known total size to avoid OOB.
            let mut tiled = Vec::with_capacity(total_tiled_bytes);
            let mut ok = true;
            // The frontbuffer is a guest *physical* address; project onto the
            // committed backing window (see `xenia_gpu::physical_to_backing`)
            // so the present reads the pixels the GPU resolved, not a stale /
            // zero mirror page.
            let fb_backing = xenia_gpu::physical_to_backing(swap.frontbuffer_phys);
            for i in 0..total_tiled_bytes {
                // read_u8 is cheap — the VirtualMemory handler returns 0
                // for unmapped pages so we get a recognisable dark frame
                // rather than a crash if the address turned out bogus.
-                let addr = swap.frontbuffer_phys.wrapping_add(i as u32);
+                let addr = fb_backing.wrapping_add(i as u32);
                tiled.push(mem.read_u8(addr));
-                if addr < swap.frontbuffer_phys {
+                if addr < fb_backing {
                    ok = false;
                    break;
                }
@@ -3509,7 +3602,7 @@ pub(crate) fn parse_timeout(state: &KernelState, timeout_ptr: u32, mem: &GuestMe
        return Some(Some(0)); // poll
    }
    let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
-    let now = state.scheduler.ctx(hw_id).timebase;
+    let now = state.now_basis_at(hw_id);
    // Negative = relative, positive = absolute wall-clock. Our timebase is a
    // plain instruction counter, so we treat all timeouts as "time-units
    // after now" regardless of sign, using the magnitude.
@@ -4817,12 +4910,14 @@ mod tests {
                    is_directory: false,
                    size: 0x1000,
                    offset: 0,
                    attributes: 0x81, // NORMAL | READONLY
                },
                xenia_vfs::VfsEntry {
                    name: "dat".into(),
                    is_directory: true,
                    size: 0,
                    offset: 0,
                    attributes: 0x11, // DIRECTORY | READONLY
                },
                // A grandchild — must NOT appear in root enumeration.
                xenia_vfs::VfsEntry {
@@ -4830,6 +4925,7 @@ mod tests {
                    is_directory: false,
                    size: 0x2000,
                    offset: 0,
                    attributes: 0x81,
                },
            ],
        }));
@@ -4856,9 +4952,11 @@ mod tests {
        // NextEntryOffset.
        let mut cursor: u32 = 0;
        let mut names: Vec<String> = Vec::new();
        let mut attrs: Vec<u32> = Vec::new();
        loop {
            let entry_base = buf + cursor;
            let name_len = mem.read_u32(entry_base + 0x3C) as usize;
            attrs.push(mem.read_u32(entry_base + 0x38));
            let mut bytes = Vec::with_capacity(name_len);
            for i in 0..name_len as u32 {
                bytes.push(mem.read_u8(entry_base + 0x40 + i));
@@ -4871,6 +4969,12 @@ mod tests {
            cursor += next;
        }
        assert_eq!(names, vec!["default.xex", "dat"]);
        // The real GDFX attribute byte must be forwarded verbatim: the file
        // reports NORMAL|READONLY (no DIRECTORY bit), the directory reports
        // DIRECTORY|READONLY.
        assert_eq!(attrs, vec![0x81, 0x11]);
        assert_eq!(attrs[0] & 0x10, 0, "file must not advertise DIRECTORY");
        assert_ne!(attrs[1] & 0x10, 0, "dir must advertise DIRECTORY");
        // A second call on the same handle must return NO_MORE_FILES —
        // the cursor has advanced past the end.
        ctx.gpr[3] = handle as u64;
@@ -6390,4 +6494,23 @@ mod tests {
        assert!(resolved.ends_with("etc/foo"));
        std::fs::remove_dir_all(&dir).ok();
    }
    /// `MmGetPhysicalAddress` must be region-aware, matching canary's
    /// `PhysicalHeap::GetPhysicalAddress`: the 0xE0000000+ 4 KB mirror gets a
    /// `+0x1000` host-address-offset; every other region is a flat
    /// `& 0x1FFFFFFF` mask.
    #[test]
    fn mm_get_physical_address_region_aware() {
        // 0xE0000000 mirror: canary `address - heap_base (==addr & 0x1FFFFFFF)`
        // then `+ 0x1000`.
        assert_eq!(translate_physical_address(0xE000_0000), 0x0000_1000);
        assert_eq!(translate_physical_address(0xE000_5000), 0x0000_6000);
        assert_eq!(translate_physical_address(0xFFFF_F000), 0x1FFF_F000 + 0x1000);
        // 0xA0000000 / 0xC0000000 physical heaps: flat mask, no offset.
        assert_eq!(translate_physical_address(0xA000_0000), 0x0000_0000);
        assert_eq!(translate_physical_address(0xC012_3000), 0x0012_3000);
        // Virtual / already-physical (< 0x20000000): unchanged.
        assert_eq!(translate_physical_address(0x0012_3000), 0x0012_3000);
        assert_eq!(translate_physical_address(0x4012_3000), 0x0012_3000);
    }
 }
--- a/crates/xenia-kernel/src/state.rs
+++ b/crates/xenia-kernel/src/state.rs
@@ -17,6 +17,16 @@ impl PcrWriter for GuestMemoryPcr<'_> {
        // `GuestMemory::write_u32` takes `&self` post-M2 trait flip; the
        // wrapping `&'a GuestMemory` is sufficient.
        self.0.write_u32(pcr_base + 0x2C, hw_id as u32);
        // PRCB.current_cpu byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC).
        // Canary writes `GetFakeCpuNumber(affinity)` here (xthread.cc:847
        // `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread
        // id we already compute. Guest spin-barriers (e.g. sub_824D1328, used by
        // the audio/update pump threads at entries 0x824D2878/0x824D2940) index a
        // per-HW-thread occupancy array by `lbz r11, 268(r13)` = this byte. Left
        // unwritten it stayed 0 for every thread, so all threads collided on
        // slot 0 and the multi-thread rendezvous signature never assembled —
        // the pump threads spun forever and never fired their KeSetEvent loops.
        self.0.write_u8(pcr_base + 0x10C, hw_id);
    }
 }
@@ -56,6 +66,18 @@ pub struct KernelState {
    /// publish; observers (the kernel object table) are guarded by
    /// their own synchronization.
    next_handle: std::sync::atomic::AtomicU32,
    /// AUDIT-059 R34: FIFO free list of closed handle slots, mirroring
    /// canary's slab/free-list `ObjectTable`. Without this, ours' bump
    /// allocator monotonically grows so a recycled slot in canary
    /// (e.g. `F8000098` reused 130× per 30s) corresponds to a fresh,
    /// never-reused slot in ours — the kernel-object identity drifts.
    /// Recycling closes that gap and (per AUDIT-042 / R30) may
    /// side-effect-unwedge γ-cluster #2 by letting silph signals land
    /// on the same handle slot the wait registered for. Population is
    /// gated on `KernelState::release_handle_slot` (only IDs in
    /// `[HANDLE_BASE, 0xF000_0000)` are recycled — synthetic XAudio
    /// handles at `0xF000_0000+` are reserved and must never be reused).
    free_handles: std::collections::VecDeque<u32>,
    /// Scheduler managing all emulated HW threads + their per-slot
    /// runqueues. Starts empty — the app installs the initial guest thread
    /// on slot 0 via `KernelState::install_initial_thread` once it has the
@@ -279,6 +301,17 @@ pub struct KernelState {
    /// Settable via `--audit-r3-dump-bytes` /
    /// `XENIA_AUDIT_R3_DUMP_BYTES`.
    pub audit_r3_dump_bytes: Option<u32>,
    /// iterate-2E — diagnostic pointer-chase. `(reg, off)`: on every
    /// `AUDIT-PC-PROBE` fire, treat `gpr[reg]` as a base object pointer,
    /// dump its first 64 bytes, then follow `[base+off]` to a sub-object
    /// (e.g. a stream/file object held in a work item), dump ITS first 64
    /// bytes, then follow `[[base+off]+0]` to the sub-object's vtable and
    /// dump the first 48 u32 slots. Designed to capture the live work-item
    /// + stream object + vtable at `sub_824510E0` entry (r4 = work item,
    /// stream at +36, vtable[28] = the "is-read-done?" predicate) BEFORE
    /// the pool recycles the slot. Read-only; lockstep digest unaffected.
    /// Settable via `XENIA_AUDIT_DEREF=<reg>:<off>` (e.g. `4:36`).
    pub audit_deref: Option<(u8, u32)>,
    /// M12 — diagnostic. PCs at which to emit a structured JSONL record
    /// per fire, designed for diffing against xenia-canary's
    /// `--log_lr_on_pc` patch output. Each line carries
@@ -313,6 +346,42 @@ pub struct KernelState {
    pub silph_synth_handles: [Option<u32>; 4],
    /// AUDIT-2.BF — `ThreadRef` cache for the 4 synthetic workers.
    pub silph_synth_refs: [Option<xenia_cpu::ThreadRef>; 4],
    /// ITERATE-2C Phase D — auto-signal delay for silph::UImpl
    /// `NtCreateEvent` calls (see [`Self::maybe_register_silph_autosignal`]).
    /// `None` = feature disabled; populated once from
    /// `XENIA_SILPH_UI_AUTOSIGNAL_DELAY=<u64>` at construction.
    pub silph_autosignal_delay: Option<u64>,
    /// ITERATE-2C Phase D — pending auto-signal queue. Drained each
    /// outer round by [`Self::fire_due_silph_autosignals`].
    pub silph_autosignal_pending: Vec<AutoSignalPending>,
    /// ITERATE-2C Phase D — most recent `stats.instruction_count`
    /// deposited by the scheduler loop (see
    /// [`Self::set_now_cycle_hint`]). Used by
    /// [`Self::maybe_register_silph_autosignal`] to compute absolute
    /// deadlines, since `nt_create_event` doesn't see `ExecStats`.
    pub last_cycle_hint: u64,
    /// ITERATE-2C Phase D — one-shot diagnostic latch. Flipped by
    /// [`Self::fire_due_silph_autosignals`] on the first visit where
    /// the pending queue is non-empty but no entry is due yet.
    pub silph_autosignal_diag_logged: bool,
    /// ITERATE-2J — guest VA of the `KeTimeStampBundle` block (xboxkrnl
    /// data export ordinal 0x00AD). Set during the import-patch pass in
    /// `xenia-app`. Zero until then. The guest's worker-hub channel
    /// dispatch loop polls `[block+0x10]` (`tick_count`, milliseconds) and
    /// gates dispatch on a `tick_count + 66` deadline; if the block is
    /// never re-written that deadline never elapses and the hub spins
    /// forever (the tid14 0x109c starvation gate). The run loop ticks this
    /// block every round from the deterministic `global_clock` via
    /// [`Self::update_timestamp_bundle`].
    pub timestamp_bundle_addr: u32,
 }
 /// ITERATE-2C Phase D — one queued auto-signal. `deadline_cycle` is
 /// absolute (cycle hint at register time + configured delay).
 #[derive(Debug, Clone, Copy)]
 pub struct AutoSignalPending {
    pub handle: u32,
    pub deadline_cycle: u64,
 }
 impl KernelState {
@@ -338,6 +407,7 @@ impl KernelState {
        let mut state = Self {
            exports: HashMap::new(),
            next_handle: AtomicU32::new(0x1000),
            free_handles: std::collections::VecDeque::new(),
            scheduler,
            next_tls_index: AtomicU32::new(0),
            cs_waiters: HashMap::new(),
@@ -379,6 +449,7 @@ impl KernelState {
            audit_pc_probe_pcs: std::collections::HashSet::new(),
            audit_mem_read_addr: None,
            audit_r3_dump_bytes: None,
            audit_deref: None,
            lr_trace_pcs: std::collections::HashSet::new(),
            lr_trace_writer: None,
            dump_addrs: Vec::new(),
@@ -387,6 +458,13 @@ impl KernelState {
            silph_synth_ctx: 0,
            silph_synth_handles: [None; 4],
            silph_synth_refs: [None; 4],
            silph_autosignal_delay: std::env::var("XENIA_SILPH_UI_AUTOSIGNAL_DELAY")
                .ok()
                .and_then(|v| v.parse::<u64>().ok()),
            silph_autosignal_pending: Vec::new(),
            last_cycle_hint: 0,
            silph_autosignal_diag_logged: false,
            timestamp_bundle_addr: 0,
        };
        crate::exports::register_exports(&mut state);
        crate::xam::register_exports(&mut state);
@@ -660,12 +738,39 @@ impl KernelState {
    }
    pub fn alloc_handle(&mut self) -> u32 {
        // AUDIT-059 R34: prefer recycling a closed slot (FIFO, matching
        // canary's `ObjectTable` slab) before bumping. The Arc<Mutex<
        // KernelState>> already serializes us; no extra synchronization.
        if let Some(slot) = self.free_handles.pop_front() {
            return slot;
        }
        // M2.4: lock-free fetch_add. Relaxed is sufficient — IDs are
        // opaque tokens; no payload is sequenced against the counter.
        self.next_handle
            .fetch_add(4, std::sync::atomic::Ordering::Relaxed)
    }
    /// AUDIT-059 R34. Return a freshly-closed handle slot to the FIFO
    /// recycle queue. No-op for the synthetic XAudio range (`>= 0xF000_0000`,
    /// AUDIT-048) and the reserved `< 0x1000` band. Call site: `nt_close`'s
    /// `objects.remove` branch when refcount reaches zero.
    ///
    /// ABA guard (subsystem-audit 2026-06-12): never recycle a slot that a
    /// thread is still parked on. Without this, a closed slot could be
    /// re-minted for a new object and a signal on that new object would wake
    /// the stale waiter that was blocked on the OLD object at the same slot.
    /// Such a slot is simply leaked (it stays out of `free_handles`),
    /// reproducing the pre-R34 bump-only behaviour for that rare case.
    pub fn release_handle_slot(&mut self, handle: u32) {
        if handle < 0x1000 || handle >= 0xF000_0000 {
            return;
        }
        if self.scheduler.any_thread_waiting_on(handle) {
            return;
        }
        self.free_handles.push_back(handle);
    }
    pub fn alloc_handle_for(&mut self, obj: KernelObject) -> u32 {
        let h = self.alloc_handle();
        self.objects.insert(h, obj);
@@ -770,6 +875,173 @@ impl KernelState {
        self.audit.record_wake(handle, entry);
    }
    /// ITERATE-2C Phase D — deposit the latest scheduler instruction
    /// count so `nt_create_event` can compute absolute auto-signal
    /// deadlines. Called once per outer round from the app's
    /// `coord_pre_round` site. No-op when the feature env is unset.
    pub fn set_now_cycle_hint(&mut self, now_cycle: u64) {
        self.last_cycle_hint = now_cycle;
    }
    /// ITERATE-2J — tick the `KeTimeStampBundle` block (xboxkrnl ordinal
    /// 0x00AD) from the deterministic monotonic clock so the guest sees a
    /// clock that *advances*.
    ///
    /// `clock` is the scheduler's `global_clock` — a pure function of
    /// retired guest instructions (see [`Self::now_basis_at`] /
    /// `Scheduler::global_clock`). Lockstep floors it up to
    /// `stats.instruction_count` each round; parallel sums per-block
    /// retired counts. Using it (rather than wall-clock) keeps every
    /// guest-visible time value a deterministic function of guest progress,
    /// so lockstep stays byte-reproducible.
    ///
    /// ## Cadence
    /// The existing kernel time math (`parse_timeout` in `exports.rs`)
    /// already treats **1 `global_clock` unit ≈ 100 ns**: it converts a
    /// signed 100-ns `LARGE_INTEGER` timeout to a deadline by dividing the
    /// magnitude by 100 and adding it to `now` (= `global_clock`). To stay
    /// coherent with that, this method uses the same scale:
    ///
    /// * `interrupt_time` / `system_time` (100-ns units): `clock` (with a
    ///   FILETIME epoch base added to `system_time`).
    /// * `tick_count` (milliseconds): `clock / INSTRUCTIONS_PER_MS` where
    ///   `INSTRUCTIONS_PER_MS = 10_000` (10_000 × 100 ns = 1 ms).
    ///
    /// At 10_000 clock-units/ms, the guest's `tick_count + 66` ms hub
    /// deadline elapses by ~660_000 retired instructions — very early in a
    /// ~1 B-instruction boot — while a 16 ms `KeWait` timeout
    /// (`parse_timeout`: 160_000 units) still resolves to 16 ms of
    /// tick_count, so no timeout collapses to "instant". The two readers
    /// share one scale.
    pub fn update_timestamp_bundle(&self, mem: &GuestMemory, clock: u64) {
        let block = self.timestamp_bundle_addr;
        if block == 0 {
            return;
        }
        const INSTRUCTIONS_PER_MS: u64 = 10_000;
        // FILETIME epoch base (~2021) so `system_time` is a plausible
        // absolute wall-clock; matches the constant used by
        // `ke_query_system_time`. interrupt_time is "since boot" so it
        // starts at the clock origin (no epoch offset).
        const FILETIME_BASE: u64 = 132_500_000_000_000_000;
        let interrupt_time: u64 = clock;
        let system_time: u64 = FILETIME_BASE.wrapping_add(clock);
        let tick_count: u32 = (clock / INSTRUCTIONS_PER_MS) as u32;
        // BE writes (write_u64/write_u32 use to_be_bytes) — guest is BE.
        mem.write_u64(block, interrupt_time); // +0x00 interrupt_time
        mem.write_u64(block + 0x08, system_time); // +0x08 system_time
        mem.write_u32(block + 0x10, tick_count); // +0x10 tick_count (ms)
        mem.write_u32(block + 0x14, 0); // +0x14 padding
    }
    /// ITERATE-2C Phase D — register a freshly-allocated event for
    /// auto-signal after the configured delay, **iff** the creating
    /// thread matches the silph::UImpl tid=13 chain that wedges in
    /// audit-049. Filter:
    ///
    /// * Env `XENIA_SILPH_UI_AUTOSIGNAL_DELAY` set (= delay non-None)
    /// * Frame-1 LR (the guest caller's post-bl PC, walked one step up
    ///   from the live thunk-wrapper frame) is in
    ///   `[0x821CB15C, 0x821CB160]` — this is the `NtCreateEvent` call
    ///   site inside `sub_821CB030+0x128`. The live `ctx.lr` is the
    ///   thunk wrapper's return slot (e.g. `0x824a9f6c`), so we walk
    ///   one back-chain step to reach the actual guest caller.
    /// * Creating thread's `start_entry == 0x821748F0` (silph trampoline)
    /// * Creating thread's `start_context == 0x4024a840`
    ///
    /// On match, the handle is queued with `deadline = last_cycle_hint +
    /// delay`. Drained by [`Self::fire_due_silph_autosignals`] from the
    /// outer scheduler loop.
    pub fn maybe_register_silph_autosignal(
        &mut self,
        handle: u32,
        ctx: &PpcContext,
        mem: &GuestMemory,
    ) {
        let Some(delay) = self.silph_autosignal_delay else {
            return;
        };
        let Some((entry, start_ctx)) = self.scheduler.current_thread_entry_and_ctx() else {
            return;
        };
        if entry != 0x821748F0 || start_ctx != 0x4024_a840 {
            return;
        }
        let frames = walk_guest_back_chain(ctx.gpr[1] as u32, ctx.lr as u32, mem, 2);
        let caller_lr = match frames.get(1) {
            Some((_, lr)) => *lr,
            None => return,
        };
        if !(0x821CB15C..=0x821CB160).contains(&caller_lr) {
            return;
        }
        let deadline = self.last_cycle_hint.saturating_add(delay);
        self.silph_autosignal_pending
            .push(AutoSignalPending { handle, deadline_cycle: deadline });
        tracing::info!(
            "silph autosignal: scheduled handle={:#x} caller_lr={:#x} for cycle {} (now={}, delay={})",
            handle,
            caller_lr,
            deadline,
            self.last_cycle_hint,
            delay,
        );
    }
    /// ITERATE-2C Phase D — drain pending entries whose deadline has
    /// passed. Each fires by setting `Event { signaled = true }` and
    /// invoking the existing `wake_eligible_waiters` to release blocked
    /// waiters. No-op when the queue is empty (the common case).
    pub fn fire_due_silph_autosignals(&mut self, now_cycle: u64) {
        if self.silph_autosignal_pending.is_empty() {
            return;
        }
        let any_due = self
            .silph_autosignal_pending
            .iter()
            .any(|p| p.deadline_cycle <= now_cycle);
        if !any_due {
            // Diagnostic for the Phase D POC: log first time we visit
            // with a non-empty queue but nothing due yet.
            if !self.silph_autosignal_diag_logged {
                self.silph_autosignal_diag_logged = true;
                if let Some(first) = self.silph_autosignal_pending.first() {
                    tracing::info!(
                        "silph autosignal: tick (first visit, none due) now={} pending={} first_deadline={}",
                        now_cycle,
                        self.silph_autosignal_pending.len(),
                        first.deadline_cycle,
                    );
                }
            }
        }
        let mut i = 0;
        while i < self.silph_autosignal_pending.len() {
            if self.silph_autosignal_pending[i].deadline_cycle <= now_cycle {
                let p = self.silph_autosignal_pending.swap_remove(i);
                let prev = match self.objects.get_mut(&p.handle) {
                    Some(KernelObject::Event { signaled, .. }) => {
                        let was = *signaled;
                        *signaled = true;
                        Some(was)
                    }
                    _ => None,
                };
                tracing::info!(
                    "silph autosignal: firing handle={:#x} prev_signaled={:?} at cycle {}",
                    p.handle,
                    prev,
                    now_cycle,
                );
                self.audit_signal(p.handle, 0, "silph_autosignal", prev.unwrap_or(false) as u64);
                crate::exports::wake_eligible_waiters(self, p.handle);
                // do not advance i — swap_remove pulled a new entry into i
            } else {
                i += 1;
            }
        }
    }
    /// Diagnostic. If the live PC for HW slot `hw_id` is in
    /// `self.ctor_probe_pcs`, emit a single `CTOR-PROBE` line with
    /// the current cycle, tid, hw_id, sp, r3, lr, plus an 8-frame
@@ -936,6 +1208,38 @@ impl KernelState {
            }
            println!("{}", out);
        }
        // iterate-2E — pointer-chase: dump base object (gpr[reg]), the
        // sub-object it holds at [base+off], and that sub-object's vtable
        // slots. Captures the live work-item + stream + vtable[28] at
        // sub_824510E0 before the pool recycles the slot. Read-only.
        if let Some((reg, deref_off)) = self.audit_deref {
            use std::fmt::Write as _;
            let base = ctx.gpr[reg as usize] as u32;
            let dump64 = |label: &str, p: u32| {
                let mut s = String::with_capacity(256);
                let _ = write!(&mut s, "AUDIT-DEREF {} ptr={:#010x}", label, p);
                let mut o: u32 = 0;
                while o < 64 {
                    let _ = write!(&mut s, " +0x{:02x}={:#010x}", o, mem.read_u32(p.wrapping_add(o)));
                    o += 4;
                }
                println!("{}", s);
            };
            println!("AUDIT-DEREF-HEAD pc={:#010x} tid={} cycle={} reg=r{} off=0x{:x}", pc, tid, cycle, reg, deref_off);
            dump64("item", base);
            let sub = mem.read_u32(base.wrapping_add(deref_off));
            dump64("sub", sub);
            let vt = mem.read_u32(sub); // [sub+0] = vtable
            // Dump 48 vtable slots so slot 28 (+0x70) and slot 36 (+0x90) show.
            let mut s = String::with_capacity(512);
            let _ = write!(&mut s, "AUDIT-DEREF vtable={:#010x}", vt);
            let mut slot: u32 = 0;
            while slot < 48 {
                let _ = write!(&mut s, " [{}]={:#010x}", slot, mem.read_u32(vt.wrapping_add(slot * 4)));
                slot += 1;
            }
            println!("{}", s);
        }
    }
    /// M12 — diagnostic. If the live PC for HW slot `hw_id` is in
@@ -1063,6 +1367,30 @@ impl KernelState {
        self.pending_timer_fires.first().map(|&(d, _)| d)
    }
    /// Coherent "now" basis for deadline arithmetic — the scheduler's
    /// single monotonic `global_clock`, in BOTH execution modes.
    ///
    /// Per-thread `ctx(hw_id).timebase` is NOT a sound "now" for deadline
    /// arithmetic: in `--parallel` workers extract/zero their slots while
    /// stepping unlocked, and in **lockstep** a parked/poll thread has
    /// `running_idx == None` so `ctx()` returns `idle_ctx` (timebase 0).
    /// Either way a `parse_timeout` reading the per-thread basis can see 0
    /// (or a stale value) and register `deadline = 0 + relative`, a value
    /// permanently in the past, which `coord_idle_advance` then re-arms
    /// forever (the timebase-desync livelock; the render-gate root). The
    /// `global_clock` is a deterministic function of retired guest
    /// instructions (per-round `stats.instruction_count` floor-ups in
    /// lockstep, per-block retired counts in parallel), so it is coherent,
    /// monotonic, never zero after boot, and bit-reproducible across two
    /// cold lockstep runs.
    ///
    /// The `hw_id` argument is retained for call-site clarity (which slot a
    /// caller would conceptually be "asking about") but is no longer read —
    /// the basis is global.
    pub fn now_basis_at(&self, _hw_id: u8) -> u64 {
        self.scheduler.global_clock()
    }
    /// Fire every timer whose deadline is `<= now` (derived from slot 0's
    /// timebase, matching `parse_timeout`'s "current thread" fallback).
    /// For each fire: mark the timer `signaled=true`, clear its
@@ -1071,7 +1399,7 @@ impl KernelState {
    /// fired — the caller uses this to decide whether the scheduler round
    /// needs a follow-up `advance_to_next_wake_if_due` step.
    pub fn fire_due_timers(&mut self) -> bool {
-        let now = self.scheduler.ctx(0).timebase;
+        let now = self.now_basis_at(0);
        let mut fired = false;
        loop {
            let Some(&(deadline, handle)) = self.pending_timer_fires.first() else {
--- a/crates/xenia-kernel/src/thread.rs
+++ b/crates/xenia-kernel/src/thread.rs
@@ -57,6 +57,11 @@ pub fn allocate_thread_image(
    mem.write_u32(pcr_base, tls_base);
    mem.write_u32(pcr_base + 0x2C, hw_thread_id as u32);
    mem.write_u32(pcr_base + 0x100, 0x1000);
    // +0x10C  prcb_data.current_cpu — canary `pcr->prcb_data.current_cpu`
    //         (PRCB@0x100 + current_cpu@0xC). Guest spin-barriers index a
    //         per-HW-thread slot array by `lbz r11, 268(r13)` = this byte; it
    //         must equal the HW thread id (== PCR+0x2C). See state.rs PcrWriter.
    mem.write_u8(pcr_base + 0x10C, hw_thread_id);
    mem.write_u32(pcr_base + 0x150, 0);
    Some(ThreadImage {
--- a/crates/xenia-vfs/src/device.rs
+++ b/crates/xenia-vfs/src/device.rs
@@ -31,6 +31,9 @@ impl VfsDevice for HostPathDevice {
                is_directory: metadata.is_dir(),
                size: metadata.len(),
                offset: 0,
                // Host FS carries no Xbox attribute byte; synthesise the
                // DIRECTORY/NORMAL split like canary's HostPathDevice.
                attributes: if metadata.is_dir() { 0x10 } else { 0x80 },
            });
        }
        Ok(entries)
@@ -49,6 +52,7 @@ impl VfsDevice for HostPathDevice {
            is_directory: metadata.is_dir(),
            size: metadata.len(),
            offset: 0,
            attributes: if metadata.is_dir() { 0x10 } else { 0x80 },
        })
    }
 }
--- a/crates/xenia-vfs/src/disc_image.rs
+++ b/crates/xenia-vfs/src/disc_image.rs
@@ -29,6 +29,11 @@ const GDFX_MAGIC: &[u8; 20] = b"MICROSOFT*XBOX*MEDIA";
 /// File attribute: directory
 const FILE_ATTRIBUTE_DIRECTORY: u8 = 0x10;
 /// File attribute: read-only. Canary OR's this into every GDFX entry's
 /// attribute byte because a pressed disc is inherently read-only
 /// (`disc_image_device.cc:154`: `attributes | kFileAttributeReadOnly`).
 const FILE_ATTRIBUTE_READONLY: u8 = 0x01;
 /// Known game partition offsets to try
 const LIKELY_OFFSETS: &[u64] = &[
    0x0000_0000,
@@ -131,6 +136,11 @@ impl DiscImageDevice {
        let name = String::from_utf8_lossy(&buffer[p + 14..p + 14 + name_length]).to_string();
        let is_directory = (attributes & FILE_ATTRIBUTE_DIRECTORY) != 0;
        // Match canary: the on-disc attribute byte (DIRECTORY/HIDDEN/SYSTEM/
        // ARCHIVE/NORMAL bits as authored) OR the implicit READONLY bit for
        // pressed media. We forward the FULL byte, not a path-shape guess, so
        // attribute queries report exactly what the disc records.
        let attributes = (attributes | FILE_ATTRIBUTE_READONLY) as u32;
        let file_offset = self.game_offset + sector * SECTOR_SIZE;
        let full_path = if prefix.is_empty() {
            name.clone()
@@ -143,6 +153,7 @@ impl DiscImageDevice {
            is_directory,
            size: length,
            offset: file_offset,
            attributes,
        });
        // Descend into subdirectories. Zero-length directory entries exist
@@ -260,4 +271,73 @@ mod tests {
            .expect("read_file on nested path");
        assert!(!bytes.is_empty(), "nested read returned empty buffer");
    }
    /// Build a one-node GDFX directory buffer in memory and parse it with
    /// `collect_entries`, asserting the real on-disc attribute byte is
    /// forwarded into `VfsEntry.attributes` (with READONLY OR'd in, matching
    /// canary `disc_image_device.cc:154`) rather than synthesised from the
    /// path shape.
    fn parse_single_entry(name: &str, on_disc_attr: u8) -> VfsEntry {
        // GDFX dirent: node_l(u16) node_r(u16) sector(u32) length(u32)
        // attributes(u8) name_length(u8) name(bytes). The directory bit
        // gates subdirectory descent; use length=0 so a "directory" entry
        // is treated as an empty leaf and we don't recurse off the buffer.
        let mut buf = Vec::new();
        buf.extend_from_slice(&0u16.to_le_bytes()); // node_l
        buf.extend_from_slice(&0u16.to_le_bytes()); // node_r
        buf.extend_from_slice(&0u32.to_le_bytes()); // sector
        buf.extend_from_slice(&0u32.to_le_bytes()); // length (0 => leaf)
        buf.push(on_disc_attr); // attributes
        buf.push(name.len() as u8); // name_length
        buf.extend_from_slice(name.as_bytes());
        let mut dev = DiscImageDevice {
            name: "test".into(),
            path: std::path::PathBuf::new(),
            game_offset: 0,
            entries: Vec::new(),
        };
        // `file` is only touched when descending into a non-empty directory;
        // our length=0 entries never recurse, so a dummy handle is fine.
        let mut file = std::fs::File::open("/dev/null").expect("open /dev/null");
        dev.collect_entries(&mut file, &buf, 0, "").expect("parse");
        assert_eq!(dev.entries.len(), 1);
        dev.entries.into_iter().next().unwrap()
    }
    #[test]
    fn directory_entry_reports_directory_attribute() {
        // On-disc 0x10 (DIRECTORY) -> attributes carries 0x10 and READONLY.
        let e = parse_single_entry("dat", FILE_ATTRIBUTE_DIRECTORY);
        assert!(e.is_directory, "directory bit not decoded");
        assert_ne!(
            e.attributes & 0x10,
            0,
            "FILE_ATTRIBUTE_DIRECTORY must be set for a directory entry"
        );
        assert_ne!(e.attributes & 0x01, 0, "READONLY must be OR'd in (canary)");
    }
    #[test]
    fn file_entry_has_no_directory_attribute() {
        // On-disc 0x80 (NORMAL) -> not a directory; READONLY still OR'd in.
        let e = parse_single_entry("default.xex", 0x80);
        assert!(!e.is_directory, "non-directory misdecoded as directory");
        assert_eq!(
            e.attributes & 0x10,
            0,
            "FILE_ATTRIBUTE_DIRECTORY must be clear for a file entry"
        );
        assert_ne!(e.attributes & 0x80, 0, "NORMAL bit must be preserved");
        assert_ne!(e.attributes & 0x01, 0, "READONLY must be OR'd in (canary)");
    }
    #[test]
    fn archive_and_hidden_bits_are_preserved() {
        // ARCHIVE(0x20) | HIDDEN(0x02) authored on disc must survive intact.
        let e = parse_single_entry("save.dat", 0x20 | 0x02);
        assert_eq!(e.attributes & 0x20, 0x20, "ARCHIVE bit dropped");
        assert_eq!(e.attributes & 0x02, 0x02, "HIDDEN bit dropped");
        assert_eq!(e.attributes & 0x10, 0, "spurious DIRECTORY bit");
    }
 }
--- a/crates/xenia-vfs/src/lib.rs
+++ b/crates/xenia-vfs/src/lib.rs
@@ -22,6 +22,16 @@ pub struct VfsEntry {
    pub is_directory: bool,
    pub size: u64,
    pub offset: u64,
    /// Xbox `FILE_ATTRIBUTE_*` bitmask for this entry, sourced from the
    /// backing device's real on-disc metadata rather than inferred from
    /// the path shape. For GDFX disc images this is the on-disc attribute
    /// byte at dirent offset +12 OR'd with `FILE_ATTRIBUTE_READONLY`
    /// (matches xenia-canary `disc_image_device.cc:154`:
    /// `entry->attributes_ = attributes | kFileAttributeReadOnly`).
    ///
    /// Bit layout (canary `vfs/entry.h:66-76`): READONLY=0x01, HIDDEN=0x02,
    /// SYSTEM=0x04, DIRECTORY=0x10, ARCHIVE=0x20, NORMAL=0x80.
    pub attributes: u32,
 }
 /// Trait for VFS device implementations (XISO, STFS, host path, etc.)