185 Commits

Author SHA1 Message Date
MechaCat02
f75bc96d17 [iterate-2H] PPC spin/yield/sync hint-class audit: lock no-over-yield + barrier-decode invariants
Audited the full PowerPC spin/yield/sync/SMT-priority-hint instruction class
against the canary oracle (ppc_emit_alu.cc InstrEmit_orx / ppc_emit_memory.cc
sync/eieio/isync) and against what Project Sylpheed actually executes (static
scan of the extracted image + disasm of the spin sites 0x824D1328 /
0x824C17AC / 0x824D3CF8).

Findings (no behavior change required — the class is already faithful):
  - or rX,rX,rX SMT priority hints: canary special-cases EXACTLY 0x7FFFFB78
    (db16cyc) -> DelayExecution; every OTHER or-self form -> Nop. Ours already
    matches (only 0x7FFFFB78 yields). Image scan: the documented priority
    hints or 1/2/3/6/26..30 do NOT appear in Sylpheed at all; the only SMT
    spin hint used is or 31,31,31 (db16cyc), already handled in de21c7a. The
    854 `or 8,8,8` etc. are compiler register self-moves (plain no-ops), not
    spin hints.
  - sync / lwsync / ptesync share XO=598 -> all decode to PpcOpcode::sync
    (canary keys on XO only, identical); eieio (XO=854), isync (XO=150) decode
    correctly. All are value-neutral no-ops under the single-host model,
    matching canary MemoryBarrier/Nop. unimpl=0 in a 200M run confirms none
    trap. tlbsync is not implemented by canary either and is unused by Sylpheed.
  - mftb-based timed back-off (loop at 0x824D3CF8: mftb delta vs timeout, with
    db16cyc between polls and a timeout escape) relies on the already-landed
    db16cyc yield + coherent global-clock timebase; no deadlock, no new gap.
  - ori 0,0,0 canonical nop (140 sites) is value-neutral; matches canary Nop.

Lands two regression tests that lock the audited invariants so a future change
cannot over-yield on a benign priority hint (which would perturb the
deterministic schedule) or break the sync L-field decode:
  - test_smt_priority_hints_are_nops_not_yields
  - test_lwsync_ptesync_eieio_isync_decode_as_benign_noops

Determinism preserved (tests-only): two cold lockstep `check -n 5M` (no
persist) byte-identical; golden digest unchanged (no re-baseline). Full
workspace suite green. 200M cascade unchanged (packets~172M, draws=0,
shaders=0, swaps=1) — confirms the hint class is exhausted; the render gate is
now downstream (tid14 0x109c per-job completion event), not CPU semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:53:54 +02:00
MechaCat02
de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00
MechaCat02
f3b7e8b760 [iterate-2F] Scheduler anti-starvation floor: fix job-4 handoff render gate
The lockstep scheduler's pick_runnable is strict priority
(max_by_key (priority, -idx)). On a cooperative single-host HW slot,
a CPU-bound spinner that never blocks (the silph poll loop pinned by
affinity to hw=5) wins pick_runnable every round forever, permanently
starving a co-located peer (the submitter, tid6) that the spinner is
actually waiting on. On real hardware those threads run on separate SMT
contexts concurrently, so the spinner never starves the submitter; ours
collapses them onto one slot with no anti-starvation, turning priority
(or equal-priority index order) into permanent starvation.

The starved submitter never dequeued job-4 -> the worker-hub (tid5)
blocked INFINITE on completion event 0x1080 -> silph (tid13) wedged on
0x1078 -> no vsync -> draws_seen=0, the publisher splash never renders.
(decrement_quantum's within-slot rotation is dead: begin_slot_visit
unconditionally re-pick_runnable()s each round, discarding the rotated
running_idx. The fix is therefore evaluated at pick time, not via that
discarded rotation.)

Fix (Option A, bounded anti-starvation, deterministic):
- Add per-thread steps_starved counter to GuestThread.
- begin_slot_visit increments it for every Ready peer passed over this
  visit, resets it to 0 for the picked thread.
- pick_runnable selects by effective_priority: once steps_starved
  reaches STARVE_LIMIT (4096) the thread is lifted to i32::MAX and wins
  exactly one pick, then resets. The genuinely higher-priority thread
  still wins ~4095/4096 visits -- the boost grants periodic forward
  progress only, it does NOT invert priority. Pure function of
  counter/priority/index -> deterministic (no wall-clock, no RNG).

Cascade (lockstep exec, XENIA_CACHE_PERSIST=1, -n 200M):
- submitter dequeue sub_82458508 now fires 4x (was 3x); the 4th job
  (buf 0x40baa2c0) is dequeued at cycle 6.15M.
- hub tid5 leaves Blocked(0x1080) -> now Ready (no more INFINITE wait).
- GPU packets 0 -> 116,101,363 (command stream now flowing).
- tid13 (silph::UImpl) advances past the old 0x1078 wedge to a NEW
  downstream wait (handle 0x10a0); 3 new threads spawn (tid14/15/16).
- draws_seen still 0 -> the splash's first draw is a NEW downstream gate,
  not this starvation.

Determinism: two cold lockstep `check -n 5M` runs byte-identical (full
and stable digests). New n50m stable digest deterministic across two
cold runs. Golden re-baselined: instructions 50000007->50000003,
imports 92317->90296 (trajectory shift from the changed pick order).

Tests: 666/666 (+1 test_anti_starvation_bounded_progress).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:02:02 +02:00
MechaCat02
7e2603a9e5 [iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix)
Lockstep livelocked the scheduler the same way --parallel did before
0332d19: the kernel deadline-arithmetic (`now_basis_at`) read per-thread
`ctx(hw_id).timebase`, but a parked/poll thread has `running_idx == None`
so `Scheduler::ctx()` returns `idle_ctx` (timebase 0). A poll thread (tid=7,
a `KeWaitForSingleObject` loop with a 30ms relative timeout) computing its
deadline via `parse_timeout` therefore read `now = 0` and registered
`deadline = 0 + 3000 = 3000` — a constant ~7.78M units in the past.
`coord_idle_advance` then re-armed that same constant 3000 deadline forever,
pinning virtual time and starving every other thread's real future deadline.

Render-gate impact: the submitter (tid=6) re-enters a 16ms-timeout
WaitForMultiple after its first jobs; that timeout never fired because vtime
was pinned at 3000, so virtual time never reached real future deadlines.

Fix (Option A — mirror the parallel fix): drive the existing deterministic
`Scheduler::global_clock` in lockstep too (floored up once per outer round
to `stats.instruction_count`, a pure function of retired guest instructions —
no wall-clock), and route `KernelState::now_basis_at` through `global_clock()`
in BOTH modes. New `Scheduler::advance_global_clock_to(now)` floor-up keeps it
monotone alongside `advance_all_timebases_to`. Parallel behavior unchanged
(it already read `global_clock()`).

Verified (lockstep, 50M):
- DETERMINISM: two cold `check -n 5M` and two cold `-n 50M` runs byte-identical.
- LIVELOCK GONE: "advanced to deadline" went from 592,679 fires / 2 unique
  values / 562,084 pinned at 3000  ->  18,586 fires / 18,567 unique /
  0 pinned, strictly increasing 5.4M -> 50M. Poll thread tid=7 now ends
  Blocked with a real future deadline Some(60002824) instead of spin-Ready
  on the past 3000.
- imports 1,790,936 -> 92,317 at 50M (the spin no longer burns import calls).

Cascade (lockstep, XENIA_CACHE_PERSIST=1, -n 200M): engine now runs to budget
instead of hard-deadlocking. Hub enqueue (sub_82458068) 4x; submitter dequeue
(sub_82458508) still 3x — the lost 4th-job HANDOFF (count/notify between
sub_82458068's tail and the submitter queue) is a SEPARATE downstream gate,
not the timebase. New gate: tid=5 (hub) Blocked INFINITE on event 0x1080
(job-4 completion); tid=6 (submitter) Ready, parked in WaitForMultiple
(sub_824AB214), loop-top stops at cycle 6.23M. draws still 0, VdSwap 1.

Golden re-baseline (same commit): sylpheed_n50m
  instructions 50000004 -> 50000007, imports 1790936 -> 92317
  (swaps/draws/RTs/shaders/textures unchanged). sylpheed_n2m unchanged
  (livelock onsets after 2M). Suite 665/665 + oracle green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 21:42:28 +02:00
MechaCat02
5aaadfec36 [iterate-2E] Add XENIA_AUDIT_DEREF pointer-chase probe
On each AUDIT-PC-PROBE fire, treat gpr[reg] as a base object, dump its
first 64 bytes, follow [base+off] to a sub-object, dump that, then follow
[[base+off]+0] to its vtable and dump 48 slots. Env-gated
(XENIA_AUDIT_DEREF=<reg>:<off>), read-only, lockstep digest unaffected.

Captures the live work-item + stream object + vtable at sub_824510E0
before the pool recycles the slot — which overturned the prior session's
"infinite spin" diagnosis: the streaming read PROGRESSES 68/68 128KB
chunks of a 9MB file, then the hub (tid=5) blocks INFINITE on a
self-created Event/Manual (0x1060) that is never signaled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 20:29:01 +02:00
MechaCat02
0332d1990d [Track 2] Parallel-scoped global clock fixes timebase-desync livelock
In --parallel mode a long run livelocked: the scheduler spun
"advanced to deadline 3000 waking hw=2 idx=0" ~14k times in
microseconds. Root cause: each guest thread owns ctx.timebase
(+1/instr in step_block), and all kernel deadline arithmetic read
Scheduler::ctx(hw_id).timebase as "now". But the parallel worker
extracts its PpcContext via mem::replace(ctx_mut_ref, PpcContext::new())
— leaving a ZEROED timebase in the slot while it steps unlocked — and
advance_all_timebases_to only walks runqueue (never idle_ctx). So the
coordinator's coord_pre_round drain and a woken thread's parse_timeout
could read a zeroed/stale basis decoupled from the deadline the
scheduler just advanced to. The thread re-armed the same constant
deadline forever; the global clock never moved.

Fix: add a single monotonic Scheduler::global_clock, advanced by the
per-block retired-instruction count on each parallel writeback and
floored up by advance_all_timebases_to. Kernel deadline reads route
through KernelState::now_basis_at(hw_id), which returns global_clock
ONLY when parallel_active; lockstep keeps reading the exact pre-existing
ctx(hw_id).timebase expression, so the deterministic lockstep trace is
byte-identical (sylpheed_n50m golden unchanged, zero re-baseline).

Verified:
- 50M --parallel run completes (was: hung). Deadlines now strictly
  increasing 5.4M -> 49.1M (18097 unique of 18116; max repeat 2) vs
  pre-fix constant 3000 x ~14000.
- sylpheed_n50m golden byte-identical via plain `check` (no persist).
- Full suite 665/665 green.

Note: an intermittent parallel hang/crash (~1-2/20 at -n 5M) is
pre-existing (master 1/20, this build 2/20 — within noise) and distinct
from the timebase livelock: it is a parallel-race class (e.g. the
unsafe block_ptr deref in run_execution_parallel). Tracked separately;
lockstep remains the recommendation for long runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 19:32:14 +02:00
MechaCat02
6271ba1f55 chore: gitignore vkd3d-proton/DXVK runtime shader caches
The Wine canary build drops vkd3d-proton.cache into the working dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 18:06:25 +02:00
MechaCat02
48b19e490f [Prong A] Three 32-bit ABI PPCBUG siblings corrected to canary semantics
Second differential audit, lead prong: hunt siblings of PPCBUG-020 (the
word-form ALU truncation fixed in 341196a, whose "32-bit ABI / MSR.SF=0"
premise was false — Xenon is a 64-bit core). Found three more band-aids of the
same class, each verified against the canary oracle. All three are genuine
oracle/ISA divergences but INERT on Sylpheed's lockstep trace (sylpheed_n50m
golden digest unchanged; no re-baseline). Fixed + directed tests anyway to
close the band-aid class (per audit decision).

1. slw/srw shift-count mask (PPCBUG-044 site). Ours tested the full u32 count
   `< 32`; canary InstrEmit_slwx/srwx mask `rb & 0x3F` then test bit 5. A count
   like 0x40 (low-6-bits 0) must pass the value through, not zero it. Fixed both
   to `& 0x3F`. The 32-bit CR0 i32-view is unchanged (genuinely 32-bit).

2. sraw/srawi result extension (PPCBUG-041/042/043 "writeback truncation").
   Ours zero-extended the 32-bit arithmetic-shift result (`result as u32 as u64`);
   PowerISA + canary InstrEmit_srawx/srawix SIGN-extend it (`f.SignExtend`, the
   `(i64.s)&¬m` fill). 0x80000000>>1 is now 0xFFFFFFFF_C0000000, not
   0x00000000_C0000000. CA math and CR0 view byte-identical.

3. mtspr CTR width (PPCBUG-054). Ours stored `val as u32 as u64`, dropping the
   upper 32 bits; CTR is a 64-bit SPR and canary InstrEmit_mtspr stores the full
   GPR (`f.StoreCTR(rt)`). A later `mfspr rX, CTR` now round-trips correctly.
   bdnz/bcctr still consume only CTR's low 32 bits (the bcx zero-TEST truncation
   at line ~922 MATCHES canary's `f.Truncate(ctr, INT32_TYPE)` — left untouched).

Tests: updated srawx_negative_value_sign_extends_upper,
srawix_high_count_negative_input_sign_extends_all_ones, and
mtspr_ctr_keeps_full_64_bits (formerly premise-defending the bugs —
reading-error #24). Added slwx/srwx 6-bit-mask tests, mfspr_ctr round-trip, and
the rlwinm MB>ME wraparound-mask test (plan-requested gap closure). 665/665.

Left correct (re-confirmed vs canary, do NOT touch): bcx/bclr CTR 32-bit test,
divw/divwu zero-extend quotient (canary f.ZeroExtend, ISA upper undefined),
extsb/extsh, logical-NOT chain, mulhw/mulhwu, srawx 0x3F mask, pixel pack/unpack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 17:25:41 +02:00
MechaCat02
341196a111 [Issue-1 PPCBUG-020] Word-form ALU ops produce full 64-bit results
Xenon is a 64-bit PPC core (32-bit *pointer* ABI, but 64-bit registers and
integer arithmetic). The interpreter was truncating every word-form integer
ALU writeback to 32 bits and zero-extending, on a false "MSR.SF=0 / 32-bit
ABI" premise. This silently corrupted any genuine 64-bit value flowing through
word-form arithmetic.

Confirmed load-bearing via runtime ours-vs-canary capture: Sylpheed's
millisecond->LARGE_INTEGER timeout converter sub_824ACA88 does
`clrldi; mulli r11,r11,-10000; std`. For a 16 ms wait the correct result is
-160000 = 0xFFFFFFFF_FFFD8F00 (relative). canary stores exactly that; ours'
truncating `mulli` stored 0x00000000_FFFD8F00 (positive) -> the i64 timeout
read as a huge *absolute* deadline -> a ~26000x over-wait that froze the main
frame loop. After the fix the timeout matches canary and the previously-frozen
frame/worker loops run (parallel boot NtWaitForMultipleObjectsEx 94 -> 30428;
KeWaitForSingleObject/critical-section loops resume).

Fix mirrors canary's INT64 emitters (ppc_emit_alu.cc) op-by-op for the 17
data-losing word-form ops: addis, addic(.), subfic(.), mulli, add(c/e/ze/me)x,
subf(c/e/ze/me)x, negx, mullwx. Only the result *writeback* widens to full
64 bit; the 32-bit carry (XER[CA]) and overflow (XER[OV]) computations and the
CR0 i32 view are preserved byte-identical (the low 32 bits of the new result
equal the old truncated result), so this is a strict no-op for clean 32-bit
values and only restores the previously-zeroed upper bits for genuine 64-bit
values. Genuinely-32-bit ops (rlwinm/slw/srw/cmpw, mulhw/divw whose upper bits
are ISA-undefined) are left untouched.

Updated 7 unit tests that asserted the truncation (they encoded the bug) to the
canary-correct full-64-bit values. Re-baselined the sylpheed_n50m golden
(imports 40454 -> 1790936: the unwedged frame/worker loops now cycle under the
instruction-count timebase); sylpheed_n2m unchanged (pre-frame-loop). Lockstep
determinism preserved (two 50M runs identical). Full suite 660/660.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 16:21:11 +02:00
MechaCat02
b20c99f141 [Subsystem-fixes] 6 verified ours-vs-canary divergence fixes
From the 2026-06-12 5-subsystem differential audit. All verified against
canary as oracle; 660/660 workspace tests green (655 + 5 new).

1. nt_create_event polarity (exports.rs) — `manual_reset = gpr[5] != 0`
   was INVERTED. Canary xboxkrnl_threading.cc:668 `Initialize(!event_type,..)`
   + xevent.cc:41 (type 0 = NotificationEvent = manual, type 1 = Sync = auto).
   Now `== 0`. Was the dormant 2.AI fix on chore/portable-snapshot, never
   merged. The Ke-path was already correct; only the Nt-path was wrong.

2. 2.AF deadline drain (main.rs coord_pre_round) — expired KeWait/KeDelay
   deadlines never fired under load because advance_to_next_wake_if_due was
   only called in coord_idle_advance (no-Ready-threads path). Added a
   per-round drain loop; covers BOTH lockstep and parallel outer loops since
   both call coord_pre_round. Was the dormant 2.AF fix, never merged.

3. handle slab-recycle ABA guard (state.rs + scheduler.rs) — release_handle_slot
   (my round-34 regression) recycled a closed slot even with a thread still
   parked on it, risking a stale-waiter wake when the slot is re-minted. Added
   Scheduler::any_thread_waiting_on; decline to recycle a still-waited slot.

4. vpkpx pixel-pack (vmx.rs) — wrong field mapping (~100% mismatch). Now
   exact canary ppc_emit_altivec.cc:1795 shift/mask (red 6b out[15:10] from
   w[24:19], green out[9:5] from w[14:10], blue out[4:0] from w[7:3]; no
   fabricated alpha bit). +unit test.

5. VFS GDFX attribute plumbing (vfs/*, exports.rs query fns) — VfsEntry now
   carries the real on-disc attribute byte (GDFX dirent +12, canary
   disc_image_device.cc:136/154) instead of inferring directory-ness from
   path shape. Query exports report the real FILE_ATTRIBUTE_* bits. Candidate
   driver of the XamShowDirtyDiscErrorUI gate. +tests.

6. MmGetPhysicalAddress region-aware mirror (exports.rs) — flat 0x1FFFFFFF
   mask missed canary's +0x1000 host_address_offset for 0xE0000000+ mirror
   (memory.cc:2317). Read-only query; proven byte-identical 50M digest. +test.

Investigated and intentionally NOT changed:
- zero-on-recommit: no-op; ours has no region-reuse path (bump allocators,
  free is a stub).
- 32-bit ALU writeback truncation (PPCBUG-020): documented-deliberate; premise
  (MSR.SF=0) is questionable but flipping it is out of scope here.
- KeSetEvent/NtSetEvent return value: ours returns true previous state
  (hardware-faithful); canary returns constant 1 — NOT an ours bug.

sylpheed_n50m golden will need re-baselining (legit behavior change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 14:57:38 +02:00
MechaCat02
db90ad0f7d [AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis
Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0,
start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128)
and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY
instructions. Env-gated; default off.

SR4 verdict B (partial unwedge):
- handle 0x1078 signal_attempts 0->1
- tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108
- ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15)
- New downstream wedges 0x1084 + 0x1088
- cxx_throw runtime_error on tid=5 inside R26 dispatcher
  (BST not-registered instance lhs=0x715a7af0)
- VdSwap stays 1; no draws (POC is diagnostic, not final fix)

Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive
NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST
instance upstream; this POC only does (a).

Reading-error class #20: ctx.lr at kernel export entry is the thunk
wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain
1 step to get frames[1].lr.

Reading-error class #21: --parallel and lockstep have SEPARATE outer
loops in main.rs (run_execution_parallel line 2928 vs run_execution
line 2706). Per-round hooks must be wired in BOTH paths.

Files:
- crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context
  fields + spawn() population + current_thread_entry_and_ctx() helper
- crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed
  silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint,
  maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals
- crates/xenia-kernel/src/exports.rs: hook in nt_create_event
- crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops
- audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md

Tests 655/655 green. Default behavior byte-identical when env unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 18:38:38 +02:00
MechaCat02
481591fdb2 [AUDIT-059 R-C1] Phase C: bit-28 setter hypothesis REFUTED via dump-addr
Phase A's diagnosis (bit 28 of [0x40d09a40] gets set to exit
sub_822F1AA8's loop) is falsified by direct probe + --dump-addr in 4
sub-rounds.

Key evidence:
- sub_821B55D8 candidate fn fires 0× in ours; sub_824AA858
  (XamInputSetState wrapper) fires 0× in canary too — chain is dead code
  in both engines.
- end-of-run dump shows [0x40d09a40+0] = 0x00000021, same as at entry —
  bit 28 is NEVER set.
- bcctrl at PC 0x822F1B4C (sub_822F1AA8+0xA4) fires (LR=0x822F1B50) but
  the post-bcctrl BB head 0x822F1B50 fires 0× — bcctrl never returns.
- sub_82173990 (vtable[0] of singleton at [0x828E1F08]) is the call
  target; tid=1 wedges inside this 768-byte function on a thread-join
  to handle 0x1070 (= tid=13's thread handle).
- tid=13 (entry=sub_821748F0, ctx=0x4024a840, handle=0x1070) reaches
  sub_821C4EB0 (silph::UImpl@GamePart_Title) at cycle 1882 → audit-049
  cluster IS reached, wedges on handle 0x1078 there.

C.2 force-clear POC NOT EXECUTED — would be no-op since bit 28 is never
set. Per plan stopping criterion, hand back instead of proceeding blind.

Adds reading-error class #19: disasm-pattern-match without runtime
verification (Phase A scanned 49 oris-0x1000 sites and declared one the
setter without ever observing the bit get set).

No xenia-rs source changes. Canary repo also unchanged (config edit
reverted clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:57:27 +02:00
MechaCat02
52c30d82a7 [AUDIT-059 R-A] Phase A backward-trace: divergence is sub_822F1AA8 loop exit, not factory/registry
Round-37 anchor reframe: both engines install the SAME static .rdata vtable
0x820A183C at [0x828E1F08]. Instance VAs differ only because of ε-class
allocator divergence (audit-043). vtable bytes byte-identical; the user
prompt's "factory/registry" framing was falsified.

Phase A walkthrough (rounds A1..A8):
- A.1 canary --audit_jit_prolog_pc=0x821741C8: tid=6, r3=0xBCCC4A80 (= inner
  sub-object of [0x828E1F08]'s singleton), LR=0x822F1D5C (return-from-bctrl
  inside sub_822F1AA8)
- A.2 found tid=6 spawn site sub_821746B0 at PC 0x82174824 spawning
  entry=sub_821748F0 ctx=BC365700/BC366DA0. sub_822F1AA8 ALSO spawns a
  second thread (entry=sub_822F1EE0 ctx=BCE24A40) at PC 0x822F1B08
- A.3 sub_822F1AA8 has 2 callers, both in sub_8216EA68 (its sole caller is
  sub_824AB748 = entry_point)
- A.4 ours mirror probe: sub_821746B0 enters, [0x828E2B14] gate passes,
  ExCreateThread fires returning handle 0x1070 (= tid=13). Ours' tid=13
  IS the same logical thread as canary's spawned silph initializer
- A.5 canary --audit_jit_prolog_pc=0x821749C0: fires only 2× on short-lived
  tid=17, tid=26 (the spawned initializers — NOT tid=6)
- A.6 canary --audit_jit_prolog_pc=0x822F1AA8: fires 1× on tid=6 with
  r3=0xBCE24A40 LR=0x8216EE14 (the second sub_822F1AA8 call site)
- A.7 canary --audit_jit_prolog_pc=0x824AB748 (entry_point): fires on
  tid=00000006. CONFIRMS canary's tid=6 = canary's main thread.

Verdict: identical call chain entry_point → sub_8216EA68 → sub_822F1AA8 in
both engines; same controller (ε-divergent VA, byte-identical fields).
Canary's main thread stays in sub_822F1AA8's dispatcher loop firing
sub_821741C8 ~1678×/30s. Ours' main thread exits the loop and thread-joins
on the spawned initializer (tid=13), which is itself wedged on handle 0x1078
forever.

Loop exit is gated by bit 28 of [r30+0] (the controller's flag word). Same
value 0x21 at function entry in both engines. Some code between entry and
loop check sets bit 28 in ours but not in canary. Mem-watch on 0x40d09a40
shows zero guest stores in ours' 50M parallel run — setter is either a
kernel-side store, computed alias, or probe-quantum-elided JIT store.

Phase B classification: Class 3a (state-divergence on controller object).
The vtable is the same; the controller's bit 28 evolves differently during
sub_822F1AA8 setup. Class 4 (synthesis) is now less attractive since we
correctly reach the dispatcher with the right inputs — we just exit too
soon.

Phase C will need either JIT instrumentation to identify the bit-28 setter,
or a kernel-side hook to clear bit 28 on entry to the loop check site.

Findings notes:
- round-A4b-ours-spawn-gate/FINDINGS.md (spawn topology + tid mapping)
- round-A8-ours-822F1AA8-trace/FINDINGS.md (full loop structure + bit-28 gate)

New reading-error class #18: probe-output anchor misframing (singleton[VA]=X
vtable=Y was misread as "Y is canary-only vtable" when Y is the same
.rdata vtable in both engines).

Branch: iterate-2C/silph-ui-spawn-trace off master @ 229b46c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:02:20 +02:00
MechaCat02
229b46c765 [Kernel] Slab-recycle handle allocator (AUDIT-059 R34)
Adds a FIFO free list of closed handle slots so alloc_handle returns
recycled IDs before bumping next_handle. Mirrors canary's slab-style
ObjectTable: F8000098 reused 130x per 30s window in canary, but ours'
monotonic bump allocator never reused slots — so a recycled slot in
canary maps to a fresh, never-reused slot in ours, drifting kernel
object identity per AUDIT-042's analysis.

release_handle_slot is wired into nt_close's refcount==0 branch and
gated to the canonical [0x1000, 0xF000_0000) range so synthetic
XAudio park handles (AUDIT-048) are never recycled.

Verified: all 655 workspace tests green, smoke tests at -n 50M show
NtClose 115/run with handle table renumbering active (round-34 max
handle 0x12ac vs round-16 baseline 0x12b8 over same workload). γ-
cluster #2 wedge unchanged — silph wait still parks tid=13 on the
renumbered handle (4216=0x1078 here vs 0x12a4 baseline), confirming
the wedge is independent of allocator policy. Lands as a parity
fix to bring our kernel-object identity in line with canary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-10 18:04:34 +02:00
MechaCat02
40f208ea4e [2.BF] Silph WorkerCtx: install canary's real sub-vtable at [+0x2C][0]
Round-21 pivot of the audit-059 synth-spawn module. Round 20 made the
silph::WorkerCtx workers run by attaching a 32-slot stub sub-vtable
where every entry was a `li r3, 0; blr` stub — workers spawned but
spun forever because slots 15/17 short-circuited to NULL ("no work").

Round 21 reads canary's real sub-vtable VA out of the XEX `.rdata` —
`0x8200A168` — and points `[sub_object + 0]` at it directly. The
vtable bytes live in the static image both engines map, so no guest
memory is consumed and slot 15 (= `sub_824FCCC8`) and slot 17
(= `sub_824FCE38`) — the only slots `sub_82506B08` ever calls —
become working game methods.

Discovery method (canary probes in
`audit-runs/audit-059-handle-disambiguation/round21-subvtable-canary/`):
  1. `--audit_jit_prolog_pc=0x82506B08` to catch the first WorkerCtx
     virtual-dispatch entry; `[r3+0x2C]` revealed the sub-object VA.
  2. Re-run with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to deref
     `[sub-object + 0]` = sub-vtable VA = 0x8200A168.
  3. PE inspection (`xex-text/xex-rdata` is the static image) reads
     all 31 slots; slot 15 -> sub_824FCCC8, slot 17 -> sub_824FCE38.

Smoke metrics (50M instructions, `XENIA_CACHE_PERSIST=1
XENIA_SILPH_SYNTH=1`, audit-runs/audit-059-handle-disambiguation/
round21-real-vtable/):
  * 4/4 workers spawned, no crash, no new fault
  * KeSetEvent 633885 -> 431860 (-32%)
  * KeWaitForSingleObject 258441 -> 185762 (-28%)
  * Per-handle state unchanged on the focused stalled set
    (0x1020/0x1090 still `<NO_SIGNALS_DESPITE_WAITS>`,
    0x12a4/0x12ac/0x1218/0x1224 still `<UNCREATED>`).
  * No VdSwap/draws progression observed in this window.

Verdict: B (partial). The workers no longer spin in a stub-loop —
internal call density shifted — but the focused wedge handles still
don't get signalled. Likely root cause: workers may now be waiting
on the WorkerCtx's own KEVENTs (which we synthesised at
+0x54/+0x94) for upstream work that no producer is enqueuing.

Net LOC: 29 ins / 31 del. Tests: workspace passes (lockstep app
tests, kernel 127/127, hir 288/288, scheduler 38/38).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:19:52 +02:00
MechaCat02
8683fb59ed [2.BF] Silph WorkerCtx: synthesize sub-object + vtable at [+0x2C]
Audit-059 round 19 isolated the round-18 worker fault: the four silph::
WorkerCtx worker bodies all execute the sequence

    lwz r3, 44(rN)     ; r3 = [ctx+0x2C] — sub-object pointer
    lwz r11, 0(r3)     ; r11 = sub-object vtable
    lwz r11, 60(r11)   ; r11 = sub-object vtable[15]
    mtctr r11
    bctrl

Ours left [ctx+0x2C] NULL → PC=0 fault on first virtual dispatch. Round 19
recommended materialising a sub-object whose vtable points entirely at an
existing trivial-return stub so workers idle live, returning NULL work,
without crashing.

Changes (silph_synth.rs only, +63/-6):

- Grow SILPH_CTX_SIZE 0x500 → 0x800 to embed sub-object at +0x300 and a
  32-slot sub-vtable at +0x500 in the same heap_alloc.
- After ctx header init, write sub-object pointer at [ctx+0x2C], the XEX-
  resident wrapper constant 0xBE568F00 (round-7 finding) at [ctx+0x30],
  and leave [ctx+0x28] NULL (matches canary first-fire snapshot).
- Populate every slot of the 32-entry sub-vtable with VA 0x8216CAA4, the
  first 4-byte-aligned standalone `li r3, 0; blr` stub located by a fresh
  PE-text scan (preceded by a `blr` terminating the previous function).
- Sub-object body itself is zero-filled apart from the [+0]=vtable_ptr
  write; round-19 disassembly confirms workers only touch slots 15/17.

Smoke (XENIA_SILPH_SYNTH=1, persistent cache, 5e7 instr):

- Lockstep: no crash, all 4 workers (tid=6/7/8/9) reach Ready in deep
  worker-body PCs (0x825067xx/0x825089xx/0x825091xx). Verdict (D) —
  workers run their idle loop returning NULL; existing silph waiters
  (0x1020, 0x1090) remain <NO_SIGNALS_DESPITE_WAITS> because we
  deliberately neutered productive work.
- Parallel: identical picture, no PC=0/PC=garbage fault anywhere.

No regression in 765-test suite.

Next round: feed real work-items into the intrusive ring at ctx+0x210
so workers' returned-NULL idle becomes returned-work productive; or
discover which sub-vtable slots actually need real callees (slot 15
worker drain, slot 17 producer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:04:04 +02:00
MechaCat02
b5885b8560 [2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing)
Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059
identified as never reached by ours' static-init chain (real chain entry
sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8).
Ctx layout follows round 5's live hexdump from canary:

  +0x00   vtable = 0x8200A1E8
  +0x04   self
  +0x08   intrusive list head -> self
  +0x0C   init flag = 1
  +0x10   packed byte field
  +0x18   2x float ~1.0 (UI rates)
  +0x24   flag = 1
  +0x28..+0x30  3x foreign-arena pointers (left NULL — see below)
  +0x54..+0x84  4x X_KEVENT auto-reset, state=0
  +0x94..+0xC4  4x X_KEVENT manual-reset, state=1 (pre-signaled)
  +0x210..+0x250  4-entry intrusive work-ring, empty

Worker spawn mirrors AUDIT-048's audio-worker pattern in
xaudio_register_render_driver: per-worker allocate_thread_image +
state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first
dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary
runs the equivalent chain.

ROUND 18 OUTCOME — opt-in only:

With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at
cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8)
spawns. Per task constraints this is STOP-and-report: the ctx fields
+0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018,
0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are
left NULL, and the worker bodies plausibly dereference one of them.
Synthesising those is a fresh investigation (round 19+).

With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot
completes normally (11 spawns, VdSwap=1, KeSetEvent=2,
KeReleaseSemaphore=1 — matches default baseline). The ctx remains
materialised in guest memory at the logged VA for downstream probing.

Default (env var unset): no synth, no regression.

Files:
  crates/xenia-kernel/src/silph_synth.rs   (new, 225 LOC)
  crates/xenia-kernel/src/lib.rs           (+1 LOC, register module)
  crates/xenia-kernel/src/exports.rs       (+37 LOC, hook in open_vfs_file)
  crates/xenia-kernel/src/state.rs         (+18 LOC, 4 silph_synth_* fields)

Tests: cargo test --release --workspace = 765 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 20:44:29 +02:00
MechaCat02
9340ff4592 [Audit] --audit-r3-dump-bytes: dump N bytes at r3 when probe fires
AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set,
every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line
with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap
256B). Sized for the 80-byte stack-local struct at sub_82452DC0's
`r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr).

Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest
unaffected (empty-set fast path in fire_audit_pc_probe_if_match).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 19:39:22 +02:00
MechaCat02
bcd018659b [Audit] --audit-mem-dump-chain: deref a guest address N levels for diagnosis
Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90
(PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences
its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs.
To name the resolved target we need to dump the (singleton, vtable,
vtable[0]) chain on probe firing.

Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single
guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel
emits a paired `AUDIT-MEM-READ` line with three guest reads:

  AUDIT-MEM-READ addr=0x828E1F08 val=<*addr> vtable=<**addr> \
                 vtable[0]=<***addr+0> vtable[24]=<***addr+24> ...

`vtable[24]` is included as the slot-6 method (audit-059 round 9
documented the canary silph chain dispatching slot 6 of a vtable here).

Read-only; lockstep digest unaffected. ~30 LOC across state.rs and
main.rs. `cmd_check` opts out of the flag (same policy as the existing
audit_pc_probe_hex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 12:13:42 +02:00
MechaCat02
09e59e09b7 Audit-2BF.delta: add --audit-pc-probe-hex for silph-init bctrl probe
Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned
for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8
(PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable
slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw,
cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the
vtable, slot-6 method pointer, auxiliary handle field, and embedded
sub-object vtable — so the line can be compared head-to-head with
canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644,
slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to
identify whether ours dispatches to the wrong vtable on a correct
object (case A) or to a wrong object entirely (case B).

Why this addition rather than reuse of an existing probe: --lr-trace
emits JSONL designed for canary-side diffing and only captures
r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR
flags and lr but again no memory; --ctor-probe is single-shot per PC
and walks the stack back-chain. None of them load the four indirect
fields needed to identify a vtable-shape divergence.

Implementation:
  - state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper
    `fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path
    keeps the cost to one is_empty() check per worker_prologue call
    when the flag is unused. Read-only — no guest state mutation,
    lockstep digest unchanged.
  - main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma
    parsing (tolerates `0x` prefix), settable also via
    XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner;
    cmd_check passes None so check digests are unaffected.

Probe wired into worker_prologue alongside fire_ctor_probe / fire_-
branch_probe / fire_lr_trace. Like its siblings, it fires once per
basic-block entry — known limitation (audit-045 reading-error class
13); use a block-entry PC if probing a mid-block instruction.

Verification: kernel 127/127, app 5/5 non-ignored, no behaviour
change with empty flag.

Cross-references audit-059 round 9's canary capture and lays the
groundwork for the round-10 ours-side comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 10:59:03 +02:00
MechaCat02
5a8fe21ad5 Iterate-2.BF.γ: refine is_in_callback gate to per-thread exclusion
Lockstep vsync delivery was capped at 54/run despite the ticker firing
333 periods and dispatcher being called 1.2M times. Root cause: the
blanket `is_in_callback()` gate skipped dispatch entirely whenever the
async audio path held `interrupts.saved`, which is essentially the
entire boot (audio worker rarely hits its LR_HALT_SENTINEL between
back-to-back callbacks). 5.85M dispatch_skip_in_callback events drowned
out the 55 with-pending windows.

Graphics dispatch (iterate-2.BE) runs the ISR synchronously and
restores the borrowed context before returning — it doesn't touch
`interrupts.saved`. The only real conflict is if graphics picks the
*same* thread audio borrowed (which would stomp audio's
SavedCallbackCtx). Replace the blanket gate with per-thread exclusion:
when audio is mid-flight, exclude only its `injected_ref` from
victim selection. Falls through to the existing no-victim drop if
that's the only candidate.

Lockstep (50M instr): gpu.interrupt.delivered{source=0} 54 → 295
(5.5×), all 333 ticker periods either delivered or unarmed (no more
queue_full_drops). Wallclock unchanged ~3 s.

Parallel (30M instr): 1193 → 3458 baseline lift (2.9×), no regression.

Tests: xenia-kernel 127/127, xenia-app 5/5 non-ignored. Lockstep
goldens will drift (interrupts.delivered is in the digest); deferred
to next iterate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 19:52:16 +02:00
MechaCat02
51489e34db Iterate-2.BE Path β: tick vsync from coord_idle_advance
The iterate-2.BE host-driven synchronous ISR dispatcher relies on
something queueing v-syncs. In lockstep that's `tick_vsync_instr`,
called from `coord_pre_round` per round. If the scheduler stalls into
`coord_idle_advance` (no Ready threads), the instruction counter
freezes — the accumulator stops incrementing, the ticker stops
queueing, and the dispatcher is left starved for the duration of the
idle wait.

Tick `tick_vsync_wallclock` at the top of `coord_idle_advance` so
v-syncs keep firing on host time even when the guest scheduler is
parked. The dispatcher in the outer loop drains whatever we queue on
the next iteration. Same MMIO `D1MODE_VBLANK_VLINE_STATUS` bit-set as
the production path.

Note: empirically in Sylpheed at 50M/500M instruction horizons,
`coord_idle_advance` is never reached (tids 9/10/12 stay Ready through
the early-boot deadlock), so this commit doesn't move
`gpu.interrupt.delivered{source=0}` off 54 for this title at these
horizons. It is the correct fix for the documented starvation pattern
and will activate as soon as the kernel reaches a state where Ready
threads drop to zero with timers/waits pending.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 19:22:03 +02:00
MechaCat02
9a93152981 Iterate-2.BE: host-driven synchronous graphics ISR delivery
Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.

Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.

Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.

Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
  SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
  callback path (audit-048) still uses the asynchronous LR-sentinel
  inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
  * dispatch_graphics_interrupts(kernel, mem, &mut stats,
    &mut decode_cache, thunk_map): new fn. Drains the full FIFO per
    call. Victim selection same shape (Ready preferred, else
    Blocked, skip Idle/Exited/ServicingIrq), but the call is
    synchronous - we run step_cached + import-thunk dispatch inline
    on the borrowed ctx until pc == LR_HALT_SENTINEL.
    MAX_INSTRS_PER_ISR = 1M safety budget.
  * coord_pre_round: graphics-IRQ injection call removed. Audio
    path unchanged (still calls try_inject_audio_callback).
  * run_execution + run_execution_parallel: each now owns a
    persistent isr_decode_cache and calls
    dispatch_graphics_interrupts after coord_pre_round.
  * try_inject_graphics_interrupt: deleted (118 LOC).

No new public APIs, no new dependencies, no changes to xenia-cpu.

Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 18:58:40 +02:00
MechaCat02
ac2f89a7bb Re-baseline sylpheed_n50m golden post-AUDIT-054
instructions: 50000002 → 50000001 (1-instr shift from FILE_DIRECTORY_FILE
plumbing on NtCreateFile path; all other digest fields unchanged —
imports/swaps/draws/render-targets/shaders/textures all match
prior golden).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:11:11 +02:00
MechaCat02
2a8ff9515d AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence
Track A — FILE_DIRECTORY_FILE handling. NtCreateFile's 9th parameter
`create_options` (sp+0x54 per shim_utils.h:49-50) is now read and
forwarded to open_vfs_file/open_cache_file. When the
FILE_DIRECTORY_FILE bit (0x1) is set on a `cache:\<hash>` path,
the host-side handler `mkdir -p`s instead of `File::create`'ing a
0-byte sentinel that blocked subsequent hierarchical creates of
`cache:\<hash>\<sub>\<leaf>` with NAME_COLLISION. Confirmed by
`opts=0x4021` (incl. FILE_DIRECTORY_FILE) on `cache:\d4ea4615`
and `opts=0x4020` (no DIR bit) on the leaf `.tmp` files. NtOpenFile
forwards `open_options` (r8) into the same slot per
xboxkrnl_io.cc:118-122. Closes the AUDIT-053 ζ-class VFS layout
aliasing wedge.

Track B — opt-in persistent cache root. AUDIT-038's per-process
tmpdir + wipe stays the default (preserves lockstep/oracle
determinism + dodges Sylpheed's `<hash>.tmp` journal-append-on-
reboot self-inconsistency). Persistence is now opt-in via
  * `XENIA_CACHE_ROOT=<path>` — explicit path (caller manages
    wiping); hands a stable place to drop a canary-built cache
    for cascade A/B oracle work.
  * `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache`
    (or `$HOME/.local/share/xenia-rs/cache`).

Cold-start (-n 500M, default tmpfs) with FILE_DIRECTORY_FILE fix:
swaps=1 draws=0 imports=40454 cxx_throw=0 — matches master baseline,
no regression. Cache hierarchy now mkdir-p'd correctly: `cache:/`
contains 9 hash dirs (e.g. `d4ea4615/e/`, `aab216c3/5/`) instead
of the 0-byte sentinel files AUDIT-053 found masquerading as
directories.

LOC: +88 / -14 = +74 net (≤80 budget). All 127 xenia-kernel unit
tests pass.

Trace: audit-runs/audit-054-vfs-layout-fix/
  cold-start-digest.json + warm-start-digest.json (defaults)
  persist-cold-digest.json + persist-warm-digest.json (opt-in)
  baseline-master-digest.json (master 25704c5 reference)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:11:04 +02:00
MechaCat02
25704c5811 Re-baseline sylpheed_n50m golden post-AUDIT-032
Companion to 49f3eaf (AUDIT-032 dedicated audio worker). With the
audio callback ticker now on by default, the boot trajectory at
50M instr changes:

  instructions  50000009 -> 50000002  (interpreter stop boundary shift)
  imports         407215 -> 40454     (-90% — left audio-wait busy loop)
  swaps                2 -> 1         (degenerate splash repeat lost;
                                       main thread advances past splash)
  draws                0 -> 0         (audio gate != renderer gate per
                                       AUDIT-032 methodology correction)

The 10x imports drop reflects exiting the NtWaitForSingleObjectEx
busy-wait pattern (1.49M -> 30 calls per audit-runs/audit-048-*).
Boot now reaches Stfs/Xam content/crypto init phase. The single
remaining swap is the first splash; main thread is then blocked on
a different handle (0x1280) for follow-up.

sylpheed_n2m unchanged — at 2M instr the audio worker hasn't fired
yet, so the digest is byte-identical pre/post AUDIT-032.

Verified deterministic via two consecutive --expect runs at the new
digest (cargo test -p xenia-app --test sylpheed_oracles -- --ignored
passes in 2.82s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:07:40 +02:00
MechaCat02
49f3eafa15 AUDIT-032: dedicated audio worker thread per client (Plan B)
Replaces APUBUG-PRODUCER-001's random-victim-hijack audio injection
with a dedicated per-client guest worker thread, mirroring xenia-canary's
apu/audio_system.cc:84-159 WorkerThreadMain pattern in xenia-rs's
threading model. Audio callback ticker is now safe to enable by default.

## What changed

- xenia-kernel/src/xaudio.rs: new XAudioState fields worker_handles +
  worker_refs (one slot per of XAUDIO_MAX_CLIENTS=8). Synthetic
  park-handle helper (0xF000_0000 | client_idx) — outside the normal
  alloc range so wake_eligible_waiters never finds it; the only
  legitimate state-flip is via try_inject_audio_callback.
- xenia-kernel/src/exports.rs: xaudio_register_render_driver spawns a
  64KB-stack guest thread (create_suspended=true) via
  state.scheduler.spawn after registration succeeds. Immediately flips
  the spawned thread's state from Blocked(Suspended) to
  Blocked(WaitAny[synthetic]) so it's parked but not woken. Stores the
  kernel handle so find_by_handle resolves a fresh ThreadRef after slot
  compaction. Failure paths log + leave xaudio.worker_refs[i] = None,
  in which case the ticker drops fires (no random-victim fallback).
- xenia-app/src/main.rs: try_inject_audio_callback resolves the worker
  via worker_handles[index] instead of scanning runqueues for a Ready
  or Blocked victim. The PC+r3 injection and SavedCallbackCtx capture
  are unchanged; the existing LR_HALT restore path re-blocks the
  worker on its synthetic handle for the next tick. Flag handling
  reworked: --xaudio-tick / XENIA_XAUDIO_TICK now act as explicit
  override (truthy = force on, falsey = force off, absent = use the
  KernelState default).
- xenia-kernel/src/state.rs: xaudio_tick_enabled default flipped from
  false to true. Pre-fix it was off because the random-victim hijack
  regressed swaps=2->1; with the dedicated worker that whole class of
  regression is gone.

## Cascade verification at -n 500M (audit-runs/audit-048-audio-host-pump/)

Pre-fix baseline: audit-runs/audit-047-gamma-wedges/ours-end-state.log.

| Dim | Predicted (AUDIT-032)               | Observed                        |
|-----|-------------------------------------|---------------------------------|
| A   | tid=9 leaves Blocked[0x828A3254]    | Ready @ pc=0x824d1404           |
| B   | tid=10 leaves Blocked[0x828A3230]   | Ready @ same pc/lr              |
| C   | XAudioSubmitRenderDriverFrame > 0   | Mixer setup path executed       |
| D   | KeReleaseSemaphore 0 -> non-zero    | 0 -> 1; xaudio.callback.delivered=1 |

Bonus: audit-042's tid=6 worker pair on 0x10A0+0x10A4 also went
Blocked->Ready as a downstream effect.

Boot trajectory shifted significantly: NtWaitForSingleObjectEx
1,489,791 -> 30; NtSetEvent 3,334 -> 68; new exports firing
(StfsCreateDevice, ObCreateSymbolicLink, XamContentCreateEnumerator,
XamEnumerate, XamTaskSchedule, ExCreateThread x10, KeSetAffinityThread x7,
NtCreateSemaphore x4, NtWaitForMultipleObjectsEx x94, NtDuplicateObject x14,
XeCryptSha, XeKeysConsolePrivateKeySign). The system left the
audio-wait busy loop and entered the savegame/content/crypto init phase.

swaps regressed 2 -> 1 (degenerate splash repeat lost; main thread now
advances past splash entirely, blocked on a different handle). draws
unchanged at 0 — expected per AUDIT-032 (audio gate != renderer gate).

## Tests + scope

- cargo build --release succeeds, no new warnings.
- cargo test -p xenia-kernel --lib: 127/127 pass (incl. xaudio).
- cargo test -p xenia-app --lib: 5/5 non-ignored pass.
- Lockstep goldens (sylpheed_n2m / sylpheed_n50m) WILL drift on this
  fix and need re-baselining as a follow-up commit.

75 net non-comment LOC across 4 files, well under AUDIT-032's
60-120 LOC budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:06:25 +02:00
MechaCat02
7bc9e3acac Merge analysis-overhaul/vmx-stores (M9.5 + M11.5 + VMX + SJIS/UTF-8) 2026-05-10 00:36:53 +02:00
MechaCat02
e428ce33aa M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle.
All four are smaller-scope and additive; lockstep determinism
unaffected (analyzer-only changes).

## M9.5 — __CxxFrameHandler scope-table parsing

- New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the
  three documented MSVC FuncInfo signatures (0x19930520/21/22) on
  4-byte alignment. Each match is parsed as the documented struct
  (BE u32 fields), with sanity caps on max_state / n_try_blocks /
  pointer validity.
- Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap
  (TryBlockMapEntry, 20 bytes) into one row each.
- New tables eh_funcinfo, eh_unwind_map, eh_try_blocks.
- Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) /
  10,019 unwind entries / 315 try-blocks.

## M11.5 — Static-init driver chain detection

- New `xenia_analysis::static_init` module. Walks every function
  looking for the canonical _initterm loop: lwz cursor; mtctr;
  bcctrl; addi cursor, cursor, 4 bounded by a compare against another
  constant register. Extracts (array_start, array_end) and reads
  the array.
- Reuses `function_pointer_arrays` table — drivers' arrays land with
  kind='static_init' (replacing M11's prologue-heuristic output where
  the structurally-grounded pattern fires).
- Sylpheed yield: 0 drivers detected — the binary's static-init
  structure does not match the canonical CRT loop. Infrastructure
  ready; future M11.6 can relax.

## VMX vector-store xrefs (M6 follow-up)

- Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31
  dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and
  stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode=
  'x_form_indexed'. Static resolution still requires both rA and rB
  constant.
- Sylpheed yield: 110 newly-detected stvx writes.

## Shift_JIS + UTF-8 localised-string detection (M7 follow-up)

- Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS
  X 0208 lead/trail byte ranges + half-width katakana pass-through)
  and scan_utf8 (2- and 3-byte sequences). At least one multi-byte
  unit required so pure-ASCII strings aren't double-counted.
- SJIS bytes rendered as \xHH escapes for diagnostic readability;
  full SJIS→UTF-8 decoding deferred.
- Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text)
  + 39 UTF-8.

## Tests

- +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state)
- +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern)
- +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string)

Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results
signature updated for new EH parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:36:53 +02:00
MechaCat02
b03192c772 Merge analysis-overhaul/m5.5-this-flow 2026-05-09 23:35:05 +02:00
MechaCat02
56ffa40a6a M5.5: this-flow indirect-dispatch resolution via vptr-write inference
Closes the dominant case M5 could not resolve — `lwz vt, off(this);
lwz fn, slot(vt); mtctr; bcctrl` (real C++ dispatch). Implements
class-membership inference using constructor-side vptr writes as an
oracle for which vtables can land at each offset.

## Algorithm

Phase 1 — vptr-write scan: walk every function with the existing
lis+addi register tracker. When `stw rA, off(rB)` writes a known M3
vtable address into off(rB), record `(vtable_addr, vptr_offset,
writer_pc, writer_function)` as a constructor-side vptr write.

Phase 2 — invert by offset: `vtables_by_offset[off] = {V : V written
at off in any ctor}`.

Phase 3 — dispatch detection: from each `bcctrl LK=1`, walk back
≤16 instructions looking for the canonical chain. Bail on register
clobber, branch, or label (basic-block) boundary.

Phase 4 — edge emission: for `(dispatch_pc, vptr_off, slot)`, emit one
`xrefs.kind='ind_call'` row per vtable V where:
  - `vtables_by_offset[vptr_off]` contains V, AND
  - `V.length > slot` (V actually has a method at that slot)

Multi-candidate sites (the common case at offset 0) are an
over-approximation; downstream queries filter to single-candidate sites
for high confidence:
  `WHERE candidate_count=1` in `indirect_dispatch_sites`.

## Schema

NEW TABLES:
- `vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function)`
- `indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count)`
- `indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address)`

NEW INDICES on vtable_address / vptr_offset / method_address /
(vptr_offset, slot) for fast joins.

## Sylpheed yield

- 567 vptr writes / 214 vtables / 29 offsets (offset 0 = 88%).
- 6,842 dispatch sites resolved: 97 single-candidate (high-confidence) +
  6,745 multi-candidate.
- 687,963 ind_call xref rows.
- 2,746 newly-reachable functions via v_indirect_reachability_from_entry
  (compared to 0 with M5 alone).
- Audit-009 cluster: functions including 0x823BC9E0, 0x823BC290,
  0x823BC5A0, 0x823BB158 newly reachable — actionable for the
  renderer-plateau hunt.

Tests 640→649 (+4 ind_dispatch_typed unit tests + 5 from tighter golden
expansion). Schema golden + write_analysis_results signature updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:35:05 +02:00
MechaCat02
d8766c6242 Merge vfs-cache-persistent/p0-real-disk-backing — audit-038 cache fix
Replaces the "Synthesized empty file" cache:/* stub with persistent
host-FS HostPathDevice backing. Sub_82459D18 / sub_8245D230 (cache-miss
reconstruct + resize-and-zero-fill) drop from constant fires to 0;
multi-MB of cache files persist to disk per boot. swaps=2 plateau
unmoved at -n 100M; cluster activation gate (audit-009) remains.
Tests 640 -> 645. Lockstep deterministic across 3+ reruns at
instructions=100000004 / imports=987485.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 14:34:34 +02:00
MechaCat02
77034b6cbf audit-038: persistent cache:/* VFS via host-FS backing
Replaces the "Synthesized empty file" stub for cache:/* paths with a
real host-FS HostPathDevice-style mount. Each KernelState gets a fresh
per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is
cleared on init for lockstep determinism (mirrors canary's
xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") +
HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc).

NtCreateFile now honours create_disposition for cache: paths:
  FILE_OPEN          -> NOT_FOUND if missing
  FILE_CREATE        -> NAME_COLLISION if present
  FILE_OPEN_IF       -> open or create
  FILE_OVERWRITE_IF  -> create or truncate
  FILE_OVERWRITE     -> NOT_FOUND if missing, else truncate
  FILE_SUPERSEDE     -> create or truncate

NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation)
/ NtQueryInformationFile / NtQueryFullAttributesFile route through
std::fs against the per-handle host_path; non-cache paths keep their
legacy semantics (read-only disc image, synth-empty stubs).

Verified by audit-037 cascade:
- sub_82459D18 (cache-miss restore): 0 fires (was firing constantly)
- sub_8245D230 (resize/zero-fill):  0 fires (was firing constantly)
- 105+ real cache-file writes per 500M run; 4+ MB of game data persisting
  to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc.
- Lockstep deterministic at instructions=100000004 / imports=987485
  across 3+ reruns (digest shifted as expected; goldens re-baselined).
- swaps=2 plateau still in place; cluster L1 unactivated. Cascade
  dimension D (cluster activation) — UNKNOWN, no L1 fires.

Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 14:34:27 +02:00
MechaCat02
9028021936 Merge analysis-overhaul/m9-eh-flag (M8+M9+M10+M11+M12) 2026-05-08 22:29:39 +02:00
MechaCat02
5af792c9fc M8+M9+M10+M11+M12: LOW-tier milestones — funcptr-arrays, EH flag, TLS, lr-trace
Five LOW-priority milestones bundled. Total ~700 LOC across 11 files.

## M9 — has_eh derived from pdata.flags exception bit
- New `functions.has_eh BOOLEAN NOT NULL` column. Derived from M1's
  already-parsed `pdata.flags` (bit 31 of the packed word — the
  exception-handler-present flag, distinct from bit 30 which is the
  always-1 32-bit-code flag). Index idx_functions_has_eh.
- Sylpheed: 2,975 of 23,073 pdata-validated functions have EH (12.9%).

## M10 — .tls section / IMAGE_TLS_DIRECTORY32 parser
- New `xenia_xex::tls::parse_tls` parses the directory + zero-terminated
  callback array. Returns None when the binary has no .tls section.
- New `tls_info` (singleton row) + `tls_callbacks(slot, address)` tables.
- New `DbWriter::write_tls()` no-ops on None.
- Sylpheed has no .tls section → 0 rows; infra ready for binaries with
  __declspec(thread).

## M8 + M11 — function_pointer_arrays (dispatch tables + static initialisers)
- New `xenia_analysis::funcptr_arrays::analyze` widens M3's vtable scan:
  detects runs of ≥2 function pointers in .rdata and classifies each as
  `vtable` (M3 re-emit), `dispatch_table` (M8), or `static_init` (M11)
  via a constructor-prologue heuristic (mfspr + small stwu).
- New tables `function_pointer_arrays(address PK, length, kind)` and
  `function_pointer_array_entries(array_address, slot, function_address)`.
- Sylpheed: 722 vtables + 388 dispatch_tables = 1,110 arrays / 6,347 slots.
  0 static_init detected (Sylpheed's ctors don't all match the
  conservative heuristic; M11.5 future work can chain via the entry-
  point's static-init driver).

## M12 — --lr-trace runtime canary-diff harness
- New CLI `exec --lr-trace=PC[,PC,...]` and `--lr-trace-out=PATH` flags.
  Symbolic resolution (Class::method, Class::*) via M4 lookup. Env vars
  XENIA_LR_TRACE / XENIA_LR_TRACE_OUT also work.
- New `KernelState::lr_trace_pcs` + `lr_trace_writer` + helper
  `fire_lr_trace_if_match(hw_id)` invoked from the per-instr probe slot.
- JSONL output: pc/tid/hw/cycle/r3/r4/r5/r6/lr — superset of what
  xenia-canary's --log_lr_on_pc patch emits, with a cycle counter for
  cross-run reproducibility. Diff-friendly via `jq`.
- Lockstep digest unaffected: smoke test on entry-point PC fires once
  with cycle=0/lr=BCBCBCBC/all-GPR-zero (correct initial state).

Tests 636→640 (+2 TLS tests, +2 funcptr_arrays tests). Schema golden
updated for new tables + has_eh column. Lockstep determinism preserved
(instructions=2000005 ×2 reruns identical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:29:35 +02:00
MechaCat02
85d1603124 Merge analysis-overhaul/m6-extended-stores 2026-05-08 21:38:51 +02:00
MechaCat02
38d8871e8d M6: addr_mode column on xrefs + extended store/load classes
Adds finer-grained addressing-mode classification to every data xref row
plus new dispatch for instruction families not previously emitted:
- New `xrefs.addr_mode VARCHAR NULL` column. NULL for control-flow edges
  (call / ind_call / j / br); one of d_form / lis_addi / lis_ori /
  multiword / x_form_indexed / x_form_byterev / atomic / dcbz for data
  edges. Index idx_xrefs_addr_mode.
- New `xenia_analysis::xref::AddrMode` enum + Xref::addr_mode field.
- Opcode 46/47 (lmw/stmw) expand to one xref per slot — D-form multi-word
  load/store now resolves all (32-rS) consecutive addresses.
- Opcode 31 X-form dispatch — stwx/stbx/sthx/stwux/stbux/sthux/stdx/stdux,
  lwzx/lbzx/lhzx/lhax/lwzux/lbzux/lhzux/lhaux/ldx/ldux,
  stwcx./stdcx. (atomic),
  stwbrx/sthbrx/lwbrx/lhbrx (byte-reverse),
  dcbz (cache-line clear).
- X-form rows are emitted ONLY when both rA and rB resolve to known
  constants (rare but present); the dominant runtime-indexed pattern
  remains correctly skipped.

Sylpheed yield (regen on master + merge):
- 442 newly-detected x_form_indexed reads (lwzx/lhzx into static tables).
- 40 newly-detected atomic writes (stwcx./stdcx. with resolvable address).
- 28,834 lis_addi refs, 18,485 d_form reads, 3,288 d_form writes — every
  pre-existing data row now tagged.
- 0 multiword / dcbz / byterev (these instructions exist but aren't on
  lis+addi-tracked code paths).

Tests 633→636 (+3 xref unit tests covering AddrMode tag uniqueness,
data-edge addr_mode round-trip, control-edge None invariant). Schema
golden updated (xrefs gains addr_mode column).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:38:47 +02:00
MechaCat02
81c90f9a53 Merge analysis-overhaul/m5-indirect-reach (M5 + M7) 2026-05-08 21:22:55 +02:00
MechaCat02
ab4fe211e5 M5+M7: indirect-dispatch reachability + .rdata string detection
Two MEDIUM milestones bundled (both opportunistic per plan; both small).

## M5 — indirect-dispatch reachability

- `xenia_analysis::indirect`: per-basic-block register tracker over each
  detected function. Recognises the canonical static-vtable pattern
  `lis+addi → lwz off(rA) → mtctr → bcctrl` where rA holds a known M3
  vtable address. Emits one `Xref { kind: IndirectCall }` per resolvable
  bcctrl site.
- PowerPC ABI awareness: `bl`-style calls clobber volatile r0..r12 + ctr
  but preserve non-volatile r13..r31, so a vtable pointer parked in r30/r31
  before a call survives.
- Label-based basic-block boundaries kill register state — bounds
  false-positive risk for jump-IN paths.
- New `XrefKind::IndirectCall` variant (DB tag `'ind_call'`).
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
  `v_reachability_from_entry`, taking `ind_call` edges in the BFS.

Sylpheed yield: 0 edges detected. The binary's 1,001 static lis+addi
references into vtables are nearly all constructor-side vptr writes, not
dispatches; real method dispatch goes through `this->vptr` which requires
alias analysis we explicitly don't do. Documented in SCHEMA.md as the
expected limitation. Three unit tests cover the synthetic-correctness path.

## M7 — string / constant-pool detection

- `xenia_analysis::strings`: scans `.rdata` for runs of ≥ 6 printable
  ASCII bytes (NUL-terminated) and ≥ 6 UTF-16LE code units (basic-plane
  printable ASCII, NUL u16 terminator).
- New `strings(address PK, encoding, length, content)` table + encoding index.
- Implicit cross-ref via existing `xrefs.kind='ref'` rows whose target
  matches a strings.address.

Sylpheed yield: 6,311 ASCII strings (including embedded HLSL shader source
and AS_CB_SURFACE_SWIZZLE_* assertion strings). 9,132 lis+addi sites
cross-reference detected strings — names source PCs near each string in
one query. Four unit tests cover encoding detection, NUL termination, and
short-run rejection.

Tests 626→633 (+3 indirect, +4 strings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:22:50 +02:00
MechaCat02
0209e88f0a Merge analysis-overhaul/m4-classaware-probes 2026-05-08 20:22:25 +02:00
MechaCat02
4ff08f6116 M4: class-aware probe tokens via M3 vtable+method tables
CLI extension only — no schema change. Adds symbolic resolution for
--pc-probe / --branch-probe / --ctor-probe tokens:
- `0xADDR` / `2186674160` — numeric (current behavior, no DB load).
- `Class::method` — joins classes × methods × demangled_names.
- `Class::*` — joins classes × methods (all slots).
- `function_name` — falls back to functions.name for free functions /
  saverestore stubs / labels.

New `xenia_analysis::lookup::resolve_probe_token(db_path, token)` opens the
DB read-only ONLY when a token is non-numeric, so legacy numeric flows pay
no IO. New `--probe-db PATH` flag (or `XENIA_PROBE_DB` env / default
`sylpheed.db` next to the .iso) selects the DB.

Symbolic resolution happens BEFORE any guest exec, so it cannot affect the
lockstep digest. Verified deterministic across two reruns at -n 2M
(instructions=2000005 identical).

End-to-end smoke test on Sylpheed: `--pc-probe='ANON_Class_6B674251::*'`
resolves to all 45 method PCs of that anonymous class (matching the
methods-table row count for that vtable).

Tests 621→626 (+5 lookup unit tests covering numeric passthrough,
symbolic-without-DB error, Class::method resolution, Class::* expansion,
and functions.name fallback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:22:21 +02:00
MechaCat02
3bd77ab506 Merge analysis-overhaul/m3-vtables-rtti 2026-05-08 20:17:50 +02:00
MechaCat02
1d6c51fbf8 M3: vtable scan + MSVC RTTI walk + 3 new tables
Adds detection of statically-allocated MSVC vtables in .rdata/.data:
- New `xenia_analysis::vtables` walks read-only sections looking for runs of
  ≥3 contiguous big-endian u32 values where each value lands on a known
  function start (from M1's corrected functions table). 2-slot runs are
  rejected to keep false-positive rate down.
- For each candidate the MSVC RTTI walk vtable[-1] → CompleteObjectLocator
  → TypeDescriptor → mangled name is attempted; on success the demangled
  class name is recorded along with a best-effort RTTIClassHierarchyDescriptor
  walk to fill base_classes_json. On failure (RTTI stripped — common for
  shipped game binaries) the class is named ANON_Class_<fnv1a-hash> keyed
  by sorted method-PC list, so identical vtables collapse to one entry.
- DB: new tables `vtables`, `methods`, `classes` with indices on
  function_address and rtti_present. `write_analysis_results` takes a
  `&[Vtable]` slice; `write_disasm` (back-compat) passes empty.
- cmd_dis wires the scan after xref analysis using
  `func_analysis.functions.keys()` as the function-start oracle.

Validation on Sylpheed (RTTI stripped, as expected): 722 vtables / 499
unique classes / 5571 methods. Sanity invariant: every methods.function_address
joins to functions.address (0 broken refs). Largest vtable: 131 slots.

Tests 617→621 (+4 vtable unit tests covering 3-slot detect, 2-slot reject,
synth name stability, and synth name divergence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:17:45 +02:00
MechaCat02
bd5753311e Merge analysis-overhaul/m2-demangler 2026-05-08 20:02:25 +02:00
MechaCat02
89f5f7e4a9 M2: MSVC C++ demangler + demangled_names DB table
Adds an MSVC name-demangling layer in front of M3's vtable / RTTI work:
- New `xenia_analysis::demangle` wraps the `msvc-demangler` crate (a Rust
  port of LLVM's `MicrosoftDemangle.cpp`). `demangle()` short-circuits on
  non-mangled inputs (`?` prefix check); `demangle_or_raw()` always returns
  a record (raw passthrough on parse failure).
- Heuristic split of the formatted demangled string into structured fields
  `(namespace_path, class_name, method_name, params_signature)`. Top-level
  paren / template-bracket aware, so `a::b<c::d>::e` and signatures with
  templated arg types parse correctly.
- DB: new `demangled_names(address, mangled, raw_demangled, namespace_path,
  class_name, method_name, params_signature)` with indices on address /
  class_name / method_name. Populated from any label whose name starts with
  `?` plus any import name that happens to be mangled.

For Sylpheed (a fully stripped binary) this table is empty out-of-the-box;
the layer's value lands in M3, which will append rows for every RTTI
TypeDescriptor name found in `.rdata`.

Tests 610→617 (+7 demangler unit tests covering early-out, raw fallback,
member function form, RTTI form, qname split, paren-template safety, and
top-level `::` splitting).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:02:21 +02:00
MechaCat02
fd68285210 Merge analysis-overhaul/m1-pdata-boundaries 2026-05-08 19:44:07 +02:00
MechaCat02
70120465a3 M1: parse .pdata RUNTIME_FUNCTION; cross-validate function boundaries
Adds an authoritative function-boundary source from the linker:
- New `xenia_xex::pdata` parses .pdata 8-byte entries (BeginAddress + packed
  prolog/length/flags). Bit layout per Microsoft PE32 PowerPC spec: prolog in
  bits 0..7, function_length in bits 8..29, flags in 30..31.
- `func::analyze_with_pdata` unions pdata BeginAddresses into the candidate
  set, attaches `pdata_validated`/`pdata_length` to each `FuncInfo`, and trims
  any function whose `end` overlaps the next start (catches mis-merge where
  one row spanned two prologues — the audit-031 sub_824D23B0/sub_824D29F0
  case).
- DB: extends `functions` with `pdata_validated BOOLEAN`, `pdata_length BIGINT`;
  new table `pdata_entries`; index on pdata_validated.
- New `crates/xenia-analysis/SCHEMA.md` documents M1 layer + forward work.

Validation on Sylpheed: 25481 functions (was 12156) / 23073 pdata_validated /
0 orphans / 0 mis-merges. Audit-031 mis-merge resolved: sub_824D29F0 now has
its own row with `pdata_length=280` (70 dwords); sub_824D23B0 now correctly
ends at 0x824D2878 (`pdata_length=1224` matches prologue walk).

Tests 605→610. New 5-test pdata unit suite covers bit layout + sentinel +
out-of-range filtering + real-world layout round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:44:02 +02:00
MechaCat02
e061e21851 Merge audit-helper/p0-dump-section: --dump-section=BASE:LEN:PATH end-of-run snapshot 2026-05-08 15:05:07 +02:00
MechaCat02
690943ceef gate dump-section reads on is_mapped; trim doc comments
Without the page-state guard, read_bulk faulted on PROT_NONE pages of
the 4 GiB host reservation. Per-page is_mapped check skips uncommitted
pages, leaving the buffer's leading zero bytes in place. Total LOC
budget after trim: 70.
2026-05-07 21:45:54 +02:00
MechaCat02
412ba858b4 move dump-section flush above quiet gate so it fires under --quiet runs
The headless cmd_exec path passes quiet=false in normal use but the
diagnostic --dump-section is independent of the chatty thread/dump
prints, so it should not be gated by --quiet. Lockstep digest preserved.
2026-05-07 21:42:33 +02:00
MechaCat02
08d41cf2fc add --dump-section=BASE:LEN:PATH for end-of-run guest memory snapshot
Drives byte-level memory diffs against canary's Memory::Save dump.
Hot-path zero-cost when absent; lockstep digest unaffected
(instructions=100000003 deterministic across reruns).
2026-05-07 21:40:45 +02:00
MechaCat02
de5a15ecfb Merge xobj-stashhandle/p0-canary-mirror 2026-05-07 21:06:28 +02:00
MechaCat02
c03f2bc9e2 fix(kernel): ensure_dispatcher_object writes XObj signature + handle (canary mirror)
Mirrors canary's `XObject::StashHandle` (xobject.h:253-256): on first
adoption of a guest dispatcher header, stamp +0x08 with the
kXObjSignature fourcc 'X','E','N','\0' and +0x0C with the stash handle
(here the guest pointer itself, since our shadow table is keyed by ptr).

Audit-023/024A documented divergence at addresses such as 0x828F4838
where canary stores "XEN\0" + handle but we left zeros. Lands as
canary-correctness restoration; cascade impact at -n 500M is nil per
the discipline gate (no sharp prediction tied to the writeback).

Lockstep determinism preserved: instructions=100000003,
imports=987516, swaps=2, draws=0 across 2 reruns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 21:06:25 +02:00
MechaCat02
d9e40d3564 Merge audit-helper/p0-mem-watch (--mem-watch infra) 2026-05-06 21:00:23 +02:00
MechaCat02
978a6950d1 feat(memory): --mem-watch=ADDR per-store writer trace
Adds an opt-in diagnostic that emits one tracing line per guest store
overlapping any armed byte address, naming the writer (tid, pc, lr)
plus old/new u32 lanes. Mirrors the --pc-probe / --branch-probe shape;
pc/lr are stamped from worker_prologue via a thread-local Cell, so
default runs (empty watch set) take a single is_empty() check on each
write. Lockstep digest preserved (instructions=100000003 across reruns,
sylpheed_n50m.json golden byte-identical).

Diagnostic infra only; no functional change. Used to identify producers
of dispatch-state writes for the audit-017 / audit-019 hunt.
2026-05-06 21:00:20 +02:00
MechaCat02
cc54ca8e64 Merge ke-resume-thread/p0-canary-mirror (KRNBUG-KE-001)
Real KeResumeThread per canary xboxkrnl_threading.cc:216-227.
Tids 9/10 leave Suspended; downstream gamma-cluster blocker
unchanged. Lockstep deterministic. Goldens re-baselined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:46:55 +02:00
MechaCat02
76dfe7fd7a fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror
Replace the no-op cookie-returner with a real impl per canary
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227
(XObject::GetNativeObject<XThread>()->Resume()). Mirrors
nt_resume_thread plumbing two functions below:
resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref.

Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves,
STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread
return semantics.

Cascade-prediction scorecard (audit-018 -> post-fix):
- A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940)
  leave Suspended -> run prologue -> park on audio buffer-completion
  semaphores 0x828A3254 / 0x828A3230.
- B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0;
  XAudioSubmitRenderDriverFrame=0.
- C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread +
  KeReleaseSemaphore still canary-only.
- D FAIL: gamma-cluster blocker unchanged — pc-probe at
  0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP;
  signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0.

Necessary-but-not-sufficient: workers unsuspend but park on a
downstream gate that's part of the audit-009/-016/-017 gamma cluster.

Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker).
Lockstep instructions=100000003 imports=987516 deterministic x2.
Goldens re-baselined: sylpheed_n50m.json instructions
50000003->50000011, imports 407255->407247.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:46:46 +02:00
MechaCat02
7ed6192b7b Merge xam-user-signin-state/p0-canary-mirror 2026-05-06 20:08:17 +02:00
MechaCat02
5d2401f9c5 fix(xam): XamUserGetSigninState returns SignedInLocally=1 for user 0
Mirrors canary xam_user.cc:90-101. User 0 returns 1 (SignedInLocally),
all other indices return 0. Replaces stub_return_zero registration that
was reaching guest-side branches looking up signin state.

Tests: 599 -> 600.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 20:08:13 +02:00
MechaCat02
d736a1dc12 Merge xnotify-listener/p0-startup-enqueue (KRNBUG-IO-004) 2026-05-06 16:56:01 +02:00
MechaCat02
91a7df5f6a docs(audit): KRNBUG-IO-004 entry + canary export queue post-fix delta
audit-findings.md: full IO-004 entry with cascade-prediction scorecard.
audit-runs/audit-006/canary_export_queue.md: post-IO-004 status note
(7 -> 3 canary-only; 4 reclassified RE-FIRES).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:55:57 +02:00
MechaCat02
b78e6fd205 fix(kernel): KRNBUG-IO-004 — real XamNotifyCreateListener + XNotifyGetNext per canary
Canary's RegisterNotifyListener (kernel_state.cc:1013-1033) auto-enqueues four
startup notifications on the first listener whose mask covers kXNotifySystem
(SystemUI=0x09 + SystemSignInChanged=0x0A) and kXNotifyLive
(LiveConnectionChanged=0x02000001 + LiveLinkStateChanged=0x02000003). XNotifyGetNext
(xam_notify.cc:22-96) pops the queue with mask + version filtering on enqueue per
xnotifylistener.cc:38-51. Our prior stubs returned 0 forever; the dispatch loop
at 0x822f1be8 in sub_822F1AA8 was thus bypassed indefinitely.

Implementation:
- KernelObject::NotifyListener { mask, max_version, queue, waiters } variant.
- KernelState::has_notified_startup + has_notified_live_startup gates.
- xam_notify_create_listener: mask=r3 (qword), max_version=r4 (clamped <=10),
  alloc handle, conditional 4-tuple startup enqueue.
- xnotify_get_next: handle/match_id/id_ptr/param_ptr in r3..r6; pop_front
  (or scan-by-id), with mask + version filter applied at enqueue time.
- 5 unit tests covering: full-mask 4 startup notifications, second-listener
  no re-fire, system-only mask filtering, max_version=0 too-new drop,
  unknown handle returning 0.

Tests: 594 -> 599. Lockstep `-n 100M` instructions=100000012 deterministic
across 2 reruns; bit-identical run-to-run diff.

Cascade (verified at -n 500M):
- dispatch arm 0x822f1be8 fires; sub_82173DC8 entered.
- 3/21 renderer-cluster L1 PCs newly reached: 0x822c6870 (2 workers),
  0x824563e0, 0x823ddb50.
- canary-only export delta 7 -> 3 (reclassified to fired:
  KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule).
- worker thread count 18 -> 20.
- signal_attempts on handle 0x15e0 = 1 (primary=1), was 0.
- draws=0 still expected at this step.

LOC: 119 (97 impl + 22 scaffolding pattern matches across main.rs / objects.rs
/ state.rs) <= 120.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 16:55:51 +02:00
MechaCat02
50a488776f docs(audit): KRNBUG-AUDIT-008 + KRNBUG-AUDIT-009 diagnostics — renderer cluster fully unreached
Captures two consecutive read-only diagnostic sessions:

AUDIT-008 (2026-05-05): IO-003 model reset. The 0x100c / 0x1004 / 0x15e0
workers ARE spawned post-IO-003; the IO-003 prediction-scorecard's
"UNCREATED" markers were misclassifications (handle audit already showed
the workers parked on lifecycle events, just unlinked from dispatcher
addresses). Hypothesized the gate among the 5 non-create-chain callers
of sub_821800D8 whose parents live in 0x82287000-0x82292FFF.

AUDIT-009 (2026-05-05): falsifies AUDIT-008's β-hypothesis. A 21-PC
--branch-probe (6 parents + 5 shims + dispatcher + 9 audit-005
producer-callsites) shows 0/21 firings at -n 500M — the entire
0x82287000-0x82294000 cluster is unreached. Static analysis confirms
the cluster's level-1 roots have zero non-call xrefs in sylpheed.db.
The gate is structurally above the cluster (vtable / function-pointer
that's never written). Stop condition 1 triggered; discipline gate
fails on box 1 + box 3; no fix this session.

Also updates audit-runs/audit-006/canary_export_queue.md to reflect
the AUDIT-009 evidence: 3 canary-only exports remain REAL_BUT_UNREACHED
(ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings) —
none is the immediate gate.

No code changes; --branch-probe machinery from AUDIT-007 sufficed.
Trace artifacts left untracked under audit-runs/audit-008/ +
audit-runs/audit-009/ (consistent with prior audit-runs/* convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 18:53:32 +02:00
MechaCat02
2cce044516 Merge xboxkrnl-ioctl/p0-fsctl-mountinfo (KRNBUG-IO-003) 2026-05-04 22:00:19 +02:00
MechaCat02
a1a7265f29 fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl
Replace the stub_success registration of NtDeviceIoControlFile at
exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive
geometry) and 0x74004 (partition info), mirroring xenia-canary
xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004
response with cache_size=0xFF000 at OUT+8 is the gate that lets
sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11
XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007.

Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the
caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox
360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area
sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export
in the codebase to need 9+ args.

Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held):
- XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓
- XamTaskSchedule count 0 → 1 ✓
- canary-only exports 7 → 3 (audit predicted ≤3) ✓
- 0x15e0 semaphore signal_attempts 0 → 1 (bonus)
- 0x100c worker spawn DID NOT fire (still UNCREATED) ✗
- 0x1004 signal_attempts unchanged ✗
- Worker spawn count unchanged at 19 ✗

Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡
run3, byte-identical). instructions=100000010 → 100000019, imports
407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists.

sylpheed_n50m golden re-baselined instructions=50000004→50000003,
imports=407362→407255. sylpheed_n2m unchanged.

Still canary-only after this fix: ExTerminateThread,
KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream
gate is somewhere past XamTaskSchedule's completion path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:00:12 +02:00
MechaCat02
58f416c284 Merge investigate-sub-824a9710/p0-branch-probe (KRNBUG-AUDIT-007) 2026-05-04 21:35:15 +02:00
MechaCat02
c51f51f9cb feat(kernel): KRNBUG-AUDIT-007 — --branch-probe instrumentation; sub_824A9710 exit gate identified
Sister to --pc-probe / --ctor-probe but emits a single compact one-line
BRANCH-PROBE record per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6 flags)
with no back-chain. Designed for tracing every conditional-branch fire
inside a candidate-gate function so the last PC reached before the
function epilogue identifies the exit branch.

Runtime trace at audit-runs/audit-007/sub_824A9710-trace.log decisively
identifies the priv-11 gate:

- Exit branch: 0x824a9944 (post bl sub_824ABD88 first call)
- Responsible kernel call: NtDeviceIoControlFile, FsCtlCode=0x74004
  (registered as stub_success at exports.rs:90)
- Mechanical chain: stub returns 0/SUCCESS without writing OUT, game
  reads [out_buf+8], finds zero, assigns hardcoded 0xC0000034
  (STATUS_OBJECT_NAME_NOT_FOUND) at sub_824ABD88:0x824abea8-ac, exits
  via 0x824a9944's lt branch before priv-11 site at 0x824a99a0.

592→592 tests; lockstep instructions=100000010, swaps=2, draws=0
deterministic across reruns. Read-only diagnostic — no fix this session.
Next session: KRNBUG-IO-003 (real NtDeviceIoControlFile per canary
NullDevice::IoControl for FsCtlCodes 0x70000 + 0x74004).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:35:10 +02:00
MechaCat02
79697ddf4e Merge xboxkrnl-vol-allocunit/p0-65536-cluster (KRNBUG-IO-002)
Volume-info class-3 alloc unit 2048 → 0x10000 (canary NullDevice
byte-identical). Tests 591 → 592, lockstep deterministic.

Audit-006-predicted 7→0 cascade FALSIFIED (7→7, no movement).
Vol-info is not the priv-11 gate. See KRNBUG-IO-002 entry in
audit-findings.md for the full diagnostic and next-session leads.
2026-05-04 21:01:30 +02:00
MechaCat02
7675035082 fix(kernel): KRNBUG-IO-002 — vol-info class-3 returns 0x10000 alloc unit (canary NullDevice)
`nt_query_volume_information_file` class-3 (`FileFsSizeInformation`)
was returning sectors_per_unit=1, bytes_per_sector=2048 (alloc unit
2048). Replaced with canary's NullDevice byte-identical values
sectors=0x80, bps=0x200 (alloc unit 0x10000), with total /
available allocation units lowered to 0x10 / 0x10 to match.

Reference: xenia-canary/src/xenia/vfs/devices/null_device.h:38-46
(`NullDevice::sectors_per_allocation_unit()` and
`bytes_per_sector()`); consumed by canary's
`NtQueryVolumeInformationFile_entry` at
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:355-365.

Tests 591 → 592 (added
`nt_query_volume_information_file_class3_returns_64k_alloc_unit`).
Lockstep `instructions=100000010, swaps=2, draws=0` deterministic
across two `--stable-digest -n 100M` reruns. sylpheed_n50m oracle
still matches its existing golden — observably a no-op at -n 50M.

The audit-006-predicted 7→0 cascade did NOT fire (canary-only
exports still 7, identical set; XexCheckExecutablePrivilege still
priv=0xA only; XamTaskSchedule still 0). All 16
NtQueryVolumeInformationFile calls in our 500M trace originate
from a single LR 0x82611f38 and complete successfully — vol-info
is therefore not the priv-11 gate. The fix value is correct
(canary-byte-identical) but is not load-bearing for the gate;
landing it anyway because it's the right value and unblocks no
regression. Stop condition triggered per the IO-002 task brief —
no second fix this session.

Next-session: --pc-probe on sub_824A9710 entry to find the actual
upstream gate. See `audit-findings.md` (KRNBUG-IO-002 entry) and
`audit-runs/post-IO-002/` for the full diagnostic trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:01:25 +02:00
MechaCat02
556a8c387a Merge investigate-sub-824aba98/diagnostic (KRNBUG-IO-001) 2026-05-04 20:20:14 +02:00
MechaCat02
bef9793aec feat(kernel): KRNBUG-IO-001 — NtReadFile on synth-empty file returns SUCCESS+0, not EOF
AUDIT-005's static attribution to sub_824ABA98 was wrong. The 0xC0000011
(STATUS_END_OF_FILE) at lr=0x824a97e4 traces to the NtReadFile call at
0x824a9810 inside sub_824A9710 — the cache-loader reads 1024 B from
offset 2048 of `\Device\Harddisk0\partition0`. Our synth-empty fallback
returned EOF (start_pos 2048 > size 0), so the function bailed via
RtlNtStatusToDosError before sub_824ABA98 was ever called.

Canary mounts partition0 to a NullDevice; `NullFile::ReadSync`
([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc))
returns X_STATUS_SUCCESS with bytes_read=0 and never touches the
buffer. Sylpheed's caller pre-zeroes the 1024-byte stack buffer
(`memset(sp+208, 0, 1024)` at sub_824A9710 prologue), validates a
"Josh" magic on the first read, and falls back to the cache-recreate
path when the magic doesn't match.

The fix mirrors NullFile semantics: when the open synthesized a
zero-length file (`data.is_empty() && size == 0`), NtReadFile returns
SUCCESS with information=0 and the buffer untouched.

Effects (chain-of-effects verification at -n 500M):
  - tests: 590 → 591 (added regression covering NullDevice semantics)
  - lockstep: deterministic across 3 reruns (same instructions=100000010,
    swaps=2)
  - sylpheed_n50m golden re-baselined: instructions 50000004→50000000,
    imports 407416→407362
  - canary kernel-call diff: 10 → 7 missing exports
    (XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile
    now run; the cache-recreate path executes through to NtWriteFile)
  - boot reaches silph::Silph::Impl::OnInit: 19 worker threads spawn
    (was 6 before the fix)
  - parked-handle 0x1004 still signal_attempts=0; the original 0x100c
    and 0x15e0 are now <UNCREATED> because cascade walked past them and
    the handle assignments shifted; new parked sites: 0x12fc/0x1600/
    0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044
  - draws=0 plateau persists; renderer is multi-causal blocked

Next blocker: per the canary-only diff, XamTaskSchedule + the cluster
of XAM exports (XamTaskCloseHandle, XamUserReadProfileSettings,
ObCreateSymbolicLink) and the post-thread-exit chain (ExTerminateThread,
KeReleaseSemaphore, KeResetEvent) are the next-up frontier.
2026-05-04 20:20:10 +02:00
MechaCat02
a6208a1249 Merge xam-avpack-hdmi/p0-return-8 (KRNBUG-XAM-001) 2026-05-04 18:54:31 +02:00
MechaCat02
19659d7f76 feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16
Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's
accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having
flipped the priv-10 gate, XGetAVPack now reaches its caller in
sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto
block before XeCryptSha. Cascade walks one step (canary-only export
list 11 → 10); sub_824ABA98 is the next candidate.

Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004,
imports 407417→407416). Lockstep deterministic across 3 reruns at
-n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2).
9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0
still signal_attempts=0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:54:24 +02:00
MechaCat02
33e49e70c8 Merge xex-check-privilege/p0-real-impl (KRNBUG-XEX-001) 2026-05-04 18:33:02 +02:00
MechaCat02
1a892d4641 feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap
Replace stub_return_zero with a canary-faithful implementation that
returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS
(key 0x00030000) bitmap. Mirrors xenia-canary
xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0.

Plumbing:
- xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor.
- xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged
  HashSet for one-shot per-priv tracing.
- xenia-app: kernel.xex_system_flags wired in cmd_exec_inner.
- xenia-kernel/exports.rs: real export body + unit test covering
  bits 10/11/0/64 + zero-flags case.

Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE,
bit 10). At -n 500M with the fix:
- XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped).
- 10 other canary-only exports + 9 producer PCs + 3 parked handles
  unchanged. Priv-11 site at sub_824A9710 is downstream and still
  not reached — AV/crypto block aborts after XGetAVPack returns
  our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts
  only 3/4/6/8 per xenia-canary xam_info.cc:250-251).

Tests 588 -> 589. Lockstep deterministic (3 reruns identical):
n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 /
draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle
test green.

Full chain-of-effects + next-frontier hand-off in audit-findings.md
under KRNBUG-XEX-001.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:32:51 +02:00
MechaCat02
451b3b28fe Merge canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade (KRNBUG-AUDIT-005) 2026-05-04 18:06:26 +02:00
MechaCat02
3e2fc1ec88 feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade
Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with
the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper
additionally logs `[disp+off]` — what the producer's
`lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both
flags share parser + storage.

Read-only diagnostic. Lockstep digest preserved (`run digest matches
golden` at -n 50M `--stable-digest`). 588 tests green.

Decisive findings (full deliverable in `audit-findings.md` /
`audit-runs/audit-005/`):

- Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call
  sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at
  -n 500M. The producer code path is not reached.

- Set-diff of kernel-call sequences (canary.log oracle vs ours.log
  at -n 500M) identifies 11 exports canary calls and we don't:
  XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign,
  ObCreateSymbolicLink, NtDeviceIoControlFile (×2),
  XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle,
  KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2).

- XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0).
  The 4 instructions immediately preceding it are:
      addi r3, r0, 10            ; privilege bit 10
      bl   XexCheckExecutablePrivilege
      cmpli 0, r3, 0
      bc 12, eq, 0x824AB724      ; if r3==0, skip whole block

- exports.rs:193 registers XexCheckExecutablePrivilege as
  stub_return_zero. Always returning 0 -> guest takes the branch
  and skips the entire AV/crypto/save-data init block.

- The other call site (sub_824A9710 at 0x824A99A0) queries privilege
  11 with opposite polarity (bne) -> gates XamTaskSchedule on the
  privilege-NOT-set arm. With both stubs returning 0, the guest
  walks the wrong arm of every privilege-gated branch.

- This explains why the dispatcher fields read zero
  ([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps):
  the ctors run, but the producers that would populate those fields
  with a non-zero handle never execute.

Next session: replace XexCheckExecutablePrivilege stub with real
priv-bit lookup from XEX header. See audit-findings.md
KRNBUG-AUDIT-005 for the validation matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:06:22 +02:00
MechaCat02
6a070bedc6 Merge dispatcher-probe-audit/p0-ctor-probe-and-struct-dump (KRNBUG-AUDIT-004) 2026-05-04 17:09:52 +02:00
MechaCat02
7108d6d131 feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump
Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.

Adds two read-only diagnostics for the parked-waiter producer hunt:

  * `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
    if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
    line capturing live r3 (= `this` in MSVC PPC ctors), lr
    (= return site), sp, plus an 8-frame back-chain with
    saved-r31/r30 per frame. Fires once per hit, exactly what the
    8-instance-pool probe needed.

  * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
    run (after the FOCUS report in `dump_thread_diagnostic`), each
    address gets a 128-byte hex + be32 + ASCII dump. Used to
    inspect the static dispatcher / job-queue struct layouts
    AUDIT-003 identified.

Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.

KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):

1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
   Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
   0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
   r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
   three handles are Meyers-style singletons with one dispatcher
   each. The "called 8 times" claim came from miscounting raw
   entries to the OUTER getter sub_8217C850 — but that getter is
   itself a Meyers-singleton-getter; only the FIRST entry cascades
   through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).

2. **The producer indirection layer is the singleton-getter
   itself.** Static byte-scan of .rdata / .data shows 0 hits for
   the dispatcher addresses — no static registry table holds them.
   But the xrefs table for the OUTER getters reveals 5–6 callers
   each, MOSTLY non-create-chain, sharing the canonical producer
   pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
   0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
   AUDIT-003 xref audit was necessary but not sufficient — it
   correctly saw "no direct producer references" but missed the
   singleton-getter indirection layer.

3. **Dispatcher struct layouts** (128-byte dumps captured at -n
   50M --halt-on-deadlock):
     - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
       thread_handle at +0x48 (0x1010), self-pointer at +0x74,
       capacity 7 at +0x28, queue empty (+0/+3C = -1).
     - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
       sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
     - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
       4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
       0x4xxxxxxx range — noticeably different layout from the
       other two pure POD job queues.

Files:
  crates/xenia-kernel/src/state.rs   ctor_probe_pcs / dump_addrs +
                                     fire_ctor_probe_if_match + 2 tests
  crates/xenia-app/src/main.rs       Exec --ctor-probe / --dump-addr
                                     CLI parsing, prologue hook,
                                     end-of-run struct dumper
  audit-findings.md                  KRNBUG-AUDIT-004 entry
  audit-runs/audit-004/              50M probe runs (v1 outer-getter
                                     hits, v2 inner-ctor hits proving
                                     the singleton hypothesis)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:09:47 +02:00
MechaCat02
48eed258f0 Merge xam-handle-stack-trace/p0-class-probe (KRNBUG-AUDIT-003)
vtable/RTTI class probe at handle creation + wait. Read-only
diagnostic; lockstep determinism preserved.

Tests 581 → 586 green. --stable-digest -n 100M instructions=100000002.

Identifies handle 0x100c dispatcher at 0x828F3D08 and handle 0x15e0
dispatcher at 0x828F4070 — both POD job queues, not C++ classes
(`[this+0]=-1` sentinel, no vtable). Decisive xref audit shows every
reference to either base is in a ctor or the CRT — NO producer code
exists in static analysis. Producer hunt deliverable: confirms
unreachable-producer, not broken-producer.

Master HEAD prior: 6440261.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 21:15:03 +02:00
MechaCat02
f84e947547 feat(kernel): KRNBUG-AUDIT-003 — vtable/RTTI class probe at handle creation + wait
Adds a read-only MSVC RTTI traversal helper (`read_class_at_this`)
and a `probe_create_stack_classes` integration that walks each
captured back-chain frame for handle creates in `--trace-handles-focus`
and probes each frame's most-likely `this` candidate (live r31/r30/r3
for frame 0; saved-r31/r30 from the prologue spill area at [fp-12]/
[fp-16] for deeper frames). False-positive guard rejects the CRT
static-init iterator pattern (vtable's first two slots must be image-
range function pointers — PPC instruction words like `mflr r12` are
not in 0x82xxxxxx).

`dump_thread_diagnostic` now takes `&GuestMemory` so the FOCUS report
prints, for each parked waiter, a WAIT-THREAD block with full back-
chain frames and per-slot saved-register dump for offline lookup.

End-to-end finding (-n 500M producer-trace):
  * Handle 0x100c dispatcher = 0x828F3D08 (image rdata; verified by
    sub_82181750 disasm + xref table). [this+0] = -1 sentinel — POD
    job queue, NOT a C++ polymorphic class.
  * Handle 0x15e0 dispatcher = 0x828F4070 (same shape).
  * Handle 0x1004's 8-instance pool members still TBD (MSVC ctors
    didn't preserve `this` in r31).
  * 0x42450b5c is a separate audit class (heap-allocated, parks via
    non-`do_wait_single` path).

Decisive xref audit: every reference to 0x828F3D08 / 0x828F4070 in
the static analysis is in a ctor or the CRT init driver. NO producer
code references either dispatcher base. Confirms `signal_attempts=0`
is unreachable-producer, not broken-producer.

Tests: 581 → 586 green (+5: RTTI-intact / RTTI-stripped / non-object
/ cstring / probe_create_stack integration). `--stable-digest -n
100M` instructions=100000002 unchanged. Master HEAD prior: 6440261.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 21:14:56 +02:00
MechaCat02
6440261e2e Merge xam-handle-stack-trace/p0-multiframe-walker (KRNBUG-AUDIT-002)
Multi-frame back-chain capture at NtCreateEvent / NtCreateSemaphore /
NtCreateTimer / XamTaskSchedule, gated on --trace-handles-focus. Read-
only diagnostic; lockstep determinism unaffected.

Tests 576 → 581 green. --stable-digest -n 100M instructions=100000002.

Identifies: 0x1004 = 8-instance pool via static ctor at 0x8280F810;
0x100c = singleton inside main(); 0x15e0 = singleton in distinct
cluster. All three are silph-framework dispatchers; producer hunt
continues with vtable/RTTI readout next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:41:12 +02:00
MechaCat02
2a9fd1fc86 feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation
Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a
`record_create_with_stack` audit hook gated on `--trace-handles-focus`.
NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now
route through the new helper so focused handles capture up to 6 stack
frames at allocation time. Diagnostic-only, read-only memory access:
unfocused handles pay one HashSet lookup, focused ones pay six
back-chain dereferences. Lockstep determinism preserved.

End-to-end finding: handles 0x1004 (8-instance pool via static ctor at
0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton
in distinct cluster) are silph-framework dispatcher objects whose
producer code is unreached at -n 500M. The producer hunt now has class
ownership; vtable/RTTI readout is the next step.

Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002
unchanged. Master HEAD prior: 9d45efe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:41:06 +02:00
MechaCat02
9d45efe5d5 Merge xaudio-register-driver/p0-real-callback-loop (APUBUG-PRODUCER-001)
Adds canary-faithful XAudioRegisterRenderDriverClient + Unregister + Submit
implementations and a default-off audio buffer-complete callback ticker
(`--xaudio-tick` / `XENIA_XAUDIO_TICK=1`).

Producer hypothesis FALSIFIED for handles 0x1004/0x100c/0x15e4 — all three
still show signal_attempts=0 at -n 500M with the ticker enabled.

Tests: 562 → 576 green. Lockstep goldens preserved at default settings
(instructions=100000002, swaps=2 unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:50:29 +02:00
MechaCat02
07068e7616 feat(audio): APUBUG-PRODUCER-001 — XAudio register driver client + opt-in callback ticker
Replace the three XAudio kernel-export stubs (Register/Unregister/SubmitFrame)
with canary-faithful implementations and add a periodic buffer-complete
callback ticker reusing the existing SavedCallbackCtx injection machinery.

Canary parity:
- xboxkrnl_audio.cc:56-93 — read callback_ptr[0..1], wrap callback_arg in a
  4-byte big-endian guest heap buffer (`wrapped_callback_arg`), write
  `0x4155_xxxx` to *driver_ptr.
- audio_system.cc:139-141 — guest callback receives r3 = wrapped pointer,
  not raw callback_arg.
- audio_driver.h:21-24 — frame rate 256 samples / 48 kHz ≈ 5.33 ms.

Implementation:
- New `crates/xenia-kernel/src/xaudio.rs` — `XAudioClient`, `XAudioState`
  (8-slot table, pending FIFO, dual-mode ticker), `XAUDIO_INSTR_PERIOD =
  48_000` (lockstep) and `XAUDIO_PERIOD = 5.333 ms` (--parallel), same
  pattern as KRNBUG-D08 v-sync.
- `try_inject_audio_callback` in xenia-app mirrors `try_inject_graphics_interrupt`,
  shares `interrupts.saved` slot for mutex with graphics callbacks.

Gating: ticker + injector run only when `--xaudio-tick` /
`XENIA_XAUDIO_TICK=1`. Default off because Sylpheed's audio callback
enters an infinite `KeWaitForSingleObject` loop on first invocation
(canary's host worker thread provides the buffer-completion fence we
don't model), which hijacks a guest HW thread and regresses
`swaps=2 → 1`. Default-off preserves the lockstep `sylpheed_n*m.json`
goldens exactly.

Producer hunt outcome (FALSIFIED for parked handles 0x1004/0x100c/0x15e4):
at `-n 500M --xaudio-tick` all 3 handles still show
`signal_attempts=0 (primary=0, ghost=0)`. Audio callback is not the
missing producer. Next candidate per audit-findings.md is Timer DPC
delivery (KeSetTimer / KeInsertQueueDpc).

Tests: 562 → 576 green (10 in `xaudio.rs`, 4 in `exports.rs`).
Lockstep `--stable-digest -n 100M` default-off: instructions=100000002,
swaps=2 (matches pre-change baseline byte-for-byte).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:50:22 +02:00
MechaCat02
38f78c88a8 Merge xam-task-schedule-producer/p0-spawn-real-thread (XAMBUG-PRODUCER-001) 2026-05-03 18:32:44 +02:00
MechaCat02
691404e36e fix(xam): XAMBUG-PRODUCER-001 — XamTaskSchedule spawns a real guest thread
Replaces the no-op stub at xam.rs:204 with a canary-faithful
implementation mirroring xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80.
Allocates a ThreadImage, allocates a KernelObject::Thread handle, and
routes through Scheduler::spawn with entry=callback and
start_context=message_ptr (canary's third positional XThread ctor arg).
Stack size = max(0x4000, page-aligned 0x10_0000).

Producer-hypothesis outcome (500M --trace-handles-focus run): the call
site at 0x824a9a10 is never reached during this boot horizon, so
XamTaskSchedule cannot be the missing producer for the 3 parked
Event/Manual handles (0x1004, 0x100c, 0x15e4). The fix still lands —
the stub was a real correctness bug that would manifest the moment
the boot advances past the current deadlock. Next candidate per
audit-findings.md: XAudioRegisterRenderDriverClient.

- Workspace tests: 561 → 562 green (new test
  xam::tests::xam_task_schedule_spawns_real_thread).
- --stable-digest -n 100M: instructions=100000002 unchanged from
  baseline; lockstep determinism preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:32:40 +02:00
MechaCat02
b54aa48d10 Merge audit-2026-05-fix/p2-session-closeout 2026-05-03 17:35:37 +02:00
MechaCat02
eb71fe8daf docs(audit): close out follow-up session 2026-05-03
3 IDs landed: GPUBUG-DRAIN-001, KRNBUG-AUDIT-001, KRNBUG-D08.
Tests 556 → 561. Lockstep digest BIT-IDENTICAL on stable fields.
draws=0 persists; parked-waiter producer-trace confirms hypothesis
(A) for 3 of 4 handles — guest code never calls Nt/KeSetEvent on
0x1004 / 0x100c / 0x15e4 — so the renderer plateau is a missing
kernel signal source, NOT a wake-eligibility bug or BST-paradox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:35:37 +02:00
MechaCat02
866855000c Merge audit-2026-05-fix/p2-vsync-wallclock (KRNBUG-D08) 2026-05-03 17:34:30 +02:00
MechaCat02
27d3608174 fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel
The synthetic v-sync ticker used a per-instruction proxy
(VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep
throughput → 60 Hz. Audit M11 observed this drifts under
`--parallel`: with 6 worker threads sharing the kernel mutex,
the dispatcher executes more PPC instructions per tick
callback, so the accumulator never crosses 150 k. Result:
~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel.

Hybrid solution preserves lockstep determinism (which the
goldens depend on) while fixing --parallel:

* `tick_vsync_instr(instr_count)` — legacy instruction-count
  ticker, used by lockstep. Bit-stable across runs.

* `tick_vsync_wallclock()` — new Instant-based ticker. Fires
  `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor
  and advances the anchor by that many full periods (no
  lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a
  forward-jumping clock can't overflow the FIFO.

* `KernelState.parallel_active` flag set at startup from
  `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round`
  in main.rs to choose between the two tickers.

Verification:

* cargo test --workspace --release: 561 passing (+3 new
  wall-clock tests vs prior 558 baseline).
* lockstep -n 100M --stable-digest: BIT-IDENTICAL to
  pre-Phase-3 baseline. interrupts_delivered preserved at
  ~630 (was ~629 pre-fix).
* --parallel --reservations-table -n 30M: interrupts_delivered
  rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps
  burst delivery; that's a separate bottleneck — addressed
  by raising cap when --parallel queue depth becomes the
  next blocker.)

Trade-off: --parallel runs are non-deterministic at the
v-sync rate by design (per audit M05 PPCBUG-703 already).
Lockstep stays bit-identical, so the `sylpheed_n*m.json`
goldens are untouched.

Audit IDs: KRNBUG-D08 (closed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:34:30 +02:00
MechaCat02
b82919bdd0 Merge audit-2026-05-fix/p2-parked-waiter-trace (KRNBUG-AUDIT-001) 2026-05-03 17:22:14 +02:00
MechaCat02
d1105aafae diag(audit): KRNBUG-AUDIT-001 — focused parked-waiter ghost-trail diagnostic
Adds a one-run diagnostic that distinguishes "guest never called
Nt/KeSetEvent on this handle" from "signal landed but waiter wasn't
woken", for any handle named via `--trace-handles-focus`.

Parked-waiter context (project_xenia_rs_sylpheed_stage3_2026_04_29):
four worker threads block Sylpheed past `draws=0` on handles
0x1004 / 0x100c / 0x15e4 / 0x42450b5c (mr=true, sig=false). The
pre-existing audit dropped signal-attempts that targeted handles
without a primary trail, so we couldn't tell whether the producer
was unreachable in the guest or whether the signal landed but missed
its waiter.

Three changes:

* audit.rs: `HandleAudit` gains `focus: HashSet<u32>` and
  `ghost_trails: HashMap<u32, GhostTrail>`. `record_signal`
  auto-falls-through to a new `record_signal_attempt_ghost` when no
  primary trail exists AND the handle is in `focus`. Bounded by
  AUDIT_RING_CAPACITY per handle. Two new tests cover the focus
  ghost-trail and no-double-record invariants.

* main.rs: new `--trace-handles-focus=<LIST>` flag (hex 0x or decimal,
  comma-separated) populates `kernel.audit.focus`. Implies
  `--trace-handles`. New "=== Handle audit (focus) ===" section in
  `dump_thread_diagnostic` emits per-handle:
    - signal_attempts (primary + ghost), waits, wakes
    - merged cycle-sorted timeline (last 16)
    - GuestExport / KernelInternal classification
    - <AUDIT_BLIND> marker when waiter_count > 0 but the audit
      saw no waits (i.e. waiter parked via a non-audit path —
      CS / spinlock / DPC).
    - DIAGNOSIS conclusion that selects between five branches.

* `cmd_check` passes None for focus → goldens unaffected.

Empirical run output at -n 500M lockstep with
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:

  handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
                    signal_attempts=0 (primary=0, ghost=0)
                    waits=1 wakes=0
     created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
     => producer is a missing kernel signal source
        (or BST-paradox upstream)
  ... (same shape for 0x100c, 0x15e4)
  handle=0x42450b5c kind=<UNCREATED> waiters=1 signal_attempts=0
                    waits=0 wakes=0 <AUDIT_BLIND>
     => waiter parked via non-audited path

Conclusion: hypothesis (A) confirmed for all 4 handles. Producer is
NOT a wake/eligibility bug — it is a genuinely missing kernel signal
source. The 3 Event/Manual handles share a creator
(lr=0x824a9f6c, tid=1) and the same wait-call wrapper at
lr=0x824ac578 — these are 3 worker threads all parked on
"work-available" notifications that never come.

Verification:
* cargo test --workspace --release: 558 passing (+2 new ghost-trail
  tests vs prior 556 baseline)
* lockstep -n 100M --stable-digest: bit-identical to master HEAD

Audit IDs: KRNBUG-AUDIT-001 (closed — diagnostic instrumentation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:22:14 +02:00
MechaCat02
0e95e38813 Merge audit-2026-05-fix/p2-vdswap-parallel-fallback (GPUBUG-DRAIN-001) 2026-05-03 17:12:19 +02:00
MechaCat02
7a1b6b3306 fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel
The Phase-C VdSwap PM4 ring path (commit 82f3d61) emits two
"PM4_XE_SWAP not consumed by drain" warnings when running:

  exec sylpheed.iso --ui --quiet --halt-on-deadlock \
    --parallel --reservations-table

Lockstep -n 100M never trips it. Two distinct race windows:

(a) Inline backend (--ui forces it): drain(mem, 4096) hit its
    fixed packet cap before reaching the PM4_XE_SWAP we'd just
    injected at the WPTR tail. With 6 CPU threads, the ring
    accumulates >4096 packets between vd_swap callbacks.

(b) Threaded backend (--parallel without --ui): the worker's
    DrainFence handler has a 900 ms deadline and game-batched
    IBs (8-10 M packets observed) keep it from reaching the
    tail in any reasonable budget. If the worker eventually
    drained past the injected packet later, the safety-net
    direct notify would double-count.

Three changes:

* gpu_system.rs: new `drain_until_wptr(target, time_budget)`
  draining by the canary `WorkerThreadMain` predicate
  (read_offset != target) instead of a fixed packet count.
  900 ms deadline mirrors the threaded DrainFence handler.

* handle.rs: inline `drain_to_current_wptr` switches to
  `drain_until_wptr`. DrainFence handler publishes the digest
  mirror BEFORE replying so the CPU's post-drain
  `digest_snapshot` sees fresh stats.

* exports.rs (vd_swap): skip the PM4 ring injection
  unconditionally and route swap notification through
  `notify_xe_swap` directly. Tail-injection is unreliable
  under --parallel for both backends. The slot-0
  fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001);
  draws=0 today so a stale slot 0 has no observable effect.

Verification:

* cargo test --workspace --release: 556 passing (unchanged).

* Lockstep -n 100M --stable-digest: bit-identical to
  pre-fix master HEAD aa3f1d3.
  {instructions:100000002, imports:987685, unimpl:0, draws:0,
   swaps:2, ...}

* check --parallel --reservations-table -n 30M: 0 warnings
  (was 2). swaps=2.

* exec --gpu-inline --parallel --reservations-table -n 30M:
  0 warnings (was 2 with drained=8M-10M observed). swaps=2.

Audit IDs: GPUBUG-DRAIN-001 (closed),
GPUBUG-FETCH-PATCH-001 (filed, deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:12:15 +02:00
MechaCat02
aa3f1d344f Merge audit-2026-05-fix/tracker-close-out: fix-session 2026-05-03 close-out 2026-05-03 14:35:08 +02:00
MechaCat02
c7fccccbc6 docs(audit): close out fix session 2026-05-03 — 12 IDs applied
Records the outcome of the audit-2026-05 fix sprint into the master
tracker. Documents:
- 12 closed IDs (10 P0 + 2 P1) with their commit SHAs and verification deltas
- 4 deferred IDs (XAMBUG-001, XAMBUG-002, KRNBUG-D08/XMODBUG-011,
  PPCBUG-720/721/722) with explicit reasons
- Sprint acceptance criteria status: A-E lands cleanly with swaps=2,
  but draws=0 persists (renderer plateau is multi-causal as the audit
  predicted; parked-waiter handles unresolved)
- Recommended next session: trace producers for the 4 parked-waiter
  handles directly

Closed IDs:
  SWAPBUG-001 / PPCBUG-001 (P0)              → 9ab986e
  ORACBUG-004 (P0; partial ORACBUG-006)       → 1f416aa
  KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013 (3× P0) → 82f3d61
  GPUBUG-101 (P0)                              → 78ea81c
  GPUBUG-100 (P0; abs deferred)                → c5c6713
  GPUBUG-102 (P0)                              → ec2d955
  GPUBUG-103/104/105 (3× P0)                   → 8723d68
  KRNBUG-017 (P0-under-parallel)               → e7d0fcf
  GPUBUG-006 (P1)                              → 8fc1b1d
  XMODBUG-002 (P1)                             → 780e854

Test count at sprint close: 556 (+5 from 551 baseline).
Workspace clean; no dangling branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:35:08 +02:00
MechaCat02
6f851a2083 Merge audit-2026-05-fix/p1-xmodbug-002-write-bulk 2026-05-03 14:30:22 +02:00
MechaCat02
780e854c2f fix(memory): XMODBUG-002 — write_bulk bumps page_versions for touched pages
`GuestMemory::write_bulk` did the bulk copy via raw `copy_nonoverlapping`
without bumping page_versions for any of the pages it touched. The
per-byte `write_u8/u16/u32` methods all bump page_versions after their
store; downstream caches (texture cache, shader cache) Acquire-load the
slot to invalidate stale entries on guest writes. Without the bulk
bump, a caller like `NtReadFile` writing a texture/shader resource into
guest memory would leave any cache that had already keyed on the prior
version handing back stale decoded bytes.

After the copy, walk every page the write touched and bump it. Cheap:
the typical bulk write spans a few pages (NtReadFile uses 64-128 KB
chunks → 16-32 pages).

Reservation-table invalidation for `lwarx`/`stwcx.` (XMODBUG-001's
sibling) is NOT addressed here — the reservation table lives on
KernelState, not GuestMemory, and plumbing it through requires a wider
change. Callers that bulk-write code-bearing or atomic-bearing memory
should call `kernel.reservations.invalidate_for_write(addr)`
themselves; XEX-loader and NtReadFile are doing data-bearing writes
that don't intersect lwarx targets, so this is acceptable for now.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged)
  draws:                0 → 0
  texture_cache_entries: 0 → 0    (Sylpheed hasn't issued IM_LOAD yet
                                   — the bump is silent until a cache
                                   keys on a touched page, which won't
                                   happen until Phase F2/F3 unblocks
                                   the resource-loader workers)
  packets:              ~59M (within noise)
Tests: 16 memory pass.

Closes XMODBUG-002 (P1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:30:22 +02:00
MechaCat02
104078dc29 Merge audit-2026-05-fix/p1-gpubug-006-mmio-ordering 2026-05-03 14:26:09 +02:00
MechaCat02
8fc1b1dfed fix(gpu): GPUBUG-006 — sync_with_mmio Acquire/Release pair the producer
The producer side (`mmio_region.rs:78`, the guest's CP_RB_WPTR MMIO
write callback) uses `Ordering::Release` so any ring-memory writes
the guest performed before bumping WPTR are visible to a paired
`Acquire`-load on the consumer. The consumer here at `sync_with_mmio`
was using `Ordering::Relaxed` for both the WPTR load and the RPTR
mirror store — leaving the Release/Acquire pairing broken.

Under `--parallel`, this broken pairing means the GPU worker can
observe a fresh WPTR value while still reading stale ring-memory
contents at the corresponding offsets — garbage PM4 packets. The
audit's M11 grid run confirmed --parallel is non-deterministic
beyond the documented `packets` ±5% noise; this fix is one strand
of that.

Symmetric fix on the RPTR mirror store: Release pairs with any
guest-side Acquire-load of CP_RB_RPTR for ring-writeback
bookkeeping.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged)
  draws:                0 → 0     (unchanged)
  packets:              ~60M (within noise)
Tests: 149 (no count change; this is a memory-ordering correctness
fix, not a behavioral change visible at the digest level in
lockstep).

Closes GPUBUG-006 (P1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:26:09 +02:00
MechaCat02
fceaa81f46 Merge audit-2026-05-fix/kernel-p0-spinlock-xam: KRNBUG-017 Kf-spinlock 2026-05-03 14:25:00 +02:00
MechaCat02
e7d0fcf2c9 fix(kernel): KRNBUG-017 — real Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql
The Kf-family spinlock exports were registered as stubs:
  KfAcquireSpinLock              → stub_return_zero (didn't write lock)
  KfReleaseSpinLock              → stub_success     (didn't clear lock)
  KeReleaseSpinLockFromRaisedIrql → stub_success     (same)
  KeTryToAcquireSpinLockAtRaisedIrql → returned 1 but didn't set lock value

Guest code that read the lock value back (e.g. nested
acquire/release sanity checks, debug assertions) saw 0 even after
"acquiring", and could enter critical regions without contention
serialization. Under `--parallel` the coarse Arc<Mutex<KernelState>>
already serializes us, so the audit's P0-under-parallel ranking is
about correctness of the lock value visible to guest code, not
mutual-exclusion (which is provided by the host mutex).

Implementation mirrors canary's
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc`:
  - KfAcquireSpinLock:           write 1 to *SpinLock, return 0 (old IRQL)
  - KfReleaseSpinLock:           write 0 to *SpinLock
  - KeReleaseSpinLockFromRaisedIrql: write 0 to *SpinLock
  - KeTryToAcquireSpinLockAtRaisedIrql: write 1 to *SpinLock, return 1

Single-threaded HLE: contention can never be observed (we never run
two guest threads simultaneously without holding the kernel mutex),
so the spin-loop can degenerate to an unconditional acquire.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged)
  draws:                0 → 0     (gated by F2/F3/G)
  packets:              ~59M (within noise)
Tests: 76 kernel pass (no count change; existing harness covers the
new write semantics implicitly via guest-memory smoke tests).

Closes KRNBUG-017 (P0 under --parallel).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:24:47 +02:00
MechaCat02
537d789deb Merge audit-2026-05-fix/drawstate-p0-register-addresses: GPUBUG-103/104/105 2026-05-03 14:22:09 +02:00
MechaCat02
8723d6826b fix(gpu): GPUBUG-103/104/105 — fix 8 draw-state register addresses + index_size bit
Eight of the register-index constants in draw_state.rs::reg pointed at
completely unrelated registers because the canonical canary table
(register_table.inc) was misread when the module was first authored.
Re-validated each value against canary's lines 1232-1336.

| Register                  | Pre-fix | Canary | Was-actually  |
| ------------------------- | ------- | ------ | ------------- |
| VGT_DRAW_INITIATOR        | 0x2281  | 0x21FC | (junk)        |
| VGT_DMA_BASE              | 0x2282  | 0x21FA | (junk)        |
| VGT_DMA_SIZE              | 0x2283  | 0x21FB | (junk)        |
| PA_SC_WINDOW_SCISSOR_TL   | 0x200E  | 0x2081 | SCREEN_SCIS_TL|
| PA_SC_WINDOW_SCISSOR_BR   | 0x200F  | 0x2082 | SCREEN_SCIS_BR|
| RB_COLOR_INFO_1           | 0x2010  | 0x2003 | COHER_DEST_BASE_10|
| RB_COLOR_INFO_2           | 0x2011  | 0x2004 | COHER_DEST_BASE_11|
| RB_COLOR_INFO_3           | 0x2012  | 0x2005 | COHER_DEST_BASE_12|
| PA_SU_VTX_CNTL            | 0x2083  | 0x2302 | PA_SC_CLIPRECT_RULE|

Also corrected the `index_size` bit position in VGT_DRAW_INITIATOR
extraction: was bit 8 (which is `major_mode[0]`), should be bit 11 per
canary `registers.h:324` (`xenos::IndexFormat index_size : 1; // +11`).
The block comment in `extract()` was also wrong about the
intermediate field layout and has been refreshed.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged)
  draws:                0 → 0     (still gated — see below)
  packets:              ~61M (within noise)
Tests: 149 (no count change; existing draw_state tests cover the
new constants implicitly via behavioral round-trip).

The audit predicted Phases C+D+E together would unlock `draws > 0`,
but the runtime plateau is multi-causal per the audit's own analysis
(`project_xenia_rs_audit_2026_05_02.md`). The likely remaining
blockers in -n 100M:
  * 4 parked-waiter worker threads (handles 0x1004, 0x100c, 0x15e4,
    0x42450b5c) — Phase F's XAM/spinlock fixes target this.
  * shader_blobs_live=0 after 100M — the game hasn't issued IM_LOAD
    yet because workers haven't loaded shader resources.
The register fixes here are still load-bearing for any draw that
DOES happen (every register read at 0x2281 was junk before this
commit) — landing them now is correct even if draws=0 persists until
Phase F unparks the resource-loader threads.

Closes GPUBUG-103, GPUBUG-104, GPUBUG-105 (P0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:22:04 +02:00
MechaCat02
a07784349d Merge audit-2026-05-fix/shader-p0-operand-modifiers: GPUBUG-100/101/102 2026-05-03 14:18:51 +02:00
MechaCat02
ec2d955dbd fix(gpu): GPUBUG-102 — apply per-format endian byte-swap to vertex fetch
The vertex fetch constant (canary `xe_gpu_vertex_fetch_t`,
xenos.h:1158-1172) holds an `endian` field (low 2 bits of dword_1)
selecting kNone/k8in16/k8in32/k16in32 swap patterns per
`GpuSwapInline` (xenos.h:1090-1109). Xbox 360 vertex data is stored
big-endian; the host is little-endian. Pre-fix every dword was
bitcast as-is — vertex positions decoded as byte-reversed garbage,
producing clipped or NaN positions in any draw that survived to the
host.

Mechanical changes:
- crates/xenia-gpu/src/translator.rs: AOT `emit_vfetch` reads
  fetch_const dword 1 (endian) and wraps each lane's load in
  `gpu_swap(value, endian)`. New `gpu_swap` helper added to the
  emitted module header.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: matching
  `gpu_swap` helper added to the runtime interpreter shader.
  `interpret_vertex_fetch` reads fc1, computes the endian, and wraps
  every format's per-lane load (including 8_8_8_8 and 16_16_FLOAT
  paths). Mirrors the AOT translator's emission.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (gated by Phase E for draws)
  draws:                0 → 0
  packets:              ~60M (within noise)
Tests: +1 (vfetch_emit_includes_gpu_swap_helper_call).

Closes GPUBUG-102 (P0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:18:46 +02:00
MechaCat02
c5c6713419 fix(gpu): GPUBUG-100 — apply per-operand swizzle + negate to ALU sources
Word-1 of every ALU triple holds three 8-bit component-relative
swizzles (`src1_swiz`/`src2_swiz`/`src3_swiz` at bits 16-23/8-15/0-7
per canary ucode.h:2064-2066) and three per-operand negate flags
(bits 24/25/26). Pre-fix, both the WGSL interpreter and the AOT
translator discarded word-1 entirely with `_ = w1;` — every ALU
result was missing its swizzle (broadcast/permute patterns like
`.zyxw`, `.xxxx`) and any negated operand was used positive instead.

Component-relative semantics (canary's
`AluInstruction::GetSwizzledComponentIndex`, ucode.h:1996): for output
component i, the source component is `((swizzle >> (2*i)) + i) & 3`.
Identity swizzle is 0x00, NOT 0xE4 — the original `apply_swizzle` in
the interpreter shader treated it as absolute, also incorrect.

Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: extend AluInstruction with
  src_X_swiz (u8) and src_X_negate (bool) fields. decode_alu unpacks
  them from word 1.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: apply_swizzle uses
  component-relative semantics. interpret_alu decodes the modifiers
  and applies via apply_swizzle + apply_modifiers (with abs=false).
- crates/xenia-gpu/src/translator.rs: src_operand emits the
  precomputed swizzle inline as `vec4<f32>(base.x, base.y, ...)`,
  then wraps in `(-…)` when negated. Identity swizzle (0x00) emits a
  bare base expression so it round-trips with the trivial-shader
  fixture.

Abs is omitted in this commit — the abs flag is dual-meaning (for
temps it lives at bit 7 of the src byte; for constants at word-2 bit
7 `abs_constants`). Wiring it up correctly requires more careful
case-split logic; deferred to Phase G.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (gated by Phase E for draws)
  draws:                0 → 0
  packets:              ~58M (within noise)
Tests: 554 → 555 (+1 swizzle/negate test, no count change otherwise
because identity swizzle test merged into D1's parameterised test).
WGSL still validates via naga (combined_module_parses_as_wgsl).

Closes GPUBUG-100 (P0). Abs deferred to Phase G.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:15:07 +02:00
MechaCat02
78ea81c12a fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector
Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h:
2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags
selecting temp register (1) vs ALU constant (0); the corresponding
8-bit src byte indexes either:
  - a temp register (bits 5:0 = index, bits 6/7 reserved for
    relative-addressing / abs flags consumed by Phase D2), or
  - an ALU constant (full 8-bit index).

Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F`
on the src byte and emitted `r[low7]` regardless of the operand class.
Every shader's WVP matrix / light constant / per-frame uniform read
came back as r[low7] — typically zero — yielding invisible rendering.

Mechanical changes:
- crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp /
  src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our
  src_a (low byte of w0) is canary's third operand, hence its selector
  is bit 29 (canary src3_sel), not bit 31.
- crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes
  the is_temp flag; constants index xenos_consts.alu directly.
- crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the
  interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when
  constant.

The trivial-shader synthetic test was updated to set the temp flags so
its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the
flags set, all sources would now resolve as constants.

Bank-selection (cf-level relative addressing for higher banks of the
512 ALU constants) remains a Phase G+ extension — covers c0..c127
in bank 0, which most Sylpheed shaders use directly.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (unchanged — gated by D2/D3/E for draws)
  draws:                0 → 0
  packets:              ~61M (within noise)
Tests: 552 → 554 (+2 translator tests for the temp/constant decode).

Closes GPUBUG-101 (P0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:10:11 +02:00
MechaCat02
1b74db6fa7 Merge audit-2026-05-fix/renderer-p0-vdswap-pm4: VdSwap PM4 ring path 2026-05-03 14:00:27 +02:00
MechaCat02
82f3d611e2 fix(gpu,kernel): KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 — VdSwap PM4 ring path
The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and
called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving
the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping
the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/
blur "sample frame N for frame N+1" path samples fetch-constant slot 0
expecting the frontbuffer descriptor; without the patch, slot 0 stayed
stale and any shader sampling it read garbage.

This commit writes the canary VdSwap PM4 sequence directly into the
primary ring at the current write pointer (read via the shared MMIO
atomic), then advances WPTR over the injection. The natural CP drain
consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant
slot 0 — without going through any direct kernel→GPU bypass.

Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521):
  1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with
     base_address re-patched from virtual to physical >> 12).
  2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys
     + width + height.

Mechanism notes:
- buffer_ptr in xenia-rs is in the system command buffer, NOT the
  primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs
  ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to
  buffer_ptr because its ring layout maps the reserved slot inside
  the ring; xenia-rs's doesn't, so we have to write at the actual
  ring WPTR address (cached on KernelState.ring_base from
  VdInitializeRingBuffer).
- The original "buffer_ptr zero-fill + bump WPTR by 64" path is
  preserved before the injection — it exposes any game-batched PM4
  packets and keeps the buffer_ptr region skippable per existing
  game compat behavior.
- A safety-net fallback at the end calls `notify_xe_swap` directly if
  swaps_seen didn't advance during the drain (e.g. a ring-arithmetic
  edge case). Idempotent — only fires when the PM4 path didn't.
- KRNBUG-Mm-04 deferred: virt→phys uses the masked stub
  `virt & 0x1FFF_FFFF`, sufficient for the standard heap.

Mechanical changes:
- crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3
  helpers + round-trip unit test (mirrors canary xenos.h:1682-1709).
- crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor
  (Acquire-load) so the kernel can compute ring offsets.
- crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords
  on KernelState (set by VdInitializeRingBuffer).
- crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit
  block; patch fetch_dwords[1] base_address virt→phys before injection.

Verification at -n 100M lockstep:
  swaps:                2 → 2     (game fires VdSwap exactly twice)
  draws:                0 → 0     (gated by Phases D+E)
  fallback warning:     0 occurrences (PM4 path consumed both swaps)
  instructions:         ~100M
Tests: 552 passing (553 with new pm4 round-trip test). Lockstep
stable-fields determinism: byte-identical across two 100M runs.

The "swaps > 2" prediction in the audit's plan assumed the game would
fire VdSwap more often once the path worked; empirically Sylpheed only
calls VdSwap twice within 100M instructions (this is the renderer
plateau the audit identified). The success criterion for Phase C is
that the PM4 path is now operational, which Phases D+E require for
visible draws.

Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:00:23 +02:00
MechaCat02
0590bffdd9 Merge audit-2026-05-fix/oracle-sylpheed-n50m-n4b: ORACBUG-004 sylpheed_n50m oracle 2026-05-03 13:46:06 +02:00
MechaCat02
1f416aaa2e test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle
Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep,
covering the first VdSwap pair (the n2m oracle is swap-blind because
the first VdSwap fires at ~18M instructions). The new --stable-digest
flag emits/compares only fields that are deterministic in lockstep:

  instructions, imports, unimpl, draws, swaps,
  unique_render_targets, shader_blobs_live, texture_cache_entries

Excluded:

  packets — empirically ±2-8% lockstep variance (GPU thread race per
    audit M11)
  resolves, interrupts_delivered, interrupts_dropped, texture_decodes —
    scheduling-sensitive under --parallel
  path — cwd-dependent

Empirical determinism: 3 consecutive lockstep -n 50M runs produce
byte-identical stable-digest output.

The n4b canonical-invocation golden the audit's recommended next sprint
also called for is deferred. Per audit memory `--parallel
--reservations-table` is pathologically slow (>32 min for -n 100M), so
-n 4B in that mode would be many hours per run, not the 5-15 min the
plan estimated. n4b will be captured one-shot post-renderer-unblock as
a manual artifact under audit-runs/post-fix/, not as a test golden. See
crates/xenia-app/tests/golden/README.md.

Test infrastructure:
- crates/xenia-app/tests/sylpheed_oracles.rs — invokes
  CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO
  env var (skips gracefully if missing).
- #[ignore]-gated; run via:
    cargo test --release -p xenia-app --test sylpheed_oracles \\
      -- --ignored --nocapture

Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:46:02 +02:00
MechaCat02
62f673d094 Merge audit-2026-05-fix/swapbug-001-revert-addi-truncation: SWAPBUG-001 revert 2026-05-03 13:38:05 +02:00
MechaCat02
9ab986ec09 fix(cpu): SWAPBUG-001 — revert addi 32-bit truncation
The addi opcode was truncating its result to 32 bits per the post-P4-batch3
"32-bit ABI" rationale (commit bf8208e). Hunk-level bisection during the
2026-05 audit (M11) isolated this single cast as the cause of the
post-P8 swap regression: swaps dropped 2 → 1 and the renderer lost a
frame. PowerISA mandates sign-extension to 64 bits; canary does not
truncate addi. The truncation was a canary-divergent over-extension
of the addis fix (which IS canary-divergent by design, see
addis at interpreter.rs:121-134).

The addi_li_neg_one_zero_extends_upper test encoded the wrong invariant.
Replaced with a sign-extension test asserting canonical PowerISA
behavior (gpr[3] == 0xFFFF_FFFF_FFFF_FFFF for `li r3, -1`).

Verification at -n 100M lockstep:
  swaps:                1 → 2     (gate met)
  draws:                0 → 0     (unchanged — gated by Phase C+D+E)
  instructions:         ~100M (unchanged)
  imports:              11.4M → 987k    (game escapes retry loop)
  packets:              281M → 57M      (same)
  interrupts_delivered: 629 → 630
Tests: 551 passing (unchanged). Lockstep determinism: byte-identical
across two 100M runs except packets (±5%, GPU-thread-race noise floor).

Closes SWAPBUG-001 / PPCBUG-001.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:37:51 +02:00
MechaCat02
caa37fc595 docs(audit): post-P8 end-to-end review findings + acid test result
Document the post-P8 cross-cutting review and acid test outcome:

End-to-end reviewer caught:
- BLOCKING-LIKELY: lwa/lwax/lwaux ISA deviation (fixed in f1166d0)
- Cosmetic: fpscr round_single_toward_zero duplicate-branch (fixed in 09c6c92)
- Minor performance: reservation table active_reservers as slot-occupancy
- Asymmetry note: extswx remains 64-bit ABI per audit PPCBUG-038 (wontfix)

Acid test (-n 4B --parallel --reservations-table, pre-lwa-hotfix build):
- swaps=1, draws=0
- exit 0, no panics, no errors, no RtlRaiseException
- 14 thread spawns, 2 LR-sentinel exits
- Renderer plateau NOT unblocked by cumulative P1-P8 correctness fixes

Implication: the Sylpheed `draws=0` plateau has a non-PPC-correctness
root cause. PPC fixes were correctness-justified independent of the
renderer (well-grounded against canary). Next investigation tracks:
graphics pipeline (EDRAM resolve, RT readback), kernel HLE (event
signaling, timers), or the unresolved BST-validation paradox per
`project_xenia_rs_sylpheed_event_chain_2026_04_29.md`. Out of scope
for the PPC instruction audit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:49:43 +02:00
MechaCat02
09c6c927bd refactor(cpu): fpscr round_single_toward_zero — collapse duplicate-branch ULP step
Post-P8 review nit: the if/else branches were identical
(`adj_bits - 1` either way). Both positive and negative finite f32
values use the IEEE-754 sign bit as the MSB, and subtracting 1 from
`to_bits()` always reduces magnitude by one ULP. Replace the
mock-conditional with the unconditional form + a comment explaining
why one operation works for both signs.

No behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:45:55 +02:00
MechaCat02
f1166d0f75 fix(cpu): revert PPCBUG-105 — lwa/lwax/lwaux sign-extend per PowerISA
Post-P8 end-to-end review caught an ISA deviation introduced by P4
batch 5. The original code used `as i32 as i64 as u64` (correct
PowerISA sign-extension; canary's `SignExtend(INT64_TYPE)`). My P4
batch 5 commit (20a730d) changed all three to `as u64` (zero-extend),
citing the audit's "32-bit-ABI hazard" note for PPCBUG-105.

This deviation is wrong per PowerISA and any 64-bit-mode kernel code
that uses `lwa rT, off(rA)` will silently produce the wrong rT for
negative words (e.g. memory 0x80000000 should yield 0xFFFFFFFF_80000000
but was yielding 0x00000000_80000000).

Restore ISA-spec sign-extension for all three forms (lwa, lwax, lwaux).
The audit's 32-bit-ABI hazard concern was speculative — there's no
evidence that Xbox 360 user code emits `lwa` (compilers use `lwz`).
If a real bug surfaces from a 32-bit-ABI consumer that feeds an
`lwa`-loaded value into a u64 unsigned compare, that's a separate
issue to debug at the consumer site.

Test renamed: lwa_high_bit_set_zero_extends_upper → lwa_sign_extends_to_i64
with assertion flipped to expect 0xFFFFFFFF_80000000.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:43:47 +02:00
MechaCat02
9de18a9eec chore(audit): mark P8 PPCBUGs applied; append P8 progress section; AUDIT-FIX-COMPLETE
P8 phase merged at 4029041. Update audit-findings.md status fields
(38 PPCBUGs marked applied) and append the P8 progress section to
audit-report-2026-04-29.md.

This closes the eight-phase audit-application sweep. Total ~161
PPCBUGs applied across P1-P8. ~12 LOW test-gap IDs remain Status:
open and can be closed incrementally without blocking any
functionality.

Next session: deferred acid test (`xenia-rs check sylpheed.iso
-n 4B --parallel --reservations-table`) to see if cumulative
correctness fixes unblock the Sylpheed renderer plateau (draws=0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:24:24 +02:00
MechaCat02
4029041618 Merge branch 'ppc-audit-fix/p8-tests' — Phase 8 test gap closure
Phase 8 of the PPC instruction audit fix application: pure test gap
closure for opcode groups that previously had near-zero unit test
coverage. 53 new tests across 5 commits (4 batches + review-nit
rename).

- 9827b03: Batch 1 — branch/CR-logical/SPR/MSR/FPSCR/sync (12 tests)
- 2d223ee: Batch 2 — load/store base + lswx/stswx with XER TBC (15 tests)
- ebfd18a: Batch 3 — FPU + VMX float (14 tests)
- 2614806: Batch 4 — VMX integer/permute/load-store (12 tests)
- 1f9696a: review-fix nit — vmsum3fp_… → vmaddfp_lane_fma rename

Independent reviewer verdict: LGTM, no blocking issues, no rubber-
stamp tests, no encoding bugs (every hand-encoded raw cross-checked
against canary's INSTRUCTION table). Two minor follow-ups: the test
rename was applied immediately; the audit cross-reference in batch-4
body is loose (one representative test per group, not 1:1) — accepted.

The XER-TBC tests (`lswx_uses_xer_tbc_for_byte_count`,
`stswx_uses_xer_tbc_for_byte_count`) are load-bearing: they directly
exercise the P6 XER TBC infrastructure, both opcodes were permanent
no-ops pre-P6.

Closed IDs (28): 055, 067, 070, 081, 082, 083, 084, 085, 089, 091,
100, 109, 110, 111, 118, 127, 129, 132, 146, 147, 153, 163, 171, 187,
208, 228, 240, 277, 316/320, 321/323, 370, 438, 439, 440, 490, 517.

Remaining `Status: Open` test-gap LOW IDs are tracked in
audit-findings.md; they don't block any functionality and can be
closed in incremental future work.

Verification at merge: cargo test --workspace --release reports 551
passed, 0 failed (up from 498 at P7 merge; 53 net new tests).
Acid test deferred to end of all phases per user direction.
2026-05-02 14:23:04 +02:00
MechaCat02
1f9696ad47 test(cpu): rename vmsum3fp_… to vmaddfp_lane_fma per reviewer nit
P8 review feedback (non-blocking): the test fn name said vmsum3fp but
the encoding/body actually tests vmaddfp. Rename + clarify comment;
no behavior change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:22:39 +02:00
MechaCat02
261480616c test(cpu): PPCBUG-240/277/278/316/321/370/490/517 P8 batch 4 — VMX integer/permute/load-store
Phase 8 batch 4 — VMX integer + permute/pack + multiply-sum + load/store.

12 new tests:
- VMX add/sub (240): vaddubm byte add, vsubuwm word sub.
- VMX compare (277): vcmpequb lane mask.
- VMX min/max (278): vmaxsw signed lane max.
- VMX shift/rotate (316): vsl 128-bit left shift, vsraw arithmetic per-lane.
- VMX logical (321): vand lane-wise AND.
- VMX permute (370): vsldoi byte concatenation + shift.
- VMX multiply-sum (490): vmaddfp lane FMA.
- VMX load/store (517): lvx aligned quadword load, stvx aligned store,
  lvebx byte-lane load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:16:51 +02:00
MechaCat02
ebfd18a64e test(cpu): PPCBUG-187/208/228/438/439/440 P8 batch 3 — FPU + VMX float
Phase 8 batch 3 — FPU and VMX float test gap closure.

14 new tests:
- Single FPU (187): fadds, fmuls
- Double FPU (208): fmul, fdiv (zero-numerator), fneg, fabs, fmr
- FPU convert/compare (228): fcmpu, fcfid
- VMX float compare (438): vcmpeqfp lane mask
- VMX rounding (439): vrfip, vrfim, vrfiz
- VMX convert (440): vctsxs saturation to INT_MAX/INT_MIN

The VMX VX-form encoding nit (XO is 11 bits at PPC 21-31, host bits 10-0,
with bit 0 the LSB — not bit 1) was caught by initial test failures and
fixed before commit. VC-form (vcmpeqfp) has the same "XO at bit 0" layout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:14:10 +02:00
MechaCat02
2d223eee69 test(cpu): PPCBUG-091/100/109-111/118/127/129/132/146-147/153/163/171 P8 batch 2 — load/store
Phase 8 batch 2 — load/store test gap closure.

15 new tests across the load/store opcodes:
- lbz zero-extend (091), lwbrx byte-swap (109/110), lwarx smoke (111),
  ld doubleword (118), lmw + lswi (127), lswx with XER TBC (127),
  lfs single-to-double widening (129).
- stb (132), sth, stw (146), std (153), stmw + stswx (163), stfs (171).

`lswx_uses_xer_tbc_for_byte_count` and `stswx_uses_xer_tbc_for_byte_count`
specifically lock in the new XER TBC infrastructure landed in P6 (68c0ee5);
both opcodes were permanent no-ops before that.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:10:26 +02:00
MechaCat02
9827b03f1a test(cpu): PPCBUG-055/067/070/081-085/089 P8 batch 1 — branch/CR/SPR/sync
Phase 8 batch 1 — test gap closure for the branch/CR-logical/SPR/MSR/
FPSCR/cache+sync groups.

12 new tests across the affected groups:
- PPCBUG-055 branch: blr, bctr, bcl-LK-on-not-taken
- PPCBUG-070 CR logical: cror, crand, crxor (crclr idiom)
- PPCBUG-067 trap+sc: sc smoke, tw TO=0 never-traps
- PPCBUG-081-085 SPR/MSR/FPSCR moves: mfcr 8-field assembly, mtfsb1/mtfsb0
- PPCBUG-089 cache+sync: sync state-non-mutation smoke

These groups previously had near-zero unit test coverage. New tests lock
in the current ISA-correct behavior; would catch a regression in any of
the dispatch/encoding/result paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:08:54 +02:00
MechaCat02
a7155f4571 chore(audit): mark P7 frozen-snapshot drift cleared (manual regen, no code change)
P7 of the PPC instruction audit fix application: re-ran the
ppc-manual generator (`python3 ppc-manual/generator/generate_manual.py`)
to regenerate all 350 family pages from current xenia-rs and
xenia-canary source. The 3 audit-cited stale snapshots
(PPCBUG-066/117/145) are now refreshed.

Note: the `ppc-manual/` directory is not versioned in xenia-rs/.git,
so this commit is purely the audit-findings status update + report
section. The regen itself happened in-place outside this repo.

Verification: post-regen grep confirms the old "For now, just trace
and continue" stub is gone from every page, and modern constructs
(trap::evaluate, current reservation_line model) appear correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 14:03:23 +02:00
MechaCat02
8b9fddc488 chore(audit): mark P6 PPCBUGs applied; append P6 progress section
P6 phase merged at 112202c. Update audit-findings.md status fields
(13 PPCBUGs marked applied) and append the P6 progress section to
audit-report-2026-04-29.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:58:26 +02:00
MechaCat02
112202c2b9 Merge branch 'ppc-audit-fix/p6-medium' — Phase 6 Other MEDIUM correctness
Phase 6 of the PPC instruction audit fix application: misc MEDIUM
correctness items across trap/sc, XER TBC, MSR/VSCR/FPSCR semantics.
~13 PPCBUGs landed across 4 batches.

- d96986a: Batch 1 — trap PC fix, sc LEV logging, twi typed-trap logging
  (PPCBUG-063/064/065)
- 68c0ee5: Batch 2 — XER TBC infrastructure (enabling lswx/stswx) +
  lswi/stswi nb fix + lmw RA-skip (PPCBUG-123/124/125/126/161/162/566)
- 0f2a26c: Batch 3 — mcrfs VX recompute, mtmsrd L=1 partial, mfvscr zero
  (PPCBUG-068/078/080)
- 99e7814: Batch 4 — mulld_ov INT_MIN*-1 verification + auto-resolved
  markers for PPCBUG-021/022/027/039
- 5ece5e3: review-fix nit — mcrfs uses existing fpscr::VX_ALL constant

Independent reviewer verdict: all 4 commits LGTM, one cosmetic nit
(applied immediately in 5ece5e3). Audit fix-shapes match canary
prescriptions; trap-PC change verified against all StepResult::Trap
consumers; XER TBC field initialization verified through the single
PpcContext::new() construction path.

Two structural enum extensions deferred (not yet needed by any
consumer):
- StepResult::HypervisorCall variant (would enable PPCBUG-064 routing
  for sc 2)
- StepResult::Trap { type_code: u16 } payload (would enable PPCBUG-065
  routing for typed C++ traps; relevant if SEH dispatch is added)

Cosmetic / test-coverage items left for future cleanup batch:
PPCBUG-642 (cosmetic disasm), PPCBUG-643/644 (SIMM/D-form hex display),
PPCBUG-367/368 (vupkhpx/vpkpx channel ordering), PPCBUG-487/495 (vsum
naming), PPCBUG-515/516 (lvebx/lvsr docs), PPCBUG-601 (decode_op6 doc).

Verification at merge: cargo test --workspace --release reports 498
passed, 0 failed. Acid test deferred to end of all phases.
2026-05-02 13:57:00 +02:00
MechaCat02
5ece5e315f refactor(cpu): mcrfs uses fpscr::VX_ALL constant per reviewer nit
P6 review nit: replace the inline `const VX_ALL_MASK` in the mcrfs arm
with the existing `fpscr::VX_ALL` constant (single source of truth).
Behaviorally identical.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:56:34 +02:00
MechaCat02
99e7814836 test(cpu): PPCBUG-022 verify mulld_ov INT_MIN*-1 + auto-resolved markers
Phase 6 batch 4 — overflow/cleanup verification.

- PPCBUG-022 mulld_ov INT_MIN * -1: the audit-claimed missing edge case
  is actually handled by `i64::checked_mul()` (returns None when the
  result would be -i64::MIN = i64::MAX+1, which doesn't fit). New
  regression tests in overflow.rs confirm: INT_MIN * -1 overflows;
  INT_MIN * 1 doesn't; (INT_MIN+1) * -1 = INT_MAX, no overflow.
  Audit's claim was incorrect; documented by the new tests.
- PPCBUG-021 (overflow.rs OE checks at bit 63): largely auto-resolved
  by P4 batch 6 (16993bb), which switched all 32-bit ABI ops to inline
  `true_sum != (result32 as i32) as i128`. Helpers like add_ov_64 are
  now only called from 64-bit ABI ops where bit-63 is correct.
- PPCBUG-027 (rlwimix upper-32 zeros): auto-resolved by P4 (rlwimix
  now writes via `as u32 as u64` truncation).
- PPCBUG-039 (cntlzdx 32-bit-ABI): wontfix per audit — only matters
  if a 32-bit-ABI binary emits cntlzd, which compilers don't.

Remaining low-impact items (PPCBUG-642 ISA-undefined fmt_bcctr decr,
PPCBUG-643/644 SIMM/D-form hex display, PPCBUG-367/368 vupkhpx/vpkpx
channel ordering, PPCBUG-487/495 vsum operand naming, PPCBUG-515/516
lvebx/lvsr documentation, PPCBUG-601 decode_op6 invariant doc) are
left for a P9 or follow-up batch — they're cosmetic/test-coverage
items rather than correctness bugs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:51:43 +02:00
MechaCat02
0f2a26c460 fix(cpu): PPCBUG-068/078/080 mcrfs VX recompute + mtmsrd L=1 + mfvscr zero
Phase 6 batch 3 — SPR/MSR/VSCR semantics.

- PPCBUG-078 mtmsrd L=1: PowerISA requires partial-MSR-write — only
  MSR[EE] (u64 bit 15) and MSR[RI] (u64 bit 0) modified, all other
  MSR bits preserved. Used by kernel code to toggle external interrupts.
  Previously merged with mtmsr (full overwrite), silently corrupting
  MSR for any L=1 caller.
- PPCBUG-080 mfvscr: ISA places VSCR in the rightmost word of VD with
  bytes 0-11 zeroed. Previously copied the full 128-bit ctx.vscr,
  leaking stale upper data to guest. Now zero-extends per canary.
- PPCBUG-068 mcrfs VX summary: when mcrfs clears VX* exception bits,
  the VX summary bit at FPSCR[2] must be recomputed (clears if all
  contributors are 0; remains 1 otherwise). Previously left stale,
  causing subsequent CR-test sequences to misread the FPU state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:50:10 +02:00
MechaCat02
68c0ee55ce fix(cpu): PPCBUG-123/124/125/126/161/162/566 XER TBC + lswi/stswi/lmw
Phase 6 batch 2 — XER TBC enabling + load/store-multiple cleanups.

- PPCBUG-123/124/161/566 (coupled): XER TBC field was unmodelled —
  `ctx.xer()` always returned 0 in bits 0-6, and `ctx.set_xer()`
  silently discarded any TBC writes. Result: `lswx` and `stswx` were
  permanent no-ops (the `while bytes_left > 0` loop never executed).
  Fix: add `pub xer_tbc: u8` to `PpcContext`; wire into `xer()` and
  `set_xer()`. Initialize to 0 in `PpcContext::new()`. lswx/stswx
  bodies are correct as-is once the infrastructure is wired.

- PPCBUG-125 lmw: PowerISA marks `lmw rT, D(rA)` invalid when rA is
  in [rT..31]; canary skips the write to rA to preserve the EA base.
  Now matches canary.

- PPCBUG-126/162 lswi/stswi: replaced `instr.rb()` with `instr.nb()`
  for the NB field. Both accessors return identical values today
  (bits 16-20), but the maintenance hazard from the misnomer is now
  removed. A future `rb()` type-system refactor would have broken
  lswi/stswi silently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:48:03 +02:00
MechaCat02
d96986a10e fix(cpu): PPCBUG-063/064/065 trap PC + sc LEV + twi typed-trap logging
Phase 6 batch 1 — trap/sc semantics.

- PPCBUG-063 trap PC: previously ctx.pc was incremented to CIA+4 BEFORE
  StepResult::Trap returned, forcing handlers to .wrapping_sub(4) to
  recover the faulting instruction address. Now ctx.pc stays at CIA on
  trap, matching SRR0 semantics on real hardware. Critical for any
  future SEH/exception-delivery path (e.g. the Sylpheed C++ throw work).
- PPCBUG-065 typed-trap logging: `twi 31, r0, IMM` is the Xbox 360
  CRT/kernel typed-trap convention encoding C++ exception class via
  SIMM. The trace now logs the SIMM type code when this pattern fires.
  Routing the type code via a StepResult payload requires an enum
  extension (multiple consumer sites) that's deferred.
- PPCBUG-064 sc LEV logging: `sc 2` is the Xbox 360 hypervisor-call
  convention; canary dispatches it to a different handler than `sc 0`.
  Now logs a warning when LEV != 0. Routing LEV=2 to a HypervisorCall
  variant also requires a StepResult enum extension; deferred.

The two enum-extension follow-ups can land as a structural sub-batch
once a clear consumer (SEH dispatch, hypervisor-call HLE) is in place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 13:42:50 +02:00
MechaCat02
9f88e275b8 chore(audit): mark P5 PPCBUGs applied; append P5 progress section
P5 phase merged at d39d0ba. Update audit-findings.md status fields
(21 PPCBUGs marked applied) and append the P5 progress section to
audit-report-2026-04-29.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:39:46 +02:00
MechaCat02
d39d0bab4d Merge branch 'ppc-audit-fix/p5-fpu' — Phase 5 FPU correctness
Phase 5 of the PPC instruction audit fix application: FPU correctness
across the scalar FPU and VMX float arithmetic. ~22 PPCBUGs across
6 sub-sections (5a-5f).

- f6a444b: 5a — round_to_i64 + vrfin round-to-even (PPCBUG-221+227, 432)
- 26b9897: 5b — FMA VXISI + NaN sign preservation (PPCBUG-181/182/183/202/203/205)
- 49bf74f: 5c — FPU XX-on-inexact for conversions (PPCBUG-223/224/225/229/230)
- 538fa5a: 5d — VSCR.NJ subnormal flush for VMX float (PPCBUG-435/436/437)
- 6ba8f83: 5e — fresx canary parity (PPCBUG-184)
- 6fe2cbf: 5f — single-FMA vnmsubfp + vctsxs NaN saturation (PPCBUG-426/427/433)
- 05f2f72: review-fix nit — vrfin uses stdlib round_ties_even

Independent reviewer found no blocking issues; two minor follow-up
items remain open for tracking. The vrfin nit was applied immediately
in 05f2f72.

Three substantive PPCBUGs were explicitly deferred — each requires
substantial helper rework that's better landed as focused sub-batches:
- PPCBUG-201: FPSCR.RN for double arithmetic (MXCSR set/restore wrappers)
- PPCBUG-185: FPSCR.NI flush for scalar FPU (NI bit constant + post-op flush)
- PPCBUG-180/200: XX/FR/FI in update_after_op (pre-vs-post-round comparison)

These remain Status: open in audit-findings.md and will be picked up in
a P5b sub-batch or P9 (test gaps) per planning.

Verification at merge: cargo test --workspace --release reports 498
passed, 0 failed. Acid test deferred to end of all phases per user
direction.
2026-05-02 12:38:18 +02:00
MechaCat02
05f2f72c71 refactor(cpu): vrfin uses stdlib f32::round_ties_even() per reviewer nit
P5 review feedback (non-blocking): replace the inline round-to-even
implementation with the stable stdlib intrinsic (Rust 1.77+).
Functionally equivalent; cleaner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:37:54 +02:00
MechaCat02
6fe2cbf251 fix(cpu): PPCBUG-426/427/433 single-FMA vnmsubfp + vctsxs NaN saturation
Phase 5 batch 6 (5f): saturation and FMA-rounding fixes.

- PPCBUG-426 vnmsubfp: was `bi - ai * ci` (two rounding steps); now
  `-ai.mul_add(ci, -bi)` which is mathematically equivalent (= bi - ai*ci)
  but uses a single FMA round per ISA.
- PPCBUG-427 vnmsubfp128: same single-FMA fix.
- PPCBUG-433 vctsxs / vcfpsxws128 NaN saturation: AltiVec ISA saturates
  NaN to INT_MIN (0x80000000); xenia returned 0. The vctuxs (unsigned)
  NaN→0 is correct per ISA.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:31:10 +02:00
MechaCat02
6ba8f83c30 fix(cpu): PPCBUG-184 fresx pre-quantize input to f32 (canary parity)
Phase 5 batch 5 (5e): minimal-viable fix for the estimate-precision
family. Hardware Xenon `fres` produces a ~12-bit LUT estimate; xenia
and canary both produce a fully IEEE single reciprocal, but canary
pre-quantizes the f64 input to f32 to at least match the input
precision. Now matches canary.

PPCBUG-428..431 (vrefp/vrsqrtefp/vexptefp/vlogefp) already operate on
f32 inputs naturally (no f64 → f32 quantization step needed); the
estimate-precision deviation is purely the output side. Newton-Raphson
convergence is unaffected. Documented in audit-findings.md as
LOW-impact full-fix-requires-LUT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:29:07 +02:00
MechaCat02
538fa5ab74 fix(cpu): PPCBUG-435/436/437 VSCR.NJ subnormal flush for VMX float
Phase 5 batch 4 (5d) — partial: VSCR.NJ subnormal flush for VMX float
arithmetic. Xbox 360 always boots with NJ=1, so games expect inputs
and outputs flushed to ±0.

- PPCBUG-435 vaddfp/vaddfp128/vsubfp/vsubfp128/vmulfp128: previously
  no flush at all on these opcodes (only vmaddfp family flushed).
  Now flushes both inputs and output per Canary's unconditional model.
- PPCBUG-436 vmsum3fp128/vmsum4fp128: per-product intermediates now
  flushed individually (was only the final sum).
- PPCBUG-437 vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp/vnmsubfp128:
  outputs now flushed (inputs were already flushed).

PPCBUG-185 (FPSCR.NI flush for scalar FPU) deferred — requires adding
a NI bit constant and post-op flush wrapper across all *sx arms; will
land in a focused sub-batch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:26:36 +02:00
MechaCat02
49bf74fae6 fix(cpu): PPCBUG-223/224/225/229/230 FPU XX bit on inexact conversions
Phase 5 batch 3 (5c) — partial: targeted XX-on-inexact fixes for the
float-to-int and double-to-single conversion family. (PPCBUG-180/200,
the broader update_after_op XX/FR/FI rework, deferred to a focused
sub-batch.)

- PPCBUG-225 frspx: set XX when the f64→f32 round produces a different
  value (i.e. precision loss). Almost every frsp call is inexact —
  previously games polling FPSCR.XX never saw the set bit after a frsp.
- PPCBUG-224 fcfidx: set XX when the i64 input has > 53 significant
  bits (precision lost in conversion to f64).
- PPCBUG-229 fctidx/fctidzx: set XX when input is non-integer (fractional
  part discarded by the conversion).
- PPCBUG-230 fctiwx/fctiwzx: same shape for word-width conversions.
- PPCBUG-223 verified already correct in current code (fcmpo sets
  VXSNAN/VXVC on NaN operands; the audit-cited drift was already fixed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:22:47 +02:00
MechaCat02
26b98975c3 fix(cpu): PPCBUG-181/182/183/202/203/205 FMA VXISI + NaN sign preservation
Phase 5 batch 2 (5b): VXISI / NaN handling for the FMA family.

The 8 FMA opcodes (fmaddx/fmaddsx/fmsubx/fmsubsx/fnmaddx/fnmaddsx/fnmsubx/
fnmsubsx) all share two fix shapes:

1. VXISI on the add/sub step. The previous code passed `a*c` to
   check_invalid_add, which has separate rounding from the FMA. In
   extreme cases this gives the wrong sign (PPCBUG-202) or wrong infinity
   status. Worse, fmsub/fnmadd/fnmsub had NO add-step VXISI check at all
   (PPCBUG-181/182/203). The fnmsub pattern is the canonical Newton-
   Raphson step — the most common FPU path in Xbox 360 graphics code.

2. NaN sign preservation in fnmadd/fnmsub. ISA Book I §4.3.4 forbids
   negation of a NaN FMA result; Rust's unary `-` flips the IEEE-754
   sign bit (PPCBUG-183/205).

Fixes:
- fpscr.rs: new helper `check_invalid_fma_add(ctx, a, c, b, sub)` that
  derives VXISI from input properties (mathematical-product sign +
  b sign) instead of from the lossy `a*c` value. Also covers SNaN.
- interpreter.rs: all 8 FMA arms now use the new helper; fnmadd[s]/
  fnmsub[s] gate the negation on `!fma.is_nan()`.

Tests:
- fmsub_inf_minus_inf_sets_vxisi: regression for PPCBUG-203.
- fnmadd_nan_input_preserves_nan_sign: regression for PPCBUG-205.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:20:02 +02:00
MechaCat02
f6a444b9d1 fix(cpu): PPCBUG-221+227 round_to_i64 + PPCBUG-432 vrfin round-to-even
Phase 5 batch 1 (5a): round-to-int correctness.

PPCBUG-221+227 (coupled): round_to_i64 NearestEven tie-breaking used
`(diff - 0.5).abs() < f64::EPSILON` to detect half-integers, but for
|v| > 2^52 every f64 value is an exact integer (v.trunc() == v), giving
diff == 0. The buggy check fell through to v.round() (round-half-away-
from-zero), giving wrong results for large odd half-integers. Replaced
with a fractional-part-only check that's exact for |v| <= 2^52 and
degenerates to truncation above.

PPCBUG-432: vrfin/vrfin128 used Rust's `f32::round()` which is round-
half-away-from-zero. ISA requires round-to-nearest-even (banker's
rounding). Implemented inline.

PPCBUG-201 (FPSCR.RN for double arithmetic) deferred — requires
MXCSR-set/restore wrappers around 10+ FPU arms; will land in a focused
sub-batch after the remaining 5a-5f fixes.

Tests:
- round_to_i64_nearest_even_on_tie: extended with 0.5, 1.5, -0.5, -1.5.
- round_to_i64_non_tie_cases: 0.4/0.6 (non-tie sanity).
- round_to_i32_nearest_even_on_tie: PPCBUG-227 coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:13:08 +02:00
MechaCat02
5c45108249 chore(audit): mark P4 PPCBUGs applied; append P4 progress section
P4 phase merged at d945aea. Update audit-findings.md status fields
(43 PPCBUGs marked applied) and append the P4 progress section to
audit-report-2026-04-29.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:09:26 +02:00
MechaCat02
d945aeae83 Merge branch 'ppc-audit-fix/p4-abi-truncation' — Phase 4 ABI truncation
Phase 4 of the PPC instruction audit fix application: 32-bit ABI writeback
truncation across the integer ALU. Six commits + one review-fix land
~43 PPCBUG IDs.

- e18a0a4: 4a active poisoning, NOT/SUB family (006/008/018/019/028/029/030/031/033)
- 145a7a4: 4a/4d coupled extsbx+extshx + CR0 (034+035+036+037)
- bf8208e: 4b immediate ALU (001/002/003/004/005/007)
- 82a9bff: 4b mul/div + srawx coupled (009/010+011/041+042+043)
- 20a730d: 4b halfword + lwa loads (095/096/097/098/105)
- 16993bb: 4c latent + 4d CR0 catch-all (012-017/020/023-026/032/044)
- 49103bb: review-fix — subfx/subfcx OE predicate + mulli test rigor

Independent reviewer caught a blocking issue: subfx/subfcx OE handlers
in batch 6 hadn't been migrated to the inline 32-bit overflow predicate
(`true_diff != (result32 as i32) as i128`), still using the legacy
`sum_overflow_64` which gave spurious OV=1 for any legitimate i32::MIN
result. Fixed in 49103bb with two new discriminating regression tests.

Verification at merge: cargo test --workspace --release reports 494
passed, 0 failed. Acid test deferred to end of all phases per user
direction.

The 32-bit ABI invariant — every GPR write zero-extends from a u32
result, every CR0 update views the result as i32 — is now systematically
restored across the integer ALU. Downstream 64-bit unsigned compares
(the addis-incident shape) can no longer be fed polluted upper bits.
2026-05-02 12:07:53 +02:00
MechaCat02
49103bb898 fix(cpu): P4 review-fix — subfx/subfcx OE predicate + mulli test rigor
Independent reviewer of the P4 branch found two issues:

(1) BLOCKING — subfx and subfcx OE handlers still called the legacy
`overflow::sum_overflow_64(true_diff, result32 as u64)` while batch 6
had migrated all add* sites to the inline `true_sum != (result32 as i32)
as i128` form. The legacy helper compares `true_diff` against
`(result32 as u64) as i64 as i128`, which views any bit-31-set result
as a positive i64 (e.g. result=0x80000000 → +2147483648 in i64). For a
legitimate i32::MIN result with no actual 32-bit overflow, this caused
spurious OV=1.

Concrete repro now caught by `subfo_no_spurious_ov_when_result_has_bit31_set`:
r3=1, r4=0x80000001 → result=0x80000000, true_diff=-2147483648, no OV.
Pre-fix: spurious OV=1.

(2) Minor — `mulli_overflow_wraps_to_32` rubber-stamped: with ra=0x80000000
and imm=2, both pre-fix (`as i64 as u64`) and post-fix (`as u32 as u64`)
write the same value. Replaced with ra=u64::MAX (polluted upper bits) where
pre-fix writes 0xFFFFFFFF_FFFFFFFE and post-fix writes 0x00000000_FFFFFFFE.

Fixes:
- interpreter.rs subfx/subfcx OE: switch to inline 32-bit predicate
  matching the rest of batch 6.
- subfo_sets_xer_ov_on_min_minus_one: renamed and updated to test 32-bit
  overflow (r4=0x80000000 - 1 = 0x7FFFFFFF, OV=1).
- New: subfo_no_spurious_ov_when_result_has_bit31_set (PPCBUG-017
  review-fix regression).
- New: subfco_no_spurious_ov_when_result_has_bit31_set (same for PPCBUG-007).
- mulli_overflow_wraps_to_32: redesigned with polluted upper bits to
  actually discriminate pre/post fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:07:32 +02:00
MechaCat02
16993bb8af fix(cpu): PPCBUG-012-017/020/023-026/032/044 4c+4d latent + CR0 catch-all
Phase 4 batch 6: latent writeback truncation (4c) and CR0 catch-all (4d).
~13 PPCBUGs across all remaining 32-bit ABI ALU sites.

Latent writeback (4c) — the 4a/4b fixes already eliminate the upstream
poisoning, but a defensive truncation here catches any future regression:
- PPCBUG-012 addx, PPCBUG-013 addcx, PPCBUG-014 addex, PPCBUG-015 addzex,
  PPCBUG-016 addmex, PPCBUG-017 subfx — all rewritten to compute on u32
  operands and write `as u64`. CA computed via 32-bit unsigned compare.
  Overflow now uses `true_sum != (result32 as i32) as i128` (32-bit
  predicate, since sum_overflow_64 is i64-bounded).
- PPCBUG-032 andx/orx/xorx — CR0 catch-all only (results inherit upper
  bits from operands; once those are clean, no truncation needed).

CR0 catch-all (4d) — fix the `update_cr_signed(0, X as i64)` pattern at
every 32-bit-ABI Rc=1 path:
- PPCBUG-020 catch-all: applied to mulhwx, mulhwux, divwux, mullwx (was
  already done in batch 4), addx/addcx/addex/addzex/addmex/subfx (now in
  4c above), andx/orx/xorx, andix, andisx, slwx, srwx, cntlzwx,
  rlwinmx, rlwimix, rlwnmx, mullwx (already), divwx (already),
  srawx/srawix (already in batch 4).
- PPCBUG-023 andisx: now correctly classifies bit-31 results as CR0.LT.
- PPCBUG-024 rlwinmx, PPCBUG-025 rlwimix, PPCBUG-026 rlwnmx.
- PPCBUG-044 slwx/srwx: bit-31 result like 0x80000000 now CR0.LT.

64-bit ABI ops (rldicl/rldicr/rldic/rldimi/rldcl/rldcr, sldx/srdx/sradx/
sradix, mulhdx/mulhdux/mulldx, divdx/divdux, cntlzdx) intentionally retain
the 64-bit `as i64` form per ISA — these are 64-bit-mode instructions.

Updated old tests:
- addo_sets_xer_ov_on_signed_overflow_and_stickies_so: i32::MAX + 1 → INT_MIN.
- addx_rc_uses_64bit_compare_not_32bit: renamed to ..._uses_32bit_compare_in_xbox_abi
  with assertions flipped to the correct 32-bit ABI behavior.

New tests:
- andisx_sign_bit_set_classifies_lt (PPCBUG-023).
- slwx_high_bit_result_classifies_lt (PPCBUG-044).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:55:50 +02:00
MechaCat02
20a730d69e fix(cpu): PPCBUG-095/096/097/098/105 halfword + lwa load truncation
Phase 4 batch 5: 5 PPCBUGs in the load family. lha/lhax/lhau/lhaux
sign-extended halfword results to u64 (active poisoning for negative
halfwords); lwa/lwax/lwaux sign-extended u32 results.

- PPCBUG-095/096/097/098 lha[ux]: `as i16 as i64 as u64` →
  `as i16 as i32 as u32 as u64`. Sign-extend to i32 then zero-extend.
  Common trigger: int16_t struct fields, PCM samples, packed vertex
  deltas. Memory 0x8000 was producing 0xFFFFFFFF_FFFF8000.
- PPCBUG-105 lwa/lwax/lwaux: `as i32 as i64 as u64` → `as u64`.
  Per-canary the 64-bit-mode form sign-extends, but in 32-bit ABI we
  must zero-extend (canary's behavior is rescued by x86 register
  zeroing in JIT; pure interpreter has no escape). Memory 0x80000000
  was producing 0xFFFFFFFF_80000000.

Tests:
- lha_negative_halfword_zero_extends_upper (PPCBUG-095).
- lhaux_negative_halfword_clean_writeback (PPCBUG-098 + EA update).
- lwa_high_bit_set_zero_extends_upper (PPCBUG-105).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:47:24 +02:00
MechaCat02
82a9bff934 fix(cpu): PPCBUG-009/010+011/041+042+043 mul/div + srawx truncation
Phase 4 batch 4: mulwx, divwx (coupled +CR0), srawx/srawix (coupled +CR0).

- PPCBUG-009 mullwx: 32-bit ABI. Product truncated to u32 before write.
  OE handler still uses full i64 product to detect overflow.
- PPCBUG-010+011 divwx (coupled): quotient zero-extended (canary uses
  ZeroExtend(v, INT64_TYPE)). CR0 view via i32 — without this, a negative
  i32 quotient (e.g. -3 from -10/3) would be classified as positive in
  i64 view of the now-zero-extended writeback.
- PPCBUG-041+042+043 srawx/srawix (coupled): writeback uses `as u32 as u64`
  (was `as i64 as u64`). All-ones case (sh>=32 with negative input) writes
  0x00000000_FFFFFFFF instead of u64::MAX. CR0 view via i32. CA logic
  preserved unchanged (audit-verified independently correct).

Tests:
- mullwx_overflow_truncates_to_32 (PPCBUG-009).
- divwx_negative_quotient_zero_extends (PPCBUG-010+011).
- srawx_negative_value_zero_extends_upper (PPCBUG-041+043).
- srawix_high_count_negative_input_yields_low32_all_ones (PPCBUG-042+043).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:44:34 +02:00
MechaCat02
bf8208e88c fix(cpu): PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation
Phase 4 batch 3: 6 PPCBUGs in the same-shape-as-addis (4b) sub-section.
All share the pattern of computing on 64-bit values when the 32-bit ABI
requires u32 arithmetic.

- PPCBUG-001 addi: `li rT, -1` produced 0xFFFFFFFF_FFFFFFFF; now 0x00000000_FFFFFFFF.
- PPCBUG-002 addic: writeback truncated + CA from u32 unsigned compare
  matching canary's `AddDidCarry`.
- PPCBUG-003 addicx: same plus CR0 i32 view (regression vs. the frozen
  ppc-manual snapshot which had the correct form).
- PPCBUG-004 mulli: 64-bit signed product now truncated to 32 bits.
- PPCBUG-005 subficx: writeback + CA in u32 space; removes the bits-32-63
  pollution from sign-extended negative SIMM.
- PPCBUG-007 subfcx: defensive 32-bit truncation of CA compare. Same shape
  as the compare that broke addis (0x828F3F98 / 0x828F3F68 case).

Tests:
- addi_li_neg_one_zero_extends_upper (PPCBUG-001).
- addic_carry_uses_32bit_compare (PPCBUG-002).
- mulli_overflow_wraps_to_32 (PPCBUG-004).
- subficx_neg_simm_zero_extends (PPCBUG-005).
- subfcx_addis_incident_case (PPCBUG-007 — exact addis-incident case).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:41:49 +02:00
MechaCat02
145a7a4019 fix(cpu): PPCBUG-034+035+036+037 extsbx/extshx writeback + CR0 (coupled)
Phase 4 batch 2: extsbx and extshx writeback truncation + CR0 view fix.
Coupled per audit — must land together because the writeback fix would
silently break CR0 sign classification if the CR0 fix didn't ship in
the same commit.

Before:
- extsbx: `as i8 as i64 as u64` — every negative byte poisoned upper
  32 bits (active poisoning, not latent). 0x80 → 0xFFFFFFFF_FFFFFF80.
- extshx: same shape for halfwords.
- CR0: `as i64` view — accidentally correct on the buggy 64-bit form
  because the high bits matched the byte's sign bit.

After:
- extsbx: `as i8 as i32 as u32 as u64` — sign-extend to i32 then
  zero-extend to u64. 0x80 → 0x00000000_FFFFFF80.
- extshx: same for halfwords.
- CR0: `as u32 as i32 as i64` — i32 view, so a result with bit 31 set
  is correctly classified as negative under the 32-bit ABI.

Tests:
- extsbx_negative_byte_zero_extends_upper: 0x80 input → 0x00000000_FFFFFF80
  with CR0.LT set.
- extshx_negative_halfword_zero_extends_upper: same shape for 0x8000.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:38:22 +02:00
MechaCat02
e18a0a40b8 fix(cpu): PPCBUG-006/008/018/019/028/029/030/031/033 4a active poisoning
Phase 4 batch 1: 9 PPCBUGs in the active-poisoning sub-section. All
follow the pattern `!val` on u64, which unconditionally flips the upper
32 bits and poisons the GPR even with clean inputs — every execution
corrupts the high 32 bits regardless of upstream state.

Sub/neg family:
- PPCBUG-006 negx: `(!ra).wrapping_add(1)` on u64 + neg_ov_64 checks
  64-bit INT_MIN. Fix: do arithmetic in u32, OE checks PPC[ra32==0x80000000].
- PPCBUG-008 subfex: same shape as above plus 64-bit unsigned CA compare.
  Fix: cast all operands to u32, compute, write `as u64`.
- PPCBUG-018 subfzex: `!ra` on u64. Fix: u32 arithmetic.
- PPCBUG-019 subfmex: `!ra` on u64 + always-true CA edge (`!ra != 0`
  was always true for clean ra<0xFFFFFFFF because high bits of !u64
  are non-zero). Fix: u32 arithmetic; CA predicate now correct.

Logical NOT family:
- PPCBUG-028 orcx: rs | !rb on u64 → high-bit poison.
- PPCBUG-029 norx: !(rs|rb) — the `not` simplified mnemonic. Hot path,
  every `not` corrupted GPR upper 32 bits.
- PPCBUG-030 nandx: !(rs&rb).
- PPCBUG-031 eqvx: !(rs^rb). The common `eqv rA,rA,rA` set-to-all-ones
  idiom now produces 0x00000000_FFFFFFFF instead of 0xFFFFFFFF_FFFFFFFF.
- PPCBUG-033 andcx: rs & !rb.

CR0 update at every Rc=1 path now uses `as u32 as i32 as i64` so a result
with bit 31 set gets classified as negative under the 32-bit ABI (was
positive before because upper bits were ones; will be positive in new
truncated form unless we cast through i32). This pre-emptively addresses
PPCBUG-020 for these specific opcodes; the catch-all sweep in batch 6
covers the remaining sites.

Tests:
- nego_sets_ov_only_on_int_min: updated from i64::MIN → 0x80000000 (32-bit).
- test_subfze_carry_only_when_ra_zero_and_ca_one: result expectations
  updated from u64::MAX → 0xFFFFFFFF (low 32 bits, upper 32 zero).
- New: neg_clean_input_no_upper_bits (PPCBUG-006 regression).
- New: norx_not_simplified_keeps_upper_bits_clean (PPCBUG-029 regression).
- New: eqvx_self_self_self_sets_low32_to_all_ones (PPCBUG-031 regression).
- New: andcx_bit_clear_keeps_upper_clean (PPCBUG-033 regression).
- New: subfex_clean_inputs_no_upper_bits (PPCBUG-008 regression).
- New: subfmex_ra_max_ca_zero_clears_ca (PPCBUG-019 always-true CA fix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:35:05 +02:00
MechaCat02
f424132a5b chore(audit): mark P3 PPCBUGs applied; append P3 progress section
P3 phase merged at f3ebaba. Update audit-findings.md status fields and
append the P3 progress section to audit-report-2026-04-29.md, including
the new PPCBUG-700 discovery (VMX128 register accessor canary-compliance).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:28:38 +02:00
MechaCat02
f3ebaba5c9 Merge branch 'ppc-audit-fix/p3-isolated-high' — Phase 3 isolated HIGH bugs
Phase 3 of the PPC instruction audit fix application. Six commits land
six independent (or coupled) PPCBUG fixes:

- cedee3c: PPCBUG-510 stvewx128 16-byte corruption → 4-byte word write
- 52ece4b: PPCBUG-424+425 vmaddfp128/vmaddcfp128 operand swap (VA*VD+VB)
- 3d8e2ce: PPCBUG-053+054 32-bit CTR semantics in bcx/bclrx + mtspr CTR
- d4f6ea7: PPCBUG-640+650 fmt_bc spurious bdnzge/bdzge condition suffix
- 2be25bd: PPCBUG-641+649 sync vs lwsync L-field disambiguation
- 7609dcd: PPCBUG-700 VMX128 register accessors → canary bitfield layout

PPCBUG-700 was a discovery during phase end-to-end review: an independent
reviewer cross-checked our va128/vb128/vd128/vx128r_rc_bit accessors
against canary's `FormatVX128*` bitfield struct (xenia-canary
`ppc_decode_data.h:484-663`) and found the bit positions were wrong on
all four. The audit's line-2958 "confirmed-clean" assessment was based
on a miscount of LSB-first packed C++ bitfields. Real Xbox 360 game code
follows canary's convention, so any production VMX128 instruction with
register VR >= 32 was silently mis-decoded — though no unit test
exercised that path until 52ece4b's operand-swap fix exposed the
inconsistency. Subsumes PPCBUG-422's prescribed Rc-bit position.

Verification at merge: `cargo test --workspace --release` clean across
all crates; targeted vmx128/decoder/disasm-golden tests green. Acid test
(`-n 4B --parallel`) deferred to end-of-all-phases per user direction.
2026-05-02 11:22:54 +02:00
MechaCat02
7609dcd406 fix(cpu): PPCBUG-700 VMX128 register accessors match canary bitfield layout
Independent review of P3 batch 2 (52ece4b) found that all three VMX128
register accessors disagreed with canary's FormatVX128/VX128_R bitfield
struct (`xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`). The
audit at line 2958 had marked these "confirmed-clean" but had miscounted
LSB-first bitfield offsets.

Canary's actual layout (LSB-first, GCC/Clang/MSVC on x86):
  VA128 = VA128l(5) | VA128h(1)<<5 | VA128H(1)<<6
        = PPC[11:15] | PPC[26]<<5 | PPC[21]<<6  (7-bit selector, 3 fields)
  VB128 = VB128l(5) | VB128h(2)<<5
        = PPC[16:20] | PPC[30:31]<<5            (7-bit selector, 2 fields)
  VD128 = VD128l(5) | VD128h(2)<<5
        = PPC[6:10]  | PPC[28:29]<<5            (7-bit selector, 2 fields)
  VX128_R Rc = PPC[25]  (host bit 6)             not PPC[27] as prior fix had

The buggy convention was internally consistent with hand-crafted test
fixtures (which set bits 29/21/22 to encode the high registers, matching
the buggy accessor). Real Xbox 360 game code follows canary's convention,
so any production VMX128 instruction with VR >= 32 was silently mis-decoded
— but no unit test exercised that path until the va128 fix in 52ece4b
exposed the inconsistency.

Changes:
- decoder.rs: rewrite va128/vb128/vd128/vx128r_rc_bit to canary positions.
  Drop the speculative `key4_dt` dot-form dispatch in decode_op6 — canary
  has no separate dot-form opcodes for VX128_R compute ops; Rc is a
  runtime modifier read by the interpreter via vx128r_rc_bit().
- decoder.rs tests: rewrite vmx128_test_word helper for canary layout;
  rename/re-encode vmx128_vd128_*, vmx128_va128_*, vmx128_vb128_* tests.
- interpreter.rs: update encode_vpkd3d128 test helper to encode VD via
  canary's VD128h field; tests now pass vd=96 explicitly.
- tests/disasm_goldens.rs: replace the vrlimi128/vsrw128/vpermwi128/
  vperm128 hand-encoded raws with canary-compliant encodings; introduce
  a shared `encode_vx128` helper.
- tests/golden/vmx128_registers.json: re-encode 9 entries (vperm128,
  vsrw128 ×2, vpermwi128, vrlimi128 ×2, vmaddfp128, vmaddcfp128,
  vnmsubfp128) to canary-compliant raws preserving the same expected
  operand strings.
- audit-findings.md: new PPCBUG-700 entry documenting the discovery and
  invalidating the audit's "confirmed-clean" assessment.

Affects all VMX128 binary ops (vaddfp128, vsubfp128, vmulfp128, vand128,
vor128, vxor128, vnor128, vandc128, vsel128, vslo128, vsro128, vperm128,
vsrw128, vmaddfp128, vmaddcfp128, vnmsubfp128, vpkd3d128, vpkshss128,
vpkshus128, vpkswss128, vpkswus128, vpkuhum128, vpkuhus128, vpkuwum128,
vpkuwus128, vmsum3fp128, vmsum4fp128, vrlimi128, vpermwi128 — 30+
opcodes), plus VX128_R compare dot-forms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 11:22:20 +02:00
MechaCat02
2be25bdd41 fix(disasm): PPCBUG-641+649 sync/lwsync L-field discrimination
PPCBUG-641: PpcOpcode::sync emitted "sync" regardless of the L-field at
PPC bit 10. The Xbox 360 acquire barrier (encoding 0x7C2004AC, L=1) is
lwsync, used in every spinlock. The disassembly DB stored every lwsync
as `mnemonic='sync'`, so `SELECT WHERE mnemonic='lwsync'` returned zero
rows regardless of binary content.

PPCBUG-649 (companion): the golden fixture for lwsync had no ext_mnemonic
field, pinning the wrong output and defeating regression detection.

Fix: in disasm.rs, gate on `(instr.raw >> 21) & 1` (PPC bit 10) — when
set, emit the lwsync extended form. Update extended_mnemonics.json
fixture to expect `ext_mnemonic: "lwsync"`.

Note: this is the disassembler-side fix only. The interpreter-side
PPCBUG-088 (lwsync vs sync semantics) is separate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:43:24 +02:00
MechaCat02
d4f6ea787b fix(disasm): PPCBUG-640+650 fmt_bc spurious condition suffix on bdnz/bdz
PPCBUG-640: For BO=16 (bdnz: decrement CTR, branch if non-zero, ignore CR)
and BO=18 (bdz: same with branch-if-zero), `fmt_bc` fell through to the
`if decr` block and computed `cond_name_opt` from the don't-care BI=0 /
cond_true=false pair, yielding `Some("ge")`. The output was therefore
`bdnzge` / `bdzge` — a CTR-only branch with a spurious CR-derived suffix.

PPCBUG-650 (companion): the golden fixture pinned the wrong output, so
the regression had no detection signal until now.

`fmt_bclr` already had the correct `if decr && uncond` guard at line 872
producing `bdnzlr` / `bdzlr`. `fmt_bc` lacked the equivalent.

Fix: gate the condition string on `!uncond` inside the `if decr` block.
For BO=16/18 (uncond bit set), the condition suffix is now empty.

Tests: extended_mnemonics.json fixture rows for bdnz/bdz now expect the
correct `ext_mnemonic: "bdnz"` / `"bdz"`.

Impact: every analysis-DB query for `bdnz` loops (common in pixel-shader
and vertex processing) was returning zero rows; matches stored as `bdnzge`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:40:45 +02:00
MechaCat02
3d8e2ced2e fix(cpu): PPCBUG-053+054 32-bit CTR semantics in bcx/bclrx + mtspr CTR
PPCBUG-053: bcx and bclrx tested `ctx.ctr != 0` against the full 64-bit
register, but the Xbox 360 ABI runs CTR as a 32-bit counter (canary
explicitly truncates: `f.Truncate(ctr, INT32_TYPE)`). When upstream 64-bit
GPR pollution flowed through `mtspr CTR, rN`, the upper 32 bits stayed
non-zero forever; bdnz then looped past the intended 32-bit zero point
because the 64-bit comparison still saw the high bits.

PPCBUG-054: `mtspr CTR` writeback wrote the full 64-bit GPR value,
acting as a firewall gap that fed PPCBUG-053. Defensive truncation
prevents CTR from ever acquiring non-zero upper 32 bits independently
of the GPR-pollution source.

Fixes:
- interpreter.rs:849, 879: ctr_ok now uses `(ctx.ctr as u32) != 0`
- interpreter.rs:1523: mtspr CTR writes `val as u32 as u64`

Tests:
- bcx_bdnz_uses_32bit_ctr_compare: bdnz with CTR=0x0000_0001_0000_0001
  decrements to 0x0000_0001_0000_0000 and exits (low 32 bits = 0).
- bclrx_uses_32bit_ctr_compare: same coverage for bdnzlr.
- mtspr_ctr_truncates_to_32_bits: gpr=0xFFFF_FFFF_8000_0001 → ctr=0x8000_0001.

Coupled fix per the audit: PPCBUG-053 and PPCBUG-054 land together because
either alone is necessary-but-not-sufficient — the truncation prevents new
pollution, the 32-bit compare protects against any pollution that slipped
in via routes other than mtspr (e.g. mfctr-mtctr roundtrips).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:38:18 +02:00
MechaCat02
52ece4bd86 fix(cpu): PPCBUG-424+425 vmaddfp128/vmaddcfp128 operand swap + va128 field fix
PPCBUG-424: vmaddfp128 computed VA×VB+VD instead of ISA-mandated VA×VD+VB.
PPCBUG-425: vmaddcfp128 computed VD×VB+VA instead of ISA-mandated VA×VD+VB.

Root-cause discovered while writing the operand-order regression tests:
va128() was extracting PPC bits 6-10 (the same field as vd128's low 5 bits),
not PPC bits 11-15 where VA lives in VX128 form. This meant va128() silently
aliased vd128 for any instruction where VA != VD, making the operand swap
invisible in the existing denorm-flush test (which used VA == VD == v2).

Fixes in this commit:
- decoder.rs: va128() now extracts PPC bits 11-15 (host bits 20-16) + bit29.
  The vmx128_va128_uses_bit29 test encoding updated to match the correct field.
- interpreter.rs: vmaddfp128 changed from ai.mul_add(bi,di) to ai.mul_add(di,bi)
  (VA×VD+VB). vmaddcfp128 changed from di.mul_add(bi,ai) to ai.mul_add(di,bi).
  vmaddfp128_flushes_denormal_inputs redesigned with distinct VA/VD/VB registers
  (v1/v2/v3) so the flush test is independent of the accessor fix.
  New vmaddfp128_operand_order_va_times_vd_plus_vb and
  vmaddcfp128_operand_order_va_times_vd_plus_vb tests verify 2×3+10=16.
- disasm_goldens.rs + vmx128_registers.json: vmaddfp128/vmaddcfp128/vnmsubfp128
  golden raws updated to properly encode VA at PPC bits 11-15 (new raws:
  0x146328D4 / 0x14632914 / 0x14632954). vperm128 / vsrw128 golden operands
  updated to reflect correct VA extraction (v4 instead of v3/v0).

Affects all VMX128 binary ops that call va128(): vaddfp128, vsubfp128,
vmulfp128, vmaddfp128, vmaddcfp128, vnmsubfp128, vperm128, vsrw128 etc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:33:24 +02:00
MechaCat02
cedee3c385 fix(cpu): PPCBUG-510 stvewx128 writes 16 bytes instead of 4
stvewx128 was aligning EA to 16 bytes and writing all 16 bytes of the
vector, corrupting 12 adjacent bytes on every call. ISA semantics:
word-align EA, extract word lane (EA & 0xF) >> 2, write 4 bytes only.

The non-128 stvewx was already correct; stvewx128 was never updated.
Mirror the stvewx body with instr.vs128() substituted for instr.rs().
The invalidate_for_write call from P1 now covers the correct word-aligned
EA rather than the over-wide 16-byte range.

interpreter.rs: stvewx128 arm (~line 2984)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 10:05:37 +02:00
MechaCat02
a8c918cf9e chore(audit): mark P2 PPCBUGs applied; append P2 progress section
21 IDs (040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422,
423, 560, 561, 562, 563, 564, 565, 600) marked applied (52b05b1, 2026-05-01)
in audit-findings.md. P2 progress section appended to audit-report-2026-04-29.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 22:11:28 +02:00
MechaCat02
52b05b127f merge(cpu): Phase 2 decoder sweep — PPCBUG-040,046,275,276,315,360,361,362,363,369,420,421,422,423,560,561,562,563,564,565,600
All 8 batches of the P2 decoder/field-extraction sweep applied and reviewed.

Batch 1: PPCBUG-040+560 — sh64() bit order fix and rldicl test helper encoding
Batch 2: PPCBUG-046+561 — mb_md() accessor; all 6 rld* mb fields corrected
Batch 3: PPCBUG-275+276+420+421+422+423+562+600 — vc_rc_bit()/vx128r_rc_bit() Rc accessors; 13 vcmp dot-form sites; 5 decode_op6 dot-form entries
Batch 4: PPCBUG-315+563 — vrlimi128 vx128_4_z and vx128_4_imm field extraction
Batch 5: PPCBUG-361+565 — vsldoi128 vx128_5_sh field extraction
Batch 6: PPCBUG-362+564 — vpermwi128 vx128_p_perm field extraction
Batch 7: PPCBUG-360 — vperm128 vc128_2() accessor (was wrongly using vd128())
Batch 8: PPCBUG-363+369 — vpkd3d128 post-pack permutation (MakePermuteMask tables from canary)

All 201 interpreter + 6 disasm golden tests pass. Independent code review: all 9 check items OK.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 22:09:38 +02:00
MechaCat02
6b9de17925 fix(cpu): PPCBUG-363 PPCBUG-369 vpkd3d128 post-pack permutation
vpkd3d128 was storing the pack codec output directly into vd128 without
applying the MakePermuteMask permutation that merges the packed scalar(s)
into the previous register value according to pack (slot layout) and shift
(destination lane offset).

PPCBUG-363: vpkd3d128 was missing the post-pack lane-placement step.
PPCBUG-369: vpkd3d128 pack field not extracted; pack=0 still worked
  (identity), but pack=1/2/3 always wrote raw out instead of blending.

Fix: extract `pack = uimm & 3` and `shift = instr.vx128_4_z()` from the
VX128_4 IMM and z fields. For pack==0 (identity) store out directly as
before. For pack 1-3, read the existing vd128 value and select 4 u32
words from {prev, out} using the 3×4 static permutation tables from
canary ppc_emit_altivec.cc:2126-2188.

Tables derived from canary MakePermuteMask(r0,l0,…r3,l3):
  pack=1 (VPACK_32): out[3] placed at lane (3-shift), prev elsewhere
  pack=2 (64-bit):   out[2..3] placed at lanes (2-shift)..(3-shift)
  pack=3 (64-bit):   same as pack=2 except shift=3 → out[2] at lane 3

Tests: vpkd3d128_pack0_legacy_unchanged, vpkd3d128_pack1_shift0_d3d_vertex_pack,
       vpkd3d128_pack1_shift3_puts_out3_at_lane0

interpreter.rs: vpkd3d128 arm (~line 3999)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 22:06:00 +02:00
MechaCat02
64e8ecbfd0 fix(cpu): PPCBUG-361 PPCBUG-565 fix vsldoi128 SH field extraction
PPCBUG-565: Add vx128_5_sh() to decoder.rs — 4-bit shift at PPC bits
22-25 (host bits 6-9). The correct MSB is at PPC bit 22 (host bit 9).

PPCBUG-361: vsldoi128 was reading the SH MSB from host bit 4 (PPC bit
27, reserved) instead of host bit 9 (PPC bit 22). All shift amounts >= 8
decoded incorrectly (e.g. shift=8 executed as shift=0). Replace the
inline bit-shuffle with instr.vx128_5_sh().

Also fix vx128_p_perm_assembles_correctly test: replace nonexistent
DecodedInstr::from_raw() calls with struct literal construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:29:12 +02:00
MechaCat02
197d76c44e fix(cpu): PPCBUG-315 PPCBUG-563 fix vrlimi128 z and IMM field extraction
PPCBUG-563: Add vx128_4_imm() (PPC bits 11-15) and vx128_4_z() (PPC bits
24-25) accessors to decoder.rs for VX128_4-form instructions.

PPCBUG-315: vrlimi128 was reading z from host bits 16-17 (a subset of IMM)
and mask from host bits 2-5 (a reserved/XO region). Replace with the
correct accessors: z selects which word-lane to start the rotation from
(0-3); IMM is the 5-bit per-lane blend mask.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:26:26 +02:00
MechaCat02
d51b9346df fix(cpu): PPCBUG-275 276 420 421 422 423 562 600 fix vcmp Rc bit + decode dot forms
PPCBUG-562: Add vc_rc_bit() (PPC bit 21) and vx128r_rc_bit() (PPC bit 27)
to decoder.rs. The generic rc_bit() reads bit 0 (PPC bit 31); all vcmp XO
values are even so bit 0 is always 0, making CR6 permanently dead.

PPCBUG-275/276/420/421: Replace rc_bit() with vc_rc_bit() at all 8 pure
VC-form vcmp arms (vcmpequb, vcmpequh, vcmpgtub, vcmpgtsb, vcmpgtuh,
vcmpgtsh, vcmpgtuw, vcmpgtsw) and with the correct per-form accessor at
the 4 combined arms (vcmpeqfp|128, vcmpgefp|128, vcmpgtfp|128,
vcmpequw|128) and vcmpbfp|128.

PPCBUG-422: VX128_R-form 128-variants in combined arms now use
vx128r_rc_bit() instead of vc_rc_bit().

PPCBUG-423/600: Add 5 dot-form key entries to decode_op6 so
vcmp*fp128./vcmpequw128. decode as the correct opcode instead of Invalid.
Uses a 5-bit key (bits22-24 + bit25 + bit27) for dot-forms to avoid
aliasing against the shift/merge group (which sets bit25=1 when bit27=1).
Interpreter uses vx128r_rc_bit() to conditionally update CR6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:15:06 +02:00
MechaCat02
75544fa9db fix(cpu): PPCBUG-046 PPCBUG-561 add mb_md() accessor; fix all 6 rld* mb fields
PPCBUG-561: Add DecodedInstr::mb_md() to decoder.rs — the correct MD-form
6-bit mask-begin reconstruction (MB[4:0] at PPC bits 21-25, MB[5] at PPC
bit 26). The disassembler already had the correct local formula; this
promotes it to a single source of truth on DecodedInstr.

PPCBUG-046: All 6 doubleword-rotate arms (rldicl, rldicr, rldic, rldimi,
rldcl, rldcr) inlined "(instr.mb() << 1) | ((instr.raw >> 1) & 1)" which
reads SH5 (host bit 1) instead of MB5 (host bit 5). For the canonical
"clrldi r3, r4, 32" zero-extend idiom (mb=32 → MB5=1, MB[4:0]=0), the
wrong formula produced mb=0, making the instruction a no-op and leaving
upper 32 bits of the GPR polluted. Replace all 6 sites with instr.mb_md().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:01:03 +02:00
MechaCat02
147daa0721 fix(cpu): PPCBUG-040 PPCBUG-560 fix sh64() bit order and rldicl test helper
PPCBUG-040: decoder.rs sh64() assembled the XS-form shift amount as
(SH[4:0] << 1) | SH[5] instead of (SH[5] << 5) | SH[4:0]. Every
`sradi` with shift N ∈ 1..=62 executed with a completely wrong shift
count (e.g. shift=32 executed as shift=1).

PPCBUG-560: disasm_goldens.rs rldicl() test helper was encoding sh[5:1]
at PPC bits 16-20 and sh[0] at PPC bit 30 — exactly backwards. The wrong
encoder and wrong decoder cancelled out, hiding PPCBUG-040 from tests.
Fix both together so tests validate ISA-correct encodings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:54:40 +02:00
MechaCat02
ca5b90b700 fix(cpu): P1 atomicity sweep — invalidate_for_write at all store sites
Implements PPCBUG-107 cascade fix: every store opcode in the interpreter
now calls ReservationTable::invalidate_for_write(ea) when a reservation
table is active and at least one thread holds a reservation. This restores
correct lwarx/stwcx. LL/SC semantics under --parallel --reservations-table.

Batches merged:
  PPCBUG-107,140-144: stw/stwu/stwx/stwux/stwbrx
  PPCBUG-130,150:     stb/stbu/stbx/stbux/sth/sthu/sthx/sthux/sthbrx/std/stdu/stdx/stdux/stdbrx
  PPCBUG-160,167:     stmw/stswi/stswx + all FP stores (stfs/stfd families)
  PPCBUG-511-514:     16 VMX stores (stvx/stvxl/stvebx/stvehx/stvewx/stvlx/stvrx families)
  PPCBUG-151:         stwcx./stdcx. width discriminator (reservation_width: u8 in PpcContext)
  PPCBUG-108:         debug_assert + doc on legacy single-context reservation path
  Review fixes:       stswi/stswx two-line guard; dcbz/dcbz128 guards added (missed in audit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:47:49 +02:00
MechaCat02
c9f194dda1 fix(cpu): review fixes — stswi/stswx two-line guard, dcbz/dcbz128 invalidate
PPCBUG-160 partial: stswi's single invalidate_for_write(ea) only covered
the first cache line; with nb up to 32, the write span can cross a 128-byte
line boundary. Replace with two-call guard:
  first_line = ea & !RESERVATION_MASK
  last_line  = ea.wrapping_add(nb - 1) & !RESERVATION_MASK
  invalidate first; if last != first, invalidate last.

PPCBUG-160 partial: stswx had the same single-call gap; nb from XER[0:6]
can be up to 127 bytes. Same two-call guard applied; wrapped in `if nb > 0`
to guard against nb==0 underflow (XER TBC field is 0 when no bytes to store).

dcbz: zeroes 32 bytes at a 32-byte-aligned EA — touches exactly one 128-byte
cache line; add canonical single-call invalidate guard (was entirely missing).

dcbz128: zeroes 128 bytes at a 128-byte-aligned EA — one full reservation
line; add canonical single-call invalidate guard (was entirely missing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:47:32 +02:00
MechaCat02
d75c4edf67 docs(cpu): PPCBUG-108 document legacy reservation path's strict-lockstep requirement
Adds doc comments above lwarx/ldarx/stwcx./stdcx. clarifying that the
legacy per-ctx reservation path is only correct in strict lockstep
(single host thread); under --parallel the M3 scheduler must enable
the cross-thread ReservationTable before spawning a second host thread.

A debug_assert fires in the legacy stwcx./stdcx. branch if a
non-primary HW slot (hw_id != 0) takes that path — surfacing
ReservationTable-disabled misconfiguration early in debug builds.
Note: the primary slot (hw_id==0) racing other parallel slots is
not caught by the assert; that case requires the table to be enabled.

Affected:
  PPCBUG-108  legacy per-ctx reservation path cannot invalidate
              cross-thread; informational — no behavioral change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:55:13 +02:00
MechaCat02
a107ac9ae7 fix(cpu): PPCBUG-151 add reservation_width discriminator to stwcx./stdcx.
Track lwarx vs ldarx reservation width in PpcContext as a u8 (4 = word,
8 = doubleword, 0 = none). stwcx. requires width==4; stdcx. requires
width==8. Cross-width pairs (lwarx + stdcx., ldarx + stwcx.) now fail
deterministically with CR0.EQ=0 instead of spuriously succeeding.

The width is held per-thread; the cross-thread reservation table keeps
its existing slot encoding because each host thread consults its own
ctx.reservation_width before committing.

Affected:
  PPCBUG-151  stwcx./stdcx. shared the same reservation slot without
              width discriminator; cross-width commits silently succeeded

Tests: lwarx_then_stdcx_cross_width_fails,
       ldarx_then_stwcx_cross_width_fails

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:44:48 +02:00
MechaCat02
d4e227eeab fix(cpu): PPCBUG-511 PPCBUG-512 PPCBUG-513 PPCBUG-514 add invalidate_for_write to VMX stores
Continuation of the PPCBUG-107 cascade sweep. All 16 VMX store opcodes
(stvx/stvxl, stvebx/stvehx/stvewx, stvlx/stvrx and 128 variants of each)
now invalidate the reservation table before writing.

stvlx/stvrx partial-vector stores can write at non-16-byte-aligned EAs;
they invalidate both potentially-touched cache lines.

stvewx128 currently writes 16 bytes at the wrong EA scope (PPCBUG-510);
the invalidate guard fires at that over-wide EA today and will narrow
automatically when PPCBUG-510 is fixed in P3.

Affected:
  PPCBUG-511  stvx, stvx128, stvxl, stvxl128
  PPCBUG-512  stvebx, stvehx, stvewx, stvewx128
  PPCBUG-513  stvlx, stvlx128, stvlxl, stvlxl128
  PPCBUG-514  stvrx, stvrx128, stvrxl, stvrxl128

Tests: lwarx_then_plain_stvx_invalidates_reservation,
       lwarx_then_plain_stvlx_invalidates_reservation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:36:17 +02:00
MechaCat02
af54eb28bd fix(cpu): PPCBUG-160 PPCBUG-167 add invalidate_for_write to multiple/string + FP stores
Continuation of the PPCBUG-107 cascade sweep. stmw/stswi/stswx (multiple
and string stores) and the 9 floating-point stores now invalidate the
reservation table before writing.

stmw can span two cache lines when the writeback range crosses a line
boundary; the guard iterates over all touched lines so multi-line atomic
holds the same guarantee as single-line stores.

Affected:
  PPCBUG-160  3 multiple/string stores: stmw, stswi, stswx
  PPCBUG-167  9 FP stores: stfs, stfsu, stfsx, stfsux,
                            stfd, stfdu, stfdx, stfdux, stfiwx

Tests: lwarx_then_plain_stmw_spans_two_lines_and_invalidates,
       lwarx_then_plain_stfd_invalidates_reservation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:24:46 +02:00
MechaCat02
24d347436a fix(cpu): PPCBUG-130 PPCBUG-150 add invalidate_for_write to byte/halfword/doubleword stores
Continuation of the PPCBUG-107 cascade sweep (batch 1: word stores landed
in 4538fa9). Plain stb/stbu/stbx/stbux, sth/sthu/sthx/sthux/sthbrx, and
std/stdu/stdx/stdux/stdbrx now invalidate the reservation table before
writing, so cross-thread lwarx/stwcx. atomicity holds when these widths
are written by another host thread.

Affected:
  PPCBUG-130  9 byte/halfword stores missing invalidate_for_write
                stb, stbu, stbx, stbux, sth, sthu, sthx, sthux, sthbrx
  PPCBUG-150  5 doubleword stores missing invalidate_for_write
                std, stdu, stdx, stdux, stdbrx

Tests: lwarx_then_plain_stb_invalidates_reservation,
       lwarx_then_plain_std_invalidates_reservation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:13:31 +02:00
MechaCat02
4538fa9e70 fix(cpu): PPCBUG-107 PPCBUG-140-144 add invalidate_for_write to word stores
Word stores (stw, stwu, stwx, stwux, stwbrx) now invalidate the
reservation table for the target line before writing. Without this,
plain stores by other host threads silently fail to clear reservations
held by lwarx, causing stwcx. to spuriously succeed under --parallel.

Affected:
  PPCBUG-107  ReservationTable::invalidate_for_write never called from any store
  PPCBUG-140  stw missing invalidate_for_write   (interpreter.rs:1183)
  PPCBUG-141  stwu missing invalidate_for_write  (interpreter.rs:1189)
  PPCBUG-142  stwx missing invalidate_for_write  (interpreter.rs:1195)
  PPCBUG-143  stwux missing invalidate_for_write (interpreter.rs:1201)
  PPCBUG-144  stwbrx missing invalidate_for_write (interpreter.rs:1568)

Tests: lwarx_then_plain_stw_invalidates_reservation,
       lwarx_then_stwcx_succeeds_without_intervening_store

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:57:05 +02:00
MechaCat02
bae9305982 xenia-app: observability subsystem, --parallel runtime, stress harness
observability.rs installs the tracing subscriber stack (env-filter +
JSON file appender + chrome trace + error layer) and the metrics
recorder shared by the workspace. main.rs grows the new CLI surface:
--parallel, --reservations-table, --trace-handles, --analyze=
{rust,sql,both}, xenia dis --json, --ui, plus the wiring that runs
the CPU through the new scheduler, drives the GPU's threaded backend,
and surfaces the framebuffer + HUD via xenia-ui.

Add tests/parallel_stress.rs (#[ignore]-gated long form, short form
runs 20×@5M) and tests/golden/sylpheed_n2m.json — the digest the
lockstep/parallel combos compare against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:30:26 +02:00
MechaCat02
b1285ba560 xenia-hid + xenia-debugger: gamepad serializer; debugger fast-skip hook
xenia-hid grows a guest-facing X_INPUT_GAMEPAD writer (big-endian on
the wire, host-neutral GamepadState in memory) so XamInputGetState in
the kernel and the UI input thread share one POD snapshot type. Adds
the GUIDE button flag.

xenia-debugger gains Debugger::wants_hooks(), a single-branch fast
path the hot interpreter loop checks to skip the pre_step/post_step
HashMap+match work when the debugger is in cold-run mode (no bps, no
trace, StepMode::Run, not paused). Part of the Tier-3 perf landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:30:03 +02:00
MechaCat02
79eb52c378 xenia-gpu: end-to-end Xenos pipeline (PM4, ucode, EDRAM, resolve)
First real GPU implementation. Ring/PM4 frontend (ring_view,
ring_drain, pm4) drains the command processor; gpu_system owns the
threaded backend (DrainFence RPC + parker/fence helpers from M1) and
the MMIO-mapped register block (mmio_region).

Xenos shader frontend: ucode/{alu,control_flow,fetch,mod}.rs decode
the Xbox 360 microcode, translator.rs lowers it onto the WGSL
xenos_interp interpreter shader (shaders/xenos_interp.wgsl).
shader_metrics.rs counts decode/translate work.

Render state: draw_state, primitive, render_target_cache,
texture_cache, tiled_address (Xenos's swizzled tiled-memory layout),
xenos_constants (register field constants), edram (the 10 MiB EDRAM
model with MSAA), and resolve.rs (TILE_FLUSH copy-out — clear-resolve
plus bitwise-equivalent 32 bpp + 64 bpp paths landed). handle.rs
owns the typed GPU-resource handles the kernel hands out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:29:38 +02:00
MechaCat02
5f0d6487ea xenia-kernel: HLE expansion, scheduler integration, audit + UI bridge
Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds
count/limit, XexGet{Module,Procedure}Address use distinct
HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse
(ModuleId,ordinal)→thunk_addr map, plus sweeping additions across
sync primitives, file I/O, semaphores, events, threads, and
allocator paths needed to advance Sylpheed past VdSwap=2.

New modules:
  - thread.rs   — ThreadRef + per-thread suspension/wake plumbing
  - interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers
  - path.rs     — guest path normalization (D:\\, game:\\, etc.)
  - audit.rs    — --trace-handles harness backing the handle audit
  - ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge
                   (input snapshots, framebuffer publish handles)

state.rs grows to own the HW-slot scheduler state, the new audit /
UI bridge handles, and the per-handle reverse maps. xam.rs and
objects.rs follow suit for the HLE additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:29:00 +02:00
MechaCat02
f1fadb5398 xenia-vfs/xex: cache full disc tree; instrument XEX load
DiscImageDevice now walks the GDFX tree at open() and caches every
file/dir entry by full relative path; the previous root-only scan
returned ENOENT for any path under a subdirectory (dat/tables.pak,
media/x.wav). Lookups become O(n) over the cached vec.

xex::load_image gains a tracing span plus per-load metrics
(xex.load_image_ms histogram, xex.bytes_{in,out} counters) so the
observability subscriber the app installs can see decompression cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:28:32 +02:00
MechaCat02
45e15d7885 xenia-analysis: unify disasm via xenia-cpu, split ingest/analyze, add sinks
The old src/ppc.rs that re-implemented PPC formatting collapses into
a 30-line shim that delegates to xenia-cpu's single-source-of-truth
disasm. A new disasm.rs wraps the shared iterator and feeds enriched
items (analysis context: function membership, xrefs, mnemonics) into
pluggable sinks.

Sinks split: text.rs (objdump-like output), json.rs (JSONL stream
matching the new xenia dis --json mode), duckdb.rs (the analysis DB
ingest). db.rs is restructured into ingest_instructions +
write_analysis_results so a run can stop after raw ingest, and a new
target_hex column lands on the instructions table. sql_views.rs adds
five additive views layered on top of the raw tables.

Tests: assert-based JSON-fixture goldens (disasm_goldens) and a
PRAGMA-table_info schema golden (db_schema_golden) covering all
ingested tables and the SQL views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:28:06 +02:00
MechaCat02
c36cca14f9 xenia-cpu: VMX128, FPSCR, decoder split, scheduler, decode/block caches
Split the monolithic interpreter into cohesive modules: dedicated
decoder (decoder.rs) producing 8-byte DecodedInstr; opcode tables
(opcode.rs); explicit traps (trap.rs); FPSCR helpers (fpscr.rs);
overflow/carry helpers (overflow.rs); a 4 KiB-page-versioned decode
cache and basic-block cache (block_cache.rs); and a full VMX/VMX128
implementation (vmx.rs) covering AltiVec + Xenon's 128-bit extensions.

Add the parallel-execution substrate behind --parallel: a 7-party
phaser (phaser.rs) for round-based barrier sync, ReservationTable
(reservation.rs) for guest LL/SC, and the per-HW-thread scheduler
core (scheduler.rs) that owns ThreadRefs, runqueues, and pending IRQs.

Disassembler is now the single source of truth: disasm.rs gains the
full base + extended + VMX128 mnemonic set, with golden JSON fixtures
and a disasm_goldens test suite. Add a criterion-style interpreter
bench. context.rs grows the per-thread state the new modules need
(reservation slot, FPSCR, vector regs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:27:43 +02:00
MechaCat02
e9b2b57a44 xenia-memory: interior-mutable writes, page versioning, fenced ops
Re-shape MemoryAccess so write methods take &self and rely on interior
mutability (atomics in GuestMemory, Cell in test mocks). This unblocks
the &Arc<KernelState>-only execution model the CPU/HLE crates moved to.

GuestMemory grows: per-4 KiB-page write-version counter (page_version)
that the CPU's decode cache and the texture cache observe via Acquire,
fenced 32-bit/64-bit read/write helpers (Release on writer / Acquire on
reader) that PM4_EVENT_WRITE_SHD and the matching CPU consumers use to
synchronize fence publication, and broader page-table / heap accounting
needed by the new HLE allocators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:27:13 +02:00
MechaCat02
e2b8860e10 Add xenia-ui crate; switch analysis store to DuckDB
Workspace gains a new xenia-ui member that owns the winit/wgpu
window, the Xenos display pipeline (xenos_pipeline + render +
texture_cache_host), HUD font/blit shaders, and the input-bridge
plumbing the app uses to surface guest framebuffers and overlays.

Workspace dependencies grow accordingly: rusqlite is replaced with
duckdb (analysis pipeline now writes DuckDB stores), and tracing /
metrics / pprof / winit / wgpu / gilrs / pollster / crossbeam /
bytemuck are added at workspace level so xenia-ui and xenia-app
share versions. Cargo.lock regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:48 +02:00
MechaCat02
f166d061be Ignore audit reports, run logs, and per-crate target/ dirs
audit-out/ and audit-*.md are local report artifacts produced by the
PPCBUG audit pipeline; *.stdout/*.stderr/*.log are stress-harness run
captures. Switch /target/ to target/ so per-crate target dirs (e.g.
crates/xenia-app/target/ used as a stress-output sink) are also ignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:26:10 +02:00
163 changed files with 760420 additions and 2995 deletions

18
.gitignore vendored
View File

@@ -1,4 +1,18 @@
/target/
target/
*.iso
*.xiso
*.db
*.db
# Audit reports / pre-pass findings (local artifacts, not source)
audit-out/
audit-*.md
# Run logs from stress harnesses and ad-hoc captures
*.stdout
*.stderr
*.log
# Runtime cache artifacts (vkd3d-proton / DXVK shader caches dropped into the
# working dir by the Wine canary build)
vkd3d-proton.cache*
*.dxvk-cache

4487
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -12,6 +12,7 @@ members = [
"crates/xenia-hid",
"crates/xenia-debugger",
"crates/xenia-analysis",
"crates/xenia-ui",
"crates/xenia-app",
]
@@ -33,10 +34,17 @@ xenia-apu = { path = "crates/xenia-apu" }
xenia-hid = { path = "crates/xenia-hid" }
xenia-debugger = { path = "crates/xenia-debugger" }
xenia-analysis = { path = "crates/xenia-analysis" }
xenia-ui = { path = "crates/xenia-ui" }
# External dependencies
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "registry"] }
tracing-appender = "0.2"
tracing-chrome = "0.7"
tracing-error = "0.2"
metrics = "0.24"
metrics-util = "0.19"
pprof = { version = "0.14", features = ["flamegraph", "protobuf-codec"] }
bitflags = "2"
byteorder = "1"
thiserror = "2"
@@ -44,4 +52,13 @@ anyhow = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
aes = "0.8"
rusqlite = { version = "0.31", features = ["bundled"] }
duckdb = { version = "1", features = ["bundled"] }
# UI / rendering / input (used by xenia-ui and xenia-app with --ui)
winit = "0.30"
wgpu = "22"
gilrs = "0.11"
pollster = "0.3"
crossbeam-utils = "0.8"
crossbeam-channel = "0.5"
bytemuck = { version = "1", features = ["derive"] }

6071
audit-findings.md Normal file

File diff suppressed because it is too large Load Diff

629
audit-report-2026-04-29.md Normal file
View File

@@ -0,0 +1,629 @@
# PPC Instruction Audit — Triaged Report (2026-04-29)
**Status**: audit complete. **No code modified.** This file is the fix-order plan for the follow-up session.
**Source of truth**: detailed bug entries (one heading per PPCBUG ID) live in `audit-findings.md`. This file references every entry by ID so nothing is lost — it does not duplicate the per-bug detail.
## Counts
- **Total findings**: 253 PPCBUG IDs, of which 4 are explicitly retracted/withdrawn (PPCBUG-220, 222, 226, 482, 483 — see Notes section).
- **Net findings**: ~248 actionable.
- **Severity breakdown** (rough):
- HIGH: ~55 (~22%)
- MEDIUM: ~75 (~30%)
- LOW (test gaps + cosmetic + informational): ~118 (~48%)
## Headline findings (most likely Sylpheed-renderer-blockers)
1. **PPCBUG-107 cascade**`ReservationTable::invalidate_for_write` defined and unit-tested but never called from any of the **50+ store opcodes** in the interpreter. Under `--parallel`, every cross-thread atomic via `lwarx`/`stwcx.` is silently broken: spinlocks succeed without exclusion, atomic counters race, condition-variable handshakes never sync. Plausible direct cause of the 4-worker-thread renderer plateau (`project_xenia_rs_sylpheed_stage3_2026_04_29.md`). **Fix is mechanical**: one-line `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` in interpreter.rs.
2. **PPCBUG-053+054 cascade**`bcx`/`bclrx` CTR zero-test compares all 64 bits; `mtspr CTR` writes full 64-bit GPR. Combined with PPCBUG-006 (`negx` poisons GPR upper 32) → **`neg; mtctr; bdnz` loops run forever**.
3. **8 decoder/field-extraction bugs collapse into 6 missing accessors** + 1 wrong sh64 formula + 1 missing decode_op6 dot-form entry. The disassembler already has correct local versions. Single mechanical sweep.
4. **PPCBUG-046 (`clrldi r3, r4, 32`)** — the canonical zero-extend-low-32 idiom is currently a no-op. Emitted constantly by 32-bit-ABI compilers.
5. **PPCBUG-510**`stvewx128` corrupts 12 adjacent bytes per call.
6. **PPCBUG-424/425**`vmaddfp128`/`vmaddcfp128` operand swap. Every D3D vertex/pixel shader using FMA with non-aliased operands gets wrong arithmetic.
7. **PPCBUG-360/363**`vperm128` uses wrong control vector (every D3D shader swizzle); `vpkd3d128` missing post-pack permutation (canonical D3D vertex-pack `pack=1` always wrong).
8. **PPCBUG-275/420-422** — VC-form and VMX128_R-form `rc_bit()` reads bit 0 instead of bit 21/27 → **CR6 never updated for ANY VMX vector compare dot form**. Breaks every `vcmpequb. + bc CR6_all_true` early-exit loop in audio mixing, font rendering, string ops.
## Recommended fix order
The phases below are the recommended fix order for the follow-up session. Each phase is **independently mergeable**; later phases may reveal that earlier phases unblocked their symptoms (e.g. P1 by itself could be sufficient to break open the Sylpheed renderer plateau).
After each phase: `cargo test --workspace --release` (must stay at 506+ pass) AND `xenia-rs check sylpheed.iso -n 100M` (must not regress against the 2026-04-29 addis-fix baseline of `swaps=2`). The acid test is whether `draws > 0` opens after P1 or P2.
---
### Phase 1 — Cross-thread atomicity (PPCBUG-107 cascade)
**Why first**: highest confidence smoking-gun for the renderer plateau. Single, mechanical, low-risk fix. Largest leverage relative to size.
**Coupled — must land together**:
- PPCBUG-107 (root: missing call from stores)
- PPCBUG-130 (9 byte/halfword stores)
- PPCBUG-140, 141, 142, 143, 144 (5 word stores: stw/stwu/stwx/stwux/stwbrx)
- PPCBUG-150 (5 doubleword stores: std/stdu/stdx/stdux/stdbrx)
- PPCBUG-160 (3 multiple/string stores: stmw/stswi/stswx)
- PPCBUG-167 (9 FP stores)
- PPCBUG-511, 512, 513, 514 (16 VMX stores)
**Independent but related**:
- PPCBUG-151 (stwcx/stdcx reservation width discriminator) — separate fix; add `reservation_width: u8` to PpcContext.
- PPCBUG-108 (legacy per-context path: cross-thread invalidation impossible) — informational; --reservations-table mode bypasses.
**Approach** — one PR adds `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before every `mem.write_*` call site. Scope:
```
mem.write_u8 / write_u16 / write_u32 / write_u64 / write_f32 / write_f64
mem.write_vec128 / write_vec128_aligned (for VMX)
```
~38 sites total. Add 1+ targeted concurrency tests (lwarx + cross-thread plain store + stwcx., expect EQ=0).
---
### Phase 2 — Decoder/field-extraction structural sweep
**Why second**: single mechanical sweep, fixes 12 distinct HIGH-severity findings, unblocks correct execution of compiler-emitted code. Disassembler already has correct local extraction logic — promote/port.
**Coupled — same commit**:
- PPCBUG-040 + PPCBUG-560 — fix `sh64()` bit order AND fix the test helper that was masking it
- PPCBUG-046 + PPCBUG-561 — promote `mb_md()` from `disasm.rs:1256` to `decoder.rs`; replace 6 inline-formula sites in interpreter.rs (rldicl/rldicr/rldic/rldimi/rldcl/rldcr)
- PPCBUG-275 + PPCBUG-276 + PPCBUG-420 + PPCBUG-421 + PPCBUG-422 + PPCBUG-562 — add `vc_rc_bit()` (PPC bit 21) and `vx128r_rc_bit()` (PPC bit 27); replace `instr.rc_bit()` at all VMX compare dot-form sites
- PPCBUG-315 + PPCBUG-563 — add `vx128_4_z()`, `vx128_4_imm()`; fix `vrlimi128`
- PPCBUG-361 + PPCBUG-565 — add `vx128_5_sh()`; fix `vsldoi128`
- PPCBUG-362 + PPCBUG-564 — add `vx128_p_perm()`; fix `vpermwi128`
- PPCBUG-423 + PPCBUG-600 — add 5 odd-key entries to `decode_op6` key4 for `vcmp*fp128.` dot forms
**Independent in this phase**:
- PPCBUG-360 — `vperm128` reads VC from `vd128()` instead of VX128_2 VC field at integer bits 6-8. Fix at the call site (or add `vx128_2_vc()` accessor).
- PPCBUG-363 + PPCBUG-369 — `vpkd3d128` missing post-pack permutation; add the `pack`/`shift` field handling per Canary.
**Test fixture updates required** (PPCBUG-560 lesson) — once `sh64()` is fixed, verify all `disasm_goldens.rs` test helpers encode shifts ISA-correctly. Don't trust the existing fixtures blindly.
---
### Phase 3 — Other HIGH bugs (single targeted fixes)
**Independent**:
- PPCBUG-510 — `stvewx128` corrupting 12 bytes per call. Direct fix: align EA to word, write only 4 bytes.
- PPCBUG-424 — `vmaddfp128` operand order: change `ai.mul_add(bi, di)``ai.mul_add(di, bi)`.
- PPCBUG-425 — `vmaddcfp128` operand order similarly.
- PPCBUG-053 + PPCBUG-054 — `bcx`/`bclrx` CTR zero-test (32-bit) + `mtspr CTR` truncation (defensive firewall). Coupled.
- PPCBUG-640 — `fmt_bc` spurious condition suffix on pure `bdnz`/`bdz`. Port the `fmt_bclr` pattern.
- PPCBUG-641 — `lwsync` shows as `sync` in disassembler (re-assessment of PPCBUG-088). Same fix.
---
### Phase 4 — 32-bit ABI writeback truncation sweep
**Why this phase**: cross-cutting, mechanical. Once ALL writebacks truncate via `as u32 as u64`, the systemic 32-bit-ABI invariant is restored and most CR0/CA helper-correctness concerns become moot.
#### 4a — Active poisoning (every execution corrupts GPR upper bits)
These bugs corrupt GPR upper bits **regardless** of whether upstream sources are clean — typically because the implementation applies Rust's `!u64` (full 64-bit NOT) somewhere:
- PPCBUG-006 (negx — `(!ra).wrapping_add(1)`)
- PPCBUG-008 (subfex — `(!ra).wrapping_add(rb).wrapping_add(ca)`)
- PPCBUG-018 (subfzex)
- PPCBUG-019 (subfmex)
- PPCBUG-028 (orcx — `rs | !rb`)
- PPCBUG-029 (norx — `!(rs | rb)` — the canonical `not` mnemonic, hot path)
- PPCBUG-030 (nandx)
- PPCBUG-031 (eqvx — `!(rs ^ rb)` — common `eqv rA, rA, rA` set-to-all-ones)
- PPCBUG-033 (andcx via `!rb`)
- PPCBUG-034 (extsbx — `as i8 as i64 as u64`)
- PPCBUG-035 (extshx)
#### 4b — Same-shape-as-addis (latent under clean inputs, active when upstream is poisoned)
- PPCBUG-001 (addi), PPCBUG-002 (addic), PPCBUG-003 (addicx), PPCBUG-005 (subficx), PPCBUG-007 (subfcx CA), PPCBUG-008 (subfex CA — also in 4a)
- PPCBUG-004 (mulli), PPCBUG-009 (mullwx)
- PPCBUG-010 + PPCBUG-011 (divwx writeback + CR0 — **must land together**, not independently)
- PPCBUG-041 + PPCBUG-042 + PPCBUG-043 (srawx/srawix writeback + CR0 coupling — **must land together**)
- PPCBUG-095, 096, 097, 098 (lha/lhax/lhau/lhaux halfword sign-extension)
- PPCBUG-105 (lwa/lwax/lwaux — note: 64-bit-mode-only; less common in 32-bit-ABI binaries)
#### 4c — Latent writeback (only triggers if 4a/4b are unfixed)
These can be fixed in the same sweep but won't fire under clean inputs:
- PPCBUG-012, 013, 014, 015, 016, 017 (addx/addcx/addex/addzex/addmex/subfx)
- PPCBUG-032 (andx/orx/xorx)
#### 4d — CR0 32-bit-ABI compare (cross-cutting catch-all)
PPCBUG-020 documents the catch-all; the per-opcode locations are referenced from there:
- PPCBUG-020 (catch-all in groups 2-5)
- PPCBUG-023 (andisx)
- PPCBUG-024 (rlwinmx), PPCBUG-025 (rlwimix), PPCBUG-026 (rlwnmx)
- PPCBUG-036 (extsbx), PPCBUG-037 (extshx) — **must land with PPCBUG-034/035**
- PPCBUG-044 (slwx/srwx)
**Fix shape** — at every Rc=1 path, change `update_cr_signed(0, result as i64)` to `update_cr_signed(0, result as u32 as i32 as i64)`. Once 4a/4b/4c land, both forms become equivalent and 4d becomes belt-and-suspenders (still recommended for resilience).
---
### Phase 5 — FPU correctness (graphics middleware impact)
#### 5a — Round-to-int and FPSCR.RN
- PPCBUG-221 + PPCBUG-227 (`round_to_i64` NearestEven broken near 2^52 — must land together; `round_to_i32` delegates)
- PPCBUG-201 (FPSCR.RN not honored for double arithmetic)
- PPCBUG-432 (vrfin/vrfin128 round-half-away-from-zero vs round-to-nearest-even)
#### 5b — VXISI / NaN / SNaN handling for FMA family
- PPCBUG-181, 182 (single fmaddsx/fmsubsx/fnmaddsx/fnmsubsx VXISI)
- PPCBUG-202, 203, 204 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI — esp. 203 hot for Newton-Raphson)
- PPCBUG-183, 205 (fnmadd/fnmsub Rust unary `-` flips NaN sign — fix: skip negation on NaN)
- PPCBUG-186 (SNaN priority for FMA)
- PPCBUG-128 (lfs SNaN quietening — bit-manipulation widening helper needed)
#### 5c — Inexact / FPSCR exception bits
- PPCBUG-180 (single XX/FR/FI never set), PPCBUG-200 (double XX/FR/FI never set)
- PPCBUG-223 (fcmpo VXSNAN/VXVC), PPCBUG-224 (fcfidx XX), PPCBUG-225 (frspx XX/FR/FI), PPCBUG-229 (fctidx/fctidzx XX/FX), PPCBUG-230 (fctiwx/fctiwzx XX/FX), PPCBUG-231 (frspx SNaN host dependency)
- PPCBUG-165 + PPCBUG-166 + PPCBUG-168 (stfs* FPSCR + RN + SNaN)
#### 5d — Subnormal flush (FPSCR.NI / VSCR.NJ)
- PPCBUG-185 (FPU NI subnormal flush not modeled)
- PPCBUG-435, 436, 437 (VMX NJ subnormal flush — vaddfp/vsubfp/vmulfp128, vmsum3fp128/vmsum4fp128 product intermediates, vmaddfp/vmaddfp128/vmaddcfp128/vnmsubfp128 outputs)
#### 5e — Estimate precision (vs hardware ~12-bit)
- PPCBUG-184 (fres)
- PPCBUG-428..431 (vrefp, vrsqrtefp, vexptefp, vlogefp — same shape as fres)
#### 5f — VMX float compares + saturation
- PPCBUG-426, 427 (vnmsubfp/vnmsubfp128 double-rounding)
- PPCBUG-433 (vctsxs/vcfpsxws128 NaN saturate to INT_MIN)
---
### Phase 6 — Other MEDIUM correctness
- PPCBUG-021 (overflow.rs OE checks at bit 63 — sub-register ops; partly covered by P4)
- PPCBUG-022 (`mulld_ov` missing INT_MIN × -1)
- PPCBUG-027 (rlwimix upper-32 ISA-deviation — auto-resolves once P4 lands)
- PPCBUG-039 (cntlzdx 32-bit-ABI counts upper-zero — only matters if emitted)
- PPCBUG-063 (trap pc-after-advance)
- PPCBUG-064 (sc LEV field)
- PPCBUG-065 (twi 31, r0, IMM typed-trap — relevant to Sylpheed C++ throw work, see `project_xenia_rs_sylpheed_throw_2026_04_28.md`)
- PPCBUG-068 (mcrfs VX summary recomputation)
- PPCBUG-078 (mtmsrd L=1 partial MSR-write)
- PPCBUG-080 (mfvscr zero upper 96 bits)
- PPCBUG-123 + PPCBUG-124 + PPCBUG-161 + PPCBUG-566 (XER TBC for lswx/stswx — coupled; add `xer_tbc: u8` to PpcContext, wire into xer()/set_xer(); enables lswx and stswx)
- PPCBUG-125 (lmw RA-in-destination skip)
- PPCBUG-126 + PPCBUG-162 (lswi/stswi `instr.rb()``instr.nb()`)
- PPCBUG-487 + PPCBUG-495 (vsum* operand naming)
- PPCBUG-515 (lvebx/lvehx/lvewx vs Canary divergence — document; xenia-rs is more ISA-faithful)
- PPCBUG-516 (lvsr sh=0 case — add comment + debug_assert)
- PPCBUG-601 (decode_op6 overlapping windows — document the invariant)
- PPCBUG-642 (fmt_bcctr extended forms)
- PPCBUG-643 + PPCBUG-644 (SIMM/D-form decimal vs hex — alignment with Canary disassembly)
- PPCBUG-367 (vupkhpx/vupklpx channel replication vs zero-extend)
- PPCBUG-368 (vpkpx pack_pixel_555 channel assignment unverified)
- PPCBUG-366 (vspltisb/vspltish sign-extension idiom — fragile, not wrong)
---
### Phase 7 — Frozen-snapshot drift (separate sweep)
8 opcodes' frozen snapshots in `ppc-manual/<cat>/<op>.md` differ from live code:
- PPCBUG-066 (td/tdi/tw/twi)
- PPCBUG-117 (ldarx)
- PPCBUG-145 (stwcx)
- PPCBUG-560 (already-listed: rldicl test helper bit-order)
- Plus the implicit drift in addicx (PPCBUG-003), andisx (PPCBUG-023), cmp/cmpi (PPCBUG-050), extsbx/extshx (PPCBUG-036/037, PPCBUG-032 in batch 1)
**Recommendation**: regenerate frozen snapshots from current code for the entire ppc-manual after Phases 1-4 land. Add a CI check that compares snapshots vs live code on every PR.
---
### Phase 8 — Test gap closure (broad)
Single PR per group is overkill; recommend bundling test additions with each Phase 1-6 PR (test the bug being fixed). The remaining LOW IDs are pure-test-gap entries — list:
- PPCBUG-045 (shift), 047 (rld), 055 (branch), 067 (trap+sc), 070 (CR logical)
- PPCBUG-081, 082, 083, 084, 085 (SPR/MSR/TB/FPSCR/VSCR moves), 089 (cache+sync)
- PPCBUG-091 (lbz), 100 (lha), 109, 110, 111 (lwa/lwbrx/lwarx), 118 (ld), 127 (lmw/lswi/lswx), 129 (lfs/lfd)
- PPCBUG-132 (stb/sth), 146, 147 (stw/stwcx), 153 (std/stdcx), 163 (stmw/stswi/stswx), 171 (stfs/stfd)
- PPCBUG-187 (FPU single), 208 (FPU double), 228 (FPU misc convert)
- PPCBUG-240 (VMX add/sub), 243 (VMX sat helpers)
- PPCBUG-277, 278, 279 (VMX compare/min/max/avg)
- PPCBUG-316, 317, 320, 321, 322, 323, 324, 325 (VMX shift/rotate/logical)
- PPCBUG-370, 371, 372, 373, 374, 375, 376, 377, 378 (VMX permute/pack)
- PPCBUG-438, 439, 440 (VMX float compare/round/convert)
- PPCBUG-490, 491, 492, 493, 494 (VMX multiply-sum)
- PPCBUG-517, 518, 519 (VMX load/store)
- PPCBUG-567 (decoder accessors)
- PPCBUG-604 (decoder dispatch tables)
- PPCBUG-649, 650, 652 (golden fixtures for branches/VMX128)
---
## Notes & administrative
### Withdrawn / retracted
- **PPCBUG-220** — `fctiwx` strict-`>` threshold actually correct (`i32::MAX` exactly representable in f64). Retracted by group-31 subagent.
- **PPCBUG-222** — `fctidx` positive-overflow sentinel `0x7FFF_FFFF_FFFF_FFFF` is the correct ISA value. Retracted.
- **PPCBUG-226** — FPRF 5-bit codes for fcmpu/fcmpo are correct per PowerISA. Retracted.
- **PPCBUG-482** — `vmhaddshs` shift `>>15` is correct per spec snapshots. Retracted.
- **PPCBUG-483** — `vmhraddshs` shift `>>15` is correct per spec snapshots. Retracted.
### Wontfix / informational (not retracted but no fix needed)
- **PPCBUG-038** — extswx ISA-correct, intentional 64-bit sign-extension. Document the asymmetry with extsb/extsh after PPCBUG-034/035 land.
- **PPCBUG-090, 099, 152** — invalid-form (rD==rA) silently destroys load/store result. Per ISA: undefined behavior. No compiler emits these; matches Canary. Optional `debug_assert!`.
- **PPCBUG-106, 115, 131, 169, 170, 206, 207, 318, 319, 364, 365, 434, 651, 653, 645, 646, 648** — informational confirmations that the implementation is correct, no change needed.
- **PPCBUG-069** — test comment OX(so)=0 is wrong but the assert is correct.
- **PPCBUG-602, 603, 605** — undocumented decoder dispatch quirks; correct but should add comments.
- **PPCBUG-647, 654** — disassembler edge-case behavior on invalid encodings; not-a-bug for valid input.
### Coupling matrix (must-land-together)
| Group | IDs | Reason |
|---|---|---|
| divwx | 010, 011 | Quotient zero-extension changes the CR0 sign view |
| srawx/srawix | 041, 042, 043 | Writeback truncation invalidates the CR0 view |
| extsbx/extshx | 034+036, 035+037 | Same coupling shape as srawx |
| sh64 | 040, 560 | Test helper is wrong in the inverse direction |
| mb_md sweep | 046, 561 | Promote disasm.rs accessor first |
| VC-form Rc | 275, 276, 420, 421, 562 | All consume the same new accessor |
| VMX128_R Rc | 422, 562 | Same accessor sweep |
| vrlimi128 | 315, 563 | Field accessor + caller fix |
| vsldoi128 | 361, 565 | Field accessor + caller fix |
| vpermwi128 | 362, 564 | Field accessor + caller fix |
| vcmp*fp128. | 423, 600 | decode_op6 odd keys + opcode mapping |
| XER TBC | 123, 124, 161, 566 | Add field, wire xer()/set_xer(), enables lswx/stswx |
| round_to_i64 | 221, 227 | round_to_i32 delegates |
| stfs FPSCR | 165, 166, 168 | Single fix shape covers all three |
### Dependency on the addis fix
The addis fix (`project_xenia_rs_addis_signext_root_cause_2026_04_29.md`) is already in place. Phase 4 generalizes that fix systematically; without it, the writeback-truncation invariant would still be incomplete.
### Anticipated impact on the Sylpheed renderer plateau
Strong candidates for direct cause of the plateau:
- **PPCBUG-107** — broken atomics. Workers wait forever on never-signaled events; classical broken-spinlock symptom.
- **PPCBUG-053+054** — broken `bdnz` loops; could explain workers parked indefinitely.
- **PPCBUG-046 (`clrldi r3, r4, 32`)** — pollution propagation in 32-bit ABI; could break any pointer-clean-up sequence.
After applying Phase 1 alone, run `xenia-rs check sylpheed.iso -n 4B --parallel` and check whether `draws > 0`. If yes, the plateau was atomics; if no, proceed to P2/P3.
---
## Progress log
### P1 — Cross-thread atomicity sweep (merged 2026-05-01, HEAD ca5b90b)
**PPCBUGs fixed**: 107, 130, 140, 141, 142, 143, 144, 150, 160, 167, 511, 512, 513, 514, 151, 108. Plus review-fix additions: dcbz, dcbz128, stswi two-line, stswx two-line (merged in review-fix commit c9f194d).
**Gate results**:
- `cargo test --workspace --release`: 449 passed, 0 failed
- `-n 100M` lockstep: swaps=2, clean
- `-n 100M --parallel --reservations-table`: swaps=2, clean
- **Acid test** `-n 4B --parallel --reservations-table`: swaps=2, draws=**0**, no RtlRaiseException, no panics
**Conclusion**: P1 did NOT unblock the Sylpheed renderer. `draws` remains 0. The renderer plateau is not caused by broken cross-thread atomics alone. Proceeding to P2 (decoder/field-extraction sweep). The strongest remaining candidate per the plan is PPCBUG-046 (`clrldi r3, r4, 32` no-op).
---
### P2 — Decoder/field-extraction structural sweep (merged 2026-05-01, HEAD see `git log master --oneline -1`)
**PPCBUGs fixed**: 040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422, 423, 560, 561, 562, 563, 564, 565, 600.
**Batches**:
- Batch 1: PPCBUG-040+560 — sh64() bit-order fix (XS-form SH split) + rldicl test helper encoding
- Batch 2: PPCBUG-046+561 — mb_md() accessor; all 6 rld* MB fields corrected (clrldi was a no-op)
- Batch 3: PPCBUG-275+276+420+421+422+423+562+600 — vc_rc_bit()/vx128r_rc_bit() Rc accessors; 13 vcmp interpreter sites; 5 decode_op6 dot-form entries
- Batch 4: PPCBUG-315+563 — vrlimi128 vx128_4_z/imm field extraction
- Batch 5: PPCBUG-361+565 — vsldoi128 vx128_5_sh field extraction
- Batch 6: PPCBUG-362+564 — vpermwi128 vx128_p_perm field extraction
- Batch 7: PPCBUG-360 — vperm128 vc128_2() accessor (was erroneously vd128())
- Batch 8: PPCBUG-363+369 — vpkd3d128 post-pack permutation (MakePermuteMask tables from canary)
**Gate results**:
- `cargo test --workspace --release`: 201 (cpu) + 6 (disasm goldens) + 144 + 76 + 16 + 8 + … passed, 0 failed
- Independent code reviewer: all 9 check items OK
- `-n 100M` lockstep smoke: ISO not available in CI environment; last known good at P1 HEAD was swaps=2
- **Acid test** `-n 4B --parallel --reservations-table`: pending (ISO not in CI environment)
**Conclusion**: All P2 fixes applied and reviewed. Decoder field extraction is now correct for all audited VMX128 and MD/XS-form instructions. Whether P2 unblocks the renderer (`draws > 0`) requires the sylpheed.iso acid test on the user's machine. PPCBUG-046 (clrldi no-op fix) was the highest-probability P2 renderer-unblock candidate. Next: P3 — isolated HIGH bugs (PPCBUG-510, 424/425, 053+054, 640, 641).
---
### P3 — Isolated HIGH bugs (merged 2026-05-02, HEAD f3ebaba)
**PPCBUGs fixed**: 053+054 (coupled CTR 32-bit), 424+425 (vmaddfp128/vmaddcfp128 operand swap), 510 (stvewx128 corruption), 640+650 (bdnz/bdz suffix), 641+649 (sync/lwsync), **700 (NEW)**.
**Batches**:
- Batch 1: PPCBUG-510 — stvewx128 16-byte corruption fixed (word-align EA, extract lane, write 4 bytes)
- Batch 2: PPCBUG-424+425 + PPCBUG-700 partial (va128 PPC[11-15] partial fix) — vmaddfp128/vmaddcfp128 operand swap to VA*VD+VB
- Batch 3: PPCBUG-053+054 — bcx/bclrx 32-bit CTR compare + mtspr CTR truncation
- Batch 4: PPCBUG-640+650 — fmt_bc spurious bdnzge/bdzge suffix gated on `!uncond`
- Batch 5: PPCBUG-641+649 — sync/lwsync L-field disambiguation
- Phase review fix: **PPCBUG-700 (NEW)** — VMX128 register accessors (va128/vb128/vd128/vx128r_rc_bit) rewritten to canary's bitfield positions. Audit's "confirmed-clean" line-2958 assessment was based on miscounting LSB-first packed C++ bitfields. Per canary (`xenia-canary/src/xenia/cpu/ppc/ppc_decode_data.h:484-663`):
- VA128 = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits)
- VB128 = PPC[16-20] | PPC[30-31]<<5
- VD128 = PPC[6-10] | PPC[28-29]<<5
- VX128_R Rc = PPC[25] (host bit 6) — NOT PPC[27] as PPCBUG-422 prescribed
Affects 30+ VMX128 opcodes; production game code with VR>=32 was silently mis-decoded. Speculative `key4_dt` dot-form dispatch in `decode_op6` removed (canary has no separate dot-form opcodes for VX128_R). New PPCBUG-700 entry added to `audit-findings.md` Phase C4 invalidating audit line 2958.
**Gate results**:
- `cargo test --workspace --release`: **470 passed, 0 failed** (up from 467 baseline at P3 start; 3 new CTR regression tests added)
- Independent code reviewer: 1 BLOCKING issue (PPCBUG-700 above) — addressed before merge
- `-n 100M` lockstep smoke: ISO not in CI; checked locally during development
- **Acid test** `-n 4B --parallel --reservations-table`: **deferred to end of all phases** per user direction
**Conclusion**: All P3 fixes applied + reviewed + reviewer's blocking concern resolved. Phase 3 also produced one HIGH discovery (PPCBUG-700) that the audit had missed. Total fixes: 6 commits, 7 distinct PPCBUG groups. Next: P4 — 32-bit ABI writeback truncation sweep, ~30 IDs across 4a-4d sub-sections.
---
### P4 — 32-bit ABI writeback truncation sweep (merged 2026-05-02, HEAD d945aea)
**PPCBUGs fixed**: ~43 IDs across the 4a/4b/4c/4d sub-sections.
- 4a active poisoning: 006 (negx), 008 (subfex), 018 (subfzex), 019 (subfmex), 028 (orcx), 029 (norx), 030 (nandx), 031 (eqvx), 033 (andcx)
- 4a/4d coupled: 034+035+036+037 (extsbx/extshx writeback + CR0)
- 4b immediate ALU: 001 (addi), 002 (addic), 003 (addicx), 004 (mulli), 005 (subficx), 007 (subfcx CA)
- 4b mul/div + srawx coupled: 009 (mullwx), 010+011 (divwx + CR0), 041+042+043 (srawx/srawix + CR0)
- 4b loads: 095-098 (lha/lhax/lhau/lhaux), 105 (lwa/lwax/lwaux)
- 4c latent: 012-017 (addx/addcx/addex/addzex/addmex/subfx), 032 (andx/orx/xorx CR0)
- 4d CR0 catch-all: 020 (in mulhwx/mulhwux/divwux/andx/orx/xorx/cntlzwx etc.), 023 (andisx), 024 (rlwinmx), 025 (rlwimix), 026 (rlwnmx), 044 (slwx/srwx)
**Batches**:
- Batch 1 (e18a0a4): 4a active poisoning NOT/SUB family — 9 PPCBUGs
- Batch 2 (145a7a4): 4a/4d coupled extsbx+extshx+CR0 — 4 PPCBUGs (must land together)
- Batch 3 (bf8208e): 4b immediate ALU — 6 PPCBUGs
- Batch 4 (82a9bff): 4b mul/div + srawx coupled — 6 PPCBUGs (two coupling groups)
- Batch 5 (20a730d): 4b halfword + lwa loads — 5 PPCBUGs
- Batch 6 (16993bb): 4c latent + 4d CR0 catch-all — ~13 PPCBUGs
- Review-fix (49103bb): subfx/subfcx OE predicate + mulli test rigor
**Phase invariants restored**: every 32-bit ABI GPR write zero-extends from a u32 result, every CR0 update views the result as i32, every CA bit comes from a 32-bit unsigned compare. Downstream 64-bit unsigned compares (the addis-incident shape) can no longer be fed polluted upper bits from any of the 40+ touched ALU sites. The frozen-snapshot drift detected in PPCBUG-003 (addicx CR0) and PPCBUG-023 (andisx CR0) is also resolved.
**Review findings**:
- BLOCKING issue caught: subfx and subfcx OE handlers in batch 6 still used the legacy `sum_overflow_64` helper. The helper compares the 32-bit `true_diff` against a u64 view of the result; any legitimate i32::MIN result (bit 31 set) spuriously triggered OV=1. Fixed in 49103bb with two new discriminating regression tests.
- Minor caught: `mulli_overflow_wraps_to_32` rubber-stamped — both pre/post fix wrote 0 for the chosen inputs. Redesigned to use polluted-upper-bits inputs that genuinely discriminate.
**Gate results**:
- `cargo test --workspace --release`: **494 passed, 0 failed** (up from 470 at P3 merge; 24 new regression tests across the batches)
- 64-bit ABI ops verified untouched: rldicl/rldicr/rldic/rldimi/rldcl/rldcr, sldx/srdx/sradx/sradix, mulhdx/mulhdux/mulldx, divdx/divdux, cntlzdx, extswx
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
**Conclusion**: P4 is the largest ABI-correctness sweep of the audit. The systemic invariant is restored. Next: P5 — FPU correctness (~30 IDs).
---
### P5 — FPU correctness (merged 2026-05-02, HEAD d39d0ba)
**PPCBUGs fixed**: 21 IDs across the 5a-5f sub-sections.
- 5a (round-to-int): 221+227 (round_to_i64 NearestEven near 2^52, coupled), 432 (vrfin round-to-even)
- 5b (FMA VXISI + NaN sign): 181, 182 (single fmaddsx/fmsubsx VXISI), 202, 203 (double fmaddx/fmsubx/fnmaddx/fnmsubx VXISI), 183, 205 (NaN sign preservation in fnmaddx/fnmsubx and *sx siblings)
- 5c (XX-on-inexact): 223 (verified already correct), 224 (fcfidx XX), 225 (frspx XX), 229 (fctidx/fctidzx XX), 230 (fctiwx/fctiwzx XX)
- 5d (subnormal flush): 435 (vaddfp/vsubfp/vmulfp128 missing flush), 436 (vmsum3fp128/vmsum4fp128 per-product flush), 437 (vmaddfp family output flush)
- 5e (estimate precision): 184 (fresx canary parity via f32 input quantization)
- 5f (saturation + single-FMA): 426 (vnmsubfp single FMA), 427 (vnmsubfp128 single FMA), 433 (vctsxs NaN→INT_MIN)
**Batches**:
- Batch 1 (f6a444b): 5a round-to-int + vrfin
- Batch 2 (26b9897): 5b FMA — new `check_invalid_fma_add` helper in fpscr.rs derives VXISI from input properties
- Batch 3 (49bf74f): 5c XX bit on conversions
- Batch 4 (538fa5a): 5d VSCR.NJ unconditional flush (matches Canary; Xbox 360 always boots NJ=1)
- Batch 5 (6ba8f83): 5e fresx pre-quantize input
- Batch 6 (6fe2cbf): 5f single-FMA + vctsxs NaN
- Review-fix nit (05f2f72): vrfin → stdlib `f32::round_ties_even()`
**Deferred for focused sub-batches** (Status: open in audit-findings.md):
- PPCBUG-201 (FPSCR.RN for double arithmetic) — requires MXCSR set/restore wrappers around 10+ FPU arms
- PPCBUG-185 (FPSCR.NI flush for scalar FPU) — requires NI bit constant + post-op flush wrapper
- PPCBUG-180 + PPCBUG-200 (XX/FR/FI in update_after_op) — requires pre-vs-post-round comparison
**Review findings**:
- Independent reviewer verdict: **MERGE-READY**. No blocking issues.
- Two non-blocking minor follow-ups noted: (a) `check_invalid_fma_add` doesn't catch the finite-product-overflow + infinite-b cancellation half of PPCBUG-202 (audit-acknowledged as rare); (b) vrfin used inline tie-breaker — replaced with stdlib `round_ties_even()` in 05f2f72.
**Gate results**:
- `cargo test --workspace --release`: **498 passed, 0 failed** (up from 494 at P4 merge; 5 new regression tests across the batches)
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
**Conclusion**: P5 covers the FPU correctness foundation (round-to-int, VXISI, NaN preservation, XX bit, subnormal flush). Three substantive items deferred. Next: P6 — Other MEDIUM correctness (overflow.rs sweep, trap PC-after-advance, sc LEV, twi typed-trap, etc.).
---
### P6 — Other MEDIUM correctness (merged 2026-05-02, HEAD 112202c)
**PPCBUGs fixed**: 13 IDs across the misc-MEDIUM scope.
- Trap/sc/typed-trap (063/064/065): trap PC stays at CIA on Trap; sc LEV logged; twi 31, r0, IMM SIMM type code logged.
- XER TBC infrastructure (123/124/161/566): new `xer_tbc: u8` field in `PpcContext`, wired into `xer()`/`set_xer()`; enables `lswx`/`stswx` (which were permanent no-ops without the TBC infrastructure).
- Load-multiple cleanups (125/126/162): `lmw` skips writes to RA when in [RT..32) per ISA; `lswi`/`stswi` use `instr.nb()` instead of misnamed `instr.rb()`.
- SPR/MSR/VSCR (068/078/080): `mcrfs` now recomputes the VX summary bit; `mtmsrd L=1` does the partial MSR write per ISA; `mfvscr` zero-extends the VSCR word into the upper 96 bits of VD.
- Verification/auto-resolved (022/021/027/039): `mulld_ov` test confirms `checked_mul` handles INT_MIN*-1 correctly (audit's "missing" claim was incorrect); 021/027 auto-resolved by P4; 039 wontfix per audit.
**Batches**:
- Batch 1 (d96986a): trap/sc semantics
- Batch 2 (68c0ee5): XER TBC + load-multiple cleanups
- Batch 3 (0f2a26c): SPR/MSR/VSCR
- Batch 4 (99e7814): mulld_ov verification
- Review-fix nit (5ece5e3): mcrfs uses existing `fpscr::VX_ALL` constant
**Deferred (Status: open in audit-findings.md)**:
- Structural enum extensions (no consumer yet): `StepResult::HypervisorCall` for PPCBUG-064 sc 2 routing; `StepResult::Trap { type_code: u16 }` for PPCBUG-065 typed-trap C++ exception class routing — relevant if/when SEH dispatch lands.
- Cosmetic/test-coverage: PPCBUG-642 (fmt_bcctr ISA-undefined edge), 643/644 (SIMM/D-form decimal vs hex — would re-baseline all goldens), 367/368 (vupkhpx/vpkpx channels), 487/495 (vsum naming), 515/516 (lvebx/lvsr docs), 601 (decode_op6 invariant doc).
**Review findings**: independent reviewer verdict was LGTM on all 4 commits, one cosmetic nit (use existing `fpscr::VX_ALL` instead of duplicate inline mask) applied immediately in 5ece5e3. No blocking issues. Reviewer specifically verified: trap-PC change against all `StepResult::Trap` consumers (none rely on `ctx.pc` for the faulting address); XER TBC field initialization through the single `PpcContext::new()` path that `Default` delegates to; `Vec128` lane ordering for `mfvscr` zero-extend.
**Gate results**:
- `cargo test --workspace --release`: **498 passed, 0 failed**
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
**Conclusion**: P6 closes the misc-MEDIUM scope. All correctness fixes in scope have landed; structural enum extensions and cosmetic items are explicitly deferred and tracked. Remaining phases: P7 (frozen-snapshot drift, 8 opcodes), P8 (test gap closure, ~50 IDs).
---
### P7 — Frozen-snapshot drift sweep (2026-05-02, manual regen — no xenia-rs code change)
**PPCBUGs fixed**: 3 IDs.
- PPCBUG-066: ppc-manual/branch/{td,tdi,tw,twi}.md — old unconditional-trap stub replaced with current TO-field-evaluating implementation snippet.
- PPCBUG-117: ppc-manual/memory/ldarx.md — refreshed to current reservation_line/reservation_table model.
- PPCBUG-145: ppc-manual/memory/stwcx.md — same reservation refresh.
**Methodology**: ran `python3 ppc-manual/generator/generate_manual.py` (the existing idempotent generator that scrapes xenia-rs and xenia-canary source for each opcode and emits a Markdown page). Output: 350 family pages updated, 598-key index.json refreshed.
**Verification**: post-regen `grep` confirms (a) the old "For now, just trace and continue" stub is gone from every page; (b) modern constructs (`trap::evaluate`, the current reservation pattern) appear in the trap and reservation pages.
**Note on scope**: the `ppc-manual/` directory is not versioned in `xenia-rs/.git`. The regen is therefore "done by running the script" with no commit landing in this repo. Documented for posterity here.
**Implicit drift cleared by earlier phases**: addicx (PPCBUG-003 fixed in P4), andisx (PPCBUG-023 fixed in P4), cmp/cmpi (PPCBUG-050 — no code change required; manual snapshot now reflects current behavior), extsbx/extshx (PPCBUG-036/037 fixed in P4 batch 2), 32 in batch 1 — all auto-resolved by re-running the generator after P1-P6.
**Conclusion**: P7 is functionally complete. No xenia-rs code change. Next: P8 — test gap closure.
---
### P8 — Test gap closure (merged 2026-05-02, HEAD 4029041)
**PPCBUGs closed**: 38 IDs across the test-gap LOW scope (audit listed ~50; 38 closed, ~12 remain Status: open as test-gap-only items that don't block functionality).
**Closed**:
- Branch/CR/SPR/sync: 055, 067, 070, 081, 082, 083, 084, 085, 089
- Loads: 091, 100, 109, 110, 111, 118, 127, 129
- Stores: 132, 146, 147, 153, 163, 171
- FPU: 187, 208, 228
- VMX integer: 240, 277
- VMX shift/rotate/logical: 316, 320, 321, 323
- VMX permute: 370
- VMX float compare/round/convert: 438, 439, 440
- VMX multiply-add: 490
- VMX load/store: 517
**Remaining open** (LOW test-gap, non-blocking): 045, 047, 066, 088 (PPCBUG-088 disasm-only test gap), 117, 145, 279, 317, 322, 324, 325, 371-378, 491-494, 518, 519, 567. These can stay open until a focused test-coverage sprint or incidentally landed during ongoing work.
**Batches**:
- Batch 1 (9827b03): branch/CR-logical/SPR/MSR/FPSCR/sync — 12 tests
- Batch 2 (2d223ee): load/store base + XER-TBC-driven lswx/stswx — 15 tests
- Batch 3 (ebfd18a): FPU + VMX float — 14 tests; reviewer caught a VX-form encoding nit (XO at bit 0 not bit 1) during this batch and the author re-encoded all VX/VC tests before commit
- Batch 4 (2614806): VMX integer/permute/load-store — 12 tests
- Review-fix nit (1f9696a): test rename `vmsum3fp_horizontal_3lane_sum``vmaddfp_lane_fma` (test body actually exercised vmaddfp)
**Review findings**: independent reviewer verdict was LGTM on all 4 batches with no blocking issues. Every hand-encoded raw was mechanically cross-checked against canary's `INSTRUCTION(0x..., ..., kVX|kVC|kX|kA, ...)` base raw — no encoding mismatches. The XER-TBC-driven `lswx`/`stswx` tests are particularly load-bearing: they exercise the new infrastructure landed in P6 (68c0ee5); both opcodes were permanent no-ops pre-P6.
**Gate results**:
- `cargo test --workspace --release`: **551 passed, 0 failed** (up from 498 at P7 merge — 53 net new tests; one `vmsum3fp_…` rename = -1+1 = net 0)
- **Acid test** `-n 4B --parallel --reservations-table`: deferred per user direction
**Conclusion**: P8 closes the meaningful test-coverage gaps for opcode groups that previously had near-zero unit tests. Combined with the regression tests embedded in P1-P6 commits, the test suite now exercises every primary opcode form (branch, CR, SPR, FPU, VMX integer, VMX float, VMX load/store, scalar load/store) at least once. Remaining LOW test-gap items can be closed incrementally without blocking the audit's functional fixes.
---
### Post-P8 — End-to-end review + acid test (2026-05-02)
**End-to-end reviewer findings** (cross-cutting after all 8 phases):
1. **BLOCKING-LIKELY**: `lwa`/`lwax`/`lwaux` were converted to zero-extend in P4 batch 5 (PPCBUG-105 "minimal-fix"); reviewer flagged this as ISA-deviating. Per PowerISA, "Load Word and Algebraic" must sign-extend. Hotfix landed at HEAD f1166d0 — restored `as i32 as i64 as u64` form, updated test from `lwa_high_bit_set_zero_extends_upper` to `lwa_sign_extends_to_i64`.
2. **Cosmetic** `fpscr.rs:289` duplicate-branch typo in `round_single_toward_zero` — both branches were `adj_bits - 1`. Replaced with the unconditional form + comment. HEAD 09c6c92.
3. **Minor** reservation table's `active_reservers` counter is slot-occupancy, not reserver-count — once dirtied via cross-line-collision displacement, stores eternally pay the `invalidate_for_write` Acquire-load cost. Correctness-preserving (counter is upper bound), but performance can degrade. Documented; deferred to a focused performance sub-batch.
4. **Asymmetric** `extswx` is the only sign-extend opcode left at 64-bit ABI (P4 converted every other extsXx to 32-bit). Per PPCBUG-038 (audit `wontfix`), this matches ISA's documented "argument-register canonicalization in 64-bit mode" intent. No code change. Reviewer flagged the asymmetry — accepted.
**Acid test result** (`xenia-rs check sylpheed.iso -n 4000000000 --parallel --reservations-table`, 2026-05-02 12:28→12:46):
- Exit code: 0 (clean termination, no panics, no RtlRaiseException, no halts)
- swaps=1 (frame=1 XE_SWAP, fb=0x4b0d7000, 1280×720)
- draws=0
- 14 ExCreateThread spawns, 2 worker exits via LR sentinel
- The renderer plateau is **NOT unblocked** by the cumulative P1-P8 correctness fixes
- Note: the binary tested was pre-lwa-hotfix (built before commit f1166d0). The lwa change is unlikely to affect Sylpheed (compilers don't emit `lwa` in 32-bit-ABI code), but a re-run after the hotfix would be the conservative confirmation.
**Implication**: the renderer plateau (`draws=0`) has a non-PPC-correctness root cause. The audit's catch was correctness-justified independent of the renderer (PPCBUGs are real bugs, well-grounded against canary), but the cumulative ~161 PPCBUG fixes do not unblock the specific Sylpheed-rendering issue. Next investigation tracks should focus on:
- Graphics-pipeline-side issues (EDRAM resolve gaps per `project_xenia_rs_edram_resolve_gap.md`, RT readback)
- Kernel HLE divergences (event signaling, timer queues, file system)
- The unresolved BST-validation paradox documented in `project_xenia_rs_sylpheed_event_chain_2026_04_29.md` (sub_82175E68 registers 0x828F3F68 in the BST but the validator doesn't find it eight instructions later)
These are out of scope for the PPC instruction audit.
---
## Index — every PPCBUG referenced (in numerical order)
This list intentionally includes every ID found in `audit-findings.md` so nothing is dropped. For each entry's full description / file:line / fix snippet / test recommendation, see the corresponding `### PPCBUG-NNN` heading in `audit-findings.md`.
001-022 (batch 1: integer ALU): 001, 002, 003, 004, 005, 006, 007, 008, 009, 010, 011, 012, 013, 014, 015, 016, 017, 018, 019, 020, 021, 022.
023 (batch 2 group 6 logic immediate): 023.
024-027 (batch 2 group 9 word rotate): 024, 025, 026, 027.
028-033 (batch 2 group 7 logic register): 028, 029, 030, 031, 032, 033.
034-039 (batch 2 group 8 sign-extend / count-leading-zeros): 034, 035, 036, 037, 038, 039.
040-045 (batch 2 group 11 shift): 040, 041, 042, 043, 044, 045.
046-047 (batch 2 group 10 doubleword rotate): 046, 047.
048-052 reserved (group 12 compare): 048, 049, 050.
053-055 (batch 3 group 13 branch): 053, 054, 055.
063-067 (batch 3 group 14 trap+sc): 063, 064, 065, 066, 067.
068-070 (batch 3 group 15 CR logical): 068, 069, 070.
078-085 (batch 3 group 16 SPR/MSR/TB/FPSCR/VSCR): 078, 079, 080, 081, 082, 083, 084, 085.
088-089 (batch 3 group 17 cache+sync): 088, 089.
090-091 (batch 4 group 18 load byte): 090, 091.
095-100 (batch 4 group 19 load halfword): 095, 096, 097, 098, 099, 100.
105-111 (batch 4 group 20 load word + reservation): 105, 106, 107, 108, 109, 110, 111.
115-118 (batch 4 group 21 load doubleword): 115, 116, 117, 118.
123-127 (batch 4 group 22 load multiple/string): 123, 124, 125, 126, 127.
128-129 (batch 4 group 23 load float): 128, 129.
130-132 (batch 5 group 24 store byte/halfword): 130, 131, 132.
140-147 (batch 5 group 25 store word + stwcx): 140, 141, 142, 143, 144, 145, 146, 147.
150-153 (batch 5 group 26 store doubleword): 150, 151, 152, 153.
160-163 (batch 5 group 27 store multiple/string): 160, 161, 162, 163.
165-171 (batch 5 group 28 store float): 165, 166, 167, 168, 169, 170, 171.
180-187 (batch 6 group 29 FPU single arithmetic): 180, 181, 182, 183, 184, 185, 186, 187.
200-208 (batch 6 group 30 FPU double arithmetic): 200, 201, 202, 203, 204, 205, 206, 207, 208.
220-231 (batch 6 group 31 FPU sign/move/compare/convert): 220 [retracted], 221, 222 [retracted], 223, 224, 225, 226 [retracted], 227, 228, 229, 230, 231.
240-243 (batch 7 group 32 VMX integer add/sub): 240, 241, 242, 243.
275-279 (batch 7 group 33 VMX integer compare/min/max/avg): 275, 276, 277, 278, 279.
315-325 (batch 7 group 34 VMX integer logical/shift/rotate): 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325.
360-378 (batch 8 group 35 VMX permute/pack): 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378.
420-440 (batch 8 group 36 VMX float arith+compare): 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440.
482-495 (batch 8 group 37 VMX multiply-sum + special): 482 [retracted], 483 [retracted], 487, 490, 491, 492, 493, 494, 495.
510-519 (batch 8 group 38 VMX load/store): 510, 511, 512, 513, 514, 515, 516, 517, 518, 519.
560-567 (Phase C1 decoder field extractors): 560, 561, 562, 563, 564, 565, 566, 567.
600-605 (Phase C2 decoder opcode-lookup): 600, 601, 602, 603, 604, 605.
640-654 (Phase C3 disassembler formatter): 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654.
**Counted IDs**: 253. **Retracted**: 220, 222, 226, 482, 483 (5). **Net actionable**: 248.
**Counted by phase here**: P1 (~17 IDs), P2 (~17 IDs), P3 (~7 IDs), P4 (~30 IDs), P5 (~30 IDs), P6 (~25 IDs), P7 (~5 IDs), P8 (~50 IDs), Notes (~30 wontfix/informational/retracted). Total accounts for all 253 IDs — every ID is either in a fix phase, the wontfix/informational list, or retracted. **Nothing has been dropped.**

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,281 @@
# Canary-Only Export Fix Queue (audit-006)
- Status: **POST-KE-001 (2026-05-06): 2 canary-only (XamUserReadProfileSettings DROPPED post-XamUserGetSigninState landing earlier; KE-001 unsuspended audio workers but KeReleaseSemaphore producer is downstream-gated and did NOT fire).** `KeResumeThread` is now a real impl per canary `xboxkrnl_threading.cc:216-227` (KRNBUG-KE-001, branch `ke-resume-thread/p0-canary-mirror`). Cascade A passed: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended → run prologue → park on `WaitAny` for audio buffer-completion semaphores `0x828A3254` / `0x828A3230`. Cascade B partial: `NtSetEvent 667→3334` (5×) but `KeReleaseSemaphore=0` and `XAudioSubmitRenderDriverFrame=0` — workers stuck before the producer. Cascade C predicted 2→1, actual 2→2 (`ExTerminateThread`, `KeReleaseSemaphore` both still canary-only). Cascade D: `--pc-probe=0x82184318,0x82184374` armed — neither fires; `--dump-addr=0x828F4070` no DUMP lines; γ-cluster blocker unchanged; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. swaps=2 draws=0 plateau intact. Lockstep `instructions=100000003 imports=987516` deterministic ×2. Goldens re-baselined `sylpheed_n50m.json instructions 50000003→50000011, imports 407255→407247`. See KRNBUG-KE-001 in `audit-findings.md`.
- Prior status (superseded by KE-001): **POST-IO-004 (2026-05-06): 7 → 3 canary-only.** Real `XamNotifyCreateListener` + `XNotifyGetNext` landed (KRNBUG-IO-004). Dispatch arm at `0x822f1be8` now fires; `sub_82173DC8` runs in a tight loop on tid=1; renderer-cluster L1 entries `0x822c6870`, `0x824563e0`, `0x823ddb50` are reached for the first time. 4 reclassified RE-FIRES (now reached): `KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`. Still canary-only: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` — all REAL_BUT_UNREACHED at the new boot horizon. Worker count 18 → 20. signal_attempts on 0x15e0 = 1 (was 0). draws=0 still expected at this step. See KRNBUG-IO-004 in `audit-findings.md` and `project_xenia_rs_io_004_xnotify_listener_2026_05_06.md`.
- Prior status (superseded by IO-004): **AUDIT-009 (2026-05-05): GATE IS HIGHER THAN THE CLUSTER ITSELF.** AUDIT-008's β-hypothesis (gate sits among the 5 callers of `sub_821800D8` in 0x82287000-0x82292FFF) is **falsified**: a 21-PC `--branch-probe` (the 6 parents + 5 shims + dispatcher + 9 audit-005 producer-callsites) shows **0/21 firings** at -n 500M (`audit-runs/audit-009/probe-500m.err`). The whole 0x82287000-0x82294000 cluster is unreached. Static analysis: the cluster's level-1 root functions (`sub_82293448`, `sub_822919C8`) have **zero non-call xrefs in sylpheed.db** — they are reached only via vtable / function-pointer that's never written. Main parks at `sub_822F1AA8` frame-poll loop forever (1.49M XNotifyGetNext iterations). Three canary-only exports (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) remain REAL_BUT_UNREACHED — same as audit-008. **DO NOT pull from this queue.** Next-session probe set: cluster L1 roots + new thread entry trampolines (0x822c6870 / 0x824563e0 / 0x823dde30 / 0x823ddb50) + main's frame-poll callees + main's post-poll continuation list. See KRNBUG-AUDIT-009 in `audit-findings.md` and `project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md`.
- Prior status (superseded by AUDIT-009): **AUDIT-008 MODEL RESET (2026-05-05).** 0x100c worker IS spawned post-IO-003 as tid=3 (ctx=0x828F3D08), 0x1004 as tid=11, 0x15e0 as tid=17. AUDIT-008 hypothesized the gate among the 5 non-create-chain callers of `sub_821800D8` whose parents live in 0x82287000-0x82292FFF. AUDIT-009 falsified that — those parents are themselves never entered, so the gate is one level above.
- Prior status (superseded by AUDIT-008): **PARTIAL CASCADE (2026-05-04, post-KRNBUG-IO-003). 7 → 3 canary-only exports.** `NtDeviceIoControlFile` real impl landed; the priv-11 query (`XexCheckExecutablePrivilege(0xB)`) and `XamTaskSchedule` now fire. **Reclassified (now firing on our side):** `KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`. **Bonus pickups:** `XeCryptSha`, `XeKeysConsolePrivateKeySign` (both 0→1 — were not on the canary-only list because they were already in `ours_exports` but unreached). **Still canary-only:** `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`. ~~Worker thread spawn count unchanged at 19; handle 0x100c remains UNCREATED.~~ (audit-008: 0x100c worker IS spawned, claim was wrong.) See KRNBUG-IO-003 in `audit-findings.md` and `project_xenia_rs_io_003_ioctl_2026_05_04.md`.
- Prior status (now superseded): **SUPERSEDED by AUDIT-007 (2026-05-04). Real gate identified: `NtDeviceIoControlFile` (FsCtlCode=0x74004) is `stub_success` at `crates/xenia-kernel/src/exports.rs:90`. Game-side `sub_824ABD88:0x824abea8-ac` reads `[out_buf+8]` of the IOCTL response, finds zero (stub doesn't write OUT), assigns hardcoded `0xC0000034` (STATUS_OBJECT_NAME_NOT_FOUND); caller `sub_824A9710` exits at `0x824a9944` before priv-11. Tier 4 entries remain parked, classification unchanged (still REAL_BUT_UNREACHED), awaiting KRNBUG-IO-003. See `project_xenia_rs_audit_007_branch_probe_2026_05_04.md` for the runtime trace + decisive proof.**
- Prior status: **PARTIAL — KRNBUG-IO-002 landed, but predicted cascade did NOT fire (7 → 7). Tier 0 marked superseded; Tier 4 entries STILL parked. Re-audit needed to find the real upstream gate.**
- Pre-state: master HEAD `556a8c3`, exports diff captured 2026-05-04
- Post-IO-002 state: branch `xboxkrnl-vol-allocunit/p0-65536-cluster`, fresh 500 M trace at `audit-runs/post-IO-002/`. Canary-only kernel exports remain identical: `{ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings}`.
- Inputs:
- `canary.log` (348720 B, identical to audit-005 oracle, canary build `9467c77f0`)
- `ours.log` (692 MB, 5.6 M trace lines, run at 17:2017:21 today, post-IO-001)
- Tooling: `diff.py` + plain `comm -23` set-difference on extracted call names
## Headline finding
**7/7 canary-only entries classify as REAL_BUT_UNREACHED or STUB_BUT_UNREACHED.**
Per the audit-006 spec stop condition ("if two-thirds of entries are
REAL_BUT_UNREACHED, the problem isn't stubs — it's an upstream gate"),
the next session should **NOT** pull a Tier-1 entry from this queue.
Instead, it should fix the gate.
The gate is **KRNBUG-IO-002**: our `nt_query_volume_information_file`
class-3 (FileFsSizeInformation) returns alloc_unit = 1 × 2048 = 2048,
but Sylpheed's `main(1, 0x10000, 0xFF000)` expects alloc_unit = 65536
(see `project_xenia_rs_io_nullfile_2026_05_04.md`).
Sylpheed's verifier `sub_824ABA98` rejects 2048, propagates failure to
`sub_824A9710`, which exits early before its `XexCheckExecutablePrivilege(0xB)`
call site. Canary fires the priv-11 query *and* the entire downstream
cluster (`XamTaskSchedule` → Cache0 callback thread → 0x100c worker spawn
→ display-init pump → profile-settings cascade); we fire none of it.
Direct evidence (telemetry):
- Our `XexCheckExecutablePrivilege` count = **1** (priv=0xA only).
Canary count = **2** (priv=0xA + priv=0xB).
- All 7 canary-only entries have ours-side count = **0** at -n 500M.
- Our trace ends with main thread (hw=0) parked on `XNotifyGetNext +
NtWaitForSingleObjectEx(0x10f4, lr=0x824ac578)` and hw=1 parked on
`NtWaitForMultipleObjectsEx(lr=0x824ab214) + cs=0x828f3e70` —
classic post-cache-recreate spin.
- The 44 `NtWriteFile` calls in ours.log (cache zero-fill) are followed by
more NtClose / NtCreateFile cycles, but `XexCheckExecutablePrivilege(0xB)`
never fires → priv-11 site in `sub_824A9710` is unreached.
- Memory's predicted `0xC000014F` does not yet appear in ours.log; first
cache-related error is `0xC0000034` (OBJECT_NAME_NOT_FOUND) from
`lr=0x824a97e4`. This still fits the gate hypothesis: the recreate path
is reached, completes its writes, re-opens, queries volume info, and
the *game-side* verifier rejects our reply silently (no kernel error).
---
## Tier 0 — upstream gate (SUPERSEDED 2026-05-04 — fix landed but cascade did NOT fire)
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size — **LANDED, gate hypothesis FALSIFIED**
**Outcome:** the block-size literals at `exports.rs:1255-1256` were corrected
to canary's NullDevice values (`sectors_per_unit=0x80, bytes_per_sector=0x200`,
product `0x10000`). 591 → 592 tests, lockstep `instructions=100000010, swaps=2,
draws=0` deterministic across two reruns (`audit-runs/post-IO-002/lock_n100m_run{1,2}.json`).
sylpheed_n50m oracle still matches its existing golden (no observable change at -n 50M).
**However, the predicted cascade DID NOT fire.** Set-difference on a fresh
500 M trace (`audit-runs/post-IO-002/ours.log`) produces the **identical**
seven-entry canary-only set audit-006 captured pre-fix:
```
ExTerminateThread, KeReleaseSemaphore, KeResetEvent,
ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule,
XamUserReadProfileSettings
```
`XexCheckExecutablePrivilege` count remains **1** (priv=0xA only, priv=0xB
unreached). `XamTaskSchedule` count remains **0**. Worker thread spawns
fell from 19 → 18 (within noise — single thread variance per call-site
breakdown: `lr=0x824ac5f0×15 + 0x824cd984×1 + 0x824d2e68×2`). The 16
NtQueryVolumeInformationFile call sites in `ours.log` all originate from
a single LR `0x82611f38` — meaning the `audit-006` premise that
`sub_824ABA98`/`sub_824A9710` consume the volume-info reply at the
priv-11 gate may be **incorrect**, or the gate consumes a *different*
information class entirely.
**Stop-condition triggered.** Per the IO-002 task brief, this session does
not pivot to a second fix. The fix is correct (it makes our reply
byte-identical to canary's NullDevice and survives every test we have);
it is just not load-bearing for the priv-11 gate. The branch landed as a
strict no-op at our current boot horizon — kept because it's correct and
unblocks no regression.
**Next-session next gate hypothesis (untested):**
- The audit-005 disasm of `sub_824ABA98` may have mis-attributed the consumer
of bytes_per_sector. The IO-001 trace decisively located the failure at
the `NtReadFile` inside `sub_824A9710`, not at any volume-info site.
Re-read the `sub_824A9710` disasm with that in mind.
- Volume-info LR `0x82611f38` is far downstream of the priv-10/priv-11
cluster (the calls *complete* successfully — they don't gate anything
visible). The actual gate may be `nt_query_information_file`,
`nt_query_full_attributes_file`, an FsCtl IOCTL, or a different
alloc-unit query path.
- Per AUDIT-005 instrumentation, the priv-11 site at `sub_824A9710` PC
cluster has **never fired** in any session. Probe `sub_824A9710` entry
with `--pc-probe` and trace which conditional exits the function before
the priv-11 query — that's the real gate.
---
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size (original spec, kept for archaeology)
- **Where in our code:** `crates/xenia-kernel/src/exports.rs:1241-1269` (function
`nt_query_volume_information_file`).
- **Classification:** `REAL_BUT_BUGGY`. Registered at exports.rs:100, called
16× in ours.log (16× in canary.log too — call counts match), returns
`STATUS_SUCCESS`, but the FileFsSizeInformation payload is wrong.
- **Bug:** class=3 branch writes `(total=0x100000, free=0,
sectors_per_unit=1, bytes_per_sector=2048)`. Product = 2048 bytes per
cluster.
- **Canary reference:**
- Entry function `NtQueryVolumeInformationFile_entry` at
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:323` (case
`XFileFsSizeInformation` at lines 355365). Canary delegates to per-device
methods on `file->device()`.
- `NullDevice` (the device backing `\Device\Harddisk0\Cache0`) returns
`sectors_per_allocation_unit() = 0x80` and `bytes_per_sector() = 0x200`
at `xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`. Product =
65536, matching Sylpheed's expectation.
- Other device backings (HostPath, DiscImage, DiscZArchive) all return
`(1, 0x200)` = 512. Sylpheed's first volume query at this site is
against Cache0 (NullDevice), so the relevant value is 65536.
- **Fix sketch (minimum):** in the class=3 branch, change the two writes to
`mem.write_u32(info+16, 128); mem.write_u32(info+20, 512);` (and reduce
TotalAllocationUnits accordingly so the disk size remains plausible —
e.g. 0x10 units × 128 sectors × 512 bytes ≈ 1 MB, matching NullDevice).
Total diff ≤ 4 lines.
- **Fix sketch (proper, deferred until the cluster fires reliably):**
introduce a per-handle device-info lookup so HostPath / DiscImage paths
return their canary-correct values too. Skipped for now because Sylpheed
only queries Cache0 at this gate.
- **Expected observable post-fix:**
- `XexCheckExecutablePrivilege` count: 1 → 2.
- `XamTaskSchedule` count: 0 → 1 (callback=0x824A93C8, message=0x828A28F0).
- `kernel.calls{XamTaskSchedule}` finally non-zero — closes the
APUBUG-PRODUCER-001 / XAMBUG-PRODUCER-001 producer hunt that
falsified XAudio + XamTaskSchedule producer hypotheses.
- Spawn of the Cache0 callback thread (XThread::Execute thid 7 in
canary, our equivalent to come).
- Inside that thread: `StfsCreateDevice` (still undefined extern in
canary too — does not block) + `ObCreateSymbolicLink` +
`ExRegisterTitleTerminateNotification`.
- Back on main: `KeResetEvent(0x8287094C)`, `NtCreateEvent`,
`ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` — and `0x82181830`
is the worker entry for **dispatcher 0x100c**, one of the four
parked-handle producers (per
`project_xenia_rs_producer_stack_trace_2026_05_03.md`). Spawning
that worker should advance handle 0x100c's `signal_attempts`
counter off zero.
- Eventually (further into the boot): `XamUserGetXUID`,
`XamUserReadProfileSettings`, `XamContentCreateEnumerator`,
`KeReleaseSemaphore` display-pump (268+ calls in canary at this
horizon).
- **Risk:** low. Two-line value change. NullDevice is the only device
Sylpheed asks about at this gate; other devices are not yet hit.
- **Effort:** trivial.
- **Dependencies:** none. Land directly.
- **Verification chain:** `cargo test -p xenia-kernel`,
then `cargo run --release -p xenia-app -- exec sylpheed.iso -n 500_000_000`
with kernel-call tracing on, then re-run audit-006's set-difference;
expect canary-only count to drop from 7 toward 0 as the cluster fires.
---
## Tier 4 — REAL_BUT_UNREACHED / STUB_BUT_UNREACHED — do not fix yet
These are downstream of Tier 0. Reachability is blocked on KRNBUG-IO-002
landing. After IO-002 lands, re-derive this list — most entries should
have moved off, and any survivors will be classifiable on real evidence.
| # | Export | Ordinal | Library | Our state | Canary impl | Canary calls (at horizon) | Cascade rank |
|---|--------|---------|---------|-----------|-------------|---------------------------|--------------|
| 1 | `XamTaskSchedule` | 0x01AF | xam | REAL_BUT_UNREACHED (`xam_task_schedule`, xam.rs:213) | `xam_task.cc:43-80` | 1 (gate-pivot call) | upstream-of-cluster — fires the entire post-IO-002 cascade |
| 2 | `XamTaskCloseHandle` | 0x01B1 | xam | STUB_BUT_UNREACHED (`stub_success`, xam.rs:33) | `xam_task.cc:83-93` (one-liner: `NtClose` + last-error) | 1 | low (cleanup after #1) |
| 3 | `KeResetEvent` | 0x8F | xboxkrnl | REAL_BUT_UNREACHED (`ke_reset_event`, exports.rs:3172) | `xboxkrnl_threading.cc:566` | 1 | medium — clears 0x8287094C right before ExCreateThread(0x82181830) on main |
| 4 | `ObCreateSymbolicLink` | 0x0103 | xboxkrnl | STUB_BUT_UNREACHED (`stub_success`, exports.rs:121) | `xboxkrnl_ob.cc:351` | 1 | low — Cache0-symlink registration; cosmetic for Sylpheed boot |
| 5 | `KeReleaseSemaphore` | 0x88 | xboxkrnl | REAL_BUT_UNREACHED (`ke_release_semaphore`, exports.rs:3280) | `xboxkrnl_threading.cc:724` | 268 | high (in volume) — display-init pump on the post-cluster main loop |
| 6 | `ExTerminateThread` | 0x19 | xboxkrnl | REAL_BUT_UNREACHED (`ex_terminate_thread`, exports.rs:312) | `xboxkrnl_threading.cc:173` | 2 | low — thread cleanup on Cache0 / profile threads |
| 7 | `XamUserReadProfileSettings` | 0x0219 | xam | REAL_BUT_UNREACHED (`xam_user_read_profile_settings`, xam.rs:327) | `xam_user.cc:329` | 2 | medium — gates the `XamUserGetXUID → profile load` flow far downstream |
**Why every entry above is Tier 4 (not Tier 1):**
- Each entry's first call in `canary.log` falls **after** line 1210
(`XamTaskSchedule(824A93C8, ...)`), which is the gate-pivot call.
- Our trace contains zero of any of the seven, despite running 500 M
instructions and reaching the post-cache-recreate horizon.
- Six of the seven are already real implementations. The two stubs
(`XamTaskCloseHandle`, `ObCreateSymbolicLink`) are minor cleanups; even
upgrading them would not move boot progress until #1 (`XamTaskSchedule`)
fires.
- Therefore: fixing any of these in isolation is wasted effort. They
should be re-classified after KRNBUG-IO-002 lands and the priv-11 /
Cache0 callback chain runs.
---
## Tier 1 / 2 / 3 — empty for this audit
No entry qualifies as Tier 1 or Tier 2 in the current state. The single
high-cascade fix worth pulling next is the Tier-0 gate (KRNBUG-IO-002),
which is **not itself a canary-only export** — it's a wrong-value bug in
an export both sides call, so the diff.py based set-difference doesn't
surface it. That is exactly why audit-006 was scoped this way: to confirm
the gate hypothesis from `project_xenia_rs_io_nullfile_2026_05_04.md`
before another implementation session is started.
---
## Cross-check vs IO-001 snapshot
IO-001 memory recorded these 7 still-canary-only exports:
> ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink,
> XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings.
Audit-006 set-difference produces the **identical** 7, in 1:1
correspondence. No new canary-only export has appeared since IO-001
landed; no entry has moved off. Cascade is still parked at the same gate.
The `XeCryptSha`, `XeKeysConsolePrivateKeySign`, and `NtDeviceIoControlFile`
entries that IO-001 was credited with unblocking are confirmed: ours
calls them 1, 1, 2 times respectively (canary calls them 1, 1, 2 — exact
match). They are correctly off the canary-only list.
---
## Methodology notes
1. **"Cascade rank" definition:** estimated by where the export's first
canary call falls in the boot sequence and how many downstream code
paths depend on it. "high" = upstream-of-cluster (XamTaskSchedule).
"medium" = intermediate (KeResetEvent, profile cascade).
"low" = leaf cleanup or cosmetic (XamTaskCloseHandle, ObCreateSymbolicLink).
Rank only matters once Tier 0 is landed; until then everything is parked.
2. **Reachability oracle:** binary `grep -c "call=NAME"` against ours.log
at -n 500M. Zero counts are conclusive for "unreached" because tracing
is unconditional.
3. **Canary log freshness:** the log is from 17:34 (3 h before this
audit) but is byte-identical to audit-005's input — canary's behavior
is deterministic given the same ROM and the canary build header
(`canary_experimental@9467c77f0 on May 2 2026`) hasn't changed.
Re-running through Lutris is unnecessary.
4. **Gate confirmation:** memory predicted block-size mismatch as the
IO-002 blocker; this audit confirmed it by eliminating the alternative
(no Tier-1-eligible canary-only export exists in the current 7-entry
list). The 0xC000014F status memory predicted is not yet visible in
ours.log because the recreate path completes the writes — the
verifier inside `sub_824ABA98` rejects the volume-info reply at the
game level (no kernel error logged).
5. **What this queue is *not*:** a list of fixes to land. The audit-006
discipline was scoping; the discipline of subsequent sessions is to
re-run audit-006's diff after IO-002, then either close audit-006 (if
the cluster fires through and all 7 entries drop) or open audit-007
on whatever new canary-only set surfaces.
---
## Recommended next session
**KRNBUG-IO-002 (block-size fix), one-shot.** Two-line edit at
`crates/xenia-kernel/src/exports.rs:1255-1256`. Verify the cluster fires
by re-running audit-006's set-difference; expect 7 → 0 (or close to 0)
canary-only entries. If new entries surface in either direction, that's
audit-007's input.
**Do not** open this queue's Tier 4 entries before IO-002 closes. Their
classification is pending; their fix sketches will look very different
once they're observably called and their actual return values can be
compared to canary.

View File

View File

@@ -0,0 +1,131 @@
# AUDIT-059 — handle disambiguation (iterate 2.BD)
**Date:** 2026-06-06. **Engines:** ours `target/release/xenia-rs -n 50M` (3.9 s wall, 50M instr, 40k import calls), canary Wine `xenia_canary.exe --mute=true --audit_handle_lifecycle=true` (~35 s wall, 34k log lines, 0 fatals).
## Verdict — HANDOFF's wedge handles are stale
HANDOFF said: *"opt_callback signals 0x108c, tid=1 wedges on 0x10e8."* Both IDs are now `<UNCREATED>` in ours, along with `0x1090 / 0x10dc / 0x10fc / 0x1104` (also in HANDOFF's adjacent list). The allocation order shifted since that snapshot.
## Real wedges, current code state
| Handle | Kind | Engine state | Waiter | Notes |
|---|---|---|---|---|
| **0x12a4** | `<UNCREATED>` | `<AUDIT_BLIND>`, waiters=1 | **tid=1 main**, pc=0x824ac578 | Wait went via `do_wait_single` but creation never hit `NtCreateEvent``KeInitializeEvent` path. **This is the iterate-2.BC wedge** (recorded as "0x10e8" in HANDOFF — same site, different ID). |
| **0x12ac** | Event/Auto | `<NO_SIGNALS_DESPITE_WAITS>`, waiters=1 | **tid=13** silph UI cluster, pc=0x824ac578 lr=0x821cb1e0 | Frame trail: `0x821cb1e0 → 0x821cbae0 → 0x821cc454 → 0x821c4f18 → 0x82174a80`. Frames 3-5 carry `silph::UImpl@GamePart_Title` / `silph::VGamePart_Title` vtables — **audit-049's cluster, unchanged**. |
| 0x12b8 | Event/Auto | NO_SIGNALS, waiters=1 | (tid TBD) | Sibling, 0xC bytes from 0x12ac. |
| 0x1020 | Event/Manual | NO_SIGNALS, waiters=1 | — | γ-class. |
| 0x1040 | Event/Auto | NO_SIGNALS, waits=32 (hot poll) | — | Heavy wait, no signal. |
| 0x10a8 | Event/Auto | NO_SIGNALS, waits=7 | — | γ-class. |
| 0x10e4 | Event/Manual | NO_SIGNALS, waiters=1, waits=2 | — | γ-class. |
**Working handles** (sanity baseline): 0x1028 (Sema, 8 waits / 7 signals / 7 wakes), 0x10d0 (Sema, 2 waits / 1 signal / 1 wake), 0x10f0 (Event/Auto, 1/1/1 ✓ marked `<SUSPECT>` but actually fine), 0x10e0 (Event/Manual, 32 primary signals from somewhere).
## GPU interrupt delivery — the iterate-2.BC delta confirmed
| Engine | gpu.interrupt.delivered (vsync) | EmulateCPInterruptDPC / vblank pump |
|---|---:|---:|
| **ours** | 54 (source=0) + 1 (source=1) | — |
| **canary** | — | **4712** in 30 s ≈ 157 Hz |
**~87× ratio.** Confirms HANDOFF's diagnosis: ours' victim-thread injector dies once guest threads all park; canary's host frame-limiter thread keeps firing regardless.
## Canary signaler attribution
Top KeSetEvent guest_ptrs in canary (30 s window):
| guest_ptr | KeSetEvent fires | Inferred role |
|---|---:|---|
| `0x828A3254` | 5729 | Audio host-pump worker (per AUDIT-032: `r3=0x828A3230` region) |
| `0x828A3244` | 5728 | Audio host-pump sibling |
| `0x828A3244` + 16-byte stride | — | Static XEX-image audio event struct |
| `0xBCE25234` | 1301 | **silph UI cluster PKEVENT** (heap-allocated, 0x10 stride). Likely ours' 0x12ac analog. |
| `0xBCE25214 / 0xBCE25244 / 0xBCE25224` | 648 / 603 / 603 | Sibling silph UI PKEVENTs (0x10 stride struct). Likely ours' 0x12a4 / 0x12b8 / 0x1040 analogs. |
Ours signals every one of those equivalents **0 times**.
## Round 2 — LR-extended probes name the producer
Extended the canary probes with guest-LR capture (5 sites in `xboxkrnl_threading.cc`, 10 LOC). Re-ran the harness. Now each `KeSetEvent` line carries the guest function that signaled the event. Result for the silph UI cluster:
| PKEVENT | KeSetEvent count | Producer LR(s) |
|---|---:|---|
| `0xBCE25214` | 574 | `0x82508510` (single producer) |
| `0xBCE25224` | 565 | `0x82508358` (single producer) |
| `0xBCE25234` | 1153 | `0x82506C90` (579) + `0x82508524` (574) |
| `0xBCE25244` | 570 | `0x82506F9C` (single producer) |
| `0xBCE25284` | 1 | `0x82507ABC` (one-shot 5th-worker init?) |
All 6 producer LRs sit in `0x825060000x82509000`. **This is exactly the `sub_825070F0` worker thread cluster** that audit-057/058 already named:
> *audit-057: "sub_825070F0 (4 missing, initializes 4 workers w/ shared ctx 0xBCE25340, entries 0x82506528/58/88/B8)"*
The 4 worker entries (`0x82506528/58/88/B8`) are inside `sub_82506xxx` — exactly where the producer LRs `0x82506C90`/`0x82506F9C` live. The other producer LRs `0x825083xx` / `0x825085xx` are in downstream callees (workers call deeper code which itself calls KeSetEvent).
For comparison the audio host-pump pair gets a single sharp producer too:
- `0x828A3254` × 5271 ← `lr=0x824D2A44`
- `0x828A3244` × 5271 ← `lr=0x824D292C`
(These match AUDIT-032's PC `0x824D229C / r3=0x828A3230` region — already-understood audio host-pump.)
## Verdict — 2.BE is INSUFFICIENT for the silph UI wedge
The silph UI PKEVENTs are signaled exclusively by threads spawned by `sub_825070F0`. Per audit-057/058, **`sub_825070F0` fires 0× in ours** — those 4 worker threads never spawn. Therefore the PKEVENTs are never signaled. Therefore tid=13 (`0x12ac` in ours) wedges forever.
**`sub_825070F0`'s call chain is gated by the audit-009 "unreachability island"** — a CRT-driven fnptr-array bootstrap that ours fails to enumerate. VSync delivery is irrelevant to that bootstrap; the host frame-limiter thread does not drive CRT initializers.
Therefore:
- **2.BE alone CANNOT unwedge tid=13.** It will close the 54-vs-4712 VSync delivery gap and may unblock things downstream of vsync, but the silph UI wedge has an independent missing-signaler root cause.
- **2.BE may still unwedge tid=1 main on `0x12a4`** — that wait went via `KeInitializeEvent` (handle never hit `NtCreateEvent` in ours, hence `<AUDIT_BLIND>`). Whether `0x12a4`'s signaler depends on VSync is unknown without further probing.
## Implications for next moves
A single fix won't take us to draws > 0. We need at least two:
1. **2.BE (VSync delivery)** — still worth landing for the architectural correctness it brings, AND because it's the only fix that can unwedge tid=1 main's `0x12a4` if that's vsync-derived. ~6080 LOC per Agent C's plan.
2. **2.BF (sub_825070F0 activation)** — this is the audit-058 unfinished business. Options:
- (a) **Static work:** trace canary's CRT-driven fnptr-array path that activates the silph UI bootstrap; backport the missing init into ours. High info, slow. Requires more probing.
- (b) **Direct synthetic spawn:** ours injects host-side `ExCreateThread` calls for the 4 worker entries at boot completion, mirroring AUDIT-048's audio-host-pump precedent. Pragmatic; ~40 LOC; risks getting context (`0xBCE25340`) wrong.
A possible third move:
3. **Re-probe with LR on Wait paths** (we already added it but didn't grep for it) — to tell us whether tid=1's wait on `0x12a4` is the same LR as `sub_825070F0`-chain or a totally different signaler. If different, it's a 3rd missing producer.
## Round 4 — wait-side guest LR via one-frame back-chain walk
After fixing the PPC stack-walk offset (Xbox 360 stores saved LR at `[prev_sp - 8]`, not the `+4` AIX convention), wait-side LR comes through cleanly.
**Canary's top wait sites:**
| canary handle | wait count | guest_lr | LR region | mapping |
|---|---:|---|---|---|
| `F800005C` | 1635 | `0x8216EE14` | kernel early-boot infra | unrelated |
| `F800000C` | 1597 | `0x824AFFC4` | xboxkrnl wrapper (scheduler / work-queue?) | unrelated |
| **`F80000DC`** | **476** | **`0x821C7D3C`** | **silph::UImpl/GamePart** | **= ours' 0x12ac silph UI wedge** |
| `F80000B0` | 6 across | `0x821CBAE0` + `0x821CC19C` + `0x822DFE2x/D0` | **exact match with audit-049's frame trail** | sibling silph UI wait |
Identity proof: ours' audit-049 frame trail for the silph UI wedge was `0x821cb1e0 / 0x821cbae0 / 0x821cc454 / 0x821c4f18 / 0x82174a80`. Round 4 captures `0x821CBAE0` and `0x821CC19C` (adjacent PCs) as wait LRs in canary — same cluster, same code.
**Refined verdict.** ours' `0x12a4` (tid=1 main, AUDIT_BLIND) and `0x12ac` (tid=13 silph UI) are 8 bytes apart — likely sibling KEVENT fields in the same silph UI struct. canary's analogs are in the `F80000xx` namespace, similarly clustered. The single fix that addresses both:
> **2.BF (b)** — synthetic host-side spawn of `sub_825070F0`'s 4 workers at the audit-058-identified context (`0xBCE25340`), entries `0x82506528/58/88/B8`. Once those workers run, they signal the silph UI PKEVENT cluster, unwedging BOTH tid=1 main and tid=13 silph UI in one shot.
2.BE (host-driven VSync ISR delivery) becomes follow-on work after the UI bootstrap completes and frame pacing actually matters.
## Open questions for iterate 2.BD / 2.BE planning
1. **Does 2.BE alone unwedge tid=13?** Cheapest verification path: land 2.BE and re-run audit-059, see whether `0x12ac` signal count goes 0 → non-zero.
2. **What is the LR-pattern of canary's `KeSetEvent guest_ptr=0xBCE25234` callers?** The current probe doesn't capture LR — extending the cvar to do so on a filtered subset would let us name the producer function in canary's namespace.
3. **Does the GPU frame-limiter's CP interrupt actually walk into the silph UI cluster?** I.e., does `EmulateCPInterruptDPC``interrupt_callback` → guest code ever hit `sub_821CB030` or its callees? An LR probe inside `EmulateCPInterruptDPC` would answer this.
## Artifacts
- `canary.log` 2.2 MB / 34,095 lines / 32,977 AUDIT-HLC lines
- `canary.stdout` 2.2 MB (duplicate of canary.log due to log_file fallback)
- `canary.stderr` 8.4 KB (Wine diagnostics)
- `ours.log` 479 lines (focus ledger + thread diagnostics + final state)
- `ours.stderr` 317 lines (kernel-call counters)
- `vkd3d-proton.cache.write` 15 KB (build artifact, ignored)
Commits in play (xenia-canary, fork-local only):
- `03362b59f` cross-build-wine (cross-compile toolchain)
- `d031d7c51` audit-handle-lifecycle-probes (this audit's probes)

View File

@@ -0,0 +1,116 @@
# Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN
## Background
Rounds 23-33 drove γ-cluster #2 down to the actual gate: **`sub_821741C8`** (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from `sub_821752C0` thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.
The fix shape is a synth module analogous to `silph_synth.rs` (rounds 18-21):
- Synthesize a singleton-like object with the right vtable
- Spawn a guest thread at the right entry with this object as r3
- Let the dispatch chain do the rest
Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.
## Sub-round breakdown (estimated 5-8 rounds)
### 34.α — Probe canary's dispatcher singleton (1 round)
Capture canary's runtime state at `sub_821741C8` entry:
- `r3 = 0xBCA44C00` (canary tid=6's dispatcher singleton)
- Dump `r3..r3+0x80` to identify all fields
- Note vtable address at `[r3+0]`
```bash
WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
--audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
...
```
### 34.β — Probe full vtable layout (1 round)
Read the vtable bytes statically from the PE (canary's `[r3+0]` IS a static XEX VA — same trick as round 21):
- Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
- Confirm slot 9 = `sub_821C7CB8` and `vtable+0x24` thunk to `sub_821741C8`
- Look at all other slots — do any reference deep guest code that needs more init?
Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.
### 34.γ — Skeleton synth + thread spawn (1 round)
Create `crates/xenia-kernel/src/silph_ui_synth.rs` mirroring `silph_synth.rs` structure:
```rust
pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
// Allocate ~0x100-0x200 bytes for the dispatcher singleton
let ctx = state.heap_alloc(0x200, 16)?;
mem.write_zeros(ctx, 0x200);
// Install static-XEX vtable at [+0]
mem.write_u32(ctx + 0x00, VTABLE_VA); // discovered in 34.β
// Other init fields from 34.α dump
// ...
// Spawn dispatcher thread at sub_821748F0 with r3=ctx
scheduler.spawn(SpawnParams{
entry: 0x821748F0,
start_context: ctx,
create_suspended: false,
...
})?;
state.silph_ui_synth_done = true;
state.silph_ui_synth_ctx = ctx;
Ok(ctx)
}
```
Hook point: first reach of `sub_821CB030` in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).
Add 3-mode env gate: `XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}`.
### 34.δ — Run + diagnose first crash (1 round)
Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:
- Probe at thread entry + early BB heads
- Identify the offset that's accessed
- Compare to canary's value at that offset
### 34.ε..η — Iterate on field fills (2-4 rounds)
Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).
### 34.θ — Producer-side seeding (1 round)
Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are `sub_821CBEA8 / sub_821D24A0 / sub_821CD458` — they may need their own bootstrap. Probe what triggers them in canary.
## Verification at each stage
After every commit:
- `cargo test --release --workspace` — 765/765 must pass
- `XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac`
- Check:
- No crash
- `sub_821741C8` fires
- `sub_82450b68` r4=3 fires increase
- Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
- Eventually: `VdSwap > 1, draws > 0`
## Risk register
- **High**: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
- **High**: foreign-arena pointers in canary's heap (similar to round 19's `[+0x28/+0x2C/+0x30]`) may need their own synthesis
- **Medium**: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
- **Low**: workspace tests breaking (probe infrastructure is solid)
- **Low**: existing iterate-2BE work regressing (it's on a separate branch)
## Off-ramps
If we hit a wall at any sub-round, the off-ramps are:
1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers
## Branch plan
New branch: `iterate-2BF/silph-ui-synth` off `iterate-2BF/synthetic-silph-spawn` HEAD `40f208e`. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.
## When ready to execute
Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.

View File

@@ -0,0 +1,66 @@
AUDIT-PC-PROBE pc=0x8216ea68 tid=1 hw=0 cycle=5362918 lr=0x824ab8e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6181256 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181641 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9229300 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x821748f0 tid=13 hw=1 cycle=0 lr=0xbcbcbcbc r3=0x4024a840 r11=0x00000000 [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
=== Final State ===
PC: 0x00000000
LR: 0xbcbcbcbc
CTR: 0x00000000
CR: 0x00000000
XER: CA=0 OV=0 SO=0
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4128], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001020 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]

View File

@@ -0,0 +1,167 @@
# Round-A1..A4 findings — canary tid=6 spawn chain & divergence frontier
## Anchor reframe (round-37 misread corrected)
The "factory/registry layer divergence at [0x828E1F08]" framing is falsified.
Both engines install the SAME static-XEX `.rdata` vtable `0x820A183C` at the
singleton's `[+0]`. The instance VAs differ only because of ε-class allocator
divergence (audit-043).
| Probe | Canary | Ours |
|----------------------------|----------------------|----------------------|
| `[0x828E1F08]` | 0xBC22C910 (heap) | 0x40111910 (heap) |
| `[[0x828E1F08]+0]` vtable | 0x820A183C | 0x820A183C (SAME) |
| `vtable[+0]` thunk | 0x82175330 | 0x82175330 (SAME) |
| `vtable[+8]` thunk | 0x82175340 → b sub_821741C8 | SAME (vtable bytes from XEX `.rdata`) |
The thunks at 0x82175330+ are 8-byte `lwz r3, 8(r3); b <real_method>`
trampolines. Slot 2 (`+0x08`) is the worker dispatch entry that round 33
identified as 471× in canary tid=6 / 0× in ours.
## A.1 — Canary dispatcher loop is in sub_822F1AA8 on tid=6
Probe `--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=256` on
canary (35 s):
- ~1678 fires of sub_821741C8 on **tid=6**
- r3 at entry = `0xBCCC4A80` (the inner sub-object of the silph::UImpl
singleton — extracted via the thunk's `lwz r3, 8(r3)`)
- LR at entry = `0x822F1D5C` (return PC after the `bctrl` at 0x822F1D58 inside
sub_822F1AA8)
- Singleton's `[+C0..+D0]` UTF-16 spells "HF Frequency" (a UI label)
The dispatch site in canary (the `bctrl`) is at PC 0x822F1D58 inside
sub_822F1AA8:
```
0x822F1D40: lwz r3, 7944(r25) ; r3 = [r25+0x1F08] = [0x828E1F08]
0x822F1D4C: lwz r11, 0(r3) ; vtable
0x822F1D50: lwz r11, 8(r11) ; vtable[+8] = thunk 0x82175340
0x822F1D54: mtctr r11
0x822F1D58: bctrl ; → 0x82175340 → b 0x821741C8
```
## A.2 — Canary tid=6 spawn site is sub_821746B0 at PC 0x82174824
Enumeration of `ExCreateThread` calls in canary (35 s, 21 unique tuples):
```
entry=821748F0 start_ctx=BC365700 lr=824AC5F0 guest_lr=82174828 ← silph dispatcher #1
entry=821748F0 start_ctx=BC366DA0 lr=824AC5F0 guest_lr=82174828 ← silph dispatcher #2
```
PC `0x82174824` is the `bl 0x82172370` (the `ExCreateThread` thunk) inside
`sub_821746B0`. The setup is:
```
0x8217480C: lis r11, 0x8217
0x82174810: li r7, 0
0x82174814: li r6, 4 ; priority
0x82174818: mr r5, r29 ; start_ctx
0x8217481C: addi r4, r11, 18672 ; r4 = 0x821748F0 (entry)
0x82174820: li r3, 0
0x82174824: bl 0x82172370 ; ExCreateThread
```
The entry `0x821748F0` is a thread main that calls `bl 0x821749C0` (the
inner dispatch).
## A.3 — sub_822F1AA8 spawns a SECOND thread at 0x822F1B08
The dispatch-loop function `sub_822F1AA8` itself ALSO spawns a thread at
PC 0x822F1B08 with entry=`sub_822F1EE0` and `start_ctx=BCE24A40`:
```
0x822F1AEC: lis r11, 0x822F
0x822F1AFC: addi r4, r11, 7904 ; r4 = 0x822F1EE0
0x822F1B08: bl 0x82172370 ; ExCreateThread
```
sub_822F1EE0 → sub_822F1F20 contains its own atomic state-machine + wait loop.
## A.3' — sub_822F1AA8 has exactly 2 callers, both in sub_8216EA68
```
source=0x8216ECCC source_func=0x8216EA68 kind=call
source=0x8216EE10 source_func=0x8216EA68 kind=call
```
So sub_8216EA68 is the only function that drives sub_822F1AA8.
## A.4 — Ours' divergence is INSIDE the spawned thread, NOT at the spawn
Mirror-probed ours at `sub_821746B0` body BB heads (parallel mode, 50M
instructions, XENIA_CACHE_PERSIST=1):
| PC | Fires | Notes |
|-------------|-------|------------------------------------------------|
| 0x821746B0 | 1 | Entry. r3=0x40ba9a80 |
| 0x821746E0 | 1 | After `bl 0x8284DCFC` (critical-section) |
| 0x82174798 | 1 | After the early `beq` (r28==0 branch) |
| 0x821747B8 | 1 | **Past the gate**: `[0x828E2B14]=0x40105000` non-NULL; `bl 0x82150EF8` returned r3=0x4024a840 (NON-NULL) |
| 0x821747D8 | 1 | After the inner `bl 0x821723F0` |
| 0x8217480C | 1 | Enters the spawn block |
| 0x82174828 | 1 | **Post-`bl ExCreateThread`**, r3=0x1070 = thread handle |
**OURS DOES SPAWN THE THREAD VIA THIS SITE.** The returned handle 0x1070 is
**tid=13's thread handle** (per round 37 final state). So **ours' tid=13 IS
the same logical thread as canary's tid=6** — spawned by the identical call
site with the same entry (0x821748F0).
## A.4 — Divergence is INSIDE the spawned thread's body
Round 37's frame trail for ours' tid=13 wedge:
`0x821CB1E0 → 0x821CBAE0 → 0x821CC454 → 0x821C4F18 → 0x82174A80`
The LAST frame `0x82174A80` is **inside sub_821749C0** (= the inner dispatch
called from sub_821748F0). It's right after the vtable dispatch at
0x82174A78 (`bctrl` on `[r30+vtable][+16]`):
```
0x82174a64: mr r3, r30 ; r3 = some object
0x82174a68: lwz r11, 0(r30)
0x82174a6c: lwz r4, 4(r29)
0x82174a70: lwz r5, 8(r31)
0x82174a74: lwz r11, 16(r11) ; r11 = vtable[+0x10]
0x82174a78: mtctr r11
0x82174a7c: bctrl ; dispatch
0x82174a80: lwz r3, 0(r29) ; ← wedge frame top (LR after bctrl)
```
So `sub_821749C0`'s vtable[+0x10] dispatch on tid=13/tid=6's `r30` object
lands at audit-049 territory in ours (chain through sub_821CB030+0x128 that
ends waiting forever on handle 0x1078). In canary, the same dispatch on the
same object SHOULD land somewhere that ultimately reaches sub_822F1AA8's
dispatch loop and runs sub_821741C8 1678× via vtable[+8].
**The object `r30` is the result of `bl 0x821CF3F0`** at PC 0x821749DC. So
sub_821CF3F0 returns a registry-lookup object; the vtable on this object's
slot +0x10 method's body determines whether the thread wedges or runs.
## Phase B classification
Class 3 — **Missing init-time precondition**. Ours reaches the spawn site,
ours' tid=13 enters the chain, ours' tid=13 enters sub_821749C0, but the
vtable[+0x10] dispatch at PC 0x82174A78 in ours lands in audit-049 territory
(wait forever on 0x1078) rather than continuing through the canonical chain
toward sub_822F1AA8's outer dispatch loop.
Possible classes to refine in next round:
- **3a**: same vtable but state-dependent — `r30`'s field at a specific offset
differs in ours vs canary, causing the method body to take a different
branch.
- **3b**: the vtable in `r30` is DIFFERENT in ours vs canary (e.g., ours has
a base-class vtable but canary has a derived-class vtable).
- **4**: synthesis fallback — spawn a SECOND thread that runs sub_822F1AA8's
dispatch loop directly, bypassing the wedged sub_821749C0 chain.
## Next probe (A.4.5)
Probe both engines at sub_821749C0 entry filtering tid=13 (ours) / tid=6
(canary), capturing:
- `r3` and `r4` at entry (the factory-output object and the ctx)
- After the `bl 0x821CF3F0` at 0x821749DC: capture r30 (= sub_821CF3F0
return — the object whose vtable is dispatched at 0x82174A78)
- At PC 0x82174A78 (the divergent bctrl): r30 + r30+0 (vtable) + vtable[+0x10]
(the dispatch target)
If ours and canary have IDENTICAL `vtable[+0x10]` targets but the method
body's behavior differs → class 3a (state divergence). If targets differ →
class 3b (vtable identity divergence).

View File

@@ -0,0 +1,91 @@
AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9228833 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746b0 tid=1 cycle=9228833
AUDIT-PC-PROBE pc=0x821746e0 tid=1 hw=0 cycle=9228856 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746e0 tid=1 cycle=9228856
AUDIT-PC-PROBE pc=0x82174798 tid=1 hw=0 cycle=9228859 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174798 tid=1 cycle=9228859
AUDIT-PC-PROBE pc=0x821747b8 tid=1 hw=0 cycle=9229012 lr=0x821747ac r3=0x4024a840 r11=0x4024a840 [r3+0]=0x4024ace0 [[r3+0]+24]=0x43777290 [r3+0x0C]=0x4024a820 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747b8 tid=1 cycle=9229012
AUDIT-PC-PROBE pc=0x821747d8 tid=1 hw=0 cycle=9229440 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747d8 tid=1 cycle=9229440
AUDIT-PC-PROBE pc=0x8217480c tid=1 hw=0 cycle=9229443 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x8217480c tid=1 cycle=9229443
AUDIT-PC-PROBE pc=0x82174828 tid=1 hw=0 cycle=9229509 lr=0x82174828 r3=0x00001070 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174828 tid=1 cycle=9229509
=== Final State ===
PC: 0x824ac578
LR: 0x824ac578
CTR: 0x82153bf0
CR: 0x24000028
XER: CA=0 OV=0 SO=0
r0 : 0x0000000082153bf0
r1 : 0x00000000700ff6e0
r2 : 0x0000000020000000
r4 : 0x0000000000000001
r7 : 0x0000000003a72328
r8 : 0x0000000043b77284
r9 : 0x0000000043b77328
r10: 0x0000000000000001
r11: 0x0000000000000103
r12: 0x0000000082173c64
r13: 0x000000007fff0000
r18: 0x0000000040d09a7c
r23: 0x00000000828f3844
r26: 0x000000004024a620
r27: 0x00000000820a17a8
r31: 0x0000000000001070
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,136 @@
# Phase A synthesis — canary tid=6 IS the main thread; the wedge is sub_822F1AA8's loop exit
## Top-line finding
**Canary's `tid=6` is canary's main thread.** Confirmed by probing `entry_point`
(`sub_824AB748`) with `--audit_jit_prolog_pc=0x824AB748`: fires 1× on
`tid=00000006` with `lr=BCBCBCBC` (= OS-initial / no caller). Ours numbers
its main thread `tid=1`. Same logical thread; different label.
Therefore "tid=6 fires sub_821741C8 471×" (round 33) means **the main thread**
loops inside `sub_822F1AA8` firing `sub_821741C8` ~1678×/30s in canary. In
ours, the main thread (tid=1) runs `sub_822F1AA8` ONCE, exits the loop, and
proceeds to thread-join on the spawned init thread (handle 0x1070 = tid=13),
which is itself blocked forever on handle 0x1078.
## Call chain (identical in both engines, different runtime behavior)
```
entry_point (sub_824AB748)
├─ sub_824ACB38 CRT-driven fnptr-array iterator (audit-050 region)
├─ ...
└─ sub_8216EA68 Many local calls including:
├─ ExCreateThread(entry=sub_8217F0F8 ...) ; sibling thread
├─ sub_822F1AA8(controller=...) ; FIRST call (PC 0x8216ECCC)
└─ sub_822F1AA8(controller=0xBCE24A40 canary / ; SECOND call (PC 0x8216EE10)
0x40d09a40 ours) ↑ this is the loop
```
The SECOND call is what runs the dispatcher loop. Its LR = 0x8216EE14.
Confirmed in both engines.
## sub_822F1AA8 loop structure
```
0x822F1AA8: entry, r30 = r3 (controller)
0x822F1AEC-0x822F1B08: ExCreateThread(entry=sub_822F1EE0, ctx=r30) → r29 = handle
0x822F1B30-0x822F1B34: bl 0x824AA8B0(r3=r29) ; ?
0x822F1B38-0x822F1B4C: first bctrl → vtable[+0] of [0x828E1F08]
0x822F1B50-0x822F1B74: setup, bl 0x824AA330 INFINITE wait on [r22+32]
0x822F1B80-0x822F1BA8: post-wait setup; [r30+0] |= 0x2
0x822F1BB0-0x822F1BBC: TOP-OF-LOOP CHECK: if [r30+0] & 0x10000000 → goto 0x822F1E10 (exit)
0x822F1BCC..0x822F1DEC: loop body (includes the vtable[+8] bctrl → sub_821741C8 at PC 0x822F1D58)
0x822F1DEC-0x822F1DFC: bl 0x824AA330 INFINITE wait on [r23+0]
0x822F1E00-0x822F1E0C: END-OF-ITERATION CHECK: if [r30+0] & 0x10000000 == 0 → goto 0x822F1BCC (re-loop)
0x822F1E10-0x822F1E18: EXIT: [r30+0] |= 0x02000000 (set MSB-6 = LSB-25)
0x822F1E1C-0x822F1E24: release something via bl 0x824AA2F0
0x822F1E28-0x822F1E30: bl 0x824AA330 INFINITE on [r30+28] = SPAWNED THREAD HANDLE (thread join!)
0x822F1E40: bl 0x824AA3E0
0x822F1E44-0x822F1E5C: final cleanup: vtable[+24] bctrl on [0x828E1F08]
0x822F1E60-0x822F1E78: [r30+0] = 0, then [r30+0] |= 1; bl 0x824567E0
0x822F1E7C-0x822F1E88: epilogue
```
**Loop exit gate**: `[r30+0] & 0x10000000` (bit 28 LSB / bit 3 MSB). Set →
exit. Both top-of-loop check (0x822F1BBC) and end-of-iteration check
(0x822F1E0C) gate on the same bit.
## What's different between engines
| Engine | [r30+0] at entry | Loop iterations | Exits sub_822F1AA8? |
|--------|------------------|------------------|----------------------|
| canary | 0x21 (per probe) | ~1678+ in 30s | NO (stays in loop) |
| ours | 0x21 (per probe) | 0 (probes show none of the loop-body PCs fire after entry) | YES (exits quickly) |
Both engines have `[r30+0]=0x21` at entry — bit 28 NOT set. After the `ori
r11, r11, 0x2` at 0x822F1B90, both should have `[r30+0]=0x23`. Bit 28 still
not set.
So **some code sets bit 28 on [r30+0] between sub_822F1AA8 entry and the
loop check** in ours but not in canary.
Mem-watch on 0x40d09a40 (ours' controller VA) shows **zero guest writes** in
my 50M-instruction parallel run. Possible reasons:
- The setter writes from kernel/runtime code that mem-watch doesn't capture
(kernel-host store, not guest JIT store)
- The setter writes via a computed alias (different VA but same backing)
- The bit IS set via a probe-quantum-elided JIT store
## Phase B classification
**Class 3a — state-divergence on the controller object**. The vtable
identity is the same (round-37 confirmed `0x820A183C` in both). The
controller object's bit 28 of `[+0]` evolves differently during the setup
between sub_822F1AA8 entry and the loop check.
Class 4 (synthesis) is now LESS attractive: ours' main thread DOES reach
sub_822F1AA8 with the right controller. We don't need to spawn the
dispatcher — we need to PREVENT the main thread from exiting the loop.
## Pragmatic next step — JIT instrumentation to find bit-28 setter
Most direct diagnostic: add a JIT hook in xenia-cpu that, for guest stores
in the range [0x822F1AA8, 0x822F1E10), captures the guest PC + the written
value when the store would set bit 28 of any address. This identifies the
exact PC that sets the loop-exit bit.
Alternative: extend `--mem-watch` to also capture kernel-side stores by
hooking the GuestMemory write path at the kernel-state level.
Even simpler: add a one-shot `--bit-watch=ADDR:MASK` cvar that fires when
the value at ADDR has any bit in MASK transition from 0→1, regardless of
who wrote it. This is the cleanest diagnostic for this exact pattern.
## Fix shape (when bit-28 setter is identified)
If the bit-28 setter is inside the vtable[+0] dispatch chain at 0x822F1B4C
(target sub_82173990), then the fix might be a state-init issue in the
kernel/runtime.
If the bit-28 setter is inside the inner wait or one of the kernel calls
(`bl 0x824AA8B0`, `bl 0x824AA330`), the fix might be a missing event signal
or a wrong handle-state evolution.
If we can't identify the setter cleanly, the synthesis fallback is to
**inject a kernel-side hook that clears bit 28 of [r30+0] on every entry to
sub_822F1AA8's bit-check site (0x822F1BB0)**. Crude but should keep the
main thread in the loop.
## Why this is a clearer wedge picture than rounds 22-33
Rounds 22-33 chased the audit-049 wedge from various angles. The diagnoses
landed on different layers:
- R22: "wrong cluster targeted" (cluster A vs B)
- R26-30: "state-machine progression bug"
- R32-33: "pool 3 starvation; bootstrap walk-back"
This round establishes the simplest possible framing:
> **Canary's main thread loops forever in a dispatcher; ours' main thread
> exits the loop after one setup phase. The exit is gated by a single bit
> on the controller's flag word.**
If bit 28 of `[controller+0]` could be permanently cleared, ours' main
thread would stay in the loop, sub_821741C8 would dispatch, signals would
flow, tid=13 would complete, draws would happen.

View File

@@ -0,0 +1,79 @@
AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6180796 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181181 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
=== Final State ===
PC: 0x824ac578
LR: 0x824ac578
CTR: 0x82153bf0
CR: 0x24000028
XER: CA=0 OV=0 SO=0
r0 : 0x0000000082153bf0
r1 : 0x00000000700ff6e0
r2 : 0x0000000020000000
r4 : 0x0000000000000001
r7 : 0x0000000003a72328
r8 : 0x0000000043b77284
r9 : 0x0000000043b77328
r10: 0x0000000000000001
r11: 0x0000000000000103
r12: 0x0000000082173c64
r13: 0x000000007fff0000
r18: 0x0000000040d09a7c
r23: 0x00000000828f3844
r26: 0x000000004024a4e0
r27: 0x00000000820a17a8
r31: 0x0000000000001070
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]

View File

@@ -0,0 +1,127 @@
# Phase C.1 — Validation refutes Phase A's bit-28 setter hypothesis
## TL;DR
Phase A claimed: "bit 28 of `[0x40d09a40]` (controller word) gets set in ours, causing sub_822F1AA8's dispatcher loop to exit early; candidate setter is `sub_821B55D8` at PC `0x821B5DA4`."
**Phase C.1 falsifies this in 4 sub-rounds:**
1. **`sub_821B55D8` is dead code** in both engines — its `XamInputSetState` wrapper `sub_824AA858` fires 0× in both.
2. **`[0x40d09a40]` is never set to anything with bit 28** — `--dump-addr` at end of run shows `+0x00 = 0x00000021`, the entry value. Bit 28 is NEVER set.
3. **The actual wedge is at the `bcctrl` at PC `0x822F1B4C`** (inside sub_822F1AA8 setup, BEFORE the dispatcher loop). tid=1 never reaches the loop top-check.
4. **The bcctrl calls `sub_82173990`** (vtable[0] of the dispatcher singleton at `[0x828E1F08]`), which eventually waits for tid=13 to terminate. tid=13 wedges in the audit-049 silph::UImpl@GamePart_Title chain on handle `0x1078`.
The C.2 force-clear POC (the planned next step) would have **zero effect** because bit 28 is never set. Skipped per plan stopping criterion.
## Probe-fire counts (ours, 50M-instr parallel)
| PC | sub-round | fires | meaning |
|---|---|---|---|
| `0x821B55D8` (Phase A candidate fn entry) | 1 | **0** | function never reached → β/γ |
| `0x821B5D98,DA0,DAC,D48` (loop BB heads) | 1 | **0** | function never reached |
| `0x822F1AA8` (sub_822F1AA8 entry) | 2,3,4 | 2-3 | reached |
| `0x822F1B38` (post-`bl 0x824AA8B0`) | 4 | 2 | reached |
| `0x822F1B50` (post-`bcctrl`) | 4 | **0** | **bcctrl never returns** |
| `0x822F1B60,B78,B80,BBC` (loop setup/top) | 3 | 0 | unreachable past bcctrl |
| `0x822F1E10` (loop exit cleanup) | 2 | 0 | loop never entered, never exited |
| `0x822F1E34` (post-thread-join) | 2 | 0 | never reached |
| `0x82173990` (vtable[0] target) | 4 | 2 | called via bcctrl, r3=singleton (LR=0x822F1B50) |
| `0x821748F0` (tid=13 entry) | 4 | 2 | tid=13 runs |
| `0x821C4EB0` (silph::UImpl@GamePart_Title) | 4 | 2 | audit-009/049 reached on tid=13 |
| `0x82457388,0x824574C0,0x82457408,0x82457490` (other oris candidates) | 2 | 0 | unreachable |
## Canary probe results
| PC | fires | meaning |
|---|---|---|
| `0x824AA858` (XamInputSetState wrapper) | **0** | sub_821B55D8 chain is dead code in CANARY too |
| `0x822F1B50` (post-bcctrl, attempted) | **0** | canary's JitProlog only fires at function entries, so not directly testable; but per audit round-33 sub_821741C8 fires 471× in canary → bcctrl DOES return in canary |
## Critical evidence: `--dump-addr=0x40d09a40` at end of run
```
addr=0x40d09a40
+0x00: 00 00 00 21 00 00 00 01 42 44 df 00 40 54 1a 40
^^^^^^^^^^^ ^^^^^^^^^^^
+0x10: 40 54 1b 40 40 54 1b 80 40 54 1b c0 00 00 10 54
+0x20: 00 00 00 00 40 24 a8 20 00 00 00 08 00 00 00 00
```
- `[+0x00] = 0x00000021` ← bit 28 (mask 0x10000000) is NOT SET. Same value as at sub_822F1AA8 entry.
- `[+0x1c] = 0x00001054` ← spawned init thread handle (= tid=8's thread handle, NOT 0x1070)
- Thread state: tid=1 waits on handle `0x1070`, tid=13 waits on handle `0x1078`.
Handle `0x1070` is **tid=13's thread handle** (per stderr: `ExCreateThread: tid=13 handle=0x1070 entry=0x821748f0 ctx=0x4024a840 suspended=true`). So tid=1's wait at the wedge point is a **thread-join on tid=13**, NOT a thread-join on the dispatcher init thread (tid=8, handle 0x1054).
## Wedge path (corrected)
```
entry_point (sub_824AB748) [tid=1 main]
└─ sub_8216EA68
└─ sub_822F1AA8(controller=0x40d09a40) [LR=0x8216EE14]
├─ ExCreateThread(entry=sub_822F1EE0, ctx=controller) [PC 0x822F1B08]
│ ⇒ tid=8 spawn, handle=0x1054 (suspended)
├─ bl 0x824AA8B0 (no-op probe) [PC 0x822F1B34]
└─ bcctrl on vtable[+0] of [0x828E1F08] singleton [PC 0x822F1B4C]
└─ sub_82173990(r3=singleton) [r3=0x40ba9a80, vtable=0x40111910]
└─ ... (768-byte function with ≥18 calls; calls sub_82448AA0, sub_824AA7A0,
sub_82448BC8, sub_82448C50, sub_8216F218, sub_8217C850, sub_82178E50,
sub_821835E0, ...)
└─ ... → KeWaitForSingleObject INFINITE on handle 0x1070
(= tid=13's thread handle, thread-join)
⇒ WEDGE — tid=13 never exits
(Concurrently — spawned somewhere else, not from sub_822F1AA8:)
[tid=13, spawn-handle=0x1070, ctx=0x4024a840]
└─ sub_821748F0 (worker boilerplate, entry from ExCreateThread)
├─ sub_82172798, sub_82172818
└─ sub_821749C0
└─ sub_821CF3F0
└─ ... → sub_821C4EB0 (UImpl@GamePart_Title@silph) [audit-009/049!]
└─ ... → sub_821CB030 (creates KEVENT at +0x128)
⇒ KeWaitForSingleObject INFINITE on handle 0x1078
⇒ WEDGE — handle 0x1078 is never signaled in ours
```
## Why Phase A's hypothesis is wrong
Phase A:
1. Disassembled sub_822F1AA8's body, observed the bit-28 loop-exit check at `0x822F1BB8` and end-of-iter check at `0x822F1E0C`.
2. Mem-watch on `0x40d09a40` showed zero stores → inferred "the setter writes via some path mem-watch doesn't capture."
3. DB-scanned `oris ?, ?, 0x1000` (49 sites), found `sub_821B55D8 + 0x821B5DA4` with pattern `bl sub_824AA858 ; if r3 == 0xAA: oris r11, 0x1000 ; stw`.
4. Concluded `sub_821B55D8` was the setter.
What Phase A missed:
- Mem-watch's 0-stores result was correct: **NO setter exists**. Bit 28 is never set in either engine. The mem-watch null-result was a hint that the bit-28 hypothesis itself was wrong, but Phase A interpreted it as "mem-watch misses something."
- The disasm-based hypothesis was visually compelling (a loop iterating arrays and setting bit 28 when a kernel call returns 0xAA) but never verified runtime.
- `sub_821B55D8` is itself dead code in both engines.
## Reading-error class #19: disasm-pattern-match without runtime verification
When scanning for a hypothesized signal source via DB pattern-match (`oris ?, ?, 0x1000`), the analyst must run a probe to verify the suspected site is *both reached* and *takes the suspected path* before declaring it the cause. Phase A bypassed both checks. The single `--dump-addr=0x40d09a40` flag in sub-round 2 (literally 4 keystrokes added to the existing probe command) revealed the central assumption was wrong.
## Real divergence (handed to next session)
This is the **same wedge as audit-049/058/059**: tid=13 wedges in the silph::UImpl@GamePart_Title cluster on handle `0x1078`. tid=1 wedges on tid=13's thread-handle (`0x1070`) inside `sub_82173990`'s call chain.
`sub_82173990` is vtable[0] of the dispatcher singleton at `[0x828E1F08]`. It's a 768-byte function with ≥18 calls; the actual wait site is somewhere down its tree. To localize where in `sub_82173990` the wait happens, probe its BB heads + the `KeWaitForSingleObject` thunks (`sub_824AA330`, `sub_824AA708`).
The fix-shape is **NOT** "force-clear bit 28." The fix-shape is **"signal handle 0x1078 in the audit-049 cluster, or short-circuit tid=13's wait."** Round 22 (silph_synth.rs) attempted the cluster-A version of this. Cluster B (silph::UImpl) needs its own synthesis or a kernel-side signal of handle 0x1078.
## Phase C verdict
- C.1: 4 sub-rounds executed (within budget).
- C.2: **NOT EXECUTED** — POC would be no-op since bit 28 is never set. Per plan stopping criterion, do not proceed to C.2 blind when C.1 refutes the diagnosis.
- C.3: not applicable.
- Branch state: no source changes. Audit artifacts only.
## Files in this directory
- `ours-c1-probe.log/stderr` — sub-round 1, probe at sub_821B55D8 BB heads (0 fires)
- `ours-sr2-confirm-bit28.log/stderr` — sub-round 2, probe loop top/exit + dump-addr (bit 28 NEVER SET)
- `ours-sr3-wait-trace.log/stderr` — sub-round 3, probe wait site + handle 0x1070 trace
- `ours-sr4-bcctrl-trace.log/stderr` — sub-round 4, probe pre/post bcctrl + sub_82173990 entry + tid=13 entry (decisive)
- canary side in `../round-C1-setter-validation-canary/`:
- `canary-824AA858.log` — XamInputSetState wrapper fires 0× in canary too
- `canary-822F1B50.log` — JitProlog can't probe at BB-internal PCs (function-entry-only)

View File

@@ -0,0 +1,144 @@
# Phase D — Audit-049 Auto-Signal POC — FINDINGS
**Branch**: `iterate-2C/silph-ui-spawn-trace` (extends Phase C `481591f`)
**Date**: 2026-06-11
**Sub-rounds**: D2.SR1 → D2.SR4 (4/4 used)
**Verdict**: **B — partial unwedge**
## Mission
Phase C diagnosed the audit-049 wedge as tid=13 (silph::UImpl@GamePart_Title) waiting INFINITE on a KEVENT created at `sub_821CB030+0x128` (`lr=0x821cb15c`, post-bl PC). The Phase D POC tests this diagnosis by hooking `NtCreateEvent` from that exact call site and auto-signaling the resulting handle after a configurable delay (`XENIA_SILPH_UI_AUTOSIGNAL_DELAY` instructions).
If tid=13 unblocks, the diagnosis is confirmed. If new wedges or new threads appear downstream, even better — that's actual game progression past the wedge.
## Result summary
| Symptom | SR2/SR3 baseline | SR4 (POC firing) |
|---|---|---|
| `silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c` | yes (SR2/SR3) | yes |
| `silph autosignal: firing handle=0x1078` | NO | **yes (cycle 16326209)** |
| handle 0x1078 final | `signaled=false waiters=1 <NO_SIGNALS_DESPITE_WAITS>` | `signal_attempts=1 waiters=0` |
| tid=13 final state | `Blocked(WaitAny[0x1078])` | **`Ready` pc=0x824a9108** |
| tid=1 final state | `Blocked(WaitAny[0x1070])` thread-join | `Blocked(WaitAny[0x1070])` (tid=13 not yet exited) |
| ExCreateThread total | 10 | **12 (+tid=14, +tid=15)** |
| New downstream wedges | none past 0x1078 | **0x1084 (Event/Auto), 0x1088 (Event/Manual)** |
| `cxx_throw` runtime_error decoded | none | **yes, stack depth 6, top L0=0x82612b50 → L4=sub_82450B60+0x1A8 → L6=sub_82450a50** |
| VdSwap | 1 | 1 |
| gpu.interrupt.delivered{source=0} | 6393 | 4539 (different trajectory, no draws) |
**Conclusion**: tid=13 unwedged cleanly from the audit-049 wait, spawned two follow-on threads (tid=14 entry=`silph` ctx=`0x40929c00`, tid=15 a worker), and progressed deep enough into the silph::UImpl state machine to throw a `runtime_error` from sub_82450a50 → sub_82450B60+0x1A8 (the dispatcher cluster from round 26). The auto-signal **is not** the proper signaler — it lets tid=13 proceed but downstream state-machine invariants the missing real signaler would have established are not in place, so the dispatcher trips on a "not-registered instance" lookup.
This is a **clean confirmation** of the Phase C diagnosis: the wedge handle, the wait site, and the LR filter are all correct. The fix shape is:
- Either: synthesize the missing signaler properly (cluster-B silph_ui_synth.rs analogue from R33's deferred plan)
- Or: track what the auto-signal needed to write into the work-item state (`[+8]` field per R26) BEFORE signaling, so the dispatcher's BST lookup succeeds
## Sub-round detail
### D2.SR1 — initial run, hook never fires (wrong LR filter)
Filter checked `creator_lr ∈ [0x821CB15C, 0x821CB160]` against `ctx.lr` at `nt_create_event` entry. But `ctx.lr` is the **thunk wrapper return slot** (`0x824a9f6c`), not the guest caller's post-bl PC. Confirmed via handle-audit `created stack` dump: frame 0 lr=`0x824a9f6c`, frame 1 lr=`0x821cb15c`. The guest caller's LR lives one frame up the PPC EABI back-chain.
Diagnosis classification: **D (filter mismatch)**. Reading-error class #20 (new).
### D2.SR2 — frame-1-LR fix; hook schedules, never fires
Refactored `maybe_register_silph_autosignal` to take `(ctx, mem)`, walk back-chain via existing `walk_guest_back_chain` (1 step), match the saved LR. Hook now fires:
```
silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 10000 (now=0, delay=10000)
```
But no "firing" log appears, and tid=13 stays Blocked. Classification: **D (drain site never reached)**.
### D2.SR3 — diagnostic added; confirms drain site never visited
Added a one-shot info-level "tick (first visit, none due)" log inside `fire_due_silph_autosignals` when pending is non-empty but nothing due. Re-ran. **The tick-diagnostic never fired either** — proving the function isn't being called at all in `--parallel` mode.
Root cause: `--parallel` dispatches to `run_execution_parallel` (line 2928 of main.rs), which has its own outer loop at line 3186. My Phase D wiring only touched the lockstep path at line 2763. Classification: **D (wrong code path wired)**.
### D2.SR4 — parallel-path wiring added; hook fires; tid=13 unblocks
Added the same `set_now_cycle_hint` + `fire_due_silph_autosignals` calls inside the parallel outer loop, right after `coord_pre_round` (and under the same `kernel_arc` guard, so no extra locking). Re-built, re-ran.
Now all three log lines appear:
```
silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 16326202 (now=16316202, delay=10000)
silph autosignal: tick (first visit, none due) now=16316213 pending=1 first_deadline=16326202
silph autosignal: firing handle=0x1078 prev_signaled=Some(false) at cycle 16326209
```
`now=16316202` at schedule time confirms `set_now_cycle_hint` is wired through correctly (the parallel path was simply never visited in SR2/SR3). Fire at cycle 16326209 = deadline 16326202 + 7-cycle scheduler granularity. Diagnostic classification: **B (partial unwedge — new waits and cxx_throw downstream)**.
## Code shape
POC is ~70 LOC across four files, all env-gated. Default off.
| File | Change | Lines |
|---|---|---|
| `crates/xenia-cpu/src/scheduler.rs` | `GuestThread.start_entry/start_context` fields; `spawn()` populates; `current_thread_entry_and_ctx()` helper | +18 |
| `crates/xenia-kernel/src/state.rs` | `AutoSignalPending` struct; `silph_autosignal_*` fields; `set_now_cycle_hint`, `maybe_register_silph_autosignal`, `fire_due_silph_autosignals` methods | +95 |
| `crates/xenia-kernel/src/exports.rs` | Hook in `nt_create_event` | +3 |
| `crates/xenia-app/src/main.rs` | Fire-site wiring in lockstep loop (line 2788) **and** parallel loop (line 3215) | +12 |
Tests stay green at **655/655**.
## Reading-error class #20 (new)
**`ctx.lr` at kernel export entry ≠ guest caller's post-bl PC.** When a guest `bl` calls an export thunk, the thunk-wrapper has its own frame between the guest caller and the export body. At export-body entry, `ctx.lr` holds the *wrapper's* return slot, not the guest caller's post-bl PC.
To match a specific guest call site by LR, the export must walk one step up the back-chain (`walk_guest_back_chain(ctx.gpr[1], ctx.lr, mem, 2)`) and use `frames[1].lr`.
SR1 burned one full sub-round on this. Detect early in future POCs by comparing `ctx.lr` against the handle-audit's `created stack` frame dump for a known-good event (e.g. one created from a labelled site).
## Reading-error class #21 (new)
**`--parallel` and lockstep have separate outer loops in main.rs.** They share `coord_pre_round` (carved out exactly for this reason), but anything wired adjacent to that call site only takes effect on the path it's wired on. Lockstep is `run_execution` (line 2706, outer loop at 2763). Parallel is `run_execution_parallel` (line 2928, outer loop at 3186).
Per-round hooks added for a specific build mode must be wired in **both** paths. SR2/SR3 burned two sub-rounds on this.
## Files modified + LR mapping (for follow-up sessions)
**Wedge handle creation** (confirmed by handle-audit dump):
```
created cycle=0 tid=13 lr=0x824a9f6c [src=NtCreateEvent thunk return]
created stack (6 frames):
[ 0] fp=0x715a7a10 lr=0x824a9f6c ← ctx.lr at nt_create_event
[ 1] fp=0x715a7aa0 lr=0x821cb15c ← guest caller's post-bl PC (filter on this)
[ 2] fp=0x715a7bd0 lr=0x821cbae0 ← sub_821CBA08 frame
[ 3] fp=0x715a7cd0 lr=0x821cc454 ← sub_821CC3F8 frame
[ 4] fp=0x715a7d60 lr=0x821c4f18 ← sub_821C4EB0 frame (silph::UImpl@GamePart_Title)
[ 5] fp=0x715a7e00 lr=0x82174a80 ← sub_821748F0 trampoline frame
```
**Downstream cxx_throw stack** (after auto-signal fires, tid=5 throws runtime_error):
```
L0 lr=0x82612b50 std::exception throw path
L1 lr=0x825f2444
L2 lr=0x824547e8
L3 lr=0x82451418
L4 lr=0x82450d08 ← sub_82450B60+0x1A8 (dispatcher, audit-059 R26)
L5 lr=0x82450b34
L6 lr=0x82450a50 ← sub_82450a50 (worker dispatch)
cxx_throw runtime_error decoded magic=0x19930520
cxx_throw BST ceil search candidate_key=0x828e2b2c match_found=false
cxx_throw lhs (not-registered instance) lhs=0x715a7af0
```
This confirms the dispatcher reached audit-049 territory (R26's `sub_82450B60+0x1A8` PC `0x82450D08`), looked up a runtime instance in its BST keyed by VA, and the instance was never registered. **The auto-signal bypassed an upstream registration step** the real signaler would have driven.
## Recommendation
Ship the POC env-gated (default off; no behavior change unless opted in). The verdict-B success makes it a useful diagnostic flag for future audit-049 work: future investigations can set `XENIA_SILPH_UI_AUTOSIGNAL_DELAY=10000` to skip the wedge and probe downstream behavior without first writing the proper signaler.
Long-term fix path remains the R33 silph_ui_synth.rs analogue: synthesize the missing signaler + its precondition state (BST instance registration at `0x715a7af0`-equivalent, work-item state `[+8]` per R26). The auto-signal POC is **not** the final fix — it confirms diagnosis but doesn't honor the dispatcher's BST registry invariant.
## Artifacts
- `poc-sr1.log`, `poc-sr1.stderr` — initial run, filter mismatch (D)
- `poc-sr2.log`, `poc-sr2.stderr` — frame-1-LR fix, no fire (D)
- `poc-sr3.log`, `poc-sr3.stderr` — diagnostic added, no fire (D, parallel path unwired)
- `poc-sr4.log`, `poc-sr4.stderr` — parallel-path wired, **fires + partial unwedge (B)**
All `.log`/`.stderr` files are `.gitignore`d; this `FINDINGS.md` is the only artifact-side commit.

View File

@@ -0,0 +1,200 @@
0x82450b60: lwz r18, 9792(r31)
0x82450b64: lwz r16, 13880(r14)
0x82450b68: mflr r12
0x82450b6c: bl 0x825F0F74
0x82450b70: subi r31, r1, 176
0x82450b74: stwu r1, -176(r1)
0x82450b78: mr r29, r4
0x82450b7c: mr r27, r3
0x82450b80: cmpwi cr6, r29, 5
0x82450b84: bne cr6, 0x82450B94
0x82450b88: addi r28, r27, 196
0x82450b8c: addi r26, r27, 28
0x82450b90: b 0x82450BAC
0x82450b94: slwi r11, r29, 2
0x82450b98: mr r26, r27
0x82450b9c: add r11, r29, r11
0x82450ba0: slwi r11, r11, 2
0x82450ba4: add r11, r11, r27
0x82450ba8: addi r28, r11, 96
0x82450bac: addi r23, r27, 56
0x82450bb0: mr r3, r23
0x82450bb4: stw r23, 84(r31)
0x82450bb8: bl 0x8284DCFC
0x82450bbc: mr r3, r26
0x82450bc0: bl 0x8284DCFC
0x82450bc4: lwz r7, 16(r28)
0x82450bc8: cntlzw r11, r7
0x82450bcc: extrwi r11, r11, 1, 26
0x82450bd0: cmplwi cr6, r11, 0x0
0x82450bd4: beq cr6, 0x82450BEC
0x82450bd8: mr r3, r26
0x82450bdc: bl 0x8284DD0C
0x82450be0: mr r3, r23
0x82450be4: bl 0x8284DD0C
0x82450be8: b 0x82450EE8
0x82450bec: lwz r11, 12(r28)
0x82450bf0: lwz r9, 8(r28)
0x82450bf4: srwi r10, r11, 2
0x82450bf8: clrlwi r8, r11, 30
0x82450bfc: cmplw cr6, r9, r10
0x82450c00: bgt cr6, 0x82450C08
0x82450c04: sub r10, r10, r9
0x82450c08: lwz r9, 4(r28)
0x82450c0c: slwi r10, r10, 2
0x82450c10: slwi r8, r8, 2
0x82450c14: lwz r6, 8(r28)
0x82450c18: addi r11, r11, 1
0x82450c1c: slwi r6, r6, 2
0x82450c20: li r24, 0
0x82450c24: lwzx r10, r10, r9
0x82450c28: cmplw cr6, r6, r11
0x82450c2c: lwzx r30, r10, r8
0x82450c30: stw r11, 12(r28)
0x82450c34: stw r30, 80(r31)
0x82450c38: bgt cr6, 0x82450C40
0x82450c3c: stw r24, 12(r28)
0x82450c40: subic. r11, r7, 1
0x82450c44: stw r11, 16(r28)
0x82450c48: bne 0x82450C50
0x82450c4c: stw r24, 12(r28)
0x82450c50: addi r25, r27, 28
0x82450c54: mr r3, r25
0x82450c58: bl 0x8284DCFC
0x82450c5c: mr r3, r25
0x82450c60: stw r30, 216(r27)
0x82450c64: bl 0x8284DD0C
0x82450c68: mr r3, r26
0x82450c6c: bl 0x8284DD0C
0x82450c70: lwz r11, 28(r30)
0x82450c74: clrlwi r11, r11, 31
0x82450c78: cmplwi cr6, r11, 0x0
0x82450c7c: bne cr6, 0x82450D30
0x82450c80: lwz r11, 8(r30)
0x82450c84: cmplwi cr6, r11, 0x1
0x82450c88: blt cr6, 0x82450CE4
0x82450c8c: bne cr6, 0x82450D3C
0x82450c90: lwz r11, 28(r30)
0x82450c94: rlwinm r11, r11, 0, 29, 29
0x82450c98: cmplwi cr6, r11, 0x0
0x82450c9c: beq cr6, 0x82450CB0
0x82450ca0: mr r4, r30
0x82450ca4: mr r3, r27
0x82450ca8: bl 0x824510E0
0x82450cac: b 0x82450CBC
0x82450cb0: mr r4, r30
0x82450cb4: mr r3, r27
0x82450cb8: bl 0x824517B0
0x82450cbc: stw r29, 220(r27)
0x82450cc0: bl 0x824AA830
0x82450cc4: mr r11, r3
0x82450cc8: lwz r3, 92(r27)
0x82450ccc: li r5, 0
0x82450cd0: addi r11, r11, 66
0x82450cd4: li r4, 1
0x82450cd8: stw r11, 224(r27)
0x82450cdc: bl 0x824AB158
0x82450ce0: b 0x82450D3C
0x82450ce4: lwz r11, 28(r30)
0x82450ce8: mr r4, r30
0x82450cec: mr r3, r27
0x82450cf0: rlwinm r11, r11, 0, 29, 29
0x82450cf4: cmplwi cr6, r11, 0x0
0x82450cf8: beq cr6, 0x82450D04
0x82450cfc: bl 0x82450F68
0x82450d00: b 0x82450D08
0x82450d04: bl 0x82451238
0x82450d08: stw r29, 220(r27)
0x82450d0c: bl 0x824AA830
0x82450d10: mr r11, r3
0x82450d14: lwz r3, 92(r27)
0x82450d18: li r5, 0
0x82450d1c: addi r11, r11, 66
0x82450d20: li r4, 1
0x82450d24: stw r11, 224(r27)
0x82450d28: bl 0x824AB158
0x82450d2c: b 0x82450D3C
0x82450d30: lwz r11, 28(r30)
0x82450d34: ori r11, r11, 0x2
0x82450d38: stw r11, 28(r30)
0x82450d3c: lwz r11, 8(r30)
0x82450d40: mr r29, r24
0x82450d44: cmpwi cr6, r11, 2
0x82450d48: blt cr6, 0x82450E08
0x82450d4c: cmpwi cr6, r11, 3
0x82450d50: ble cr6, 0x82450DA0
0x82450d54: cmpwi cr6, r11, 4
0x82450d58: bne cr6, 0x82450E08
0x82450d5c: lwz r11, 28(r30)
0x82450d60: rlwinm r11, r11, 0, 29, 29
0x82450d64: cmplwi cr6, r11, 0x0
0x82450d68: bne cr6, 0x82450D98
0x82450d6c: lwz r29, 36(r30)
0x82450d70: mr r3, r29
0x82450d74: lwz r11, 0(r29)
0x82450d78: lwz r11, 4(r11)
0x82450d7c: mtctr r11
0x82450d80: bctrl
0x82450d84: clrlwi r11, r3, 24
0x82450d88: cmplwi cr6, r11, 0x0
0x82450d8c: beq cr6, 0x82450D98
0x82450d90: mr r3, r29
0x82450d94: bl 0x8244FB38
0x82450d98: li r29, 1
0x82450d9c: b 0x82450E28
0x82450da0: addi r3, r30, 40
0x82450da4: bl 0x82451DB8
0x82450da8: lwz r11, 32(r30)
0x82450dac: cmplwi cr6, r11, 0x0
0x82450db0: beq cr6, 0x82450DCC
0x82450db4: rlwinm r11, r11, 0, 0, 31
0x82450db8: lwz r10, 4(r30)
0x82450dbc: lwz r11, 4(r11)
0x82450dc0: cmplw cr6, r10, r11
0x82450dc4: li r11, 1
0x82450dc8: beq cr6, 0x82450DD0
0x82450dcc: mr r11, r24
0x82450dd0: clrlwi r11, r11, 24
0x82450dd4: cmplwi cr6, r11, 0x0
0x82450dd8: beq cr6, 0x82450E00
0x82450ddc: lwz r4, 8(r30)
0x82450de0: lwz r5, 0(r30)
0x82450de4: lwz r3, 32(r30)
0x82450de8: cmpwi cr6, r4, 1
0x82450dec: ble cr6, 0x82450DFC
0x82450df0: bl 0x8245D9D8
0x82450df4: li r29, 1
0x82450df8: b 0x82450E28
0x82450dfc: stw r4, 8(r3)
0x82450e00: li r29, 1
0x82450e04: b 0x82450E28
0x82450e08: mr r3, r26
0x82450e0c: stw r26, 88(r31)
0x82450e10: bl 0x8284DCFC
0x82450e14: addi r4, r31, 80
0x82450e18: mr r3, r28
0x82450e1c: bl 0x823232C0
0x82450e20: mr r3, r26
0x82450e24: bl 0x8284DD0C
0x82450e28: clrlwi r11, r29, 24
0x82450e2c: cmplwi cr6, r11, 0x0
0x82450e30: beq cr6, 0x82450ECC
0x82450e34: lwz r11, 28(r30)
0x82450e38: rlwinm r11, r11, 0, 30, 30
0x82450e3c: cmplwi cr6, r11, 0x0
0x82450e40: beq cr6, 0x82450E68
0x82450e44: mr r3, r26
0x82450e48: stw r26, 88(r31)
0x82450e4c: bl 0x8284DCFC
0x82450e50: addi r4, r31, 80
0x82450e54: mr r3, r28
0x82450e58: bl 0x823232C0
0x82450e5c: mr r3, r26
0x82450e60: bl 0x8284DD0C
0x82450e64: b 0x82450ECC
0x82450e68: lwz r11, 40(r30)
0x82450e6c: cmplwi cr6, r11, 0x0
0x82450e70: beq cr6, 0x82450EA4
0x82450e74: rlwinm r3, r11, 0, 0, 31
0x82450e78: bl 0x82458A70
0x82450e7c: lwz r29, 40(r30)

View File

@@ -0,0 +1,80 @@
0x82451238: mflr r12
0x8245123c: li r0, 0
0x82451240: stw r0, 4(r1)
0x82451244: bl 0x825F0F80
0x82451248: subi r31, r1, 160
0x8245124c: stwu r1, -160(r1)
0x82451250: mr r30, r4
0x82451254: li r9, 1
0x82451258: lwz r10, 32(r30)
0x8245125c: stw r30, 188(r31)
0x82451260: stw r9, 8(r30)
0x82451264: cmplwi cr6, r10, 0x0
0x82451268: beq cr6, 0x82451288
0x8245126c: lwz r11, 4(r30)
0x82451270: lwz r8, 4(r10)
0x82451274: cmplw cr6, r11, r8
0x82451278: bne cr6, 0x82451288
0x8245127c: mr r11, r9
0x82451280: li r26, 0
0x82451284: b 0x82451290
0x82451288: li r26, 0
0x8245128c: mr r11, r26
0x82451290: clrlwi r11, r11, 24
0x82451294: cmplwi cr6, r11, 0x0
0x82451298: beq cr6, 0x824512A0
0x8245129c: stw r9, 8(r10)
0x824512a0: lwz r3, 36(r30)
0x824512a4: lwz r11, 0(r3)
0x824512a8: lwz r11, 32(r11)
0x824512ac: mtctr r11
0x824512b0: bctrl
0x824512b4: mr r27, r3
0x824512b8: stw r26, 84(r31)
0x824512bc: stw r27, 96(r31)
0x824512c0: bl 0x82454498
0x824512c4: addi r4, r31, 84
0x824512c8: bl 0x82454580
0x824512cc: stw r26, 92(r31)
0x824512d0: addi r11, r27, 2047
0x824512d4: lis r10, 0x2
0x824512d8: clrrwi r11, r11, 11
0x824512dc: cmplw cr6, r11, r10
0x824512e0: stw r11, 100(r31)
0x824512e4: ble cr6, 0x824512F4
0x824512e8: lis r11, 0x8207
0x824512ec: addi r11, r11, 6724
0x824512f0: b 0x824512F8
0x824512f4: addi r11, r31, 100
0x824512f8: addi r3, r31, 84
0x824512fc: lwz r4, 0(r11)
0x82451300: bl 0x82454B08
0x82451304: mr r8, r8
0x82451308: mr r28, r3
0x8245130c: stw r28, 92(r31)
0x82451310: b 0x82451324
0x82451314: lwz r30, 188(r31)
0x82451318: lwz r27, 96(r31)
0x8245131c: li r26, 0
0x82451320: lwz r28, 92(r31)
0x82451324: addi r3, r31, 84
0x82451328: bl 0x82454AA0
0x8245132c: mr r29, r3
0x82451330: cmplwi cr6, r28, 0x0
0x82451334: beq cr6, 0x82451684
0x82451338: lwz r3, 36(r30)
0x8245133c: li r8, 0
0x82451340: addi r7, r31, 88
0x82451344: mr r6, r29
0x82451348: mr r5, r29
0x8245134c: mr r4, r28
0x82451350: lwz r11, 0(r3)
0x82451354: lwz r11, 28(r11)
0x82451358: mtctr r11
0x8245135c: bctrl
0x82451360: clrlwi r11, r3, 24
0x82451364: cmplwi cr6, r11, 0x0
0x82451368: beq cr6, 0x82451684
0x8245136c: lwz r11, 28(r30)
0x82451370: rlwinm r11, r11, 0, 28, 28
0x82451374: cmplwi cr6, r11, 0x0

View File

@@ -0,0 +1,52 @@
=== Fire counts ===
ours: 3
canary: 7
=== Per-LR breakdown ===
ours:
lr=0x82458674: 3
canary:
lr=0x82457bd4: 2
lr=0x82458674: 5
=== Side-by-side first 5 fires (entry registers) ===
--- fire #0 ---
ours: tid=6 cycle=363 lr=0x82458674 r3=0x40ba9ac0
dump: 419fecda 000007f6 00000000 41d7dd10 00001688 00000000 00000000 41f5dd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a5c0
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4ac0 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb19cda 000007f6 00000000 bde98d10 00001688 00000000 00000000 be078d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365760
--- fire #1 ---
ours: tid=6 cycle=140548 lr=0x82458674 r3=0x40ba9b80
dump: 42c0f09a 00018ff6 00000000 43777210 0004d055 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a960
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bed2a09a 00018ff6 00000000 bf892210 0004d055 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365840
--- fire #2 ---
ours: tid=6 cycle=5957876 lr=0x82458674 r3=0x40ba9b80
dump: 419fecda 000007f6 00000000 414f5f70 000003b9 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 4024a980
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb19cda 000007f6 00000000 bd610b90 000003b9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 bc365860
--- fire #3 ---
ours: <no fire>
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc5300 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb1acda 000007f6 00000000 bce24ed0 00000167 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365f40
--- fire #4 ---
ours: <no fire>
canary: tid=6 cycle=<unk> lr=0x82457bd4 r3=0x701cf3c0 r4=0x00000004 r5=0x00002530 r6=0x00008000 r7=0x00000001
dump: be95af9a 0000c170 00000000 b2050010 000681e9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 0000c17a 00000001 701cf4e0 00000000 be95af90
=== Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===
Both fields are stable identifiers across engines (host VAs differ but data should match).
Index of fields:
[+0x04] = work-item 'size?' (looks like a length field)
[+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]
ours [+04,+10]: [(2038, 5768), (102390, 315477), (2038, 953)]
canary [+04,+10]: [(2038, 5768), (102390, 315477), (2038, 953), (2038, 359), (49520, 426473), (232195, 999643), (6134, 13763)]
ours fires whose [+04,+10] match a canary fire: 3/3

View File

@@ -0,0 +1,175 @@
#!/usr/bin/env python3
"""Round 35 lockstep diff: align sub_8280AD40 entry fires between
ours (--audit-pc-probe-hex AUDIT-PC-PROBE / AUDIT-R3-DUMP) and
canary (AUDIT-HLC JitProlog).
Outputs side-by-side rendering of:
- per-fire entry register snapshot (r3..r10, lr)
- 64-byte r3 dump (u32 lanes, big-endian)
Alignment is by tid + invocation order (no input-equivalence required).
"""
import re
import sys
import os
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
OURS_LOG = os.path.join(THIS_DIR, "ours.log")
CANARY_LOG = os.path.join(
os.path.dirname(THIS_DIR), "round35-lockstep-inflate-canary", "canary.log"
)
PC_TARGET = 0x8280AD40
def parse_ours(path):
"""Pair AUDIT-PC-PROBE lines with their following AUDIT-R3-DUMP lines."""
fires = []
cur = None
with open(path) as f:
for line in f:
line = line.strip()
if line.startswith("AUDIT-PC-PROBE"):
m = re.search(
r"pc=0x([0-9a-f]+) tid=(\d+) hw=\d+ cycle=(\d+) lr=0x([0-9a-f]+) r3=0x([0-9a-f]+) r11=0x([0-9a-f]+)",
line,
)
if not m:
continue
pc = int(m.group(1), 16)
if pc != PC_TARGET:
cur = None
continue
cur = {
"tid": int(m.group(2)),
"cycle": int(m.group(3)),
"lr": int(m.group(4), 16),
"r3": int(m.group(5), 16),
"dump": [],
}
fires.append(cur)
elif line.startswith("AUDIT-R3-DUMP") and cur is not None:
lanes = re.findall(r"\+0x[0-9a-f]+=0x([0-9a-f]+)", line)
cur["dump"] = [int(x, 16) for x in lanes]
cur = None
return fires
def parse_canary(path):
"""Pair AUDIT-HLC JitProlog header lines with following r3+NN dump lines."""
fires = []
cur = None
hdr_re = re.compile(
r"AUDIT-HLC JitProlog pc=8280AD40 tid=([0-9A-F]+) r3=([0-9A-F]+) r4=([0-9A-F]+) "
r"r5=([0-9A-F]+) r6=([0-9A-F]+) r7=([0-9A-F]+) r8=([0-9A-F]+) r9=([0-9A-F]+) r10=([0-9A-F]+) lr=([0-9A-F]+)"
)
dump_re = re.compile(
r"AUDIT-HLC JitProlog pc=8280AD40 r3\+([0-9A-F]+): ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+)"
)
with open(path) as f:
for line in f:
line = line.strip()
m = hdr_re.search(line)
if m:
cur = {
"tid": int(m.group(1), 16),
"r3": int(m.group(2), 16),
"r4": int(m.group(3), 16),
"r5": int(m.group(4), 16),
"r6": int(m.group(5), 16),
"r7": int(m.group(6), 16),
"r8": int(m.group(7), 16),
"r9": int(m.group(8), 16),
"r10": int(m.group(9), 16),
"lr": int(m.group(10), 16),
"dump": [],
}
fires.append(cur)
continue
m = dump_re.search(line)
if m and cur is not None:
off = int(m.group(1), 16)
for i in range(4):
word = int(m.group(2 + i), 16)
# extend dump to fit
idx = off // 4 + i
while len(cur["dump"]) <= idx:
cur["dump"].append(0)
cur["dump"][idx] = word
return fires
def fmt_dump(d):
return " ".join(f"{w:08x}" for w in d[:16])
def main():
ours = parse_ours(OURS_LOG)
canary = parse_canary(CANARY_LOG)
print(f"=== Fire counts ===")
print(f" ours: {len(ours)}")
print(f" canary: {len(canary)}")
print()
print(f"=== Per-LR breakdown ===")
for label, fires in (("ours", ours), ("canary", canary)):
lr_counts = {}
for f in fires:
lr_counts[f["lr"]] = lr_counts.get(f["lr"], 0) + 1
print(f" {label}:")
for lr, n in sorted(lr_counts.items()):
print(f" lr=0x{lr:08x}: {n}")
print()
print(f"=== Side-by-side first 5 fires (entry registers) ===")
n = max(len(ours), len(canary))
n = min(n, 5)
for i in range(n):
print(f"\n--- fire #{i} ---")
if i < len(ours):
f = ours[i]
print(
f" ours: tid={f['tid']:<3} cycle={f['cycle']:<10} lr=0x{f['lr']:08x} r3=0x{f['r3']:08x}"
)
print(f" dump: {fmt_dump(f['dump'])}")
else:
print(f" ours: <no fire>")
if i < len(canary):
f = canary[i]
print(
f" canary: tid={f['tid']:<3} cycle=<unk> lr=0x{f['lr']:08x} r3=0x{f['r3']:08x} "
f"r4=0x{f['r4']:08x} r5=0x{f['r5']:08x} r6=0x{f['r6']:08x} r7=0x{f['r7']:08x}"
)
print(f" dump: {fmt_dump(f['dump'])}")
else:
print(f" canary: <no fire>")
print()
print("=== Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===")
print(" Both fields are stable identifiers across engines (host VAs differ but data should match).")
print()
print(" Index of fields:")
print(" [+0x04] = work-item 'size?' (looks like a length field)")
print(" [+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]")
print()
# +0x04 is dump[1], +0x10 is dump[4]
ours_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in ours]
canary_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in canary]
print(f" ours [+04,+10]: {ours_keys}")
print(f" canary [+04,+10]: {canary_keys}")
print()
# Cross-match: every ours key should appear in canary (canary is a superset)
matched = []
unmatched_ours = []
for k in ours_keys:
if k in canary_keys:
matched.append(k)
else:
unmatched_ours.append(k)
print(f" ours fires whose [+04,+10] match a canary fire: {len(matched)}/{len(ours)}")
if unmatched_ours:
print(f" ours fires with NO canary match: {unmatched_ours}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,17 @@
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 tid=00000006 r3=BCCC4A80 r4=00000018 r5=828F3888 r6=701CF924 r7=82456F00 r8=00000000 r9=00000000 r10=00000018 lr=822F1D5C
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+00: BC22C910 00010004 00000000 000003E8
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+10: 0101FFFF 00000000 00000000 01010000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+20: FFFFFFFF 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+30: 00000000 BC365BC0 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+40: 00000000 00000000 00000000 BDE9A398
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+50: BC365560 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+60: 00000000 00000000 00000000 01010040
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+70: 00000000 00000000 00000000 FFFFFFFF
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+80: 00000000 00000000 00000000 BC22C930
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+90: 00000000 00000001 00000800 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+A0: F800004C 00000000 00000000 BC365220
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+B0: BC3655C0 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+C0: 00CC0048 00460020 00460072 00650071
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+D0: 00750065 006E0063 00790000 01010000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+E0: 00000000 00000000 00000000 FFFFFFFF
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+F0: 00000000 00000000 00000000 BD610B80

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbd84000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
564.236:00dc:013c:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
564.240:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
564.240:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
564.399:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.839:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
564.839:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
564.839:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
564.843:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: Promoting write cache to read cache. No need to merge any disk caches.
564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 1.012 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.607 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.370 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
564.903:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
564.903:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
564.946:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.067:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
565.067:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
565.067:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.136 ms.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.221 ms.
565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
565.075:00dc:013c:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
565.173:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
565.194:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
565.195:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
565.773:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
566.349:00dc:016c:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
566.387:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
805.907:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
805.910:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
805.910:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
805.955:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.105:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
806.105:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
806.105:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.161 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.185 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
806.154:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
806.154:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
806.197:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
806.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
806.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.156 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.659 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.035 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
806.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
806.408:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
806.422:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
806.423:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
806.948:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
807.499:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
807.521:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
893.096:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
893.099:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
893.099:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
893.145:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.310:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
893.310:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
893.310:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.187 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.161 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.040 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
893.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
893.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
893.405:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.522:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
893.522:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
893.522:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.153 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.199 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.034 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
893.529:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
893.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
893.631:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
893.632:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
894.203:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
894.705:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
894.727:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
956.778:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
956.781:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
956.781:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
956.826:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.985:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
956.985:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
956.985:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
956.985:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.171 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.269 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
957.031:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
957.031:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
957.075:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.188:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
957.188:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
957.188:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.172 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.231 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
957.195:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
957.285:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
957.806:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
958.343:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
958.382:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
1217.111:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1217.111:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1217.160:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1217.309:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.166 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.173 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1217.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1217.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1217.403:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1217.516:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.157 ms.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.208 ms.
1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.032 ms.
1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1217.524:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
1217.612:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
1218.136:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
1218.678:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
1218.699:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
1413.919:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1413.919:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1413.963:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1414.111:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.173 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.276 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1414.157:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1414.157:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1414.199:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1414.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.158 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.256 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1414.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
1414.406:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
1414.927:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
1415.477:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
1415.500:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,47 @@
# iterate-2D Deferred Structural Fixes — Outcome
Branch `iterate-2D/subsystem-fixes`. After verification + the user's go-ahead:
## Issue 1 — 32-bit word-form ALU truncation (PPCBUG-020) — ✅ FIXED & LANDED
Commit **341196a**. Confirmed load-bearing via runtime ours-vs-canary capture:
Sylpheed's ms→LARGE_INTEGER converter `sub_824ACA88` (`clrldi; mulli r11,r11,-10000; std`)
produced `0x00000000_FFFD8F00` in ours vs canary's correct `0xFFFFFFFF_FFFD8F00` for a 16 ms
wait — a positive (absolute) timeout → ~26000× over-wait that froze the main frame loop.
Fixed the 17 data-losing word-form ops (full 64-bit result, CA/OV/CR0 preserved byte-identical),
updated 7 bug-asserting tests, re-baselined `sylpheed_n50m` (imports 40454→1790936), `sylpheed_n2m`
unchanged. 660/660 + ignored oracle green; lockstep determinism preserved. Boot unwedged
(parallel NtWaitForMultipleObjectsEx 94→30428; frozen worker/critical-section loops now run).
VdSwap still 1 — rendering progression needs the out-of-scope acd1656 fixes (nt_create_event
polarity + 2.AF), not in this branch.
## Issue 2 — Memory page-size per-region collapse — DEFERRED (verified NOT load-bearing)
Sylpheed requests `MmAllocatePhysicalMemoryEx` with flags=0, alignment(r8)=0 (default); ours returns
self-consistent 4K-aligned addresses and boots. ours has no 0xA0/0xC0/0xE0 physical-region model at
all, so a faithful fix is a region-model rewrite that shifts every physical guest VA (golden-breaking,
invalidates the audit-059 VA map) with no demonstrated boot benefit. A partial page-size-only change
would shift VAs for zero correctness gain — do NOT do it piecemeal. Pursue only if a render-path
struct is proven to depend on physical region/alignment.
## Issue 3 — Timing — LEFT (not load-bearing / determinism-coupled)
- 3d DPC/APC: INERT — the only timer (NtSetTimerEx) passes a NULL APC routine; no
NtQueueApcThread/KeInsertQueueDpc imported.
- 3b timeout sign: was a SYMPTOM of Issue 1 (the "positive absolute" timeouts were mulli-corruption
artifacts) — resolved by the Issue 1 fix.
- 3a/3c timebase/skew: timebase = instruction-count IS the deterministic lockstep clock; must not
become wallclock. 2.AF deadline-drain already present. Not load-bearing for Sylpheed.
## Issue 4 — VFS synthesized-success-on-miss — LEFT (risky / coupled to Issue 1 trajectory)
The synthesis fallback handles a MIX (writable-partition probes partition0/Cache0 + a genuine disc
miss dat/files.tbl, verified absent from the ISO). Canary doesn't fire XamShowDirtyDiscErrorUI during
boot (the one "DirtyDisc" log hit is the import-table declaration). Not cleanly separable without
heuristic disc-vs-partition routing. Re-verify on the corrected post-Issue-1 (and post-acd1656)
trajectory before changing.
## Issue 5 — Mutant object — SKIPPED (verified unused)
Sylpheed's XEX import table contains NO mutant symbols (NtCreateMutant/NtReleaseMutant/KeReleaseMutant/
KeInitializeMutant/NtQueryMutant) — the game cannot call them; unimplemented=0 across boot. A correct
implementation needs mutant hand-off semantics + an owner-type redesign (the existing
`Mutex { owner: Option<u8> }` tracks a HW slot, not a thread) in the determinism-critical wait path,
for code that never executes. Per the mandate's skip-if-unused criterion, left unimplemented. Can be
added on request as a pure canary-parity / future-title feature (determinism-safe since no Sylpheed
mutant ever exists at runtime).

View File

@@ -7,7 +7,11 @@ build = "build.rs"
[dependencies]
xenia-xex = { workspace = true }
xenia-cpu = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
rusqlite = { workspace = true }
metrics = { workspace = true }
duckdb = { workspace = true }
msvc-demangler = "0.11"

View File

@@ -0,0 +1,570 @@
# `xenia-analysis` schema reference
Authoritative documentation for the DuckDB tables and SQL views produced by
`xenia-rs dis --db sylpheed.db`. Track schema changes here alongside any
update to the `db_schema_golden` test fixture.
The base + disasm tables (`metadata`, `sections`, `imports`, `functions`,
`labels`, `instructions`, `xrefs`, opt-in `exec_trace` / `import_calls` /
`branch_trace`) are documented inline in `src/db.rs` doc comment. This file
collects layered analysis additions and forward-work notes.
---
## Layer M1 — `.pdata` boundary correction (landed)
### Schema additions
- `functions.pdata_validated BOOLEAN NOT NULL``true` when the row's
`address` matches a `RUNTIME_FUNCTION.BeginAddress` from `.pdata`. Linker
ground truth.
- `functions.pdata_length BIGINT NULL``function_length` (bytes) from the
matching pdata entry; `NULL` when the row is prologue-only.
- New table `pdata_entries(begin_address BIGINT PRIMARY KEY, end_address
BIGINT, function_length BIGINT, prolog_length BIGINT, flags BIGINT)` — every
parsed `.pdata` `RUNTIME_FUNCTION` entry (raw, before any merge with
prologue analysis).
- Index `idx_functions_pdata_validated` on `functions(pdata_validated)`.
### What this layer does
- Parses `.pdata` 8-byte `RUNTIME_FUNCTION` entries (PowerPC PE32 layout):
word 0 `BeginAddress` (absolute VA), word 1 packed
`{prolog_length:8, function_length:22, flags:2}`, both big-endian.
- Unions pdata `BeginAddress` values into the function-candidate set fed to
the prologue walker, so functions our prologue heuristic missed still get
rows.
- When pdata supplies a longer `function_length` than the prologue walk
found, extends `end_address` to the pdata-implied end (catches mis-split
where the walker stopped at an early `blr`).
- After the walker, performs a forward pass that trims `function.end` to the
next start when they overlap (catches mis-merge where one row spanned two
prologues — the audit-031 `sub_824D23B0` / `sub_824D29F0` case).
### What this layer does NOT do
- Does not adjust prolog-derived `frame_size` / `saved_gprs` from `.pdata`'s
`prolog_length` field — those remain prologue-only inferences.
- Does not classify functions further than the existing `is_leaf` /
`is_saverestore` columns. Class membership is M3.
- Does not detect functions whose entries are missing from BOTH `.pdata`
and the bl-target scan (extremely rare; would require executable-byte
linear sweep).
### Reference docs
- Microsoft PE32+ exception data spec for PowerPC RUNTIME_FUNCTION.
- xenia-canary `src/xenia/cpu/xex_module.cc:1570-1587` — canary's reference
parser (extracts `BeginAddress` only; we additionally decode word 1).
### Validation queries
```sql
-- All pdata entries found
SELECT COUNT(*) FROM pdata_entries; -- ~23073 for Sylpheed
-- Functions cross-validated against pdata
SELECT COUNT(*) FROM functions WHERE pdata_validated;
-- Functions detected ONLY by prologue (orphans of pdata)
SELECT COUNT(*) FROM functions WHERE NOT pdata_validated;
-- Pdata orphans NOT yet in functions (should be 0 after this layer)
SELECT COUNT(*) FROM pdata_entries p
LEFT JOIN functions f ON f.address = p.begin_address
WHERE f.address IS NULL;
-- Audit-031 mis-merge resolved: 0x824D29F0 should have its own row
SELECT name FROM functions WHERE address = 2186674160; -- 0x824D29F0
```
---
## Layer M2 — MSVC C++ name demangler (landed)
### Schema additions
- New table `demangled_names(address BIGINT NULL, mangled VARCHAR NOT NULL,
raw_demangled VARCHAR NOT NULL, namespace_path VARCHAR NULL,
class_name VARCHAR NULL, method_name VARCHAR NULL,
params_signature VARCHAR NULL)`.
- Indices on `address`, `class_name`, `method_name`.
### What this layer does
- Wraps `msvc_demangler::demangle` (a Rust port of LLVM's
`MicrosoftDemangle.cpp`) and splits the formatted output into structured
fields via a heuristic top-level parser (handles templates and nested parens
correctly).
- Populates `demangled_names` from any label whose name starts with `?` plus
any import name that happens to be mangled (defensive — typical kernel
imports use C names).
### What this layer does NOT do
- Does not parse the AST returned by `msvc_demangler::parse` — uses the formatted
string and a heuristic split. Adequate for typical class member functions
and RTTI strings; exotic template / lambda forms still get `raw_demangled`
populated but may have NULL structured fields.
- Does not yet ingest RTTI strings discovered in `.rdata` — that's M3's job;
M3 will append rows to this table at the addresses where it finds RTTI
TypeDescriptors.
### Reference docs
- `msvc-demangler` crate (`https://docs.rs/msvc-demangler/0.11`).
- LLVM `MicrosoftDemangle.cpp` (the parser this crate ports).
## Layer M3 — Vtable + RTTI detection (landed)
### Schema additions
- `vtables(address PK, length, col_address NULL, class_name, rtti_present,
base_classes_json NULL)` — every detected static vtable.
- `methods(vtable_address, slot, function_address, mangled_name NULL,
demangled_name NULL, PRIMARY KEY (vtable_address, slot))` — one row per
method slot.
- `classes(name PK, vtable_address, rtti_present, base_classes_json NULL)` —
deduped by class name (first-detected vtable wins).
- Indices: `methods.function_address`, `classes.rtti_present`.
### What this layer does
- Walks `.rdata` and `.data` looking for runs of ≥3 consecutive 4-byte BE
values where each value is a known function start (from M1's corrected
`functions` table). Single-2-method vtables are intentionally rejected to
control false-positive rate.
- Attempts the MSVC RTTI walk `vtable[-1] → CompleteObjectLocator → TypeDescriptor`
for each candidate. When successful, the demangled `class ClassName`
string fills `class_name` and a best-effort
`RTTIClassHierarchyDescriptor` walk fills `base_classes_json` (JSON array
of base class names).
- Falls back to `ANON_Class_<8-hex>` keyed by FNV-1a hash of the sorted
method-PC tuple when RTTI is absent (typical for shipped game binaries).
Identical vtables across the binary (multiple instances) collapse to the
same anonymous name.
### What this layer does NOT do
- Vtables built at runtime in heap-allocated memory (e.g. by ctors copying
static templates) are out of scope — only static `.rdata`/`.data` content.
- Multiple-inheritance "extra" vftables (one per base subobject) are detected
as independent vtables with no link between them.
- Inheritance-tree walking beyond `RTTIClassHierarchyDescriptor`'s direct
base list is not attempted.
### Reference docs
- openrce.org "Reversing Microsoft Visual C++" — RTTI layout articles
(CompleteObjectLocator at vtable[-1]; TypeDescriptor at COL+0xC; mangled
name at TD+0x8).
## Layer M4 — Class-aware probe targeting (landed)
CLI extension only — no schema changes. The probe-token grammar adds three
symbolic forms on top of the existing `0xADDR` literal:
- `Class::method` — joins `classes` × `methods` × `demangled_names` to find
every PC whose vtable belongs to that class and whose demangled
`method_name` matches.
- `Class::*` — joins `classes` × `methods` to find every method PC of that
class.
- `function_name` — falls back to `functions.name` lookup for free functions
/ saverestore stubs / labels.
Numeric tokens never touch the DB (preserves zero-IO fast path; lockstep
digest unaffected). Symbolic tokens require the DuckDB at `--probe-db PATH`
or `XENIA_PROBE_DB`; default is `sylpheed.db` next to the .iso when present.
Resolution happens BEFORE guest exec begins, so it cannot affect the
lockstep digest.
See `crates/xenia-analysis/src/lookup.rs`.
---
## Layer M5 — Indirect-dispatch reachability (landed)
### Schema additions
- New value `'ind_call'` in the `xrefs.kind` set.
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
`v_reachability_from_entry`, taking `ind_call` edges in the BFS.
### What this layer does
- Walks each `FuncAnalysis.functions` entry with a per-basic-block register
tracker. Recognises the canonical static-vtable pattern:
`lis+addi → lwz off(rA) → mtctr → bcctrl`, where `rA` ends up holding a
known vtable's start address from M3.
- Honours the PowerPC ABI: `bl`-style calls (op 18 / 16 with LK=1) clobber
volatile r0..r12 + ctr but preserve non-volatile r13..r31, so a vtable
pointer parked in r30/r31 before a call survives.
- Treats every M3 `loc_*` label as a basic-block boundary (kills register
state) so jump-IN paths cannot induce false positives.
### What this layer does NOT do (and observed impact)
- Vtable pointer loaded from a `this`-pointer field
(`lwz r_vt, off(rA)` where `rA = this`) — by far the dominant pattern in
real C++ — is unresolvable without alias / points-to analysis.
- On Sylpheed: the layer detects 0 edges. The binary's 1,001 lis+addi
references into vtables are mostly constructor-side **vptr writes**
(`stw rVtable, vptr_offset(this)`), not direct dispatches. The renderer
hunt's audit-009 cluster therefore needs a future M5.5 with `this`-flow
tracking before this layer surfaces it.
### Reference docs
- IBM PowerPC ABI: register-save convention (volatile r0..r12 + ctr,
non-volatile r13..r31).
## Layer M7 — String / constant-pool detection (landed)
### Schema additions
- New table `strings(address PK, encoding, length, content)`.
- Index `idx_strings_encoding`.
### What this layer does
- Scans `.rdata` for runs of length ≥ 6 of printable ASCII bytes followed by
a NUL terminator.
- Scans `.rdata` for UTF-16LE runs of length ≥ 6 code units (printable-ASCII
basic plane only) followed by a u16 NUL terminator.
- Cross-reference is implicit: existing `xrefs.kind='ref'` rows whose
`target` falls in `strings.address`'s exact match set name the referencing
PCs. SQL: `SELECT s.content, x.source FROM xrefs x JOIN strings s
ON s.address = x.target WHERE x.kind='ref'`.
### What this layer does NOT do
- No UTF-8 multibyte / non-ASCII basic plane in either encoding.
- No `.data` scan (read-only-section bias).
- No multi-byte CJK encodings — Japanese text in localised builds may be
represented in shift_jis / utf-8 with non-printable bytes that this
scanner skips.
### Sylpheed yield
- 6,311 ASCII strings (including full embedded HLSL shader source).
- 0 UTF-16LE strings (binary uses ASCII / native CJK encoding).
- 9,132 lis+addi sites cross-reference into the detected strings — names
the source PCs that reference each string.
## Layer M6 — Extended store-class xrefs + `addr_mode` column (landed)
### Schema additions
- `xrefs.addr_mode VARCHAR NULL` — sub-classifies how the source instruction
computes its target. NULL for control-flow edges (call / ind_call / j /
br); one of the following tags for data edges:
- `d_form` — standard signed-16 displacement (lwz/stw/lfs/stfs/etc.)
- `lis_addi` — address materialised via `lis + addi` register tracking
- `lis_ori` — address materialised via `lis + ori`
- `multiword` — `lmw / stmw` (one xref per slot; up to 32-rS slots)
- `x_form_indexed` — `stwx / stbx / sthx / stwux / stbux / sthux / stdx /
stdux / lwzx / lbzx / lhzx / lhax / lwzux / lbzux / lhzux / lhaux / ldx /
ldux` — emitted only when both rA and rB are tracked constants
- `x_form_byterev` — `stwbrx / sthbrx / lwbrx / lhbrx`
- `atomic` — `stwcx. / stdcx.` reservation-conditional stores
- `dcbz` — cache-line clear (32-byte zero at rA+rB)
- Index `idx_xrefs_addr_mode`.
### What this layer does
- Tags every existing data xref with its addressing mode (`d_form` for the
bulk; `lis_addi` / `lis_ori` for the lift-and-add cases that produce
DataRef rows).
- Adds new dispatch for opcode 47 (`stmw`) and 46 (`lmw`), expanding to
per-slot DataWrite / DataRead rows.
- Adds new dispatch for opcode 31 X-form: stores, atomic, byte-reverse,
dcbz. X-form rows are emitted ONLY when both rA and rB resolve to known
constants (otherwise the address is runtime-dependent and we skip).
### What this layer does NOT do
- VMX / VMX128 vector stores (opcode 31 with vector XO codes) are not
emitted — they always have register-indexed addresses that the
lis+addi tracker can't usually resolve, and detecting them adds noise
without improving target resolution.
- The dominant runtime-of-stwx pattern (rA = base, rB = runtime index) is
not resolved — by design; mem-watch covers the runtime side per VERIFY-B.
### Sylpheed yield
- 28,834 `lis_addi` refs, 18,485 `d_form` reads, 3,288 `d_form` writes —
the existing baseline now properly tagged.
- **442 newly-detected `x_form_indexed` reads** — primarily lwzx/lhzx
reads from in-table dispatch (each pair (rA,rB) resolved statically).
- **40 newly-detected `atomic` writes** — every `stwcx.` site with a
resolvable address; useful for reservation-table audits.
- 9 `lis_ori` refs.
- 0 multiword / dcbz / byterev — these instructions exist in the binary
but are not in lis+addi-tracked code paths.
## Layer M8 + M11 — Function-pointer arrays beyond vtables (landed)
### Schema additions
- New table `function_pointer_arrays(address PK, length, kind)` where
`kind` is `'vtable'` (M3 re-emit), `'dispatch_table'` (M8), or
`'static_init'` (M11).
- New table `function_pointer_array_entries(array_address, slot,
function_address, PRIMARY KEY (array_address, slot))` — one row per
slot of every detected array (vtable + non-vtable).
- Indices on `function_pointer_arrays.kind` and
`function_pointer_array_entries.function_address`.
### What this layer does
- Walks `.rdata` (only — `.data` produces too many false positives) for
runs of ≥ 2 consecutive 4-byte BE values where each value is a known
function entry from M1's `functions` table.
- Skips runs whose start matches an M3 vtable head — those are re-emitted
in this table with `kind='vtable'` for unified queries but not
re-classified.
- Heuristically classifies non-vtable runs:
- `static_init` (M11): every entry's first instruction is `mfspr r12, LR`
AND the next is `stwu r1, -N(r1)` with `N ≤ 0x80` (or a save-stub `bl`).
Mirrors the typical C++ static-initialiser prologue.
- `dispatch_table` (M8): everything else.
### What this layer does NOT do
- Does not parse symbol-table-bracketed regions like `__xc_a` / `__xc_z`
/ `__xi_a` / `__xi_z` directly — Sylpheed's symbol table is stripped.
- Does not chain multi-segment static-init drivers; future M11.5 could
walk the entry-point's static-init driver call chain to surface
ground-truth ctor PCs.
- 2-slot runs in `.rdata` may be false positives where two struct fields
happen to alias function VAs; downstream queries should use a length
filter (`WHERE length >= 3`) when high precision matters.
### Sylpheed yield
- 722 vtables (M3 re-emit) + 388 dispatch_tables = 1,110 arrays in
`function_pointer_arrays`.
- 0 static_init detected — Sylpheed's ctors don't all match the
conservative prologue heuristic. Lengths concentrate at 2 slots
(typical of switch-case jump tables).
## Layer M9 — `has_eh` from `.pdata` exception flag (landed)
### Schema additions
- `functions.has_eh BOOLEAN NOT NULL` — true when `.pdata`'s exception-
handler-present bit (bit 31 of word 1, the high bit) is set.
- Index `idx_functions_has_eh`.
### What this layer does
- Derived directly from M1's already-parsed `pdata.flags` bit field (no
new parsing). The bit was always available in `pdata_entries.flags`;
this layer surfaces it as a first-class column on `functions`.
### What this layer does NOT do
- Does not parse the actual `__CxxFrameHandler` / `__C_specific_handler`
scope-table records that the exception bit gates. Walking those tables
would let us name try/catch ranges and per-state cleanup actions, but
is out of scope for a derive-only milestone.
### Sylpheed yield
- 2,975 of 23,073 pdata-validated functions have `has_eh=true` (12.9%) —
plausible MSVC C++ EH coverage rate. Largest EH function: 26,328 bytes
(`sub_823518F0`).
## Layer M10 — `.tls` section / TLS directory (landed)
### Schema additions
- New table `tls_info(raw_data_start, raw_data_end, index_address,
callback_array, zero_fill_size, characteristics)` — at most one row
(the IMAGE_TLS_DIRECTORY32).
- New table `tls_callbacks(slot PK, address)` — one row per resolved TLS
callback function.
### What this layer does
- Reads the first 24 bytes of the `.tls` section as an
`IMAGE_TLS_DIRECTORY32` and walks the zero-terminated callback array.
- All addresses stored as absolute VAs.
### What this layer does NOT do
- Does not parse the raw TLS template content (the variable initialiser
block); just records its start/end VAs.
### Sylpheed yield
- 0 rows — Sylpheed has no `.tls` section. Infrastructure ready for any
binary that uses `__declspec(thread)` storage.
## Layer M12 — `--lr-trace` runtime canary-diff harness (landed)
### Runtime additions (no DB)
- New CLI flag `--lr-trace=PC[,PC,...]` on `exec` — comma-separated PCs
to capture as JSONL records on every fire. Symbolic tokens (`Class::method`)
resolve via M4's lookup against `--probe-db`. Settable via
`XENIA_LR_TRACE`.
- New CLI flag `--lr-trace-out=PATH` — writes JSONL to a file (one
record per line). Stdout when omitted. Settable via `XENIA_LR_TRACE_OUT`.
- New kernel state fields `lr_trace_pcs: HashSet<u32>` +
`lr_trace_writer: Option<Mutex<File>>` and helper
`KernelState::fire_lr_trace_if_match(hw_id)` invoked from the
per-instruction probe slot.
### JSONL record fields
`pc, tid, hw, cycle, r3, r4, r5, r6, lr` — superset of what
xenia-canary's `--log_lr_on_pc` patch emits, with a cycle counter added
for cross-run reproducibility.
### What this layer does NOT do
- Does not capture VMX / FP register state (only GPRs r3..r6).
- Does not buffer / batch records — one `write_all` per fire. For
high-frequency probes (e.g. tight loops at >1M fires/sec), redirect
to a file and use a SSD.
### Determinism
Lockstep digest unaffected: probe firing happens after the per-instr
hooks for ctor/branch probes and only emits side-channel output. Verified
end-of-session: `check sylpheed.iso --stable-digest -n 2M` ×2 produced
byte-identical digests (`instructions=2000005`).
---
## Layer M5.5 — `this`-flow indirect-dispatch resolution (landed)
### Schema additions
- New table `vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function)` —
every detected `stw rVtable, vptr_off(rThis)` site.
- New table `indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count)` —
one row per resolved dispatch.
- New table `indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address)` —
one row per (dispatch × candidate vtable). Joined to existing
`xrefs.kind='ind_call'` edges (one ind_call row per candidate).
- New indices on `vptr_writes.vtable_address`, `vptr_writes.vptr_offset`,
`indirect_dispatch_candidates.method_address`,
`indirect_dispatch_candidates.vtable_address`,
`indirect_dispatch_sites.(vptr_offset, slot)`.
### What this layer does (class-membership inference)
1. **Phase 1 — vptr-write scan**: walk every function with the lis+addi
tracker; whenever `stw rA, off(rB)` writes a known M3 vtable address,
record `(vtable_addr, vptr_offset, writer_pc)`.
2. **Phase 2 — invert**: build `vtables_by_offset[vptr_off] = {V}` for the
set of vtables ever written at that offset.
3. **Phase 3 — dispatch detection**: walk back ≤16 instructions from each
`bcctrl`/`bctr LK=1`, find the canonical
`lwz vt, off(this); lwz fn, slot*4(vt); mtctr fn` chain. Extract
`(vptr_off, slot)`. Bail on register clobber, branch, or label
boundary.
4. **Phase 4 — emit**: for each `(dispatch_pc, vptr_off, slot)`, emit one
`xrefs.kind='ind_call'` row per candidate vtable that has a
matching slot. Multi-candidate rows are an over-approximation.
### What this layer does NOT do
- No alias resolution at multi-candidate sites — emits one edge per
matching vtable. Downstream queries should filter
`indirect_dispatch_sites WHERE candidate_count=1` for high-confidence
edges.
- No flow-sensitive analysis: register state is killed at every label
(basic-block boundary) and at `bl`/`bcl` calls (volatile r0..r12 +
ctr). We do NOT propagate values across calls in the chain-walker.
- No tracking of vptr writes via X-form indexed (`stwx`), VMX, or
multiword stores. Only D-form `stw rA, off(rB)`.
- Does not synthesise vptr writes for inlined / elided constructors.
If a class never has a writer at offset `vptr_off`, dispatches
through that offset find no candidates.
### Sylpheed yield
- 567 vptr writes covering 214 distinct vtables (~30% of M3's 722).
- 29 distinct vptr offsets used; offset 0 dominates (501/567 = 88%,
single-inheritance).
- **6,842 dispatch sites resolved**: 97 single-candidate
(high-confidence) + 6,745 multi-candidate (over-approximation).
- 687,963 `ind_call` xref rows total.
- **2,746 newly-reachable functions** via the M5 BFS view
(`v_indirect_reachability_from_entry`) compared to call/j/br alone.
- Audit-009 cluster (renderer plateau): functions newly visible
include `0x823BC9E0`, `0x823BC290`, `0x823BC5A0`, `0x823BB158`,
`0x823BB1E0`, `0x823BCAF0`, `0x823BC4C8` — actionable starting
points for the cluster's reachability hunt.
### Reference docs
- IBM PowerPC ABI (volatile/non-volatile register partition).
- Itanium C++ ABI on vtable layout (offset-from-`this` model adapted
by MSVC for Win32 PPC).
## Layer M9.5 — `__CxxFrameHandler` scope-table parsing (landed)
### Schema additions
- New table `eh_funcinfo(address PK, magic, max_state, p_unwind_map,
n_try_blocks, p_try_block_map, n_ip_map_entries, p_ip_to_state_map,
p_es_type_list, eh_flags)`.
- New table `eh_unwind_map(funcinfo_address, state_index, to_state, action_pc,
PRIMARY KEY (funcinfo_address, state_index))`.
- New table `eh_try_blocks(funcinfo_address, try_index, try_low, try_high,
catch_high, n_catches, p_handler_array,
PRIMARY KEY (funcinfo_address, try_index))`.
### What this layer does
- Magic-scans `.rdata` for the documented MSVC FuncInfo signatures
(0x19930520 / 0x19930521 / 0x19930522), reading 4-byte BE values
on 4-byte alignment.
- Sanity-checks `max_state` ≤ 10,000, `n_try_blocks` ≤ 1,000, all
internal pointers landing in valid sections.
- Walks `pUnwindMap` (8-byte UnwindMapEntry) and `pTryBlockMap`
(20-byte TryBlockMapEntry) into one row each.
### What this layer does NOT do
- Does not associate FuncInfo records with their owning function via
the `bl __CxxFrameHandler` registration site — joins to `functions`
by best-effort PC-range queries. A future M9.6 can chase the
registration to make the link explicit.
- Does not parse `pHandlerArray` (per-try-block catch type info).
### Sylpheed yield
- 2,588 FuncInfo records (all version 0x19930522).
- 10,019 unwind-map entries.
- 315 try-blocks across the binary.
## Layer M11.5 — Static-init driver chain detection (landed)
### Schema additions
- Reuses existing `function_pointer_arrays` table — drivers' arrays are
emitted with `kind='static_init'`, replacing M11's prologue-heuristic
output where the structurally-grounded pattern fires.
### What this layer does
- Walks every detected function looking for the canonical `_initterm`-
style loop: `lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4`
bounded by a comparison against another constant register.
- Extracts `(array_start, array_end)` from the cursor's initial
constant value and the end-comparand register.
- Reads the array, validates each entry against
`func_analysis.functions`, and emits the array as `static_init`.
### What this layer does NOT do
- Doesn't handle drivers with multiple back-to-back trampoline loops.
- Doesn't follow `_initterm_e` return-status semantics — both
`_initterm` and `_initterm_e` match if the loop body matches.
### Sylpheed yield
- 0 drivers detected. Sylpheed's static-init structure does not match
the canonical CRT loop pattern; the binary likely calls ctors via
another mechanism (inline at the entry point, or via a different
driver shape). Infrastructure ready for any binary with the
documented MSVC pattern.
## Layer VMX — Vector-store xrefs (M6 follow-up, landed)
Extends the M6 X-form opcode-31 dispatch in `xref.rs` with AltiVec/VMX
vector loads and stores. New entries (XO codes):
- `lvx` (103), `lvxl` (359), `lvebx` (7), `lvehx` (39), `lvewx` (71)
— `addr_mode='x_form_indexed'`, `kind='read'`.
- `stvx` (231), `stvxl` (487), `stvebx` (135), `stvehx` (167),
`stvewx` (199) — `addr_mode='x_form_indexed'`, `kind='write'`.
Same constraint as M6: rows emitted only when both `rA` and `rB`
resolve to known constants (rare but useful).
### Sylpheed yield
- 110 `stvx` writes newly resolved.
## Layer SJIS+UTF-8 — Localised-string detection (M7 follow-up, landed)
Extends `xenia_analysis::strings::analyze` with two additional scanners.
### Shift_JIS detection
Per JIS X 0208: lead byte ∈ [0x81, 0x9F] [0xE0, 0xEF];
trail byte ∈ [0x40, 0x7E] [0x80, 0xFC]. Single-byte ASCII and JIS
half-width katakana (0xA1..=0xDF) are passed through. At least one
multi-byte pair must be present (so we don't double-count pure ASCII).
SJIS bytes are rendered as `\\xHH` escapes in the `content` column for
diagnostic readability — full SJIS→UTF-8 decoding is a future
enhancement.
### UTF-8 detection
Validates 2-byte (`110xxxxx 10xxxxxx`) and 3-byte
(`1110xxxx 10xxxxxx 10xxxxxx`) sequences plus printable ASCII. Skips
4-byte (supplementary plane) which is rare in game text.
### Sylpheed yield
- 790 Shift_JIS strings (Japanese debug + UI text, including
`[WARNING] ードに割り当てるエフェクトIDの指定がない ノードデータが見つからない` style mission strings).
- 39 UTF-8 strings.
- 6,311 ASCII strings (unchanged from M7).
## Forward work (not yet landed)
- **M9.6** — link `eh_funcinfo` records back to their owning functions
via `bl __CxxFrameHandler` registration sites + per-try-block
`pHandlerArray` parsing.
- **M11.6** — relax M11.5 to detect non-canonical static-init driver
shapes (`_initterm_e` with status return, custom drivers).
- Full SJIS → UTF-8 decoding in the `strings.content` column.
- VMX128 (opcode 4) vector-store xrefs — separate encoding space, low
ROI; document if Sylpheed's renderer cluster uses it.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,277 @@
//! MSVC C++ name demangling for Xbox 360 binaries.
//!
//! Wraps [`msvc_demangler::demangle`] (a Rust port of LLVM's
//! `MicrosoftDemangle.cpp`) and splits the resulting human-readable string
//! into structured fields (namespace path, class name, method name, params
//! signature) for storage in the `demangled_names` DB table.
//!
//! The structured split is heuristic — it operates on the formatted output,
//! not the parsed AST. This is good enough for typical RTTI strings of the
//! form `?AVClassName@Namespace@@` and standard member functions; exotic
//! template / lambda forms degrade gracefully (the structured fields end up
//! `None` while `raw_demangled` retains the full LLVM-style output).
//!
//! Reference: <https://docs.rs/msvc-demangler> (LLVM `MicrosoftDemangle.cpp` port).
use msvc_demangler::DemangleFlags;
/// Structured view of one demangled MSVC symbol.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Demangled {
/// Original mangled string.
pub mangled: String,
/// Full LLVM-style demangled output (e.g. `xe::apu::AudioSystem::Setup(void)`).
pub raw_demangled: String,
/// `::`-joined namespace path leading up to the class, e.g. `xe::apu`. None
/// when the symbol is at global scope.
pub namespace_path: Option<String>,
/// Class name for member functions, e.g. `AudioSystem`. None when the
/// symbol is a free function.
pub class_name: Option<String>,
/// Method or free-function name, e.g. `Setup`. None when the heuristic
/// could not separate the name from the rest of the demangled string.
pub method_name: Option<String>,
/// Parameter signature without the surrounding parens, e.g. `void` or
/// `int, char *`. None when not a function or no `(...)` was found.
pub params_signature: Option<String>,
}
/// Demangle one mangled MSVC C++ symbol. Returns `None` if the input does not
/// start with `?` (early-out for non-mangled names) OR if the underlying
/// demangler fails to parse it. Callers that want a "best effort" record
/// (NULL fields + raw=mangled) should use [`demangle_or_raw`] instead.
pub fn demangle(mangled: &str) -> Option<Demangled> {
if !mangled.starts_with('?') {
return None;
}
let raw = msvc_demangler::demangle(mangled, DemangleFlags::llvm()).ok()?;
Some(split_structured(mangled.to_string(), raw))
}
/// Demangle, or fall back to a record that just carries the original mangled
/// string in `raw_demangled` and leaves all structured fields `None`. Useful
/// for DB insert paths that want one row per mangled input regardless of
/// parser success.
pub fn demangle_or_raw(mangled: &str) -> Demangled {
if let Some(d) = demangle(mangled) {
return d;
}
Demangled {
mangled: mangled.to_string(),
raw_demangled: mangled.to_string(),
namespace_path: None,
class_name: None,
method_name: None,
params_signature: None,
}
}
/// Split a fully-formatted demangled string into structured fields.
///
/// Strategy:
/// 1. Find the first un-nested `(` — everything before it is the qualified
/// name; everything inside the matching parens is `params_signature`.
/// 2. Strip leading return-type tokens before the qualified name (everything
/// up to the LAST whitespace not inside `<...>` or `(...)` brackets).
/// 3. Split the qualified name on `::` (top-level only) — last segment is
/// `method_name`, second-to-last is `class_name`, the rest joined back
/// with `::` is `namespace_path`.
fn split_structured(mangled: String, raw: String) -> Demangled {
let raw_view = raw.as_str();
let (qualified_name, params) = match find_paren_split(raw_view) {
Some((before, inside)) => (before.trim_end().to_string(), Some(inside.to_string())),
None => (raw_view.to_string(), None),
};
// Drop any return-type prefix: keep everything after the last top-level
// whitespace boundary (where "top-level" means depth-0 in <...>/(...)).
let qname_clean = strip_return_type_prefix(&qualified_name);
let (namespace_path, class_name, method_name) = split_qname(&qname_clean);
Demangled {
mangled,
raw_demangled: raw,
namespace_path,
class_name,
method_name,
params_signature: params,
}
}
/// Returns `(text_before_paren, text_inside_outer_parens)` for the first
/// top-level `(` in `s`. Returns `None` when no top-level paren is present.
fn find_paren_split(s: &str) -> Option<(&str, &str)> {
let bytes = s.as_bytes();
let mut depth_angle: i32 = 0;
for (i, &b) in bytes.iter().enumerate() {
match b {
b'<' => depth_angle += 1,
b'>' if depth_angle > 0 => depth_angle -= 1,
b'(' if depth_angle == 0 => {
// Find matching close at depth 0 on parens.
let mut depth_paren = 1i32;
let mut depth_angle2 = 0i32;
for (j, &b2) in bytes.iter().enumerate().skip(i + 1) {
match b2 {
b'<' => depth_angle2 += 1,
b'>' if depth_angle2 > 0 => depth_angle2 -= 1,
b'(' => depth_paren += 1,
b')' => {
depth_paren -= 1;
if depth_paren == 0 {
return Some((&s[..i], &s[i + 1..j]));
}
}
_ => {}
}
}
return None;
}
_ => {}
}
}
None
}
/// Strip a leading return-type token (everything up to and including the
/// last top-level whitespace). E.g. `void __cdecl Foo::Bar` → `Foo::Bar`.
fn strip_return_type_prefix(s: &str) -> String {
let bytes = s.as_bytes();
let mut depth_angle: i32 = 0;
let mut depth_paren: i32 = 0;
let mut last_ws_at: Option<usize> = None;
for (i, &b) in bytes.iter().enumerate() {
match b {
b'<' => depth_angle += 1,
b'>' if depth_angle > 0 => depth_angle -= 1,
b'(' => depth_paren += 1,
b')' if depth_paren > 0 => depth_paren -= 1,
b' ' if depth_angle == 0 && depth_paren == 0 => last_ws_at = Some(i),
_ => {}
}
}
match last_ws_at {
Some(i) => s[i + 1..].to_string(),
None => s.to_string(),
}
}
/// Split a fully-qualified name on top-level `::` and tag the parts.
fn split_qname(qname: &str) -> (Option<String>, Option<String>, Option<String>) {
if qname.is_empty() {
return (None, None, None);
}
let parts = top_level_split_colon_colon(qname);
match parts.len() {
0 => (None, None, None),
1 => (None, None, Some(parts[0].clone())),
2 => (None, Some(parts[0].clone()), Some(parts[1].clone())),
_ => {
let n = parts.len();
let method = parts[n - 1].clone();
let class = parts[n - 2].clone();
let ns = parts[..n - 2].join("::");
(Some(ns), Some(class), Some(method))
}
}
}
/// Split on top-level `::` — `::` inside `<...>` or `(...)` is preserved.
fn top_level_split_colon_colon(s: &str) -> Vec<String> {
let bytes = s.as_bytes();
let mut depth_angle: i32 = 0;
let mut depth_paren: i32 = 0;
let mut out: Vec<String> = Vec::new();
let mut start = 0usize;
let mut i = 0usize;
while i < bytes.len() {
let b = bytes[i];
match b {
b'<' => depth_angle += 1,
b'>' if depth_angle > 0 => depth_angle -= 1,
b'(' => depth_paren += 1,
b')' if depth_paren > 0 => depth_paren -= 1,
b':' if depth_angle == 0
&& depth_paren == 0
&& i + 1 < bytes.len()
&& bytes[i + 1] == b':' =>
{
out.push(s[start..i].to_string());
start = i + 2;
i += 2;
continue;
}
_ => {}
}
i += 1;
}
out.push(s[start..].to_string());
out.into_iter().filter(|p| !p.is_empty()).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn early_out_on_non_mangled() {
assert!(demangle("plain_c_name").is_none());
assert!(demangle("Foo::Bar").is_none());
}
#[test]
fn demangle_or_raw_records_failures() {
let d = demangle_or_raw("not_mangled");
assert_eq!(d.mangled, "not_mangled");
assert_eq!(d.raw_demangled, "not_mangled");
assert!(d.method_name.is_none());
}
#[test]
fn simple_member_function() {
// ?Setup@AudioSystem@apu@xe@@QEAAXXZ → public: __cdecl xe::apu::AudioSystem::Setup(void)
let d = demangle("?Setup@AudioSystem@apu@xe@@QEAAXXZ").expect("should parse");
assert_eq!(d.method_name.as_deref(), Some("Setup"));
assert_eq!(d.class_name.as_deref(), Some("AudioSystem"));
assert_eq!(d.namespace_path.as_deref(), Some("xe::apu"));
assert_eq!(d.params_signature.as_deref(), Some("void"));
}
#[test]
fn rtti_type_descriptor_string() {
// RTTI TypeDescriptor mangled name format: ".?AVClassName@@" → "class ClassName".
// We strip the leading "." and call demangle on the "?AV…" part below in M3.
// For now confirm the demangler handles the minimal class form.
let d = demangle("?AVAudioSystem@apu@xe@@").expect("should parse");
assert!(
d.raw_demangled.contains("AudioSystem"),
"raw='{}'",
d.raw_demangled
);
}
#[test]
fn split_qname_handles_namespace_chain() {
let (ns, cls, m) = split_qname("a::b::c::Klass::method");
assert_eq!(ns.as_deref(), Some("a::b::c"));
assert_eq!(cls.as_deref(), Some("Klass"));
assert_eq!(m.as_deref(), Some("method"));
}
#[test]
fn paren_split_handles_template_in_args() {
// Templates inside the param list must not confuse paren matching.
let s = "void __cdecl Foo::Bar(std::vector<int>, std::map<a, b>)";
let (before, inside) = find_paren_split(s).expect("paren found");
assert_eq!(before, "void __cdecl Foo::Bar");
assert_eq!(inside, "std::vector<int>, std::map<a, b>");
}
#[test]
fn double_colon_inside_template_not_split() {
let parts = top_level_split_colon_colon("a::b<c::d>::e");
assert_eq!(parts, vec!["a", "b<c::d>", "e"]);
}
}

View File

@@ -0,0 +1,51 @@
//! Analysis-side enrichment over [`xenia_cpu::disasm::iter_disasm`].
//!
//! Turns a stream of decoder-only [`xenia_cpu::disasm::DisasmItem`]s into a
//! stream of [`RichDisasmItem`]s carrying section name + enclosing function +
//! label name. The three sinks in [`crate::sinks`] (text, JSON, DuckDB) all
//! consume `RichDisasmItem`.
use std::collections::HashMap;
use xenia_cpu::disasm::DisasmItem;
use crate::func::FuncAnalysis;
/// `DisasmItem` plus the analysis context (section/function/label).
#[derive(Debug, Clone)]
pub struct RichDisasmItem<'a> {
pub item: DisasmItem,
pub section: &'a str,
pub function: Option<u32>,
pub label: Option<&'a str>,
}
/// Walk one code section, yielding rich items annotated with section name,
/// rolling-window enclosing function, and label-at-address.
///
/// The `function` field tracks the most recent function-start the iterator
/// has crossed — matching the legacy `current_func` behaviour in
/// `db.rs::insert_instructions_streaming`.
pub fn enrich_section<'a>(
image: &'a [u8],
image_base: u32,
section_name: &'a str,
va_start: u32,
va_end: u32,
func_analysis: &'a FuncAnalysis,
labels: &'a HashMap<u32, String>,
) -> impl Iterator<Item = RichDisasmItem<'a>> + 'a {
let mut current_func: Option<u32> = None;
xenia_cpu::disasm::iter_disasm(image, image_base, va_start, va_end).map(move |item| {
if func_analysis.is_function_start(item.addr) {
current_func = Some(item.addr);
}
let label = labels.get(&item.addr).map(|s| s.as_str());
RichDisasmItem {
item,
section: section_name,
function: current_func,
label,
}
})
}

View File

@@ -0,0 +1,296 @@
//! M9.5 — MSVC `__CxxFrameHandler` scope-table parsing.
//!
//! When MSVC compiles C++ try/catch on Win32 PowerPC, the compiler emits
//! per-function `FuncInfo` records in `.rdata` containing the scope-state
//! tables that `__CxxFrameHandler` walks during unwinding. Each record
//! starts with one of the documented magic numbers:
//!
//! - `0x19930520` — original FuncInfo (no aligned-state-array)
//! - `0x19930521` — adds `pESTypeList` field
//! - `0x19930522` — adds `EHFlags` field
//!
//! Layout (4-byte little-endian on x86; **on Xbox 360 PowerPC PE the
//! struct is big-endian** because the binary is BE throughout):
//!
//! ```text
//! +0x00 uint32 magicNumber (one of 0x199305{20,21,22})
//! +0x04 int32 maxState (number of UnwindMapEntry rows)
//! +0x08 uint32 pUnwindMap (VA → UnwindMapEntry[])
//! +0x0C uint32 nTryBlocks
//! +0x10 uint32 pTryBlockMap (VA → TryBlockMapEntry[])
//! +0x14 uint32 nIPMapEntries (ignored on x86; present on PPC)
//! +0x18 uint32 pIPtoStateMap (VA → IPtoStateMapEntry[])
//! +0x1C uint32 pESTypeList (only when magic ≥ 0x19930521)
//! +0x20 uint32 EHFlags (only when magic = 0x19930522)
//! ```
//!
//! Each `UnwindMapEntry` is 8 bytes: `(toState i32, action u32)`.
//! Each `TryBlockMapEntry` is 20 bytes:
//! `(tryLow i32, tryHigh i32, catchHigh i32, nCatches u32, pHandlerArray u32)`.
//!
//! ### What this module does
//!
//! - Magic-scan `.rdata` for the three FuncInfo signatures (read as BE u32).
//! - Parse the FuncInfo record + walk the unwind map and try-block map.
//! - Skip records whose internal pointers don't land in valid sections,
//! or whose lengths exceed sane caps.
//!
//! ### What this module does NOT do
//!
//! - Does not associate a FuncInfo back to its owning function. The
//! `bl __CxxFrameHandler` registration would name that linkage, but
//! it requires walking all `has_eh=true` functions' prologues; a
//! future M9.6 can do that. For now the FuncInfo record stands on its
//! own — joins to `functions` by best-effort PC range queries.
//! - Does not parse the `pHandlerArray` per try-block (catch type info).
//!
//! Reference: LLVM `llvm/lib/CodeGen/AsmPrinter/WinException.cpp`,
//! Microsoft openrce.org documentation on FuncInfo.
use xenia_xex::pe::PeSection;
const MAGIC_OLD: u32 = 0x1993_0520;
const MAGIC_V21: u32 = 0x1993_0521;
const MAGIC_V22: u32 = 0x1993_0522;
#[derive(Debug, Clone, Copy)]
pub struct UnwindMapEntry {
pub to_state: i32,
pub action_pc: u32, // VA of the cleanup action; 0 if none
}
#[derive(Debug, Clone, Copy)]
pub struct TryBlockMapEntry {
pub try_low: i32,
pub try_high: i32,
pub catch_high: i32,
pub n_catches: u32,
pub p_handler_array: u32,
}
#[derive(Debug, Clone)]
pub struct EhFuncInfo {
pub address: u32, // VA of the FuncInfo record itself
pub magic: u32,
pub max_state: i32,
pub p_unwind_map: u32,
pub n_try_blocks: u32,
pub p_try_block_map: u32,
pub n_ip_map_entries: u32,
pub p_ip_to_state_map: u32,
pub p_es_type_list: Option<u32>,
pub eh_flags: Option<u32>,
pub unwind_map: Vec<UnwindMapEntry>,
pub try_blocks: Vec<TryBlockMapEntry>,
}
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
) -> Vec<EhFuncInfo> {
let started = std::time::Instant::now();
let mut out: Vec<EhFuncInfo> = Vec::new();
// Compute the union of valid VA ranges across all sections — used to
// sanity-check internal pointers in the FuncInfo records.
let valid_ranges: Vec<(u32, u32)> = sections.iter()
.map(|s| (image_base + s.virtual_address,
image_base + s.virtual_address + s.virtual_size))
.collect();
let in_valid = |va: u32| valid_ranges.iter().any(|(lo, hi)| va >= *lo && va < *hi);
let read_u32 = |abs: u32| -> Option<u32> {
let off = abs.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
};
let read_i32 = |abs: u32| -> Option<i32> { read_u32(abs).map(|u| u as i32) };
for section in sections {
if section.name != ".rdata" { continue; }
let raw_start = section.virtual_address as usize;
let raw_end = (section.virtual_address + section.virtual_size) as usize;
if raw_end > pe.len() { continue; }
let bytes = &pe[raw_start..raw_end.min(pe.len())];
let va_base = image_base + section.virtual_address;
// Walk on 4-byte alignment looking for the magic.
let mut i = 0;
while i + 4 <= bytes.len() {
if !i.is_multiple_of(4) { i += 1; continue; }
let m = u32::from_be_bytes([bytes[i], bytes[i + 1], bytes[i + 2], bytes[i + 3]]);
if m == MAGIC_OLD || m == MAGIC_V21 || m == MAGIC_V22 {
let addr = va_base + i as u32;
if let Some(rec) = parse_funcinfo(addr, m, &read_u32, &read_i32, &in_valid) {
out.push(rec);
}
}
i += 4;
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
let n_unwind: usize = out.iter().map(|r| r.unwind_map.len()).sum();
let n_try: usize = out.iter().map(|r| r.try_blocks.len()).sum();
metrics::histogram!("analysis.phase_ms", "phase" => "eh_scope").record(elapsed_ms);
tracing::info!(
records = out.len(),
unwind_entries = n_unwind,
try_blocks = n_try,
elapsed_ms,
"M9.5 EH scope-table scan complete",
);
out
}
fn parse_funcinfo(
addr: u32,
magic: u32,
read_u32: &impl Fn(u32) -> Option<u32>,
read_i32: &impl Fn(u32) -> Option<i32>,
in_valid: &impl Fn(u32) -> bool,
) -> Option<EhFuncInfo> {
let max_state = read_i32(addr + 0x04)?;
let p_unwind_map = read_u32(addr + 0x08)?;
let n_try_blocks = read_u32(addr + 0x0C)?;
let p_try_block_map = read_u32(addr + 0x10)?;
let n_ip_map_entries = read_u32(addr + 0x14)?;
let p_ip_to_state_map = read_u32(addr + 0x18)?;
// Sanity caps: real FuncInfo records have max_state ≤ a few thousand,
// n_try_blocks ≤ a few hundred. Reject obviously bogus values that
// happened to alias the magic.
if !(0..=10_000).contains(&max_state) { return None; }
if n_try_blocks > 1_000 { return None; }
if n_ip_map_entries > 100_000 { return None; }
// Pointers must either be NULL or land in a valid section.
if p_unwind_map != 0 && !in_valid(p_unwind_map) { return None; }
if p_try_block_map != 0 && !in_valid(p_try_block_map) { return None; }
if p_ip_to_state_map != 0 && !in_valid(p_ip_to_state_map) { return None; }
let (p_es_type_list, eh_flags) = if magic == MAGIC_V21 {
(read_u32(addr + 0x1C), None)
} else if magic == MAGIC_V22 {
(read_u32(addr + 0x1C), read_u32(addr + 0x20))
} else {
(None, None)
};
// Walk unwind map (8-byte entries).
let mut unwind_map: Vec<UnwindMapEntry> = Vec::with_capacity(max_state as usize);
if p_unwind_map != 0 && max_state > 0 {
for i in 0..max_state {
let p = p_unwind_map.wrapping_add((i * 8) as u32);
let to_state = read_i32(p)?;
let action_pc = read_u32(p + 4)?;
unwind_map.push(UnwindMapEntry { to_state, action_pc });
}
}
// Walk try-block map (20-byte entries).
let mut try_blocks: Vec<TryBlockMapEntry> = Vec::with_capacity(n_try_blocks as usize);
if p_try_block_map != 0 && n_try_blocks > 0 {
for i in 0..n_try_blocks {
let p = p_try_block_map.wrapping_add(i * 20);
let try_low = read_i32(p)?;
let try_high = read_i32(p + 4)?;
let catch_high = read_i32(p + 8)?;
let n_catches = read_u32(p + 12)?;
let p_handler_a = read_u32(p + 16)?;
try_blocks.push(TryBlockMapEntry {
try_low, try_high, catch_high, n_catches, p_handler_array: p_handler_a,
});
}
}
Some(EhFuncInfo {
address: addr,
magic,
max_state,
p_unwind_map,
n_try_blocks,
p_try_block_map,
n_ip_map_entries,
p_ip_to_state_map,
p_es_type_list,
eh_flags,
unwind_map,
try_blocks,
})
}
#[cfg(test)]
mod tests {
use super::*;
use xenia_xex::pe::PeSection;
fn mk_section(name: &str, va: u32, size: u32) -> PeSection {
PeSection {
name: name.into(),
virtual_address: va, virtual_size: size,
raw_offset: va, raw_size: size,
flags: 0x4000_0040,
}
}
fn write_be(pe: &mut [u8], at: usize, v: u32) {
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
}
fn write_be_i32(pe: &mut [u8], at: usize, v: i32) {
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
}
#[test]
fn parses_minimal_funcinfo_v0() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let mut pe = vec![0u8; 0x4000];
// FuncInfo at .rdata + 0x10.
let fi_off = (rdata_va + 0x10) as usize;
let fi_va = image_base + rdata_va + 0x10;
let unwind_off = (rdata_va + 0x80) as usize;
let unwind_va = image_base + rdata_va + 0x80;
write_be(&mut pe, fi_off, MAGIC_OLD); // magic
write_be_i32(&mut pe, fi_off + 4, 2); // maxState
write_be(&mut pe, fi_off + 8, unwind_va); // pUnwindMap
write_be(&mut pe, fi_off + 12, 0); // nTryBlocks
write_be(&mut pe, fi_off + 16, 0); // pTryBlockMap
write_be(&mut pe, fi_off + 20, 0); // nIPMapEntries
write_be(&mut pe, fi_off + 24, 0); // pIPtoStateMap
// Two unwind entries.
write_be_i32(&mut pe, unwind_off, -1); // to_state
write_be(&mut pe, unwind_off + 4, image_base + 0x500); // action_pc
write_be_i32(&mut pe, unwind_off + 8, 0);
write_be(&mut pe, unwind_off + 12, image_base + 0x600);
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
let recs = analyze(&pe, image_base, &sections);
assert_eq!(recs.len(), 1);
let r = &recs[0];
assert_eq!(r.address, fi_va);
assert_eq!(r.magic, MAGIC_OLD);
assert_eq!(r.max_state, 2);
assert_eq!(r.unwind_map.len(), 2);
assert_eq!(r.unwind_map[0].to_state, -1);
assert_eq!(r.unwind_map[0].action_pc, image_base + 0x500);
assert_eq!(r.try_blocks.len(), 0);
}
#[test]
fn rejects_bogus_max_state() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let mut pe = vec![0u8; 0x4000];
let fi_off = (rdata_va + 0x10) as usize;
write_be(&mut pe, fi_off, MAGIC_OLD);
write_be_i32(&mut pe, fi_off + 4, 0xFFFF); // bogus maxState
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
let recs = analyze(&pe, image_base, &sections);
assert_eq!(recs.len(), 0);
}
}

View File

@@ -6,8 +6,10 @@ use std::io::Write;
use xenia_xex::header::ImportLibrary;
use xenia_xex::pe::PeSection;
use crate::disasm::enrich_section;
use crate::func::FuncAnalysis;
use crate::xref::{XrefKind, Xref, XrefMap, section_for_addr, resolve_source_label};
use crate::sinks::text::write_instr_line;
use crate::xref::{XrefKind, Xref, XrefMap, resolve_source_label};
/// Metadata passed to the formatter (avoids exposing full Xex2Header internals).
pub struct DisasmInfo<'a> {
@@ -88,11 +90,14 @@ pub fn write_asm(
writeln!(out)?;
let mut in_function = false;
let mut addr = va_start;
while addr < va_end {
let abs_addr = info.image_base + addr;
let off = (addr - va_start) as usize + file_start;
if off + 4 > pe.len() { break; }
let abs_start = info.image_base + va_start;
let abs_end = info.image_base + va_end;
let items = enrich_section(
pe, info.image_base, &section.name, abs_start, abs_end, func_analysis, labels,
);
for ri in items {
let abs_addr = ri.item.addr;
// Function start? Emit separator + header
if let Some(fi) = func_analysis.get(abs_addr) {
@@ -126,7 +131,6 @@ pub fn write_asm(
writeln!(out, "; FUNCTION: {lbl}{detail_str}")?;
}
// Xrefs for function entry
if let Some(xref_lines) = format_xrefs(abs_addr, xrefs, func_analysis, labels) {
for line in &xref_lines {
writeln!(out, "{line}")?;
@@ -141,7 +145,6 @@ pub fn write_asm(
if let Some(lbl) = labels.get(&abs_addr) {
if !func_analysis.is_function_start(abs_addr) {
writeln!(out)?;
// Xrefs for local labels
if let Some(xref_lines) = format_xrefs(abs_addr, xrefs, func_analysis, labels) {
for line in &xref_lines {
writeln!(out, "{line}")?;
@@ -159,37 +162,8 @@ pub fn write_asm(
writeln!(out, " ; IMPORT: {imp_name}")?;
}
let instr = u32::from_be_bytes([
pe[off], pe[off+1], pe[off+2], pe[off+3]
]);
let decoded = crate::ppc::disasm(instr, abs_addr);
let disasm_text = decoded.display().to_string();
// Annotate branch targets with label names
let mut annotated = annotate_branch(&disasm_text, labels);
// Annotate data references
if let Some(&(data_addr, kind)) = data_annotations.get(&abs_addr) {
let tag = match kind {
XrefKind::DataRead => "[R]",
XrefKind::DataWrite => "[W]",
_ => "[&]",
};
let sec = section_for_addr(data_addr, info.sections, info.image_base)
.unwrap_or("?");
let data_lbl = labels.get(&data_addr)
.map(|s| format!(" = {s}"))
.unwrap_or_default();
if !annotated.contains("; ->") {
annotated = format!("{annotated:<40} ; {tag} 0x{data_addr:08X} ({sec}){data_lbl}");
} else {
annotated = format!("{annotated} {tag} 0x{data_addr:08X} ({sec}){data_lbl}");
}
}
writeln!(out, " {:08X}: {:08X} {}", abs_addr, instr, annotated)?;
addr += 4;
let data_annot = data_annotations.get(&abs_addr).copied();
write_instr_line(out, &ri, labels, info.sections, info.image_base, data_annot)?;
}
if in_function {
writeln!(out, "; end function")?;
@@ -298,21 +272,3 @@ fn format_xrefs(
Some(lines)
}
fn annotate_branch(disasm: &str, labels: &HashMap<u32, String>) -> String {
if let Some(pos) = disasm.find("0x") {
let hex_start = pos + 2;
let hex_end = disasm[hex_start..].find(|c: char| !c.is_ascii_hexdigit())
.map(|i| hex_start + i)
.unwrap_or(disasm.len());
let hex_str = &disasm[hex_start..hex_end];
if hex_str.len() == 8 {
if let Ok(addr) = u32::from_str_radix(hex_str, 16) {
if let Some(lbl) = labels.get(&addr) {
return format!("{disasm:<40} ; -> {lbl}");
}
}
}
}
disasm.to_string()
}

View File

@@ -32,6 +32,17 @@ pub struct FuncInfo {
pub is_leaf: bool,
/// True if this is a save/restore GPR helper stub.
pub is_saverestore: bool,
/// True if `.pdata` has a RUNTIME_FUNCTION whose `BeginAddress` matches `start`.
/// Authoritative ground truth from the linker; rows without this flag are
/// prologue-detected only and may carry boundary errors.
pub pdata_validated: bool,
/// Function size in bytes per `.pdata`'s `function_length` field, if known.
/// Absent (None) when this row is prologue-only.
pub pdata_length: Option<u32>,
/// True when `.pdata`'s exception-flag bit is set on this entry — the
/// function has a registered C++ EH (or SEH) frame handler. Always false
/// for entries without `.pdata` coverage. (M9)
pub has_eh: bool,
}
/// Result of the function analysis pass.
@@ -42,6 +53,9 @@ pub struct FuncAnalysis {
pub save_gpr_base: Option<u32>,
/// Addresses in the restore-GPR region (start of __restgprlr block).
pub restore_gpr_base: Option<u32>,
/// Raw `.pdata` entries from the binary, in original order. Empty when no
/// `.pdata` was supplied. Mirrored into the DB as `pdata_entries`.
pub pdata_entries: Vec<xenia_xex::pdata::PdataEntry>,
}
// ── Instruction field helpers ──────────────────────────────────────────────
@@ -184,12 +198,37 @@ fn find_saverestore_stubs(
// ── Main analysis ──────────────────────────────────────────────────────────
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base), entry_point = format_args!("{:#010x}", entry_point)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
entry_point: u32,
code_sections: &[(u32, u32, u32)], // (va_start, va_size, flags)
) -> FuncAnalysis {
analyze_with_pdata(pe, image_base, entry_point, code_sections, &[])
}
/// Same as [`analyze`] but also unions `.pdata` `RUNTIME_FUNCTION` entries
/// into the candidate set. Each surviving function carries `pdata_validated`
/// when its start matches a pdata `BeginAddress`, and `pdata_length` when
/// the linker-supplied length disagrees with the prologue walk.
///
/// Pdata entries that have no prologue match (orphans) are still emitted,
/// using the linker-supplied length to bound the function.
///
/// What this layer does NOT do:
/// - Does not edit the `prolog_length` we'd derive from prologue analysis;
/// `frame_size` and `saved_gprs` remain best-effort prologue inferences.
/// - Does not infer base/derived call edges — that's M3+M5.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base), entry_point = format_args!("{:#010x}", entry_point), pdata_entries = pdata.len()))]
pub fn analyze_with_pdata(
pe: &[u8],
image_base: u32,
entry_point: u32,
code_sections: &[(u32, u32, u32)],
pdata: &[xenia_xex::pdata::PdataEntry],
) -> FuncAnalysis {
let started = std::time::Instant::now();
let code_ranges: Vec<(u32, u32)> = code_sections.iter()
.map(|(va, sz, _)| (image_base + va, image_base + va + sz))
.collect();
@@ -197,10 +236,10 @@ pub fn analyze(
// 1. Find save/restore stubs
let (save_base, restore_base) = find_saverestore_stubs(pe, image_base, &code_ranges);
if let Some(sb) = save_base {
eprintln!("[func] __savegprlr stub at 0x{sb:08X}");
tracing::debug!(addr = format_args!("{:#010x}", sb), "__savegprlr stub");
}
if let Some(rb) = restore_base {
eprintln!("[func] __restgprlr stub at 0x{rb:08X}");
tracing::debug!(addr = format_args!("{:#010x}", rb), "__restgprlr stub");
}
// Set of addresses in the save/restore region (to exclude from function detection)
@@ -214,32 +253,79 @@ pub fn analyze(
for i in 0..21 { saverestore_addrs.insert(rb + i * 4); }
}
// 2. Collect all bl targets as candidate function entries
// 2. Collect all bl targets as candidate function entries.
// Union: bl targets pdata BeginAddresses entry_point.
let mut call_targets: HashSet<u32> = HashSet::new();
call_targets.insert(entry_point);
for &(start, end) in &code_ranges {
let mut addr = start;
while addr < end {
if let Some(instr) = read_instr(pe, addr, image_base) {
if let Some(target) = bl_target(instr, addr) {
if let Some(instr) = read_instr(pe, addr, image_base)
&& let Some(target) = bl_target(instr, addr) {
// Don't count calls into save/restore stubs as function entries
if !saverestore_addrs.contains(&target) {
call_targets.insert(target);
}
}
}
addr += 4;
}
}
eprintln!("[func] {} bl targets (candidate functions)", call_targets.len());
// 3. For each candidate, detect prologue and walk to epilogue
// Index pdata by begin_address for O(1) prologue → length lookup.
let pdata_by_begin: HashMap<u32, &xenia_xex::pdata::PdataEntry> =
pdata.iter().map(|e| (e.begin_address, e)).collect();
for e in pdata {
if !saverestore_addrs.contains(&e.begin_address) {
call_targets.insert(e.begin_address);
}
}
tracing::debug!(
candidates = call_targets.len(),
pdata_entries = pdata.len(),
"function candidates (bl pdata)"
);
// 3. For each candidate, detect prologue and walk to epilogue. Pdata
// metadata is layered on after the prologue walk so a missing prologue
// still yields an entry when pdata covers it.
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
for &func_addr in &call_targets {
if let Some(fi) = analyze_function(pe, image_base, func_addr, &code_ranges, save_base, restore_base) {
let pdata_entry = pdata_by_begin.get(&func_addr).copied();
if let Some(mut fi) = analyze_function(
pe, image_base, func_addr, &code_ranges, save_base, restore_base,
) {
if let Some(p) = pdata_entry {
fi.pdata_validated = true;
fi.pdata_length = Some(p.function_length);
// bit 0 of the packed flags = exception-handler-present
fi.has_eh = (p.flags & 0x2) != 0;
// If the prologue walk ended too early, trust pdata's length.
let pdata_end = p.begin_address.wrapping_add(p.function_length);
if pdata_end > fi.end {
fi.end = pdata_end;
}
}
functions.insert(func_addr, fi);
} else if let Some(p) = pdata_entry {
// Orphan: pdata claims a function here but no prologue matched.
// Emit a synthetic entry so the row exists for downstream queries.
functions.insert(
func_addr,
FuncInfo {
start: func_addr,
end: p.begin_address.wrapping_add(p.function_length),
frame_size: 0,
saved_gprs: 0,
is_leaf: false,
is_saverestore: false,
pdata_validated: true,
pdata_length: Some(p.function_length),
has_eh: (p.flags & 0x2) != 0,
},
);
}
}
@@ -247,6 +333,7 @@ pub fn analyze(
if let Some(sb) = save_base {
// The save block is one cascade: entry at each rN, falls through to blr
// Treat as a single function with the first entry point
let pe_sb = pdata_by_begin.get(&sb).copied();
functions.insert(sb, FuncInfo {
start: sb,
end: sb + 20 * 4, // 18 std + stw r12 + blr
@@ -254,9 +341,13 @@ pub fn analyze(
saved_gprs: 18,
is_leaf: true,
is_saverestore: true,
pdata_validated: pe_sb.is_some(),
pdata_length: pe_sb.map(|p| p.function_length),
has_eh: pe_sb.map(|p| (p.flags & 0x2) != 0).unwrap_or(false),
});
}
if let Some(rb) = restore_base {
let pe_rb = pdata_by_begin.get(&rb).copied();
functions.insert(rb, FuncInfo {
start: rb,
end: rb + 21 * 4, // 18 ld + lwz r12 + mtspr LR + blr
@@ -264,15 +355,43 @@ pub fn analyze(
saved_gprs: 18,
is_leaf: true,
is_saverestore: true,
pdata_validated: pe_rb.is_some(),
pdata_length: pe_rb.map(|p| p.function_length),
has_eh: pe_rb.map(|p| (p.flags & 0x2) != 0).unwrap_or(false),
});
}
eprintln!("[func] {} functions detected", functions.len());
// 5. Fix up `end_address` collisions: if function A's `end` overlaps
// function B's `start` (B > A), trim A. This catches mis-merged
// prologue walks where pdata revealed an interleaved second prologue.
// We do this in a single forward pass.
let starts: Vec<u32> = functions.keys().copied().collect();
for i in 0..starts.len().saturating_sub(1) {
let cur = starts[i];
let next = starts[i + 1];
if let Some(fi) = functions.get_mut(&cur)
&& fi.end > next
{
fi.end = next;
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
metrics::histogram!("analysis.phase_ms", "phase" => "functions").record(elapsed_ms);
let pdata_validated_count = functions.values().filter(|f| f.pdata_validated).count();
tracing::info!(
functions = functions.len(),
pdata_entries = pdata.len(),
pdata_validated = pdata_validated_count,
elapsed_ms,
"function detection complete"
);
FuncAnalysis {
functions,
save_gpr_base: save_base,
restore_gpr_base: restore_base,
pdata_entries: pdata.to_vec(),
}
}
@@ -302,15 +421,13 @@ fn analyze_function(
let instr1 = read_instr(pe, func_addr + 4, image_base).unwrap_or(0);
// Check if next is bl to save stub
if let Some(target) = bl_target(instr1, func_addr + 4) {
if let Some(sb) = save_base {
if target >= sb && target < sb + 18 * 4 {
if let Some(target) = bl_target(instr1, func_addr + 4)
&& let Some(sb) = save_base
&& target >= sb && target < sb + 18 * 4 {
let idx = (target - sb) / 4;
saved_gprs = 18 - idx;
prologue_len = 8;
}
}
}
// Next should be stwu r1, -N(r1)
let stwu_instr = read_instr(pe, func_addr + prologue_len, image_base).unwrap_or(0);
@@ -356,14 +473,12 @@ fn analyze_function(
}
// Epilogue: b __restgprlr_NN (tail branch into restore stub)
if let Some(target) = b_target(instr, addr) {
if let Some(rb) = restore_base {
if target >= rb && target < rb + 18 * 4 {
if let Some(target) = b_target(instr, addr)
&& let Some(rb) = restore_base
&& target >= rb && target < rb + 18 * 4 {
end_addr = addr + 4;
break;
}
}
}
// Epilogue: bctr (indirect tail call — end of function)
if is_bctr(instr) {
@@ -392,6 +507,9 @@ fn analyze_function(
saved_gprs,
is_leaf,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
})
}
@@ -407,24 +525,22 @@ impl FuncAnalysis {
for (&addr, fi) in &self.functions {
if fi.is_saverestore {
// Label the block start, plus individual register entry points
if let Some(sb) = self.save_gpr_base {
if addr == sb {
if let Some(sb) = self.save_gpr_base
&& addr == sb {
for i in 0u32..18 {
let reg = 14 + i;
labels.insert(sb + i * 4, format!("__savegprlr_{reg}"));
}
continue;
}
}
if let Some(rb) = self.restore_gpr_base {
if addr == rb {
if let Some(rb) = self.restore_gpr_base
&& addr == rb {
for i in 0u32..18 {
let reg = 14 + i;
labels.insert(rb + i * 4, format!("__restgprlr_{reg}"));
}
continue;
}
}
}
labels.insert(addr, format!("sub_{addr:08X}"));
}

View File

@@ -0,0 +1,257 @@
//! Generic function-pointer array detection (M8 + M11).
//!
//! M3 already detects "vtable" candidates — runs of ≥3 contiguous function
//! pointers in `.rdata` / `.data` (with COL/RTTI walk on top). This module
//! widens the net:
//!
//! - **Dispatch tables** (M8): runs of ≥2 function pointers in `.rdata` /
//! `.data` that are NOT already classified as vtables. Captures switch
//! jump tables, callback registries, command tables, gameplay state
//! machines, etc.
//! - **Static initialiser tables** (M11): function-pointer arrays in
//! `.rdata` whose entries all have classic constructor-like prologues
//! (small frame; either leaf or calling well-known runtime helpers).
//! The MSVC convention names the bracketing symbols `__xc_a` /
//! `__xc_z` (C++ ctors) and `__xi_a` / `__xi_z` (C runtime), but the
//! names are stripped from Sylpheed; we classify by structure.
//!
//! All findings are written to a single `function_pointer_arrays` table
//! with a `kind` column — `"vtable"`, `"dispatch_table"`, or `"static_init"`.
//! Vtable rows are duplicated from M3's `vtables` table for join
//! convenience (so a single query covers all classification kinds).
//!
//! ### What this module does NOT do
//!
//! - No alias-based classification — `static_init` is heuristic and may
//! include any function-pointer array near the binary's `__xc_*` region.
//! - Does not parse the bracket symbols' actual addresses — we'd need
//! debug symbols, which Sylpheed doesn't ship.
//! - Two-element runs in `.data` are common false positives (struct fields
//! that happen to alias function entries); we only emit `dispatch_table`
//! rows for `.rdata`.
use std::collections::BTreeSet;
use xenia_xex::pe::PeSection;
use crate::vtables::Vtable;
/// One detected function-pointer array.
#[derive(Debug, Clone)]
pub struct FuncPtrArray {
pub address: u32,
pub length: u32,
pub kind: &'static str, // "vtable" | "dispatch_table" | "static_init"
/// Array entries (function VAs).
pub entries: Vec<u32>,
}
/// Run the pass. `vtables` is the M3 result — those addresses are skipped
/// in the dispatch-table scan to avoid duplication. `function_starts` is
/// the M1 corrected function-start set (used to validate that each array
/// entry actually points at a known function).
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
function_starts: &BTreeSet<u32>,
vtables: &[Vtable],
) -> Vec<FuncPtrArray> {
let started = std::time::Instant::now();
let vtable_addrs: BTreeSet<u32> = vtables.iter().map(|v| v.address).collect();
let mut out: Vec<FuncPtrArray> = Vec::new();
// Re-emit vtables in this table for unified-query convenience.
for v in vtables {
out.push(FuncPtrArray {
address: v.address,
length: v.length,
kind: "vtable",
entries: v.methods.clone(),
});
}
// Scan only .rdata for dispatch tables — .data has too many false
// positives from struct fields aliasing function VAs.
for section in sections {
if section.name != ".rdata" { continue; }
let raw_start = section.virtual_address as usize;
let raw_end = (section.virtual_address + section.virtual_size) as usize;
if raw_end > pe.len() { continue; }
let bytes = &pe[raw_start..raw_end.min(pe.len())];
let va_base = image_base + section.virtual_address;
let mut i = 0usize;
while i + 8 <= bytes.len() {
if !i.is_multiple_of(4) { i += 1; continue; }
let mut entries: Vec<u32> = Vec::new();
let mut j = i;
while j + 4 <= bytes.len() {
let val = u32::from_be_bytes([bytes[j], bytes[j + 1], bytes[j + 2], bytes[j + 3]]);
if function_starts.contains(&val) {
entries.push(val);
j += 4;
} else {
break;
}
}
if entries.len() >= 2 {
let address = va_base + (i as u32);
if !vtable_addrs.contains(&address) {
let kind = classify_run(image_base, &entries, pe);
out.push(FuncPtrArray {
address,
length: entries.len() as u32,
kind,
entries,
});
}
i += j - i;
} else {
i += 4;
}
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
let n_vt = out.iter().filter(|a| a.kind == "vtable").count();
let n_dt = out.iter().filter(|a| a.kind == "dispatch_table").count();
let n_si = out.iter().filter(|a| a.kind == "static_init").count();
metrics::histogram!("analysis.phase_ms", "phase" => "funcptr_arrays").record(elapsed_ms);
tracing::info!(
total = out.len(), vtable = n_vt, dispatch_table = n_dt, static_init = n_si,
elapsed_ms,
"function-pointer array scan complete",
);
out
}
/// Classify a non-vtable function-pointer array. Currently distinguishes
/// only "static_init" (all entries have constructor-like prologues — a
/// brief mfspr+stwu prologue with a small frame) from "dispatch_table"
/// (anything else).
fn classify_run(image_base: u32, entries: &[u32], pe: &[u8]) -> &'static str {
// Heuristic: a static initialiser's prologue is small (frame ≤ 0x80,
// typically ≤ 0x40). If every entry's first instruction is mfspr+LR
// (opcode 31, xo 339, spr 8) followed by a small stwu, classify as
// static_init.
let mut all_ctor = true;
let mut any_ctor = false;
for &fn_va in entries {
if !is_ctor_like(pe, image_base, fn_va) {
all_ctor = false;
} else {
any_ctor = true;
}
}
if all_ctor && any_ctor && entries.len() >= 3 {
"static_init"
} else {
"dispatch_table"
}
}
/// True if the function at `fn_va` looks like a tiny C++ static initialiser:
/// `mfspr r12, LR` immediately followed by `stwu r1, -N(r1)` with `N ≤ 0x80`.
fn is_ctor_like(pe: &[u8], image_base: u32, fn_va: u32) -> bool {
let off = fn_va.wrapping_sub(image_base) as usize;
if off + 8 > pe.len() { return false; }
let i0 = u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]);
let i1 = u32::from_be_bytes([pe[off + 4], pe[off + 5], pe[off + 6], pe[off + 7]]);
// i0: mfspr rD, LR — opcode 31, xo 339, spr 8.
let op0 = i0 >> 26;
let xo0 = (i0 >> 1) & 0x3FF;
let spr0 = (((i0 >> 11) & 0x1F) << 5) | ((i0 >> 16) & 0x1F);
if !(op0 == 31 && xo0 == 339 && spr0 == 8) { return false; }
// i1 must be stwu r1, -N(r1) with N ≤ 0x80, OR a `bl __savegprlr_*`
// followed eventually by stwu (full prologue). Allow either.
let op1 = i1 >> 26;
if op1 == 37 {
// stwu D-form: rS=1, rA=1
let rs = (i1 >> 21) & 0x1F;
let ra = (i1 >> 16) & 0x1F;
let d = ((i1 & 0xFFFF) as i16) as i32;
rs == 1 && ra == 1 && d <= 0 && (-d) <= 0x80
} else if op1 == 18 {
// bl __savegprlr_NN — accept; ctor with frame ≤ 0x80 is the
// common case, but if the compiler emits a save-stub call we
// can't easily verify the frame size without walking further.
true
} else {
false
}
}
#[cfg(test)]
mod tests {
use super::*;
use xenia_xex::pe::PeSection;
fn mk_section(name: &str, va: u32, size: u32) -> PeSection {
PeSection {
name: name.into(),
virtual_address: va,
virtual_size: size,
raw_offset: va,
raw_size: size,
flags: 0x4000_0040,
}
}
fn write_be_u32(buf: &mut [u8], at: usize, val: u32) {
buf[at..at + 4].copy_from_slice(&val.to_be_bytes());
}
#[test]
fn detects_dispatch_table_in_rdata() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let mut pe = vec![0u8; 0x4000];
// Two consecutive function pointers, no vtable shadowing them.
let pcs = [image_base + 0x2000, image_base + 0x2010];
for (i, p) in pcs.iter().enumerate() {
write_be_u32(&mut pe, rdata_va as usize + i * 4, *p);
}
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
let mut starts = BTreeSet::new();
for &p in &pcs { starts.insert(p); }
let arrs = analyze(&pe, image_base, &sections, &starts, &[]);
assert_eq!(arrs.len(), 1);
assert_eq!(arrs[0].kind, "dispatch_table");
assert_eq!(arrs[0].length, 2);
}
#[test]
fn vtable_overrides_dispatch_classification() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let mut pe = vec![0u8; 0x4000];
let pcs = [image_base + 0x2000, image_base + 0x2010, image_base + 0x2020];
for (i, p) in pcs.iter().enumerate() {
write_be_u32(&mut pe, rdata_va as usize + i * 4, *p);
}
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
let mut starts = BTreeSet::new();
for &p in &pcs { starts.insert(p); }
let vt = Vtable {
address: image_base + rdata_va,
length: 3,
col_address: None,
class_name: "ANON_test".into(),
rtti_present: false,
base_classes_json: None,
methods: pcs.to_vec(),
};
let arrs = analyze(&pe, image_base, &sections, &starts, &[vt]);
// Vtable + (no dispatch-table dup): the M3 vtable is re-emitted, but
// the scan also skips the same address from re-classification.
assert_eq!(arrs.len(), 1);
assert_eq!(arrs[0].kind, "vtable");
}
}

View File

@@ -0,0 +1,636 @@
//! M5.5 — `this`-flow indirect-dispatch resolution.
//!
//! M5 only resolved the canonical `lis+addi → lwz off(vt) → mtctr → bcctrl`
//! pattern (vtable address materialised statically; rare in real C++).
//! This layer closes the dominant case, where the dispatch reads through
//! the object's `vptr` field:
//!
//! ```text
//! lwz rVt, vptr_off(this) ; rVt = this->vptr
//! ... ; (rVt not clobbered)
//! lwz rFn, slot*4(rVt) ; rFn = vtable[slot]
//! ... ; (rFn / ctr not clobbered)
//! mtctr rFn
//! ...
//! bcctrl
//! ```
//!
//! Resolution strategy (class-membership inference):
//!
//! 1. **Phase 1 — vptr-write scan.** Walk every function with a tiny
//! register tracker (mirrors the lis+addi propagation in
//! `xenia_analysis::xref`). Whenever a `stw rA, off(rB)` writes a
//! known M3 vtable address into `off(rB)`, record
//! `(vtable_addr, vptr_offset, writer_pc)`. These are constructor-
//! side vptr stores.
//!
//! 2. **Phase 2 — invert by offset.** Build
//! `vtables_by_offset[vptr_off] = set of vtables ever written at
//! that offset`. Most classes use offset 0 (single inheritance);
//! multiple-inheritance secondary vptrs land at non-zero offsets.
//!
//! 3. **Phase 3 — dispatch-site scan.** For each `bcctrl`, walk back
//! up to 16 instructions looking for the canonical sequence,
//! extracting `(vptr_off, slot)`. Bail on any clobber of the
//! tracked register, on any branch instruction, or on a label
//! boundary.
//!
//! 4. **Phase 4 — emit edges.** For each detected
//! `(dispatch_pc, vptr_off, slot)`:
//! - Look up all candidate vtables `V` where:
//! - `vtables_by_offset[vptr_off]` contains `V`, AND
//! - `V.length > slot`
//! - Emit one `ind_call` edge from `dispatch_pc` to
//! `V.methods[slot]` per candidate.
//!
//! Multi-candidate sites are an over-approximation: the analysis can't
//! distinguish without alias info which of the matching classes the
//! `this` register actually holds. Downstream queries can filter by
//! the exposed `candidate_count` column — single-candidate edges are
//! high-confidence, multi-candidate edges are reachability-only.
//!
//! ### What this layer does NOT do
//!
//! - No flow-sensitive analysis: register state is killed at every
//! label (basic-block boundary), and we do not propagate values
//! across calls (since the ABI's volatile/non-volatile partition is
//! unreliable for `this`-pointer chains).
//! - No alias resolution: a multi-candidate site emits one edge per
//! matching vtable, not the exact one used at runtime.
//! - Does not handle vptr writes via X-form indexed stores (`stwx`)
//! or VMX/VMX128 stores — only D-form `stw rA, off(rB)`. The MSVC
//! compiler uses D-form for all canonical vptr writes we've seen.
//! - Does not synthesise vptr writes for inlined / elided constructors.
//! If a class never has a writer at offset `vptr_off`, dispatches
//! through that offset will not find candidates.
//!
//! Reference: IBM PowerPC ABI, Itanium C++ ABI on vtable layout (the
//! same offset-from-`this` model applies on Win32 PPC).
use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
use crate::func::FuncAnalysis;
use crate::vtables::Vtable;
/// One detected dispatch site after typed resolution.
#[derive(Debug, Clone)]
pub struct TypedDispatch {
pub dispatch_pc: u32,
pub vptr_offset: u32,
pub slot: u32,
/// Set of candidate vtable addresses whose `(vptr_offset, slot)` matched.
pub candidate_vtables: Vec<u32>,
/// Set of resolved method PCs (one per candidate vtable).
pub method_pcs: Vec<u32>,
}
/// Result of the M5.5 pass.
#[derive(Debug, Default)]
pub struct TypedIndirectResult {
pub dispatches: Vec<TypedDispatch>,
/// Phase-1 raw output, exposed for diagnostics.
pub vptr_writes: Vec<VptrWrite>,
}
/// One detected constructor-side vptr write.
#[derive(Debug, Clone, Copy)]
pub struct VptrWrite {
pub vtable_addr: u32,
pub vptr_offset: u32,
pub writer_pc: u32,
pub writer_function: u32,
}
const OP_ADDI: u32 = 14;
const OP_ADDIS: u32 = 15;
const OP_BCCTR: u32 = 19;
const OP_LWZ: u32 = 32;
const OP_ORI: u32 = 24;
const OP_STW: u32 = 36;
const OP_X_FORM: u32 = 31;
/// Run the full M5.5 analysis.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
func_analysis: &FuncAnalysis,
vtables: &[Vtable],
labels: &HashMap<u32, String>,
) -> TypedIndirectResult {
let started = std::time::Instant::now();
let vtable_addrs: BTreeSet<u32> = vtables.iter().map(|v| v.address).collect();
let vtable_by_addr: BTreeMap<u32, &Vtable> =
vtables.iter().map(|v| (v.address, v)).collect();
let block_boundaries: HashSet<u32> = labels.keys().copied().collect();
// Phase 1: scan for vptr writes.
let vptr_writes = scan_vptr_writes(
pe, image_base, func_analysis, &vtable_addrs, &block_boundaries,
);
// Phase 2: invert by offset.
let mut vtables_by_offset: HashMap<u32, HashSet<u32>> = HashMap::new();
for w in &vptr_writes {
vtables_by_offset.entry(w.vptr_offset).or_default().insert(w.vtable_addr);
}
// Phase 3 + 4: scan dispatches and emit edges.
let dispatches = scan_dispatches_and_resolve(
pe, image_base, func_analysis, &block_boundaries,
&vtables_by_offset, &vtable_by_addr,
);
let elapsed_ms = started.elapsed().as_millis() as f64;
let single_candidate = dispatches.iter().filter(|d| d.candidate_vtables.len() == 1).count();
let multi_candidate = dispatches.len() - single_candidate;
let total_edges: usize = dispatches.iter().map(|d| d.method_pcs.len()).sum();
metrics::histogram!("analysis.phase_ms", "phase" => "ind_dispatch_typed").record(elapsed_ms);
tracing::info!(
vptr_writes = vptr_writes.len(),
offsets = vtables_by_offset.len(),
dispatches = dispatches.len(),
single = single_candidate,
multi = multi_candidate,
edges = total_edges,
elapsed_ms,
"M5.5 typed indirect-dispatch scan complete",
);
TypedIndirectResult { dispatches, vptr_writes }
}
fn read_instr(pe: &[u8], image_base: u32, addr: u32) -> Option<u32> {
let off = addr.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
}
/// Phase 1 — find every `stw rA, off(rB)` where the lis+addi-tracked
/// value of `rA` equals a known vtable address.
fn scan_vptr_writes(
pe: &[u8],
image_base: u32,
func_analysis: &FuncAnalysis,
vtable_addrs: &BTreeSet<u32>,
block_boundaries: &HashSet<u32>,
) -> Vec<VptrWrite> {
let mut writes: Vec<VptrWrite> = Vec::new();
for (&fn_start, fi) in &func_analysis.functions {
if fi.is_saverestore { continue; }
let mut reg: [Option<u32>; 32] = [None; 32];
let mut pc = fn_start;
while pc < fi.end {
if pc != fn_start && block_boundaries.contains(&pc) {
reg = [None; 32];
}
let Some(instr) = read_instr(pe, image_base, pc) else { break };
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
let ra = ((instr >> 16) & 0x1F) as usize;
let simm = ((instr & 0xFFFF) as i16) as i32;
let uimm = instr & 0xFFFF;
match op {
OP_ADDIS if ra == 0 => reg[rd] = Some(uimm << 16),
OP_ADDIS => {
reg[rd] = reg[ra].map(|b| b.wrapping_add(uimm << 16));
}
OP_ADDI if ra != 0 => {
reg[rd] = reg[ra].map(|b| b.wrapping_add(simm as u32));
}
OP_ADDI => reg[rd] = Some(simm as u32),
OP_ORI => {
let rs = rd;
reg[ra] = reg[rs].map(|b| b | uimm);
}
OP_STW => {
// `stw rS, off(rA)` — rS in bits 21..25, rA in 16..20.
if ra != 0
&& let Some(vtable_addr) = reg[rd]
&& vtable_addrs.contains(&vtable_addr)
{
// The vptr offset is the displacement; rB's value
// is irrelevant for class-membership inference.
writes.push(VptrWrite {
vtable_addr,
vptr_offset: simm as u32,
writer_pc: pc,
writer_function: fn_start,
});
}
// stw doesn't write to rD.
}
OP_LWZ => reg[rd] = None,
32..=35 | 40..=43 | 48..=51 => reg[rd] = None,
OP_X_FORM => {
let xo = (instr >> 1) & 0x3FF;
if xo != 444 && xo != 467 { reg[rd] = None; }
}
18 => {
// `bl` (LK=1) clobbers volatile r0..r12 + ctr. Plain
// `b` makes the next instruction unreachable; the
// label-based reset handles join points.
if (instr & 1) != 0 {
for r in 0..=12 { reg[r] = None; }
}
}
16 => {
if (instr & 1) != 0 {
for r in 0..=12 { reg[r] = None; }
}
}
_ => {}
}
pc = pc.wrapping_add(4);
}
}
writes
}
/// Phase 3 + 4 — scan every `bcctrl`/`bctr` instruction; for each, walk
/// backward up to 16 instructions to find the canonical
/// `lwz vt, vptr_off(this); lwz fn, slot(vt); mtctr fn; bcctrl` sequence.
/// Emit one `TypedDispatch` per dispatch site that resolves to ≥ 1
/// candidate vtable.
fn scan_dispatches_and_resolve(
pe: &[u8],
image_base: u32,
func_analysis: &FuncAnalysis,
block_boundaries: &HashSet<u32>,
vtables_by_offset: &HashMap<u32, HashSet<u32>>,
vtable_by_addr: &BTreeMap<u32, &Vtable>,
) -> Vec<TypedDispatch> {
let mut out: Vec<TypedDispatch> = Vec::new();
for (&fn_start, fi) in &func_analysis.functions {
if fi.is_saverestore { continue; }
let mut pc = fn_start;
while pc < fi.end {
let Some(instr) = read_instr(pe, image_base, pc) else { break };
let op = instr >> 26;
if op == OP_BCCTR {
let xo = (instr >> 1) & 0x3FF;
let lk = (instr & 1) != 0;
if xo == 528 && lk
&& let Some(d) = try_resolve_dispatch_site(
pe, image_base, fn_start, fi.end, pc,
block_boundaries, vtables_by_offset, vtable_by_addr,
)
{
out.push(d);
}
}
pc = pc.wrapping_add(4);
}
}
out
}
/// Backwards scan from `bcctrl` at `pc` (looking back at most 16 instrs
/// within the same basic block). Returns `Some(_)` only when the full
/// `lwz vt, off(rA); lwz fn, slot(vt); mtctr fn` chain is present and the
/// `(vptr_off, slot)` pair has at least one candidate vtable.
fn try_resolve_dispatch_site(
pe: &[u8],
image_base: u32,
fn_start: u32,
_fn_end: u32,
bcctrl_pc: u32,
block_boundaries: &HashSet<u32>,
vtables_by_offset: &HashMap<u32, HashSet<u32>>,
vtable_by_addr: &BTreeMap<u32, &Vtable>,
) -> Option<TypedDispatch> {
const LOOKBACK: u32 = 16;
// Walk back 1..LOOKBACK instrs to find `mtctr rFn`.
let mut mtctr_rs: Option<usize> = None;
let mut mtctr_pc: Option<u32> = None;
for i in 1..=LOOKBACK {
let p = bcctrl_pc.wrapping_sub(i * 4);
if p < fn_start { break; }
if block_boundaries.contains(&p) { break; }
let Some(instr) = read_instr(pe, image_base, p) else { break };
let op = instr >> 26;
if op == OP_X_FORM {
let xo = (instr >> 1) & 0x3FF;
if xo == 467 {
let spr = (((instr >> 11) & 0x1F) << 5) | ((instr >> 16) & 0x1F);
if spr == 9 {
mtctr_rs = Some(((instr >> 21) & 0x1F) as usize);
mtctr_pc = Some(p);
break;
}
}
}
}
let mtctr_rs = mtctr_rs?;
let mtctr_pc = mtctr_pc?;
// Walk back from mtctr to find `lwz rFn, slot(rVt)` defining mtctr_rs.
let mut slot: Option<u32> = None;
let mut vt_reg: Option<usize> = None;
let mut fn_lwz_pc: Option<u32> = None;
for i in 1..=LOOKBACK {
let p = mtctr_pc.wrapping_sub(i * 4);
if p < fn_start { break; }
if block_boundaries.contains(&p) { break; }
let Some(instr) = read_instr(pe, image_base, p) else { break };
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
if op == OP_LWZ {
if rd == mtctr_rs {
let ra = ((instr >> 16) & 0x1F) as usize;
if ra == 0 { return None; }
let off = ((instr & 0xFFFF) as i16) as i32;
if off < 0 || (off % 4) != 0 { return None; }
slot = Some((off as u32) / 4);
vt_reg = Some(ra);
fn_lwz_pc = Some(p);
break;
}
// Other lwz; if it writes our target reg, it's a clobber, but
// the loop already keys on the lwz that produces the value, so
// no clobber check needed beyond seeing rd == mtctr_rs.
} else if writes_reg(instr, mtctr_rs as u32) {
return None;
}
}
let slot = slot?;
let vt_reg = vt_reg?;
let fn_lwz_pc = fn_lwz_pc?;
// Walk back from the fn-lwz to find `lwz rVt, vptr_off(rThis)` defining vt_reg.
let mut vptr_off: Option<u32> = None;
for i in 1..=LOOKBACK {
let p = fn_lwz_pc.wrapping_sub(i * 4);
if p < fn_start { break; }
if block_boundaries.contains(&p) { break; }
let Some(instr) = read_instr(pe, image_base, p) else { break };
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
if op == OP_LWZ && rd == vt_reg {
let ra = ((instr >> 16) & 0x1F) as usize;
if ra == 0 { return None; }
let off = ((instr & 0xFFFF) as i16) as i32;
// Negative offsets are valid in C++ (multiple inheritance casts
// can produce them in some ABIs); reinterpret as u32 wrap.
vptr_off = Some(off as u32);
break;
}
if writes_reg(instr, vt_reg as u32) {
return None;
}
}
let vptr_off = vptr_off?;
// Phase 4 — resolve to candidate vtables.
let candidates = vtables_by_offset.get(&vptr_off)?;
let mut candidate_vtables: Vec<u32> = Vec::new();
let mut method_pcs: Vec<u32> = Vec::new();
for &vt_addr in candidates {
if let Some(vt) = vtable_by_addr.get(&vt_addr)
&& vt.length > slot
&& let Some(&method_pc) = vt.methods.get(slot as usize)
{
candidate_vtables.push(vt_addr);
method_pcs.push(method_pc);
}
}
if method_pcs.is_empty() { return None; }
Some(TypedDispatch {
dispatch_pc: bcctrl_pc,
vptr_offset: vptr_off,
slot,
candidate_vtables,
method_pcs,
})
}
/// Conservative "does this instruction write to register `r`" predicate.
/// Used to detect register clobbers between the value-producing lwz and
/// its consumer.
fn writes_reg(instr: u32, r: u32) -> bool {
let op = instr >> 26;
let rd = (instr >> 21) & 0x1F;
let _ra = (instr >> 16) & 0x1F;
match op {
// Most arithmetic / load opcodes use bits 21..25 = rD/rT.
14 | 15 | 32..=43 | 46 | 48..=51 => rd == r,
// ori/oris/xor/etc. opcodes 24..29 — rA in bits 16..20 is the dest.
24 | 25 | 26 | 27 | 28 | 29 => ((instr >> 16) & 0x1F) == r,
// X-form: most write rD; some write rA. Check both, conservatively.
OP_X_FORM => {
let xo = (instr >> 1) & 0x3FF;
// Logical X-form (and/or/xor/etc.): rA is the dest.
// Logical X-form ops (and/or/xor/etc.) write rA, not rD.
if matches!(xo, 26 | 28 | 60 | 124 | 284 | 316 | 444 | 476 | 536 | 539 | 922 | 954) {
((instr >> 16) & 0x1F) == r
} else {
rd == r
}
}
_ => false,
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::func::FuncInfo;
use std::collections::BTreeMap;
fn mk_vtable(addr: u32, methods: Vec<u32>) -> Vtable {
Vtable {
address: addr,
length: methods.len() as u32,
col_address: None,
class_name: format!("ANON_{addr:08X}"),
rtti_present: false,
base_classes_json: None,
methods,
}
}
fn mk_func_analysis(start: u32, len: u32) -> FuncAnalysis {
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(start, FuncInfo {
start,
end: start + len,
frame_size: 0,
saved_gprs: 0,
is_leaf: false,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
});
FuncAnalysis { functions, save_gpr_base: None, restore_gpr_base: None, pdata_entries: Vec::new() }
}
fn write_be(pe: &mut [u8], at: usize, v: u32) {
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
}
/// Encode a vptr-write site: `lis rN, hi(vt); addi rN, rN, lo(vt); stw rN, off(rOther)`.
fn enc_vptr_write(pe: &mut [u8], at: usize, vt: u32, write_off: i16, dest_reg: u32) {
let hi = (vt >> 16) as u16;
let lo = (vt & 0xFFFF) as i16;
let lis = (15u32 << 26) | (3 << 21) | 0 << 16 | (hi as u32);
let addi = (14u32 << 26) | (3 << 21) | (3 << 16) | ((lo as u16) as u32);
let stw = (36u32 << 26) | (3 << 21) | (dest_reg << 16) | ((write_off as u16) as u32);
write_be(pe, at, lis);
write_be(pe, at + 4, addi);
write_be(pe, at + 8, stw);
}
/// Encode a dispatch site:
/// lwz r4, vptr_off(r3) ; r4 = this->vptr
/// lwz r5, slot*4(r4) ; r5 = vptr[slot]
/// mtctr r5
/// bcctrl
fn enc_dispatch(pe: &mut [u8], at: usize, vptr_off: i16, slot: u32) {
let lwz_vt = (32u32 << 26) | (4 << 21) | (3 << 16) | ((vptr_off as u16) as u32);
let lwz_fn = (32u32 << 26) | (5 << 21) | (4 << 16) | ((slot * 4) & 0xFFFF);
// mtctr r5 = mtspr CTR(=9), r5: SPR_low (=9) → bits 16..20.
let mtctr = (31u32 << 26) | (5 << 21) | (9 << 16) | (467 << 1);
let bcctrl = (19u32 << 26) | (20 << 21) | (528 << 1) | 1;
write_be(pe, at, lwz_vt);
write_be(pe, at + 4, lwz_fn);
write_be(pe, at + 8, mtctr);
write_be(pe, at + 12, bcctrl);
}
#[test]
fn single_candidate_vtable_resolves_to_one_method() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
// Function A — constructor — at 0x82001000. Writes vt=0x82010000 at off=0.
let ctor_pc = 0x82001000u32;
enc_vptr_write(&mut pe, (ctor_pc - image_base) as usize, 0x82010000, 0, 31);
// Function B — dispatcher — at 0x82002000. Calls slot 2 of vptr at off 0.
let disp_pc = 0x82002000u32;
enc_dispatch(&mut pe, (disp_pc - image_base) as usize, 0, 2);
let bcctrl_pc = disp_pc + 12;
// Both functions in func_analysis (synthesise).
let mut fa = mk_func_analysis(ctor_pc, 0x40);
fa.functions.insert(disp_pc, FuncInfo {
start: disp_pc, end: disp_pc + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let vt = mk_vtable(0x82010000, vec![0xAA, 0xBB, 0xCC, 0xDD]);
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &fa, &[vt], &labels);
assert_eq!(r.vptr_writes.len(), 1);
assert_eq!(r.vptr_writes[0].vtable_addr, 0x82010000);
assert_eq!(r.vptr_writes[0].vptr_offset, 0);
assert_eq!(r.dispatches.len(), 1);
let d = &r.dispatches[0];
assert_eq!(d.dispatch_pc, bcctrl_pc);
assert_eq!(d.vptr_offset, 0);
assert_eq!(d.slot, 2);
assert_eq!(d.method_pcs, vec![0xCC]);
assert_eq!(d.candidate_vtables, vec![0x82010000]);
}
#[test]
fn multi_candidate_emits_one_edge_per_match() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
// Two ctors, each writing a different vtable at offset 0.
let ctor_a = 0x82001000u32;
enc_vptr_write(&mut pe, (ctor_a - image_base) as usize, 0x82010000, 0, 31);
let ctor_b = 0x82001100u32;
enc_vptr_write(&mut pe, (ctor_b - image_base) as usize, 0x82010040, 0, 31);
// One dispatch at slot 1.
let disp = 0x82002000u32;
enc_dispatch(&mut pe, (disp - image_base) as usize, 0, 1);
let mut fa = mk_func_analysis(ctor_a, 0x40);
fa.functions.insert(ctor_b, FuncInfo {
start: ctor_b, end: ctor_b + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
fa.functions.insert(disp, FuncInfo {
start: disp, end: disp + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let vts = vec![
mk_vtable(0x82010000, vec![0x11, 0x22, 0x33, 0x44]),
mk_vtable(0x82010040, vec![0x55, 0x66, 0x77, 0x88]),
];
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &fa, &vts, &labels);
assert_eq!(r.vptr_writes.len(), 2);
assert_eq!(r.dispatches.len(), 1);
let d = &r.dispatches[0];
assert_eq!(d.candidate_vtables.len(), 2);
assert!(d.method_pcs.contains(&0x22));
assert!(d.method_pcs.contains(&0x66));
}
#[test]
fn out_of_bounds_slot_yields_no_dispatch() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
let ctor = 0x82001000u32;
enc_vptr_write(&mut pe, (ctor - image_base) as usize, 0x82010000, 0, 31);
let disp = 0x82002000u32;
// slot 10 — vtable only has 4 methods.
enc_dispatch(&mut pe, (disp - image_base) as usize, 0, 10);
let mut fa = mk_func_analysis(ctor, 0x40);
fa.functions.insert(disp, FuncInfo {
start: disp, end: disp + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let vt = mk_vtable(0x82010000, vec![0x11, 0x22, 0x33, 0x44]);
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &fa, &[vt], &labels);
assert_eq!(r.dispatches.len(), 0);
}
#[test]
fn no_writer_at_offset_yields_no_dispatch() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
// ctor writes at offset 0
let ctor = 0x82001000u32;
enc_vptr_write(&mut pe, (ctor - image_base) as usize, 0x82010000, 0, 31);
// dispatch reads from offset 8 — no class writes vptr there.
let disp = 0x82002000u32;
enc_dispatch(&mut pe, (disp - image_base) as usize, 8, 1);
let mut fa = mk_func_analysis(ctor, 0x40);
fa.functions.insert(disp, FuncInfo {
start: disp, end: disp + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let vt = mk_vtable(0x82010000, vec![0x11, 0x22, 0x33, 0x44]);
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &fa, &[vt], &labels);
assert_eq!(r.dispatches.len(), 0);
}
}

View File

@@ -0,0 +1,471 @@
//! Indirect-dispatch reachability for vtable-bound `bcctrl`/`bctrl` sites.
//!
//! Walks each detected function with a tiny per-basic-block register tracker,
//! recognising the canonical MSVC PowerPC pattern that loads a slot from a
//! statically-addressed vtable into CTR and indirectly calls it:
//!
//! ```text
//! lis rA, hi
//! addi rA, rA, lo ; rA = vtable_address
//! lwz rB, slot*4(rA) ; rB = vtable[slot]
//! mtctr rB ; CTR = vtable[slot]
//! bcctrl ; indirect call → vtable[slot]
//! ```
//!
//! Pattern hits are emitted as `(source_pc, target_pc)` pairs that callers
//! insert into the `xrefs` table with `kind='ind_call'`.
//!
//! ### What this does NOT cover
//!
//! - Vtable pointer loaded from a `this`-pointer field (`lwz rA, off(this)`)
//! is the dominant pattern in real C++ code; resolving it requires
//! alias / points-to analysis that's far beyond this layer's scope.
//! - Indirect calls via function-pointer fields (callbacks) are similarly
//! unresolvable without object-flow analysis.
//! - Register state is intentionally killed at every label (basic-block
//! boundary) — we don't try to do flow-sensitive merging across joins.
//!
//! Reference: IBM PowerPC ABI on register-save convention, plus the
//! `xenia_analysis::xref` `lis+addi`/`lis+ori` tracker which we mirror
//! conceptually.
use std::collections::{BTreeMap, HashMap, HashSet};
use crate::func::FuncAnalysis;
use crate::vtables::Vtable;
/// One detected indirect-call edge: `bcctrl` at `source` jumps to `target`.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct IndirectEdge {
pub source: u32,
pub target: u32,
/// Vtable the source resolved through.
pub via_vtable: u32,
/// Method slot index within the vtable.
pub slot: u32,
}
#[derive(Debug, Clone, Copy)]
enum RegVal {
/// Register holds a known constant (e.g. after `lis+addi`).
Const(u32),
/// Register holds a method pointer loaded from a known vtable slot.
MethodPtr {
vtable_addr: u32,
slot: u32,
method_pc: u32,
},
}
const OP_ADDI: u32 = 14;
const OP_ADDIS: u32 = 15;
const OP_BCCTR: u32 = 19; // also covers blr — distinguish via XO
const OP_LWZ: u32 = 32;
const OP_ORI: u32 = 24;
const OP_X_FORM: u32 = 31; // mtspr / mr / etc.
/// Run the static indirect-dispatch scan. Returns one edge per resolvable
/// `bcctrl` site.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
func_analysis: &FuncAnalysis,
vtables: &[Vtable],
labels: &HashMap<u32, String>,
) -> Vec<IndirectEdge> {
let started = std::time::Instant::now();
// Index vtables by their start VA so the lwz handler can decide
// whether a given Const(addr) is "really" a vtable.
let vtable_by_addr: BTreeMap<u32, &Vtable> =
vtables.iter().map(|v| (v.address, v)).collect();
// Set of all "label"-bearing PCs in the analyzed binary. We treat each
// label as a basic-block boundary (anything `loc_*` is a jump target,
// so register state arriving at it is unreliable).
let mut block_boundaries: HashSet<u32> = HashSet::with_capacity(labels.len());
for &addr in labels.keys() {
block_boundaries.insert(addr);
}
let mut edges: Vec<IndirectEdge> = Vec::new();
for (&fn_start, fi) in &func_analysis.functions {
if fi.is_saverestore { continue; }
let mut reg: [Option<RegVal>; 32] = [None; 32];
let mut ctr: Option<RegVal> = None;
let mut pc = fn_start;
while pc < fi.end {
// Reset register state on basic-block entry. We don't reset on
// the function entry itself (PC == fn_start) because labels and
// function-starts coincide; the initial state is already None.
if pc != fn_start && block_boundaries.contains(&pc) {
reg = [None; 32];
ctr = None;
}
let instr = match read_instr(pe, image_base, pc) {
Some(i) => i,
None => break,
};
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
let ra = ((instr >> 16) & 0x1F) as usize;
let simm = ((instr & 0xFFFF) as i16) as i32;
let uimm = instr & 0xFFFF;
match op {
// lis rD, IMM (== addis rD, r0, IMM)
OP_ADDIS if ra == 0 => {
reg[rd] = Some(RegVal::Const(uimm << 16));
}
// addis rD, rA, IMM
OP_ADDIS => {
if let Some(RegVal::Const(b)) = reg[ra] {
reg[rd] = Some(RegVal::Const(b.wrapping_add(uimm << 16)));
} else {
reg[rd] = None;
}
}
// addi rD, rA, IMM
OP_ADDI if ra != 0 => {
if let Some(RegVal::Const(b)) = reg[ra] {
reg[rd] = Some(RegVal::Const(b.wrapping_add(simm as u32)));
} else {
reg[rd] = None;
}
}
// li rD, IMM (== addi rD, 0, IMM)
OP_ADDI => {
reg[rd] = Some(RegVal::Const(simm as u32));
}
// ori rA, rS, IMM — note operand order: bits 21..25 = rS, 16..20 = rA
OP_ORI => {
let rs = rd; // bits 21..25 = source
if let Some(RegVal::Const(b)) = reg[rs] {
reg[ra] = Some(RegVal::Const(b | uimm));
} else {
reg[ra] = None;
}
}
// lwz rD, off(rA) — try to resolve as vtable slot load.
OP_LWZ => {
if ra != 0
&& let Some(RegVal::Const(base)) = reg[ra]
{
let target = base.wrapping_add(simm as u32);
// Two-step lookup so we accept both:
// (a) base = exact vtable head, simm/4 = slot
// (b) base + simm = exact vtable head (rare;
// compiler hoists the slot offset into addi)
let resolved = resolve_vtable_slot(target, &vtable_by_addr)
.or_else(|| resolve_vtable_slot_via_off(base, simm, &vtable_by_addr));
reg[rd] = resolved.map(|(vt, slot, pc)| RegVal::MethodPtr {
vtable_addr: vt, slot, method_pc: pc,
});
} else {
reg[rd] = None;
}
}
// X-form: mtspr/mtctr, bcctrl, mr, etc.
OP_X_FORM => {
let xo = (instr >> 1) & 0x3FF;
match xo {
467 => {
// mtspr SPR, rS — PPC SPR field is split: high 5 bits
// in PPC bits 16:20 (= Rust bits 11..15), low 5 bits
// in PPC bits 11:15 (= Rust bits 16..20). Mirrors
// the convention in `func.rs::is_mfspr_lr`.
let spr = (((instr >> 11) & 0x1F) << 5) | ((instr >> 16) & 0x1F);
if spr == 9 {
ctr = reg[rd];
}
// Otherwise no observable effect on tracked state.
}
// Anything that writes rD (most arithmetic, loads, etc.) clobbers it.
// Conservative: invalidate rD on any X-form that has rD in bits 21..25
// and is NOT a comparison or branch.
_ => {
// Heuristic: most X-form ops with non-zero RC encode rD; we
// invalidate to avoid stale Const propagation past arithmetic.
// This is over-eager but safe (false negatives on edges, never
// false positives).
reg[rd] = None;
}
}
}
// bcctr/bcctrl — opcode 19, XO=528. LK in low bit.
OP_BCCTR => {
let xo = (instr >> 1) & 0x3FF;
if xo == 528 {
let lk = (instr & 1) != 0;
if lk
&& let Some(RegVal::MethodPtr { vtable_addr, slot, method_pc }) = ctr
{
edges.push(IndirectEdge {
source: pc,
target: method_pc,
via_vtable: vtable_addr,
slot,
});
}
// After the call, CTR is preserved but rD register
// values across the call boundary are not trustworthy.
// Don't touch reg state — most ABIs preserve only
// some regs anyway.
}
}
// op 18: b / bl / ba / bla. LK=1 is a call; LK=0 is an
// unconditional branch with no fall-through (next PC is
// reached only via a different basic block, which the
// label-based reset already handles). On a call, the
// PowerPC ABI marks r0..r12 + ctr as volatile and
// r13..r31 as non-volatile (callee-saved); preserve the
// non-volatile half so vtable pointers loaded into r30/r31
// before a `bl` survive the call.
18 => {
let lk = (instr & 1) != 0;
if lk {
for r in 0..=12 { reg[r] = None; }
ctr = None;
}
// LK=0 (`b`) makes fall-through unreachable; nothing to do —
// any next reachable PC will hit a label boundary.
}
// Conditional branches (op 16) fall through; preserve all reg
// state for the fall-through path. The label-based join-point
// invalidation bounds false-positive risk for jump-IN paths.
16 => {
let lk = (instr & 1) != 0;
if lk {
for r in 0..=12 { reg[r] = None; }
ctr = None;
}
}
// Stores and loads we don't track explicitly clobber rD only
// when rD is on the destination side; the conservative rule
// is "any non-recognised opcode that may write rD invalidates it".
36..=55 => {
// Loads write rD; stores don't. The safe pessimisation is
// to invalidate rD for the load family (32..=35, 40..=43, etc.)
// and leave it alone for stores. We've already handled lwz
// above; for the rest, invalidate rD.
if matches!(op, 32..=35 | 40..=43 | 48..=51) {
reg[rd] = None;
}
}
_ => {}
}
pc = pc.wrapping_add(4);
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
metrics::histogram!("analysis.phase_ms", "phase" => "indirect").record(elapsed_ms);
tracing::info!(
edges = edges.len(),
elapsed_ms,
"indirect-dispatch scan complete"
);
edges
}
fn read_instr(pe: &[u8], image_base: u32, addr: u32) -> Option<u32> {
let off = addr.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
}
/// `target = base + simm` where `target` is an exact vtable head (rare,
/// compiler hoists the slot offset into the addi).
fn resolve_vtable_slot_via_off(
base: u32,
simm: i32,
vtable_by_addr: &BTreeMap<u32, &Vtable>,
) -> Option<(u32, u32, u32)> {
let target = base.wrapping_add(simm as u32);
if let Some(v) = vtable_by_addr.get(&target)
&& !v.methods.is_empty()
{
return Some((v.address, 0, v.methods[0]));
}
None
}
/// `target` is an absolute address. If it falls inside a known vtable's
/// `[address, address + length*4)` range AND is 4-aligned to a slot,
/// return `(vtable_addr, slot, method_pc)`.
fn resolve_vtable_slot(
target: u32,
vtable_by_addr: &BTreeMap<u32, &Vtable>,
) -> Option<(u32, u32, u32)> {
// BTreeMap range search for the largest key ≤ target.
let (&vt_addr, vt) = vtable_by_addr.range(..=target).next_back()?;
if target < vt_addr { return None; }
let off = target - vt_addr;
if !off.is_multiple_of(4) { return None; }
let slot = off / 4;
if slot >= vt.length { return None; }
let method_pc = *vt.methods.get(slot as usize)?;
Some((vt_addr, slot, method_pc))
}
#[cfg(test)]
mod tests {
use super::*;
use crate::func::FuncInfo;
use std::collections::BTreeMap;
fn mk_vtable(addr: u32, methods: Vec<u32>) -> Vtable {
Vtable {
address: addr,
length: methods.len() as u32,
col_address: None,
class_name: "ANON_test".into(),
rtti_present: false,
base_classes_json: None,
methods,
}
}
/// Encode the canonical pattern at PC `start`:
/// lis r3, hi
/// addi r3, r3, lo ; r3 = vtable_addr
/// lwz r4, slot*4(r3) ; r4 = vtable[slot]
/// mtctr r4
/// bcctrl
fn encode_pattern(buf: &mut [u8], offset: usize, vtable_addr: u32, slot_off: i32) {
let hi = (vtable_addr >> 16) as u16;
let lo = (vtable_addr & 0xFFFF) as i16;
let lis = (15u32 << 26) | (3 << 21) | (0 << 16) | (hi as u32);
// addi r3, r3, lo (signed) — note: addi is treated as signed
let addi = (14u32 << 26) | (3 << 21) | (3 << 16) | ((lo as u16) as u32);
let lwz = (32u32 << 26) | (4 << 21) | (3 << 16) | ((slot_off as u16) as u32);
// mtctr r4 = mtspr CTR(=9), r4. SPR_low (=9) → Rust bits 16-20;
// SPR_high (=0) → Rust bits 11-15. Rc bit 0.
let mtctr = (31u32 << 26) | (4 << 21) | (9 << 16) | (0 << 11) | (467 << 1);
let bcctrl = (19u32 << 26) | (20 << 21) | (528 << 1) | 1; // bcctrl 20, 0
let words = [lis, addi, lwz, mtctr, bcctrl];
for (i, w) in words.iter().enumerate() {
buf[offset + i * 4..offset + i * 4 + 4].copy_from_slice(&w.to_be_bytes());
}
}
#[test]
fn detects_canonical_lis_addi_lwz_mtctr_bcctrl() {
let image_base = 0x82000000u32;
let text_va = 0x1000u32;
let pc_start = image_base + text_va;
let vtable_addr = 0x82010000u32;
// PE: just the .text we'll write the pattern into.
let mut pe = vec![0u8; 0x1100];
encode_pattern(&mut pe, text_va as usize, vtable_addr, 8); // slot 2
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(pc_start, FuncInfo {
start: pc_start,
end: pc_start + 5 * 4,
frame_size: 0,
saved_gprs: 0,
is_leaf: false,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
});
let func_analysis = FuncAnalysis {
functions,
save_gpr_base: None,
restore_gpr_base: None,
pdata_entries: Vec::new(),
};
let vtables = vec![mk_vtable(vtable_addr, vec![0xAA, 0xBB, 0xCC, 0xDD])];
let labels: HashMap<u32, String> = HashMap::new();
let edges = analyze(&pe, image_base, &func_analysis, &vtables, &labels);
assert_eq!(edges.len(), 1);
assert_eq!(edges[0].source, pc_start + 4 * 4); // bcctrl at 5th instruction
assert_eq!(edges[0].target, 0xCC); // slot 2
assert_eq!(edges[0].via_vtable, vtable_addr);
assert_eq!(edges[0].slot, 2);
}
#[test]
fn out_of_range_slot_yields_no_edge() {
let image_base = 0x82000000u32;
let text_va = 0x1000u32;
let pc_start = image_base + text_va;
let vtable_addr = 0x82010000u32;
let mut pe = vec![0u8; 0x1100];
// Encode slot 12, but vtable only has 4 methods.
encode_pattern(&mut pe, text_va as usize, vtable_addr, 48);
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(pc_start, FuncInfo {
start: pc_start,
end: pc_start + 5 * 4,
frame_size: 0,
saved_gprs: 0,
is_leaf: false,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
});
let func_analysis = FuncAnalysis {
functions,
save_gpr_base: None,
restore_gpr_base: None,
pdata_entries: Vec::new(),
};
let vtables = vec![mk_vtable(vtable_addr, vec![0xAA, 0xBB, 0xCC, 0xDD])];
let labels: HashMap<u32, String> = HashMap::new();
let edges = analyze(&pe, image_base, &func_analysis, &vtables, &labels);
assert_eq!(edges.len(), 0);
}
#[test]
fn label_in_middle_kills_state() {
let image_base = 0x82000000u32;
let text_va = 0x1000u32;
let pc_start = image_base + text_va;
let vtable_addr = 0x82010000u32;
let mut pe = vec![0u8; 0x1100];
encode_pattern(&mut pe, text_va as usize, vtable_addr, 0);
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(pc_start, FuncInfo {
start: pc_start,
end: pc_start + 5 * 4,
frame_size: 0,
saved_gprs: 0,
is_leaf: false,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
});
let func_analysis = FuncAnalysis {
functions,
save_gpr_base: None,
restore_gpr_base: None,
pdata_entries: Vec::new(),
};
let vtables = vec![mk_vtable(vtable_addr, vec![0xAA, 0xBB])];
// Label between addi and lwz — must kill the Const tracking.
let mut labels: HashMap<u32, String> = HashMap::new();
labels.insert(pc_start + 8, "loc_mid".to_string());
let edges = analyze(&pe, image_base, &func_analysis, &vtables, &labels);
assert_eq!(edges.len(), 0, "label in middle of pattern must kill register state");
}
}

View File

@@ -2,9 +2,22 @@ pub mod ppc;
pub mod func;
pub mod xref;
pub mod db;
pub mod disasm;
pub mod formatter;
pub mod sinks;
pub mod sql_views;
pub mod demangle;
pub mod vtables;
pub mod lookup;
pub mod indirect;
pub mod ind_dispatch_typed;
pub mod strings;
pub mod funcptr_arrays;
pub mod eh_scope;
pub mod static_init;
mod ordinals;
pub use ordinals::resolve_ordinal;
pub use xref::{XrefKind, Xref, XrefMap, resolve_source_label};
pub use db::{DbWriter, ExecTraceEntry, ImportCallEntry, BranchTraceEntry};
pub use disasm::{RichDisasmItem, enrich_section};

View File

@@ -0,0 +1,222 @@
//! Symbolic-name resolution for runtime probes (M4).
//!
//! Lets `--pc-probe` / `--branch-probe` / `--ctor-probe` accept names like
//! `xe::apu::AudioSystem::Setup` or `MyClass::*` instead of bare PC literals.
//! Resolution joins the M3-produced `classes` × `methods` × `functions` tables
//! and the M2 `demangled_names` table.
//!
//! Numeric tokens (`0x824D6640`, `2186674160`) are returned unchanged; symbolic
//! tokens require a path to an existing `sylpheed.db` (passed by the caller).
//!
//! All DB access is read-only and happens before guest execution, so the
//! lockstep digest is unaffected.
use std::path::Path;
use anyhow::{anyhow, Result};
use duckdb::params;
/// Parse one probe token into one or more PCs.
///
/// Recognized forms:
/// - `0xADDR` / `ADDR` (decimal) → returns one PC unchanged.
/// - `Class::method` → all `methods.function_address` matching that
/// `class_name` + `method_name` pair.
/// - `Class::*` → all `methods.function_address` for that class.
/// - `func::Name` (free function) → falls back to `functions.name` lookup.
///
/// `db_path` is consulted ONLY if the token is non-numeric. When `db_path` is
/// `None` and the token is symbolic, returns an error suggesting the user
/// either pass `--db` or use a numeric address.
pub fn resolve_probe_token(db_path: Option<&Path>, token: &str) -> Result<Vec<u32>> {
let token = token.trim();
if token.is_empty() {
return Ok(vec![]);
}
if let Some(pc) = parse_numeric(token) {
return Ok(vec![pc]);
}
let db = db_path.ok_or_else(|| {
anyhow!(
"symbolic probe token {token:?} requires a sylpheed.db; \
pass --probe-db=PATH or use a numeric 0x… address",
)
})?;
if !db.exists() {
return Err(anyhow!("--probe-db not found: {}", db.display()));
}
let conn = duckdb::Connection::open_with_flags(
db,
duckdb::Config::default().access_mode(duckdb::AccessMode::ReadOnly)?,
)?;
// Class::method or Class::*
if let Some((class, method)) = token.split_once("::") {
if method == "*" {
return resolve_class_star(&conn, class);
}
// Try Class::method first, then fall back to functions.name lookup.
let pcs = resolve_class_method(&conn, class, method)?;
if !pcs.is_empty() {
return Ok(pcs);
}
}
// Last-resort: functions.name match (e.g. for `entry_point` or
// `__savegprlr_22`). Substring-free; user gets a clear error if missing.
resolve_function_name(&conn, token)
}
fn parse_numeric(token: &str) -> Option<u32> {
if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
return u32::from_str_radix(hex, 16).ok();
}
token.parse::<u32>().ok()
}
fn resolve_class_method(conn: &duckdb::Connection, class: &str, method: &str) -> Result<Vec<u32>> {
// Two-step lookup so we can give better errors:
// 1. find matching methods rows joined to classes;
// 2. surface the function_address column.
let mut stmt = conn.prepare(
"SELECT DISTINCT m.function_address FROM methods m
JOIN classes c ON c.vtable_address = m.vtable_address
JOIN demangled_names dn ON dn.address = m.function_address
WHERE c.name = ? AND dn.method_name = ?",
)?;
let pcs: Vec<u32> = stmt
.query_map(params![class, method], |r| r.get::<_, i64>(0).map(|x| x as u32))?
.filter_map(|r| r.ok())
.collect();
Ok(pcs)
}
fn resolve_class_star(conn: &duckdb::Connection, class: &str) -> Result<Vec<u32>> {
let mut stmt = conn.prepare(
"SELECT DISTINCT m.function_address FROM methods m
JOIN classes c ON c.vtable_address = m.vtable_address
WHERE c.name = ?",
)?;
let pcs: Vec<u32> = stmt
.query_map(params![class], |r| r.get::<_, i64>(0).map(|x| x as u32))?
.filter_map(|r| r.ok())
.collect();
if pcs.is_empty() {
return Err(anyhow!(
"no class named {class:?} found in classes table — has --dis populated this DB?",
));
}
Ok(pcs)
}
fn resolve_function_name(conn: &duckdb::Connection, name: &str) -> Result<Vec<u32>> {
let mut stmt = conn.prepare("SELECT address FROM functions WHERE name = ?")?;
let pcs: Vec<u32> = stmt
.query_map(params![name], |r| r.get::<_, i64>(0).map(|x| x as u32))?
.filter_map(|r| r.ok())
.collect();
if pcs.is_empty() {
return Err(anyhow!(
"probe token {name:?} did not match any classes::methods or functions row",
));
}
Ok(pcs)
}
#[cfg(test)]
mod tests {
use super::*;
use duckdb::Connection;
fn build_synthetic_db(path: &Path) {
let conn = Connection::open(path).expect("open");
conn.execute_batch(
"
CREATE TABLE functions (
address BIGINT PRIMARY KEY,
name VARCHAR
);
CREATE TABLE classes (
name VARCHAR PRIMARY KEY,
vtable_address BIGINT,
rtti_present BOOLEAN,
base_classes_json VARCHAR
);
CREATE TABLE methods (
vtable_address BIGINT,
slot BIGINT,
function_address BIGINT,
mangled_name VARCHAR,
demangled_name VARCHAR,
PRIMARY KEY (vtable_address, slot)
);
CREATE TABLE demangled_names (
address BIGINT,
mangled VARCHAR,
raw_demangled VARCHAR,
namespace_path VARCHAR,
class_name VARCHAR,
method_name VARCHAR,
params_signature VARCHAR
);
INSERT INTO classes VALUES ('Foo', 11000, true, NULL);
INSERT INTO functions VALUES (12000, 'sub_2EE0'), (12100, 'sub_2F44');
INSERT INTO methods VALUES (11000, 0, 12000, NULL, NULL),
(11000, 1, 12100, NULL, NULL);
INSERT INTO demangled_names (address, mangled, raw_demangled, class_name, method_name)
VALUES (12000, '?bar@Foo@@QEAAXXZ', 'void Foo::bar(void)', 'Foo', 'bar'),
(12100, '?baz@Foo@@QEAAXXZ', 'void Foo::baz(void)', 'Foo', 'baz');
",
)
.expect("seed");
}
#[test]
fn numeric_passthrough_no_db_needed() {
let pcs = resolve_probe_token(None, "0x824D6640").unwrap();
assert_eq!(pcs, vec![0x824D6640]);
let pcs = resolve_probe_token(None, "2186095088").unwrap();
assert_eq!(pcs, vec![0x824D29F0]);
}
#[test]
fn symbolic_token_without_db_errors() {
let err = resolve_probe_token(None, "Foo::bar").unwrap_err();
assert!(format!("{err}").contains("requires a sylpheed.db"));
}
#[test]
fn class_method_resolves() {
let tmp = std::env::temp_dir().join("xenia_lookup_test.duckdb");
let _ = std::fs::remove_file(&tmp);
build_synthetic_db(&tmp);
let pcs = resolve_probe_token(Some(&tmp), "Foo::bar").unwrap();
assert_eq!(pcs, vec![12000]);
let _ = std::fs::remove_file(&tmp);
}
#[test]
fn class_star_returns_all_methods() {
let tmp = std::env::temp_dir().join("xenia_lookup_star.duckdb");
let _ = std::fs::remove_file(&tmp);
build_synthetic_db(&tmp);
let mut pcs = resolve_probe_token(Some(&tmp), "Foo::*").unwrap();
pcs.sort();
assert_eq!(pcs, vec![12000, 12100]);
let _ = std::fs::remove_file(&tmp);
}
#[test]
fn function_name_fallback() {
let tmp = std::env::temp_dir().join("xenia_lookup_fn.duckdb");
let _ = std::fs::remove_file(&tmp);
build_synthetic_db(&tmp);
let pcs = resolve_probe_token(Some(&tmp), "sub_2EE0").unwrap();
assert_eq!(pcs, vec![12000]);
let _ = std::fs::remove_file(&tmp);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,37 @@
//! DuckDB sink — appends rich disasm items to the `instructions` table.
//!
//! Column layout matches [`crate::db`]: address, raw, mnemonic, operands,
//! disasm, ext_mnemonic, ext_operands, ext_disasm, section, function, label.
use duckdb::{Appender, params};
use crate::disasm::RichDisasmItem;
/// Append every item to the appender. Returns the number of rows written.
/// Does NOT flush — the caller decides when to flush, since multiple
/// section iterators typically share one appender.
pub fn append_instructions<'a>(
appender: &mut Appender<'_>,
items: impl IntoIterator<Item = RichDisasmItem<'a>>,
) -> duckdb::Result<u64> {
let mut count: u64 = 0;
for ri in items {
let t = &ri.item.text;
appender.append_row(params![
ri.item.addr as i64,
ri.item.raw as i64,
t.mnemonic.as_str(),
t.operands.as_str(),
t.disasm.as_str(),
t.ext_mnemonic.as_deref(),
t.ext_operands.as_deref(),
t.ext_disasm.as_deref(),
t.branch_target.map(|t| t as i64),
ri.section,
ri.function.map(|f| f as i64),
ri.label,
])?;
count += 1;
}
Ok(count)
}

View File

@@ -0,0 +1,63 @@
//! JSON Lines sink — one structured row per line, constant memory.
//!
//! Suited for piping into `jq`, importing into pandas / DuckDB's
//! `read_json_auto`, or feeding downstream tooling that expects a
//! line-delimited stream rather than a single megaobject.
use std::io::{self, Write};
use serde::Serialize;
use crate::disasm::RichDisasmItem;
#[derive(Serialize)]
struct JsonRow<'a> {
addr: u32,
raw: u32,
mnemonic: &'a str,
operands: &'a str,
disasm: &'a str,
#[serde(skip_serializing_if = "Option::is_none")]
ext_mnemonic: Option<&'a str>,
#[serde(skip_serializing_if = "Option::is_none")]
ext_operands: Option<&'a str>,
#[serde(skip_serializing_if = "Option::is_none")]
ext_disasm: Option<&'a str>,
#[serde(skip_serializing_if = "Option::is_none")]
branch_target: Option<u32>,
section: &'a str,
#[serde(skip_serializing_if = "Option::is_none")]
function: Option<u32>,
#[serde(skip_serializing_if = "Option::is_none")]
label: Option<&'a str>,
}
/// Write each item as a single JSON object on its own line. Returns the
/// number of rows written.
pub fn write_jsonl<'a, W: Write>(
out: &mut W,
items: impl IntoIterator<Item = RichDisasmItem<'a>>,
) -> io::Result<u64> {
let mut count: u64 = 0;
for ri in items {
let t = &ri.item.text;
let row = JsonRow {
addr: ri.item.addr,
raw: ri.item.raw,
mnemonic: &t.mnemonic,
operands: &t.operands,
disasm: &t.disasm,
ext_mnemonic: t.ext_mnemonic.as_deref(),
ext_operands: t.ext_operands.as_deref(),
ext_disasm: t.ext_disasm.as_deref(),
branch_target: t.branch_target,
section: ri.section,
function: ri.function,
label: ri.label,
};
serde_json::to_writer(&mut *out, &row)?;
out.write_all(b"\n")?;
count += 1;
}
Ok(count)
}

View File

@@ -0,0 +1,8 @@
//! Output sinks for [`crate::disasm::RichDisasmItem`] streams.
//!
//! Each sink consumes the same iterator shape and writes to a different
//! medium: human-readable .asm text, JSON Lines, or DuckDB rows.
pub mod duckdb;
pub mod json;
pub mod text;

View File

@@ -0,0 +1,58 @@
//! Text sink — renders one .asm instruction line with optional
//! branch-target / data-ref annotations.
//!
//! The full `write_asm` orchestration (section headers, function prologue
//! info, xref comment blocks, hex-dump of data sections) stays in
//! [`crate::formatter`]; this sink only owns the per-instruction line.
use std::collections::HashMap;
use std::io::{self, Write};
use xenia_xex::pe::PeSection;
use crate::disasm::RichDisasmItem;
use crate::xref::{XrefKind, section_for_addr};
/// Render one instruction line:
/// ` 82000000: 60000000 nop`
/// ` 82000004: 4800FFFC bl 0x82000000 ; -> entry_point`
/// ` 82000010: 812A0000 lwz r9, 0(r10) ; [R] 0x828A0000 (.rdata) = dat_…`
pub fn write_instr_line<W: Write + ?Sized>(
out: &mut W,
item: &RichDisasmItem<'_>,
labels: &HashMap<u32, String>,
sections: &[PeSection],
image_base: u32,
data_annotation: Option<(u32, XrefKind)>,
) -> io::Result<()> {
let disasm_text = item.item.text.display();
// Branch-target → label annotation. Uses the structured `branch_target`
// field (cleaner than the legacy "find 0x in disasm string" regex).
let mut annotated = match item.item.text.branch_target {
Some(target) => match labels.get(&target) {
Some(lbl) => format!("{disasm_text:<40} ; -> {lbl}"),
None => disasm_text.to_string(),
},
None => disasm_text.to_string(),
};
if let Some((data_addr, kind)) = data_annotation {
let tag = match kind {
XrefKind::DataRead => "[R]",
XrefKind::DataWrite => "[W]",
_ => "[&]",
};
let sec = section_for_addr(data_addr, sections, image_base).unwrap_or("?");
let data_lbl = labels.get(&data_addr)
.map(|s| format!(" = {s}"))
.unwrap_or_default();
if !annotated.contains("; ->") {
annotated = format!("{annotated:<40} ; {tag} 0x{data_addr:08X} ({sec}){data_lbl}");
} else {
annotated = format!("{annotated} {tag} 0x{data_addr:08X} ({sec}){data_lbl}");
}
}
writeln!(out, " {:08X}: {:08X} {}", item.item.addr, item.item.raw, annotated)
}

View File

@@ -0,0 +1,165 @@
//! Additive SQL views over the Phase-3 ingest tables.
//!
//! These views are created when `--analyze=sql` or `--analyze=both` is set.
//! They are *not* a replacement for the Rust passes ([`crate::xref`],
//! [`crate::func`]) — those still own data-ref resolution and prologue
//! pattern matching. The views cover the cleanly-relational parts:
//!
//! - branch xrefs (self-join on `instructions.target_hex`)
//! - call graph + reachability (recursive CTE over `xrefs`)
//! - convenience joins (function-first-instruction, imports-called)
//!
//! All views are read-only and stable across re-creation: dropping and
//! recreating the database via [`crate::db::DbWriter::open_fresh`] re-runs
//! these definitions.
//!
//! ## Cross-check semantics
//!
//! `v_branch_xrefs` is intended to produce *exactly* the same `(source,
//! target, kind)` tuples as the Rust `xref.rs` first pass — given the same
//! input image. [`crate::db::DbWriter::cross_check_branch_xrefs`] queries
//! the symmetric difference and returns the row counts; both should be
//! zero. A non-zero count means the formatter's `mnemonic` column or the
//! kind-classification CASE drifted out of agreement with `xref.rs`, and
//! is worth a one-line warning at log time.
/// `(view_name, CREATE VIEW … SQL)` pairs in the order they must run.
/// Later views may depend on earlier ones (e.g. `v_call_graph` reads
/// `xrefs`, which is the Rust-pass table; `v_branch_xrefs` is independent).
pub const ALL_VIEWS: &[(&str, &str)] = &[
("v_branch_xrefs", V_BRANCH_XREFS),
("v_call_graph", V_CALL_GRAPH),
("v_reachability_from_entry", V_REACHABILITY_FROM_ENTRY),
("v_indirect_reachability_from_entry", V_INDIRECT_REACHABILITY_FROM_ENTRY),
("v_function_first_instruction", V_FUNCTION_FIRST_INSTRUCTION),
("v_imports_called", V_IMPORTS_CALLED),
];
/// Branch cross-references derived purely from `instructions.target_hex`.
///
/// Mirrors the kind classification in [`crate::xref::collect_branch_target`]
/// and the short tags returned by [`crate::xref::XrefKind::tag`] (which are
/// what `xrefs.kind` actually stores):
/// - I-form (`b`/`bl`/`ba`/`bla`): `bl`/`bla` → `"call"`, `b`/`ba` → `"j"`
/// - B-form (`bc`/`bcl`/`bca`/`bcla`): always → `"br"`
///
/// Indirect branches (`bclr`/`bcctr`) leave `target_hex` NULL and are
/// excluded from this view by design.
const V_BRANCH_XREFS: &str = "
CREATE OR REPLACE VIEW v_branch_xrefs AS
SELECT
address AS source,
target_hex AS target,
CASE
WHEN mnemonic IN ('bl', 'bla') THEN 'call'
WHEN mnemonic IN ('b', 'ba') THEN 'j'
WHEN mnemonic IN ('bc', 'bcl', 'bca', 'bcla') THEN 'br'
ELSE 'br'
END AS kind,
mnemonic AS instruction,
function AS source_func
FROM instructions
WHERE target_hex IS NOT NULL;
";
/// Call-graph edges resolved against function names.
///
/// Reads from `xrefs` (the Rust-pass table) — this is the canonical source
/// for *all* edge kinds, including indirect/data; SQL can't reconstruct the
/// data-ref edges cleanly because they require register tracking. For pure
/// branch edges, `v_branch_xrefs` produces equivalent rows directly from
/// `instructions`.
const V_CALL_GRAPH: &str = "
CREATE OR REPLACE VIEW v_call_graph AS
SELECT
x.source AS caller_addr,
cf.name AS caller_name,
x.target AS callee_addr,
tf.name AS callee_name,
x.kind AS edge_kind
FROM xrefs x
LEFT JOIN functions cf ON cf.address = x.source_func
LEFT JOIN functions tf ON tf.address = x.target
WHERE x.kind = 'call';
";
/// Transitive function-level reachability from the entry point over
/// call/jump/branch edges. Useful for finding dead code
/// (`SELECT address FROM functions
/// WHERE address NOT IN (SELECT addr FROM v_reachability_from_entry)`)
/// and for scoping analysis to the live subset.
///
/// Seeds from the function containing the `entry_point` label and walks
/// the recursive closure: a reachable function's instructions branch into
/// the functions enclosing the branch targets, which are then reachable
/// in turn. `UNION` (not `UNION ALL`) deduplicates to handle call-graph
/// cycles (recursive functions, mutually-recursive pairs).
const V_REACHABILITY_FROM_ENTRY: &str = "
CREATE OR REPLACE VIEW v_reachability_from_entry AS
WITH RECURSIVE reach(fn) AS (
SELECT i.function FROM instructions i
JOIN labels l ON l.address = i.address
WHERE l.name = 'entry_point' AND i.function IS NOT NULL
UNION
SELECT tgt.function FROM xrefs x
JOIN instructions src ON src.address = x.source
JOIN instructions tgt ON tgt.address = x.target
JOIN reach r ON src.function = r.fn
WHERE x.kind IN ('call', 'j', 'br')
AND tgt.function IS NOT NULL
)
SELECT fn AS addr FROM reach;
";
/// Reachability extended over `kind='ind_call'` edges from M5. Strict
/// superset of `v_reachability_from_entry` — every fn there is also here,
/// plus any function reached only via a vtable bcctrl whose vtable+slot
/// the M5 dataflow could resolve. Sample 5 newly-reachable PCs in canary
/// before trusting widely; the analysis intentionally leaves out alias-
/// dependent indirect calls (vtable loaded from a `this` field).
const V_INDIRECT_REACHABILITY_FROM_ENTRY: &str = "
CREATE OR REPLACE VIEW v_indirect_reachability_from_entry AS
WITH RECURSIVE reach(fn) AS (
SELECT i.function FROM instructions i
JOIN labels l ON l.address = i.address
WHERE l.name = 'entry_point' AND i.function IS NOT NULL
UNION
SELECT tgt.function FROM xrefs x
JOIN instructions src ON src.address = x.source
JOIN instructions tgt ON tgt.address = x.target
JOIN reach r ON src.function = r.fn
WHERE x.kind IN ('call', 'ind_call', 'j', 'br')
AND tgt.function IS NOT NULL
)
SELECT fn AS addr FROM reach;
";
/// Convenience join: each function's first decoded instruction. Useful for
/// quickly inspecting prologue patterns without computing offsets manually.
const V_FUNCTION_FIRST_INSTRUCTION: &str = "
CREATE OR REPLACE VIEW v_function_first_instruction AS
SELECT
f.address AS function_addr,
f.name AS function_name,
i.raw AS first_raw,
i.disasm AS first_disasm,
i.ext_disasm AS first_ext_disasm
FROM functions f
JOIN instructions i ON i.address = f.address;
";
/// Per-function summary of which kernel/library imports it calls. Joins
/// xrefs (call edges) against the labels table to surface import names.
const V_IMPORTS_CALLED: &str = "
CREATE OR REPLACE VIEW v_imports_called AS
SELECT
x.source_func AS function_addr,
f.name AS function_name,
x.target AS import_addr,
l.name AS import_name
FROM xrefs x
JOIN labels l ON l.address = x.target
LEFT JOIN functions f ON f.address = x.source_func
WHERE x.kind = 'call'
AND l.kind = 'import';
";

View File

@@ -0,0 +1,399 @@
//! M11.5 — static-initialiser driver detection.
//!
//! MSVC's CRT static-init driver (`_initterm` / `_initterm_e` style)
//! is a tight loop that walks a function-pointer array between two
//! addresses, calling each non-null entry:
//!
//! ```text
//! loop_top:
//! cmpw[l] rA, rB ; compare cursor vs end
//! beq done
//! lwz rN, 0(rA) ; load fn ptr
//! cmpwi rN, 0 ; null-skip (optional)
//! beq skip
//! mtctr rN
//! bcctrl
//! skip:
//! addi rA, rA, 4
//! b loop_top
//! done:
//! ```
//!
//! Two static addresses (`rA` and `rB` at loop start) bracket the
//! function-pointer array. Detection strategy: scan every function for
//! the canonical pattern; when found, extract the array bounds and
//! emit one row in `function_pointer_arrays` with `kind='static_init'`.
//!
//! ### What this layer does
//!
//! - Walks each function looking for an `lwz; mtctr; bcctrl` sequence
//! inside a loop bounded by a comparison against another constant.
//! - When the loop's cursor register is observed to be incremented by
//! exactly 4 per iteration, classifies it as a static-init driver
//! and records the (start, end) array bounds.
//!
//! ### What this layer does NOT do
//!
//! - No support for back-to-back drivers sharing a common loop trampoline.
//! - No detection of the M11 prologue-style heuristic; M11.5 is
//! structure-grounded and replaces the prior heuristic where it fires.
//! - Does not handle CRT-style `_initterm_e` (the `_e` variant returns
//! a status); detection works for both as long as the loop shape
//! matches.
//!
//! Reference: Microsoft CRT `crt0.c::_initterm` source pattern.
use std::collections::{BTreeSet, HashMap, HashSet};
use crate::func::FuncAnalysis;
use crate::funcptr_arrays::FuncPtrArray;
use xenia_xex::pe::PeSection;
#[derive(Debug, Clone, Copy)]
pub struct StaticInitDriver {
/// VA of the driver function (the one containing the loop).
pub driver_function: u32,
/// VA of the array start.
pub array_start: u32,
/// VA one-past-end of the array.
pub array_end: u32,
/// Detected length in slots.
pub length: u32,
}
#[derive(Debug, Default)]
pub struct StaticInitResult {
pub drivers: Vec<StaticInitDriver>,
/// Newly-detected static-init arrays, ready to be merged into the
/// `function_pointer_arrays` table with `kind='static_init'`.
pub arrays: Vec<FuncPtrArray>,
}
const OP_ADDI: u32 = 14;
const OP_ADDIS: u32 = 15;
const OP_BCCTR: u32 = 19;
const OP_LWZ: u32 = 32;
const OP_X_FORM: u32 = 31;
#[derive(Debug, Clone, Copy)]
enum RegVal {
Const(u32),
}
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
func_analysis: &FuncAnalysis,
function_starts: &BTreeSet<u32>,
labels: &HashMap<u32, String>,
) -> StaticInitResult {
let started = std::time::Instant::now();
let block_boundaries: HashSet<u32> = labels.keys().copied().collect();
let mut drivers: Vec<StaticInitDriver> = Vec::new();
for (&fn_start, fi) in &func_analysis.functions {
if fi.is_saverestore { continue; }
if let Some(d) = scan_function_for_driver(
pe, image_base, fn_start, fi.end, &block_boundaries,
) {
drivers.push(d);
}
}
// Build arrays from the discovered drivers + section data.
let mut arrays: Vec<FuncPtrArray> = Vec::new();
for d in &drivers {
if let Some(entries) = read_array(pe, image_base, sections, d.array_start, d.array_end, function_starts) {
arrays.push(FuncPtrArray {
address: d.array_start,
length: entries.len() as u32,
kind: "static_init",
entries,
});
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
metrics::histogram!("analysis.phase_ms", "phase" => "static_init").record(elapsed_ms);
tracing::info!(
drivers = drivers.len(),
arrays = arrays.len(),
elapsed_ms,
"M11.5 static-init driver scan complete",
);
StaticInitResult { drivers, arrays }
}
/// Read the function-pointer array between [start, end) from .rdata/.data.
/// NULL entries are skipped (CRT _initterm explicitly tolerates them).
/// Non-function-start entries cause us to bail (the driver bounds were
/// likely misidentified).
fn read_array(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
start: u32,
end: u32,
function_starts: &BTreeSet<u32>,
) -> Option<Vec<u32>> {
if end <= start || (end - start) > 4096 { return None; }
let _section = sections.iter().find(|s| {
let lo = image_base + s.virtual_address;
let hi = lo + s.virtual_size;
start >= lo && end <= hi && (s.name == ".rdata" || s.name == ".data")
})?;
let mut entries = Vec::new();
let mut p = start;
while p < end {
let off = p.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
let v = u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]);
if v != 0 {
if !function_starts.contains(&v) { return None; }
entries.push(v);
}
p = p.wrapping_add(4);
}
if entries.is_empty() { return None; }
Some(entries)
}
/// Walk one function looking for the canonical static-init driver shape.
/// Returns Some when the loop's cursor register starts at a known constant
/// `rA`, terminates at another known constant `rB` via a compare, and
/// increments by 4 per iteration with an `lwz; mtctr; bcctrl` body.
fn scan_function_for_driver(
pe: &[u8],
image_base: u32,
fn_start: u32,
fn_end: u32,
block_boundaries: &HashSet<u32>,
) -> Option<StaticInitDriver> {
let mut reg: [Option<RegVal>; 32] = [None; 32];
// Pattern features observed during the walk.
let mut cursor_reg: Option<usize> = None;
let mut cursor_init: Option<u32> = None;
let mut end_reg: Option<usize> = None;
let mut end_init: Option<u32> = None;
let mut saw_lwz_through_cursor = false;
let mut saw_mtctr = false;
let mut saw_bcctrl = false;
let mut saw_addi_4 = false;
let mut pc = fn_start;
while pc < fn_end {
if pc != fn_start && block_boundaries.contains(&pc) {
// Heuristic: when we cross a basic-block boundary that
// is not the loop-top, accumulated state remains valid for
// pattern-matching purposes — but we drop register Const
// tracking to be safe.
reg = [None; 32];
}
let off = pc.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { break; }
let instr = u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]);
let op = instr >> 26;
let rd = ((instr >> 21) & 0x1F) as usize;
let ra = ((instr >> 16) & 0x1F) as usize;
let simm = ((instr & 0xFFFF) as i16) as i32;
let uimm = instr & 0xFFFF;
match op {
OP_ADDIS if ra == 0 => reg[rd] = Some(RegVal::Const(uimm << 16)),
OP_ADDIS => {
if let Some(RegVal::Const(b)) = reg[ra] {
reg[rd] = Some(RegVal::Const(b.wrapping_add(uimm << 16)));
} else { reg[rd] = None; }
}
OP_ADDI if ra != 0 => {
let prev = reg[ra];
if let Some(RegVal::Const(b)) = prev {
let v = b.wrapping_add(simm as u32);
reg[rd] = Some(RegVal::Const(v));
// Was this an `addi r, r, 4`? Mark cursor-increment.
if rd == ra && simm == 4 {
if Some(rd) == cursor_reg {
saw_addi_4 = true;
}
} else if cursor_reg.is_none() {
// First time we see a known-constant register that
// *could* be the cursor — defer the choice until we
// see a load through it.
cursor_init = Some(v);
cursor_reg = Some(rd);
} else if end_reg.is_none() && Some(rd) != cursor_reg {
end_init = Some(v);
end_reg = Some(rd);
}
} else { reg[rd] = None; }
}
OP_LWZ => {
if ra != 0 && Some(ra) == cursor_reg {
saw_lwz_through_cursor = true;
}
reg[rd] = None;
}
OP_X_FORM => {
let xo = (instr >> 1) & 0x3FF;
if xo == 467 {
let spr = (((instr >> 11) & 0x1F) << 5) | ((instr >> 16) & 0x1F);
if spr == 9 && saw_lwz_through_cursor { saw_mtctr = true; }
}
if xo != 444 && xo != 467 { reg[rd] = None; }
}
OP_BCCTR => {
let xo = (instr >> 1) & 0x3FF;
let lk = (instr & 1) != 0;
if xo == 528 && lk && saw_mtctr {
saw_bcctrl = true;
}
}
18 => {
if (instr & 1) != 0 {
for r in 0..=12 { reg[r] = None; }
}
}
16 => {
if (instr & 1) != 0 {
for r in 0..=12 { reg[r] = None; }
}
}
_ => {}
}
pc = pc.wrapping_add(4);
}
// Validate that all four pattern features fired.
if !(saw_lwz_through_cursor && saw_mtctr && saw_bcctrl && saw_addi_4) {
return None;
}
let cursor_init = cursor_init?;
let end_init = end_init?;
if end_init <= cursor_init { return None; }
if end_init - cursor_init > 4096 { return None; }
Some(StaticInitDriver {
driver_function: fn_start,
array_start: cursor_init,
array_end: end_init,
length: (end_init - cursor_init) / 4,
})
}
#[cfg(test)]
mod tests {
use super::*;
use crate::func::FuncInfo;
use std::collections::BTreeMap;
use xenia_xex::pe::PeSection;
fn mk_section(name: &str, va: u32, size: u32) -> PeSection {
PeSection {
name: name.into(),
virtual_address: va, virtual_size: size,
raw_offset: va, raw_size: size,
flags: 0x4000_0040,
}
}
fn write_be(pe: &mut [u8], at: usize, v: u32) {
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
}
#[test]
fn detects_canonical_initterm_loop() {
// Build a tiny driver that loops over a 3-entry array.
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
// Array at .rdata + 0x800: 3 function pointers.
let arr_va_lo = 0x800u32;
let fns = [image_base + 0x2000, image_base + 0x2010, image_base + 0x2020];
for (i, p) in fns.iter().enumerate() {
write_be(&mut pe, arr_va_lo as usize + i * 4, *p);
}
let array_start = image_base + arr_va_lo;
let array_end = array_start + 12;
// Driver function at 0x82001000:
// lis r3, hi(array_start)
// addi r3, r3, lo(array_start)
// lis r4, hi(array_end)
// addi r4, r4, lo(array_end)
// lwz r5, 0(r3)
// mtctr r5
// bcctrl
// addi r3, r3, 4
// blr
let driver = 0x82001000u32;
let off = (driver - image_base) as usize;
let lis_r3 = (15u32 << 26) | (3 << 21) | ((array_start >> 16) as u32);
let addi_r3 = (14u32 << 26) | (3 << 21) | (3 << 16) | ((array_start as u16) as u32);
let lis_r4 = (15u32 << 26) | (4 << 21) | ((array_end >> 16) as u32);
let addi_r4 = (14u32 << 26) | (4 << 21) | (4 << 16) | ((array_end as u16) as u32);
let lwz = (32u32 << 26) | (5 << 21) | (3 << 16);
let mtctr = (31u32 << 26) | (5 << 21) | (9 << 16) | (467 << 1);
let bcctrl = (19u32 << 26) | (20 << 21) | (528 << 1) | 1;
let addi_inc = (14u32 << 26) | (3 << 21) | (3 << 16) | 4;
let blr = (19u32 << 26) | (20 << 21) | (16 << 1);
for (i, w) in [lis_r3, addi_r3, lis_r4, addi_r4, lwz, mtctr, bcctrl, addi_inc, blr].iter().enumerate() {
write_be(&mut pe, off + i * 4, *w);
}
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(driver, FuncInfo {
start: driver, end: driver + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: false, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let fa = FuncAnalysis {
functions, save_gpr_base: None, restore_gpr_base: None, pdata_entries: Vec::new(),
};
let sections = vec![mk_section(".rdata", 0x800, 0x100)];
let mut starts = BTreeSet::new();
for &p in &fns { starts.insert(p); }
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &sections, &fa, &starts, &labels);
assert_eq!(r.drivers.len(), 1, "should detect one driver");
let d = &r.drivers[0];
assert_eq!(d.driver_function, driver);
assert_eq!(d.array_start, array_start);
assert_eq!(d.array_end, array_end);
assert_eq!(d.length, 3);
assert_eq!(r.arrays.len(), 1);
assert_eq!(r.arrays[0].kind, "static_init");
assert_eq!(r.arrays[0].entries.len(), 3);
}
#[test]
fn rejects_function_without_pattern() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x4000];
let driver = 0x82001000u32;
// Just a blr — no driver pattern.
let blr = (19u32 << 26) | (20 << 21) | (16 << 1);
write_be(&mut pe, (driver - image_base) as usize, blr);
let mut functions: BTreeMap<u32, FuncInfo> = BTreeMap::new();
functions.insert(driver, FuncInfo {
start: driver, end: driver + 0x40, frame_size: 0, saved_gprs: 0,
is_leaf: true, is_saverestore: false,
pdata_validated: false, pdata_length: None, has_eh: false,
});
let fa = FuncAnalysis {
functions, save_gpr_base: None, restore_gpr_base: None, pdata_entries: Vec::new(),
};
let sections = vec![mk_section(".rdata", 0x800, 0x100)];
let starts: BTreeSet<u32> = BTreeSet::new();
let labels: HashMap<u32, String> = HashMap::new();
let r = analyze(&pe, image_base, &sections, &fa, &starts, &labels);
assert_eq!(r.drivers.len(), 0);
}
}

View File

@@ -0,0 +1,382 @@
//! String / constant-pool detection in `.rdata`.
//!
//! Scans the `.rdata` section for runs of printable ASCII or null-terminated
//! UTF-16LE characters of length ≥ 6, emitting one row per discovered string.
//! Cross-references against `xrefs.target` are computed by the caller —
//! this module only finds the strings; downstream queries can join.
//!
//! ### What this layer does NOT do
//!
//! - No UTF-8 multibyte detection — Xbox 360 game binaries reliably use
//! ASCII for debug strings and UTF-16LE for localised text.
//! - Strings in `.data` (mutable globals) are not scanned by default.
//! - Wide strings on Xbox 360 are little-endian (compiler convention even
//! on this big-endian platform); we do NOT try big-endian UTF-16.
//! - No language detection / classification beyond encoding.
//!
//! Extends the original ASCII / UTF-16LE pass with Shift_JIS detection
//! (Sylpheed is originally Japanese — likely yields mission/UI text
//! invisible to ASCII-only) and UTF-8 multi-byte detection.
//!
//! Reference: `objdump -s` `.rdata` walks rely on the same heuristic;
//! Shift_JIS lead/trail byte ranges per JIS X 0208.
use xenia_xex::pe::PeSection;
/// One detected string.
#[derive(Debug, Clone)]
pub struct DetectedString {
/// Absolute VA of the first byte.
pub address: u32,
/// `"ascii"` | `"utf16le"` | `"shift_jis"` | `"utf8"`.
pub encoding: &'static str,
/// Length in bytes (excluding the NUL terminator).
pub length: u32,
/// UTF-8 representation of the string content.
pub content: String,
}
/// Scan all `.rdata` sections (and any other read-only data section the user
/// configures) for ASCII and UTF-16LE strings.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(pe: &[u8], image_base: u32, sections: &[PeSection]) -> Vec<DetectedString> {
let started = std::time::Instant::now();
let mut out: Vec<DetectedString> = Vec::new();
for section in sections {
if section.name != ".rdata" { continue; }
let raw_start = section.virtual_address as usize;
let raw_end = (section.virtual_address + section.virtual_size) as usize;
if raw_end > pe.len() { continue; }
let bytes = &pe[raw_start..raw_end.min(pe.len())];
let va_base = image_base + section.virtual_address;
scan_ascii(bytes, va_base, &mut out);
scan_utf16le(bytes, va_base, &mut out);
scan_shift_jis(bytes, va_base, &mut out);
scan_utf8(bytes, va_base, &mut out);
}
let elapsed_ms = started.elapsed().as_millis() as f64;
let n_ascii = out.iter().filter(|s| s.encoding == "ascii").count();
let n_utf16 = out.iter().filter(|s| s.encoding == "utf16le").count();
let n_sjis = out.iter().filter(|s| s.encoding == "shift_jis").count();
let n_utf8 = out.iter().filter(|s| s.encoding == "utf8").count();
metrics::histogram!("analysis.phase_ms", "phase" => "strings").record(elapsed_ms);
tracing::info!(
ascii = n_ascii,
utf16le = n_utf16,
shift_jis = n_sjis,
utf8 = n_utf8,
total = out.len(),
elapsed_ms,
"string scan complete"
);
out
}
const MIN_LEN: usize = 6;
fn is_printable_ascii(b: u8) -> bool {
// Printable + the common whitespace characters used in real strings.
matches!(b, 0x20..=0x7E | b'\t' | b'\n' | b'\r')
}
fn scan_ascii(bytes: &[u8], va_base: u32, out: &mut Vec<DetectedString>) {
let mut i = 0;
while i < bytes.len() {
if !is_printable_ascii(bytes[i]) {
i += 1;
continue;
}
let start = i;
while i < bytes.len() && is_printable_ascii(bytes[i]) { i += 1; }
let run_len = i - start;
// Require NUL termination and minimum length.
if run_len >= MIN_LEN && i < bytes.len() && bytes[i] == 0 {
let s = std::str::from_utf8(&bytes[start..i]).unwrap_or("");
out.push(DetectedString {
address: va_base + start as u32,
encoding: "ascii",
length: run_len as u32,
content: s.to_string(),
});
}
// Skip the NUL (if any) before continuing.
if i < bytes.len() && bytes[i] == 0 { i += 1; }
}
}
fn scan_utf16le(bytes: &[u8], va_base: u32, out: &mut Vec<DetectedString>) {
// UTF-16LE strings are 2-byte aligned in MSVC output. Walk on even
// offsets to avoid misaligned hits.
let mut i = 0;
while i + 2 <= bytes.len() {
if !i.is_multiple_of(2) { i += 1; continue; }
let lo = bytes[i];
let hi = bytes[i + 1];
// Restrict scan-start to printable ASCII range with a zero high byte —
// this is what real Xbox 360 wide strings look like.
if hi != 0 || !is_printable_ascii(lo) {
i += 2;
continue;
}
let start = i;
let mut codeunits: Vec<u16> = Vec::new();
while i + 2 <= bytes.len() {
let l = bytes[i];
let h = bytes[i + 1];
if h != 0 || !is_printable_ascii(l) { break; }
codeunits.push((h as u16) << 8 | l as u16);
i += 2;
}
// Require NUL u16 terminator.
let nul_terminated = i + 2 <= bytes.len() && bytes[i] == 0 && bytes[i + 1] == 0;
if codeunits.len() >= MIN_LEN && nul_terminated {
let s: String = String::from_utf16_lossy(&codeunits);
out.push(DetectedString {
address: va_base + start as u32,
encoding: "utf16le",
length: ((i - start) as u32),
content: s,
});
}
// Skip past the terminator.
if nul_terminated { i += 2; }
}
}
/// Per JIS X 0208: Shift_JIS first byte ∈ [0x81, 0x9F] [0xE0, 0xEF];
/// trail byte ∈ [0x40, 0x7E] [0x80, 0xFC]. Single-byte ASCII and JIS
/// half-width katakana (0xA1..=0xDF) are passed through.
fn is_sjis_lead(b: u8) -> bool {
(0x81..=0x9F).contains(&b) || (0xE0..=0xEF).contains(&b)
}
fn is_sjis_trail(b: u8) -> bool {
(0x40..=0x7E).contains(&b) || (0x80..=0xFC).contains(&b)
}
fn is_sjis_singlebyte(b: u8) -> bool {
is_printable_ascii(b) || (0xA1..=0xDF).contains(&b)
}
/// Scan for Shift_JIS strings — runs of ≥ 6 bytes consisting of valid
/// SJIS code units (single-byte ASCII / half-width katakana, OR a
/// lead+trail pair). At least one multi-byte pair must be present so we
/// don't double-count strings that are purely ASCII.
fn scan_shift_jis(bytes: &[u8], va_base: u32, out: &mut Vec<DetectedString>) {
let mut i = 0;
while i < bytes.len() {
let start = i;
let mut has_multibyte = false;
let mut nbytes = 0;
while i < bytes.len() {
let b = bytes[i];
if is_sjis_lead(b) && i + 1 < bytes.len() && is_sjis_trail(bytes[i + 1]) {
has_multibyte = true;
nbytes += 2;
i += 2;
} else if is_sjis_singlebyte(b) {
nbytes += 1;
i += 1;
} else {
break;
}
}
// Require NUL terminator + min length + at least one multi-byte char.
if has_multibyte
&& nbytes >= MIN_LEN
&& i < bytes.len() && bytes[i] == 0
{
// Decode SJIS → UTF-8 best-effort. We don't ship a full
// SJIS decoder; keep the bytes as a `\u{XX}\u{YY}…` style
// rendering for diagnostic readability, and let downstream
// tooling re-decode if needed.
let raw = &bytes[start..i];
let mut s = String::with_capacity(raw.len() * 4);
let mut p = 0;
while p < raw.len() {
let b = raw[p];
if is_sjis_lead(b) && p + 1 < raw.len() && is_sjis_trail(raw[p + 1]) {
// Render as SJIS hex pair so the string is identifiable
// even without a decoder. Real Japanese decoding is a
// future enhancement.
s.push_str(&format!("\\x{:02X}\\x{:02X}", b, raw[p + 1]));
p += 2;
} else {
s.push(b as char);
p += 1;
}
}
out.push(DetectedString {
address: va_base + start as u32,
encoding: "shift_jis",
length: nbytes as u32,
content: s,
});
i += 1; // skip NUL
} else {
// Advance past whatever didn't match.
i = start + 1;
if i < bytes.len() && bytes[i] == 0 { i += 1; }
}
}
}
/// Scan for UTF-8 strings carrying multi-byte sequences (we already
/// catch pure-ASCII via `scan_ascii`). Validates 2/3-byte sequences;
/// 4-byte (supplementary plane) is uncommon in game text and skipped.
fn scan_utf8(bytes: &[u8], va_base: u32, out: &mut Vec<DetectedString>) {
let mut i = 0;
while i < bytes.len() {
let start = i;
let mut has_multibyte = false;
let mut nbytes = 0;
while i < bytes.len() {
let b = bytes[i];
if b < 0x80 {
if !is_printable_ascii(b) { break; }
nbytes += 1;
i += 1;
} else if (b & 0xE0) == 0xC0 {
// 2-byte: 110xxxxx 10xxxxxx
if i + 1 >= bytes.len() || (bytes[i + 1] & 0xC0) != 0x80 { break; }
has_multibyte = true;
nbytes += 2;
i += 2;
} else if (b & 0xF0) == 0xE0 {
// 3-byte: 1110xxxx 10xxxxxx 10xxxxxx
if i + 2 >= bytes.len()
|| (bytes[i + 1] & 0xC0) != 0x80
|| (bytes[i + 2] & 0xC0) != 0x80 { break; }
has_multibyte = true;
nbytes += 3;
i += 3;
} else {
break;
}
}
if has_multibyte
&& nbytes >= MIN_LEN
&& i < bytes.len() && bytes[i] == 0
&& let Ok(s) = std::str::from_utf8(&bytes[start..i])
{
out.push(DetectedString {
address: va_base + start as u32,
encoding: "utf8",
length: nbytes as u32,
content: s.to_string(),
});
i += 1; // skip NUL
} else {
i = start + 1;
if i < bytes.len() && bytes[i] == 0 { i += 1; }
}
}
}
#[cfg(test)]
mod tests {
use super::*;
fn mk_section(name: &str, va: u32, size: u32) -> PeSection {
PeSection {
name: name.into(),
virtual_address: va,
virtual_size: size,
raw_offset: va,
raw_size: size,
flags: 0x4000_0040,
}
}
#[test]
fn detects_ascii_string() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
let off = 0x1000usize;
let s = b"Hello, world!\0";
pe[off..off + s.len()].copy_from_slice(s);
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
assert_eq!(strings.len(), 1);
assert_eq!(strings[0].encoding, "ascii");
assert_eq!(strings[0].content, "Hello, world!");
assert_eq!(strings[0].address, image_base + 0x1000);
}
#[test]
fn rejects_short_runs() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
let off = 0x1000usize;
let s = b"Hi\0longer string here\0";
pe[off..off + s.len()].copy_from_slice(s);
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
assert_eq!(strings.len(), 1);
assert_eq!(strings[0].content, "longer string here");
}
#[test]
fn detects_utf16le_string() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
let off = 0x1000usize;
// "Hello!" in UTF-16LE + NUL u16
let s: &[u8] = b"H\0e\0l\0l\0o\0!\0\0\0";
pe[off..off + s.len()].copy_from_slice(s);
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
// Both ASCII and UTF-16 may detect — UTF-16 should find it as wide;
// ASCII pass scans bytes and won't see this as a contiguous run
// because of the interleaved 0 bytes (non-printable).
let utf16: Vec<_> = strings.iter().filter(|s| s.encoding == "utf16le").collect();
assert!(utf16.iter().any(|s| s.content == "Hello!"));
}
#[test]
fn detects_shift_jis_string() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
let off = 0x1000usize;
// "ABC" + (SJIS hiragana 'a' = 0x82 0xA0) + (SJIS 'i' = 0x82 0xA2) + NUL
let s: &[u8] = b"ABC\x82\xA0\x82\xA2\0";
pe[off..off + s.len()].copy_from_slice(s);
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
let sjis: Vec<_> = strings.iter().filter(|s| s.encoding == "shift_jis").collect();
assert_eq!(sjis.len(), 1);
assert!(sjis[0].content.contains("ABC"));
assert!(sjis[0].content.contains("\\x82\\xA0"));
}
#[test]
fn detects_utf8_multibyte_string() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
let off = 0x1000usize;
// "Café" = 'C', 'a', 'f', 0xC3 0xA9 (é), then more ASCII to reach min length
let s: &[u8] = b"Caf\xC3\xA9eteria\0";
pe[off..off + s.len()].copy_from_slice(s);
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
let u8s: Vec<_> = strings.iter().filter(|s| s.encoding == "utf8").collect();
assert_eq!(u8s.len(), 1);
assert_eq!(u8s[0].content, "Café".to_string() + "eteria");
}
#[test]
fn requires_nul_terminator() {
let image_base = 0x82000000u32;
let mut pe = vec![0u8; 0x1100];
// No trailing NUL — should NOT be detected.
let off = 0x1000usize;
let s = b"abcdefghij";
pe[off..off + s.len()].copy_from_slice(s);
// Fill rest of section with 0xFF so the run terminates cleanly without NUL.
for j in off + s.len()..off + 0x100 { pe[j] = 0xFF; }
let sections = vec![mk_section(".rdata", 0x1000, 0x100)];
let strings = analyze(&pe, image_base, &sections);
assert_eq!(strings.len(), 0);
}
}

View File

@@ -0,0 +1,424 @@
//! MSVC vtable + RTTI detection.
//!
//! Heuristic two-pass scan over the binary's read-only data sections. Pass 1
//! finds candidate vtables — runs of ≥3 contiguous big-endian u32 values that
//! all land on known function entries. Pass 2 attempts the MSVC RTTI walk
//! `vtable[-1] → CompleteObjectLocator → TypeDescriptor → mangled name`. When
//! RTTI is stripped (typical for shipped game binaries), each anonymous vtable
//! gets a deterministic name `ANON_Class_<hex>` keyed by a hash of its
//! sorted method PCs (so identical vtables across multiple class instances
//! collapse to one entry).
//!
//! What this module does NOT do:
//! - Vtables in heap-allocated memory (built at runtime by ctors) are out of
//! scope — only vtables present statically in `.rdata` / `.data`.
//! - RTTI inheritance (`BaseClassDescriptor` walk) is best-effort; we record
//! the first-level base list when present and leave it NULL otherwise.
//! - Multiple-inheritance "extra" vftables (one per base subobject) are
//! detected as independent vtables; we don't link them.
//!
//! Reference: openrce.org "Reversing Microsoft Visual C++" RTTI articles
//! (CompleteObjectLocator / TypeDescriptor / BaseClassDescriptor layout).
use std::collections::BTreeMap;
use xenia_xex::pe::PeSection;
use crate::demangle;
/// One detected vtable.
#[derive(Debug, Clone)]
pub struct Vtable {
/// Absolute VA of `vtable[0]` (first method slot).
pub address: u32,
/// Number of methods in the vtable.
pub length: u32,
/// Absolute VA of the `CompleteObjectLocator` from `vtable[-1]`, if it
/// looked like a valid pointer into `.rdata`. NULL when no RTTI / stripped.
pub col_address: Option<u32>,
/// Class name. Demangled from RTTI when available, otherwise the synthetic
/// `ANON_Class_<hex>` form.
pub class_name: String,
/// True when the COL → TypeDescriptor walk succeeded.
pub rtti_present: bool,
/// First-level base class names from `RTTIClassHierarchyDescriptor`, JSON-encoded.
/// `None` when not parseable.
pub base_classes_json: Option<String>,
/// One entry per slot: function VA in `.text`.
pub methods: Vec<u32>,
}
/// Run the vtable scan + RTTI walk. `function_starts` is the set of valid
/// `.text` function entry VAs from M1's corrected `functions` table.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
pub fn analyze(
pe: &[u8],
image_base: u32,
sections: &[PeSection],
function_starts: &std::collections::BTreeSet<u32>,
) -> Vec<Vtable> {
let started = std::time::Instant::now();
// Sections we'll scan for vtable bodies.
let scan_targets: Vec<&PeSection> = sections
.iter()
.filter(|s| matches!(s.name.as_str(), ".rdata" | ".data"))
.collect();
// Range table for "is this VA in .rdata or .data?"
let rdata_ranges: Vec<(u32, u32)> = sections
.iter()
.filter(|s| s.name == ".rdata")
.map(|s| (image_base + s.virtual_address, image_base + s.virtual_address + s.virtual_size))
.collect();
let mut candidates: Vec<Vtable> = Vec::new();
for section in scan_targets {
let va_start = image_base + section.virtual_address;
let va_end = va_start + section.virtual_size;
let raw_start = section.virtual_address as usize;
let raw_end = (section.virtual_address + section.virtual_size) as usize;
if raw_end > pe.len() { continue; }
let bytes = &pe[raw_start..raw_end.min(pe.len())];
let mut i = 0usize;
while i + 12 <= bytes.len() {
// Try to start a run at this 4-aligned offset.
if !i.is_multiple_of(4) { i += 1; continue; }
let mut run_len = 0usize;
let mut methods: Vec<u32> = Vec::new();
let mut j = i;
while j + 4 <= bytes.len() {
let val = u32::from_be_bytes([bytes[j], bytes[j + 1], bytes[j + 2], bytes[j + 3]]);
if function_starts.contains(&val) {
methods.push(val);
run_len += 1;
j += 4;
} else {
break;
}
}
if run_len >= 3 {
let address = va_start + (i as u32);
candidates.push(Vtable {
address,
length: run_len as u32,
col_address: None,
class_name: synth_anon_name(&methods),
rtti_present: false,
base_classes_json: None,
methods,
});
i += run_len * 4;
} else {
i += 4;
}
}
let _ = (va_start, va_end);
}
// RTTI walk: for each candidate, look at vtable[-1].
let pe_image_base = image_base;
for v in &mut candidates {
if v.address < 4 { continue; }
let col_off = (v.address - pe_image_base - 4) as usize;
if col_off + 4 > pe.len() { continue; }
let col_ptr = u32::from_be_bytes([pe[col_off], pe[col_off + 1], pe[col_off + 2], pe[col_off + 3]]);
if col_ptr == 0 { continue; }
if !is_in_ranges(col_ptr, &rdata_ranges) { continue; }
// Try to extract the TypeDescriptor mangled-name string.
if let Some((td_ptr, hierarchy_ptr)) = read_col(pe, image_base, col_ptr)
&& let Some(mangled) = read_typedescriptor_name(pe, image_base, td_ptr, &rdata_ranges)
&& let Some(class) = demangle_rtti_typename(&mangled)
{
v.col_address = Some(col_ptr);
v.class_name = class;
v.rtti_present = true;
v.base_classes_json = read_class_hierarchy(pe, image_base, hierarchy_ptr, &rdata_ranges);
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
let rtti_count = candidates.iter().filter(|v| v.rtti_present).count();
metrics::histogram!("analysis.phase_ms", "phase" => "vtables").record(elapsed_ms);
tracing::info!(
vtables = candidates.len(),
rtti = rtti_count,
anon = candidates.len() - rtti_count,
elapsed_ms,
"vtable scan complete"
);
candidates
}
fn is_in_ranges(addr: u32, ranges: &[(u32, u32)]) -> bool {
ranges.iter().any(|&(s, e)| addr >= s && addr < e)
}
/// Read 4 big-endian bytes at absolute VA `addr` from the PE image.
fn read_be_u32(pe: &[u8], image_base: u32, addr: u32) -> Option<u32> {
let off = addr.wrapping_sub(image_base) as usize;
if off + 4 > pe.len() { return None; }
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
}
/// Parse a `CompleteObjectLocator` at VA `col`. Returns
/// `(type_descriptor_ptr, class_hierarchy_descriptor_ptr)` on success.
///
/// Layout (32-bit MSVC):
/// ```text
/// +0x00 signature (0 for x86 without /GR-, can be 1)
/// +0x04 offset within complete object
/// +0x08 cdOffset (this-pointer adjuster)
/// +0x0C TypeDescriptor *
/// +0x10 RTTIClassHierarchyDescriptor *
/// ```
fn read_col(pe: &[u8], image_base: u32, col: u32) -> Option<(u32, u32)> {
let td = read_be_u32(pe, image_base, col + 0x0C)?;
let chd = read_be_u32(pe, image_base, col + 0x10)?;
if td == 0 { return None; }
Some((td, chd))
}
/// Read a TypeDescriptor's mangled-name string at VA `td`.
///
/// Layout: `+0x00` vftable ptr, `+0x04` "spare", `+0x08` zero-terminated
/// mangled name (e.g. `.?AVClassName@@`).
fn read_typedescriptor_name(
pe: &[u8],
image_base: u32,
td: u32,
rdata_ranges: &[(u32, u32)],
) -> Option<String> {
if !is_in_ranges(td, rdata_ranges) { return None; }
let name_va = td + 0x08;
let off = name_va.wrapping_sub(image_base) as usize;
if off + 1 > pe.len() { return None; }
// Read up to 256 bytes or until NUL.
let mut end = off;
while end < pe.len().min(off + 256) && pe[end] != 0 { end += 1; }
if end == off { return None; }
let s = std::str::from_utf8(&pe[off..end]).ok()?;
// Sanity: MSVC RTTI names always start with `.?A`.
if !s.starts_with(".?A") { return None; }
Some(s.to_string())
}
/// Demangle an RTTI type-name string of the form `.?AVClassName@ns@@`.
/// MSVC convention: leading `.` is the marker for an RTTI string; strip it
/// before passing to the demangler.
fn demangle_rtti_typename(rtti_name: &str) -> Option<String> {
let stripped = rtti_name.strip_prefix('.')?;
let raw = msvc_demangler::demangle(stripped, msvc_demangler::DemangleFlags::llvm()).ok()?;
// Output looks like `class xe::apu::AudioSystem` or `struct foo::Bar`.
let cls = raw
.strip_prefix("class ")
.or_else(|| raw.strip_prefix("struct "))
.or_else(|| raw.strip_prefix("union "))
.unwrap_or(&raw);
Some(cls.to_string())
}
/// Best-effort `RTTIClassHierarchyDescriptor` walk: read the
/// `BaseClassArray` entries and demangle each base's TypeDescriptor name.
/// Returns a JSON array string on success.
///
/// Layout:
/// ```text
/// RTTIClassHierarchyDescriptor:
/// +0x00 signature
/// +0x04 attributes
/// +0x08 numBaseClasses
/// +0x0C BaseClassArray * (-> array of BaseClassDescriptor *)
/// BaseClassDescriptor:
/// +0x00 TypeDescriptor *
/// +0x04 numContainedBases
/// ...
/// ```
fn read_class_hierarchy(
pe: &[u8],
image_base: u32,
chd: u32,
rdata_ranges: &[(u32, u32)],
) -> Option<String> {
if !is_in_ranges(chd, rdata_ranges) { return None; }
let num_bases = read_be_u32(pe, image_base, chd + 0x08)?;
if num_bases == 0 || num_bases > 256 { return None; } // sanity cap
let bca_ptr = read_be_u32(pe, image_base, chd + 0x0C)?;
if !is_in_ranges(bca_ptr, rdata_ranges) { return None; }
let mut names: Vec<String> = Vec::new();
for i in 0..num_bases {
let bcd_ptr = match read_be_u32(pe, image_base, bca_ptr + i * 4) {
Some(p) if is_in_ranges(p, rdata_ranges) => p,
_ => return None,
};
let td_ptr = match read_be_u32(pe, image_base, bcd_ptr) {
Some(p) if is_in_ranges(p, rdata_ranges) => p,
_ => return None,
};
let mangled = match read_typedescriptor_name(pe, image_base, td_ptr, rdata_ranges) {
Some(s) => s,
None => return None,
};
let cls = demangle_rtti_typename(&mangled).unwrap_or(mangled);
names.push(cls);
}
serde_json::to_string(&names).ok()
}
/// Synthetic name for an RTTI-stripped vtable, derived from a stable hash of
/// the sorted method-PC list. Two vtables with identical method ordering
/// collapse to the same anonymous name.
fn synth_anon_name(methods: &[u32]) -> String {
// FNV-1a 64-bit on the sorted PC list; we only use 32 bits for brevity.
let mut sorted = methods.to_vec();
sorted.sort_unstable();
let mut h: u64 = 0xcbf29ce484222325;
for pc in &sorted {
for b in pc.to_le_bytes() {
h ^= b as u64;
h = h.wrapping_mul(0x100000001b3);
}
}
format!("ANON_Class_{:08X}", (h as u32))
}
/// Build the per-method `(vtable_address, slot, function_address)` list for
/// DB insertion, with optional demangled-name lookup for any function that
/// has a matching `?…` label. Skips slots whose function isn't in the
/// supplied label map.
pub fn methods_table(
vtables: &[Vtable],
labels: &std::collections::HashMap<u32, String>,
) -> Vec<(u32, u32, u32, Option<String>, Option<String>)> {
let mut out = Vec::new();
for v in vtables {
for (slot, &fn_va) in v.methods.iter().enumerate() {
let label = labels.get(&fn_va).cloned();
let demangled = label.as_ref()
.and_then(|l| demangle::demangle(l).map(|d| d.raw_demangled));
out.push((v.address, slot as u32, fn_va, label, demangled));
}
}
out
}
/// Build a `class_name → Vtable` summary for the `classes` table. Multiple
/// vtables sharing the same class name (multiple instances at link time)
/// collapse via `BTreeMap` — the first detected vtable wins.
pub fn classes_table(vtables: &[Vtable]) -> Vec<(String, u32, bool, Option<String>)> {
let mut by_name: BTreeMap<String, &Vtable> = BTreeMap::new();
for v in vtables {
by_name.entry(v.class_name.clone()).or_insert(v);
}
by_name
.into_iter()
.map(|(name, v)| (name, v.address, v.rtti_present, v.base_classes_json.clone()))
.collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn synth_anon_name_is_stable() {
let a = synth_anon_name(&[0x82001000, 0x82001100, 0x82001200]);
let b = synth_anon_name(&[0x82001200, 0x82001000, 0x82001100]);
assert_eq!(a, b, "anon name must be order-independent");
}
#[test]
fn synth_anon_name_differs_for_different_methods() {
let a = synth_anon_name(&[0x82001000, 0x82001100]);
let b = synth_anon_name(&[0x82002000, 0x82002100]);
assert_ne!(a, b);
}
#[test]
fn detects_3_method_vtable_in_rdata() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let text_va = 0x2000u32;
let rdata_size = 16u32;
let text_size = 0x100u32;
// PE buffer big enough for both sections.
let total = (text_va + text_size) as usize;
let mut pe = vec![0u8; total];
// Vtable: 3 method PCs at .rdata start, all valid function entries.
let m: [u32; 3] = [image_base + text_va, image_base + text_va + 0x10, image_base + text_va + 0x20];
for (i, val) in m.iter().enumerate() {
pe[rdata_va as usize + i * 4..rdata_va as usize + (i + 1) * 4]
.copy_from_slice(&val.to_be_bytes());
}
let sections = vec![
PeSection {
name: ".rdata".into(),
virtual_address: rdata_va,
virtual_size: rdata_size,
raw_offset: rdata_va,
raw_size: rdata_size,
flags: 0x4000_0040,
},
PeSection {
name: ".text".into(),
virtual_address: text_va,
virtual_size: text_size,
raw_offset: text_va,
raw_size: text_size,
flags: 0x6000_0020,
},
];
let mut function_starts = std::collections::BTreeSet::new();
for &pc in &m { function_starts.insert(pc); }
let vtables = analyze(&pe, image_base, &sections, &function_starts);
assert_eq!(vtables.len(), 1);
assert_eq!(vtables[0].length, 3);
assert_eq!(vtables[0].address, image_base + rdata_va);
assert!(vtables[0].class_name.starts_with("ANON_Class_"));
assert!(!vtables[0].rtti_present);
}
#[test]
fn rejects_2_method_run() {
let image_base = 0x82000000u32;
let rdata_va = 0x1000u32;
let text_va = 0x2000u32;
let total = (text_va + 0x100) as usize;
let mut pe = vec![0u8; total];
let m: [u32; 2] = [image_base + text_va, image_base + text_va + 0x10];
for (i, val) in m.iter().enumerate() {
pe[rdata_va as usize + i * 4..rdata_va as usize + (i + 1) * 4]
.copy_from_slice(&val.to_be_bytes());
}
let sections = vec![
PeSection {
name: ".rdata".into(),
virtual_address: rdata_va,
virtual_size: 8,
raw_offset: rdata_va,
raw_size: 8,
flags: 0x4000_0040,
},
PeSection {
name: ".text".into(),
virtual_address: text_va,
virtual_size: 0x100,
raw_offset: text_va,
raw_size: 0x100,
flags: 0x6000_0020,
},
];
let mut function_starts = std::collections::BTreeSet::new();
for &pc in &m { function_starts.insert(pc); }
let vtables = analyze(&pe, image_base, &sections, &function_starts);
assert_eq!(vtables.len(), 0, "runs of 2 must be rejected to keep false-positive rate down");
}
}

View File

@@ -8,23 +8,25 @@ use crate::func::FuncAnalysis;
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub enum XrefKind {
Call, // bl
Jump, // b (unconditional)
Branch, // bc / bXX (conditional)
DataRead, // lwz, lbz, lhz, lha, lfs, lfd, etc. from resolved address
DataWrite, // stw, stb, sth, stfs, stfd, etc. to resolved address
DataRef, // address computed via lis+addi/ori but not directly loaded/stored
Call, // bl
IndirectCall, // bcctrl through a statically-resolvable vtable slot (M5)
Jump, // b (unconditional)
Branch, // bc / bXX (conditional)
DataRead, // lwz, lbz, lhz, lha, lfs, lfd, etc. from resolved address
DataWrite, // stw, stb, sth, stfs, stfd, etc. to resolved address
DataRef, // address computed via lis+addi/ori but not directly loaded/stored
}
impl XrefKind {
pub fn tag(self) -> &'static str {
match self {
XrefKind::Call => "call",
XrefKind::Jump => "j",
XrefKind::Branch => "br",
XrefKind::DataRead => "read",
XrefKind::DataWrite => "write",
XrefKind::DataRef => "ref",
XrefKind::Call => "call",
XrefKind::IndirectCall => "ind_call",
XrefKind::Jump => "j",
XrefKind::Branch => "br",
XrefKind::DataRead => "read",
XrefKind::DataWrite => "write",
XrefKind::DataRef => "ref",
}
}
@@ -37,10 +39,56 @@ impl XrefKind {
}
}
/// Sub-classification of how `source`'s instruction computes its target
/// address. Only meaningful for data xrefs (`read` / `write` / `ref`); call
/// / jump / branch / ind_call rows store `None`.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum AddrMode {
/// Standard signed-16 displacement: `lwz rD, simm(rA)`, `stw rS, simm(rA)`,
/// FP D-forms (`lfs/lfd/stfs/stfd`), update variants. The dominant case.
DForm,
/// Address materialised via `lis + addi` register tracking — no
/// load/store yet at this site.
LisAddi,
/// Address materialised via `lis + ori` register tracking.
LisOri,
/// Multi-word D-form: `lmw / stmw rS, simm(rA)` — emits one xref per
/// register slot (32-rS slots starting at the resolved base).
Multiword,
/// X-form indexed: `stwx / stbx / sthx / stwux / stbux / sthux / stdx /
/// stdux` plus AltiVec/VMX vector stores `stvx / stvxl / stvebx /
/// stvehx / stvewx`. Static resolution requires both rA and rB
/// constant. (M6 + VMX follow-up.)
XFormIndexed,
/// X-form byte-reverse: `stwbrx / sthbrx / lwbrx / lhbrx`.
XFormByteRev,
/// Reservation/atomic store-conditional: `stwcx. / stdcx.`.
Atomic,
/// Cache-line clear: `dcbz rA, rB` — clears 32 bytes at rA+rB.
DCBZ,
}
impl AddrMode {
pub fn tag(self) -> &'static str {
match self {
AddrMode::DForm => "d_form",
AddrMode::LisAddi => "lis_addi",
AddrMode::LisOri => "lis_ori",
AddrMode::Multiword => "multiword",
AddrMode::XFormIndexed => "x_form_indexed",
AddrMode::XFormByteRev => "x_form_byterev",
AddrMode::Atomic => "atomic",
AddrMode::DCBZ => "dcbz",
}
}
}
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Xref {
pub source: u32,
pub kind: XrefKind,
/// `None` for control-flow edges; `Some(...)` for data edges.
pub addr_mode: Option<AddrMode>,
}
pub type XrefMap = HashMap<u32, Vec<Xref>>;
@@ -53,6 +101,7 @@ pub struct XrefResult {
}
/// Perform full cross-reference analysis on a PE image.
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base), entry_point = format_args!("{:#010x}", entry_point)))]
pub fn analyze_xrefs(
pe: &[u8],
image_base: u32,
@@ -61,6 +110,7 @@ pub fn analyze_xrefs(
func_analysis: &FuncAnalysis,
import_map: &HashMap<u32, String>,
) -> XrefResult {
let started = std::time::Instant::now();
let func_labels = func_analysis.generate_labels();
let mut labels: HashMap<u32, String> = func_labels;
labels.insert(entry_point, "entry_point".to_string());
@@ -124,7 +174,7 @@ pub fn analyze_xrefs(
let rd = ((instr >> 21) & 0x1F) as usize;
let ra = ((instr >> 16) & 0x1F) as usize;
let simm = ((instr & 0xFFFF) as i16) as i32;
let uimm = (instr & 0xFFFF) as u32;
let uimm = instr & 0xFFFF;
// Reset tracking on function boundaries (prologue = mfspr rN, LR)
if opcode == 31 {
@@ -156,7 +206,10 @@ pub fn analyze_xrefs(
let data_addr = base.wrapping_add(simm as u32);
if is_in_ranges(data_addr, &data_ranges) {
data_annotations.insert(abs_addr, (data_addr, XrefKind::DataRef));
xrefs.entry(data_addr).or_default().push(Xref { source: abs_addr, kind: XrefKind::DataRef });
xrefs.entry(data_addr).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataRef,
addr_mode: Some(AddrMode::LisAddi),
});
labels.entry(data_addr).or_insert_with(|| format!("dat_{data_addr:08X}"));
}
reg_hi[rd] = Some(data_addr); // propagate for chained access
@@ -171,7 +224,10 @@ pub fn analyze_xrefs(
let data_addr = base | uimm;
if is_in_ranges(data_addr, &data_ranges) {
data_annotations.insert(abs_addr, (data_addr, XrefKind::DataRef));
xrefs.entry(data_addr).or_default().push(Xref { source: abs_addr, kind: XrefKind::DataRef });
xrefs.entry(data_addr).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataRef,
addr_mode: Some(AddrMode::LisOri),
});
labels.entry(data_addr).or_insert_with(|| format!("dat_{data_addr:08X}"));
}
reg_hi[ra] = Some(data_addr);
@@ -180,33 +236,163 @@ pub fn analyze_xrefs(
}
}
// Load instructions: lwz, lbz, lhz, lha, lfs, lfd, lwzu, etc.
32 | 33 | 34 | 35 | 40 | 41 | 42 | 43 | 46 | 48 | 49 | 50 | 51 => {
if ra != 0 {
if let Some(base) = reg_hi[ra] {
32 | 33 | 34 | 35 | 40 | 41 | 42 | 43 | 48 | 49 | 50 | 51 => {
if ra != 0
&& let Some(base) = reg_hi[ra] {
let data_addr = base.wrapping_add(simm as u32);
if is_in_ranges(data_addr, &data_ranges) {
data_annotations.insert(abs_addr, (data_addr, XrefKind::DataRead));
xrefs.entry(data_addr).or_default().push(Xref { source: abs_addr, kind: XrefKind::DataRead });
xrefs.entry(data_addr).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataRead,
addr_mode: Some(AddrMode::DForm),
});
labels.entry(data_addr).or_insert_with(|| format!("dat_{data_addr:08X}"));
}
}
}
// Load into rD may clobber the tracked value
reg_hi[rd] = None;
}
// lmw rD, simm(rA) — D-form multi-word load. Reads (32-rD)
// consecutive 4-byte words starting at base+simm into
// rD..r31. Emits one DataRead per slot.
46 => {
if ra != 0
&& let Some(base) = reg_hi[ra]
{
let mut addr_w = base.wrapping_add(simm as u32);
for _slot in (rd as u32)..32 {
if is_in_ranges(addr_w, &data_ranges) {
data_annotations.insert(abs_addr, (addr_w, XrefKind::DataRead));
xrefs.entry(addr_w).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataRead,
addr_mode: Some(AddrMode::Multiword),
});
labels.entry(addr_w).or_insert_with(|| format!("dat_{addr_w:08X}"));
}
addr_w = addr_w.wrapping_add(4);
}
}
reg_hi[rd] = None;
}
// Store instructions: stw, stb, sth, stfs, stfd, stwu, etc.
36 | 37 | 38 | 39 | 44 | 45 | 47 | 52 | 53 | 54 | 55 => {
if ra != 0 {
if let Some(base) = reg_hi[ra] {
36 | 37 | 38 | 39 | 44 | 45 | 52 | 53 | 54 | 55 => {
if ra != 0
&& let Some(base) = reg_hi[ra] {
let data_addr = base.wrapping_add(simm as u32);
if is_in_ranges(data_addr, &data_ranges) {
data_annotations.insert(abs_addr, (data_addr, XrefKind::DataWrite));
xrefs.entry(data_addr).or_default().push(Xref { source: abs_addr, kind: XrefKind::DataWrite });
xrefs.entry(data_addr).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataWrite,
addr_mode: Some(AddrMode::DForm),
});
labels.entry(data_addr).or_insert_with(|| format!("dat_{data_addr:08X}"));
}
}
}
// stmw rS, simm(rA) — D-form multi-word store. Writes
// (32-rS) consecutive 4-byte words from rS..r31 to
// base+simm onward. Emits one DataWrite per slot.
47 => {
if ra != 0
&& let Some(base) = reg_hi[ra]
{
let mut addr_w = base.wrapping_add(simm as u32);
for _slot in (rd as u32)..32 {
if is_in_ranges(addr_w, &data_ranges) {
data_annotations.insert(abs_addr, (addr_w, XrefKind::DataWrite));
xrefs.entry(addr_w).or_default().push(Xref {
source: abs_addr, kind: XrefKind::DataWrite,
addr_mode: Some(AddrMode::Multiword),
});
labels.entry(addr_w).or_insert_with(|| format!("dat_{addr_w:08X}"));
}
addr_w = addr_w.wrapping_add(4);
}
}
}
// X-form: opcode 31 — indexed loads/stores, atomic ops, dcbz.
// We can't statically resolve `rA + rB` without tracking rB
// too; we record an xref ONLY when rB is also a known
// constant (rare) OR when rB is r0 (which encodes as zero).
// Falls through to the generic-clobber arm afterwards via
// the explicit reg_hi update.
31 => {
let xo = (instr >> 1) & 0x3FF;
let rb = ((instr >> 11) & 0x1F) as usize;
let resolve_rab = |reg_hi: &[Option<u32>; 32]| -> Option<u32> {
let a = if ra == 0 { Some(0u32) } else { reg_hi[ra] };
let b = if rb == 0 { Some(0u32) } else { reg_hi[rb] };
match (a, b) {
(Some(av), Some(bv)) => Some(av.wrapping_add(bv)),
_ => None,
}
};
let mode_for_xo = |xo: u32| -> Option<(AddrMode, XrefKind)> {
match xo {
// Atomic store-conditional
150 => Some((AddrMode::Atomic, XrefKind::DataWrite)), // stwcx.
214 => Some((AddrMode::Atomic, XrefKind::DataWrite)), // stdcx.
// Byte-reverse stores
662 => Some((AddrMode::XFormByteRev, XrefKind::DataWrite)), // stwbrx
918 => Some((AddrMode::XFormByteRev, XrefKind::DataWrite)), // sthbrx
// Byte-reverse loads
534 => Some((AddrMode::XFormByteRev, XrefKind::DataRead)), // lwbrx
790 => Some((AddrMode::XFormByteRev, XrefKind::DataRead)), // lhbrx
// dcbz — cache-line zero (32-byte clear). Treat as a write.
1014 => Some((AddrMode::DCBZ, XrefKind::DataWrite)),
// Plain X-form indexed stores (the common ones)
151 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stwx
215 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stbx
407 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // sthx
183 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stwux
247 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stbux
439 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // sthux
149 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stdx
181 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stdux
// Plain X-form indexed loads
23 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lwzx
87 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lbzx
279 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lhzx
343 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lhax
55 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lwzux
119 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lbzux
311 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lhzux
375 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lhaux
21 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // ldx
53 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // ldux
// AltiVec/VMX (opcode 31) loads & stores. Element
// variants store one byte/halfword/word; full
// `stvx` stores 16 bytes. Address resolution still
// requires both rA and rB constant — common only
// in static-table setup loops.
231 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stvx
487 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stvxl
135 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stvebx
167 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stvehx
199 => Some((AddrMode::XFormIndexed, XrefKind::DataWrite)), // stvewx
// AltiVec/VMX loads — same XO range, kind=read.
103 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lvx
359 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lvxl
7 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lvebx
39 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lvehx
71 => Some((AddrMode::XFormIndexed, XrefKind::DataRead)), // lvewx
_ => None,
}
};
if let Some((addr_mode, kind)) = mode_for_xo(xo)
&& let Some(data_addr) = resolve_rab(&reg_hi)
&& is_in_ranges(data_addr, &data_ranges)
{
data_annotations.insert(abs_addr, (data_addr, kind));
xrefs.entry(data_addr).or_default().push(Xref {
source: abs_addr, kind,
addr_mode: Some(addr_mode),
});
labels.entry(data_addr).or_insert_with(|| format!("dat_{data_addr:08X}"));
}
// Fall through: any X-form op may write rD; invalidate.
reg_hi[rd] = None;
}
// Any other instruction writing to rD: invalidate
_ => {
// Conservatively invalidate for instructions that modify rD
@@ -221,6 +407,17 @@ pub fn analyze_xrefs(
}
}
let elapsed_ms = started.elapsed().as_millis() as f64;
metrics::histogram!("analysis.phase_ms", "phase" => "xrefs").record(elapsed_ms);
let total_xrefs: usize = xrefs.values().map(|v| v.len()).sum();
tracing::info!(
labels = labels.len(),
xrefs = total_xrefs,
data_annotations = data_annotations.len(),
elapsed_ms,
"xref analysis complete"
);
XrefResult { labels, xrefs, data_annotations }
}
@@ -235,7 +432,7 @@ fn collect_branch_target(instr: u32, addr: u32, labels: &mut HashMap<u32, String
let target = if aa { li as u32 } else { addr.wrapping_add(li as u32) };
labels.entry(target).or_insert_with(|| format!("loc_{target:08X}"));
let kind = if lk { XrefKind::Call } else { XrefKind::Jump };
xrefs.entry(target).or_default().push(Xref { source: addr, kind });
xrefs.entry(target).or_default().push(Xref { source: addr, kind, addr_mode: None });
}
16 => {
// B-form: bc/bcl
@@ -243,7 +440,7 @@ fn collect_branch_target(instr: u32, addr: u32, labels: &mut HashMap<u32, String
let aa = instr & 2 != 0;
let target = if aa { bd as u32 } else { addr.wrapping_add(bd as u32) };
labels.entry(target).or_insert_with(|| format!("loc_{target:08X}"));
xrefs.entry(target).or_default().push(Xref { source: addr, kind: XrefKind::Branch });
xrefs.entry(target).or_default().push(Xref { source: addr, kind: XrefKind::Branch, addr_mode: None });
}
_ => {}
}
@@ -262,7 +459,7 @@ fn is_in_ranges(addr: u32, ranges: &[(u32, u32)]) -> bool {
}
/// Find which section a data address falls in.
pub fn section_for_addr<'a>(addr: u32, sections: &'a [PeSection], image_base: u32) -> Option<&'a str> {
pub fn section_for_addr(addr: u32, sections: &[PeSection], image_base: u32) -> Option<&str> {
for s in sections {
let start = image_base + s.virtual_address;
let end = start + s.virtual_size;
@@ -285,12 +482,44 @@ pub fn resolve_source_label(
}
// Find the containing function (largest start <= addr)
if let Some((&func_start, _fi)) = func_analysis.functions.range(..=addr).next_back() {
if let Some(func_label) = labels.get(&func_start) {
if let Some((&func_start, _fi)) = func_analysis.functions.range(..=addr).next_back()
&& let Some(func_label) = labels.get(&func_start) {
let offset = addr - func_start;
return format!("{func_label}+0x{offset:X}");
}
}
format!("0x{addr:08X}")
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn addr_mode_tags_are_distinct() {
let modes = [
AddrMode::DForm,
AddrMode::LisAddi,
AddrMode::LisOri,
AddrMode::Multiword,
AddrMode::XFormIndexed,
AddrMode::XFormByteRev,
AddrMode::Atomic,
AddrMode::DCBZ,
];
let tags: std::collections::HashSet<&str> = modes.iter().map(|m| m.tag()).collect();
assert_eq!(tags.len(), modes.len(), "every AddrMode variant must have a unique tag");
}
#[test]
fn xref_struct_carries_addr_mode_for_data_edges() {
let x = Xref { source: 0x1234, kind: XrefKind::DataWrite, addr_mode: Some(AddrMode::DForm) };
assert_eq!(x.addr_mode.unwrap().tag(), "d_form");
}
#[test]
fn xref_struct_addr_mode_is_none_for_call_edges() {
let x = Xref { source: 0x1234, kind: XrefKind::Call, addr_mode: None };
assert!(x.addr_mode.is_none());
}
}

View File

@@ -0,0 +1,362 @@
//! DB schema golden — locks the column layout (names + types) of every
//! table written by `DbWriter`. A schema change here without a fixture
//! update fails the test, forcing a conscious decision before downstream
//! query consumers break.
//!
//! The fixture is constructed in-process (no XEX/ISO needed): a small
//! synthetic PE-shaped byte slice with one `.text` section of 4
//! instructions, plus an empty import-library list and one detected
//! function.
use std::collections::{BTreeMap, HashMap};
use std::io::Write;
use duckdb::Connection;
use xenia_analysis::DbWriter;
use xenia_analysis::formatter::DisasmInfo;
use xenia_analysis::func::{FuncAnalysis, FuncInfo};
use xenia_analysis::xref::XrefMap;
use xenia_xex::pe::PeSection;
/// Build a 16-byte `.text` section: 4 instructions (mflr / nop / blr / nop).
fn synthetic_pe() -> (Vec<u8>, Vec<PeSection>, Vec<xenia_xex::header::ImportLibrary>) {
// VA layout: image_base + 0x1000 = .text start (so RVA = 0x1000).
// The DB writer expects pe[rva] to hold the byte at that RVA, so the
// buffer must be at least 0x1000 + section_size bytes long.
const RVA: usize = 0x1000;
const TEXT: [u32; 4] = [
// mfspr r12, LR (a.k.a. mflr r12) — opcode 31, xo 339, spr 8 (LR).
// Encoded with spr halves swapped per the ISA: spr_field = (8<<5).
(31u32 << 26) | (12 << 21) | ((8 << 5) << 11) | (339 << 1),
0x60000000, // nop (ori r0, r0, 0)
(19u32 << 26) | (20 << 21) | (16 << 1), // blr (bclr 20, 0)
0x60000000, // nop
];
let mut pe = vec![0u8; RVA + 16];
for (i, &word) in TEXT.iter().enumerate() {
pe[RVA + i * 4..RVA + i * 4 + 4].copy_from_slice(&word.to_be_bytes());
}
let sections = vec![PeSection {
name: ".text".to_string(),
virtual_address: 0x1000,
virtual_size: 16,
raw_offset: 0x1000,
raw_size: 16,
flags: 0x60000020, // CODE | EXECUTE | READ
}];
let import_libraries = vec![]; // No imports in the fixture.
(pe, sections, import_libraries)
}
fn synthetic_func_analysis(image_base: u32) -> FuncAnalysis {
// Single function covering all four .text instructions.
let entry = image_base + 0x1000;
let mut functions = BTreeMap::new();
functions.insert(
entry,
FuncInfo {
start: entry,
end: entry + 16,
frame_size: 0,
saved_gprs: 0,
is_leaf: true,
is_saverestore: false,
pdata_validated: false,
pdata_length: None,
has_eh: false,
},
);
FuncAnalysis {
functions,
save_gpr_base: None,
restore_gpr_base: None,
pdata_entries: Vec::new(),
}
}
#[test]
fn db_schema_matches_expected_columns() {
let (pe, sections, libs) = synthetic_pe();
let image_base = 0x82000000u32;
let entry = image_base + 0x1000;
let info = DisasmInfo {
image_base,
entry_point: entry,
original_pe_name: Some("synthetic.exe"),
title_id: Some(0xDEADBEEF),
media_id: Some(0xCAFEF00D),
sections: &sections,
import_libraries: &libs,
};
let func_analysis = synthetic_func_analysis(image_base);
let mut labels: HashMap<u32, String> = HashMap::new();
labels.insert(entry, "entry_point".to_string());
let xrefs: XrefMap = XrefMap::new();
let tmp = std::env::temp_dir().join("xenia_rs_schema_golden.duckdb");
let _ = std::fs::remove_file(&tmp);
{
let mut w = DbWriter::open_fresh(&tmp).expect("open fresh DB");
w.write_base(&info).expect("write_base");
w.ingest_instructions(&pe, &info, &func_analysis, &labels)
.expect("ingest_instructions");
w.write_analysis_results(&pe, &info, &func_analysis, &labels, &xrefs, &[], &[], &[], None, &[])
.expect("write_analysis_results");
w.create_sql_views().expect("create_sql_views");
}
let conn = Connection::open(&tmp).expect("reopen DB");
// Lock the column layout per table. Pairs are (name, type).
let expected: &[(&str, &[(&str, &str)])] = &[
("metadata", &[
("key", "VARCHAR"),
("value", "VARCHAR"),
]),
("sections", &[
("name", "VARCHAR"),
("virtual_address", "BIGINT"),
("virtual_size", "BIGINT"),
("raw_offset", "BIGINT"),
("raw_size", "BIGINT"),
("flags", "BIGINT"),
("is_code", "BOOLEAN"),
]),
("imports", &[
("library", "VARCHAR"),
("ordinal", "BIGINT"),
("name", "VARCHAR"),
("record_type", "BIGINT"),
("address", "BIGINT"),
]),
("instructions", &[
("address", "BIGINT"),
("raw", "BIGINT"),
("mnemonic", "VARCHAR"),
("operands", "VARCHAR"),
("disasm", "VARCHAR"),
("ext_mnemonic", "VARCHAR"),
("ext_operands", "VARCHAR"),
("ext_disasm", "VARCHAR"),
("target_hex", "BIGINT"),
("section", "VARCHAR"),
("function", "BIGINT"),
("label", "VARCHAR"),
]),
("functions", &[
("address", "BIGINT"),
("name", "VARCHAR"),
("end_address", "BIGINT"),
("frame_size", "BIGINT"),
("saved_gprs", "BIGINT"),
("is_leaf", "BOOLEAN"),
("is_saverestore", "BOOLEAN"),
("pdata_validated", "BOOLEAN"),
("pdata_length", "BIGINT"),
("has_eh", "BOOLEAN"),
]),
("pdata_entries", &[
("begin_address", "BIGINT"),
("end_address", "BIGINT"),
("function_length", "BIGINT"),
("prolog_length", "BIGINT"),
("flags", "BIGINT"),
]),
("labels", &[
("address", "BIGINT"),
("name", "VARCHAR"),
("kind", "VARCHAR"),
]),
("demangled_names", &[
("address", "BIGINT"),
("mangled", "VARCHAR"),
("raw_demangled", "VARCHAR"),
("namespace_path", "VARCHAR"),
("class_name", "VARCHAR"),
("method_name", "VARCHAR"),
("params_signature", "VARCHAR"),
]),
("vtables", &[
("address", "BIGINT"),
("length", "BIGINT"),
("col_address", "BIGINT"),
("class_name", "VARCHAR"),
("rtti_present", "BOOLEAN"),
("base_classes_json", "VARCHAR"),
]),
("methods", &[
("vtable_address", "BIGINT"),
("slot", "BIGINT"),
("function_address", "BIGINT"),
("mangled_name", "VARCHAR"),
("demangled_name", "VARCHAR"),
]),
("classes", &[
("name", "VARCHAR"),
("vtable_address", "BIGINT"),
("rtti_present", "BOOLEAN"),
("base_classes_json", "VARCHAR"),
]),
("strings", &[
("address", "BIGINT"),
("encoding", "VARCHAR"),
("length", "BIGINT"),
("content", "VARCHAR"),
]),
("tls_info", &[
("raw_data_start", "BIGINT"),
("raw_data_end", "BIGINT"),
("index_address", "BIGINT"),
("callback_array", "BIGINT"),
("zero_fill_size", "BIGINT"),
("characteristics", "BIGINT"),
]),
("tls_callbacks", &[
("slot", "BIGINT"),
("address", "BIGINT"),
]),
("function_pointer_arrays", &[
("address", "BIGINT"),
("length", "BIGINT"),
("kind", "VARCHAR"),
]),
("function_pointer_array_entries", &[
("array_address", "BIGINT"),
("slot", "BIGINT"),
("function_address", "BIGINT"),
]),
("indirect_dispatch_sites", &[
("dispatch_pc", "BIGINT"),
("vptr_offset", "BIGINT"),
("slot", "BIGINT"),
("candidate_count", "BIGINT"),
]),
("indirect_dispatch_candidates", &[
("dispatch_pc", "BIGINT"),
("vtable_address", "BIGINT"),
("method_address", "BIGINT"),
]),
("vptr_writes", &[
("writer_pc", "BIGINT"),
("vtable_address", "BIGINT"),
("vptr_offset", "BIGINT"),
("writer_function", "BIGINT"),
]),
("eh_funcinfo", &[
("address", "BIGINT"),
("magic", "BIGINT"),
("max_state", "BIGINT"),
("p_unwind_map", "BIGINT"),
("n_try_blocks", "BIGINT"),
("p_try_block_map", "BIGINT"),
("n_ip_map_entries", "BIGINT"),
("p_ip_to_state_map", "BIGINT"),
("p_es_type_list", "BIGINT"),
("eh_flags", "BIGINT"),
]),
("eh_unwind_map", &[
("funcinfo_address", "BIGINT"),
("state_index", "BIGINT"),
("to_state", "BIGINT"),
("action_pc", "BIGINT"),
]),
("eh_try_blocks", &[
("funcinfo_address", "BIGINT"),
("try_index", "BIGINT"),
("try_low", "BIGINT"),
("try_high", "BIGINT"),
("catch_high", "BIGINT"),
("n_catches", "BIGINT"),
("p_handler_array", "BIGINT"),
]),
("xrefs", &[
("source", "BIGINT"),
("target", "BIGINT"),
("kind", "VARCHAR"),
("addr_mode", "VARCHAR"),
("instruction", "VARCHAR"),
("source_func", "BIGINT"),
("source_label", "VARCHAR"),
("target_label", "VARCHAR"),
]),
];
let mut errs: Vec<String> = Vec::new();
for (table, cols) in expected {
let mut stmt = conn
.prepare(&format!("PRAGMA table_info('{}')", table))
.unwrap_or_else(|e| panic!("prepare PRAGMA for {table}: {e}"));
let rows: Vec<(String, String)> = stmt
.query_map([], |row| {
let name: String = row.get(1)?;
let ty: String = row.get(2)?;
Ok((name, ty))
})
.expect("query")
.map(|r| r.unwrap())
.collect();
if rows.len() != cols.len() {
writeln!(
std::io::stderr(),
"{table}: column count mismatch (got {}, expected {})",
rows.len(),
cols.len()
).ok();
errs.push(format!("{table}: count {} vs {}", rows.len(), cols.len()));
}
for (i, (got, expected_col)) in rows.iter().zip(cols.iter()).enumerate() {
if got.0 != expected_col.0 || got.1 != expected_col.1 {
errs.push(format!(
"{table} col {i}: got ({}, {}) expected ({}, {})",
got.0, got.1, expected_col.0, expected_col.1
));
}
}
}
assert!(errs.is_empty(), "schema drift detected:\n {}", errs.join("\n "));
// Verify row counts in the populated tables.
let n_instr: i64 = conn
.query_row("SELECT COUNT(*) FROM instructions", [], |r| r.get(0))
.unwrap();
assert_eq!(n_instr, 4, "expected 4 instruction rows from the synthetic PE");
// The synthetic mflr should produce target_hex = NULL, blr likewise (indirect).
let n_with_target: i64 = conn
.query_row("SELECT COUNT(target_hex) FROM instructions", [], |r| r.get(0))
.unwrap();
assert_eq!(n_with_target, 0, "indirect-only fixture should have no direct branch targets");
// SQL views must be queryable. The `_` in SQL LIKE is a single-char
// wildcard, so we list the names explicitly rather than `LIKE 'v_%'`
// (which also matches DuckDB's built-in `views` system view).
let expected_views = [
"v_branch_xrefs",
"v_call_graph",
"v_function_first_instruction",
"v_imports_called",
"v_indirect_reachability_from_entry",
"v_reachability_from_entry",
];
for v in expected_views {
let exists: i64 = conn
.query_row(
"SELECT COUNT(*) FROM duckdb_views() WHERE view_name = ?",
[v],
|r| r.get(0),
)
.unwrap();
assert_eq!(exists, 1, "missing SQL view: {v}");
}
let _ = std::fs::remove_file(&tmp);
}

View File

@@ -0,0 +1,123 @@
//! Analysis-side goldens: every row in the xenia-cpu fixtures must
//! round-trip cleanly through the [`xenia_analysis::ppc`] shim. This
//! pins the shim's behaviour to the canonical `xenia_cpu::disasm::format`
//! output so that any future refactor of the shim layer surfaces here.
//!
//! Loads the same JSON fixtures committed under
//! `crates/xenia-cpu/tests/golden/`. No separate analysis-side fixture
//! files — the cpu canon is the source of truth.
use std::path::PathBuf;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct GoldenRow {
label: String,
raw: String,
addr: String,
mnemonic: String,
operands: String,
#[serde(default)]
ext_mnemonic: Option<String>,
#[serde(default)]
ext_operands: Option<String>,
#[serde(default)]
branch_target: Option<String>,
}
#[derive(Debug, Deserialize)]
struct GoldenFile {
rows: Vec<GoldenRow>,
}
fn cpu_fixture(name: &str) -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("..")
.join("xenia-cpu")
.join("tests")
.join("golden")
.join(name)
}
fn parse_hex(s: &str) -> u32 {
let trimmed = s.strip_prefix("0x").or_else(|| s.strip_prefix("0X")).unwrap_or(s);
u32::from_str_radix(trimmed, 16).expect("hex u32")
}
/// Verify the shim's `Decoded { base, ext }` mirrors the canonical fields
/// from `xenia_cpu::disasm::format` for every fixture row.
fn check_fixture(fixture_name: &str) {
let path = cpu_fixture(fixture_name);
assert!(
path.exists(),
"missing fixture {} — run `cargo test -p xenia-cpu --test disasm_goldens` to (re)generate it",
path.display()
);
let src = std::fs::read_to_string(&path).unwrap();
let golden: GoldenFile = serde_json::from_str(&src).unwrap();
for row in &golden.rows {
let raw = parse_hex(&row.raw);
let addr = parse_hex(&row.addr);
let canonical =
xenia_cpu::disasm::format(&xenia_cpu::decode(raw, addr));
let shim = xenia_analysis::ppc::disasm(raw, addr);
assert_eq!(
shim.base, canonical.disasm,
"shim.base drifted for {} (raw={})",
row.label, row.raw,
);
assert_eq!(
shim.ext, canonical.ext_disasm,
"shim.ext drifted for {} (raw={})",
row.label, row.raw,
);
// Also pin against the fixture's structured fields — guards against
// someone changing the cpu canon without regenerating the fixture.
assert_eq!(canonical.mnemonic, row.mnemonic, "mnemonic drift: {}", row.label);
assert_eq!(canonical.operands, row.operands, "operands drift: {}", row.label);
assert_eq!(canonical.ext_mnemonic, row.ext_mnemonic, "ext_mnemonic drift: {}", row.label);
assert_eq!(canonical.ext_operands, row.ext_operands, "ext_operands drift: {}", row.label);
let target_str = canonical.branch_target.map(|t| format!("0x{t:08X}"));
assert_eq!(target_str, row.branch_target, "branch_target drift: {}", row.label);
}
}
#[test]
fn analysis_shim_matches_base_mnemonics() {
check_fixture("base_mnemonics.json");
}
#[test]
fn analysis_shim_matches_extended_mnemonics() {
check_fixture("extended_mnemonics.json");
}
#[test]
fn analysis_shim_matches_vmx128_registers() {
check_fixture("vmx128_registers.json");
}
/// Spot-check that the shim's `display()` returns the extended form when
/// present and falls back to the base otherwise. This is the contract
/// `formatter.rs` and the .asm output rely on.
#[test]
fn shim_display_prefers_extended() {
// ori r0, r0, 0 → base "ori r0, r0, 0x0", ext "nop"
let d = xenia_analysis::ppc::disasm(0x60000000, 0);
assert_eq!(d.display(), "nop");
// addi r3, r1, 16 → no extended form, display falls back to base
let raw = (14u32 << 26) | (3 << 21) | (1 << 16) | 16;
let d = xenia_analysis::ppc::disasm(raw, 0);
assert!(
d.ext.is_none(),
"addi r3, r1, 16 has no extended form (only addi r3, r0, … → li)"
);
assert_eq!(d.display(), d.base);
}

View File

@@ -20,9 +20,21 @@ xenia-apu = { workspace = true }
xenia-hid = { workspace = true }
xenia-debugger = { workspace = true }
xenia-analysis = { workspace = true }
xenia-ui = { workspace = true }
winit = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }
tracing-appender = { workspace = true }
tracing-chrome = { workspace = true }
tracing-error = { workspace = true }
metrics = { workspace = true }
metrics-util = { workspace = true }
pprof = { workspace = true, optional = true }
anyhow = { workspace = true }
clap = { version = "4", features = ["derive"] }
serde = { workspace = true }
serde_json = { workspace = true }
[features]
default = ["profiling"]
profiling = ["dep:pprof"]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,384 @@
//! Logging, tracing, and profiling wiring for the `xenia-rs` CLI.
//!
//! Owns the `tracing-subscriber` registry, optional file / Chrome-trace sinks,
//! the `metrics` debugging recorder, and (behind the `profiling` feature) the
//! `pprof-rs` sampling profiler. All drop-time cleanup (flushing appenders,
//! finalising Chrome output, writing flamegraphs, printing the metrics
//! summary) is carried by [`ObservabilityGuards`] so `main` just has to hold
//! the value until return.
use std::path::{Path, PathBuf};
use anyhow::{bail, Context, Result};
use tracing::Level;
use tracing_error::{ErrorLayer, SpanTrace};
use tracing_subscriber::fmt::format::FmtSpan;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
use tracing_subscriber::{fmt, EnvFilter, Layer, Registry};
/// User-selectable observability settings parsed from CLI + environment.
#[derive(Debug, Clone, Default)]
pub struct ObservabilityConfig {
/// If `true`, render console logs as JSON instead of compact text.
pub log_json: bool,
/// Additional log sink file. `.json` → JSON formatter; anything else → text.
pub log_file: Option<PathBuf>,
/// Overrides `RUST_LOG` when set. Passed through `EnvFilter::try_new`.
pub log_filter: Option<String>,
/// Default filter directive used when neither `RUST_LOG` nor
/// [`log_filter`](Self::log_filter) are set.
pub default_level: &'static str,
/// If set, emit a Chrome `about:tracing` JSON trace to this path.
pub trace_chrome: Option<PathBuf>,
/// If set, run the pprof sampling profiler and write output here on drop.
/// Extension `.svg` → flamegraph, `.pb` → protobuf.
pub profile: Option<PathBuf>,
}
impl ObservabilityConfig {
#[allow(dead_code)]
pub fn new(default_level: &'static str) -> Self {
Self {
default_level,
..Self::default()
}
}
}
/// RAII handle returned by [`init`]. Drop flushes the appender, finalises
/// Chrome output, writes the pprof report, and prints the metrics summary.
#[must_use = "drop of ObservabilityGuards is what flushes logs, profiles, and metrics"]
pub struct ObservabilityGuards {
_appender: Option<tracing_appender::non_blocking::WorkerGuard>,
_chrome: Option<tracing_chrome::FlushGuard>,
#[cfg(feature = "profiling")]
pprof: Option<(pprof::ProfilerGuard<'static>, PathBuf)>,
metrics_snapshotter: Option<metrics_util::debugging::Snapshotter>,
}
impl Drop for ObservabilityGuards {
fn drop(&mut self) {
#[cfg(feature = "profiling")]
if let Some((guard, path)) = self.pprof.take() {
if let Err(e) = write_pprof_report(&guard, &path) {
eprintln!("profile write failed: {e:#}");
} else {
tracing::info!(path = %path.display(), "pprof report written");
}
}
if let Some(snap) = self.metrics_snapshotter.take() {
print_metrics_summary(&snap);
}
}
}
/// Build and install the global tracing subscriber + metrics recorder.
pub fn init(config: &ObservabilityConfig) -> Result<ObservabilityGuards> {
let span_events = parse_span_events();
// Resolve the filter directive once; attach a freshly-built `EnvFilter`
// per sink layer via `.with_filter()`. Previously the filter was pushed
// into the layer-`Vec` but that only gates what *itself* sees in a
// boxed-Vec setup; sibling fmt / chrome / file layers kept emitting
// filtered-out events. Per-layer filtering is the idiomatic tracing-
// subscriber pattern and works cleanly with boxed layer dispatch.
let directive = resolve_filter_directive(config);
let mut layers: Vec<Box<dyn Layer<Registry> + Send + Sync + 'static>> = Vec::new();
// Console fmt layer — compact text or JSON, always stderr.
let console_layer: Box<dyn Layer<Registry> + Send + Sync + 'static> = if config.log_json {
fmt::layer()
.json()
.with_span_events(span_events.clone())
.with_writer(std::io::stderr)
.with_filter(EnvFilter::try_new(&directive).context("invalid filter")?)
.boxed()
} else {
fmt::layer()
.compact()
.with_span_events(span_events.clone())
.with_writer(std::io::stderr)
.with_filter(EnvFilter::try_new(&directive).context("invalid filter")?)
.boxed()
};
layers.push(console_layer);
// Optional file sink — also filtered.
let appender_guard = match &config.log_file {
Some(path) => {
let (layer, guard) = build_file_layer(path, span_events)?;
layers.push(
layer
.with_filter(EnvFilter::try_new(&directive).context("invalid filter")?)
.boxed(),
);
Some(guard)
}
None => None,
};
// Optional Chrome `about:tracing` sink — intentionally UNFILTERED so
// traces capture the full picture even when the console is quiet.
let chrome_guard = match &config.trace_chrome {
Some(path) => {
let (layer, guard) = tracing_chrome::ChromeLayerBuilder::new()
.file(path.clone())
.include_args(true)
.build();
layers.push(layer.boxed());
Some(guard)
}
None => None,
};
// `tracing-error` layer enables SpanTrace capture in `with_span_trace`.
layers.push(ErrorLayer::default().boxed());
tracing_subscriber::registry()
.with(layers)
.try_init()
.context("tracing subscriber already initialized")?;
// `build_env_filter` is retained for compatibility with older callers;
// `resolve_filter_directive` above is what actually drives the layer
// filters.
let _ = build_env_filter(config);
// Install the metrics debugging recorder. `install` sets the global
// recorder; its snapshotter is held in the guards struct.
let recorder = metrics_util::debugging::DebuggingRecorder::new();
let snapshotter = recorder.snapshotter();
if recorder.install().is_err() {
tracing::warn!("a metrics recorder was already installed; skipping xenia-rs recorder");
}
#[cfg(feature = "profiling")]
let pprof = match &config.profile {
Some(path) => {
let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.blocklist(&["libc", "libgcc", "pthread", "vdso"])
.build()
.context("failed to start pprof sampling profiler")?;
Some((guard, path.clone()))
}
None => None,
};
#[cfg(not(feature = "profiling"))]
if config.profile.is_some() {
bail!("--profile requires building with --features profiling");
}
Ok(ObservabilityGuards {
_appender: appender_guard,
_chrome: chrome_guard,
#[cfg(feature = "profiling")]
pprof,
metrics_snapshotter: Some(snapshotter),
})
}
fn resolve_filter_directive(config: &ObservabilityConfig) -> String {
if let Some(ref f) = config.log_filter {
return f.clone();
}
if let Ok(f) = std::env::var("RUST_LOG")
&& !f.is_empty() {
return f;
}
config.default_level.to_string()
}
fn build_env_filter(config: &ObservabilityConfig) -> Result<EnvFilter> {
// Precedence: explicit --log-filter > RUST_LOG > default_level.
if let Some(ref f) = config.log_filter {
return EnvFilter::try_new(f).context("invalid --log-filter directive");
}
if let Ok(f) = EnvFilter::try_from_default_env() {
return Ok(f);
}
EnvFilter::try_new(config.default_level)
.with_context(|| format!("invalid default filter `{}`", config.default_level))
}
fn parse_span_events() -> FmtSpan {
match std::env::var("RUST_LOG_SPAN_EVENTS").as_deref() {
Ok("full") => FmtSpan::FULL,
Ok("close") => FmtSpan::CLOSE,
Ok("active") => FmtSpan::ACTIVE,
Ok("enter") => FmtSpan::ENTER,
Ok("exit") => FmtSpan::EXIT,
Ok("new") => FmtSpan::NEW,
_ => FmtSpan::NONE,
}
}
type FileLayerBox = Box<dyn Layer<Registry> + Send + Sync + 'static>;
fn build_file_layer(
path: &Path,
span_events: FmtSpan,
) -> Result<(FileLayerBox, tracing_appender::non_blocking::WorkerGuard)> {
let parent = path.parent().unwrap_or_else(|| Path::new("."));
let file_name = path
.file_name()
.ok_or_else(|| anyhow::anyhow!("log file path has no file name: {}", path.display()))?;
std::fs::create_dir_all(parent)
.with_context(|| format!("failed to create {}", parent.display()))?;
let appender = tracing_appender::rolling::never(parent, file_name);
let (non_blocking, guard) = tracing_appender::non_blocking(appender);
let ext = path
.extension()
.and_then(|e| e.to_str())
.unwrap_or_default();
let layer: FileLayerBox = if ext.eq_ignore_ascii_case("json") {
fmt::layer()
.json()
.with_span_events(span_events)
.with_writer(non_blocking)
.with_ansi(false)
.boxed()
} else {
fmt::layer()
.with_span_events(span_events)
.with_writer(non_blocking)
.with_ansi(false)
.boxed()
};
Ok((layer, guard))
}
/// Wrap an error with a captured span-trace so the top-level `main` can
/// render "failed in `cmd_exec > load_image > …`" alongside the regular
/// anyhow context chain.
#[allow(dead_code)]
pub fn with_span_trace<E>(err: E) -> anyhow::Error
where
E: std::error::Error + Send + Sync + 'static,
{
anyhow::Error::new(err).context(SpanTraceDisplay(SpanTrace::capture()))
}
/// Attach a captured span-trace to an existing `anyhow::Error` as extra
/// context. Used at command boundaries where errors already bubble as
/// `anyhow::Error`.
pub fn attach_span_trace(err: anyhow::Error) -> anyhow::Error {
err.context(SpanTraceDisplay(SpanTrace::capture()))
}
struct SpanTraceDisplay(SpanTrace);
impl std::fmt::Display for SpanTraceDisplay {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "span trace:\n{}", self.0)
}
}
impl std::fmt::Debug for SpanTraceDisplay {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
<Self as std::fmt::Display>::fmt(self, f)
}
}
#[cfg(feature = "profiling")]
fn write_pprof_report(guard: &pprof::ProfilerGuard<'static>, path: &Path) -> Result<()> {
let report = guard.report().build().context("pprof report build failed")?;
let ext = path
.extension()
.and_then(|e| e.to_str())
.unwrap_or("")
.to_ascii_lowercase();
let parent = path.parent().unwrap_or_else(|| Path::new("."));
std::fs::create_dir_all(parent).ok();
match ext.as_str() {
"svg" | "" => {
let file = std::fs::File::create(path)
.with_context(|| format!("create {}", path.display()))?;
report
.flamegraph(file)
.context("flamegraph render failed")?;
}
"pb" | "proto" | "pprof" => {
use pprof::protos::Message;
let profile = report.pprof().context("pprof protobuf build failed")?;
let buf = profile
.write_to_bytes()
.context("pprof protobuf encode failed")?;
std::fs::write(path, &buf).with_context(|| format!("write {}", path.display()))?;
}
other => bail!("unknown --profile extension `.{other}` (use .svg or .pb)"),
}
Ok(())
}
fn print_metrics_summary(snap: &metrics_util::debugging::Snapshotter) {
use metrics_util::debugging::DebugValue;
use metrics_util::MetricKind;
let snapshot = snap.snapshot();
let rows = snapshot.into_vec();
if rows.is_empty() {
return;
}
// Group counters, gauges, histograms into simple lines. Use tracing so
// the summary honours the installed subscriber (can land in file + JSON
// sinks and not just stderr).
let mut lines: Vec<String> = Vec::with_capacity(rows.len());
for (key, _unit, _desc, value) in rows {
let kind = match key.kind() {
MetricKind::Counter => "counter",
MetricKind::Gauge => "gauge",
MetricKind::Histogram => "histogram",
};
let name = key.key().name();
let labels: Vec<String> = key
.key()
.labels()
.map(|l| format!("{}={}", l.key(), l.value()))
.collect();
let labels_str = if labels.is_empty() {
String::new()
} else {
format!("{{{}}}", labels.join(","))
};
let value_str = match value {
DebugValue::Counter(n) => n.to_string(),
DebugValue::Gauge(g) => format!("{}", g.into_inner()),
DebugValue::Histogram(samples) => {
if samples.is_empty() {
"empty".to_string()
} else {
let floats: Vec<f64> = samples.iter().map(|s| s.into_inner()).collect();
let count = floats.len();
let sum: f64 = floats.iter().copied().sum();
let min = floats.iter().copied().fold(f64::INFINITY, f64::min);
let max = floats.iter().copied().fold(f64::NEG_INFINITY, f64::max);
format!(
"count={} sum={:.3} min={:.3} max={:.3} mean={:.3}",
count,
sum,
min,
max,
sum / count as f64
)
}
}
};
lines.push(format!(" {kind:<9} {name}{labels_str} = {value_str}"));
}
if tracing::enabled!(Level::INFO) {
tracing::info!("metrics summary:\n{}", lines.join("\n"));
} else {
eprintln!("metrics summary:\n{}", lines.join("\n"));
}
}

View File

@@ -0,0 +1,72 @@
# Sylpheed regression goldens
These JSON files anchor `xenia-rs check` digest output for Project Sylpheed.
## Files
| File | -n | Mode | Captures |
|------|----|------|----------|
| `sylpheed_n2m.json` | 2_000_000 | full digest | early boot (swaps=0, no rendering) |
| `sylpheed_n50m.json` | 50_000_000 | stable-digest | first VdSwap pair (swaps=2 post-Phase-A) |
## Stable-digest mode
`sylpheed_n50m.json` is captured with `--stable-digest`, which omits
timing-sensitive counters: `packets` (±28% lockstep noise from a GPU thread
race), `resolves`, `interrupts_delivered`, `interrupts_dropped`,
`texture_decodes`. The remaining fields are byte-identical across repeated
lockstep runs at a fixed -n.
`sylpheed_n2m.json` predates the stable-digest flag and uses full-digest
compare. It still works because at -n 2M the GPU pipeline has not produced any
packets yet — `packets=0` is trivially deterministic.
## Circularity hazard
Per ORACBUG-001/002/003, these goldens were captured by running the same code
they validate. They detect **regression** from a known-good snapshot, not
**correctness**. When a planned fix intentionally moves the digest (e.g. a
shader fix landing `draws > 0` for the first time), re-baseline the golden as
a separate commit and reference the audit ID in the message.
## Re-baselining
```sh
cargo build --release -p xenia-app
target/release/xenia-rs check \
"$SYLPHEED_ISO" \
-n 50000000 \
--stable-digest \
--out crates/xenia-app/tests/golden/sylpheed_n50m.json
```
## Running the goldens
```sh
cargo test --release -p xenia-app --test sylpheed_oracles -- --ignored --nocapture
```
The tests are `#[ignore]`-gated because each run takes a few seconds, which is
unacceptable in the default `cargo test` cycle. The ISO path defaults to the
contributor's local `~/RE Project Sylpheed/Project Sylpheed*.iso` and can be
overridden via `SYLPHEED_ISO=/path/to/sylpheed.iso`.
## n4b canonical-invocation regression anchor (deferred)
The audit's recommended next sprint also called for a `sylpheed_n4b.json`
golden capturing the canonical reference invocation
`xenia-rs check sylpheed.iso -n 4_000_000_000 --parallel --reservations-table`.
This is **deferred** because:
1. The `--parallel --reservations-table` combination is empirically pathologically
slow at -n 100M (>32 min per run per the audit memory). At -n 4B the run cost
is many hours, not the single-session-friendly 515 min the original plan
estimated.
2. Each phase that intentionally moves rendering counters (C, D, E, F) would
need a re-baseline of n4b — a significant time cost compounding over the
sprint.
Once the renderer-unblock phases (C+D+E) land and `draws > 0` is confirmed at
-n 100M lockstep, an n4b artifact may be captured one-shot and stored under
`audit-runs/post-fix/` (not as a test golden) as a manual regression anchor for
the canonical invocation.

View File

@@ -0,0 +1,10 @@
{
"instructions": 2000005,
"imports": 5635,
"unimpl": 0,
"draws": 0,
"swaps": 0,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000000,
"imports": 339766,
"unimpl": 0,
"draws": 0,
"swaps": 2,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,111 @@
//! M3 real-parallelism stress harness.
//!
//! Runs `xenia-rs check sylpheed.iso --parallel --halt-on-deadlock`
//! many times back-to-back to surface lost-wakeups, lock-order
//! inversions, and ABA hazards that a single run wouldn't reliably
//! reproduce. Failures dump per-run stdout/stderr to
//! `target/parallel-stress-NNN.{stdout,stderr}` for post-mortem.
//!
//! Two configurations:
//! - `parallel_stress_short`: 20 runs at -n 5_000_000. Quick smoke
//! check — runs in a few minutes on the current substrate.
//! - `parallel_stress_long` (ignored, opt-in): 100 runs at
//! -n 50_000_000. The full gate from the master plan; expected
//! runtime is hours until the perf gap (Step 05's deferred parking
//! fix) closes.
//!
//! Run with `cargo test --release -p xenia-app --test parallel_stress
//! -- --ignored --nocapture` for the full 100x; otherwise the short
//! variant runs as part of the normal test suite when explicitly
//! invoked: `cargo test --release -p xenia-app --test parallel_stress
//! -- --nocapture parallel_stress_short`.
use std::process::Command;
use std::time::Instant;
const ISO_DEFAULT: &str = "/home/fabi/RE Project Sylpheed/Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso";
fn iso_path() -> String {
std::env::var("SYLPHEED_ISO").unwrap_or_else(|_| ISO_DEFAULT.to_string())
}
fn run_stress(label: &str, runs: u32, max_instr: u64) {
let bin = env!("CARGO_BIN_EXE_xenia-rs");
let iso = iso_path();
if !std::path::Path::new(&iso).exists() {
eprintln!("{label}: iso not found at {iso}; set SYLPHEED_ISO to override. SKIPPING.");
return;
}
std::fs::create_dir_all("target").ok();
let mut failures: u32 = 0;
let mut wall_ms: Vec<u128> = Vec::with_capacity(runs as usize);
let max_instr_str = max_instr.to_string();
for run in 1..=runs {
let t0 = Instant::now();
let out = Command::new(bin)
.args([
"exec",
&iso,
"-n",
&max_instr_str,
"--parallel",
"--halt-on-deadlock",
"--quiet",
])
.output()
.expect("failed to spawn xenia-rs");
let dt = t0.elapsed().as_millis();
wall_ms.push(dt);
let exit_ok = out.status.success();
let vdswap2 = String::from_utf8_lossy(&out.stderr).contains("VdSwap")
|| String::from_utf8_lossy(&out.stdout).contains("VdSwap");
let _ = vdswap2; // VdSwap=2 not required at -n 5M; tracked for diagnostic only.
if !exit_ok {
failures += 1;
std::fs::write(
format!("target/parallel-stress-{label}-{run:03}.stdout"),
&out.stdout,
)
.ok();
std::fs::write(
format!("target/parallel-stress-{label}-{run:03}.stderr"),
&out.stderr,
)
.ok();
eprintln!(
"{label}: run {run}/{runs} FAILED (wall={}ms, exit={:?})",
dt,
out.status.code()
);
} else {
eprintln!("{label}: run {run}/{runs} ok (wall={dt}ms)");
}
}
wall_ms.sort();
let p50 = wall_ms[wall_ms.len() / 2];
let p95_idx = ((wall_ms.len() - 1) * 95) / 100;
let p95 = wall_ms[p95_idx];
let max = *wall_ms.last().unwrap();
eprintln!(
"{label} summary: runs={runs} ok={} failed={failures} p50={p50}ms p95={p95}ms max={max}ms",
runs - failures,
);
assert_eq!(failures, 0, "{label}: {failures} of {runs} stress runs failed");
}
/// 20 runs at -n 5M. Session-feasible (~10 minutes at the current
/// perf level). Surfaces lost-wakeup / lock-order / phaser-timeout
/// bugs that a single run wouldn't reproduce.
#[test]
#[ignore = "stress test; run via `cargo test ... -- --ignored parallel_stress_short`"]
fn parallel_stress_short() {
run_stress("short", 20, 5_000_000);
}
/// 100 runs at -n 50M. The full M3 follow-up gate per the master
/// plan. Expected runtime is hours until the perf gap closes.
#[test]
#[ignore = "full stress test; run via `cargo test ... -- --ignored parallel_stress_long`"]
fn parallel_stress_long() {
run_stress("long", 100, 50_000_000);
}

View File

@@ -0,0 +1,85 @@
//! Sylpheed boot-sequence regression oracles.
//!
//! These goldens trigger `xenia-rs check` against the Project Sylpheed ISO and
//! compare the resulting digest to a checked-in JSON file via `--stable-digest`,
//! which excludes timing-sensitive counters (`packets`, `interrupts_*`,
//! `resolves`, `texture_decodes`). The remaining fields are deterministic in
//! lockstep at a fixed instruction budget — verified empirically across 3
//! consecutive runs.
//!
//! Goldens are CIRCULAR per ORACBUG-001/002/003: they were captured by running
//! the same code they validate. Treat them as **regression anchors** (catch
//! drift from a known-good snapshot) not **correctness anchors** (no claim
//! about absolute behavior). When a planned fix intentionally moves the
//! digest (e.g. swap fix → `swaps` increments; renderer fix → `draws` becomes
//! non-zero), re-baseline the golden as a separate commit.
//!
//! Tests are `#[ignore]`-gated because the runs take ~4 seconds each, which
//! is unacceptable for the default `cargo test` cycle. Run explicitly:
//! cargo test --release -p xenia-app --test sylpheed_oracles -- --ignored --nocapture
//!
//! ISO path is read from the `SYLPHEED_ISO` env var, falling back to the
//! repo-relative default. CI/contributors without the ISO will see the test
//! skip gracefully.
use std::process::Command;
const ISO_DEFAULT: &str = "/home/fabi/RE Project Sylpheed/Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso";
fn iso_path() -> String {
std::env::var("SYLPHEED_ISO").unwrap_or_else(|_| ISO_DEFAULT.to_string())
}
fn run_oracle(label: &str, max_instr: u64, golden_rel: &str) {
let bin = env!("CARGO_BIN_EXE_xenia-rs");
let iso = iso_path();
if !std::path::Path::new(&iso).exists() {
eprintln!("{label}: iso not found at {iso}; set SYLPHEED_ISO to override. SKIPPING.");
return;
}
// Resolve the golden path relative to the test's CARGO_MANIFEST_DIR so the
// test runs correctly from any cwd.
let manifest_dir = env!("CARGO_MANIFEST_DIR");
let golden = std::path::Path::new(manifest_dir).join(golden_rel);
assert!(
golden.exists(),
"{label}: golden file missing at {}",
golden.display()
);
let max_instr_str = max_instr.to_string();
let golden_str = golden.to_string_lossy().to_string();
let out = Command::new(bin)
.args([
"check",
&iso,
"-n",
&max_instr_str,
"--stable-digest",
"--expect",
&golden_str,
])
.output()
.expect("failed to spawn xenia-rs");
if !out.status.success() {
eprintln!(
"{label}: STDOUT:\n{}\nSTDERR:\n{}",
String::from_utf8_lossy(&out.stdout),
String::from_utf8_lossy(&out.stderr),
);
panic!("{label}: digest mismatch (exit {:?})", out.status.code());
}
}
/// Sylpheed boot to first VdSwap pair, captured at -n 50M lockstep.
/// Catches regressions in: addi/addic semantics, kernel HLE for VdSwap path,
/// thread spawning, file I/O for sound/config. With Phase A's swap fix landed,
/// `swaps` should be 2 and `draws` 0 (Phase E gates draws>0).
#[test]
#[ignore = "long-running; run via `cargo test ... -- --ignored sylpheed_n50m`"]
fn sylpheed_n50m() {
run_oracle("sylpheed_n50m", 50_000_000, "tests/golden/sylpheed_n50m.json");
}

View File

@@ -10,3 +10,11 @@ xenia-memory = { workspace = true }
tracing = { workspace = true }
bitflags = { workspace = true }
thiserror = { workspace = true }
[dev-dependencies]
serde = { workspace = true }
serde_json = { workspace = true }
[[bench]]
name = "interpreter"
harness = false

View File

@@ -0,0 +1,194 @@
//! Interpreter throughput micro-benchmarks.
//!
//! Custom `harness = false` main — no extra dev-deps. Run via
//! `cargo bench -p xenia-cpu` (or `cargo run --release --bench interpreter`).
//!
//! Three workloads, each measuring `step_cached` throughput in MIPS:
//!
//! - `tight_alu_loop` — pure dispatch + ALU + decode-cache hit.
//! - `loadstore_loop` — alternating `lwz`/`stw` against main RAM. Stresses
//! every load/store path and `find_mmio` dispatch.
//! - `mmio_storm` — same shape as `loadstore_loop` but the address is
//! in a registered MMIO aperture. Sanity-checks that
//! MMIO writes still dispatch correctly.
//!
//! These are not statistically rigorous — no warmup, no variance — they're
//! just enough to detect 2x-class wins or regressions on the perf-track
//! changes (MMIO fast-reject, threaded dispatch, block cache). Numbers go
//! into commit messages; there is no automated baseline file.
use std::sync::atomic::{AtomicU32, AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Instant;
use xenia_cpu::context::PpcContext;
use xenia_cpu::decoder::DecodeCache;
use xenia_cpu::interpreter::{step_cached, StepResult};
use xenia_memory::{GuestMemory, MemoryAccess, MmioRegion};
use xenia_memory::page_table::MemoryProtect;
// PPC instruction encoders — minimal subset needed by the benches.
#[inline]
fn enc_addi(rd: u32, ra: u32, simm: i16) -> u32 {
(14 << 26) | (rd << 21) | (ra << 16) | (simm as u16 as u32)
}
#[inline]
fn enc_lwz(rd: u32, ra: u32, d: i16) -> u32 {
(32 << 26) | (rd << 21) | (ra << 16) | (d as u16 as u32)
}
#[inline]
fn enc_stw(rs: u32, ra: u32, d: i16) -> u32 {
(36 << 26) | (rs << 21) | (ra << 16) | (d as u16 as u32)
}
/// Set up a `GuestMemory` with one writable region for code+data.
fn make_mem(code_base: u32, code_size: u32) -> GuestMemory {
let mut mem = GuestMemory::new().expect("reserve 4GB");
mem.alloc(code_base, code_size, MemoryProtect::READ | MemoryProtect::WRITE)
.expect("alloc bench region");
mem
}
/// Write a sequence of raw PPC instructions starting at `base`.
fn write_program(mem: &GuestMemory, base: u32, instrs: &[u32]) {
for (i, &raw) in instrs.iter().enumerate() {
mem.write_u32(base + (i as u32 * 4), raw);
}
}
/// Run `total_instrs` interpreter steps over a program of length `n`,
/// wrapping PC back to `base` whenever it falls off the end. Returns the
/// elapsed wall time.
fn run_loop(
ctx: &mut PpcContext,
mem: &GuestMemory,
cache: &mut DecodeCache,
base: u32,
n: u32,
total_instrs: u64,
) -> std::time::Duration {
let end = base + n * 4;
ctx.pc = base;
let t0 = Instant::now();
for _ in 0..total_instrs {
let pv = mem.page_version(ctx.pc);
let r = step_cached(ctx, mem, cache, pv);
debug_assert!(matches!(r, StepResult::Continue));
if ctx.pc >= end {
ctx.pc = base;
}
}
t0.elapsed()
}
fn report(label: &str, total_instrs: u64, elapsed: std::time::Duration) {
let secs = elapsed.as_secs_f64();
let mips = (total_instrs as f64) / secs / 1.0e6;
println!(
"{:<24} {:>12} instrs in {:>7.3}s = {:>7.2} MIPS",
label, total_instrs, secs, mips
);
}
fn bench_tight_alu_loop() {
const BASE: u32 = 0x1000;
const N: u32 = 256;
const TOTAL: u64 = 50_000_000;
let mut mem = make_mem(BASE, 0x1000);
// 256 × `addi r3, r3, 1` — pure register-register, no memory touch
// beyond instruction fetch.
let prog: Vec<u32> = (0..N).map(|_| enc_addi(3, 3, 1)).collect();
write_program(&mut mem, BASE, &prog);
let mut ctx = PpcContext::new();
let mut cache = DecodeCache::new();
let elapsed = run_loop(&mut ctx, &mut mem, &mut cache, BASE, N, TOTAL);
report("tight_alu_loop", TOTAL, elapsed);
}
fn bench_loadstore_loop() {
const CODE_BASE: u32 = 0x1000;
const DATA_BASE: u32 = 0x2000;
const N: u32 = 256;
const TOTAL: u64 = 30_000_000;
let mut mem = make_mem(CODE_BASE, 0x2000);
// 128 pairs of `stw r3, 0(r4); lwz r5, 0(r4)` — exercises every
// load/store path through `read_u32`/`write_u32` (incl. `find_mmio`).
let mut prog = Vec::with_capacity(N as usize);
for _ in 0..(N / 2) {
prog.push(enc_stw(3, 4, 0));
prog.push(enc_lwz(5, 4, 0));
}
write_program(&mut mem, CODE_BASE, &prog);
let mut ctx = PpcContext::new();
ctx.gpr[3] = 0xDEAD_BEEF;
ctx.gpr[4] = DATA_BASE as u64;
let mut cache = DecodeCache::new();
let elapsed = run_loop(&mut ctx, &mut mem, &mut cache, CODE_BASE, N, TOTAL);
report("loadstore_loop", TOTAL, elapsed);
}
fn bench_mmio_storm() {
const CODE_BASE: u32 = 0x1000;
const MMIO_BASE: u32 = 0xEA00_0000;
const N: u32 = 64;
// MMIO is slower per access — keep total smaller so the bench stays
// under a few seconds.
const TOTAL: u64 = 2_000_000;
let mut mem = make_mem(CODE_BASE, 0x1000);
let writes = Arc::new(AtomicU64::new(0));
let reads = Arc::new(AtomicU32::new(0));
let writes_clone = writes.clone();
let reads_clone = reads.clone();
mem.add_mmio_region(MmioRegion {
base_address: MMIO_BASE,
mask: 0xFFFF_0000,
size: 0x0001_0000,
read_callback: Box::new(move |_a| {
reads_clone.fetch_add(1, Ordering::Relaxed);
0
}),
write_callback: Box::new(move |_a, _v| {
writes_clone.fetch_add(1, Ordering::Relaxed);
}),
});
let mut prog = Vec::with_capacity(N as usize);
for _ in 0..(N / 2) {
prog.push(enc_stw(3, 4, 0));
prog.push(enc_lwz(5, 4, 0));
}
write_program(&mut mem, CODE_BASE, &prog);
let mut ctx = PpcContext::new();
ctx.gpr[3] = 0x1234_5678;
ctx.gpr[4] = MMIO_BASE as u64;
let mut cache = DecodeCache::new();
let elapsed = run_loop(&mut ctx, &mut mem, &mut cache, CODE_BASE, N, TOTAL);
report("mmio_storm", TOTAL, elapsed);
// Sanity assertions — silently catch a refactor that breaks MMIO dispatch.
let w = writes.load(Ordering::Relaxed);
let r = reads.load(Ordering::Relaxed);
assert_eq!(w, TOTAL / 2, "expected MMIO writes to be dispatched");
assert_eq!(r as u64, TOTAL / 2, "expected MMIO reads to be dispatched");
}
fn main() {
println!("xenia-cpu interpreter bench");
println!(" build: {}", if cfg!(debug_assertions) { "debug" } else { "release" });
bench_tight_alu_loop();
bench_loadstore_loop();
bench_mmio_storm();
}

View File

@@ -0,0 +1,423 @@
//! Tier-4 perf — basic-block cache for the PPC interpreter.
//!
//! `DecodeCache` (in [`crate::decoder`]) caches one decoded instruction
//! per slot, indexed by PC. The hot loop still pays the per-instruction
//! cost of fetching the raw word, hashing the PC into a slot, and
//! comparing tags. For straight-line code — common in the asset/inflate
//! loops where Sylpheed boot is currently CPU-bound — the savings of
//! batching N decoded instructions per slot lookup are linear in block
//! length.
//!
//! ## Shape
//!
//! A `DecodedBlock` is a contiguous run of decoded instructions starting
//! at `start_pc`, ending at the first *block terminator* (any branch,
//! `sc`, trap, or `Invalid`) or at one of two safety limits:
//!
//! - [`MAX_BLOCK_INSTRS`] caps memory growth and re-build cost.
//! - 4 KiB page boundary stop. A block is fully contained inside a
//! single 4 KiB guest page; that means `mem.page_version(start_pc)`
//! is sufficient to detect any code-page rewrite that should
//! invalidate the block. Without this rule the cache would have to
//! walk every spanned page on every hit, which would erase the win.
//!
//! ## Invalidation
//!
//! Each block stamps the page version at build time. On lookup, if
//! `mem.page_version(start_pc)` differs from `block.page_version`, the
//! slot is rebuilt. Same mechanism `DecodeCache` uses, just at
//! block granularity.
//!
//! ## Debugger semantics
//!
//! Block dispatch is **opt-in** by the caller. The hot loop in
//! `xenia-app/src/main.rs` selects the per-instruction path whenever
//! `Debugger::wants_hooks()` is true or any `--trace-*` flag is set.
//! That's how single-step, breakpoints, in-memory trace, instruction
//! trace, and branch trace continue to observe every PC: the block
//! cache simply never runs in those modes.
use crate::decoder::{decode, DecodedInstr};
use xenia_memory::MemoryAccess;
/// Direct-mapped block-cache slot count. Same shape as
/// [`crate::decoder::DECODE_CACHE_SIZE`] — 64 K slots indexed by the
/// low 16 bits of `start_pc >> 2`. With Sylpheed-class workloads the
/// slot collision rate is negligible.
const BLOCK_CACHE_SIZE: usize = 1 << 16;
const BLOCK_CACHE_MASK: u32 = (BLOCK_CACHE_SIZE - 1) as u32;
/// Hard cap on instructions per block. Keeps the worst-case memory
/// footprint bounded and limits the rebuild cost when a code page
/// gets bumped. 32 instructions is generous for most basic blocks
/// (real-world average across Sylpheed boot is ~6 between branches).
pub const MAX_BLOCK_INSTRS: usize = 32;
/// Guest page size — duplicated here to avoid pulling
/// `xenia-memory::heap` internals into `xenia-cpu`. Must stay in sync
/// with the memory crate. Both refer to the architectural PowerPC 4 KiB
/// page granule, so this constant is locked.
const GUEST_PAGE_SIZE: u32 = 4096;
const GUEST_PAGE_MASK: u32 = !(GUEST_PAGE_SIZE - 1);
/// One cached basic block. Owned by [`BlockCache`]; a `&DecodedBlock`
/// is handed to the interpreter via [`BlockCache::lookup_or_build`] and
/// stays valid until the next `lookup_or_build` on the same slot.
#[derive(Debug)]
pub struct DecodedBlock {
/// Guest PC at which this block starts. Used as the slot tag.
pub start_pc: u32,
/// Guest PC immediately after the last instruction in `instrs`.
/// Equal to `instrs.last().addr + 4` whether or not the block
/// ended on a terminator. Useful for tracing / disassembly.
pub end_pc: u32,
/// `mem.page_version(start_pc)` at build time. Mismatch on lookup
/// invalidates the block. Single value because every block is
/// page-bounded by construction.
pub page_version: u64,
/// Decoded instructions in execution order. Always non-empty after
/// a successful build (`MAX_BLOCK_INSTRS >= 1` and the build walk
/// pushes the first decoded word unconditionally).
pub instrs: Vec<DecodedInstr>,
}
/// Per-slot status from a `lookup_or_build` probe. Internal only.
enum CacheStatus {
/// Block at this slot matches `pc` and the page version at build
/// time matches `mem.page_version(pc)` — return as-is.
Hit,
/// Block at this slot matched `pc` but the page version has
/// advanced — rebuild and bump `invalidations`.
Stale,
/// Slot is empty or holds a block keyed at a different `start_pc`.
/// Build a fresh block and bump `misses`.
Miss,
}
/// Direct-mapped block cache. One instance shared across all HW slots
/// (block contents are PC-only and read-only after fill). Not
/// thread-safe — owner is the single scheduler thread, same as
/// `DecodeCache`.
pub struct BlockCache {
slots: Box<[Option<Box<DecodedBlock>>]>,
hits: u64,
misses: u64,
invalidations: u64,
}
impl Default for BlockCache {
fn default() -> Self {
Self::new()
}
}
impl BlockCache {
pub fn new() -> Self {
// `Option<Box<T>>` is a niche-optimized 8-byte slot; 64 K of
// them cost ~512 KiB of cold storage. Live blocks beyond that
// sit on the heap.
let mut v: Vec<Option<Box<DecodedBlock>>> = Vec::with_capacity(BLOCK_CACHE_SIZE);
v.resize_with(BLOCK_CACHE_SIZE, || None);
Self {
slots: v.into_boxed_slice(),
hits: 0,
misses: 0,
invalidations: 0,
}
}
pub fn hits(&self) -> u64 {
self.hits
}
pub fn misses(&self) -> u64 {
self.misses
}
pub fn invalidations(&self) -> u64 {
self.invalidations
}
/// Return the cached block starting at `pc`, building it if absent
/// or stale. The returned reference is borrowed from the cache and
/// stays valid until the next `lookup_or_build` call.
pub fn lookup_or_build(&mut self, pc: u32, mem: &dyn MemoryAccess) -> &DecodedBlock {
let idx = ((pc >> 2) & BLOCK_CACHE_MASK) as usize;
let cur_pv = mem.page_version(pc);
// Phase 1: classify the slot. Borrow ends before fill so the
// mutable update below doesn't conflict.
let status = match &self.slots[idx] {
Some(b) if b.start_pc == pc && b.page_version == cur_pv => CacheStatus::Hit,
Some(b) if b.start_pc == pc => CacheStatus::Stale,
_ => CacheStatus::Miss,
};
// Phase 2: fill on miss/stale, account.
match status {
CacheStatus::Hit => {
self.hits += 1;
}
CacheStatus::Stale => {
self.invalidations += 1;
self.misses += 1;
let block = build_block(pc, mem, cur_pv);
self.slots[idx] = Some(Box::new(block));
}
CacheStatus::Miss => {
self.misses += 1;
let block = build_block(pc, mem, cur_pv);
self.slots[idx] = Some(Box::new(block));
}
}
// Slot is guaranteed populated at this point — Hit returned a
// pre-existing block, Miss/Stale just wrote a new one.
self.slots[idx]
.as_deref()
.expect("block freshly built or hit")
}
}
/// Walk forward from `pc`, decoding instructions and collecting them
/// into a `DecodedBlock`. The walk stops on the first of:
/// - a [`PpcOpcode::terminates_block`] true (the terminator IS
/// included as the last instruction),
/// - reaching [`MAX_BLOCK_INSTRS`],
/// - the next PC would cross a 4 KiB guest page boundary.
fn build_block(start_pc: u32, mem: &dyn MemoryAccess, page_version: u64) -> DecodedBlock {
let mut instrs: Vec<DecodedInstr> = Vec::with_capacity(8);
let page_base = start_pc & GUEST_PAGE_MASK;
let mut cur = start_pc;
loop {
let raw = mem.read_u32(cur);
let decoded = decode(raw, cur);
let terminates = decoded.opcode.terminates_block();
instrs.push(decoded);
if terminates {
break;
}
if instrs.len() >= MAX_BLOCK_INSTRS {
break;
}
let next = cur.wrapping_add(4);
if (next & GUEST_PAGE_MASK) != page_base {
break;
}
cur = next;
}
let last = instrs.last().expect("build pushes at least one instruction");
let end_pc = last.addr.wrapping_add(4);
DecodedBlock {
start_pc,
end_pc,
page_version,
instrs,
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::opcode::PpcOpcode;
use std::cell::Cell;
/// 64 KiB byte-array memory, big-endian word reads.
/// Mirrors `interpreter::tests::TestMem` but lives here so block_cache
/// tests don't depend on interpreter internals.
struct BlockTestMem {
data: Box<[Cell<u8>]>,
version_a: u64,
version_b: u64,
// Address of the page whose version is `version_b` instead of
// `version_a`. Used to model an out-of-band page-version bump in
// the invalidation test without going through write_*.
bumped_page: Cell<Option<u32>>,
}
impl BlockTestMem {
fn new() -> Self {
Self {
data: (0..0x10000u32).map(|_| Cell::new(0)).collect(),
version_a: 1,
version_b: 2,
bumped_page: Cell::new(None),
}
}
fn put(&self, addr: u32, raw: u32) {
let a = addr as usize;
for (i, byte) in raw.to_be_bytes().iter().enumerate() {
self.data[a + i].set(*byte);
}
}
}
impl MemoryAccess for BlockTestMem {
fn read_u8(&self, a: u32) -> u8 { self.data[a as usize].get() }
fn read_u16(&self, a: u32) -> u16 {
let i = a as usize;
u16::from_be_bytes([self.data[i].get(), self.data[i + 1].get()])
}
fn read_u32(&self, a: u32) -> u32 {
let i = a as usize;
u32::from_be_bytes([
self.data[i].get(), self.data[i + 1].get(),
self.data[i + 2].get(), self.data[i + 3].get(),
])
}
fn read_u64(&self, a: u32) -> u64 {
let i = a as usize;
u64::from_be_bytes([
self.data[i].get(), self.data[i + 1].get(),
self.data[i + 2].get(), self.data[i + 3].get(),
self.data[i + 4].get(), self.data[i + 5].get(),
self.data[i + 6].get(), self.data[i + 7].get(),
])
}
fn write_u8(&self, a: u32, v: u8) { self.data[a as usize].set(v); }
fn write_u16(&self, a: u32, v: u16) {
let i = a as usize;
let b = v.to_be_bytes();
self.data[i].set(b[0]);
self.data[i + 1].set(b[1]);
}
fn write_u32(&self, a: u32, v: u32) {
let i = a as usize;
for (k, byte) in v.to_be_bytes().iter().enumerate() {
self.data[i + k].set(*byte);
}
}
fn write_u64(&self, a: u32, v: u64) {
let i = a as usize;
for (k, byte) in v.to_be_bytes().iter().enumerate() {
self.data[i + k].set(*byte);
}
}
fn translate(&self, _: u32) -> Option<*const u8> { None }
fn translate_mut(&self, _: u32) -> Option<*mut u8> { None }
fn page_version(&self, addr: u32) -> u64 {
if Some(addr & GUEST_PAGE_MASK) == self.bumped_page.get() {
self.version_b
} else {
self.version_a
}
}
}
// PPC encodings — minimal subset for these tests.
fn enc_addi(rd: u32, ra: u32, simm: i16) -> u32 {
(14 << 26) | (rd << 21) | (ra << 16) | (simm as u16 as u32)
}
fn enc_b_self() -> u32 {
// b 0 — branch to self (LI=0). Opcode=18, AA=0, LK=0.
18 << 26
}
fn enc_unimplemented() -> u32 {
// Use opcode 0 raw = 0; decoder maps to Invalid.
0
}
#[test]
fn block_built_to_terminator() {
let mem = BlockTestMem::new();
mem.put(0x100, enc_addi(3, 3, 1));
mem.put(0x104, enc_addi(3, 3, 1));
mem.put(0x108, enc_addi(3, 3, 1));
mem.put(0x10C, enc_b_self()); // terminator
let mut bc = BlockCache::new();
let b = bc.lookup_or_build(0x100, &mem);
assert_eq!(b.start_pc, 0x100);
assert_eq!(b.instrs.len(), 4);
assert_eq!(b.instrs.last().unwrap().opcode, PpcOpcode::bx);
assert_eq!(b.end_pc, 0x110);
}
#[test]
fn block_stops_at_page_boundary() {
// Build from 0x1FFC. The next PC (0x2000) is in a different
// 4 KiB page — block must contain only the one instruction.
let mem = BlockTestMem::new();
mem.put(0x1FFC, enc_addi(3, 3, 1));
mem.put(0x2000, enc_addi(3, 3, 1));
let mut bc = BlockCache::new();
let b = bc.lookup_or_build(0x1FFC, &mem);
assert_eq!(b.instrs.len(), 1);
assert_eq!(b.end_pc, 0x2000);
}
#[test]
fn block_stops_at_max_len() {
// 64 consecutive non-terminator instructions on one page —
// block must clamp at MAX_BLOCK_INSTRS.
let mem = BlockTestMem::new();
for i in 0..64u32 {
mem.put(0x100 + i * 4, enc_addi(3, 3, 1));
}
let mut bc = BlockCache::new();
let b = bc.lookup_or_build(0x100, &mem);
assert_eq!(b.instrs.len(), MAX_BLOCK_INSTRS);
assert_eq!(b.end_pc, 0x100 + (MAX_BLOCK_INSTRS as u32) * 4);
}
#[test]
fn block_stops_at_invalid_opcode() {
// Decoder mapping `Invalid` is treated as a block terminator
// so the per-instruction Unimplemented path is preserved.
let mem = BlockTestMem::new();
mem.put(0x100, enc_addi(3, 3, 1));
mem.put(0x104, enc_unimplemented());
mem.put(0x108, enc_addi(3, 3, 1));
let mut bc = BlockCache::new();
let b = bc.lookup_or_build(0x100, &mem);
assert_eq!(b.instrs.len(), 2);
assert_eq!(b.instrs.last().unwrap().opcode, PpcOpcode::Invalid);
}
#[test]
fn block_invalidates_on_page_version_bump() {
let mem = BlockTestMem::new();
mem.put(0x100, enc_addi(3, 3, 1));
mem.put(0x104, enc_b_self());
let mut bc = BlockCache::new();
let _ = bc.lookup_or_build(0x100, &mem);
assert_eq!(bc.misses(), 1);
assert_eq!(bc.hits(), 0);
// Same call → hit.
let _ = bc.lookup_or_build(0x100, &mem);
assert_eq!(bc.hits(), 1);
assert_eq!(bc.invalidations(), 0);
// Bump the page version on the page containing 0x100. Next
// lookup must invalidate and rebuild.
mem.bumped_page.set(Some(0x100 & GUEST_PAGE_MASK));
let _ = bc.lookup_or_build(0x100, &mem);
assert_eq!(bc.invalidations(), 1);
assert_eq!(bc.misses(), 2);
}
#[test]
fn block_hit_returns_same_contents() {
// Sanity: cache hit returns a block whose contents reflect the
// ORIGINAL instruction stream, even after a non-version-bumping
// poke to the underlying bytes. (No real workload would do
// this, but it confirms we're returning cached data, not
// re-reading.)
let mem = BlockTestMem::new();
mem.put(0x100, enc_addi(3, 3, 7));
mem.put(0x104, enc_b_self());
let mut bc = BlockCache::new();
let first_simm = bc.lookup_or_build(0x100, &mem).instrs[0].simm16();
// Rewrite without bumping version (test-only path).
let bytes = enc_addi(3, 3, 99).to_be_bytes();
for (i, b) in bytes.iter().enumerate() {
mem.data[0x100 + i].set(*b);
}
let cached_simm = bc.lookup_or_build(0x100, &mem).instrs[0].simm16();
assert_eq!(first_simm, 7);
assert_eq!(cached_simm, 7, "cache must serve original decoded form");
}
}

View File

@@ -29,16 +29,37 @@ pub mod spr {
pub const XER: u32 = 1;
pub const LR: u32 = 8;
pub const CTR: u32 = 9;
pub const TBL: u32 = 268;
pub const TBU: u32 = 269;
pub const DSISR: u32 = 18;
pub const DAR: u32 = 19;
/// Decrementer (hypervisor-visible, 32-bit down-counter).
pub const DEC: u32 = 22;
pub const TBL: u32 = 268; // Read (user)
pub const TBU: u32 = 269; // Read (user)
/// Time-base write (supervisor). Separate SPR number from TBL (268) for
/// access-control reasons.
pub const TBL_WRITE: u32 = 284;
pub const TBU_WRITE: u32 = 285;
pub const SPRG0: u32 = 272;
pub const SPRG1: u32 = 273;
pub const SPRG2: u32 = 274;
pub const SPRG3: u32 = 275;
pub const VRSAVE: u32 = 256;
pub const PVR: u32 = 287;
pub const HID0: u32 = 1008;
pub const HID1: u32 = 1009;
pub const PIR: u32 = 1023;
}
/// LR halt sentinel. When `bclr` returns to this address, the interpreter
/// loop halts cleanly (matches the "entry returned" convention).
pub const LR_HALT_SENTINEL: u64 = 0xBCBC_BCBC;
/// VSCR NJ (Non-Java mode) bit. Stored in word 3 at bit 16 (mask 0x0001_0000).
/// Set at startup; when clear, denormals are flushed to zero following IEEE-754.
pub const VSCR_NJ_MASK: u32 = 0x0001_0000;
/// VSCR SAT (saturation sticky) bit. Stored in word 3 at bit 31 (mask 0x0000_0001).
pub const VSCR_SAT_MASK: u32 = 0x0000_0001;
/// PowerPC processor context. Holds all register state for one guest thread.
/// Mirrors PPCContext from ppc_context.h, minus JIT-specific fields.
#[repr(C, align(64))]
@@ -64,15 +85,49 @@ pub struct PpcContext {
pub xer_ca: u8,
pub xer_ov: u8,
pub xer_so: u8,
// Altivec VSCR saturation bit
pub vscr_sat: u8,
/// XER[25:31] string-byte count (`TBC`). Read/written by `mtspr XER`,
/// consumed by `lswx`/`stswx`. Per PPCBUG-123/124/161: was previously
/// unmodelled, making `lswx`/`stswx` a permanent no-op.
pub xer_tbc: u8,
// Altivec VSCR. Only bits 16 (NJ) and 31 (SAT) of word 3 are meaningful.
pub vscr: Vec128,
// VRSAVE (SPR 256). Bitmask of which VRs need saving across context switches.
pub vrsave: u32,
// Program counter
pub pc: u32,
// Reservation address/value for lwarx/stwcx
pub reserved_addr: u32,
// Reservation for lwarx/ldarx/stwcx/stdcx. Xenon's reservation granule is
// one L2 cache line (128 bytes) — `reserved_line` is stored as the base
// address of that line (`ea & !0x7F`). `has_reservation` gates the
// validity; stwcx./stdcx. check that both match before committing.
// `reserved_val` is retained for possible future use by a coherency
// observer; the store-conditional logic itself does not compare it.
pub reserved_line: u32,
pub reserved_val: u64,
pub has_reservation: bool,
/// PPCBUG-151 — width of the active reservation: 4 = `lwarx` (word),
/// 8 = `ldarx` (doubleword), 0 = no reservation. `stwcx.` requires
/// width==4; `stdcx.` requires width==8. Cross-width pairs fail
/// deterministically with CR0.EQ=0. Cleared alongside `has_reservation`
/// on every `stwcx.`/`stdcx.` exit (success or failure).
pub reservation_width: u8,
/// M3.7 — generation stamp returned by [`crate::ReservationTable::reserve`]
/// at the most recent `lwarx`/`ldarx`. Paired with `reserved_line`;
/// `stwcx.`/`stdcx.` pass this back to `try_commit`. Meaningful only
/// when `reservation_table` is `Some` and the table is enabled.
pub reserved_generation: u32,
/// M3.7 — optional handle to the inter-thread reservation table.
/// When `Some(table)` *and* `table.is_enabled()`, the interpreter's
/// `lwarx`/`stwcx.`/`ldarx`/`stdcx.` arms route through the table;
/// otherwise they use the legacy per-`PpcContext` fields above. The
/// scheduler populates this when it spawns a thread under a kernel
/// that has `reservations` set.
pub reservation_table: Option<std::sync::Arc<crate::ReservationTable>>,
/// M3.7 — emulated HW slot ID this thread is bound to. Used as the
/// reservation table's `hw_id` discriminator so two threads on
/// different slots can't accidentally commit each other's
/// reservations. Populated by the scheduler at spawn / migration.
pub hw_id: u8,
// Thread ID (for kernel use)
pub thread_id: u32,
@@ -82,6 +137,12 @@ pub struct PpcContext {
// Time base (incremented each instruction for debugging)
pub timebase: u64,
// Decrementer (SPR 22): 32-bit down-counter that fires an external
// interrupt at underflow on real hw. Xenia-rs doesn't dispatch DEC
// interrupts to the guest; this value is maintained so that mfspr DEC
// returns something coherent.
pub dec: u32,
}
impl PpcContext {
@@ -89,7 +150,9 @@ impl PpcContext {
Self {
gpr: [0; 32],
ctr: 0,
lr: 0,
// Canary sets LR to the halt sentinel at thread start so `blr`
// from the top-level entry falls out of the interpreter loop.
lr: LR_HALT_SENTINEL,
msr: 0,
fpr: [0.0; 32],
vr: [Vec128::ZERO; 128],
@@ -98,14 +161,23 @@ impl PpcContext {
xer_ca: 0,
xer_ov: 0,
xer_so: 0,
vscr_sat: 0,
xer_tbc: 0,
// VSCR starts with NJ bit set (denormals flushed) — matches canary
// thread_state.cc initialization.
vscr: Vec128::from_u32x4(0, 0, 0, VSCR_NJ_MASK),
vrsave: 0xFFFF_FFFF,
pc: 0,
reserved_addr: 0,
reserved_line: 0,
reserved_val: 0,
has_reservation: false,
reservation_width: 0,
reserved_generation: 0,
reservation_table: None,
hw_id: 0,
thread_id: 0,
cycle_count: 0,
timebase: 0,
dec: 0,
}
}
@@ -173,7 +245,10 @@ impl PpcContext {
/// Get the full XER register value.
pub fn xer(&self) -> u32 {
((self.xer_so as u32) << 31) | ((self.xer_ov as u32) << 30) | ((self.xer_ca as u32) << 29)
((self.xer_so as u32) << 31)
| ((self.xer_ov as u32) << 30)
| ((self.xer_ca as u32) << 29)
| (self.xer_tbc as u32) // PPCBUG-123/566: bits 0-6 (TBC).
}
/// Set XER from a full 32-bit value.
@@ -181,6 +256,28 @@ impl PpcContext {
self.xer_so = ((val >> 31) & 1) as u8;
self.xer_ov = ((val >> 30) & 1) as u8;
self.xer_ca = ((val >> 29) & 1) as u8;
self.xer_tbc = (val & 0x7F) as u8; // PPCBUG-124.
}
/// Read the VSCR SAT (sticky saturation) bit.
pub fn vscr_sat(&self) -> bool {
(self.vscr.u32x4(3) & VSCR_SAT_MASK) != 0
}
/// Set or clear VSCR SAT. Preserves the NJ bit (and any other word-3 bits).
pub fn set_vscr_sat(&mut self, v: bool) {
let mut w = self.vscr.u32x4(3);
if v {
w |= VSCR_SAT_MASK;
} else {
w &= !VSCR_SAT_MASK;
}
self.vscr.set_u32x4(3, w);
}
/// Read the VSCR NJ (non-Java mode / flush-denormals) bit.
pub fn vscr_nj(&self) -> bool {
(self.vscr.u32x4(3) & VSCR_NJ_MASK) != 0
}
}

View File

@@ -74,9 +74,24 @@ impl DecodedInstr {
/// Rc bit (bit 31) - record CR0
#[inline] pub fn rc_bit(&self) -> bool { self.raw & 1 != 0 }
/// Rc for VC-form vector compare instructions — PPC bit 21 = host bit 10.
#[inline] pub fn vc_rc_bit(&self) -> bool { (self.raw >> 10) & 1 != 0 }
/// Rc for VX128_R-form vector compare instructions — PPC bit 27 = host bit 4.
/// VX128_R Rc bit — PPC bit 25 (host bit 6) per canary's FormatVX128_R
/// bitfield layout. PPCBUG-700.
#[inline] pub fn vx128r_rc_bit(&self) -> bool { (self.raw >> 6) & 1 != 0 }
/// IMM field for VX128_4-form instructions (vrlimi128) — 5-bit blend mask at PPC bits 11-15.
#[inline] pub fn vx128_4_imm(&self) -> u32 { extract_bits(self.raw, 11, 15) }
/// z field for VX128_4-form instructions (vrlimi128) — 2-bit rotation index at PPC bits 24-25.
#[inline] pub fn vx128_4_z(&self) -> u32 { extract_bits(self.raw, 24, 25) }
/// OE bit (bit 21) - overflow enable
#[inline] pub fn oe(&self) -> bool { extract_bits(self.raw, 21, 21) != 0 }
/// TO field (bits 6-10) for tw/twi/td/tdi trap instructions.
#[inline] pub fn to(&self) -> u32 { extract_bits(self.raw, 6, 10) }
/// MB, ME fields for rotate instructions
#[inline] pub fn mb(&self) -> u32 { extract_bits(self.raw, 21, 25) }
#[inline] pub fn me(&self) -> u32 { extract_bits(self.raw, 26, 30) }
@@ -86,7 +101,13 @@ impl DecodedInstr {
/// SH field for 64-bit shifts (bits 16-20 + bit 30)
#[inline] pub fn sh64(&self) -> u32 {
(extract_bits(self.raw, 16, 20) << 1) | extract_bits(self.raw, 30, 30)
(extract_bits(self.raw, 30, 30) << 5) | extract_bits(self.raw, 16, 20)
}
/// MB/ME field for MD-form and MDS-form instructions (6-bit field, split encoding).
/// MB[4:0] at PPC bits 21-25; MB[5] at PPC bit 26.
#[inline] pub fn mb_md(&self) -> u32 {
extract_bits(self.raw, 21, 25) | (extract_bits(self.raw, 26, 26) << 5)
}
/// SPR field (bits 11-20, swapped halves)
@@ -114,32 +135,67 @@ impl DecodedInstr {
/// crbB (bits 16-20)
#[inline] pub fn crbb(&self) -> u32 { extract_bits(self.raw, 16, 20) }
// VMX128 field extractors
// VMX128 field extractors — bit positions match canary's
// FormatVX128/VX128_2/VX128_4/VX128_5/VX128_R bitfield layout
// (xenia-canary `ppc_decode_data.h:484-663`, LSB-first packed). PPCBUG-700.
/// VA128 (bits 6-10, plus bit from 29)
/// VA128 = VA128l(5) | VA128h(1) << 5 | VA128H(1) << 6.
/// Canonical 7-bit register selector: PPC 11-15 (low), PPC 26 (mid), PPC 21 (high).
#[inline] pub fn va128(&self) -> usize {
(extract_bits(self.raw, 6, 10) | (extract_bits(self.raw, 29, 29) << 5)) as usize
(extract_bits(self.raw, 11, 15)
| (extract_bits(self.raw, 26, 26) << 5)
| (extract_bits(self.raw, 21, 21) << 6)) as usize
}
/// VB128 (bits 16-20, plus bits from 28, 30)
/// VB128 = VB128l(5) | VB128h(2) << 5. Canary's VB128h is a 2-bit
/// contiguous field at PPC 30-31 (host bits 0-1).
#[inline] pub fn vb128(&self) -> usize {
(extract_bits(self.raw, 16, 20)
| (extract_bits(self.raw, 28, 28) << 5)
| (extract_bits(self.raw, 30, 30) << 6)) as usize
| (extract_bits(self.raw, 30, 31) << 5)) as usize
}
/// VD128 (bits 6-10, plus bits from 21, 22)
/// VD128 = VD128l(5) | VD128h(2) << 5. Canary's VD128h is a 2-bit
/// contiguous field at PPC 28-29 (host bits 2-3).
#[inline] pub fn vd128(&self) -> usize {
(extract_bits(self.raw, 6, 10)
| (extract_bits(self.raw, 21, 21) << 5)
| (extract_bits(self.raw, 22, 22) << 6)) as usize
| (extract_bits(self.raw, 28, 29) << 5)) as usize
}
/// VS128 - same encoding as VD128
#[inline] pub fn vs128(&self) -> usize { self.vd128() }
/// VC register for VX128_2-form instructions (vperm128) — 3-bit at PPC bits 23-25.
#[inline] pub fn vc128_2(&self) -> usize { extract_bits(self.raw, 23, 25) as usize }
/// NB field (bits 16-20) for lswi/stswi
#[inline] pub fn nb(&self) -> u32 { extract_bits(self.raw, 16, 20) }
/// PERM field for VX128_P-form instructions (vpermwi128) — 8-bit split encoding.
/// PERMl (5 bits) at PPC bits 11-15; PERMh (3 bits) at PPC bits 23-25.
#[inline] pub fn vx128_p_perm(&self) -> u32 {
extract_bits(self.raw, 11, 15) | (extract_bits(self.raw, 23, 25) << 5)
}
/// SH field for VX128_5-form instructions (vsldoi128) — 4-bit shift at PPC bits 22-25.
#[inline] pub fn vx128_5_sh(&self) -> u32 { extract_bits(self.raw, 22, 25) }
}
/// Extract the 5-bit `UIMM` (`VX128_3`) / `IMM` (`VX128_4`) field. Canary
/// packs both formats with LSB-bits 16-20 holding the field, which is
/// MSB bits 11-15 in our `extract_bits` convention. For `vpkd3d128` /
/// `vupkd3d128` the decoded selector is `type = UIMM >> 2` (3 bits; valid
/// values 0-6 per [`crate::vmx::D3dPackType`], 7 is undocumented /
/// undefined in canary) and `pack = UIMM & 0x3` (output-slot layout for
/// `vpkd3d128` only, `vupkd3d128` ignores it).
///
/// First-Pixels M3: the interpreter previously used a hand-rolled
/// `(instr.raw >> 6) & 0x7` that was **LSB-numbered** and extracted
/// bits from a completely different part of the word (the
/// secondary-opcode region). Centralizing the extractor here matches
/// canary's `FormatVX128_{3,4}::{UIMM,IMM}` field semantics exactly.
#[inline]
pub fn extract_vx128_uimm5(raw: u32) -> u32 {
extract_bits(raw, 11, 15)
}
/// Decode a 32-bit PPC instruction into its opcode.
@@ -149,6 +205,123 @@ pub fn decode(raw: u32, addr: u32) -> DecodedInstr {
DecodedInstr { opcode, raw, addr }
}
// Perf tier-2 — direct-mapped PC-keyed decode cache.
//
// The interpreter hot path spends ~15-25% of its time in `decode()`
// parsing the raw u32 and walking the primary+secondary opcode tables.
// For non-self-modifying guest code — the common case past the XEX
// loader — `decode(raw, pc)` is purely a function of `(raw, pc)` and
// the output is `Copy + 16B`. A direct-mapped cache indexed by
// `(pc >> 2) & MASK` gives the interpreter a 1-comparison fast path,
// at the cost of one branch and a 1.5 MiB region of memory.
//
// Invalidation piggybacks on `xenia_memory::GuestMemory::page_version`
// (P5 texture-cache invalidation): every cache entry carries the page
// version that was active at decode time; on lookup we compare against
// the current version of the containing 4 KiB page. Any write to the
// page bumps the counter, so the next decode on that PC is a miss that
// refills.
/// Number of direct-mapped entries. 2^16 = 65,536 slots, one PPC
/// instruction address per slot — enough for every hot code path in a
/// typical Xbox 360 title to stay resident without collision.
const DECODE_CACHE_SIZE: usize = 1 << 16;
const DECODE_CACHE_MASK: u32 = (DECODE_CACHE_SIZE - 1) as u32;
#[derive(Clone, Copy)]
struct DecodeCacheEntry {
/// Guest PC this entry was decoded at. Used as the tag on lookup; a
/// mismatch means the slot was last populated by a different PC that
/// shares the same low-16 index.
pc: u32,
/// Page version at decode time (from `GuestMemory::page_version(pc)`).
/// Zero means "unused slot" since real page versions start at 1.
page_version: u64,
decoded: DecodedInstr,
}
impl DecodeCacheEntry {
const fn empty() -> Self {
// `Invalid` is the decoder's "unrecognized opcode" sentinel; we
// use it here as the empty-slot marker. Real misses compare `pc`,
// not the opcode, so the sentinel choice is cosmetic.
Self {
pc: 0,
page_version: 0,
decoded: DecodedInstr {
opcode: PpcOpcode::Invalid,
raw: 0,
addr: 0,
},
}
}
}
/// Direct-mapped PC-keyed decode cache. One instance shared across all
/// HW threads (PC is thread-independent; entries are read-only once
/// filled). Not thread-safe — the single scheduler thread owns it.
pub struct DecodeCache {
slots: Box<[DecodeCacheEntry]>,
hits: u64,
misses: u64,
invalidations: u64,
}
impl Default for DecodeCache {
fn default() -> Self {
Self::new()
}
}
impl DecodeCache {
pub fn new() -> Self {
Self {
slots: vec![DecodeCacheEntry::empty(); DECODE_CACHE_SIZE].into_boxed_slice(),
hits: 0,
misses: 0,
invalidations: 0,
}
}
/// Look up (or fill) the decoded form of the instruction at `pc`.
/// `raw` is the fetched instruction word; `current_page_version` is
/// `mem.page_version(pc)` — the caller has it cheaper than we do,
/// since they're already touching `mem` to fetch `raw`.
#[inline]
pub fn lookup(&mut self, pc: u32, raw: u32, current_page_version: u64) -> DecodedInstr {
let idx = ((pc >> 2) & DECODE_CACHE_MASK) as usize;
// Safety: `idx` is masked into `[0, DECODE_CACHE_SIZE)` so the
// slice access is always in-bounds. Opt-out of the bounds check
// for the hot path.
let entry = unsafe { self.slots.get_unchecked_mut(idx) };
if entry.pc == pc && entry.page_version == current_page_version {
self.hits += 1;
return entry.decoded;
}
if entry.pc == pc && entry.page_version != current_page_version {
self.invalidations += 1;
}
self.misses += 1;
let decoded = decode(raw, pc);
*entry = DecodeCacheEntry {
pc,
page_version: current_page_version,
decoded,
};
decoded
}
pub fn hits(&self) -> u64 {
self.hits
}
pub fn misses(&self) -> u64 {
self.misses
}
pub fn invalidations(&self) -> u64 {
self.invalidations
}
}
fn lookup_opcode(code: u32) -> PpcOpcode {
match extract_bits(code, 0, 5) {
2 => PpcOpcode::tdi,
@@ -498,9 +671,13 @@ fn decode_op6(code: u32) -> PpcOpcode {
_ => {}
}
// VMX128 compare
let key4 = (extract_bits(code, 22, 24) << 3) | extract_bits(code, 27, 27);
match key4 {
// VMX128 compare (VX128_R form). Single dispatch path: bit 27 = 0 always
// for these opcodes per canary's table (`ppc_opcode_table_gen.cc:295-305`).
// The Rc bit is at PPC 25 (host bit 6) per the FormatVX128_R bitfield —
// it's a runtime modifier read by the interpreter, NOT part of the
// secondary-opcode discrimination. PPCBUG-700.
let key4_nd = (extract_bits(code, 22, 24) << 3) | extract_bits(code, 27, 27);
match key4_nd {
0b000000 => return PpcOpcode::vcmpeqfp128,
0b001000 => return PpcOpcode::vcmpgefp128,
0b010000 => return PpcOpcode::vcmpgtfp128,
@@ -781,6 +958,57 @@ mod tests {
assert_eq!(instr.d(), 0x20);
}
#[test]
fn decode_cache_miss_fills_then_hit() {
let mut cache = DecodeCache::new();
let raw: u32 = (14 << 26) | (3 << 21) | (1 << 16) | 0x10;
let pc = 0x8200_0000u32;
let first = cache.lookup(pc, raw, 1);
assert_eq!(first.opcode, PpcOpcode::addi);
assert_eq!(cache.hits(), 0);
assert_eq!(cache.misses(), 1);
// Same pc, same version → cache hit, no new decode.
let second = cache.lookup(pc, raw, 1);
assert_eq!(second.opcode, PpcOpcode::addi);
assert_eq!(cache.hits(), 1);
assert_eq!(cache.misses(), 1);
}
#[test]
fn decode_cache_stale_version_refills() {
let mut cache = DecodeCache::new();
// First fill with an `addi`.
let raw_addi: u32 = (14 << 26) | (3 << 21) | (1 << 16) | 0x10;
let pc = 0x8200_0000u32;
cache.lookup(pc, raw_addi, 1);
// Guest rewrote the page: same pc, different raw + bumped version.
// Cache must refill — not return the stale `addi`.
let raw_lwz: u32 = (32 << 26) | (5 << 21) | (1 << 16) | 0x20;
let refreshed = cache.lookup(pc, raw_lwz, 2);
assert_eq!(refreshed.opcode, PpcOpcode::lwz);
assert_eq!(cache.invalidations(), 1);
assert_eq!(cache.misses(), 2);
}
#[test]
fn decode_cache_pc_collision_refills() {
// Two PCs that hash to the same slot (pc >> 2 low 16 bits equal)
// must not alias. Slot index = ((pc >> 2) & 0xFFFF) — pick two
// PCs 4 * 2^16 bytes apart.
let mut cache = DecodeCache::new();
let pc_a = 0x8200_0000u32;
let pc_b = pc_a.wrapping_add(0x0004_0000u32); // (>> 2) differs by 2^16
let raw_addi: u32 = (14 << 26) | (3 << 21) | (1 << 16) | 0x10;
let raw_lwz: u32 = (32 << 26) | (5 << 21) | (1 << 16) | 0x20;
cache.lookup(pc_a, raw_addi, 1);
// Different pc but same slot → miss + refill.
cache.lookup(pc_b, raw_lwz, 1);
// First pc comes back → miss + refill (slot was taken by pc_b).
let back = cache.lookup(pc_a, raw_addi, 1);
assert_eq!(back.opcode, PpcOpcode::addi);
assert_eq!(cache.misses(), 3);
}
#[test]
fn test_decode_branch() {
// b +0x100 => opcode 18, LI=0x40 (shifted left 2 = 0x100), AA=0, LK=0
@@ -816,4 +1044,202 @@ mod tests {
assert_eq!(extract_bits(0x8000_0000, 0, 0), 1);
assert_eq!(extract_bits(0x0000_0001, 31, 31), 1);
}
// VMX128 register-name extraction. Locks the canonical bit positions
// (decoder.rs is the single source of truth — the analysis crate's
// old `ppc.rs` had different positions, which produced wrong printed
// register names; the bug was silent because the interpreter never
// used those extractors). Each test poke-bits exactly the slots the
// accessor reads and asserts the assembled register number.
/// Build a VMX128 test word for the canary-compliant register layout.
/// `vd128 = vd_lo | (vd_hi << 5)` where vd_lo is 5 bits (PPC 6-10) and
/// vd_hi is 2 bits (PPC 28-29). Same shape for vb128 (vb_lo at PPC 16-20,
/// vb_hi 2 bits at PPC 30-31). va128 = va_lo | (va_h26<<5) | (va_h21<<6)
/// per canary's 7-bit VA selector.
fn vmx128_test_word(vd_lo: u32, vd_hi: u32, va_lo: u32, va_h26: u32, va_h21: u32,
vb_lo: u32, vb_hi: u32) -> u32 {
// PPC bit i -> host bit (31-i).
(vd_lo << (31 - 10)) // VD128l: PPC 6-10 = host 21-25
| (vd_hi << (31 - 29)) // VD128h: PPC 28-29 = host 2-3 (LSB at host 2)
| (va_lo << (31 - 15)) // VA128l: PPC 11-15 = host 16-20
| (va_h26 << (31 - 26)) // VA128h: PPC 26 = host 5
| (va_h21 << (31 - 21)) // VA128H: PPC 21 = host 10
| (vb_lo << (31 - 20)) // VB128l: PPC 16-20 = host 11-15
| (vb_hi << (31 - 31)) // VB128h: PPC 30-31 = host 0-1 (LSB at host 0)
}
#[test]
fn vmx128_vd128_low_5_bits_only() {
// vd_lo = 0..31, vd_hi = 0 → vd128 = vd_lo
for r in 0..32u32 {
let raw = (r as u32) << (31 - 10);
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), r as usize, "vd_lo={r}");
}
}
#[test]
fn vmx128_vd128_high_low_bit_adds_32() {
// vd_lo = 0, VD128h = 0b01 (LSB only at host bit 2 = PPC 29) → vd128 = 32
let raw = (1u32 << (31 - 29));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), 32);
}
#[test]
fn vmx128_vd128_high_high_bit_adds_64() {
// vd_lo = 0, VD128h = 0b10 (MSB only at host bit 3 = PPC 28) → vd128 = 64
let raw = (1u32 << (31 - 28));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), 64);
}
#[test]
fn vmx128_vd128_full_127() {
// vd_lo = 31, VD128h = 0b11 → vd128 = 127
let raw = (31u32 << (31 - 10))
| (1u32 << (31 - 28))
| (1u32 << (31 - 29));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), 127);
}
#[test]
fn vmx128_va128_canary_layout() {
// va_lo = 7 at PPC 11-15, VA128h = 1 at PPC 26 → va128 = 7 | 32 = 39
let raw = (7u32 << (31 - 15)) | (1u32 << (31 - 26));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.va128(), 39);
// VA128H = 1 at PPC 21 → va128 += 64 = 103
let raw = raw | (1u32 << (31 - 21));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.va128(), 7 | 32 | 64);
}
#[test]
fn vmx128_vb128_uses_bits30_31() {
// vb_lo = 5 at PPC 16-20. VB128h = 0b01 (LSB at PPC 31 = host 0) → +32.
// VB128h = 0b11 → +96.
let raw = (5u32 << (31 - 20)) | (1u32 << (31 - 31));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vb128(), 5 | 32);
let raw = raw | (1u32 << (31 - 30));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vb128(), 5 | 32 | 64);
}
#[test]
fn vmx128_vs128_aliases_vd128() {
// vs128 must always equal vd128.
for r in [0u32, 31, 32, 64, 96, 127] {
let lo = r & 0x1F;
let hi = (r >> 5) & 0x3;
let raw = (lo << (31 - 10))
| (hi << (31 - 29));
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), r as usize, "vd128 mismatch for r={r}");
assert_eq!(d.vs128(), r as usize, "vs128 mismatch for r={r}");
assert_eq!(d.vd128(), d.vs128());
}
}
#[test]
#[allow(dead_code)]
fn _vmx128_test_word_helper_compiles() {
// Keep the helper validated against the real accessor.
// vd_lo=5, vd_hi=0b11 → vd128 = 5 | 96 = 101
let raw = vmx128_test_word(5, 3, 0, 0, 0, 0, 0);
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vd128(), 5 | 32 | 64);
}
#[test]
fn vx128_5_sh_bit_positions() {
// SH=8 (binary 1000): bit 3 = 1, bits 0-2 = 0.
// Host bit 9 = 1 (PPC bit 22), host bits 6-8 = 0.
// So raw bit 9 set = raw |= 1 << 9 = 0x200
let raw = 0x200u32; // host bit 9 set only
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_5_sh(), 8, "SH=8: MSB at PPC bit 22");
// SH=1 (binary 0001): host bit 6 set = raw |= 1 << 6 = 0x40
let raw = 0x40u32;
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_5_sh(), 1, "SH=1: LSB at PPC bit 25");
// SH=15 (binary 1111): host bits 6-9 all set = raw |= 0xF << 6 = 0x3C0
let raw = 0x3C0u32;
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_5_sh(), 15, "SH=15: all 4 bits set");
// SH=0: raw=0
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw: 0, addr: 0 };
assert_eq!(d.vx128_5_sh(), 0, "SH=0");
}
#[test]
fn vx128_4_accessors_correct_bit_positions() {
// z=3 (binary 11) at PPC bits 24-25 = host bits 6-7
let raw = 0b11u32 << 6;
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_4_z(), 3, "z=3 from host bits 6-7");
// IMM=0x15 (binary 10101) at PPC bits 11-15 = host bits 16-20
let raw2 = 0x15u32 << 16;
let d2 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: raw2, addr: 0 };
assert_eq!(d2.vx128_4_imm(), 0x15, "IMM=0x15 from host bits 16-20");
// Combined: z=1, IMM=0xA — fields must not bleed into each other
let raw3 = (0x1u32 << 6) | (0xAu32 << 16);
let d3 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: raw3, addr: 0 };
assert_eq!(d3.vx128_4_z(), 1, "z=1 combined");
assert_eq!(d3.vx128_4_imm(), 0xA, "IMM=0xA combined");
// z=2, IMM=0xF — max 4-bit blend mask, exercises the full lower nibble
let raw4 = (0b10u32 << 6) | (0xFu32 << 16);
let d4 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: raw4, addr: 0 };
assert_eq!(d4.vx128_4_z(), 2, "z=2 from binary 10");
assert_eq!(d4.vx128_4_imm(), 0xF, "IMM=0xF all-ones nibble");
}
#[test]
fn vc128_2_extracts_ppc_bits_23_25() {
// VC=5 (binary 101) at PPC bits 23-25 = host bits 6-8
// extract_bits(raw, 23, 25) = (raw >> (31-25)) & 0x7 = (raw >> 6) & 0x7
let raw = 5u32 << 6; // host bits 6-8 = 5
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vc128_2(), 5);
let d0 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: 0, addr: 0 };
assert_eq!(d0.vc128_2(), 0);
let d7 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: 7u32 << 6, addr: 0 };
assert_eq!(d7.vc128_2(), 7);
let d1 = DecodedInstr { opcode: PpcOpcode::Invalid, raw: 1u32 << 6, addr: 0 };
assert_eq!(d1.vc128_2(), 1);
}
#[test]
fn vx128_p_perm_assembles_correctly() {
// PERMl=0x1F (all 5 bits set) at host bits 16-20: raw = 0x1F << 16
let raw = 0x1Fu32 << 16;
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_p_perm(), 0x1F, "PERMl only");
// PERMh=0x7 (all 3 bits set) at host bits 6-8: raw = 0x7 << 6 = 0x1C0
let raw = 0x7u32 << 6;
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_p_perm(), 0x7 << 5, "PERMh only: bits 5-7");
// PERMl=0xA, PERMh=0x5: raw = (0xA << 16) | (0x5 << 6)
let raw = (0xAu32 << 16) | (0x5u32 << 6);
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw, addr: 0 };
assert_eq!(d.vx128_p_perm(), 0xA | (0x5 << 5));
// PERMl and PERMh bits must not bleed into each other
let d = DecodedInstr { opcode: PpcOpcode::Invalid, raw: 0, addr: 0 };
assert_eq!(d.vx128_p_perm(), 0);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,447 @@
//! FPSCR (Floating-Point Status and Control Register) maintenance.
//!
//! Scope per project plan: rounding modes honoured, plus the exception bits
//! games actually read (FX, FEX, VX, OX, UX, ZX, XX, FI, FPRF). Enabled-
//! exception dispatch (FE[0,1], VE/OE/UE/ZE/XE) is *not* modelled — games
//! running on Xenon almost never take FP traps.
//!
//! Bit layout (PowerISA, MSB-0 numbering; stored in a u32 with bit 31 = MSB):
//!
//! | PPC bit | u32 mask | Name |
//! |---------|-------------------------|-------------|
//! | 0 | `1<<31` | FX |
//! | 1 | `1<<30` | FEX |
//! | 2 | `1<<29` | VX (summary)|
//! | 3 | `1<<28` | OX |
//! | 4 | `1<<27` | UX |
//! | 5 | `1<<26` | ZX |
//! | 6 | `1<<25` | XX |
//! | 7 | `1<<24` | VXSNAN |
//! | 8 | `1<<23` | VXISI |
//! | 9 | `1<<22` | VXIDI |
//! | 10 | `1<<21` | VXZDZ |
//! | 11 | `1<<20` | VXIMZ |
//! | 12 | `1<<19` | VXVC |
//! | 13 | `1<<18` | FR |
//! | 14 | `1<<17` | FI |
//! | 15..19 | `0xF8000 >> 15` @ 15..19 | FPRF (5 bits)|
//! | 21 | `1<<10` | VXSOFT |
//! | 22 | `1<<9` | VXSQRT |
//! | 23 | `1<<8` | VXCVI |
//! | 30..31 | `0x3` | RN (2 bits) |
use crate::context::PpcContext;
pub const FX: u32 = 1 << 31;
pub const FEX: u32 = 1 << 30;
pub const VX: u32 = 1 << 29;
pub const OX: u32 = 1 << 28;
pub const UX: u32 = 1 << 27;
pub const ZX: u32 = 1 << 26;
pub const XX: u32 = 1 << 25;
pub const VXSNAN: u32 = 1 << 24;
pub const VXISI: u32 = 1 << 23;
pub const VXIDI: u32 = 1 << 22;
pub const VXZDZ: u32 = 1 << 21;
pub const VXIMZ: u32 = 1 << 20;
pub const VXVC: u32 = 1 << 19;
pub const FR: u32 = 1 << 18;
pub const FI: u32 = 1 << 17;
pub const FPRF_MASK: u32 = 0x1F << 12; // bits 15..19
pub const VXSOFT: u32 = 1 << 10;
pub const VXSQRT: u32 = 1 << 9;
pub const VXCVI: u32 = 1 << 8;
pub const RN_MASK: u32 = 0x3;
/// Union of all VX* bits (used for the VX summary recomputation).
pub const VX_ALL: u32 = VXSNAN | VXISI | VXIDI | VXZDZ | VXIMZ | VXVC | VXSOFT | VXSQRT | VXCVI;
/// FPRF classification codes (5-bit, placed in FPSCR bits 15..19).
/// The high bit ("C" in PowerISA) distinguishes ±zero/±denormal/QNaN from
/// ±normal/±inf. The next 4 bits are (FL, FG, FE, FU) = (less, greater, equal, unordered).
pub mod fprf {
pub const QNAN: u8 = 0b1_0001;
pub const NEG_INF: u8 = 0b0_1001;
pub const NEG_NORMAL: u8 = 0b0_1000;
pub const NEG_DENORMAL: u8 = 0b1_1000;
pub const NEG_ZERO: u8 = 0b1_0010;
pub const POS_ZERO: u8 = 0b0_0010;
pub const POS_DENORMAL: u8 = 0b1_0100;
pub const POS_NORMAL: u8 = 0b0_0100;
pub const POS_INF: u8 = 0b0_0101;
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum RoundingMode {
NearestEven, // RN=00
TowardZero, // RN=01
TowardPosInf, // RN=10
TowardNegInf, // RN=11
}
pub fn rounding_mode(ctx: &PpcContext) -> RoundingMode {
match ctx.fpscr & RN_MASK {
0 => RoundingMode::NearestEven,
1 => RoundingMode::TowardZero,
2 => RoundingMode::TowardPosInf,
_ => RoundingMode::TowardNegInf,
}
}
/// Classify a finite f64 into its FPRF 5-bit code.
pub fn classify_fprf(v: f64) -> u8 {
if v.is_nan() {
fprf::QNAN
} else if v.is_infinite() {
if v.is_sign_negative() { fprf::NEG_INF } else { fprf::POS_INF }
} else if v == 0.0 {
if v.is_sign_negative() { fprf::NEG_ZERO } else { fprf::POS_ZERO }
} else if v.is_subnormal() {
if v.is_sign_negative() { fprf::NEG_DENORMAL } else { fprf::POS_DENORMAL }
} else if v.is_sign_negative() { fprf::NEG_NORMAL } else { fprf::POS_NORMAL }
}
/// Write FPRF into FPSCR, preserving other bits.
pub fn set_fprf(ctx: &mut PpcContext, code: u8) {
ctx.fpscr = (ctx.fpscr & !FPRF_MASK) | ((code as u32 & 0x1F) << 12);
}
/// Set one or more exception bits on FPSCR, maintaining FX (sticky set on any
/// new exception) and VX (summary of VX* bits).
pub fn set_exception(ctx: &mut PpcContext, bits: u32) {
let prev = ctx.fpscr;
let new = prev | bits;
// FX is sticky-set if any new non-sticky bit transitions to 1. PPC defines
// FX as "any of OX, UX, ZX, XX, VX* newly set". Compute the transition set.
let transition = (new & !prev) & (OX | UX | ZX | XX | VX_ALL);
let mut updated = new;
if transition != 0 {
updated |= FX;
}
// Recompute VX summary from any VX* bits currently set.
if (updated & VX_ALL) != 0 { updated |= VX; }
ctx.fpscr = updated;
}
/// Classify the inputs of a floating-point arithmetic op and set appropriate
/// VX* bits. Returns true if any invalid-operation was detected (caller may
/// want to write a default QNaN result).
///
/// Detected cases:
/// * any SNaN input → VXSNAN
/// * infinity - infinity (same sign) → VXISI
/// * 0 / 0 → VXZDZ
/// * infinity / infinity → VXIDI
/// * 0 * infinity → VXIMZ
pub fn check_invalid_add(ctx: &mut PpcContext, a: f64, b: f64, sub: bool) -> bool {
let mut bits = 0u32;
if is_snan(a) || is_snan(b) { bits |= VXSNAN; }
if a.is_infinite() && b.is_infinite() {
// For add: VXISI iff same-sign(a,b) negated — inf - inf
// For sub: VXISI iff same-sign(a,b) — (+inf) - (+inf) or (-inf) - (-inf)
let both_pos = a.is_sign_positive() && b.is_sign_positive();
let both_neg = a.is_sign_negative() && b.is_sign_negative();
if sub {
if both_pos || both_neg { bits |= VXISI; }
} else {
// add: opposite signs cancel to inf-inf
if a.is_sign_positive() != b.is_sign_positive() { bits |= VXISI; }
}
}
if bits != 0 { set_exception(ctx, bits); return true; }
false
}
/// FMA-aware add/sub VXISI check. Per PPCBUG-202+203: the previous code
/// passed `a*c` as `lhs` to `check_invalid_add`, which suffers from two
/// rounding errors and can spuriously raise/miss VXISI in extreme cases.
/// This helper derives the mathematical product's sign and infinity status
/// from the inputs directly.
///
/// `sub` follows the same semantics as `check_invalid_add`:
/// - false (add): VXISI when product and b have opposite signs at infinity
/// - true (sub): VXISI when product and b have same sign at infinity
pub fn check_invalid_fma_add(ctx: &mut PpcContext, a: f64, c: f64, b: f64, sub: bool) -> bool {
let mut bits = 0u32;
if is_snan(a) || is_snan(c) || is_snan(b) { bits |= VXSNAN; }
let product_is_inf = (a.is_infinite() || c.is_infinite())
&& a != 0.0 && c != 0.0
&& !a.is_nan() && !c.is_nan();
if product_is_inf && b.is_infinite() {
let p_neg = a.is_sign_negative() != c.is_sign_negative();
let b_neg = b.is_sign_negative();
let same_sign = p_neg == b_neg;
if (sub && same_sign) || (!sub && !same_sign) {
bits |= VXISI;
}
}
if bits != 0 { set_exception(ctx, bits); return true; }
false
}
pub fn check_invalid_mul(ctx: &mut PpcContext, a: f64, b: f64) -> bool {
let mut bits = 0u32;
if is_snan(a) || is_snan(b) { bits |= VXSNAN; }
let zero_times_inf =
(a == 0.0 && b.is_infinite()) || (b == 0.0 && a.is_infinite());
if zero_times_inf { bits |= VXIMZ; }
if bits != 0 { set_exception(ctx, bits); return true; }
false
}
pub fn check_invalid_div(ctx: &mut PpcContext, a: f64, b: f64) -> bool {
let mut bits = 0u32;
if is_snan(a) || is_snan(b) { bits |= VXSNAN; }
if a == 0.0 && b == 0.0 { bits |= VXZDZ; }
if a.is_infinite() && b.is_infinite() { bits |= VXIDI; }
if bits != 0 { set_exception(ctx, bits); return true; }
false
}
/// Divide-by-zero (finite nonzero / 0) — sets ZX but not VX.
pub fn check_zero_divide(ctx: &mut PpcContext, a: f64, b: f64) {
if b == 0.0 && a != 0.0 && !a.is_nan() && !a.is_infinite() {
set_exception(ctx, ZX);
}
}
/// Post-op: classify the result and update FPRF + detect overflow/underflow/inexact.
/// `inputs_finite` lets us suppress OX for ops whose output is infinite because
/// an input already was.
pub fn update_after_op(ctx: &mut PpcContext, result: f64, inputs_were_finite: bool) {
let mut bits = 0u32;
if result.is_infinite() && inputs_were_finite {
bits |= OX;
}
if result.is_subnormal() {
bits |= UX;
}
if bits != 0 { set_exception(ctx, bits); }
set_fprf(ctx, classify_fprf(result));
}
/// Test whether an f64 is a signalling NaN.
/// In IEEE 754-2008 (binary64), the signalling bit is the high bit of the
/// mantissa. SNaN has it clear, QNaN has it set. NaN with high mantissa bit
/// clear (and mantissa nonzero) is an SNaN.
pub fn is_snan(x: f64) -> bool {
if !x.is_nan() { return false; }
let bits = x.to_bits();
// Highest mantissa bit (bit 51) clear ⇒ SNaN. Mantissa nonzero always true for NaN.
(bits & (1u64 << 51)) == 0
}
/// Round an f64 to f32 honouring FPSCR[RN]. Uses the current hardware
/// rounding mode when RN=0 (nearest-even, the PPC default), otherwise
/// emulates the directed rounding via bit-manipulation.
pub fn round_to_single(ctx: &PpcContext, v: f64) -> f64 {
match rounding_mode(ctx) {
RoundingMode::NearestEven => (v as f32) as f64,
RoundingMode::TowardZero => round_single_toward_zero(v) as f64,
RoundingMode::TowardPosInf => round_single_toward_pos_inf(v) as f64,
RoundingMode::TowardNegInf => round_single_toward_neg_inf(v) as f64,
}
}
/// Round an f64 to an i64 integer honouring FPSCR[RN]. Used by fctidx.
pub fn round_to_i64(ctx: &PpcContext, v: f64) -> i64 {
match rounding_mode(ctx) {
RoundingMode::NearestEven => {
// PPCBUG-221: round-half-to-even (banker's rounding). The previous
// tie-detection used `(diff - 0.5).abs() < f64::EPSILON` which
// breaks for |v| > 2^52 (where v.trunc() == v exactly, giving diff
// == 0). Use a fractional-part-only check that's exact for
// |v| <= 2^52 and degenerates correctly above.
let t = v.trunc();
let frac = v - t;
let fa = frac.abs();
if fa > 0.5 {
t as i64 + if v >= 0.0 { 1 } else { -1 }
} else if fa < 0.5 {
t as i64
} else {
// Exact 0.5 tie — round to even.
let fi = t as i64;
if fi & 1 == 0 { fi } else { fi + if v >= 0.0 { 1 } else { -1 } }
}
}
RoundingMode::TowardZero => v.trunc() as i64,
RoundingMode::TowardPosInf => v.ceil() as i64,
RoundingMode::TowardNegInf => v.floor() as i64,
}
}
/// Round an f64 to an i32 integer honouring FPSCR[RN]. Used by fctiwx.
pub fn round_to_i32(ctx: &PpcContext, v: f64) -> i32 {
round_to_i64(ctx, v).clamp(i32::MIN as i64, i32::MAX as i64) as i32
}
// ------ directed rounding helpers (f64 → f32) ------
fn round_single_toward_zero(v: f64) -> f32 {
// Default f64→f32 is round-to-nearest-even. Emulate truncation:
// take the default rounded value; if the absolute rounded magnitude
// exceeds |v|, bump down by one ULP toward zero.
let rn = v as f32;
if rn.is_nan() || rn.is_infinite() || rn == 0.0 { return rn; }
if rn.abs() as f64 <= v.abs() { return rn; }
let adj_bits = rn.to_bits();
// Both positive and negative finite f32 values have the IEEE-754 sign
// bit as the MSB; subtracting 1 from `to_bits()` always reduces the
// magnitude by one ULP (clearing the lowest mantissa bit, with carry
// never reaching the sign bit since adj_bits is already not-zero,
// not-inf, not-NaN, and we already returned early for those).
let lower = adj_bits - 1;
f32::from_bits(lower)
}
fn round_single_toward_pos_inf(v: f64) -> f32 {
let rn = v as f32;
if rn.is_nan() || rn.is_infinite() { return rn; }
if (rn as f64) >= v { return rn; }
// rn < v — bump up by one ULP in the +direction.
let b = rn.to_bits();
let nb = if rn.is_sign_negative() { b - 1 } else { b + 1 };
f32::from_bits(nb)
}
fn round_single_toward_neg_inf(v: f64) -> f32 {
let rn = v as f32;
if rn.is_nan() || rn.is_infinite() { return rn; }
if (rn as f64) <= v { return rn; }
// rn > v — bump down.
let b = rn.to_bits();
let nb = if rn.is_sign_negative() { b + 1 } else { b - 1 };
f32::from_bits(nb)
}
/// Drop-in replacement for the old `update_cr1_from_fpscr`. Reads the
/// currently-maintained FPSCR bits (FX, FEX, VX, OX) into CR1.
pub fn update_cr1(ctx: &mut PpcContext) {
ctx.cr[1].lt = (ctx.fpscr & FX) != 0;
ctx.cr[1].gt = (ctx.fpscr & FEX) != 0;
ctx.cr[1].eq = (ctx.fpscr & VX) != 0;
ctx.cr[1].so = (ctx.fpscr & OX) != 0;
}
#[cfg(test)]
mod tests {
use super::*;
fn ctx() -> PpcContext { PpcContext::new() }
#[test]
fn rn_default_is_nearest() {
assert_eq!(rounding_mode(&ctx()), RoundingMode::NearestEven);
}
#[test]
fn rn_bits_decode() {
let mut c = ctx();
c.fpscr = 0x1;
assert_eq!(rounding_mode(&c), RoundingMode::TowardZero);
c.fpscr = 0x2;
assert_eq!(rounding_mode(&c), RoundingMode::TowardPosInf);
c.fpscr = 0x3;
assert_eq!(rounding_mode(&c), RoundingMode::TowardNegInf);
}
#[test]
fn fprf_classifies_correctly() {
assert_eq!(classify_fprf(1.0), fprf::POS_NORMAL);
assert_eq!(classify_fprf(-1.0), fprf::NEG_NORMAL);
assert_eq!(classify_fprf(0.0), fprf::POS_ZERO);
assert_eq!(classify_fprf(-0.0), fprf::NEG_ZERO);
assert_eq!(classify_fprf(f64::INFINITY), fprf::POS_INF);
assert_eq!(classify_fprf(f64::NEG_INFINITY), fprf::NEG_INF);
assert_eq!(classify_fprf(f64::NAN), fprf::QNAN);
assert_eq!(classify_fprf(f64::from_bits(1)), fprf::POS_DENORMAL);
}
#[test]
fn fx_is_sticky_on_new_exception() {
let mut c = ctx();
set_exception(&mut c, OX);
assert_ne!(c.fpscr & FX, 0);
// Clear FX/OX manually.
c.fpscr &= !(FX | OX);
// Re-set OX; FX should re-latch.
set_exception(&mut c, OX);
assert_ne!(c.fpscr & FX, 0);
}
#[test]
fn vx_summary_set_on_any_vx_bit() {
let mut c = ctx();
set_exception(&mut c, VXSNAN);
assert_ne!(c.fpscr & VX, 0);
assert_ne!(c.fpscr & VXSNAN, 0);
}
#[test]
fn round_to_single_nearest_is_identity_on_representable() {
let c = ctx();
assert_eq!(round_to_single(&c, 1.0_f64), 1.0_f64);
}
#[test]
fn round_to_i32_clamps_out_of_range() {
let c = ctx();
assert_eq!(round_to_i32(&c, 1e20_f64), i32::MAX);
assert_eq!(round_to_i32(&c, -1e20_f64), i32::MIN);
}
#[test]
fn round_to_i64_nearest_even_on_tie() {
let c = ctx();
assert_eq!(round_to_i64(&c, 0.5_f64), 0);
assert_eq!(round_to_i64(&c, 1.5_f64), 2);
assert_eq!(round_to_i64(&c, 2.5_f64), 2);
assert_eq!(round_to_i64(&c, 3.5_f64), 4);
assert_eq!(round_to_i64(&c, -0.5_f64), 0);
assert_eq!(round_to_i64(&c, -1.5_f64), -2);
assert_eq!(round_to_i64(&c, -2.5_f64), -2);
}
#[test]
fn round_to_i64_non_tie_cases() {
// PPCBUG-221 regression: non-tie fractions must round to nearest.
let c = ctx();
assert_eq!(round_to_i64(&c, 0.4_f64), 0);
assert_eq!(round_to_i64(&c, 0.6_f64), 1);
assert_eq!(round_to_i64(&c, -0.4_f64), 0);
assert_eq!(round_to_i64(&c, -0.6_f64), -1);
}
#[test]
fn round_to_i32_nearest_even_on_tie() {
// PPCBUG-227: round_to_i32 inherits round_to_i64's tie semantics.
let c = ctx();
assert_eq!(round_to_i32(&c, 0.5_f64), 0);
assert_eq!(round_to_i32(&c, 1.5_f64), 2);
assert_eq!(round_to_i32(&c, 2.5_f64), 2);
assert_eq!(round_to_i32(&c, -1.5_f64), -2);
}
#[test]
fn check_invalid_add_detects_inf_minus_inf() {
let mut c = ctx();
assert!(check_invalid_add(&mut c, f64::INFINITY, f64::INFINITY, true));
assert_ne!(c.fpscr & VXISI, 0);
}
#[test]
fn check_invalid_div_detects_zero_over_zero() {
let mut c = ctx();
assert!(check_invalid_div(&mut c, 0.0, 0.0));
assert_ne!(c.fpscr & VXZDZ, 0);
}
#[test]
fn snan_detection() {
// SNaN in binary64: sign=0, exp=all-ones, mantissa nonzero with bit 51 clear.
let snan = f64::from_bits(0x7FF0_0000_0000_0001);
assert!(is_snan(snan));
assert!(!is_snan(f64::NAN));
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,25 @@
pub mod block_cache;
pub mod context;
pub mod decoder;
pub mod disasm;
pub mod fpscr;
pub mod interpreter;
pub mod opcode;
pub mod overflow;
pub mod phaser;
pub mod reservation;
pub mod scheduler;
pub mod trap;
pub mod vmx;
pub use context::PpcContext;
pub use decoder::decode;
pub use disasm::{DisasmItem, DisasmText, disassemble, format as disasm_format, iter_disasm};
pub use opcode::PpcOpcode;
pub use phaser::{Phaser, PhaserOutcome};
pub use reservation::ReservationTable;
pub use scheduler::{
BlockReason, GuestThread, HwSlot, HwState, MigrationFixup, OrderMode, PcrWriter, RoundOutcome,
Scheduler, SpawnError, SpawnParams, ThreadRef, HW_THREAD_COUNT, INITIAL_GUEST_TID,
QUANTUM_DEFAULT,
};

View File

@@ -145,6 +145,33 @@ impl PpcOpcode {
matches!(self, Self::sc)
}
/// Returns true if this opcode unconditionally ends a basic block:
/// any branch, system call, trap, or `Invalid` (decoder couldn't
/// recognize the instruction — execution will hit the
/// `Unimplemented` arm and we don't want to swallow the boundary
/// inside a cached block).
///
/// Notably *not* terminating: `mtmsr`/`mtmsrd`/`isync`/`mfmsr`.
/// On real hardware these have synchronization semantics (a context
/// synchronizing event for `isync`, MSR rewrite for the `mt*`s) but
/// our interpreter has no asynchronous-exception model and no
/// out-of-order execution — they execute as plain ALU/move ops and
/// don't change control flow synchronously. Block-cache replay is
/// still bit-for-bit identical to per-instruction dispatch for
/// those.
///
/// Used by the basic-block cache (`block_cache.rs`) to know when to
/// stop accumulating instructions during a forward decode walk.
pub fn terminates_block(&self) -> bool {
matches!(
self,
Self::bx | Self::bcx | Self::bclrx | Self::bcctrx
| Self::sc
| Self::td | Self::tdi | Self::tw | Self::twi
| Self::Invalid
)
}
/// Returns true if this is a load instruction.
pub fn is_load(&self) -> bool {
matches!(self,
@@ -194,3 +221,60 @@ impl std::fmt::Display for PpcOpcode {
std::fmt::Debug::fmt(self, f)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn terminates_block_includes_all_branches() {
assert!(PpcOpcode::bx.terminates_block());
assert!(PpcOpcode::bcx.terminates_block());
assert!(PpcOpcode::bclrx.terminates_block());
assert!(PpcOpcode::bcctrx.terminates_block());
}
#[test]
fn terminates_block_includes_sc_and_traps() {
assert!(PpcOpcode::sc.terminates_block());
assert!(PpcOpcode::td.terminates_block());
assert!(PpcOpcode::tdi.terminates_block());
assert!(PpcOpcode::tw.terminates_block());
assert!(PpcOpcode::twi.terminates_block());
}
#[test]
fn terminates_block_includes_invalid() {
// Decoder failure must end the block — otherwise an unknown
// opcode would be replayed inside a cached block without going
// through the per-instruction Unimplemented path.
assert!(PpcOpcode::Invalid.terminates_block());
}
#[test]
fn terminates_block_excludes_straight_line_ops() {
// Common ALU and load/store ops must NOT terminate a block.
assert!(!PpcOpcode::addi.terminates_block());
assert!(!PpcOpcode::addis.terminates_block());
assert!(!PpcOpcode::addx.terminates_block());
assert!(!PpcOpcode::cmpi.terminates_block());
assert!(!PpcOpcode::cmp.terminates_block());
assert!(!PpcOpcode::lwz.terminates_block());
assert!(!PpcOpcode::stw.terminates_block());
assert!(!PpcOpcode::lbzx.terminates_block());
assert!(!PpcOpcode::ori.terminates_block());
assert!(!PpcOpcode::oris.terminates_block());
assert!(!PpcOpcode::rlwinmx.terminates_block());
}
#[test]
fn terminates_block_excludes_msr_and_sync_ops() {
// Documented decision: synchronizing ops execute as ALU within
// a block since the interpreter has no async-exception model.
assert!(!PpcOpcode::mtmsr.terminates_block());
assert!(!PpcOpcode::mtmsrd.terminates_block());
assert!(!PpcOpcode::isync.terminates_block());
assert!(!PpcOpcode::sync.terminates_block());
assert!(!PpcOpcode::mfmsr.terminates_block());
}
}

View File

@@ -0,0 +1,178 @@
//! OE / XER[OV] / XER[SO] handling for integer arithmetic.
//!
//! PPC integer ops with the OE bit set update XER[OV] (overflow) and sticky-set
//! XER[SO]. When OE is clear the instruction leaves XER untouched. Signed
//! overflow is predicated on the operation width and operand signs per the
//! PowerISA pseudocode. For 32-bit-word operations (`addw`, `mullw`, `divw`,
//! `neg`, etc. — on PPC these all have `w` in the mnemonic in spec
//! descriptions even when the assembler spells them without) the predicate
//! uses the low 32 bits. For 64-bit operations (`add`, `mulld`, `divd`) the
//! predicate uses the full 64 bits.
use crate::context::PpcContext;
#[inline]
pub fn apply(ctx: &mut PpcContext, overflowed: bool) {
if overflowed {
ctx.xer_ov = 1;
ctx.xer_so = 1;
} else {
ctx.xer_ov = 0;
}
}
/// Signed addition overflow at width-64 (plain `add`, `addc`, `subf`, `subfc`).
///
/// Predicate: same-sign inputs with opposite-sign result.
/// For sub callers, rewrite as `a + b'` first (see `_sub`).
#[inline]
pub fn add_ov_64(a: u64, b: u64, result: u64) -> bool {
((!(a ^ b)) & (a ^ result)) >> 63 != 0
}
/// Universal signed-overflow predicate for 64-bit arithmetic.
///
/// Caller computes the mathematical (infinite-precision) signed sum as i128,
/// plus the stored 64-bit result. Overflow iff the two disagree — i.e. the
/// true value doesn't fit in i64.
///
/// Use this for multi-term chains (`adde`, `addme`, `addze`, `subfe`, `subfme`,
/// `subfze`) where the carry-in makes the bit-predicate above awkward.
#[inline]
pub fn sum_overflow_64(true_sum: i128, result: u64) -> bool {
true_sum != (result as i64) as i128
}
/// Signed subtraction: RT = b - a. Overflow iff opposite-sign inputs with
/// result sign != b's sign. Equivalently, reduce to addition with `!a + 1`.
#[inline]
pub fn sub_ov_64(a: u64, b: u64, result: u64) -> bool {
((a ^ b) & (b ^ result)) >> 63 != 0
}
/// Signed `addc`/`adde` chain overflow. Same rule as `add_ov_64` — the carry
/// in doesn't alter the sign predicate directly because it's already folded
/// into the stored result.
#[inline]
pub fn adde_ov_64(a: u64, b: u64, result: u64) -> bool {
add_ov_64(a, b, result)
}
/// Signed 32-bit multiply overflow (`mullwo`): result fits in 32 bits signed
/// iff bit 32 equals bits 33..63 of the 64-bit product.
#[inline]
pub fn mullw_ov(product: i64) -> bool {
let lo = product as i32 as i64;
lo != product
}
/// Signed 64-bit multiply overflow (`mulldo`). Detected via checked_mul.
#[inline]
pub fn mulld_ov(a: i64, b: i64) -> bool {
a.checked_mul(b).is_none()
}
/// `divwo` / `divwuo` / `divdo` / `divduo` raise OV in two cases:
/// * divisor is zero, or
/// * signed division of `INT_MIN / -1` (quotient doesn't fit).
#[inline]
pub fn divw_ov_signed(ra: i32, rb: i32) -> bool {
rb == 0 || (ra == i32::MIN && rb == -1)
}
#[inline]
pub fn divw_ov_unsigned(rb: u32) -> bool {
rb == 0
}
#[inline]
pub fn divd_ov_signed(ra: i64, rb: i64) -> bool {
rb == 0 || (ra == i64::MIN && rb == -1)
}
#[inline]
pub fn divd_ov_unsigned(rb: u64) -> bool {
rb == 0
}
/// `negx`: RT = -(RA). Overflow only when RA = INT_MIN (the negation doesn't fit).
#[inline]
pub fn neg_ov_64(ra: u64) -> bool {
ra == 0x8000_0000_0000_0000
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn add_no_overflow() {
assert!(!add_ov_64(1, 2, 3));
assert!(!add_ov_64(u64::MAX, 0, u64::MAX));
}
#[test]
fn add_positive_overflow() {
// INT64_MAX + 1 = INT64_MIN — signed overflow
let a = i64::MAX as u64;
let b = 1u64;
let r = a.wrapping_add(b);
assert!(add_ov_64(a, b, r));
}
#[test]
fn add_negative_overflow() {
// INT64_MIN + -1 = INT64_MAX — signed overflow
let a = i64::MIN as u64;
let b = (-1i64) as u64;
let r = a.wrapping_add(b);
assert!(add_ov_64(a, b, r));
}
#[test]
fn sub_overflow_min_minus_pos() {
// INT64_MIN - 1 overflows
let b = i64::MIN as u64;
let a = 1u64;
let r = b.wrapping_sub(a);
assert!(sub_ov_64(a, b, r));
}
#[test]
fn sub_no_overflow() {
let b = 5u64;
let a = 2u64;
let r = b.wrapping_sub(a);
assert!(!sub_ov_64(a, b, r));
}
#[test]
fn mullw_fits_32_bits() {
assert!(!mullw_ov((i32::MAX as i64) * 1));
assert!(!mullw_ov(-1i64));
}
#[test]
fn mullw_overflows_32_bits() {
let p = (i32::MAX as i64) * 2;
assert!(mullw_ov(p));
}
#[test]
fn mulld_overflows() {
assert!(mulld_ov(i64::MAX, 2));
assert!(!mulld_ov(i64::MAX, 1));
// PPCBUG-022: INT_MIN * -1 overflows (=-INT_MIN > INT_MAX).
// checked_mul correctly returns None for this case.
assert!(mulld_ov(i64::MIN, -1), "INT_MIN * -1 overflows i64");
assert!(!mulld_ov(i64::MIN, 1));
assert!(!mulld_ov(i64::MIN + 1, -1), "INT_MIN+1 * -1 = INT_MAX, no overflow");
}
#[test]
fn neg_ov_only_at_min() {
assert!(neg_ov_64(i64::MIN as u64));
assert!(!neg_ov_64(0));
assert!(!neg_ov_64(1));
}
}

View File

@@ -0,0 +1,345 @@
//! Quantum-boundary phaser for the M3 per-HW-thread parallel scheduler.
//!
//! Six [`super::HW_THREAD_COUNT`] host threads run their slots' interpreters
//! in parallel, then meet at a phaser to advance to the next quantum. This
//! is **not** [`std::sync::Barrier`]: a Barrier needs a fixed party count,
//! but our slots can become idle (no runnable thread) and shouldn't block
//! the phaser arrival.
//!
//! ## Semantics
//!
//! - Each slot at the end of its quantum either calls
//! [`Phaser::arrive_and_wait`] (it has a runnable thread to run next
//! quantum) or [`Phaser::skip`] (it's idle this round and will wake on
//! `slot_wake[i]`).
//! - The phase advances when **all 6 slots have either arrived or
//! skipped**. Arrived slots block until the advance; skipped slots
//! return immediately and re-poll their wake state.
//! - The phaser uses a generation counter so a slot that arrives "early"
//! in the next phase doesn't see the prior phase's "all arrived"
//! condition.
//! - Defensive timeout: [`Phaser::arrive_and_wait_timeout`] returns
//! [`PhaserOutcome::Timeout`] if a peer crashes / hangs. Callers
//! typically convert this into a graceful shutdown rather than
//! panicking, so the rest of the topology can tear down cleanly.
//!
//! ## Memory ordering
//!
//! - The participant counter (`arrived` + `skipped`) uses `AcqRel` on
//! the increment so the last-to-arrive thread sees a consistent
//! "everyone is here" snapshot.
//! - The generation `phase` is read with `Acquire` in arrivers' wait
//! loops; the advancing thread stores with `Release` after bumping.
//! - The condvar's broadcast publishes the phase; the wait loop
//! re-checks `phase` against its captured value to defend against
//! spurious wakeups.
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Condvar, Mutex};
use std::time::{Duration, Instant};
/// Outcome of a phaser arrival.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PhaserOutcome {
/// All participants arrived/skipped — phase advanced. Caller proceeds
/// into the next quantum.
Advanced,
/// Defensive timeout fired before all peers arrived. Caller should
/// log + initiate shutdown rather than retry.
Timeout,
/// Phaser was shut down via [`Phaser::shutdown`]; all waiters are
/// woken and return this. Caller exits cleanly.
Shutdown,
}
/// Custom barrier-with-skip primitive. Construct once with the number of
/// participating slots; share via `Arc` across host threads.
pub struct Phaser {
/// Total participant count (constant after construction). For our
/// scheduler this is `HW_THREAD_COUNT = 6`.
party_count: u32,
/// Monotonic phase counter, incremented every time the phase
/// advances. Used as a generation marker so a slot that wakes "into"
/// the next phase doesn't observe the old "everyone arrived" state.
phase: AtomicU32,
/// Inner state guarded by the condvar's mutex.
inner: Mutex<Inner>,
/// Notified when a phase advances or shutdown fires.
cv: Condvar,
}
#[derive(Debug)]
struct Inner {
arrived_or_skipped: u32,
shutdown: bool,
}
impl Phaser {
/// Create a phaser with `party_count` participants. Panics if
/// `party_count == 0`.
pub fn new(party_count: u32) -> Self {
assert!(party_count > 0, "phaser party_count must be > 0");
Self {
party_count,
phase: AtomicU32::new(0),
inner: Mutex::new(Inner {
arrived_or_skipped: 0,
shutdown: false,
}),
cv: Condvar::new(),
}
}
/// Get the current phase number. Useful for tests and observability.
pub fn current_phase(&self) -> u32 {
self.phase.load(Ordering::Acquire)
}
/// Mark this slot as not participating in the current phase. Counts
/// toward the advance threshold but does not block. Used when a slot
/// has no runnable thread and is parked waiting on
/// `slot_wake[i].unpark()`.
///
/// `_slot_id` is informational (not stored); the parameter exists so
/// call sites stay greppable.
pub fn skip(&self, _slot_id: u8) {
self.contribute_advance();
}
/// Block until the phase advances or the defensive 5-second timeout
/// fires. Returns [`PhaserOutcome::Advanced`] on a clean phase
/// transition; [`Timeout`] if a peer hung; [`Shutdown`] on tear-down.
///
/// `_slot_id` is informational (see [`Self::skip`]).
pub fn arrive_and_wait(&self, _slot_id: u8) -> PhaserOutcome {
self.arrive_and_wait_timeout(Duration::from_secs(5))
}
/// Same as [`Self::arrive_and_wait`] with a caller-supplied timeout.
pub fn arrive_and_wait_timeout(&self, timeout: Duration) -> PhaserOutcome {
let pre_phase = self.phase.load(Ordering::Acquire);
self.contribute_advance();
let deadline = Instant::now() + timeout;
let mut guard = self.inner.lock().unwrap();
loop {
if guard.shutdown {
return PhaserOutcome::Shutdown;
}
if self.phase.load(Ordering::Acquire) != pre_phase {
return PhaserOutcome::Advanced;
}
let now = Instant::now();
if now >= deadline {
return PhaserOutcome::Timeout;
}
let remaining = deadline - now;
let result = self.cv.wait_timeout(guard, remaining).unwrap();
guard = result.0;
if result.1.timed_out() {
// Loop once more to disambiguate "real timeout" vs
// "spurious wakeup just before the deadline".
if self.phase.load(Ordering::Acquire) != pre_phase {
return PhaserOutcome::Advanced;
}
if guard.shutdown {
return PhaserOutcome::Shutdown;
}
return PhaserOutcome::Timeout;
}
}
}
/// Wake every parked arriver and signal shutdown. After this, all
/// future and outstanding `arrive_and_wait_*` calls return
/// [`PhaserOutcome::Shutdown`].
pub fn shutdown(&self) {
let mut guard = self.inner.lock().unwrap();
guard.shutdown = true;
self.cv.notify_all();
}
/// Common path for both arrive-and-wait and skip: bump the
/// participant counter, and if we were the last one in, advance the
/// phase + broadcast.
fn contribute_advance(&self) {
let mut guard = self.inner.lock().unwrap();
guard.arrived_or_skipped += 1;
if guard.arrived_or_skipped >= self.party_count {
// Last one in. Reset the counter, bump the phase, broadcast.
guard.arrived_or_skipped = 0;
// `Release` on the phase store pairs with `Acquire` reads in
// arriving slots' wait-loop predicates.
self.phase.fetch_add(1, Ordering::Release);
self.cv.notify_all();
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::Arc;
use std::sync::atomic::AtomicU32;
use std::thread;
/// All N participants arrive — phase advances, every arriver returns
/// `Advanced`.
#[test]
fn n_arrivers_all_advance() {
const N: u32 = 6;
let p = Arc::new(Phaser::new(N));
let mut handles = Vec::new();
for i in 0..N {
let p = p.clone();
handles.push(
thread::Builder::new()
.name(format!("phaser-test-{i}"))
.spawn(move || p.arrive_and_wait(i as u8))
.unwrap(),
);
}
for h in handles {
assert_eq!(h.join().unwrap(), PhaserOutcome::Advanced);
}
assert_eq!(p.current_phase(), 1);
}
/// 5 arrive + 1 skip → phase advances; arrivers see `Advanced`.
#[test]
fn skip_counts_toward_advance() {
const N: u32 = 6;
let p = Arc::new(Phaser::new(N));
let mut handles = Vec::new();
for i in 0..(N - 1) {
let p = p.clone();
handles.push(
thread::Builder::new()
.name(format!("phaser-arrive-{i}"))
.spawn(move || p.arrive_and_wait(i as u8))
.unwrap(),
);
}
// Brief pause to let arrivers park first (exercising the
// skip-unblocks-arrivers path).
thread::sleep(Duration::from_millis(20));
p.skip((N - 1) as u8);
for h in handles {
assert_eq!(h.join().unwrap(), PhaserOutcome::Advanced);
}
assert_eq!(p.current_phase(), 1);
}
/// Shutdown wakes parked arrivers; they return `Shutdown`.
#[test]
fn shutdown_wakes_arrivers() {
const N: u32 = 6;
let p = Arc::new(Phaser::new(N));
let mut handles = Vec::new();
// Only N-1 arrive — phase will not advance.
for i in 0..(N - 1) {
let p = p.clone();
handles.push(
thread::Builder::new()
.name(format!("phaser-arrive-shutdown-{i}"))
.spawn(move || p.arrive_and_wait(i as u8))
.unwrap(),
);
}
thread::sleep(Duration::from_millis(20));
p.shutdown();
for h in handles {
assert_eq!(h.join().unwrap(), PhaserOutcome::Shutdown);
}
}
/// Defensive timeout: if some peers never arrive, others surface
/// `Timeout` rather than blocking forever.
#[test]
fn timeout_fires_when_peer_hangs() {
const N: u32 = 4;
let p = Arc::new(Phaser::new(N));
// Only 2 of 4 arrive — others "hang".
let p1 = p.clone();
let h1 = thread::spawn(move || {
p1.arrive_and_wait_timeout(Duration::from_millis(50))
});
let p2 = p.clone();
let h2 = thread::spawn(move || {
p2.arrive_and_wait_timeout(Duration::from_millis(50))
});
assert_eq!(h1.join().unwrap(), PhaserOutcome::Timeout);
assert_eq!(h2.join().unwrap(), PhaserOutcome::Timeout);
}
/// Multi-phase stress: all participants run a tight loop of
/// arrive_and_wait calls; after K phases they all observe the same
/// `current_phase()` value. Catches generation/counter resync bugs.
#[test]
fn multi_phase_progress() {
const N: u32 = 6;
const K: u32 = 1000;
let p = Arc::new(Phaser::new(N));
let counter = Arc::new(AtomicU32::new(0));
let mut handles = Vec::new();
for i in 0..N {
let p = p.clone();
let c = counter.clone();
handles.push(
thread::Builder::new()
.name(format!("phaser-multi-{i}"))
.spawn(move || {
for _ in 0..K {
assert_eq!(
p.arrive_and_wait(i as u8),
PhaserOutcome::Advanced
);
}
c.fetch_add(1, Ordering::Relaxed);
})
.unwrap(),
);
}
for h in handles {
h.join().unwrap();
}
assert_eq!(counter.load(Ordering::Relaxed), N);
assert_eq!(p.current_phase(), K);
}
/// Mixed skip/arrive across phases — emulates the realistic scheduler
/// pattern where slots become idle for some quanta.
#[test]
fn mixed_skip_and_arrive_random() {
const N: u32 = 6;
const K: u32 = 200;
let p = Arc::new(Phaser::new(N));
let mut handles = Vec::new();
for i in 0..N {
let p = p.clone();
handles.push(
thread::Builder::new()
.name(format!("phaser-mixed-{i}"))
.spawn(move || {
// Pseudo-random skip pattern based on slot+phase
let mut state: u32 = 0x9E37_79B9u32.wrapping_add(i);
for phase in 0..K {
state = state.wrapping_mul(0x6C8E_9CF7).wrapping_add(phase);
if state & 0xF == 0 {
p.skip(i as u8);
} else {
let _ = p.arrive_and_wait(i as u8);
}
}
})
.unwrap(),
);
}
for h in handles {
h.join().unwrap();
}
// After K rounds with all-N participation each phase, the phase
// counter equals K. Each iteration contributes exactly N to the
// counter (split between arrive and skip).
assert_eq!(p.current_phase(), K);
}
}

View File

@@ -0,0 +1,424 @@
//! Inter-thread reservation table for `lwarx`/`stwcx.` and
//! `ldarx`/`stdcx.`.
//!
//! On real Xenon, each core's `lwarx` places a reservation on a 128-byte
//! cache line; any other CPU's store to the line invalidates the
//! reservation. `stwcx.`'s success depends on the reservation still being
//! valid. Under M3's per-HW-thread parallelism, we need an inter-thread
//! mechanism for the same guarantee.
//!
//! M2 introduces the table behind a runtime `reservations_enabled` flag
//! (default `false`). When the flag is `false`, the interpreter's
//! existing per-`PpcContext` `reserved_line`/`has_reservation` fields are
//! used as-is — no inter-thread tracking. M3 flips the flag on once the
//! per-HW-thread host threads are spawning.
//!
//! ## Design
//!
//! - **Banked AtomicU64 array** of [`NUM_LINES`] entries (4096 × 8 B =
//! 32 KiB total). Each entry packs `(line_address, generation,
//! hw_id)`. A zero value means "no reservation on this bank".
//! - **Hash function**: `(line >> 7) & (NUM_LINES - 1)`. Different lines
//! that map to the same bank conservatively invalidate each other's
//! reservations — sound (real Xenon's L2 has finite associativity and
//! has the same property), at the cost of slightly more `stwcx.`
//! failures than a perfect-mapping table would produce.
//! - **`active_reservers: AtomicU16`** — a fast-path counter
//! incremented by every `lwarx` and decremented when its reservation is
//! either committed or invalidated. `write_u32` checks this with a
//! single `Relaxed` load; when zero (the common case in code that
//! doesn't use atomics), the invalidation hook is a one-instruction
//! skip.
//! - **Generation counter**: monotonic across all reservations,
//! incremented atomically. 24 bits of generation packed in the slot
//! means 16 M reuses per slot before wraparound; at multi-million
//! reservations/sec sustained that's still many seconds, and a
//! stale-gen `stwcx.` simply fails (sound, not livelocking).
//!
//! ## Invariants
//!
//! 1. A `stwcx.(addr)` succeeds only if the line slot still holds the
//! same `(line, gen, hw_id)` triple the reserver stamped at `lwarx`.
//! 2. Any plain store to a reserved line invalidates it (slot CASed to
//! zero). Hash-collision side-effect: a store to a different line
//! that maps to the same bank also invalidates — guests that observe
//! a `stwcx.` failure simply retry, so this is correctness-preserving.
//! 3. `stwcx.` from a different `hw_id` than the reserver fails even if
//! the line and gen would otherwise match — only the originating HW
//! thread can commit its own reservation.
//!
//! Memory ordering: all CAS / store operations on the line slot use
//! `AcqRel`; readers use `Acquire`. The store inside `stwcx.`'s payload
//! itself (the actual data write) is the caller's responsibility — see
//! [`crate::interpreter`]'s `stwcx.` arm.
use std::sync::atomic::{AtomicU16, AtomicU64, Ordering};
/// Real Xenon L2 cache-line size — the granule a reservation covers.
pub const LINE_BYTES: u32 = 0x80;
/// Mask to align an address to a cache-line boundary.
pub const LINE_MASK: u32 = !(LINE_BYTES - 1);
/// Number of bank entries in the reservation table. Power of two so the
/// hash is a single AND. 32 KiB total at 8 B per entry.
pub const NUM_LINES: usize = 4096;
const HASH_MASK: u32 = (NUM_LINES as u32) - 1;
/// Pack `(line_addr, generation, hw_id)` into a single u64. The packed
/// layout is:
/// bits 63..32: line address (we only need the high bits since the
/// low 7 are always zero — reserved range is line-aligned)
/// bits 31..8: 24-bit generation
/// bits 7..0: 8-bit `hw_id`
///
/// A packed value of `0` means "no reservation". Since we never reserve
/// on guest virtual address `0` (the page is unmapped) and the
/// generation increments from `1`, zero is a safe sentinel.
#[inline]
pub fn pack(line_addr: u32, generation: u32, hw_id: u8) -> u64 {
debug_assert!(line_addr & !LINE_MASK == 0, "line_addr must be line-aligned");
debug_assert!(generation < (1 << 24), "generation must fit in 24 bits");
((line_addr as u64) << 32)
| ((generation as u64 & 0xFF_FFFF) << 8)
| (hw_id as u64)
}
/// Inverse of [`pack`]. Returns `None` if the value is the zero sentinel
/// (no reservation).
#[inline]
pub fn unpack(raw: u64) -> Option<(u32, u32, u8)> {
if raw == 0 {
return None;
}
let line = (raw >> 32) as u32;
let generation = ((raw >> 8) & 0xFF_FFFF) as u32;
let hw_id = (raw & 0xFF) as u8;
Some((line, generation, hw_id))
}
#[inline]
fn hash(line_addr: u32) -> usize {
((line_addr >> 7) & HASH_MASK) as usize
}
#[inline]
fn align_to_line(addr: u32) -> u32 {
addr & LINE_MASK
}
/// Banked reservation table shared across all emulated HW threads. Built
/// once per emulation instance; lives behind an `Arc` so worker host
/// threads (M3) can hold their own clones without lifetime gymnastics.
pub struct ReservationTable {
lines: Vec<AtomicU64>,
active_reservers: AtomicU16,
next_gen: AtomicU64,
/// Runtime activation flag. Default `false`. M2.8's
/// `--reservations-table` flag (or M3 spawn) flips this to `true`,
/// at which point the interpreter's `lwarx`/`stwcx.` arms route
/// through the table; otherwise they use the legacy per-`PpcContext`
/// reservation fields.
enabled: std::sync::atomic::AtomicBool,
}
impl Default for ReservationTable {
fn default() -> Self {
Self::new()
}
}
impl ReservationTable {
/// Construct a fresh table with all banks empty.
pub fn new() -> Self {
let mut lines = Vec::with_capacity(NUM_LINES);
for _ in 0..NUM_LINES {
lines.push(AtomicU64::new(0));
}
Self {
lines,
active_reservers: AtomicU16::new(0),
// Start at 1 so the very first reservation gets a non-zero
// gen and the packed slot value is non-zero (zero is the
// "no reservation" sentinel).
next_gen: AtomicU64::new(1),
enabled: std::sync::atomic::AtomicBool::new(false),
}
}
/// Activate the table. The interpreter's `lwarx`/`stwcx.` arms will
/// route through this table on subsequent dispatches. Idempotent.
pub fn enable(&self) {
self.enabled
.store(true, std::sync::atomic::Ordering::Release);
}
/// Deactivate the table. The interpreter falls back to per-`PpcContext`
/// reservation fields. Idempotent.
pub fn disable(&self) {
self.enabled
.store(false, std::sync::atomic::Ordering::Release);
}
/// Whether the table is currently active. The interpreter consults
/// this on every `lwarx`/`stwcx.` to decide which path runs.
pub fn is_enabled(&self) -> bool {
self.enabled.load(std::sync::atomic::Ordering::Acquire)
}
/// True when at least one reservation is currently outstanding.
/// Plain `write_u32` consults this to skip the invalidation hook
/// when no thread holds a reservation — the common case for
/// non-atomic code.
#[inline]
pub fn has_active_reservers(&self) -> bool {
self.active_reservers.load(Ordering::Relaxed) > 0
}
/// `lwarx(addr)` — claim a reservation on the line containing `addr`.
/// Returns the generation stamped into the slot; the interpreter
/// stores this alongside the per-`PpcContext` `has_reservation` bit
/// so a subsequent `stwcx.` can verify the same gen still holds.
///
/// If a different reservation already occupied the bank, it's
/// silently overwritten — that thread's `stwcx.` will fail because
/// the slot no longer matches its stamped gen. Matches Xenon
/// behavior (a different core's lwarx on the same line displaces
/// any prior reservation).
pub fn reserve(&self, addr: u32, hw_id: u8) -> u32 {
let line = align_to_line(addr);
let generation = (self
.next_gen
.fetch_add(1, Ordering::Relaxed)
& 0xFF_FFFF) as u32;
let new_raw = pack(line, generation, hw_id);
// Release: prior reads of the reservation target should
// happen-before any thread that observes the new slot value.
let prev = self.lines[hash(line)].swap(new_raw, Ordering::AcqRel);
// If the previous slot was non-zero, the displaced reserver is
// implicitly invalidated — decrement the active counter for it.
// Else, increment for our new reservation. Net effect: the
// counter equals the number of *bank slots* with a non-zero
// value, which is an upper bound on actual reservers.
if prev == 0 {
self.active_reservers.fetch_add(1, Ordering::Relaxed);
}
generation
}
/// `stwcx.(addr)` — try to commit a reservation. Returns `true` if
/// the slot still holds `(line, my_gen, my_hw_id)` (in which case
/// it's CAS'd back to zero, releasing the bank), `false` otherwise.
/// The data store itself is the caller's responsibility — see
/// [`crate::interpreter`]'s `stwcx.` arm.
pub fn try_commit(&self, addr: u32, my_gen: u32, my_hw_id: u8) -> bool {
let line = align_to_line(addr);
let expected = pack(line, my_gen, my_hw_id);
match self.lines[hash(line)].compare_exchange(
expected,
0,
Ordering::AcqRel,
Ordering::Relaxed,
) {
Ok(_) => {
// Successfully released the slot; decrement the active
// count.
self.active_reservers.fetch_sub(1, Ordering::Relaxed);
true
}
Err(_) => false,
}
}
/// Hook for plain (non-reserving) stores: invalidate any
/// reservation on the containing line. Cheap when the bank is
/// already empty (single Acquire load + branch).
pub fn invalidate_for_write(&self, addr: u32) {
let line = align_to_line(addr);
let bank = &self.lines[hash(line)];
let prev = bank.load(Ordering::Acquire);
if prev == 0 {
return;
}
// Verify the slot still holds a reservation on *this* line
// before clearing — hash collisions mean the bank may hold a
// reservation on an unrelated line that maps to the same slot.
// Real Xenon has the same property (limited L2 associativity);
// we mirror it here. A spurious bank match invalidates a
// different line's reservation; the affected `stwcx.` retries —
// sound, slightly less efficient.
if let Some((bank_line, _generation, _hw)) = unpack(prev) {
if bank_line != line {
// Different line in the same bank — leave it alone (we
// chose not to invalidate cross-line collisions to
// reduce false-fail noise; real-HW behavior is similar
// since L2 associativity sets cross-line constraints).
return;
}
}
// CAS-clear the bank if it still holds the value we observed.
// If a concurrent `stwcx.` or `reserve` raced with us, the CAS
// fails — that's fine; the line slot is now in a different
// state and the displaced reservation will be picked up there.
if bank
.compare_exchange(prev, 0, Ordering::AcqRel, Ordering::Relaxed)
.is_ok()
{
self.active_reservers.fetch_sub(1, Ordering::Relaxed);
}
}
/// Drop a per-`PpcContext` reservation without committing. Called
/// when the interpreter clears `has_reservation` due to a
/// non-`stwcx.` event (context switch, exception, etc.). Safe to
/// call when the table doesn't hold our reservation anymore (the
/// CAS simply fails).
pub fn release(&self, addr: u32, my_gen: u32, my_hw_id: u8) {
let _ = self.try_commit(addr, my_gen, my_hw_id);
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::Arc;
use std::thread;
#[test]
fn pack_unpack_roundtrip() {
let raw = pack(0x1000_0000, 42, 5);
let (line, generation, hw) = unpack(raw).unwrap();
assert_eq!(line, 0x1000_0000);
assert_eq!(generation, 42);
assert_eq!(hw, 5);
}
#[test]
fn unpack_zero_is_none() {
assert!(unpack(0).is_none());
}
#[test]
fn reserve_then_commit_succeeds() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
assert!(t.try_commit(0x1234, gn, 0));
// Already released — second commit fails.
assert!(!t.try_commit(0x1234, gn, 0));
}
#[test]
fn other_hw_id_cannot_commit() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
assert!(
!t.try_commit(0x1234, gn, 1),
"stwcx. from a different hw_id must fail"
);
// Original owner can still commit.
assert!(t.try_commit(0x1234, gn, 0));
}
#[test]
fn lwarx_displaces_prior_reservation() {
let t = ReservationTable::new();
let g0 = t.reserve(0x1234, 0);
// Different HW thread's lwarx on the same line.
let g1 = t.reserve(0x1234, 1);
// Original reserver's stwcx. fails because the gen changed.
assert!(!t.try_commit(0x1234, g0, 0));
// New reserver's stwcx. succeeds.
assert!(t.try_commit(0x1234, g1, 1));
}
#[test]
fn invalidate_clears_matching_reservation() {
let t = ReservationTable::new();
let gn = t.reserve(0x1234, 0);
t.invalidate_for_write(0x1238); // same line as 0x1234
assert!(!t.try_commit(0x1234, gn, 0));
assert_eq!(t.active_reservers.load(Ordering::Relaxed), 0);
}
#[test]
fn invalidate_different_line_in_same_bank_is_noop() {
let t = ReservationTable::new();
// Force a hash collision: addr A and addr B with same hash but
// different line addresses.
let line_a = 0x0000_1000;
let line_b = line_a + ((NUM_LINES as u32) << 7); // +0x80000 → same hash
assert_eq!(hash(line_a), hash(line_b));
let gn = t.reserve(line_a, 0);
// Invalidating line_b must NOT clear line_a's reservation.
t.invalidate_for_write(line_b);
assert!(t.try_commit(line_a, gn, 0));
}
#[test]
fn has_active_reservers_tracks_count() {
let t = ReservationTable::new();
assert!(!t.has_active_reservers());
let g0 = t.reserve(0x1000, 0);
assert!(t.has_active_reservers());
let g1 = t.reserve(0x2000, 1);
assert!(t.has_active_reservers());
t.try_commit(0x1000, g0, 0);
assert!(t.has_active_reservers());
t.try_commit(0x2000, g1, 1);
assert!(!t.has_active_reservers());
}
/// Stress test: 8 host threads each loop reserve+stwcx on the same
/// line. Exactly one stwcx per round can win; the others fail and
/// retry. The total number of *successful* commits across N
/// outer iterations equals N (one winner per round).
///
/// This proves the table's mutual-exclusion property: at most one
/// thread's stwcx. on a given line can succeed between two events
/// that would invalidate the line.
#[test]
fn concurrent_lwarx_stwcx_serializes() {
let t = Arc::new(ReservationTable::new());
const ROUNDS: u32 = 1000;
const THREADS: u8 = 8;
let total_successes = Arc::new(AtomicU64::new(0));
let mut handles = Vec::new();
for hw_id in 0..THREADS {
let t_clone = t.clone();
let s_clone = total_successes.clone();
handles.push(
thread::Builder::new()
.name(format!("res-stress-{hw_id}"))
.spawn(move || {
let mut wins = 0u64;
for _ in 0..ROUNDS {
let gn = t_clone.reserve(0x1234_5678, hw_id);
if t_clone.try_commit(0x1234_5678, gn, hw_id) {
wins += 1;
}
}
s_clone.fetch_add(wins, Ordering::Relaxed);
})
.expect("spawn"),
);
}
for h in handles {
h.join().expect("join");
}
let total = total_successes.load(Ordering::Relaxed);
// Lower bound: every round had at least one winner — but races
// can cause some rounds to have zero (all threads' reservations
// got displaced before any could commit). Assert progress: at
// least 10% of attempts succeed, and active_reservers is back
// to zero.
let attempts = ROUNDS as u64 * THREADS as u64;
assert!(
total > attempts / 10,
"expected at least 10% successful commits, got {total}/{attempts}"
);
assert_eq!(
t.active_reservers.load(Ordering::Relaxed),
0,
"all reservations should have been resolved"
);
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,95 @@
//! TO-field evaluation for `tw`, `twi`, `td`, `tdi`.
//!
//! The TO field (5 bits) encodes which comparison outcomes trigger a trap:
//!
//! | bit | condition |
//! |-----|-----------|
//! | 0 | a < b (signed) |
//! | 1 | a > b (signed) |
//! | 2 | a == b |
//! | 3 | a < b (unsigned) |
//! | 4 | a > b (unsigned) |
//!
//! The bit numbering matches PowerISA ("MSB is bit 0"): TO[0] corresponds to
//! the high bit of the 5-bit field, i.e. (to >> 4) & 1.
//!
//! `tw` / `twi` compare the low 32 bits of the operands (sign-extended back to
//! 64 for the signed comparison); `td` / `tdi` compare the full 64 bits.
#[derive(Clone, Copy, Debug)]
pub enum TrapWidth {
Word, // tw, twi: 32-bit
Doubleword, // td, tdi: 64-bit
}
const TO_SLT: u32 = 1 << 4; // a < b signed
const TO_SGT: u32 = 1 << 3; // a > b signed
const TO_EQ: u32 = 1 << 2; // a == b
const TO_ULT: u32 = 1 << 1; // a < b unsigned
const TO_UGT: u32 = 1 << 0; // a > b unsigned
/// Returns true when the trap should fire.
pub fn evaluate(to: u32, a: u64, b: u64, width: TrapWidth) -> bool {
let (sa, sb, ua, ub): (i64, i64, u64, u64) = match width {
TrapWidth::Word => (
a as i32 as i64,
b as i32 as i64,
a as u32 as u64,
b as u32 as u64,
),
TrapWidth::Doubleword => (a as i64, b as i64, a, b),
};
if (to & TO_SLT) != 0 && sa < sb { return true; }
if (to & TO_SGT) != 0 && sa > sb { return true; }
if (to & TO_EQ) != 0 && ua == ub { return true; }
if (to & TO_ULT) != 0 && ua < ub { return true; }
if (to & TO_UGT) != 0 && ua > ub { return true; }
false
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn to_zero_never_traps() {
assert!(!evaluate(0, 0, 0, TrapWidth::Doubleword));
assert!(!evaluate(0, 5, 3, TrapWidth::Doubleword));
assert!(!evaluate(0, !0, 0, TrapWidth::Doubleword));
}
#[test]
fn to_31_always_traps_when_any_condition_holds() {
// 31 = 0b11111 = all conditions enabled
assert!(evaluate(31, 1, 2, TrapWidth::Doubleword)); // slt+ult
assert!(evaluate(31, 2, 1, TrapWidth::Doubleword)); // sgt+ugt
assert!(evaluate(31, 7, 7, TrapWidth::Doubleword)); // eq
}
#[test]
fn to_eq_only() {
// TO[2] = 0b00100 = 4
assert!(evaluate(4, 5, 5, TrapWidth::Doubleword));
assert!(!evaluate(4, 5, 6, TrapWidth::Doubleword));
}
#[test]
fn to_signed_vs_unsigned_on_negative() {
// a=-1 (as u64 = all-ones). TO[0]=slt enabled = 0b10000 = 16
// Signed: -1 < 0 → true
let neg1 = (-1i64) as u64;
assert!(evaluate(16, neg1, 0, TrapWidth::Doubleword));
// TO[3]=ult enabled = 0b00010 = 2 → unsigned: all-ones < 0 is false
assert!(!evaluate(2, neg1, 0, TrapWidth::Doubleword));
}
#[test]
fn word_width_ignores_high_32_bits() {
// a's low 32 = 1, high 32 = different; b = 1. With TO=eq, should trap.
let a = 0xDEAD_BEEF_0000_0001u64;
assert!(evaluate(4, a, 1, TrapWidth::Word));
// In doubleword, different.
assert!(!evaluate(4, a, 1, TrapWidth::Doubleword));
}
}

944
crates/xenia-cpu/src/vmx.rs Normal file
View File

@@ -0,0 +1,944 @@
//! VMX / AltiVec helper routines shared by the interpreter's 150+ vector
//! opcode handlers.
//!
//! Big-endian lane indexing throughout: `Vec128::bytes[0]` is the most
//! significant byte, which corresponds to PowerPC lane 0. Operations that
//! care about "even" vs "odd" lanes follow the PPC convention (lane 0 = most
//! significant = "even" for multiply-even/odd purposes).
use xenia_memory::MemoryAccess;
use xenia_types::Vec128;
// ─── Lane accessors ────────────────────────────────────────────────────────
#[inline] pub fn as_i8x16(v: Vec128) -> [i8; 16] {
let b = v.as_bytes();
let mut r = [0i8; 16];
for i in 0..16 { r[i] = b[i] as i8; }
r
}
#[inline] pub fn as_i16x8(v: Vec128) -> [i16; 8] {
let u = v.as_u16x8();
[u[0] as i16, u[1] as i16, u[2] as i16, u[3] as i16,
u[4] as i16, u[5] as i16, u[6] as i16, u[7] as i16]
}
#[inline] pub fn as_i32x4(v: Vec128) -> [i32; 4] {
let u = v.as_u32x4();
[u[0] as i32, u[1] as i32, u[2] as i32, u[3] as i32]
}
#[inline] pub fn from_i8x16(r: [i8; 16]) -> Vec128 {
let mut b = [0u8; 16];
for i in 0..16 { b[i] = r[i] as u8; }
Vec128::from_bytes(b)
}
#[inline] pub fn from_i16x8(r: [i16; 8]) -> Vec128 {
Vec128::from_u16x8_array([
r[0] as u16, r[1] as u16, r[2] as u16, r[3] as u16,
r[4] as u16, r[5] as u16, r[6] as u16, r[7] as u16,
])
}
#[inline] pub fn from_i32x4(r: [i32; 4]) -> Vec128 {
Vec128::from_u32x4_array([r[0] as u32, r[1] as u32, r[2] as u32, r[3] as u32])
}
// ─── Saturation helpers ────────────────────────────────────────────────────
// Each returns (clamped_value, saturated_flag). Handlers OR the flags together
// and call `ctx.set_vscr_sat(true)` once per instruction.
#[inline] pub fn sat_add_u8(a: u8, b: u8) -> (u8, bool) {
let s = a as u16 + b as u16;
if s > u8::MAX as u16 { (u8::MAX, true) } else { (s as u8, false) }
}
#[inline] pub fn sat_sub_u8(a: u8, b: u8) -> (u8, bool) {
if a >= b { (a - b, false) } else { (0, true) }
}
#[inline] pub fn sat_add_i8(a: i8, b: i8) -> (i8, bool) {
let s = a as i16 + b as i16;
if s > i8::MAX as i16 { (i8::MAX, true) }
else if s < i8::MIN as i16 { (i8::MIN, true) }
else { (s as i8, false) }
}
#[inline] pub fn sat_sub_i8(a: i8, b: i8) -> (i8, bool) {
let s = a as i16 - b as i16;
if s > i8::MAX as i16 { (i8::MAX, true) }
else if s < i8::MIN as i16 { (i8::MIN, true) }
else { (s as i8, false) }
}
#[inline] pub fn sat_add_u16(a: u16, b: u16) -> (u16, bool) {
let s = a as u32 + b as u32;
if s > u16::MAX as u32 { (u16::MAX, true) } else { (s as u16, false) }
}
#[inline] pub fn sat_sub_u16(a: u16, b: u16) -> (u16, bool) {
if a >= b { (a - b, false) } else { (0, true) }
}
#[inline] pub fn sat_add_i16(a: i16, b: i16) -> (i16, bool) {
let s = a as i32 + b as i32;
if s > i16::MAX as i32 { (i16::MAX, true) }
else if s < i16::MIN as i32 { (i16::MIN, true) }
else { (s as i16, false) }
}
#[inline] pub fn sat_sub_i16(a: i16, b: i16) -> (i16, bool) {
let s = a as i32 - b as i32;
if s > i16::MAX as i32 { (i16::MAX, true) }
else if s < i16::MIN as i32 { (i16::MIN, true) }
else { (s as i16, false) }
}
#[inline] pub fn sat_add_u32(a: u32, b: u32) -> (u32, bool) {
let s = a as u64 + b as u64;
if s > u32::MAX as u64 { (u32::MAX, true) } else { (s as u32, false) }
}
#[inline] pub fn sat_sub_u32(a: u32, b: u32) -> (u32, bool) {
if a >= b { (a - b, false) } else { (0, true) }
}
#[inline] pub fn sat_add_i32(a: i32, b: i32) -> (i32, bool) {
let s = a as i64 + b as i64;
if s > i32::MAX as i64 { (i32::MAX, true) }
else if s < i32::MIN as i64 { (i32::MIN, true) }
else { (s as i32, false) }
}
#[inline] pub fn sat_sub_i32(a: i32, b: i32) -> (i32, bool) {
let s = a as i64 - b as i64;
if s > i32::MAX as i64 { (i32::MAX, true) }
else if s < i32::MIN as i64 { (i32::MIN, true) }
else { (s as i32, false) }
}
// Pack-with-saturation helpers — clamp a wider integer to the narrower type.
#[inline] pub fn sat_i16_to_i8(v: i16) -> (i8, bool) {
if v > i8::MAX as i16 { (i8::MAX, true) }
else if v < i8::MIN as i16 { (i8::MIN, true) }
else { (v as i8, false) }
}
#[inline] pub fn sat_i16_to_u8(v: i16) -> (u8, bool) {
if v < 0 { (0, true) }
else if v > u8::MAX as i16 { (u8::MAX, true) }
else { (v as u8, false) }
}
#[inline] pub fn sat_u16_to_u8(v: u16) -> (u8, bool) {
if v > u8::MAX as u16 { (u8::MAX, true) } else { (v as u8, false) }
}
#[inline] pub fn sat_i32_to_i16(v: i32) -> (i16, bool) {
if v > i16::MAX as i32 { (i16::MAX, true) }
else if v < i16::MIN as i32 { (i16::MIN, true) }
else { (v as i16, false) }
}
#[inline] pub fn sat_i32_to_u16(v: i32) -> (u16, bool) {
if v < 0 { (0, true) }
else if v > u16::MAX as i32 { (u16::MAX, true) }
else { (v as u16, false) }
}
#[inline] pub fn sat_u32_to_u16(v: u32) -> (u16, bool) {
if v > u16::MAX as u32 { (u16::MAX, true) } else { (v as u16, false) }
}
#[inline] pub fn sat_i64_to_i32(v: i64) -> (i32, bool) {
if v > i32::MAX as i64 { (i32::MAX, true) }
else if v < i32::MIN as i64 { (i32::MIN, true) }
else { (v as i32, false) }
}
#[inline] pub fn sat_i64_to_u32(v: i64) -> (u32, bool) {
if v < 0 { (0, true) }
else if v > u32::MAX as i64 { (u32::MAX, true) }
else { (v as u32, false) }
}
// ─── Averages ──────────────────────────────────────────────────────────────
// PPC avg is rounded up: (a + b + 1) / 2.
#[inline] pub fn avg_u8(a: u8, b: u8) -> u8 {
((a as u16 + b as u16 + 1) >> 1) as u8
}
#[inline] pub fn avg_u16(a: u16, b: u16) -> u16 {
((a as u32 + b as u32 + 1) >> 1) as u16
}
#[inline] pub fn avg_u32(a: u32, b: u32) -> u32 {
((a as u64 + b as u64 + 1) >> 1) as u32
}
#[inline] pub fn avg_i8(a: i8, b: i8) -> i8 {
((a as i32 + b as i32 + 1) >> 1) as i8
}
#[inline] pub fn avg_i16(a: i16, b: i16) -> i16 {
((a as i32 + b as i32 + 1) >> 1) as i16
}
#[inline] pub fn avg_i32(a: i32, b: i32) -> i32 {
((a as i64 + b as i64 + 1) >> 1) as i32
}
// ─── NaN-aware f32 min/max for vmaxfp / vminfp ────────────────────────────
//
// Altivec PEM: "If either element of vA or vB is a NaN, the corresponding
// element of vD is set to the quiet NaN form of that NaN". Rust's `>` / `<`
// comparison with NaN always returns false, so `if a > b { a } else { b }`
// would silently pick `b` whenever `a` is NaN — losing NaN propagation.
#[inline]
pub fn max_nan(a: f32, b: f32) -> f32 {
if a.is_nan() { quiet_nan(a) }
else if b.is_nan() { quiet_nan(b) }
else if a > b { a } else { b }
}
#[inline]
pub fn min_nan(a: f32, b: f32) -> f32 {
if a.is_nan() { quiet_nan(a) }
else if b.is_nan() { quiet_nan(b) }
else if a < b { a } else { b }
}
/// Convert an SNaN to QNaN by setting the high mantissa bit. A QNaN is
/// returned unchanged.
#[inline]
pub fn quiet_nan(x: f32) -> f32 {
if !x.is_nan() { return x; }
f32::from_bits(x.to_bits() | 0x0040_0000)
}
/// Flush a subnormal f32 to ±0 (preserving the sign). Used by vmaddfp family,
/// vctsxs / vctuxs, and any instruction whose AltiVec definition specifies
/// input-side denormal flushing regardless of VSCR[NJ].
#[inline]
pub fn flush_denorm(x: f32) -> f32 {
if x.is_subnormal() {
if x.is_sign_negative() { -0.0 } else { 0.0 }
} else {
x
}
}
// ─── Float ⇄ fixed-point conversions (scaled by 2^scale_bits) ─────────────
//
// vctsxs / vctuxs flush denormal inputs to 0 before scaling, per Altivec.
#[inline] pub fn cvt_f32_to_i32_sat(x: f32, scale_bits: u32) -> (i32, bool) {
// PPCBUG-433: AltiVec ISA saturates NaN to INT_MIN (0x80000000), not 0.
// (vctuxs's NaN→0 is correct per AltiVec ISA — see PPCBUG-434.)
if x.is_nan() { return (i32::MIN, true); }
let x = flush_denorm(x);
let scaled = (x as f64) * ((1u64 << scale_bits) as f64);
if scaled >= i32::MAX as f64 { return (i32::MAX, true); }
if scaled <= i32::MIN as f64 { return (i32::MIN, true); }
(scaled.trunc() as i32, false)
}
#[inline] pub fn cvt_f32_to_u32_sat(x: f32, scale_bits: u32) -> (u32, bool) {
if x.is_nan() { return (0, true); }
let x = flush_denorm(x);
let scaled = (x as f64) * ((1u64 << scale_bits) as f64);
if scaled < 0.0 { return (0, true); }
if scaled > u32::MAX as f64 { return (u32::MAX, true); }
(scaled.trunc() as u32, false)
}
#[inline] pub fn cvt_i32_to_f32(v: i32, scale_bits: u32) -> f32 {
(v as f64 / (1u64 << scale_bits) as f64) as f32
}
#[inline] pub fn cvt_u32_to_f32(v: u32, scale_bits: u32) -> f32 {
(v as f64 / (1u64 << scale_bits) as f64) as f32
}
// ─── Unaligned vector load/store ──────────────────────────────────────────
//
// lvlx/lvrx and stvlx/stvrx combine to perform any unaligned 16-byte access:
// lvlx(EA) | lvrx(EA + 16) loads 16 bytes starting at unaligned EA.
// stvlx(EA); stvrx(EA + 16) stores 16 bytes starting at unaligned EA.
//
// Semantics per the AltiVec manual (and xenia-canary ppc_emit_memory.cc):
// lvlx: shift = EA & 0xF, n = 16 - shift. Loads mem[EA..EA+n] into
// lanes VR[0..n], zeros VR[n..16].
// lvrx: shift = EA & 0xF. If shift == 0, VR = 0. Otherwise loads
// mem[EA-shift..EA] into lanes VR[16-shift..16], zeros VR[0..16-shift].
// stvlx / stvrx are the symmetric stores.
//
// `Vec128::bytes[0]` is the most significant byte (PPC lane 0 in BE view).
pub fn load_vector_left(mem: &dyn MemoryAccess, ea: u32) -> Vec128 {
let shift = (ea & 0xF) as usize;
let n = 16 - shift;
let mut bytes = [0u8; 16];
for i in 0..n {
bytes[i] = mem.read_u8(ea.wrapping_add(i as u32));
}
Vec128::from_bytes(bytes)
}
pub fn load_vector_right(mem: &dyn MemoryAccess, ea: u32) -> Vec128 {
let shift = (ea & 0xF) as usize;
if shift == 0 { return Vec128::ZERO; }
let base = ea & !0xFu32;
let mut bytes = [0u8; 16];
for i in 0..shift {
bytes[16 - shift + i] = mem.read_u8(base.wrapping_add(i as u32));
}
Vec128::from_bytes(bytes)
}
pub fn store_vector_left(mem: &dyn MemoryAccess, ea: u32, v: Vec128) {
let shift = (ea & 0xF) as usize;
let n = 16 - shift;
let b = v.as_bytes();
for i in 0..n {
mem.write_u8(ea.wrapping_add(i as u32), b[i]);
}
}
pub fn store_vector_right(mem: &dyn MemoryAccess, ea: u32, v: Vec128) {
let shift = (ea & 0xF) as usize;
if shift == 0 { return; }
let base = ea & !0xFu32;
let b = v.as_bytes();
for i in 0..shift {
mem.write_u8(base.wrapping_add(i as u32), b[16 - shift + i]);
}
}
// ─── pixel pack (vpkpx / vupkhpx / vupklpx) ───────────────────────────────
// PPC vpkpx packs each 32-bit lane into a 16-bit 1-5-5-5 pixel.
// Mapping transcribed EXACTLY from xenia-canary
// `ppc_emit_altivec.cc::vkpkx_in_low` (lines 1795-1808):
// tmp1 = (input >> 9) & 0xFC00 // out bits 15:10 = in bits 24:19
// tmp2 = (input >> 6) & 0x3E0 // out bits 9:5 = in bits 14:10
// tmp3 = (input >> 3) & 0x1F // out bits 4:0 = in bits 7:3
// result = tmp1 | tmp2 | tmp3
// This is a pure shift/mask: there is NO standalone alpha select. Output
// bit 15 is simply input bit 24 (the top of the 6-bit field masked by
// 0xFC00) — NOT input bit 7. The red field is 6 bits wide here.
#[inline] pub fn pack_pixel_555(input: u32) -> u16 {
let tmp1 = (input >> 9) & 0xFC00;
let tmp2 = (input >> 6) & 0x3E0;
let tmp3 = (input >> 3) & 0x1F;
(tmp1 | tmp2 | tmp3) as u16
}
#[inline] pub fn unpack_pixel_555(input: u16) -> u32 {
let input = input as u32;
let a = (input >> 15) & 0x1;
let r = (input >> 10) & 0x1F;
let g = (input >> 5) & 0x1F;
let b = input & 0x1F;
// Sign-extend A and replicate 5-bit RGB into the top of each byte.
let a8 = if a != 0 { 0xFFu32 } else { 0 };
let r8 = (r << 3) | (r >> 2);
let g8 = (g << 3) | (g >> 2);
let b8 = (b << 3) | (b >> 2);
(a8 << 24) | (r8 << 16) | (g8 << 8) | b8
}
// ─── VMX128 D3D pack/unpack dispatch ──────────────────────────────────────
// `vpkd3d128` / `vupkd3d128` encode a small enum in the instruction word
// (VX128_4 immediate field). The exact enum lives in canary's
// ppc_emit_altivec.cc under PACK_TYPE_*; titles usually touch D3DCOLOR
// (type 0) and a handful of texture-coordinate variants.
//
// Rather than risk getting a rarely-used sub-case wrong, we implement the
// common types and fall back to a warning + pass-through for unknown types.
// Returning the VB register value unchanged is always preferable to emitting
// StepResult::Unimplemented because it keeps the interpreter running.
/// Pack-type encoding of `vpkd3d128` / `vupkd3d128`.
///
/// The immediate field lives at PPC bits 16-22 (VX128_3/4 IMM, 7 bits).
/// Canary decodes `type = IMM >> 2` (top 5 bits) and `pack = IMM & 0x3`
/// (low 2 bits, used only by `vpkd3d128` to select output-slot layout).
/// Valid `type` values are 0..=6 per `ppc_emit_altivec.cc:2095-2118`:
///
/// | id | canary name | format |
/// |----|-------------------|---------------------------------------|
/// | 0 | VPACK_D3DCOLOR | 4 f32 [0,1] ↔ ARGB8 |
/// | 1 | VPACK_NORMSHORT2 | 2 f32 [-1,1] ↔ 2× signed-normalized i16 |
/// | 2 | VPACK_NORMPACKED32| 4 f32 [-1,1] ↔ UINT_2101010 (w:2,z:10,y:10,x:10) |
/// | 3 | VPACK_FLOAT16_2 | 2 f32 ↔ 2× fp16 |
/// | 4 | VPACK_NORMSHORT4 | 4 f32 [-1,1] ↔ 4× signed-normalized i16 |
/// | 5 | VPACK_FLOAT16_4 | 4 f32 ↔ 4× fp16 |
/// | 6 | VPACK_NORMPACKED64| 4 f32 [-1,1] ↔ ULONG_4202020 (w:4,z:20,y:20,x:20) |
///
/// Prior (M3-pre) this enum listed made-up "Normal16"/"Normal8"/"UByteN4"
/// variants that didn't match canary; the immediate extraction was also
/// wrong (LSB-numbered `>>6 & 0x7` instead of MSB-numbered `>>11 & 0x1F`
/// against a 7-bit IMM field). M3 fixes both.
#[derive(Debug, Clone, Copy)]
pub enum D3dPackType {
D3dColor,
NormShort2,
NormPacked32,
Float16_2,
NormShort4,
Float16_4,
NormPacked64,
Other(u32),
}
impl D3dPackType {
/// Decode the `type` bits extracted from the VX128_3/4 IMM field via
/// canary's `IMM >> 2` convention (i.e. the caller has already divided
/// out the 2-bit `pack` subfield).
pub fn from_immediate(type_bits: u32) -> Self {
match type_bits {
0 => Self::D3dColor,
1 => Self::NormShort2,
2 => Self::NormPacked32,
3 => Self::Float16_2,
4 => Self::NormShort4,
5 => Self::Float16_4,
6 => Self::NormPacked64,
other => Self::Other(other),
}
}
}
/// Pack an f32x4 vector of [R, G, B, A] in [0.0, 1.0] into a single D3DCOLOR
/// value in lane 3 of the output.
pub fn pack_d3dcolor(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
let to_byte = |x: f32| -> u32 {
let c = x.clamp(0.0, 1.0) * 255.0;
(c + 0.5) as u32 & 0xFF
};
// D3DCOLOR is A,R,G,B in that byte order inside a u32.
let word = (to_byte(f[3]) << 24) | (to_byte(f[0]) << 16) | (to_byte(f[1]) << 8) | to_byte(f[2]);
Vec128::from_u32x4(0, 0, 0, word)
}
/// Unpack a D3DCOLOR value (in lane 3 of the input) into an f32x4 [R, G, B, A].
pub fn unpack_d3dcolor(v: Vec128) -> Vec128 {
let word = v.u32x4(3);
let a = ((word >> 24) & 0xFF) as f32 / 255.0;
let r = ((word >> 16) & 0xFF) as f32 / 255.0;
let g = ((word >> 8) & 0xFF) as f32 / 255.0;
let b = (word & 0xFF) as f32 / 255.0;
Vec128::from_f32x4(r, g, b, a)
}
// ───────────────────────────────────────────────────────────────────────
// First-Pixels M3 — pack/unpack for the remaining canary pack types.
//
// Conventions shared across all helpers:
// * Input-to-`unpack_*` (packed data) lives in the *source* lane position
// canary's HIR assumes: canonically the 32-bit word is in lane 3 and
// the 64-bit value straddles lanes 2-3. We match that so the existing
// D3DCOLOR helpers' 3-lane convention is preserved across the whole
// pack-type family.
// * Output-from-`pack_*` sits in the same lane(s). The caller usually
// follows with a permute to move it elsewhere (the VX128_4 `pack`
// subfield controls that in `vpkd3d128`).
// * Range semantics match canary: normalized types use `max` = (1<<N-1)-1
// for signed, clamp before rounding.
// ───────────────────────────────────────────────────────────────────────
#[inline]
fn norm_to_i16(x: f32) -> i16 {
let c = x.clamp(-1.0, 1.0) * 32767.0;
// Round half away from zero, matching canary's `vcfsx` semantics.
let r = if c >= 0.0 { (c + 0.5) as i32 } else { (c - 0.5) as i32 };
r.clamp(-32768, 32767) as i16
}
#[inline]
fn i16_to_norm(s: i16) -> f32 {
(s as f32) / 32767.0
}
/// **NORMSHORT2** — 2 f32s in [-1, 1] → two 16-bit signed-normalized
/// shorts packed as `(x << 16) | y` in lane 3 (high 32 bits of the word
/// hold X; low 16 hold Y). Output lanes 0..=2 are zero-filled.
pub fn pack_normshort2(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
let x = norm_to_i16(f[0]) as u16 as u32;
let y = norm_to_i16(f[1]) as u16 as u32;
Vec128::from_u32x4(0, 0, 0, (x << 16) | y)
}
pub fn unpack_normshort2(v: Vec128) -> Vec128 {
let word = v.u32x4(3);
let x = i16_to_norm((word >> 16) as i16);
let y = i16_to_norm(word as i16);
Vec128::from_f32x4(x, y, 0.0, 1.0)
}
/// **NORMSHORT4** — 4 f32s in [-1, 1] → four 16-bit signed-normalized
/// shorts packed across lanes 2-3 (big-endian dword order: X in the
/// high word of lane 2, Y low of lane 2, Z high of lane 3, W low of lane
/// 3).
pub fn pack_normshort4(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
let x = norm_to_i16(f[0]) as u16 as u32;
let y = norm_to_i16(f[1]) as u16 as u32;
let z = norm_to_i16(f[2]) as u16 as u32;
let w = norm_to_i16(f[3]) as u16 as u32;
Vec128::from_u32x4(0, 0, (x << 16) | y, (z << 16) | w)
}
pub fn unpack_normshort4(v: Vec128) -> Vec128 {
let hi = v.u32x4(2);
let lo = v.u32x4(3);
let x = i16_to_norm((hi >> 16) as i16);
let y = i16_to_norm(hi as i16);
let z = i16_to_norm((lo >> 16) as i16);
let w = i16_to_norm(lo as i16);
Vec128::from_f32x4(x, y, z, w)
}
/// **NORMPACKED32** — UINT_2101010 layout, 4 f32s in [-1, 1] packed into
/// 32 bits in lane 3. Per canary's comment `2_10_10_10 w_z_y_x`: the
/// high 2 bits hold W (signed 2-bit, -2..=1), then Z/Y/X each use 10
/// signed-normalized bits.
pub fn pack_normpacked32(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
#[inline]
fn n10(x: f32) -> u32 {
let c = x.clamp(-1.0, 1.0) * 511.0;
let r = if c >= 0.0 { (c + 0.5) as i32 } else { (c - 0.5) as i32 };
(r.clamp(-512, 511) as i32 as u32) & 0x3FF
}
#[inline]
fn n2(x: f32) -> u32 {
let c = x.clamp(-1.0, 1.0) * 1.0;
let r = if c >= 0.0 { (c + 0.5) as i32 } else { (c - 0.5) as i32 };
(r.clamp(-2, 1) as i32 as u32) & 0x3
}
let x = n10(f[0]);
let y = n10(f[1]);
let z = n10(f[2]);
let w = n2(f[3]);
let word = (w << 30) | (z << 20) | (y << 10) | x;
Vec128::from_u32x4(0, 0, 0, word)
}
pub fn unpack_normpacked32(v: Vec128) -> Vec128 {
let word = v.u32x4(3);
#[inline]
fn u10_to_norm(bits: u32) -> f32 {
// Sign-extend the 10-bit field then normalize.
let s = ((bits & 0x3FF) as i32) << 22 >> 22;
(s as f32) / 511.0
}
#[inline]
fn u2_to_norm(bits: u32) -> f32 {
let s = ((bits & 0x3) as i32) << 30 >> 30;
(s as f32).clamp(-1.0, 1.0)
}
let x = u10_to_norm(word);
let y = u10_to_norm(word >> 10);
let z = u10_to_norm(word >> 20);
let w = u2_to_norm(word >> 30);
Vec128::from_f32x4(x, y, z, w)
}
/// **NORMPACKED64** — ULONG_4202020, 4 f32s in [-1, 1] packed into 64
/// bits across lanes 2-3. Per canary's comment `4_20_20_20 w_z_y_x`:
/// the high 4 bits of the dword hold W (signed 4-bit); the remaining 60
/// bits hold 3× 20-bit signed-normalized Z/Y/X. Rare outside very few
/// titles (canary notes 54540829).
pub fn pack_normpacked64(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
#[inline]
fn n20(x: f32) -> u64 {
let c = x.clamp(-1.0, 1.0) * 524287.0; // 2^19 - 1
let r = if c >= 0.0 { (c + 0.5) as i64 } else { (c - 0.5) as i64 };
(r.clamp(-524288, 524287) as i64 as u64) & 0xF_FFFF
}
#[inline]
fn n4(x: f32) -> u64 {
let c = x.clamp(-1.0, 1.0) * 7.0;
let r = if c >= 0.0 { (c + 0.5) as i64 } else { (c - 0.5) as i64 };
(r.clamp(-8, 7) as i64 as u64) & 0xF
}
let x = n20(f[0]);
let y = n20(f[1]);
let z = n20(f[2]);
let w = n4(f[3]);
let dw: u64 = (w << 60) | (z << 40) | (y << 20) | x;
Vec128::from_u32x4(0, 0, (dw >> 32) as u32, dw as u32)
}
pub fn unpack_normpacked64(v: Vec128) -> Vec128 {
let hi = v.u32x4(2) as u64;
let lo = v.u32x4(3) as u64;
let dw = (hi << 32) | lo;
#[inline]
fn u20_to_norm(bits: u64) -> f32 {
let s = ((bits & 0xF_FFFF) as i64) << 44 >> 44;
(s as f32) / 524287.0
}
#[inline]
fn u4_to_norm(bits: u64) -> f32 {
let s = ((bits & 0xF) as i64) << 60 >> 60;
(s as f32) / 7.0
}
let x = u20_to_norm(dw);
let y = u20_to_norm(dw >> 20);
let z = u20_to_norm(dw >> 40);
let w = u4_to_norm(dw >> 60);
Vec128::from_f32x4(x, y, z, w)
}
/// IEEE 754 half-precision float pack/unpack — used by both FLOAT16_2
/// and FLOAT16_4. No FMA quirks involved; we go via `f32::to_bits` and
/// manual bit-twiddling (the stable-Rust `f16` type isn't available
/// yet).
#[inline]
fn f32_to_f16_bits(f: f32) -> u16 {
let bits = f.to_bits();
let sign = ((bits >> 31) & 0x1) as u16;
let exp = ((bits >> 23) & 0xFF) as i32;
let mant = bits & 0x7FFFFF;
// Handle the easy cases first.
if exp == 0xFF {
// NaN or infinity.
let half_exp = 0x1F;
let half_mant = if mant != 0 { 0x200 } else { 0 }; // quiet NaN / zero mantissa for Inf
return (sign << 15) | (half_exp << 10) | half_mant;
}
let unbiased_exp = exp - 127;
if unbiased_exp >= 16 {
// Overflow → infinity.
return (sign << 15) | (0x1F << 10);
}
if unbiased_exp <= -15 {
// Denormal or zero. Compute the shift and subnormal mantissa;
// anything too small flushes to signed zero.
if unbiased_exp < -24 {
return sign << 15;
}
let shift = -14 - unbiased_exp as i32; // amount to shift the implicit-1'd mantissa
let full_mant = 0x800000 | mant; // 24 bits with implicit leading 1
let half_mant = (full_mant >> (shift + 13)) as u16;
return (sign << 15) | half_mant;
}
let half_exp = ((unbiased_exp + 15) as u16) & 0x1F;
let half_mant = (mant >> 13) as u16;
(sign << 15) | (half_exp << 10) | half_mant
}
#[inline]
fn f16_bits_to_f32(h: u16) -> f32 {
let sign = ((h >> 15) & 0x1) as u32;
let exp = ((h >> 10) & 0x1F) as i32;
let mant = (h & 0x3FF) as u32;
let bits = if exp == 0x1F {
// NaN or infinity.
let f32_exp = 0xFFu32;
let f32_mant = if mant != 0 { 0x400000 } else { 0 };
(sign << 31) | (f32_exp << 23) | f32_mant
} else if exp == 0 && mant == 0 {
// Signed zero.
sign << 31
} else if exp == 0 {
// Subnormal — renormalize.
let mut e = -14i32;
let mut m = mant;
while (m & 0x400) == 0 {
m <<= 1;
e -= 1;
}
let f32_exp = ((e + 127) as u32) & 0xFF;
let f32_mant = (m & 0x3FF) << 13;
(sign << 31) | (f32_exp << 23) | f32_mant
} else {
let f32_exp = ((exp - 15 + 127) as u32) & 0xFF;
let f32_mant = mant << 13;
(sign << 31) | (f32_exp << 23) | f32_mant
};
f32::from_bits(bits)
}
/// **FLOAT16_2** — two 32-bit floats → two half-floats packed into one
/// 32-bit word (X in high 16 bits of lane 3, Y in low 16).
pub fn pack_float16_2(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
let x = f32_to_f16_bits(f[0]) as u32;
let y = f32_to_f16_bits(f[1]) as u32;
Vec128::from_u32x4(0, 0, 0, (x << 16) | y)
}
pub fn unpack_float16_2(v: Vec128) -> Vec128 {
let word = v.u32x4(3);
let x = f16_bits_to_f32((word >> 16) as u16);
let y = f16_bits_to_f32(word as u16);
Vec128::from_f32x4(x, y, 0.0, 1.0)
}
/// **FLOAT16_4** — four 32-bit floats → four half-floats packed across
/// 64 bits (lanes 2-3).
pub fn pack_float16_4(v: Vec128) -> Vec128 {
let f = v.as_f32x4();
let x = f32_to_f16_bits(f[0]) as u32;
let y = f32_to_f16_bits(f[1]) as u32;
let z = f32_to_f16_bits(f[2]) as u32;
let w = f32_to_f16_bits(f[3]) as u32;
Vec128::from_u32x4(0, 0, (x << 16) | y, (z << 16) | w)
}
pub fn unpack_float16_4(v: Vec128) -> Vec128 {
let hi = v.u32x4(2);
let lo = v.u32x4(3);
let x = f16_bits_to_f32((hi >> 16) as u16);
let y = f16_bits_to_f32(hi as u16);
let z = f16_bits_to_f32((lo >> 16) as u16);
let w = f16_bits_to_f32(lo as u16);
Vec128::from_f32x4(x, y, z, w)
}
// ─── CR6 helpers used by integer compares ─────────────────────────────────
// vcmp*. (record-form) updates CR6 in a compressed form:
// CR6 = {all-true, 0, all-false, 0}
// where each bit reflects the per-lane mask across the whole register.
#[inline] pub fn cr6_flags_from_mask(mask: Vec128) -> (bool, bool) {
let b = mask.as_bytes();
let mut any_set = false;
let mut any_clear = false;
for &byte in b.iter() {
if byte != 0 { any_set = true; }
if byte != 0xFF { any_clear = true; }
}
let all_true = !any_clear;
let all_false = !any_set;
(all_true, all_false)
}
#[cfg(test)]
mod tests {
use super::*;
use std::cell::Cell;
struct TestMem { data: Box<[Cell<u8>]> }
impl TestMem {
fn new(size: usize) -> Self {
Self { data: (0..size).map(|_| Cell::new(0)).collect() }
}
}
impl MemoryAccess for TestMem {
fn read_u8(&self, a: u32) -> u8 { self.data[a as usize].get() }
fn read_u16(&self, a: u32) -> u16 {
u16::from_be_bytes([self.data[a as usize].get(), self.data[a as usize + 1].get()])
}
fn read_u32(&self, a: u32) -> u32 {
let a = a as usize;
u32::from_be_bytes([
self.data[a].get(), self.data[a+1].get(),
self.data[a+2].get(), self.data[a+3].get(),
])
}
fn read_u64(&self, a: u32) -> u64 {
let a = a as usize;
u64::from_be_bytes([
self.data[a].get(), self.data[a+1].get(),
self.data[a+2].get(), self.data[a+3].get(),
self.data[a+4].get(), self.data[a+5].get(),
self.data[a+6].get(), self.data[a+7].get(),
])
}
fn write_u8(&self, a: u32, v: u8) { self.data[a as usize].set(v); }
fn write_u16(&self, a: u32, v: u16) {
let b = v.to_be_bytes();
self.data[a as usize].set(b[0]);
self.data[a as usize + 1].set(b[1]);
}
fn write_u32(&self, a: u32, v: u32) {
let b = v.to_be_bytes(); let a = a as usize;
for (i, byte) in b.iter().enumerate() { self.data[a+i].set(*byte); }
}
fn write_u64(&self, a: u32, v: u64) {
let b = v.to_be_bytes(); let a = a as usize;
for (i, byte) in b.iter().enumerate() { self.data[a+i].set(*byte); }
}
fn translate(&self, _a: u32) -> Option<*const u8> { None }
fn translate_mut(&self, _a: u32) -> Option<*mut u8> { None }
}
#[test]
fn lvlx_lvrx_round_trip() {
let m = TestMem::new(0x40);
for i in 0..0x30 { m.data[i].set((i as u8).wrapping_add(0x10)); }
// Unaligned load from 0x13 should combine lvlx(0x13) | lvrx(0x23).
let lo = load_vector_left(&m, 0x13);
let hi = load_vector_right(&m, 0x23);
let mut combined = [0u8; 16];
let lob = lo.as_bytes();
let hib = hi.as_bytes();
for i in 0..16 { combined[i] = lob[i] | hib[i]; }
for i in 0..16 {
assert_eq!(combined[i], m.data[0x13 + i].get(), "lane {}", i);
}
}
#[test]
fn lvlx_aligned_is_full_load() {
let m = TestMem::new(0x20);
for i in 0..0x20 { m.data[i].set(i as u8); }
let v = load_vector_left(&m, 0x10);
let b = v.as_bytes();
for i in 0..16 { assert_eq!(b[i], 0x10 + i as u8); }
}
#[test]
fn lvrx_aligned_is_zero() {
let m = TestMem::new(0x20);
let v = load_vector_right(&m, 0x10);
assert_eq!(v.as_bytes(), [0u8; 16]);
}
#[test]
fn sat_add_signed_overflow() {
assert_eq!(sat_add_i8(120, 10), (127, true));
assert_eq!(sat_add_i8(-120, -10), (-128, true));
assert_eq!(sat_add_i8(1, 2), (3, false));
}
#[test]
fn sat_sub_unsigned_underflow() {
assert_eq!(sat_sub_u8(5, 10), (0, true));
assert_eq!(sat_sub_u8(10, 5), (5, false));
}
#[test]
fn pack_pixel_555_matches_canary() {
// Mapping (canary ppc_emit_altivec.cc::vkpkx_in_low):
// out[15:10] = in[24:19], out[9:5] = in[14:10], out[4:0] = in[7:3]
// Pure shift/mask, NO standalone alpha bit.
// All three colour fields exercised. Expected (hand-computed):
// (0x018844C0 >> 9)&0xFC00 = 0xC400
// (0x018844C0 >> 6)&0x3E0 = 0x100
// (0x018844C0 >> 3)&0x1F = 0x18
// => 0xC518
assert_eq!(pack_pixel_555(0x01_88_44_C0), 0xC518);
// Boundary the audit flagged: low byte 0xF8 has bit 7 set. Canary does
// NOT turn that into output bit 15 (alpha). Output bit 15 = in bit 24,
// which is 0 here => high bit clear. (Old impl wrongly produced 0x8000.)
assert_eq!(pack_pixel_555(0x80_F8_F8_F8), 0x7FFF);
assert_eq!(pack_pixel_555(0x80_F8_F8_F8) & 0x8000, 0);
// Lone source bit 7 (0x80) lands in the blue field, not in bit 15.
assert_eq!(pack_pixel_555(0x00_00_00_80), 0x0010);
// Output bit 15 is sourced from input bit 24, not bit 7.
assert_eq!(pack_pixel_555(0x01_00_00_00), 0x8000);
// Saturated input -> all field bits set.
assert_eq!(pack_pixel_555(0xFF_FF_FF_FF), 0xFFFF);
}
#[test]
fn unpack_pixel_555_roundtrip() {
// vupkhpx/vupklpx are NOTIMPLEMENTED in canary, so unpack_pixel_555 is
// unchanged; just sanity-check the alpha-replicate path still holds.
let w = unpack_pixel_555(0x8000 | (0x1F << 10) | (0x1F << 5) | 0x1F);
assert_eq!(w & 0xFF000000, 0xFF000000);
}
// ─── First-Pixels M3 pack/unpack roundtrip tests ───
/// Quantization error tolerance for N-bit signed normalized values.
/// `1.0 / ((1 << (bits - 1)) - 1)` is the step size.
fn tol_normalized(bits: u32) -> f32 {
1.0 / ((1u32 << (bits - 1)) - 1) as f32
}
#[test]
fn normshort2_roundtrip() {
let v = Vec128::from_f32x4(0.5, -0.75, 0.0, 0.0);
let packed = pack_normshort2(v);
let back = unpack_normshort2(packed).as_f32x4();
let tol = tol_normalized(16);
assert!((back[0] - 0.5).abs() < tol, "x got {}", back[0]);
assert!((back[1] - -0.75).abs() < tol, "y got {}", back[1]);
assert_eq!(back[2], 0.0);
assert_eq!(back[3], 1.0);
}
#[test]
fn normshort4_roundtrip_extremes() {
let v = Vec128::from_f32x4(1.0, -1.0, 0.0, 0.25);
let packed = pack_normshort4(v);
let back = unpack_normshort4(packed).as_f32x4();
let tol = tol_normalized(16);
assert!((back[0] - 1.0).abs() < tol);
assert!((back[1] - -1.0).abs() < tol);
assert!((back[2] - 0.0).abs() < tol);
assert!((back[3] - 0.25).abs() < tol);
}
#[test]
fn normpacked32_roundtrip() {
let v = Vec128::from_f32x4(0.5, -0.5, 0.9, -1.0);
let packed = pack_normpacked32(v);
let back = unpack_normpacked32(packed).as_f32x4();
let tol10 = tol_normalized(10);
let tol2 = tol_normalized(2);
assert!((back[0] - 0.5).abs() < tol10, "x got {}", back[0]);
assert!((back[1] - -0.5).abs() < tol10, "y got {}", back[1]);
assert!((back[2] - 0.9).abs() < tol10, "z got {}", back[2]);
// 2-bit signed quantizes to {-1, -0.5-ish, 0, 0.5-ish}; tolerance
// is the full step.
assert!((back[3] - -1.0).abs() < 2.0 * tol2, "w got {}", back[3]);
}
#[test]
fn normpacked64_roundtrip() {
let v = Vec128::from_f32x4(0.5, -0.25, 0.75, 0.5);
let packed = pack_normpacked64(v);
let back = unpack_normpacked64(packed).as_f32x4();
let tol20 = tol_normalized(20);
let tol4 = tol_normalized(4);
assert!((back[0] - 0.5).abs() < tol20, "x got {}", back[0]);
assert!((back[1] - -0.25).abs() < tol20, "y got {}", back[1]);
assert!((back[2] - 0.75).abs() < tol20, "z got {}", back[2]);
assert!((back[3] - 0.5).abs() < tol4, "w got {}", back[3]);
}
#[test]
fn float16_2_roundtrip_normals() {
// Half has ~3 decimal digits of precision. Pick values that
// survive conversion cleanly: powers of 2 + simple fractions.
let v = Vec128::from_f32x4(1.0, -2.5, 0.0, 0.0);
let packed = pack_float16_2(v);
let back = unpack_float16_2(packed).as_f32x4();
assert_eq!(back[0], 1.0);
assert_eq!(back[1], -2.5);
assert_eq!(back[2], 0.0);
assert_eq!(back[3], 1.0);
}
#[test]
fn float16_4_roundtrip_normals() {
let v = Vec128::from_f32x4(0.5, -3.0, 16.0, -0.125);
let packed = pack_float16_4(v);
let back = unpack_float16_4(packed).as_f32x4();
assert_eq!(back[0], 0.5);
assert_eq!(back[1], -3.0);
assert_eq!(back[2], 16.0);
assert_eq!(back[3], -0.125);
}
#[test]
fn float16_handles_zero_and_infinity() {
// Zero should survive.
assert_eq!(f16_bits_to_f32(f32_to_f16_bits(0.0)), 0.0);
assert_eq!(f16_bits_to_f32(f32_to_f16_bits(-0.0)).to_bits(), (-0.0f32).to_bits());
// +inf.
let inf_back = f16_bits_to_f32(f32_to_f16_bits(f32::INFINITY));
assert!(inf_back.is_infinite() && inf_back > 0.0);
// Overflow → +inf.
let overflow_back = f16_bits_to_f32(f32_to_f16_bits(65536.0));
assert!(overflow_back.is_infinite());
}
#[test]
fn pack_type_enum_maps_canary_values() {
use D3dPackType::*;
assert!(matches!(D3dPackType::from_immediate(0), D3dColor));
assert!(matches!(D3dPackType::from_immediate(1), NormShort2));
assert!(matches!(D3dPackType::from_immediate(2), NormPacked32));
assert!(matches!(D3dPackType::from_immediate(3), Float16_2));
assert!(matches!(D3dPackType::from_immediate(4), NormShort4));
assert!(matches!(D3dPackType::from_immediate(5), Float16_4));
assert!(matches!(D3dPackType::from_immediate(6), NormPacked64));
assert!(matches!(D3dPackType::from_immediate(7), Other(7)));
}
}

View File

@@ -0,0 +1,550 @@
//! Assert-based goldens for the PPC disassembler.
//!
//! Each test owns an inline list of `(raw, addr, label)` cases. On a
//! normal run, the test reads the corresponding fixture JSON and asserts
//! that `format(decode(raw, addr))` reproduces every field exactly. On
//! first creation (fixture file missing) or with `REGEN_GOLDENS=1` set,
//! the test (re)writes the fixture from `format()` output.
//!
//! Workflow:
//! ```sh
//! cargo test -p xenia-cpu --test disasm_goldens # assert
//! REGEN_GOLDENS=1 cargo test -p xenia-cpu --test disasm_goldens # regen
//! ```
//!
//! The hand-encoded test cases below cover the silent-bug regression
//! cases that lived in the old println-based `disasm_audit.rs` harness
//! (now deleted).
use std::path::PathBuf;
use serde::{Deserialize, Serialize};
use xenia_cpu::decoder::{DecodedInstr, decode};
use xenia_cpu::disasm::format;
#[derive(Debug, Clone, PartialEq, Eq, Deserialize, Serialize)]
struct GoldenRow {
label: String,
raw: String,
addr: String,
mnemonic: String,
operands: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
ext_mnemonic: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
ext_operands: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
branch_target: Option<String>,
}
#[derive(Debug, Deserialize, Serialize)]
struct GoldenFile {
rows: Vec<GoldenRow>,
}
fn fixture_path(name: &str) -> PathBuf {
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.join("tests")
.join("golden")
.join(name)
}
/// Encode a VMX128 VX128-form (or VX128_R/_2) instruction with canary's
/// 7-bit register layout: VD low at PPC 6-10, high 2 bits at PPC 28-29;
/// VA low at PPC 11-15, mid bit at PPC 26, high bit at PPC 21; VB low at
/// PPC 16-20, high 2 bits at PPC 30-31. `secondary_bits` carries any
/// secondary opcode + VC + Rc + key bits the caller needs.
fn encode_vx128(op6: u32, vd: u32, va: u32, vb: u32, secondary_bits: u32) -> u32 {
((op6 & 0x3F) << 26)
| ((vd & 0x1F) << 21)
| (((vd >> 5) & 0x3) << 2)
| ((va & 0x1F) << 16)
| (((va >> 5) & 0x1) << 5)
| (((va >> 6) & 0x1) << 10)
| ((vb & 0x1F) << 11)
| (((vb >> 5) & 0x3) << 0)
| secondary_bits
}
fn build_rows(cases: &[(u32, u32, &str)]) -> Vec<GoldenRow> {
cases
.iter()
.map(|&(raw, addr, label)| {
let d = decode(raw, addr);
let t = format(&d);
GoldenRow {
label: label.to_string(),
raw: format!("0x{raw:08X}"),
addr: format!("0x{addr:08X}"),
mnemonic: t.mnemonic,
operands: t.operands,
ext_mnemonic: t.ext_mnemonic,
ext_operands: t.ext_operands,
branch_target: t.branch_target.map(|t| format!("0x{t:08X}")),
}
})
.collect()
}
/// Compare what `format()` produces against the committed JSON snapshot.
/// Set `REGEN_GOLDENS=1` to overwrite the snapshot from current output.
/// Missing snapshot is treated as "first creation": writes and panics so
/// CI can't accidentally accept blank goldens.
fn assert_or_regen(fixture_name: &str, cases: &[(u32, u32, &str)]) {
let rows = build_rows(cases);
let path = fixture_path(fixture_name);
let regen = std::env::var("REGEN_GOLDENS").is_ok();
if regen || !path.exists() {
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent).unwrap();
}
let serialized = serde_json::to_string_pretty(&GoldenFile { rows }).unwrap();
std::fs::write(&path, serialized + "\n").unwrap();
if !regen {
panic!(
"Generated fixture {} (was missing). Inspect, commit, then re-run.",
path.display()
);
}
return;
}
let src = std::fs::read_to_string(&path).unwrap();
let golden: GoldenFile = serde_json::from_str(&src).unwrap();
assert_eq!(
rows.len(),
golden.rows.len(),
"row count differs from {} (live={}, fixture={}). Run with REGEN_GOLDENS=1 if the test cases changed intentionally.",
path.display(),
rows.len(),
golden.rows.len()
);
for (i, (got, expected)) in rows.iter().zip(golden.rows.iter()).enumerate() {
assert_eq!(
got, expected,
"row {} ({}) differs in {}\n live: {got:#?}\n fixture: {expected:#?}",
i,
expected.label,
path.display()
);
}
}
// ── Encoding helpers ────────────────────────────────────────────────────────
// PPC bit numbering: bit 0 is MSB, bit 31 is LSB. Most helpers below emit
// instructions in canonical hand-readable form: opcode << 26 | <fields>.
#[allow(clippy::too_many_arguments)]
fn xform_xo3(rd: u32, ra: u32, rb: u32, oe: u32, xo: u32, rc: u32) -> u32 {
(31 << 26) | (rd << 21) | (ra << 16) | (rb << 11) | (oe << 10) | (xo << 1) | rc
}
fn xform_logic(rs: u32, ra: u32, rb: u32, xo: u32, rc: u32) -> u32 {
(31 << 26) | (rs << 21) | (ra << 16) | (rb << 11) | (xo << 1) | rc
}
fn dform(op: u32, rt: u32, ra: u32, imm: i16) -> u32 {
(op << 26) | (rt << 21) | (ra << 16) | ((imm as u16) as u32)
}
fn iform_b(target_disp: i32, aa: u32, lk: u32) -> u32 {
// I-form: opcode 18 | LI<<2 | AA<<1 | LK
let li = (target_disp as u32) & 0x03FF_FFFC;
(18 << 26) | li | (aa << 1) | lk
}
fn bform_bc(bo: u32, bi: u32, target_disp: i32, aa: u32, lk: u32) -> u32 {
// B-form: opcode 16 | BO<<21 | BI<<16 | BD<<2 | AA<<1 | LK
let bd = (target_disp as u32) & 0x0000_FFFC;
(16 << 26) | (bo << 21) | (bi << 16) | bd | (aa << 1) | lk
}
fn xlform_bclr(bo: u32, bi: u32, lk: u32) -> u32 {
// XL-form: opcode 19 | BO<<21 | BI<<16 | XO=16<<1 | LK
(19 << 26) | (bo << 21) | (bi << 16) | (16 << 1) | lk
}
fn xlform_bcctr(bo: u32, bi: u32, lk: u32) -> u32 {
(19 << 26) | (bo << 21) | (bi << 16) | (528 << 1) | lk
}
fn rlwinm(rs: u32, ra: u32, sh: u32, mb: u32, me: u32, rc: u32) -> u32 {
(21 << 26) | (rs << 21) | (ra << 16) | (sh << 11) | (mb << 6) | (me << 1) | rc
}
fn rldicl(rs: u32, ra: u32, sh: u32, mb: u32, rc: u32) -> u32 {
// MD-form: sh[4:0] at PPC bits 16-20 (host bits 11-15); sh[5] at PPC bit 30 (host bit 1).
// mb[4:0] at PPC bits 21-25 (host bits 6-10); mb[5] at PPC bit 26 (host bit 5).
let sh_lo = sh & 0x1F;
let sh_hi = (sh >> 5) & 1;
let mb_lo = mb & 0x1F;
let mb_hi = (mb >> 5) & 1;
(30 << 26)
| (rs << 21)
| (ra << 16)
| (sh_lo << 11)
| (mb_lo << 6)
| (mb_hi << 5)
| (0 << 2)
| (sh_hi << 1)
| rc
}
fn mfspr(rd: u32, spr: u32) -> u32 {
let spr_swapped = ((spr & 0x1F) << 5) | ((spr >> 5) & 0x1F);
(31 << 26) | (rd << 21) | (spr_swapped << 11) | (339 << 1)
}
fn mtspr(rs: u32, spr: u32) -> u32 {
let spr_swapped = ((spr & 0x1F) << 5) | ((spr >> 5) & 0x1F);
(31 << 26) | (rs << 21) | (spr_swapped << 11) | (467 << 1)
}
// ── Tests ───────────────────────────────────────────────────────────────────
#[test]
fn base_mnemonics() {
let cases: &[(u32, u32, &str)] = &[
// X-form ALU (Rc and OE bits)
(xform_xo3(3, 4, 5, 0, 266, 0), 0x82000000, "add r3,r4,r5"),
(xform_xo3(3, 4, 5, 0, 266, 1), 0x82000000, "add. r3,r4,r5"),
(xform_xo3(3, 4, 5, 1, 266, 0), 0x82000000, "addo r3,r4,r5"),
(xform_xo3(3, 4, 5, 1, 266, 1), 0x82000000, "addo. r3,r4,r5"),
(xform_xo3(3, 4, 0, 0, 104, 0), 0x82000000, "neg r3,r4"),
(xform_xo3(3, 4, 5, 0, 235, 0), 0x82000000, "mullw r3,r4,r5"),
(xform_xo3(3, 4, 5, 0, 491, 0), 0x82000000, "divw r3,r4,r5"),
(xform_xo3(3, 4, 5, 0, 75, 1), 0x82000000, "mulhw. r3,r4,r5"),
(xform_xo3(3, 4, 5, 0, 11, 1), 0x82000000, "mulhwu. r3,r4,r5"),
(xform_xo3(3, 4, 5, 0, 233, 0), 0x82000000, "mulld r3,r4,r5"),
// X-form logical
(xform_logic(4, 3, 5, 28, 0), 0x82000000, "and r3,r4,r5"),
(xform_logic(4, 3, 5, 444, 0), 0x82000000, "or r3,r4,r5 (non-mr: rs!=rb)"),
(xform_logic(4, 3, 5, 316, 0), 0x82000000, "xor r3,r4,r5"),
(xform_logic(4, 3, 5, 124, 0), 0x82000000, "nor r3,r4,r5"),
(xform_logic(4, 3, 5, 476, 0), 0x82000000, "nand r3,r4,r5"),
(xform_logic(4, 3, 5, 284, 0), 0x82000000, "eqv r3,r4,r5"),
(xform_logic(4, 3, 5, 60, 0), 0x82000000, "andc r3,r4,r5"),
(xform_logic(4, 3, 5, 412, 0), 0x82000000, "orc r3,r4,r5"),
// X-form shift
(xform_logic(4, 3, 5, 24, 0), 0x82000000, "slw r3,r4,r5"),
(xform_logic(4, 3, 5, 536, 0), 0x82000000, "srw r3,r4,r5"),
(xform_logic(4, 3, 5, 792, 0), 0x82000000, "sraw r3,r4,r5"),
(xform_logic(4, 3, 5, 27, 0), 0x82000000, "sld r3,r4,r5"),
(xform_logic(4, 3, 5, 539, 0), 0x82000000, "srd r3,r4,r5"),
// srawi / sradi (immediate shifts)
((31 << 26) | (4 << 21) | (3 << 16) | (16 << 11) | (824 << 1), 0x82000000, "srawi r3,r4,16"),
// Atomics
((31 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (150 << 1) | 1, 0x82000000, "stwcx. r3,r4,r5"),
((31 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (214 << 1) | 1, 0x82000000, "stdcx. r3,r4,r5"),
((31 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (20 << 1), 0x82000000, "lwarx r3,r4,r5"),
((31 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (84 << 1), 0x82000000, "ldarx r3,r4,r5"),
// Compares
(dform(11, 0, 3, 16), 0x82000000, "cmpwi cr0, r3, 16"),
(dform(11, 2 << 2, 3, 16), 0x82000000, "cmpwi cr2, r3, 16"),
(dform(10, 0, 3, 16), 0x82000000, "cmplwi cr0, r3, 16"),
((31 << 26) | (3 << 16) | (4 << 11), 0x82000000, "cmpw r3,r4 in cr0"),
((31 << 26) | (1 << 21) | (3 << 16) | (4 << 11), 0x82000000, "cmpd r3,r4"),
((31 << 26) | (3 << 16) | (4 << 11) | (32 << 1), 0x82000000, "cmplw r3,r4"),
// D-form ALU/load/store
(dform(14, 3, 1, 16), 0x82000000, "addi r3, r1, 16"),
(dform(15, 3, 1, 0x100), 0x82000000, "addis r3, r1, 0x100 (ra!=0)"),
(dform(7, 3, 4, 5), 0x82000000, "mulli r3, r4, 5"),
(dform(8, 3, 4, 5), 0x82000000, "subfic r3, r4, 5"),
(dform(12, 3, 4, 16), 0x82000000, "addic r3, r4, 16"),
(dform(13, 3, 4, 16), 0x82000000, "addic. r3, r4, 16"),
(dform(24, 3, 4, 0x10), 0x82000000, "ori r4, r3, 0x10 (non-nop)"),
(dform(25, 3, 4, 0x10), 0x82000000, "oris r4, r3, 0x10"),
(dform(26, 3, 4, 0x10), 0x82000000, "xori r4, r3, 0x10"),
(dform(28, 3, 4, 0x10), 0x82000000, "andi. r4, r3, 0x10"),
// Loads/stores D-form
(dform(32, 5, 1, 0x20), 0x82000000, "lwz r5, 0x20(r1)"),
(dform(36, 5, 1, 0x20), 0x82000000, "stw r5, 0x20(r1)"),
(dform(34, 5, 1, 0x20), 0x82000000, "lbz r5, 0x20(r1)"),
(dform(40, 5, 1, 0x20), 0x82000000, "lhz r5, 0x20(r1)"),
(dform(48, 5, 1, 0x20), 0x82000000, "lfs f5, 0x20(r1)"),
(dform(50, 5, 1, 0x20), 0x82000000, "lfd f5, 0x20(r1)"),
(dform(54, 5, 1, 0x20), 0x82000000, "stfd f5, 0x20(r1)"),
// DS-form 64-bit loads
((58u32 << 26) | (5 << 21) | (1 << 16) | 0x20, 0x82000000, "ld r5, 0x20(r1)"),
((62u32 << 26) | (5 << 21) | (1 << 16) | 0x20, 0x82000000, "std r5, 0x20(r1)"),
// Sync / barrier (parameterless)
((31 << 26) | (598 << 1), 0x82000000, "sync 0 (extends to sync)"),
((19 << 26) | (150 << 1), 0x82000000, "isync"),
((31 << 26) | (854 << 1), 0x82000000, "eieio"),
// Cache hints
((31 << 26) | (1 << 16) | (2 << 11) | (54 << 1), 0x82000000, "dcbst r1, r2"),
((31 << 26) | (1 << 16) | (2 << 11) | (86 << 1), 0x82000000, "dcbf r1, r2"),
((31 << 26) | (1 << 16) | (2 << 11) | (278 << 1), 0x82000000, "dcbt r1, r2"),
((31 << 26) | (1 << 16) | (2 << 11) | (1014 << 1), 0x82000000, "dcbz r1, r2"),
((31 << 26) | (1 << 21) | (1 << 16) | (2 << 11) | (1014 << 1), 0x82000000, "dcbz128 r1, r2"),
// CR logical (without simplification triggers)
((19 << 26) | (4 << 21) | (5 << 16) | (6 << 11) | (33 << 1), 0x82000000, "crnor 4,5,6 (no simplify)"),
((19 << 26) | (4 << 21) | (5 << 16) | (6 << 11) | (257 << 1), 0x82000000, "crand 4,5,6"),
((19 << 26) | (4 << 21) | (5 << 16) | (6 << 11) | (449 << 1), 0x82000000, "cror 4,5,6 (no simplify)"),
// Trap (no simplification: TO=11 doesn't match the table)
((31 << 26) | (11 << 21) | (3 << 16) | (4 << 11) | (4 << 1), 0x82000000, "tw 11, r3, r4 (uncommon TO)"),
((2u32 << 26) | (11 << 21) | (3 << 16) | (123u32 & 0xFFFF), 0x82000000, "tdi 11, r3, 123"),
// mtcr (extended): mtcrf 0xFF, r5
((31 << 26) | (5 << 21) | (0xFF << 12) | (144 << 1), 0x82000000, "mtcrf 0xFF, r5 → mtcr"),
// mfcr / mfmsr / mtmsr / mtmsrd
((31 << 26) | (5 << 21) | (19 << 1), 0x82000000, "mfcr r5"),
((31 << 26) | (5 << 21) | (83 << 1), 0x82000000, "mfmsr r5"),
((31 << 26) | (5 << 21) | (146 << 1), 0x82000000, "mtmsr r5"),
((31 << 26) | (5 << 21) | (178 << 1), 0x82000000, "mtmsrd r5"),
// FPU base
((63u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (21 << 1), 0x82000000, "fadd f3, f4, f5"),
((63u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (20 << 1), 0x82000000, "fsub f3, f4, f5"),
((63u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (18 << 1), 0x82000000, "fdiv f3, f4, f5"),
((63u32 << 26) | (3 << 21) | (5 << 21) | (5 << 11) | (25 << 1), 0x82000000, "fmul f3, f0, f5 (encoded)"),
((63u32 << 26) | (3 << 21) | (4 << 16) | (40 << 1), 0x82000000, "fneg f3, f4"),
((63u32 << 26) | (3 << 21) | (4 << 16) | (72 << 1), 0x82000000, "fmr f3, f4"),
// mtfsf — XFL form (Fix 1). FM at LSB bits 17-24 (PPC bits 7-14).
// Encoding: opcode 63 | FM<<17 | frB<<11 | XO=711<<1 | Rc.
((63u32 << 26) | (0xFF << 17) | (5 << 11) | (711 << 1), 0x82000000, "mtfsf 0xFF, f5 (Rc=0)"),
((63u32 << 26) | (0xFF << 17) | (5 << 11) | (711 << 1) | 1, 0x82000000, "mtfsf. 0xFF, f5 (Rc=1)"),
];
assert_or_regen("base_mnemonics.json", cases);
}
#[test]
fn extended_mnemonics() {
let cases: &[(u32, u32, &str)] = &[
// ori r0, r0, 0 → nop
(dform(24, 0, 0, 0), 0x82000000, "nop"),
// addi r3, r0, imm → li
(dform(14, 3, 0, 16), 0x82000000, "li r3, 16"),
(dform(14, 3, 0, -1), 0x82000000, "li r3, -1"),
// addi r3, r4, neg → subi
(dform(14, 3, 4, -16), 0x82000000, "subi r3, r4, 16"),
// addis r3, r0, imm → lis
(dform(15, 3, 0, 0x1234), 0x82000000, "lis r3, 0x1234"),
// addis r3, r4, neg → subis
(dform(15, 3, 4, -1), 0x82000000, "subis r3, r4, 0xFFFF"),
// or rA, rS, rS → mr
(xform_logic(4, 3, 4, 444, 0), 0x82000000, "mr r3, r4"),
(xform_logic(4, 3, 4, 444, 1), 0x82000000, "mr. r3, r4"),
// and rA, rS, rS → mr (also)
(xform_logic(4, 3, 4, 28, 0), 0x82000000, "mr (via and)"),
// nor rA, rS, rS → not
(xform_logic(4, 3, 4, 124, 0), 0x82000000, "not r3, r4"),
// subf → sub (operand swap)
(xform_xo3(3, 4, 5, 0, 40, 0), 0x82000000, "subf → sub r3, r5, r4"),
// rlwinm simplifications
(rlwinm(4, 3, 4, 0, 31 - 4, 0), 0x82000000, "slwi r3, r4, 4"),
(rlwinm(4, 3, 32 - 4, 4, 31, 0), 0x82000000, "srwi r3, r4, 4"),
(rlwinm(4, 3, 8, 0, 31, 0), 0x82000000, "rotlwi r3, r4, 8"),
(rlwinm(4, 3, 0, 4, 31, 0), 0x82000000, "clrlwi r3, r4, 4"),
(rlwinm(4, 3, 0, 0, 27, 0), 0x82000000, "clrrwi r3, r4, 4"),
(rlwinm(4, 3, 8, 0, 7, 0), 0x82000000, "extlwi r3, r4, 8, 8"),
// rlwinm with Rc
(rlwinm(4, 3, 4, 0, 31 - 4, 1), 0x82000000, "slwi. r3, r4, 4"),
// rlwinm Sylpheed regression
(rlwinm(11, 11, 0, 31, 31, 1), 0x82000000, "rlwinm. r11,r11,0,31,31 (no simplify)"),
// rldicl simplifications
(rldicl(4, 3, 0, 32, 0), 0x82000000, "clrldi r3, r4, 32"),
(rldicl(4, 3, 64u32 - 8, 8, 0), 0x82000000, "srdi r3, r4, 8"),
(rldicl(4, 3, 8, 0, 0), 0x82000000, "rotldi r3, r4, 8"),
// cmpi / cmpli → cmpwi/cmpdi/cmplwi/cmpldi
(dform(11, 0, 3, 16), 0x82000000, "cmpwi cr0, r3, 16"),
(dform(11, (1 << 21) | (2 << 23), 3, 16) | (1 << 21), 0x82000000, "cmpdi (L=1) variant"),
// bclr 20, 0 → blr
(xlform_bclr(20, 0, 0), 0x82000000, "blr"),
(xlform_bclr(20, 0, 1), 0x82000000, "blrl"),
// bcctr 20, 0 → bctr
(xlform_bcctr(20, 0, 0), 0x82000000, "bctr"),
(xlform_bcctr(20, 0, 1), 0x82000000, "bctrl"),
// bclr conditional
(xlform_bclr(12, 2, 0), 0x82000000, "beqlr (BO=12, BI=2 → cr0.eq true)"),
(xlform_bclr(4, 2, 0), 0x82000000, "bnelr"),
// bc with full BO/BI: branch always (BO=20)
(bform_bc(20, 0, 0x40, 0, 0), 0x82000000, "bc → b 0x82000040"),
(bform_bc(20, 0, 0x40, 0, 1), 0x82000000, "bc l → bl 0x82000040"),
// Conditional bc → beq/bne/etc
(bform_bc(12, 2, 0x40, 0, 0), 0x82000000, "bc 12,cr0.eq → beq 0x82000040"),
(bform_bc(4, 2, 0x40, 0, 0), 0x82000000, "bc 4,cr0.eq → bne 0x82000040"),
(bform_bc(12, 0, 0x40, 0, 0), 0x82000000, "bc 12,cr0.lt → blt 0x82000040"),
(bform_bc(4, 0, 0x40, 0, 0), 0x82000000, "bc 4,cr0.lt → bge 0x82000040"),
(bform_bc(12, 1, 0x40, 0, 0), 0x82000000, "bc 12,cr0.gt → bgt 0x82000040"),
(bform_bc(4, 1, 0x40, 0, 0), 0x82000000, "bc 4,cr0.gt → ble 0x82000040"),
// Conditional with non-zero CR field
(bform_bc(12, 2 + 8, 0x40, 0, 0), 0x82000000, "bc 12, cr2.eq → beq cr2, 0x...040"),
// bdnz / bdz (decrement-CTR branches)
(bform_bc(16, 0, 0x40, 0, 0), 0x82000000, "bdnz 0x82000040"),
(bform_bc(18, 0, 0x40, 0, 0), 0x82000000, "bdz 0x82000040"),
// I-form branches
(iform_b(0x40, 0, 0), 0x82000000, "b +0x40 → 0x82000040"),
(iform_b(0x40, 0, 1), 0x82000000, "bl +0x40 → 0x82000040"),
(iform_b(0x40, 1, 0), 0x82000000, "ba 0x40 absolute"),
(iform_b(0x40, 1, 1), 0x82000000, "bla 0x40 absolute"),
// Trap immediate simplifications
((2u32 << 26) | (4 << 21) | (3 << 16) | (123u32 & 0xFFFF), 0x82000000, "tdeqi r3, 123"),
((3u32 << 26) | (16 << 21) | (3 << 16) | (123u32 & 0xFFFF), 0x82000000, "twlti r3, 123"),
// mfspr → mflr / mfctr / mfxer
(mfspr(3, 8), 0x82000000, "mflr r3"),
(mfspr(3, 9), 0x82000000, "mfctr r3"),
(mfspr(3, 1), 0x82000000, "mfxer r3"),
// mtspr → mtlr / mtctr / mtxer
(mtspr(3, 8), 0x82000000, "mtlr r3"),
(mtspr(3, 9), 0x82000000, "mtctr r3"),
(mtspr(3, 1), 0x82000000, "mtxer r3"),
// crnor with same source bits → crnot
((19 << 26) | (4 << 21) | (5 << 16) | (5 << 11) | (33 << 1), 0x82000000, "crnot 4, 5"),
// crxor with all same → crclr
((19 << 26) | (4 << 21) | (4 << 16) | (4 << 11) | (193 << 1), 0x82000000, "crclr 4"),
// creqv with all same → crset
((19 << 26) | (4 << 21) | (4 << 16) | (4 << 11) | (289 << 1), 0x82000000, "crset 4"),
// cror with same source bits → crmove
((19 << 26) | (4 << 21) | (5 << 16) | (5 << 11) | (449 << 1), 0x82000000, "crmove 4, 5"),
// sync L=1 → lwsync
((31 << 26) | (1 << 21) | (598 << 1), 0x82000000, "lwsync"),
// tw 31, 0, 0 → trap
((31 << 26) | (31 << 21) | (4 << 1), 0x82000000, "trap"),
// Fix 2: bclr/bcctr with BO=20 and BI≠0 still emits blr/bctr ext.
// BO=20 ignores both CTR test and CR test, so BI is don't-care.
(xlform_bclr(20, 4, 0), 0x82000000, "blr (BO=20, BI=4 — BI is don't-care)"),
(xlform_bclr(20, 7, 1), 0x82000000, "blrl (BO=20, BI=7)"),
(xlform_bcctr(20, 4, 0), 0x82000000, "bctr (BO=20, BI=4)"),
// Fix 3: trap unsigned simplified mnemonics (TO=1, 2, 5, 6 — logical
// compare conditions). Register form (tw/td) and immediate (twi/tdi).
((31u32 << 26) | (2 << 21) | (3 << 16) | (4 << 11) | (4 << 1), 0x82000000, "twllt r3, r4 (TO=2)"),
((31u32 << 26) | (1 << 21) | (3 << 16) | (4 << 11) | (4 << 1), 0x82000000, "twlgt r3, r4 (TO=1)"),
((31u32 << 26) | (5 << 21) | (3 << 16) | (4 << 11) | (68 << 1), 0x82000000, "tdlge r3, r4 (TO=5)"),
((31u32 << 26) | (6 << 21) | (3 << 16) | (4 << 11) | (4 << 1), 0x82000000, "twlle r3, r4 (TO=6)"),
((3u32 << 26) | (2 << 21) | (3 << 16) | (16u32 & 0xFFFF), 0x82000000, "twllti r3, 16"),
((2u32 << 26) | (5 << 21) | (3 << 16) | (16u32 & 0xFFFF), 0x82000000, "tdlgei r3, 16"),
];
assert_or_regen("extended_mnemonics.json", cases);
}
#[test]
fn vmx128_registers() {
// Standard VMX (op=4) — 5-bit registers v0..v31. Verifies that the
// low-register path renders correctly through the new formatter.
let std_vmx = [
// vaddubm v3, v4, v5 : op=4, 3-op key=0
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | 0, 0x82000000, "vaddubm v3, v4, v5"),
// vaddfp v3, v4, v5 : op=4, vx=10
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | 10, 0x82000000, "vaddfp v3, v4, v5"),
// vand v3, v4, v5 : vx=1028
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | 1028, 0x82000000, "vand v3, v4, v5"),
// vor v3, v4, v5 : vx=1156
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | 1156, 0x82000000, "vor v3, v4, v5"),
// vxor v3, v4, v5 : vx=1220
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | 1220, 0x82000000, "vxor v3, v4, v5"),
// vsel v3, v4, v5, v6 : op=4, va_key=42 (4-op)
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (6 << 6) | 42, 0x82000000, "vsel v3,v4,v5,v6"),
// vperm v3, v4, v5, v6 : va_key=43
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (6 << 6) | 43, 0x82000000, "vperm v3,v4,v5,v6"),
// vmaddfp v3, v4, v5, v6 : va_key=46 (operand swap: vd, va, vc, vb)
((4u32 << 26) | (3 << 21) | (4 << 16) | (5 << 11) | (6 << 6) | 46, 0x82000000, "vmaddfp v3, v4, v6, v5 (swap)"),
// mfvscr v3 : vx=1540
((4u32 << 26) | (3 << 21) | 1540, 0x82000000, "mfvscr v3"),
// mtvscr v5 : vx=1604, vb=v5
((4u32 << 26) | (5 << 11) | 1604, 0x82000000, "mtvscr v5"),
];
// VMX128 op=5: vperm128 v3, v4, v5, vc=0. Canary FormatVX128: VD low
// at PPC 6-10, VA low at PPC 11-15, VB low at PPC 16-20, VC at PPC 23-25.
// key1 = (bit22<<5)|bit27 = 0 selects vperm128.
let vmx128_op5 = [
(encode_vx128(5, 3, 4, 5, 0), 0x82000000, "vperm128 v3, v4, v5, 0 (canary)"),
];
// VMX128 op=6 — exercise full 0-127 vd128 range under canary's layout.
// VD128h is at PPC 28-29 (host 2-3): no overlap with secondary opcode key,
// so vd can be freely 0-127 for any op6 instruction.
let vsrw128 = |vd: u32, vb: u32| -> u32 {
// vsrw128 secondary: 0x000001D0 (decode_op6 key5 = 0b011101).
encode_vx128(6, vd, 0, vb, 0x000001D0)
};
let vpermwi128 = |vd: u32, vb: u32, perm: u32| -> u32 {
// vpermwi128: PERMl at PPC 11-15, PERMh at PPC 23-25, key1 sets bit 22 + bit 27.
let perml = perm & 0x1F;
let permh = (perm >> 5) & 0x7;
let mut raw = (6u32 << 26)
| ((vd & 0x1F) << 21)
| (((vd >> 5) & 0x3) << 2) // VD128h
| (perml << 16)
| ((vb & 0x1F) << 11)
| (((vb >> 5) & 0x3) << 0) // VB128h
| (permh << 6) // PERMh at PPC 23-25
| (1 << 9) // bit 22 (key1 high)
| (1 << 4); // bit 27 (key1 low)
raw &= !(1 << 10); // PPC 21 = 0 for vpermwi128
raw
};
let vrlimi128 = |vd: u32, vb: u32, imm: u32, z: u32| -> u32 {
// vrlimi128: IMM at PPC 11-15, z at PPC 24-25, key2 = 0b1110001 over
// bits 21-23 + 26-27 → bits 21,22,23 = 1, bit 26 = 0, bit 27 = 1.
(6u32 << 26)
| ((vd & 0x1F) << 21)
| (((vd >> 5) & 0x3) << 2) // VD128h
| ((imm & 0x1F) << 16)
| ((vb & 0x1F) << 11)
| (((vb >> 5) & 0x3) << 0) // VB128h
| ((z & 0x3) << 6) // z at PPC 24-25 = host 6-7
| (1 << 8) // bit 23 (key2)
| (1 << 9) // bit 22 (key2)
| (1 << 10) // bit 21 (key2)
| (1 << 4) // bit 27 (key2)
};
let vmx128_high = [
(vsrw128(0, 12), 0x82000000, "vsrw128 v0, v0, v12 (canary, vd_hi=00)"),
(vsrw128(32, 12), 0x82000000, "vsrw128 v32, v0, v12 (canary, VD128h=01)"),
(vpermwi128(64, 12, 0xE4), 0x82000000, "vpermwi128 v64, v12, 0xE4 (canary, VD128h=10)"),
(vrlimi128(96, 12, 4, 3), 0x82000000, "vrlimi128 v96, v12, 4, 3 (canary, VD128h=11)"),
(vrlimi128(127, 95, 4, 3), 0x82000000, "vrlimi128 v127, v95, 4, 3 (canary)"),
];
// Fix 4: VMX128 multiply-add 4-operand layouts. Per canary, the addend
// is the VD register re-used; operand order differs between the three
// mnemonics. Encodings hand-built to satisfy decode_op5's key2 secondary
// opcode (vmaddfp128=0b001101, vmaddcfp128=0b010001, vnmsubfp128=0b010101)
// with bit 22=0 (forced by key2's high nibble) so vd128 high bit 1 = 0.
// vd128 low = 3 (bits 6-10); va128 = 3 | (bit29<<5) = 35; vb128 = 5.
// Distinct VD vs VA verifies the layout isn't trivially aliasing VD.
//
// layout (canary):
// vmaddfp128 VD, VA, VB, VD → "v3, v35, v5, v3"
// vmaddcfp128 VD, VA, VD, VB → "v3, v35, v3, v5"
// vnmsubfp128 VD, VA, VD, VB → "v3, v35, v3, v5"
let vmx128_4op = [
// Canary FormatVX128 layout: vd=3 (PPC 6-10), va=35 (low 3 at PPC 11-15 + VA128h=1 at PPC 26),
// vb=5 (PPC 16-20), key2 at PPC 22-25 + bit 27.
(0x146328F0u32, 0x82000000, "vmaddfp128 v3, v35, v5, v3"),
(0x14632930u32, 0x82000000, "vmaddcfp128 v3, v35, v3, v5"),
(0x14632970u32, 0x82000000, "vnmsubfp128 v3, v35, v3, v5"),
];
let mut all = Vec::new();
all.extend_from_slice(&std_vmx);
all.extend_from_slice(&vmx128_op5);
all.extend_from_slice(&vmx128_high);
all.extend_from_slice(&vmx128_4op);
assert_or_regen("vmx128_registers.json", &all);
}
#[test]
fn sradi_shift_32_decodes_to_32() {
// sradi rA, rS, 32: sh=32 → sh[4:0]=0, sh[5]=1
// After PPCBUG-040 fix, sh64() must return 32, not 1.
let instr: DecodedInstr = decode(rldicl(3, 4, 32, 63, 0), 0);
// rldicl with mb=63 is not sradi, but tests sh64() extraction.
assert_eq!(instr.sh64(), 32, "sh64 must return 32 for sh=32 (sh5=1, sh_lo=0)");
}
#[test]
fn sh64_shift_1_decodes_correctly() {
// sh=1: sh[4:0]=1, sh[5]=0 → sh64() must return 1
let instr: DecodedInstr = decode(rldicl(3, 4, 1, 0, 0), 0);
assert_eq!(instr.sh64(), 1, "sh64 must return 1 for sh=1");
}
#[test]
fn sh64_shift_63_decodes_correctly() {
// sh=63: sh[4:0]=31=0x1F, sh[5]=1 → sh64() must return 63
let instr: DecodedInstr = decode(rldicl(3, 4, 63, 0, 0), 0);
assert_eq!(instr.sh64(), 63, "sh64 must return 63 for sh=63");
}

View File

@@ -0,0 +1,571 @@
{
"rows": [
{
"label": "add r3,r4,r5",
"raw": "0x7C642A14",
"addr": "0x82000000",
"mnemonic": "add",
"operands": "r3, r4, r5"
},
{
"label": "add. r3,r4,r5",
"raw": "0x7C642A15",
"addr": "0x82000000",
"mnemonic": "add.",
"operands": "r3, r4, r5"
},
{
"label": "addo r3,r4,r5",
"raw": "0x7C642E14",
"addr": "0x82000000",
"mnemonic": "addo",
"operands": "r3, r4, r5"
},
{
"label": "addo. r3,r4,r5",
"raw": "0x7C642E15",
"addr": "0x82000000",
"mnemonic": "addo.",
"operands": "r3, r4, r5"
},
{
"label": "neg r3,r4",
"raw": "0x7C6400D0",
"addr": "0x82000000",
"mnemonic": "neg",
"operands": "r3, r4"
},
{
"label": "mullw r3,r4,r5",
"raw": "0x7C6429D6",
"addr": "0x82000000",
"mnemonic": "mullw",
"operands": "r3, r4, r5"
},
{
"label": "divw r3,r4,r5",
"raw": "0x7C642BD6",
"addr": "0x82000000",
"mnemonic": "divw",
"operands": "r3, r4, r5"
},
{
"label": "mulhw. r3,r4,r5",
"raw": "0x7C642897",
"addr": "0x82000000",
"mnemonic": "mulhw.",
"operands": "r3, r4, r5"
},
{
"label": "mulhwu. r3,r4,r5",
"raw": "0x7C642817",
"addr": "0x82000000",
"mnemonic": "mulhwu.",
"operands": "r3, r4, r5"
},
{
"label": "mulld r3,r4,r5",
"raw": "0x7C6429D2",
"addr": "0x82000000",
"mnemonic": "mulld",
"operands": "r3, r4, r5"
},
{
"label": "and r3,r4,r5",
"raw": "0x7C832838",
"addr": "0x82000000",
"mnemonic": "and",
"operands": "r3, r4, r5"
},
{
"label": "or r3,r4,r5 (non-mr: rs!=rb)",
"raw": "0x7C832B78",
"addr": "0x82000000",
"mnemonic": "or",
"operands": "r3, r4, r5"
},
{
"label": "xor r3,r4,r5",
"raw": "0x7C832A78",
"addr": "0x82000000",
"mnemonic": "xor",
"operands": "r3, r4, r5"
},
{
"label": "nor r3,r4,r5",
"raw": "0x7C8328F8",
"addr": "0x82000000",
"mnemonic": "nor",
"operands": "r3, r4, r5"
},
{
"label": "nand r3,r4,r5",
"raw": "0x7C832BB8",
"addr": "0x82000000",
"mnemonic": "nand",
"operands": "r3, r4, r5"
},
{
"label": "eqv r3,r4,r5",
"raw": "0x7C832A38",
"addr": "0x82000000",
"mnemonic": "eqv",
"operands": "r3, r4, r5"
},
{
"label": "andc r3,r4,r5",
"raw": "0x7C832878",
"addr": "0x82000000",
"mnemonic": "andc",
"operands": "r3, r4, r5"
},
{
"label": "orc r3,r4,r5",
"raw": "0x7C832B38",
"addr": "0x82000000",
"mnemonic": "orc",
"operands": "r3, r4, r5"
},
{
"label": "slw r3,r4,r5",
"raw": "0x7C832830",
"addr": "0x82000000",
"mnemonic": "slw",
"operands": "r3, r4, r5"
},
{
"label": "srw r3,r4,r5",
"raw": "0x7C832C30",
"addr": "0x82000000",
"mnemonic": "srw",
"operands": "r3, r4, r5"
},
{
"label": "sraw r3,r4,r5",
"raw": "0x7C832E30",
"addr": "0x82000000",
"mnemonic": "sraw",
"operands": "r3, r4, r5"
},
{
"label": "sld r3,r4,r5",
"raw": "0x7C832836",
"addr": "0x82000000",
"mnemonic": "sld",
"operands": "r3, r4, r5"
},
{
"label": "srd r3,r4,r5",
"raw": "0x7C832C36",
"addr": "0x82000000",
"mnemonic": "srd",
"operands": "r3, r4, r5"
},
{
"label": "srawi r3,r4,16",
"raw": "0x7C838670",
"addr": "0x82000000",
"mnemonic": "srawi",
"operands": "r3, r4, 16"
},
{
"label": "stwcx. r3,r4,r5",
"raw": "0x7C64292D",
"addr": "0x82000000",
"mnemonic": "stwcx.",
"operands": "r3, r4, r5"
},
{
"label": "stdcx. r3,r4,r5",
"raw": "0x7C6429AD",
"addr": "0x82000000",
"mnemonic": "stdcx.",
"operands": "r3, r4, r5"
},
{
"label": "lwarx r3,r4,r5",
"raw": "0x7C642828",
"addr": "0x82000000",
"mnemonic": "lwarx",
"operands": "r3, r4, r5"
},
{
"label": "ldarx r3,r4,r5",
"raw": "0x7C6428A8",
"addr": "0x82000000",
"mnemonic": "ldarx",
"operands": "r3, r4, r5"
},
{
"label": "cmpwi cr0, r3, 16",
"raw": "0x2C030010",
"addr": "0x82000000",
"mnemonic": "cmpi",
"operands": "0, r3, 16",
"ext_mnemonic": "cmpwi",
"ext_operands": "r3, 16"
},
{
"label": "cmpwi cr2, r3, 16",
"raw": "0x2D030010",
"addr": "0x82000000",
"mnemonic": "cmpi",
"operands": "cr2, 0, r3, 16",
"ext_mnemonic": "cmpwi",
"ext_operands": "cr2, r3, 16"
},
{
"label": "cmplwi cr0, r3, 16",
"raw": "0x28030010",
"addr": "0x82000000",
"mnemonic": "cmpli",
"operands": "0, r3, 0x10",
"ext_mnemonic": "cmplwi",
"ext_operands": "r3, 0x10"
},
{
"label": "cmpw r3,r4 in cr0",
"raw": "0x7C032000",
"addr": "0x82000000",
"mnemonic": "cmp",
"operands": "0, r3, r4",
"ext_mnemonic": "cmpw",
"ext_operands": "r3, r4"
},
{
"label": "cmpd r3,r4",
"raw": "0x7C232000",
"addr": "0x82000000",
"mnemonic": "cmp",
"operands": "1, r3, r4",
"ext_mnemonic": "cmpd",
"ext_operands": "r3, r4"
},
{
"label": "cmplw r3,r4",
"raw": "0x7C032040",
"addr": "0x82000000",
"mnemonic": "cmpl",
"operands": "0, r3, r4",
"ext_mnemonic": "cmplw",
"ext_operands": "r3, r4"
},
{
"label": "addi r3, r1, 16",
"raw": "0x38610010",
"addr": "0x82000000",
"mnemonic": "addi",
"operands": "r3, r1, 16"
},
{
"label": "addis r3, r1, 0x100 (ra!=0)",
"raw": "0x3C610100",
"addr": "0x82000000",
"mnemonic": "addis",
"operands": "r3, r1, 0x100"
},
{
"label": "mulli r3, r4, 5",
"raw": "0x1C640005",
"addr": "0x82000000",
"mnemonic": "mulli",
"operands": "r3, r4, 5"
},
{
"label": "subfic r3, r4, 5",
"raw": "0x20640005",
"addr": "0x82000000",
"mnemonic": "subfic",
"operands": "r3, r4, 5"
},
{
"label": "addic r3, r4, 16",
"raw": "0x30640010",
"addr": "0x82000000",
"mnemonic": "addic",
"operands": "r3, r4, 16"
},
{
"label": "addic. r3, r4, 16",
"raw": "0x34640010",
"addr": "0x82000000",
"mnemonic": "addic.",
"operands": "r3, r4, 16"
},
{
"label": "ori r4, r3, 0x10 (non-nop)",
"raw": "0x60640010",
"addr": "0x82000000",
"mnemonic": "ori",
"operands": "r4, r3, 0x10"
},
{
"label": "oris r4, r3, 0x10",
"raw": "0x64640010",
"addr": "0x82000000",
"mnemonic": "oris",
"operands": "r4, r3, 0x10"
},
{
"label": "xori r4, r3, 0x10",
"raw": "0x68640010",
"addr": "0x82000000",
"mnemonic": "xori",
"operands": "r4, r3, 0x10"
},
{
"label": "andi. r4, r3, 0x10",
"raw": "0x70640010",
"addr": "0x82000000",
"mnemonic": "andi.",
"operands": "r4, r3, 0x10"
},
{
"label": "lwz r5, 0x20(r1)",
"raw": "0x80A10020",
"addr": "0x82000000",
"mnemonic": "lwz",
"operands": "r5, 32(r1)"
},
{
"label": "stw r5, 0x20(r1)",
"raw": "0x90A10020",
"addr": "0x82000000",
"mnemonic": "stw",
"operands": "r5, 32(r1)"
},
{
"label": "lbz r5, 0x20(r1)",
"raw": "0x88A10020",
"addr": "0x82000000",
"mnemonic": "lbz",
"operands": "r5, 32(r1)"
},
{
"label": "lhz r5, 0x20(r1)",
"raw": "0xA0A10020",
"addr": "0x82000000",
"mnemonic": "lhz",
"operands": "r5, 32(r1)"
},
{
"label": "lfs f5, 0x20(r1)",
"raw": "0xC0A10020",
"addr": "0x82000000",
"mnemonic": "lfs",
"operands": "f5, 32(r1)"
},
{
"label": "lfd f5, 0x20(r1)",
"raw": "0xC8A10020",
"addr": "0x82000000",
"mnemonic": "lfd",
"operands": "f5, 32(r1)"
},
{
"label": "stfd f5, 0x20(r1)",
"raw": "0xD8A10020",
"addr": "0x82000000",
"mnemonic": "stfd",
"operands": "f5, 32(r1)"
},
{
"label": "ld r5, 0x20(r1)",
"raw": "0xE8A10020",
"addr": "0x82000000",
"mnemonic": "ld",
"operands": "r5, 32(r1)"
},
{
"label": "std r5, 0x20(r1)",
"raw": "0xF8A10020",
"addr": "0x82000000",
"mnemonic": "std",
"operands": "r5, 32(r1)"
},
{
"label": "sync 0 (extends to sync)",
"raw": "0x7C0004AC",
"addr": "0x82000000",
"mnemonic": "sync",
"operands": ""
},
{
"label": "isync",
"raw": "0x4C00012C",
"addr": "0x82000000",
"mnemonic": "isync",
"operands": ""
},
{
"label": "eieio",
"raw": "0x7C0006AC",
"addr": "0x82000000",
"mnemonic": "eieio",
"operands": ""
},
{
"label": "dcbst r1, r2",
"raw": "0x7C01106C",
"addr": "0x82000000",
"mnemonic": "dcbst",
"operands": "r1, r2"
},
{
"label": "dcbf r1, r2",
"raw": "0x7C0110AC",
"addr": "0x82000000",
"mnemonic": "dcbf",
"operands": "r1, r2"
},
{
"label": "dcbt r1, r2",
"raw": "0x7C01122C",
"addr": "0x82000000",
"mnemonic": "dcbt",
"operands": "r1, r2"
},
{
"label": "dcbz r1, r2",
"raw": "0x7C0117EC",
"addr": "0x82000000",
"mnemonic": "dcbz",
"operands": "r1, r2"
},
{
"label": "dcbz128 r1, r2",
"raw": "0x7C2117EC",
"addr": "0x82000000",
"mnemonic": "dcbz128",
"operands": "r1, r2"
},
{
"label": "crnor 4,5,6 (no simplify)",
"raw": "0x4C853042",
"addr": "0x82000000",
"mnemonic": "crnor",
"operands": "4*cr1+lt, 4*cr1+gt, 4*cr1+eq"
},
{
"label": "crand 4,5,6",
"raw": "0x4C853202",
"addr": "0x82000000",
"mnemonic": "crand",
"operands": "4*cr1+lt, 4*cr1+gt, 4*cr1+eq"
},
{
"label": "cror 4,5,6 (no simplify)",
"raw": "0x4C853382",
"addr": "0x82000000",
"mnemonic": "cror",
"operands": "4*cr1+lt, 4*cr1+gt, 4*cr1+eq"
},
{
"label": "tw 11, r3, r4 (uncommon TO)",
"raw": "0x7D632008",
"addr": "0x82000000",
"mnemonic": "tw",
"operands": "11, r3, r4"
},
{
"label": "tdi 11, r3, 123",
"raw": "0x0963007B",
"addr": "0x82000000",
"mnemonic": "tdi",
"operands": "11, r3, 123"
},
{
"label": "mtcrf 0xFF, r5 → mtcr",
"raw": "0x7CAFF120",
"addr": "0x82000000",
"mnemonic": "mtcrf",
"operands": "0xFF, r5",
"ext_mnemonic": "mtcr",
"ext_operands": "r5"
},
{
"label": "mfcr r5",
"raw": "0x7CA00026",
"addr": "0x82000000",
"mnemonic": "mfcr",
"operands": "r5"
},
{
"label": "mfmsr r5",
"raw": "0x7CA000A6",
"addr": "0x82000000",
"mnemonic": "mfmsr",
"operands": "r5"
},
{
"label": "mtmsr r5",
"raw": "0x7CA00124",
"addr": "0x82000000",
"mnemonic": "mtmsr",
"operands": "r5"
},
{
"label": "mtmsrd r5",
"raw": "0x7CA00164",
"addr": "0x82000000",
"mnemonic": "mtmsrd",
"operands": "r5"
},
{
"label": "fadd f3, f4, f5",
"raw": "0xFC64282A",
"addr": "0x82000000",
"mnemonic": "fadd",
"operands": "f3, f4, f5"
},
{
"label": "fsub f3, f4, f5",
"raw": "0xFC642828",
"addr": "0x82000000",
"mnemonic": "fsub",
"operands": "f3, f4, f5"
},
{
"label": "fdiv f3, f4, f5",
"raw": "0xFC642824",
"addr": "0x82000000",
"mnemonic": "fdiv",
"operands": "f3, f4, f5"
},
{
"label": "fmul f3, f0, f5 (encoded)",
"raw": "0xFCE02832",
"addr": "0x82000000",
"mnemonic": "fmul",
"operands": "f7, f0, f0"
},
{
"label": "fneg f3, f4",
"raw": "0xFC640050",
"addr": "0x82000000",
"mnemonic": "fneg",
"operands": "f3, f0"
},
{
"label": "fmr f3, f4",
"raw": "0xFC640090",
"addr": "0x82000000",
"mnemonic": "fmr",
"operands": "f3, f0"
},
{
"label": "mtfsf 0xFF, f5 (Rc=0)",
"raw": "0xFDFE2D8E",
"addr": "0x82000000",
"mnemonic": "mtfsf",
"operands": "0xFF, f5"
},
{
"label": "mtfsf. 0xFF, f5 (Rc=1)",
"raw": "0xFDFE2D8F",
"addr": "0x82000000",
"mnemonic": "mtfsf.",
"operands": "0xFF, f5"
}
]
}

View File

@@ -0,0 +1,623 @@
{
"rows": [
{
"label": "nop",
"raw": "0x60000000",
"addr": "0x82000000",
"mnemonic": "ori",
"operands": "r0, r0, 0x0",
"ext_mnemonic": "nop",
"ext_operands": ""
},
{
"label": "li r3, 16",
"raw": "0x38600010",
"addr": "0x82000000",
"mnemonic": "addi",
"operands": "r3, r0, 16",
"ext_mnemonic": "li",
"ext_operands": "r3, 16"
},
{
"label": "li r3, -1",
"raw": "0x3860FFFF",
"addr": "0x82000000",
"mnemonic": "addi",
"operands": "r3, r0, -1",
"ext_mnemonic": "li",
"ext_operands": "r3, -1"
},
{
"label": "subi r3, r4, 16",
"raw": "0x3864FFF0",
"addr": "0x82000000",
"mnemonic": "addi",
"operands": "r3, r4, -16",
"ext_mnemonic": "subi",
"ext_operands": "r3, r4, 16"
},
{
"label": "lis r3, 0x1234",
"raw": "0x3C601234",
"addr": "0x82000000",
"mnemonic": "addis",
"operands": "r3, r0, 0x1234",
"ext_mnemonic": "lis",
"ext_operands": "r3, 0x1234"
},
{
"label": "subis r3, r4, 0xFFFF",
"raw": "0x3C64FFFF",
"addr": "0x82000000",
"mnemonic": "addis",
"operands": "r3, r4, 0xFFFF",
"ext_mnemonic": "subis",
"ext_operands": "r3, r4, 0x1"
},
{
"label": "mr r3, r4",
"raw": "0x7C832378",
"addr": "0x82000000",
"mnemonic": "or",
"operands": "r3, r4, r4",
"ext_mnemonic": "mr",
"ext_operands": "r3, r4"
},
{
"label": "mr. r3, r4",
"raw": "0x7C832379",
"addr": "0x82000000",
"mnemonic": "or.",
"operands": "r3, r4, r4",
"ext_mnemonic": "mr.",
"ext_operands": "r3, r4"
},
{
"label": "mr (via and)",
"raw": "0x7C832038",
"addr": "0x82000000",
"mnemonic": "and",
"operands": "r3, r4, r4",
"ext_mnemonic": "mr",
"ext_operands": "r3, r4"
},
{
"label": "not r3, r4",
"raw": "0x7C8320F8",
"addr": "0x82000000",
"mnemonic": "nor",
"operands": "r3, r4, r4",
"ext_mnemonic": "not",
"ext_operands": "r3, r4"
},
{
"label": "subf → sub r3, r5, r4",
"raw": "0x7C642850",
"addr": "0x82000000",
"mnemonic": "subf",
"operands": "r3, r4, r5",
"ext_mnemonic": "sub",
"ext_operands": "r3, r5, r4"
},
{
"label": "slwi r3, r4, 4",
"raw": "0x54832036",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 4, 0, 27",
"ext_mnemonic": "slwi",
"ext_operands": "r3, r4, 4"
},
{
"label": "srwi r3, r4, 4",
"raw": "0x5483E13E",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 28, 4, 31",
"ext_mnemonic": "srwi",
"ext_operands": "r3, r4, 4"
},
{
"label": "rotlwi r3, r4, 8",
"raw": "0x5483403E",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 8, 0, 31",
"ext_mnemonic": "rotlwi",
"ext_operands": "r3, r4, 8"
},
{
"label": "clrlwi r3, r4, 4",
"raw": "0x5483013E",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 0, 4, 31",
"ext_mnemonic": "clrlwi",
"ext_operands": "r3, r4, 4"
},
{
"label": "clrrwi r3, r4, 4",
"raw": "0x54830036",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 0, 0, 27",
"ext_mnemonic": "clrrwi",
"ext_operands": "r3, r4, 4"
},
{
"label": "extlwi r3, r4, 8, 8",
"raw": "0x5483400E",
"addr": "0x82000000",
"mnemonic": "rlwinm",
"operands": "r3, r4, 8, 0, 7",
"ext_mnemonic": "extlwi",
"ext_operands": "r3, r4, 8, 8"
},
{
"label": "slwi. r3, r4, 4",
"raw": "0x54832037",
"addr": "0x82000000",
"mnemonic": "rlwinm.",
"operands": "r3, r4, 4, 0, 27",
"ext_mnemonic": "slwi.",
"ext_operands": "r3, r4, 4"
},
{
"label": "rlwinm. r11,r11,0,31,31 (no simplify)",
"raw": "0x556B07FF",
"addr": "0x82000000",
"mnemonic": "rlwinm.",
"operands": "r11, r11, 0, 31, 31",
"ext_mnemonic": "clrlwi.",
"ext_operands": "r11, r11, 31"
},
{
"label": "clrldi r3, r4, 32",
"raw": "0x78830020",
"addr": "0x82000000",
"mnemonic": "rldicl",
"operands": "r3, r4, 0, 32",
"ext_mnemonic": "clrldi",
"ext_operands": "r3, r4, 32"
},
{
"label": "srdi r3, r4, 8",
"raw": "0x7883C202",
"addr": "0x82000000",
"mnemonic": "rldicl",
"operands": "r3, r4, 56, 8",
"ext_mnemonic": "srdi",
"ext_operands": "r3, r4, 8"
},
{
"label": "rotldi r3, r4, 8",
"raw": "0x78834000",
"addr": "0x82000000",
"mnemonic": "rldicl",
"operands": "r3, r4, 8, 0",
"ext_mnemonic": "rotldi",
"ext_operands": "r3, r4, 8"
},
{
"label": "cmpwi cr0, r3, 16",
"raw": "0x2C030010",
"addr": "0x82000000",
"mnemonic": "cmpi",
"operands": "0, r3, 16",
"ext_mnemonic": "cmpwi",
"ext_operands": "r3, 16"
},
{
"label": "cmpdi (L=1) variant",
"raw": "0x2C230010",
"addr": "0x82000000",
"mnemonic": "cmpi",
"operands": "1, r3, 16",
"ext_mnemonic": "cmpdi",
"ext_operands": "r3, 16"
},
{
"label": "blr",
"raw": "0x4E800020",
"addr": "0x82000000",
"mnemonic": "bclr",
"operands": "20, lt",
"ext_mnemonic": "blr",
"ext_operands": ""
},
{
"label": "blrl",
"raw": "0x4E800021",
"addr": "0x82000000",
"mnemonic": "bclrl",
"operands": "20, lt",
"ext_mnemonic": "blrl",
"ext_operands": ""
},
{
"label": "bctr",
"raw": "0x4E800420",
"addr": "0x82000000",
"mnemonic": "bcctr",
"operands": "20, lt",
"ext_mnemonic": "bctr",
"ext_operands": ""
},
{
"label": "bctrl",
"raw": "0x4E800421",
"addr": "0x82000000",
"mnemonic": "bcctrl",
"operands": "20, lt",
"ext_mnemonic": "bctrl",
"ext_operands": ""
},
{
"label": "beqlr (BO=12, BI=2 → cr0.eq true)",
"raw": "0x4D820020",
"addr": "0x82000000",
"mnemonic": "bclr",
"operands": "12, eq",
"ext_mnemonic": "beqlr",
"ext_operands": ""
},
{
"label": "bnelr",
"raw": "0x4C820020",
"addr": "0x82000000",
"mnemonic": "bclr",
"operands": "4, eq",
"ext_mnemonic": "bnelr",
"ext_operands": ""
},
{
"label": "bc → b 0x82000040",
"raw": "0x42800040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "20, lt, 0x82000040",
"ext_mnemonic": "b",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc l → bl 0x82000040",
"raw": "0x42800041",
"addr": "0x82000000",
"mnemonic": "bcl",
"operands": "20, lt, 0x82000040",
"ext_mnemonic": "bl",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 12,cr0.eq → beq 0x82000040",
"raw": "0x41820040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "12, eq, 0x82000040",
"ext_mnemonic": "beq",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 4,cr0.eq → bne 0x82000040",
"raw": "0x40820040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "4, eq, 0x82000040",
"ext_mnemonic": "bne",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 12,cr0.lt → blt 0x82000040",
"raw": "0x41800040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "12, lt, 0x82000040",
"ext_mnemonic": "blt",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 4,cr0.lt → bge 0x82000040",
"raw": "0x40800040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "4, lt, 0x82000040",
"ext_mnemonic": "bge",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 12,cr0.gt → bgt 0x82000040",
"raw": "0x41810040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "12, gt, 0x82000040",
"ext_mnemonic": "bgt",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 4,cr0.gt → ble 0x82000040",
"raw": "0x40810040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "4, gt, 0x82000040",
"ext_mnemonic": "ble",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bc 12, cr2.eq → beq cr2, 0x...040",
"raw": "0x418A0040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "12, 4*cr2+eq, 0x82000040",
"ext_mnemonic": "beq",
"ext_operands": "cr2, 0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bdnz 0x82000040",
"raw": "0x42000040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "16, lt, 0x82000040",
"ext_mnemonic": "bdnz",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bdz 0x82000040",
"raw": "0x42400040",
"addr": "0x82000000",
"mnemonic": "bc",
"operands": "18, lt, 0x82000040",
"ext_mnemonic": "bdz",
"ext_operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "b +0x40 → 0x82000040",
"raw": "0x48000040",
"addr": "0x82000000",
"mnemonic": "b",
"operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "bl +0x40 → 0x82000040",
"raw": "0x48000041",
"addr": "0x82000000",
"mnemonic": "bl",
"operands": "0x82000040",
"branch_target": "0x82000040"
},
{
"label": "ba 0x40 absolute",
"raw": "0x48000042",
"addr": "0x82000000",
"mnemonic": "ba",
"operands": "0x00000040",
"branch_target": "0x00000040"
},
{
"label": "bla 0x40 absolute",
"raw": "0x48000043",
"addr": "0x82000000",
"mnemonic": "bla",
"operands": "0x00000040",
"branch_target": "0x00000040"
},
{
"label": "tdeqi r3, 123",
"raw": "0x0883007B",
"addr": "0x82000000",
"mnemonic": "tdi",
"operands": "4, r3, 123",
"ext_mnemonic": "tdeqi",
"ext_operands": "r3, 123"
},
{
"label": "twlti r3, 123",
"raw": "0x0E03007B",
"addr": "0x82000000",
"mnemonic": "twi",
"operands": "16, r3, 123",
"ext_mnemonic": "twlti",
"ext_operands": "r3, 123"
},
{
"label": "mflr r3",
"raw": "0x7C6802A6",
"addr": "0x82000000",
"mnemonic": "mfspr",
"operands": "r3, LR",
"ext_mnemonic": "mflr",
"ext_operands": "r3"
},
{
"label": "mfctr r3",
"raw": "0x7C6902A6",
"addr": "0x82000000",
"mnemonic": "mfspr",
"operands": "r3, CTR",
"ext_mnemonic": "mfctr",
"ext_operands": "r3"
},
{
"label": "mfxer r3",
"raw": "0x7C6102A6",
"addr": "0x82000000",
"mnemonic": "mfspr",
"operands": "r3, XER",
"ext_mnemonic": "mfxer",
"ext_operands": "r3"
},
{
"label": "mtlr r3",
"raw": "0x7C6803A6",
"addr": "0x82000000",
"mnemonic": "mtspr",
"operands": "LR, r3",
"ext_mnemonic": "mtlr",
"ext_operands": "r3"
},
{
"label": "mtctr r3",
"raw": "0x7C6903A6",
"addr": "0x82000000",
"mnemonic": "mtspr",
"operands": "CTR, r3",
"ext_mnemonic": "mtctr",
"ext_operands": "r3"
},
{
"label": "mtxer r3",
"raw": "0x7C6103A6",
"addr": "0x82000000",
"mnemonic": "mtspr",
"operands": "XER, r3",
"ext_mnemonic": "mtxer",
"ext_operands": "r3"
},
{
"label": "crnot 4, 5",
"raw": "0x4C852842",
"addr": "0x82000000",
"mnemonic": "crnor",
"operands": "4*cr1+lt, 4*cr1+gt, 4*cr1+gt",
"ext_mnemonic": "crnot",
"ext_operands": "4*cr1+lt, 4*cr1+gt"
},
{
"label": "crclr 4",
"raw": "0x4C842182",
"addr": "0x82000000",
"mnemonic": "crxor",
"operands": "4*cr1+lt, 4*cr1+lt, 4*cr1+lt",
"ext_mnemonic": "crclr",
"ext_operands": "4*cr1+lt"
},
{
"label": "crset 4",
"raw": "0x4C842242",
"addr": "0x82000000",
"mnemonic": "creqv",
"operands": "4*cr1+lt, 4*cr1+lt, 4*cr1+lt",
"ext_mnemonic": "crset",
"ext_operands": "4*cr1+lt"
},
{
"label": "crmove 4, 5",
"raw": "0x4C852B82",
"addr": "0x82000000",
"mnemonic": "cror",
"operands": "4*cr1+lt, 4*cr1+gt, 4*cr1+gt",
"ext_mnemonic": "crmove",
"ext_operands": "4*cr1+lt, 4*cr1+gt"
},
{
"label": "lwsync",
"raw": "0x7C2004AC",
"addr": "0x82000000",
"mnemonic": "sync",
"operands": "",
"ext_mnemonic": "lwsync",
"ext_operands": ""
},
{
"label": "trap",
"raw": "0x7FE00008",
"addr": "0x82000000",
"mnemonic": "tw",
"operands": "31, r0, r0",
"ext_mnemonic": "trap",
"ext_operands": ""
},
{
"label": "blr (BO=20, BI=4 — BI is don't-care)",
"raw": "0x4E840020",
"addr": "0x82000000",
"mnemonic": "bclr",
"operands": "20, 4*cr1+lt",
"ext_mnemonic": "blr",
"ext_operands": ""
},
{
"label": "blrl (BO=20, BI=7)",
"raw": "0x4E870021",
"addr": "0x82000000",
"mnemonic": "bclrl",
"operands": "20, 4*cr1+so",
"ext_mnemonic": "blrl",
"ext_operands": ""
},
{
"label": "bctr (BO=20, BI=4)",
"raw": "0x4E840420",
"addr": "0x82000000",
"mnemonic": "bcctr",
"operands": "20, 4*cr1+lt",
"ext_mnemonic": "bctr",
"ext_operands": ""
},
{
"label": "twllt r3, r4 (TO=2)",
"raw": "0x7C432008",
"addr": "0x82000000",
"mnemonic": "tw",
"operands": "2, r3, r4",
"ext_mnemonic": "twllt",
"ext_operands": "r3, r4"
},
{
"label": "twlgt r3, r4 (TO=1)",
"raw": "0x7C232008",
"addr": "0x82000000",
"mnemonic": "tw",
"operands": "1, r3, r4",
"ext_mnemonic": "twlgt",
"ext_operands": "r3, r4"
},
{
"label": "tdlge r3, r4 (TO=5)",
"raw": "0x7CA32088",
"addr": "0x82000000",
"mnemonic": "td",
"operands": "5, r3, r4",
"ext_mnemonic": "tdlge",
"ext_operands": "r3, r4"
},
{
"label": "twlle r3, r4 (TO=6)",
"raw": "0x7CC32008",
"addr": "0x82000000",
"mnemonic": "tw",
"operands": "6, r3, r4",
"ext_mnemonic": "twlle",
"ext_operands": "r3, r4"
},
{
"label": "twllti r3, 16",
"raw": "0x0C430010",
"addr": "0x82000000",
"mnemonic": "twi",
"operands": "2, r3, 16",
"ext_mnemonic": "twllti",
"ext_operands": "r3, 16"
},
{
"label": "tdlgei r3, 16",
"raw": "0x08A30010",
"addr": "0x82000000",
"mnemonic": "tdi",
"operands": "5, r3, 16",
"ext_mnemonic": "tdlgei",
"ext_operands": "r3, 16"
}
]
}

View File

@@ -0,0 +1,137 @@
{
"rows": [
{
"label": "vaddubm v3, v4, v5",
"raw": "0x10642800",
"addr": "0x82000000",
"mnemonic": "vaddubm",
"operands": "v3, v4, v5"
},
{
"label": "vaddfp v3, v4, v5",
"raw": "0x1064280A",
"addr": "0x82000000",
"mnemonic": "vaddfp",
"operands": "v3, v4, v5"
},
{
"label": "vand v3, v4, v5",
"raw": "0x10642C04",
"addr": "0x82000000",
"mnemonic": "vand",
"operands": "v3, v4, v5"
},
{
"label": "vor v3, v4, v5",
"raw": "0x10642C84",
"addr": "0x82000000",
"mnemonic": "vor",
"operands": "v3, v4, v5"
},
{
"label": "vxor v3, v4, v5",
"raw": "0x10642CC4",
"addr": "0x82000000",
"mnemonic": "vxor",
"operands": "v3, v4, v5"
},
{
"label": "vsel v3,v4,v5,v6",
"raw": "0x106429AA",
"addr": "0x82000000",
"mnemonic": "vsel",
"operands": "v3, v4, v5, v6"
},
{
"label": "vperm v3,v4,v5,v6",
"raw": "0x106429AB",
"addr": "0x82000000",
"mnemonic": "vperm",
"operands": "v3, v4, v5, v6"
},
{
"label": "vmaddfp v3, v4, v6, v5 (swap)",
"raw": "0x106429AE",
"addr": "0x82000000",
"mnemonic": "vmaddfp",
"operands": "v3, v4, v6, v5"
},
{
"label": "mfvscr v3",
"raw": "0x10600604",
"addr": "0x82000000",
"mnemonic": "mfvscr",
"operands": "v3"
},
{
"label": "mtvscr v5",
"raw": "0x10002E44",
"addr": "0x82000000",
"mnemonic": "mtvscr",
"operands": "v5"
},
{
"label": "vperm128 v3, v4, v5, 0 (canary)",
"raw": "0x14642800",
"addr": "0x82000000",
"mnemonic": "vperm128",
"operands": "v3, v4, v5, 0"
},
{
"label": "vsrw128 v0, v0, v12 (canary, vd_hi=00)",
"raw": "0x180061D0",
"addr": "0x82000000",
"mnemonic": "vsrw128",
"operands": "v0, v0, v12"
},
{
"label": "vsrw128 v32, v0, v12 (canary, VD128h=01)",
"raw": "0x180061D4",
"addr": "0x82000000",
"mnemonic": "vsrw128",
"operands": "v32, v0, v12"
},
{
"label": "vpermwi128 v64, v12, 0xE4 (canary, VD128h=10)",
"raw": "0x180463D8",
"addr": "0x82000000",
"mnemonic": "vpermwi128",
"operands": "v64, v12, 0xE4"
},
{
"label": "vrlimi128 v96, v12, 4, 3 (canary, VD128h=11)",
"raw": "0x180467DC",
"addr": "0x82000000",
"mnemonic": "vrlimi128",
"operands": "v96, v12, 4, 3"
},
{
"label": "vrlimi128 v127, v95, 4, 3 (canary)",
"raw": "0x1BE4FFDE",
"addr": "0x82000000",
"mnemonic": "vrlimi128",
"operands": "v127, v95, 4, 3"
},
{
"label": "vmaddfp128 v3, v35, v5, v3",
"raw": "0x146328F0",
"addr": "0x82000000",
"mnemonic": "vmaddfp128",
"operands": "v3, v35, v5, v3"
},
{
"label": "vmaddcfp128 v3, v35, v3, v5",
"raw": "0x14632930",
"addr": "0x82000000",
"mnemonic": "vmaddcfp128",
"operands": "v3, v35, v3, v5"
},
{
"label": "vnmsubfp128 v3, v35, v3, v5",
"raw": "0x14632970",
"addr": "0x82000000",
"mnemonic": "vnmsubfp128",
"operands": "v3, v35, v3, v5"
}
]
}

View File

@@ -42,15 +42,30 @@ impl Debugger {
}
}
/// Tier-3 perf: single branch that the hot interpreter loop checks
/// before dispatching to [`pre_step`]/[`post_step`]. When the
/// debugger is in "cold run" mode (not paused, no breakpoints,
/// `StepMode::Run`, in-memory trace off), both hooks become dead
/// code and we can skip the HashMap lookup + step-mode match + Vec
/// maintenance entirely. The compiler reliably branch-predicts the
/// stable branch direction across millions of instructions.
#[inline]
pub fn wants_hooks(&self) -> bool {
self.trace_enabled
|| self.paused
|| self.break_pending
|| !matches!(self.step_mode, StepMode::Run)
|| !self.breakpoints.is_empty()
}
/// Called before each instruction executes.
pub fn pre_step(&mut self, ctx: &PpcContext, _mem: &dyn MemoryAccess) {
// Check breakpoints
if let Some(bp) = self.breakpoints.get(&ctx.pc) {
if bp.enabled {
if let Some(bp) = self.breakpoints.get(&ctx.pc)
&& bp.enabled {
self.break_pending = true;
tracing::info!("Breakpoint hit at {:#010x}", ctx.pc);
}
}
}
/// Called after each instruction executes.

Some files were not shown because too many files have changed in this diff Show More