17 Commits

Author SHA1 Message Date
MechaCat02
93f60a3ba0 [iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier
Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C
(prcb_data@0x100 + current_cpu@0xC). Canary sets it from
`GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu =
cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C.
Left unwritten it read 0 for every thread.

Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at
entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread
occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`.
With index 0 for all threads, every thread marked slot 0; the multi-byte
rendezvous signature it then spins on (`ld [obj+0x164]` compared against the
packed per-slot expectation) could never assemble. Both pump threads busied at
pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran
their `KeSetEvent` loops — so the events they signal (the 21k-per-thread
heartbeat in canary) never fired, starving the downstream worker handshake.

Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread
image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler
spawn + affinity migration) so the two stay in sync.

Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier
(barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly
Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c).
imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical).
draws still 0 (a later, separate render gate). golden re-baselined.
cargo test --workspace: 672 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 18:08:46 +02:00
MechaCat02
2bdb93e51e [iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never
truly drained → render worker ring-space wait → no frame → no draw):

The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
windows differing only in cache policy — bare physical (0x0xxxxxxx),
write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing
addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc`
(MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx
window. The guest masks its CP-ring allocation base to bare physical
(0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to
VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve
pointers are likewise bare-physical. Ours stored those verbatim and read
`membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU
drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream,
and the RPtr writeback landed on a zero page the render worker (tid=8) polls,
freezing it forever.

Fix (GPU/Vd-boundary translation, not memory-layer): add
`physical_to_backing(addr)` deriving the committed backing exactly from
`heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for
the WC window, flat for non-physical code/stack). Apply it at every point the
GPU/kernel consumes a guest physical address: ring base
(initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4
INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write,
REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses,
the resolve dest write, and the vd_swap frontbuffer present read. This was
chosen over memory-layer aliasing because the latter re-projects every CPU
load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an
early PC=0xfffffffc fault).

Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful):
- WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef
  selector (it decoded wait_info&7==3 as NotEqual instead of Equal),
  inverting the standard CP coherency wait so the GPU parked forever on the
  first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never.
- Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit
  (mirrors command_processor.cc:801-838) so the coherency handshake resolves.

Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT,
INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of
zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU
worker); the frame loop sub_822F1AA8 actively writes the controller at
0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean.

draws_seen is still 0: the remaining gate is upstream and separate — the main
frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls
at 0x23, the known iterate-2C state-divergence gate), so the guest never
enqueues a render IB; the GPU only ever replays the init IB. This fix
correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the
bit-28 frame-ready gate is the next target.

Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at
50M); regenerated twice byte-identical. cargo test --workspace: 672 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 13:39:57 +02:00
MechaCat02
ed2e0e72fd [iterate-2J] KeTimeStampBundle deterministic tick: fix frozen+mislaid guest clock
The xboxkrnl data export KeTimeStampBundle (ordinal 0x00AD, import slot
0x820007d0 — confirmed via sylpheed.db imports table) was set up with TWO
defects in the import-patch pass:

  1. FROZEN: the block was written once at boot and never updated, so every
     field stayed a constant for the whole run (observed: the guest's clock
     reader sub_824AA830 = [[0x820007d0]+0x10] returned a constant
     0x01d6bc0c from 5M..150M instructions).
  2. WRONG LAYOUT: it stuffed the FILETIME high-dword at +0x10. The canonical
     X_TIME_STAMP_BUNDLE (xenia-canary kernel_state.h) is:
       +0x00 interrupt_time u64 (100ns since boot)
       +0x08 system_time    u64 (FILETIME 100ns since 1601)
       +0x10 tick_count      u32 (milliseconds since boot)
       +0x14 padding
     so [block+0x10] is tick_count in ms, not a FILETIME dword.

Fix (deterministic, no wall-clock):
  * Initialize the block with the correct field layout (tick_count = 0 at
    boot, system_time = FILETIME base, interrupt_time = 0).
  * Store the block VA on KernelState::timestamp_bundle_addr during the
    import patch.
  * Add KernelState::update_timestamp_bundle(mem, clock) and call it every
    round in BOTH the lockstep (run_execution) and parallel
    (run_execution_parallel) outer loops, right where the deterministic
    Scheduler::global_clock is advanced. The clock is the retired-instruction
    monotonic global_clock, so every guest-visible time value stays a pure
    function of guest progress (lockstep byte-reproducible).
  * Cadence: 1 global_clock unit = 100ns (coherent with parse_timeout, which
    divides 100ns timeouts by 100 onto the same basis), so
    INSTRUCTIONS_PER_MS = 10_000. tick_count now advances 0 -> ~4999ms over
    a 50M-instruction window. Also make KeQuerySystemTime read the same
    100ns clock instead of a frozen FILETIME constant.

Verification: tick_count at 0x40002010 now advances (deadline arm at
0x82450d0c stores clock+66 = 0x260,0x269,...,0x51d,... advancing, vs the
frozen 0x01d6bc4e before the fix). Determinism: two cold --stable-digest
runs are byte-identical; the n50m golden is UNCHANGED (the clock-affected
counter is not in the stable digest). 672/672 tests pass.

HONEST CAVEAT — the predicted render cascade did NOT materialize on this
branch. The diagnosed consuming gate at 0x82450b10 (the clock-vs-deadline
compare in the worker-hub channel loop sub_82450A68) is unreachable here:
the loop always branches away at 0x82450b0c ([this+220] >= channel-index),
so the hub already dispatches sub_82450B68 342x in BOTH the frozen and
fixed builds. Guest trajectory (imports 339766@50M / 1738001@200M /
9212446@1B), draws (0), swaps (2) and thread topology (tid14 Ready, not
blocked on 0x109c) are identical frozen-vs-fixed. This commit is therefore
a correct latent-clock-bug fix and determinism-safe prerequisite, NOT the
render unblock. The 0x109c/tid14 starvation premise was not reproduced at
f75bc96; the next gate must be re-localized.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 11:54:44 +02:00
MechaCat02
f75bc96d17 [iterate-2H] PPC spin/yield/sync hint-class audit: lock no-over-yield + barrier-decode invariants
Audited the full PowerPC spin/yield/sync/SMT-priority-hint instruction class
against the canary oracle (ppc_emit_alu.cc InstrEmit_orx / ppc_emit_memory.cc
sync/eieio/isync) and against what Project Sylpheed actually executes (static
scan of the extracted image + disasm of the spin sites 0x824D1328 /
0x824C17AC / 0x824D3CF8).

Findings (no behavior change required — the class is already faithful):
  - or rX,rX,rX SMT priority hints: canary special-cases EXACTLY 0x7FFFFB78
    (db16cyc) -> DelayExecution; every OTHER or-self form -> Nop. Ours already
    matches (only 0x7FFFFB78 yields). Image scan: the documented priority
    hints or 1/2/3/6/26..30 do NOT appear in Sylpheed at all; the only SMT
    spin hint used is or 31,31,31 (db16cyc), already handled in de21c7a. The
    854 `or 8,8,8` etc. are compiler register self-moves (plain no-ops), not
    spin hints.
  - sync / lwsync / ptesync share XO=598 -> all decode to PpcOpcode::sync
    (canary keys on XO only, identical); eieio (XO=854), isync (XO=150) decode
    correctly. All are value-neutral no-ops under the single-host model,
    matching canary MemoryBarrier/Nop. unimpl=0 in a 200M run confirms none
    trap. tlbsync is not implemented by canary either and is unused by Sylpheed.
  - mftb-based timed back-off (loop at 0x824D3CF8: mftb delta vs timeout, with
    db16cyc between polls and a timeout escape) relies on the already-landed
    db16cyc yield + coherent global-clock timebase; no deadlock, no new gap.
  - ori 0,0,0 canonical nop (140 sites) is value-neutral; matches canary Nop.

Lands two regression tests that lock the audited invariants so a future change
cannot over-yield on a benign priority hint (which would perturb the
deterministic schedule) or break the sync L-field decode:
  - test_smt_priority_hints_are_nops_not_yields
  - test_lwsync_ptesync_eieio_isync_decode_as_benign_noops

Determinism preserved (tests-only): two cold lockstep `check -n 5M` (no
persist) byte-identical; golden digest unchanged (no re-baseline). Full
workspace suite green. 200M cascade unchanged (packets~172M, draws=0,
shaders=0, swaps=1) — confirms the hint class is exhausted; the render gate is
now downstream (tid14 0x109c per-job completion event), not CPU semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:53:54 +02:00
MechaCat02
de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00
MechaCat02
f3b7e8b760 [iterate-2F] Scheduler anti-starvation floor: fix job-4 handoff render gate
The lockstep scheduler's pick_runnable is strict priority
(max_by_key (priority, -idx)). On a cooperative single-host HW slot,
a CPU-bound spinner that never blocks (the silph poll loop pinned by
affinity to hw=5) wins pick_runnable every round forever, permanently
starving a co-located peer (the submitter, tid6) that the spinner is
actually waiting on. On real hardware those threads run on separate SMT
contexts concurrently, so the spinner never starves the submitter; ours
collapses them onto one slot with no anti-starvation, turning priority
(or equal-priority index order) into permanent starvation.

The starved submitter never dequeued job-4 -> the worker-hub (tid5)
blocked INFINITE on completion event 0x1080 -> silph (tid13) wedged on
0x1078 -> no vsync -> draws_seen=0, the publisher splash never renders.
(decrement_quantum's within-slot rotation is dead: begin_slot_visit
unconditionally re-pick_runnable()s each round, discarding the rotated
running_idx. The fix is therefore evaluated at pick time, not via that
discarded rotation.)

Fix (Option A, bounded anti-starvation, deterministic):
- Add per-thread steps_starved counter to GuestThread.
- begin_slot_visit increments it for every Ready peer passed over this
  visit, resets it to 0 for the picked thread.
- pick_runnable selects by effective_priority: once steps_starved
  reaches STARVE_LIMIT (4096) the thread is lifted to i32::MAX and wins
  exactly one pick, then resets. The genuinely higher-priority thread
  still wins ~4095/4096 visits -- the boost grants periodic forward
  progress only, it does NOT invert priority. Pure function of
  counter/priority/index -> deterministic (no wall-clock, no RNG).

Cascade (lockstep exec, XENIA_CACHE_PERSIST=1, -n 200M):
- submitter dequeue sub_82458508 now fires 4x (was 3x); the 4th job
  (buf 0x40baa2c0) is dequeued at cycle 6.15M.
- hub tid5 leaves Blocked(0x1080) -> now Ready (no more INFINITE wait).
- GPU packets 0 -> 116,101,363 (command stream now flowing).
- tid13 (silph::UImpl) advances past the old 0x1078 wedge to a NEW
  downstream wait (handle 0x10a0); 3 new threads spawn (tid14/15/16).
- draws_seen still 0 -> the splash's first draw is a NEW downstream gate,
  not this starvation.

Determinism: two cold lockstep `check -n 5M` runs byte-identical (full
and stable digests). New n50m stable digest deterministic across two
cold runs. Golden re-baselined: instructions 50000007->50000003,
imports 92317->90296 (trajectory shift from the changed pick order).

Tests: 666/666 (+1 test_anti_starvation_bounded_progress).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:02:02 +02:00
MechaCat02
7e2603a9e5 [iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix)
Lockstep livelocked the scheduler the same way --parallel did before
0332d19: the kernel deadline-arithmetic (`now_basis_at`) read per-thread
`ctx(hw_id).timebase`, but a parked/poll thread has `running_idx == None`
so `Scheduler::ctx()` returns `idle_ctx` (timebase 0). A poll thread (tid=7,
a `KeWaitForSingleObject` loop with a 30ms relative timeout) computing its
deadline via `parse_timeout` therefore read `now = 0` and registered
`deadline = 0 + 3000 = 3000` — a constant ~7.78M units in the past.
`coord_idle_advance` then re-armed that same constant 3000 deadline forever,
pinning virtual time and starving every other thread's real future deadline.

Render-gate impact: the submitter (tid=6) re-enters a 16ms-timeout
WaitForMultiple after its first jobs; that timeout never fired because vtime
was pinned at 3000, so virtual time never reached real future deadlines.

Fix (Option A — mirror the parallel fix): drive the existing deterministic
`Scheduler::global_clock` in lockstep too (floored up once per outer round
to `stats.instruction_count`, a pure function of retired guest instructions —
no wall-clock), and route `KernelState::now_basis_at` through `global_clock()`
in BOTH modes. New `Scheduler::advance_global_clock_to(now)` floor-up keeps it
monotone alongside `advance_all_timebases_to`. Parallel behavior unchanged
(it already read `global_clock()`).

Verified (lockstep, 50M):
- DETERMINISM: two cold `check -n 5M` and two cold `-n 50M` runs byte-identical.
- LIVELOCK GONE: "advanced to deadline" went from 592,679 fires / 2 unique
  values / 562,084 pinned at 3000  ->  18,586 fires / 18,567 unique /
  0 pinned, strictly increasing 5.4M -> 50M. Poll thread tid=7 now ends
  Blocked with a real future deadline Some(60002824) instead of spin-Ready
  on the past 3000.
- imports 1,790,936 -> 92,317 at 50M (the spin no longer burns import calls).

Cascade (lockstep, XENIA_CACHE_PERSIST=1, -n 200M): engine now runs to budget
instead of hard-deadlocking. Hub enqueue (sub_82458068) 4x; submitter dequeue
(sub_82458508) still 3x — the lost 4th-job HANDOFF (count/notify between
sub_82458068's tail and the submitter queue) is a SEPARATE downstream gate,
not the timebase. New gate: tid=5 (hub) Blocked INFINITE on event 0x1080
(job-4 completion); tid=6 (submitter) Ready, parked in WaitForMultiple
(sub_824AB214), loop-top stops at cycle 6.23M. draws still 0, VdSwap 1.

Golden re-baseline (same commit): sylpheed_n50m
  instructions 50000004 -> 50000007, imports 1790936 -> 92317
  (swaps/draws/RTs/shaders/textures unchanged). sylpheed_n2m unchanged
  (livelock onsets after 2M). Suite 665/665 + oracle green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 21:42:28 +02:00
MechaCat02
5aaadfec36 [iterate-2E] Add XENIA_AUDIT_DEREF pointer-chase probe
On each AUDIT-PC-PROBE fire, treat gpr[reg] as a base object, dump its
first 64 bytes, follow [base+off] to a sub-object, dump that, then follow
[[base+off]+0] to its vtable and dump 48 slots. Env-gated
(XENIA_AUDIT_DEREF=<reg>:<off>), read-only, lockstep digest unaffected.

Captures the live work-item + stream object + vtable at sub_824510E0
before the pool recycles the slot — which overturned the prior session's
"infinite spin" diagnosis: the streaming read PROGRESSES 68/68 128KB
chunks of a 9MB file, then the hub (tid=5) blocks INFINITE on a
self-created Event/Manual (0x1060) that is never signaled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 20:29:01 +02:00
MechaCat02
0332d1990d [Track 2] Parallel-scoped global clock fixes timebase-desync livelock
In --parallel mode a long run livelocked: the scheduler spun
"advanced to deadline 3000 waking hw=2 idx=0" ~14k times in
microseconds. Root cause: each guest thread owns ctx.timebase
(+1/instr in step_block), and all kernel deadline arithmetic read
Scheduler::ctx(hw_id).timebase as "now". But the parallel worker
extracts its PpcContext via mem::replace(ctx_mut_ref, PpcContext::new())
— leaving a ZEROED timebase in the slot while it steps unlocked — and
advance_all_timebases_to only walks runqueue (never idle_ctx). So the
coordinator's coord_pre_round drain and a woken thread's parse_timeout
could read a zeroed/stale basis decoupled from the deadline the
scheduler just advanced to. The thread re-armed the same constant
deadline forever; the global clock never moved.

Fix: add a single monotonic Scheduler::global_clock, advanced by the
per-block retired-instruction count on each parallel writeback and
floored up by advance_all_timebases_to. Kernel deadline reads route
through KernelState::now_basis_at(hw_id), which returns global_clock
ONLY when parallel_active; lockstep keeps reading the exact pre-existing
ctx(hw_id).timebase expression, so the deterministic lockstep trace is
byte-identical (sylpheed_n50m golden unchanged, zero re-baseline).

Verified:
- 50M --parallel run completes (was: hung). Deadlines now strictly
  increasing 5.4M -> 49.1M (18097 unique of 18116; max repeat 2) vs
  pre-fix constant 3000 x ~14000.
- sylpheed_n50m golden byte-identical via plain `check` (no persist).
- Full suite 665/665 green.

Note: an intermittent parallel hang/crash (~1-2/20 at -n 5M) is
pre-existing (master 1/20, this build 2/20 — within noise) and distinct
from the timebase livelock: it is a parallel-race class (e.g. the
unsafe block_ptr deref in run_execution_parallel). Tracked separately;
lockstep remains the recommendation for long runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 19:32:14 +02:00
MechaCat02
6271ba1f55 chore: gitignore vkd3d-proton/DXVK runtime shader caches
The Wine canary build drops vkd3d-proton.cache into the working dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 18:06:25 +02:00
MechaCat02
48b19e490f [Prong A] Three 32-bit ABI PPCBUG siblings corrected to canary semantics
Second differential audit, lead prong: hunt siblings of PPCBUG-020 (the
word-form ALU truncation fixed in 341196a, whose "32-bit ABI / MSR.SF=0"
premise was false — Xenon is a 64-bit core). Found three more band-aids of the
same class, each verified against the canary oracle. All three are genuine
oracle/ISA divergences but INERT on Sylpheed's lockstep trace (sylpheed_n50m
golden digest unchanged; no re-baseline). Fixed + directed tests anyway to
close the band-aid class (per audit decision).

1. slw/srw shift-count mask (PPCBUG-044 site). Ours tested the full u32 count
   `< 32`; canary InstrEmit_slwx/srwx mask `rb & 0x3F` then test bit 5. A count
   like 0x40 (low-6-bits 0) must pass the value through, not zero it. Fixed both
   to `& 0x3F`. The 32-bit CR0 i32-view is unchanged (genuinely 32-bit).

2. sraw/srawi result extension (PPCBUG-041/042/043 "writeback truncation").
   Ours zero-extended the 32-bit arithmetic-shift result (`result as u32 as u64`);
   PowerISA + canary InstrEmit_srawx/srawix SIGN-extend it (`f.SignExtend`, the
   `(i64.s)&¬m` fill). 0x80000000>>1 is now 0xFFFFFFFF_C0000000, not
   0x00000000_C0000000. CA math and CR0 view byte-identical.

3. mtspr CTR width (PPCBUG-054). Ours stored `val as u32 as u64`, dropping the
   upper 32 bits; CTR is a 64-bit SPR and canary InstrEmit_mtspr stores the full
   GPR (`f.StoreCTR(rt)`). A later `mfspr rX, CTR` now round-trips correctly.
   bdnz/bcctr still consume only CTR's low 32 bits (the bcx zero-TEST truncation
   at line ~922 MATCHES canary's `f.Truncate(ctr, INT32_TYPE)` — left untouched).

Tests: updated srawx_negative_value_sign_extends_upper,
srawix_high_count_negative_input_sign_extends_all_ones, and
mtspr_ctr_keeps_full_64_bits (formerly premise-defending the bugs —
reading-error #24). Added slwx/srwx 6-bit-mask tests, mfspr_ctr round-trip, and
the rlwinm MB>ME wraparound-mask test (plan-requested gap closure). 665/665.

Left correct (re-confirmed vs canary, do NOT touch): bcx/bclr CTR 32-bit test,
divw/divwu zero-extend quotient (canary f.ZeroExtend, ISA upper undefined),
extsb/extsh, logical-NOT chain, mulhw/mulhwu, srawx 0x3F mask, pixel pack/unpack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 17:25:41 +02:00
MechaCat02
341196a111 [Issue-1 PPCBUG-020] Word-form ALU ops produce full 64-bit results
Xenon is a 64-bit PPC core (32-bit *pointer* ABI, but 64-bit registers and
integer arithmetic). The interpreter was truncating every word-form integer
ALU writeback to 32 bits and zero-extending, on a false "MSR.SF=0 / 32-bit
ABI" premise. This silently corrupted any genuine 64-bit value flowing through
word-form arithmetic.

Confirmed load-bearing via runtime ours-vs-canary capture: Sylpheed's
millisecond->LARGE_INTEGER timeout converter sub_824ACA88 does
`clrldi; mulli r11,r11,-10000; std`. For a 16 ms wait the correct result is
-160000 = 0xFFFFFFFF_FFFD8F00 (relative). canary stores exactly that; ours'
truncating `mulli` stored 0x00000000_FFFD8F00 (positive) -> the i64 timeout
read as a huge *absolute* deadline -> a ~26000x over-wait that froze the main
frame loop. After the fix the timeout matches canary and the previously-frozen
frame/worker loops run (parallel boot NtWaitForMultipleObjectsEx 94 -> 30428;
KeWaitForSingleObject/critical-section loops resume).

Fix mirrors canary's INT64 emitters (ppc_emit_alu.cc) op-by-op for the 17
data-losing word-form ops: addis, addic(.), subfic(.), mulli, add(c/e/ze/me)x,
subf(c/e/ze/me)x, negx, mullwx. Only the result *writeback* widens to full
64 bit; the 32-bit carry (XER[CA]) and overflow (XER[OV]) computations and the
CR0 i32 view are preserved byte-identical (the low 32 bits of the new result
equal the old truncated result), so this is a strict no-op for clean 32-bit
values and only restores the previously-zeroed upper bits for genuine 64-bit
values. Genuinely-32-bit ops (rlwinm/slw/srw/cmpw, mulhw/divw whose upper bits
are ISA-undefined) are left untouched.

Updated 7 unit tests that asserted the truncation (they encoded the bug) to the
canary-correct full-64-bit values. Re-baselined the sylpheed_n50m golden
(imports 40454 -> 1790936: the unwedged frame/worker loops now cycle under the
instruction-count timebase); sylpheed_n2m unchanged (pre-frame-loop). Lockstep
determinism preserved (two 50M runs identical). Full suite 660/660.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 16:21:11 +02:00
MechaCat02
b20c99f141 [Subsystem-fixes] 6 verified ours-vs-canary divergence fixes
From the 2026-06-12 5-subsystem differential audit. All verified against
canary as oracle; 660/660 workspace tests green (655 + 5 new).

1. nt_create_event polarity (exports.rs) — `manual_reset = gpr[5] != 0`
   was INVERTED. Canary xboxkrnl_threading.cc:668 `Initialize(!event_type,..)`
   + xevent.cc:41 (type 0 = NotificationEvent = manual, type 1 = Sync = auto).
   Now `== 0`. Was the dormant 2.AI fix on chore/portable-snapshot, never
   merged. The Ke-path was already correct; only the Nt-path was wrong.

2. 2.AF deadline drain (main.rs coord_pre_round) — expired KeWait/KeDelay
   deadlines never fired under load because advance_to_next_wake_if_due was
   only called in coord_idle_advance (no-Ready-threads path). Added a
   per-round drain loop; covers BOTH lockstep and parallel outer loops since
   both call coord_pre_round. Was the dormant 2.AF fix, never merged.

3. handle slab-recycle ABA guard (state.rs + scheduler.rs) — release_handle_slot
   (my round-34 regression) recycled a closed slot even with a thread still
   parked on it, risking a stale-waiter wake when the slot is re-minted. Added
   Scheduler::any_thread_waiting_on; decline to recycle a still-waited slot.

4. vpkpx pixel-pack (vmx.rs) — wrong field mapping (~100% mismatch). Now
   exact canary ppc_emit_altivec.cc:1795 shift/mask (red 6b out[15:10] from
   w[24:19], green out[9:5] from w[14:10], blue out[4:0] from w[7:3]; no
   fabricated alpha bit). +unit test.

5. VFS GDFX attribute plumbing (vfs/*, exports.rs query fns) — VfsEntry now
   carries the real on-disc attribute byte (GDFX dirent +12, canary
   disc_image_device.cc:136/154) instead of inferring directory-ness from
   path shape. Query exports report the real FILE_ATTRIBUTE_* bits. Candidate
   driver of the XamShowDirtyDiscErrorUI gate. +tests.

6. MmGetPhysicalAddress region-aware mirror (exports.rs) — flat 0x1FFFFFFF
   mask missed canary's +0x1000 host_address_offset for 0xE0000000+ mirror
   (memory.cc:2317). Read-only query; proven byte-identical 50M digest. +test.

Investigated and intentionally NOT changed:
- zero-on-recommit: no-op; ours has no region-reuse path (bump allocators,
  free is a stub).
- 32-bit ALU writeback truncation (PPCBUG-020): documented-deliberate; premise
  (MSR.SF=0) is questionable but flipping it is out of scope here.
- KeSetEvent/NtSetEvent return value: ours returns true previous state
  (hardware-faithful); canary returns constant 1 — NOT an ours bug.

sylpheed_n50m golden will need re-baselining (legit behavior change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 14:57:38 +02:00
MechaCat02
db90ad0f7d [AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis
Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0,
start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128)
and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY
instructions. Env-gated; default off.

SR4 verdict B (partial unwedge):
- handle 0x1078 signal_attempts 0->1
- tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108
- ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15)
- New downstream wedges 0x1084 + 0x1088
- cxx_throw runtime_error on tid=5 inside R26 dispatcher
  (BST not-registered instance lhs=0x715a7af0)
- VdSwap stays 1; no draws (POC is diagnostic, not final fix)

Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive
NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST
instance upstream; this POC only does (a).

Reading-error class #20: ctx.lr at kernel export entry is the thunk
wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain
1 step to get frames[1].lr.

Reading-error class #21: --parallel and lockstep have SEPARATE outer
loops in main.rs (run_execution_parallel line 2928 vs run_execution
line 2706). Per-round hooks must be wired in BOTH paths.

Files:
- crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context
  fields + spawn() population + current_thread_entry_and_ctx() helper
- crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed
  silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint,
  maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals
- crates/xenia-kernel/src/exports.rs: hook in nt_create_event
- crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops
- audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md

Tests 655/655 green. Default behavior byte-identical when env unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 18:38:38 +02:00
MechaCat02
481591fdb2 [AUDIT-059 R-C1] Phase C: bit-28 setter hypothesis REFUTED via dump-addr
Phase A's diagnosis (bit 28 of [0x40d09a40] gets set to exit
sub_822F1AA8's loop) is falsified by direct probe + --dump-addr in 4
sub-rounds.

Key evidence:
- sub_821B55D8 candidate fn fires 0× in ours; sub_824AA858
  (XamInputSetState wrapper) fires 0× in canary too — chain is dead code
  in both engines.
- end-of-run dump shows [0x40d09a40+0] = 0x00000021, same as at entry —
  bit 28 is NEVER set.
- bcctrl at PC 0x822F1B4C (sub_822F1AA8+0xA4) fires (LR=0x822F1B50) but
  the post-bcctrl BB head 0x822F1B50 fires 0× — bcctrl never returns.
- sub_82173990 (vtable[0] of singleton at [0x828E1F08]) is the call
  target; tid=1 wedges inside this 768-byte function on a thread-join
  to handle 0x1070 (= tid=13's thread handle).
- tid=13 (entry=sub_821748F0, ctx=0x4024a840, handle=0x1070) reaches
  sub_821C4EB0 (silph::UImpl@GamePart_Title) at cycle 1882 → audit-049
  cluster IS reached, wedges on handle 0x1078 there.

C.2 force-clear POC NOT EXECUTED — would be no-op since bit 28 is never
set. Per plan stopping criterion, hand back instead of proceeding blind.

Adds reading-error class #19: disasm-pattern-match without runtime
verification (Phase A scanned 49 oris-0x1000 sites and declared one the
setter without ever observing the bit get set).

No xenia-rs source changes. Canary repo also unchanged (config edit
reverted clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:57:27 +02:00
MechaCat02
52c30d82a7 [AUDIT-059 R-A] Phase A backward-trace: divergence is sub_822F1AA8 loop exit, not factory/registry
Round-37 anchor reframe: both engines install the SAME static .rdata vtable
0x820A183C at [0x828E1F08]. Instance VAs differ only because of ε-class
allocator divergence (audit-043). vtable bytes byte-identical; the user
prompt's "factory/registry" framing was falsified.

Phase A walkthrough (rounds A1..A8):
- A.1 canary --audit_jit_prolog_pc=0x821741C8: tid=6, r3=0xBCCC4A80 (= inner
  sub-object of [0x828E1F08]'s singleton), LR=0x822F1D5C (return-from-bctrl
  inside sub_822F1AA8)
- A.2 found tid=6 spawn site sub_821746B0 at PC 0x82174824 spawning
  entry=sub_821748F0 ctx=BC365700/BC366DA0. sub_822F1AA8 ALSO spawns a
  second thread (entry=sub_822F1EE0 ctx=BCE24A40) at PC 0x822F1B08
- A.3 sub_822F1AA8 has 2 callers, both in sub_8216EA68 (its sole caller is
  sub_824AB748 = entry_point)
- A.4 ours mirror probe: sub_821746B0 enters, [0x828E2B14] gate passes,
  ExCreateThread fires returning handle 0x1070 (= tid=13). Ours' tid=13
  IS the same logical thread as canary's spawned silph initializer
- A.5 canary --audit_jit_prolog_pc=0x821749C0: fires only 2× on short-lived
  tid=17, tid=26 (the spawned initializers — NOT tid=6)
- A.6 canary --audit_jit_prolog_pc=0x822F1AA8: fires 1× on tid=6 with
  r3=0xBCE24A40 LR=0x8216EE14 (the second sub_822F1AA8 call site)
- A.7 canary --audit_jit_prolog_pc=0x824AB748 (entry_point): fires on
  tid=00000006. CONFIRMS canary's tid=6 = canary's main thread.

Verdict: identical call chain entry_point → sub_8216EA68 → sub_822F1AA8 in
both engines; same controller (ε-divergent VA, byte-identical fields).
Canary's main thread stays in sub_822F1AA8's dispatcher loop firing
sub_821741C8 ~1678×/30s. Ours' main thread exits the loop and thread-joins
on the spawned initializer (tid=13), which is itself wedged on handle 0x1078
forever.

Loop exit is gated by bit 28 of [r30+0] (the controller's flag word). Same
value 0x21 at function entry in both engines. Some code between entry and
loop check sets bit 28 in ours but not in canary. Mem-watch on 0x40d09a40
shows zero guest stores in ours' 50M parallel run — setter is either a
kernel-side store, computed alias, or probe-quantum-elided JIT store.

Phase B classification: Class 3a (state-divergence on controller object).
The vtable is the same; the controller's bit 28 evolves differently during
sub_822F1AA8 setup. Class 4 (synthesis) is now less attractive since we
correctly reach the dispatcher with the right inputs — we just exit too
soon.

Phase C will need either JIT instrumentation to identify the bit-28 setter,
or a kernel-side hook to clear bit 28 on entry to the loop check site.

Findings notes:
- round-A4b-ours-spawn-gate/FINDINGS.md (spawn topology + tid mapping)
- round-A8-ours-822F1AA8-trace/FINDINGS.md (full loop structure + bit-28 gate)

New reading-error class #18: probe-output anchor misframing (singleton[VA]=X
vtable=Y was misread as "Y is canary-only vtable" when Y is the same
.rdata vtable in both engines).

Branch: iterate-2C/silph-ui-spawn-trace off master @ 229b46c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:02:20 +02:00
MechaCat02
229b46c765 [Kernel] Slab-recycle handle allocator (AUDIT-059 R34)
Adds a FIFO free list of closed handle slots so alloc_handle returns
recycled IDs before bumping next_handle. Mirrors canary's slab-style
ObjectTable: F8000098 reused 130x per 30s window in canary, but ours'
monotonic bump allocator never reused slots — so a recycled slot in
canary maps to a fresh, never-reused slot in ours, drifting kernel
object identity per AUDIT-042's analysis.

release_handle_slot is wired into nt_close's refcount==0 branch and
gated to the canonical [0x1000, 0xF000_0000) range so synthetic
XAudio park handles (AUDIT-048) are never recycled.

Verified: all 655 workspace tests green, smoke tests at -n 50M show
NtClose 115/run with handle table renumbering active (round-34 max
handle 0x12ac vs round-16 baseline 0x12b8 over same workload). γ-
cluster #2 wedge unchanged — silph wait still parks tid=13 on the
renumbered handle (4216=0x1078 here vs 0x12a4 baseline), confirming
the wedge is independent of allocator policy. Lands as a parity
fix to bring our kernel-object identity in line with canary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-10 18:04:34 +02:00
53 changed files with 694918 additions and 206 deletions

5
.gitignore vendored
View File

@@ -11,3 +11,8 @@ audit-*.md
*.stdout *.stdout
*.stderr *.stderr
*.log *.log
# Runtime cache artifacts (vkd3d-proton / DXVK shader caches dropped into the
# working dir by the Wine canary build)
vkd3d-proton.cache*
*.dxvk-cache

View File

View File

@@ -0,0 +1,131 @@
# AUDIT-059 — handle disambiguation (iterate 2.BD)
**Date:** 2026-06-06. **Engines:** ours `target/release/xenia-rs -n 50M` (3.9 s wall, 50M instr, 40k import calls), canary Wine `xenia_canary.exe --mute=true --audit_handle_lifecycle=true` (~35 s wall, 34k log lines, 0 fatals).
## Verdict — HANDOFF's wedge handles are stale
HANDOFF said: *"opt_callback signals 0x108c, tid=1 wedges on 0x10e8."* Both IDs are now `<UNCREATED>` in ours, along with `0x1090 / 0x10dc / 0x10fc / 0x1104` (also in HANDOFF's adjacent list). The allocation order shifted since that snapshot.
## Real wedges, current code state
| Handle | Kind | Engine state | Waiter | Notes |
|---|---|---|---|---|
| **0x12a4** | `<UNCREATED>` | `<AUDIT_BLIND>`, waiters=1 | **tid=1 main**, pc=0x824ac578 | Wait went via `do_wait_single` but creation never hit `NtCreateEvent``KeInitializeEvent` path. **This is the iterate-2.BC wedge** (recorded as "0x10e8" in HANDOFF — same site, different ID). |
| **0x12ac** | Event/Auto | `<NO_SIGNALS_DESPITE_WAITS>`, waiters=1 | **tid=13** silph UI cluster, pc=0x824ac578 lr=0x821cb1e0 | Frame trail: `0x821cb1e0 → 0x821cbae0 → 0x821cc454 → 0x821c4f18 → 0x82174a80`. Frames 3-5 carry `silph::UImpl@GamePart_Title` / `silph::VGamePart_Title` vtables — **audit-049's cluster, unchanged**. |
| 0x12b8 | Event/Auto | NO_SIGNALS, waiters=1 | (tid TBD) | Sibling, 0xC bytes from 0x12ac. |
| 0x1020 | Event/Manual | NO_SIGNALS, waiters=1 | — | γ-class. |
| 0x1040 | Event/Auto | NO_SIGNALS, waits=32 (hot poll) | — | Heavy wait, no signal. |
| 0x10a8 | Event/Auto | NO_SIGNALS, waits=7 | — | γ-class. |
| 0x10e4 | Event/Manual | NO_SIGNALS, waiters=1, waits=2 | — | γ-class. |
**Working handles** (sanity baseline): 0x1028 (Sema, 8 waits / 7 signals / 7 wakes), 0x10d0 (Sema, 2 waits / 1 signal / 1 wake), 0x10f0 (Event/Auto, 1/1/1 ✓ marked `<SUSPECT>` but actually fine), 0x10e0 (Event/Manual, 32 primary signals from somewhere).
## GPU interrupt delivery — the iterate-2.BC delta confirmed
| Engine | gpu.interrupt.delivered (vsync) | EmulateCPInterruptDPC / vblank pump |
|---|---:|---:|
| **ours** | 54 (source=0) + 1 (source=1) | — |
| **canary** | — | **4712** in 30 s ≈ 157 Hz |
**~87× ratio.** Confirms HANDOFF's diagnosis: ours' victim-thread injector dies once guest threads all park; canary's host frame-limiter thread keeps firing regardless.
## Canary signaler attribution
Top KeSetEvent guest_ptrs in canary (30 s window):
| guest_ptr | KeSetEvent fires | Inferred role |
|---|---:|---|
| `0x828A3254` | 5729 | Audio host-pump worker (per AUDIT-032: `r3=0x828A3230` region) |
| `0x828A3244` | 5728 | Audio host-pump sibling |
| `0x828A3244` + 16-byte stride | — | Static XEX-image audio event struct |
| `0xBCE25234` | 1301 | **silph UI cluster PKEVENT** (heap-allocated, 0x10 stride). Likely ours' 0x12ac analog. |
| `0xBCE25214 / 0xBCE25244 / 0xBCE25224` | 648 / 603 / 603 | Sibling silph UI PKEVENTs (0x10 stride struct). Likely ours' 0x12a4 / 0x12b8 / 0x1040 analogs. |
Ours signals every one of those equivalents **0 times**.
## Round 2 — LR-extended probes name the producer
Extended the canary probes with guest-LR capture (5 sites in `xboxkrnl_threading.cc`, 10 LOC). Re-ran the harness. Now each `KeSetEvent` line carries the guest function that signaled the event. Result for the silph UI cluster:
| PKEVENT | KeSetEvent count | Producer LR(s) |
|---|---:|---|
| `0xBCE25214` | 574 | `0x82508510` (single producer) |
| `0xBCE25224` | 565 | `0x82508358` (single producer) |
| `0xBCE25234` | 1153 | `0x82506C90` (579) + `0x82508524` (574) |
| `0xBCE25244` | 570 | `0x82506F9C` (single producer) |
| `0xBCE25284` | 1 | `0x82507ABC` (one-shot 5th-worker init?) |
All 6 producer LRs sit in `0x825060000x82509000`. **This is exactly the `sub_825070F0` worker thread cluster** that audit-057/058 already named:
> *audit-057: "sub_825070F0 (4 missing, initializes 4 workers w/ shared ctx 0xBCE25340, entries 0x82506528/58/88/B8)"*
The 4 worker entries (`0x82506528/58/88/B8`) are inside `sub_82506xxx` — exactly where the producer LRs `0x82506C90`/`0x82506F9C` live. The other producer LRs `0x825083xx` / `0x825085xx` are in downstream callees (workers call deeper code which itself calls KeSetEvent).
For comparison the audio host-pump pair gets a single sharp producer too:
- `0x828A3254` × 5271 ← `lr=0x824D2A44`
- `0x828A3244` × 5271 ← `lr=0x824D292C`
(These match AUDIT-032's PC `0x824D229C / r3=0x828A3230` region — already-understood audio host-pump.)
## Verdict — 2.BE is INSUFFICIENT for the silph UI wedge
The silph UI PKEVENTs are signaled exclusively by threads spawned by `sub_825070F0`. Per audit-057/058, **`sub_825070F0` fires 0× in ours** — those 4 worker threads never spawn. Therefore the PKEVENTs are never signaled. Therefore tid=13 (`0x12ac` in ours) wedges forever.
**`sub_825070F0`'s call chain is gated by the audit-009 "unreachability island"** — a CRT-driven fnptr-array bootstrap that ours fails to enumerate. VSync delivery is irrelevant to that bootstrap; the host frame-limiter thread does not drive CRT initializers.
Therefore:
- **2.BE alone CANNOT unwedge tid=13.** It will close the 54-vs-4712 VSync delivery gap and may unblock things downstream of vsync, but the silph UI wedge has an independent missing-signaler root cause.
- **2.BE may still unwedge tid=1 main on `0x12a4`** — that wait went via `KeInitializeEvent` (handle never hit `NtCreateEvent` in ours, hence `<AUDIT_BLIND>`). Whether `0x12a4`'s signaler depends on VSync is unknown without further probing.
## Implications for next moves
A single fix won't take us to draws > 0. We need at least two:
1. **2.BE (VSync delivery)** — still worth landing for the architectural correctness it brings, AND because it's the only fix that can unwedge tid=1 main's `0x12a4` if that's vsync-derived. ~6080 LOC per Agent C's plan.
2. **2.BF (sub_825070F0 activation)** — this is the audit-058 unfinished business. Options:
- (a) **Static work:** trace canary's CRT-driven fnptr-array path that activates the silph UI bootstrap; backport the missing init into ours. High info, slow. Requires more probing.
- (b) **Direct synthetic spawn:** ours injects host-side `ExCreateThread` calls for the 4 worker entries at boot completion, mirroring AUDIT-048's audio-host-pump precedent. Pragmatic; ~40 LOC; risks getting context (`0xBCE25340`) wrong.
A possible third move:
3. **Re-probe with LR on Wait paths** (we already added it but didn't grep for it) — to tell us whether tid=1's wait on `0x12a4` is the same LR as `sub_825070F0`-chain or a totally different signaler. If different, it's a 3rd missing producer.
## Round 4 — wait-side guest LR via one-frame back-chain walk
After fixing the PPC stack-walk offset (Xbox 360 stores saved LR at `[prev_sp - 8]`, not the `+4` AIX convention), wait-side LR comes through cleanly.
**Canary's top wait sites:**
| canary handle | wait count | guest_lr | LR region | mapping |
|---|---:|---|---|---|
| `F800005C` | 1635 | `0x8216EE14` | kernel early-boot infra | unrelated |
| `F800000C` | 1597 | `0x824AFFC4` | xboxkrnl wrapper (scheduler / work-queue?) | unrelated |
| **`F80000DC`** | **476** | **`0x821C7D3C`** | **silph::UImpl/GamePart** | **= ours' 0x12ac silph UI wedge** |
| `F80000B0` | 6 across | `0x821CBAE0` + `0x821CC19C` + `0x822DFE2x/D0` | **exact match with audit-049's frame trail** | sibling silph UI wait |
Identity proof: ours' audit-049 frame trail for the silph UI wedge was `0x821cb1e0 / 0x821cbae0 / 0x821cc454 / 0x821c4f18 / 0x82174a80`. Round 4 captures `0x821CBAE0` and `0x821CC19C` (adjacent PCs) as wait LRs in canary — same cluster, same code.
**Refined verdict.** ours' `0x12a4` (tid=1 main, AUDIT_BLIND) and `0x12ac` (tid=13 silph UI) are 8 bytes apart — likely sibling KEVENT fields in the same silph UI struct. canary's analogs are in the `F80000xx` namespace, similarly clustered. The single fix that addresses both:
> **2.BF (b)** — synthetic host-side spawn of `sub_825070F0`'s 4 workers at the audit-058-identified context (`0xBCE25340`), entries `0x82506528/58/88/B8`. Once those workers run, they signal the silph UI PKEVENT cluster, unwedging BOTH tid=1 main and tid=13 silph UI in one shot.
2.BE (host-driven VSync ISR delivery) becomes follow-on work after the UI bootstrap completes and frame pacing actually matters.
## Open questions for iterate 2.BD / 2.BE planning
1. **Does 2.BE alone unwedge tid=13?** Cheapest verification path: land 2.BE and re-run audit-059, see whether `0x12ac` signal count goes 0 → non-zero.
2. **What is the LR-pattern of canary's `KeSetEvent guest_ptr=0xBCE25234` callers?** The current probe doesn't capture LR — extending the cvar to do so on a filtered subset would let us name the producer function in canary's namespace.
3. **Does the GPU frame-limiter's CP interrupt actually walk into the silph UI cluster?** I.e., does `EmulateCPInterruptDPC``interrupt_callback` → guest code ever hit `sub_821CB030` or its callees? An LR probe inside `EmulateCPInterruptDPC` would answer this.
## Artifacts
- `canary.log` 2.2 MB / 34,095 lines / 32,977 AUDIT-HLC lines
- `canary.stdout` 2.2 MB (duplicate of canary.log due to log_file fallback)
- `canary.stderr` 8.4 KB (Wine diagnostics)
- `ours.log` 479 lines (focus ledger + thread diagnostics + final state)
- `ours.stderr` 317 lines (kernel-call counters)
- `vkd3d-proton.cache.write` 15 KB (build artifact, ignored)
Commits in play (xenia-canary, fork-local only):
- `03362b59f` cross-build-wine (cross-compile toolchain)
- `d031d7c51` audit-handle-lifecycle-probes (this audit's probes)

View File

@@ -0,0 +1,116 @@
# Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN
## Background
Rounds 23-33 drove γ-cluster #2 down to the actual gate: **`sub_821741C8`** (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from `sub_821752C0` thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.
The fix shape is a synth module analogous to `silph_synth.rs` (rounds 18-21):
- Synthesize a singleton-like object with the right vtable
- Spawn a guest thread at the right entry with this object as r3
- Let the dispatch chain do the rest
Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.
## Sub-round breakdown (estimated 5-8 rounds)
### 34.α — Probe canary's dispatcher singleton (1 round)
Capture canary's runtime state at `sub_821741C8` entry:
- `r3 = 0xBCA44C00` (canary tid=6's dispatcher singleton)
- Dump `r3..r3+0x80` to identify all fields
- Note vtable address at `[r3+0]`
```bash
WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
--audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
...
```
### 34.β — Probe full vtable layout (1 round)
Read the vtable bytes statically from the PE (canary's `[r3+0]` IS a static XEX VA — same trick as round 21):
- Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
- Confirm slot 9 = `sub_821C7CB8` and `vtable+0x24` thunk to `sub_821741C8`
- Look at all other slots — do any reference deep guest code that needs more init?
Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.
### 34.γ — Skeleton synth + thread spawn (1 round)
Create `crates/xenia-kernel/src/silph_ui_synth.rs` mirroring `silph_synth.rs` structure:
```rust
pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
// Allocate ~0x100-0x200 bytes for the dispatcher singleton
let ctx = state.heap_alloc(0x200, 16)?;
mem.write_zeros(ctx, 0x200);
// Install static-XEX vtable at [+0]
mem.write_u32(ctx + 0x00, VTABLE_VA); // discovered in 34.β
// Other init fields from 34.α dump
// ...
// Spawn dispatcher thread at sub_821748F0 with r3=ctx
scheduler.spawn(SpawnParams{
entry: 0x821748F0,
start_context: ctx,
create_suspended: false,
...
})?;
state.silph_ui_synth_done = true;
state.silph_ui_synth_ctx = ctx;
Ok(ctx)
}
```
Hook point: first reach of `sub_821CB030` in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).
Add 3-mode env gate: `XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}`.
### 34.δ — Run + diagnose first crash (1 round)
Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:
- Probe at thread entry + early BB heads
- Identify the offset that's accessed
- Compare to canary's value at that offset
### 34.ε..η — Iterate on field fills (2-4 rounds)
Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).
### 34.θ — Producer-side seeding (1 round)
Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are `sub_821CBEA8 / sub_821D24A0 / sub_821CD458` — they may need their own bootstrap. Probe what triggers them in canary.
## Verification at each stage
After every commit:
- `cargo test --release --workspace` — 765/765 must pass
- `XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac`
- Check:
- No crash
- `sub_821741C8` fires
- `sub_82450b68` r4=3 fires increase
- Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
- Eventually: `VdSwap > 1, draws > 0`
## Risk register
- **High**: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
- **High**: foreign-arena pointers in canary's heap (similar to round 19's `[+0x28/+0x2C/+0x30]`) may need their own synthesis
- **Medium**: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
- **Low**: workspace tests breaking (probe infrastructure is solid)
- **Low**: existing iterate-2BE work regressing (it's on a separate branch)
## Off-ramps
If we hit a wall at any sub-round, the off-ramps are:
1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers
## Branch plan
New branch: `iterate-2BF/silph-ui-synth` off `iterate-2BF/synthetic-silph-spawn` HEAD `40f208e`. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.
## When ready to execute
Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.

View File

@@ -0,0 +1,66 @@
AUDIT-PC-PROBE pc=0x8216ea68 tid=1 hw=0 cycle=5362918 lr=0x824ab8e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6181256 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181641 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9229300 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x821748f0 tid=13 hw=1 cycle=0 lr=0xbcbcbcbc r3=0x4024a840 r11=0x00000000 [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
=== Final State ===
PC: 0x00000000
LR: 0xbcbcbcbc
CTR: 0x00000000
CR: 0x00000000
XER: CA=0 OV=0 SO=0
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4128], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001020 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]

View File

@@ -0,0 +1,167 @@
# Round-A1..A4 findings — canary tid=6 spawn chain & divergence frontier
## Anchor reframe (round-37 misread corrected)
The "factory/registry layer divergence at [0x828E1F08]" framing is falsified.
Both engines install the SAME static-XEX `.rdata` vtable `0x820A183C` at the
singleton's `[+0]`. The instance VAs differ only because of ε-class allocator
divergence (audit-043).
| Probe | Canary | Ours |
|----------------------------|----------------------|----------------------|
| `[0x828E1F08]` | 0xBC22C910 (heap) | 0x40111910 (heap) |
| `[[0x828E1F08]+0]` vtable | 0x820A183C | 0x820A183C (SAME) |
| `vtable[+0]` thunk | 0x82175330 | 0x82175330 (SAME) |
| `vtable[+8]` thunk | 0x82175340 → b sub_821741C8 | SAME (vtable bytes from XEX `.rdata`) |
The thunks at 0x82175330+ are 8-byte `lwz r3, 8(r3); b <real_method>`
trampolines. Slot 2 (`+0x08`) is the worker dispatch entry that round 33
identified as 471× in canary tid=6 / 0× in ours.
## A.1 — Canary dispatcher loop is in sub_822F1AA8 on tid=6
Probe `--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=256` on
canary (35 s):
- ~1678 fires of sub_821741C8 on **tid=6**
- r3 at entry = `0xBCCC4A80` (the inner sub-object of the silph::UImpl
singleton — extracted via the thunk's `lwz r3, 8(r3)`)
- LR at entry = `0x822F1D5C` (return PC after the `bctrl` at 0x822F1D58 inside
sub_822F1AA8)
- Singleton's `[+C0..+D0]` UTF-16 spells "HF Frequency" (a UI label)
The dispatch site in canary (the `bctrl`) is at PC 0x822F1D58 inside
sub_822F1AA8:
```
0x822F1D40: lwz r3, 7944(r25) ; r3 = [r25+0x1F08] = [0x828E1F08]
0x822F1D4C: lwz r11, 0(r3) ; vtable
0x822F1D50: lwz r11, 8(r11) ; vtable[+8] = thunk 0x82175340
0x822F1D54: mtctr r11
0x822F1D58: bctrl ; → 0x82175340 → b 0x821741C8
```
## A.2 — Canary tid=6 spawn site is sub_821746B0 at PC 0x82174824
Enumeration of `ExCreateThread` calls in canary (35 s, 21 unique tuples):
```
entry=821748F0 start_ctx=BC365700 lr=824AC5F0 guest_lr=82174828 ← silph dispatcher #1
entry=821748F0 start_ctx=BC366DA0 lr=824AC5F0 guest_lr=82174828 ← silph dispatcher #2
```
PC `0x82174824` is the `bl 0x82172370` (the `ExCreateThread` thunk) inside
`sub_821746B0`. The setup is:
```
0x8217480C: lis r11, 0x8217
0x82174810: li r7, 0
0x82174814: li r6, 4 ; priority
0x82174818: mr r5, r29 ; start_ctx
0x8217481C: addi r4, r11, 18672 ; r4 = 0x821748F0 (entry)
0x82174820: li r3, 0
0x82174824: bl 0x82172370 ; ExCreateThread
```
The entry `0x821748F0` is a thread main that calls `bl 0x821749C0` (the
inner dispatch).
## A.3 — sub_822F1AA8 spawns a SECOND thread at 0x822F1B08
The dispatch-loop function `sub_822F1AA8` itself ALSO spawns a thread at
PC 0x822F1B08 with entry=`sub_822F1EE0` and `start_ctx=BCE24A40`:
```
0x822F1AEC: lis r11, 0x822F
0x822F1AFC: addi r4, r11, 7904 ; r4 = 0x822F1EE0
0x822F1B08: bl 0x82172370 ; ExCreateThread
```
sub_822F1EE0 → sub_822F1F20 contains its own atomic state-machine + wait loop.
## A.3' — sub_822F1AA8 has exactly 2 callers, both in sub_8216EA68
```
source=0x8216ECCC source_func=0x8216EA68 kind=call
source=0x8216EE10 source_func=0x8216EA68 kind=call
```
So sub_8216EA68 is the only function that drives sub_822F1AA8.
## A.4 — Ours' divergence is INSIDE the spawned thread, NOT at the spawn
Mirror-probed ours at `sub_821746B0` body BB heads (parallel mode, 50M
instructions, XENIA_CACHE_PERSIST=1):
| PC | Fires | Notes |
|-------------|-------|------------------------------------------------|
| 0x821746B0 | 1 | Entry. r3=0x40ba9a80 |
| 0x821746E0 | 1 | After `bl 0x8284DCFC` (critical-section) |
| 0x82174798 | 1 | After the early `beq` (r28==0 branch) |
| 0x821747B8 | 1 | **Past the gate**: `[0x828E2B14]=0x40105000` non-NULL; `bl 0x82150EF8` returned r3=0x4024a840 (NON-NULL) |
| 0x821747D8 | 1 | After the inner `bl 0x821723F0` |
| 0x8217480C | 1 | Enters the spawn block |
| 0x82174828 | 1 | **Post-`bl ExCreateThread`**, r3=0x1070 = thread handle |
**OURS DOES SPAWN THE THREAD VIA THIS SITE.** The returned handle 0x1070 is
**tid=13's thread handle** (per round 37 final state). So **ours' tid=13 IS
the same logical thread as canary's tid=6** — spawned by the identical call
site with the same entry (0x821748F0).
## A.4 — Divergence is INSIDE the spawned thread's body
Round 37's frame trail for ours' tid=13 wedge:
`0x821CB1E0 → 0x821CBAE0 → 0x821CC454 → 0x821C4F18 → 0x82174A80`
The LAST frame `0x82174A80` is **inside sub_821749C0** (= the inner dispatch
called from sub_821748F0). It's right after the vtable dispatch at
0x82174A78 (`bctrl` on `[r30+vtable][+16]`):
```
0x82174a64: mr r3, r30 ; r3 = some object
0x82174a68: lwz r11, 0(r30)
0x82174a6c: lwz r4, 4(r29)
0x82174a70: lwz r5, 8(r31)
0x82174a74: lwz r11, 16(r11) ; r11 = vtable[+0x10]
0x82174a78: mtctr r11
0x82174a7c: bctrl ; dispatch
0x82174a80: lwz r3, 0(r29) ; ← wedge frame top (LR after bctrl)
```
So `sub_821749C0`'s vtable[+0x10] dispatch on tid=13/tid=6's `r30` object
lands at audit-049 territory in ours (chain through sub_821CB030+0x128 that
ends waiting forever on handle 0x1078). In canary, the same dispatch on the
same object SHOULD land somewhere that ultimately reaches sub_822F1AA8's
dispatch loop and runs sub_821741C8 1678× via vtable[+8].
**The object `r30` is the result of `bl 0x821CF3F0`** at PC 0x821749DC. So
sub_821CF3F0 returns a registry-lookup object; the vtable on this object's
slot +0x10 method's body determines whether the thread wedges or runs.
## Phase B classification
Class 3 — **Missing init-time precondition**. Ours reaches the spawn site,
ours' tid=13 enters the chain, ours' tid=13 enters sub_821749C0, but the
vtable[+0x10] dispatch at PC 0x82174A78 in ours lands in audit-049 territory
(wait forever on 0x1078) rather than continuing through the canonical chain
toward sub_822F1AA8's outer dispatch loop.
Possible classes to refine in next round:
- **3a**: same vtable but state-dependent — `r30`'s field at a specific offset
differs in ours vs canary, causing the method body to take a different
branch.
- **3b**: the vtable in `r30` is DIFFERENT in ours vs canary (e.g., ours has
a base-class vtable but canary has a derived-class vtable).
- **4**: synthesis fallback — spawn a SECOND thread that runs sub_822F1AA8's
dispatch loop directly, bypassing the wedged sub_821749C0 chain.
## Next probe (A.4.5)
Probe both engines at sub_821749C0 entry filtering tid=13 (ours) / tid=6
(canary), capturing:
- `r3` and `r4` at entry (the factory-output object and the ctx)
- After the `bl 0x821CF3F0` at 0x821749DC: capture r30 (= sub_821CF3F0
return — the object whose vtable is dispatched at 0x82174A78)
- At PC 0x82174A78 (the divergent bctrl): r30 + r30+0 (vtable) + vtable[+0x10]
(the dispatch target)
If ours and canary have IDENTICAL `vtable[+0x10]` targets but the method
body's behavior differs → class 3a (state divergence). If targets differ →
class 3b (vtable identity divergence).

View File

@@ -0,0 +1,91 @@
AUDIT-PC-PROBE pc=0x821746b0 tid=1 hw=0 cycle=9228833 lr=0x82173c38 r3=0x40ba9a80 r11=0x00000000 [r3+0]=0x40111910 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746b0 tid=1 cycle=9228833
AUDIT-PC-PROBE pc=0x821746e0 tid=1 hw=0 cycle=9228856 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821746e0 tid=1 cycle=9228856
AUDIT-PC-PROBE pc=0x82174798 tid=1 hw=0 cycle=9228859 lr=0x821746e0 r3=0x00000000 r11=0x00000000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174798 tid=1 cycle=9228859
AUDIT-PC-PROBE pc=0x821747b8 tid=1 hw=0 cycle=9229012 lr=0x821747ac r3=0x4024a840 r11=0x4024a840 [r3+0]=0x4024ace0 [[r3+0]+24]=0x43777290 [r3+0x0C]=0x4024a820 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747b8 tid=1 cycle=9229012
AUDIT-PC-PROBE pc=0x821747d8 tid=1 hw=0 cycle=9229440 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x821747d8 tid=1 cycle=9229440
AUDIT-PC-PROBE pc=0x8217480c tid=1 hw=0 cycle=9229443 lr=0x821747cc r3=0x4024a840 r11=0xffffffff [r3+0]=0x40ba9a80 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x4250dec0
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x8217480c tid=1 cycle=9229443
AUDIT-PC-PROBE pc=0x82174828 tid=1 hw=0 cycle=9229509 lr=0x82174828 r3=0x00001070 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
AUDIT-MEM-READ addr=0x828e2b14 val=0x40105000 vtable=0x40105004 vtable[0]=0x40105008 vtable[24]=0x40105020 pc=0x82174828 tid=1 cycle=9229509
=== Final State ===
PC: 0x824ac578
LR: 0x824ac578
CTR: 0x82153bf0
CR: 0x24000028
XER: CA=0 OV=0 SO=0
r0 : 0x0000000082153bf0
r1 : 0x00000000700ff6e0
r2 : 0x0000000020000000
r4 : 0x0000000000000001
r7 : 0x0000000003a72328
r8 : 0x0000000043b77284
r9 : 0x0000000043b77328
r10: 0x0000000000000001
r11: 0x0000000000000103
r12: 0x0000000082173c64
r13: 0x000000007fff0000
r18: 0x0000000040d09a7c
r23: 0x00000000828f3844
r26: 0x000000004024a620
r27: 0x00000000820a17a8
r31: 0x0000000000001070
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d140c lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,136 @@
# Phase A synthesis — canary tid=6 IS the main thread; the wedge is sub_822F1AA8's loop exit
## Top-line finding
**Canary's `tid=6` is canary's main thread.** Confirmed by probing `entry_point`
(`sub_824AB748`) with `--audit_jit_prolog_pc=0x824AB748`: fires 1× on
`tid=00000006` with `lr=BCBCBCBC` (= OS-initial / no caller). Ours numbers
its main thread `tid=1`. Same logical thread; different label.
Therefore "tid=6 fires sub_821741C8 471×" (round 33) means **the main thread**
loops inside `sub_822F1AA8` firing `sub_821741C8` ~1678×/30s in canary. In
ours, the main thread (tid=1) runs `sub_822F1AA8` ONCE, exits the loop, and
proceeds to thread-join on the spawned init thread (handle 0x1070 = tid=13),
which is itself blocked forever on handle 0x1078.
## Call chain (identical in both engines, different runtime behavior)
```
entry_point (sub_824AB748)
├─ sub_824ACB38 CRT-driven fnptr-array iterator (audit-050 region)
├─ ...
└─ sub_8216EA68 Many local calls including:
├─ ExCreateThread(entry=sub_8217F0F8 ...) ; sibling thread
├─ sub_822F1AA8(controller=...) ; FIRST call (PC 0x8216ECCC)
└─ sub_822F1AA8(controller=0xBCE24A40 canary / ; SECOND call (PC 0x8216EE10)
0x40d09a40 ours) ↑ this is the loop
```
The SECOND call is what runs the dispatcher loop. Its LR = 0x8216EE14.
Confirmed in both engines.
## sub_822F1AA8 loop structure
```
0x822F1AA8: entry, r30 = r3 (controller)
0x822F1AEC-0x822F1B08: ExCreateThread(entry=sub_822F1EE0, ctx=r30) → r29 = handle
0x822F1B30-0x822F1B34: bl 0x824AA8B0(r3=r29) ; ?
0x822F1B38-0x822F1B4C: first bctrl → vtable[+0] of [0x828E1F08]
0x822F1B50-0x822F1B74: setup, bl 0x824AA330 INFINITE wait on [r22+32]
0x822F1B80-0x822F1BA8: post-wait setup; [r30+0] |= 0x2
0x822F1BB0-0x822F1BBC: TOP-OF-LOOP CHECK: if [r30+0] & 0x10000000 → goto 0x822F1E10 (exit)
0x822F1BCC..0x822F1DEC: loop body (includes the vtable[+8] bctrl → sub_821741C8 at PC 0x822F1D58)
0x822F1DEC-0x822F1DFC: bl 0x824AA330 INFINITE wait on [r23+0]
0x822F1E00-0x822F1E0C: END-OF-ITERATION CHECK: if [r30+0] & 0x10000000 == 0 → goto 0x822F1BCC (re-loop)
0x822F1E10-0x822F1E18: EXIT: [r30+0] |= 0x02000000 (set MSB-6 = LSB-25)
0x822F1E1C-0x822F1E24: release something via bl 0x824AA2F0
0x822F1E28-0x822F1E30: bl 0x824AA330 INFINITE on [r30+28] = SPAWNED THREAD HANDLE (thread join!)
0x822F1E40: bl 0x824AA3E0
0x822F1E44-0x822F1E5C: final cleanup: vtable[+24] bctrl on [0x828E1F08]
0x822F1E60-0x822F1E78: [r30+0] = 0, then [r30+0] |= 1; bl 0x824567E0
0x822F1E7C-0x822F1E88: epilogue
```
**Loop exit gate**: `[r30+0] & 0x10000000` (bit 28 LSB / bit 3 MSB). Set →
exit. Both top-of-loop check (0x822F1BBC) and end-of-iteration check
(0x822F1E0C) gate on the same bit.
## What's different between engines
| Engine | [r30+0] at entry | Loop iterations | Exits sub_822F1AA8? |
|--------|------------------|------------------|----------------------|
| canary | 0x21 (per probe) | ~1678+ in 30s | NO (stays in loop) |
| ours | 0x21 (per probe) | 0 (probes show none of the loop-body PCs fire after entry) | YES (exits quickly) |
Both engines have `[r30+0]=0x21` at entry — bit 28 NOT set. After the `ori
r11, r11, 0x2` at 0x822F1B90, both should have `[r30+0]=0x23`. Bit 28 still
not set.
So **some code sets bit 28 on [r30+0] between sub_822F1AA8 entry and the
loop check** in ours but not in canary.
Mem-watch on 0x40d09a40 (ours' controller VA) shows **zero guest writes** in
my 50M-instruction parallel run. Possible reasons:
- The setter writes from kernel/runtime code that mem-watch doesn't capture
(kernel-host store, not guest JIT store)
- The setter writes via a computed alias (different VA but same backing)
- The bit IS set via a probe-quantum-elided JIT store
## Phase B classification
**Class 3a — state-divergence on the controller object**. The vtable
identity is the same (round-37 confirmed `0x820A183C` in both). The
controller object's bit 28 of `[+0]` evolves differently during the setup
between sub_822F1AA8 entry and the loop check.
Class 4 (synthesis) is now LESS attractive: ours' main thread DOES reach
sub_822F1AA8 with the right controller. We don't need to spawn the
dispatcher — we need to PREVENT the main thread from exiting the loop.
## Pragmatic next step — JIT instrumentation to find bit-28 setter
Most direct diagnostic: add a JIT hook in xenia-cpu that, for guest stores
in the range [0x822F1AA8, 0x822F1E10), captures the guest PC + the written
value when the store would set bit 28 of any address. This identifies the
exact PC that sets the loop-exit bit.
Alternative: extend `--mem-watch` to also capture kernel-side stores by
hooking the GuestMemory write path at the kernel-state level.
Even simpler: add a one-shot `--bit-watch=ADDR:MASK` cvar that fires when
the value at ADDR has any bit in MASK transition from 0→1, regardless of
who wrote it. This is the cleanest diagnostic for this exact pattern.
## Fix shape (when bit-28 setter is identified)
If the bit-28 setter is inside the vtable[+0] dispatch chain at 0x822F1B4C
(target sub_82173990), then the fix might be a state-init issue in the
kernel/runtime.
If the bit-28 setter is inside the inner wait or one of the kernel calls
(`bl 0x824AA8B0`, `bl 0x824AA330`), the fix might be a missing event signal
or a wrong handle-state evolution.
If we can't identify the setter cleanly, the synthesis fallback is to
**inject a kernel-side hook that clears bit 28 of [r30+0] on every entry to
sub_822F1AA8's bit-check site (0x822F1BB0)**. Crude but should keep the
main thread in the loop.
## Why this is a clearer wedge picture than rounds 22-33
Rounds 22-33 chased the audit-049 wedge from various angles. The diagnoses
landed on different layers:
- R22: "wrong cluster targeted" (cluster A vs B)
- R26-30: "state-machine progression bug"
- R32-33: "pool 3 starvation; bootstrap walk-back"
This round establishes the simplest possible framing:
> **Canary's main thread loops forever in a dispatcher; ours' main thread
> exits the loop after one setup phase. The exit is gated by a single bit
> on the controller's flag word.**
If bit 28 of `[controller+0]` could be permanently cleared, ours' main
thread would stay in the loop, sub_821741C8 would dispatch, signals would
flow, tid=13 would complete, draws would happen.

View File

@@ -0,0 +1,79 @@
AUDIT-PC-PROBE pc=0x822f1aa8 tid=1 hw=0 cycle=6180796 lr=0x8216ee14 r3=0x40d09a40 r11=0x40111910 [r3+0]=0x00000021 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x40541a40 [r3+0x30]=0x00000000
AUDIT-PC-PROBE pc=0x822f1b38 tid=1 hw=0 cycle=6181181 lr=0x822f1b38 r3=0x00000001 r11=0x824b0000 [r3+0]=0x00000000 [[r3+0]+24]=0x00000000 [r3+0x0C]=0x00000000 [r3+0x30]=0x00000000
=== Final State ===
PC: 0x824ac578
LR: 0x824ac578
CTR: 0x82153bf0
CR: 0x24000028
XER: CA=0 OV=0 SO=0
r0 : 0x0000000082153bf0
r1 : 0x00000000700ff6e0
r2 : 0x0000000020000000
r4 : 0x0000000000000001
r7 : 0x0000000003a72328
r8 : 0x0000000043b77284
r9 : 0x0000000043b77328
r10: 0x0000000000000001
r11: 0x0000000000000103
r12: 0x0000000082173c64
r13: 0x000000007fff0000
r18: 0x0000000040d09a7c
r23: 0x00000000828f3844
r26: 0x000000004024a4e0
r27: 0x00000000820a17a8
r31: 0x0000000000001070
=== Thread diagnostics ===
hw=0 idx=0 tid=1 state=Blocked(WaitAny { handles: [4208], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x700ff6e0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a72328
r8=0x43b77284 r9=0x43b77328 r10=0x00000001 r11=0x00000103 r12=0x82173c64 r13=0x7fff0000
hw=0 idx=1 tid=11 state=Blocked(WaitAny { handles: [2190094916, 2190094880], deadline: None }) pc=0x824d2a94 lr=0x824d2a94 sp=0x71497d90
r0=0x00000000 r3=0x00000000 r4=0x71497de0 r5=0x00000001 r6=0x00000003 r7=0x00000001
r8=0x00000000 r9=0x00000000 r10=0x71497df0 r11=0x828a3244 r12=0xbcbcbcbc r13=0x4b9f1000
hw=1 idx=0 tid=2 state=Blocked(WaitAny { handles: [2189887804], deadline: None }) pc=0x824a95f8 lr=0x824a95f8 sp=0x710ffd20
r0=0x0000030c r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x00000001 r9=0x6f000000 r10=0x824a9178 r11=0x82870000 r12=0x824a94f0 r13=0x4acc3000
hw=1 idx=1 tid=13 state=Blocked(WaitAny { handles: [4216], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x715a7a20
r0=0x821511d0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b77334 r9=0x43b77334 r10=0x40541f80 r11=0x00000001 r12=0x821cb1e0 r13=0x4d1d4000
hw=2 idx=0 tid=7 state=Blocked(WaitAny { handles: [1111821148], deadline: Some(42946672) }) pc=0x824cd4f4 lr=0x824cd4f4 sp=0x71187e60
r0=0x00000000 r3=0x00000000 r4=0x00000003 r5=0x00000001 r6=0x00000000 r7=0x71187eb0
r8=0x00000000 r9=0x00000000 r10=0x00000002 r11=0x00000002 r12=0xbcbcbcbc r13=0x4b1d6000
hw=2 idx=1 tid=8 state=Blocked(WaitAny { handles: [4176, 4132], deadline: None }) pc=0x824ab214 lr=0x824ab214 sp=0x71287c90
r0=0x00000000 r3=0x00000000 r4=0x71287cf0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x00000000 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x822f1ff0 r13=0x4b90a000
hw=3 idx=0 tid=4 state=Blocked(WaitAny { handles: [4120], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7112fb80
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000008 r11=0x00000000 r12=0x8245a660 r13=0x4adc6000
hw=3 idx=1 tid=5 state=Blocked(WaitAny { handles: [4224], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7116fbe0
r0=0x821511a0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x03a723d0
r8=0x43b7732c r9=0x828f0000 r10=0x00000001 r11=0x00000000 r12=0x82458b34 r13=0x4adc8000
hw=4 idx=0 tid=9 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71387df0
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ec000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ec000
hw=5 idx=0 tid=3 state=Blocked(WaitAny { handles: [4112], deadline: None }) pc=0x824ac578 lr=0x824ac578 sp=0x7111fdf0
r0=0x82153bf0 r3=0x00000000 r4=0x00000001 r5=0x00000000 r6=0x00000000 r7=0x00000a10
r8=0x00000010 r9=0x00000000 r10=0x00009030 r11=0x00000000 r12=0x82181988 r13=0x4adc4000
hw=5 idx=1 tid=6 state=Ready pc=0x824ab214 lr=0x824ab214 sp=0x7117fc60
r0=0x821511a0 r3=0x00000001 r4=0x7117fcc0 r5=0x00000001 r6=0x00000001 r7=0x00000000
r8=0x7117fcb0 r9=0x00009030 r10=0x00000002 r11=0x00000020 r12=0x82458d68 r13=0x4adca000
hw=5 idx=2 tid=10 state=Ready pc=0x824d1404 lr=0x824d22b4 sp=0x71487e00
r0=0x00000000 r3=0x4250dedc r4=0x4250e040 r5=0x00000001 r6=0x00000000 r7=0x00000000
r8=0x4b9ee000 r9=0x01010000 r10=0x01010000 r11=0x00000000 r12=0x824d22a8 r13=0x4b9ee000
hw=5 idx=3 tid=12 state=Ready pc=0x824aa6a4 lr=0x824aa6a4 sp=0x714a7da0
r0=0x00000000 r3=0x000000ff r4=0x00000020 r5=0x714a7df4 r6=0x00000000 r7=0x00000000
r8=0x00000000 r9=0x00000000 r10=0x00000000 r11=0x00000001 r12=0x8217898c r13=0x4d1d2000
-- Handle waiter lists --
handle=0x00001018 Semaphore(0/2147483647) waiters(tid)=[4]
handle=0x8287093c Event(sig=false, mr=false) waiters(tid)=[2]
handle=0x00001070 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x42450b5c Event(sig=false, mr=true) waiters(tid)=[7]
handle=0x00001078 Event(sig=false, mr=false) waiters(tid)=[13]
handle=0x00001080 Event(sig=false, mr=false) waiters(tid)=[5]
handle=0x828a3244 Event(sig=false, mr=false) waiters(tid)=[11]
handle=0x00001024 Semaphore(0/2147483647) waiters(tid)=[8]
handle=0x828a3220 Event(sig=false, mr=true) waiters(tid)=[11]
handle=0x00001010 Event(sig=false, mr=true) waiters(tid)=[3]
handle=0x00001050 Event(sig=false, mr=true) waiters(tid)=[8]

View File

@@ -0,0 +1,127 @@
# Phase C.1 — Validation refutes Phase A's bit-28 setter hypothesis
## TL;DR
Phase A claimed: "bit 28 of `[0x40d09a40]` (controller word) gets set in ours, causing sub_822F1AA8's dispatcher loop to exit early; candidate setter is `sub_821B55D8` at PC `0x821B5DA4`."
**Phase C.1 falsifies this in 4 sub-rounds:**
1. **`sub_821B55D8` is dead code** in both engines — its `XamInputSetState` wrapper `sub_824AA858` fires 0× in both.
2. **`[0x40d09a40]` is never set to anything with bit 28** — `--dump-addr` at end of run shows `+0x00 = 0x00000021`, the entry value. Bit 28 is NEVER set.
3. **The actual wedge is at the `bcctrl` at PC `0x822F1B4C`** (inside sub_822F1AA8 setup, BEFORE the dispatcher loop). tid=1 never reaches the loop top-check.
4. **The bcctrl calls `sub_82173990`** (vtable[0] of the dispatcher singleton at `[0x828E1F08]`), which eventually waits for tid=13 to terminate. tid=13 wedges in the audit-049 silph::UImpl@GamePart_Title chain on handle `0x1078`.
The C.2 force-clear POC (the planned next step) would have **zero effect** because bit 28 is never set. Skipped per plan stopping criterion.
## Probe-fire counts (ours, 50M-instr parallel)
| PC | sub-round | fires | meaning |
|---|---|---|---|
| `0x821B55D8` (Phase A candidate fn entry) | 1 | **0** | function never reached → β/γ |
| `0x821B5D98,DA0,DAC,D48` (loop BB heads) | 1 | **0** | function never reached |
| `0x822F1AA8` (sub_822F1AA8 entry) | 2,3,4 | 2-3 | reached |
| `0x822F1B38` (post-`bl 0x824AA8B0`) | 4 | 2 | reached |
| `0x822F1B50` (post-`bcctrl`) | 4 | **0** | **bcctrl never returns** |
| `0x822F1B60,B78,B80,BBC` (loop setup/top) | 3 | 0 | unreachable past bcctrl |
| `0x822F1E10` (loop exit cleanup) | 2 | 0 | loop never entered, never exited |
| `0x822F1E34` (post-thread-join) | 2 | 0 | never reached |
| `0x82173990` (vtable[0] target) | 4 | 2 | called via bcctrl, r3=singleton (LR=0x822F1B50) |
| `0x821748F0` (tid=13 entry) | 4 | 2 | tid=13 runs |
| `0x821C4EB0` (silph::UImpl@GamePart_Title) | 4 | 2 | audit-009/049 reached on tid=13 |
| `0x82457388,0x824574C0,0x82457408,0x82457490` (other oris candidates) | 2 | 0 | unreachable |
## Canary probe results
| PC | fires | meaning |
|---|---|---|
| `0x824AA858` (XamInputSetState wrapper) | **0** | sub_821B55D8 chain is dead code in CANARY too |
| `0x822F1B50` (post-bcctrl, attempted) | **0** | canary's JitProlog only fires at function entries, so not directly testable; but per audit round-33 sub_821741C8 fires 471× in canary → bcctrl DOES return in canary |
## Critical evidence: `--dump-addr=0x40d09a40` at end of run
```
addr=0x40d09a40
+0x00: 00 00 00 21 00 00 00 01 42 44 df 00 40 54 1a 40
^^^^^^^^^^^ ^^^^^^^^^^^
+0x10: 40 54 1b 40 40 54 1b 80 40 54 1b c0 00 00 10 54
+0x20: 00 00 00 00 40 24 a8 20 00 00 00 08 00 00 00 00
```
- `[+0x00] = 0x00000021` ← bit 28 (mask 0x10000000) is NOT SET. Same value as at sub_822F1AA8 entry.
- `[+0x1c] = 0x00001054` ← spawned init thread handle (= tid=8's thread handle, NOT 0x1070)
- Thread state: tid=1 waits on handle `0x1070`, tid=13 waits on handle `0x1078`.
Handle `0x1070` is **tid=13's thread handle** (per stderr: `ExCreateThread: tid=13 handle=0x1070 entry=0x821748f0 ctx=0x4024a840 suspended=true`). So tid=1's wait at the wedge point is a **thread-join on tid=13**, NOT a thread-join on the dispatcher init thread (tid=8, handle 0x1054).
## Wedge path (corrected)
```
entry_point (sub_824AB748) [tid=1 main]
└─ sub_8216EA68
└─ sub_822F1AA8(controller=0x40d09a40) [LR=0x8216EE14]
├─ ExCreateThread(entry=sub_822F1EE0, ctx=controller) [PC 0x822F1B08]
│ ⇒ tid=8 spawn, handle=0x1054 (suspended)
├─ bl 0x824AA8B0 (no-op probe) [PC 0x822F1B34]
└─ bcctrl on vtable[+0] of [0x828E1F08] singleton [PC 0x822F1B4C]
└─ sub_82173990(r3=singleton) [r3=0x40ba9a80, vtable=0x40111910]
└─ ... (768-byte function with ≥18 calls; calls sub_82448AA0, sub_824AA7A0,
sub_82448BC8, sub_82448C50, sub_8216F218, sub_8217C850, sub_82178E50,
sub_821835E0, ...)
└─ ... → KeWaitForSingleObject INFINITE on handle 0x1070
(= tid=13's thread handle, thread-join)
⇒ WEDGE — tid=13 never exits
(Concurrently — spawned somewhere else, not from sub_822F1AA8:)
[tid=13, spawn-handle=0x1070, ctx=0x4024a840]
└─ sub_821748F0 (worker boilerplate, entry from ExCreateThread)
├─ sub_82172798, sub_82172818
└─ sub_821749C0
└─ sub_821CF3F0
└─ ... → sub_821C4EB0 (UImpl@GamePart_Title@silph) [audit-009/049!]
└─ ... → sub_821CB030 (creates KEVENT at +0x128)
⇒ KeWaitForSingleObject INFINITE on handle 0x1078
⇒ WEDGE — handle 0x1078 is never signaled in ours
```
## Why Phase A's hypothesis is wrong
Phase A:
1. Disassembled sub_822F1AA8's body, observed the bit-28 loop-exit check at `0x822F1BB8` and end-of-iter check at `0x822F1E0C`.
2. Mem-watch on `0x40d09a40` showed zero stores → inferred "the setter writes via some path mem-watch doesn't capture."
3. DB-scanned `oris ?, ?, 0x1000` (49 sites), found `sub_821B55D8 + 0x821B5DA4` with pattern `bl sub_824AA858 ; if r3 == 0xAA: oris r11, 0x1000 ; stw`.
4. Concluded `sub_821B55D8` was the setter.
What Phase A missed:
- Mem-watch's 0-stores result was correct: **NO setter exists**. Bit 28 is never set in either engine. The mem-watch null-result was a hint that the bit-28 hypothesis itself was wrong, but Phase A interpreted it as "mem-watch misses something."
- The disasm-based hypothesis was visually compelling (a loop iterating arrays and setting bit 28 when a kernel call returns 0xAA) but never verified runtime.
- `sub_821B55D8` is itself dead code in both engines.
## Reading-error class #19: disasm-pattern-match without runtime verification
When scanning for a hypothesized signal source via DB pattern-match (`oris ?, ?, 0x1000`), the analyst must run a probe to verify the suspected site is *both reached* and *takes the suspected path* before declaring it the cause. Phase A bypassed both checks. The single `--dump-addr=0x40d09a40` flag in sub-round 2 (literally 4 keystrokes added to the existing probe command) revealed the central assumption was wrong.
## Real divergence (handed to next session)
This is the **same wedge as audit-049/058/059**: tid=13 wedges in the silph::UImpl@GamePart_Title cluster on handle `0x1078`. tid=1 wedges on tid=13's thread-handle (`0x1070`) inside `sub_82173990`'s call chain.
`sub_82173990` is vtable[0] of the dispatcher singleton at `[0x828E1F08]`. It's a 768-byte function with ≥18 calls; the actual wait site is somewhere down its tree. To localize where in `sub_82173990` the wait happens, probe its BB heads + the `KeWaitForSingleObject` thunks (`sub_824AA330`, `sub_824AA708`).
The fix-shape is **NOT** "force-clear bit 28." The fix-shape is **"signal handle 0x1078 in the audit-049 cluster, or short-circuit tid=13's wait."** Round 22 (silph_synth.rs) attempted the cluster-A version of this. Cluster B (silph::UImpl) needs its own synthesis or a kernel-side signal of handle 0x1078.
## Phase C verdict
- C.1: 4 sub-rounds executed (within budget).
- C.2: **NOT EXECUTED** — POC would be no-op since bit 28 is never set. Per plan stopping criterion, do not proceed to C.2 blind when C.1 refutes the diagnosis.
- C.3: not applicable.
- Branch state: no source changes. Audit artifacts only.
## Files in this directory
- `ours-c1-probe.log/stderr` — sub-round 1, probe at sub_821B55D8 BB heads (0 fires)
- `ours-sr2-confirm-bit28.log/stderr` — sub-round 2, probe loop top/exit + dump-addr (bit 28 NEVER SET)
- `ours-sr3-wait-trace.log/stderr` — sub-round 3, probe wait site + handle 0x1070 trace
- `ours-sr4-bcctrl-trace.log/stderr` — sub-round 4, probe pre/post bcctrl + sub_82173990 entry + tid=13 entry (decisive)
- canary side in `../round-C1-setter-validation-canary/`:
- `canary-824AA858.log` — XamInputSetState wrapper fires 0× in canary too
- `canary-822F1B50.log` — JitProlog can't probe at BB-internal PCs (function-entry-only)

View File

@@ -0,0 +1,144 @@
# Phase D — Audit-049 Auto-Signal POC — FINDINGS
**Branch**: `iterate-2C/silph-ui-spawn-trace` (extends Phase C `481591f`)
**Date**: 2026-06-11
**Sub-rounds**: D2.SR1 → D2.SR4 (4/4 used)
**Verdict**: **B — partial unwedge**
## Mission
Phase C diagnosed the audit-049 wedge as tid=13 (silph::UImpl@GamePart_Title) waiting INFINITE on a KEVENT created at `sub_821CB030+0x128` (`lr=0x821cb15c`, post-bl PC). The Phase D POC tests this diagnosis by hooking `NtCreateEvent` from that exact call site and auto-signaling the resulting handle after a configurable delay (`XENIA_SILPH_UI_AUTOSIGNAL_DELAY` instructions).
If tid=13 unblocks, the diagnosis is confirmed. If new wedges or new threads appear downstream, even better — that's actual game progression past the wedge.
## Result summary
| Symptom | SR2/SR3 baseline | SR4 (POC firing) |
|---|---|---|
| `silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c` | yes (SR2/SR3) | yes |
| `silph autosignal: firing handle=0x1078` | NO | **yes (cycle 16326209)** |
| handle 0x1078 final | `signaled=false waiters=1 <NO_SIGNALS_DESPITE_WAITS>` | `signal_attempts=1 waiters=0` |
| tid=13 final state | `Blocked(WaitAny[0x1078])` | **`Ready` pc=0x824a9108** |
| tid=1 final state | `Blocked(WaitAny[0x1070])` thread-join | `Blocked(WaitAny[0x1070])` (tid=13 not yet exited) |
| ExCreateThread total | 10 | **12 (+tid=14, +tid=15)** |
| New downstream wedges | none past 0x1078 | **0x1084 (Event/Auto), 0x1088 (Event/Manual)** |
| `cxx_throw` runtime_error decoded | none | **yes, stack depth 6, top L0=0x82612b50 → L4=sub_82450B60+0x1A8 → L6=sub_82450a50** |
| VdSwap | 1 | 1 |
| gpu.interrupt.delivered{source=0} | 6393 | 4539 (different trajectory, no draws) |
**Conclusion**: tid=13 unwedged cleanly from the audit-049 wait, spawned two follow-on threads (tid=14 entry=`silph` ctx=`0x40929c00`, tid=15 a worker), and progressed deep enough into the silph::UImpl state machine to throw a `runtime_error` from sub_82450a50 → sub_82450B60+0x1A8 (the dispatcher cluster from round 26). The auto-signal **is not** the proper signaler — it lets tid=13 proceed but downstream state-machine invariants the missing real signaler would have established are not in place, so the dispatcher trips on a "not-registered instance" lookup.
This is a **clean confirmation** of the Phase C diagnosis: the wedge handle, the wait site, and the LR filter are all correct. The fix shape is:
- Either: synthesize the missing signaler properly (cluster-B silph_ui_synth.rs analogue from R33's deferred plan)
- Or: track what the auto-signal needed to write into the work-item state (`[+8]` field per R26) BEFORE signaling, so the dispatcher's BST lookup succeeds
## Sub-round detail
### D2.SR1 — initial run, hook never fires (wrong LR filter)
Filter checked `creator_lr ∈ [0x821CB15C, 0x821CB160]` against `ctx.lr` at `nt_create_event` entry. But `ctx.lr` is the **thunk wrapper return slot** (`0x824a9f6c`), not the guest caller's post-bl PC. Confirmed via handle-audit `created stack` dump: frame 0 lr=`0x824a9f6c`, frame 1 lr=`0x821cb15c`. The guest caller's LR lives one frame up the PPC EABI back-chain.
Diagnosis classification: **D (filter mismatch)**. Reading-error class #20 (new).
### D2.SR2 — frame-1-LR fix; hook schedules, never fires
Refactored `maybe_register_silph_autosignal` to take `(ctx, mem)`, walk back-chain via existing `walk_guest_back_chain` (1 step), match the saved LR. Hook now fires:
```
silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 10000 (now=0, delay=10000)
```
But no "firing" log appears, and tid=13 stays Blocked. Classification: **D (drain site never reached)**.
### D2.SR3 — diagnostic added; confirms drain site never visited
Added a one-shot info-level "tick (first visit, none due)" log inside `fire_due_silph_autosignals` when pending is non-empty but nothing due. Re-ran. **The tick-diagnostic never fired either** — proving the function isn't being called at all in `--parallel` mode.
Root cause: `--parallel` dispatches to `run_execution_parallel` (line 2928 of main.rs), which has its own outer loop at line 3186. My Phase D wiring only touched the lockstep path at line 2763. Classification: **D (wrong code path wired)**.
### D2.SR4 — parallel-path wiring added; hook fires; tid=13 unblocks
Added the same `set_now_cycle_hint` + `fire_due_silph_autosignals` calls inside the parallel outer loop, right after `coord_pre_round` (and under the same `kernel_arc` guard, so no extra locking). Re-built, re-ran.
Now all three log lines appear:
```
silph autosignal: scheduled handle=0x1078 caller_lr=0x821cb15c for cycle 16326202 (now=16316202, delay=10000)
silph autosignal: tick (first visit, none due) now=16316213 pending=1 first_deadline=16326202
silph autosignal: firing handle=0x1078 prev_signaled=Some(false) at cycle 16326209
```
`now=16316202` at schedule time confirms `set_now_cycle_hint` is wired through correctly (the parallel path was simply never visited in SR2/SR3). Fire at cycle 16326209 = deadline 16326202 + 7-cycle scheduler granularity. Diagnostic classification: **B (partial unwedge — new waits and cxx_throw downstream)**.
## Code shape
POC is ~70 LOC across four files, all env-gated. Default off.
| File | Change | Lines |
|---|---|---|
| `crates/xenia-cpu/src/scheduler.rs` | `GuestThread.start_entry/start_context` fields; `spawn()` populates; `current_thread_entry_and_ctx()` helper | +18 |
| `crates/xenia-kernel/src/state.rs` | `AutoSignalPending` struct; `silph_autosignal_*` fields; `set_now_cycle_hint`, `maybe_register_silph_autosignal`, `fire_due_silph_autosignals` methods | +95 |
| `crates/xenia-kernel/src/exports.rs` | Hook in `nt_create_event` | +3 |
| `crates/xenia-app/src/main.rs` | Fire-site wiring in lockstep loop (line 2788) **and** parallel loop (line 3215) | +12 |
Tests stay green at **655/655**.
## Reading-error class #20 (new)
**`ctx.lr` at kernel export entry ≠ guest caller's post-bl PC.** When a guest `bl` calls an export thunk, the thunk-wrapper has its own frame between the guest caller and the export body. At export-body entry, `ctx.lr` holds the *wrapper's* return slot, not the guest caller's post-bl PC.
To match a specific guest call site by LR, the export must walk one step up the back-chain (`walk_guest_back_chain(ctx.gpr[1], ctx.lr, mem, 2)`) and use `frames[1].lr`.
SR1 burned one full sub-round on this. Detect early in future POCs by comparing `ctx.lr` against the handle-audit's `created stack` frame dump for a known-good event (e.g. one created from a labelled site).
## Reading-error class #21 (new)
**`--parallel` and lockstep have separate outer loops in main.rs.** They share `coord_pre_round` (carved out exactly for this reason), but anything wired adjacent to that call site only takes effect on the path it's wired on. Lockstep is `run_execution` (line 2706, outer loop at 2763). Parallel is `run_execution_parallel` (line 2928, outer loop at 3186).
Per-round hooks added for a specific build mode must be wired in **both** paths. SR2/SR3 burned two sub-rounds on this.
## Files modified + LR mapping (for follow-up sessions)
**Wedge handle creation** (confirmed by handle-audit dump):
```
created cycle=0 tid=13 lr=0x824a9f6c [src=NtCreateEvent thunk return]
created stack (6 frames):
[ 0] fp=0x715a7a10 lr=0x824a9f6c ← ctx.lr at nt_create_event
[ 1] fp=0x715a7aa0 lr=0x821cb15c ← guest caller's post-bl PC (filter on this)
[ 2] fp=0x715a7bd0 lr=0x821cbae0 ← sub_821CBA08 frame
[ 3] fp=0x715a7cd0 lr=0x821cc454 ← sub_821CC3F8 frame
[ 4] fp=0x715a7d60 lr=0x821c4f18 ← sub_821C4EB0 frame (silph::UImpl@GamePart_Title)
[ 5] fp=0x715a7e00 lr=0x82174a80 ← sub_821748F0 trampoline frame
```
**Downstream cxx_throw stack** (after auto-signal fires, tid=5 throws runtime_error):
```
L0 lr=0x82612b50 std::exception throw path
L1 lr=0x825f2444
L2 lr=0x824547e8
L3 lr=0x82451418
L4 lr=0x82450d08 ← sub_82450B60+0x1A8 (dispatcher, audit-059 R26)
L5 lr=0x82450b34
L6 lr=0x82450a50 ← sub_82450a50 (worker dispatch)
cxx_throw runtime_error decoded magic=0x19930520
cxx_throw BST ceil search candidate_key=0x828e2b2c match_found=false
cxx_throw lhs (not-registered instance) lhs=0x715a7af0
```
This confirms the dispatcher reached audit-049 territory (R26's `sub_82450B60+0x1A8` PC `0x82450D08`), looked up a runtime instance in its BST keyed by VA, and the instance was never registered. **The auto-signal bypassed an upstream registration step** the real signaler would have driven.
## Recommendation
Ship the POC env-gated (default off; no behavior change unless opted in). The verdict-B success makes it a useful diagnostic flag for future audit-049 work: future investigations can set `XENIA_SILPH_UI_AUTOSIGNAL_DELAY=10000` to skip the wedge and probe downstream behavior without first writing the proper signaler.
Long-term fix path remains the R33 silph_ui_synth.rs analogue: synthesize the missing signaler + its precondition state (BST instance registration at `0x715a7af0`-equivalent, work-item state `[+8]` per R26). The auto-signal POC is **not** the final fix — it confirms diagnosis but doesn't honor the dispatcher's BST registry invariant.
## Artifacts
- `poc-sr1.log`, `poc-sr1.stderr` — initial run, filter mismatch (D)
- `poc-sr2.log`, `poc-sr2.stderr` — frame-1-LR fix, no fire (D)
- `poc-sr3.log`, `poc-sr3.stderr` — diagnostic added, no fire (D, parallel path unwired)
- `poc-sr4.log`, `poc-sr4.stderr` — parallel-path wired, **fires + partial unwedge (B)**
All `.log`/`.stderr` files are `.gitignore`d; this `FINDINGS.md` is the only artifact-side commit.

View File

@@ -0,0 +1,200 @@
0x82450b60: lwz r18, 9792(r31)
0x82450b64: lwz r16, 13880(r14)
0x82450b68: mflr r12
0x82450b6c: bl 0x825F0F74
0x82450b70: subi r31, r1, 176
0x82450b74: stwu r1, -176(r1)
0x82450b78: mr r29, r4
0x82450b7c: mr r27, r3
0x82450b80: cmpwi cr6, r29, 5
0x82450b84: bne cr6, 0x82450B94
0x82450b88: addi r28, r27, 196
0x82450b8c: addi r26, r27, 28
0x82450b90: b 0x82450BAC
0x82450b94: slwi r11, r29, 2
0x82450b98: mr r26, r27
0x82450b9c: add r11, r29, r11
0x82450ba0: slwi r11, r11, 2
0x82450ba4: add r11, r11, r27
0x82450ba8: addi r28, r11, 96
0x82450bac: addi r23, r27, 56
0x82450bb0: mr r3, r23
0x82450bb4: stw r23, 84(r31)
0x82450bb8: bl 0x8284DCFC
0x82450bbc: mr r3, r26
0x82450bc0: bl 0x8284DCFC
0x82450bc4: lwz r7, 16(r28)
0x82450bc8: cntlzw r11, r7
0x82450bcc: extrwi r11, r11, 1, 26
0x82450bd0: cmplwi cr6, r11, 0x0
0x82450bd4: beq cr6, 0x82450BEC
0x82450bd8: mr r3, r26
0x82450bdc: bl 0x8284DD0C
0x82450be0: mr r3, r23
0x82450be4: bl 0x8284DD0C
0x82450be8: b 0x82450EE8
0x82450bec: lwz r11, 12(r28)
0x82450bf0: lwz r9, 8(r28)
0x82450bf4: srwi r10, r11, 2
0x82450bf8: clrlwi r8, r11, 30
0x82450bfc: cmplw cr6, r9, r10
0x82450c00: bgt cr6, 0x82450C08
0x82450c04: sub r10, r10, r9
0x82450c08: lwz r9, 4(r28)
0x82450c0c: slwi r10, r10, 2
0x82450c10: slwi r8, r8, 2
0x82450c14: lwz r6, 8(r28)
0x82450c18: addi r11, r11, 1
0x82450c1c: slwi r6, r6, 2
0x82450c20: li r24, 0
0x82450c24: lwzx r10, r10, r9
0x82450c28: cmplw cr6, r6, r11
0x82450c2c: lwzx r30, r10, r8
0x82450c30: stw r11, 12(r28)
0x82450c34: stw r30, 80(r31)
0x82450c38: bgt cr6, 0x82450C40
0x82450c3c: stw r24, 12(r28)
0x82450c40: subic. r11, r7, 1
0x82450c44: stw r11, 16(r28)
0x82450c48: bne 0x82450C50
0x82450c4c: stw r24, 12(r28)
0x82450c50: addi r25, r27, 28
0x82450c54: mr r3, r25
0x82450c58: bl 0x8284DCFC
0x82450c5c: mr r3, r25
0x82450c60: stw r30, 216(r27)
0x82450c64: bl 0x8284DD0C
0x82450c68: mr r3, r26
0x82450c6c: bl 0x8284DD0C
0x82450c70: lwz r11, 28(r30)
0x82450c74: clrlwi r11, r11, 31
0x82450c78: cmplwi cr6, r11, 0x0
0x82450c7c: bne cr6, 0x82450D30
0x82450c80: lwz r11, 8(r30)
0x82450c84: cmplwi cr6, r11, 0x1
0x82450c88: blt cr6, 0x82450CE4
0x82450c8c: bne cr6, 0x82450D3C
0x82450c90: lwz r11, 28(r30)
0x82450c94: rlwinm r11, r11, 0, 29, 29
0x82450c98: cmplwi cr6, r11, 0x0
0x82450c9c: beq cr6, 0x82450CB0
0x82450ca0: mr r4, r30
0x82450ca4: mr r3, r27
0x82450ca8: bl 0x824510E0
0x82450cac: b 0x82450CBC
0x82450cb0: mr r4, r30
0x82450cb4: mr r3, r27
0x82450cb8: bl 0x824517B0
0x82450cbc: stw r29, 220(r27)
0x82450cc0: bl 0x824AA830
0x82450cc4: mr r11, r3
0x82450cc8: lwz r3, 92(r27)
0x82450ccc: li r5, 0
0x82450cd0: addi r11, r11, 66
0x82450cd4: li r4, 1
0x82450cd8: stw r11, 224(r27)
0x82450cdc: bl 0x824AB158
0x82450ce0: b 0x82450D3C
0x82450ce4: lwz r11, 28(r30)
0x82450ce8: mr r4, r30
0x82450cec: mr r3, r27
0x82450cf0: rlwinm r11, r11, 0, 29, 29
0x82450cf4: cmplwi cr6, r11, 0x0
0x82450cf8: beq cr6, 0x82450D04
0x82450cfc: bl 0x82450F68
0x82450d00: b 0x82450D08
0x82450d04: bl 0x82451238
0x82450d08: stw r29, 220(r27)
0x82450d0c: bl 0x824AA830
0x82450d10: mr r11, r3
0x82450d14: lwz r3, 92(r27)
0x82450d18: li r5, 0
0x82450d1c: addi r11, r11, 66
0x82450d20: li r4, 1
0x82450d24: stw r11, 224(r27)
0x82450d28: bl 0x824AB158
0x82450d2c: b 0x82450D3C
0x82450d30: lwz r11, 28(r30)
0x82450d34: ori r11, r11, 0x2
0x82450d38: stw r11, 28(r30)
0x82450d3c: lwz r11, 8(r30)
0x82450d40: mr r29, r24
0x82450d44: cmpwi cr6, r11, 2
0x82450d48: blt cr6, 0x82450E08
0x82450d4c: cmpwi cr6, r11, 3
0x82450d50: ble cr6, 0x82450DA0
0x82450d54: cmpwi cr6, r11, 4
0x82450d58: bne cr6, 0x82450E08
0x82450d5c: lwz r11, 28(r30)
0x82450d60: rlwinm r11, r11, 0, 29, 29
0x82450d64: cmplwi cr6, r11, 0x0
0x82450d68: bne cr6, 0x82450D98
0x82450d6c: lwz r29, 36(r30)
0x82450d70: mr r3, r29
0x82450d74: lwz r11, 0(r29)
0x82450d78: lwz r11, 4(r11)
0x82450d7c: mtctr r11
0x82450d80: bctrl
0x82450d84: clrlwi r11, r3, 24
0x82450d88: cmplwi cr6, r11, 0x0
0x82450d8c: beq cr6, 0x82450D98
0x82450d90: mr r3, r29
0x82450d94: bl 0x8244FB38
0x82450d98: li r29, 1
0x82450d9c: b 0x82450E28
0x82450da0: addi r3, r30, 40
0x82450da4: bl 0x82451DB8
0x82450da8: lwz r11, 32(r30)
0x82450dac: cmplwi cr6, r11, 0x0
0x82450db0: beq cr6, 0x82450DCC
0x82450db4: rlwinm r11, r11, 0, 0, 31
0x82450db8: lwz r10, 4(r30)
0x82450dbc: lwz r11, 4(r11)
0x82450dc0: cmplw cr6, r10, r11
0x82450dc4: li r11, 1
0x82450dc8: beq cr6, 0x82450DD0
0x82450dcc: mr r11, r24
0x82450dd0: clrlwi r11, r11, 24
0x82450dd4: cmplwi cr6, r11, 0x0
0x82450dd8: beq cr6, 0x82450E00
0x82450ddc: lwz r4, 8(r30)
0x82450de0: lwz r5, 0(r30)
0x82450de4: lwz r3, 32(r30)
0x82450de8: cmpwi cr6, r4, 1
0x82450dec: ble cr6, 0x82450DFC
0x82450df0: bl 0x8245D9D8
0x82450df4: li r29, 1
0x82450df8: b 0x82450E28
0x82450dfc: stw r4, 8(r3)
0x82450e00: li r29, 1
0x82450e04: b 0x82450E28
0x82450e08: mr r3, r26
0x82450e0c: stw r26, 88(r31)
0x82450e10: bl 0x8284DCFC
0x82450e14: addi r4, r31, 80
0x82450e18: mr r3, r28
0x82450e1c: bl 0x823232C0
0x82450e20: mr r3, r26
0x82450e24: bl 0x8284DD0C
0x82450e28: clrlwi r11, r29, 24
0x82450e2c: cmplwi cr6, r11, 0x0
0x82450e30: beq cr6, 0x82450ECC
0x82450e34: lwz r11, 28(r30)
0x82450e38: rlwinm r11, r11, 0, 30, 30
0x82450e3c: cmplwi cr6, r11, 0x0
0x82450e40: beq cr6, 0x82450E68
0x82450e44: mr r3, r26
0x82450e48: stw r26, 88(r31)
0x82450e4c: bl 0x8284DCFC
0x82450e50: addi r4, r31, 80
0x82450e54: mr r3, r28
0x82450e58: bl 0x823232C0
0x82450e5c: mr r3, r26
0x82450e60: bl 0x8284DD0C
0x82450e64: b 0x82450ECC
0x82450e68: lwz r11, 40(r30)
0x82450e6c: cmplwi cr6, r11, 0x0
0x82450e70: beq cr6, 0x82450EA4
0x82450e74: rlwinm r3, r11, 0, 0, 31
0x82450e78: bl 0x82458A70
0x82450e7c: lwz r29, 40(r30)

View File

@@ -0,0 +1,80 @@
0x82451238: mflr r12
0x8245123c: li r0, 0
0x82451240: stw r0, 4(r1)
0x82451244: bl 0x825F0F80
0x82451248: subi r31, r1, 160
0x8245124c: stwu r1, -160(r1)
0x82451250: mr r30, r4
0x82451254: li r9, 1
0x82451258: lwz r10, 32(r30)
0x8245125c: stw r30, 188(r31)
0x82451260: stw r9, 8(r30)
0x82451264: cmplwi cr6, r10, 0x0
0x82451268: beq cr6, 0x82451288
0x8245126c: lwz r11, 4(r30)
0x82451270: lwz r8, 4(r10)
0x82451274: cmplw cr6, r11, r8
0x82451278: bne cr6, 0x82451288
0x8245127c: mr r11, r9
0x82451280: li r26, 0
0x82451284: b 0x82451290
0x82451288: li r26, 0
0x8245128c: mr r11, r26
0x82451290: clrlwi r11, r11, 24
0x82451294: cmplwi cr6, r11, 0x0
0x82451298: beq cr6, 0x824512A0
0x8245129c: stw r9, 8(r10)
0x824512a0: lwz r3, 36(r30)
0x824512a4: lwz r11, 0(r3)
0x824512a8: lwz r11, 32(r11)
0x824512ac: mtctr r11
0x824512b0: bctrl
0x824512b4: mr r27, r3
0x824512b8: stw r26, 84(r31)
0x824512bc: stw r27, 96(r31)
0x824512c0: bl 0x82454498
0x824512c4: addi r4, r31, 84
0x824512c8: bl 0x82454580
0x824512cc: stw r26, 92(r31)
0x824512d0: addi r11, r27, 2047
0x824512d4: lis r10, 0x2
0x824512d8: clrrwi r11, r11, 11
0x824512dc: cmplw cr6, r11, r10
0x824512e0: stw r11, 100(r31)
0x824512e4: ble cr6, 0x824512F4
0x824512e8: lis r11, 0x8207
0x824512ec: addi r11, r11, 6724
0x824512f0: b 0x824512F8
0x824512f4: addi r11, r31, 100
0x824512f8: addi r3, r31, 84
0x824512fc: lwz r4, 0(r11)
0x82451300: bl 0x82454B08
0x82451304: mr r8, r8
0x82451308: mr r28, r3
0x8245130c: stw r28, 92(r31)
0x82451310: b 0x82451324
0x82451314: lwz r30, 188(r31)
0x82451318: lwz r27, 96(r31)
0x8245131c: li r26, 0
0x82451320: lwz r28, 92(r31)
0x82451324: addi r3, r31, 84
0x82451328: bl 0x82454AA0
0x8245132c: mr r29, r3
0x82451330: cmplwi cr6, r28, 0x0
0x82451334: beq cr6, 0x82451684
0x82451338: lwz r3, 36(r30)
0x8245133c: li r8, 0
0x82451340: addi r7, r31, 88
0x82451344: mr r6, r29
0x82451348: mr r5, r29
0x8245134c: mr r4, r28
0x82451350: lwz r11, 0(r3)
0x82451354: lwz r11, 28(r11)
0x82451358: mtctr r11
0x8245135c: bctrl
0x82451360: clrlwi r11, r3, 24
0x82451364: cmplwi cr6, r11, 0x0
0x82451368: beq cr6, 0x82451684
0x8245136c: lwz r11, 28(r30)
0x82451370: rlwinm r11, r11, 0, 28, 28
0x82451374: cmplwi cr6, r11, 0x0

View File

@@ -0,0 +1,52 @@
=== Fire counts ===
ours: 3
canary: 7
=== Per-LR breakdown ===
ours:
lr=0x82458674: 3
canary:
lr=0x82457bd4: 2
lr=0x82458674: 5
=== Side-by-side first 5 fires (entry registers) ===
--- fire #0 ---
ours: tid=6 cycle=363 lr=0x82458674 r3=0x40ba9ac0
dump: 419fecda 000007f6 00000000 41d7dd10 00001688 00000000 00000000 41f5dd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a5c0
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4ac0 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb19cda 000007f6 00000000 bde98d10 00001688 00000000 00000000 be078d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365760
--- fire #1 ---
ours: tid=6 cycle=140548 lr=0x82458674 r3=0x40ba9b80
dump: 42c0f09a 00018ff6 00000000 43777210 0004d055 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 4024a960
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bed2a09a 00018ff6 00000000 bf892210 0004d055 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365840
--- fire #2 ---
ours: tid=6 cycle=5957876 lr=0x82458674 r3=0x40ba9b80
dump: 419fecda 000007f6 00000000 414f5f70 000003b9 00000000 00000000 41f60d80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 4024a980
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc4b80 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb19cda 000007f6 00000000 bd610b90 000003b9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000040 00000001 00000000 00000000 bc365860
--- fire #3 ---
ours: <no fire>
canary: tid=11 cycle=<unk> lr=0x82458674 r3=0xbccc5300 r4=0x00000000 r5=0x00000001 r6=0x00000001 r7=0x00000000
dump: bdb1acda 000007f6 00000000 bce24ed0 00000167 00000000 00000000 be07bd80 82457958 823f53f0 00000000 00000000 00000001 00000000 00000000 bc365f40
--- fire #4 ---
ours: <no fire>
canary: tid=6 cycle=<unk> lr=0x82457bd4 r3=0x701cf3c0 r4=0x00000004 r5=0x00002530 r6=0x00008000 r7=0x00000001
dump: be95af9a 0000c170 00000000 b2050010 000681e9 00000000 00000000 be07bd80 82457958 823f53f0 00000000 0000c17a 00000001 701cf4e0 00000000 be95af90
=== Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===
Both fields are stable identifiers across engines (host VAs differ but data should match).
Index of fields:
[+0x04] = work-item 'size?' (looks like a length field)
[+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]
ours [+04,+10]: [(2038, 5768), (102390, 315477), (2038, 953)]
canary [+04,+10]: [(2038, 5768), (102390, 315477), (2038, 953), (2038, 359), (49520, 426473), (232195, 999643), (6134, 13763)]
ours fires whose [+04,+10] match a canary fire: 3/3

View File

@@ -0,0 +1,175 @@
#!/usr/bin/env python3
"""Round 35 lockstep diff: align sub_8280AD40 entry fires between
ours (--audit-pc-probe-hex AUDIT-PC-PROBE / AUDIT-R3-DUMP) and
canary (AUDIT-HLC JitProlog).
Outputs side-by-side rendering of:
- per-fire entry register snapshot (r3..r10, lr)
- 64-byte r3 dump (u32 lanes, big-endian)
Alignment is by tid + invocation order (no input-equivalence required).
"""
import re
import sys
import os
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
OURS_LOG = os.path.join(THIS_DIR, "ours.log")
CANARY_LOG = os.path.join(
os.path.dirname(THIS_DIR), "round35-lockstep-inflate-canary", "canary.log"
)
PC_TARGET = 0x8280AD40
def parse_ours(path):
"""Pair AUDIT-PC-PROBE lines with their following AUDIT-R3-DUMP lines."""
fires = []
cur = None
with open(path) as f:
for line in f:
line = line.strip()
if line.startswith("AUDIT-PC-PROBE"):
m = re.search(
r"pc=0x([0-9a-f]+) tid=(\d+) hw=\d+ cycle=(\d+) lr=0x([0-9a-f]+) r3=0x([0-9a-f]+) r11=0x([0-9a-f]+)",
line,
)
if not m:
continue
pc = int(m.group(1), 16)
if pc != PC_TARGET:
cur = None
continue
cur = {
"tid": int(m.group(2)),
"cycle": int(m.group(3)),
"lr": int(m.group(4), 16),
"r3": int(m.group(5), 16),
"dump": [],
}
fires.append(cur)
elif line.startswith("AUDIT-R3-DUMP") and cur is not None:
lanes = re.findall(r"\+0x[0-9a-f]+=0x([0-9a-f]+)", line)
cur["dump"] = [int(x, 16) for x in lanes]
cur = None
return fires
def parse_canary(path):
"""Pair AUDIT-HLC JitProlog header lines with following r3+NN dump lines."""
fires = []
cur = None
hdr_re = re.compile(
r"AUDIT-HLC JitProlog pc=8280AD40 tid=([0-9A-F]+) r3=([0-9A-F]+) r4=([0-9A-F]+) "
r"r5=([0-9A-F]+) r6=([0-9A-F]+) r7=([0-9A-F]+) r8=([0-9A-F]+) r9=([0-9A-F]+) r10=([0-9A-F]+) lr=([0-9A-F]+)"
)
dump_re = re.compile(
r"AUDIT-HLC JitProlog pc=8280AD40 r3\+([0-9A-F]+): ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+) ([0-9A-F]+)"
)
with open(path) as f:
for line in f:
line = line.strip()
m = hdr_re.search(line)
if m:
cur = {
"tid": int(m.group(1), 16),
"r3": int(m.group(2), 16),
"r4": int(m.group(3), 16),
"r5": int(m.group(4), 16),
"r6": int(m.group(5), 16),
"r7": int(m.group(6), 16),
"r8": int(m.group(7), 16),
"r9": int(m.group(8), 16),
"r10": int(m.group(9), 16),
"lr": int(m.group(10), 16),
"dump": [],
}
fires.append(cur)
continue
m = dump_re.search(line)
if m and cur is not None:
off = int(m.group(1), 16)
for i in range(4):
word = int(m.group(2 + i), 16)
# extend dump to fit
idx = off // 4 + i
while len(cur["dump"]) <= idx:
cur["dump"].append(0)
cur["dump"][idx] = word
return fires
def fmt_dump(d):
return " ".join(f"{w:08x}" for w in d[:16])
def main():
ours = parse_ours(OURS_LOG)
canary = parse_canary(CANARY_LOG)
print(f"=== Fire counts ===")
print(f" ours: {len(ours)}")
print(f" canary: {len(canary)}")
print()
print(f"=== Per-LR breakdown ===")
for label, fires in (("ours", ours), ("canary", canary)):
lr_counts = {}
for f in fires:
lr_counts[f["lr"]] = lr_counts.get(f["lr"], 0) + 1
print(f" {label}:")
for lr, n in sorted(lr_counts.items()):
print(f" lr=0x{lr:08x}: {n}")
print()
print(f"=== Side-by-side first 5 fires (entry registers) ===")
n = max(len(ours), len(canary))
n = min(n, 5)
for i in range(n):
print(f"\n--- fire #{i} ---")
if i < len(ours):
f = ours[i]
print(
f" ours: tid={f['tid']:<3} cycle={f['cycle']:<10} lr=0x{f['lr']:08x} r3=0x{f['r3']:08x}"
)
print(f" dump: {fmt_dump(f['dump'])}")
else:
print(f" ours: <no fire>")
if i < len(canary):
f = canary[i]
print(
f" canary: tid={f['tid']:<3} cycle=<unk> lr=0x{f['lr']:08x} r3=0x{f['r3']:08x} "
f"r4=0x{f['r4']:08x} r5=0x{f['r5']:08x} r6=0x{f['r6']:08x} r7=0x{f['r7']:08x}"
)
print(f" dump: {fmt_dump(f['dump'])}")
else:
print(f" canary: <no fire>")
print()
print("=== Equivalence check: u32 lanes at +0x04 and +0x10 (work-item magic + counter) ===")
print(" Both fields are stable identifiers across engines (host VAs differ but data should match).")
print()
print(" Index of fields:")
print(" [+0x04] = work-item 'size?' (looks like a length field)")
print(" [+0x10] = state counter (per round 30, this is [+128/4 ?]) — but in dump it's u32[4]")
print()
# +0x04 is dump[1], +0x10 is dump[4]
ours_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in ours]
canary_keys = [(f["dump"][1], f["dump"][4]) if len(f["dump"]) > 4 else None for f in canary]
print(f" ours [+04,+10]: {ours_keys}")
print(f" canary [+04,+10]: {canary_keys}")
print()
# Cross-match: every ours key should appear in canary (canary is a superset)
matched = []
unmatched_ours = []
for k in ours_keys:
if k in canary_keys:
matched.append(k)
else:
unmatched_ours.append(k)
print(f" ours fires whose [+04,+10] match a canary fire: {len(matched)}/{len(ours)}")
if unmatched_ours:
print(f" ours fires with NO canary match: {unmatched_ours}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,17 @@
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 tid=00000006 r3=BCCC4A80 r4=00000018 r5=828F3888 r6=701CF924 r7=82456F00 r8=00000000 r9=00000000 r10=00000018 lr=822F1D5C
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+00: BC22C910 00010004 00000000 000003E8
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+10: 0101FFFF 00000000 00000000 01010000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+20: FFFFFFFF 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+30: 00000000 BC365BC0 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+40: 00000000 00000000 00000000 BDE9A398
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+50: BC365560 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+60: 00000000 00000000 00000000 01010040
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+70: 00000000 00000000 00000000 FFFFFFFF
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+80: 00000000 00000000 00000000 BC22C930
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+90: 00000000 00000001 00000800 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+A0: F800004C 00000000 00000000 BC365220
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+B0: BC3655C0 00000000 00000000 00000000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+C0: 00CC0048 00460020 00460072 00650071
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+D0: 00750065 006E0063 00790000 01010000
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+E0: 00000000 00000000 00000000 FFFFFFFF
K> F8000008 AUDIT-HLC JitProlog pc=821741C8 r3+F0: 00000000 00000000 00000000 BD610B80

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbd84000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
564.236:00dc:013c:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
564.236:00dc:013c:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
564.240:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
564.240:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
564.399:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
564.825:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.827:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
564.839:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
564.839:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
564.839:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
564.840:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
564.843:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: Promoting write cache to read cache. No need to merge any disk caches.
564.844:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 1.012 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.607 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.370 ms.
564.845:00dc:0154:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
564.903:00dc:013c:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
564.903:00dc:013c:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
564.946:00dc:013c:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
565.065:00dc:013c:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.066:00dc:013c:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
565.067:00dc:013c:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
565.067:00dc:013c:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
565.067:00dc:013c:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
565.067:00dc:013c:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.136 ms.
565.068:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.221 ms.
565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
565.069:00dc:015c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
565.075:00dc:013c:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
565.173:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
565.194:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
565.195:00dc:00e0:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
565.773:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
566.349:00dc:016c:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
566.387:00dc:0164:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
805.907:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
805.907:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
805.910:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
805.910:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
805.955:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
806.100:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.101:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.105:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
806.105:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
806.105:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
806.105:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
806.106:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.161 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.185 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
806.107:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
806.154:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
806.154:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
806.197:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.310:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
806.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
806.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
806.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
806.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
806.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.156 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.659 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.035 ms.
806.314:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
806.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
806.408:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
806.422:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
806.423:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
806.948:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
807.499:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
807.521:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
893.096:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
893.096:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
893.099:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
893.099:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
893.145:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.308:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.310:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
893.310:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
893.310:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
893.310:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
893.311:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.187 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.161 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.040 ms.
893.312:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
893.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
893.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
893.405:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.520:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
893.522:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
893.522:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
893.522:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
893.522:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.153 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.199 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.034 ms.
893.523:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
893.529:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
893.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
893.631:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
893.632:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
894.203:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
894.705:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
894.727:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
956.778:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
956.778:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
956.781:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
956.781:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
956.826:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.983:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
956.985:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
956.985:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
956.985:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
956.985:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
956.985:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.171 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.269 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.028 ms.
956.986:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
957.031:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
957.031:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
957.075:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.186:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
957.188:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
957.188:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
957.188:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
957.188:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
957.188:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.172 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.231 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
957.189:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
957.195:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
957.285:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
957.295:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
957.806:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
958.343:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
958.382:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
1217.108:00d4:0128:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
1217.111:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1217.111:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1217.160:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.307:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1217.309:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1217.309:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1217.309:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.166 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.173 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
1217.310:00d4:0140:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1217.360:00d4:0128:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1217.360:00d4:0128:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1217.403:00d4:0128:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.515:00d4:0128:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1217.516:00d4:0128:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1217.516:00d4:0128:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1217.516:00d4:0128:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.157 ms.
1217.517:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.208 ms.
1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.032 ms.
1217.518:00d4:0148:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1217.524:00d4:0128:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
1217.612:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
1217.622:00d4:00d8:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
1218.136:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
1218.678:00d4:0158:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
1218.699:00d4:0150:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,89 @@
warn: CreateDXGIFactory2: Ignoring flags
info: Game: xenia_canary.exe
info: DXVK: v2.7.1
info: Build: x86_64 gcc 15.1.0
info: Vulkan: Found vkGetInstanceProcAddr in winevulkan.dll @ 0x6ffffbfb4000
info: Extension providers:
info: Platform WSI
info: OpenVR
info: OpenVR: could not open registry key, status 2
info: OpenVR: Failed to locate module
info: OpenXR
info: Enabled instance extensions:
info: VK_EXT_surface_maintenance1
info: VK_KHR_get_surface_capabilities2
info: VK_KHR_surface
info: VK_KHR_win32_surface
info: Found device: NVIDIA GeForce GTX 1070 Ti (NVIDIA 580.159.3)
info: Found device: llvmpipe (LLVM 20.1.2, 256 bits) (llvmpipe 25.2.8)
info: Skipping: Software driver
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
warn: DxgiAdapter::QueryInterface: Unknown interface query
warn: f0db4c7f-fe5a-42a2-bd62-f2a6cf6fc83e
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_apply_application_workarounds: Program name: "xenia_canary.exe" (hash: c099ade372da5277)
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_instance_deduce_config_flags_from_environment: shader_cache is used, global_pipeline_cache is enforced.
1413.916:00d0:0124:info:vkd3d-proton:vkd3d_config_flags_init_once: VKD3D_CONFIG=''.
1413.919:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1413.919:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1413.963:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.109:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1414.111:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1414.111:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1414.111:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1414.112:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.173 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.276 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.029 ms.
1414.113:00d0:013c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1414.157:00d0:0124:info:vkd3d-proton:vkd3d_get_vk_version: vkd3d-proton - applicationVersion: 3.0.1.
1414.157:00d0:0124:info:vkd3d-proton:vkd3d_instance_init: vkd3d-proton - build: 3b10bd7a7ec6a73.
1414.199:00d0:0124:info:vkd3d-proton:vkd3d_init_device_caps: Not all relevant pipeline stages are supported by EXT_dgc. Skipping.
1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_decide_hvv_usage: Topology: Device heaps are split. Assuming small BAR situation.
1414.310:00d0:0124:info:vkd3d-proton:vkd3d_memory_info_upload_hvv_memory_properties: Topology: HVV usage is not allowed, using HOST_COHERENT for UPLOAD.
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_get_bindless_flags: Device does not support VK_EXT_mutable_descriptor_type (or VALVE).
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.311:00d0:0124:info:vkd3d-proton:vkd3d_bindless_state_add_binding: Device supports VK_EXT_descriptor_buffer!
1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_caps_init_shader_model: Enabling support for SM 6.6.
1414.312:00d0:0124:fixme:vkd3d-proton:d3d12_device_caps_init_feature_options1: TotalLaneCount = 2432, may be inaccurate.
1414.312:00d0:0124:info:vkd3d-proton:d3d12_device_determine_ray_tracing_tier: DXR support enabled.
1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Remapping VKD3D_SHADER_CACHE to: vkd3d-proton.cache.
1414.312:00d0:0124:info:vkd3d-proton:vkd3d_pipeline_library_init_disk_cache: Attempting to load disk cache from: vkd3d-proton.cache.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Performing async setup of stream archive ...
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_merge: No write cache exists. No need to merge any disk caches.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Merging pipeline libraries took 0.158 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Mapping read-only cache took 0.256 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_cache_initial_setup: Parsing stream archive took 0.031 ms.
1414.313:00d0:0144:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Done performing async setup of stream archive.
1414.319:00d0:0124:fixme:vkd3d-proton:d3d12_command_queue_init: Ignoring priority 0x64.
warn: DXGIGetDebugInterface1: Stub
info: DXGI: Hiding actual GPU, reporting:
info: vendor ID: 0x1002
info: device ID: 0x73df
1414.406:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init: Creating swapchain (1280 x 720), BufferCount = 3.
1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sync_objects: Ensure maximum latency of 3 frames with KHR_present_wait.
1414.416:00d0:00d4:info:vkd3d-proton:dxgi_vk_swap_chain_init_sleep_state: Timer interval is 1.0 ms.
warn: DXGI: MakeWindowAssociation: Ignoring flags
warn: DxgiOutput::WaitForVBlank: Inaccurate
info: Setting timer interval to 1000 us
1414.927:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.
1415.477:00d0:0154:fixme:vkd3d-proton:vkd3d_texture_view_desc_fixup: Remapping 2D to 2D_ARRAY. Needs Vulkan spec tightening to match D3D12 properly.
1415.500:00d0:014c:info:vkd3d-proton:dxgi_vk_swap_chain_recreate_swapchain_in_present_task: Got 3 swapchain images.

View File

@@ -0,0 +1,47 @@
# iterate-2D Deferred Structural Fixes — Outcome
Branch `iterate-2D/subsystem-fixes`. After verification + the user's go-ahead:
## Issue 1 — 32-bit word-form ALU truncation (PPCBUG-020) — ✅ FIXED & LANDED
Commit **341196a**. Confirmed load-bearing via runtime ours-vs-canary capture:
Sylpheed's ms→LARGE_INTEGER converter `sub_824ACA88` (`clrldi; mulli r11,r11,-10000; std`)
produced `0x00000000_FFFD8F00` in ours vs canary's correct `0xFFFFFFFF_FFFD8F00` for a 16 ms
wait — a positive (absolute) timeout → ~26000× over-wait that froze the main frame loop.
Fixed the 17 data-losing word-form ops (full 64-bit result, CA/OV/CR0 preserved byte-identical),
updated 7 bug-asserting tests, re-baselined `sylpheed_n50m` (imports 40454→1790936), `sylpheed_n2m`
unchanged. 660/660 + ignored oracle green; lockstep determinism preserved. Boot unwedged
(parallel NtWaitForMultipleObjectsEx 94→30428; frozen worker/critical-section loops now run).
VdSwap still 1 — rendering progression needs the out-of-scope acd1656 fixes (nt_create_event
polarity + 2.AF), not in this branch.
## Issue 2 — Memory page-size per-region collapse — DEFERRED (verified NOT load-bearing)
Sylpheed requests `MmAllocatePhysicalMemoryEx` with flags=0, alignment(r8)=0 (default); ours returns
self-consistent 4K-aligned addresses and boots. ours has no 0xA0/0xC0/0xE0 physical-region model at
all, so a faithful fix is a region-model rewrite that shifts every physical guest VA (golden-breaking,
invalidates the audit-059 VA map) with no demonstrated boot benefit. A partial page-size-only change
would shift VAs for zero correctness gain — do NOT do it piecemeal. Pursue only if a render-path
struct is proven to depend on physical region/alignment.
## Issue 3 — Timing — LEFT (not load-bearing / determinism-coupled)
- 3d DPC/APC: INERT — the only timer (NtSetTimerEx) passes a NULL APC routine; no
NtQueueApcThread/KeInsertQueueDpc imported.
- 3b timeout sign: was a SYMPTOM of Issue 1 (the "positive absolute" timeouts were mulli-corruption
artifacts) — resolved by the Issue 1 fix.
- 3a/3c timebase/skew: timebase = instruction-count IS the deterministic lockstep clock; must not
become wallclock. 2.AF deadline-drain already present. Not load-bearing for Sylpheed.
## Issue 4 — VFS synthesized-success-on-miss — LEFT (risky / coupled to Issue 1 trajectory)
The synthesis fallback handles a MIX (writable-partition probes partition0/Cache0 + a genuine disc
miss dat/files.tbl, verified absent from the ISO). Canary doesn't fire XamShowDirtyDiscErrorUI during
boot (the one "DirtyDisc" log hit is the import-table declaration). Not cleanly separable without
heuristic disc-vs-partition routing. Re-verify on the corrected post-Issue-1 (and post-acd1656)
trajectory before changing.
## Issue 5 — Mutant object — SKIPPED (verified unused)
Sylpheed's XEX import table contains NO mutant symbols (NtCreateMutant/NtReleaseMutant/KeReleaseMutant/
KeInitializeMutant/NtQueryMutant) — the game cannot call them; unimplemented=0 across boot. A correct
implementation needs mutant hand-off semantics + an owner-type redesign (the existing
`Mutex { owner: Option<u8> }` tracks a HW slot, not a thread) in the determinism-critical wait path,
for code that never executes. Per the mandate's skip-if-unused criterion, left unimplemented. Can be
added on request as a pure canary-parity / future-title feature (determinism-safe since no Sylpheed
mutant ever exists at runtime).

View File

@@ -1301,6 +1301,29 @@ fn cmd_exec_inner(
} }
} }
// iterate-2E — pointer-chase probe. `XENIA_AUDIT_DEREF=<reg>:<off>`
// (e.g. `4:36`). On each AUDIT-PC-PROBE fire, dumps gpr[reg] as a base
// object, the sub-object at [base+off], and that sub-object's vtable.
// Read-only; lockstep digest unaffected.
if let Ok(spec) = std::env::var("XENIA_AUDIT_DEREF") {
if !spec.is_empty() {
let (rs, os) = spec
.split_once(':')
.ok_or_else(|| anyhow::anyhow!("XENIA_AUDIT_DEREF {spec:?}: expected <reg>:<off>"))?;
let reg: u8 = rs.trim_start_matches('r').parse()
.map_err(|e| anyhow::anyhow!("XENIA_AUDIT_DEREF reg {rs:?}: {e}"))?;
let off: u32 = if let Some(h) = os.strip_prefix("0x") {
u32::from_str_radix(h, 16)
} else {
os.parse::<u32>()
}.map_err(|e| anyhow::anyhow!("XENIA_AUDIT_DEREF off {os:?}: {e}"))?;
kernel.audit_deref = Some((reg, off));
if !quiet {
tracing::info!("audit-deref armed: r{} +0x{:x}", reg, off);
}
}
}
// Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or // Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or
// `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents // `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents
// are dumped at end-of-run by `dump_thread_diagnostic`. Pure // are dumped at end-of-run by `dump_thread_diagnostic`. Pure
@@ -1474,16 +1497,28 @@ fn cmd_exec_inner(
mem.write_u32(addr, block); mem.write_u32(addr, block);
} }
("xboxkrnl.exe", 0x00AD) => { ("xboxkrnl.exe", 0x00AD) => {
// KeTimeStampBundle — 0x18 block with FILETIME at +0 and // KeTimeStampBundle — X_TIME_STAMP_BUNDLE (canary layout,
// interrupt-time u64 at +0x10. Mirrors the clock used by // kernel_state.h): +0x00 interrupt_time u64, +0x08
// KeQuerySystemTime so fast-path readers see consistent values. // system_time u64 (FILETIME 100ns), +0x10 tick_count u32
// (milliseconds since boot), +0x14 padding. The guest's
// worker-hub channel-dispatch loop (sub_82450A68 @
// 0x82450b10) polls [block+0x10] (tick_count) and gates
// dispatch on a `tick_count + 66` (ms) deadline. The block
// MUST be ticked over the run or that deadline never
// elapses (tid14 0x109c starvation gate). Initialize to a
// zero-uptime base; KernelState::update_timestamp_bundle
// ticks it every round from the deterministic global_clock.
let block = alloc_zero(0x18, &mut mem, &mut kernel); let block = alloc_zero(0x18, &mut mem, &mut kernel);
if block != 0 { if block != 0 {
let fake_time: u64 = 132_500_000_000_000_000; // ~2021 FILETIME // FILETIME base (~2021) so system_time is plausible.
mem.write_u32(block, (fake_time >> 32) as u32); let fake_time: u64 = 132_500_000_000_000_000;
mem.write_u32(block + 4, fake_time as u32); mem.write_u32(block, 0); // interrupt_time hi
mem.write_u32(block + 0x10, (fake_time >> 32) as u32); mem.write_u32(block + 4, 0); // interrupt_time lo
mem.write_u32(block + 0x14, fake_time as u32); mem.write_u32(block + 0x08, (fake_time >> 32) as u32); // system_time hi
mem.write_u32(block + 0x0C, fake_time as u32); // system_time lo
mem.write_u32(block + 0x10, 0); // tick_count (ms) = 0 at boot
mem.write_u32(block + 0x14, 0); // padding
kernel.timestamp_bundle_addr = block;
} }
mem.write_u32(addr, block); mem.write_u32(addr, block);
} }
@@ -2124,6 +2159,27 @@ fn coord_pre_round(
} }
kernel.fire_due_timers(); kernel.fire_due_timers();
// 2.AF — fire expired wait-deadlines under load. Without this drain,
// `advance_to_next_wake_if_due` only runs in `coord_idle_advance` (the
// no-Ready-threads path), so a thread whose `KeWait*`/`KeDelay` deadline
// expires while other threads keep the scheduler busy sits Blocked
// forever (observed: tid=5's 42.95ms deadline unfired 29s+). Drain every
// entry whose deadline `<=` the current guest timebase — the same `now`
// basis `fire_due_timers` uses, so the two stay in lock-step — and let
// `handle_timeout_wake` stamp `STATUS_TIMEOUT` and scrub the waiter from
// each handle. `advance_to_next_wake_if_due` pops at most one due wake
// per call and returns `None` once the earliest remaining deadline is in
// the future, so this loop terminates. Deterministic: `ctx(0).timebase`
// is the guest-cycle timebase, not host_ns. This runs in `coord_pre_round`
// which both the lockstep and parallel outer loops call every round.
loop {
let now = kernel.now_basis_at(0);
let Some((r, reason)) = kernel.scheduler.advance_to_next_wake_if_due(now)
else {
break;
};
kernel.handle_timeout_wake(r, reason);
}
// Graphics-interrupt delivery is no longer done here — see // Graphics-interrupt delivery is no longer done here — see
// `dispatch_graphics_interrupts`, called from the outer loop with // `dispatch_graphics_interrupts`, called from the outer loop with
// `mem` and `&mut stats` in scope. The audio path still uses the // `mem` and `&mut stats` in scope. The audio path still uses the
@@ -2575,6 +2631,10 @@ fn worker_prologue(
match result { match result {
StepResult::Continue => {} StepResult::Continue => {}
StepResult::Yield => {
// db16cyc spin-wait hint (per-instruction path): yield the slot.
kernel.scheduler.yield_current();
}
StepResult::SystemCall => { StepResult::SystemCall => {
tracing::warn!("SYSCALL at {:#010x} (hw={})", pc, hw_id); tracing::warn!("SYSCALL at {:#010x} (hw={})", pc, hw_id);
} }
@@ -2654,6 +2714,11 @@ fn worker_epilogue(
match result { match result {
StepResult::Continue => {} StepResult::Continue => {}
StepResult::Yield => {
// db16cyc spin-wait hint: hand the slot to a Ready peer so the
// spinner doesn't starve the co-located thread it is waiting on.
kernel.scheduler.yield_current();
}
StepResult::SystemCall => { StepResult::SystemCall => {
let last_pc = block.instrs.last().map(|i| i.addr).unwrap_or(pc_before); let last_pc = block.instrs.last().map(|i| i.addr).unwrap_or(pc_before);
tracing::warn!("SYSCALL at {:#010x} (hw={})", last_pc, hw_id); tracing::warn!("SYSCALL at {:#010x} (hw={})", last_pc, hw_id);
@@ -2780,6 +2845,32 @@ fn run_execution(
RoundCtl::BreakOuter => break, RoundCtl::BreakOuter => break,
RoundCtl::Continue => {} RoundCtl::Continue => {}
} }
// ITERATE-2C Phase D — deposit the current instruction count so
// `nt_create_event` can compute absolute auto-signal deadlines,
// then drain any pending auto-signals whose deadline has passed.
// Both calls are no-ops when `XENIA_SILPH_UI_AUTOSIGNAL_DELAY`
// is unset (the pending queue stays empty).
kernel.set_now_cycle_hint(stats.instruction_count);
// Drive the coherent monotonic "now" the kernel deadline-arithmetic
// reads (`KernelState::now_basis_at` -> `Scheduler::global_clock`)
// from the deterministic retired-instruction count. Floored up (never
// backwards). This is the LOCKSTEP analogue of the parallel writeback's
// `advance_global_clock`: a parked/poll thread computing a relative
// timeout via `parse_timeout` now reads a real, non-zero, monotone
// basis instead of `idle_ctx`'s timebase-0, so its deadline lands in
// the future and `coord_idle_advance` stops re-arming the constant
// past deadline forever (the timebase-desync livelock / render-gate
// root). Pure function of guest instructions -> bit-reproducible.
kernel
.scheduler
.advance_global_clock_to(stats.instruction_count);
// ITERATE-2J — tick the KeTimeStampBundle (ordinal 0x00AD) from the
// same deterministic clock so the guest's worker-hub tick_count
// deadline gate (`[block+0x10] + 66` ms) actually elapses. Without
// this the block is frozen at boot and the hub spins forever,
// starving tid14 on event 0x109c.
kernel.update_timestamp_bundle(mem, kernel.scheduler.global_clock());
kernel.fire_due_silph_autosignals(stats.instruction_count);
dispatch_graphics_interrupts( dispatch_graphics_interrupts(
kernel, kernel,
mem, mem,
@@ -3118,6 +3209,16 @@ fn run_execution_parallel(
.and_then(|t| guard.scheduler.find_by_tid(t)) .and_then(|t| guard.scheduler.find_by_tid(t))
.unwrap_or(thread_ref); .unwrap_or(thread_ref);
*guard.scheduler.ctx_mut_ref(target_ref) = ctx_taken; *guard.scheduler.ctx_mut_ref(target_ref) = ctx_taken;
// Advance the parallel-mode coherent clock by
// the instructions this block retired. This is
// the single authoritative "now" the kernel
// deadline-arithmetic reads in parallel mode
// (per-thread `ctx.timebase` is incoherent here
// because peers extract/zero their slots) —
// keeping it monotonic breaks the timebase-
// desync livelock where a woken thread re-armed
// the same constant deadline forever.
guard.scheduler.advance_global_clock(executed);
// worker_epilogue's exit_current path // worker_epilogue's exit_current path
// expects scheduler.current to be set // expects scheduler.current to be set
// to the running thread. // to the running thread.
@@ -3204,6 +3305,25 @@ fn run_execution_parallel(
} }
let mut guard = pre_outcome.1; let mut guard = pre_outcome.1;
// ITERATE-2C Phase D — same auto-signal hook as the lockstep
// path. Held under the same `kernel_arc` guard the rest of
// this prologue runs under, so no extra locking.
{
let s = stats_mtx.lock().expect("stats mutex poisoned");
guard.set_now_cycle_hint(s.instruction_count);
guard.fire_due_silph_autosignals(s.instruction_count);
}
// ITERATE-2J — tick the KeTimeStampBundle (ordinal 0x00AD) from
// the parallel-mode coherent global_clock (summed per-block
// retired instructions). Same fix as the lockstep loop: keeps the
// guest's worker-hub tick_count deadline gate advancing so it
// dispatches channel-3 and unblocks tid14 on event 0x109c.
{
let clock = guard.scheduler.global_clock();
guard.update_timestamp_bundle(mem, clock);
}
// Iterate-2.BE — host-driven synchronous ISR dispatch. // Iterate-2.BE — host-driven synchronous ISR dispatch.
// Runs under the kernel lock while workers are still parked // Runs under the kernel lock while workers are still parked
// at the phaser B2 barrier (the coordinator hasn't published // at the phaser B2 barrier (the coordinator hasn't published
@@ -3555,6 +3675,9 @@ fn dispatch_graphics_interrupts(
isr_instrs += 1; isr_instrs += 1;
match r { match r {
StepResult::Continue => {} StepResult::Continue => {}
// db16cyc inside the synchronous ISR has no slot to yield —
// the ISR runs to completion on the borrowed context.
StepResult::Yield => {}
StepResult::SystemCall => { StepResult::SystemCall => {
tracing::warn!("graphics ISR hit `sc` instruction; aborting"); tracing::warn!("graphics ISR hit `sc` instruction; aborting");
break; break;

View File

@@ -1,9 +1,9 @@
{ {
"instructions": 50000001, "instructions": 50000003,
"imports": 40454, "imports": 451508,
"unimpl": 0, "unimpl": 0,
"draws": 0, "draws": 0,
"swaps": 1, "swaps": 2,
"unique_render_targets": 0, "unique_render_targets": 0,
"shader_blobs_live": 0, "shader_blobs_live": 0,
"texture_cache_entries": 0 "texture_cache_entries": 0

View File

@@ -28,6 +28,15 @@ pub enum StepResult {
Trap, Trap,
/// Execution halted (by debugger or error). /// Execution halted (by debugger or error).
Halted, Halted,
/// Executed the `db16cyc` spin-wait hint (`or r31,r31,r31`, encoding
/// `0x7FFFFB78`). The PC has already advanced past the hint; this is a
/// cooperative-yield signal so the scheduler hands the slot to a Ready
/// peer. On real hardware all six HW threads run concurrently and the
/// spin resolves naturally; under our round-robin lockstep a spinning
/// barrier/spinlock participant would otherwise monopolize its slot and
/// starve the co-located thread it is waiting on. Matches canary's
/// `InstrEmit_orx` db16cyc → `DelayExecution()` handling.
Yield,
} }
/// Execute a single PPC instruction. /// Execute a single PPC instruction.
@@ -95,6 +104,9 @@ pub fn step_block(
ctx.cycle_count += 1; ctx.cycle_count += 1;
ctx.timebase += 1; ctx.timebase += 1;
if !matches!(result, StepResult::Continue) { if !matches!(result, StepResult::Continue) {
// `Yield` (db16cyc spin hint) terminates the block here so the
// scheduler regains control and can rotate the slot; the PC has
// already advanced past the hint inside `execute`.
return result; return result;
} }
// PC discontinuity within a block. By construction only the // PC discontinuity within a block. By construction only the
@@ -117,65 +129,65 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addis => { PpcOpcode::addis => {
// Xbox 360 user mode is 32-bit ABI (MSR.SF=0), so addis must // PPCBUG-020 fix: Xenon is a 64-bit core; `addis` produces the full
// produce a value whose upper 32 bits don't pollute downstream // 64-bit `RA + (EXTS(SI) << 16)`. Matches canary
// 64-bit arithmetic. The PPC ISA in 64-bit mode sign-extends // (`Add(RA, Int64(EXTS(imm) << 16))`, stores full 64-bit).
// simm16 before the shift, producing 0xFFFFFFFF_xxxx0000 for
// negative simm16 (high bit set). When this value flows into
// a 64-bit subfc against a zero-extended lwz value, the unsigned
// 64-bit comparison yields wrong CA. Truncate to 32 bits to
// simulate 32-bit ABI behavior.
let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16); let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16);
ctx.gpr[instr.rd()] = result as u32 as u64; ctx.gpr[instr.rd()] = result;
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addic => { PpcOpcode::addic => {
// PPCBUG-002: 32-bit ABI. CA must be from a 32-bit unsigned compare; // PPCBUG-020 fix: full 64-bit `RA + EXTS(SI)` (canary `Add(RA,
// canary's `AddDidCarry` truncates both operands to int32 first. // Int64(EXTS(imm)))`). CA stays a 32-bit unsigned compare to match
// canary's `AddDidCarry` (truncates operands to int32 first).
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let imm32 = instr.simm16() as i32 as u32; let imm32 = instr.simm16() as i32 as u32;
let result32 = ra32.wrapping_add(imm32); let result32 = ra32.wrapping_add(imm32);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(instr.simm16() as i64 as u64);
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addicx => { PpcOpcode::addicx => {
// PPCBUG-003: same fix as addic plus CR0 i32 view. // PPCBUG-020 fix: full 64-bit result; CA 32-bit; CR0 32-bit i32 view
// (= low 32 of the result; unchanged from the pre-fix behaviour).
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let imm32 = instr.simm16() as i32 as u32; let imm32 = instr.simm16() as i32 as u32;
let result32 = ra32.wrapping_add(imm32); let result32 = ra32.wrapping_add(imm32);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(instr.simm16() as i64 as u64);
ctx.update_cr_signed(0, result32 as i32 as i64); ctx.update_cr_signed(0, result32 as i32 as i64);
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subficx => { PpcOpcode::subficx => {
// PPCBUG-005: 32-bit ABI. Sign-extended imm has bits 32-63 set for // PPCBUG-020 fix: full 64-bit `EXTS(SI) - RA` (canary `Sub(Int64(
// negative SIMM, poisoning the writeback. Canary uses 32-bit form. // EXTS(imm)), RA)`). CA stays a 32-bit compare.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let imm32 = instr.simm16() as i32 as u32; let imm32 = instr.simm16() as i32 as u32;
let result32 = imm32.wrapping_sub(ra32); let result32 = imm32.wrapping_sub(ra32);
ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 }; ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = (instr.simm16() as i64 as u64).wrapping_sub(ctx.gpr[instr.ra()]);
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::mulli => { PpcOpcode::mulli => {
// PPCBUG-004: 32-bit ABI. Read RA as i32 (low 32, sign-extended for // PPCBUG-020 fix: full 64-bit low product of (full 64-bit RA) ×
// multiply), product fits in 32 bits per ISA (overflow wraps). // EXTS(SI). Matches canary InstrEmit_mulli
let ra = ctx.gpr[instr.ra()] as i32 as i64; // (`StoreGPR(Mul(LoadGPR(RA), Int64(EXTS(imm))))`).
let ra = ctx.gpr[instr.ra()] as i64;
let imm = instr.simm16() as i64; let imm = instr.simm16() as i64;
ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64; ctx.gpr[instr.rd()] = ra.wrapping_mul(imm) as u64;
ctx.pc += 4; ctx.pc += 4;
} }
// ===== ALU: Register ===== // ===== ALU: Register =====
PpcOpcode::addx => { PpcOpcode::addx => {
// PPCBUG-012+020: 32-bit ABI writeback truncation + CR0 i32 view. // PPCBUG-020 fix: full 64-bit `RA + RB` (canary `Add(RA, RB)`).
// OV/CR0 keep their 32-bit computation (low 32 of the result is
// unchanged), so only the previously-zeroed upper 32 bits change.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let result32 = ra32.wrapping_add(rb32); let result32 = ra32.wrapping_add(rb32);
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]);
if instr.oe() { if instr.oe() {
let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128); let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -186,12 +198,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addcx => { PpcOpcode::addcx => {
// PPCBUG-013+020: 32-bit truncation; CA from u32 unsigned compare. // PPCBUG-020 fix: full 64-bit `RA + RB`; CA stays 32-bit (canary
// `AddDidCarry` truncates to int32). Low 32 of result unchanged.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let result32 = ra32.wrapping_add(rb32); let result32 = ra32.wrapping_add(rb32);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]);
if instr.oe() { if instr.oe() {
let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128); let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -202,13 +215,15 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addex => { PpcOpcode::addex => {
// PPCBUG-014+020: 32-bit truncation; CA from u32 unsigned compare. // PPCBUG-020 fix: full 64-bit `RA + RB + CA`; CA stays 32-bit.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = ra32.wrapping_add(rb32).wrapping_add(ca); let result32 = ra32.wrapping_add(rb32).wrapping_add(ca);
ctx.xer_ca = if result32 < ra32 || (ca != 0 && result32 == ra32) { 1 } else { 0 }; ctx.xer_ca = if result32 < ra32 || (ca != 0 && result32 == ra32) { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()]
.wrapping_add(ctx.gpr[instr.rb()])
.wrapping_add(ca as u64);
if instr.oe() { if instr.oe() {
let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128) + (ca as i128); let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128) + (ca as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -219,12 +234,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addzex => { PpcOpcode::addzex => {
// PPCBUG-015+020: 32-bit truncation. // PPCBUG-020 fix: full 64-bit `RA + CA`; CA stays 32-bit.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = ra32.wrapping_add(ca); let result32 = ra32.wrapping_add(ca);
ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; ctx.xer_ca = if result32 < ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ca as u64);
if instr.oe() { if instr.oe() {
let true_sum = (ra32 as i32 as i128) + (ca as i128); let true_sum = (ra32 as i32 as i128) + (ca as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -235,12 +250,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::addmex => { PpcOpcode::addmex => {
// PPCBUG-016+020: 32-bit truncation. RT = RA + CA - 1. // PPCBUG-020 fix: full 64-bit `RA + CA - 1`; CA stays 32-bit.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = ra32.wrapping_add(ca).wrapping_sub(1); let result32 = ra32.wrapping_add(ca).wrapping_sub(1);
ctx.xer_ca = if ra32 != 0 || ca != 0 { 1 } else { 0 }; ctx.xer_ca = if ra32 != 0 || ca != 0 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.ra()].wrapping_add(ca as u64).wrapping_sub(1);
if instr.oe() { if instr.oe() {
let true_sum = (ra32 as i32 as i128) + (ca as i128) - 1; let true_sum = (ra32 as i32 as i128) + (ca as i128) - 1;
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -251,11 +266,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subfx => { PpcOpcode::subfx => {
// PPCBUG-017+020: 32-bit truncation. // PPCBUG-020 fix: full 64-bit `RB - RA` (canary `Sub(RB, RA)`).
// OV/CR0 keep their 32-bit view (low 32 of result unchanged).
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let result32 = rb32.wrapping_sub(ra32); let result32 = rb32.wrapping_sub(ra32);
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.rb()].wrapping_sub(ctx.gpr[instr.ra()]);
if instr.oe() { if instr.oe() {
let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128); let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128);
overflow::apply(ctx, true_diff != (result32 as i32) as i128); overflow::apply(ctx, true_diff != (result32 as i32) as i128);
@@ -266,14 +282,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subfcx => { PpcOpcode::subfcx => {
// PPCBUG-007: 32-bit ABI. The `rb >= ra` u64 unsigned compare is // PPCBUG-020 fix: full 64-bit `RB - RA`; CA stays a 32-bit `rb >= ra`
// exactly the shape that broke addis. Defensive 32-bit truncation // compare (canary `SubDidCarry` truncates to int32).
// is required for correct CA even after upstream cleanup.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let result32 = rb32.wrapping_sub(ra32); let result32 = rb32.wrapping_sub(ra32);
ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 }; ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = ctx.gpr[instr.rb()].wrapping_sub(ctx.gpr[instr.ra()]);
if instr.oe() { if instr.oe() {
let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128); let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128);
overflow::apply(ctx, true_diff != (result32 as i32) as i128); overflow::apply(ctx, true_diff != (result32 as i32) as i128);
@@ -284,14 +299,16 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subfex => { PpcOpcode::subfex => {
// PPCBUG-008: 32-bit ABI. Compute in u32 space — `!ra` on u64 always // PPCBUG-020 fix: full 64-bit `~RA + RB + CA` (canary semantics).
// pollutes the upper 32 bits, making this an active poisoner. // CA keeps its 32-bit compare. Low 32 of the result is unchanged.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let rb32 = ctx.gpr[instr.rb()] as u32; let rb32 = ctx.gpr[instr.rb()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca); let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca);
ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 }; ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()])
.wrapping_add(ctx.gpr[instr.rb()])
.wrapping_add(ca as u64);
if instr.oe() { if instr.oe() {
// RT <- !RA + RB + CA == RB - RA - 1 + CA (32-bit semantics). // RT <- !RA + RB + CA == RB - RA - 1 + CA (32-bit semantics).
let true_sum = (rb32 as i32 as i128) - (ra32 as i32 as i128) - 1 + (ca as i128); let true_sum = (rb32 as i32 as i128) - (ra32 as i32 as i128) - 1 + (ca as i128);
@@ -303,14 +320,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subfzex => { PpcOpcode::subfzex => {
// PPCBUG-018: same active-poisoning shape as subfex; operate in u32. // PPCBUG-020 fix: full 64-bit `~RA + CA` (canary semantics).
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(ca); let result32 = (!ra32).wrapping_add(ca);
// RT <- !RA + CA (no -1 term). 32-bit carry-out only when // CA: 32-bit carry-out only when !ra32 = u32::MAX (ra32 = 0) AND ca = 1.
// !ra32 = u32::MAX (i.e. ra32 = 0) AND ca = 1.
ctx.xer_ca = if ra32 == 0 && ca != 0 { 1 } else { 0 }; ctx.xer_ca = if ra32 == 0 && ca != 0 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()]).wrapping_add(ca as u64);
if instr.oe() { if instr.oe() {
let true_sum = -(ra32 as i32 as i128) - 1 + (ca as i128); let true_sum = -(ra32 as i32 as i128) - 1 + (ca as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -321,13 +337,13 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::subfmex => { PpcOpcode::subfmex => {
// PPCBUG-019: also fixes the always-true CA edge — `!ra` on u64 // PPCBUG-020 fix: full 64-bit `~RA + CA - 1` (canary semantics). CA
// is non-zero when ra32==0xFFFFFFFF and ca==0, so CA was stuck at 1. // uses the 32-bit `!ra32` so it isn't stuck at 1 from u64 inversion.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let ca = ctx.xer_ca as u32; let ca = ctx.xer_ca as u32;
let result32 = (!ra32).wrapping_add(ca).wrapping_sub(1); let result32 = (!ra32).wrapping_add(ca).wrapping_sub(1);
ctx.xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }; ctx.xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 };
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = (!ctx.gpr[instr.ra()]).wrapping_add(ca as u64).wrapping_sub(1);
if instr.oe() { if instr.oe() {
let true_sum = -(ra32 as i32 as i128) - 2 + (ca as i128); let true_sum = -(ra32 as i32 as i128) - 2 + (ca as i128);
overflow::apply(ctx, true_sum != (result32 as i32) as i128); overflow::apply(ctx, true_sum != (result32 as i32) as i128);
@@ -338,12 +354,11 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::negx => { PpcOpcode::negx => {
// PPCBUG-006: 32-bit ABI. `(!ra).wrapping_add(1)` on u64 always // PPCBUG-020 fix: full 64-bit `-RA` (canary `Sub(0, RA)`). OV keeps
// sets upper 32 bits — every neg poisoned the GPR. neg_ov also // the 32-bit INT_MIN check (low 32 of the result is unchanged).
// checks at 64-bit INT_MIN; should be 32-bit INT_MIN.
let ra32 = ctx.gpr[instr.ra()] as u32; let ra32 = ctx.gpr[instr.ra()] as u32;
let result32 = (!ra32).wrapping_add(1); let result32 = (!ra32).wrapping_add(1);
ctx.gpr[instr.rd()] = result32 as u64; ctx.gpr[instr.rd()] = 0u64.wrapping_sub(ctx.gpr[instr.ra()]);
if instr.oe() { if instr.oe() {
overflow::apply(ctx, ra32 == 0x8000_0000); overflow::apply(ctx, ra32 == 0x8000_0000);
} }
@@ -353,12 +368,15 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::mullwx => { PpcOpcode::mullwx => {
// PPCBUG-009: 32-bit ABI. Truncate product to u32 — overflow detection // PPCBUG-020 fix: full 64-bit low product of EXTS(RA[32:63]) ×
// (mullw_ov) still uses the full i64 product to catch the overflow. // EXTS(RB[32:63]) (canary InstrEmit_mullwx stores the full i64
// product). A 32×32 product can occupy the upper 32 bits (e.g.
// 0x10000 × 0x10000 = 0x1_0000_0000); the old `as u32` dropped them.
// OV uses the full product; CR0 keeps its 32-bit (low-word) view.
let ra = ctx.gpr[instr.ra()] as i32 as i64; let ra = ctx.gpr[instr.ra()] as i32 as i64;
let rb = ctx.gpr[instr.rb()] as i32 as i64; let rb = ctx.gpr[instr.rb()] as i32 as i64;
let product = ra.wrapping_mul(rb); let product = ra.wrapping_mul(rb);
ctx.gpr[instr.rd()] = product as u32 as u64; ctx.gpr[instr.rd()] = product as u64;
if instr.oe() { if instr.oe() {
overflow::apply(ctx, overflow::mullw_ov(product)); overflow::apply(ctx, overflow::mullw_ov(product));
} }
@@ -542,6 +560,18 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()]; ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()];
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
ctx.pc += 4; ctx.pc += 4;
// `or r31,r31,r31` with encoding 0x7FFFFB78 is the Xenon `db16cyc`
// spin-wait hint (a no-op write of r31 onto itself). Canary's
// `InstrEmit_orx` special-cases exactly this code → `DelayExecution()`.
// Under our round-robin lockstep, a guest spinlock/barrier loop that
// executes db16cyc would otherwise consume its whole block every round
// and starve the co-located thread it is waiting on (the lock holder /
// barrier peer). Surface it as a cooperative yield so the scheduler can
// hand the slot to a Ready peer. The semantic result of the op is
// already applied (r31 |= r31 is a no-op), so yielding is value-neutral.
if instr.raw == 0x7FFF_FB78 {
return StepResult::Yield;
}
} }
PpcOpcode::orcx => { PpcOpcode::orcx => {
// PPCBUG-028: same shape as andcx — operate in u32. // PPCBUG-028: same shape as andcx — operate in u32.
@@ -620,7 +650,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
PpcOpcode::slwx => { PpcOpcode::slwx => {
// PPCBUG-044: 32-bit ABI CR0 view. A result with bit 31 set // PPCBUG-044: 32-bit ABI CR0 view. A result with bit 31 set
// (e.g. 0x80000000) is negative in i32 view but positive in i64. // (e.g. 0x80000000) is negative in i32 view but positive in i64.
let sh = ctx.gpr[instr.rb()] as u32; // Shift amount is RB[58:63] (6 bits): if >=32 the result is zeroed,
// else shift by the low bits. Matches canary InstrEmit_slwx, which
// masks `rb & 0x3F` then tests bit 5 — NOT a full-u32 `< 32` test
// (a count like 0x40 has low-6-bits 0 and must pass the value
// through, not zero it).
let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
ctx.gpr[instr.ra()] = if sh < 32 { ctx.gpr[instr.ra()] = if sh < 32 {
((ctx.gpr[instr.rs()] as u32) << sh) as u64 ((ctx.gpr[instr.rs()] as u32) << sh) as u64
} else { 0 }; } else { 0 };
@@ -630,7 +665,9 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
PpcOpcode::srwx => { PpcOpcode::srwx => {
// PPCBUG-044: 32-bit ABI CR0 view (zero-extended right shift can never // PPCBUG-044: 32-bit ABI CR0 view (zero-extended right shift can never
// have bit 31 set, but use the canonical form for consistency). // have bit 31 set, but use the canonical form for consistency).
let sh = ctx.gpr[instr.rb()] as u32; // Shift amount masked to RB[58:63] (6 bits) to match canary
// InstrEmit_srwx (`rb & 0x3F`, test bit 5).
let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
ctx.gpr[instr.ra()] = if sh < 32 { ctx.gpr[instr.ra()] = if sh < 32 {
((ctx.gpr[instr.rs()] as u32) >> sh) as u64 ((ctx.gpr[instr.rs()] as u32) >> sh) as u64
} else { 0 }; } else { 0 };
@@ -638,37 +675,46 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::srawx => { PpcOpcode::srawx => {
// PPCBUG-041+043 coupled: 32-bit ABI writeback truncation + CR0 i32. // sraw: 32-bit arithmetic shift right. Per PowerISA the 32-bit result
// CA logic is independently correct (uses u32 shifted-out test). // is SIGN-extended into the full 64-bit RA (`RA <- r&m | (i64.s)&¬m`),
// matching canary InstrEmit_srawx (`v = f.SignExtend(v, INT64_TYPE)`).
// Earlier ours zero-extended (`result as u32 as u64`) — the PPCBUG-041
// "writeback truncation" band-aid — which corrupts any negative shift
// result consumed as a 64-bit value. CA logic is independently correct
// (uses the u32 shifted-out test) and the CR0 view is unchanged (the
// sign-extended i64 has the same i32 view).
let rs = ctx.gpr[instr.rs()] as i32; let rs = ctx.gpr[instr.rs()] as i32;
let sh = ctx.gpr[instr.rb()] as u32 & 0x3F; let sh = ctx.gpr[instr.rb()] as u32 & 0x3F;
if sh == 0 { let result: i32 = if sh == 0 {
ctx.gpr[instr.ra()] = rs as u32 as u64;
ctx.xer_ca = 0; ctx.xer_ca = 0;
rs
} else if sh < 32 { } else if sh < 32 {
let result = rs >> sh;
ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 }; ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 };
ctx.gpr[instr.ra()] = result as u32 as u64; rs >> sh
} else { } else {
ctx.gpr[instr.ra()] = if rs < 0 { 0xFFFF_FFFFu64 } else { 0 }; // sh >= 32: result is all sign bits of rs.
ctx.xer_ca = if rs < 0 { 1 } else { 0 }; ctx.xer_ca = if rs < 0 { 1 } else { 0 };
} rs >> 31
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } };
ctx.gpr[instr.ra()] = result as i64 as u64;
if instr.rc_bit() { ctx.update_cr_signed(0, result as i64); }
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::srawix => { PpcOpcode::srawix => {
// PPCBUG-042+043 coupled: same shape as srawx for the sh-immediate form. // srawi: same as srawx for the sh-immediate form (sh in 0..31).
// Sign-extend the 32-bit result into the full 64-bit RA per PowerISA /
// canary InstrEmit_srawix.
let rs = ctx.gpr[instr.rs()] as i32; let rs = ctx.gpr[instr.rs()] as i32;
let sh = instr.sh(); let sh = instr.sh();
if sh == 0 { let result: i32 = if sh == 0 {
ctx.gpr[instr.ra()] = rs as u32 as u64;
ctx.xer_ca = 0; ctx.xer_ca = 0;
rs
} else { } else {
let result = rs >> sh;
ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 }; ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 };
ctx.gpr[instr.ra()] = result as u32 as u64; rs >> sh
} };
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } ctx.gpr[instr.ra()] = result as i64 as u64;
if instr.rc_bit() { ctx.update_cr_signed(0, result as i64); }
ctx.pc += 4; ctx.pc += 4;
} }
PpcOpcode::sldx => { PpcOpcode::sldx => {
@@ -1605,7 +1651,12 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
match spr { match spr {
crate::context::spr::XER => ctx.set_xer(val as u32), crate::context::spr::XER => ctx.set_xer(val as u32),
crate::context::spr::LR => ctx.lr = val, crate::context::spr::LR => ctx.lr = val,
crate::context::spr::CTR => ctx.ctr = val as u32 as u64, // CTR is a 64-bit SPR — store the full GPR, matching canary
// InstrEmit_mtspr (`f.StoreCTR(rt)`, no truncation). The PPCBUG-054
// `val as u32 as u64` band-aid dropped the upper 32 bits, which a
// later `mfspr rX, CTR` would read back wrong. (bdnz/bcctr only
// ever consume CTR's low 32 bits, so branching is unaffected.)
crate::context::spr::CTR => ctx.ctr = val,
crate::context::spr::DEC => ctx.dec = val as u32, crate::context::spr::DEC => ctx.dec = val as u32,
crate::context::spr::TBL_WRITE => { crate::context::spr::TBL_WRITE => {
ctx.timebase = (ctx.timebase & 0xFFFF_FFFF_0000_0000) | (val & 0xFFFF_FFFF); ctx.timebase = (ctx.timebase & 0xFFFF_FFFF_0000_0000) | (val & 0xFFFF_FFFF);
@@ -5015,6 +5066,106 @@ mod tests {
assert_eq!(ctx.pc, 4); assert_eq!(ctx.pc, 4);
} }
#[test]
fn test_db16cyc_yields() {
// `or r31,r31,r31` encoding 0x7FFFFB78 is the Xenon db16cyc spin hint.
// It must (a) be value-neutral (r31 unchanged), (b) advance PC, and
// (c) report StepResult::Yield so the scheduler can hand off the slot.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
write_instr(&mut mem, 0, 0x7FFF_FB78);
ctx.pc = 0;
ctx.gpr[31] = 0x1234_5678_9ABC_DEF0;
let r = step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[31], 0x1234_5678_9ABC_DEF0, "db16cyc is value-neutral");
assert_eq!(ctx.pc, 4, "PC advances past the hint");
assert_eq!(r, StepResult::Yield, "db16cyc surfaces as a cooperative yield");
}
#[test]
fn test_plain_or_self_is_not_yield() {
// A regular `or rN,rN,rN` that is NOT the db16cyc encoding (e.g. r3)
// is an ordinary no-op move and must keep executing (Continue), so we
// only yield on the exact spin-hint code canary special-cases.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
// or r3, r3, r3 (RT=RA=RB=3, Rc=0): 31<<26 | 3<<21 | 3<<16 | 3<<11 | 444<<1
let raw = (31u32 << 26) | (3 << 21) | (3 << 16) | (3 << 11) | (444 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
ctx.gpr[3] = 0xCAFE;
let r = step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[3], 0xCAFE);
assert_eq!(ctx.pc, 4);
assert_eq!(r, StepResult::Continue, "non-db16cyc or-self stays Continue");
}
#[test]
fn test_smt_priority_hints_are_nops_not_yields() {
// iterate-2H spin/yield/sync hint-class audit. The PowerPC SMT
// thread-priority hints `or 1,1,1` / `or 2,2,2` / `or 3,3,3` / `or 6,6,6`
// (and the db8cyc family `or 26..30`) are reserved no-op encodings.
// Canary's `InstrEmit_orx` emits `f.Nop()` for EVERY `or rX,rX,rX`
// (RT==RB==RA && !Rc) form EXCEPT the exact db16cyc code 0x7FFFFB78,
// which alone gets `f.DelayExecution()`. So ours must NOT yield on any
// of these — over-yielding would diverge from canary and perturb the
// deterministic schedule. (Audit evidence: none of 1/2/3/6/26..30 even
// appear in Sylpheed's image; only `or 31,31,31` (db16cyc) is used as a
// spin hint. This test locks the no-over-yield invariant regardless.)
for r in [1u32, 2, 3, 6, 26, 27, 28, 29, 30] {
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
// or rN,rN,rN, Rc=0: 31<<26 | r<<21 | r<<16 | r<<11 | 444<<1
let raw = (31u32 << 26) | (r << 21) | (r << 16) | (r << 11) | (444 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
ctx.gpr[r as usize] = 0xDEAD_BEEF_F00D_BA11;
let res = step(&mut ctx, &mut mem);
assert_eq!(
ctx.gpr[r as usize], 0xDEAD_BEEF_F00D_BA11,
"or {r},{r},{r} is value-neutral"
);
assert_eq!(ctx.pc, 4, "or {r},{r},{r} advances PC");
assert_eq!(
res,
StepResult::Continue,
"priority hint or {r},{r},{r} is a plain no-op (canary Nop), NOT a yield"
);
}
}
#[test]
fn test_lwsync_ptesync_eieio_isync_decode_as_benign_noops() {
// Memory/sync barrier class. Canary keys `sync` on XO=598 only, so
// sync (L=0), lwsync (L=1), ptesync (L=2) all map to the same
// `InstrEmit_sync` -> `MemoryBarrier`; `eieio` -> `MemoryBarrier`;
// `isync` -> `Nop`. Under our single-host interpreter every one is a
// value-neutral no-op that advances PC and must DECODE (never trap as
// unknown). This guards the L-field disambiguation and the decode path.
let cases: &[(u32, &str)] = &[
(0x7C00_04AC, "sync"), // L=0
(0x7C20_04AC, "lwsync"), // L=1
(0x7C40_04AC, "ptesync"), // L=2
(0x7C00_06AC, "eieio"),
(0x4C00_012C, "isync"),
];
for &(raw, name) in cases {
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
let pre_xer = ctx.xer();
let pre_fpscr = ctx.fpscr;
let pre_gpr = ctx.gpr;
write_instr(&mut mem, 0x200, raw);
ctx.pc = 0x200;
let res = step(&mut ctx, &mut mem);
assert_eq!(res, StepResult::Continue, "{name} continues");
assert_eq!(ctx.pc, 0x204, "{name} advances PC (decoded, did not trap)");
assert_eq!(ctx.xer(), pre_xer, "{name} leaves XER");
assert_eq!(ctx.fpscr, pre_fpscr, "{name} leaves FPSCR");
assert_eq!(ctx.gpr, pre_gpr, "{name} leaves GPRs");
}
}
#[test] #[test]
fn test_fadd() { fn test_fadd() {
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
@@ -5332,15 +5483,17 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.xer_ov, 1); assert_eq!(ctx.xer_ov, 1, "32-bit INT_MIN check (preserved) sets OV");
// -INT_MIN wraps to INT_MIN (low 32 bits) with upper 32 bits zero. // PPCBUG-020 fix: neg is full 64-bit `0 - RA` (canary `Sub(0, RA)`).
assert_eq!(ctx.gpr[5], 0x0000_0000_8000_0000); // RA = 0x0000_0000_8000_0000 → 0xFFFF_FFFF_8000_0000. (OV remains the
// preserved 32-bit INT_MIN flag.)
assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_8000_0000);
} }
#[test] #[test]
fn neg_clean_input_no_upper_bits() { fn neg_clean_input_no_upper_bits() {
// PPCBUG-006 regression: neg r3=5 must produce 0x00000000_FFFFFFFB, // PPCBUG-020 fix: neg r3=5 = `0 - 5` = -5 = 0xFFFFFFFF_FFFFFFFB on a
// not 0xFFFFFFFF_FFFFFFFB (the 64-bit !ra-then-add-1 result). // 64-bit core (canary `Sub(0, RA)`), not the truncated 0x00000000_FFFFFFFB.
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 5; ctx.gpr[3] = 5;
@@ -5348,7 +5501,7 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_0000_FFFF_FFFB); assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_FFFF_FFFB);
} }
#[test] #[test]
@@ -5502,9 +5655,10 @@ mod tests {
} }
#[test] #[test]
fn mullwx_overflow_truncates_to_32() { fn mullwx_overflow_keeps_full_64bit_product() {
// PPCBUG-009: mullwo r5, r3, r4 with ra=0x10000, rb=0x10000 → product // PPCBUG-020 fix: mullwo r5, r3, r4 with ra=0x10000, rb=0x10000 → full
// 0x100000000 (overflow). Low 32 = 0; OE must fire. // 64-bit product 0x1_0000_0000 (canary stores the full i64 product, not
// the truncated low 32). OE still fires (the product overflows int32).
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 0x10000; ctx.gpr[3] = 0x10000;
@@ -5514,7 +5668,7 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0, "low 32 bits = 0"); assert_eq!(ctx.gpr[5], 0x0000_0001_0000_0000, "full 64-bit product");
assert_eq!(ctx.xer_ov, 1, "overflow detected"); assert_eq!(ctx.xer_ov, 1, "overflow detected");
} }
@@ -5536,9 +5690,74 @@ mod tests {
} }
#[test] #[test]
fn srawx_negative_value_zero_extends_upper() { fn slwx_shift_count_masks_to_6_bits() {
// PPCBUG-041+043: srawx of negative i32 by 1 produces a negative i32; // slw masks the shift count to RB[58:63] (6 bits): a count of 0x40 has
// writeback must zero-extend to u64 (not sign-extend). // low-6-bits 0, so the value passes through unchanged — it must NOT be
// zeroed by a naive full-u32 `>= 32` test. Matches canary InstrEmit_slwx.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
ctx.gpr[3] = 0x0000_1234u64;
ctx.gpr[4] = 0x40; // count & 0x3F == 0 → shift by 0
// slwx r5, r3, r4 (XO=24)
let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (24 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_1234u64, "0x40 masks to 0 → passthrough");
}
#[test]
fn slwx_count_32_to_63_zeroes() {
// A masked count in [32,63] (bit 5 set) zeroes the result.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
ctx.gpr[3] = 0xFFFF_FFFFu64;
ctx.gpr[4] = 0x60; // & 0x3F = 0x20 (32) → zero
let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (24 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0);
}
#[test]
fn srwx_shift_count_masks_to_6_bits() {
// srw, same 6-bit mask. Count 0x48 → low-6-bits = 8 → logical >> 8.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
ctx.gpr[3] = 0x0000_FF00u64;
ctx.gpr[4] = 0x48; // & 0x3F = 8
// srwx r5, r3, r4 (XO=536)
let raw = (31u32 << 26) | (3 << 21) | (5 << 16) | (4 << 11) | (536 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_00FFu64, "0x48 masks to 8 → >>8");
}
#[test]
fn rlwinm_mb_greater_than_me_wraparound_mask() {
// rlwinm with MB > ME produces a wraparound mask covering bits
// [0..ME] [MB..31] (a "split" mask). PowerISA MASK(mb,me) wraps when
// mb > me. Here rotate by 0, MB=28, ME=3 → mask = 0xF000000F.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
ctx.gpr[3] = 0xFFFF_FFFFu64;
// rlwinm r5, r3, SH=0, MB=28, ME=3 (opcode 21)
let raw = (21u32 << 26) | (3 << 21) | (5 << 16) | (0 << 11) | (28 << 6) | (3 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_0000_F000_000Fu64,
"MB>ME wraparound mask = bits [0..3] | [28..31]");
}
#[test]
fn srawx_negative_value_sign_extends_upper() {
// sraw of negative i32 by 1 produces a negative i32 result that PowerISA
// SIGN-extends into the full 64-bit RA (canary InstrEmit_srawx uses
// `f.SignExtend`). 0x80000000 >> 1 = 0xC0000000 (i32) → 0xFFFFFFFF_C0000000.
// (Was 0x00000000_C0000000 under the PPCBUG-041 zero-extend band-aid.)
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 0x8000_0000u64; // i32::MIN ctx.gpr[3] = 0x8000_0000u64; // i32::MIN
@@ -5548,14 +5767,15 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_0000_C000_0000u64); assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_C000_0000u64);
assert!(ctx.cr[0].lt); assert!(ctx.cr[0].lt);
} }
#[test] #[test]
fn srawix_high_count_negative_input_yields_low32_all_ones() { fn srawix_high_count_negative_input_sign_extends_all_ones() {
// PPCBUG-042+043: srawi with count=31 on negative input → low 32 bits // srawi count=31 on negative input → result is -1 (0xFFFFFFFF as i32),
// all ones (0xFFFFFFFF), upper 32 zero (was u64::MAX before fix). // sign-extended to the full 64-bit RA: 0xFFFFFFFF_FFFFFFFF (canary
// InstrEmit_srawix). Was 0x00000000_FFFFFFFF under the zero-extend band-aid.
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 0x8000_0000u64; ctx.gpr[3] = 0x8000_0000u64;
@@ -5564,7 +5784,7 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0x0000_0000_FFFF_FFFFu64); assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_FFFF_FFFFu64);
} }
#[test] #[test]
@@ -5598,17 +5818,18 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
// Result low 32: 0x00000001 + 0xFFFFFFFF = 0x00000000 with carry. // PPCBUG-020 fix: full 64-bit `RA + EXTS(-1)` = 0xFFFFFFFF_00000001 +
assert_eq!(ctx.gpr[4], 0); // 0xFFFFFFFF_FFFFFFFF = 0xFFFFFFFF_00000000 (canary). CA still comes
// from the 32-bit compare (low 32: 0x00000001 + 0xFFFFFFFF = 0, carry).
assert_eq!(ctx.gpr[4], 0xFFFFFFFF_00000000u64);
assert_eq!(ctx.xer_ca, 1, "32-bit compare must see CA=1"); assert_eq!(ctx.xer_ca, 1, "32-bit compare must see CA=1");
} }
#[test] #[test]
fn mulli_overflow_wraps_to_32() { fn mulli_full_64bit_product() {
// PPCBUG-004: mulli must truncate to 32 bits even when the upper 32 bits // PPCBUG-020 fix: mulli uses the full 64-bit RA (canary
// of RA are polluted (e.g. by upstream bugs). Pre-fix: ra = u64::MAX as // `Mul(LoadGPR(RA), Int64(EXTS(imm)))`). RA = u64::MAX = -1, × 2 = -2
// i64 = -1, * 2 = -2, written to GPR as `0xFFFFFFFF_FFFFFFFE`. Post-fix: // = 0xFFFFFFFF_FFFFFFFE (full 64-bit), not the truncated 0xFFFFFFFE.
// truncated to `0xFFFFFFFE`. Discriminating regression test.
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = u64::MAX; ctx.gpr[3] = u64::MAX;
@@ -5617,13 +5838,14 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[4], 0xFFFF_FFFEu64, "low 32 bits = -2 in i32; upper 32 zero"); assert_eq!(ctx.gpr[4], 0xFFFF_FFFF_FFFF_FFFEu64, "full 64-bit -2");
} }
#[test] #[test]
fn subficx_neg_simm_zero_extends() { fn subficx_full_64bit_result() {
// PPCBUG-005: subfic r4, r3, -1 with r3=5: imm-ra = 0xFFFFFFFF - 5 = 0xFFFFFFFA. // PPCBUG-020 fix: subfic r4, r3, -1 with r3=5 = `EXTS(-1) - RA` =
// Buggy form: imm sign-extended to u64 0xFFFFFFFFFFFFFFFF - 5 = poisoned. // 0xFFFFFFFF_FFFFFFFF - 5 = 0xFFFFFFFF_FFFFFFFA (canary `Sub(Int64(
// EXTS(imm)), RA)`). CA stays a 32-bit compare (0xFFFFFFFF >= 5 → 1).
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 5; ctx.gpr[3] = 5;
@@ -5632,7 +5854,7 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[4], 0x0000_0000_FFFF_FFFAu64); assert_eq!(ctx.gpr[4], 0xFFFF_FFFF_FFFF_FFFAu64);
assert_eq!(ctx.xer_ca, 1, "0xFFFFFFFF >= 5 → CA=1"); assert_eq!(ctx.xer_ca, 1, "0xFFFFFFFF >= 5 → CA=1");
} }
@@ -6538,12 +6760,13 @@ mod tests {
assert_eq!(ctx.pc, 4); assert_eq!(ctx.pc, 4);
} }
// PPCBUG-054: mtspr CTR must truncate the source GPR to 32 bits, matching // CTR is a 64-bit SPR. mtspr CTR stores the full GPR (canary
// canary's `f.Truncate(ctr, INT32_TYPE)`. Prevents upstream 64-bit GPR // InstrEmit_mtspr: `f.StoreCTR(rt)`, no truncation). The bdnz/bclr zero-TEST
// pollution from poisoning the 32-bit CTR counter independently of the // still truncates to 32 bits (separate, canary-faithful — see the bcx tests
// bcx zero-test fix. // above); the earlier PPCBUG-054 store-side truncation was a band-aid that a
// later `mfspr rX, CTR` would read back wrong.
#[test] #[test]
fn mtspr_ctr_truncates_to_32_bits() { fn mtspr_ctr_keeps_full_64_bits() {
let mut ctx = PpcContext::new(); let mut ctx = PpcContext::new();
let mut mem = TestMem::new(); let mut mem = TestMem::new();
ctx.gpr[3] = 0xFFFF_FFFF_8000_0001; ctx.gpr[3] = 0xFFFF_FFFF_8000_0001;
@@ -6553,7 +6776,26 @@ mod tests {
write_instr(&mut mem, 0, raw); write_instr(&mut mem, 0, raw);
ctx.pc = 0; ctx.pc = 0;
step(&mut ctx, &mut mem); step(&mut ctx, &mut mem);
assert_eq!(ctx.ctr, 0x8000_0001); assert_eq!(ctx.ctr, 0xFFFF_FFFF_8000_0001);
}
// mfspr rX, CTR must read back the full 64-bit CTR (round-trips the value
// mtspr stored). This is the observable consequence of the mtspr fix.
#[test]
fn mfspr_ctr_reads_full_64_bits() {
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
ctx.gpr[3] = 0xFFFF_FFFF_8000_0001;
// mtspr CTR, r3 then mfspr r5, CTR
let spr_swapped = ((9u32 & 0x1F) << 5) | ((9u32 >> 5) & 0x1F);
let mt = (31u32 << 26) | (3 << 21) | (spr_swapped << 11) | (467 << 1);
let mf = (31u32 << 26) | (5 << 21) | (spr_swapped << 11) | (339 << 1);
write_instr(&mut mem, 0, mt);
write_instr(&mut mem, 4, mf);
ctx.pc = 0;
step(&mut ctx, &mut mem);
step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[5], 0xFFFF_FFFF_8000_0001);
} }
// ─────────────────────────────────────────────────────────────────────── // ───────────────────────────────────────────────────────────────────────
@@ -7640,8 +7882,8 @@ mod tests {
ctx.xer_ca = 0; ctx.xer_ca = 0;
step(&mut ctx, &mem); step(&mut ctx, &mem);
assert_eq!(ctx.xer_ca, 0, "ra=0, ca=0 should produce CA=0"); assert_eq!(ctx.xer_ca, 0, "ra=0, ca=0 should produce CA=0");
// PPCBUG-018: 32-bit ABI. !0u32 + 0 = u32::MAX, with upper 32 bits zero. // PPCBUG-020 fix: full 64-bit `!RA + CA` = !0u64 + 0 = u64::MAX.
assert_eq!(ctx.gpr[3], 0xFFFF_FFFFu64, "result = !0u32 + 0 = u32::MAX"); assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_FFFF_FFFFu64, "result = !0u64 + 0");
} }
// Case 3: ra=1, ca=0 → CA=0 (old buggy code reported CA=1) // Case 3: ra=1, ca=0 → CA=0 (old buggy code reported CA=1)
{ {
@@ -7653,8 +7895,8 @@ mod tests {
ctx.xer_ca = 0; ctx.xer_ca = 0;
step(&mut ctx, &mem); step(&mut ctx, &mem);
assert_eq!(ctx.xer_ca, 0, "ra=1, ca=0 should produce CA=0"); assert_eq!(ctx.xer_ca, 0, "ra=1, ca=0 should produce CA=0");
// PPCBUG-018: 32-bit ABI. !1u32 + 0 = u32::MAX - 1, with upper 32 bits zero. // PPCBUG-020 fix: full 64-bit `!1u64 + 0` = u64::MAX - 1.
assert_eq!(ctx.gpr[3], 0xFFFF_FFFEu64, "result = !1u32 + 0 = u32::MAX - 1"); assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_FFFF_FFFEu64, "result = !1u64 + 0");
} }
// Case 4: ra=u32::MAX, ca=1 → CA=0; result = !u32::MAX + 1 = 1. // Case 4: ra=u32::MAX, ca=1 → CA=0; result = !u32::MAX + 1 = 1.
{ {
@@ -7666,7 +7908,9 @@ mod tests {
ctx.xer_ca = 1; ctx.xer_ca = 1;
step(&mut ctx, &mem); step(&mut ctx, &mem);
assert_eq!(ctx.xer_ca, 0, "ra=u32::MAX, ca=1 should produce CA=0"); assert_eq!(ctx.xer_ca, 0, "ra=u32::MAX, ca=1 should produce CA=0");
assert_eq!(ctx.gpr[3], 1, "result = !u32::MAX + 1 = 1"); // PPCBUG-020 fix: full 64-bit `!RA + CA`. RA = 0x0000_0000_FFFF_FFFF
// → !RA = 0xFFFF_FFFF_0000_0000, + 1 = 0xFFFF_FFFF_0000_0001.
assert_eq!(ctx.gpr[3], 0xFFFF_FFFF_0000_0001u64, "result = !RA + 1");
} }
} }

View File

@@ -35,6 +35,20 @@ pub const INITIAL_GUEST_TID: u32 = 1;
/// Axis 1 carries the field on every thread but doesn't decrement yet. /// Axis 1 carries the field on every thread but doesn't decrement yet.
pub const QUANTUM_DEFAULT: u32 = 50_000; pub const QUANTUM_DEFAULT: u32 = 50_000;
/// Anti-starvation floor. On a cooperative single-host slot, strict-priority
/// `pick_runnable` lets a high-priority CPU-bound spinner (e.g. a pri-15
/// time-critical poll loop pinned by affinity) win every round forever,
/// permanently starving a co-located lower-priority peer that the spinner is
/// actually *waiting on* — a deadlock that never occurs on real hardware,
/// where SMT contexts run those threads concurrently.
///
/// Once a Ready thread has been passed over this many consecutive slot
/// visits, `pick_runnable` grants it ONE pick (then its counter resets). The
/// limit is large enough that the genuinely-higher-priority thread still wins
/// the overwhelming majority of visits (here: ~4095/4096); the boost only
/// guarantees *bounded* forward progress, it does not invert priority.
pub const STARVE_LIMIT: u32 = 4096;
/// Above this depth, `spawn` prunes `Exited` entries from a slot's runqueue /// Above this depth, `spawn` prunes `Exited` entries from a slot's runqueue
/// before pushing the new thread. Keeps peer `ThreadRef`s stable on the /// before pushing the new thread. Keeps peer `ThreadRef`s stable on the
/// common (low-depth) path — a game that spawns a handful of long-lived /// common (low-depth) path — a game that spawns a handful of long-lived
@@ -117,6 +131,20 @@ pub struct GuestThread {
/// Axis 3 instruction budget. Decremented per retired step on this /// Axis 3 instruction budget. Decremented per retired step on this
/// thread; on zero, slot rotates within same-priority tier. /// thread; on zero, slot rotates within same-priority tier.
pub quantum_remaining: u32, pub quantum_remaining: u32,
/// Anti-starvation counter. Incremented each slot visit this thread is
/// Ready but NOT picked; reset to 0 when picked. When it reaches
/// `STARVE_LIMIT`, `pick_runnable` grants this thread one boosted pick so
/// a monopolizing higher-priority peer on the same slot cannot starve it
/// indefinitely. Deterministic: a pure function of pick history.
pub steps_starved: u32,
/// SpawnParams.entry — the BL target the trampoline jumped to.
/// Persisted so kernel exports can filter syscalls by spawning
/// chain (e.g. the silph UI auto-signal POC). 0 for the initial
/// thread (uses `install_initial_thread`, not `spawn`).
pub start_entry: u32,
/// SpawnParams.start_context — initial r3 at spawn. Persisted for
/// the same filtering reason as `start_entry`.
pub start_context: u32,
} }
impl GuestThread { impl GuestThread {
@@ -136,6 +164,9 @@ impl GuestThread {
affinity_mask: 0xFF, affinity_mask: 0xFF,
ideal_processor: None, ideal_processor: None,
quantum_remaining: QUANTUM_DEFAULT, quantum_remaining: QUANTUM_DEFAULT,
steps_starved: 0,
start_entry: 0,
start_context: 0,
} }
} }
} }
@@ -208,15 +239,35 @@ impl Default for HwSlot {
impl HwSlot { impl HwSlot {
/// Index of the highest-priority Ready/ServicingIrq thread in this /// Index of the highest-priority Ready/ServicingIrq thread in this
/// slot's runqueue. Tiebreak: prefer lower index (deterministic). /// slot's runqueue. Tiebreak: prefer lower index (deterministic).
///
/// Selection is by *effective* priority: a Ready thread that has been
/// passed over for `STARVE_LIMIT` consecutive visits is boosted so it
/// wins exactly one pick, then [`Scheduler::begin_slot_visit`] resets its
/// counter. This restores the guest-visible invariant that every Ready
/// thread makes forward progress, without inverting the intended priority
/// order (a starved thread only beats its monopolizer once per
/// `STARVE_LIMIT` visits). The boost is a pure function of the per-thread
/// counters/priority/index, so picks stay deterministic.
pub fn pick_runnable(&self) -> Option<usize> { pub fn pick_runnable(&self) -> Option<usize> {
self.runqueue self.runqueue
.iter() .iter()
.enumerate() .enumerate()
.filter(|(_, t)| matches!(t.state, HwState::Ready | HwState::ServicingIrq(_))) .filter(|(_, t)| matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)))
.max_by_key(|(i, t)| (t.priority, -(*i as i64))) .max_by_key(|(i, t)| (Self::effective_priority(t), -(*i as i64)))
.map(|(i, _)| i) .map(|(i, _)| i)
} }
/// Priority used for selection. A thread starved for `STARVE_LIMIT`
/// visits is lifted to `i32::MAX` so it wins the next pick regardless of
/// peer priority; otherwise its nominal priority is used unchanged.
fn effective_priority(t: &GuestThread) -> i32 {
if t.steps_starved >= STARVE_LIMIT {
i32::MAX
} else {
t.priority
}
}
/// How many non-Exited threads currently live on this slot (used by /// How many non-Exited threads currently live on this slot (used by
/// placement policies). /// placement policies).
pub fn live_depth(&self) -> usize { pub fn live_depth(&self) -> usize {
@@ -341,6 +392,28 @@ pub struct Scheduler {
/// Sorted by deadline ascending. Scheduler wakes the first entry via /// Sorted by deadline ascending. Scheduler wakes the first entry via
/// `advance_to_next_wake` when a round finds nothing runnable. /// `advance_to_next_wake` when a round finds nothing runnable.
timed_waits: Vec<(u64, ThreadRef)>, timed_waits: Vec<(u64, ThreadRef)>,
/// Coherent monotonic "now" clock — the single authoritative basis the
/// kernel deadline-arithmetic (`KernelState::now_basis_at`) reads in
/// BOTH execution modes. Per-thread `ctx(hw_id).timebase` is NOT a
/// coherent "now":
/// * In `--parallel`, workers extract their `PpcContext` (leaving a
/// zeroed timebase in the slot) and step unlocked.
/// * In **lockstep**, a parked/poll thread has `running_idx == None`,
/// so `ctx()` returns `idle_ctx` (timebase 0); a `parse_timeout`
/// reading that basis registers `deadline = 0 + relative`, a value
/// permanently in the past, and `coord_idle_advance` re-arms that
/// same constant deadline forever (timebase-desync livelock — the
/// render-gate root: the submitter's 16ms re-wait never fires).
/// So a coordinator/parked thread reading per-thread timebase can see a
/// stale/zero basis decoupled from the deadline it just advanced to.
/// This field is that coherent basis instead. It is DETERMINISTIC: a
/// pure function of retired guest instructions (never wall-clock).
/// Advanced by `advance_global_clock` (per-block retired count on each
/// parallel writeback), `advance_global_clock_to` (floored up to the
/// deterministic per-round `stats.instruction_count` in lockstep), and
/// floored up by `advance_all_timebases_to`. Two cold lockstep runs
/// read identical values, so the lockstep trace stays bit-reproducible.
global_clock: u64,
/// Global count of TLS slots allocated — `spawn` pre-sizes new threads' /// Global count of TLS slots allocated — `spawn` pre-sizes new threads'
/// `tls_values` to this. /// `tls_values` to this.
tls_slot_count: usize, tls_slot_count: usize,
@@ -379,6 +452,7 @@ impl Scheduler {
order, order,
rng_state, rng_state,
timed_waits: Vec::new(), timed_waits: Vec::new(),
global_clock: 0,
tls_slot_count: 0, tls_slot_count: 0,
non_empty_runnable: 0, non_empty_runnable: 0,
rotation_cursor: 0, rotation_cursor: 0,
@@ -500,6 +574,17 @@ impl Scheduler {
self.current.expect("no current thread") self.current.expect("no current thread")
} }
/// `(start_entry, start_context)` of the currently-running thread.
/// Returns None if there is no current thread or its ref is stale.
/// Used by `KernelState::maybe_register_silph_autosignal` to filter
/// `NtCreateEvent` calls by spawning chain.
pub fn current_thread_entry_and_ctx(&self) -> Option<(u32, u32)> {
let r = self.current?;
let slot = self.slots.get(r.hw_id as usize)?;
let t = slot.runqueue.get(r.idx as usize)?;
Some((t.start_entry, t.start_context))
}
// ----- Guest-thread lookup ----- // ----- Guest-thread lookup -----
/// Find the `ThreadRef` of the (non-Exited) thread with `tid`. /// Find the `ThreadRef` of the (non-Exited) thread with `tid`.
@@ -614,6 +699,8 @@ impl Scheduler {
t.priority = params.priority; t.priority = params.priority;
t.affinity_mask = mask; t.affinity_mask = mask;
t.ideal_processor = params.ideal_processor; t.ideal_processor = params.ideal_processor;
t.start_entry = params.entry;
t.start_context = params.start_context;
// M3.7 — populate the inter-thread reservation handle + slot id // M3.7 — populate the inter-thread reservation handle + slot id
// so the interpreter can route lwarx/stwcx through the table. // so the interpreter can route lwarx/stwcx through the table.
t.ctx.hw_id = slot_id; t.ctx.hw_id = slot_id;
@@ -744,10 +831,22 @@ impl Scheduler {
/// stashes `self.current` so exports can reach it. /// stashes `self.current` so exports can reach it.
pub fn begin_slot_visit(&mut self, hw_id: u8) { pub fn begin_slot_visit(&mut self, hw_id: u8) {
let slot = &mut self.slots[hw_id as usize]; let slot = &mut self.slots[hw_id as usize];
slot.running_idx = slot.pick_runnable(); let picked = slot.pick_runnable();
self.current = slot slot.running_idx = picked;
.running_idx // Anti-starvation bookkeeping: reset the picked thread's counter,
.map(|idx| ThreadRef::new(hw_id, idx as u16)); // increment every other Ready peer that was passed over this visit.
// Once a passed-over thread reaches STARVE_LIMIT it wins the next
// pick_runnable (effective_priority -> i32::MAX), then lands here as
// `picked` and resets — bounding any thread's starvation. Pure
// function of pick history, so it stays deterministic.
for (i, t) in slot.runqueue.iter_mut().enumerate() {
if Some(i) == picked {
t.steps_starved = 0;
} else if matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)) {
t.steps_starved = t.steps_starved.saturating_add(1);
}
}
self.current = picked.map(|idx| ThreadRef::new(hw_id, idx as u16));
} }
/// Clear `current` at the end of each per-slot visit. /// Clear `current` at the end of each per-slot visit.
@@ -803,6 +902,41 @@ impl Scheduler {
false false
} }
/// Cooperative yield: the currently-running thread executed a `db16cyc`
/// spin-wait hint (see `StepResult::Yield`). It is busy-spinning on a
/// guest spinlock/barrier whose release depends on a *co-located* peer
/// that cannot make progress while this thread keeps winning the slot.
///
/// Promote every Ready peer on this slot past `STARVE_LIMIT` so the next
/// `begin_slot_visit` picks one of them (their `effective_priority` →
/// `i32::MAX`), and reset the yielder's own counter. Each promoted peer
/// runs once and resets to 0 in `begin_slot_visit`; once all peers have
/// had their turn the spinner is picked again, spins, and re-yields —
/// producing a fair round-robin between the spinner and the threads it is
/// waiting on. This mirrors real hardware, where all six HW threads run
/// concurrently and the spin resolves as soon as the peer releases.
///
/// Pure function of the slot's current state (no RNG, no wall-clock), so
/// it preserves lockstep determinism. No-op if there is no Ready peer
/// (the spinner is alone on its slot — nothing to hand off to).
///
/// Returns `true` if at least one peer was promoted.
pub fn yield_current(&mut self) -> bool {
let Some(r) = self.current else { return false; };
let slot = &mut self.slots[r.hw_id as usize];
let me = r.idx as usize;
let mut promoted = false;
for (i, t) in slot.runqueue.iter_mut().enumerate() {
if i == me {
t.steps_starved = 0;
} else if matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)) {
t.steps_starved = STARVE_LIMIT;
promoted = true;
}
}
promoted
}
// ----- Park / wake / exit ----- // ----- Park / wake / exit -----
pub fn park_current(&mut self, reason: BlockReason) { pub fn park_current(&mut self, reason: BlockReason) {
@@ -1091,6 +1225,42 @@ impl Scheduler {
} }
} }
} }
// Keep the parallel-mode coherent clock at least as far forward as
// any deadline we fast-forward to (idle/timer/wake advances). This
// only mutates the new `global_clock` field — lockstep never reads
// it — so it cannot perturb the deterministic lockstep trace.
self.global_clock = self.global_clock.max(deadline);
}
/// Parallel-mode coherent "now" (see [`Self::global_clock`] field doc).
/// Read by the kernel deadline-arithmetic ONLY when
/// `KernelState::parallel_active`; lockstep keeps reading per-thread
/// `ctx(hw_id).timebase`.
#[inline]
pub fn global_clock(&self) -> u64 {
self.global_clock
}
/// Advance the parallel-mode coherent clock by `n` retired instructions.
/// Called from the parallel worker writeback with the block's executed
/// count so "now" tracks aggregate guest progress.
#[inline]
pub fn advance_global_clock(&mut self, n: u64) {
self.global_clock = self.global_clock.saturating_add(n);
}
/// Floor the coherent clock up to `now` (monotonic; never goes
/// backwards). Used by the **lockstep** outer loop once per round to
/// track the deterministic retired-instruction count
/// (`stats.instruction_count`) as the single coherent "now". A plain
/// floor-up rather than `saturating_add` because the lockstep caller
/// passes an absolute monotonic counter (not a per-block delta), and
/// because `advance_all_timebases_to` may already have pushed
/// `global_clock` past the instruction count when fast-forwarding to a
/// future deadline — clamping with `max` keeps both sources monotone.
#[inline]
pub fn advance_global_clock_to(&mut self, now: u64) {
self.global_clock = self.global_clock.max(now);
} }
/// Fast-forward the timebase to the earliest pending timed wait and /// Fast-forward the timebase to the earliest pending timed wait and
@@ -1161,6 +1331,28 @@ impl Scheduler {
}) })
} }
/// True if any thread is currently `Blocked` on a `WaitAny`/`WaitAll`
/// whose handle set contains `handle`. Used by the handle-slab recycler
/// (AUDIT-059 R34) to avoid an ABA hazard: if a closed handle's slot is
/// returned to the free list while a thread is still parked on it, a
/// later `alloc_handle` could hand the same slot to a NEW object, and a
/// signal on that new object would wake the stale waiter that was
/// waiting on the OLD (closed) object. Canary sidesteps this by keeping
/// the object alive via an object_ref while waiters hold references; we
/// instead simply decline to recycle a still-waited slot (leaking it,
/// matching the pre-R34 bump-only behaviour for that rare case).
pub fn any_thread_waiting_on(&self, handle: u32) -> bool {
self.slots.iter().any(|slot| {
slot.runqueue.iter().any(|t| match &t.state {
HwState::Blocked(BlockReason::WaitAny { handles, .. })
| HwState::Blocked(BlockReason::WaitAll { handles, .. }) => {
handles.contains(&handle)
}
_ => false,
})
})
}
/// Snapshot thread states for diagnostic logging. One entry per live /// Snapshot thread states for diagnostic logging. One entry per live
/// guest thread (Exited are included so post-mortem can see exit codes). /// guest thread (Exited are included so post-mortem can see exit codes).
pub fn diagnostic_snapshot(&self) -> Vec<(ThreadRef, Option<u32>, HwState)> { pub fn diagnostic_snapshot(&self) -> Vec<(ThreadRef, Option<u32>, HwState)> {
@@ -1858,6 +2050,118 @@ mod tests {
assert_eq!(t.quantum_remaining, QUANTUM_DEFAULT, "quantum reloaded"); assert_eq!(t.quantum_remaining, QUANTUM_DEFAULT, "quantum reloaded");
} }
#[test]
fn test_anti_starvation_bounded_progress() {
// Reproduces the Sylpheed render-gate deadlock: a high-priority
// CPU-bound spinner (the pri-15 poll loop) co-located on one slot
// with a pri-0 worker (the submitter) the spinner is waiting on.
// Strict priority would starve the worker forever; the anti-starve
// floor must hand it a pick within STARVE_LIMIT+1 visits, then the
// spinner reclaims the slot (priority is NOT inverted).
let mut s = mk_empty_scheduler();
let mut spinner = SpawnParams::default();
spinner.guest_tid = 1;
spinner.thread_handle = 0x1000;
spinner.affinity_mask = 0b0001;
spinner.pcr_base = 0x4000_0000;
spinner.priority = 15;
s.spawn(spinner, &mut NullPcr).unwrap();
let mut worker = SpawnParams::default();
worker.guest_tid = 2;
worker.thread_handle = 0x1004;
worker.affinity_mask = 0b0001;
worker.pcr_base = 0x4000_1000;
worker.priority = 0;
s.spawn(worker, &mut NullPcr).unwrap();
let mut worker_picks = 0u32;
let mut spinner_picks = 0u32;
// Both stay Ready (the spinner never blocks — that's the bug shape).
for _ in 0..(STARVE_LIMIT + 2) {
s.begin_slot_visit(0);
match s.thread(s.current.unwrap()).tid {
1 => spinner_picks += 1,
2 => worker_picks += 1,
other => panic!("unexpected tid {other}"),
}
s.end_slot_visit();
}
assert_eq!(
worker_picks, 1,
"starved worker gets exactly one bounded pick within STARVE_LIMIT+2 visits"
);
assert_eq!(
spinner_picks,
STARVE_LIMIT + 1,
"high-priority spinner still dominates — priority is not inverted"
);
}
#[test]
fn test_db16cyc_yield_hands_slot_to_peer() {
// Reproduces the Sylpheed title-screen gate: a guest spinlock/barrier
// participant (tid=1) executes the `db16cyc` spin hint each round and
// would otherwise win `pick_runnable` forever (equal priority, lower
// index), starving the co-located peer (tid=2) it is waiting on.
// `yield_current` must promote the Ready peer so the very next
// `begin_slot_visit` picks it — without waiting STARVE_LIMIT rounds.
let mut s = mk_empty_scheduler();
for tid in [1u32, 2] {
let mut p = SpawnParams::default();
p.guest_tid = tid;
p.thread_handle = 0x1000 + tid * 4;
p.affinity_mask = 0b0001;
p.pcr_base = 0x4000_0000 + tid * 0x1000;
p.priority = 0; // equal priority — index would otherwise decide
s.spawn(p, &mut NullPcr).unwrap();
}
// Round 1: the spinner (lower index) wins.
s.begin_slot_visit(0);
let spinner = s.thread(s.current.unwrap()).tid;
assert_eq!(spinner, 1, "lower-index equal-priority thread wins first pick");
// It spins (db16cyc) → cooperative yield.
assert!(s.yield_current(), "yield promotes the Ready peer");
s.end_slot_visit();
// Round 2: the promoted peer must now be picked, not the spinner.
s.begin_slot_visit(0);
let after_yield = s.thread(s.current.unwrap()).tid;
assert_eq!(
after_yield, 2,
"after db16cyc yield the co-located peer runs (no STARVE_LIMIT wait)"
);
s.end_slot_visit();
// Round 3: peer's boost was consumed (reset to 0 when picked), so the
// spinner reclaims the slot — fair alternation, no priority inversion.
s.begin_slot_visit(0);
assert_eq!(
s.thread(s.current.unwrap()).tid,
1,
"spinner reclaims the slot after the peer has had its turn"
);
}
#[test]
fn test_yield_current_noop_when_alone() {
// A spinner with no Ready peer on its slot has nothing to hand off to;
// yield_current must be a no-op (returns false) and not panic.
let mut s = mk_empty_scheduler();
let mut p = SpawnParams::default();
p.guest_tid = 1;
p.thread_handle = 0x1004;
p.affinity_mask = 0b0001;
p.pcr_base = 0x4000_0000;
s.spawn(p, &mut NullPcr).unwrap();
s.begin_slot_visit(0);
assert!(!s.yield_current(), "no peer to promote → no-op");
// Still the same thread next round.
s.end_slot_visit();
s.begin_slot_visit(0);
assert_eq!(s.thread(s.current.unwrap()).tid, 1);
}
#[test] #[test]
fn test_cooperative_yield_does_not_need_quantum() { fn test_cooperative_yield_does_not_need_quantum() {
let mut s = mk_empty_scheduler(); let mut s = mk_empty_scheduler();

View File

@@ -293,28 +293,23 @@ pub fn store_vector_right(mem: &dyn MemoryAccess, ea: u32, v: Vec128) {
} }
} }
// ─── 5-6-5 pixel pack (vpkpx / vupkhpx / vupklpx) ───────────────────────── // ─── pixel pack (vpkpx / vupkhpx / vupklpx) ───────────────────────────────
// PPC vpkpx takes a 32-bit RGB lane and packs it into a 16-bit 1-5-5-5 pixel. // PPC vpkpx packs each 32-bit lane into a 16-bit 1-5-5-5 pixel.
// vupkhpx / vupklpx reverse the operation. // Mapping transcribed EXACTLY from xenia-canary
// // `ppc_emit_altivec.cc::vkpkx_in_low` (lines 1795-1808):
// Format: input 32-bit word holds // tmp1 = (input >> 9) & 0xFC00 // out bits 15:10 = in bits 24:19
// bits 0-6: unused (0) // tmp2 = (input >> 6) & 0x3E0 // out bits 9:5 = in bits 14:10
// bit 7: alpha-select (→ bit 15 of output) // tmp3 = (input >> 3) & 0x1F // out bits 4:0 = in bits 7:3
// bits 8-15: R (top 5 bits kept) // result = tmp1 | tmp2 | tmp3
// bits 16-23: G (top 5 bits kept) // This is a pure shift/mask: there is NO standalone alpha select. Output
// bits 24-31: B (top 5 bits kept) // bit 15 is simply input bit 24 (the top of the 6-bit field masked by
// Output 16-bit word: // 0xFC00) — NOT input bit 7. The red field is 6 bits wide here.
// bit 15: A (from input bit 7)
// bits 10-14: R
// bits 5-9: G
// bits 0-4: B
#[inline] pub fn pack_pixel_555(input: u32) -> u16 { #[inline] pub fn pack_pixel_555(input: u32) -> u16 {
let a = (input >> 7) & 0x1; let tmp1 = (input >> 9) & 0xFC00;
let r = (input >> 8) & 0xFF; let tmp2 = (input >> 6) & 0x3E0;
let g = (input >> 16) & 0xFF; let tmp3 = (input >> 3) & 0x1F;
let b = (input >> 24) & 0xFF; (tmp1 | tmp2 | tmp3) as u16
((a << 15) | ((r & 0xF8) << 7) | ((g & 0xF8) << 2) | ((b & 0xF8) >> 3)) as u16
} }
#[inline] pub fn unpack_pixel_555(input: u16) -> u32 { #[inline] pub fn unpack_pixel_555(input: u16) -> u32 {
@@ -801,9 +796,38 @@ mod tests {
} }
#[test] #[test]
fn pack_unpack_pixel_555() { fn pack_pixel_555_matches_canary() {
let encoded = pack_pixel_555(0x80_F8_F8_F8); // Mapping (canary ppc_emit_altivec.cc::vkpkx_in_low):
assert_eq!(encoded & 0x8000, 0x8000); // out[15:10] = in[24:19], out[9:5] = in[14:10], out[4:0] = in[7:3]
// Pure shift/mask, NO standalone alpha bit.
// All three colour fields exercised. Expected (hand-computed):
// (0x018844C0 >> 9)&0xFC00 = 0xC400
// (0x018844C0 >> 6)&0x3E0 = 0x100
// (0x018844C0 >> 3)&0x1F = 0x18
// => 0xC518
assert_eq!(pack_pixel_555(0x01_88_44_C0), 0xC518);
// Boundary the audit flagged: low byte 0xF8 has bit 7 set. Canary does
// NOT turn that into output bit 15 (alpha). Output bit 15 = in bit 24,
// which is 0 here => high bit clear. (Old impl wrongly produced 0x8000.)
assert_eq!(pack_pixel_555(0x80_F8_F8_F8), 0x7FFF);
assert_eq!(pack_pixel_555(0x80_F8_F8_F8) & 0x8000, 0);
// Lone source bit 7 (0x80) lands in the blue field, not in bit 15.
assert_eq!(pack_pixel_555(0x00_00_00_80), 0x0010);
// Output bit 15 is sourced from input bit 24, not bit 7.
assert_eq!(pack_pixel_555(0x01_00_00_00), 0x8000);
// Saturated input -> all field bits set.
assert_eq!(pack_pixel_555(0xFF_FF_FF_FF), 0xFFFF);
}
#[test]
fn unpack_pixel_555_roundtrip() {
// vupkhpx/vupklpx are NOTIMPLEMENTED in canary, so unpack_pixel_555 is
// unchanged; just sanity-check the alpha-replicate path still holds.
let w = unpack_pixel_555(0x8000 | (0x1F << 10) | (0x1F << 5) | 0x1F); let w = unpack_pixel_555(0x8000 | (0x1F << 10) | (0x1F << 5) | 0x1F);
assert_eq!(w & 0xFF000000, 0xFF000000); assert_eq!(w & 0xFF000000, 0xFF000000);
} }

View File

@@ -28,6 +28,56 @@ use crate::primitive::{self, ProcessedPrimitive};
use crate::register_file::RegisterFile; use crate::register_file::RegisterFile;
use crate::ring_view::RingBufferView; use crate::ring_view::RingBufferView;
/// The guest-virtual window that physical allocations are committed into.
/// `xenia-kernel`'s `heap_alloc` bumps its cursor through `0x4000_0000..=
/// 0x6FFF_FFFF` and commits the host backing for `MmAllocatePhysicalMemoryEx`
/// there, so this write-combine mirror is the canonical home of physical DRAM.
/// Keep in sync with `KernelState::heap_cursor`'s initial value.
pub const PHYSICAL_BACKING_BASE: u32 = 0x4000_0000;
/// Re-project a guest *physical* address — as handed to the Vd/GPU ABI and
/// embedded in PM4 pointers (`INDIRECT_BUFFER`, `WAIT_REG_MEM`-memory,
/// `MEM_WRITE`, `EVENT_WRITE*`, `IM_LOAD`, …) — onto the guest-virtual window
/// where its host backing is actually committed.
///
/// The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
/// windows that differ only in cache policy: bare physical (`0x0xxxxxxx`),
/// write-combine (`0x4xxxxxxx`), and the cached `0xA/0xC/0xExxxxxxx` mirrors —
/// all aliasing `addr & 0x1FFF_FFFF`. On real hardware (and in xenia-canary
/// via overlapping `mmap`s) these are literally the same bytes.
///
/// Ours has a single flat `membase` and `MmAllocatePhysicalMemoryEx` commits
/// physical backing in the write-combine `0x4xxxxxxx` window. The guest then
/// masks its allocation base to *bare physical* before passing it to
/// `VdInitializeRingBuffer` / `VdEnableRingBufferRPtrWriteBack`, and PM4
/// pointers are likewise bare-physical. A flat `membase + phys` access
/// therefore hits a never-committed, zero-filled page instead of the committed
/// `0x4xxxxxxx` backing — so the GPU decoded zero PM4 headers and never ran
/// the real command stream.
///
/// Projecting any physical-mirror address back onto the `0x4xxxxxxx` window
/// lands on the page `heap_alloc` actually backed, regardless of which mirror
/// the guest used (idempotent for `0x4xxxxxxx` itself). The projection is
/// derived from `heap_alloc`'s placement, not a guess — if that window ever
/// moves, `PHYSICAL_BACKING_BASE` must move with it.
///
/// This is deliberately applied only at the GPU/Vd boundary (where addresses
/// arrive in their bare-physical form), NOT on the CPU's flat load/store path:
/// the guest CPU already accesses its allocations through the `0x4xxxxxxx`
/// base, and non-physical guest-virtual addresses (image `0x82xxxxxx`, stacks
/// `0x7xxxxxxx`) must stay flat.
#[inline]
pub fn physical_to_backing(addr: u32) -> u32 {
match addr {
0x0000_0000..=0x1FFF_FFFF
| 0x4000_0000..=0x4FFF_FFFF
| 0xA000_0000..=0xBFFF_FFFF
| 0xC000_0000..=0xDFFF_FFFF
| 0xE000_0000..=0xFFFF_FFFF => PHYSICAL_BACKING_BASE | (addr & 0x1FFF_FFFF),
_ => addr,
}
}
/// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets. /// Cached Xenos microcode blob, produced by `PM4_IM_LOAD*` packets.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct ShaderBlob { pub struct ShaderBlob {
@@ -58,21 +108,37 @@ pub enum WaitCmp {
GreaterEq, GreaterEq,
/// value > ref /// value > ref
Greater, Greater,
/// Always — caller wants to sleep regardless. /// Always — caller wants to sleep regardless (selector bit 7).
Always, Always,
/// Never matches — `wait_info & 7 == 0` selects bit 0 of canary's
/// selector word, which is always zero.
Never,
} }
impl WaitCmp { impl WaitCmp {
/// Interpret the lower 3 bits of `wait_info` per canary's `MatchValueAndRef`. /// Interpret the lower 3 bits of `wait_info` per canary's `MatchValueAndRef`
/// (`pm4_command_processor_implement.h:685-696`). Canary forms a selector
/// `((value<ref)<<1) | ((value<=ref)<<2) | ((value==ref)<<3) |
/// ((value!=ref)<<4) | ((value>=ref)<<5) | ((value>ref)<<6) | (1<<7)` and
/// evaluates `(selector >> (wait_info & 7)) & 1`. So the index is the bit
/// position: 1=Less, 2=LessEq, 3=Equal, 4=NotEqual, 5=GreaterEq,
/// 6=Greater, 7=always-true, 0=never (bit 0 is always clear).
///
/// GPUBUG: the prior mapping was off by one (it started at `0 => Less`),
/// so `wait_info & 7 == 3` decoded as `NotEqual` instead of `Equal`. That
/// inverted the standard CP coherency wait
/// (`WAIT_REG_MEM COHER_STATUS_HOST, Equal 0`): the GPU parked forever on
/// the first INDIRECT_BUFFER and never reached any draw.
pub fn from_wait_info(wait_info: u32) -> Self { pub fn from_wait_info(wait_info: u32) -> Self {
match wait_info & 0x7 { match wait_info & 0x7 {
0 => WaitCmp::Less, 1 => WaitCmp::Less,
1 => WaitCmp::LessEq, 2 => WaitCmp::LessEq,
2 => WaitCmp::Equal, 3 => WaitCmp::Equal,
3 => WaitCmp::NotEqual, 4 => WaitCmp::NotEqual,
4 => WaitCmp::GreaterEq, 5 => WaitCmp::GreaterEq,
5 => WaitCmp::Greater, 6 => WaitCmp::Greater,
_ => WaitCmp::Always, 7 => WaitCmp::Always,
_ => WaitCmp::Never,
} }
} }
@@ -85,6 +151,7 @@ impl WaitCmp {
WaitCmp::GreaterEq => value >= reference, WaitCmp::GreaterEq => value >= reference,
WaitCmp::Greater => value > reference, WaitCmp::Greater => value > reference,
WaitCmp::Always => true, WaitCmp::Always => true,
WaitCmp::Never => false,
} }
} }
} }
@@ -561,6 +628,12 @@ impl GpuSystem {
pub fn execute_one(&mut self, mem: &dyn MemoryAccess) -> ExecOutcome { pub fn execute_one(&mut self, mem: &dyn MemoryAccess) -> ExecOutcome {
// 0) If currently parked, probe the condition and either wake up or stay blocked. // 0) If currently parked, probe the condition and either wake up or stay blocked.
if let Some(block) = self.pending_block.clone() { if let Some(block) = self.pending_block.clone() {
// Re-service the CP coherency handshake on each probe so a
// COHER_STATUS_HOST wait can clear (canary does this in its WAIT
// loop body, not just at entry).
if let GpuBlock::WaitRegMem { poll_addr, is_memory: false, .. } = &block {
self.make_coherent(*poll_addr);
}
if block.is_satisfied(mem, &self.register_file) { if block.is_satisfied(mem, &self.register_file) {
tracing::debug!(?block, "gpu: wait satisfied — resuming"); tracing::debug!(?block, "gpu: wait satisfied — resuming");
self.pending_block = None; self.pending_block = None;
@@ -658,6 +731,10 @@ impl GpuSystem {
/// Called by `VdInitializeRingBuffer` to give us the primary ring. /// Called by `VdInitializeRingBuffer` to give us the primary ring.
pub fn initialize_ring_buffer(&mut self, base: u32, size_log2: u32) { pub fn initialize_ring_buffer(&mut self, base: u32, size_log2: u32) {
let size_bytes = 1u32 << size_log2.min(31); let size_bytes = 1u32 << size_log2.min(31);
// The guest hands us a bare *physical* ring base; project it onto the
// committed backing window so ring reads hit real PM4 packets (see
// `physical_to_backing`).
let base = physical_to_backing(base);
self.ring.base = base; self.ring.base = base;
self.ring.size_dwords = size_bytes / 4; self.ring.size_dwords = size_bytes / 4;
self.ring.read_offset_dwords = 0; self.ring.read_offset_dwords = 0;
@@ -675,6 +752,10 @@ impl GpuSystem {
/// Called by `VdEnableRingBufferRPtrWriteBack` to record where the guest /// Called by `VdEnableRingBufferRPtrWriteBack` to record where the guest
/// expects us to mirror `read_offset_dwords`. /// expects us to mirror `read_offset_dwords`.
pub fn enable_rptr_writeback(&mut self, addr: u32, block_log2: u32) { pub fn enable_rptr_writeback(&mut self, addr: u32, block_log2: u32) {
// The guest registers a bare *physical* writeback address and polls
// the same allocation through its `0x4xxxxxxx` base; project so our
// RPtr store lands on the page the guest actually reads.
let addr = physical_to_backing(addr);
self.ring.rptr_writeback_addr = addr; self.ring.rptr_writeback_addr = addr;
self.ring.rptr_writeback_block_dwords = 1u32 << block_log2.min(31); self.ring.rptr_writeback_block_dwords = 1u32 << block_log2.min(31);
tracing::info!( tracing::info!(
@@ -724,6 +805,26 @@ impl GpuSystem {
/// upstream packet effects (memory writes, register file updates /// upstream packet effects (memory writes, register file updates
/// the guest reads via subsequent MMIO) happen-before the /// the guest reads via subsequent MMIO) happen-before the
/// CPU-visible RPTR bump. /// CPU-visible RPTR bump.
/// Service a CP coherency request, mirroring canary's
/// `CommandProcessor::MakeCoherent` (`command_processor.cc:801-838`).
///
/// The guest requests a vertex/texture-cache flush by writing
/// `COHER_STATUS_HOST` with its status bit (bit 31) set, then spins on a
/// `WAIT_REG_MEM COHER_STATUS_HOST, Equal 0`. We have no host cache to
/// flush (memory is shared, coherency is implicit), so completing the
/// request is simply clearing the register — which lets the wait satisfy.
/// No-op unless `poll_addr` is `COHER_STATUS_HOST` and its status bit is
/// set, so it is safe to call on every coherency-register WAIT probe.
fn make_coherent(&mut self, poll_addr: u32) {
if poll_addr != reg::COHER_STATUS_HOST {
return;
}
let status = self.register_file.read(reg::COHER_STATUS_HOST);
if status & 0x8000_0000 != 0 {
self.register_file.write(reg::COHER_STATUS_HOST, 0);
}
}
fn writeback_read_ptr(&mut self, mem: &dyn MemoryAccess) { fn writeback_read_ptr(&mut self, mem: &dyn MemoryAccess) {
if self.ring.rptr_writeback_addr != 0 && self.ring.is_initialized() { if self.ring.rptr_writeback_addr != 0 && self.ring.is_initialized() {
mem.write_u32_fence( mem.write_u32_fence(
@@ -816,7 +917,9 @@ impl GpuSystem {
} }
pm4::PM4_INDIRECT_BUFFER | pm4::PM4_INDIRECT_BUFFER_PFD => { pm4::PM4_INDIRECT_BUFFER | pm4::PM4_INDIRECT_BUFFER_PFD => {
self.stats.indirect_buffer_jumps += 1; self.stats.indirect_buffer_jumps += 1;
let ib_ptr = self.read_payload(mem, 1); // The IB pointer is a guest *physical* address — project it
// onto the committed backing window (see `physical_to_backing`).
let ib_ptr = physical_to_backing(self.read_payload(mem, 1));
let ib_size = self.read_payload(mem, 2); let ib_size = self.read_payload(mem, 2);
// Advance past the IB header + payload before recursing so // Advance past the IB header + payload before recursing so
// the return location is correct. // the return location is correct.
@@ -854,7 +957,8 @@ impl GpuSystem {
let is_memory = (wait_info & 0x10) != 0; let is_memory = (wait_info & 0x10) != 0;
let cmp = WaitCmp::from_wait_info(wait_info); let cmp = WaitCmp::from_wait_info(wait_info);
let poll_addr = if is_memory { let poll_addr = if is_memory {
poll_addr_raw & !3 // Physical memory poll address → committed backing.
physical_to_backing(poll_addr_raw & !3)
} else { } else {
poll_addr_raw poll_addr_raw
}; };
@@ -865,6 +969,12 @@ impl GpuSystem {
mask, mask,
cmp, cmp,
}; };
// A WAIT polling COHER_STATUS_HOST is the CP coherency
// handshake: service it now so the status bit clears (see
// `make_coherent`), exactly as canary does in its WAIT loop.
if !is_memory {
self.make_coherent(poll_addr);
}
if block.is_satisfied(mem, &self.register_file) { if block.is_satisfied(mem, &self.register_file) {
// Condition already true; proceed past this packet. // Condition already true; proceed past this packet.
tracing::trace!(?block, "gpu: WAIT_REG_MEM immediately satisfied"); tracing::trace!(?block, "gpu: WAIT_REG_MEM immediately satisfied");
@@ -908,7 +1018,7 @@ impl GpuSystem {
pm4::PM4_REG_TO_MEM => { pm4::PM4_REG_TO_MEM => {
// payload[0] = reg_index, payload[1] = mem addr // payload[0] = reg_index, payload[1] = mem addr
let reg_index = self.read_payload(mem, 1) & 0x1FFF; let reg_index = self.read_payload(mem, 1) & 0x1FFF;
let dst = self.read_payload(mem, 2) & !3; let dst = physical_to_backing(self.read_payload(mem, 2) & !3);
let value = self.register_file.read(reg_index); let value = self.register_file.read(reg_index);
mem.write_u32(dst, value); mem.write_u32(dst, value);
tracing::trace!( tracing::trace!(
@@ -920,7 +1030,7 @@ impl GpuSystem {
} }
pm4::PM4_MEM_WRITE => { pm4::PM4_MEM_WRITE => {
// payload[0] = dst, payload[1..=count-1] = values // payload[0] = dst, payload[1..=count-1] = values
let mut dst = self.read_payload(mem, 1) & !3; let mut dst = physical_to_backing(self.read_payload(mem, 1) & !3);
for i in 2..=count { for i in 2..=count {
let val = self.read_payload(mem, i); let val = self.read_payload(mem, i);
mem.write_u32(dst, val); mem.write_u32(dst, val);
@@ -936,7 +1046,7 @@ impl GpuSystem {
let mask = self.read_payload(mem, 4); let mask = self.read_payload(mem, 4);
let is_memory = (wait_info & 0x10) != 0; let is_memory = (wait_info & 0x10) != 0;
let cmp = WaitCmp::from_wait_info(wait_info); let cmp = WaitCmp::from_wait_info(wait_info);
let poll_addr = if is_memory { poll_raw & !3 } else { poll_raw }; let poll_addr = if is_memory { physical_to_backing(poll_raw & !3) } else { poll_raw };
let cur_raw = if is_memory { let cur_raw = if is_memory {
mem.read_u32(poll_addr) mem.read_u32(poll_addr)
} else { } else {
@@ -946,7 +1056,7 @@ impl GpuSystem {
let write_addr = self.read_payload(mem, 5); let write_addr = self.read_payload(mem, 5);
let write_data = self.read_payload(mem, 6); let write_data = self.read_payload(mem, 6);
if (wait_info & 0x100) != 0 { if (wait_info & 0x100) != 0 {
mem.write_u32(write_addr & !3, write_data); mem.write_u32(physical_to_backing(write_addr & !3), write_data);
} else { } else {
self.register_file self.register_file
.write(write_addr & 0x1FFF, write_data); .write(write_addr & 0x1FFF, write_data);
@@ -965,7 +1075,7 @@ impl GpuSystem {
// payload[0] = initiator (bit 31: write counter, else write `value`) // payload[0] = initiator (bit 31: write counter, else write `value`)
// payload[1] = address, payload[2] = value // payload[1] = address, payload[2] = value
let initiator = self.read_payload(mem, 1); let initiator = self.read_payload(mem, 1);
let address = self.read_payload(mem, 2); let address = physical_to_backing(self.read_payload(mem, 2));
let value = self.read_payload(mem, 3); let value = self.read_payload(mem, 3);
self.register_file self.register_file
.write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F); .write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F);
@@ -993,7 +1103,7 @@ impl GpuSystem {
// payload[0] = initiator, [1] = address. Writes 6 u16 extents // payload[0] = initiator, [1] = address. Writes 6 u16 extents
// (min/max x/y/z) — we're not tracking scissors yet, so write zeros. // (min/max x/y/z) — we're not tracking scissors yet, so write zeros.
let initiator = self.read_payload(mem, 1); let initiator = self.read_payload(mem, 1);
let address = self.read_payload(mem, 2) & !3; let address = physical_to_backing(self.read_payload(mem, 2) & !3);
self.register_file self.register_file
.write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F); .write(reg::VGT_EVENT_INITIATOR, initiator & 0x3F);
self.handle_event_initiator(initiator & 0x3F, mem); self.handle_event_initiator(initiator & 0x3F, mem);
@@ -1123,7 +1233,7 @@ impl GpuSystem {
} }
pm4::PM4_LOAD_ALU_CONSTANT => { pm4::PM4_LOAD_ALU_CONSTANT => {
// payload[0] = source mem addr, [1] = offset_type, [2] = size_dwords // payload[0] = source mem addr, [1] = offset_type, [2] = size_dwords
let src = self.read_payload(mem, 1) & !3; let src = physical_to_backing(self.read_payload(mem, 1) & !3);
let offset_type = self.read_payload(mem, 2); let offset_type = self.read_payload(mem, 2);
let size_dwords = self.read_payload(mem, 3); let size_dwords = self.read_payload(mem, 3);
let index = offset_type & 0x7FF; let index = offset_type & 0x7FF;
@@ -1155,7 +1265,7 @@ impl GpuSystem {
} }
v v
} else { } else {
let addr = self.read_payload(mem, 1) & !3; let addr = physical_to_backing(self.read_payload(mem, 1) & !3);
let mut v = Vec::with_capacity(size_dwords as usize); let mut v = Vec::with_capacity(size_dwords as usize);
for i in 0..size_dwords { for i in 0..size_dwords {
v.push(mem.read_u32(addr + i * 4)); v.push(mem.read_u32(addr + i * 4));
@@ -1477,8 +1587,9 @@ mod tests {
// header // header
let hdr = (3u32 << 30) | ((5u32 - 1) << 16) | ((pm4::PM4_WAIT_REG_MEM as u32) << 8); let hdr = (3u32 << 30) | ((5u32 - 1) << 16) | ((pm4::PM4_WAIT_REG_MEM as u32) << 8);
mem.write_u32(0x4000_0000, hdr); mem.write_u32(0x4000_0000, hdr);
// wait_info: is_memory=1 (bit 4), cmp=equal (bits 2:0 = 2) // wait_info: is_memory=1 (bit 4), cmp=equal (bits 2:0 = 3, per canary's
mem.write_u32(0x4000_0004, 0x12); // MatchValueAndRef selector: 1=Less, 2=LessEq, 3=Equal, …).
mem.write_u32(0x4000_0004, 0x13);
mem.write_u32(0x4000_0008, 0x4000_1000); mem.write_u32(0x4000_0008, 0x4000_1000);
mem.write_u32(0x4000_000C, 0x42); mem.write_u32(0x4000_000C, 0x42);
mem.write_u32(0x4000_0010, 0xFFFF_FFFF); mem.write_u32(0x4000_0010, 0xFFFF_FFFF);

View File

@@ -34,7 +34,7 @@ pub mod xenos_constants;
pub use gpu_system::{ pub use gpu_system::{
ExecOutcome, GpuBlock, GpuMmio, GpuStats, GpuSystem, InterruptSource, PendingInterrupt, ExecOutcome, GpuBlock, GpuMmio, GpuStats, GpuSystem, InterruptSource, PendingInterrupt,
ShaderBlob, SwapNotification, WaitCmp, PHYSICAL_BACKING_BASE, ShaderBlob, SwapNotification, WaitCmp, physical_to_backing,
}; };
pub use handle::{ pub use handle::{
DrainReply, GpuBackend, GpuCommand, GpuDigestSnapshot, GpuHandle, GpuWorker, DrainReply, GpuBackend, GpuCommand, GpuDigestSnapshot, GpuHandle, GpuWorker,

View File

@@ -364,7 +364,11 @@ pub fn copy_to_memory(
// Destination coordinates are 0-based against `dest_base` — the // Destination coordinates are 0-based against `dest_base` — the
// base already points at the top-left of the copy rectangle. // base already points at the top-left of the copy rectangle.
let dst_off = tiled_2d_offset(dx, dy, pitch_aligned, bpp_log2); let dst_off = tiled_2d_offset(dx, dy, pitch_aligned, bpp_log2);
let dst_addr = info.dest_base.wrapping_add(dst_off); // `dest_base` is a bare guest *physical* address; project onto the
// committed backing window so resolved pixels land where the guest
// (and `vd_swap`'s frontbuffer read) actually see them.
let dst_addr =
crate::gpu_system::physical_to_backing(info.dest_base.wrapping_add(dst_off));
if info.source_is_64bpp { if info.source_is_64bpp {
let (lo, hi) = match single_sample_idx { let (lo, hi) = match single_sample_idx {

View File

@@ -486,12 +486,20 @@ fn ke_query_performance_frequency(ctx: &mut PpcContext, _mem: &GuestMemory, _sta
ctx.gpr[3] = 50_000_000; // 50 MHz ctx.gpr[3] = 50_000_000; // 50 MHz
} }
fn ke_query_system_time(ctx: &mut PpcContext, mem: &GuestMemory, _state: &mut KernelState) { fn ke_query_system_time(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
let time_ptr = ctx.gpr[3] as u32; let time_ptr = ctx.gpr[3] as u32;
if time_ptr != 0 { if time_ptr != 0 {
let fake_time: u64 = 132_500_000_000_000_000; // ~2021 FILETIME // ITERATE-2J — advance with the same deterministic clock the
mem.write_u32(time_ptr, (fake_time >> 32) as u32); // KeTimeStampBundle uses (1 global_clock unit ≈ 100 ns) so a guest
mem.write_u32(time_ptr + 4, fake_time as u32); // that polls KeQuerySystemTime for elapsed time also sees forward
// progress instead of a frozen constant. FILETIME base (~2021) +
// 100-ns-unit clock.
const FILETIME_BASE: u64 = 132_500_000_000_000_000;
let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
let now = state.now_basis_at(hw_id);
let system_time = FILETIME_BASE.wrapping_add(now);
mem.write_u32(time_ptr, (system_time >> 32) as u32);
mem.write_u32(time_ptr + 4, system_time as u32);
} }
} }
@@ -696,9 +704,36 @@ fn mm_create_kernel_stack(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut K
} }
} }
/// Region-aware guest-virtual → physical translation, matching canary's
/// `Memory::GetPhysicalAddress` + `PhysicalHeap::GetPhysicalAddress`
/// (`xenia-canary/src/xenia/memory.cc:528-545` and `:2317-2326`).
///
/// Canary `PhysicalHeap::GetPhysicalAddress`:
/// ```c
/// address -= heap_base_;
/// if (heap_base_ >= 0xE0000000) { address += 0x1000; }
/// return address;
/// ```
/// The three physical heap bases (0xA0000000 / 0xC0000000 / 0xE0000000) all
/// alias the same 512 MB physical window, so `address - heap_base ==
/// address & 0x1FFFFFFF` for each. The only region-specific delta is the
/// `+0x1000` host-address-offset for the 0xE0000000+ 4 KB mirror — see
/// `memory.h:368-372` (`host_address_offset` for `heap_base >= 0xE0000000`).
/// For non-physical / sub-0x1FFFFFFF virtual addresses canary returns the
/// address unchanged, which equals `address & 0x1FFFFFFF` there too.
pub(crate) fn translate_physical_address(virt: u32) -> u32 {
let phys = virt & 0x1FFF_FFFF;
if virt >= 0xE000_0000 {
phys + 0x1000
} else {
phys
}
}
fn mm_get_physical_address(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) { fn mm_get_physical_address(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
// r3 = virtual address -> return physical address // r3 = virtual address -> return physical address.
ctx.gpr[3] &= 0x1FFF_FFFF; // Mask to 512MB physical // Region-aware, mirroring canary (see `translate_physical_address`).
ctx.gpr[3] = translate_physical_address(ctx.gpr[3] as u32) as u64;
} }
fn mm_query_address_protect(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) { fn mm_query_address_protect(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
@@ -1480,20 +1515,35 @@ fn nt_query_information_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mu
*size *size
}; };
// Root-of-device opens (`game:\`, `cache:\`, `partition0`) strip to // Snapshot what we need from the handle, then drop the borrow so we can
// an empty string post-prefix — see `open_vfs_file`'s synth path. // re-resolve the path against the VFS for its real attribute byte.
// Games query these as directories (DirectoryObject probe), and let path = path.clone();
// reporting `Directory=0` makes Sylpheed treat the open as "found a
// non-directory where I expected a directory" and call
// `XamShowDirtyDiscErrorUI`. Canary's `NtQueryInformationFile` pulls
// the real file-system entry's kind; we key on path shape since we
// don't model directory entries.
let is_directory = path.is_empty()
|| path.ends_with('/')
|| path.ends_with(':');
let size = live_size; let size = live_size;
let position = *position; let position = *position;
// Pull the REAL GDFX attribute byte (canary `disc_image_device.cc:154`)
// for disc-backed handles by re-resolving the stored path. Root-of-device
// opens (`game:\`, `cache:\`, `partition0`) strip to an empty string and
// synth-stub opens have no VFS entry — for those we fall back to the
// path-shape heuristic. Games query these as directories (DirectoryObject
// probe), and reporting `Directory=0` makes Sylpheed treat the open as
// "found a non-directory where I expected a directory" and call
// `XamShowDirtyDiscErrorUI`.
let vfs_attributes: Option<u32> = if path.is_empty() {
None
} else {
state
.vfs
.as_ref()
.and_then(|vfs| vfs.stat(&path).ok())
.map(|e| e.attributes)
.filter(|&a| a != 0)
};
let is_directory = match vfs_attributes {
Some(a) => (a & 0x10) != 0,
None => path.is_empty() || path.ends_with('/') || path.ends_with(':'),
};
// `FILE_ATTRIBUTE_DIRECTORY` (NT / Xbox) — advertised in // `FILE_ATTRIBUTE_DIRECTORY` (NT / Xbox) — advertised in
// `FileNetworkOpenInformation.FileAttributes`; Sylpheed's async-I/O // `FileNetworkOpenInformation.FileAttributes`; Sylpheed's async-I/O
// worker queries with class=34 and the calling code checks this bit // worker queries with class=34 and the calling code checks this bit
@@ -1532,10 +1582,13 @@ fn nt_query_information_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mu
} }
mem.write_u64(file_info + 32, size); mem.write_u64(file_info + 32, size);
mem.write_u64(file_info + 40, size); mem.write_u64(file_info + 40, size);
let attrs = if is_directory { // Prefer the real GDFX attribute byte; fall back to the
FILE_ATTRIBUTE_DIRECTORY // DIRECTORY/NORMAL split for root-of-device and synth-stub
} else { // handles that have no VFS entry.
FILE_ATTRIBUTE_NORMAL let attrs = match vfs_attributes {
Some(a) => a,
None if is_directory => FILE_ATTRIBUTE_DIRECTORY,
None => FILE_ATTRIBUTE_NORMAL,
}; };
mem.write_u32(file_info + 48, attrs); mem.write_u32(file_info + 48, attrs);
mem.write_u32(file_info + 52, 0); // pad mem.write_u32(file_info + 52, 0); // pad
@@ -1738,7 +1791,18 @@ fn nt_query_full_attributes_file(ctx: &mut PpcContext, mem: &GuestMemory, state:
mem.write_u32(out + 28, filetime as u32); mem.write_u32(out + 28, filetime as u32);
mem.write_u64(out + 32, entry.size); mem.write_u64(out + 32, entry.size);
mem.write_u64(out + 40, entry.size); mem.write_u64(out + 40, entry.size);
let attrs: u32 = if entry.is_directory { 0x10 } else { 0x80 }; // Use the REAL GDFX attribute byte forwarded by the VFS
// (canary `disc_image_device.cc:154`) instead of a
// path-shape guess. Disc rips never carry a 0-attribute
// entry, but guard anyway so a synthesised/legacy entry
// still advertises a sane DIRECTORY/NORMAL split.
let attrs: u32 = if entry.attributes != 0 {
entry.attributes
} else if entry.is_directory {
0x10
} else {
0x80
};
mem.write_u32(out + 48, attrs); mem.write_u32(out + 48, attrs);
mem.write_u32(out + 52, 0); mem.write_u32(out + 52, 0);
} }
@@ -1859,6 +1923,7 @@ fn nt_query_directory_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut
is_directory: e.is_directory, is_directory: e.is_directory,
size: e.size, size: e.size,
offset: e.offset, offset: e.offset,
attributes: e.attributes,
}) })
}) })
.collect(), .collect(),
@@ -1909,7 +1974,12 @@ fn nt_query_directory_file(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut
mem.write_u64(base + 0x20, 0); mem.write_u64(base + 0x20, 0);
mem.write_u64(base + 0x28, entry.size); mem.write_u64(base + 0x28, entry.size);
mem.write_u64(base + 0x30, entry.size); mem.write_u64(base + 0x30, entry.size);
let attrs = if entry.is_directory { // Real GDFX attribute byte (canary `disc_image_device.cc:154`);
// fall back to the directory/normal split only for legacy entries
// that carry no attribute bits.
let attrs = if entry.attributes != 0 {
entry.attributes
} else if entry.is_directory {
FILE_ATTRIBUTE_DIRECTORY FILE_ATTRIBUTE_DIRECTORY
} else { } else {
FILE_ATTRIBUTE_NORMAL FILE_ATTRIBUTE_NORMAL
@@ -1977,14 +2047,29 @@ fn nt_close(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) {
// so a later scheduler round doesn't try to signal a dead handle. // so a later scheduler round doesn't try to signal a dead handle.
// `disarm_timer` is a no-op for non-timer handles. // `disarm_timer` is a no-op for non-timer handles.
state.disarm_timer(handle); state.disarm_timer(handle);
// AUDIT-059 R34: return the slot to the recycle FIFO so a later
// `alloc_handle` mints the same ID (matching canary's slab).
state.release_handle_slot(handle);
} }
ctx.gpr[3] = 0; ctx.gpr[3] = 0;
} }
fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) { fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
// r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
//
// Xenon DISPATCHER_HEADER `Type` (NT convention):
// 0 = NotificationEvent (manual-reset)
// 1 = SynchronizationEvent (auto-reset)
// Canary: `xboxkrnl_threading.cc:668` `ev->Initialize(!event_type, !!initial_state)`
// with `XEvent::Initialize(bool manual_reset, ...)` (xevent.cc:25) and
// `InitializeNative` (xevent.cc:41 `case 0x00: manual_reset_ = true`).
// So `manual_reset = (event_type == 0)`. The Ke-path
// (`ensure_dispatcher_object`) was already correct; the Nt-path here was
// inverted, mis-classifying Sylpheed's per-frame VSync gate (type=1 auto +
// initial=1) as manual-reset+signaled → it stayed signaled forever and
// tid=1's main loop spun ~2800x canary's 60Hz.
let handle_ptr = ctx.gpr[3] as u32; let handle_ptr = ctx.gpr[3] as u32;
let manual_reset = ctx.gpr[5] != 0; let manual_reset = ctx.gpr[5] == 0;
let signaled = ctx.gpr[6] != 0; let signaled = ctx.gpr[6] != 0;
let handle = state.alloc_handle_for(KernelObject::Event { let handle = state.alloc_handle_for(KernelObject::Event {
manual_reset, manual_reset,
@@ -1998,6 +2083,9 @@ fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelSt
mem, mem,
"NtCreateEvent", "NtCreateEvent",
); );
// ITERATE-2C Phase D — audit-049 auto-signal POC. Env-gated; no-op
// when `XENIA_SILPH_UI_AUTOSIGNAL_DELAY` is unset.
state.maybe_register_silph_autosignal(handle, ctx, mem);
if handle_ptr != 0 { if handle_ptr != 0 {
mem.write_u32(handle_ptr, handle); mem.write_u32(handle_ptr, handle);
} }
@@ -2085,7 +2173,7 @@ fn nt_set_timer_ex(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelSt
// timebase separately (immutable borrow) before any mutation of the // timebase separately (immutable borrow) before any mutation of the
// object to keep the borrow-checker happy. // object to keep the borrow-checker happy.
let hw_id = state.scheduler.current_hw_id().unwrap_or(0); let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
let now = state.scheduler.ctx(hw_id).timebase; let now = state.now_basis_at(hw_id);
// Read signed i64 due_time (big-endian hi/lo — same pattern as // Read signed i64 due_time (big-endian hi/lo — same pattern as
// parse_timeout). Negative = relative-from-now, positive = absolute // parse_timeout). Negative = relative-from-now, positive = absolute
@@ -3081,13 +3169,18 @@ fn vd_swap(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
// safer to cap the read at the known total size to avoid OOB. // safer to cap the read at the known total size to avoid OOB.
let mut tiled = Vec::with_capacity(total_tiled_bytes); let mut tiled = Vec::with_capacity(total_tiled_bytes);
let mut ok = true; let mut ok = true;
// The frontbuffer is a guest *physical* address; project onto the
// committed backing window (see `xenia_gpu::physical_to_backing`)
// so the present reads the pixels the GPU resolved, not a stale /
// zero mirror page.
let fb_backing = xenia_gpu::physical_to_backing(swap.frontbuffer_phys);
for i in 0..total_tiled_bytes { for i in 0..total_tiled_bytes {
// read_u8 is cheap — the VirtualMemory handler returns 0 // read_u8 is cheap — the VirtualMemory handler returns 0
// for unmapped pages so we get a recognisable dark frame // for unmapped pages so we get a recognisable dark frame
// rather than a crash if the address turned out bogus. // rather than a crash if the address turned out bogus.
let addr = swap.frontbuffer_phys.wrapping_add(i as u32); let addr = fb_backing.wrapping_add(i as u32);
tiled.push(mem.read_u8(addr)); tiled.push(mem.read_u8(addr));
if addr < swap.frontbuffer_phys { if addr < fb_backing {
ok = false; ok = false;
break; break;
} }
@@ -3509,7 +3602,7 @@ pub(crate) fn parse_timeout(state: &KernelState, timeout_ptr: u32, mem: &GuestMe
return Some(Some(0)); // poll return Some(Some(0)); // poll
} }
let hw_id = state.scheduler.current_hw_id().unwrap_or(0); let hw_id = state.scheduler.current_hw_id().unwrap_or(0);
let now = state.scheduler.ctx(hw_id).timebase; let now = state.now_basis_at(hw_id);
// Negative = relative, positive = absolute wall-clock. Our timebase is a // Negative = relative, positive = absolute wall-clock. Our timebase is a
// plain instruction counter, so we treat all timeouts as "time-units // plain instruction counter, so we treat all timeouts as "time-units
// after now" regardless of sign, using the magnitude. // after now" regardless of sign, using the magnitude.
@@ -4817,12 +4910,14 @@ mod tests {
is_directory: false, is_directory: false,
size: 0x1000, size: 0x1000,
offset: 0, offset: 0,
attributes: 0x81, // NORMAL | READONLY
}, },
xenia_vfs::VfsEntry { xenia_vfs::VfsEntry {
name: "dat".into(), name: "dat".into(),
is_directory: true, is_directory: true,
size: 0, size: 0,
offset: 0, offset: 0,
attributes: 0x11, // DIRECTORY | READONLY
}, },
// A grandchild — must NOT appear in root enumeration. // A grandchild — must NOT appear in root enumeration.
xenia_vfs::VfsEntry { xenia_vfs::VfsEntry {
@@ -4830,6 +4925,7 @@ mod tests {
is_directory: false, is_directory: false,
size: 0x2000, size: 0x2000,
offset: 0, offset: 0,
attributes: 0x81,
}, },
], ],
})); }));
@@ -4856,9 +4952,11 @@ mod tests {
// NextEntryOffset. // NextEntryOffset.
let mut cursor: u32 = 0; let mut cursor: u32 = 0;
let mut names: Vec<String> = Vec::new(); let mut names: Vec<String> = Vec::new();
let mut attrs: Vec<u32> = Vec::new();
loop { loop {
let entry_base = buf + cursor; let entry_base = buf + cursor;
let name_len = mem.read_u32(entry_base + 0x3C) as usize; let name_len = mem.read_u32(entry_base + 0x3C) as usize;
attrs.push(mem.read_u32(entry_base + 0x38));
let mut bytes = Vec::with_capacity(name_len); let mut bytes = Vec::with_capacity(name_len);
for i in 0..name_len as u32 { for i in 0..name_len as u32 {
bytes.push(mem.read_u8(entry_base + 0x40 + i)); bytes.push(mem.read_u8(entry_base + 0x40 + i));
@@ -4871,6 +4969,12 @@ mod tests {
cursor += next; cursor += next;
} }
assert_eq!(names, vec!["default.xex", "dat"]); assert_eq!(names, vec!["default.xex", "dat"]);
// The real GDFX attribute byte must be forwarded verbatim: the file
// reports NORMAL|READONLY (no DIRECTORY bit), the directory reports
// DIRECTORY|READONLY.
assert_eq!(attrs, vec![0x81, 0x11]);
assert_eq!(attrs[0] & 0x10, 0, "file must not advertise DIRECTORY");
assert_ne!(attrs[1] & 0x10, 0, "dir must advertise DIRECTORY");
// A second call on the same handle must return NO_MORE_FILES — // A second call on the same handle must return NO_MORE_FILES —
// the cursor has advanced past the end. // the cursor has advanced past the end.
ctx.gpr[3] = handle as u64; ctx.gpr[3] = handle as u64;
@@ -6390,4 +6494,23 @@ mod tests {
assert!(resolved.ends_with("etc/foo")); assert!(resolved.ends_with("etc/foo"));
std::fs::remove_dir_all(&dir).ok(); std::fs::remove_dir_all(&dir).ok();
} }
/// `MmGetPhysicalAddress` must be region-aware, matching canary's
/// `PhysicalHeap::GetPhysicalAddress`: the 0xE0000000+ 4 KB mirror gets a
/// `+0x1000` host-address-offset; every other region is a flat
/// `& 0x1FFFFFFF` mask.
#[test]
fn mm_get_physical_address_region_aware() {
// 0xE0000000 mirror: canary `address - heap_base (==addr & 0x1FFFFFFF)`
// then `+ 0x1000`.
assert_eq!(translate_physical_address(0xE000_0000), 0x0000_1000);
assert_eq!(translate_physical_address(0xE000_5000), 0x0000_6000);
assert_eq!(translate_physical_address(0xFFFF_F000), 0x1FFF_F000 + 0x1000);
// 0xA0000000 / 0xC0000000 physical heaps: flat mask, no offset.
assert_eq!(translate_physical_address(0xA000_0000), 0x0000_0000);
assert_eq!(translate_physical_address(0xC012_3000), 0x0012_3000);
// Virtual / already-physical (< 0x20000000): unchanged.
assert_eq!(translate_physical_address(0x0012_3000), 0x0012_3000);
assert_eq!(translate_physical_address(0x4012_3000), 0x0012_3000);
}
} }

View File

@@ -17,6 +17,16 @@ impl PcrWriter for GuestMemoryPcr<'_> {
// `GuestMemory::write_u32` takes `&self` post-M2 trait flip; the // `GuestMemory::write_u32` takes `&self` post-M2 trait flip; the
// wrapping `&'a GuestMemory` is sufficient. // wrapping `&'a GuestMemory` is sufficient.
self.0.write_u32(pcr_base + 0x2C, hw_id as u32); self.0.write_u32(pcr_base + 0x2C, hw_id as u32);
// PRCB.current_cpu byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC).
// Canary writes `GetFakeCpuNumber(affinity)` here (xthread.cc:847
// `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread
// id we already compute. Guest spin-barriers (e.g. sub_824D1328, used by
// the audio/update pump threads at entries 0x824D2878/0x824D2940) index a
// per-HW-thread occupancy array by `lbz r11, 268(r13)` = this byte. Left
// unwritten it stayed 0 for every thread, so all threads collided on
// slot 0 and the multi-thread rendezvous signature never assembled —
// the pump threads spun forever and never fired their KeSetEvent loops.
self.0.write_u8(pcr_base + 0x10C, hw_id);
} }
} }
@@ -56,6 +66,18 @@ pub struct KernelState {
/// publish; observers (the kernel object table) are guarded by /// publish; observers (the kernel object table) are guarded by
/// their own synchronization. /// their own synchronization.
next_handle: std::sync::atomic::AtomicU32, next_handle: std::sync::atomic::AtomicU32,
/// AUDIT-059 R34: FIFO free list of closed handle slots, mirroring
/// canary's slab/free-list `ObjectTable`. Without this, ours' bump
/// allocator monotonically grows so a recycled slot in canary
/// (e.g. `F8000098` reused 130× per 30s) corresponds to a fresh,
/// never-reused slot in ours — the kernel-object identity drifts.
/// Recycling closes that gap and (per AUDIT-042 / R30) may
/// side-effect-unwedge γ-cluster #2 by letting silph signals land
/// on the same handle slot the wait registered for. Population is
/// gated on `KernelState::release_handle_slot` (only IDs in
/// `[HANDLE_BASE, 0xF000_0000)` are recycled — synthetic XAudio
/// handles at `0xF000_0000+` are reserved and must never be reused).
free_handles: std::collections::VecDeque<u32>,
/// Scheduler managing all emulated HW threads + their per-slot /// Scheduler managing all emulated HW threads + their per-slot
/// runqueues. Starts empty — the app installs the initial guest thread /// runqueues. Starts empty — the app installs the initial guest thread
/// on slot 0 via `KernelState::install_initial_thread` once it has the /// on slot 0 via `KernelState::install_initial_thread` once it has the
@@ -279,6 +301,17 @@ pub struct KernelState {
/// Settable via `--audit-r3-dump-bytes` / /// Settable via `--audit-r3-dump-bytes` /
/// `XENIA_AUDIT_R3_DUMP_BYTES`. /// `XENIA_AUDIT_R3_DUMP_BYTES`.
pub audit_r3_dump_bytes: Option<u32>, pub audit_r3_dump_bytes: Option<u32>,
/// iterate-2E — diagnostic pointer-chase. `(reg, off)`: on every
/// `AUDIT-PC-PROBE` fire, treat `gpr[reg]` as a base object pointer,
/// dump its first 64 bytes, then follow `[base+off]` to a sub-object
/// (e.g. a stream/file object held in a work item), dump ITS first 64
/// bytes, then follow `[[base+off]+0]` to the sub-object's vtable and
/// dump the first 48 u32 slots. Designed to capture the live work-item
/// + stream object + vtable at `sub_824510E0` entry (r4 = work item,
/// stream at +36, vtable[28] = the "is-read-done?" predicate) BEFORE
/// the pool recycles the slot. Read-only; lockstep digest unaffected.
/// Settable via `XENIA_AUDIT_DEREF=<reg>:<off>` (e.g. `4:36`).
pub audit_deref: Option<(u8, u32)>,
/// M12 — diagnostic. PCs at which to emit a structured JSONL record /// M12 — diagnostic. PCs at which to emit a structured JSONL record
/// per fire, designed for diffing against xenia-canary's /// per fire, designed for diffing against xenia-canary's
/// `--log_lr_on_pc` patch output. Each line carries /// `--log_lr_on_pc` patch output. Each line carries
@@ -313,6 +346,42 @@ pub struct KernelState {
pub silph_synth_handles: [Option<u32>; 4], pub silph_synth_handles: [Option<u32>; 4],
/// AUDIT-2.BF — `ThreadRef` cache for the 4 synthetic workers. /// AUDIT-2.BF — `ThreadRef` cache for the 4 synthetic workers.
pub silph_synth_refs: [Option<xenia_cpu::ThreadRef>; 4], pub silph_synth_refs: [Option<xenia_cpu::ThreadRef>; 4],
/// ITERATE-2C Phase D — auto-signal delay for silph::UImpl
/// `NtCreateEvent` calls (see [`Self::maybe_register_silph_autosignal`]).
/// `None` = feature disabled; populated once from
/// `XENIA_SILPH_UI_AUTOSIGNAL_DELAY=<u64>` at construction.
pub silph_autosignal_delay: Option<u64>,
/// ITERATE-2C Phase D — pending auto-signal queue. Drained each
/// outer round by [`Self::fire_due_silph_autosignals`].
pub silph_autosignal_pending: Vec<AutoSignalPending>,
/// ITERATE-2C Phase D — most recent `stats.instruction_count`
/// deposited by the scheduler loop (see
/// [`Self::set_now_cycle_hint`]). Used by
/// [`Self::maybe_register_silph_autosignal`] to compute absolute
/// deadlines, since `nt_create_event` doesn't see `ExecStats`.
pub last_cycle_hint: u64,
/// ITERATE-2C Phase D — one-shot diagnostic latch. Flipped by
/// [`Self::fire_due_silph_autosignals`] on the first visit where
/// the pending queue is non-empty but no entry is due yet.
pub silph_autosignal_diag_logged: bool,
/// ITERATE-2J — guest VA of the `KeTimeStampBundle` block (xboxkrnl
/// data export ordinal 0x00AD). Set during the import-patch pass in
/// `xenia-app`. Zero until then. The guest's worker-hub channel
/// dispatch loop polls `[block+0x10]` (`tick_count`, milliseconds) and
/// gates dispatch on a `tick_count + 66` deadline; if the block is
/// never re-written that deadline never elapses and the hub spins
/// forever (the tid14 0x109c starvation gate). The run loop ticks this
/// block every round from the deterministic `global_clock` via
/// [`Self::update_timestamp_bundle`].
pub timestamp_bundle_addr: u32,
}
/// ITERATE-2C Phase D — one queued auto-signal. `deadline_cycle` is
/// absolute (cycle hint at register time + configured delay).
#[derive(Debug, Clone, Copy)]
pub struct AutoSignalPending {
pub handle: u32,
pub deadline_cycle: u64,
} }
impl KernelState { impl KernelState {
@@ -338,6 +407,7 @@ impl KernelState {
let mut state = Self { let mut state = Self {
exports: HashMap::new(), exports: HashMap::new(),
next_handle: AtomicU32::new(0x1000), next_handle: AtomicU32::new(0x1000),
free_handles: std::collections::VecDeque::new(),
scheduler, scheduler,
next_tls_index: AtomicU32::new(0), next_tls_index: AtomicU32::new(0),
cs_waiters: HashMap::new(), cs_waiters: HashMap::new(),
@@ -379,6 +449,7 @@ impl KernelState {
audit_pc_probe_pcs: std::collections::HashSet::new(), audit_pc_probe_pcs: std::collections::HashSet::new(),
audit_mem_read_addr: None, audit_mem_read_addr: None,
audit_r3_dump_bytes: None, audit_r3_dump_bytes: None,
audit_deref: None,
lr_trace_pcs: std::collections::HashSet::new(), lr_trace_pcs: std::collections::HashSet::new(),
lr_trace_writer: None, lr_trace_writer: None,
dump_addrs: Vec::new(), dump_addrs: Vec::new(),
@@ -387,6 +458,13 @@ impl KernelState {
silph_synth_ctx: 0, silph_synth_ctx: 0,
silph_synth_handles: [None; 4], silph_synth_handles: [None; 4],
silph_synth_refs: [None; 4], silph_synth_refs: [None; 4],
silph_autosignal_delay: std::env::var("XENIA_SILPH_UI_AUTOSIGNAL_DELAY")
.ok()
.and_then(|v| v.parse::<u64>().ok()),
silph_autosignal_pending: Vec::new(),
last_cycle_hint: 0,
silph_autosignal_diag_logged: false,
timestamp_bundle_addr: 0,
}; };
crate::exports::register_exports(&mut state); crate::exports::register_exports(&mut state);
crate::xam::register_exports(&mut state); crate::xam::register_exports(&mut state);
@@ -660,12 +738,39 @@ impl KernelState {
} }
pub fn alloc_handle(&mut self) -> u32 { pub fn alloc_handle(&mut self) -> u32 {
// AUDIT-059 R34: prefer recycling a closed slot (FIFO, matching
// canary's `ObjectTable` slab) before bumping. The Arc<Mutex<
// KernelState>> already serializes us; no extra synchronization.
if let Some(slot) = self.free_handles.pop_front() {
return slot;
}
// M2.4: lock-free fetch_add. Relaxed is sufficient — IDs are // M2.4: lock-free fetch_add. Relaxed is sufficient — IDs are
// opaque tokens; no payload is sequenced against the counter. // opaque tokens; no payload is sequenced against the counter.
self.next_handle self.next_handle
.fetch_add(4, std::sync::atomic::Ordering::Relaxed) .fetch_add(4, std::sync::atomic::Ordering::Relaxed)
} }
/// AUDIT-059 R34. Return a freshly-closed handle slot to the FIFO
/// recycle queue. No-op for the synthetic XAudio range (`>= 0xF000_0000`,
/// AUDIT-048) and the reserved `< 0x1000` band. Call site: `nt_close`'s
/// `objects.remove` branch when refcount reaches zero.
///
/// ABA guard (subsystem-audit 2026-06-12): never recycle a slot that a
/// thread is still parked on. Without this, a closed slot could be
/// re-minted for a new object and a signal on that new object would wake
/// the stale waiter that was blocked on the OLD object at the same slot.
/// Such a slot is simply leaked (it stays out of `free_handles`),
/// reproducing the pre-R34 bump-only behaviour for that rare case.
pub fn release_handle_slot(&mut self, handle: u32) {
if handle < 0x1000 || handle >= 0xF000_0000 {
return;
}
if self.scheduler.any_thread_waiting_on(handle) {
return;
}
self.free_handles.push_back(handle);
}
pub fn alloc_handle_for(&mut self, obj: KernelObject) -> u32 { pub fn alloc_handle_for(&mut self, obj: KernelObject) -> u32 {
let h = self.alloc_handle(); let h = self.alloc_handle();
self.objects.insert(h, obj); self.objects.insert(h, obj);
@@ -770,6 +875,173 @@ impl KernelState {
self.audit.record_wake(handle, entry); self.audit.record_wake(handle, entry);
} }
/// ITERATE-2C Phase D — deposit the latest scheduler instruction
/// count so `nt_create_event` can compute absolute auto-signal
/// deadlines. Called once per outer round from the app's
/// `coord_pre_round` site. No-op when the feature env is unset.
pub fn set_now_cycle_hint(&mut self, now_cycle: u64) {
self.last_cycle_hint = now_cycle;
}
/// ITERATE-2J — tick the `KeTimeStampBundle` block (xboxkrnl ordinal
/// 0x00AD) from the deterministic monotonic clock so the guest sees a
/// clock that *advances*.
///
/// `clock` is the scheduler's `global_clock` — a pure function of
/// retired guest instructions (see [`Self::now_basis_at`] /
/// `Scheduler::global_clock`). Lockstep floors it up to
/// `stats.instruction_count` each round; parallel sums per-block
/// retired counts. Using it (rather than wall-clock) keeps every
/// guest-visible time value a deterministic function of guest progress,
/// so lockstep stays byte-reproducible.
///
/// ## Cadence
/// The existing kernel time math (`parse_timeout` in `exports.rs`)
/// already treats **1 `global_clock` unit ≈ 100 ns**: it converts a
/// signed 100-ns `LARGE_INTEGER` timeout to a deadline by dividing the
/// magnitude by 100 and adding it to `now` (= `global_clock`). To stay
/// coherent with that, this method uses the same scale:
///
/// * `interrupt_time` / `system_time` (100-ns units): `clock` (with a
/// FILETIME epoch base added to `system_time`).
/// * `tick_count` (milliseconds): `clock / INSTRUCTIONS_PER_MS` where
/// `INSTRUCTIONS_PER_MS = 10_000` (10_000 × 100 ns = 1 ms).
///
/// At 10_000 clock-units/ms, the guest's `tick_count + 66` ms hub
/// deadline elapses by ~660_000 retired instructions — very early in a
/// ~1 B-instruction boot — while a 16 ms `KeWait` timeout
/// (`parse_timeout`: 160_000 units) still resolves to 16 ms of
/// tick_count, so no timeout collapses to "instant". The two readers
/// share one scale.
pub fn update_timestamp_bundle(&self, mem: &GuestMemory, clock: u64) {
let block = self.timestamp_bundle_addr;
if block == 0 {
return;
}
const INSTRUCTIONS_PER_MS: u64 = 10_000;
// FILETIME epoch base (~2021) so `system_time` is a plausible
// absolute wall-clock; matches the constant used by
// `ke_query_system_time`. interrupt_time is "since boot" so it
// starts at the clock origin (no epoch offset).
const FILETIME_BASE: u64 = 132_500_000_000_000_000;
let interrupt_time: u64 = clock;
let system_time: u64 = FILETIME_BASE.wrapping_add(clock);
let tick_count: u32 = (clock / INSTRUCTIONS_PER_MS) as u32;
// BE writes (write_u64/write_u32 use to_be_bytes) — guest is BE.
mem.write_u64(block, interrupt_time); // +0x00 interrupt_time
mem.write_u64(block + 0x08, system_time); // +0x08 system_time
mem.write_u32(block + 0x10, tick_count); // +0x10 tick_count (ms)
mem.write_u32(block + 0x14, 0); // +0x14 padding
}
/// ITERATE-2C Phase D — register a freshly-allocated event for
/// auto-signal after the configured delay, **iff** the creating
/// thread matches the silph::UImpl tid=13 chain that wedges in
/// audit-049. Filter:
///
/// * Env `XENIA_SILPH_UI_AUTOSIGNAL_DELAY` set (= delay non-None)
/// * Frame-1 LR (the guest caller's post-bl PC, walked one step up
/// from the live thunk-wrapper frame) is in
/// `[0x821CB15C, 0x821CB160]` — this is the `NtCreateEvent` call
/// site inside `sub_821CB030+0x128`. The live `ctx.lr` is the
/// thunk wrapper's return slot (e.g. `0x824a9f6c`), so we walk
/// one back-chain step to reach the actual guest caller.
/// * Creating thread's `start_entry == 0x821748F0` (silph trampoline)
/// * Creating thread's `start_context == 0x4024a840`
///
/// On match, the handle is queued with `deadline = last_cycle_hint +
/// delay`. Drained by [`Self::fire_due_silph_autosignals`] from the
/// outer scheduler loop.
pub fn maybe_register_silph_autosignal(
&mut self,
handle: u32,
ctx: &PpcContext,
mem: &GuestMemory,
) {
let Some(delay) = self.silph_autosignal_delay else {
return;
};
let Some((entry, start_ctx)) = self.scheduler.current_thread_entry_and_ctx() else {
return;
};
if entry != 0x821748F0 || start_ctx != 0x4024_a840 {
return;
}
let frames = walk_guest_back_chain(ctx.gpr[1] as u32, ctx.lr as u32, mem, 2);
let caller_lr = match frames.get(1) {
Some((_, lr)) => *lr,
None => return,
};
if !(0x821CB15C..=0x821CB160).contains(&caller_lr) {
return;
}
let deadline = self.last_cycle_hint.saturating_add(delay);
self.silph_autosignal_pending
.push(AutoSignalPending { handle, deadline_cycle: deadline });
tracing::info!(
"silph autosignal: scheduled handle={:#x} caller_lr={:#x} for cycle {} (now={}, delay={})",
handle,
caller_lr,
deadline,
self.last_cycle_hint,
delay,
);
}
/// ITERATE-2C Phase D — drain pending entries whose deadline has
/// passed. Each fires by setting `Event { signaled = true }` and
/// invoking the existing `wake_eligible_waiters` to release blocked
/// waiters. No-op when the queue is empty (the common case).
pub fn fire_due_silph_autosignals(&mut self, now_cycle: u64) {
if self.silph_autosignal_pending.is_empty() {
return;
}
let any_due = self
.silph_autosignal_pending
.iter()
.any(|p| p.deadline_cycle <= now_cycle);
if !any_due {
// Diagnostic for the Phase D POC: log first time we visit
// with a non-empty queue but nothing due yet.
if !self.silph_autosignal_diag_logged {
self.silph_autosignal_diag_logged = true;
if let Some(first) = self.silph_autosignal_pending.first() {
tracing::info!(
"silph autosignal: tick (first visit, none due) now={} pending={} first_deadline={}",
now_cycle,
self.silph_autosignal_pending.len(),
first.deadline_cycle,
);
}
}
}
let mut i = 0;
while i < self.silph_autosignal_pending.len() {
if self.silph_autosignal_pending[i].deadline_cycle <= now_cycle {
let p = self.silph_autosignal_pending.swap_remove(i);
let prev = match self.objects.get_mut(&p.handle) {
Some(KernelObject::Event { signaled, .. }) => {
let was = *signaled;
*signaled = true;
Some(was)
}
_ => None,
};
tracing::info!(
"silph autosignal: firing handle={:#x} prev_signaled={:?} at cycle {}",
p.handle,
prev,
now_cycle,
);
self.audit_signal(p.handle, 0, "silph_autosignal", prev.unwrap_or(false) as u64);
crate::exports::wake_eligible_waiters(self, p.handle);
// do not advance i — swap_remove pulled a new entry into i
} else {
i += 1;
}
}
}
/// Diagnostic. If the live PC for HW slot `hw_id` is in /// Diagnostic. If the live PC for HW slot `hw_id` is in
/// `self.ctor_probe_pcs`, emit a single `CTOR-PROBE` line with /// `self.ctor_probe_pcs`, emit a single `CTOR-PROBE` line with
/// the current cycle, tid, hw_id, sp, r3, lr, plus an 8-frame /// the current cycle, tid, hw_id, sp, r3, lr, plus an 8-frame
@@ -936,6 +1208,38 @@ impl KernelState {
} }
println!("{}", out); println!("{}", out);
} }
// iterate-2E — pointer-chase: dump base object (gpr[reg]), the
// sub-object it holds at [base+off], and that sub-object's vtable
// slots. Captures the live work-item + stream + vtable[28] at
// sub_824510E0 before the pool recycles the slot. Read-only.
if let Some((reg, deref_off)) = self.audit_deref {
use std::fmt::Write as _;
let base = ctx.gpr[reg as usize] as u32;
let dump64 = |label: &str, p: u32| {
let mut s = String::with_capacity(256);
let _ = write!(&mut s, "AUDIT-DEREF {} ptr={:#010x}", label, p);
let mut o: u32 = 0;
while o < 64 {
let _ = write!(&mut s, " +0x{:02x}={:#010x}", o, mem.read_u32(p.wrapping_add(o)));
o += 4;
}
println!("{}", s);
};
println!("AUDIT-DEREF-HEAD pc={:#010x} tid={} cycle={} reg=r{} off=0x{:x}", pc, tid, cycle, reg, deref_off);
dump64("item", base);
let sub = mem.read_u32(base.wrapping_add(deref_off));
dump64("sub", sub);
let vt = mem.read_u32(sub); // [sub+0] = vtable
// Dump 48 vtable slots so slot 28 (+0x70) and slot 36 (+0x90) show.
let mut s = String::with_capacity(512);
let _ = write!(&mut s, "AUDIT-DEREF vtable={:#010x}", vt);
let mut slot: u32 = 0;
while slot < 48 {
let _ = write!(&mut s, " [{}]={:#010x}", slot, mem.read_u32(vt.wrapping_add(slot * 4)));
slot += 1;
}
println!("{}", s);
}
} }
/// M12 — diagnostic. If the live PC for HW slot `hw_id` is in /// M12 — diagnostic. If the live PC for HW slot `hw_id` is in
@@ -1063,6 +1367,30 @@ impl KernelState {
self.pending_timer_fires.first().map(|&(d, _)| d) self.pending_timer_fires.first().map(|&(d, _)| d)
} }
/// Coherent "now" basis for deadline arithmetic — the scheduler's
/// single monotonic `global_clock`, in BOTH execution modes.
///
/// Per-thread `ctx(hw_id).timebase` is NOT a sound "now" for deadline
/// arithmetic: in `--parallel` workers extract/zero their slots while
/// stepping unlocked, and in **lockstep** a parked/poll thread has
/// `running_idx == None` so `ctx()` returns `idle_ctx` (timebase 0).
/// Either way a `parse_timeout` reading the per-thread basis can see 0
/// (or a stale value) and register `deadline = 0 + relative`, a value
/// permanently in the past, which `coord_idle_advance` then re-arms
/// forever (the timebase-desync livelock; the render-gate root). The
/// `global_clock` is a deterministic function of retired guest
/// instructions (per-round `stats.instruction_count` floor-ups in
/// lockstep, per-block retired counts in parallel), so it is coherent,
/// monotonic, never zero after boot, and bit-reproducible across two
/// cold lockstep runs.
///
/// The `hw_id` argument is retained for call-site clarity (which slot a
/// caller would conceptually be "asking about") but is no longer read —
/// the basis is global.
pub fn now_basis_at(&self, _hw_id: u8) -> u64 {
self.scheduler.global_clock()
}
/// Fire every timer whose deadline is `<= now` (derived from slot 0's /// Fire every timer whose deadline is `<= now` (derived from slot 0's
/// timebase, matching `parse_timeout`'s "current thread" fallback). /// timebase, matching `parse_timeout`'s "current thread" fallback).
/// For each fire: mark the timer `signaled=true`, clear its /// For each fire: mark the timer `signaled=true`, clear its
@@ -1071,7 +1399,7 @@ impl KernelState {
/// fired — the caller uses this to decide whether the scheduler round /// fired — the caller uses this to decide whether the scheduler round
/// needs a follow-up `advance_to_next_wake_if_due` step. /// needs a follow-up `advance_to_next_wake_if_due` step.
pub fn fire_due_timers(&mut self) -> bool { pub fn fire_due_timers(&mut self) -> bool {
let now = self.scheduler.ctx(0).timebase; let now = self.now_basis_at(0);
let mut fired = false; let mut fired = false;
loop { loop {
let Some(&(deadline, handle)) = self.pending_timer_fires.first() else { let Some(&(deadline, handle)) = self.pending_timer_fires.first() else {

View File

@@ -57,6 +57,11 @@ pub fn allocate_thread_image(
mem.write_u32(pcr_base, tls_base); mem.write_u32(pcr_base, tls_base);
mem.write_u32(pcr_base + 0x2C, hw_thread_id as u32); mem.write_u32(pcr_base + 0x2C, hw_thread_id as u32);
mem.write_u32(pcr_base + 0x100, 0x1000); mem.write_u32(pcr_base + 0x100, 0x1000);
// +0x10C prcb_data.current_cpu — canary `pcr->prcb_data.current_cpu`
// (PRCB@0x100 + current_cpu@0xC). Guest spin-barriers index a
// per-HW-thread slot array by `lbz r11, 268(r13)` = this byte; it
// must equal the HW thread id (== PCR+0x2C). See state.rs PcrWriter.
mem.write_u8(pcr_base + 0x10C, hw_thread_id);
mem.write_u32(pcr_base + 0x150, 0); mem.write_u32(pcr_base + 0x150, 0);
Some(ThreadImage { Some(ThreadImage {

View File

@@ -31,6 +31,9 @@ impl VfsDevice for HostPathDevice {
is_directory: metadata.is_dir(), is_directory: metadata.is_dir(),
size: metadata.len(), size: metadata.len(),
offset: 0, offset: 0,
// Host FS carries no Xbox attribute byte; synthesise the
// DIRECTORY/NORMAL split like canary's HostPathDevice.
attributes: if metadata.is_dir() { 0x10 } else { 0x80 },
}); });
} }
Ok(entries) Ok(entries)
@@ -49,6 +52,7 @@ impl VfsDevice for HostPathDevice {
is_directory: metadata.is_dir(), is_directory: metadata.is_dir(),
size: metadata.len(), size: metadata.len(),
offset: 0, offset: 0,
attributes: if metadata.is_dir() { 0x10 } else { 0x80 },
}) })
} }
} }

View File

@@ -29,6 +29,11 @@ const GDFX_MAGIC: &[u8; 20] = b"MICROSOFT*XBOX*MEDIA";
/// File attribute: directory /// File attribute: directory
const FILE_ATTRIBUTE_DIRECTORY: u8 = 0x10; const FILE_ATTRIBUTE_DIRECTORY: u8 = 0x10;
/// File attribute: read-only. Canary OR's this into every GDFX entry's
/// attribute byte because a pressed disc is inherently read-only
/// (`disc_image_device.cc:154`: `attributes | kFileAttributeReadOnly`).
const FILE_ATTRIBUTE_READONLY: u8 = 0x01;
/// Known game partition offsets to try /// Known game partition offsets to try
const LIKELY_OFFSETS: &[u64] = &[ const LIKELY_OFFSETS: &[u64] = &[
0x0000_0000, 0x0000_0000,
@@ -131,6 +136,11 @@ impl DiscImageDevice {
let name = String::from_utf8_lossy(&buffer[p + 14..p + 14 + name_length]).to_string(); let name = String::from_utf8_lossy(&buffer[p + 14..p + 14 + name_length]).to_string();
let is_directory = (attributes & FILE_ATTRIBUTE_DIRECTORY) != 0; let is_directory = (attributes & FILE_ATTRIBUTE_DIRECTORY) != 0;
// Match canary: the on-disc attribute byte (DIRECTORY/HIDDEN/SYSTEM/
// ARCHIVE/NORMAL bits as authored) OR the implicit READONLY bit for
// pressed media. We forward the FULL byte, not a path-shape guess, so
// attribute queries report exactly what the disc records.
let attributes = (attributes | FILE_ATTRIBUTE_READONLY) as u32;
let file_offset = self.game_offset + sector * SECTOR_SIZE; let file_offset = self.game_offset + sector * SECTOR_SIZE;
let full_path = if prefix.is_empty() { let full_path = if prefix.is_empty() {
name.clone() name.clone()
@@ -143,6 +153,7 @@ impl DiscImageDevice {
is_directory, is_directory,
size: length, size: length,
offset: file_offset, offset: file_offset,
attributes,
}); });
// Descend into subdirectories. Zero-length directory entries exist // Descend into subdirectories. Zero-length directory entries exist
@@ -260,4 +271,73 @@ mod tests {
.expect("read_file on nested path"); .expect("read_file on nested path");
assert!(!bytes.is_empty(), "nested read returned empty buffer"); assert!(!bytes.is_empty(), "nested read returned empty buffer");
} }
/// Build a one-node GDFX directory buffer in memory and parse it with
/// `collect_entries`, asserting the real on-disc attribute byte is
/// forwarded into `VfsEntry.attributes` (with READONLY OR'd in, matching
/// canary `disc_image_device.cc:154`) rather than synthesised from the
/// path shape.
fn parse_single_entry(name: &str, on_disc_attr: u8) -> VfsEntry {
// GDFX dirent: node_l(u16) node_r(u16) sector(u32) length(u32)
// attributes(u8) name_length(u8) name(bytes). The directory bit
// gates subdirectory descent; use length=0 so a "directory" entry
// is treated as an empty leaf and we don't recurse off the buffer.
let mut buf = Vec::new();
buf.extend_from_slice(&0u16.to_le_bytes()); // node_l
buf.extend_from_slice(&0u16.to_le_bytes()); // node_r
buf.extend_from_slice(&0u32.to_le_bytes()); // sector
buf.extend_from_slice(&0u32.to_le_bytes()); // length (0 => leaf)
buf.push(on_disc_attr); // attributes
buf.push(name.len() as u8); // name_length
buf.extend_from_slice(name.as_bytes());
let mut dev = DiscImageDevice {
name: "test".into(),
path: std::path::PathBuf::new(),
game_offset: 0,
entries: Vec::new(),
};
// `file` is only touched when descending into a non-empty directory;
// our length=0 entries never recurse, so a dummy handle is fine.
let mut file = std::fs::File::open("/dev/null").expect("open /dev/null");
dev.collect_entries(&mut file, &buf, 0, "").expect("parse");
assert_eq!(dev.entries.len(), 1);
dev.entries.into_iter().next().unwrap()
}
#[test]
fn directory_entry_reports_directory_attribute() {
// On-disc 0x10 (DIRECTORY) -> attributes carries 0x10 and READONLY.
let e = parse_single_entry("dat", FILE_ATTRIBUTE_DIRECTORY);
assert!(e.is_directory, "directory bit not decoded");
assert_ne!(
e.attributes & 0x10,
0,
"FILE_ATTRIBUTE_DIRECTORY must be set for a directory entry"
);
assert_ne!(e.attributes & 0x01, 0, "READONLY must be OR'd in (canary)");
}
#[test]
fn file_entry_has_no_directory_attribute() {
// On-disc 0x80 (NORMAL) -> not a directory; READONLY still OR'd in.
let e = parse_single_entry("default.xex", 0x80);
assert!(!e.is_directory, "non-directory misdecoded as directory");
assert_eq!(
e.attributes & 0x10,
0,
"FILE_ATTRIBUTE_DIRECTORY must be clear for a file entry"
);
assert_ne!(e.attributes & 0x80, 0, "NORMAL bit must be preserved");
assert_ne!(e.attributes & 0x01, 0, "READONLY must be OR'd in (canary)");
}
#[test]
fn archive_and_hidden_bits_are_preserved() {
// ARCHIVE(0x20) | HIDDEN(0x02) authored on disc must survive intact.
let e = parse_single_entry("save.dat", 0x20 | 0x02);
assert_eq!(e.attributes & 0x20, 0x20, "ARCHIVE bit dropped");
assert_eq!(e.attributes & 0x02, 0x02, "HIDDEN bit dropped");
assert_eq!(e.attributes & 0x10, 0, "spurious DIRECTORY bit");
}
} }

View File

@@ -22,6 +22,16 @@ pub struct VfsEntry {
pub is_directory: bool, pub is_directory: bool,
pub size: u64, pub size: u64,
pub offset: u64, pub offset: u64,
/// Xbox `FILE_ATTRIBUTE_*` bitmask for this entry, sourced from the
/// backing device's real on-disc metadata rather than inferred from
/// the path shape. For GDFX disc images this is the on-disc attribute
/// byte at dirent offset +12 OR'd with `FILE_ATTRIBUTE_READONLY`
/// (matches xenia-canary `disc_image_device.cc:154`:
/// `entry->attributes_ = attributes | kFileAttributeReadOnly`).
///
/// Bit layout (canary `vfs/entry.h:66-76`): READONLY=0x01, HIDDEN=0x02,
/// SYSTEM=0x04, DIRECTORY=0x10, ARCHIVE=0x20, NORMAL=0x80.
pub attributes: u32,
} }
/// Trait for VFS device implementations (XISO, STFS, host path, etc.) /// Trait for VFS device implementations (XISO, STFS, host path, etc.)