xenia-rs

Author	SHA1	Message	Date
MechaCat02	ad9c8e4cb8	[iterate-2U] VdGlobalDevice: allocate a real device cell so the swap counter (clock B) can advance Sylpheed's title loop re-runs its per-frame manager update sub_821741C8 only when "clock B" ([controller+88], the swap count) changes. Clock B's sole source is the CP swap-complete callback sub_824CE2B8, which bumps [gfx+15160] via the TWO-LEVEL deref [[VdGlobalDevice]+0]+15160, where VdGlobalDevice is the kernel variable export 0x01BE at guest .data 0x82000750. Ours patched that import slot with literal 0 (the old "passed through to Vd* shims, write 0" behaviour). Consequences, both confirmed at runtime: * the guest's graphics init stores its D3D device object via `stw r31, 0([0x82000750])` (sub_824C6DC0 @0x824C6F18) — with the slot 0, that store lands at address 0; * the swap callback reads [[0x82000750]] = [0] = 0 and increments [0+15160] (the null page) instead of the real device's swap counter. So [gfx+15160] never moved, clock B stayed frozen at 0, sub_821741C8 fired exactly once, and the game submitted one render batch (the 78-draw splash) then stalled. Fix mirrors xenia-canary RegisterVideoExports (xboxkrnl_video.cc:557-564) exactly: allocate a 4-byte cell, point the import slot at it, zero the cell. The guest then stores its device into the cell, and the callback's two-level deref resolves correctly. Verified: [0x82000750] now holds a real cell whose [+0] is the device (gfx state), the swap callback bumps [gfx+15160] 0->1, clock B advances, and the per-frame chain steps forward (sub_821741C8 fires 1->2x, GamePart update sub_821C7CB8 0->1x). Determinism: --gpu-inline digest re-baselined and byte-identical across runs. The fix shifts the early execution trajectory (clock B unfreezing), so the n50m golden moves imports 451500->178937 and instructions 50000001->50000014; draws/swaps/RTs/shaders unchanged (78/4/2/3). n2m golden unchanged (early boot, pre-fix-effect). 675 workspace tests green; sylpheed_n50m oracle green. Note: this breaks the FIRST hard blocker (clock B could never advance at all). Full per-frame sustain (draws past 78) needs a further step: each GamePart update must submit a per-frame command buffer (with PM4_INTERRUPT) during the asset-streaming phase to keep generating CP interrupts; ours currently produces only the single seed interrupt from the initial batch, so the chain advances once and re-stalls. Tracked for the next iterate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 16:20:08 +02:00
MechaCat02	873c197ff1	[iterate-2T] VdSwap: route present through ring PM4_XE_SWAP, drop out-of-band swap interrupt Make ours' VdSwap present path faithful to xenia-canary `VdSwap_entry` (xboxkrnl_video.cc:518-548): write the reserved 64-dword ring slot with a PM4_TYPE0 fetch-constant patch + PM4_TYPE3(PM4_XE_SWAP) + NOP padding, then let the natural drain consume the swap packet in command-stream order. Remove the synthetic CP swap-complete interrupt that `notify_xe_swap` raised out-of-band. Root found this session (the actual present-path bug): ours' `notify_xe_swap` pushed an `InterruptSource::Swap` (→ INTERRUPT_SOURCE_CP) interrupt directly from the VdSwap HLE, decoupled from the GPU command stream. When that interrupt reached the graphics ISR `sub_824BE9A0` before D3D had armed its swap-callback slot (`[gfx+10772]+16` still the `0xBADF00D` placeholder), the ISR took its error path and hit the assert "ERR[D3D]: Unanticipated CPU_INTERRUPT. Sign of a corrupt command buffer?" (`bl sub_824C5DF0; twi` at 0x824BE9DC) — 2x per run on master. Canary's VdSwap raises NO interrupt; swap-complete CP interrupts come only from in-stream PM4_INTERRUPT packets, which are naturally ordered after the callback-arming Type-0 writes. Routing the swap through the ring packet matches that ordering and eliminates the trap (2 -> 0). Canary oracle confirmation (muted, audit_mem_watch + audit_jit_prolog_pc): canary's early/loading loop is present-driven — swap counter [gfx+15160] (0xBE56CA38) advances ~per-vblank from vblank 65 onward, reaching 0xD02 (3330) in ~60s via 6184 CP source=1 interrupts, with VdSwap called only ONCE. So the present interrupts are entirely in-stream, not from the VdSwap export. This is a correctness/faithfulness fix; it does NOT cascade. draws stay 78 at 200M and 1B because the upstream gate persists: the game submits one render batch then stalls (renderer sub_82506xxx 0x; 2nd title thread 0x821748F0 never spawns). The per-frame loop sub_822F1AA8 runs ~1207 iterations on vsync but clock B (swap count) only advances ~once, so the manager update sub_821741C8 fires once. That is the iterate-2Q/2F title-pipeline gate, not a present/ interrupt bug. swaps 3 -> 4 (the in-stream PM4_XE_SWAP now drains). Deterministic in inline mode (n50m --gpu-inline --stable-digest regenerated byte-identical twice; golden re-baselined: swaps 3 -> 4). cargo test --workspace 675 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 15:20:02 +02:00
MechaCat02	1ae472bd2b	[iterate-2S] GPU: implement CP SCRATCH_REG memory writeback — arms Sylpheed's swap-callback slot Sylpheed renders the splash (draws=78, iterate-2O) then plateaus: the title's per-frame manager (sub_821741C8) only re-fires when "clock B" ([gfx+15160], swap count) changes, which only the CP swap-complete callback sub_824CE2B8 increments. The graphics ISR sub_824BE9A0 indirect-calls that callback via [[gfx+10772]+16] on CP (source=1) interrupts, but the slot stayed NULL so the callback never ran. Root (runtime-verified, ours-side GPU): the guest arms the slot through the Xenos CP scratch-register writeback path, which ours never implemented. The arming IB (drained by ours at 0x4adf5180) contains a Type-0 register write of the callback PC 0x824ce2b8 into SCRATCH_REG4 (0x057C). On hardware/canary, writing a SCRATCH_REG{n} mirrors the value to SCRATCH_ADDR + n4 in memory when the matching SCRATCH_UMSK bit is set. Runtime values: SCRATCH_ADDR=0x0b1d5000 (the [gfx+10772] descriptor), SCRATCH_UMSK=0x20033 (bit 4 set), so SCRATCH_REG4 -> 0x0b1d5010 = descriptor+16 = the callback slot (0x4b1d5010). Ours decoded the Type-0 write into the register file but performed no writeback (case a: drained-but-mishandled), so the slot stayed NULL. Fix mirrors canary's CommandProcessor::HandleSpecialRegisterWrite (command_processor.cc:545-552): a scratch_register_writeback() helper called from handle_type0/handle_type1 after every register write; for SCRATCH_REG0..7 with the UMSK bit set, it writes the value (big-endian, as mem.write_u32 already stores) to SCRATCH_ADDR + n4 (projected via physical_to_backing). Deterministic given identical register state; proven by unit test. Cascade (verified by runtime probe): slot 0x4b1d5010 now armed with 0x824ce2b8; on the 2-3 CP interrupts that fire, the ISR reads the slot and bcctrl's into sub_824CE2B8 (runs 2x; 0x cascade on master); sub_824CE2B8 increments clock B ([gfx+15160]). The cascade does NOT yet reach draws>78: there are only ~3 CP interrupts (from the initial 9825- packet batch), and the title render loop stalls upstream (the iterate-2Q title-respawn gate) before it submits more PM4_INTERRUPT work, so the callback can't bootstrap a self-sustaining loop. This is the remaining update-17/18 arming gap closed; the upstream stall is the next gate. The default threaded GPU backend drains the ring on a separate host thread, so with the callback now doing work the exact CP-interrupt delivery instruction varies run to run (pre-existing GPU-thread race). Pin the n50m oracle test to --gpu-inline (instruction-count deterministic) and re-baseline its golden; bit-exact across repeated runs. New unit test scratch_reg_write_mirrors_to_memory_when_umsk_enabled. Tests: 675 pass (was 674). Golden re-baselined + determinism verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 14:21:30 +02:00
MechaCat02	034ec8b47f	[iterate-2O] GPU: drain indirect buffers correctly — Sylpheed renders splash (draws 0→78) Ours' GPU never drained the D3D driver's system command buffer past the first 11-dword indirect buffer, so DRAW_INDX / reg-0x57C-arm packets never executed and draws stayed 0 (the long-hunted render gate; see UPDATE-18). Runtime tracing (temporary, removed) showed the guest submits 6 INDIRECT_BUFFER packets at boot (CP_RB_WPTR 22→37) but ours executed exactly ONE IB and then spun 15.7M packets inside it. Three coupled command-processor bugs, all corrected to match canary: 1. `sync_with_mmio` applied the primary CP_RB_WPTR to whichever ring was active, including an executing indirect buffer — `37 % 11 = 3` clobbered the IB's write pointer so its read pointer looped 0→2→5→0 forever and never popped back to the primary ring. CP_RB_WPTR governs ONLY the primary ring; while an IB executes, the primary is the bottom of the IB stack. Canary executes each IB through a separate `RingBuffer reader_` (command_processor.cc), so the primary write pointer is structurally inapplicable to an IB. 2. Indirect buffers were treated as circular rings: read wrapped at `size_dwords` (`11 % 11 = 0`) and never reached the fixed write pointer, so even without the clobber the IB could not terminate. An IB is a fixed linear sub-stream; add `RingBufferView.indirect` and drain `[0, ib_size)` monotonically, then pop. 3. `is_ready` only checked the active ring, so an IB that now correctly exhausts would never get `execute_one` called again to pop back to the primary ring (whose WPTR may have advanced). Check the whole IB stack. Also: the ring was sized `1 << size_log2` bytes (1024 dwords) vs canary's `1 << (size_log2 + 3)` (8192 dwords) — an 8× undersize that desynced WPTR-wrap math from the guest. Fixed in `GpuSystem::initialize_ring_buffer` (and the dead bookkeeping copy in `vd_initialize_ring_buffer`). Cascade (deterministic; threaded-default backend, byte-identical across runs): reg 0x57C now written, IB jumps 1→12, packets 15.7M→9,825, and the splash renders — draws 0→78, shaders 0→3, render_targets 0→2, swaps 2→3 — stable at 50M / 200M / 1B. Boot then reaches a new downstream gate (draws plateau at 78, interrupts keep climbing → engine alive, not deadlocked). golden `sylpheed_n50m.json` re-baselined (draws 78). `cargo test --workspace` green (674; +2 ring_view regression tests). vd_swap's synthetic-swap short-circuit is now redundant but left untouched (cascade works without changing it); cleaning it up is a separate follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 22:06:16 +02:00
MechaCat02	93f60a3ba0	[iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC). Canary sets it from `GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C. Left unwritten it read 0 for every thread. Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`. With index 0 for all threads, every thread marked slot 0; the multi-byte rendezvous signature it then spins on (`ld [obj+0x164]` compared against the packed per-slot expectation) could never assemble. Both pump threads busied at pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran their `KeSetEvent` loops — so the events they signal (the 21k-per-thread heartbeat in canary) never fired, starving the downstream worker handshake. Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler spawn + affinity migration) so the two stay in sync. Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier (barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c). imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical). draws still 0 (a later, separate render gate). golden re-baselined. cargo test --workspace: 672 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 18:08:46 +02:00
MechaCat02	2bdb93e51e	[iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never truly drained → render worker ring-space wait → no frame → no draw): The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror windows differing only in cache policy — bare physical (0x0xxxxxxx), write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc` (MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx window. The guest masks its CP-ring allocation base to bare physical (0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve pointers are likewise bare-physical. Ours stored those verbatim and read `membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream, and the RPtr writeback landed on a zero page the render worker (tid=8) polls, freezing it forever. Fix (GPU/Vd-boundary translation, not memory-layer): add `physical_to_backing(addr)` deriving the committed backing exactly from `heap_alloc`'s placement (0x4000_0000 \| (addr & 0x1FFF_FFFF), idempotent for the WC window, flat for non-physical code/stack). Apply it at every point the GPU/kernel consumes a guest physical address: ring base (initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4 INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write, REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses, the resolve dest write, and the vd_swap frontbuffer present read. This was chosen over memory-layer aliasing because the latter re-projects every CPU load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an early PC=0xfffffffc fault). Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful): - WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef selector (it decoded wait_info&7==3 as NotEqual instead of Equal), inverting the standard CP coherency wait so the GPU parked forever on the first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never. - Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit (mirrors command_processor.cc:801-838) so the coherency handshake resolves. Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT, INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU worker); the frame loop sub_822F1AA8 actively writes the controller at 0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean. draws_seen is still 0: the remaining gate is upstream and separate — the main frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls at 0x23, the known iterate-2C state-divergence gate), so the guest never enqueues a render IB; the GPU only ever replays the init IB. This fix correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the bit-28 frame-ready gate is the next target. Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at 50M); regenerated twice byte-identical. cargo test --workspace: 672 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 13:39:57 +02:00
MechaCat02	ed2e0e72fd	[iterate-2J] KeTimeStampBundle deterministic tick: fix frozen+mislaid guest clock The xboxkrnl data export KeTimeStampBundle (ordinal 0x00AD, import slot 0x820007d0 — confirmed via sylpheed.db imports table) was set up with TWO defects in the import-patch pass: 1. FROZEN: the block was written once at boot and never updated, so every field stayed a constant for the whole run (observed: the guest's clock reader sub_824AA830 = [[0x820007d0]+0x10] returned a constant 0x01d6bc0c from 5M..150M instructions). 2. WRONG LAYOUT: it stuffed the FILETIME high-dword at +0x10. The canonical X_TIME_STAMP_BUNDLE (xenia-canary kernel_state.h) is: +0x00 interrupt_time u64 (100ns since boot) +0x08 system_time u64 (FILETIME 100ns since 1601) +0x10 tick_count u32 (milliseconds since boot) +0x14 padding so [block+0x10] is tick_count in ms, not a FILETIME dword. Fix (deterministic, no wall-clock): * Initialize the block with the correct field layout (tick_count = 0 at boot, system_time = FILETIME base, interrupt_time = 0). * Store the block VA on KernelState::timestamp_bundle_addr during the import patch. * Add KernelState::update_timestamp_bundle(mem, clock) and call it every round in BOTH the lockstep (run_execution) and parallel (run_execution_parallel) outer loops, right where the deterministic Scheduler::global_clock is advanced. The clock is the retired-instruction monotonic global_clock, so every guest-visible time value stays a pure function of guest progress (lockstep byte-reproducible). * Cadence: 1 global_clock unit = 100ns (coherent with parse_timeout, which divides 100ns timeouts by 100 onto the same basis), so INSTRUCTIONS_PER_MS = 10_000. tick_count now advances 0 -> ~4999ms over a 50M-instruction window. Also make KeQuerySystemTime read the same 100ns clock instead of a frozen FILETIME constant. Verification: tick_count at 0x40002010 now advances (deadline arm at 0x82450d0c stores clock+66 = 0x260,0x269,...,0x51d,... advancing, vs the frozen 0x01d6bc4e before the fix). Determinism: two cold --stable-digest runs are byte-identical; the n50m golden is UNCHANGED (the clock-affected counter is not in the stable digest). 672/672 tests pass. HONEST CAVEAT — the predicted render cascade did NOT materialize on this branch. The diagnosed consuming gate at 0x82450b10 (the clock-vs-deadline compare in the worker-hub channel loop sub_82450A68) is unreachable here: the loop always branches away at 0x82450b0c ([this+220] >= channel-index), so the hub already dispatches sub_82450B68 342x in BOTH the frozen and fixed builds. Guest trajectory (imports 339766@50M / 1738001@200M / 9212446@1B), draws (0), swaps (2) and thread topology (tid14 Ready, not blocked on 0x109c) are identical frozen-vs-fixed. This commit is therefore a correct latent-clock-bug fix and determinism-safe prerequisite, NOT the render unblock. The 0x109c/tid14 starvation premise was not reproduced at f75bc96; the next gate must be re-localized. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 11:54:44 +02:00
MechaCat02	f75bc96d17	[iterate-2H] PPC spin/yield/sync hint-class audit: lock no-over-yield + barrier-decode invariants Audited the full PowerPC spin/yield/sync/SMT-priority-hint instruction class against the canary oracle (ppc_emit_alu.cc InstrEmit_orx / ppc_emit_memory.cc sync/eieio/isync) and against what Project Sylpheed actually executes (static scan of the extracted image + disasm of the spin sites 0x824D1328 / 0x824C17AC / 0x824D3CF8). Findings (no behavior change required — the class is already faithful): - or rX,rX,rX SMT priority hints: canary special-cases EXACTLY 0x7FFFFB78 (db16cyc) -> DelayExecution; every OTHER or-self form -> Nop. Ours already matches (only 0x7FFFFB78 yields). Image scan: the documented priority hints or 1/2/3/6/26..30 do NOT appear in Sylpheed at all; the only SMT spin hint used is or 31,31,31 (db16cyc), already handled in `de21c7a`. The 854 `or 8,8,8` etc. are compiler register self-moves (plain no-ops), not spin hints. - sync / lwsync / ptesync share XO=598 -> all decode to PpcOpcode::sync (canary keys on XO only, identical); eieio (XO=854), isync (XO=150) decode correctly. All are value-neutral no-ops under the single-host model, matching canary MemoryBarrier/Nop. unimpl=0 in a 200M run confirms none trap. tlbsync is not implemented by canary either and is unused by Sylpheed. - mftb-based timed back-off (loop at 0x824D3CF8: mftb delta vs timeout, with db16cyc between polls and a timeout escape) relies on the already-landed db16cyc yield + coherent global-clock timebase; no deadlock, no new gap. - ori 0,0,0 canonical nop (140 sites) is value-neutral; matches canary Nop. Lands two regression tests that lock the audited invariants so a future change cannot over-yield on a benign priority hint (which would perturb the deterministic schedule) or break the sync L-field decode: - test_smt_priority_hints_are_nops_not_yields - test_lwsync_ptesync_eieio_isync_decode_as_benign_noops Determinism preserved (tests-only): two cold lockstep `check -n 5M` (no persist) byte-identical; golden digest unchanged (no re-baseline). Full workspace suite green. 200M cascade unchanged (packets~172M, draws=0, shaders=0, swaps=1) — confirms the hint class is exhausted; the render gate is now downstream (tid14 0x109c per-job completion event), not CPU semantics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:53:54 +02:00
MechaCat02	de21c7a544	[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:38:17 +02:00
MechaCat02	f3b7e8b760	[iterate-2F] Scheduler anti-starvation floor: fix job-4 handoff render gate The lockstep scheduler's pick_runnable is strict priority (max_by_key (priority, -idx)). On a cooperative single-host HW slot, a CPU-bound spinner that never blocks (the silph poll loop pinned by affinity to hw=5) wins pick_runnable every round forever, permanently starving a co-located peer (the submitter, tid6) that the spinner is actually waiting on. On real hardware those threads run on separate SMT contexts concurrently, so the spinner never starves the submitter; ours collapses them onto one slot with no anti-starvation, turning priority (or equal-priority index order) into permanent starvation. The starved submitter never dequeued job-4 -> the worker-hub (tid5) blocked INFINITE on completion event 0x1080 -> silph (tid13) wedged on 0x1078 -> no vsync -> draws_seen=0, the publisher splash never renders. (decrement_quantum's within-slot rotation is dead: begin_slot_visit unconditionally re-pick_runnable()s each round, discarding the rotated running_idx. The fix is therefore evaluated at pick time, not via that discarded rotation.) Fix (Option A, bounded anti-starvation, deterministic): - Add per-thread steps_starved counter to GuestThread. - begin_slot_visit increments it for every Ready peer passed over this visit, resets it to 0 for the picked thread. - pick_runnable selects by effective_priority: once steps_starved reaches STARVE_LIMIT (4096) the thread is lifted to i32::MAX and wins exactly one pick, then resets. The genuinely higher-priority thread still wins ~4095/4096 visits -- the boost grants periodic forward progress only, it does NOT invert priority. Pure function of counter/priority/index -> deterministic (no wall-clock, no RNG). Cascade (lockstep exec, XENIA_CACHE_PERSIST=1, -n 200M): - submitter dequeue sub_82458508 now fires 4x (was 3x); the 4th job (buf 0x40baa2c0) is dequeued at cycle 6.15M. - hub tid5 leaves Blocked(0x1080) -> now Ready (no more INFINITE wait). - GPU packets 0 -> 116,101,363 (command stream now flowing). - tid13 (silph::UImpl) advances past the old 0x1078 wedge to a NEW downstream wait (handle 0x10a0); 3 new threads spawn (tid14/15/16). - draws_seen still 0 -> the splash's first draw is a NEW downstream gate, not this starvation. Determinism: two cold lockstep `check -n 5M` runs byte-identical (full and stable digests). New n50m stable digest deterministic across two cold runs. Golden re-baselined: instructions 50000007->50000003, imports 92317->90296 (trajectory shift from the changed pick order). Tests: 666/666 (+1 test_anti_starvation_bounded_progress). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:02:02 +02:00
MechaCat02	7e2603a9e5	[iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix) Lockstep livelocked the scheduler the same way --parallel did before `0332d19`: the kernel deadline-arithmetic (`now_basis_at`) read per-thread `ctx(hw_id).timebase`, but a parked/poll thread has `running_idx == None` so `Scheduler::ctx()` returns `idle_ctx` (timebase 0). A poll thread (tid=7, a `KeWaitForSingleObject` loop with a 30ms relative timeout) computing its deadline via `parse_timeout` therefore read `now = 0` and registered `deadline = 0 + 3000 = 3000` — a constant ~7.78M units in the past. `coord_idle_advance` then re-armed that same constant 3000 deadline forever, pinning virtual time and starving every other thread's real future deadline. Render-gate impact: the submitter (tid=6) re-enters a 16ms-timeout WaitForMultiple after its first jobs; that timeout never fired because vtime was pinned at 3000, so virtual time never reached real future deadlines. Fix (Option A — mirror the parallel fix): drive the existing deterministic `Scheduler::global_clock` in lockstep too (floored up once per outer round to `stats.instruction_count`, a pure function of retired guest instructions — no wall-clock), and route `KernelState::now_basis_at` through `global_clock()` in BOTH modes. New `Scheduler::advance_global_clock_to(now)` floor-up keeps it monotone alongside `advance_all_timebases_to`. Parallel behavior unchanged (it already read `global_clock()`). Verified (lockstep, 50M): - DETERMINISM: two cold `check -n 5M` and two cold `-n 50M` runs byte-identical. - LIVELOCK GONE: "advanced to deadline" went from 592,679 fires / 2 unique values / 562,084 pinned at 3000 -> 18,586 fires / 18,567 unique / 0 pinned, strictly increasing 5.4M -> 50M. Poll thread tid=7 now ends Blocked with a real future deadline Some(60002824) instead of spin-Ready on the past 3000. - imports 1,790,936 -> 92,317 at 50M (the spin no longer burns import calls). Cascade (lockstep, XENIA_CACHE_PERSIST=1, -n 200M): engine now runs to budget instead of hard-deadlocking. Hub enqueue (sub_82458068) 4x; submitter dequeue (sub_82458508) still 3x — the lost 4th-job HANDOFF (count/notify between sub_82458068's tail and the submitter queue) is a SEPARATE downstream gate, not the timebase. New gate: tid=5 (hub) Blocked INFINITE on event 0x1080 (job-4 completion); tid=6 (submitter) Ready, parked in WaitForMultiple (sub_824AB214), loop-top stops at cycle 6.23M. draws still 0, VdSwap 1. Golden re-baseline (same commit): sylpheed_n50m instructions 50000004 -> 50000007, imports 1790936 -> 92317 (swaps/draws/RTs/shaders/textures unchanged). sylpheed_n2m unchanged (livelock onsets after 2M). Suite 665/665 + oracle green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 21:42:28 +02:00
MechaCat02	5aaadfec36	[iterate-2E] Add XENIA_AUDIT_DEREF pointer-chase probe On each AUDIT-PC-PROBE fire, treat gpr[reg] as a base object, dump its first 64 bytes, follow [base+off] to a sub-object, dump that, then follow [[base+off]+0] to its vtable and dump 48 slots. Env-gated (XENIA_AUDIT_DEREF=<reg>:<off>), read-only, lockstep digest unaffected. Captures the live work-item + stream object + vtable at sub_824510E0 before the pool recycles the slot — which overturned the prior session's "infinite spin" diagnosis: the streaming read PROGRESSES 68/68 128KB chunks of a 9MB file, then the hub (tid=5) blocks INFINITE on a self-created Event/Manual (0x1060) that is never signaled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-12 20:29:01 +02:00
MechaCat02	0332d1990d	[Track 2] Parallel-scoped global clock fixes timebase-desync livelock In --parallel mode a long run livelocked: the scheduler spun "advanced to deadline 3000 waking hw=2 idx=0" ~14k times in microseconds. Root cause: each guest thread owns ctx.timebase (+1/instr in step_block), and all kernel deadline arithmetic read Scheduler::ctx(hw_id).timebase as "now". But the parallel worker extracts its PpcContext via mem::replace(ctx_mut_ref, PpcContext::new()) — leaving a ZEROED timebase in the slot while it steps unlocked — and advance_all_timebases_to only walks runqueue (never idle_ctx). So the coordinator's coord_pre_round drain and a woken thread's parse_timeout could read a zeroed/stale basis decoupled from the deadline the scheduler just advanced to. The thread re-armed the same constant deadline forever; the global clock never moved. Fix: add a single monotonic Scheduler::global_clock, advanced by the per-block retired-instruction count on each parallel writeback and floored up by advance_all_timebases_to. Kernel deadline reads route through KernelState::now_basis_at(hw_id), which returns global_clock ONLY when parallel_active; lockstep keeps reading the exact pre-existing ctx(hw_id).timebase expression, so the deterministic lockstep trace is byte-identical (sylpheed_n50m golden unchanged, zero re-baseline). Verified: - 50M --parallel run completes (was: hung). Deadlines now strictly increasing 5.4M -> 49.1M (18097 unique of 18116; max repeat 2) vs pre-fix constant 3000 x ~14000. - sylpheed_n50m golden byte-identical via plain `check` (no persist). - Full suite 665/665 green. Note: an intermittent parallel hang/crash (~1-2/20 at -n 5M) is pre-existing (master 1/20, this build 2/20 — within noise) and distinct from the timebase livelock: it is a parallel-race class (e.g. the unsafe block_ptr deref in run_execution_parallel). Tracked separately; lockstep remains the recommendation for long runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 19:32:14 +02:00
MechaCat02	48b19e490f	[Prong A] Three 32-bit ABI PPCBUG siblings corrected to canary semantics Second differential audit, lead prong: hunt siblings of PPCBUG-020 (the word-form ALU truncation fixed in `341196a`, whose "32-bit ABI / MSR.SF=0" premise was false — Xenon is a 64-bit core). Found three more band-aids of the same class, each verified against the canary oracle. All three are genuine oracle/ISA divergences but INERT on Sylpheed's lockstep trace (sylpheed_n50m golden digest unchanged; no re-baseline). Fixed + directed tests anyway to close the band-aid class (per audit decision). 1. slw/srw shift-count mask (PPCBUG-044 site). Ours tested the full u32 count `< 32`; canary InstrEmit_slwx/srwx mask `rb & 0x3F` then test bit 5. A count like 0x40 (low-6-bits 0) must pass the value through, not zero it. Fixed both to `& 0x3F`. The 32-bit CR0 i32-view is unchanged (genuinely 32-bit). 2. sraw/srawi result extension (PPCBUG-041/042/043 "writeback truncation"). Ours zero-extended the 32-bit arithmetic-shift result (`result as u32 as u64`); PowerISA + canary InstrEmit_srawx/srawix SIGN-extend it (`f.SignExtend`, the `(i64.s)&¬m` fill). 0x80000000>>1 is now 0xFFFFFFFF_C0000000, not 0x00000000_C0000000. CA math and CR0 view byte-identical. 3. mtspr CTR width (PPCBUG-054). Ours stored `val as u32 as u64`, dropping the upper 32 bits; CTR is a 64-bit SPR and canary InstrEmit_mtspr stores the full GPR (`f.StoreCTR(rt)`). A later `mfspr rX, CTR` now round-trips correctly. bdnz/bcctr still consume only CTR's low 32 bits (the bcx zero-TEST truncation at line ~922 MATCHES canary's `f.Truncate(ctr, INT32_TYPE)` — left untouched). Tests: updated srawx_negative_value_sign_extends_upper, srawix_high_count_negative_input_sign_extends_all_ones, and mtspr_ctr_keeps_full_64_bits (formerly premise-defending the bugs — reading-error #24). Added slwx/srwx 6-bit-mask tests, mfspr_ctr round-trip, and the rlwinm MB>ME wraparound-mask test (plan-requested gap closure). 665/665. Left correct (re-confirmed vs canary, do NOT touch): bcx/bclr CTR 32-bit test, divw/divwu zero-extend quotient (canary f.ZeroExtend, ISA upper undefined), extsb/extsh, logical-NOT chain, mulhw/mulhwu, srawx 0x3F mask, pixel pack/unpack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 17:25:41 +02:00
MechaCat02	341196a111	[Issue-1 PPCBUG-020] Word-form ALU ops produce full 64-bit results Xenon is a 64-bit PPC core (32-bit pointer ABI, but 64-bit registers and integer arithmetic). The interpreter was truncating every word-form integer ALU writeback to 32 bits and zero-extending, on a false "MSR.SF=0 / 32-bit ABI" premise. This silently corrupted any genuine 64-bit value flowing through word-form arithmetic. Confirmed load-bearing via runtime ours-vs-canary capture: Sylpheed's millisecond->LARGE_INTEGER timeout converter sub_824ACA88 does `clrldi; mulli r11,r11,-10000; std`. For a 16 ms wait the correct result is -160000 = 0xFFFFFFFF_FFFD8F00 (relative). canary stores exactly that; ours' truncating `mulli` stored 0x00000000_FFFD8F00 (positive) -> the i64 timeout read as a huge absolute deadline -> a ~26000x over-wait that froze the main frame loop. After the fix the timeout matches canary and the previously-frozen frame/worker loops run (parallel boot NtWaitForMultipleObjectsEx 94 -> 30428; KeWaitForSingleObject/critical-section loops resume). Fix mirrors canary's INT64 emitters (ppc_emit_alu.cc) op-by-op for the 17 data-losing word-form ops: addis, addic(.), subfic(.), mulli, add(c/e/ze/me)x, subf(c/e/ze/me)x, negx, mullwx. Only the result writeback widens to full 64 bit; the 32-bit carry (XER[CA]) and overflow (XER[OV]) computations and the CR0 i32 view are preserved byte-identical (the low 32 bits of the new result equal the old truncated result), so this is a strict no-op for clean 32-bit values and only restores the previously-zeroed upper bits for genuine 64-bit values. Genuinely-32-bit ops (rlwinm/slw/srw/cmpw, mulhw/divw whose upper bits are ISA-undefined) are left untouched. Updated 7 unit tests that asserted the truncation (they encoded the bug) to the canary-correct full-64-bit values. Re-baselined the sylpheed_n50m golden (imports 40454 -> 1790936: the unwedged frame/worker loops now cycle under the instruction-count timebase); sylpheed_n2m unchanged (pre-frame-loop). Lockstep determinism preserved (two 50M runs identical). Full suite 660/660. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 16:21:11 +02:00
MechaCat02	b20c99f141	[Subsystem-fixes] 6 verified ours-vs-canary divergence fixes From the 2026-06-12 5-subsystem differential audit. All verified against canary as oracle; 660/660 workspace tests green (655 + 5 new). 1. nt_create_event polarity (exports.rs) — `manual_reset = gpr[5] != 0` was INVERTED. Canary xboxkrnl_threading.cc:668 `Initialize(!event_type,..)` + xevent.cc:41 (type 0 = NotificationEvent = manual, type 1 = Sync = auto). Now `== 0`. Was the dormant 2.AI fix on chore/portable-snapshot, never merged. The Ke-path was already correct; only the Nt-path was wrong. 2. 2.AF deadline drain (main.rs coord_pre_round) — expired KeWait/KeDelay deadlines never fired under load because advance_to_next_wake_if_due was only called in coord_idle_advance (no-Ready-threads path). Added a per-round drain loop; covers BOTH lockstep and parallel outer loops since both call coord_pre_round. Was the dormant 2.AF fix, never merged. 3. handle slab-recycle ABA guard (state.rs + scheduler.rs) — release_handle_slot (my round-34 regression) recycled a closed slot even with a thread still parked on it, risking a stale-waiter wake when the slot is re-minted. Added Scheduler::any_thread_waiting_on; decline to recycle a still-waited slot. 4. vpkpx pixel-pack (vmx.rs) — wrong field mapping (~100% mismatch). Now exact canary ppc_emit_altivec.cc:1795 shift/mask (red 6b out[15:10] from w[24:19], green out[9:5] from w[14:10], blue out[4:0] from w[7:3]; no fabricated alpha bit). +unit test. 5. VFS GDFX attribute plumbing (vfs/, exports.rs query fns) — VfsEntry now carries the real on-disc attribute byte (GDFX dirent +12, canary disc_image_device.cc:136/154) instead of inferring directory-ness from path shape. Query exports report the real FILE_ATTRIBUTE_ bits. Candidate driver of the XamShowDirtyDiscErrorUI gate. +tests. 6. MmGetPhysicalAddress region-aware mirror (exports.rs) — flat 0x1FFFFFFF mask missed canary's +0x1000 host_address_offset for 0xE0000000+ mirror (memory.cc:2317). Read-only query; proven byte-identical 50M digest. +test. Investigated and intentionally NOT changed: - zero-on-recommit: no-op; ours has no region-reuse path (bump allocators, free is a stub). - 32-bit ALU writeback truncation (PPCBUG-020): documented-deliberate; premise (MSR.SF=0) is questionable but flipping it is out of scope here. - KeSetEvent/NtSetEvent return value: ours returns true previous state (hardware-faithful); canary returns constant 1 — NOT an ours bug. sylpheed_n50m golden will need re-baselining (legit behavior change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-12 14:57:38 +02:00
MechaCat02	db90ad0f7d	[AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0, start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128) and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY instructions. Env-gated; default off. SR4 verdict B (partial unwedge): - handle 0x1078 signal_attempts 0->1 - tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108 - ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15) - New downstream wedges 0x1084 + 0x1088 - cxx_throw runtime_error on tid=5 inside R26 dispatcher (BST not-registered instance lhs=0x715a7af0) - VdSwap stays 1; no draws (POC is diagnostic, not final fix) Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST instance upstream; this POC only does (a). Reading-error class #20: ctx.lr at kernel export entry is the thunk wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain 1 step to get frames[1].lr. Reading-error class #21: --parallel and lockstep have SEPARATE outer loops in main.rs (run_execution_parallel line 2928 vs run_execution line 2706). Per-round hooks must be wired in BOTH paths. Files: - crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context fields + spawn() population + current_thread_entry_and_ctx() helper - crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint, maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals - crates/xenia-kernel/src/exports.rs: hook in nt_create_event - crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops - audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md Tests 655/655 green. Default behavior byte-identical when env unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-11 18:38:38 +02:00
MechaCat02	229b46c765	[Kernel] Slab-recycle handle allocator (AUDIT-059 R34) Adds a FIFO free list of closed handle slots so alloc_handle returns recycled IDs before bumping next_handle. Mirrors canary's slab-style ObjectTable: F8000098 reused 130x per 30s window in canary, but ours' monotonic bump allocator never reused slots — so a recycled slot in canary maps to a fresh, never-reused slot in ours, drifting kernel object identity per AUDIT-042's analysis. release_handle_slot is wired into nt_close's refcount==0 branch and gated to the canonical [0x1000, 0xF000_0000) range so synthetic XAudio park handles (AUDIT-048) are never recycled. Verified: all 655 workspace tests green, smoke tests at -n 50M show NtClose 115/run with handle table renumbering active (round-34 max handle 0x12ac vs round-16 baseline 0x12b8 over same workload). γ- cluster #2 wedge unchanged — silph wait still parks tid=13 on the renumbered handle (4216=0x1078 here vs 0x12a4 baseline), confirming the wedge is independent of allocator policy. Lands as a parity fix to bring our kernel-object identity in line with canary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-10 18:04:34 +02:00
MechaCat02	40f208ea4e	[2.BF] Silph WorkerCtx: install canary's real sub-vtable at [+0x2C][0] Round-21 pivot of the audit-059 synth-spawn module. Round 20 made the silph::WorkerCtx workers run by attaching a 32-slot stub sub-vtable where every entry was a `li r3, 0; blr` stub — workers spawned but spun forever because slots 15/17 short-circuited to NULL ("no work"). Round 21 reads canary's real sub-vtable VA out of the XEX `.rdata` — `0x8200A168` — and points `[sub_object + 0]` at it directly. The vtable bytes live in the static image both engines map, so no guest memory is consumed and slot 15 (= `sub_824FCCC8`) and slot 17 (= `sub_824FCE38`) — the only slots `sub_82506B08` ever calls — become working game methods. Discovery method (canary probes in `audit-runs/audit-059-handle-disambiguation/round21-subvtable-canary/`): 1. `--audit_jit_prolog_pc=0x82506B08` to catch the first WorkerCtx virtual-dispatch entry; `[r3+0x2C]` revealed the sub-object VA. 2. Re-run with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to deref `[sub-object + 0]` = sub-vtable VA = 0x8200A168. 3. PE inspection (`xex-text/xex-rdata` is the static image) reads all 31 slots; slot 15 -> sub_824FCCC8, slot 17 -> sub_824FCE38. Smoke metrics (50M instructions, `XENIA_CACHE_PERSIST=1 XENIA_SILPH_SYNTH=1`, audit-runs/audit-059-handle-disambiguation/ round21-real-vtable/): * 4/4 workers spawned, no crash, no new fault * KeSetEvent 633885 -> 431860 (-32%) * KeWaitForSingleObject 258441 -> 185762 (-28%) * Per-handle state unchanged on the focused stalled set (0x1020/0x1090 still `<NO_SIGNALS_DESPITE_WAITS>`, 0x12a4/0x12ac/0x1218/0x1224 still `<UNCREATED>`). * No VdSwap/draws progression observed in this window. Verdict: B (partial). The workers no longer spin in a stub-loop — internal call density shifted — but the focused wedge handles still don't get signalled. Likely root cause: workers may now be waiting on the WorkerCtx's own KEVENTs (which we synthesised at +0x54/+0x94) for upstream work that no producer is enqueuing. Net LOC: 29 ins / 31 del. Tests: workspace passes (lockstep app tests, kernel 127/127, hir 288/288, scheduler 38/38). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 21:19:52 +02:00
MechaCat02	8683fb59ed	[2.BF] Silph WorkerCtx: synthesize sub-object + vtable at [+0x2C] Audit-059 round 19 isolated the round-18 worker fault: the four silph:: WorkerCtx worker bodies all execute the sequence lwz r3, 44(rN) ; r3 = [ctx+0x2C] — sub-object pointer lwz r11, 0(r3) ; r11 = sub-object vtable lwz r11, 60(r11) ; r11 = sub-object vtable[15] mtctr r11 bctrl Ours left [ctx+0x2C] NULL → PC=0 fault on first virtual dispatch. Round 19 recommended materialising a sub-object whose vtable points entirely at an existing trivial-return stub so workers idle live, returning NULL work, without crashing. Changes (silph_synth.rs only, +63/-6): - Grow SILPH_CTX_SIZE 0x500 → 0x800 to embed sub-object at +0x300 and a 32-slot sub-vtable at +0x500 in the same heap_alloc. - After ctx header init, write sub-object pointer at [ctx+0x2C], the XEX- resident wrapper constant 0xBE568F00 (round-7 finding) at [ctx+0x30], and leave [ctx+0x28] NULL (matches canary first-fire snapshot). - Populate every slot of the 32-entry sub-vtable with VA 0x8216CAA4, the first 4-byte-aligned standalone `li r3, 0; blr` stub located by a fresh PE-text scan (preceded by a `blr` terminating the previous function). - Sub-object body itself is zero-filled apart from the [+0]=vtable_ptr write; round-19 disassembly confirms workers only touch slots 15/17. Smoke (XENIA_SILPH_SYNTH=1, persistent cache, 5e7 instr): - Lockstep: no crash, all 4 workers (tid=6/7/8/9) reach Ready in deep worker-body PCs (0x825067xx/0x825089xx/0x825091xx). Verdict (D) — workers run their idle loop returning NULL; existing silph waiters (0x1020, 0x1090) remain <NO_SIGNALS_DESPITE_WAITS> because we deliberately neutered productive work. - Parallel: identical picture, no PC=0/PC=garbage fault anywhere. No regression in 765-test suite. Next round: feed real work-items into the intrusive ring at ctx+0x210 so workers' returned-NULL idle becomes returned-work productive; or discover which sub-vtable slots actually need real callees (slot 15 worker drain, slot 17 producer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 21:04:04 +02:00
MechaCat02	b5885b8560	[2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing) Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059 identified as never reached by ours' static-init chain (real chain entry sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8). Ctx layout follows round 5's live hexdump from canary: +0x00 vtable = 0x8200A1E8 +0x04 self +0x08 intrusive list head -> self +0x0C init flag = 1 +0x10 packed byte field +0x18 2x float ~1.0 (UI rates) +0x24 flag = 1 +0x28..+0x30 3x foreign-arena pointers (left NULL — see below) +0x54..+0x84 4x X_KEVENT auto-reset, state=0 +0x94..+0xC4 4x X_KEVENT manual-reset, state=1 (pre-signaled) +0x210..+0x250 4-entry intrusive work-ring, empty Worker spawn mirrors AUDIT-048's audio-worker pattern in xaudio_register_render_driver: per-worker allocate_thread_image + state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary runs the equivalent chain. ROUND 18 OUTCOME — opt-in only: With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8) spawns. Per task constraints this is STOP-and-report: the ctx fields +0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018, 0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are left NULL, and the worker bodies plausibly dereference one of them. Synthesising those is a fresh investigation (round 19+). With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot completes normally (11 spawns, VdSwap=1, KeSetEvent=2, KeReleaseSemaphore=1 — matches default baseline). The ctx remains materialised in guest memory at the logged VA for downstream probing. Default (env var unset): no synth, no regression. Files: crates/xenia-kernel/src/silph_synth.rs (new, 225 LOC) crates/xenia-kernel/src/lib.rs (+1 LOC, register module) crates/xenia-kernel/src/exports.rs (+37 LOC, hook in open_vfs_file) crates/xenia-kernel/src/state.rs (+18 LOC, 4 silph_synth_* fields) Tests: cargo test --release --workspace = 765 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 20:44:29 +02:00
MechaCat02	9340ff4592	[Audit] --audit-r3-dump-bytes: dump N bytes at r3 when probe fires AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set, every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap 256B). Sized for the 80-byte stack-local struct at sub_82452DC0's `r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr). Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest unaffected (empty-set fast path in fire_audit_pc_probe_if_match). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 19:39:22 +02:00
MechaCat02	bcd018659b	[Audit] --audit-mem-dump-chain: deref a guest address N levels for diagnosis Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90 (PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs. To name the resolved target we need to dump the (singleton, vtable, vtable[0]) chain on probe firing. Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel emits a paired `AUDIT-MEM-READ` line with three guest reads: AUDIT-MEM-READ addr=0x828E1F08 val=<addr> vtable=<addr> \ vtable[0]=<addr+0> vtable[24]=<*addr+24> ... `vtable[24]` is included as the slot-6 method (audit-059 round 9 documented the canary silph chain dispatching slot 6 of a vtable here). Read-only; lockstep digest unaffected. ~30 LOC across state.rs and main.rs. `cmd_check` opts out of the flag (same policy as the existing audit_pc_probe_hex). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 12:13:42 +02:00
MechaCat02	09e59e09b7	Audit-2BF.delta: add --audit-pc-probe-hex for silph-init bctrl probe Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8 (PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw, cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the vtable, slot-6 method pointer, auxiliary handle field, and embedded sub-object vtable — so the line can be compared head-to-head with canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644, slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to identify whether ours dispatches to the wrong vtable on a correct object (case A) or to a wrong object entirely (case B). Why this addition rather than reuse of an existing probe: --lr-trace emits JSONL designed for canary-side diffing and only captures r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR flags and lr but again no memory; --ctor-probe is single-shot per PC and walks the stack back-chain. None of them load the four indirect fields needed to identify a vtable-shape divergence. Implementation: - state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper `fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path keeps the cost to one is_empty() check per worker_prologue call when the flag is unused. Read-only — no guest state mutation, lockstep digest unchanged. - main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma parsing (tolerates `0x` prefix), settable also via XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner; cmd_check passes None so check digests are unaffected. Probe wired into worker_prologue alongside fire_ctor_probe / fire_- branch_probe / fire_lr_trace. Like its siblings, it fires once per basic-block entry — known limitation (audit-045 reading-error class 13); use a block-entry PC if probing a mid-block instruction. Verification: kernel 127/127, app 5/5 non-ignored, no behaviour change with empty flag. Cross-references audit-059 round 9's canary capture and lays the groundwork for the round-10 ours-side comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 10:59:03 +02:00
MechaCat02	5a8fe21ad5	Iterate-2.BF.γ: refine is_in_callback gate to per-thread exclusion Lockstep vsync delivery was capped at 54/run despite the ticker firing 333 periods and dispatcher being called 1.2M times. Root cause: the blanket `is_in_callback()` gate skipped dispatch entirely whenever the async audio path held `interrupts.saved`, which is essentially the entire boot (audio worker rarely hits its LR_HALT_SENTINEL between back-to-back callbacks). 5.85M dispatch_skip_in_callback events drowned out the 55 with-pending windows. Graphics dispatch (iterate-2.BE) runs the ISR synchronously and restores the borrowed context before returning — it doesn't touch `interrupts.saved`. The only real conflict is if graphics picks the same thread audio borrowed (which would stomp audio's SavedCallbackCtx). Replace the blanket gate with per-thread exclusion: when audio is mid-flight, exclude only its `injected_ref` from victim selection. Falls through to the existing no-victim drop if that's the only candidate. Lockstep (50M instr): gpu.interrupt.delivered{source=0} 54 → 295 (5.5×), all 333 ticker periods either delivered or unarmed (no more queue_full_drops). Wallclock unchanged ~3 s. Parallel (30M instr): 1193 → 3458 baseline lift (2.9×), no regression. Tests: xenia-kernel 127/127, xenia-app 5/5 non-ignored. Lockstep goldens will drift (interrupts.delivered is in the digest); deferred to next iterate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 19:52:16 +02:00
MechaCat02	51489e34db	Iterate-2.BE Path β: tick vsync from coord_idle_advance The iterate-2.BE host-driven synchronous ISR dispatcher relies on something queueing v-syncs. In lockstep that's `tick_vsync_instr`, called from `coord_pre_round` per round. If the scheduler stalls into `coord_idle_advance` (no Ready threads), the instruction counter freezes — the accumulator stops incrementing, the ticker stops queueing, and the dispatcher is left starved for the duration of the idle wait. Tick `tick_vsync_wallclock` at the top of `coord_idle_advance` so v-syncs keep firing on host time even when the guest scheduler is parked. The dispatcher in the outer loop drains whatever we queue on the next iteration. Same MMIO `D1MODE_VBLANK_VLINE_STATUS` bit-set as the production path. Note: empirically in Sylpheed at 50M/500M instruction horizons, `coord_idle_advance` is never reached (tids 9/10/12 stay Ready through the early-boot deadlock), so this commit doesn't move `gpu.interrupt.delivered{source=0}` off 54 for this title at these horizons. It is the correct fix for the documented starvation pattern and will activate as soon as the kernel reaches a state where Ready threads drop to zero with timers/waits pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 19:22:03 +02:00
MechaCat02	9a93152981	Iterate-2.BE: host-driven synchronous graphics ISR delivery Replaces the victim-thread-mutate-then-wait scheme for vsync / CP interrupts with synchronous in-line dispatch on the coordinator host thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute path (kernel_state.cc:1370, processor.cc:413): pick a guest thread, borrow its PpcContext, jam ISR PC + args in, run the interpreter inline until LR_HALT_SENTINEL, restore the borrowed context. Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over 3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old asynchronous LR-sentinel injection (try_inject_graphics_interrupt) needed a Ready or Blocked guest thread to land on; once the Sylpheed main thread and worker threads all idled post-boot, no victim was available and every queued vsync got dropped. Host-driven dispatch decouples delivery from guest-thread readiness. Smoke test (lockstep): unchanged 54 — under current Sylpheed boot trajectory the ticker is gated by guest-instruction progress, not victim availability; lockstep stalls into idle-advance after ~5M instructions of real work and the synthetic tick_vsync_instr stops firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered climbs to ~1131 over a 128 s run, confirming the synchronous dispatcher itself works as intended. Architectural piece is now in place; raising the lockstep delivery rate requires ticking the synthetic vsync inside coord_idle_advance, which is a separate change. Changes: - crates/xenia-kernel/src/interrupts.rs: doc-comment update only. SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio callback path (audit-048) still uses the asynchronous LR-sentinel inject on a dedicated per-client worker. - crates/xenia-app/src/main.rs: * dispatch_graphics_interrupts(kernel, mem, &mut stats, &mut decode_cache, thunk_map): new fn. Drains the full FIFO per call. Victim selection same shape (Ready preferred, else Blocked, skip Idle/Exited/ServicingIrq), but the call is synchronous - we run step_cached + import-thunk dispatch inline on the borrowed ctx until pc == LR_HALT_SENTINEL. MAX_INSTRS_PER_ISR = 1M safety budget. * coord_pre_round: graphics-IRQ injection call removed. Audio path unchanged (still calls try_inject_audio_callback). * run_execution + run_execution_parallel: each now owns a persistent isr_decode_cache and calls dispatch_graphics_interrupts after coord_pre_round. * try_inject_graphics_interrupt: deleted (118 LOC). No new public APIs, no new dependencies, no changes to xenia-cpu. Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress + sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 18:58:40 +02:00
MechaCat02	ac2f89a7bb	Re-baseline sylpheed_n50m golden post-AUDIT-054 instructions: 50000002 → 50000001 (1-instr shift from FILE_DIRECTORY_FILE plumbing on NtCreateFile path; all other digest fields unchanged — imports/swaps/draws/render-targets/shaders/textures all match prior golden). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:11:11 +02:00
MechaCat02	2a8ff9515d	AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence Track A — FILE_DIRECTORY_FILE handling. NtCreateFile's 9th parameter `create_options` (sp+0x54 per shim_utils.h:49-50) is now read and forwarded to open_vfs_file/open_cache_file. When the FILE_DIRECTORY_FILE bit (0x1) is set on a `cache:\<hash>` path, the host-side handler `mkdir -p`s instead of `File::create`'ing a 0-byte sentinel that blocked subsequent hierarchical creates of `cache:\<hash>\<sub>\<leaf>` with NAME_COLLISION. Confirmed by `opts=0x4021` (incl. FILE_DIRECTORY_FILE) on `cache:\d4ea4615` and `opts=0x4020` (no DIR bit) on the leaf `.tmp` files. NtOpenFile forwards `open_options` (r8) into the same slot per xboxkrnl_io.cc:118-122. Closes the AUDIT-053 ζ-class VFS layout aliasing wedge. Track B — opt-in persistent cache root. AUDIT-038's per-process tmpdir + wipe stays the default (preserves lockstep/oracle determinism + dodges Sylpheed's `<hash>.tmp` journal-append-on- reboot self-inconsistency). Persistence is now opt-in via * `XENIA_CACHE_ROOT=<path>` — explicit path (caller manages wiping); hands a stable place to drop a canary-built cache for cascade A/B oracle work. * `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache` (or `$HOME/.local/share/xenia-rs/cache`). Cold-start (-n 500M, default tmpfs) with FILE_DIRECTORY_FILE fix: swaps=1 draws=0 imports=40454 cxx_throw=0 — matches master baseline, no regression. Cache hierarchy now mkdir-p'd correctly: `cache:/` contains 9 hash dirs (e.g. `d4ea4615/e/`, `aab216c3/5/`) instead of the 0-byte sentinel files AUDIT-053 found masquerading as directories. LOC: +88 / -14 = +74 net (≤80 budget). All 127 xenia-kernel unit tests pass. Trace: audit-runs/audit-054-vfs-layout-fix/ cold-start-digest.json + warm-start-digest.json (defaults) persist-cold-digest.json + persist-warm-digest.json (opt-in) baseline-master-digest.json (master `25704c5` reference) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:11:04 +02:00
MechaCat02	25704c5811	Re-baseline sylpheed_n50m golden post-AUDIT-032 Companion to `49f3eaf` (AUDIT-032 dedicated audio worker). With the audio callback ticker now on by default, the boot trajectory at 50M instr changes: instructions 50000009 -> 50000002 (interpreter stop boundary shift) imports 407215 -> 40454 (-90% — left audio-wait busy loop) swaps 2 -> 1 (degenerate splash repeat lost; main thread advances past splash) draws 0 -> 0 (audio gate != renderer gate per AUDIT-032 methodology correction) The 10x imports drop reflects exiting the NtWaitForSingleObjectEx busy-wait pattern (1.49M -> 30 calls per audit-runs/audit-048-*). Boot now reaches Stfs/Xam content/crypto init phase. The single remaining swap is the first splash; main thread is then blocked on a different handle (0x1280) for follow-up. sylpheed_n2m unchanged — at 2M instr the audio worker hasn't fired yet, so the digest is byte-identical pre/post AUDIT-032. Verified deterministic via two consecutive --expect runs at the new digest (cargo test -p xenia-app --test sylpheed_oracles -- --ignored passes in 2.82s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:07:40 +02:00
MechaCat02	49f3eafa15	AUDIT-032: dedicated audio worker thread per client (Plan B) Replaces APUBUG-PRODUCER-001's random-victim-hijack audio injection with a dedicated per-client guest worker thread, mirroring xenia-canary's apu/audio_system.cc:84-159 WorkerThreadMain pattern in xenia-rs's threading model. Audio callback ticker is now safe to enable by default. ## What changed - xenia-kernel/src/xaudio.rs: new XAudioState fields worker_handles + worker_refs (one slot per of XAUDIO_MAX_CLIENTS=8). Synthetic park-handle helper (0xF000_0000 \| client_idx) — outside the normal alloc range so wake_eligible_waiters never finds it; the only legitimate state-flip is via try_inject_audio_callback. - xenia-kernel/src/exports.rs: xaudio_register_render_driver spawns a 64KB-stack guest thread (create_suspended=true) via state.scheduler.spawn after registration succeeds. Immediately flips the spawned thread's state from Blocked(Suspended) to Blocked(WaitAny[synthetic]) so it's parked but not woken. Stores the kernel handle so find_by_handle resolves a fresh ThreadRef after slot compaction. Failure paths log + leave xaudio.worker_refs[i] = None, in which case the ticker drops fires (no random-victim fallback). - xenia-app/src/main.rs: try_inject_audio_callback resolves the worker via worker_handles[index] instead of scanning runqueues for a Ready or Blocked victim. The PC+r3 injection and SavedCallbackCtx capture are unchanged; the existing LR_HALT restore path re-blocks the worker on its synthetic handle for the next tick. Flag handling reworked: --xaudio-tick / XENIA_XAUDIO_TICK now act as explicit override (truthy = force on, falsey = force off, absent = use the KernelState default). - xenia-kernel/src/state.rs: xaudio_tick_enabled default flipped from false to true. Pre-fix it was off because the random-victim hijack regressed swaps=2->1; with the dedicated worker that whole class of regression is gone. ## Cascade verification at -n 500M (audit-runs/audit-048-audio-host-pump/) Pre-fix baseline: audit-runs/audit-047-gamma-wedges/ours-end-state.log. \| Dim \| Predicted (AUDIT-032) \| Observed \| \|-----\|-------------------------------------\|---------------------------------\| \| A \| tid=9 leaves Blocked[0x828A3254] \| Ready @ pc=0x824d1404 \| \| B \| tid=10 leaves Blocked[0x828A3230] \| Ready @ same pc/lr \| \| C \| XAudioSubmitRenderDriverFrame > 0 \| Mixer setup path executed \| \| D \| KeReleaseSemaphore 0 -> non-zero \| 0 -> 1; xaudio.callback.delivered=1 \| Bonus: audit-042's tid=6 worker pair on 0x10A0+0x10A4 also went Blocked->Ready as a downstream effect. Boot trajectory shifted significantly: NtWaitForSingleObjectEx 1,489,791 -> 30; NtSetEvent 3,334 -> 68; new exports firing (StfsCreateDevice, ObCreateSymbolicLink, XamContentCreateEnumerator, XamEnumerate, XamTaskSchedule, ExCreateThread x10, KeSetAffinityThread x7, NtCreateSemaphore x4, NtWaitForMultipleObjectsEx x94, NtDuplicateObject x14, XeCryptSha, XeKeysConsolePrivateKeySign). The system left the audio-wait busy loop and entered the savegame/content/crypto init phase. swaps regressed 2 -> 1 (degenerate splash repeat lost; main thread now advances past splash entirely, blocked on a different handle). draws unchanged at 0 — expected per AUDIT-032 (audio gate != renderer gate). ## Tests + scope - cargo build --release succeeds, no new warnings. - cargo test -p xenia-kernel --lib: 127/127 pass (incl. xaudio). - cargo test -p xenia-app --lib: 5/5 non-ignored pass. - Lockstep goldens (sylpheed_n2m / sylpheed_n50m) WILL drift on this fix and need re-baselining as a follow-up commit. 75 net non-comment LOC across 4 files, well under AUDIT-032's 60-120 LOC budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:06:25 +02:00
MechaCat02	e428ce33aa	M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set Closes the four remaining deferred follow-up items in one bundle. All four are smaller-scope and additive; lockstep determinism unaffected (analyzer-only changes). ## M9.5 — __CxxFrameHandler scope-table parsing - New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the three documented MSVC FuncInfo signatures (0x19930520/21/22) on 4-byte alignment. Each match is parsed as the documented struct (BE u32 fields), with sanity caps on max_state / n_try_blocks / pointer validity. - Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap (TryBlockMapEntry, 20 bytes) into one row each. - New tables eh_funcinfo, eh_unwind_map, eh_try_blocks. - Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) / 10,019 unwind entries / 315 try-blocks. ## M11.5 — Static-init driver chain detection - New `xenia_analysis::static_init` module. Walks every function looking for the canonical _initterm loop: lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4 bounded by a compare against another constant register. Extracts (array_start, array_end) and reads the array. - Reuses `function_pointer_arrays` table — drivers' arrays land with kind='static_init' (replacing M11's prologue-heuristic output where the structurally-grounded pattern fires). - Sylpheed yield: 0 drivers detected — the binary's static-init structure does not match the canonical CRT loop. Infrastructure ready; future M11.6 can relax. ## VMX vector-store xrefs (M6 follow-up) - Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31 dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode= 'x_form_indexed'. Static resolution still requires both rA and rB constant. - Sylpheed yield: 110 newly-detected stvx writes. ## Shift_JIS + UTF-8 localised-string detection (M7 follow-up) - Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS X 0208 lead/trail byte ranges + half-width katakana pass-through) and scan_utf8 (2- and 3-byte sequences). At least one multi-byte unit required so pure-ASCII strings aren't double-counted. - SJIS bytes rendered as \xHH escapes for diagnostic readability; full SJIS→UTF-8 decoding deferred. - Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text) + 39 UTF-8. ## Tests - +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state) - +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern) - +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string) Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results signature updated for new EH parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:36:53 +02:00
MechaCat02	56ffa40a6a	M5.5: this-flow indirect-dispatch resolution via vptr-write inference Closes the dominant case M5 could not resolve — `lwz vt, off(this); lwz fn, slot(vt); mtctr; bcctrl` (real C++ dispatch). Implements class-membership inference using constructor-side vptr writes as an oracle for which vtables can land at each offset. ## Algorithm Phase 1 — vptr-write scan: walk every function with the existing lis+addi register tracker. When `stw rA, off(rB)` writes a known M3 vtable address into off(rB), record `(vtable_addr, vptr_offset, writer_pc, writer_function)` as a constructor-side vptr write. Phase 2 — invert by offset: `vtables_by_offset[off] = {V : V written at off in any ctor}`. Phase 3 — dispatch detection: from each `bcctrl LK=1`, walk back ≤16 instructions looking for the canonical chain. Bail on register clobber, branch, or label (basic-block) boundary. Phase 4 — edge emission: for `(dispatch_pc, vptr_off, slot)`, emit one `xrefs.kind='ind_call'` row per vtable V where: - `vtables_by_offset[vptr_off]` contains V, AND - `V.length > slot` (V actually has a method at that slot) Multi-candidate sites (the common case at offset 0) are an over-approximation; downstream queries filter to single-candidate sites for high confidence: `WHERE candidate_count=1` in `indirect_dispatch_sites`. ## Schema NEW TABLES: - `vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function)` - `indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count)` - `indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address)` NEW INDICES on vtable_address / vptr_offset / method_address / (vptr_offset, slot) for fast joins. ## Sylpheed yield - 567 vptr writes / 214 vtables / 29 offsets (offset 0 = 88%). - 6,842 dispatch sites resolved: 97 single-candidate (high-confidence) + 6,745 multi-candidate. - 687,963 ind_call xref rows. - 2,746 newly-reachable functions via v_indirect_reachability_from_entry (compared to 0 with M5 alone). - Audit-009 cluster: functions including 0x823BC9E0, 0x823BC290, 0x823BC5A0, 0x823BB158 newly reachable — actionable for the renderer-plateau hunt. Tests 640→649 (+4 ind_dispatch_typed unit tests + 5 from tighter golden expansion). Schema golden + write_analysis_results signature updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:35:05 +02:00
MechaCat02	77034b6cbf	audit-038: persistent cache:/* VFS via host-FS backing Replaces the "Synthesized empty file" stub for cache:/* paths with a real host-FS HostPathDevice-style mount. Each KernelState gets a fresh per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is cleared on init for lockstep determinism (mirrors canary's xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") + HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc). NtCreateFile now honours create_disposition for cache: paths: FILE_OPEN -> NOT_FOUND if missing FILE_CREATE -> NAME_COLLISION if present FILE_OPEN_IF -> open or create FILE_OVERWRITE_IF -> create or truncate FILE_OVERWRITE -> NOT_FOUND if missing, else truncate FILE_SUPERSEDE -> create or truncate NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation) / NtQueryInformationFile / NtQueryFullAttributesFile route through std::fs against the per-handle host_path; non-cache paths keep their legacy semantics (read-only disc image, synth-empty stubs). Verified by audit-037 cascade: - sub_82459D18 (cache-miss restore): 0 fires (was firing constantly) - sub_8245D230 (resize/zero-fill): 0 fires (was firing constantly) - 105+ real cache-file writes per 500M run; 4+ MB of game data persisting to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc. - Lockstep deterministic at instructions=100000004 / imports=987485 across 3+ reruns (digest shifted as expected; goldens re-baselined). - swaps=2 plateau still in place; cluster L1 unactivated. Cascade dimension D (cluster activation) — UNKNOWN, no L1 fires. Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 14:34:27 +02:00
MechaCat02	5af792c9fc	M8+M9+M10+M11+M12: LOW-tier milestones — funcptr-arrays, EH flag, TLS, lr-trace Five LOW-priority milestones bundled. Total ~700 LOC across 11 files. ## M9 — has_eh derived from pdata.flags exception bit - New `functions.has_eh BOOLEAN NOT NULL` column. Derived from M1's already-parsed `pdata.flags` (bit 31 of the packed word — the exception-handler-present flag, distinct from bit 30 which is the always-1 32-bit-code flag). Index idx_functions_has_eh. - Sylpheed: 2,975 of 23,073 pdata-validated functions have EH (12.9%). ## M10 — .tls section / IMAGE_TLS_DIRECTORY32 parser - New `xenia_xex::tls::parse_tls` parses the directory + zero-terminated callback array. Returns None when the binary has no .tls section. - New `tls_info` (singleton row) + `tls_callbacks(slot, address)` tables. - New `DbWriter::write_tls()` no-ops on None. - Sylpheed has no .tls section → 0 rows; infra ready for binaries with __declspec(thread). ## M8 + M11 — function_pointer_arrays (dispatch tables + static initialisers) - New `xenia_analysis::funcptr_arrays::analyze` widens M3's vtable scan: detects runs of ≥2 function pointers in .rdata and classifies each as `vtable` (M3 re-emit), `dispatch_table` (M8), or `static_init` (M11) via a constructor-prologue heuristic (mfspr + small stwu). - New tables `function_pointer_arrays(address PK, length, kind)` and `function_pointer_array_entries(array_address, slot, function_address)`. - Sylpheed: 722 vtables + 388 dispatch_tables = 1,110 arrays / 6,347 slots. 0 static_init detected (Sylpheed's ctors don't all match the conservative heuristic; M11.5 future work can chain via the entry- point's static-init driver). ## M12 — --lr-trace runtime canary-diff harness - New CLI `exec --lr-trace=PC[,PC,...]` and `--lr-trace-out=PATH` flags. Symbolic resolution (Class::method, Class::*) via M4 lookup. Env vars XENIA_LR_TRACE / XENIA_LR_TRACE_OUT also work. - New `KernelState::lr_trace_pcs` + `lr_trace_writer` + helper `fire_lr_trace_if_match(hw_id)` invoked from the per-instr probe slot. - JSONL output: pc/tid/hw/cycle/r3/r4/r5/r6/lr — superset of what xenia-canary's --log_lr_on_pc patch emits, with a cycle counter for cross-run reproducibility. Diff-friendly via `jq`. - Lockstep digest unaffected: smoke test on entry-point PC fires once with cycle=0/lr=BCBCBCBC/all-GPR-zero (correct initial state). Tests 636→640 (+2 TLS tests, +2 funcptr_arrays tests). Schema golden updated for new tables + has_eh column. Lockstep determinism preserved (instructions=2000005 ×2 reruns identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 22:29:35 +02:00
MechaCat02	38d8871e8d	M6: addr_mode column on xrefs + extended store/load classes Adds finer-grained addressing-mode classification to every data xref row plus new dispatch for instruction families not previously emitted: - New `xrefs.addr_mode VARCHAR NULL` column. NULL for control-flow edges (call / ind_call / j / br); one of d_form / lis_addi / lis_ori / multiword / x_form_indexed / x_form_byterev / atomic / dcbz for data edges. Index idx_xrefs_addr_mode. - New `xenia_analysis::xref::AddrMode` enum + Xref::addr_mode field. - Opcode 46/47 (lmw/stmw) expand to one xref per slot — D-form multi-word load/store now resolves all (32-rS) consecutive addresses. - Opcode 31 X-form dispatch — stwx/stbx/sthx/stwux/stbux/sthux/stdx/stdux, lwzx/lbzx/lhzx/lhax/lwzux/lbzux/lhzux/lhaux/ldx/ldux, stwcx./stdcx. (atomic), stwbrx/sthbrx/lwbrx/lhbrx (byte-reverse), dcbz (cache-line clear). - X-form rows are emitted ONLY when both rA and rB resolve to known constants (rare but present); the dominant runtime-indexed pattern remains correctly skipped. Sylpheed yield (regen on master + merge): - 442 newly-detected x_form_indexed reads (lwzx/lhzx into static tables). - 40 newly-detected atomic writes (stwcx./stdcx. with resolvable address). - 28,834 lis_addi refs, 18,485 d_form reads, 3,288 d_form writes — every pre-existing data row now tagged. - 0 multiword / dcbz / byterev (these instructions exist but aren't on lis+addi-tracked code paths). Tests 633→636 (+3 xref unit tests covering AddrMode tag uniqueness, data-edge addr_mode round-trip, control-edge None invariant). Schema golden updated (xrefs gains addr_mode column). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:38:47 +02:00
MechaCat02	ab4fe211e5	M5+M7: indirect-dispatch reachability + .rdata string detection Two MEDIUM milestones bundled (both opportunistic per plan; both small). ## M5 — indirect-dispatch reachability - `xenia_analysis::indirect`: per-basic-block register tracker over each detected function. Recognises the canonical static-vtable pattern `lis+addi → lwz off(rA) → mtctr → bcctrl` where rA holds a known M3 vtable address. Emits one `Xref { kind: IndirectCall }` per resolvable bcctrl site. - PowerPC ABI awareness: `bl`-style calls clobber volatile r0..r12 + ctr but preserve non-volatile r13..r31, so a vtable pointer parked in r30/r31 before a call survives. - Label-based basic-block boundaries kill register state — bounds false-positive risk for jump-IN paths. - New `XrefKind::IndirectCall` variant (DB tag `'ind_call'`). - New SQL view `v_indirect_reachability_from_entry` — strict superset of `v_reachability_from_entry`, taking `ind_call` edges in the BFS. Sylpheed yield: 0 edges detected. The binary's 1,001 static lis+addi references into vtables are nearly all constructor-side vptr writes, not dispatches; real method dispatch goes through `this->vptr` which requires alias analysis we explicitly don't do. Documented in SCHEMA.md as the expected limitation. Three unit tests cover the synthetic-correctness path. ## M7 — string / constant-pool detection - `xenia_analysis::strings`: scans `.rdata` for runs of ≥ 6 printable ASCII bytes (NUL-terminated) and ≥ 6 UTF-16LE code units (basic-plane printable ASCII, NUL u16 terminator). - New `strings(address PK, encoding, length, content)` table + encoding index. - Implicit cross-ref via existing `xrefs.kind='ref'` rows whose target matches a strings.address. Sylpheed yield: 6,311 ASCII strings (including embedded HLSL shader source and AS_CB_SURFACE_SWIZZLE_* assertion strings). 9,132 lis+addi sites cross-reference detected strings — names source PCs near each string in one query. Four unit tests cover encoding detection, NUL termination, and short-run rejection. Tests 626→633 (+3 indirect, +4 strings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 21:22:50 +02:00
MechaCat02	4ff08f6116	M4: class-aware probe tokens via M3 vtable+method tables CLI extension only — no schema change. Adds symbolic resolution for --pc-probe / --branch-probe / --ctor-probe tokens: - `0xADDR` / `2186674160` — numeric (current behavior, no DB load). - `Class::method` — joins classes × methods × demangled_names. - `Class::` — joins classes × methods (all slots). - `function_name` — falls back to functions.name for free functions / saverestore stubs / labels. New `xenia_analysis::lookup::resolve_probe_token(db_path, token)` opens the DB read-only ONLY when a token is non-numeric, so legacy numeric flows pay no IO. New `--probe-db PATH` flag (or `XENIA_PROBE_DB` env / default `sylpheed.db` next to the .iso) selects the DB. Symbolic resolution happens BEFORE any guest exec, so it cannot affect the lockstep digest. Verified deterministic across two reruns at -n 2M (instructions=2000005 identical). End-to-end smoke test on Sylpheed: `--pc-probe='ANON_Class_6B674251::'` resolves to all 45 method PCs of that anonymous class (matching the methods-table row count for that vtable). Tests 621→626 (+5 lookup unit tests covering numeric passthrough, symbolic-without-DB error, Class::method resolution, Class::* expansion, and functions.name fallback). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:22:21 +02:00
MechaCat02	1d6c51fbf8	M3: vtable scan + MSVC RTTI walk + 3 new tables Adds detection of statically-allocated MSVC vtables in .rdata/.data: - New `xenia_analysis::vtables` walks read-only sections looking for runs of ≥3 contiguous big-endian u32 values where each value lands on a known function start (from M1's corrected functions table). 2-slot runs are rejected to keep false-positive rate down. - For each candidate the MSVC RTTI walk vtable[-1] → CompleteObjectLocator → TypeDescriptor → mangled name is attempted; on success the demangled class name is recorded along with a best-effort RTTIClassHierarchyDescriptor walk to fill base_classes_json. On failure (RTTI stripped — common for shipped game binaries) the class is named ANON_Class_<fnv1a-hash> keyed by sorted method-PC list, so identical vtables collapse to one entry. - DB: new tables `vtables`, `methods`, `classes` with indices on function_address and rtti_present. `write_analysis_results` takes a `&[Vtable]` slice; `write_disasm` (back-compat) passes empty. - cmd_dis wires the scan after xref analysis using `func_analysis.functions.keys()` as the function-start oracle. Validation on Sylpheed (RTTI stripped, as expected): 722 vtables / 499 unique classes / 5571 methods. Sanity invariant: every methods.function_address joins to functions.address (0 broken refs). Largest vtable: 131 slots. Tests 617→621 (+4 vtable unit tests covering 3-slot detect, 2-slot reject, synth name stability, and synth name divergence). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:17:45 +02:00
MechaCat02	89f5f7e4a9	M2: MSVC C++ demangler + demangled_names DB table Adds an MSVC name-demangling layer in front of M3's vtable / RTTI work: - New `xenia_analysis::demangle` wraps the `msvc-demangler` crate (a Rust port of LLVM's `MicrosoftDemangle.cpp`). `demangle()` short-circuits on non-mangled inputs (`?` prefix check); `demangle_or_raw()` always returns a record (raw passthrough on parse failure). - Heuristic split of the formatted demangled string into structured fields `(namespace_path, class_name, method_name, params_signature)`. Top-level paren / template-bracket aware, so `a::b<c::d>::e` and signatures with templated arg types parse correctly. - DB: new `demangled_names(address, mangled, raw_demangled, namespace_path, class_name, method_name, params_signature)` with indices on address / class_name / method_name. Populated from any label whose name starts with `?` plus any import name that happens to be mangled. For Sylpheed (a fully stripped binary) this table is empty out-of-the-box; the layer's value lands in M3, which will append rows for every RTTI TypeDescriptor name found in `.rdata`. Tests 610→617 (+7 demangler unit tests covering early-out, raw fallback, member function form, RTTI form, qname split, paren-template safety, and top-level `::` splitting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 20:02:21 +02:00
MechaCat02	70120465a3	M1: parse .pdata RUNTIME_FUNCTION; cross-validate function boundaries Adds an authoritative function-boundary source from the linker: - New `xenia_xex::pdata` parses .pdata 8-byte entries (BeginAddress + packed prolog/length/flags). Bit layout per Microsoft PE32 PowerPC spec: prolog in bits 0..7, function_length in bits 8..29, flags in 30..31. - `func::analyze_with_pdata` unions pdata BeginAddresses into the candidate set, attaches `pdata_validated`/`pdata_length` to each `FuncInfo`, and trims any function whose `end` overlaps the next start (catches mis-merge where one row spanned two prologues — the audit-031 sub_824D23B0/sub_824D29F0 case). - DB: extends `functions` with `pdata_validated BOOLEAN`, `pdata_length BIGINT`; new table `pdata_entries`; index on pdata_validated. - New `crates/xenia-analysis/SCHEMA.md` documents M1 layer + forward work. Validation on Sylpheed: 25481 functions (was 12156) / 23073 pdata_validated / 0 orphans / 0 mis-merges. Audit-031 mis-merge resolved: sub_824D29F0 now has its own row with `pdata_length=280` (70 dwords); sub_824D23B0 now correctly ends at 0x824D2878 (`pdata_length=1224` matches prologue walk). Tests 605→610. New 5-test pdata unit suite covers bit layout + sentinel + out-of-range filtering + real-world layout round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 19:44:02 +02:00
MechaCat02	690943ceef	gate dump-section reads on is_mapped; trim doc comments Without the page-state guard, read_bulk faulted on PROT_NONE pages of the 4 GiB host reservation. Per-page is_mapped check skips uncommitted pages, leaving the buffer's leading zero bytes in place. Total LOC budget after trim: 70.	2026-05-07 21:45:54 +02:00
MechaCat02	412ba858b4	move dump-section flush above quiet gate so it fires under --quiet runs The headless cmd_exec path passes quiet=false in normal use but the diagnostic --dump-section is independent of the chatty thread/dump prints, so it should not be gated by --quiet. Lockstep digest preserved.	2026-05-07 21:42:33 +02:00
MechaCat02	08d41cf2fc	add --dump-section=BASE:LEN:PATH for end-of-run guest memory snapshot Drives byte-level memory diffs against canary's Memory::Save dump. Hot-path zero-cost when absent; lockstep digest unaffected (instructions=100000003 deterministic across reruns).	2026-05-07 21:40:45 +02:00
MechaCat02	c03f2bc9e2	fix(kernel): ensure_dispatcher_object writes XObj signature + handle (canary mirror) Mirrors canary's `XObject::StashHandle` (xobject.h:253-256): on first adoption of a guest dispatcher header, stamp +0x08 with the kXObjSignature fourcc 'X','E','N','\0' and +0x0C with the stash handle (here the guest pointer itself, since our shadow table is keyed by ptr). Audit-023/024A documented divergence at addresses such as 0x828F4838 where canary stores "XEN\0" + handle but we left zeros. Lands as canary-correctness restoration; cascade impact at -n 500M is nil per the discipline gate (no sharp prediction tied to the writeback). Lockstep determinism preserved: instructions=100000003, imports=987516, swaps=2, draws=0 across 2 reruns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 21:06:25 +02:00
MechaCat02	978a6950d1	feat(memory): --mem-watch=ADDR per-store writer trace Adds an opt-in diagnostic that emits one tracing line per guest store overlapping any armed byte address, naming the writer (tid, pc, lr) plus old/new u32 lanes. Mirrors the --pc-probe / --branch-probe shape; pc/lr are stamped from worker_prologue via a thread-local Cell, so default runs (empty watch set) take a single is_empty() check on each write. Lockstep digest preserved (instructions=100000003 across reruns, sylpheed_n50m.json golden byte-identical). Diagnostic infra only; no functional change. Used to identify producers of dispatch-state writes for the audit-017 / audit-019 hunt.	2026-05-06 21:00:20 +02:00
MechaCat02	76dfe7fd7a	fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:46:46 +02:00
MechaCat02	5d2401f9c5	fix(xam): XamUserGetSigninState returns SignedInLocally=1 for user 0 Mirrors canary xam_user.cc:90-101. User 0 returns 1 (SignedInLocally), all other indices return 0. Replaces stub_return_zero registration that was reaching guest-side branches looking up signin state. Tests: 599 -> 600. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:08:13 +02:00
MechaCat02	b78e6fd205	fix(kernel): KRNBUG-IO-004 — real XamNotifyCreateListener + XNotifyGetNext per canary Canary's RegisterNotifyListener (kernel_state.cc:1013-1033) auto-enqueues four startup notifications on the first listener whose mask covers kXNotifySystem (SystemUI=0x09 + SystemSignInChanged=0x0A) and kXNotifyLive (LiveConnectionChanged=0x02000001 + LiveLinkStateChanged=0x02000003). XNotifyGetNext (xam_notify.cc:22-96) pops the queue with mask + version filtering on enqueue per xnotifylistener.cc:38-51. Our prior stubs returned 0 forever; the dispatch loop at 0x822f1be8 in sub_822F1AA8 was thus bypassed indefinitely. Implementation: - KernelObject::NotifyListener { mask, max_version, queue, waiters } variant. - KernelState::has_notified_startup + has_notified_live_startup gates. - xam_notify_create_listener: mask=r3 (qword), max_version=r4 (clamped <=10), alloc handle, conditional 4-tuple startup enqueue. - xnotify_get_next: handle/match_id/id_ptr/param_ptr in r3..r6; pop_front (or scan-by-id), with mask + version filter applied at enqueue time. - 5 unit tests covering: full-mask 4 startup notifications, second-listener no re-fire, system-only mask filtering, max_version=0 too-new drop, unknown handle returning 0. Tests: 594 -> 599. Lockstep `-n 100M` instructions=100000012 deterministic across 2 reruns; bit-identical run-to-run diff. Cascade (verified at -n 500M): - dispatch arm 0x822f1be8 fires; sub_82173DC8 entered. - 3/21 renderer-cluster L1 PCs newly reached: 0x822c6870 (2 workers), 0x824563e0, 0x823ddb50. - canary-only export delta 7 -> 3 (reclassified to fired: KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule). - worker thread count 18 -> 20. - signal_attempts on handle 0x15e0 = 1 (primary=1), was 0. - draws=0 still expected at this step. LOC: 119 (97 impl + 22 scaffolding pattern matches across main.rs / objects.rs / state.rs) <= 120. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:55:51 +02:00
MechaCat02	a1a7265f29	fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl Replace the stub_success registration of NtDeviceIoControlFile at exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive geometry) and 0x74004 (partition info), mirroring xenia-canary xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004 response with cache_size=0xFF000 at OUT+8 is the gate that lets sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11 XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007. Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox 360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export in the codebase to need 9+ args. Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held): - XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓ - XamTaskSchedule count 0 → 1 ✓ - canary-only exports 7 → 3 (audit predicted ≤3) ✓ - 0x15e0 semaphore signal_attempts 0 → 1 (bonus) - 0x100c worker spawn DID NOT fire (still UNCREATED) ✗ - 0x1004 signal_attempts unchanged ✗ - Worker spawn count unchanged at 19 ✗ Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡ run3, byte-identical). instructions=100000010 → 100000019, imports 407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists. sylpheed_n50m golden re-baselined instructions=50000004→50000003, imports=407362→407255. sylpheed_n2m unchanged. Still canary-only after this fix: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream gate is somewhere past XamTaskSchedule's completion path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:00:12 +02:00

1 2 3

129 Commits