xenia-rs

Author	SHA1	Message	Date
MechaCat02	ad9c8e4cb8	[iterate-2U] VdGlobalDevice: allocate a real device cell so the swap counter (clock B) can advance Sylpheed's title loop re-runs its per-frame manager update sub_821741C8 only when "clock B" ([controller+88], the swap count) changes. Clock B's sole source is the CP swap-complete callback sub_824CE2B8, which bumps [gfx+15160] via the TWO-LEVEL deref [[VdGlobalDevice]+0]+15160, where VdGlobalDevice is the kernel variable export 0x01BE at guest .data 0x82000750. Ours patched that import slot with literal 0 (the old "passed through to Vd* shims, write 0" behaviour). Consequences, both confirmed at runtime: * the guest's graphics init stores its D3D device object via `stw r31, 0([0x82000750])` (sub_824C6DC0 @0x824C6F18) — with the slot 0, that store lands at address 0; * the swap callback reads [[0x82000750]] = [0] = 0 and increments [0+15160] (the null page) instead of the real device's swap counter. So [gfx+15160] never moved, clock B stayed frozen at 0, sub_821741C8 fired exactly once, and the game submitted one render batch (the 78-draw splash) then stalled. Fix mirrors xenia-canary RegisterVideoExports (xboxkrnl_video.cc:557-564) exactly: allocate a 4-byte cell, point the import slot at it, zero the cell. The guest then stores its device into the cell, and the callback's two-level deref resolves correctly. Verified: [0x82000750] now holds a real cell whose [+0] is the device (gfx state), the swap callback bumps [gfx+15160] 0->1, clock B advances, and the per-frame chain steps forward (sub_821741C8 fires 1->2x, GamePart update sub_821C7CB8 0->1x). Determinism: --gpu-inline digest re-baselined and byte-identical across runs. The fix shifts the early execution trajectory (clock B unfreezing), so the n50m golden moves imports 451500->178937 and instructions 50000001->50000014; draws/swaps/RTs/shaders unchanged (78/4/2/3). n2m golden unchanged (early boot, pre-fix-effect). 675 workspace tests green; sylpheed_n50m oracle green. Note: this breaks the FIRST hard blocker (clock B could never advance at all). Full per-frame sustain (draws past 78) needs a further step: each GamePart update must submit a per-frame command buffer (with PM4_INTERRUPT) during the asset-streaming phase to keep generating CP interrupts; ours currently produces only the single seed interrupt from the initial batch, so the chain advances once and re-stalls. Tracked for the next iterate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 16:20:08 +02:00
MechaCat02	873c197ff1	[iterate-2T] VdSwap: route present through ring PM4_XE_SWAP, drop out-of-band swap interrupt Make ours' VdSwap present path faithful to xenia-canary `VdSwap_entry` (xboxkrnl_video.cc:518-548): write the reserved 64-dword ring slot with a PM4_TYPE0 fetch-constant patch + PM4_TYPE3(PM4_XE_SWAP) + NOP padding, then let the natural drain consume the swap packet in command-stream order. Remove the synthetic CP swap-complete interrupt that `notify_xe_swap` raised out-of-band. Root found this session (the actual present-path bug): ours' `notify_xe_swap` pushed an `InterruptSource::Swap` (→ INTERRUPT_SOURCE_CP) interrupt directly from the VdSwap HLE, decoupled from the GPU command stream. When that interrupt reached the graphics ISR `sub_824BE9A0` before D3D had armed its swap-callback slot (`[gfx+10772]+16` still the `0xBADF00D` placeholder), the ISR took its error path and hit the assert "ERR[D3D]: Unanticipated CPU_INTERRUPT. Sign of a corrupt command buffer?" (`bl sub_824C5DF0; twi` at 0x824BE9DC) — 2x per run on master. Canary's VdSwap raises NO interrupt; swap-complete CP interrupts come only from in-stream PM4_INTERRUPT packets, which are naturally ordered after the callback-arming Type-0 writes. Routing the swap through the ring packet matches that ordering and eliminates the trap (2 -> 0). Canary oracle confirmation (muted, audit_mem_watch + audit_jit_prolog_pc): canary's early/loading loop is present-driven — swap counter [gfx+15160] (0xBE56CA38) advances ~per-vblank from vblank 65 onward, reaching 0xD02 (3330) in ~60s via 6184 CP source=1 interrupts, with VdSwap called only ONCE. So the present interrupts are entirely in-stream, not from the VdSwap export. This is a correctness/faithfulness fix; it does NOT cascade. draws stay 78 at 200M and 1B because the upstream gate persists: the game submits one render batch then stalls (renderer sub_82506xxx 0x; 2nd title thread 0x821748F0 never spawns). The per-frame loop sub_822F1AA8 runs ~1207 iterations on vsync but clock B (swap count) only advances ~once, so the manager update sub_821741C8 fires once. That is the iterate-2Q/2F title-pipeline gate, not a present/ interrupt bug. swaps 3 -> 4 (the in-stream PM4_XE_SWAP now drains). Deterministic in inline mode (n50m --gpu-inline --stable-digest regenerated byte-identical twice; golden re-baselined: swaps 3 -> 4). cargo test --workspace 675 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 15:20:02 +02:00
MechaCat02	1ae472bd2b	[iterate-2S] GPU: implement CP SCRATCH_REG memory writeback — arms Sylpheed's swap-callback slot Sylpheed renders the splash (draws=78, iterate-2O) then plateaus: the title's per-frame manager (sub_821741C8) only re-fires when "clock B" ([gfx+15160], swap count) changes, which only the CP swap-complete callback sub_824CE2B8 increments. The graphics ISR sub_824BE9A0 indirect-calls that callback via [[gfx+10772]+16] on CP (source=1) interrupts, but the slot stayed NULL so the callback never ran. Root (runtime-verified, ours-side GPU): the guest arms the slot through the Xenos CP scratch-register writeback path, which ours never implemented. The arming IB (drained by ours at 0x4adf5180) contains a Type-0 register write of the callback PC 0x824ce2b8 into SCRATCH_REG4 (0x057C). On hardware/canary, writing a SCRATCH_REG{n} mirrors the value to SCRATCH_ADDR + n4 in memory when the matching SCRATCH_UMSK bit is set. Runtime values: SCRATCH_ADDR=0x0b1d5000 (the [gfx+10772] descriptor), SCRATCH_UMSK=0x20033 (bit 4 set), so SCRATCH_REG4 -> 0x0b1d5010 = descriptor+16 = the callback slot (0x4b1d5010). Ours decoded the Type-0 write into the register file but performed no writeback (case a: drained-but-mishandled), so the slot stayed NULL. Fix mirrors canary's CommandProcessor::HandleSpecialRegisterWrite (command_processor.cc:545-552): a scratch_register_writeback() helper called from handle_type0/handle_type1 after every register write; for SCRATCH_REG0..7 with the UMSK bit set, it writes the value (big-endian, as mem.write_u32 already stores) to SCRATCH_ADDR + n4 (projected via physical_to_backing). Deterministic given identical register state; proven by unit test. Cascade (verified by runtime probe): slot 0x4b1d5010 now armed with 0x824ce2b8; on the 2-3 CP interrupts that fire, the ISR reads the slot and bcctrl's into sub_824CE2B8 (runs 2x; 0x cascade on master); sub_824CE2B8 increments clock B ([gfx+15160]). The cascade does NOT yet reach draws>78: there are only ~3 CP interrupts (from the initial 9825- packet batch), and the title render loop stalls upstream (the iterate-2Q title-respawn gate) before it submits more PM4_INTERRUPT work, so the callback can't bootstrap a self-sustaining loop. This is the remaining update-17/18 arming gap closed; the upstream stall is the next gate. The default threaded GPU backend drains the ring on a separate host thread, so with the callback now doing work the exact CP-interrupt delivery instruction varies run to run (pre-existing GPU-thread race). Pin the n50m oracle test to --gpu-inline (instruction-count deterministic) and re-baseline its golden; bit-exact across repeated runs. New unit test scratch_reg_write_mirrors_to_memory_when_umsk_enabled. Tests: 675 pass (was 674). Golden re-baselined + determinism verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 14:21:30 +02:00
MechaCat02	034ec8b47f	[iterate-2O] GPU: drain indirect buffers correctly — Sylpheed renders splash (draws 0→78) Ours' GPU never drained the D3D driver's system command buffer past the first 11-dword indirect buffer, so DRAW_INDX / reg-0x57C-arm packets never executed and draws stayed 0 (the long-hunted render gate; see UPDATE-18). Runtime tracing (temporary, removed) showed the guest submits 6 INDIRECT_BUFFER packets at boot (CP_RB_WPTR 22→37) but ours executed exactly ONE IB and then spun 15.7M packets inside it. Three coupled command-processor bugs, all corrected to match canary: 1. `sync_with_mmio` applied the primary CP_RB_WPTR to whichever ring was active, including an executing indirect buffer — `37 % 11 = 3` clobbered the IB's write pointer so its read pointer looped 0→2→5→0 forever and never popped back to the primary ring. CP_RB_WPTR governs ONLY the primary ring; while an IB executes, the primary is the bottom of the IB stack. Canary executes each IB through a separate `RingBuffer reader_` (command_processor.cc), so the primary write pointer is structurally inapplicable to an IB. 2. Indirect buffers were treated as circular rings: read wrapped at `size_dwords` (`11 % 11 = 0`) and never reached the fixed write pointer, so even without the clobber the IB could not terminate. An IB is a fixed linear sub-stream; add `RingBufferView.indirect` and drain `[0, ib_size)` monotonically, then pop. 3. `is_ready` only checked the active ring, so an IB that now correctly exhausts would never get `execute_one` called again to pop back to the primary ring (whose WPTR may have advanced). Check the whole IB stack. Also: the ring was sized `1 << size_log2` bytes (1024 dwords) vs canary's `1 << (size_log2 + 3)` (8192 dwords) — an 8× undersize that desynced WPTR-wrap math from the guest. Fixed in `GpuSystem::initialize_ring_buffer` (and the dead bookkeeping copy in `vd_initialize_ring_buffer`). Cascade (deterministic; threaded-default backend, byte-identical across runs): reg 0x57C now written, IB jumps 1→12, packets 15.7M→9,825, and the splash renders — draws 0→78, shaders 0→3, render_targets 0→2, swaps 2→3 — stable at 50M / 200M / 1B. Boot then reaches a new downstream gate (draws plateau at 78, interrupts keep climbing → engine alive, not deadlocked). golden `sylpheed_n50m.json` re-baselined (draws 78). `cargo test --workspace` green (674; +2 ring_view regression tests). vd_swap's synthetic-swap short-circuit is now redundant but left untouched (cascade works without changing it); cleaning it up is a separate follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 22:06:16 +02:00
MechaCat02	93f60a3ba0	[iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC). Canary sets it from `GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C. Left unwritten it read 0 for every thread. Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`. With index 0 for all threads, every thread marked slot 0; the multi-byte rendezvous signature it then spins on (`ld [obj+0x164]` compared against the packed per-slot expectation) could never assemble. Both pump threads busied at pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran their `KeSetEvent` loops — so the events they signal (the 21k-per-thread heartbeat in canary) never fired, starving the downstream worker handshake. Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler spawn + affinity migration) so the two stay in sync. Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier (barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c). imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical). draws still 0 (a later, separate render gate). golden re-baselined. cargo test --workspace: 672 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 18:08:46 +02:00
MechaCat02	de21c7a544	[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:38:17 +02:00
MechaCat02	f3b7e8b760	[iterate-2F] Scheduler anti-starvation floor: fix job-4 handoff render gate The lockstep scheduler's pick_runnable is strict priority (max_by_key (priority, -idx)). On a cooperative single-host HW slot, a CPU-bound spinner that never blocks (the silph poll loop pinned by affinity to hw=5) wins pick_runnable every round forever, permanently starving a co-located peer (the submitter, tid6) that the spinner is actually waiting on. On real hardware those threads run on separate SMT contexts concurrently, so the spinner never starves the submitter; ours collapses them onto one slot with no anti-starvation, turning priority (or equal-priority index order) into permanent starvation. The starved submitter never dequeued job-4 -> the worker-hub (tid5) blocked INFINITE on completion event 0x1080 -> silph (tid13) wedged on 0x1078 -> no vsync -> draws_seen=0, the publisher splash never renders. (decrement_quantum's within-slot rotation is dead: begin_slot_visit unconditionally re-pick_runnable()s each round, discarding the rotated running_idx. The fix is therefore evaluated at pick time, not via that discarded rotation.) Fix (Option A, bounded anti-starvation, deterministic): - Add per-thread steps_starved counter to GuestThread. - begin_slot_visit increments it for every Ready peer passed over this visit, resets it to 0 for the picked thread. - pick_runnable selects by effective_priority: once steps_starved reaches STARVE_LIMIT (4096) the thread is lifted to i32::MAX and wins exactly one pick, then resets. The genuinely higher-priority thread still wins ~4095/4096 visits -- the boost grants periodic forward progress only, it does NOT invert priority. Pure function of counter/priority/index -> deterministic (no wall-clock, no RNG). Cascade (lockstep exec, XENIA_CACHE_PERSIST=1, -n 200M): - submitter dequeue sub_82458508 now fires 4x (was 3x); the 4th job (buf 0x40baa2c0) is dequeued at cycle 6.15M. - hub tid5 leaves Blocked(0x1080) -> now Ready (no more INFINITE wait). - GPU packets 0 -> 116,101,363 (command stream now flowing). - tid13 (silph::UImpl) advances past the old 0x1078 wedge to a NEW downstream wait (handle 0x10a0); 3 new threads spawn (tid14/15/16). - draws_seen still 0 -> the splash's first draw is a NEW downstream gate, not this starvation. Determinism: two cold lockstep `check -n 5M` runs byte-identical (full and stable digests). New n50m stable digest deterministic across two cold runs. Golden re-baselined: instructions 50000007->50000003, imports 92317->90296 (trajectory shift from the changed pick order). Tests: 666/666 (+1 test_anti_starvation_bounded_progress). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:02:02 +02:00
MechaCat02	7e2603a9e5	[iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix) Lockstep livelocked the scheduler the same way --parallel did before `0332d19`: the kernel deadline-arithmetic (`now_basis_at`) read per-thread `ctx(hw_id).timebase`, but a parked/poll thread has `running_idx == None` so `Scheduler::ctx()` returns `idle_ctx` (timebase 0). A poll thread (tid=7, a `KeWaitForSingleObject` loop with a 30ms relative timeout) computing its deadline via `parse_timeout` therefore read `now = 0` and registered `deadline = 0 + 3000 = 3000` — a constant ~7.78M units in the past. `coord_idle_advance` then re-armed that same constant 3000 deadline forever, pinning virtual time and starving every other thread's real future deadline. Render-gate impact: the submitter (tid=6) re-enters a 16ms-timeout WaitForMultiple after its first jobs; that timeout never fired because vtime was pinned at 3000, so virtual time never reached real future deadlines. Fix (Option A — mirror the parallel fix): drive the existing deterministic `Scheduler::global_clock` in lockstep too (floored up once per outer round to `stats.instruction_count`, a pure function of retired guest instructions — no wall-clock), and route `KernelState::now_basis_at` through `global_clock()` in BOTH modes. New `Scheduler::advance_global_clock_to(now)` floor-up keeps it monotone alongside `advance_all_timebases_to`. Parallel behavior unchanged (it already read `global_clock()`). Verified (lockstep, 50M): - DETERMINISM: two cold `check -n 5M` and two cold `-n 50M` runs byte-identical. - LIVELOCK GONE: "advanced to deadline" went from 592,679 fires / 2 unique values / 562,084 pinned at 3000 -> 18,586 fires / 18,567 unique / 0 pinned, strictly increasing 5.4M -> 50M. Poll thread tid=7 now ends Blocked with a real future deadline Some(60002824) instead of spin-Ready on the past 3000. - imports 1,790,936 -> 92,317 at 50M (the spin no longer burns import calls). Cascade (lockstep, XENIA_CACHE_PERSIST=1, -n 200M): engine now runs to budget instead of hard-deadlocking. Hub enqueue (sub_82458068) 4x; submitter dequeue (sub_82458508) still 3x — the lost 4th-job HANDOFF (count/notify between sub_82458068's tail and the submitter queue) is a SEPARATE downstream gate, not the timebase. New gate: tid=5 (hub) Blocked INFINITE on event 0x1080 (job-4 completion); tid=6 (submitter) Ready, parked in WaitForMultiple (sub_824AB214), loop-top stops at cycle 6.23M. draws still 0, VdSwap 1. Golden re-baseline (same commit): sylpheed_n50m instructions 50000004 -> 50000007, imports 1790936 -> 92317 (swaps/draws/RTs/shaders/textures unchanged). sylpheed_n2m unchanged (livelock onsets after 2M). Suite 665/665 + oracle green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 21:42:28 +02:00
MechaCat02	341196a111	[Issue-1 PPCBUG-020] Word-form ALU ops produce full 64-bit results Xenon is a 64-bit PPC core (32-bit pointer ABI, but 64-bit registers and integer arithmetic). The interpreter was truncating every word-form integer ALU writeback to 32 bits and zero-extending, on a false "MSR.SF=0 / 32-bit ABI" premise. This silently corrupted any genuine 64-bit value flowing through word-form arithmetic. Confirmed load-bearing via runtime ours-vs-canary capture: Sylpheed's millisecond->LARGE_INTEGER timeout converter sub_824ACA88 does `clrldi; mulli r11,r11,-10000; std`. For a 16 ms wait the correct result is -160000 = 0xFFFFFFFF_FFFD8F00 (relative). canary stores exactly that; ours' truncating `mulli` stored 0x00000000_FFFD8F00 (positive) -> the i64 timeout read as a huge absolute deadline -> a ~26000x over-wait that froze the main frame loop. After the fix the timeout matches canary and the previously-frozen frame/worker loops run (parallel boot NtWaitForMultipleObjectsEx 94 -> 30428; KeWaitForSingleObject/critical-section loops resume). Fix mirrors canary's INT64 emitters (ppc_emit_alu.cc) op-by-op for the 17 data-losing word-form ops: addis, addic(.), subfic(.), mulli, add(c/e/ze/me)x, subf(c/e/ze/me)x, negx, mullwx. Only the result writeback widens to full 64 bit; the 32-bit carry (XER[CA]) and overflow (XER[OV]) computations and the CR0 i32 view are preserved byte-identical (the low 32 bits of the new result equal the old truncated result), so this is a strict no-op for clean 32-bit values and only restores the previously-zeroed upper bits for genuine 64-bit values. Genuinely-32-bit ops (rlwinm/slw/srw/cmpw, mulhw/divw whose upper bits are ISA-undefined) are left untouched. Updated 7 unit tests that asserted the truncation (they encoded the bug) to the canary-correct full-64-bit values. Re-baselined the sylpheed_n50m golden (imports 40454 -> 1790936: the unwedged frame/worker loops now cycle under the instruction-count timebase); sylpheed_n2m unchanged (pre-frame-loop). Lockstep determinism preserved (two 50M runs identical). Full suite 660/660. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 16:21:11 +02:00
MechaCat02	ac2f89a7bb	Re-baseline sylpheed_n50m golden post-AUDIT-054 instructions: 50000002 → 50000001 (1-instr shift from FILE_DIRECTORY_FILE plumbing on NtCreateFile path; all other digest fields unchanged — imports/swaps/draws/render-targets/shaders/textures all match prior golden). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:11:11 +02:00
MechaCat02	25704c5811	Re-baseline sylpheed_n50m golden post-AUDIT-032 Companion to `49f3eaf` (AUDIT-032 dedicated audio worker). With the audio callback ticker now on by default, the boot trajectory at 50M instr changes: instructions 50000009 -> 50000002 (interpreter stop boundary shift) imports 407215 -> 40454 (-90% — left audio-wait busy loop) swaps 2 -> 1 (degenerate splash repeat lost; main thread advances past splash) draws 0 -> 0 (audio gate != renderer gate per AUDIT-032 methodology correction) The 10x imports drop reflects exiting the NtWaitForSingleObjectEx busy-wait pattern (1.49M -> 30 calls per audit-runs/audit-048-*). Boot now reaches Stfs/Xam content/crypto init phase. The single remaining swap is the first splash; main thread is then blocked on a different handle (0x1280) for follow-up. sylpheed_n2m unchanged — at 2M instr the audio worker hasn't fired yet, so the digest is byte-identical pre/post AUDIT-032. Verified deterministic via two consecutive --expect runs at the new digest (cargo test -p xenia-app --test sylpheed_oracles -- --ignored passes in 2.82s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:07:40 +02:00
MechaCat02	77034b6cbf	audit-038: persistent cache:/* VFS via host-FS backing Replaces the "Synthesized empty file" stub for cache:/* paths with a real host-FS HostPathDevice-style mount. Each KernelState gets a fresh per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is cleared on init for lockstep determinism (mirrors canary's xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") + HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc). NtCreateFile now honours create_disposition for cache: paths: FILE_OPEN -> NOT_FOUND if missing FILE_CREATE -> NAME_COLLISION if present FILE_OPEN_IF -> open or create FILE_OVERWRITE_IF -> create or truncate FILE_OVERWRITE -> NOT_FOUND if missing, else truncate FILE_SUPERSEDE -> create or truncate NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation) / NtQueryInformationFile / NtQueryFullAttributesFile route through std::fs against the per-handle host_path; non-cache paths keep their legacy semantics (read-only disc image, synth-empty stubs). Verified by audit-037 cascade: - sub_82459D18 (cache-miss restore): 0 fires (was firing constantly) - sub_8245D230 (resize/zero-fill): 0 fires (was firing constantly) - 105+ real cache-file writes per 500M run; 4+ MB of game data persisting to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc. - Lockstep deterministic at instructions=100000004 / imports=987485 across 3+ reruns (digest shifted as expected; goldens re-baselined). - swaps=2 plateau still in place; cluster L1 unactivated. Cascade dimension D (cluster activation) — UNKNOWN, no L1 fires. Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 14:34:27 +02:00
MechaCat02	76dfe7fd7a	fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:46:46 +02:00
MechaCat02	a1a7265f29	fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl Replace the stub_success registration of NtDeviceIoControlFile at exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive geometry) and 0x74004 (partition info), mirroring xenia-canary xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004 response with cache_size=0xFF000 at OUT+8 is the gate that lets sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11 XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007. Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox 360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export in the codebase to need 9+ args. Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held): - XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓ - XamTaskSchedule count 0 → 1 ✓ - canary-only exports 7 → 3 (audit predicted ≤3) ✓ - 0x15e0 semaphore signal_attempts 0 → 1 (bonus) - 0x100c worker spawn DID NOT fire (still UNCREATED) ✗ - 0x1004 signal_attempts unchanged ✗ - Worker spawn count unchanged at 19 ✗ Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡ run3, byte-identical). instructions=100000010 → 100000019, imports 407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists. sylpheed_n50m golden re-baselined instructions=50000004→50000003, imports=407362→407255. sylpheed_n2m unchanged. Still canary-only after this fix: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream gate is somewhere past XamTaskSchedule's completion path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:00:12 +02:00
MechaCat02	bef9793aec	feat(kernel): KRNBUG-IO-001 — NtReadFile on synth-empty file returns SUCCESS+0, not EOF AUDIT-005's static attribution to sub_824ABA98 was wrong. The 0xC0000011 (STATUS_END_OF_FILE) at lr=0x824a97e4 traces to the NtReadFile call at 0x824a9810 inside sub_824A9710 — the cache-loader reads 1024 B from offset 2048 of `\Device\Harddisk0\partition0`. Our synth-empty fallback returned EOF (start_pos 2048 > size 0), so the function bailed via RtlNtStatusToDosError before sub_824ABA98 was ever called. Canary mounts partition0 to a NullDevice; `NullFile::ReadSync` ([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc)) returns X_STATUS_SUCCESS with bytes_read=0 and never touches the buffer. Sylpheed's caller pre-zeroes the 1024-byte stack buffer (`memset(sp+208, 0, 1024)` at sub_824A9710 prologue), validates a "Josh" magic on the first read, and falls back to the cache-recreate path when the magic doesn't match. The fix mirrors NullFile semantics: when the open synthesized a zero-length file (`data.is_empty() && size == 0`), NtReadFile returns SUCCESS with information=0 and the buffer untouched. Effects (chain-of-effects verification at -n 500M): - tests: 590 → 591 (added regression covering NullDevice semantics) - lockstep: deterministic across 3 reruns (same instructions=100000010, swaps=2) - sylpheed_n50m golden re-baselined: instructions 50000004→50000000, imports 407416→407362 - canary kernel-call diff: 10 → 7 missing exports (XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile now run; the cache-recreate path executes through to NtWriteFile) - boot reaches silph::Silph::Impl::OnInit: 19 worker threads spawn (was 6 before the fix) - parked-handle 0x1004 still signal_attempts=0; the original 0x100c and 0x15e0 are now <UNCREATED> because cascade walked past them and the handle assignments shifted; new parked sites: 0x12fc/0x1600/ 0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044 - draws=0 plateau persists; renderer is multi-causal blocked Next blocker: per the canary-only diff, XamTaskSchedule + the cluster of XAM exports (XamTaskCloseHandle, XamUserReadProfileSettings, ObCreateSymbolicLink) and the post-thread-exit chain (ExTerminateThread, KeReleaseSemaphore, KeResetEvent) are the next-up frontier.	2026-05-04 20:20:10 +02:00
MechaCat02	19659d7f76	feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16 Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having flipped the priv-10 gate, XGetAVPack now reaches its caller in sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto block before XeCryptSha. Cascade walks one step (canary-only export list 11 → 10); sub_824ABA98 is the next candidate. Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004, imports 407417→407416). Lockstep deterministic across 3 reruns at -n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2). 9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0 still signal_attempts=0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:54:24 +02:00
MechaCat02	1a892d4641	feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap Replace stub_return_zero with a canary-faithful implementation that returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS (key 0x00030000) bitmap. Mirrors xenia-canary xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0. Plumbing: - xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor. - xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged HashSet for one-shot per-priv tracing. - xenia-app: kernel.xex_system_flags wired in cmd_exec_inner. - xenia-kernel/exports.rs: real export body + unit test covering bits 10/11/0/64 + zero-flags case. Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE, bit 10). At -n 500M with the fix: - XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped). - 10 other canary-only exports + 9 producer PCs + 3 parked handles unchanged. Priv-11 site at sub_824A9710 is downstream and still not reached — AV/crypto block aborts after XGetAVPack returns our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts only 3/4/6/8 per xenia-canary xam_info.cc:250-251). Tests 588 -> 589. Lockstep deterministic (3 reruns identical): n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 / draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle test green. Full chain-of-effects + next-frontier hand-off in audit-findings.md under KRNBUG-XEX-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:32:51 +02:00
MechaCat02	1f416aaa2e	test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep, covering the first VdSwap pair (the n2m oracle is swap-blind because the first VdSwap fires at ~18M instructions). The new --stable-digest flag emits/compares only fields that are deterministic in lockstep: instructions, imports, unimpl, draws, swaps, unique_render_targets, shader_blobs_live, texture_cache_entries Excluded: packets — empirically ±2-8% lockstep variance (GPU thread race per audit M11) resolves, interrupts_delivered, interrupts_dropped, texture_decodes — scheduling-sensitive under --parallel path — cwd-dependent Empirical determinism: 3 consecutive lockstep -n 50M runs produce byte-identical stable-digest output. The n4b canonical-invocation golden the audit's recommended next sprint also called for is deferred. Per audit memory `--parallel --reservations-table` is pathologically slow (>32 min for -n 100M), so -n 4B in that mode would be many hours per run, not the 5-15 min the plan estimated. n4b will be captured one-shot post-renderer-unblock as a manual artifact under audit-runs/post-fix/, not as a test golden. See crates/xenia-app/tests/golden/README.md. Test infrastructure: - crates/xenia-app/tests/sylpheed_oracles.rs — invokes CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO env var (skips gracefully if missing). - #[ignore]-gated; run via: cargo test --release -p xenia-app --test sylpheed_oracles \\ -- --ignored --nocapture Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:46:02 +02:00

18 Commits