ad9c8e4cb8a1c607f7578e19b52bdd1d4084899a
25 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
873c197ff1 |
[iterate-2T] VdSwap: route present through ring PM4_XE_SWAP, drop out-of-band swap interrupt
Make ours' VdSwap present path faithful to xenia-canary `VdSwap_entry` (xboxkrnl_video.cc:518-548): write the reserved 64-dword ring slot with a PM4_TYPE0 fetch-constant patch + PM4_TYPE3(PM4_XE_SWAP) + NOP padding, then let the natural drain consume the swap packet in command-stream order. Remove the synthetic CP swap-complete interrupt that `notify_xe_swap` raised out-of-band. Root found this session (the actual present-path bug): ours' `notify_xe_swap` pushed an `InterruptSource::Swap` (→ INTERRUPT_SOURCE_CP) interrupt directly from the VdSwap HLE, decoupled from the GPU command stream. When that interrupt reached the graphics ISR `sub_824BE9A0` before D3D had armed its swap-callback slot (`[gfx+10772]+16` still the `0xBADF00D` placeholder), the ISR took its error path and hit the assert "ERR[D3D]: Unanticipated CPU_INTERRUPT. Sign of a corrupt command buffer?" (`bl sub_824C5DF0; twi` at 0x824BE9DC) — 2x per run on master. Canary's VdSwap raises NO interrupt; swap-complete CP interrupts come only from in-stream PM4_INTERRUPT packets, which are naturally ordered after the callback-arming Type-0 writes. Routing the swap through the ring packet matches that ordering and eliminates the trap (2 -> 0). Canary oracle confirmation (muted, audit_mem_watch + audit_jit_prolog_pc): canary's early/loading loop is present-driven — swap counter [gfx+15160] (0xBE56CA38) advances ~per-vblank from vblank 65 onward, reaching 0xD02 (3330) in ~60s via 6184 CP source=1 interrupts, with VdSwap called only ONCE. So the present interrupts are entirely in-stream, not from the VdSwap export. This is a correctness/faithfulness fix; it does NOT cascade. draws stay 78 at 200M and 1B because the upstream gate persists: the game submits one render batch then stalls (renderer sub_82506xxx 0x; 2nd title thread 0x821748F0 never spawns). The per-frame loop sub_822F1AA8 runs ~1207 iterations on vsync but clock B (swap count) only advances ~once, so the manager update sub_821741C8 fires once. That is the iterate-2Q/2F title-pipeline gate, not a present/ interrupt bug. swaps 3 -> 4 (the in-stream PM4_XE_SWAP now drains). Deterministic in inline mode (n50m --gpu-inline --stable-digest regenerated byte-identical twice; golden re-baselined: swaps 3 -> 4). cargo test --workspace 675 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
034ec8b47f |
[iterate-2O] GPU: drain indirect buffers correctly — Sylpheed renders splash (draws 0→78)
Ours' GPU never drained the D3D driver's system command buffer past the first 11-dword indirect buffer, so DRAW_INDX / reg-0x57C-arm packets never executed and draws stayed 0 (the long-hunted render gate; see UPDATE-18). Runtime tracing (temporary, removed) showed the guest submits 6 INDIRECT_BUFFER packets at boot (CP_RB_WPTR 22→37) but ours executed exactly ONE IB and then spun 15.7M packets inside it. Three coupled command-processor bugs, all corrected to match canary: 1. `sync_with_mmio` applied the primary CP_RB_WPTR to whichever ring was active, including an executing indirect buffer — `37 % 11 = 3` clobbered the IB's write pointer so its read pointer looped 0→2→5→0 forever and never popped back to the primary ring. CP_RB_WPTR governs ONLY the primary ring; while an IB executes, the primary is the bottom of the IB stack. Canary executes each IB through a separate `RingBuffer reader_` (command_processor.cc), so the primary write pointer is structurally inapplicable to an IB. 2. Indirect buffers were treated as circular rings: read wrapped at `size_dwords` (`11 % 11 = 0`) and never reached the fixed write pointer, so even without the clobber the IB could not terminate. An IB is a fixed *linear* sub-stream; add `RingBufferView.indirect` and drain `[0, ib_size)` monotonically, then pop. 3. `is_ready` only checked the active ring, so an IB that now correctly exhausts would never get `execute_one` called again to pop back to the primary ring (whose WPTR may have advanced). Check the whole IB stack. Also: the ring was sized `1 << size_log2` bytes (1024 dwords) vs canary's `1 << (size_log2 + 3)` (8192 dwords) — an 8× undersize that desynced WPTR-wrap math from the guest. Fixed in `GpuSystem::initialize_ring_buffer` (and the dead bookkeeping copy in `vd_initialize_ring_buffer`). Cascade (deterministic; threaded-default backend, byte-identical across runs): reg 0x57C now written, IB jumps 1→12, packets 15.7M→9,825, and the splash renders — draws 0→78, shaders 0→3, render_targets 0→2, swaps 2→3 — stable at 50M / 200M / 1B. Boot then reaches a new downstream gate (draws plateau at 78, interrupts keep climbing → engine alive, not deadlocked). golden `sylpheed_n50m.json` re-baselined (draws 78). `cargo test --workspace` green (674; +2 ring_view regression tests). vd_swap's synthetic-swap short-circuit is now redundant but left untouched (cascade works without changing it); cleaning it up is a separate follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
2bdb93e51e |
[iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never truly drained → render worker ring-space wait → no frame → no draw): The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror windows differing only in cache policy — bare physical (0x0xxxxxxx), write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc` (MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx window. The guest masks its CP-ring allocation base to bare physical (0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve pointers are likewise bare-physical. Ours stored those verbatim and read `membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream, and the RPtr writeback landed on a zero page the render worker (tid=8) polls, freezing it forever. Fix (GPU/Vd-boundary translation, not memory-layer): add `physical_to_backing(addr)` deriving the committed backing exactly from `heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for the WC window, flat for non-physical code/stack). Apply it at every point the GPU/kernel consumes a guest physical address: ring base (initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4 INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write, REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses, the resolve dest write, and the vd_swap frontbuffer present read. This was chosen over memory-layer aliasing because the latter re-projects every CPU load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an early PC=0xfffffffc fault). Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful): - WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef selector (it decoded wait_info&7==3 as NotEqual instead of Equal), inverting the standard CP coherency wait so the GPU parked forever on the first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never. - Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit (mirrors command_processor.cc:801-838) so the coherency handshake resolves. Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT, INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU worker); the frame loop sub_822F1AA8 actively writes the controller at 0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean. draws_seen is still 0: the remaining gate is upstream and separate — the main frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls at 0x23, the known iterate-2C state-divergence gate), so the guest never enqueues a render IB; the GPU only ever replays the init IB. This fix correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the bit-28 frame-ready gate is the next target. Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at 50M); regenerated twice byte-identical. cargo test --workspace: 672 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
ed2e0e72fd |
[iterate-2J] KeTimeStampBundle deterministic tick: fix frozen+mislaid guest clock
The xboxkrnl data export KeTimeStampBundle (ordinal 0x00AD, import slot
0x820007d0 — confirmed via sylpheed.db imports table) was set up with TWO
defects in the import-patch pass:
1. FROZEN: the block was written once at boot and never updated, so every
field stayed a constant for the whole run (observed: the guest's clock
reader sub_824AA830 = [[0x820007d0]+0x10] returned a constant
0x01d6bc0c from 5M..150M instructions).
2. WRONG LAYOUT: it stuffed the FILETIME high-dword at +0x10. The canonical
X_TIME_STAMP_BUNDLE (xenia-canary kernel_state.h) is:
+0x00 interrupt_time u64 (100ns since boot)
+0x08 system_time u64 (FILETIME 100ns since 1601)
+0x10 tick_count u32 (milliseconds since boot)
+0x14 padding
so [block+0x10] is tick_count in ms, not a FILETIME dword.
Fix (deterministic, no wall-clock):
* Initialize the block with the correct field layout (tick_count = 0 at
boot, system_time = FILETIME base, interrupt_time = 0).
* Store the block VA on KernelState::timestamp_bundle_addr during the
import patch.
* Add KernelState::update_timestamp_bundle(mem, clock) and call it every
round in BOTH the lockstep (run_execution) and parallel
(run_execution_parallel) outer loops, right where the deterministic
Scheduler::global_clock is advanced. The clock is the retired-instruction
monotonic global_clock, so every guest-visible time value stays a pure
function of guest progress (lockstep byte-reproducible).
* Cadence: 1 global_clock unit = 100ns (coherent with parse_timeout, which
divides 100ns timeouts by 100 onto the same basis), so
INSTRUCTIONS_PER_MS = 10_000. tick_count now advances 0 -> ~4999ms over
a 50M-instruction window. Also make KeQuerySystemTime read the same
100ns clock instead of a frozen FILETIME constant.
Verification: tick_count at 0x40002010 now advances (deadline arm at
0x82450d0c stores clock+66 = 0x260,0x269,...,0x51d,... advancing, vs the
frozen 0x01d6bc4e before the fix). Determinism: two cold --stable-digest
runs are byte-identical; the n50m golden is UNCHANGED (the clock-affected
counter is not in the stable digest). 672/672 tests pass.
HONEST CAVEAT — the predicted render cascade did NOT materialize on this
branch. The diagnosed consuming gate at 0x82450b10 (the clock-vs-deadline
compare in the worker-hub channel loop sub_82450A68) is unreachable here:
the loop always branches away at 0x82450b0c ([this+220] >= channel-index),
so the hub already dispatches sub_82450B68 342x in BOTH the frozen and
fixed builds. Guest trajectory (imports 339766@50M / 1738001@200M /
9212446@1B), draws (0), swaps (2) and thread topology (tid14 Ready, not
blocked on 0x109c) are identical frozen-vs-fixed. This commit is therefore
a correct latent-clock-bug fix and determinism-safe prerequisite, NOT the
render unblock. The 0x109c/tid14 starvation premise was not reproduced at
f75bc96; the next gate must be re-localized.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
0332d1990d |
[Track 2] Parallel-scoped global clock fixes timebase-desync livelock
In --parallel mode a long run livelocked: the scheduler spun "advanced to deadline 3000 waking hw=2 idx=0" ~14k times in microseconds. Root cause: each guest thread owns ctx.timebase (+1/instr in step_block), and all kernel deadline arithmetic read Scheduler::ctx(hw_id).timebase as "now". But the parallel worker extracts its PpcContext via mem::replace(ctx_mut_ref, PpcContext::new()) — leaving a ZEROED timebase in the slot while it steps unlocked — and advance_all_timebases_to only walks runqueue (never idle_ctx). So the coordinator's coord_pre_round drain and a woken thread's parse_timeout could read a zeroed/stale basis decoupled from the deadline the scheduler just advanced to. The thread re-armed the same constant deadline forever; the global clock never moved. Fix: add a single monotonic Scheduler::global_clock, advanced by the per-block retired-instruction count on each parallel writeback and floored up by advance_all_timebases_to. Kernel deadline reads route through KernelState::now_basis_at(hw_id), which returns global_clock ONLY when parallel_active; lockstep keeps reading the exact pre-existing ctx(hw_id).timebase expression, so the deterministic lockstep trace is byte-identical (sylpheed_n50m golden unchanged, zero re-baseline). Verified: - 50M --parallel run completes (was: hung). Deadlines now strictly increasing 5.4M -> 49.1M (18097 unique of 18116; max repeat 2) vs pre-fix constant 3000 x ~14000. - sylpheed_n50m golden byte-identical via plain `check` (no persist). - Full suite 665/665 green. Note: an intermittent parallel hang/crash (~1-2/20 at -n 5M) is pre-existing (master 1/20, this build 2/20 — within noise) and distinct from the timebase livelock: it is a parallel-race class (e.g. the unsafe block_ptr deref in run_execution_parallel). Tracked separately; lockstep remains the recommendation for long runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
b20c99f141 |
[Subsystem-fixes] 6 verified ours-vs-canary divergence fixes
From the 2026-06-12 5-subsystem differential audit. All verified against canary as oracle; 660/660 workspace tests green (655 + 5 new). 1. nt_create_event polarity (exports.rs) — `manual_reset = gpr[5] != 0` was INVERTED. Canary xboxkrnl_threading.cc:668 `Initialize(!event_type,..)` + xevent.cc:41 (type 0 = NotificationEvent = manual, type 1 = Sync = auto). Now `== 0`. Was the dormant 2.AI fix on chore/portable-snapshot, never merged. The Ke-path was already correct; only the Nt-path was wrong. 2. 2.AF deadline drain (main.rs coord_pre_round) — expired KeWait/KeDelay deadlines never fired under load because advance_to_next_wake_if_due was only called in coord_idle_advance (no-Ready-threads path). Added a per-round drain loop; covers BOTH lockstep and parallel outer loops since both call coord_pre_round. Was the dormant 2.AF fix, never merged. 3. handle slab-recycle ABA guard (state.rs + scheduler.rs) — release_handle_slot (my round-34 regression) recycled a closed slot even with a thread still parked on it, risking a stale-waiter wake when the slot is re-minted. Added Scheduler::any_thread_waiting_on; decline to recycle a still-waited slot. 4. vpkpx pixel-pack (vmx.rs) — wrong field mapping (~100% mismatch). Now exact canary ppc_emit_altivec.cc:1795 shift/mask (red 6b out[15:10] from w[24:19], green out[9:5] from w[14:10], blue out[4:0] from w[7:3]; no fabricated alpha bit). +unit test. 5. VFS GDFX attribute plumbing (vfs/*, exports.rs query fns) — VfsEntry now carries the real on-disc attribute byte (GDFX dirent +12, canary disc_image_device.cc:136/154) instead of inferring directory-ness from path shape. Query exports report the real FILE_ATTRIBUTE_* bits. Candidate driver of the XamShowDirtyDiscErrorUI gate. +tests. 6. MmGetPhysicalAddress region-aware mirror (exports.rs) — flat 0x1FFFFFFF mask missed canary's +0x1000 host_address_offset for 0xE0000000+ mirror (memory.cc:2317). Read-only query; proven byte-identical 50M digest. +test. Investigated and intentionally NOT changed: - zero-on-recommit: no-op; ours has no region-reuse path (bump allocators, free is a stub). - 32-bit ALU writeback truncation (PPCBUG-020): documented-deliberate; premise (MSR.SF=0) is questionable but flipping it is out of scope here. - KeSetEvent/NtSetEvent return value: ours returns true previous state (hardware-faithful); canary returns constant 1 — NOT an ours bug. sylpheed_n50m golden will need re-baselining (legit behavior change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
db90ad0f7d |
[AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis
Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0, start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128) and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY instructions. Env-gated; default off. SR4 verdict B (partial unwedge): - handle 0x1078 signal_attempts 0->1 - tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108 - ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15) - New downstream wedges 0x1084 + 0x1088 - cxx_throw runtime_error on tid=5 inside R26 dispatcher (BST not-registered instance lhs=0x715a7af0) - VdSwap stays 1; no draws (POC is diagnostic, not final fix) Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST instance upstream; this POC only does (a). Reading-error class #20: ctx.lr at kernel export entry is the thunk wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain 1 step to get frames[1].lr. Reading-error class #21: --parallel and lockstep have SEPARATE outer loops in main.rs (run_execution_parallel line 2928 vs run_execution line 2706). Per-round hooks must be wired in BOTH paths. Files: - crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context fields + spawn() population + current_thread_entry_and_ctx() helper - crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint, maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals - crates/xenia-kernel/src/exports.rs: hook in nt_create_event - crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops - audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md Tests 655/655 green. Default behavior byte-identical when env unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
229b46c765 |
[Kernel] Slab-recycle handle allocator (AUDIT-059 R34)
Adds a FIFO free list of closed handle slots so alloc_handle returns recycled IDs before bumping next_handle. Mirrors canary's slab-style ObjectTable: F8000098 reused 130x per 30s window in canary, but ours' monotonic bump allocator never reused slots — so a recycled slot in canary maps to a fresh, never-reused slot in ours, drifting kernel object identity per AUDIT-042's analysis. release_handle_slot is wired into nt_close's refcount==0 branch and gated to the canonical [0x1000, 0xF000_0000) range so synthetic XAudio park handles (AUDIT-048) are never recycled. Verified: all 655 workspace tests green, smoke tests at -n 50M show NtClose 115/run with handle table renumbering active (round-34 max handle 0x12ac vs round-16 baseline 0x12b8 over same workload). γ- cluster #2 wedge unchanged — silph wait still parks tid=13 on the renumbered handle (4216=0x1078 here vs 0x12a4 baseline), confirming the wedge is independent of allocator policy. Lands as a parity fix to bring our kernel-object identity in line with canary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b5885b8560 |
[2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing)
Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059 identified as never reached by ours' static-init chain (real chain entry sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8). Ctx layout follows round 5's live hexdump from canary: +0x00 vtable = 0x8200A1E8 +0x04 self +0x08 intrusive list head -> self +0x0C init flag = 1 +0x10 packed byte field +0x18 2x float ~1.0 (UI rates) +0x24 flag = 1 +0x28..+0x30 3x foreign-arena pointers (left NULL — see below) +0x54..+0x84 4x X_KEVENT auto-reset, state=0 +0x94..+0xC4 4x X_KEVENT manual-reset, state=1 (pre-signaled) +0x210..+0x250 4-entry intrusive work-ring, empty Worker spawn mirrors AUDIT-048's audio-worker pattern in xaudio_register_render_driver: per-worker allocate_thread_image + state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary runs the equivalent chain. ROUND 18 OUTCOME — opt-in only: With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8) spawns. Per task constraints this is STOP-and-report: the ctx fields +0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018, 0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are left NULL, and the worker bodies plausibly dereference one of them. Synthesising those is a fresh investigation (round 19+). With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot completes normally (11 spawns, VdSwap=1, KeSetEvent=2, KeReleaseSemaphore=1 — matches default baseline). The ctx remains materialised in guest memory at the logged VA for downstream probing. Default (env var unset): no synth, no regression. Files: crates/xenia-kernel/src/silph_synth.rs (new, 225 LOC) crates/xenia-kernel/src/lib.rs (+1 LOC, register module) crates/xenia-kernel/src/exports.rs (+37 LOC, hook in open_vfs_file) crates/xenia-kernel/src/state.rs (+18 LOC, 4 silph_synth_* fields) Tests: cargo test --release --workspace = 765 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a8ff9515d |
AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence
Track A — FILE_DIRECTORY_FILE handling. NtCreateFile's 9th parameter
`create_options` (sp+0x54 per shim_utils.h:49-50) is now read and
forwarded to open_vfs_file/open_cache_file. When the
FILE_DIRECTORY_FILE bit (0x1) is set on a `cache:\<hash>` path,
the host-side handler `mkdir -p`s instead of `File::create`'ing a
0-byte sentinel that blocked subsequent hierarchical creates of
`cache:\<hash>\<sub>\<leaf>` with NAME_COLLISION. Confirmed by
`opts=0x4021` (incl. FILE_DIRECTORY_FILE) on `cache:\d4ea4615`
and `opts=0x4020` (no DIR bit) on the leaf `.tmp` files. NtOpenFile
forwards `open_options` (r8) into the same slot per
xboxkrnl_io.cc:118-122. Closes the AUDIT-053 ζ-class VFS layout
aliasing wedge.
Track B — opt-in persistent cache root. AUDIT-038's per-process
tmpdir + wipe stays the default (preserves lockstep/oracle
determinism + dodges Sylpheed's `<hash>.tmp` journal-append-on-
reboot self-inconsistency). Persistence is now opt-in via
* `XENIA_CACHE_ROOT=<path>` — explicit path (caller manages
wiping); hands a stable place to drop a canary-built cache
for cascade A/B oracle work.
* `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache`
(or `$HOME/.local/share/xenia-rs/cache`).
Cold-start (-n 500M, default tmpfs) with FILE_DIRECTORY_FILE fix:
swaps=1 draws=0 imports=40454 cxx_throw=0 — matches master baseline,
no regression. Cache hierarchy now mkdir-p'd correctly: `cache:/`
contains 9 hash dirs (e.g. `d4ea4615/e/`, `aab216c3/5/`) instead
of the 0-byte sentinel files AUDIT-053 found masquerading as
directories.
LOC: +88 / -14 = +74 net (≤80 budget). All 127 xenia-kernel unit
tests pass.
Trace: audit-runs/audit-054-vfs-layout-fix/
cold-start-digest.json + warm-start-digest.json (defaults)
persist-cold-digest.json + persist-warm-digest.json (opt-in)
baseline-master-digest.json (master
|
||
|
|
49f3eafa15 |
AUDIT-032: dedicated audio worker thread per client (Plan B)
Replaces APUBUG-PRODUCER-001's random-victim-hijack audio injection with a dedicated per-client guest worker thread, mirroring xenia-canary's apu/audio_system.cc:84-159 WorkerThreadMain pattern in xenia-rs's threading model. Audio callback ticker is now safe to enable by default. ## What changed - xenia-kernel/src/xaudio.rs: new XAudioState fields worker_handles + worker_refs (one slot per of XAUDIO_MAX_CLIENTS=8). Synthetic park-handle helper (0xF000_0000 | client_idx) — outside the normal alloc range so wake_eligible_waiters never finds it; the only legitimate state-flip is via try_inject_audio_callback. - xenia-kernel/src/exports.rs: xaudio_register_render_driver spawns a 64KB-stack guest thread (create_suspended=true) via state.scheduler.spawn after registration succeeds. Immediately flips the spawned thread's state from Blocked(Suspended) to Blocked(WaitAny[synthetic]) so it's parked but not woken. Stores the kernel handle so find_by_handle resolves a fresh ThreadRef after slot compaction. Failure paths log + leave xaudio.worker_refs[i] = None, in which case the ticker drops fires (no random-victim fallback). - xenia-app/src/main.rs: try_inject_audio_callback resolves the worker via worker_handles[index] instead of scanning runqueues for a Ready or Blocked victim. The PC+r3 injection and SavedCallbackCtx capture are unchanged; the existing LR_HALT restore path re-blocks the worker on its synthetic handle for the next tick. Flag handling reworked: --xaudio-tick / XENIA_XAUDIO_TICK now act as explicit override (truthy = force on, falsey = force off, absent = use the KernelState default). - xenia-kernel/src/state.rs: xaudio_tick_enabled default flipped from false to true. Pre-fix it was off because the random-victim hijack regressed swaps=2->1; with the dedicated worker that whole class of regression is gone. ## Cascade verification at -n 500M (audit-runs/audit-048-audio-host-pump/) Pre-fix baseline: audit-runs/audit-047-gamma-wedges/ours-end-state.log. | Dim | Predicted (AUDIT-032) | Observed | |-----|-------------------------------------|---------------------------------| | A | tid=9 leaves Blocked[0x828A3254] | Ready @ pc=0x824d1404 | | B | tid=10 leaves Blocked[0x828A3230] | Ready @ same pc/lr | | C | XAudioSubmitRenderDriverFrame > 0 | Mixer setup path executed | | D | KeReleaseSemaphore 0 -> non-zero | 0 -> 1; xaudio.callback.delivered=1 | Bonus: audit-042's tid=6 worker pair on 0x10A0+0x10A4 also went Blocked->Ready as a downstream effect. Boot trajectory shifted significantly: NtWaitForSingleObjectEx 1,489,791 -> 30; NtSetEvent 3,334 -> 68; new exports firing (StfsCreateDevice, ObCreateSymbolicLink, XamContentCreateEnumerator, XamEnumerate, XamTaskSchedule, ExCreateThread x10, KeSetAffinityThread x7, NtCreateSemaphore x4, NtWaitForMultipleObjectsEx x94, NtDuplicateObject x14, XeCryptSha, XeKeysConsolePrivateKeySign). The system left the audio-wait busy loop and entered the savegame/content/crypto init phase. swaps regressed 2 -> 1 (degenerate splash repeat lost; main thread now advances past splash entirely, blocked on a different handle). draws unchanged at 0 — expected per AUDIT-032 (audio gate != renderer gate). ## Tests + scope - cargo build --release succeeds, no new warnings. - cargo test -p xenia-kernel --lib: 127/127 pass (incl. xaudio). - cargo test -p xenia-app --lib: 5/5 non-ignored pass. - Lockstep goldens (sylpheed_n2m / sylpheed_n50m) WILL drift on this fix and need re-baselining as a follow-up commit. 75 net non-comment LOC across 4 files, well under AUDIT-032's 60-120 LOC budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
77034b6cbf |
audit-038: persistent cache:/* VFS via host-FS backing
Replaces the "Synthesized empty file" stub for cache:/* paths with a
real host-FS HostPathDevice-style mount. Each KernelState gets a fresh
per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is
cleared on init for lockstep determinism (mirrors canary's
xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") +
HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc).
NtCreateFile now honours create_disposition for cache: paths:
FILE_OPEN -> NOT_FOUND if missing
FILE_CREATE -> NAME_COLLISION if present
FILE_OPEN_IF -> open or create
FILE_OVERWRITE_IF -> create or truncate
FILE_OVERWRITE -> NOT_FOUND if missing, else truncate
FILE_SUPERSEDE -> create or truncate
NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation)
/ NtQueryInformationFile / NtQueryFullAttributesFile route through
std::fs against the per-handle host_path; non-cache paths keep their
legacy semantics (read-only disc image, synth-empty stubs).
Verified by audit-037 cascade:
- sub_82459D18 (cache-miss restore): 0 fires (was firing constantly)
- sub_8245D230 (resize/zero-fill): 0 fires (was firing constantly)
- 105+ real cache-file writes per 500M run; 4+ MB of game data persisting
to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc.
- Lockstep deterministic at instructions=100000004 / imports=987485
across 3+ reruns (digest shifted as expected; goldens re-baselined).
- swaps=2 plateau still in place; cluster L1 unactivated. Cascade
dimension D (cluster activation) — UNKNOWN, no L1 fires.
Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c03f2bc9e2 |
fix(kernel): ensure_dispatcher_object writes XObj signature + handle (canary mirror)
Mirrors canary's `XObject::StashHandle` (xobject.h:253-256): on first adoption of a guest dispatcher header, stamp +0x08 with the kXObjSignature fourcc 'X','E','N','\0' and +0x0C with the stash handle (here the guest pointer itself, since our shadow table is keyed by ptr). Audit-023/024A documented divergence at addresses such as 0x828F4838 where canary stores "XEN\0" + handle but we left zeros. Lands as canary-correctness restoration; cascade impact at -n 500M is nil per the discipline gate (no sharp prediction tied to the writeback). Lockstep determinism preserved: instructions=100000003, imports=987516, swaps=2, draws=0 across 2 reruns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
76dfe7fd7a |
fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror
Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a1a7265f29 |
fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl
Replace the stub_success registration of NtDeviceIoControlFile at
exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive
geometry) and 0x74004 (partition info), mirroring xenia-canary
xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004
response with cache_size=0xFF000 at OUT+8 is the gate that lets
sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11
XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007.
Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the
caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox
360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area
sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export
in the codebase to need 9+ args.
Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held):
- XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓
- XamTaskSchedule count 0 → 1 ✓
- canary-only exports 7 → 3 (audit predicted ≤3) ✓
- 0x15e0 semaphore signal_attempts 0 → 1 (bonus)
- 0x100c worker spawn DID NOT fire (still UNCREATED) ✗
- 0x1004 signal_attempts unchanged ✗
- Worker spawn count unchanged at 19 ✗
Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡
run3, byte-identical). instructions=100000010 → 100000019, imports
407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists.
sylpheed_n50m golden re-baselined instructions=50000004→50000003,
imports=407362→407255. sylpheed_n2m unchanged.
Still canary-only after this fix: ExTerminateThread,
KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream
gate is somewhere past XamTaskSchedule's completion path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7675035082 |
fix(kernel): KRNBUG-IO-002 — vol-info class-3 returns 0x10000 alloc unit (canary NullDevice)
`nt_query_volume_information_file` class-3 (`FileFsSizeInformation`) was returning sectors_per_unit=1, bytes_per_sector=2048 (alloc unit 2048). Replaced with canary's NullDevice byte-identical values sectors=0x80, bps=0x200 (alloc unit 0x10000), with total / available allocation units lowered to 0x10 / 0x10 to match. Reference: xenia-canary/src/xenia/vfs/devices/null_device.h:38-46 (`NullDevice::sectors_per_allocation_unit()` and `bytes_per_sector()`); consumed by canary's `NtQueryVolumeInformationFile_entry` at xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:355-365. Tests 591 → 592 (added `nt_query_volume_information_file_class3_returns_64k_alloc_unit`). Lockstep `instructions=100000010, swaps=2, draws=0` deterministic across two `--stable-digest -n 100M` reruns. sylpheed_n50m oracle still matches its existing golden — observably a no-op at -n 50M. The audit-006-predicted 7→0 cascade did NOT fire (canary-only exports still 7, identical set; XexCheckExecutablePrivilege still priv=0xA only; XamTaskSchedule still 0). All 16 NtQueryVolumeInformationFile calls in our 500M trace originate from a single LR 0x82611f38 and complete successfully — vol-info is therefore not the priv-11 gate. The fix value is correct (canary-byte-identical) but is not load-bearing for the gate; landing it anyway because it's the right value and unblocks no regression. Stop condition triggered per the IO-002 task brief — no second fix this session. Next-session: --pc-probe on sub_824A9710 entry to find the actual upstream gate. See `audit-findings.md` (KRNBUG-IO-002 entry) and `audit-runs/post-IO-002/` for the full diagnostic trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bef9793aec |
feat(kernel): KRNBUG-IO-001 — NtReadFile on synth-empty file returns SUCCESS+0, not EOF
AUDIT-005's static attribution to sub_824ABA98 was wrong. The 0xC0000011
(STATUS_END_OF_FILE) at lr=0x824a97e4 traces to the NtReadFile call at
0x824a9810 inside sub_824A9710 — the cache-loader reads 1024 B from
offset 2048 of `\Device\Harddisk0\partition0`. Our synth-empty fallback
returned EOF (start_pos 2048 > size 0), so the function bailed via
RtlNtStatusToDosError before sub_824ABA98 was ever called.
Canary mounts partition0 to a NullDevice; `NullFile::ReadSync`
([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc))
returns X_STATUS_SUCCESS with bytes_read=0 and never touches the
buffer. Sylpheed's caller pre-zeroes the 1024-byte stack buffer
(`memset(sp+208, 0, 1024)` at sub_824A9710 prologue), validates a
"Josh" magic on the first read, and falls back to the cache-recreate
path when the magic doesn't match.
The fix mirrors NullFile semantics: when the open synthesized a
zero-length file (`data.is_empty() && size == 0`), NtReadFile returns
SUCCESS with information=0 and the buffer untouched.
Effects (chain-of-effects verification at -n 500M):
- tests: 590 → 591 (added regression covering NullDevice semantics)
- lockstep: deterministic across 3 reruns (same instructions=100000010,
swaps=2)
- sylpheed_n50m golden re-baselined: instructions 50000004→50000000,
imports 407416→407362
- canary kernel-call diff: 10 → 7 missing exports
(XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile
now run; the cache-recreate path executes through to NtWriteFile)
- boot reaches silph::Silph::Impl::OnInit: 19 worker threads spawn
(was 6 before the fix)
- parked-handle 0x1004 still signal_attempts=0; the original 0x100c
and 0x15e0 are now <UNCREATED> because cascade walked past them and
the handle assignments shifted; new parked sites: 0x12fc/0x1600/
0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044
- draws=0 plateau persists; renderer is multi-causal blocked
Next blocker: per the canary-only diff, XamTaskSchedule + the cluster
of XAM exports (XamTaskCloseHandle, XamUserReadProfileSettings,
ObCreateSymbolicLink) and the post-thread-exit chain (ExTerminateThread,
KeReleaseSemaphore, KeResetEvent) are the next-up frontier.
|
||
|
|
1a892d4641 |
feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap
Replace stub_return_zero with a canary-faithful implementation that returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS (key 0x00030000) bitmap. Mirrors xenia-canary xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0. Plumbing: - xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor. - xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged HashSet for one-shot per-priv tracing. - xenia-app: kernel.xex_system_flags wired in cmd_exec_inner. - xenia-kernel/exports.rs: real export body + unit test covering bits 10/11/0/64 + zero-flags case. Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE, bit 10). At -n 500M with the fix: - XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped). - 10 other canary-only exports + 9 producer PCs + 3 parked handles unchanged. Priv-11 site at sub_824A9710 is downstream and still not reached — AV/crypto block aborts after XGetAVPack returns our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts only 3/4/6/8 per xenia-canary xam_info.cc:250-251). Tests 588 -> 589. Lockstep deterministic (3 reruns identical): n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 / draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle test green. Full chain-of-effects + next-frontier hand-off in audit-findings.md under KRNBUG-XEX-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a9fd1fc86 |
feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation
Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a
`record_create_with_stack` audit hook gated on `--trace-handles-focus`.
NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now
route through the new helper so focused handles capture up to 6 stack
frames at allocation time. Diagnostic-only, read-only memory access:
unfocused handles pay one HashSet lookup, focused ones pay six
back-chain dereferences. Lockstep determinism preserved.
End-to-end finding: handles 0x1004 (8-instance pool via static ctor at
0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton
in distinct cluster) are silph-framework dispatcher objects whose
producer code is unreached at -n 500M. The producer hunt now has class
ownership; vtable/RTTI readout is the next step.
Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002
unchanged. Master HEAD prior:
|
||
|
|
07068e7616 |
feat(audio): APUBUG-PRODUCER-001 — XAudio register driver client + opt-in callback ticker
Replace the three XAudio kernel-export stubs (Register/Unregister/SubmitFrame) with canary-faithful implementations and add a periodic buffer-complete callback ticker reusing the existing SavedCallbackCtx injection machinery. Canary parity: - xboxkrnl_audio.cc:56-93 — read callback_ptr[0..1], wrap callback_arg in a 4-byte big-endian guest heap buffer (`wrapped_callback_arg`), write `0x4155_xxxx` to *driver_ptr. - audio_system.cc:139-141 — guest callback receives r3 = wrapped pointer, not raw callback_arg. - audio_driver.h:21-24 — frame rate 256 samples / 48 kHz ≈ 5.33 ms. Implementation: - New `crates/xenia-kernel/src/xaudio.rs` — `XAudioClient`, `XAudioState` (8-slot table, pending FIFO, dual-mode ticker), `XAUDIO_INSTR_PERIOD = 48_000` (lockstep) and `XAUDIO_PERIOD = 5.333 ms` (--parallel), same pattern as KRNBUG-D08 v-sync. - `try_inject_audio_callback` in xenia-app mirrors `try_inject_graphics_interrupt`, shares `interrupts.saved` slot for mutex with graphics callbacks. Gating: ticker + injector run only when `--xaudio-tick` / `XENIA_XAUDIO_TICK=1`. Default off because Sylpheed's audio callback enters an infinite `KeWaitForSingleObject` loop on first invocation (canary's host worker thread provides the buffer-completion fence we don't model), which hijacks a guest HW thread and regresses `swaps=2 → 1`. Default-off preserves the lockstep `sylpheed_n*m.json` goldens exactly. Producer hunt outcome (FALSIFIED for parked handles 0x1004/0x100c/0x15e4): at `-n 500M --xaudio-tick` all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`. Audio callback is not the missing producer. Next candidate per audit-findings.md is Timer DPC delivery (KeSetTimer / KeInsertQueueDpc). Tests: 562 → 576 green (10 in `xaudio.rs`, 4 in `exports.rs`). Lockstep `--stable-digest -n 100M` default-off: instructions=100000002, swaps=2 (matches pre-change baseline byte-for-byte). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7a1b6b3306 |
fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel
The Phase-C VdSwap PM4 ring path (commit |
||
|
|
e7d0fcf2c9 |
fix(kernel): KRNBUG-017 — real Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql
The Kf-family spinlock exports were registered as stubs: KfAcquireSpinLock → stub_return_zero (didn't write lock) KfReleaseSpinLock → stub_success (didn't clear lock) KeReleaseSpinLockFromRaisedIrql → stub_success (same) KeTryToAcquireSpinLockAtRaisedIrql → returned 1 but didn't set lock value Guest code that read the lock value back (e.g. nested acquire/release sanity checks, debug assertions) saw 0 even after "acquiring", and could enter critical regions without contention serialization. Under `--parallel` the coarse Arc<Mutex<KernelState>> already serializes us, so the audit's P0-under-parallel ranking is about correctness of the lock value visible to guest code, not mutual-exclusion (which is provided by the host mutex). Implementation mirrors canary's `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc`: - KfAcquireSpinLock: write 1 to *SpinLock, return 0 (old IRQL) - KfReleaseSpinLock: write 0 to *SpinLock - KeReleaseSpinLockFromRaisedIrql: write 0 to *SpinLock - KeTryToAcquireSpinLockAtRaisedIrql: write 1 to *SpinLock, return 1 Single-threaded HLE: contention can never be observed (we never run two guest threads simultaneously without holding the kernel mutex), so the spin-loop can degenerate to an unconditional acquire. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (gated by F2/F3/G) packets: ~59M (within noise) Tests: 76 kernel pass (no count change; existing harness covers the new write semantics implicitly via guest-memory smoke tests). Closes KRNBUG-017 (P0 under --parallel). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
82f3d611e2 |
fix(gpu,kernel): KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 — VdSwap PM4 ring path
The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and
called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving
the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping
the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/
blur "sample frame N for frame N+1" path samples fetch-constant slot 0
expecting the frontbuffer descriptor; without the patch, slot 0 stayed
stale and any shader sampling it read garbage.
This commit writes the canary VdSwap PM4 sequence directly into the
primary ring at the current write pointer (read via the shared MMIO
atomic), then advances WPTR over the injection. The natural CP drain
consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant
slot 0 — without going through any direct kernel→GPU bypass.
Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521):
1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with
base_address re-patched from virtual to physical >> 12).
2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys
+ width + height.
Mechanism notes:
- buffer_ptr in xenia-rs is in the system command buffer, NOT the
primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs
ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to
buffer_ptr because its ring layout maps the reserved slot inside
the ring; xenia-rs's doesn't, so we have to write at the actual
ring WPTR address (cached on KernelState.ring_base from
VdInitializeRingBuffer).
- The original "buffer_ptr zero-fill + bump WPTR by 64" path is
preserved before the injection — it exposes any game-batched PM4
packets and keeps the buffer_ptr region skippable per existing
game compat behavior.
- A safety-net fallback at the end calls `notify_xe_swap` directly if
swaps_seen didn't advance during the drain (e.g. a ring-arithmetic
edge case). Idempotent — only fires when the PM4 path didn't.
- KRNBUG-Mm-04 deferred: virt→phys uses the masked stub
`virt & 0x1FFF_FFFF`, sufficient for the standard heap.
Mechanical changes:
- crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3
helpers + round-trip unit test (mirrors canary xenos.h:1682-1709).
- crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor
(Acquire-load) so the kernel can compute ring offsets.
- crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords
on KernelState (set by VdInitializeRingBuffer).
- crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit
block; patch fetch_dwords[1] base_address virt→phys before injection.
Verification at -n 100M lockstep:
swaps: 2 → 2 (game fires VdSwap exactly twice)
draws: 0 → 0 (gated by Phases D+E)
fallback warning: 0 occurrences (PM4 path consumed both swaps)
instructions: ~100M
Tests: 552 passing (553 with new pm4 round-trip test). Lockstep
stable-fields determinism: byte-identical across two 100M runs.
The "swaps > 2" prediction in the audit's plan assumed the game would
fire VdSwap more often once the path worked; empirically Sylpheed only
calls VdSwap twice within 100M instructions (this is the renderer
plateau the audit identified). The success criterion for Phase C is
that the PM4 path is now operational, which Phases D+E require for
visible draws.
Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5f0d6487ea |
xenia-kernel: HLE expansion, scheduler integration, audit + UI bridge
Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds
count/limit, XexGet{Module,Procedure}Address use distinct
HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse
(ModuleId,ordinal)→thunk_addr map, plus sweeping additions across
sync primitives, file I/O, semaphores, events, threads, and
allocator paths needed to advance Sylpheed past VdSwap=2.
New modules:
- thread.rs — ThreadRef + per-thread suspension/wake plumbing
- interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers
- path.rs — guest path normalization (D:\\, game:\\, etc.)
- audit.rs — --trace-handles harness backing the handle audit
- ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge
(input snapshots, framebuffer publish handles)
state.rs grows to own the HW-slot scheduler state, the new audit /
UI bridge handles, and the per-handle reverse maps. xam.rs and
objects.rs follow suit for the HLE additions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c694bb3f43 |
Initial commit: xenia-rs workspace for Xbox 360 RE
Rust reimplementation of the xenia Xbox 360 emulator targeting reverse- engineering and preservation, initially scoped to Project Sylpheed. Includes: - XEX2 loader (LZX decompression, AES decryption, PE parsing) - XISO / XGD2 disc image VFS - PPC interpreter with 200+ opcodes and VMX128 decoding - Static analyzer: functions, cross-references, labels, asm + SQLite output - HLE kernel covering the xboxkrnl/xam subset used by Sylpheed init - Debugger with in-memory and SQLite-backed execution tracing - `xenia-rs` CLI with extract/dis/exec commands that produce cumulative, superset SQLite databases and opt-in instruction/import/branch traces Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |