MechaCat02 2bdb93e51e [iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never
truly drained → render worker ring-space wait → no frame → no draw):

The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror
windows differing only in cache policy — bare physical (0x0xxxxxxx),
write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing
addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc`
(MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx
window. The guest masks its CP-ring allocation base to bare physical
(0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to
VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve
pointers are likewise bare-physical. Ours stored those verbatim and read
`membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU
drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream,
and the RPtr writeback landed on a zero page the render worker (tid=8) polls,
freezing it forever.

Fix (GPU/Vd-boundary translation, not memory-layer): add
`physical_to_backing(addr)` deriving the committed backing exactly from
`heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for
the WC window, flat for non-physical code/stack). Apply it at every point the
GPU/kernel consumes a guest physical address: ring base
(initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4
INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write,
REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses,
the resolve dest write, and the vd_swap frontbuffer present read. This was
chosen over memory-layer aliasing because the latter re-projects every CPU
load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an
early PC=0xfffffffc fault).

Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful):
- WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef
  selector (it decoded wait_info&7==3 as NotEqual instead of Equal),
  inverting the standard CP coherency wait so the GPU parked forever on the
  first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never.
- Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit
  (mirrors command_processor.cc:801-838) so the coherency handshake resolves.

Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT,
INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of
zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU
worker); the frame loop sub_822F1AA8 actively writes the controller at
0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean.

draws_seen is still 0: the remaining gate is upstream and separate — the main
frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls
at 0x23, the known iterate-2C state-divergence gate), so the guest never
enqueues a render IB; the GPU only ever replays the init IB. This fix
correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the
bit-28 frame-ready gate is the next target.

Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at
50M); regenerated twice byte-identical. cargo test --workspace: 672 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 13:39:57 +02:00

xenia-rs

Rust reimplementation of the Xbox 360 emulator xenia, focused on reverse-engineering and preservation rather than full-speed play. The initial target is Project Sylpheed — Arc of Deception; getting the title disassembled, traced, and far enough into its init path to understand its engine.

Heavy cross-reference to xenia-canary for CPU context setup, kernel export behavior, and XEX loading semantics.

Status

  • XEX loader — XEX2 header parsing, LZX decompression, AES decryption, PE section parsing.
  • VFS / XISO — XGD2 dual-layer disc images (with the 0x0FD90000 partition offset).
  • PPC interpreter — 200+ opcodes, PowerPC 32/64-bit GPR/FPR, VMX128 decoding.
  • Static analyzer — function discovery (prolog/epilog heuristics), cross-references, labels, save/restore helper detection, assembly text + SQLite database output.
  • Kernel HLE — minimal subset driving Project Sylpheed: ~170 xboxkrnl + xam exports (critical sections, events, TLS, virtual memory, Vd stubs, XAM input/user/content).
  • Debugger — in-memory step/break, SQLite execution + import-call + branch tracing.

Not yet: GPU (xenos/xe-shader), APU audio, HID, kernel scheduler, full threading, exception delivery.

Workspace

crates/
  xenia-types       # shared primitive types, bitflags
  xenia-memory      # guest memory, paged allocator, page table
  xenia-cpu         # PPC decoder, interpreter, context
  xenia-xex         # XEX2 loader, PE parser, LZX, AES
  xenia-vfs         # XISO / disc-image reader
  xenia-kernel      # HLE kernel state, exports, XAM
  xenia-gpu         # (stub) Xenos command processor
  xenia-apu         # (stub) XAudio
  xenia-hid         # (stub) XInput
  xenia-debugger    # in-memory trace, breakpoints, step modes
  xenia-analysis    # function/xref analysis, assembly formatter, SQLite DbWriter
  xenia-app         # `xenia-rs` CLI binary

CLI

Build:

cargo build --release

The binary xenia-rs accepts XEX2 files or ISO / XISO disc images as input (the loader auto-detects discs and extracts default.xex).

info / browse / disasm

Quick header / disc / first-N-instructions inspection. See --help.

extract — unpack PE + metadata

xenia-rs extract <xex-or-iso> [-o <out-dir>] [--db <sqlite-path>]

Writes <name>.pe (decompressed/decrypted PE image) and <name>.xex.json (header metadata). With --db, also emits a SQLite database containing the base tables: metadata, sections, imports.

dis — full disassembly

xenia-rs dis <xex-or-iso> [-o <asm-file>] [--db <sqlite-path>] [--quiet]

Runs function + cross-reference analysis and produces:

  • assembly text to stdout or -o <file> (unless --quiet)
  • optional SQLite DB with the base tables + disasm tables: functions, labels, instructions, xrefs

exec — interpret with tracing

xenia-rs exec <xex-or-iso> [-n <max-instrs>] [--db <sqlite-path>]
             [--trace-instructions] [--trace-imports] [--trace-branches]

Loads the title, initializes CPU state per xenia-canary, intercepts import thunks with HLE kernel calls, and interprets from the entry point. Without -n, runs until halt/fault. With --db, produces a DB that is a superset of dis --db plus opt-in trace tables:

flag table rows
--trace-instructions exec_trace one row per interpreted instruction (PC, r3/r4, LR, SP)
--trace-imports import_calls one row per kernel/XAM call (module, ordinal, args)
--trace-branches branch_trace taken branches classified as call/return/jump/branch

Cumulative DB layering

Each command's DB is a superset of the previous. A single xenia-rs exec <iso> --db full.db --trace-instructions --trace-imports --trace-branches produces the full picture in one pass — base tables, complete static disassembly, and runtime traces correlatable by address/cycle.

Performance knobs

  • XENIA_DB_BATCH_SIZE — rows per streaming commit / trace-buffer flush (default 100_000). Lower values reduce memory use; higher values reduce fsync overhead on slow disks.

The DB writer uses journal_mode=OFF, synchronous=OFF, locking_mode=EXCLUSIVE and commits in batches; no ANALYZE is run at finalize. Indices are created after bulk insertion with progress messages.

Example queries

-- Top 20 kernel functions called during early init
SELECT name, COUNT(*) FROM import_calls GROUP BY name ORDER BY 2 DESC LIMIT 20;

-- All basic-block leaders (targets of taken branches) not already labelled
SELECT DISTINCT bt.target
FROM branch_trace bt LEFT JOIN labels l ON l.address = bt.target
WHERE l.address IS NULL;

-- Correlate a traced call site with its static disassembly
SELECT et.cycle, i.disasm, i.ext_disasm
FROM exec_trace et JOIN instructions i ON i.address = et.address
WHERE et.address = 0x824AB748 ORDER BY et.cycle;

License

BSD-3-Clause, matching upstream xenia.

Description
No description provided
Readme 7.1 MiB
Languages
Rust 98%
WGSL 1.7%
Python 0.3%