Re-shape MemoryAccess so write methods take &self and rely on interior
mutability (atomics in GuestMemory, Cell in test mocks). This unblocks
the &Arc<KernelState>-only execution model the CPU/HLE crates moved to.
GuestMemory grows: per-4 KiB-page write-version counter (page_version)
that the CPU's decode cache and the texture cache observe via Acquire,
fenced 32-bit/64-bit read/write helpers (Release on writer / Acquire on
reader) that PM4_EVENT_WRITE_SHD and the matching CPU consumers use to
synchronize fence publication, and broader page-table / heap accounting
needed by the new HLE allocators.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>