xenia-rs/migration/claude-memory/project_xenia_rs_current_state.md at e6d43a23ac393004d2e5adf2f0395fd0b5e6448b

Files

MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup

Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-10 21:38:38 +02:00

7.5 KiB

Raw Blame History

name, description, type, originSessionId

name	description	type	originSessionId
xenia-rs current state — boot/render progress + active blockers	Where Sylpheed boot sits now (2026-04-24 — IRQ-injection stack-pad + full-volatile register save fix lands; second VdSwap fires)	project	5465978c-b9ad-47fb-ab6d-e8e3053646af

What works end-to-end (2026-04-24)

Sylpheed now reaches its second VdSwap (first real frame). Previous sessions stopped at the splash frame (VdSwap=1) because our graphics-interrupt injection was stomping the interrupted thread's stack-saved LR — see "Root cause" below.

Observed after the fix, at 3 B instructions:

VdSwap frame=1 splash at ~18 M cycles, VdSwap frame=2 at ~28 M cycles
scheduler.deadlock_halts = 0, deadlock_recoveries = 0 (clean)
All 351 workspace tests green
tid=5 stays alive past cycle 7.5 M (was exiting there pre-fix)
RtlEnterCriticalSection / LeaveCriticalSection dropped ~1300× versus pre-fix (was the symptom of the corruption, not the cause)

Root cause — IRQ injection was stomping `[r1 - 8]` on the interrupted thread's stack

The graphics-interrupt injector (try_inject_graphics_interrupt in main.rs) overwrote pc/lr/r3/r4 on whichever thread it picked, but left r1 (SP) untouched. The ISR callback's prologue immediately does mflr r12; bl __savegprlr_N where __savegprlr_N stores r12 (= LR_HALT_SENTINEL, just set by injection) at [r1 − 8]. That slot is exactly where the interrupted function's own prologue had saved its caller's return address (standard PPC savegprlr layout). When the interrupted function eventually ran its __restgprlr_N tail → bclr, it loaded SENTINEL into LR and jumped there, silently terminating the thread through the halt-sentinel path rather than the intended return.

Observed concretely: tid=5 hit LR_HALT_SENTINEL via from_pc=0x825f0ff0 (the shared restgprlr bclr) with r12=0xBCBCBCBC — i.e. the value it just read from the stack at [r1 - 8]. Six normal vsync-ISR returns had ctr=0x821753c8 (ISR-path resolved correctly); the 7th exit had ctr=0x00000000 — this one was sub_82458B90 returning with the stack-saved LR clobbered. Matches canary's workaround at Processor::Execute (lines 381–394) which decrements r[1] by 64 + 112 = 176 before calling the ISR callback and restores after — the comment says "games seem to overwrite the caller by about 16 to 32b," with the pad sized generously.

The fix (2026-04-24)

CALLBACK_STACK_PAD = 176 applied in two places:

SavedCallbackCtx now captures/restores all PPC volatile GPRs (r0, r2–r12) plus r1 (SP), pc, lr, ctr, and cr. The non-volatile set (r13–r31) is preserved by the callback's own __savegprlr_N prologue/epilogue per the PPC ELF ABI, so it doesn't need stashing.
try_inject_graphics_interrupt decrements ctx.gpr[1] by CALLBACK_STACK_PAD after SavedCallbackCtx::capture (so the saved r1 is the pre-inject value) and before setting pc = callback_pc. The callback now prologues into [injected_r1 − 176 − 8] instead of stomping [injected_r1 − 8]. On return, SavedCallbackCtx::restore puts r1 back.

Thin unit-test coverage: the existing inject_restore_roundtrip_smoke test in interrupts.rs still passes (just a smoke test for pc/lr/r3-r4 roundtrip); extending it to cover the new SP + r0/r2/r7-r12 paths would be a cheap follow-up.

Concrete next-session blockers (post-fix)

tid=5 is now alive, progresses through multiple work items, and drives the data-stream decoders (sub_8280AD40 = inflate, sub_828085E0 = Adler-32, sub_82807AB8 = CRC-32 — all around 0x82807-0x8280C). Observed behaviour at 3 B instructions:

Sylpheed boot is CPU-bound on stream decode. At 10 MIPS interpreter throughput, the per-asset inflate + Adler/CRC passes eat multi-seconds of wall time each. Second VdSwap fires at ~28 M cycles (~3 s wall). For first-pixels to be visually obvious (dozens of frames), we likely need the Tier-4 JIT or at least threaded-code dispatch. Order of magnitude: real HW boots Sylpheed to menu in ~2–3 s at ~200 MIPS; we're ~20× slower.
wgpu→ShadowEdram RT readback (P1 from prior memory) — frame-2+ blocker once draws fire. See edram-resolve-gap memory.
Keep verifying with exec --halt-on-deadlock -n 500_000_000 — still clean post-fix. Any regression here means a new sync bug.

Investigation tools available

dump_thread_diagnostic (from 2026-04-23b) — prints per-thread state + handle/CS waiter maps at normal -n N exit. Now also dumps r0–r13 for every thread (expanded 2026-04-24).
disasm --at <addr> -n N — unchanged.
DuckDB xrefs — see project_xenia_rs_duckdb.md.
PC → LR_HALT_SENTINEL tracer pattern — reference impl in 2026-04-24 diff on main.rs; was instrumental for this fix. Reverted after use.
Adler/CRC entry probes — one-shot tracing::warn!(target: "adler_probe", ...) at the pc == 0x828085E0 && tid == 5 site. Logs lr/r3/r4/r5 at entry. Reverted after use.

Confirmed NOT the issue (verified this session)

VdCallGraphicsNotificationRoutines stub — canary matches, Sylpheed doesn't register notifications.
NtSetEvent / KeSetEvent return-value semantics — match canary.
Graphics-interrupt injection per-vsync — fires correctly, delivered counter scales with VSYNC_INSTR_PERIOD = 150_000.
Ring-buffer write-back — correct.
PKEVENT shadow refresh — correct.
Event/semaphore handle table — correct; the pre-fix "main stuck on 0x10fc" was a symptom of tid=5 dying before producing the signal, not a handle-table bug.

Architectural patterns (stable, don't re-derive)

Scheduler + HW slots + ThreadRef — see project_xenia_rs_scheduler.md.
UI bridge + GPU pipeline + MMIO + HUD — see project_xenia_rs_ui.md.
PKEVENT shim — ensure_dispatcher_object reads DISPATCHER_HEADER type on first touch.
IRQ injection stack discipline (new 2026-04-24): the injected callback runs on a 176-byte-padded extension of the interrupted thread's stack. SavedCallbackCtx captures/restores r0, r1, r2–r12 + pc/lr/ctr/cr. Non-volatile regs (r13–r31) are not in the save set because the callback prologue handles them. Canary's Processor::Execute uses the same 64+112 pad.
Main thread return ≠ emulator halt — unchanged.

Memory-model caveats

pending_timer_fires is keyed by handle (u32). NtClose / NtCancelTimer / NtSetTimerEx manage lifecycle. (Sylpheed doesn't use timers on the boot path.)
waiters_mut() on KernelObject returns None for File and Some for the 5 sync variants.
Handle allocator starts at 0x1000, bumps by 4.

Files touched in the 2026-04-24 session

xenia-kernel/src/interrupts.rs — SavedCallbackCtx expanded to gprs: [u64; 13] (r0–r12), added CALLBACK_STACK_PAD = 176 constant with docs citing canary as ground-truth.
xenia-app/src/main.rs — try_inject_graphics_interrupt now ctx.gpr[1] -= CALLBACK_STACK_PAD after capture, before setting callback PC. dump_thread_diagnostic expanded to print r0/r3–r13.

7.5 KiB Raw Blame History Unescape Escape