chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
76
migration/claude-memory/project_xenia_rs_current_state.md
Normal file
76
migration/claude-memory/project_xenia_rs_current_state.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
name: xenia-rs current state — boot/render progress + active blockers
|
||||
description: Where Sylpheed boot sits now (2026-04-24 — IRQ-injection stack-pad + full-volatile register save fix lands; second VdSwap fires)
|
||||
type: project
|
||||
originSessionId: 5465978c-b9ad-47fb-ab6d-e8e3053646af
|
||||
---
|
||||
## What works end-to-end (2026-04-24)
|
||||
|
||||
Sylpheed **now reaches its second `VdSwap` (first real frame)**. Previous sessions stopped at the splash frame (`VdSwap=1`) because our graphics-interrupt injection was stomping the interrupted thread's stack-saved LR — see "Root cause" below.
|
||||
|
||||
Observed after the fix, at 3 B instructions:
|
||||
|
||||
- `VdSwap frame=1` splash at ~18 M cycles, `VdSwap frame=2` at ~28 M cycles
|
||||
- `scheduler.deadlock_halts = 0`, `deadlock_recoveries = 0` (clean)
|
||||
- All 351 workspace tests green
|
||||
- tid=5 stays alive past cycle 7.5 M (was exiting there pre-fix)
|
||||
- `RtlEnterCriticalSection` / `LeaveCriticalSection` dropped ~1300× versus pre-fix (was the symptom of the corruption, not the cause)
|
||||
|
||||
## Root cause — IRQ injection was stomping `[r1 - 8]` on the interrupted thread's stack
|
||||
|
||||
The graphics-interrupt injector (`try_inject_graphics_interrupt` in [main.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-app/src/main.rs)) overwrote `pc`/`lr`/`r3`/`r4` on whichever thread it picked, but left `r1` (SP) untouched. The ISR callback's prologue immediately does `mflr r12; bl __savegprlr_N` where `__savegprlr_N` stores `r12` (= `LR_HALT_SENTINEL`, just set by injection) at `[r1 − 8]`. That slot is **exactly where the interrupted function's own prologue had saved its caller's return address** (standard PPC savegprlr layout). When the interrupted function eventually ran its `__restgprlr_N` tail → `bclr`, it loaded `SENTINEL` into LR and jumped there, silently terminating the thread through the halt-sentinel path rather than the intended return.
|
||||
|
||||
Observed concretely: tid=5 hit `LR_HALT_SENTINEL` via `from_pc=0x825f0ff0` (the shared `restgprlr` bclr) with `r12=0xBCBCBCBC` — i.e. the value it just read from the stack at `[r1 - 8]`. Six normal vsync-ISR returns had `ctr=0x821753c8` (ISR-path resolved correctly); the 7th exit had `ctr=0x00000000` — this one was `sub_82458B90` returning with the stack-saved LR clobbered. Matches canary's workaround at [`Processor::Execute`](../../../RE%20Project%20Sylpheed/xenia-canary/src/xenia/cpu/processor.cc#L383) (lines 381–394) which decrements `r[1]` by 64 + 112 = 176 before calling the ISR callback and restores after — the comment says "games seem to overwrite the caller by about 16 to 32b," with the pad sized generously.
|
||||
|
||||
## The fix (2026-04-24)
|
||||
|
||||
[`CALLBACK_STACK_PAD = 176`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/interrupts.rs) applied in two places:
|
||||
|
||||
1. **`SavedCallbackCtx`** now captures/restores **all PPC volatile GPRs** (`r0`, `r2`–`r12`) plus `r1` (SP), `pc`, `lr`, `ctr`, and `cr`. The non-volatile set (`r13`–`r31`) is preserved by the callback's own `__savegprlr_N` prologue/epilogue per the PPC ELF ABI, so it doesn't need stashing.
|
||||
2. `try_inject_graphics_interrupt` decrements `ctx.gpr[1]` by `CALLBACK_STACK_PAD` **after** `SavedCallbackCtx::capture` (so the saved `r1` is the pre-inject value) and **before** setting `pc = callback_pc`. The callback now prologues into `[injected_r1 − 176 − 8]` instead of stomping `[injected_r1 − 8]`. On return, `SavedCallbackCtx::restore` puts `r1` back.
|
||||
|
||||
Thin unit-test coverage: the existing `inject_restore_roundtrip_smoke` test in [interrupts.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/interrupts.rs) still passes (just a smoke test for pc/lr/r3-r4 roundtrip); extending it to cover the new SP + r0/r2/r7-r12 paths would be a cheap follow-up.
|
||||
|
||||
## Concrete next-session blockers (post-fix)
|
||||
|
||||
tid=5 is now alive, progresses through multiple work items, and drives the data-stream decoders (`sub_8280AD40` = inflate, `sub_828085E0` = Adler-32, `sub_82807AB8` = CRC-32 — all around 0x82807-0x8280C). Observed behaviour at 3 B instructions:
|
||||
|
||||
1. **Sylpheed boot is CPU-bound on stream decode.** At 10 MIPS interpreter throughput, the per-asset inflate + Adler/CRC passes eat multi-seconds of wall time each. Second `VdSwap` fires at ~28 M cycles (~3 s wall). For first-pixels to be visually obvious (dozens of frames), we likely need the Tier-4 JIT or at least threaded-code dispatch. Order of magnitude: real HW boots Sylpheed to menu in ~2–3 s at ~200 MIPS; we're ~20× slower.
|
||||
2. **wgpu→ShadowEdram RT readback** (P1 from prior memory) — frame-2+ blocker once draws fire. See [edram-resolve-gap memory](project_xenia_rs_edram_resolve_gap.md).
|
||||
3. **Keep verifying with `exec --halt-on-deadlock -n 500_000_000`** — still clean post-fix. Any regression here means a new sync bug.
|
||||
|
||||
## Investigation tools available
|
||||
|
||||
- **`dump_thread_diagnostic`** (from 2026-04-23b) — prints per-thread state + handle/CS waiter maps at normal `-n N` exit. Now also dumps r0–r13 for every thread (expanded 2026-04-24).
|
||||
- **`disasm --at <addr> -n N`** — unchanged.
|
||||
- **DuckDB xrefs** — see [project_xenia_rs_duckdb.md](project_xenia_rs_duckdb.md).
|
||||
- **PC → LR_HALT_SENTINEL tracer pattern** — reference impl in `2026-04-24` diff on [main.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-app/src/main.rs); was instrumental for this fix. Reverted after use.
|
||||
- **Adler/CRC entry probes** — one-shot `tracing::warn!(target: "adler_probe", ...)` at the `pc == 0x828085E0 && tid == 5` site. Logs lr/r3/r4/r5 at entry. Reverted after use.
|
||||
|
||||
## Confirmed NOT the issue (verified this session)
|
||||
|
||||
- `VdCallGraphicsNotificationRoutines` stub — canary matches, Sylpheed doesn't register notifications.
|
||||
- `NtSetEvent` / `KeSetEvent` return-value semantics — match canary.
|
||||
- Graphics-interrupt injection per-vsync — fires correctly, delivered counter scales with `VSYNC_INSTR_PERIOD = 150_000`.
|
||||
- Ring-buffer write-back — correct.
|
||||
- PKEVENT shadow refresh — correct.
|
||||
- Event/semaphore handle table — correct; the pre-fix "main stuck on 0x10fc" was a *symptom* of tid=5 dying before producing the signal, not a handle-table bug.
|
||||
|
||||
## Architectural patterns (stable, don't re-derive)
|
||||
|
||||
- **Scheduler + HW slots + ThreadRef** — see [project_xenia_rs_scheduler.md](project_xenia_rs_scheduler.md).
|
||||
- **UI bridge + GPU pipeline + MMIO + HUD** — see [project_xenia_rs_ui.md](project_xenia_rs_ui.md).
|
||||
- **PKEVENT shim** — `ensure_dispatcher_object` reads DISPATCHER_HEADER type on first touch.
|
||||
- **IRQ injection stack discipline** (new 2026-04-24): the injected callback runs on a **176-byte-padded extension** of the interrupted thread's stack. `SavedCallbackCtx` captures/restores r0, r1, r2–r12 + pc/lr/ctr/cr. Non-volatile regs (r13–r31) are not in the save set because the callback prologue handles them. Canary's Processor::Execute uses the same 64+112 pad.
|
||||
- **Main thread return ≠ emulator halt** — unchanged.
|
||||
|
||||
## Memory-model caveats
|
||||
|
||||
- `pending_timer_fires` is keyed by handle (u32). `NtClose` / `NtCancelTimer` / `NtSetTimerEx` manage lifecycle. (Sylpheed doesn't use timers on the boot path.)
|
||||
- `waiters_mut()` on `KernelObject` returns `None` for `File` and `Some` for the 5 sync variants.
|
||||
- Handle allocator starts at `0x1000`, bumps by 4.
|
||||
|
||||
## Files touched in the 2026-04-24 session
|
||||
|
||||
- `xenia-kernel/src/interrupts.rs` — `SavedCallbackCtx` expanded to `gprs: [u64; 13]` (r0–r12), added `CALLBACK_STACK_PAD = 176` constant with docs citing canary as ground-truth.
|
||||
- `xenia-app/src/main.rs` — `try_inject_graphics_interrupt` now `ctx.gpr[1] -= CALLBACK_STACK_PAD` after capture, before setting callback PC. `dump_thread_diagnostic` expanded to print r0/r3–r13.
|
||||
Reference in New Issue
Block a user