Files
xenia-rs/migration/claude-memory/project_xenia_rs_scheduler.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

5.9 KiB
Raw Blame History

name, description, type, originSessionId
name description type originSessionId
xenia-rs scheduler architecture (post-Axis-1-to-5 refactor, 2026-04-23) Canonical scheduler model — 6 HW slots × per-slot priority runqueues, single host thread, GuestThread as first-class, ThreadRef identity, bind-and-migrate affinity. Supersedes the old HwThread[32] one-thread-per-slot model. project a178fdd6-2965-4652-903a-f684cf80835d

Model in one paragraph

Single host thread runs the interpreter (GuestMemory pinned). Scheduler has 6 HwSlots matching Xenon hardware. Each slot holds runqueue: Vec<GuestThread> + running_idx: Option<usize>. A GuestThread owns its own PpcContext inline — the live register file is always the one on whichever thread the slot has pinned as running, so context switch is just a running_idx flip (no memcpy). Unlimited guest threads per slot.

Identity

ThreadRef { hw_id: u8, idx: u16 } — 4-byte positional identity used across the boundary. Waiter lists in KernelObject::{Event,Semaphore,Mutex,Thread}, state.cs_waiters, interrupts.injected_ref, and scheduler.timed_waits all store ThreadRef (not raw hw_id). After swap_remove (Axis 4 migration), refs are fixed up via MigrationFixup::apply.

Compat accessors (how ~30 call-sites survived the data-model refactor)

scheduler.ctx(hw_id) / ctx_mut(hw_id) / ctx_mut_ref(r) / state(hw_id) / tid(hw_id) / thread_handle(hw_id) / suspend_count_mut(hw_id) / current_hw_id() — each resolves through slots[hw_id].running_idx. Safe sentinel (idle_ctx) returned when running_idx is None. This let the refactor avoid rewriting every hw_threads[i].ctx site in main.rs and exports.rs.

Scheduling

  • HwSlot::pick_runnable — highest-priority Ready/ServicingIrq thread; tiebreak lowest idx.
  • Scheduler::round_schedule — emits slot ids in rotating order starting from rotation_cursor, filtered by non_empty_runnable: u8 bitset. Empty-slot fast path. OrderMode::Seeded layers Fisher-Yates on top of the filtered list.
  • Scheduler::begin_slot_visit(hw_id) — called by main.rs at top of each slot iteration; picks runnable, sets running_idx, writes self.current: Option<ThreadRef>.
  • Scheduler::decrement_quantum() — Axis 3 per-instruction tick; on hit-zero, reloads to QUANTUM_DEFAULT = 50_000 and rotates within same-priority tier (observed next round, not mid-instruction).

Affinity + priority (Axis 4/5 wire-up)

  • KeSetAffinityThread(handle, mask) -> old_mask does real migration: set_affinity_ref finds the thread, updates mask, if current slot no longer allowed → swap_remove from source slot, push onto least-depth allowed slot, rewrite PCR+0x2C, return MigrationFixup. KernelState::set_affinity walks every waiter list and applies the fixup.
  • Self-migration handling: if the migrating thread is scheduler.current, the ref is updated in place. call_export's post-call ctx restore re-reads current (not the stashed entry ref) so ctx lands on the new slot. main.rs's post-export pc = lr advance uses post_ref = scheduler.current for the same reason.
  • KeSetBasePriorityThread / KeQueryBasePriorityThread store/read GuestThread.priority: i32. NT-style [-15..+15], default 0. Drives pick_runnable.
  • KeSetIdealProcessor / KeQueryIdealProcessor / NtSetInformationThread (classes 2/3/13) wired; ideal is a spawn-placement hint (not migrate-on-change).

Lifecycle details

  • exit_current flips state to Exited(code) but does NOT Vec::remove (would invalidate peer ThreadRefs). Pruning happens at spawn time via prune_exited_if_needed when a slot reaches PRUNE_DEPTH_THRESHOLD = 4.
  • install_initial_thread on Scheduler lives next to spawn; both write PCR+0x2C = hw_id via the PcrWriter trait (impl GuestMemoryPcr in state.rs).
  • KernelObject::Thread.waiters: Vec<ThreadRef> (not Vec<u8>) — necessary for correctness under per-slot runqueues.

Known caveat (2026-04-23)

Axis 4's real migration distributes Sylpheed's workers across slots differently than the old 32-slot one-per-slot model. The resulting wait/signal chain trips a single scheduler.deadlock_recoveries event during boot; default force-wake recovery resolves it and the game progresses to VdSwap=2 (up from pre-Axis-4's 1). Under --halt-on-deadlock this trips scheduler.deadlock_halts = 1 at ~7.5M cycles. The issue is a latent HLE sync-primitive gap exposed by correct migration, not an Axis 4 defect. Root cause: one of tid=1/3/4/7's blocking events isn't being signaled by its expected source after thread layout changes. Track down by instrumenting the specific handle values (0x10FC, 0x1014, 0x1104, 0x10DC/0x10F0) in a future session.

Files

Metrics added

  • scheduler.spawn.ok — successful spawns
  • scheduler.spawn.rejected — spawn failures (should stay 0)
  • scheduler.deadlock_recoveries — force-wake events (non-zero post-Axis-4; see caveat)
  • scheduler.deadlock_halts — halts under --halt-on-deadlock