Files
xenia-rs/migration/claude-memory/project_xenia_rs_scheduler.md
MechaCat02 e6d43a23ac chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:38:38 +02:00

57 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: xenia-rs scheduler architecture (post-Axis-1-to-5 refactor, 2026-04-23)
description: Canonical scheduler model — 6 HW slots × per-slot priority runqueues, single host thread, GuestThread as first-class, ThreadRef identity, bind-and-migrate affinity. Supersedes the old HwThread[32] one-thread-per-slot model.
type: project
originSessionId: a178fdd6-2965-4652-903a-f684cf80835d
---
## Model in one paragraph
Single host thread runs the interpreter (`GuestMemory` pinned). Scheduler has **6 `HwSlot`s** matching Xenon hardware. Each slot holds `runqueue: Vec<GuestThread>` + `running_idx: Option<usize>`. A `GuestThread` owns its own `PpcContext` inline — the live register file is always the one on whichever thread the slot has pinned as running, so context switch is just a `running_idx` flip (no memcpy). Unlimited guest threads per slot.
## Identity
`ThreadRef { hw_id: u8, idx: u16 }` — 4-byte positional identity used across the boundary. Waiter lists in `KernelObject::{Event,Semaphore,Mutex,Thread}`, `state.cs_waiters`, `interrupts.injected_ref`, and `scheduler.timed_waits` all store `ThreadRef` (not raw hw_id). After `swap_remove` (Axis 4 migration), refs are fixed up via `MigrationFixup::apply`.
## Compat accessors (how ~30 call-sites survived the data-model refactor)
`scheduler.ctx(hw_id) / ctx_mut(hw_id) / ctx_mut_ref(r) / state(hw_id) / tid(hw_id) / thread_handle(hw_id) / suspend_count_mut(hw_id) / current_hw_id()` — each resolves through `slots[hw_id].running_idx`. Safe sentinel (`idle_ctx`) returned when running_idx is None. This let the refactor avoid rewriting every `hw_threads[i].ctx` site in [main.rs](xenia-rs/crates/xenia-app/src/main.rs) and [exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs).
## Scheduling
- **`HwSlot::pick_runnable`** — highest-priority Ready/ServicingIrq thread; tiebreak lowest idx.
- **`Scheduler::round_schedule`** — emits slot ids in rotating order starting from `rotation_cursor`, filtered by `non_empty_runnable: u8` bitset. Empty-slot fast path. `OrderMode::Seeded` layers Fisher-Yates on top of the filtered list.
- **`Scheduler::begin_slot_visit(hw_id)`** — called by main.rs at top of each slot iteration; picks runnable, sets `running_idx`, writes `self.current: Option<ThreadRef>`.
- **`Scheduler::decrement_quantum()`** — Axis 3 per-instruction tick; on hit-zero, reloads to `QUANTUM_DEFAULT = 50_000` and rotates within same-priority tier (observed next round, not mid-instruction).
## Affinity + priority (Axis 4/5 wire-up)
- **`KeSetAffinityThread(handle, mask) -> old_mask`** does real migration: `set_affinity_ref` finds the thread, updates mask, if current slot no longer allowed → `swap_remove` from source slot, push onto least-depth allowed slot, rewrite `PCR+0x2C`, return `MigrationFixup`. `KernelState::set_affinity` walks every waiter list and applies the fixup.
- **Self-migration handling**: if the migrating thread is `scheduler.current`, the ref is updated in place. `call_export`'s post-call ctx restore re-reads `current` (not the stashed entry ref) so ctx lands on the new slot. `main.rs`'s post-export `pc = lr` advance uses `post_ref = scheduler.current` for the same reason.
- **`KeSetBasePriorityThread` / `KeQueryBasePriorityThread`** store/read `GuestThread.priority: i32`. NT-style [-15..+15], default 0. Drives `pick_runnable`.
- **`KeSetIdealProcessor` / `KeQueryIdealProcessor` / `NtSetInformationThread`** (classes 2/3/13) wired; ideal is a spawn-placement hint (not migrate-on-change).
## Lifecycle details
- `exit_current` flips state to `Exited(code)` but does NOT `Vec::remove` (would invalidate peer ThreadRefs). Pruning happens at `spawn` time via `prune_exited_if_needed` when a slot reaches `PRUNE_DEPTH_THRESHOLD = 4`.
- `install_initial_thread` on `Scheduler` lives next to `spawn`; both write `PCR+0x2C = hw_id` via the `PcrWriter` trait (impl `GuestMemoryPcr` in [state.rs](xenia-rs/crates/xenia-kernel/src/state.rs)).
- `KernelObject::Thread.waiters: Vec<ThreadRef>` (not `Vec<u8>`) — necessary for correctness under per-slot runqueues.
## Known caveat (2026-04-23)
Axis 4's real migration distributes Sylpheed's workers across slots differently than the old 32-slot one-per-slot model. The resulting wait/signal chain trips a single `scheduler.deadlock_recoveries` event during boot; default force-wake recovery resolves it and the game progresses to **VdSwap=2** (up from pre-Axis-4's 1). Under `--halt-on-deadlock` this trips `scheduler.deadlock_halts = 1` at ~7.5M cycles. The issue is a latent HLE sync-primitive gap exposed by correct migration, not an Axis 4 defect. Root cause: one of tid=1/3/4/7's blocking events isn't being signaled by its expected source after thread layout changes. Track down by instrumenting the specific handle values (0x10FC, 0x1014, 0x1104, 0x10DC/0x10F0) in a future session.
## Files
- [xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs) — workhorse (~35 tests covering all 5 axes)
- [xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `KernelState::set_affinity` orchestrator, `call_export` ctx swap via `ThreadRef`
- [xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — `ke_set_affinity_thread` (0x97), `ke_set_base_priority_thread` (0x99), `ke_query_base_priority_thread` (0x81), `ke_set_ideal_processor` (0x98), `ke_query_ideal_processor` (0x82), `nt_set_information_thread` (0xFB)
- [xenia-kernel/src/objects.rs](xenia-rs/crates/xenia-kernel/src/objects.rs) — waiter lists as `Vec<ThreadRef>`
- [xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs) — `injected_ref: Option<ThreadRef>` (not `injected_hw: u8`)
## Metrics added
- `scheduler.spawn.ok` — successful spawns
- `scheduler.spawn.rejected` — spawn failures (should stay 0)
- `scheduler.deadlock_recoveries` — force-wake events (non-zero post-Axis-4; see caveat)
- `scheduler.deadlock_halts` — halts under `--halt-on-deadlock`