Files
xenia-rs/crates/xenia-kernel/src/thread.rs
MechaCat02 93f60a3ba0 [iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier
Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C
(prcb_data@0x100 + current_cpu@0xC). Canary sets it from
`GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu =
cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C.
Left unwritten it read 0 for every thread.

Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at
entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread
occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`.
With index 0 for all threads, every thread marked slot 0; the multi-byte
rendezvous signature it then spins on (`ld [obj+0x164]` compared against the
packed per-slot expectation) could never assemble. Both pump threads busied at
pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran
their `KeSetEvent` loops — so the events they signal (the 21k-per-thread
heartbeat in canary) never fired, starving the downstream worker handshake.

Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread
image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler
spawn + affinity migration) so the two stay in sync.

Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier
(barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly
Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c).
imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical).
draws still 0 (a later, separate render gate). golden re-baselined.
cargo test --workspace: 672 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 18:08:46 +02:00

74 lines
2.9 KiB
Rust

//! Guest-thread image allocation — shared by the initial thread setup in
//! `xenia-app/src/main.rs` and `ExCreateThread`. Stack, PCR, and TLS blocks
//! all come from the existing kernel bump allocators so layout is consistent.
use xenia_memory::{GuestMemory, MemoryAccess};
use crate::state::KernelState;
/// Addresses the caller passes to `Scheduler::spawn` / the initial-thread
/// setup. Matches xenia-canary's per-thread allocations: a stack, a PCR, and
/// a TLS block.
#[derive(Debug, Clone, Copy)]
pub struct ThreadImage {
pub stack_base: u32,
pub stack_size: u32,
pub pcr_base: u32,
pub tls_base: u32,
}
/// Allocate stack + PCR + TLS for one guest thread and initialize the PCR
/// fields that games read in their thread prolog.
///
/// - Stack comes from `KernelState::stack_alloc` (bump allocator at
/// 0x7100_0000 upward). The returned base is the *bottom*; callers
/// compute SP as `base + size`.
/// - PCR and TLS are fixed 4 KiB pages allocated via `heap_alloc` so they
/// land in the user heap region together with other kernel metadata.
/// - `hw_thread_id` is written at PCR+0x2C so `KeGetCurrentProcessorNumber`-
/// style reads from r13 resolve correctly even though we never register
/// that export.
pub fn allocate_thread_image(
kernel: &mut KernelState,
mem: &GuestMemory,
stack_size: u32,
hw_thread_id: u8,
) -> Option<ThreadImage> {
// Round stack size to a page and give games a minimum that matches
// xenia-canary's 16 MiB default when callers request 0 (common for
// ExCreateThread when the caller lets the kernel pick).
let stack_size = if stack_size == 0 {
0x10_0000
} else {
(stack_size + 0xFFF) & !0xFFF
};
// stack_alloc returns top-of-stack; we need the base.
let stack_top = kernel.stack_alloc(stack_size, mem)?;
let stack_base = stack_top - stack_size;
let pcr_base = kernel.heap_alloc(0x1000, mem)?;
let tls_base = kernel.heap_alloc(0x1000, mem)?;
// PCR layout (canary xboxkrnl/xboxkrnl_module.cc, simplified):
// +0x000 tls_ptr → TLS block base
// +0x02C current_processor_id → HW thread id (0..5)
// +0x100 current_thread → placeholder non-zero tag
// +0x150 dpc_active → 0 (no DPC queued)
mem.write_u32(pcr_base, tls_base);
mem.write_u32(pcr_base + 0x2C, hw_thread_id as u32);
mem.write_u32(pcr_base + 0x100, 0x1000);
// +0x10C prcb_data.current_cpu — canary `pcr->prcb_data.current_cpu`
// (PRCB@0x100 + current_cpu@0xC). Guest spin-barriers index a
// per-HW-thread slot array by `lbz r11, 268(r13)` = this byte; it
// must equal the HW thread id (== PCR+0x2C). See state.rs PcrWriter.
mem.write_u8(pcr_base + 0x10C, hw_thread_id);
mem.write_u32(pcr_base + 0x150, 0);
Some(ThreadImage {
stack_base,
stack_size,
pcr_base,
tls_base,
})
}