Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC). Canary sets it from `GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C. Left unwritten it read 0 for every thread. Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`. With index 0 for all threads, every thread marked slot 0; the multi-byte rendezvous signature it then spins on (`ld [obj+0x164]` compared against the packed per-slot expectation) could never assemble. Both pump threads busied at pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran their `KeSetEvent` loops — so the events they signal (the 21k-per-thread heartbeat in canary) never fired, starving the downstream worker handshake. Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler spawn + affinity migration) so the two stay in sync. Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier (barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c). imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical). draws still 0 (a later, separate render gate). golden re-baselined. cargo test --workspace: 672 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
74 lines
2.9 KiB
Rust
74 lines
2.9 KiB
Rust
//! Guest-thread image allocation — shared by the initial thread setup in
|
|
//! `xenia-app/src/main.rs` and `ExCreateThread`. Stack, PCR, and TLS blocks
|
|
//! all come from the existing kernel bump allocators so layout is consistent.
|
|
|
|
use xenia_memory::{GuestMemory, MemoryAccess};
|
|
|
|
use crate::state::KernelState;
|
|
|
|
/// Addresses the caller passes to `Scheduler::spawn` / the initial-thread
|
|
/// setup. Matches xenia-canary's per-thread allocations: a stack, a PCR, and
|
|
/// a TLS block.
|
|
#[derive(Debug, Clone, Copy)]
|
|
pub struct ThreadImage {
|
|
pub stack_base: u32,
|
|
pub stack_size: u32,
|
|
pub pcr_base: u32,
|
|
pub tls_base: u32,
|
|
}
|
|
|
|
/// Allocate stack + PCR + TLS for one guest thread and initialize the PCR
|
|
/// fields that games read in their thread prolog.
|
|
///
|
|
/// - Stack comes from `KernelState::stack_alloc` (bump allocator at
|
|
/// 0x7100_0000 upward). The returned base is the *bottom*; callers
|
|
/// compute SP as `base + size`.
|
|
/// - PCR and TLS are fixed 4 KiB pages allocated via `heap_alloc` so they
|
|
/// land in the user heap region together with other kernel metadata.
|
|
/// - `hw_thread_id` is written at PCR+0x2C so `KeGetCurrentProcessorNumber`-
|
|
/// style reads from r13 resolve correctly even though we never register
|
|
/// that export.
|
|
pub fn allocate_thread_image(
|
|
kernel: &mut KernelState,
|
|
mem: &GuestMemory,
|
|
stack_size: u32,
|
|
hw_thread_id: u8,
|
|
) -> Option<ThreadImage> {
|
|
// Round stack size to a page and give games a minimum that matches
|
|
// xenia-canary's 16 MiB default when callers request 0 (common for
|
|
// ExCreateThread when the caller lets the kernel pick).
|
|
let stack_size = if stack_size == 0 {
|
|
0x10_0000
|
|
} else {
|
|
(stack_size + 0xFFF) & !0xFFF
|
|
};
|
|
// stack_alloc returns top-of-stack; we need the base.
|
|
let stack_top = kernel.stack_alloc(stack_size, mem)?;
|
|
let stack_base = stack_top - stack_size;
|
|
|
|
let pcr_base = kernel.heap_alloc(0x1000, mem)?;
|
|
let tls_base = kernel.heap_alloc(0x1000, mem)?;
|
|
|
|
// PCR layout (canary xboxkrnl/xboxkrnl_module.cc, simplified):
|
|
// +0x000 tls_ptr → TLS block base
|
|
// +0x02C current_processor_id → HW thread id (0..5)
|
|
// +0x100 current_thread → placeholder non-zero tag
|
|
// +0x150 dpc_active → 0 (no DPC queued)
|
|
mem.write_u32(pcr_base, tls_base);
|
|
mem.write_u32(pcr_base + 0x2C, hw_thread_id as u32);
|
|
mem.write_u32(pcr_base + 0x100, 0x1000);
|
|
// +0x10C prcb_data.current_cpu — canary `pcr->prcb_data.current_cpu`
|
|
// (PRCB@0x100 + current_cpu@0xC). Guest spin-barriers index a
|
|
// per-HW-thread slot array by `lbz r11, 268(r13)` = this byte; it
|
|
// must equal the HW thread id (== PCR+0x2C). See state.rs PcrWriter.
|
|
mem.write_u8(pcr_base + 0x10C, hw_thread_id);
|
|
mem.write_u32(pcr_base + 0x150, 0);
|
|
|
|
Some(ThreadImage {
|
|
stack_base,
|
|
stack_size,
|
|
pcr_base,
|
|
tls_base,
|
|
})
|
|
}
|