[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -28,6 +28,15 @@ pub enum StepResult {
|
||||
Trap,
|
||||
/// Execution halted (by debugger or error).
|
||||
Halted,
|
||||
/// Executed the `db16cyc` spin-wait hint (`or r31,r31,r31`, encoding
|
||||
/// `0x7FFFFB78`). The PC has already advanced past the hint; this is a
|
||||
/// cooperative-yield signal so the scheduler hands the slot to a Ready
|
||||
/// peer. On real hardware all six HW threads run concurrently and the
|
||||
/// spin resolves naturally; under our round-robin lockstep a spinning
|
||||
/// barrier/spinlock participant would otherwise monopolize its slot and
|
||||
/// starve the co-located thread it is waiting on. Matches canary's
|
||||
/// `InstrEmit_orx` db16cyc → `DelayExecution()` handling.
|
||||
Yield,
|
||||
}
|
||||
|
||||
/// Execute a single PPC instruction.
|
||||
@@ -95,6 +104,9 @@ pub fn step_block(
|
||||
ctx.cycle_count += 1;
|
||||
ctx.timebase += 1;
|
||||
if !matches!(result, StepResult::Continue) {
|
||||
// `Yield` (db16cyc spin hint) terminates the block here so the
|
||||
// scheduler regains control and can rotate the slot; the PC has
|
||||
// already advanced past the hint inside `execute`.
|
||||
return result;
|
||||
}
|
||||
// PC discontinuity within a block. By construction only the
|
||||
@@ -548,6 +560,18 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
|
||||
ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()];
|
||||
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
|
||||
ctx.pc += 4;
|
||||
// `or r31,r31,r31` with encoding 0x7FFFFB78 is the Xenon `db16cyc`
|
||||
// spin-wait hint (a no-op write of r31 onto itself). Canary's
|
||||
// `InstrEmit_orx` special-cases exactly this code → `DelayExecution()`.
|
||||
// Under our round-robin lockstep, a guest spinlock/barrier loop that
|
||||
// executes db16cyc would otherwise consume its whole block every round
|
||||
// and starve the co-located thread it is waiting on (the lock holder /
|
||||
// barrier peer). Surface it as a cooperative yield so the scheduler can
|
||||
// hand the slot to a Ready peer. The semantic result of the op is
|
||||
// already applied (r31 |= r31 is a no-op), so yielding is value-neutral.
|
||||
if instr.raw == 0x7FFF_FB78 {
|
||||
return StepResult::Yield;
|
||||
}
|
||||
}
|
||||
PpcOpcode::orcx => {
|
||||
// PPCBUG-028: same shape as andcx — operate in u32.
|
||||
@@ -5042,6 +5066,40 @@ mod tests {
|
||||
assert_eq!(ctx.pc, 4);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_db16cyc_yields() {
|
||||
// `or r31,r31,r31` encoding 0x7FFFFB78 is the Xenon db16cyc spin hint.
|
||||
// It must (a) be value-neutral (r31 unchanged), (b) advance PC, and
|
||||
// (c) report StepResult::Yield so the scheduler can hand off the slot.
|
||||
let mut ctx = PpcContext::new();
|
||||
let mut mem = TestMem::new();
|
||||
write_instr(&mut mem, 0, 0x7FFF_FB78);
|
||||
ctx.pc = 0;
|
||||
ctx.gpr[31] = 0x1234_5678_9ABC_DEF0;
|
||||
let r = step(&mut ctx, &mut mem);
|
||||
assert_eq!(ctx.gpr[31], 0x1234_5678_9ABC_DEF0, "db16cyc is value-neutral");
|
||||
assert_eq!(ctx.pc, 4, "PC advances past the hint");
|
||||
assert_eq!(r, StepResult::Yield, "db16cyc surfaces as a cooperative yield");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_plain_or_self_is_not_yield() {
|
||||
// A regular `or rN,rN,rN` that is NOT the db16cyc encoding (e.g. r3)
|
||||
// is an ordinary no-op move and must keep executing (Continue), so we
|
||||
// only yield on the exact spin-hint code canary special-cases.
|
||||
let mut ctx = PpcContext::new();
|
||||
let mut mem = TestMem::new();
|
||||
// or r3, r3, r3 (RT=RA=RB=3, Rc=0): 31<<26 | 3<<21 | 3<<16 | 3<<11 | 444<<1
|
||||
let raw = (31u32 << 26) | (3 << 21) | (3 << 16) | (3 << 11) | (444 << 1);
|
||||
write_instr(&mut mem, 0, raw);
|
||||
ctx.pc = 0;
|
||||
ctx.gpr[3] = 0xCAFE;
|
||||
let r = step(&mut ctx, &mut mem);
|
||||
assert_eq!(ctx.gpr[3], 0xCAFE);
|
||||
assert_eq!(ctx.pc, 4);
|
||||
assert_eq!(r, StepResult::Continue, "non-db16cyc or-self stays Continue");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_fadd() {
|
||||
let mut ctx = PpcContext::new();
|
||||
|
||||
@@ -902,6 +902,41 @@ impl Scheduler {
|
||||
false
|
||||
}
|
||||
|
||||
/// Cooperative yield: the currently-running thread executed a `db16cyc`
|
||||
/// spin-wait hint (see `StepResult::Yield`). It is busy-spinning on a
|
||||
/// guest spinlock/barrier whose release depends on a *co-located* peer
|
||||
/// that cannot make progress while this thread keeps winning the slot.
|
||||
///
|
||||
/// Promote every Ready peer on this slot past `STARVE_LIMIT` so the next
|
||||
/// `begin_slot_visit` picks one of them (their `effective_priority` →
|
||||
/// `i32::MAX`), and reset the yielder's own counter. Each promoted peer
|
||||
/// runs once and resets to 0 in `begin_slot_visit`; once all peers have
|
||||
/// had their turn the spinner is picked again, spins, and re-yields —
|
||||
/// producing a fair round-robin between the spinner and the threads it is
|
||||
/// waiting on. This mirrors real hardware, where all six HW threads run
|
||||
/// concurrently and the spin resolves as soon as the peer releases.
|
||||
///
|
||||
/// Pure function of the slot's current state (no RNG, no wall-clock), so
|
||||
/// it preserves lockstep determinism. No-op if there is no Ready peer
|
||||
/// (the spinner is alone on its slot — nothing to hand off to).
|
||||
///
|
||||
/// Returns `true` if at least one peer was promoted.
|
||||
pub fn yield_current(&mut self) -> bool {
|
||||
let Some(r) = self.current else { return false; };
|
||||
let slot = &mut self.slots[r.hw_id as usize];
|
||||
let me = r.idx as usize;
|
||||
let mut promoted = false;
|
||||
for (i, t) in slot.runqueue.iter_mut().enumerate() {
|
||||
if i == me {
|
||||
t.steps_starved = 0;
|
||||
} else if matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)) {
|
||||
t.steps_starved = STARVE_LIMIT;
|
||||
promoted = true;
|
||||
}
|
||||
}
|
||||
promoted
|
||||
}
|
||||
|
||||
// ----- Park / wake / exit -----
|
||||
|
||||
pub fn park_current(&mut self, reason: BlockReason) {
|
||||
@@ -2062,6 +2097,71 @@ mod tests {
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_db16cyc_yield_hands_slot_to_peer() {
|
||||
// Reproduces the Sylpheed title-screen gate: a guest spinlock/barrier
|
||||
// participant (tid=1) executes the `db16cyc` spin hint each round and
|
||||
// would otherwise win `pick_runnable` forever (equal priority, lower
|
||||
// index), starving the co-located peer (tid=2) it is waiting on.
|
||||
// `yield_current` must promote the Ready peer so the very next
|
||||
// `begin_slot_visit` picks it — without waiting STARVE_LIMIT rounds.
|
||||
let mut s = mk_empty_scheduler();
|
||||
for tid in [1u32, 2] {
|
||||
let mut p = SpawnParams::default();
|
||||
p.guest_tid = tid;
|
||||
p.thread_handle = 0x1000 + tid * 4;
|
||||
p.affinity_mask = 0b0001;
|
||||
p.pcr_base = 0x4000_0000 + tid * 0x1000;
|
||||
p.priority = 0; // equal priority — index would otherwise decide
|
||||
s.spawn(p, &mut NullPcr).unwrap();
|
||||
}
|
||||
|
||||
// Round 1: the spinner (lower index) wins.
|
||||
s.begin_slot_visit(0);
|
||||
let spinner = s.thread(s.current.unwrap()).tid;
|
||||
assert_eq!(spinner, 1, "lower-index equal-priority thread wins first pick");
|
||||
// It spins (db16cyc) → cooperative yield.
|
||||
assert!(s.yield_current(), "yield promotes the Ready peer");
|
||||
s.end_slot_visit();
|
||||
|
||||
// Round 2: the promoted peer must now be picked, not the spinner.
|
||||
s.begin_slot_visit(0);
|
||||
let after_yield = s.thread(s.current.unwrap()).tid;
|
||||
assert_eq!(
|
||||
after_yield, 2,
|
||||
"after db16cyc yield the co-located peer runs (no STARVE_LIMIT wait)"
|
||||
);
|
||||
s.end_slot_visit();
|
||||
|
||||
// Round 3: peer's boost was consumed (reset to 0 when picked), so the
|
||||
// spinner reclaims the slot — fair alternation, no priority inversion.
|
||||
s.begin_slot_visit(0);
|
||||
assert_eq!(
|
||||
s.thread(s.current.unwrap()).tid,
|
||||
1,
|
||||
"spinner reclaims the slot after the peer has had its turn"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_yield_current_noop_when_alone() {
|
||||
// A spinner with no Ready peer on its slot has nothing to hand off to;
|
||||
// yield_current must be a no-op (returns false) and not panic.
|
||||
let mut s = mk_empty_scheduler();
|
||||
let mut p = SpawnParams::default();
|
||||
p.guest_tid = 1;
|
||||
p.thread_handle = 0x1004;
|
||||
p.affinity_mask = 0b0001;
|
||||
p.pcr_base = 0x4000_0000;
|
||||
s.spawn(p, &mut NullPcr).unwrap();
|
||||
s.begin_slot_visit(0);
|
||||
assert!(!s.yield_current(), "no peer to promote → no-op");
|
||||
// Still the same thread next round.
|
||||
s.end_slot_visit();
|
||||
s.begin_slot_visit(0);
|
||||
assert_eq!(s.thread(s.current.unwrap()).tid, 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_cooperative_yield_does_not_need_quantum() {
|
||||
let mut s = mk_empty_scheduler();
|
||||
|
||||
Reference in New Issue
Block a user