[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -28,6 +28,15 @@ pub enum StepResult {
|
||||
Trap,
|
||||
/// Execution halted (by debugger or error).
|
||||
Halted,
|
||||
/// Executed the `db16cyc` spin-wait hint (`or r31,r31,r31`, encoding
|
||||
/// `0x7FFFFB78`). The PC has already advanced past the hint; this is a
|
||||
/// cooperative-yield signal so the scheduler hands the slot to a Ready
|
||||
/// peer. On real hardware all six HW threads run concurrently and the
|
||||
/// spin resolves naturally; under our round-robin lockstep a spinning
|
||||
/// barrier/spinlock participant would otherwise monopolize its slot and
|
||||
/// starve the co-located thread it is waiting on. Matches canary's
|
||||
/// `InstrEmit_orx` db16cyc → `DelayExecution()` handling.
|
||||
Yield,
|
||||
}
|
||||
|
||||
/// Execute a single PPC instruction.
|
||||
@@ -95,6 +104,9 @@ pub fn step_block(
|
||||
ctx.cycle_count += 1;
|
||||
ctx.timebase += 1;
|
||||
if !matches!(result, StepResult::Continue) {
|
||||
// `Yield` (db16cyc spin hint) terminates the block here so the
|
||||
// scheduler regains control and can rotate the slot; the PC has
|
||||
// already advanced past the hint inside `execute`.
|
||||
return result;
|
||||
}
|
||||
// PC discontinuity within a block. By construction only the
|
||||
@@ -548,6 +560,18 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
|
||||
ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()];
|
||||
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
|
||||
ctx.pc += 4;
|
||||
// `or r31,r31,r31` with encoding 0x7FFFFB78 is the Xenon `db16cyc`
|
||||
// spin-wait hint (a no-op write of r31 onto itself). Canary's
|
||||
// `InstrEmit_orx` special-cases exactly this code → `DelayExecution()`.
|
||||
// Under our round-robin lockstep, a guest spinlock/barrier loop that
|
||||
// executes db16cyc would otherwise consume its whole block every round
|
||||
// and starve the co-located thread it is waiting on (the lock holder /
|
||||
// barrier peer). Surface it as a cooperative yield so the scheduler can
|
||||
// hand the slot to a Ready peer. The semantic result of the op is
|
||||
// already applied (r31 |= r31 is a no-op), so yielding is value-neutral.
|
||||
if instr.raw == 0x7FFF_FB78 {
|
||||
return StepResult::Yield;
|
||||
}
|
||||
}
|
||||
PpcOpcode::orcx => {
|
||||
// PPCBUG-028: same shape as andcx — operate in u32.
|
||||
@@ -5042,6 +5066,40 @@ mod tests {
|
||||
assert_eq!(ctx.pc, 4);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_db16cyc_yields() {
|
||||
// `or r31,r31,r31` encoding 0x7FFFFB78 is the Xenon db16cyc spin hint.
|
||||
// It must (a) be value-neutral (r31 unchanged), (b) advance PC, and
|
||||
// (c) report StepResult::Yield so the scheduler can hand off the slot.
|
||||
let mut ctx = PpcContext::new();
|
||||
let mut mem = TestMem::new();
|
||||
write_instr(&mut mem, 0, 0x7FFF_FB78);
|
||||
ctx.pc = 0;
|
||||
ctx.gpr[31] = 0x1234_5678_9ABC_DEF0;
|
||||
let r = step(&mut ctx, &mut mem);
|
||||
assert_eq!(ctx.gpr[31], 0x1234_5678_9ABC_DEF0, "db16cyc is value-neutral");
|
||||
assert_eq!(ctx.pc, 4, "PC advances past the hint");
|
||||
assert_eq!(r, StepResult::Yield, "db16cyc surfaces as a cooperative yield");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_plain_or_self_is_not_yield() {
|
||||
// A regular `or rN,rN,rN` that is NOT the db16cyc encoding (e.g. r3)
|
||||
// is an ordinary no-op move and must keep executing (Continue), so we
|
||||
// only yield on the exact spin-hint code canary special-cases.
|
||||
let mut ctx = PpcContext::new();
|
||||
let mut mem = TestMem::new();
|
||||
// or r3, r3, r3 (RT=RA=RB=3, Rc=0): 31<<26 | 3<<21 | 3<<16 | 3<<11 | 444<<1
|
||||
let raw = (31u32 << 26) | (3 << 21) | (3 << 16) | (3 << 11) | (444 << 1);
|
||||
write_instr(&mut mem, 0, raw);
|
||||
ctx.pc = 0;
|
||||
ctx.gpr[3] = 0xCAFE;
|
||||
let r = step(&mut ctx, &mut mem);
|
||||
assert_eq!(ctx.gpr[3], 0xCAFE);
|
||||
assert_eq!(ctx.pc, 4);
|
||||
assert_eq!(r, StepResult::Continue, "non-db16cyc or-self stays Continue");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_fadd() {
|
||||
let mut ctx = PpcContext::new();
|
||||
|
||||
Reference in New Issue
Block a user