[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate

The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-13 10:38:17 +02:00
parent f3b7e8b760
commit de21c7a544
31 changed files with 433587 additions and 3 deletions

View File

@@ -28,6 +28,15 @@ pub enum StepResult {
Trap,
/// Execution halted (by debugger or error).
Halted,
/// Executed the `db16cyc` spin-wait hint (`or r31,r31,r31`, encoding
/// `0x7FFFFB78`). The PC has already advanced past the hint; this is a
/// cooperative-yield signal so the scheduler hands the slot to a Ready
/// peer. On real hardware all six HW threads run concurrently and the
/// spin resolves naturally; under our round-robin lockstep a spinning
/// barrier/spinlock participant would otherwise monopolize its slot and
/// starve the co-located thread it is waiting on. Matches canary's
/// `InstrEmit_orx` db16cyc → `DelayExecution()` handling.
Yield,
}
/// Execute a single PPC instruction.
@@ -95,6 +104,9 @@ pub fn step_block(
ctx.cycle_count += 1;
ctx.timebase += 1;
if !matches!(result, StepResult::Continue) {
// `Yield` (db16cyc spin hint) terminates the block here so the
// scheduler regains control and can rotate the slot; the PC has
// already advanced past the hint inside `execute`.
return result;
}
// PC discontinuity within a block. By construction only the
@@ -548,6 +560,18 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()];
if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); }
ctx.pc += 4;
// `or r31,r31,r31` with encoding 0x7FFFFB78 is the Xenon `db16cyc`
// spin-wait hint (a no-op write of r31 onto itself). Canary's
// `InstrEmit_orx` special-cases exactly this code → `DelayExecution()`.
// Under our round-robin lockstep, a guest spinlock/barrier loop that
// executes db16cyc would otherwise consume its whole block every round
// and starve the co-located thread it is waiting on (the lock holder /
// barrier peer). Surface it as a cooperative yield so the scheduler can
// hand the slot to a Ready peer. The semantic result of the op is
// already applied (r31 |= r31 is a no-op), so yielding is value-neutral.
if instr.raw == 0x7FFF_FB78 {
return StepResult::Yield;
}
}
PpcOpcode::orcx => {
// PPCBUG-028: same shape as andcx — operate in u32.
@@ -5042,6 +5066,40 @@ mod tests {
assert_eq!(ctx.pc, 4);
}
#[test]
fn test_db16cyc_yields() {
// `or r31,r31,r31` encoding 0x7FFFFB78 is the Xenon db16cyc spin hint.
// It must (a) be value-neutral (r31 unchanged), (b) advance PC, and
// (c) report StepResult::Yield so the scheduler can hand off the slot.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
write_instr(&mut mem, 0, 0x7FFF_FB78);
ctx.pc = 0;
ctx.gpr[31] = 0x1234_5678_9ABC_DEF0;
let r = step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[31], 0x1234_5678_9ABC_DEF0, "db16cyc is value-neutral");
assert_eq!(ctx.pc, 4, "PC advances past the hint");
assert_eq!(r, StepResult::Yield, "db16cyc surfaces as a cooperative yield");
}
#[test]
fn test_plain_or_self_is_not_yield() {
// A regular `or rN,rN,rN` that is NOT the db16cyc encoding (e.g. r3)
// is an ordinary no-op move and must keep executing (Continue), so we
// only yield on the exact spin-hint code canary special-cases.
let mut ctx = PpcContext::new();
let mut mem = TestMem::new();
// or r3, r3, r3 (RT=RA=RB=3, Rc=0): 31<<26 | 3<<21 | 3<<16 | 3<<11 | 444<<1
let raw = (31u32 << 26) | (3 << 21) | (3 << 16) | (3 << 11) | (444 << 1);
write_instr(&mut mem, 0, raw);
ctx.pc = 0;
ctx.gpr[3] = 0xCAFE;
let r = step(&mut ctx, &mut mem);
assert_eq!(ctx.gpr[3], 0xCAFE);
assert_eq!(ctx.pc, 4);
assert_eq!(r, StepResult::Continue, "non-db16cyc or-self stays Continue");
}
#[test]
fn test_fadd() {
let mut ctx = PpcContext::new();