feat(v1.1.1-dispatcher): dispatcher loop + retry + depth limit + outbox emitter
`OutboxEventEmitter` replaces `NoopEventEmitter` in the picloud binary's `Services` bundle. KV mutations now fan out to the outbox via `TriggerRepo::list_matching_kv` — one row per matching trigger, carrying the serialized `TriggerEvent` payload + the matching trigger's retry policy. `Dispatcher` is the single tokio task that polls the outbox every 100ms, claims due rows via FOR UPDATE SKIP LOCKED (with a batch cap), and routes each to the executor. Shares the `ExecutionGate` with sync HTTP per design notes §2 — gate saturation reschedules the row instead of dropping it. Outcome handling matches design notes §3 and §4: - reply_to.is_some() (sync HTTP): never retry. Deliver via `InboxResolver`; if the receiver was dropped, write an `abandoned_executions` row. - is_dead_letter_handler == true: never retry, never DL. On failure, annotate the original DL row with `resolution = 'handler_failed'`. Stops the recursion that would otherwise re-fire a broken handler script. - Otherwise async: bump attempt_count, reschedule with exponential backoff + ±jitter; once max_attempts is reached, write a `dead_letters` row and drop from outbox. - Trigger-depth limit: `cx.trigger_depth > max_trigger_depth` skips execution entirely (log + future metric), NEVER dead-letters. Loops are not retried via the DL chain — they're terminated. `InboxResolver` trait lands in `picloud-shared` with a `NoopInboxResolver` bootstrap that flags every delivery as `Abandoned`. Commit 6 replaces the noop with the real in-process registry in `orchestrator-core`. `AdminPrincipalResolver` builds a `Principal` from a trigger's `registered_by_principal` user id so the dispatched script executes as the trigger registrant (design notes §4). Unit tests cover backoff math (exponential/linear/constant) + jitter range + ExecError → InboxFailureKind classification + the status-code table mapping. Integration tests for the full dispatcher loop need a real Postgres + executor; reviewer runs them via the manual smoke flow in the plan / HANDBACK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
16
crates/shared/src/exec_summary.rs
Normal file
16
crates/shared/src/exec_summary.rs
Normal file
@@ -0,0 +1,16 @@
|
||||
//! `ExecResponseSummary` — a flattened, crate-portable view of an
|
||||
//! `ExecResponse` for use by `InboxResult`. Lives in
|
||||
//! `picloud-shared` because the dispatcher (manager-core) and the
|
||||
//! orchestrator-core inbox registry both need to read it, and
|
||||
//! `executor-core::ExecResponse` is owned by a leaf crate.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ExecResponseSummary {
|
||||
pub status_code: u16,
|
||||
pub headers: BTreeMap<String, String>,
|
||||
pub body: serde_json::Value,
|
||||
}
|
||||
86
crates/shared/src/inbox.rs
Normal file
86
crates/shared/src/inbox.rs
Normal file
@@ -0,0 +1,86 @@
|
||||
//! `InboxResolver` — abstraction the dispatcher uses to deliver sync
|
||||
//! HTTP results back to the orchestrator that's awaiting them on a
|
||||
//! oneshot channel. Lives in `picloud-shared` because the dispatcher
|
||||
//! (manager-core) and the registry impl (orchestrator-core) live in
|
||||
//! different crates and need a shared trait surface.
|
||||
//!
|
||||
//! v1.1.1 ships an in-process implementation in `orchestrator-core`
|
||||
//! that keeps a `HashMap<inbox_id, oneshot::Sender<...>>`. Cluster
|
||||
//! mode (v1.3+) swaps this for a Postgres `LISTEN/NOTIFY`-based
|
||||
//! resolver without touching the dispatcher code (design notes §3
|
||||
//! implementation table).
|
||||
//!
|
||||
//! Until commit 6 wires up the real registry, `NoopInboxResolver`
|
||||
//! (`Abandoned` for every attempt) keeps the dispatcher able to run.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::ExecResponseSummary;
|
||||
|
||||
/// Result of trying to hand back a sync-HTTP outcome.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum InboxDeliveryOutcome {
|
||||
/// Receiver still attached; result was delivered. Dispatcher
|
||||
/// deletes the outbox row.
|
||||
Delivered,
|
||||
/// Receiver was dropped (orchestrator timed out). Dispatcher
|
||||
/// writes an `abandoned_executions` row.
|
||||
Abandoned,
|
||||
}
|
||||
|
||||
/// Outcome shape the dispatcher delivers to the inbox. Carries enough
|
||||
/// to reconstruct an HTTP response — full body via JSON, optional
|
||||
/// error string when the executor reported a failure.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum InboxResult {
|
||||
/// Successful execution. `response` is the `ExecResponse` summary
|
||||
/// (status code + body + headers + logs).
|
||||
Success(ExecResponseSummary),
|
||||
/// Failure modes — script threw, op-budget, timeout, etc. The
|
||||
/// orchestrator maps these to the design-notes §3 status codes
|
||||
/// (422/502/503/504/507/500) when responding to the HTTP caller.
|
||||
Failure {
|
||||
kind: InboxFailureKind,
|
||||
message: String,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum InboxFailureKind {
|
||||
/// Script's Rhai code threw or hit a runtime error → 502.
|
||||
Runtime,
|
||||
/// Wall-clock exceeded → 504.
|
||||
Timeout,
|
||||
/// Operation budget exceeded → 507.
|
||||
OperationBudget,
|
||||
/// Gate refused admission → 503.
|
||||
Overloaded,
|
||||
/// Script parse failure / bad-request → 422.
|
||||
Validation,
|
||||
/// Platform problem (executor crashed, dispatcher crashed, etc.) → 500.
|
||||
Platform,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait InboxResolver: Send + Sync {
|
||||
/// Attempt to deliver `result` to the receiver registered under
|
||||
/// `inbox_id`. Returns `Delivered` if the channel was alive,
|
||||
/// `Abandoned` if the receiver was already dropped (the
|
||||
/// orchestrator's timeout fired before the dispatcher got here).
|
||||
async fn deliver(&self, inbox_id: Uuid, result: InboxResult) -> InboxDeliveryOutcome;
|
||||
}
|
||||
|
||||
/// Bootstrap impl used before the real registry is wired in. Every
|
||||
/// delivery is treated as abandoned — the dispatcher records an
|
||||
/// abandoned-execution row and moves on. Replaced in `build_app` with
|
||||
/// the in-process `InboxRegistry` from orchestrator-core.
|
||||
#[derive(Debug, Default, Clone, Copy)]
|
||||
pub struct NoopInboxResolver;
|
||||
|
||||
#[async_trait]
|
||||
impl InboxResolver for NoopInboxResolver {
|
||||
async fn deliver(&self, _inbox_id: Uuid, _result: InboxResult) -> InboxDeliveryOutcome {
|
||||
InboxDeliveryOutcome::Abandoned
|
||||
}
|
||||
}
|
||||
@@ -9,8 +9,10 @@ pub mod auth;
|
||||
pub mod dead_letters;
|
||||
pub mod error;
|
||||
pub mod events;
|
||||
pub mod exec_summary;
|
||||
pub mod execution_log;
|
||||
pub mod ids;
|
||||
pub mod inbox;
|
||||
pub mod kv;
|
||||
pub mod log_sink;
|
||||
pub mod route;
|
||||
@@ -27,8 +29,12 @@ pub use auth::{AppRole, InstanceRole, Principal, Scope, UserId};
|
||||
pub use dead_letters::{DeadLetterError, DeadLetterId, DeadLetterService, NoopDeadLetterService};
|
||||
pub use error::Error;
|
||||
pub use events::{EmitError, NoopEventEmitter, ServiceEvent, ServiceEventEmitter};
|
||||
pub use exec_summary::ExecResponseSummary;
|
||||
pub use execution_log::{ExecutionLog, ExecutionStatus};
|
||||
pub use ids::{AdminUserId, ApiKeyId, AppId, ExecutionId, RequestId, ScriptId, TriggerId};
|
||||
pub use inbox::{
|
||||
InboxDeliveryOutcome, InboxFailureKind, InboxResolver, InboxResult, NoopInboxResolver,
|
||||
};
|
||||
pub use kv::{KvError, KvListPage, KvService, NoopKvService};
|
||||
pub use log_sink::{ExecutionLogSink, LogSinkError};
|
||||
pub use route::{HostKind, PathKind, Route};
|
||||
|
||||
Reference in New Issue
Block a user