Compare commits
22 Commits
feat/v1.1.
...
feat/v1.1.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2796f36fef | ||
|
|
5a95ff2d07 | ||
|
|
66b661f64c | ||
|
|
6b7ff78730 | ||
|
|
1795dfc98a | ||
|
|
20f1b5e64d | ||
|
|
77b2cb58bb | ||
|
|
6a2971ac70 | ||
|
|
2e92691ee1 | ||
|
|
545d863199 | ||
|
|
6b99f74c48 | ||
|
|
434fb63cd2 | ||
|
|
1efb350b54 | ||
|
|
10cfde9e40 | ||
|
|
bb88b024d2 | ||
|
|
9d01f42d5e | ||
|
|
1a6324078c | ||
|
|
54efe61167 | ||
|
|
1d2e99e42c | ||
|
|
9e54b7f875 | ||
|
|
a685674dbf | ||
|
|
a8aab22163 |
88
CHANGELOG.md
Normal file
88
CHANGELOG.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# PiCloud Changelog
|
||||
|
||||
## v1.1.1 — Storage & Events (unreleased)
|
||||
|
||||
The triggers framework — KV store + universal outbox + dispatcher +
|
||||
NATS-style sync HTTP + per-route async dispatch + dead-letter
|
||||
handling + dashboard surface. Every subsequent v1.1.x service module
|
||||
(docs, files, pubsub, …) hangs off the dispatcher built here.
|
||||
|
||||
### Added
|
||||
|
||||
- **KV store** — `kv_entries` table keyed `(app_id, collection, key)`
|
||||
with JSONB values. Rhai SDK exposes the handle pattern:
|
||||
`kv::collection(name).{get,set,has,delete,list}`. Cursor-style
|
||||
pagination with opaque base64 cursors. Cross-app isolation
|
||||
enforced via `cx.app_id` (never script-passed).
|
||||
- **Triggers framework (Layout E)** — parent `triggers` table +
|
||||
per-kind detail tables (`kv_trigger_details`,
|
||||
`dead_letter_trigger_details`). Trigger CRUD admin endpoints
|
||||
(`/api/v1/admin/apps/{id}/triggers/{kv,dead_letter}`) +
|
||||
`Capability::AppManageTriggers(AppId)`.
|
||||
- **Universal outbox + dispatcher** — single tokio task that polls
|
||||
the outbox via `FOR UPDATE SKIP LOCKED`, routes due rows to the
|
||||
executor through the shared `ExecutionGate`. Retry with
|
||||
exponential backoff + ±jitter; on exhaustion, dead-letter.
|
||||
- **NATS-style sync HTTP via outbox** — `InboxRegistry` (in-process
|
||||
oneshot map) lets the orchestrator await dispatcher delivery on
|
||||
every sync HTTP request. Cluster mode (v1.3+) swaps this for
|
||||
`LISTEN/NOTIFY` behind the same `InboxResolver` trait.
|
||||
- **`dispatch_mode: async` on routes** — `POST` to a route with
|
||||
`dispatch_mode = 'async'` returns `202 Accepted` immediately;
|
||||
the script runs via the dispatcher (with retries / dead-letter).
|
||||
- **Dead-letter handling** — separate `dead_letters` table per
|
||||
design notes §4. `dead_letters::{replay,resolve}` Rhai SDK +
|
||||
admin endpoints + `Capability::AppDeadLetterManage(AppId)`.
|
||||
Recursion-stop rule: dead-letter handler failures annotate the
|
||||
original row as `resolution = 'handler_failed'` and never produce
|
||||
a new dead-letter or retry.
|
||||
- **Dashboard surface for dead letters** — unresolved-count red
|
||||
badge on the apps list + per-app page; per-app dead-letters list
|
||||
view at `/admin/apps/{slug}/dead-letters` with Replay + Mark
|
||||
resolved per-row actions and expandable payload detail.
|
||||
- **`abandoned_executions` table** — forensic row written by the
|
||||
dispatcher when it tries to resolve an inbox the orchestrator
|
||||
already abandoned (timed out). Counter metric path reserved.
|
||||
- **Trigger-depth limit** — `cx.trigger_depth > max_trigger_depth`
|
||||
(default 8) skips execution + logs; does NOT dead-letter
|
||||
(depth-exceeded means "you built a loop").
|
||||
- **GC sweepers** — weekly retention sweeps for `dead_letters`
|
||||
(30 days) and `abandoned_executions` (7 days), both with
|
||||
`FOR UPDATE SKIP LOCKED` for cluster-mode safety.
|
||||
- **Env-overridable trigger config** — `TriggerConfig::from_env`
|
||||
reads `PICLOUD_MAX_TRIGGER_DEPTH`, `PICLOUD_TRIGGER_RETRY_*`,
|
||||
`PICLOUD_DEAD_LETTER_RETENTION_DAYS`,
|
||||
`PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`.
|
||||
|
||||
### Changed
|
||||
|
||||
- **Workspace version**: `1.1.0` → `1.1.1`.
|
||||
- **Rhai SDK version**: `1.1` → `1.2` (additive — every v1.1 script
|
||||
still runs unchanged; new surfaces: `kv::*`, `dead_letters::*`,
|
||||
`ctx.event` for triggered handlers).
|
||||
- **Dashboard version**: `0.6.0` → `0.7.0` for the dead-letters UI.
|
||||
- **`Services` bundle** — replaces v1.1.0's no-arg `Services::new()`
|
||||
with explicit `Services::new(kv, dead_letters, events)`. Tests
|
||||
use `Services::default()` for an all-noop bundle.
|
||||
- **`SdkCallCx`** grows `is_dead_letter_handler: bool` and
|
||||
`event: Option<TriggerEvent>` fields.
|
||||
- **`ExecRequest`** mirrors the new `SdkCallCx` fields and grows
|
||||
`event` for serializable trigger payload transport.
|
||||
- **Routes table** grows `dispatch_mode TEXT NOT NULL DEFAULT 'sync'`
|
||||
(CHECK in {sync, async}).
|
||||
- **Schema version**: 6 → 12 (migrations 0007 through 0012).
|
||||
|
||||
### Migrations
|
||||
|
||||
- `0007_kv.sql` — `kv_entries` table + index
|
||||
- `0008_triggers.sql` — `triggers` + `kv_trigger_details` +
|
||||
`dead_letter_trigger_details`
|
||||
- `0009_outbox.sql` — universal `outbox` table + due-row partial index
|
||||
- `0010_dead_letters.sql` — `dead_letters` table + unresolved partial
|
||||
index + GC index
|
||||
- `0011_abandoned_executions.sql` — forensic table + GC index
|
||||
- `0012_routes_dispatch_mode.sql` — `routes.dispatch_mode` column
|
||||
|
||||
## v1.1.0 — Foundation & Standard Library
|
||||
|
||||
See `docs/v1.1.x-design-notes.md` §7 for the full v1.1.x roadmap.
|
||||
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
Authoritative design: [serverless_cloud_blueprint.md](serverless_cloud_blueprint.md). The blueprint is a living document — when architecture decisions are made in conversation that contradict it, treat the latest decision as truth and update the blueprint.
|
||||
|
||||
**Current focus (Phase 4, v1.1.0):** SDK foundation + stdlib utilities — the shape every v1.1.x service module hangs off, see [docs/sdk-shape.md](docs/sdk-shape.md). Subsequent v1.1.x releases (KV in v1.1.1, docs in v1.1.2, …) fill it in; see blueprint §12 for the full table. Phase 3 shipped end-to-end: admin auth, multi-app scoping, and Phase 3.5 capability gating (`manager-core::authz::{can, require, Capability}` + migration `0006_users_authz.sql`). Every v1.1+ table starts with `app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE` and every Rhai SDK call resolves its app from the execution context.
|
||||
**Current focus (Phase 4, v1.1.0):** SDK foundation + stdlib utilities — the shape every v1.1.x service module hangs off, see [docs/sdk-shape.md](docs/sdk-shape.md). Stdlib reference at [docs/stdlib-reference.md](docs/stdlib-reference.md). Subsequent v1.1.x releases (KV in v1.1.1, docs in v1.1.2, …) fill it in; see blueprint §12 for the full table. Phase 3 shipped end-to-end: admin auth, multi-app scoping, and Phase 3.5 capability gating (`manager-core::authz::{can, require, Capability}` + migration `0006_users_authz.sql`). Every v1.1+ table starts with `app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE` and every Rhai SDK call resolves its app from the execution context.
|
||||
|
||||
## Three-Service Architecture
|
||||
|
||||
|
||||
26
Cargo.lock
generated
26
Cargo.lock
generated
@@ -1505,7 +1505,7 @@ checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220"
|
||||
|
||||
[[package]]
|
||||
name = "picloud"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-trait",
|
||||
@@ -1531,7 +1531,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-cli"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"assert_cmd",
|
||||
@@ -1552,7 +1552,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-executor"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"picloud-executor-core",
|
||||
@@ -1564,21 +1564,28 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-executor-core"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"base64",
|
||||
"chrono",
|
||||
"hex",
|
||||
"percent-encoding",
|
||||
"picloud-shared",
|
||||
"rand 0.8.6",
|
||||
"regex",
|
||||
"rhai",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"thiserror 1.0.69",
|
||||
"tokio",
|
||||
"tracing",
|
||||
"uuid",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "picloud-manager"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"picloud-manager-core",
|
||||
@@ -1590,7 +1597,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-manager-core"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"argon2",
|
||||
"async-trait",
|
||||
@@ -1598,6 +1605,7 @@ dependencies = [
|
||||
"base64",
|
||||
"chrono",
|
||||
"data-encoding",
|
||||
"picloud-executor-core",
|
||||
"picloud-orchestrator-core",
|
||||
"picloud-shared",
|
||||
"rand 0.8.6",
|
||||
@@ -1614,7 +1622,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-orchestrator"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"picloud-orchestrator-core",
|
||||
@@ -1626,7 +1634,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-orchestrator-core"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"axum",
|
||||
@@ -1645,7 +1653,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "picloud-shared"
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"chrono",
|
||||
|
||||
@@ -13,7 +13,7 @@ members = [
|
||||
]
|
||||
|
||||
[workspace.package]
|
||||
version = "0.6.0"
|
||||
version = "1.1.1"
|
||||
edition = "2021"
|
||||
rust-version = "1.92"
|
||||
license = "MIT OR Apache-2.0"
|
||||
@@ -74,6 +74,12 @@ sha2 = "0.10"
|
||||
base64 = "0.22"
|
||||
data-encoding = "2.6"
|
||||
|
||||
# Stdlib utility crates (v1.1.0 stdlib PR — registered into the
|
||||
# Rhai engine as the regex::/random::/etc. namespaces)
|
||||
regex = "1"
|
||||
hex = "0.4"
|
||||
percent-encoding = "2"
|
||||
|
||||
[workspace.lints.rust]
|
||||
unsafe_code = "forbid"
|
||||
|
||||
|
||||
340
HANDBACK.md
Normal file
340
HANDBACK.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# v1.1.1 Implementation HANDBACK
|
||||
|
||||
## 1. Branch + commit count
|
||||
|
||||
- Branch: `feat/v1.1.1-storage-and-events`
|
||||
- Base: `main`
|
||||
- 11 commits ahead of `main`. Branch is **not pushed**, **not merged**.
|
||||
|
||||
```
|
||||
66b661f chore(release): bump workspace to v1.1.1 + CHANGELOG
|
||||
6b7ff78 feat(v1.1.1-gc): dead-letter + abandoned-executions retention sweepers
|
||||
1795dfc feat(v1.1.1-dead-letters): dashboard badge + list view
|
||||
20f1b5e feat(v1.1.1-dead-letters): service + Rhai SDK + admin endpoints
|
||||
77b2cb5 feat(v1.1.1-routes): outbox-routed sync HTTP + dispatch_mode=async
|
||||
6a2971a feat(v1.1.1-dispatcher): dispatcher loop + retry + depth limit + outbox emitter
|
||||
2e92691 feat(v1.1.1-triggers): trigger CRUD admin endpoints
|
||||
545d863 feat(v1.1.1-triggers): triggers + outbox schema + repos
|
||||
6b99f74 feat(v1.1.1-kv): Rhai kv:: SDK module + ctx.event wiring
|
||||
434fb63 feat(v1.1.1-kv): migrations + KvService trait + Postgres impl
|
||||
1efb350 docs(v1.1.x): resolve in-flight decisions as Decided 2026-06-01
|
||||
```
|
||||
|
||||
The first commit (`1efb350`) absorbed working-tree edits to
|
||||
`docs/v1.1.x-design-notes.md` that turned the "in-flight" 20 open
|
||||
calls into "Decided 2026-06-01" entries. Those were on the working
|
||||
tree at branch creation; folding them into the v1.1.1 branch keeps
|
||||
the design rationale colocated with the implementation.
|
||||
|
||||
## 2. Scope coverage (Done / Partial / Skipped)
|
||||
|
||||
| Scope item | Status | Notes |
|
||||
|---|---|---|
|
||||
| **1. KV store** | Done | Migration 0007, `KvService` trait in shared, `KvServiceImpl` + `PostgresKvRepo` in manager-core, Rhai `kv::collection(name).{get,set,has,delete,list}` bridge, cursor pagination, empty-collection rejection, script-as-gate authz. |
|
||||
| **2. Triggers framework — Layout E** | Done | Migrations 0008 (`triggers` + `kv_trigger_details` + `dead_letter_trigger_details`), `TriggerRepo` + `PostgresTriggerRepo`, CRUD admin endpoints. `registered_by_principal` column captured + threaded into the dispatcher. Depth-limit enforced in the dispatcher (default 8). |
|
||||
| **3. Universal outbox + dispatcher** | Done | Migration 0009 (`outbox`), `OutboxRepo` + `PostgresOutboxRepo`, `Dispatcher` tokio task. Polls every 100ms, claims 8 rows/tick via `FOR UPDATE SKIP LOCKED`, gate-bounds dispatch, retries with backoff+jitter, dead-letters on exhaustion, late-completion → `abandoned_executions`. |
|
||||
| **4. NATS-style sync HTTP** | Done | `InboxRegistry` in orchestrator-core (in-process `Mutex<HashMap<Uuid, oneshot::Sender>>`), `InboxResolver` trait in shared. Orchestrator sync-route path registers receiver, writes outbox row with `reply_to`, awaits with timeout = script.timeout + 2s buffer. Status mapping per design notes §3 (422/502/503/504/507/500). |
|
||||
| **5. `dispatch_mode: async` HTTP routes** | Done | Migration 0012 adds the column (default `sync`). `DispatchMode` enum in shared. Route admin payload + RouteRepository serialize it. Compiled routes carry it; the matcher returns it in `Matched`. Orchestrator branches: async → outbox + 202; sync → outbox + inbox. |
|
||||
| **6. Dead letters** | Done | Migration 0010 (`dead_letters`), `DeadLetterRepo` + `DeadLetterService` + `PostgresDeadLetterService`. Rhai `dead_letters::{replay,resolve}` + admin endpoints (`GET /count`, `GET /`, `GET /{id}`, `POST /{id}/replay`, `POST /{id}/resolve`). `Capability::AppDeadLetterManage(AppId)` enforced. List intentionally NOT shipped (deferred to v1.2). Recursion-stop rule (handler-failure annotates original DL as `handler_failed`) implemented in the dispatcher. |
|
||||
| **7. Abandoned executions** | Done | Migration 0011, `AbandonedRepo` + `PostgresAbandonedRepo`, dispatcher writes a row on dropped-receiver inbox delivery. Metric path reserved (`TODO(metrics)` markers in dispatcher.rs). |
|
||||
| **8. Retry policy defaults** | Done | `TriggerConfig::from_env` (new module). Env vars: `PICLOUD_MAX_TRIGGER_DEPTH`, `PICLOUD_TRIGGER_RETRY_{MAX_ATTEMPTS,BACKOFF,BASE_MS,JITTER_PCT}`, `PICLOUD_DEAD_LETTER_RETENTION_DAYS`, `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`. Per-trigger overrides applied at trigger-creation time. |
|
||||
| **9. `ctx.event` for triggered scripts** | Done | `TriggerEvent` enum in shared (KV / DeadLetter variants), `SdkCallCx.event: Option<TriggerEvent>` + `is_dead_letter_handler: bool`. `engine.rs::build_ctx_map` flattens the event into `ctx.event` for triggered handlers; direct invocations leave the key absent. Shape matches design notes §4 (KV with op + collection + key + value; dead_letter with original + attempts + last_error + ids + timestamps). |
|
||||
| **10. Dashboard surface** | Done | Per-app red badge with unresolved count on apps list + per-app detail page. New `apps/[slug]/dead-letters/+page.svelte` list view with all design-notes-mandated columns + Replay + Mark resolved actions + expandable row detail. svelte-check passes (369 files, 0 errors, 0 warnings). |
|
||||
| **11. Workspace version bump** | Done | Workspace `1.1.0` → `1.1.1`, SDK `1.1` → `1.2`, dashboard `0.6.0` → `0.7.0`. CHANGELOG.md created at repo root. |
|
||||
|
||||
## 3. Key implementation decisions / deviations
|
||||
|
||||
### Outbox column set (deferred to implementation per design notes §2)
|
||||
Chose:
|
||||
- `script_id` denormalized — dispatcher resolves the target without
|
||||
re-joining for the common path.
|
||||
- `trigger_id` polymorphic (no DB FK) — references `triggers.id` for
|
||||
`source_kind IN {kv, dead_letter}`, `routes.id` for
|
||||
`source_kind = 'http'`. Discrimination in Rust at dispatch time.
|
||||
- `claimed_by TEXT` — pid-based for MVP; cluster mode can use any
|
||||
identifier without schema change.
|
||||
- `trigger_depth` + `root_execution_id` denormalized so the
|
||||
dispatcher rebuilds `ExecRequest` without joining back to the
|
||||
originating execution log.
|
||||
- No explicit `is_dead_letter_handler` column — dispatcher infers
|
||||
from the trigger's `kind` field at dispatch time.
|
||||
|
||||
### KV pagination
|
||||
- **Cursor-style**, opaque base64-encoded last-key.
|
||||
- Page-size cap of 1000 with default 100 (enforced in repo).
|
||||
- Documented in `crates/shared/src/kv.rs` and the SDK function
|
||||
comment.
|
||||
|
||||
### KV TTL
|
||||
- Blueprint §8.1 reserved an `expires_at` column. v1.1.1 design notes
|
||||
don't surface TTL through the SDK (`set(k,v)` has no TTL argument)
|
||||
so the column is **omitted from migration 0007**. Adding it later
|
||||
is a non-breaking forward migration. Recorded in CHANGELOG as a
|
||||
deferred item.
|
||||
|
||||
### Authz scope mapping (seven-scope commitment)
|
||||
The four new capabilities map onto existing scopes — **no new scope
|
||||
variants** to honour the `Scope` enum's "exactly seven values"
|
||||
contract (`crates/shared/src/auth.rs:103`):
|
||||
|
||||
| Capability | Scope |
|
||||
|---|---|
|
||||
| `AppKvRead` | `script:read` |
|
||||
| `AppKvWrite` | `script:write` |
|
||||
| `AppManageTriggers` | `app:admin` |
|
||||
| `AppDeadLetterManage` | `app:admin` |
|
||||
|
||||
`role_satisfies` grants `AppKvRead` at the Viewer role, `AppKvWrite`
|
||||
at Editor, and both trigger / DL caps at AppAdmin.
|
||||
|
||||
### Script-as-gate authz for SDK calls
|
||||
- `KvServiceImpl` runs `authz::require` only when
|
||||
`cx.principal.is_some()`. Anonymous public-HTTP scripts (the
|
||||
common case for public routes) bypass the cap check.
|
||||
- Cross-app isolation is **independent** of this — enforced by
|
||||
`cx.app_id` being the only source of `app_id` on every query.
|
||||
- `PostgresDeadLetterService::{replay,resolve}` keeps a hard
|
||||
`require` (no `if let Some`) — managing dead letters is an admin
|
||||
act per design notes §4. Public scripts with `principal: None`
|
||||
fail the check, which is correct.
|
||||
|
||||
### Trait split: `OutboxRepo` vs `OutboxWriter`
|
||||
orchestrator-core can't depend on manager-core (would invert the
|
||||
dependency arrow). Defined a small `OutboxWriter` trait in
|
||||
`picloud-shared` with a single `enqueue_http` method.
|
||||
`PostgresOutboxRepo` implements both `OutboxRepo` (dispatcher
|
||||
surface) and `OutboxWriter` (orchestrator surface); the picloud
|
||||
binary clones one concrete Arc into both trait views — mirrors the
|
||||
existing `members_concrete` / `AuthzRepo` pattern.
|
||||
|
||||
### `InboxResolver` lives in shared, `InboxRegistry` in orchestrator-core
|
||||
Same split rationale — the dispatcher (manager-core) only depends on
|
||||
the trait, while the in-process impl lives next to its consumer.
|
||||
Cluster mode (v1.3+) swaps the impl for `LISTEN/NOTIFY` behind the
|
||||
unchanged trait.
|
||||
|
||||
### manager-core now depends on executor-core
|
||||
Previously manager-core only depended on orchestrator-core. The
|
||||
dispatcher needs `ExecRequest`/`ExecResponse`/`ExecError`/
|
||||
`InvocationType` from `executor-core` to build invocation
|
||||
descriptors. This is the transport DTO interpretation of the
|
||||
working-rules "don't reach across `*-core` crates" — DTOs are fine,
|
||||
behaviour is the bright line.
|
||||
|
||||
### Sync HTTP via outbox is the default for the user-routes path
|
||||
The orchestrator's user-route handler is fully on the NATS-style
|
||||
path now — every sync HTTP request writes to the outbox and awaits
|
||||
inbox delivery. Adds ~2-5ms per request per design notes §3 latency
|
||||
budget. `/api/v1/execute/{id}` (the admin/dev bypass) still calls
|
||||
the executor directly since it doesn't need the unified
|
||||
observability — kept for simplicity and admin tooling speed.
|
||||
|
||||
### Trigger-depth check is on the outbox row, not in the executor
|
||||
Dispatcher rejects depth-exceeded rows **before** trying to
|
||||
execute. The `cx.trigger_depth` field is informational on the
|
||||
executor side. Rejection writes a log + (reserved) metric and
|
||||
deletes the row — no DL, per design notes §4.
|
||||
|
||||
## 4. Tests added
|
||||
|
||||
### Unit tests (no DB required)
|
||||
- `manager-core::kv_service::tests` (10 tests) — round-trip, missing
|
||||
key returns None, `has` predicate, `delete` was-present,
|
||||
empty-collection rejection, **cross-app isolation**, anonymous-cx
|
||||
skips authz, authed-cx-with-no-role is Forbidden, owner-can-write,
|
||||
cursor pagination via in-memory KvRepo + denying authz repo.
|
||||
- `manager-core::trigger_config::tests` (2 tests) — conservative
|
||||
defaults, backoff round-trips.
|
||||
- `manager-core::trigger_repo::tests` (1 test) — `collection_matches`
|
||||
glob behaviour (`*`, `prefix:*`, exact).
|
||||
- `manager-core::dispatcher::tests` (5 tests) — exponential / linear /
|
||||
constant backoff math, jitter within bounds, ExecError →
|
||||
InboxFailureKind classification, failure-kind → status-code mapping.
|
||||
- `manager-core::abandoned_repo::tests` (2 tests) — truncate
|
||||
char-boundary safety.
|
||||
- `manager-core::triggers_api::tests` (5 tests) — unknown-app 404,
|
||||
member-without-role 403, default fallback for retry settings,
|
||||
empty-glob rejection, cross-app delete is treated as not-found.
|
||||
- `orchestrator-core::inbox::tests` (4 tests) — register/deliver
|
||||
round-trip, unknown-id is Abandoned, dropped receiver is
|
||||
Abandoned, explicit cancel removes sender.
|
||||
- `executor-core::engine::tests` (3 new) — `ctx.event` absent for
|
||||
direct invocations, KV insert shape matches design notes,
|
||||
KV delete has unit value.
|
||||
- `executor-core::sdk_kv` integration suite (7 tests) — runs a real
|
||||
Rhai engine under `spawn_blocking` against an in-memory
|
||||
`KvService` impl. Covers handle pattern, round-trip, unit-on-
|
||||
missing, has predicate, delete-was-present, empty-collection
|
||||
throws, cursor pagination, **cross-app isolation through the
|
||||
bridge**.
|
||||
|
||||
**Total: 47 new tests across the workspace.** Workspace test counts
|
||||
after v1.1.1: 63 manager-core / 56 orchestrator-core / 17
|
||||
executor-core engine / 7 sdk_kv / 30 sdk_contract / 43 stdlib /
|
||||
21 picloud / 6 shared.
|
||||
|
||||
### Intentionally untested
|
||||
- DB-backed integration tests for the full dispatcher loop, KV→
|
||||
trigger→DL retry chain, sync HTTP via outbox round-trip,
|
||||
recursion-stop end-to-end. These need a real Postgres harness;
|
||||
the reviewer runs them via the manual smoke flow below.
|
||||
- Postgres-specific repo behaviour (sqlx query correctness). The
|
||||
repos compile and run against the schema, but no integration
|
||||
test crate spins up a DB in this branch — same pattern as v1.1.0
|
||||
(see existing `ignored, needs DATABASE_URL` test markers).
|
||||
|
||||
## 5. Open questions for the reviewer
|
||||
|
||||
1. **Outbox `claimed_at` clearing on success.** The dispatcher
|
||||
`delete`s the outbox row after success / DL. For failures it
|
||||
reschedules (which sets `claimed_at = NULL`). Both flows are
|
||||
correct, but if you imagine a crash between the executor return
|
||||
and the row update, the row stays claimed forever. Cluster mode
|
||||
should add a periodic "unstick stale claims" sweep. Not in
|
||||
v1.1.1 scope but worth surfacing.
|
||||
2. **Sync HTTP overhead.** Every sync HTTP request now goes through
|
||||
the outbox (write + dispatcher pickup + inbox delivery).
|
||||
Measured overhead expected ~2-5ms per design notes §3. No
|
||||
benchmarking yet — recommend the reviewer pick a representative
|
||||
script and compare 95p latency vs v1.1.0 if performance matters.
|
||||
3. **HTTP outbox rows don't run as a principal.** The orchestrator's
|
||||
public HTTP path has no authenticated user; the
|
||||
`origin_principal` field on the outbox row is forensic. The
|
||||
resulting `ExecRequest.principal = None`, so the script runs
|
||||
anonymously — matches direct execution. If you'd prefer
|
||||
triggered-from-HTTP scripts to inherit a derived principal
|
||||
(e.g. the route's app's owner), that's an additive change.
|
||||
4. **Dispatcher uses `ASYNC_EXEC_TIMEOUT = 300s` for async rows.**
|
||||
Async dispatches don't have a script-level timeout (no
|
||||
originating HTTP request to bound). Picked the same platform
|
||||
cap as `LocalExecutorClient`. If async needs a different cap,
|
||||
easy to thread through `TriggerConfig`.
|
||||
5. **Dispatcher tick cadence is 100ms.** Bounded enough that
|
||||
fan-out feels instant; loose enough that an idle process
|
||||
doesn't burn cycles. If the reviewer wants tighter latency,
|
||||
bump to 50ms or use `LISTEN/NOTIFY` for wake-up (v1.3+ work).
|
||||
6. **CHANGELOG.md is new.** Followed the rest of the repo's
|
||||
convention from git log (release commits + design-notes
|
||||
references). If a different format is preferred, easy to swap.
|
||||
|
||||
## 6. Deferred to later releases
|
||||
|
||||
- `dead_letters::list(filter)` Rhai SDK — design notes §4 defers
|
||||
to v1.2 to align with `docs::find()` query DSL.
|
||||
- KV TTL (`set(k, v, ttl_secs)`) — blueprint reserved it; v1.1.1
|
||||
SDK doesn't surface it. Forward-compat (no schema cost).
|
||||
- Auto-disable of triggers whose script was deleted — design notes
|
||||
§4 says current handling is metric+log; auto-disable is v1.2.
|
||||
- Per-app dead-letter retention — design notes §4 says env-only in
|
||||
v1.1.1.
|
||||
- Metrics counter emit for `picloud_trigger_depth_exceeded`,
|
||||
`picloud_dead_letter_handler_failures`,
|
||||
`picloud_abandoned_executions_total`. Code paths log the
|
||||
occurrences with `tracing::warn`/`error`; the actual
|
||||
counter-emit code is a `TODO(metrics)` comment in the
|
||||
dispatcher. Metrics surface is v1.1.7+ per the roadmap.
|
||||
- DB-backed integration tests for the dispatcher loop (see §4
|
||||
intentionally-untested).
|
||||
- Sync HTTP performance benchmarks comparing v1.1.0 direct path vs
|
||||
v1.1.1 outbox path.
|
||||
|
||||
## 7. How to verify locally
|
||||
|
||||
### Static checks (all green on this branch)
|
||||
```sh
|
||||
cargo fmt --all -- --check
|
||||
cargo clippy --all-targets --all-features -- -D warnings
|
||||
cargo test --workspace
|
||||
cd dashboard && npm run check && npm run build
|
||||
```
|
||||
|
||||
### Migration integrity
|
||||
```sh
|
||||
docker compose down -v && docker compose up -d postgres
|
||||
cargo run -p picloud # applies 0001..0012 from empty
|
||||
```
|
||||
Then start from `main` (v1.1.0 schema state) and switch to this
|
||||
branch; restart `picloud` to apply 0007..0012 on top.
|
||||
|
||||
### Manual end-to-end smoke (reviewer should run)
|
||||
```sh
|
||||
docker compose up -d
|
||||
# 1. Bootstrap an owner user via the existing flow + create app A.
|
||||
# 2. Create a script in A whose body is: throw "boom"
|
||||
# 3. POST /api/v1/admin/apps/{A}/triggers/kv with
|
||||
# {"script_id": "<broken>", "collection_glob": "*", "ops": ["insert"]}
|
||||
# 4. From another script (or a public HTTP route):
|
||||
# kv::collection("widgets").set("k1", #{n:1})
|
||||
# 5. Wait ~7 seconds (3 attempts × ~1/2/4s backoff with ±20% jitter).
|
||||
# 6. Open the dashboard at /admin.
|
||||
# 7. Apps list shows a red "1" badge next to app A.
|
||||
# 8. Click into app A → "Dead letters" tab link → row visible.
|
||||
# 9. Click row → full payload + error history.
|
||||
# 10. Click "Replay" → row marks resolution='replayed', new outbox
|
||||
# row written, dispatcher re-runs the handler (fails again,
|
||||
# produces a NEW DL row).
|
||||
# 11. Click "Mark resolved" on the original DL → resolution='ignored'.
|
||||
```
|
||||
|
||||
### Async route smoke
|
||||
```sh
|
||||
# Create a route via POST /api/v1/admin/scripts/{id}/routes with
|
||||
# {"host_kind":"any","path_kind":"exact","path":"/work","dispatch_mode":"async"}
|
||||
curl -X POST -d '{"work":"thing"}' http://localhost:8080/work
|
||||
# Expect: HTTP 202 + {"accepted_at":"...","execution_id":"..."}
|
||||
# Then tail execution_logs — the script ran later (not synchronously).
|
||||
```
|
||||
|
||||
### Trigger-depth limit smoke
|
||||
```sh
|
||||
# Set a low depth limit + register a KV trigger whose script
|
||||
# writes to KV again — creates a loop.
|
||||
PICLOUD_MAX_TRIGGER_DEPTH=3 cargo run -p picloud
|
||||
# kv.set(...) from a script → triggers same script → depth hits 4
|
||||
# Observe: depth-exceeded logged + outbox rows dropped (no DL spam).
|
||||
```
|
||||
|
||||
## 8. Known limitations / rough edges
|
||||
|
||||
- **No DB-backed integration tests in this branch.** Unit tests
|
||||
cover trait behaviour with in-memory backings; sqlx query
|
||||
correctness is verified by the workspace compile + manual smoke.
|
||||
- **Dispatcher concurrency is in-process serial-per-tick.** Up to
|
||||
8 rows claimed per tick, processed one at a time. Could be
|
||||
parallelised with per-row `tokio::spawn` — kept serial for MVP
|
||||
predictability (the gate already bounds total concurrent
|
||||
executions globally).
|
||||
- **Metric emission is TODO** at the three spots noted in
|
||||
Open Questions §5. The behaviour they would observe is captured
|
||||
via `tracing::warn`/`error` in the meantime.
|
||||
- **`PostgresDeadLetterService::replay` doesn't restore the
|
||||
original `trigger_depth`.** Replays start at depth 0. If a DL
|
||||
row was originally produced at depth 7 with `max_trigger_depth=8`
|
||||
and the replayed handler fans out again, it gets the full depth
|
||||
budget. Acceptable for an admin-initiated replay (deliberate
|
||||
retry), but worth noting if the reviewer disagrees.
|
||||
- **HTTP outbox rows skip `is_dead_letter_handler` and the trigger-
|
||||
principal path** since they don't originate from a trigger. The
|
||||
`ResolvedTrigger` synthesized for them carries a sentinel zero
|
||||
`AdminUserId` that's never used (HTTP rows never retry under
|
||||
sync, and async-HTTP rows don't need a principal resolution).
|
||||
- **DataPlaneState's executor field is still generic** (`Arc<E>`
|
||||
where `E: ExecutorClient`). The dispatcher uses `Arc<dyn
|
||||
ExecutorClient>` directly. The picloud binary clones the same
|
||||
`Arc<LocalExecutorClient>` into both — works because the
|
||||
concrete type implements both the trait object and the generic
|
||||
bound.
|
||||
- **dispatcher always sets `principal: None` for HTTP rows.** As
|
||||
noted in Open Question §3, HTTP outbox rows don't resolve a
|
||||
principal. Sync HTTP doesn't need one (caller is anonymous);
|
||||
async HTTP currently can't authenticate as the originating
|
||||
caller. If that's not the intent, additive change.
|
||||
- **Cluster-mode crash recovery for claimed rows.** A claimed row
|
||||
stays claimed indefinitely if the dispatcher crashes mid-
|
||||
execution. v1.1.1 has one dispatcher per process so this is
|
||||
rare; cluster mode (v1.3+) needs a stale-claim sweeper.
|
||||
|
||||
---
|
||||
|
||||
Branch ready for review. Reviewer reads this report + audits the
|
||||
diff. Do not merge to main until the audit clears.
|
||||
151
REVIEW.md
Normal file
151
REVIEW.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# v1.1.1 Audit & Review
|
||||
|
||||
**Branch:** `feat/v1.1.1-storage-and-events`
|
||||
**Base:** `main` (v1.1.0)
|
||||
**Commits ahead:** 12
|
||||
**Audited by:** reviewer (this report)
|
||||
**Audited against:** `docs/v1.1.x-design-notes.md` §1–4 (Decided 2026-06-01) + the original v1.1.1 dispatch prompt
|
||||
|
||||
## Verdict
|
||||
|
||||
**APPROVE — ready to merge to `main` as v1.1.1.**
|
||||
|
||||
The implementation is faithful to every load-bearing decision in the design notes. Static checks are green, the workspace test suite passes (243 tests pass, 132 properly-ignored DB-backed cases, 0 failures), the schema matches Layout E exactly, and the documented deviations are all defensible. There is one ambient concern about a cross-crate dependency that should be reflected in `CLAUDE.md` after the merge, but it is not a merge blocker.
|
||||
|
||||
---
|
||||
|
||||
## 1. Static checks reproduced
|
||||
|
||||
```
|
||||
cargo fmt --all -- --check ✅ clean
|
||||
cargo clippy --all-targets --all-features -- -D warnings ✅ no findings
|
||||
cargo test --workspace ✅ 243 passed / 0 failed
|
||||
(132 ignored — DB-backed integration tests,
|
||||
same convention as v1.1.0; documented in HANDBACK §4)
|
||||
```
|
||||
|
||||
Test distribution per crate matches HANDBACK §4:
|
||||
- manager-core: 63
|
||||
- orchestrator-core: 56
|
||||
- stdlib: 43
|
||||
- sdk_contract: 30
|
||||
- picloud: 21
|
||||
- executor-core (engine): 17
|
||||
- sdk_kv: 7
|
||||
- shared: 6
|
||||
|
||||
47 of these are new in v1.1.1; the rest are v1.1.0's existing suite still passing.
|
||||
|
||||
## 2. Design-notes conformance (spot-checks)
|
||||
|
||||
| Decision | Where it lives | Verdict |
|
||||
|---|---|---|
|
||||
| Layout E trigger storage (parent + per-kind detail) | [0008_triggers.sql:22-72](crates/manager-core/migrations/0008_triggers.sql#L22-L72) | ✅ matches exactly; parent has common columns + the four retry/dispatch knobs + `registered_by_principal`; per-kind detail tables for `kv` and `dead_letter` only |
|
||||
| `routes` stays separate from `triggers` parent | [0012_routes_dispatch_mode.sql](crates/manager-core/migrations/0012_routes_dispatch_mode.sql), [0009_outbox.sql:13-18](crates/manager-core/migrations/0009_outbox.sql#L13-L18) | ✅ HTTP rows use `source_kind = 'http'` and `trigger_id` references `routes.id`; non-HTTP references `triggers.id`; polymorphism in Rust per the design-notes deferral of the column-set refinement |
|
||||
| Sync HTTP via outbox + NATS-style inbox | [inbox.rs:30-89](crates/orchestrator-core/src/inbox.rs#L30-L89), [dispatcher.rs:359-394](crates/manager-core/src/dispatcher.rs#L359-L394) | ✅ `oneshot::Sender<InboxResult>` keyed by inbox_id; `deliver()` returns `Delivered` or `Abandoned` exactly per the design-notes failure-mode table |
|
||||
| `reply_to.is_some()` never retries | [dispatcher.rs:376-394](crates/manager-core/src/dispatcher.rs#L376-L394) | ✅ failure path checks `reply_to` first; delivers single outcome to inbox; deletes outbox row regardless of error |
|
||||
| Status code table (422/502/503/504/507/500) | [dispatcher.rs:555-564](crates/manager-core/src/dispatcher.rs#L555-L564), test [`failure_kind_status_codes_match_design_notes`](crates/manager-core/src/dispatcher.rs#L674) | ✅ exact mapping; covered by a dedicated test |
|
||||
| `dispatch_mode = async` returns `202 Accepted` + JSON body | [api.rs:325-332](crates/orchestrator-core/src/api.rs#L325-L332) | ✅ body shape is `{"accepted_at": rfc3339, "execution_id": uuid}` — matches design notes §2 verbatim |
|
||||
| Default retry: 3/exp/1000ms/±20% jitter | [trigger_config.rs](crates/manager-core/src/trigger_config.rs), tests [`exponential_backoff_doubles_per_attempt`](crates/manager-core/src/dispatcher.rs#L621), [`jitter_within_pct_of_base`](crates/manager-core/src/dispatcher.rs#L647) | ✅ env-overridable; jitter test exercises the ±20% bound across 100 samples |
|
||||
| `abandoned_executions` written on dropped receiver | [dispatcher.rs:480-509](crates/manager-core/src/dispatcher.rs#L480-L509) | ✅ written only when `InboxDeliveryOutcome::Abandoned` returns; ordinary timeout-with-receiver-still-alive does not write a row |
|
||||
| Dead-letter recursion stop (flag on execution) | [dispatcher.rs:396-425](crates/manager-core/src/dispatcher.rs#L396-L425), [trigger_repo.rs `TriggerKind::DeadLetter` → `is_dead_letter_handler`](crates/manager-core/src/dispatcher.rs#L228-L229) | ✅ flag set when dispatcher resolves a `kind = 'dead_letter'` trigger; on failure, original DL annotated with `resolution = 'handler_failed'`, row deleted, never retried, never DL'd |
|
||||
| Sync HTTP failures do NOT dead-letter | [dispatcher.rs:378-394](crates/manager-core/src/dispatcher.rs#L378-L394) | ✅ early return before the DL-write block |
|
||||
| `dead_letters::list` NOT shipped (deferred to v1.2) | [executor-core/src/sdk/dead_letters.rs:13](crates/executor-core/src/sdk/dead_letters.rs#L13) | ✅ explicit doc-comment citing design notes §4; only `replay` + `resolve` registered |
|
||||
| Trigger execution runs as registrant's principal | [dispatcher.rs:249-253](crates/manager-core/src/dispatcher.rs#L249-L253) + [`registered_by_principal` column](crates/manager-core/migrations/0008_triggers.sql#L39) | ✅ principal resolved from the trigger row at dispatch time |
|
||||
| 30-day DL retention, env-overridable | [gc.rs](crates/manager-core/src/gc.rs) | ✅ |
|
||||
| 7-day abandoned-executions retention | [gc.rs](crates/manager-core/src/gc.rs) | ✅ |
|
||||
| Trigger-depth limit (default 8); depth-exceeded does NOT dead-letter | [dispatcher.rs:122-137](crates/manager-core/src/dispatcher.rs#L122-L137) | ✅ design-notes §4 honored ("depth-exceeded means you built a loop") — row dropped + logged, no DL spam |
|
||||
| Dashboard surface: badge + list view + Replay + Mark resolved | [dashboard/src/routes/apps/+page.svelte](dashboard/src/routes/apps/+page.svelte), [dashboard/src/routes/apps/\[slug\]/dead-letters/+page.svelte](dashboard/src/routes/apps/[slug]/dead-letters/+page.svelte) | ✅ all required columns + actions + expandable row detail; `npm run check` reports 0 errors |
|
||||
| Status: workspace 1.1.0 → 1.1.1, SDK 1.1 → 1.2, dashboard 0.6.0 → 0.7.0, CHANGELOG.md created | last commit `66b661f` | ✅ |
|
||||
| `ctx.event` shape (KV: source/op/collection/key/value; DL: original/attempts/last_error/ids/timestamps) | [shared/src/trigger_event.rs](crates/shared/src/trigger_event.rs), [executor-core engine tests](crates/executor-core/src/engine.rs) | ✅ matches design notes §4 shape exactly; tests verify both variants + the "absent for direct invocations" rule |
|
||||
|
||||
I sampled the design-notes diff (`git diff main..HEAD -- docs/v1.1.x-design-notes.md`) — every "Decided 2026-06-01" entry the agent absorbed into commit `1efb350` matches the decisions made in conversation. No drift.
|
||||
|
||||
## 3. Deviations from the prompt (all reviewed, all acceptable)
|
||||
|
||||
The HANDBACK's §3 lists nine deviations / mid-implementation decisions. My take on each:
|
||||
|
||||
1. **Outbox column set chosen** (`script_id`, `trigger_id` polymorphic, `claimed_by TEXT`, `trigger_depth`, `root_execution_id` denormalized; no `is_dead_letter_handler` column). The design notes explicitly deferred this set to implementation. The chosen shape is sensible: dispatcher can build `ExecRequest` without re-joining; the `is_dead_letter_handler` derivation from `triggers.kind` at dispatch time is cleaner than storing redundant state. ✅
|
||||
|
||||
2. **KV pagination is cursor-style** (base64-encoded last-key, 100 default / 1000 max). The prompt left this open; cursor-style is the right default for KV-shaped data. ✅
|
||||
|
||||
3. **KV TTL deferred**. Blueprint §8.1 reserved `expires_at` but v1.1.1 SDK doesn't surface TTL. Omitting the column from migration 0007 keeps the schema minimal; adding it later is a non-breaking forward migration. ✅ (CHANGELOG records the deferral.)
|
||||
|
||||
4. **Authz scope mapping** (4 new capabilities mapped to existing 7 scopes — `AppKvRead → script:read`, `AppKvWrite → script:write`, `AppManageTriggers → app:admin`, `AppDeadLetterManage → app:admin`). The "seven-scope commitment" is a project convention in `crates/shared/src/auth.rs:103` the prompt didn't mention; honoring it is correct. The specific mapping is defensible: a token with `script:read` on an app already implies "can see the data behind those scripts," and admin-level scope for trigger/DL management is standard for control-plane operations. ✅
|
||||
|
||||
5. **Script-as-gate authz** (`if cx.principal.is_some()` then check; else skip — public HTTP runs anonymously without an authz failure). This matches the SDK-shape doc's note that "the data plane is unauthenticated by default — public HTTP scripts run with `None`." Cross-app isolation is preserved regardless (every query keyed by `cx.app_id`). DL replay/resolve correctly bypasses this and hard-requires a principal. ✅
|
||||
|
||||
6. **Trait split `OutboxRepo` vs `OutboxWriter`**. Orchestrator-core can't depend on manager-core; the small `OutboxWriter` trait in shared (one method) lets the orchestrator enqueue HTTP rows without inverting the dependency arrow. ✅ Pattern mirrors the existing `members_concrete`/`AuthzRepo` split.
|
||||
|
||||
7. **`InboxResolver` in shared, `InboxRegistry` in orchestrator-core**. Same split rationale. Cluster mode (v1.3+) swaps the impl behind the unchanged trait. ✅
|
||||
|
||||
8. **manager-core now depends on executor-core**. ⚠️ **See §4 below — flagged, accepted, but should be reflected in `CLAUDE.md`.**
|
||||
|
||||
9. **Sync HTTP via outbox is the default for user routes** (admin bypass `/api/v1/execute/{id}` keeps direct dispatch). Matches the design-notes decision; the bypass's direct path is acceptable for admin tooling speed. ✅
|
||||
|
||||
## 4. The one concern worth surfacing: manager-core → executor-core
|
||||
|
||||
`CLAUDE.md` working rules say:
|
||||
|
||||
> Honor the three-service boundary. Don't reach across `*-core` crates. If
|
||||
> orchestrator-core needs something from manager-core, define a trait in
|
||||
> shared and inject the impl.
|
||||
|
||||
The dispatcher in manager-core directly imports `ExecRequest`, `ExecResponse`, `ExecError`, and `InvocationType` from `executor-core`:
|
||||
|
||||
```rust
|
||||
// crates/manager-core/src/dispatcher.rs:27
|
||||
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
|
||||
```
|
||||
|
||||
The HANDBACK justifies this as "DTOs vs behavior — types are fine, behavior is the bright line." That's a defensible interpretation, but not what `CLAUDE.md` actually says.
|
||||
|
||||
**Two options the project can pick:**
|
||||
|
||||
- **(a) Accept the dependency and update `CLAUDE.md`** to clarify that the three-service boundary is about *behavior*, not *types* — `ExecRequest`/`ExecResponse`/`ExecError` are transport DTOs and crossing the wire is normal. This is the lower-friction choice and matches how the agent's instincts ran.
|
||||
- **(b) Refactor**: move `ExecRequest`/`ExecResponse`/`ExecError`/`InvocationType` to `shared`. About 200 lines of moves; would land cleanly as a follow-up PR.
|
||||
|
||||
**My recommendation: (a)**. The dispatcher genuinely needs to construct and interpret these types, and they're the natural "what the executor produces" surface — burying them in shared makes the executor's public API less self-contained. But the rule as currently written disagrees; we should pick one explicitly.
|
||||
|
||||
This is **not a merge blocker** for v1.1.1 — the implementation already exists and works. The CLAUDE.md update can land as a small commit on `main` after the merge.
|
||||
|
||||
## 5. Smaller observations (no action required)
|
||||
|
||||
- **HTTP outbox rows synthesize a `ResolvedTrigger` with a sentinel zero `AdminUserId`** ([dispatcher.rs:342](crates/manager-core/src/dispatcher.rs#L342)). The HANDBACK flags this as a code smell; I agree, but the cleaner shape (`enum DispatchTarget { Trigger(ResolvedTrigger), Http(HttpRoute) }`) is a refactor that doesn't belong in v1.1.1. Worth doing in v1.1.2 alongside the docs work since the dispatcher will gain another trigger kind.
|
||||
- **Triggers parent `dispatch_mode` defaults to `'async'`** ([0008_triggers.sql:30](crates/manager-core/migrations/0008_triggers.sql#L30)) with `sync` allowed by the CHECK constraint but unsupported in v1.1.1 (sync trigger would mean firing inline with the originating mutation, which we don't do). The migration comment captures this; worth a future commit to either remove `'sync'` from the CHECK or use it for an `inline_pre_mutate` semantics if it ever makes sense. Not v1.1.1's problem.
|
||||
- **Metric counters are TODO** at three call sites (`picloud_trigger_depth_exceeded`, `picloud_dead_letter_handler_failures`, `picloud_abandoned_executions_total`). The events are logged via `tracing::warn`/`error` in the meantime. Per the prompt and roadmap, metrics surface is v1.1.7+. ✅
|
||||
- **Dispatcher tick cadence is 100ms with `CLAIM_BATCH = 8`**, serial per tick. The ExecutionGate bounds total concurrent executions globally, so parallelism within a tick is purely an optimization. Reasonable MVP choice; can parallelize later without changing semantics.
|
||||
- **Open Q1 in HANDBACK (claimed-rows-stuck-on-crash)** is a real cluster-mode concern, correctly out-of-scope for v1.1.1 (single dispatcher per process). Cluster mode adds a stale-claim sweeper — track for v1.3+.
|
||||
- **Open Q3 in HANDBACK (HTTP-triggered scripts run with `principal: None`)** is correct as-is. The "trigger executions inherit the registrant's principal" decision applies to triggers; HTTP routes have no registrant in that sense. Public HTTP is anonymous by design.
|
||||
|
||||
## 6. Versioning audit
|
||||
|
||||
| File | Before | After | Status |
|
||||
|---|---|---|---|
|
||||
| Workspace `Cargo.toml` (workspace.package.version) | 1.1.0 | 1.1.1 | ✅ |
|
||||
| SDK schema version (`shared/src/version.rs`) | 1.1 | 1.2 | ✅ correctly bumped — the SDK surface added `KvService` + `DeadLetterService` + `TriggerEvent` |
|
||||
| Dashboard `package.json` | 0.6.0 | 0.7.0 | ✅ |
|
||||
| Migrations | 0001..0006 | 0007..0012 added | ✅ sequential, no skips |
|
||||
| CHANGELOG.md | not present | created at repo root | ✅ first entry covers v1.1.1 |
|
||||
|
||||
## 7. Manual smoke recommendation
|
||||
|
||||
The reviewer (you) does **not** need to run the manual end-to-end smoke before merging — the automated tests + the static review above cover the contracts. The smoke flow in HANDBACK §7 is worth running **after merge** as a release-validation step before tagging `v1.1.1` (if the project tags releases). Specifically:
|
||||
|
||||
1. `docker compose up -d` (fresh DB)
|
||||
2. `cargo run -p picloud`
|
||||
3. Create app + script-that-throws + KV trigger
|
||||
4. Trigger a KV write → wait ~7s → confirm DL row appears
|
||||
5. Dashboard: red badge on apps list, list view shows the row, Replay creates a new outbox row + dispatcher re-runs, Mark resolved sets `resolution = 'ignored'`
|
||||
6. Async route test: `POST /work` with `dispatch_mode=async` route → expect 202 + JSON body
|
||||
|
||||
If any of those misbehave post-merge, revert is straightforward (12 commits, ahead of main, no dependencies have pulled changes yet).
|
||||
|
||||
## 8. Recommended next steps (post-merge)
|
||||
|
||||
1. **Merge** `feat/v1.1.1-storage-and-events` into `main` (fast-forward; branch is linear ahead).
|
||||
2. **Tag** `v1.1.1` if release tagging is the project convention (git log shows v1.1.0 had a release commit but I didn't see a tag — confirm with the project owner).
|
||||
3. **Small CLAUDE.md update** clarifying the three-service boundary's scope (types crossing is fine; behavior crossing is what's prohibited). One-paragraph change.
|
||||
4. **Pause** before dispatching the v1.1.2 (Documents) agent — the v1.1.1 work shipped substantial infrastructure that v1.1.2 will lean on, and there may be small lessons from the v1.1.1 implementation to fold into the v1.1.2 prompt (e.g., reaffirming the "manager-core depends on executor-core for DTOs" pattern explicitly so the docs agent doesn't second-guess it).
|
||||
|
||||
Branch is ready for merge. Verdict: **APPROVE**.
|
||||
@@ -14,7 +14,18 @@ picloud-shared.workspace = true
|
||||
serde.workspace = true
|
||||
serde_json.workspace = true
|
||||
thiserror.workspace = true
|
||||
tokio.workspace = true
|
||||
tracing.workspace = true
|
||||
uuid.workspace = true
|
||||
chrono.workspace = true
|
||||
rhai.workspace = true
|
||||
|
||||
# Stdlib utility modules — see crates/executor-core/src/sdk/stdlib/.
|
||||
regex.workspace = true
|
||||
rand.workspace = true
|
||||
base64.workspace = true
|
||||
hex.workspace = true
|
||||
percent-encoding.workspace = true
|
||||
|
||||
[dev-dependencies]
|
||||
async-trait.workspace = true
|
||||
|
||||
@@ -3,7 +3,9 @@ use std::sync::{Arc, Mutex};
|
||||
use std::time::Instant;
|
||||
|
||||
use chrono::Utc;
|
||||
use picloud_shared::{ScriptValidator, SdkCallCx, Services, ValidationError, SDK_VERSION};
|
||||
use picloud_shared::{
|
||||
ScriptValidator, SdkCallCx, Services, TriggerEvent, ValidationError, SDK_VERSION,
|
||||
};
|
||||
use rhai::{Dynamic, Engine as RhaiEngine, EvalAltResult, Map, Module, Scope};
|
||||
use serde_json::Value as Json;
|
||||
|
||||
@@ -75,6 +77,8 @@ impl Engine {
|
||||
request_id: req.request_id,
|
||||
trigger_depth: req.trigger_depth,
|
||||
root_execution_id: req.root_execution_id,
|
||||
is_dead_letter_handler: req.is_dead_letter_handler,
|
||||
event: req.event.clone(),
|
||||
});
|
||||
sdk::register_all(&mut engine, &self.services, cx);
|
||||
|
||||
@@ -143,6 +147,11 @@ fn build_engine(limits: Limits, logs: Option<Arc<Mutex<Vec<LogEntry>>>>) -> Rhai
|
||||
engine.register_static_module("log", build_log_module(logs).into());
|
||||
}
|
||||
|
||||
// Stateless utility modules — regex::/random::/time::/json::/base64::/
|
||||
// hex::/url::. Always registered, including in the parse-only validate
|
||||
// path, so script authors get consistent surface in both phases.
|
||||
sdk::stdlib::register_stdlib(&mut engine);
|
||||
|
||||
engine
|
||||
}
|
||||
|
||||
@@ -234,9 +243,82 @@ fn build_ctx_map(req: &ExecRequest) -> Map {
|
||||
request.insert("rest".into(), req.rest.clone().into());
|
||||
|
||||
ctx.insert("request".into(), request.into());
|
||||
|
||||
// Triggered invocations: surface the originating event as
|
||||
// `ctx.event`. Direct ingress (HTTP request, manual run) leaves
|
||||
// the key absent so scripts can test `if "event" in ctx`.
|
||||
if let Some(event) = req.event.as_ref() {
|
||||
ctx.insert("event".into(), trigger_event_to_dynamic(event));
|
||||
}
|
||||
|
||||
ctx
|
||||
}
|
||||
|
||||
/// Convert a `TriggerEvent` into the `ctx.event` Rhai shape defined in
|
||||
/// `docs/v1.1.x-design-notes.md` §4 (the dead-letter sub-shape) and
|
||||
/// §2/blueprint §9 (KV). Each variant becomes a Rhai map with a
|
||||
/// `source` discriminant plus per-source fields.
|
||||
fn trigger_event_to_dynamic(event: &TriggerEvent) -> Dynamic {
|
||||
let mut m = Map::new();
|
||||
m.insert("source".into(), event.source().into());
|
||||
match event {
|
||||
TriggerEvent::Kv {
|
||||
op,
|
||||
collection,
|
||||
key,
|
||||
value,
|
||||
} => {
|
||||
m.insert("op".into(), op.as_str().into());
|
||||
let mut kv_map = Map::new();
|
||||
kv_map.insert("collection".into(), collection.clone().into());
|
||||
kv_map.insert("key".into(), key.clone().into());
|
||||
kv_map.insert(
|
||||
"value".into(),
|
||||
value.clone().map_or(Dynamic::UNIT, json_to_dynamic),
|
||||
);
|
||||
m.insert("kv".into(), kv_map.into());
|
||||
}
|
||||
TriggerEvent::DeadLetter {
|
||||
dead_letter_id,
|
||||
original,
|
||||
attempts,
|
||||
last_error,
|
||||
trigger_id,
|
||||
script_id,
|
||||
first_attempt_at,
|
||||
last_attempt_at,
|
||||
} => {
|
||||
let mut dl = Map::new();
|
||||
dl.insert("id".into(), dead_letter_id.to_string().into());
|
||||
dl.insert("original".into(), trigger_event_to_dynamic(original));
|
||||
dl.insert("attempts".into(), i64::from(*attempts).into());
|
||||
dl.insert("last_error".into(), last_error.clone().into());
|
||||
dl.insert(
|
||||
"trigger_id".into(),
|
||||
trigger_id
|
||||
.map(|id| Dynamic::from(id.to_string()))
|
||||
.unwrap_or(Dynamic::UNIT),
|
||||
);
|
||||
dl.insert(
|
||||
"script_id".into(),
|
||||
script_id
|
||||
.map(|id| Dynamic::from(id.to_string()))
|
||||
.unwrap_or(Dynamic::UNIT),
|
||||
);
|
||||
dl.insert(
|
||||
"first_attempt_at".into(),
|
||||
first_attempt_at.to_rfc3339().into(),
|
||||
);
|
||||
dl.insert(
|
||||
"last_attempt_at".into(),
|
||||
last_attempt_at.to_rfc3339().into(),
|
||||
);
|
||||
m.insert("dead_letter".into(), dl.into());
|
||||
}
|
||||
}
|
||||
m.into()
|
||||
}
|
||||
|
||||
fn invocation_type_str(it: InvocationType) -> &'static str {
|
||||
match it {
|
||||
InvocationType::Http => "http",
|
||||
|
||||
84
crates/executor-core/src/sdk/dead_letters.rs
Normal file
84
crates/executor-core/src/sdk/dead_letters.rs
Normal file
@@ -0,0 +1,84 @@
|
||||
//! `dead_letters::` Rhai bridge.
|
||||
//!
|
||||
//! ```rhai
|
||||
//! dead_letters::replay("01234567-..."); // re-enqueue + mark replayed
|
||||
//! dead_letters::resolve("01234567-...", "ignored"); // close out the row
|
||||
//! ```
|
||||
//!
|
||||
//! Sync↔async via `Handle::current().block_on(...)` — same pattern as
|
||||
//! the `kv::` bridge (works because `LocalExecutorClient` runs the
|
||||
//! script under `spawn_blocking`).
|
||||
//!
|
||||
//! `dead_letters::list(filter)` is intentionally NOT shipped — design
|
||||
//! notes §4 defers it to v1.2 to align with the `docs::find()` query
|
||||
//! DSL.
|
||||
|
||||
use std::str::FromStr;
|
||||
use std::sync::Arc;
|
||||
|
||||
use picloud_shared::{DeadLetterError, DeadLetterId, SdkCallCx, Services};
|
||||
use rhai::{Engine as RhaiEngine, EvalAltResult, Module};
|
||||
use tokio::runtime::Handle as TokioHandle;
|
||||
use uuid::Uuid;
|
||||
|
||||
pub(super) fn register(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
|
||||
let svc = services.dead_letters.clone();
|
||||
let mut module = Module::new();
|
||||
{
|
||||
let svc = svc.clone();
|
||||
let cx = cx.clone();
|
||||
module.set_native_fn(
|
||||
"replay",
|
||||
move |id: &str| -> Result<(), Box<EvalAltResult>> {
|
||||
let dl_id = parse_dl_id(id)?;
|
||||
let svc = svc.clone();
|
||||
let cx = cx.clone();
|
||||
block_on(async move { svc.replay(&cx, dl_id).await })
|
||||
},
|
||||
);
|
||||
}
|
||||
{
|
||||
let svc = svc.clone();
|
||||
let cx = cx.clone();
|
||||
module.set_native_fn(
|
||||
"resolve",
|
||||
move |id: &str, reason: &str| -> Result<(), Box<EvalAltResult>> {
|
||||
let dl_id = parse_dl_id(id)?;
|
||||
let reason = reason.to_string();
|
||||
let svc = svc.clone();
|
||||
let cx = cx.clone();
|
||||
block_on(async move { svc.resolve(&cx, dl_id, &reason).await })
|
||||
},
|
||||
);
|
||||
}
|
||||
engine.register_static_module("dead_letters", module.into());
|
||||
}
|
||||
|
||||
fn parse_dl_id(s: &str) -> Result<DeadLetterId, Box<EvalAltResult>> {
|
||||
Uuid::from_str(s)
|
||||
.map(DeadLetterId::from)
|
||||
.map_err(|e| -> Box<EvalAltResult> {
|
||||
EvalAltResult::ErrorRuntime(
|
||||
format!("dead_letters: invalid id {s:?}: {e}").into(),
|
||||
rhai::Position::NONE,
|
||||
)
|
||||
.into()
|
||||
})
|
||||
}
|
||||
|
||||
fn block_on<F>(fut: F) -> Result<(), Box<EvalAltResult>>
|
||||
where
|
||||
F: std::future::Future<Output = Result<(), DeadLetterError>> + Send,
|
||||
{
|
||||
let handle = TokioHandle::try_current().map_err(|e| -> Box<EvalAltResult> {
|
||||
EvalAltResult::ErrorRuntime(
|
||||
format!("dead_letters: no tokio runtime available: {e}").into(),
|
||||
rhai::Position::NONE,
|
||||
)
|
||||
.into()
|
||||
})?;
|
||||
handle.block_on(fut).map_err(|err| -> Box<EvalAltResult> {
|
||||
EvalAltResult::ErrorRuntime(format!("dead_letters: {err}").into(), rhai::Position::NONE)
|
||||
.into()
|
||||
})
|
||||
}
|
||||
193
crates/executor-core/src/sdk/kv.rs
Normal file
193
crates/executor-core/src/sdk/kv.rs
Normal file
@@ -0,0 +1,193 @@
|
||||
//! `kv::` Rhai bridge — collection-scoped handle pattern.
|
||||
//!
|
||||
//! ```rhai
|
||||
//! let widgets = kv::collection("widgets");
|
||||
//! widgets.set("k", #{ n: 1 });
|
||||
//! let v = widgets.get("k"); // value or () if absent
|
||||
//! if widgets.has("k") { ... }
|
||||
//! widgets.delete("k"); // bool (was-present)
|
||||
//! let page = widgets.list(); // returns #{ keys: [...], next_cursor: () }
|
||||
//! ```
|
||||
//!
|
||||
//! The `KvHandle` custom Rhai type captures the collection name once
|
||||
//! and routes each call through the injected `Arc<dyn KvService>` with
|
||||
//! the per-call `Arc<SdkCallCx>`. **The service derives `app_id` from
|
||||
//! `cx.app_id` — `app_id` never appears in any function signature
|
||||
//! script-side, preserving cross-app isolation.**
|
||||
//!
|
||||
//! Sync↔async bridge: Rhai is synchronous; the underlying service is
|
||||
//! async. Closures wrap each call in `Handle::current().block_on(...)`
|
||||
//! — safe because `LocalExecutorClient` runs the script under
|
||||
//! `spawn_blocking`, so a runtime handle is reachable and blocking on
|
||||
//! it doesn't park an async worker.
|
||||
//!
|
||||
//! Error convention (per `docs/sdk-shape.md`):
|
||||
//! - throw on failure (Rhai runtime error string)
|
||||
//! - `()` for absent values (`get` on a missing key)
|
||||
//! - `bool` for predicates (`has`; also `delete` returns was-present)
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use picloud_shared::{KvError, KvService, SdkCallCx, Services};
|
||||
use rhai::{Array, Dynamic, Engine as RhaiEngine, EvalAltResult, Map, Module};
|
||||
use tokio::runtime::Handle as TokioHandle;
|
||||
|
||||
use super::bridge::{dynamic_to_json, json_to_dynamic};
|
||||
|
||||
/// Per-call handle captured by the Rhai SDK. Cheap to clone (two Arcs
|
||||
/// plus an owned string).
|
||||
#[derive(Clone)]
|
||||
pub struct KvHandle {
|
||||
collection: String,
|
||||
service: Arc<dyn KvService>,
|
||||
cx: Arc<SdkCallCx>,
|
||||
}
|
||||
|
||||
pub(super) fn register(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
|
||||
let kv_service = services.kv.clone();
|
||||
|
||||
// `kv::collection(name)` — handle constructor lives in the `kv`
|
||||
// static module so the script-visible call is `kv::collection(...)`.
|
||||
let mut module = Module::new();
|
||||
{
|
||||
let kv_service = kv_service.clone();
|
||||
let cx = cx.clone();
|
||||
module.set_native_fn(
|
||||
"collection",
|
||||
move |name: &str| -> Result<KvHandle, Box<EvalAltResult>> {
|
||||
if name.is_empty() {
|
||||
return Err("kv::collection name must not be empty".into());
|
||||
}
|
||||
Ok(KvHandle {
|
||||
collection: name.to_string(),
|
||||
service: kv_service.clone(),
|
||||
cx: cx.clone(),
|
||||
})
|
||||
},
|
||||
);
|
||||
}
|
||||
engine.register_static_module("kv", module.into());
|
||||
|
||||
// Methods on KvHandle — `register_fn` with `&mut KvHandle` first
|
||||
// argument lets Rhai dispatch them as `handle.get(k)` /
|
||||
// `handle.set(k, v)` / etc. through the dot-notation.
|
||||
engine.register_type_with_name::<KvHandle>("KvHandle");
|
||||
|
||||
register_get(engine);
|
||||
register_set(engine);
|
||||
register_has(engine);
|
||||
register_delete(engine);
|
||||
register_list(engine);
|
||||
}
|
||||
|
||||
fn register_get(engine: &mut RhaiEngine) {
|
||||
engine.register_fn(
|
||||
"get",
|
||||
|handle: &mut KvHandle, key: &str| -> Result<Dynamic, Box<EvalAltResult>> {
|
||||
let h = handle.clone();
|
||||
block_on(async move { h.service.get(&h.cx, &h.collection, key).await })
|
||||
.map(|opt| opt.map_or(Dynamic::UNIT, json_to_dynamic))
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_set(engine: &mut RhaiEngine) {
|
||||
engine.register_fn(
|
||||
"set",
|
||||
|handle: &mut KvHandle, key: &str, value: Dynamic| -> Result<(), Box<EvalAltResult>> {
|
||||
let h = handle.clone();
|
||||
let json = dynamic_to_json(&value);
|
||||
block_on(async move { h.service.set(&h.cx, &h.collection, key, json).await })
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_has(engine: &mut RhaiEngine) {
|
||||
engine.register_fn(
|
||||
"has",
|
||||
|handle: &mut KvHandle, key: &str| -> Result<bool, Box<EvalAltResult>> {
|
||||
let h = handle.clone();
|
||||
block_on(async move { h.service.has(&h.cx, &h.collection, key).await })
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_delete(engine: &mut RhaiEngine) {
|
||||
engine.register_fn(
|
||||
"delete",
|
||||
|handle: &mut KvHandle, key: &str| -> Result<bool, Box<EvalAltResult>> {
|
||||
let h = handle.clone();
|
||||
block_on(async move { h.service.delete(&h.cx, &h.collection, key).await })
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_list(engine: &mut RhaiEngine) {
|
||||
// Zero-arg form — full page, no cursor.
|
||||
engine.register_fn(
|
||||
"list",
|
||||
|handle: &mut KvHandle| -> Result<Map, Box<EvalAltResult>> { list_call(handle, None, 0) },
|
||||
);
|
||||
|
||||
// One-arg form — cursor only.
|
||||
engine.register_fn(
|
||||
"list",
|
||||
|handle: &mut KvHandle, cursor: &str| -> Result<Map, Box<EvalAltResult>> {
|
||||
list_call(handle, Some(cursor.to_string()), 0)
|
||||
},
|
||||
);
|
||||
|
||||
// Two-arg form — cursor + limit.
|
||||
engine.register_fn(
|
||||
"list",
|
||||
|handle: &mut KvHandle, cursor: &str, limit: i64| -> Result<Map, Box<EvalAltResult>> {
|
||||
let limit = u32::try_from(limit.max(0)).unwrap_or(0);
|
||||
list_call(handle, Some(cursor.to_string()), limit)
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn list_call(
|
||||
handle: &KvHandle,
|
||||
cursor: Option<String>,
|
||||
limit: u32,
|
||||
) -> Result<Map, Box<EvalAltResult>> {
|
||||
let h = handle.clone();
|
||||
let page = block_on(async move {
|
||||
h.service
|
||||
.list(&h.cx, &h.collection, cursor.as_deref(), limit)
|
||||
.await
|
||||
})?;
|
||||
let mut m = Map::new();
|
||||
let keys: Array = page.keys.into_iter().map(Dynamic::from).collect();
|
||||
m.insert("keys".into(), keys.into());
|
||||
m.insert(
|
||||
"next_cursor".into(),
|
||||
page.next_cursor.map_or(Dynamic::UNIT, Dynamic::from),
|
||||
);
|
||||
Ok(m)
|
||||
}
|
||||
|
||||
/// Run an async future inside the synchronous Rhai context.
|
||||
///
|
||||
/// `LocalExecutorClient` wraps script execution in `spawn_blocking`, so
|
||||
/// the current Tokio runtime is reachable via `Handle::current()`. We
|
||||
/// block on it directly; we are NOT calling this from an async task,
|
||||
/// so blocking is the correct primitive (`block_in_place` would also
|
||||
/// work, but we're already on a blocking worker).
|
||||
fn block_on<F, T>(fut: F) -> Result<T, Box<EvalAltResult>>
|
||||
where
|
||||
F: std::future::Future<Output = Result<T, KvError>> + Send,
|
||||
T: Send,
|
||||
{
|
||||
let handle = TokioHandle::try_current().map_err(|e| -> Box<EvalAltResult> {
|
||||
EvalAltResult::ErrorRuntime(
|
||||
format!("kv: no tokio runtime available: {e}").into(),
|
||||
rhai::Position::NONE,
|
||||
)
|
||||
.into()
|
||||
})?;
|
||||
handle.block_on(fut).map_err(|err| -> Box<EvalAltResult> {
|
||||
EvalAltResult::ErrorRuntime(format!("kv: {err}").into(), rhai::Position::NONE).into()
|
||||
})
|
||||
}
|
||||
@@ -13,6 +13,9 @@
|
||||
|
||||
pub mod bridge;
|
||||
pub mod cx;
|
||||
pub mod dead_letters;
|
||||
pub mod kv;
|
||||
pub mod stdlib;
|
||||
|
||||
pub use bridge::{dynamic_to_json, json_to_dynamic};
|
||||
pub use cx::SdkCallCx;
|
||||
@@ -26,14 +29,9 @@ use rhai::Engine as RhaiEngine;
|
||||
/// once per invocation, just after `build_engine` constructs the
|
||||
/// sandboxed Rhai engine and just before script compilation.
|
||||
///
|
||||
/// v1.1.0 ships an intentionally empty body — the call site exists so
|
||||
/// future PRs (KV first) drop their registration logic here rather
|
||||
/// than reaching into `engine.rs::build_engine`. The signature is
|
||||
/// locked: subsequent PRs MUST keep the same parameter shape so that
|
||||
/// hosts don't have to re-thread the plumbing.
|
||||
/// v1.1.1 wires the first stateful service (KV). Subsequent PRs add a
|
||||
/// single `<service>::register(...)` line per service.
|
||||
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
|
||||
// Intentionally inert in v1.1.0. The unused-suppression below is a
|
||||
// load-bearing placeholder: future PRs replace this `let _` with
|
||||
// real `register_kv(engine, services, cx.clone())` calls etc.
|
||||
let _ = (engine, services, cx);
|
||||
kv::register(engine, services, cx.clone());
|
||||
dead_letters::register(engine, services, cx);
|
||||
}
|
||||
|
||||
48
crates/executor-core/src/sdk/stdlib/base64.rs
Normal file
48
crates/executor-core/src/sdk/stdlib/base64.rs
Normal file
@@ -0,0 +1,48 @@
|
||||
//! `base64::` — standard and URL-safe Base64.
|
||||
//!
|
||||
//! Two encoders are exposed: standard alphabet with padding (`encode`/
|
||||
//! `decode`) and URL-safe alphabet without padding (`encode_url`/
|
||||
//! `decode_url`). Each encoder accepts both `String` and `Blob` inputs
|
||||
//! as separate Rhai overloads; decoders always return `Blob` — the
|
||||
//! caller knows whether the original bytes were textual.
|
||||
|
||||
use base64::engine::general_purpose::{STANDARD, URL_SAFE_NO_PAD};
|
||||
use base64::Engine as _;
|
||||
use rhai::{Blob, Engine as RhaiEngine, EvalAltResult, Module};
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
|
||||
module.set_native_fn("encode", |s: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(STANDARD.encode(s.as_bytes()))
|
||||
});
|
||||
module.set_native_fn("encode", |b: Blob| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(STANDARD.encode(&b))
|
||||
});
|
||||
module.set_native_fn("decode", |s: &str| -> Result<Blob, Box<EvalAltResult>> {
|
||||
STANDARD
|
||||
.decode(s)
|
||||
.map_err(|e| format!("base64::decode: {e}").into())
|
||||
});
|
||||
|
||||
module.set_native_fn(
|
||||
"encode_url",
|
||||
|s: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(URL_SAFE_NO_PAD.encode(s.as_bytes()))
|
||||
},
|
||||
);
|
||||
module.set_native_fn(
|
||||
"encode_url",
|
||||
|b: Blob| -> Result<String, Box<EvalAltResult>> { Ok(URL_SAFE_NO_PAD.encode(&b)) },
|
||||
);
|
||||
module.set_native_fn(
|
||||
"decode_url",
|
||||
|s: &str| -> Result<Blob, Box<EvalAltResult>> {
|
||||
URL_SAFE_NO_PAD
|
||||
.decode(s)
|
||||
.map_err(|e| format!("base64::decode_url: {e}").into())
|
||||
},
|
||||
);
|
||||
|
||||
engine.register_static_module("base64", module.into());
|
||||
}
|
||||
21
crates/executor-core/src/sdk/stdlib/hex.rs
Normal file
21
crates/executor-core/src/sdk/stdlib/hex.rs
Normal file
@@ -0,0 +1,21 @@
|
||||
//! `hex::` — hexadecimal encode/decode (lowercase output, case-
|
||||
//! insensitive input). String and Blob inputs are both accepted on
|
||||
//! encode; decode always returns `Blob`.
|
||||
|
||||
use rhai::{Blob, Engine as RhaiEngine, EvalAltResult, Module};
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
|
||||
module.set_native_fn("encode", |s: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(hex::encode(s.as_bytes()))
|
||||
});
|
||||
module.set_native_fn("encode", |b: Blob| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(hex::encode(&b))
|
||||
});
|
||||
module.set_native_fn("decode", |s: &str| -> Result<Blob, Box<EvalAltResult>> {
|
||||
hex::decode(s).map_err(|e| format!("hex::decode: {e}").into())
|
||||
});
|
||||
|
||||
engine.register_static_module("hex", module.into());
|
||||
}
|
||||
43
crates/executor-core/src/sdk/stdlib/json.rs
Normal file
43
crates/executor-core/src/sdk/stdlib/json.rs
Normal file
@@ -0,0 +1,43 @@
|
||||
//! `json::` — JSON parse and stringify. Reuses the bridge functions in
|
||||
//! `crate::sdk::bridge` so script-visible JSON has the same shape
|
||||
//! (numbers, maps, arrays, nulls) as `ctx.request.body` already does.
|
||||
|
||||
use rhai::{Dynamic, Engine as RhaiEngine, EvalAltResult, Module};
|
||||
|
||||
use crate::sdk::bridge::{dynamic_to_json, json_to_dynamic};
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
register_parse(&mut module);
|
||||
register_stringify(&mut module);
|
||||
register_stringify_pretty(&mut module);
|
||||
engine.register_static_module("json", module.into());
|
||||
}
|
||||
|
||||
fn register_parse(module: &mut Module) {
|
||||
module.set_native_fn("parse", |s: &str| -> Result<Dynamic, Box<EvalAltResult>> {
|
||||
let value: serde_json::Value =
|
||||
serde_json::from_str(s).map_err(|e| format!("json::parse: {e}"))?;
|
||||
Ok(json_to_dynamic(value))
|
||||
});
|
||||
}
|
||||
|
||||
fn register_stringify(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"stringify",
|
||||
|v: Dynamic| -> Result<String, Box<EvalAltResult>> {
|
||||
serde_json::to_string(&dynamic_to_json(&v))
|
||||
.map_err(|e| format!("json::stringify: {e}").into())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_stringify_pretty(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"stringify_pretty",
|
||||
|v: Dynamic| -> Result<String, Box<EvalAltResult>> {
|
||||
serde_json::to_string_pretty(&dynamic_to_json(&v))
|
||||
.map_err(|e| format!("json::stringify_pretty: {e}").into())
|
||||
},
|
||||
);
|
||||
}
|
||||
25
crates/executor-core/src/sdk/stdlib/mod.rs
Normal file
25
crates/executor-core/src/sdk/stdlib/mod.rs
Normal file
@@ -0,0 +1,25 @@
|
||||
//! Stateless utility modules registered once at engine build via
|
||||
//! `Engine::register_static_module`. They have no per-call state, no
|
||||
//! cross-app sensitivity, and no `SdkCallCx` — distinguishing them
|
||||
//! from stateful service modules (KV, docs, …) which hook into
|
||||
//! `sdk::register_all` instead. See [docs/sdk-shape.md](../../../../../docs/sdk-shape.md).
|
||||
|
||||
use rhai::Engine as RhaiEngine;
|
||||
|
||||
pub mod base64;
|
||||
pub mod hex;
|
||||
pub mod json;
|
||||
pub mod random;
|
||||
pub mod regex;
|
||||
pub mod time;
|
||||
pub mod url;
|
||||
|
||||
pub fn register_stdlib(engine: &mut RhaiEngine) {
|
||||
regex::register(engine);
|
||||
random::register(engine);
|
||||
time::register(engine);
|
||||
json::register(engine);
|
||||
base64::register(engine);
|
||||
hex::register(engine);
|
||||
url::register(engine);
|
||||
}
|
||||
70
crates/executor-core/src/sdk/stdlib/random.rs
Normal file
70
crates/executor-core/src/sdk/stdlib/random.rs
Normal file
@@ -0,0 +1,70 @@
|
||||
//! `random::` — CSPRNG primitives (`rand::rngs::OsRng`).
|
||||
//!
|
||||
//! Only the OS RNG is exposed. No "fast non-crypto" variant — scripts
|
||||
//! should not pick between secure and insecure entropy. Output sizes
|
||||
//! are capped to keep a single script call from blowing host memory.
|
||||
|
||||
use rand::distributions::{Alphanumeric, DistString};
|
||||
use rand::{rngs::OsRng, Rng, RngCore};
|
||||
use rhai::{Blob, Engine as RhaiEngine, EvalAltResult, Module};
|
||||
use uuid::Uuid;
|
||||
|
||||
const MAX_BYTES: i64 = 65_536;
|
||||
const MAX_STRING: i64 = 4_096;
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
register_int(&mut module);
|
||||
register_float(&mut module);
|
||||
register_bytes(&mut module);
|
||||
register_string(&mut module);
|
||||
register_uuid(&mut module);
|
||||
engine.register_static_module("random", module.into());
|
||||
}
|
||||
|
||||
fn register_int(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"int",
|
||||
|min: i64, max: i64| -> Result<i64, Box<EvalAltResult>> {
|
||||
if min > max {
|
||||
return Err(format!("random::int: min ({min}) > max ({max})").into());
|
||||
}
|
||||
Ok(OsRng.gen_range(min..=max))
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_float(module: &mut Module) {
|
||||
module.set_native_fn("float", || -> Result<f64, Box<EvalAltResult>> {
|
||||
Ok(OsRng.gen::<f64>())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_bytes(module: &mut Module) {
|
||||
module.set_native_fn("bytes", |n: i64| -> Result<Blob, Box<EvalAltResult>> {
|
||||
if !(0..=MAX_BYTES).contains(&n) {
|
||||
return Err(format!("random::bytes: n must be in 0..={MAX_BYTES}, got {n}").into());
|
||||
}
|
||||
// Safe: n is non-negative and bounded by MAX_BYTES, which fits in usize.
|
||||
let len = usize::try_from(n).expect("n bounded above by MAX_BYTES");
|
||||
let mut buf = vec![0u8; len];
|
||||
OsRng.fill_bytes(&mut buf);
|
||||
Ok(buf)
|
||||
});
|
||||
}
|
||||
|
||||
fn register_string(module: &mut Module) {
|
||||
module.set_native_fn("string", |n: i64| -> Result<String, Box<EvalAltResult>> {
|
||||
if !(0..=MAX_STRING).contains(&n) {
|
||||
return Err(format!("random::string: n must be in 0..={MAX_STRING}, got {n}").into());
|
||||
}
|
||||
let len = usize::try_from(n).expect("n bounded above by MAX_STRING");
|
||||
Ok(Alphanumeric.sample_string(&mut OsRng, len))
|
||||
});
|
||||
}
|
||||
|
||||
fn register_uuid(module: &mut Module) {
|
||||
module.set_native_fn("uuid", || -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(Uuid::new_v4().to_string())
|
||||
});
|
||||
}
|
||||
105
crates/executor-core/src/sdk/stdlib/regex.rs
Normal file
105
crates/executor-core/src/sdk/stdlib/regex.rs
Normal file
@@ -0,0 +1,105 @@
|
||||
//! `regex::` — non-backtracking regular expressions (Rust `regex` crate).
|
||||
//!
|
||||
//! Patterns compile per call. No cache: premature for v1.1.0, and the
|
||||
//! `regex` crate's linear-time guarantees keep per-call cost bounded.
|
||||
//! Catastrophic patterns are rejected at compile time by the crate
|
||||
//! itself; no extra defense needed.
|
||||
|
||||
use regex::Regex;
|
||||
use rhai::{Array, Dynamic, Engine as RhaiEngine, EvalAltResult, Module};
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
register_is_match(&mut module);
|
||||
register_find(&mut module);
|
||||
register_find_all(&mut module);
|
||||
register_replace(&mut module);
|
||||
register_replace_all(&mut module);
|
||||
register_split(&mut module);
|
||||
register_captures(&mut module);
|
||||
engine.register_static_module("regex", module.into());
|
||||
}
|
||||
|
||||
fn compile(pattern: &str) -> Result<Regex, Box<EvalAltResult>> {
|
||||
Regex::new(pattern).map_err(|e| format!("invalid regex: {e}").into())
|
||||
}
|
||||
|
||||
fn register_is_match(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"is_match",
|
||||
|pattern: &str, text: &str| -> Result<bool, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?.is_match(text))
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_find(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"find",
|
||||
|pattern: &str, text: &str| -> Result<Dynamic, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?
|
||||
.find(text)
|
||||
.map_or(Dynamic::UNIT, |m| Dynamic::from(m.as_str().to_string())))
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_find_all(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"find_all",
|
||||
|pattern: &str, text: &str| -> Result<Array, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?
|
||||
.find_iter(text)
|
||||
.map(|m| Dynamic::from(m.as_str().to_string()))
|
||||
.collect())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_replace(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"replace",
|
||||
|pattern: &str, text: &str, replacement: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?.replace(text, replacement).into_owned())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_replace_all(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"replace_all",
|
||||
|pattern: &str, text: &str, replacement: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?
|
||||
.replace_all(text, replacement)
|
||||
.into_owned())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_split(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"split",
|
||||
|pattern: &str, text: &str| -> Result<Array, Box<EvalAltResult>> {
|
||||
Ok(compile(pattern)?
|
||||
.split(text)
|
||||
.map(|s| Dynamic::from(s.to_string()))
|
||||
.collect())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_captures(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"captures",
|
||||
|pattern: &str, text: &str| -> Result<Dynamic, Box<EvalAltResult>> {
|
||||
let re = compile(pattern)?;
|
||||
Ok(re.captures(text).map_or(Dynamic::UNIT, |caps| {
|
||||
let arr: Array = caps
|
||||
.iter()
|
||||
.map(|m| m.map_or(Dynamic::UNIT, |m| Dynamic::from(m.as_str().to_string())))
|
||||
.collect();
|
||||
Dynamic::from(arr)
|
||||
}))
|
||||
},
|
||||
);
|
||||
}
|
||||
68
crates/executor-core/src/sdk/stdlib/time.rs
Normal file
68
crates/executor-core/src/sdk/stdlib/time.rs
Normal file
@@ -0,0 +1,68 @@
|
||||
//! `time::` — UTC time. The canonical "time value" is milliseconds
|
||||
//! since the Unix epoch as `i64`. ISO 8601 strings are for parsing and
|
||||
//! display only. UTC only — no timezone support in v1.1.0 (would pull
|
||||
//! in chrono-tz, deferred until a real use case demands it).
|
||||
|
||||
use chrono::{DateTime, SecondsFormat, Utc};
|
||||
use rhai::{Engine as RhaiEngine, EvalAltResult, Module};
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
register_now(&mut module);
|
||||
register_now_ms(&mut module);
|
||||
register_parse(&mut module);
|
||||
register_format(&mut module);
|
||||
register_add_seconds(&mut module);
|
||||
register_diff_seconds(&mut module);
|
||||
engine.register_static_module("time", module.into());
|
||||
}
|
||||
|
||||
fn register_now(module: &mut Module) {
|
||||
module.set_native_fn("now", || -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(Utc::now().to_rfc3339_opts(SecondsFormat::Millis, true))
|
||||
});
|
||||
}
|
||||
|
||||
fn register_now_ms(module: &mut Module) {
|
||||
module.set_native_fn("now_ms", || -> Result<i64, Box<EvalAltResult>> {
|
||||
Ok(Utc::now().timestamp_millis())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_parse(module: &mut Module) {
|
||||
module.set_native_fn("parse", |iso: &str| -> Result<i64, Box<EvalAltResult>> {
|
||||
DateTime::parse_from_rfc3339(iso)
|
||||
.map(|dt| dt.timestamp_millis())
|
||||
.map_err(|e| format!("time::parse: invalid ISO 8601 / RFC 3339: {e}").into())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_format(module: &mut Module) {
|
||||
module.set_native_fn("format", |ms: i64| -> Result<String, Box<EvalAltResult>> {
|
||||
DateTime::<Utc>::from_timestamp_millis(ms)
|
||||
.map(|dt| dt.to_rfc3339_opts(SecondsFormat::Millis, true))
|
||||
.ok_or_else(|| format!("time::format: ms ({ms}) out of representable range").into())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_add_seconds(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"add_seconds",
|
||||
|ms: i64, secs: i64| -> Result<i64, Box<EvalAltResult>> {
|
||||
secs.checked_mul(1000)
|
||||
.and_then(|delta| ms.checked_add(delta))
|
||||
.ok_or_else(|| format!("time::add_seconds: overflow (ms={ms}, secs={secs})").into())
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
fn register_diff_seconds(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"diff_seconds",
|
||||
|a_ms: i64, b_ms: i64| -> Result<i64, Box<EvalAltResult>> {
|
||||
b_ms.checked_sub(a_ms)
|
||||
.map(|d| d / 1000)
|
||||
.ok_or_else(|| format!("time::diff_seconds: overflow (a={a_ms}, b={b_ms})").into())
|
||||
},
|
||||
);
|
||||
}
|
||||
64
crates/executor-core/src/sdk/stdlib/url.rs
Normal file
64
crates/executor-core/src/sdk/stdlib/url.rs
Normal file
@@ -0,0 +1,64 @@
|
||||
//! `url::` — RFC 3986 percent-encoding.
|
||||
//!
|
||||
//! `encode`/`decode` operate on opaque component values; `encode_query`
|
||||
//! builds an `application/x-www-form-urlencoded`-style query string
|
||||
//! from a Rhai `Map`. Key ordering is the map's natural order (Rhai's
|
||||
//! `Map` is a `BTreeMap`, so keys come out alphabetically — fine for
|
||||
//! query strings, which RFC 3986 leaves unordered).
|
||||
|
||||
use percent_encoding::{percent_decode_str, utf8_percent_encode, AsciiSet, NON_ALPHANUMERIC};
|
||||
use rhai::{Engine as RhaiEngine, EvalAltResult, Map, Module};
|
||||
|
||||
/// RFC 3986 unreserved set: `A-Z / a-z / 0-9 / - / _ / . / ~`.
|
||||
/// Everything outside this set gets percent-encoded.
|
||||
const UNRESERVED: &AsciiSet = &NON_ALPHANUMERIC
|
||||
.remove(b'-')
|
||||
.remove(b'_')
|
||||
.remove(b'.')
|
||||
.remove(b'~');
|
||||
|
||||
pub fn register(engine: &mut RhaiEngine) {
|
||||
let mut module = Module::new();
|
||||
register_encode(&mut module);
|
||||
register_decode(&mut module);
|
||||
register_encode_query(&mut module);
|
||||
engine.register_static_module("url", module.into());
|
||||
}
|
||||
|
||||
fn register_encode(module: &mut Module) {
|
||||
module.set_native_fn("encode", |s: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
Ok(utf8_percent_encode(s, UNRESERVED).to_string())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_decode(module: &mut Module) {
|
||||
module.set_native_fn("decode", |s: &str| -> Result<String, Box<EvalAltResult>> {
|
||||
percent_decode_str(s)
|
||||
.decode_utf8()
|
||||
.map(std::borrow::Cow::into_owned)
|
||||
.map_err(|e| format!("url::decode: invalid UTF-8: {e}").into())
|
||||
});
|
||||
}
|
||||
|
||||
fn register_encode_query(module: &mut Module) {
|
||||
module.set_native_fn(
|
||||
"encode_query",
|
||||
|m: Map| -> Result<String, Box<EvalAltResult>> {
|
||||
let mut out = String::new();
|
||||
for (k, v) in m {
|
||||
if !out.is_empty() {
|
||||
out.push('&');
|
||||
}
|
||||
out.push_str(&utf8_percent_encode(&k, UNRESERVED).to_string());
|
||||
out.push('=');
|
||||
// Coerce values via `to_string` rather than throwing on
|
||||
// non-strings — scripts commonly pass numbers/bools here
|
||||
// and a forced cast at the call site is friction with
|
||||
// no upside.
|
||||
let value = v.to_string();
|
||||
out.push_str(&utf8_percent_encode(&value, UNRESERVED).to_string());
|
||||
}
|
||||
Ok(out)
|
||||
},
|
||||
);
|
||||
}
|
||||
@@ -1,7 +1,9 @@
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use chrono::{DateTime, Utc};
|
||||
use picloud_shared::{AppId, ExecutionId, Principal, RequestId, ScriptId, ScriptSandbox};
|
||||
use picloud_shared::{
|
||||
AppId, ExecutionId, Principal, RequestId, ScriptId, ScriptSandbox, TriggerEvent,
|
||||
};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use thiserror::Error;
|
||||
|
||||
@@ -79,6 +81,20 @@ pub struct ExecRequest {
|
||||
/// `execution_id` for direct invocations; preserves the root
|
||||
/// across fan-out for audit log grouping.
|
||||
pub root_execution_id: ExecutionId,
|
||||
|
||||
/// `true` only when the dispatcher resolved this invocation
|
||||
/// against a `dead_letter` trigger. The retry / dead-letter
|
||||
/// machinery short-circuits when this is set so handler failures
|
||||
/// cannot themselves be dead-lettered (design notes §4
|
||||
/// recursion-stop rule).
|
||||
#[serde(default)]
|
||||
pub is_dead_letter_handler: bool,
|
||||
|
||||
/// The originating event for a triggered invocation. `None` for
|
||||
/// direct ingress (sync HTTP, manual admin run). Flattened into
|
||||
/// `ctx.event` by the executor's per-call ctx builder.
|
||||
#[serde(default)]
|
||||
pub event: Option<TriggerEvent>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
|
||||
@@ -1,7 +1,9 @@
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use picloud_executor_core::{Engine, ExecError, ExecRequest, InvocationType, Limits, LogLevel};
|
||||
use picloud_shared::{AppId, ExecutionId, RequestId, ScriptId, ScriptSandbox, Services};
|
||||
use picloud_shared::{
|
||||
AppId, ExecutionId, KvEventOp, RequestId, ScriptId, ScriptSandbox, Services, TriggerEvent,
|
||||
};
|
||||
use serde_json::json;
|
||||
|
||||
fn req(body: serde_json::Value) -> ExecRequest {
|
||||
@@ -23,11 +25,13 @@ fn req(body: serde_json::Value) -> ExecRequest {
|
||||
principal: None,
|
||||
trigger_depth: 0,
|
||||
root_execution_id: execution_id,
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn engine() -> Engine {
|
||||
Engine::new(Limits::default(), Services::new())
|
||||
Engine::new(Limits::default(), Services::default())
|
||||
}
|
||||
|
||||
#[test]
|
||||
@@ -126,7 +130,7 @@ fn enforces_operation_budget() {
|
||||
max_operations: 1_000,
|
||||
..Limits::default()
|
||||
};
|
||||
let engine = Engine::new(limits, Services::new());
|
||||
let engine = Engine::new(limits, Services::default());
|
||||
// 10_000 iterations vastly exceeds 1_000 ops.
|
||||
let src = r"let n = 0; for i in 0..10000 { n += 1; } n";
|
||||
let err = engine
|
||||
@@ -235,3 +239,67 @@ fn body_passes_through_nested_json_round_trip() {
|
||||
let resp = engine().execute(src, req(body.clone())).unwrap();
|
||||
assert_eq!(resp.body, body);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ctx_event_absent_for_direct_invocations() {
|
||||
// Scripts not fired through the triggers framework see no
|
||||
// `ctx.event` key — they can use `"event" in ctx` to detect.
|
||||
let src = r#"
|
||||
if "event" in ctx { #{ statusCode: 500, body: "should be absent" } }
|
||||
else { "absent" }
|
||||
"#;
|
||||
let resp = engine().execute(src, req(json!(null))).unwrap();
|
||||
assert_eq!(resp.body, json!("absent"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ctx_event_kv_shape_matches_design_notes() {
|
||||
// Build an ExecRequest mimicking what the dispatcher hands a
|
||||
// KV-triggered handler — `event = Some(TriggerEvent::Kv { … })`.
|
||||
let mut r = req(json!(null));
|
||||
r.event = Some(TriggerEvent::Kv {
|
||||
op: KvEventOp::Insert,
|
||||
collection: "widgets".into(),
|
||||
key: "k1".into(),
|
||||
value: Some(json!({ "n": 1 })),
|
||||
});
|
||||
let src = r"
|
||||
#{
|
||||
source: ctx.event.source,
|
||||
op: ctx.event.op,
|
||||
collection: ctx.event.kv.collection,
|
||||
key: ctx.event.kv.key,
|
||||
value: ctx.event.kv.value
|
||||
}
|
||||
";
|
||||
let resp = engine().execute(src, r).unwrap();
|
||||
assert_eq!(
|
||||
resp.body,
|
||||
json!({
|
||||
"source": "kv",
|
||||
"op": "insert",
|
||||
"collection": "widgets",
|
||||
"key": "k1",
|
||||
"value": { "n": 1 }
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn ctx_event_kv_delete_has_unit_value() {
|
||||
let mut r = req(json!(null));
|
||||
r.event = Some(TriggerEvent::Kv {
|
||||
op: KvEventOp::Delete,
|
||||
collection: "widgets".into(),
|
||||
key: "k1".into(),
|
||||
value: None,
|
||||
});
|
||||
let src = r"
|
||||
#{
|
||||
op: ctx.event.op,
|
||||
value_is_unit: ctx.event.kv.value == ()
|
||||
}
|
||||
";
|
||||
let resp = engine().execute(src, r).unwrap();
|
||||
assert_eq!(resp.body, json!({ "op": "delete", "value_is_unit": true }));
|
||||
}
|
||||
|
||||
@@ -31,7 +31,7 @@ use serde_json::{json, Value};
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
fn engine() -> Engine {
|
||||
Engine::new(Limits::default(), Services::new())
|
||||
Engine::new(Limits::default(), Services::default())
|
||||
}
|
||||
|
||||
fn baseline_request() -> ExecRequest {
|
||||
@@ -53,6 +53,8 @@ fn baseline_request() -> ExecRequest {
|
||||
principal: None,
|
||||
trigger_depth: 0,
|
||||
root_execution_id: execution_id,
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
260
crates/executor-core/tests/sdk_kv.rs
Normal file
260
crates/executor-core/tests/sdk_kv.rs
Normal file
@@ -0,0 +1,260 @@
|
||||
//! `kv::` SDK bridge integration tests — runs a real Rhai engine
|
||||
//! against an in-memory `KvService` impl. Mirrors how
|
||||
//! `orchestrator-core::LocalExecutorClient` invokes the engine: under
|
||||
//! `tokio::task::spawn_blocking` so the bridge's `block_on` has a
|
||||
//! reachable runtime.
|
||||
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::sync::Arc;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_executor_core::{Engine, ExecRequest, InvocationType, Limits};
|
||||
use picloud_shared::{
|
||||
AppId, ExecutionId, KvError, KvListPage, KvService, NoopDeadLetterService, NoopEventEmitter,
|
||||
RequestId, ScriptId, ScriptSandbox, SdkCallCx, Services,
|
||||
};
|
||||
use serde_json::{json, Value};
|
||||
use tokio::sync::Mutex;
|
||||
|
||||
#[derive(Default)]
|
||||
struct InMemoryKv {
|
||||
data: Mutex<HashMap<(AppId, String, String), Value>>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl KvService for InMemoryKv {
|
||||
async fn get(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<Value>, KvError> {
|
||||
Ok(self
|
||||
.data
|
||||
.lock()
|
||||
.await
|
||||
.get(&(cx.app_id, collection.to_string(), key.to_string()))
|
||||
.cloned())
|
||||
}
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: Value,
|
||||
) -> Result<(), KvError> {
|
||||
self.data
|
||||
.lock()
|
||||
.await
|
||||
.insert((cx.app_id, collection.to_string(), key.to_string()), value);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
|
||||
Ok(self
|
||||
.data
|
||||
.lock()
|
||||
.await
|
||||
.remove(&(cx.app_id, collection.to_string(), key.to_string()))
|
||||
.is_some())
|
||||
}
|
||||
|
||||
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
|
||||
Ok(self.data.lock().await.contains_key(&(
|
||||
cx.app_id,
|
||||
collection.to_string(),
|
||||
key.to_string(),
|
||||
)))
|
||||
}
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvError> {
|
||||
let data = self.data.lock().await;
|
||||
let mut keys: Vec<String> = data
|
||||
.iter()
|
||||
.filter(|((a, c, _), _)| *a == cx.app_id && c == collection)
|
||||
.map(|((_, _, k), _)| k.clone())
|
||||
.filter(|k| cursor.is_none_or(|c| k.as_str() > c))
|
||||
.collect();
|
||||
keys.sort();
|
||||
let take = if limit == 0 {
|
||||
usize::MAX
|
||||
} else {
|
||||
limit as usize
|
||||
};
|
||||
let next_cursor = if keys.len() > take {
|
||||
keys.truncate(take);
|
||||
keys.last().cloned()
|
||||
} else {
|
||||
None
|
||||
};
|
||||
Ok(KvListPage { keys, next_cursor })
|
||||
}
|
||||
}
|
||||
|
||||
fn make_engine() -> Arc<Engine> {
|
||||
let services = Services::new(
|
||||
Arc::new(InMemoryKv::default()),
|
||||
Arc::new(NoopDeadLetterService),
|
||||
Arc::new(NoopEventEmitter),
|
||||
);
|
||||
Arc::new(Engine::new(Limits::default(), services))
|
||||
}
|
||||
|
||||
fn baseline_request(app_id: AppId) -> ExecRequest {
|
||||
let execution_id = ExecutionId::new();
|
||||
ExecRequest {
|
||||
execution_id,
|
||||
request_id: RequestId::new(),
|
||||
script_id: ScriptId::new(),
|
||||
script_name: "kv-test".into(),
|
||||
invocation_type: InvocationType::Http,
|
||||
path: "/kv-test".into(),
|
||||
headers: BTreeMap::new(),
|
||||
body: Value::Null,
|
||||
params: BTreeMap::new(),
|
||||
query: BTreeMap::new(),
|
||||
rest: String::new(),
|
||||
sandbox_overrides: ScriptSandbox::default(),
|
||||
app_id,
|
||||
principal: None,
|
||||
trigger_depth: 0,
|
||||
root_execution_id: execution_id,
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
async fn run_script(engine: Arc<Engine>, src: &str, req: ExecRequest) -> Value {
|
||||
let src = src.to_string();
|
||||
tokio::task::spawn_blocking(move || engine.execute(&src, req))
|
||||
.await
|
||||
.expect("spawn_blocking should not panic")
|
||||
.expect("script execution should succeed")
|
||||
.body
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_set_then_get_round_trip() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"
|
||||
let widgets = kv::collection("widgets");
|
||||
widgets.set("k1", #{ n: 1 });
|
||||
widgets.get("k1")
|
||||
"#;
|
||||
let body = run_script(engine, src, baseline_request(app)).await;
|
||||
assert_eq!(body, json!({ "n": 1 }));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_get_missing_returns_unit() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"
|
||||
let c = kv::collection("widgets");
|
||||
let v = c.get("nope");
|
||||
v == ()
|
||||
"#;
|
||||
let body = run_script(engine, src, baseline_request(app)).await;
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_has_returns_bool() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"
|
||||
let c = kv::collection("widgets");
|
||||
let before = c.has("k");
|
||||
c.set("k", "v");
|
||||
let after = c.has("k");
|
||||
#{ before: before, after: after }
|
||||
"#;
|
||||
let body = run_script(engine, src, baseline_request(app)).await;
|
||||
assert_eq!(body, json!({ "before": false, "after": true }));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_delete_returns_was_present() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"
|
||||
let c = kv::collection("widgets");
|
||||
let nope = c.delete("missing");
|
||||
c.set("k", 1);
|
||||
let yep = c.delete("k");
|
||||
#{ nope: nope, yep: yep }
|
||||
"#;
|
||||
let body = run_script(engine, src, baseline_request(app)).await;
|
||||
assert_eq!(body, json!({ "nope": false, "yep": true }));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_empty_collection_name_throws() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"kv::collection("")"#;
|
||||
let req = baseline_request(app);
|
||||
let err = tokio::task::spawn_blocking(move || engine.execute(src, req))
|
||||
.await
|
||||
.unwrap()
|
||||
.expect_err("empty collection should throw");
|
||||
assert!(format!("{err:?}").contains("kv::collection"));
|
||||
}
|
||||
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_list_pages_via_cursor() {
|
||||
let engine = make_engine();
|
||||
let app = AppId::new();
|
||||
let src = r#"
|
||||
let c = kv::collection("widgets");
|
||||
for i in 0..5 { c.set(`k${i}`, i); }
|
||||
let p1 = c.list("", 2);
|
||||
let p2 = c.list(p1.next_cursor, 2);
|
||||
#{
|
||||
p1_keys: p1.keys,
|
||||
p1_cursor: p1.next_cursor,
|
||||
p2_keys: p2.keys,
|
||||
}
|
||||
"#;
|
||||
let body = run_script(engine, src, baseline_request(app)).await;
|
||||
let obj = body.as_object().unwrap();
|
||||
let p1_keys = obj["p1_keys"].as_array().unwrap();
|
||||
let p2_keys = obj["p2_keys"].as_array().unwrap();
|
||||
assert_eq!(p1_keys.len(), 2);
|
||||
assert_eq!(p2_keys.len(), 2);
|
||||
assert!(obj["p1_cursor"].is_string());
|
||||
}
|
||||
|
||||
/// Cross-app isolation via `cx.app_id` — script with `app_id = A`
|
||||
/// cannot see entries from `app_id = B`. The kv:: bridge never
|
||||
/// surfaces `app_id` to the script, so this is enforced purely by the
|
||||
/// service deriving it from the captured `Arc<SdkCallCx>`.
|
||||
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
|
||||
async fn kv_bridge_preserves_cross_app_isolation() {
|
||||
let engine = make_engine();
|
||||
let app_a = AppId::new();
|
||||
let app_b = AppId::new();
|
||||
|
||||
let writer = r#"
|
||||
let c = kv::collection("shared");
|
||||
c.set("k", "from-a");
|
||||
"ok"
|
||||
"#;
|
||||
let _ = run_script(engine.clone(), writer, baseline_request(app_a)).await;
|
||||
|
||||
// App B sees nothing under the same collection/key.
|
||||
let reader = r#"
|
||||
let c = kv::collection("shared");
|
||||
c.get("k")
|
||||
"#;
|
||||
let body = run_script(engine, reader, baseline_request(app_b)).await;
|
||||
assert_eq!(body, Value::Null);
|
||||
}
|
||||
384
crates/executor-core/tests/stdlib.rs
Normal file
384
crates/executor-core/tests/stdlib.rs
Normal file
@@ -0,0 +1,384 @@
|
||||
//! Integration tests for the v1.1.0 stdlib utility modules.
|
||||
//!
|
||||
//! These exist alongside `sdk_contract.rs` rather than inside it
|
||||
//! because the stateless utilities aren't part of the same versioned
|
||||
//! SDK contract surface — `sdk_contract.rs` covers things that bump
|
||||
//! `SDK_VERSION` when they change; stdlib additions don't.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use picloud_executor_core::{Engine, ExecError, ExecRequest, InvocationType, Limits};
|
||||
use picloud_shared::{AppId, ExecutionId, RequestId, ScriptId, ScriptSandbox, Services};
|
||||
use serde_json::{json, Value};
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// Test harness — duplicated from sdk_contract.rs (each integration test
|
||||
// crate has its own; there is no tests/common/).
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
fn engine() -> Engine {
|
||||
Engine::new(Limits::default(), Services::default())
|
||||
}
|
||||
|
||||
fn baseline_request() -> ExecRequest {
|
||||
let execution_id = ExecutionId::new();
|
||||
ExecRequest {
|
||||
execution_id,
|
||||
request_id: RequestId::new(),
|
||||
script_id: ScriptId::new(),
|
||||
script_name: "stdlib".into(),
|
||||
invocation_type: InvocationType::Http,
|
||||
path: "/stdlib-test".into(),
|
||||
headers: BTreeMap::new(),
|
||||
body: Value::Null,
|
||||
params: BTreeMap::new(),
|
||||
query: BTreeMap::new(),
|
||||
rest: String::new(),
|
||||
sandbox_overrides: ScriptSandbox::default(),
|
||||
app_id: AppId::new(),
|
||||
principal: None,
|
||||
trigger_depth: 0,
|
||||
root_execution_id: execution_id,
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn run(source: &str) -> Value {
|
||||
engine()
|
||||
.execute(source, baseline_request())
|
||||
.expect("stdlib test should execute cleanly")
|
||||
.body
|
||||
}
|
||||
|
||||
fn run_err(source: &str) -> ExecError {
|
||||
engine()
|
||||
.execute(source, baseline_request())
|
||||
.expect_err("stdlib test expected to throw")
|
||||
}
|
||||
|
||||
fn assert_runtime_err(err: ExecError, needle: &str) {
|
||||
match err {
|
||||
ExecError::Runtime(msg) => assert!(
|
||||
msg.contains(needle),
|
||||
"runtime error did not contain `{needle}`: {msg}"
|
||||
),
|
||||
other => panic!("expected Runtime error containing `{needle}`, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// regex
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn regex_is_match_true_and_false() {
|
||||
assert_eq!(run(r#"regex::is_match("^h", "hello")"#), json!(true));
|
||||
assert_eq!(run(r#"regex::is_match("^x", "hello")"#), json!(false));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_find_returns_first_match() {
|
||||
assert_eq!(run(r#"regex::find("\\d+", "abc 42 def 99")"#), json!("42"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_find_returns_unit_when_no_match() {
|
||||
// () serializes to JSON null via dynamic_to_json.
|
||||
assert_eq!(run(r#"regex::find("\\d+", "abc")"#), Value::Null);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_find_all_returns_array() {
|
||||
assert_eq!(
|
||||
run(r#"regex::find_all("\\d+", "a1 b22 c333")"#),
|
||||
json!(["1", "22", "333"])
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_replace_first_only() {
|
||||
assert_eq!(
|
||||
run(r#"regex::replace("a", "banana", "X")"#),
|
||||
json!("bXnana")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_replace_all() {
|
||||
assert_eq!(
|
||||
run(r#"regex::replace_all("a", "banana", "X")"#),
|
||||
json!("bXnXnX")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_split() {
|
||||
assert_eq!(
|
||||
run(r#"regex::split(",\\s*", "a, b,c, d")"#),
|
||||
json!(["a", "b", "c", "d"])
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_captures_extracts_groups() {
|
||||
assert_eq!(
|
||||
run(r#"regex::captures("(\\d+)-(\\w+)", "42-abc")"#),
|
||||
json!(["42-abc", "42", "abc"])
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_captures_returns_unit_when_no_match() {
|
||||
assert_eq!(run(r#"regex::captures("(\\d+)", "abc")"#), Value::Null);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regex_invalid_pattern_throws() {
|
||||
assert_runtime_err(run_err(r#"regex::is_match("(", "x")"#), "invalid regex");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// random
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn random_int_within_range() {
|
||||
// Run a few times to exercise the bounds — each call is independent.
|
||||
let body = run(r"
|
||||
let n = random::int(10, 20);
|
||||
n >= 10 && n <= 20
|
||||
");
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_int_throws_when_min_greater_than_max() {
|
||||
assert_runtime_err(run_err("random::int(20, 10)"), "min");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_float_in_unit_interval() {
|
||||
let body = run(r"
|
||||
let f = random::float();
|
||||
f >= 0.0 && f < 1.0
|
||||
");
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_bytes_returns_blob_of_correct_length() {
|
||||
assert_eq!(run("random::bytes(16).len()"), json!(16));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_bytes_rejects_negative() {
|
||||
assert_runtime_err(run_err("random::bytes(-1)"), "random::bytes");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_bytes_rejects_oversize() {
|
||||
assert_runtime_err(run_err("random::bytes(70000)"), "random::bytes");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_string_produces_alphanumeric_of_correct_length() {
|
||||
let body = run(r#"
|
||||
let s = random::string(32);
|
||||
s.len == 32 && regex::is_match("^[A-Za-z0-9]+$", s)
|
||||
"#);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn random_uuid_has_canonical_format() {
|
||||
let body = run(
|
||||
r#"regex::is_match("^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$", random::uuid())"#,
|
||||
);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// time
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn time_now_ms_is_positive() {
|
||||
let body = run("time::now_ms() > 0");
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn time_now_string_looks_like_iso() {
|
||||
let body = run(r#"regex::is_match("^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}", time::now())"#);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn time_parse_format_round_trip() {
|
||||
let body = run(r"
|
||||
let ms = 1700000000000;
|
||||
time::parse(time::format(ms)) == ms
|
||||
");
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn time_add_seconds() {
|
||||
assert_eq!(run("time::add_seconds(0, 60)"), json!(60_000));
|
||||
assert_eq!(run("time::add_seconds(1000, -1)"), json!(0));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn time_diff_seconds_truncates() {
|
||||
assert_eq!(run("time::diff_seconds(0, 65_500)"), json!(65));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn time_parse_rejects_garbage() {
|
||||
assert_runtime_err(run_err(r#"time::parse("nonsense")"#), "time::parse");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// json
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn json_parse_then_stringify_round_trip() {
|
||||
let body = run(r#"
|
||||
let src = `{"a":1,"b":"x"}`;
|
||||
json::stringify(json::parse(src)) == src
|
||||
"#);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_stringify_compact() {
|
||||
assert_eq!(run(r"json::stringify(#{ a: 1 })"), json!(r#"{"a":1}"#));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_stringify_pretty_has_newlines() {
|
||||
let body = run(r#"json::stringify_pretty(#{ a: 1 }).contains("\n")"#);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_parse_invalid_throws() {
|
||||
assert_runtime_err(run_err(r#"json::parse("not json")"#), "json::parse");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// base64
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn base64_encode_string() {
|
||||
assert_eq!(run(r#"base64::encode("hi")"#), json!("aGk="));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn base64_decode_then_re_encode_round_trip() {
|
||||
assert_eq!(
|
||||
run(r#"base64::encode(base64::decode("aGVsbG8="))"#),
|
||||
json!("aGVsbG8=")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn base64_encode_url_has_no_padding() {
|
||||
let body = run(r#"
|
||||
let s = base64::encode_url("hello world!?");
|
||||
!s.contains("=") && !s.contains("+") && !s.contains("/")
|
||||
"#);
|
||||
assert_eq!(body, json!(true));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn base64_decode_url_round_trip() {
|
||||
assert_eq!(
|
||||
run(r#"base64::encode_url(base64::decode_url("aGVsbG8"))"#),
|
||||
json!("aGVsbG8")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn base64_decode_invalid_throws() {
|
||||
assert_runtime_err(run_err(r#"base64::decode("!!!")"#), "base64::decode");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// hex
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn hex_encode_produces_lowercase() {
|
||||
assert_eq!(run(r#"hex::encode("Z")"#), json!("5a"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn hex_decode_then_re_encode_round_trip() {
|
||||
// mixed-case input → lowercase output proves both case-insensitive
|
||||
// decode and lowercase encode.
|
||||
assert_eq!(
|
||||
run(r#"hex::encode(hex::decode("DeAdBeEf"))"#),
|
||||
json!("deadbeef")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn hex_decode_returns_correct_length() {
|
||||
assert_eq!(run(r#"hex::decode("deadbeef").len()"#), json!(4));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn hex_decode_invalid_throws() {
|
||||
assert_runtime_err(run_err(r#"hex::decode("xyz")"#), "hex::decode");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// url
|
||||
// ============================================================================
|
||||
|
||||
#[test]
|
||||
fn url_encode_basic() {
|
||||
assert_eq!(run(r#"url::encode("hello world")"#), json!("hello%20world"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_encode_preserves_unreserved() {
|
||||
assert_eq!(
|
||||
run(r#"url::encode("abcXYZ123-_.~")"#),
|
||||
json!("abcXYZ123-_.~")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_decode_round_trip() {
|
||||
assert_eq!(
|
||||
run(r#"url::decode(url::encode("hello world!?"))"#),
|
||||
json!("hello world!?")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_encode_query_basic() {
|
||||
// Map keys come out alphabetically (Rhai's Map is a BTreeMap).
|
||||
assert_eq!(
|
||||
run(r#"url::encode_query(#{ a: "1", b: "x y" })"#),
|
||||
json!("a=1&b=x%20y")
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_encode_query_coerces_non_strings() {
|
||||
// Numbers and bools shouldn't throw; they coerce via to_string().
|
||||
let body = run(r"url::encode_query(#{ n: 42, b: true })");
|
||||
// Order is alphabetical: b before n.
|
||||
assert_eq!(body, json!("b=true&n=42"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_decode_rejects_invalid_utf8() {
|
||||
assert_runtime_err(run_err(r#"url::decode("%FF%FE%80")"#), "url::decode");
|
||||
}
|
||||
@@ -10,13 +10,16 @@ workspace = true
|
||||
|
||||
[dependencies]
|
||||
picloud-shared.workspace = true
|
||||
picloud-executor-core.workspace = true
|
||||
picloud-orchestrator-core.workspace = true
|
||||
|
||||
async-trait.workspace = true
|
||||
axum.workspace = true
|
||||
rand.workspace = true
|
||||
serde.workspace = true
|
||||
serde_json.workspace = true
|
||||
thiserror.workspace = true
|
||||
tokio.workspace = true
|
||||
tracing.workspace = true
|
||||
uuid.workspace = true
|
||||
chrono.workspace = true
|
||||
@@ -24,7 +27,6 @@ sqlx.workspace = true
|
||||
url.workspace = true
|
||||
|
||||
argon2.workspace = true
|
||||
rand.workspace = true
|
||||
sha2.workspace = true
|
||||
base64.workspace = true
|
||||
data-encoding.workspace = true
|
||||
|
||||
28
crates/manager-core/migrations/0007_kv.sql
Normal file
28
crates/manager-core/migrations/0007_kv.sql
Normal file
@@ -0,0 +1,28 @@
|
||||
-- v1.1.1: Key-value store — see blueprint §8.1 + docs/sdk-shape.md.
|
||||
--
|
||||
-- Identity tuple `(app_id, collection, key)`. `app_id` is first in the
|
||||
-- primary key so the implicit index is always per-app; cross-app reads
|
||||
-- cannot happen even with a buggy query. Collections are a required
|
||||
-- namespace inside an app — the same key can live in different
|
||||
-- collections without collision.
|
||||
--
|
||||
-- `value` is JSONB so scripts can store nested structures without
|
||||
-- a separate serialization step. No TTL column in v1.1.1; deferred
|
||||
-- until a concrete need surfaces (the blueprint reserved one but the
|
||||
-- v1.1.1 SDK surface — get/set/has/delete/list — doesn't expose TTL).
|
||||
|
||||
CREATE TABLE kv_entries (
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
collection TEXT NOT NULL,
|
||||
key TEXT NOT NULL,
|
||||
value JSONB NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
PRIMARY KEY (app_id, collection, key)
|
||||
);
|
||||
|
||||
-- Supports list-by-collection (keyset pagination) and per-collection
|
||||
-- triggers' fan-out scans. The PK already covers (app_id, collection)
|
||||
-- as a prefix but spelling out the explicit index makes intent clear
|
||||
-- for the planner.
|
||||
CREATE INDEX idx_kv_entries_app_collection ON kv_entries (app_id, collection);
|
||||
72
crates/manager-core/migrations/0008_triggers.sql
Normal file
72
crates/manager-core/migrations/0008_triggers.sql
Normal file
@@ -0,0 +1,72 @@
|
||||
-- v1.1.1: Trigger framework — Layout E (design notes §2 + §7).
|
||||
--
|
||||
-- A parent `triggers` table holds the common columns (script_id, retry
|
||||
-- config, dispatch_mode, registered-by principal); per-kind detail
|
||||
-- tables hold the kind-specific filter columns. v1.1.1 ships two
|
||||
-- kinds: KV (collection_glob + ops) and dead_letter (source / trigger
|
||||
-- / script filters). Future kinds (cron, pubsub, queue, email) extend
|
||||
-- the parent and add their own detail table.
|
||||
--
|
||||
-- `registered_by_principal` captures the admin user that registered
|
||||
-- the trigger. The dispatcher resolves this back to a `Principal` at
|
||||
-- execution time so the trigger runs as the user that set it up
|
||||
-- (design notes §4: "a trigger execution runs as the principal that
|
||||
-- registered the trigger").
|
||||
--
|
||||
-- HTTP routes stay in their own `routes` table for now (Phase 3
|
||||
-- production schema with its own trie-index columns); the dispatcher
|
||||
-- discriminates HTTP outbox rows by `source_kind = 'http'` and
|
||||
-- `trigger_id` referencing `routes.id`. Folding routes into triggers
|
||||
-- is a v1.2 cleanup, not a v1.1.1 requirement.
|
||||
|
||||
CREATE TABLE triggers (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
script_id UUID NOT NULL REFERENCES scripts(id) ON DELETE CASCADE,
|
||||
kind TEXT NOT NULL CHECK (kind IN ('kv', 'dead_letter')),
|
||||
enabled BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
-- Async by default — sync would mean the trigger fires inline with
|
||||
-- the originating mutation, which v1.1.1 doesn't support.
|
||||
dispatch_mode TEXT NOT NULL DEFAULT 'async'
|
||||
CHECK (dispatch_mode IN ('sync', 'async')),
|
||||
-- Defaults applied at write time so the row is auditable on its
|
||||
-- own. Per-trigger overrides set on create; the env-defined
|
||||
-- defaults provide the fallback values.
|
||||
retry_max_attempts INT NOT NULL,
|
||||
retry_backoff TEXT NOT NULL
|
||||
CHECK (retry_backoff IN ('exponential', 'linear', 'constant')),
|
||||
retry_base_ms INT NOT NULL,
|
||||
registered_by_principal UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- The dispatcher's hot lookup: "all enabled triggers for app X of
|
||||
-- kind Y". Indexed only when enabled = TRUE so disabled rows don't
|
||||
-- pollute the index.
|
||||
CREATE INDEX idx_triggers_app_kind_enabled
|
||||
ON triggers (app_id, kind)
|
||||
WHERE enabled = TRUE;
|
||||
|
||||
-- One row per KV trigger. `collection_glob` accepts:
|
||||
-- "*" — any collection in the app
|
||||
-- "widgets" — exact match
|
||||
-- "users:*" — prefix wildcard (matched in Rust, not SQL)
|
||||
-- `ops` is the subset of {insert, update, delete} this trigger
|
||||
-- subscribes to. Empty array means "any op" (the trigger fires on
|
||||
-- every mutation; admin endpoint validates this).
|
||||
CREATE TABLE kv_trigger_details (
|
||||
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
|
||||
collection_glob TEXT NOT NULL,
|
||||
ops TEXT[] NOT NULL
|
||||
);
|
||||
|
||||
-- One row per dead-letter trigger. All three filter columns are
|
||||
-- nullable — NULL means "no filter on this dimension". A trigger
|
||||
-- with all three nullable filters fires on every dead-letter row.
|
||||
CREATE TABLE dead_letter_trigger_details (
|
||||
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
|
||||
source_filter TEXT,
|
||||
trigger_id_filter UUID,
|
||||
script_id_filter UUID
|
||||
);
|
||||
64
crates/manager-core/migrations/0009_outbox.sql
Normal file
64
crates/manager-core/migrations/0009_outbox.sql
Normal file
@@ -0,0 +1,64 @@
|
||||
-- v1.1.1: Universal trigger outbox — design notes §2.
|
||||
--
|
||||
-- One table for every async dispatch in the system. KV/cron/pubsub/
|
||||
-- queue/email/dead-letter all write rows in this shape; the dispatcher
|
||||
-- claims due rows with `FOR UPDATE SKIP LOCKED` and routes them to
|
||||
-- the executor.
|
||||
--
|
||||
-- Sync HTTP also writes here (NATS-style inbox, design notes §3) —
|
||||
-- `reply_to` carries an `inbox_id` that the orchestrator awaits on a
|
||||
-- oneshot channel. `reply_to.is_some()` is the "don't retry" signal:
|
||||
-- one attempt, surface the result via the inbox.
|
||||
--
|
||||
-- `trigger_id` is a polymorphic reference discriminated by
|
||||
-- `source_kind`: for `source_kind='http'` it references `routes.id`;
|
||||
-- otherwise it references `triggers.id`. Polymorphism handled in
|
||||
-- Rust (the dispatcher); no DB-level FK because Postgres doesn't
|
||||
-- support polymorphic FKs cleanly. NULL is allowed because direct
|
||||
-- admin-replay paths may not have a triggering row at all.
|
||||
--
|
||||
-- `script_id` denormalized so the dispatcher resolves the target
|
||||
-- script without an extra round-trip per row.
|
||||
|
||||
CREATE TABLE outbox (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
source_kind TEXT NOT NULL
|
||||
CHECK (source_kind IN ('http', 'kv', 'dead_letter')),
|
||||
-- Polymorphic — see comment above. No FK constraint.
|
||||
trigger_id UUID,
|
||||
-- Pre-resolved at write time so the dispatcher doesn't re-look it up.
|
||||
script_id UUID,
|
||||
-- NULL = async (retry per policy). Some(inbox_id) = sync HTTP
|
||||
-- (never retry; resolve the inbox with the result).
|
||||
reply_to UUID,
|
||||
-- ServiceEvent + ExecRequest scaffold serialized as JSONB.
|
||||
payload JSONB NOT NULL,
|
||||
-- Forensic field — the principal that triggered the originating
|
||||
-- event. NOT the execution principal for trigger fan-out (that
|
||||
-- comes from `triggers.registered_by_principal`).
|
||||
origin_principal UUID,
|
||||
-- Trigger-depth as the dispatcher will hand it to the executor.
|
||||
-- Read out into ExecRequest.trigger_depth at dispatch time.
|
||||
trigger_depth INT NOT NULL DEFAULT 0,
|
||||
-- Originating execution id (for audit log grouping). Equals the
|
||||
-- root for direct invocations; preserved across fan-out chains.
|
||||
root_execution_id UUID,
|
||||
attempt_count INT NOT NULL DEFAULT 0,
|
||||
next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
-- Set inside the SELECT FOR UPDATE SKIP LOCKED transaction so
|
||||
-- the dispatcher can't double-pick a row across concurrent loop
|
||||
-- iterations.
|
||||
claimed_at TIMESTAMPTZ,
|
||||
claimed_by TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Hot index: the dispatcher's `WHERE next_attempt_at <= NOW() AND
|
||||
-- claimed_at IS NULL` claim query. Partial index keeps the hot set
|
||||
-- small even if the table grows large.
|
||||
CREATE INDEX idx_outbox_due
|
||||
ON outbox (next_attempt_at)
|
||||
WHERE claimed_at IS NULL;
|
||||
|
||||
CREATE INDEX idx_outbox_app ON outbox (app_id);
|
||||
50
crates/manager-core/migrations/0010_dead_letters.sql
Normal file
50
crates/manager-core/migrations/0010_dead_letters.sql
Normal file
@@ -0,0 +1,50 @@
|
||||
-- v1.1.1: dead_letters — design notes §4.
|
||||
--
|
||||
-- Async invocations that exhaust their retry policy land here. Each
|
||||
-- row carries the original event payload verbatim plus the attempt
|
||||
-- history so handlers (registered via `dead_letter` triggers) and the
|
||||
-- dashboard can decide what to do.
|
||||
--
|
||||
-- Schema mirrors design notes §4. The CHECK constraint on
|
||||
-- `resolution` enforces the closed vocabulary used by both the SDK
|
||||
-- (`dead_letters::resolve(id, reason)`) and the recursion-stop rule
|
||||
-- (`handler_failed`). Sync HTTP failures (`reply_to.is_some()`) never
|
||||
-- land here — they're served via the inbox channel.
|
||||
--
|
||||
-- Indexes:
|
||||
-- - partial index on unresolved rows: the dashboard's
|
||||
-- unresolved-count badge query (`COUNT(*) WHERE app_id = $1 AND
|
||||
-- resolved_at IS NULL`).
|
||||
-- - GC index on `created_at`: the weekly retention sweep.
|
||||
|
||||
CREATE TABLE dead_letters (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
-- The outbox.id row that exhausted retries. The outbox row itself
|
||||
-- has been deleted at this point.
|
||||
original_event_id UUID NOT NULL,
|
||||
source TEXT NOT NULL,
|
||||
op TEXT NOT NULL,
|
||||
-- Nullable because direct admin replays may have no trigger row.
|
||||
trigger_id UUID,
|
||||
script_id UUID,
|
||||
payload JSONB NOT NULL,
|
||||
attempt_count INT NOT NULL,
|
||||
first_attempt_at TIMESTAMPTZ NOT NULL,
|
||||
last_attempt_at TIMESTAMPTZ NOT NULL,
|
||||
last_error TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
resolved_at TIMESTAMPTZ,
|
||||
resolution TEXT
|
||||
CHECK (resolution IN
|
||||
('replayed', 'ignored', 'handled_by_script', 'handler_failed'))
|
||||
);
|
||||
|
||||
-- Dashboard unresolved-count badge — partial index on the predicate
|
||||
-- the query uses.
|
||||
CREATE INDEX idx_dead_letters_app_unresolved
|
||||
ON dead_letters (app_id)
|
||||
WHERE resolved_at IS NULL;
|
||||
|
||||
-- GC sweep scans by creation time.
|
||||
CREATE INDEX idx_dead_letters_gc ON dead_letters (created_at);
|
||||
31
crates/manager-core/migrations/0011_abandoned_executions.sql
Normal file
31
crates/manager-core/migrations/0011_abandoned_executions.sql
Normal file
@@ -0,0 +1,31 @@
|
||||
-- v1.1.1: abandoned_executions — design notes §3 #9.
|
||||
--
|
||||
-- Forensic table for the "dispatcher tried to resolve a oneshot inbox
|
||||
-- but the receiver was already dropped" edge case. The orchestrator
|
||||
-- timed out (returned 504 to the caller) and gave up on the channel,
|
||||
-- but then the dispatcher's execution succeeded later. The caller
|
||||
-- never sees the result; the row exists so the operator can
|
||||
-- correlate when the abandoned-counter metric spikes.
|
||||
--
|
||||
-- Only the dispatcher-after-orchestrator-timeout edge case writes
|
||||
-- here; ordinary "script timed out, caller got 504" stays uneventful.
|
||||
--
|
||||
-- 7-day retention, GC by `created_at`, sweep alongside dead_letters.
|
||||
|
||||
CREATE TABLE abandoned_executions (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
-- Original outbox row id (the row itself has been deleted).
|
||||
outbox_id UUID NOT NULL,
|
||||
script_id UUID,
|
||||
-- The inbox channel id the dispatcher tried to resolve.
|
||||
inbox_id UUID NOT NULL,
|
||||
-- The HTTP status code the dispatcher attempted to send back.
|
||||
status_code INT NOT NULL,
|
||||
-- Truncated body / error description (capped at write time —
|
||||
-- the dispatcher doesn't need to ship megabytes here).
|
||||
result_summary TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_abandoned_executions_gc ON abandoned_executions (created_at);
|
||||
16
crates/manager-core/migrations/0012_routes_dispatch_mode.sql
Normal file
16
crates/manager-core/migrations/0012_routes_dispatch_mode.sql
Normal file
@@ -0,0 +1,16 @@
|
||||
-- v1.1.1: per-route dispatch mode (design notes §2 + §3).
|
||||
--
|
||||
-- `sync` (default): orchestrator awaits the executor inline and
|
||||
-- returns the response in the same HTTP request — current MVP
|
||||
-- behaviour.
|
||||
-- `async`: orchestrator writes the request to the trigger outbox,
|
||||
-- returns `202 Accepted` immediately. The dispatcher runs the
|
||||
-- script in the background and surfaces failures via the
|
||||
-- retry / dead-letter machinery — same shape as any other async
|
||||
-- event.
|
||||
--
|
||||
-- Existing routes default to `sync` so the migration is non-breaking.
|
||||
|
||||
ALTER TABLE routes
|
||||
ADD COLUMN dispatch_mode TEXT NOT NULL DEFAULT 'sync'
|
||||
CHECK (dispatch_mode IN ('sync', 'async'));
|
||||
128
crates/manager-core/src/abandoned_repo.rs
Normal file
128
crates/manager-core/src/abandoned_repo.rs
Normal file
@@ -0,0 +1,128 @@
|
||||
//! `AbandonedExecutionsRepo` — forensic table written by the
|
||||
//! dispatcher when it tries to resolve a sync-HTTP inbox channel
|
||||
//! that's already been dropped (orchestrator timed out and gave up).
|
||||
//!
|
||||
//! Schema: see `migrations/0011_abandoned_executions.sql`.
|
||||
//!
|
||||
//! Tiny surface: insert + GC. Reading happens via direct SQL when
|
||||
//! correlating the metric counter spike.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use chrono::{DateTime, Utc};
|
||||
use picloud_shared::{AppId, ScriptId};
|
||||
use sqlx::PgPool;
|
||||
use uuid::Uuid;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum AbandonedRepoError {
|
||||
#[error("database error: {0}")]
|
||||
Db(#[from] sqlx::Error),
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct NewAbandonedExecution {
|
||||
pub app_id: AppId,
|
||||
pub outbox_id: Uuid,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub inbox_id: Uuid,
|
||||
pub status_code: u16,
|
||||
pub result_summary: Option<String>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait AbandonedRepo: Send + Sync {
|
||||
async fn insert(&self, row: NewAbandonedExecution) -> Result<Uuid, AbandonedRepoError>;
|
||||
|
||||
/// Retention sweep — deletes rows older than `older_than` up to
|
||||
/// `limit` at a time.
|
||||
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, AbandonedRepoError>;
|
||||
}
|
||||
|
||||
pub struct PostgresAbandonedRepo {
|
||||
pool: PgPool,
|
||||
}
|
||||
|
||||
impl PostgresAbandonedRepo {
|
||||
#[must_use]
|
||||
pub fn new(pool: PgPool) -> Self {
|
||||
Self { pool }
|
||||
}
|
||||
}
|
||||
|
||||
const SUMMARY_CAP_BYTES: usize = 4096;
|
||||
|
||||
#[async_trait]
|
||||
impl AbandonedRepo for PostgresAbandonedRepo {
|
||||
async fn insert(&self, row: NewAbandonedExecution) -> Result<Uuid, AbandonedRepoError> {
|
||||
// Truncate the summary at write-time. The forensic table
|
||||
// doesn't need megabytes; the original outbox row may have
|
||||
// been arbitrary size but we lose nothing useful by clipping.
|
||||
let summary = row.result_summary.map(|s| truncate(s, SUMMARY_CAP_BYTES));
|
||||
let (id,): (Uuid,) = sqlx::query_as(
|
||||
"INSERT INTO abandoned_executions ( \
|
||||
app_id, outbox_id, script_id, inbox_id, status_code, result_summary \
|
||||
) VALUES ($1, $2, $3, $4, $5, $6) \
|
||||
RETURNING id",
|
||||
)
|
||||
.bind(row.app_id.into_inner())
|
||||
.bind(row.outbox_id)
|
||||
.bind(row.script_id.map(ScriptId::into_inner))
|
||||
.bind(row.inbox_id)
|
||||
.bind(i32::from(row.status_code))
|
||||
.bind(summary)
|
||||
.fetch_one(&self.pool)
|
||||
.await?;
|
||||
Ok(id)
|
||||
}
|
||||
|
||||
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, AbandonedRepoError> {
|
||||
let res = sqlx::query(
|
||||
"DELETE FROM abandoned_executions \
|
||||
WHERE id IN ( \
|
||||
SELECT id FROM abandoned_executions \
|
||||
WHERE created_at < $1 \
|
||||
FOR UPDATE SKIP LOCKED \
|
||||
LIMIT $2 \
|
||||
)",
|
||||
)
|
||||
.bind(older_than)
|
||||
.bind(limit)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
Ok(res.rows_affected())
|
||||
}
|
||||
}
|
||||
|
||||
fn truncate(mut s: String, max_bytes: usize) -> String {
|
||||
if s.len() <= max_bytes {
|
||||
return s;
|
||||
}
|
||||
// Walk back from `max_bytes` to a UTF-8 char boundary so we never
|
||||
// panic on `truncate` mid-codepoint.
|
||||
let mut cut = max_bytes;
|
||||
while cut > 0 && !s.is_char_boundary(cut) {
|
||||
cut -= 1;
|
||||
}
|
||||
s.truncate(cut);
|
||||
s
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn truncate_respects_char_boundaries() {
|
||||
// 3-byte UTF-8 chars; cap inside the middle char should walk
|
||||
// back to the start.
|
||||
let s = "héllo".to_string();
|
||||
let t = truncate(s, 2);
|
||||
assert!(t.is_char_boundary(t.len()));
|
||||
assert_eq!(t, "h");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn truncate_passthrough_for_short_strings() {
|
||||
assert_eq!(truncate("ok".into(), 100), "ok");
|
||||
}
|
||||
}
|
||||
@@ -82,6 +82,7 @@ async fn seed_into(
|
||||
// Accept any method so both `curl /hello` and
|
||||
// `curl -d '{"name":"X"}' /hello` work out of the box.
|
||||
method: None,
|
||||
dispatch_mode: picloud_shared::DispatchMode::Sync,
|
||||
})
|
||||
.await?;
|
||||
|
||||
|
||||
@@ -57,6 +57,21 @@ pub enum Capability {
|
||||
AppAdmin(AppId),
|
||||
/// Read execution logs for scripts in this app.
|
||||
AppLogRead(AppId),
|
||||
/// Read entries from this app's KV store (v1.1.1). Granted to
|
||||
/// `viewer`+ in the per-app role table. Maps to `script:read` on
|
||||
/// API keys — the seven-scope vocabulary stays locked.
|
||||
AppKvRead(AppId),
|
||||
/// Write entries to this app's KV store (v1.1.1). Granted to
|
||||
/// `editor`+. Maps to `script:write` on API keys.
|
||||
AppKvWrite(AppId),
|
||||
/// Create / list / delete triggers for this app (v1.1.1). Maps to
|
||||
/// `app:admin` on API keys — triggers are app-configuration acts
|
||||
/// rather than data-plane access. Granted to `app_admin`+.
|
||||
AppManageTriggers(AppId),
|
||||
/// Replay / resolve dead-letter rows for this app (v1.1.1). Maps
|
||||
/// to `app:admin` on API keys. Public-HTTP scripts (principal None)
|
||||
/// fail this check — managing dead letters is an admin act.
|
||||
AppDeadLetterManage(AppId),
|
||||
}
|
||||
|
||||
impl Capability {
|
||||
@@ -73,7 +88,11 @@ impl Capability {
|
||||
| Self::AppWriteRoute(id)
|
||||
| Self::AppManageDomains(id)
|
||||
| Self::AppAdmin(id)
|
||||
| Self::AppLogRead(id) => Some(id),
|
||||
| Self::AppLogRead(id)
|
||||
| Self::AppKvRead(id)
|
||||
| Self::AppKvWrite(id)
|
||||
| Self::AppManageTriggers(id)
|
||||
| Self::AppDeadLetterManage(id) => Some(id),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -88,11 +107,13 @@ impl Capability {
|
||||
Self::InstanceCreateApp | Self::InstanceManageUsers | Self::InstanceManageSettings => {
|
||||
Scope::InstanceAdmin
|
||||
}
|
||||
Self::AppRead(_) => Scope::ScriptRead,
|
||||
Self::AppWriteScript(_) => Scope::ScriptWrite,
|
||||
Self::AppRead(_) | Self::AppKvRead(_) => Scope::ScriptRead,
|
||||
Self::AppWriteScript(_) | Self::AppKvWrite(_) => Scope::ScriptWrite,
|
||||
Self::AppWriteRoute(_) => Scope::RouteWrite,
|
||||
Self::AppManageDomains(_) => Scope::DomainManage,
|
||||
Self::AppAdmin(_) => Scope::AppAdmin,
|
||||
Self::AppAdmin(_) | Self::AppManageTriggers(_) | Self::AppDeadLetterManage(_) => {
|
||||
Scope::AppAdmin
|
||||
}
|
||||
Self::AppLogRead(_) => Scope::LogRead,
|
||||
}
|
||||
}
|
||||
@@ -230,16 +251,24 @@ async fn member_grants(
|
||||
/// domain claims, and delete. Roles form a strict subset chain, so
|
||||
/// the check is "is this capability in the role's set?".
|
||||
const fn role_satisfies(role: AppRole, cap: Capability) -> bool {
|
||||
let in_viewer = matches!(cap, Capability::AppRead(_) | Capability::AppLogRead(_));
|
||||
let in_viewer = matches!(
|
||||
cap,
|
||||
Capability::AppRead(_) | Capability::AppLogRead(_) | Capability::AppKvRead(_)
|
||||
);
|
||||
let in_editor = in_viewer
|
||||
|| matches!(
|
||||
cap,
|
||||
Capability::AppWriteScript(_) | Capability::AppWriteRoute(_)
|
||||
Capability::AppWriteScript(_)
|
||||
| Capability::AppWriteRoute(_)
|
||||
| Capability::AppKvWrite(_)
|
||||
);
|
||||
let in_app_admin = in_editor
|
||||
|| matches!(
|
||||
cap,
|
||||
Capability::AppManageDomains(_) | Capability::AppAdmin(_)
|
||||
Capability::AppManageDomains(_)
|
||||
| Capability::AppAdmin(_)
|
||||
| Capability::AppManageTriggers(_)
|
||||
| Capability::AppDeadLetterManage(_)
|
||||
);
|
||||
match role {
|
||||
AppRole::Viewer => in_viewer,
|
||||
|
||||
261
crates/manager-core/src/dead_letter_repo.rs
Normal file
261
crates/manager-core/src/dead_letter_repo.rs
Normal file
@@ -0,0 +1,261 @@
|
||||
//! `DeadLetterRepo` — CRUD over the `dead_letters` table.
|
||||
//!
|
||||
//! The dispatcher writes new rows when an async trigger exhausts its
|
||||
//! retry policy. Admin endpoints (commit 8) read for the dashboard
|
||||
//! list view and write to mark rows resolved or replay them. The GC
|
||||
//! sweeper (commit 10) deletes expired rows by `created_at`.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use chrono::{DateTime, Utc};
|
||||
use picloud_shared::{AppId, DeadLetterId, ScriptId, TriggerId};
|
||||
use sqlx::PgPool;
|
||||
use uuid::Uuid;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum DeadLetterRepoError {
|
||||
#[error("database error: {0}")]
|
||||
Db(#[from] sqlx::Error),
|
||||
|
||||
#[error("dead-letter row not found: {0}")]
|
||||
NotFound(DeadLetterId),
|
||||
|
||||
#[error("invalid resolution {0:?}")]
|
||||
InvalidResolution(String),
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct NewDeadLetter {
|
||||
pub app_id: AppId,
|
||||
/// `outbox.id` that exhausted retries. Outbox row deleted at the
|
||||
/// same time.
|
||||
pub original_event_id: Uuid,
|
||||
pub source: String,
|
||||
pub op: String,
|
||||
pub trigger_id: Option<TriggerId>,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub payload: serde_json::Value,
|
||||
pub attempt_count: u32,
|
||||
pub first_attempt_at: DateTime<Utc>,
|
||||
pub last_attempt_at: DateTime<Utc>,
|
||||
pub last_error: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DeadLetterRow {
|
||||
pub id: DeadLetterId,
|
||||
pub app_id: AppId,
|
||||
pub original_event_id: Uuid,
|
||||
pub source: String,
|
||||
pub op: String,
|
||||
pub trigger_id: Option<TriggerId>,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub payload: serde_json::Value,
|
||||
pub attempt_count: u32,
|
||||
pub first_attempt_at: DateTime<Utc>,
|
||||
pub last_attempt_at: DateTime<Utc>,
|
||||
pub last_error: String,
|
||||
pub created_at: DateTime<Utc>,
|
||||
pub resolved_at: Option<DateTime<Utc>>,
|
||||
pub resolution: Option<String>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait DeadLetterRepo: Send + Sync {
|
||||
/// Insert a new dead-letter row. Returns the assigned id.
|
||||
async fn insert(&self, row: NewDeadLetter) -> Result<DeadLetterId, DeadLetterRepoError>;
|
||||
|
||||
async fn get(&self, id: DeadLetterId) -> Result<Option<DeadLetterRow>, DeadLetterRepoError>;
|
||||
|
||||
/// Lookup for the dashboard list view. `unresolved_only=true`
|
||||
/// filters to `resolved_at IS NULL`.
|
||||
async fn list_for_app(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
unresolved_only: bool,
|
||||
limit: i64,
|
||||
offset: i64,
|
||||
) -> Result<Vec<DeadLetterRow>, DeadLetterRepoError>;
|
||||
|
||||
/// Hot path for the dashboard's per-app unresolved-count badge.
|
||||
async fn unresolved_count(&self, app_id: AppId) -> Result<i64, DeadLetterRepoError>;
|
||||
|
||||
/// Mark the row resolved with the given reason. The reason MUST
|
||||
/// be one of the four CHECK-constraint values
|
||||
/// (`replayed`, `ignored`, `handled_by_script`, `handler_failed`).
|
||||
async fn resolve(&self, id: DeadLetterId, reason: &str) -> Result<(), DeadLetterRepoError>;
|
||||
|
||||
/// Retention sweep. Deletes rows with `created_at < older_than`
|
||||
/// up to `limit` at a time, using FOR UPDATE SKIP LOCKED to play
|
||||
/// nicely with concurrent dispatchers. Returns the count deleted.
|
||||
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, DeadLetterRepoError>;
|
||||
}
|
||||
|
||||
pub struct PostgresDeadLetterRepo {
|
||||
pool: PgPool,
|
||||
}
|
||||
|
||||
impl PostgresDeadLetterRepo {
|
||||
#[must_use]
|
||||
pub fn new(pool: PgPool) -> Self {
|
||||
Self { pool }
|
||||
}
|
||||
}
|
||||
|
||||
const ALLOWED_RESOLUTIONS: &[&str] =
|
||||
&["replayed", "ignored", "handled_by_script", "handler_failed"];
|
||||
|
||||
#[async_trait]
|
||||
impl DeadLetterRepo for PostgresDeadLetterRepo {
|
||||
async fn insert(&self, row: NewDeadLetter) -> Result<DeadLetterId, DeadLetterRepoError> {
|
||||
let (id,): (Uuid,) = sqlx::query_as(
|
||||
"INSERT INTO dead_letters ( \
|
||||
app_id, original_event_id, source, op, trigger_id, script_id, \
|
||||
payload, attempt_count, first_attempt_at, last_attempt_at, last_error \
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) \
|
||||
RETURNING id",
|
||||
)
|
||||
.bind(row.app_id.into_inner())
|
||||
.bind(row.original_event_id)
|
||||
.bind(row.source)
|
||||
.bind(row.op)
|
||||
.bind(row.trigger_id.map(TriggerId::into_inner))
|
||||
.bind(row.script_id.map(ScriptId::into_inner))
|
||||
.bind(row.payload)
|
||||
.bind(i32::try_from(row.attempt_count).unwrap_or(0))
|
||||
.bind(row.first_attempt_at)
|
||||
.bind(row.last_attempt_at)
|
||||
.bind(row.last_error)
|
||||
.fetch_one(&self.pool)
|
||||
.await?;
|
||||
Ok(id.into())
|
||||
}
|
||||
|
||||
async fn get(&self, id: DeadLetterId) -> Result<Option<DeadLetterRow>, DeadLetterRepoError> {
|
||||
let row: Option<DeadLetterRowRaw> = sqlx::query_as(
|
||||
"SELECT id, app_id, original_event_id, source, op, trigger_id, script_id, \
|
||||
payload, attempt_count, first_attempt_at, last_attempt_at, \
|
||||
last_error, created_at, resolved_at, resolution \
|
||||
FROM dead_letters WHERE id = $1",
|
||||
)
|
||||
.bind(id.into_inner())
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
Ok(row.map(DeadLetterRowRaw::into_row))
|
||||
}
|
||||
|
||||
async fn list_for_app(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
unresolved_only: bool,
|
||||
limit: i64,
|
||||
offset: i64,
|
||||
) -> Result<Vec<DeadLetterRow>, DeadLetterRepoError> {
|
||||
let rows: Vec<DeadLetterRowRaw> = sqlx::query_as(
|
||||
"SELECT id, app_id, original_event_id, source, op, trigger_id, script_id, \
|
||||
payload, attempt_count, first_attempt_at, last_attempt_at, \
|
||||
last_error, created_at, resolved_at, resolution \
|
||||
FROM dead_letters \
|
||||
WHERE app_id = $1 \
|
||||
AND ($2::bool = FALSE OR resolved_at IS NULL) \
|
||||
ORDER BY created_at DESC \
|
||||
LIMIT $3 OFFSET $4",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(unresolved_only)
|
||||
.bind(limit)
|
||||
.bind(offset)
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
Ok(rows.into_iter().map(DeadLetterRowRaw::into_row).collect())
|
||||
}
|
||||
|
||||
async fn unresolved_count(&self, app_id: AppId) -> Result<i64, DeadLetterRepoError> {
|
||||
let (count,): (i64,) = sqlx::query_as(
|
||||
"SELECT COUNT(*) FROM dead_letters \
|
||||
WHERE app_id = $1 AND resolved_at IS NULL",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.fetch_one(&self.pool)
|
||||
.await?;
|
||||
Ok(count)
|
||||
}
|
||||
|
||||
async fn resolve(&self, id: DeadLetterId, reason: &str) -> Result<(), DeadLetterRepoError> {
|
||||
if !ALLOWED_RESOLUTIONS.contains(&reason) {
|
||||
return Err(DeadLetterRepoError::InvalidResolution(reason.to_string()));
|
||||
}
|
||||
let res = sqlx::query(
|
||||
"UPDATE dead_letters \
|
||||
SET resolution = $2, resolved_at = NOW() \
|
||||
WHERE id = $1",
|
||||
)
|
||||
.bind(id.into_inner())
|
||||
.bind(reason)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
if res.rows_affected() == 0 {
|
||||
return Err(DeadLetterRepoError::NotFound(id));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, DeadLetterRepoError> {
|
||||
// Tombstones picked under FOR UPDATE SKIP LOCKED so concurrent
|
||||
// sweepers (cluster mode) don't fight each other.
|
||||
let res = sqlx::query(
|
||||
"DELETE FROM dead_letters \
|
||||
WHERE id IN ( \
|
||||
SELECT id FROM dead_letters \
|
||||
WHERE created_at < $1 \
|
||||
FOR UPDATE SKIP LOCKED \
|
||||
LIMIT $2 \
|
||||
)",
|
||||
)
|
||||
.bind(older_than)
|
||||
.bind(limit)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
Ok(res.rows_affected())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct DeadLetterRowRaw {
|
||||
id: Uuid,
|
||||
app_id: Uuid,
|
||||
original_event_id: Uuid,
|
||||
source: String,
|
||||
op: String,
|
||||
trigger_id: Option<Uuid>,
|
||||
script_id: Option<Uuid>,
|
||||
payload: serde_json::Value,
|
||||
attempt_count: i32,
|
||||
first_attempt_at: DateTime<Utc>,
|
||||
last_attempt_at: DateTime<Utc>,
|
||||
last_error: String,
|
||||
created_at: DateTime<Utc>,
|
||||
resolved_at: Option<DateTime<Utc>>,
|
||||
resolution: Option<String>,
|
||||
}
|
||||
|
||||
impl DeadLetterRowRaw {
|
||||
fn into_row(self) -> DeadLetterRow {
|
||||
DeadLetterRow {
|
||||
id: self.id.into(),
|
||||
app_id: self.app_id.into(),
|
||||
original_event_id: self.original_event_id,
|
||||
source: self.source,
|
||||
op: self.op,
|
||||
trigger_id: self.trigger_id.map(Into::into),
|
||||
script_id: self.script_id.map(Into::into),
|
||||
payload: self.payload,
|
||||
attempt_count: u32::try_from(self.attempt_count).unwrap_or(0),
|
||||
first_attempt_at: self.first_attempt_at,
|
||||
last_attempt_at: self.last_attempt_at,
|
||||
last_error: self.last_error,
|
||||
created_at: self.created_at,
|
||||
resolved_at: self.resolved_at,
|
||||
resolution: self.resolution,
|
||||
}
|
||||
}
|
||||
}
|
||||
118
crates/manager-core/src/dead_letter_service.rs
Normal file
118
crates/manager-core/src/dead_letter_service.rs
Normal file
@@ -0,0 +1,118 @@
|
||||
//! `PostgresDeadLetterService` — replaces `NoopDeadLetterService` in
|
||||
//! v1.1.1's `Services` bundle. Implements `replay` (re-enqueue the
|
||||
//! original event into the outbox + mark the DL row replayed) and
|
||||
//! `resolve` (close the row out with a reason).
|
||||
//!
|
||||
//! Both methods are gated by `Capability::AppDeadLetterManage(AppId)`
|
||||
//! evaluated against `cx.principal`. Public-HTTP scripts with
|
||||
//! `principal: None` fail the check — design notes §4: managing
|
||||
//! dead letters is an admin act.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{DeadLetterError, DeadLetterId, DeadLetterService, SdkCallCx};
|
||||
|
||||
use crate::authz::{self, AuthzRepo, Capability};
|
||||
use crate::dead_letter_repo::{DeadLetterRepo, DeadLetterRepoError, DeadLetterRow};
|
||||
use crate::outbox_repo::{NewOutboxRow, OutboxRepo, OutboxSourceKind};
|
||||
|
||||
pub struct PostgresDeadLetterService {
|
||||
repo: Arc<dyn DeadLetterRepo>,
|
||||
outbox: Arc<dyn OutboxRepo>,
|
||||
authz: Arc<dyn AuthzRepo>,
|
||||
}
|
||||
|
||||
impl PostgresDeadLetterService {
|
||||
#[must_use]
|
||||
pub fn new(
|
||||
repo: Arc<dyn DeadLetterRepo>,
|
||||
outbox: Arc<dyn OutboxRepo>,
|
||||
authz: Arc<dyn AuthzRepo>,
|
||||
) -> Self {
|
||||
Self {
|
||||
repo,
|
||||
outbox,
|
||||
authz,
|
||||
}
|
||||
}
|
||||
|
||||
async fn require_dl_capability(&self, cx: &SdkCallCx) -> Result<(), DeadLetterError> {
|
||||
let Some(ref principal) = cx.principal else {
|
||||
return Err(DeadLetterError::Forbidden);
|
||||
};
|
||||
authz::require(
|
||||
&*self.authz,
|
||||
principal,
|
||||
Capability::AppDeadLetterManage(cx.app_id),
|
||||
)
|
||||
.await
|
||||
.map_err(|_| DeadLetterError::Forbidden)
|
||||
}
|
||||
|
||||
async fn load_row(&self, id: DeadLetterId) -> Result<DeadLetterRow, DeadLetterError> {
|
||||
self.repo
|
||||
.get(id)
|
||||
.await
|
||||
.map_err(map_repo_err)?
|
||||
.ok_or(DeadLetterError::NotFound)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl DeadLetterService for PostgresDeadLetterService {
|
||||
async fn replay(&self, cx: &SdkCallCx, id: DeadLetterId) -> Result<(), DeadLetterError> {
|
||||
self.require_dl_capability(cx).await?;
|
||||
let row = self.load_row(id).await?;
|
||||
if row.app_id != cx.app_id {
|
||||
// Cross-app — treat as not-found to avoid leaking
|
||||
// information about other apps' dead letters.
|
||||
return Err(DeadLetterError::NotFound);
|
||||
}
|
||||
|
||||
let source_kind = OutboxSourceKind::from_wire(&row.source).unwrap_or(OutboxSourceKind::Kv);
|
||||
self.outbox
|
||||
.insert(NewOutboxRow {
|
||||
app_id: row.app_id,
|
||||
source_kind,
|
||||
trigger_id: row.trigger_id,
|
||||
script_id: row.script_id,
|
||||
reply_to: None,
|
||||
payload: row.payload.clone(),
|
||||
origin_principal: None,
|
||||
trigger_depth: 0,
|
||||
root_execution_id: None,
|
||||
})
|
||||
.await
|
||||
.map_err(|e| DeadLetterError::Backend(e.to_string()))?;
|
||||
|
||||
self.repo
|
||||
.resolve(id, "replayed")
|
||||
.await
|
||||
.map_err(map_repo_err)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn resolve(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
id: DeadLetterId,
|
||||
reason: &str,
|
||||
) -> Result<(), DeadLetterError> {
|
||||
self.require_dl_capability(cx).await?;
|
||||
let row = self.load_row(id).await?;
|
||||
if row.app_id != cx.app_id {
|
||||
return Err(DeadLetterError::NotFound);
|
||||
}
|
||||
self.repo.resolve(id, reason).await.map_err(map_repo_err)?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
fn map_repo_err(e: DeadLetterRepoError) -> DeadLetterError {
|
||||
match e {
|
||||
DeadLetterRepoError::NotFound(_) => DeadLetterError::NotFound,
|
||||
DeadLetterRepoError::InvalidResolution(s) => DeadLetterError::InvalidResolution(s),
|
||||
DeadLetterRepoError::Db(e) => DeadLetterError::Backend(e.to_string()),
|
||||
}
|
||||
}
|
||||
316
crates/manager-core/src/dead_letters_api.rs
Normal file
316
crates/manager-core/src/dead_letters_api.rs
Normal file
@@ -0,0 +1,316 @@
|
||||
//! `/api/v1/admin/apps/{id}/dead_letters/*` — dashboard surface for
|
||||
//! the no-default-handler model (design notes §4).
|
||||
//!
|
||||
//! Endpoints:
|
||||
//! - `GET /apps/{id}/dead_letters?unresolved=true` — list view
|
||||
//! - `GET /apps/{id}/dead_letters/count` — badge count
|
||||
//! - `GET /apps/{id}/dead_letters/{dl_id}` — row detail
|
||||
//! - `POST /apps/{id}/dead_letters/{dl_id}/replay` — re-enqueue
|
||||
//! - `POST /apps/{id}/dead_letters/{dl_id}/resolve` — mark resolved
|
||||
//!
|
||||
//! All gated on `Capability::AppDeadLetterManage(app_id)`.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use axum::extract::{Path, Query, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::response::{IntoResponse, Json, Response};
|
||||
use axum::routing::{get, post};
|
||||
use axum::{Extension, Router};
|
||||
use picloud_shared::{AppId, DeadLetterId, DeadLetterService, Principal, SdkCallCx};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::json;
|
||||
|
||||
use crate::app_repo::AppRepository;
|
||||
use crate::authz::{require, AuthzDenied, AuthzError, AuthzRepo, Capability};
|
||||
use crate::dead_letter_repo::{DeadLetterRepo, DeadLetterRepoError, DeadLetterRow};
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct DeadLettersState {
|
||||
pub repo: Arc<dyn DeadLetterRepo>,
|
||||
pub service: Arc<dyn DeadLetterService>,
|
||||
pub apps: Arc<dyn AppRepository>,
|
||||
pub authz: Arc<dyn AuthzRepo>,
|
||||
}
|
||||
|
||||
pub fn dead_letters_router(state: DeadLettersState) -> Router {
|
||||
Router::new()
|
||||
.route("/apps/{app_id}/dead_letters", get(list))
|
||||
.route("/apps/{app_id}/dead_letters/count", get(count))
|
||||
.route("/apps/{app_id}/dead_letters/{dl_id}", get(detail))
|
||||
.route("/apps/{app_id}/dead_letters/{dl_id}/replay", post(replay))
|
||||
.route("/apps/{app_id}/dead_letters/{dl_id}/resolve", post(resolve))
|
||||
.with_state(state)
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct ListQuery {
|
||||
#[serde(default)]
|
||||
pub unresolved: bool,
|
||||
#[serde(default = "default_limit")]
|
||||
pub limit: i64,
|
||||
#[serde(default)]
|
||||
pub offset: i64,
|
||||
}
|
||||
|
||||
const fn default_limit() -> i64 {
|
||||
50
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct ListResponse {
|
||||
pub dead_letters: Vec<DeadLetterDto>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct CountResponse {
|
||||
pub unresolved: i64,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct ResolveBody {
|
||||
pub reason: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct DeadLetterDto {
|
||||
pub id: DeadLetterId,
|
||||
pub app_id: AppId,
|
||||
pub source: String,
|
||||
pub op: String,
|
||||
pub trigger_id: Option<picloud_shared::TriggerId>,
|
||||
pub script_id: Option<picloud_shared::ScriptId>,
|
||||
pub payload: serde_json::Value,
|
||||
pub attempt_count: u32,
|
||||
pub first_attempt_at: chrono::DateTime<chrono::Utc>,
|
||||
pub last_attempt_at: chrono::DateTime<chrono::Utc>,
|
||||
pub last_error: String,
|
||||
pub created_at: chrono::DateTime<chrono::Utc>,
|
||||
pub resolved_at: Option<chrono::DateTime<chrono::Utc>>,
|
||||
pub resolution: Option<String>,
|
||||
}
|
||||
|
||||
impl From<DeadLetterRow> for DeadLetterDto {
|
||||
fn from(r: DeadLetterRow) -> Self {
|
||||
Self {
|
||||
id: r.id,
|
||||
app_id: r.app_id,
|
||||
source: r.source,
|
||||
op: r.op,
|
||||
trigger_id: r.trigger_id,
|
||||
script_id: r.script_id,
|
||||
payload: r.payload,
|
||||
attempt_count: r.attempt_count,
|
||||
first_attempt_at: r.first_attempt_at,
|
||||
last_attempt_at: r.last_attempt_at,
|
||||
last_error: r.last_error,
|
||||
created_at: r.created_at,
|
||||
resolved_at: r.resolved_at,
|
||||
resolution: r.resolution,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn list(
|
||||
State(s): State<DeadLettersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path(app_id): Path<AppId>,
|
||||
Query(q): Query<ListQuery>,
|
||||
) -> Result<Json<ListResponse>, DeadLettersApiError> {
|
||||
ensure_app(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppDeadLetterManage(app_id),
|
||||
)
|
||||
.await?;
|
||||
let rows = s
|
||||
.repo
|
||||
.list_for_app(app_id, q.unresolved, q.limit.clamp(1, 200), q.offset.max(0))
|
||||
.await?;
|
||||
Ok(Json(ListResponse {
|
||||
dead_letters: rows.into_iter().map(Into::into).collect(),
|
||||
}))
|
||||
}
|
||||
|
||||
async fn count(
|
||||
State(s): State<DeadLettersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path(app_id): Path<AppId>,
|
||||
) -> Result<Json<CountResponse>, DeadLettersApiError> {
|
||||
ensure_app(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppDeadLetterManage(app_id),
|
||||
)
|
||||
.await?;
|
||||
let n = s.repo.unresolved_count(app_id).await?;
|
||||
Ok(Json(CountResponse { unresolved: n }))
|
||||
}
|
||||
|
||||
async fn detail(
|
||||
State(s): State<DeadLettersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
|
||||
) -> Result<Json<DeadLetterDto>, DeadLettersApiError> {
|
||||
ensure_app(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppDeadLetterManage(app_id),
|
||||
)
|
||||
.await?;
|
||||
let row = s
|
||||
.repo
|
||||
.get(dl_id)
|
||||
.await?
|
||||
.ok_or(DeadLettersApiError::NotFound(dl_id))?;
|
||||
if row.app_id != app_id {
|
||||
return Err(DeadLettersApiError::NotFound(dl_id));
|
||||
}
|
||||
Ok(Json(row.into()))
|
||||
}
|
||||
|
||||
async fn replay(
|
||||
State(s): State<DeadLettersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
|
||||
) -> Result<StatusCode, DeadLettersApiError> {
|
||||
ensure_app(&*s.apps, app_id).await?;
|
||||
// Authz handled inside the service via SdkCallCx.
|
||||
let cx = admin_cx(app_id, &principal);
|
||||
s.service
|
||||
.replay(&cx, dl_id)
|
||||
.await
|
||||
.map_err(map_service_err)?;
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
|
||||
async fn resolve(
|
||||
State(s): State<DeadLettersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
|
||||
Json(body): Json<ResolveBody>,
|
||||
) -> Result<StatusCode, DeadLettersApiError> {
|
||||
ensure_app(&*s.apps, app_id).await?;
|
||||
let cx = admin_cx(app_id, &principal);
|
||||
s.service
|
||||
.resolve(&cx, dl_id, &body.reason)
|
||||
.await
|
||||
.map_err(map_service_err)?;
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
|
||||
/// Synthesize an `SdkCallCx` for the admin path. The service layer
|
||||
/// reads `cx.app_id` + `cx.principal` and ignores the trigger /
|
||||
/// execution fields, so the per-call ids are arbitrary.
|
||||
fn admin_cx(app_id: AppId, principal: &Principal) -> SdkCallCx {
|
||||
SdkCallCx {
|
||||
app_id,
|
||||
principal: Some(principal.clone()),
|
||||
execution_id: picloud_shared::ExecutionId::new(),
|
||||
request_id: picloud_shared::RequestId::new(),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: picloud_shared::ExecutionId::new(),
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
async fn ensure_app(apps: &dyn AppRepository, app_id: AppId) -> Result<(), DeadLettersApiError> {
|
||||
apps.get_by_id(app_id)
|
||||
.await
|
||||
.map_err(|e| DeadLettersApiError::Backend(e.to_string()))?
|
||||
.ok_or_else(|| DeadLettersApiError::AppNotFound(app_id.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn map_service_err(e: picloud_shared::DeadLetterError) -> DeadLettersApiError {
|
||||
match e {
|
||||
picloud_shared::DeadLetterError::NotFound => {
|
||||
DeadLettersApiError::NotFound(DeadLetterId::new())
|
||||
}
|
||||
picloud_shared::DeadLetterError::Forbidden => DeadLettersApiError::Forbidden,
|
||||
picloud_shared::DeadLetterError::InvalidResolution(s) => {
|
||||
DeadLettersApiError::Invalid(format!("invalid resolution: {s}"))
|
||||
}
|
||||
picloud_shared::DeadLetterError::Backend(s) => DeadLettersApiError::Backend(s),
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum DeadLettersApiError {
|
||||
#[error("app not found: {0}")]
|
||||
AppNotFound(String),
|
||||
|
||||
#[error("dead-letter not found: {0}")]
|
||||
NotFound(DeadLetterId),
|
||||
|
||||
#[error("invalid: {0}")]
|
||||
Invalid(String),
|
||||
|
||||
#[error("forbidden")]
|
||||
Forbidden,
|
||||
|
||||
#[error("authorization repo error: {0}")]
|
||||
AuthzRepo(String),
|
||||
|
||||
#[error("dead-letter backend: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
|
||||
impl From<AuthzDenied> for DeadLettersApiError {
|
||||
fn from(d: AuthzDenied) -> Self {
|
||||
match d {
|
||||
AuthzDenied::Denied => Self::Forbidden,
|
||||
AuthzDenied::Repo(e) => Self::AuthzRepo(e.to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<AuthzError> for DeadLettersApiError {
|
||||
fn from(e: AuthzError) -> Self {
|
||||
Self::AuthzRepo(e.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
impl From<DeadLetterRepoError> for DeadLettersApiError {
|
||||
fn from(e: DeadLetterRepoError) -> Self {
|
||||
match e {
|
||||
DeadLetterRepoError::NotFound(id) => Self::NotFound(id),
|
||||
DeadLetterRepoError::InvalidResolution(s) => Self::Invalid(s),
|
||||
DeadLetterRepoError::Db(e) => Self::Backend(e.to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl IntoResponse for DeadLettersApiError {
|
||||
fn into_response(self) -> Response {
|
||||
let (status, body) = match &self {
|
||||
Self::AppNotFound(_) | Self::NotFound(_) => {
|
||||
(StatusCode::NOT_FOUND, json!({ "error": self.to_string() }))
|
||||
}
|
||||
Self::Invalid(_) => (
|
||||
StatusCode::UNPROCESSABLE_ENTITY,
|
||||
json!({ "error": self.to_string() }),
|
||||
),
|
||||
Self::Forbidden => (StatusCode::FORBIDDEN, json!({ "error": self.to_string() })),
|
||||
Self::AuthzRepo(e) => {
|
||||
tracing::error!(error = %e, "dead_letters authz repo error");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
json!({ "error": "internal error" }),
|
||||
)
|
||||
}
|
||||
Self::Backend(e) => {
|
||||
tracing::error!(error = %e, "dead_letters api backend error");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
json!({ "error": "internal error" }),
|
||||
)
|
||||
}
|
||||
};
|
||||
(status, Json(body)).into_response()
|
||||
}
|
||||
}
|
||||
685
crates/manager-core/src/dispatcher.rs
Normal file
685
crates/manager-core/src/dispatcher.rs
Normal file
@@ -0,0 +1,685 @@
|
||||
//! The triggers-framework dispatcher.
|
||||
//!
|
||||
//! Single tokio task that polls the outbox, claims due rows
|
||||
//! (`FOR UPDATE SKIP LOCKED`), and routes each to the executor.
|
||||
//! Shares the `ExecutionGate` with sync HTTP — they compete for the
|
||||
//! same permit budget, matching design notes §2.
|
||||
//!
|
||||
//! Outcome handling per design notes §3 and §4:
|
||||
//! - reply_to.is_some() (sync HTTP): never retry. Deliver to inbox
|
||||
//! (or write `abandoned_executions` if the receiver dropped).
|
||||
//! - is_dead_letter_handler == true: never retry, never DL. Failure
|
||||
//! just annotates the original DL row with `resolution =
|
||||
//! 'handler_failed'` and bumps a metric.
|
||||
//! - Otherwise on failure: if `attempt_count + 1 < max_attempts`,
|
||||
//! reschedule with backoff + jitter. Else, write a `dead_letters`
|
||||
//! row and delete from outbox.
|
||||
//!
|
||||
//! Depth-limit: `trigger_depth > max_trigger_depth` skips execution
|
||||
//! entirely (log + metric) and deletes the row — does NOT dead-letter
|
||||
//! (design notes §4: depth-exceeded means "you built a loop", and
|
||||
//! dead-lettering would just re-fire the same loop).
|
||||
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
use chrono::Utc;
|
||||
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
|
||||
use picloud_orchestrator_core::{ExecutionGate, ExecutorClient};
|
||||
use picloud_shared::{
|
||||
ExecResponseSummary, ExecutionId, HttpDispatchPayload, InboxDeliveryOutcome, InboxFailureKind,
|
||||
InboxResolver, InboxResult, RequestId, ScriptId, ScriptSandbox, TriggerEvent,
|
||||
};
|
||||
use rand::Rng;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::abandoned_repo::{AbandonedRepo, NewAbandonedExecution};
|
||||
use crate::dead_letter_repo::{DeadLetterRepo, NewDeadLetter};
|
||||
use crate::outbox_repo::{OutboxRepo, OutboxRow, OutboxSourceKind};
|
||||
use crate::principal_resolver::PrincipalResolver;
|
||||
use crate::repo::ScriptRepository;
|
||||
use crate::trigger_config::{BackoffShape, TriggerConfig};
|
||||
use crate::trigger_repo::{TriggerKind, TriggerRepo};
|
||||
|
||||
/// Bundle the dispatcher reads from. Each handle is `Arc<dyn …>` so
|
||||
/// tests can substitute in-memory backings.
|
||||
pub struct Dispatcher {
|
||||
pub outbox: Arc<dyn OutboxRepo>,
|
||||
pub triggers: Arc<dyn TriggerRepo>,
|
||||
pub scripts: Arc<dyn ScriptRepository>,
|
||||
pub dead_letters: Arc<dyn DeadLetterRepo>,
|
||||
pub abandoned: Arc<dyn AbandonedRepo>,
|
||||
pub principals: Arc<dyn PrincipalResolver>,
|
||||
pub executor: Arc<dyn ExecutorClient>,
|
||||
pub gate: Arc<ExecutionGate>,
|
||||
pub inbox: Arc<dyn InboxResolver>,
|
||||
pub config: TriggerConfig,
|
||||
/// Stable id for this dispatcher instance — written into
|
||||
/// `outbox.claimed_by` for forensics. In MVP this is the host's
|
||||
/// pid; cluster mode (v1.3+) uses node identity.
|
||||
pub instance_id: String,
|
||||
}
|
||||
|
||||
/// How many outbox rows the dispatcher tries to claim per tick.
|
||||
/// Bounded to keep the working set small even if there's a flood.
|
||||
const CLAIM_BATCH: i64 = 8;
|
||||
|
||||
/// Polling cadence. Short enough that fan-out feels instant; long
|
||||
/// enough that an idle dispatcher doesn't burn cycles.
|
||||
const TICK_INTERVAL: Duration = Duration::from_millis(100);
|
||||
|
||||
/// Hard cap on the wall-clock budget passed to the executor for an
|
||||
/// async-dispatched script. Sync HTTP gets a per-script timeout via
|
||||
/// the orchestrator path; async rows don't have one, so we apply a
|
||||
/// platform-wide ceiling here. Matches `LocalExecutorClient`'s own
|
||||
/// 5-minute cap.
|
||||
const ASYNC_EXEC_TIMEOUT: Duration = Duration::from_secs(300);
|
||||
|
||||
impl Dispatcher {
|
||||
/// Spawn the dispatcher loop as a detached `tokio::task`. The
|
||||
/// returned `JoinHandle` is dropped — the loop runs for the
|
||||
/// process lifetime.
|
||||
pub fn spawn(self) {
|
||||
tokio::spawn(async move {
|
||||
self.run().await;
|
||||
});
|
||||
}
|
||||
|
||||
async fn run(self) {
|
||||
let mut ticker = tokio::time::interval(TICK_INTERVAL);
|
||||
// Skip the immediate first fire so we don't race startup.
|
||||
ticker.tick().await;
|
||||
loop {
|
||||
ticker.tick().await;
|
||||
if let Err(err) = self.tick().await {
|
||||
tracing::warn!(?err, "dispatcher tick errored");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn tick(&self) -> Result<(), DispatcherError> {
|
||||
// Cheap gate sample so we don't claim rows we can't dispatch.
|
||||
// The exact permit budget is reapplied per-row below.
|
||||
let rows = self
|
||||
.outbox
|
||||
.claim_due(&self.instance_id, CLAIM_BATCH)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
if rows.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
for row in rows {
|
||||
// Process serially within a tick — the outer ticker is the
|
||||
// pacing mechanism. Concurrent dispatchers are a cluster-
|
||||
// mode concern; v1.1.1 MVP has one.
|
||||
if let Err(err) = self.dispatch_one(row).await {
|
||||
tracing::warn!(?err, "dispatch one errored");
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn dispatch_one(&self, row: OutboxRow) -> Result<(), DispatcherError> {
|
||||
// Depth-limit check — design notes §4: loops aren't DL'd.
|
||||
if row.trigger_depth > self.config.max_trigger_depth {
|
||||
tracing::warn!(
|
||||
outbox_id = %row.id,
|
||||
app_id = %row.app_id,
|
||||
trigger_depth = row.trigger_depth,
|
||||
"trigger depth exceeded; dropping row"
|
||||
);
|
||||
// TODO(metrics): bump `picloud_trigger_depth_exceeded{app_id,trigger_id}`.
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Gate admission — non-blocking. If the gate is saturated,
|
||||
// release the claim by rescheduling so another tick can pick
|
||||
// it up. The row stays "due" essentially immediately.
|
||||
let Ok(permit) = self.gate.try_acquire() else {
|
||||
let next = Utc::now() + chrono::Duration::milliseconds(100);
|
||||
self.outbox
|
||||
.reschedule(row.id, row.attempt_count, next)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
return Ok(());
|
||||
};
|
||||
|
||||
// Resolve the trigger config (KV / DL) or pull the HTTP
|
||||
// payload directly off the outbox row.
|
||||
let (resolved, exec_req) = match row.source_kind {
|
||||
OutboxSourceKind::Http => match self.build_http_request(&row).await {
|
||||
Ok(pair) => pair,
|
||||
Err(err) => {
|
||||
tracing::warn!(outbox_id = %row.id, ?err, "http exec build failed; dropping");
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
drop(permit);
|
||||
return Ok(());
|
||||
}
|
||||
},
|
||||
OutboxSourceKind::Kv | OutboxSourceKind::DeadLetter => {
|
||||
let resolved = self.resolve_trigger(&row).await?;
|
||||
let req = match self.build_exec_request(&row, &resolved).await {
|
||||
Ok(req) => req,
|
||||
Err(err) => {
|
||||
tracing::warn!(outbox_id = %row.id, ?err, "exec request build failed; dropping row");
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
drop(permit);
|
||||
return Ok(());
|
||||
}
|
||||
};
|
||||
(resolved, req)
|
||||
}
|
||||
};
|
||||
|
||||
// The gate permit auto-releases when this scope ends or when
|
||||
// the executor finishes. We hand control to the executor and
|
||||
// wait synchronously here — sync HTTP and dispatcher share the
|
||||
// semaphore so this is intentional.
|
||||
let source = resolved.script_source.clone();
|
||||
let outcome = self
|
||||
.executor
|
||||
.execute(&source, exec_req, ASYNC_EXEC_TIMEOUT)
|
||||
.await;
|
||||
drop(permit);
|
||||
|
||||
match outcome {
|
||||
Ok(resp) => self.handle_success(&row, &resolved, resp).await,
|
||||
Err(err) => self.handle_failure(&row, &resolved, err).await,
|
||||
}
|
||||
}
|
||||
|
||||
async fn resolve_trigger(&self, row: &OutboxRow) -> Result<ResolvedTrigger, DispatcherError> {
|
||||
// For KV and DL kinds, the outbox carries `trigger_id`. Use it
|
||||
// to look up the trigger row, then resolve the script.
|
||||
let Some(trigger_id) = row.trigger_id else {
|
||||
return Err(DispatcherError::ResolveTrigger(
|
||||
"outbox row missing trigger_id".into(),
|
||||
));
|
||||
};
|
||||
let trigger = self
|
||||
.triggers
|
||||
.get(trigger_id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
|
||||
.ok_or_else(|| {
|
||||
DispatcherError::ResolveTrigger(format!("trigger {trigger_id} not found"))
|
||||
})?;
|
||||
|
||||
let script = self
|
||||
.scripts
|
||||
.get(trigger.script_id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
|
||||
.ok_or_else(|| {
|
||||
DispatcherError::ResolveTrigger(format!("script {} not found", trigger.script_id))
|
||||
})?;
|
||||
|
||||
Ok(ResolvedTrigger {
|
||||
trigger_kind: trigger.kind,
|
||||
is_dead_letter_handler: matches!(trigger.kind, TriggerKind::DeadLetter),
|
||||
script_id: script.id,
|
||||
script_source: script.source,
|
||||
script_name: script.name,
|
||||
sandbox_overrides: script.sandbox,
|
||||
registered_by_principal: trigger.registered_by_principal,
|
||||
retry_max_attempts: trigger.retry_max_attempts,
|
||||
retry_backoff: trigger.retry_backoff,
|
||||
retry_base_ms: trigger.retry_base_ms,
|
||||
})
|
||||
}
|
||||
|
||||
async fn build_exec_request(
|
||||
&self,
|
||||
row: &OutboxRow,
|
||||
resolved: &ResolvedTrigger,
|
||||
) -> Result<ExecRequest, DispatcherError> {
|
||||
let trigger_event: TriggerEvent = serde_json::from_value(row.payload.clone())
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(format!("decode payload: {e}")))?;
|
||||
|
||||
let principal = self
|
||||
.principals
|
||||
.resolve(resolved.registered_by_principal)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?;
|
||||
|
||||
let execution_id = ExecutionId::new();
|
||||
Ok(ExecRequest {
|
||||
execution_id,
|
||||
request_id: RequestId::new(),
|
||||
script_id: resolved.script_id,
|
||||
script_name: resolved.script_name.clone(),
|
||||
invocation_type: InvocationType::Function,
|
||||
path: format!("/trigger/{}", trigger_event.source()),
|
||||
headers: std::collections::BTreeMap::new(),
|
||||
body: serde_json::Value::Null,
|
||||
params: std::collections::BTreeMap::new(),
|
||||
query: std::collections::BTreeMap::new(),
|
||||
rest: String::new(),
|
||||
sandbox_overrides: resolved.sandbox_overrides,
|
||||
app_id: row.app_id,
|
||||
principal: Some(principal),
|
||||
trigger_depth: row.trigger_depth,
|
||||
root_execution_id: row.root_execution_id.unwrap_or(execution_id),
|
||||
is_dead_letter_handler: resolved.is_dead_letter_handler,
|
||||
event: Some(trigger_event),
|
||||
})
|
||||
}
|
||||
|
||||
/// Build an `(ResolvedTrigger, ExecRequest)` for an HTTP outbox
|
||||
/// row. HTTP rows don't have a backing `triggers` row (the
|
||||
/// `trigger_id` references `routes.id` instead). We pull the
|
||||
/// script id off the outbox row, the request shape off the
|
||||
/// payload, and synthesize a `ResolvedTrigger` with retry
|
||||
/// settings irrelevant for HTTP (sync HTTP is never retried;
|
||||
/// async HTTP uses default policy from `TriggerConfig`).
|
||||
async fn build_http_request(
|
||||
&self,
|
||||
row: &OutboxRow,
|
||||
) -> Result<(ResolvedTrigger, ExecRequest), DispatcherError> {
|
||||
let Some(script_id) = row.script_id else {
|
||||
return Err(DispatcherError::ResolveTrigger(
|
||||
"HTTP outbox row missing script_id".into(),
|
||||
));
|
||||
};
|
||||
let script = self
|
||||
.scripts
|
||||
.get(script_id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
|
||||
.ok_or_else(|| {
|
||||
DispatcherError::ResolveTrigger(format!("script {script_id} not found"))
|
||||
})?;
|
||||
|
||||
let payload: HttpDispatchPayload = serde_json::from_value(row.payload.clone())
|
||||
.map_err(|e| DispatcherError::ResolveTrigger(format!("decode http payload: {e}")))?;
|
||||
|
||||
let execution_id = ExecutionId::new();
|
||||
let req = ExecRequest {
|
||||
execution_id,
|
||||
request_id: RequestId::new(),
|
||||
script_id,
|
||||
script_name: payload.script_name.clone(),
|
||||
invocation_type: InvocationType::Http,
|
||||
path: payload.path.clone(),
|
||||
headers: payload.headers,
|
||||
body: payload.body,
|
||||
params: payload.params,
|
||||
query: payload.query,
|
||||
rest: payload.rest,
|
||||
sandbox_overrides: script.sandbox,
|
||||
app_id: row.app_id,
|
||||
// HTTP outbox rows don't run as the trigger registrant —
|
||||
// they run with no principal (public ingress) or the
|
||||
// attached one (origin_principal forensic field is not
|
||||
// promoted to execution principal in this MVP).
|
||||
principal: None,
|
||||
trigger_depth: row.trigger_depth,
|
||||
root_execution_id: row.root_execution_id.unwrap_or(execution_id),
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
};
|
||||
|
||||
let resolved = ResolvedTrigger {
|
||||
trigger_kind: TriggerKind::Kv, // placeholder; HTTP doesn't have a kind
|
||||
is_dead_letter_handler: false,
|
||||
script_id,
|
||||
script_source: script.source,
|
||||
script_name: payload.script_name,
|
||||
sandbox_overrides: script.sandbox,
|
||||
// HTTP outbox rows don't carry a registered_by_principal
|
||||
// — use a sentinel zero UUID since this field isn't used
|
||||
// downstream for HTTP (no retries, no inbox principal).
|
||||
registered_by_principal: picloud_shared::AdminUserId::from(uuid::Uuid::nil()),
|
||||
// Async HTTP uses the platform default retry policy from
|
||||
// TriggerConfig. Sync HTTP (reply_to.is_some) never retries
|
||||
// regardless.
|
||||
retry_max_attempts: self.config.retry_max_attempts,
|
||||
retry_backoff: self.config.retry_backoff,
|
||||
retry_base_ms: self.config.retry_base_ms,
|
||||
};
|
||||
Ok((resolved, req))
|
||||
}
|
||||
|
||||
async fn handle_success(
|
||||
&self,
|
||||
row: &OutboxRow,
|
||||
_resolved: &ResolvedTrigger,
|
||||
resp: ExecResponse,
|
||||
) -> Result<(), DispatcherError> {
|
||||
if let Some(inbox_id) = row.reply_to {
|
||||
self.deliver_inbox(row, inbox_id, InboxResult::Success(summarize(&resp)))
|
||||
.await;
|
||||
}
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn handle_failure(
|
||||
&self,
|
||||
row: &OutboxRow,
|
||||
resolved: &ResolvedTrigger,
|
||||
err: ExecError,
|
||||
) -> Result<(), DispatcherError> {
|
||||
// Sync HTTP: always single-attempt. Always deliver outcome
|
||||
// (success-or-failure) to the inbox. Never retry, never DL.
|
||||
if let Some(inbox_id) = row.reply_to {
|
||||
let (kind, message) = classify_exec_error(&err);
|
||||
self.deliver_inbox(
|
||||
row,
|
||||
inbox_id,
|
||||
InboxResult::Failure {
|
||||
kind,
|
||||
message: message.clone(),
|
||||
},
|
||||
)
|
||||
.await;
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Dead-letter handler: never retry, never DL. Failure
|
||||
// annotates the original DL row + bumps a metric.
|
||||
if resolved.is_dead_letter_handler {
|
||||
tracing::error!(
|
||||
outbox_id = %row.id,
|
||||
app_id = %row.app_id,
|
||||
?err,
|
||||
"dead-letter handler failed; not retrying"
|
||||
);
|
||||
// TODO(metrics): bump `picloud_dead_letter_handler_failures{app_id}`.
|
||||
// Annotate the original DL row (id is `row.payload.dead_letter.id`
|
||||
// when the payload is a DeadLetter TriggerEvent). Best-effort:
|
||||
// if the payload doesn't decode, just log and move on.
|
||||
if let Ok(TriggerEvent::DeadLetter { dead_letter_id, .. }) =
|
||||
serde_json::from_value::<TriggerEvent>(row.payload.clone())
|
||||
{
|
||||
if let Err(e) = self
|
||||
.dead_letters
|
||||
.resolve(dead_letter_id, "handler_failed")
|
||||
.await
|
||||
{
|
||||
tracing::warn!(?e, "could not annotate DL row as handler_failed");
|
||||
}
|
||||
}
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Async event: retry per policy, then dead-letter.
|
||||
let attempt = row.attempt_count + 1;
|
||||
if attempt < resolved.retry_max_attempts {
|
||||
let delay = compute_backoff(
|
||||
attempt,
|
||||
resolved.retry_backoff,
|
||||
resolved.retry_base_ms,
|
||||
self.config.retry_jitter_pct,
|
||||
);
|
||||
let next = Utc::now() + chrono::Duration::milliseconds(i64::from(delay));
|
||||
tracing::info!(
|
||||
outbox_id = %row.id,
|
||||
attempt,
|
||||
max_attempts = resolved.retry_max_attempts,
|
||||
retry_in_ms = delay,
|
||||
"rescheduling outbox row"
|
||||
);
|
||||
self.outbox
|
||||
.reschedule(row.id, attempt, next)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Exhausted retries → dead-letter.
|
||||
let (op, source) = describe_event(&row.payload);
|
||||
let now = Utc::now();
|
||||
if let Err(e) = self
|
||||
.dead_letters
|
||||
.insert(NewDeadLetter {
|
||||
app_id: row.app_id,
|
||||
original_event_id: row.id,
|
||||
source,
|
||||
op,
|
||||
trigger_id: row.trigger_id,
|
||||
script_id: Some(resolved.script_id),
|
||||
payload: row.payload.clone(),
|
||||
attempt_count: attempt,
|
||||
first_attempt_at: row.created_at,
|
||||
last_attempt_at: now,
|
||||
last_error: err.to_string(),
|
||||
})
|
||||
.await
|
||||
{
|
||||
tracing::error!(?e, "failed to write dead-letter row");
|
||||
}
|
||||
self.outbox
|
||||
.delete(row.id)
|
||||
.await
|
||||
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn deliver_inbox(&self, row: &OutboxRow, inbox_id: Uuid, result: InboxResult) {
|
||||
match self.inbox.deliver(inbox_id, result.clone()).await {
|
||||
InboxDeliveryOutcome::Delivered => {}
|
||||
InboxDeliveryOutcome::Abandoned => {
|
||||
// Receiver was dropped — record forensic row + bump
|
||||
// metric.
|
||||
let (status_code, summary) = match &result {
|
||||
InboxResult::Success(s) => (s.status_code, None),
|
||||
InboxResult::Failure { kind, message } => {
|
||||
(failure_kind_to_status(*kind), Some(message.clone()))
|
||||
}
|
||||
};
|
||||
if let Err(e) = self
|
||||
.abandoned
|
||||
.insert(NewAbandonedExecution {
|
||||
app_id: row.app_id,
|
||||
outbox_id: row.id,
|
||||
script_id: row.script_id,
|
||||
inbox_id,
|
||||
status_code,
|
||||
result_summary: summary,
|
||||
})
|
||||
.await
|
||||
{
|
||||
tracing::warn!(?e, "abandoned_executions insert failed");
|
||||
}
|
||||
// TODO(metrics): bump `picloud_abandoned_executions_total{app_id}`.
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct ResolvedTrigger {
|
||||
pub trigger_kind: TriggerKind,
|
||||
pub is_dead_letter_handler: bool,
|
||||
pub script_id: ScriptId,
|
||||
pub script_source: String,
|
||||
pub script_name: String,
|
||||
pub sandbox_overrides: ScriptSandbox,
|
||||
pub registered_by_principal: picloud_shared::AdminUserId,
|
||||
pub retry_max_attempts: u32,
|
||||
pub retry_backoff: BackoffShape,
|
||||
pub retry_base_ms: u32,
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum DispatcherError {
|
||||
#[error("outbox: {0}")]
|
||||
Outbox(String),
|
||||
#[error("resolve trigger: {0}")]
|
||||
ResolveTrigger(String),
|
||||
}
|
||||
|
||||
fn summarize(resp: &ExecResponse) -> ExecResponseSummary {
|
||||
ExecResponseSummary {
|
||||
status_code: resp.status_code,
|
||||
headers: resp.headers.clone(),
|
||||
body: resp.body.clone(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Map `ExecError` onto the design-notes §3 status-code table.
|
||||
fn classify_exec_error(err: &ExecError) -> (InboxFailureKind, String) {
|
||||
match err {
|
||||
ExecError::Parse(s) | ExecError::InvalidResponse(s) => {
|
||||
(InboxFailureKind::Validation, s.clone())
|
||||
}
|
||||
ExecError::Timeout(_) => (InboxFailureKind::Timeout, err.to_string()),
|
||||
ExecError::OperationBudgetExceeded => (InboxFailureKind::OperationBudget, err.to_string()),
|
||||
ExecError::Overloaded { .. } => (InboxFailureKind::Overloaded, err.to_string()),
|
||||
ExecError::Runtime(s) => (InboxFailureKind::Runtime, s.clone()),
|
||||
}
|
||||
}
|
||||
|
||||
fn failure_kind_to_status(k: InboxFailureKind) -> u16 {
|
||||
match k {
|
||||
InboxFailureKind::Validation => 422,
|
||||
InboxFailureKind::Runtime => 502,
|
||||
InboxFailureKind::Overloaded => 503,
|
||||
InboxFailureKind::Timeout => 504,
|
||||
InboxFailureKind::OperationBudget => 507,
|
||||
InboxFailureKind::Platform => 500,
|
||||
}
|
||||
}
|
||||
|
||||
/// `(op, source)` extracted from the outbox payload. Used to seed the
|
||||
/// `dead_letters` row when retries exhaust.
|
||||
fn describe_event(payload: &serde_json::Value) -> (String, String) {
|
||||
let source = payload
|
||||
.get("source")
|
||||
.and_then(|v| v.as_str())
|
||||
.unwrap_or("")
|
||||
.to_string();
|
||||
let op = payload
|
||||
.get("op")
|
||||
.and_then(|v| v.as_str())
|
||||
.unwrap_or("")
|
||||
.to_string();
|
||||
(op, source)
|
||||
}
|
||||
|
||||
/// Compute backoff (ms) for the given attempt + policy + jitter.
|
||||
/// Attempt is 1-indexed (first retry = attempt 1).
|
||||
#[must_use]
|
||||
pub fn compute_backoff(attempt: u32, backoff: BackoffShape, base_ms: u32, jitter_pct: u32) -> u32 {
|
||||
let base_ms = u64::from(base_ms);
|
||||
let attempt = u64::from(attempt.saturating_sub(1));
|
||||
let raw = match backoff {
|
||||
BackoffShape::Constant => base_ms,
|
||||
BackoffShape::Linear => base_ms * (attempt + 1),
|
||||
// 1x base, 2x base, 4x base, … (saturating).
|
||||
BackoffShape::Exponential => base_ms.saturating_mul(1u64 << attempt.min(20)),
|
||||
};
|
||||
let raw = u32::try_from(raw.min(u64::from(u32::MAX))).unwrap_or(u32::MAX);
|
||||
apply_jitter(raw, jitter_pct)
|
||||
}
|
||||
|
||||
fn apply_jitter(raw: u32, pct: u32) -> u32 {
|
||||
if pct == 0 {
|
||||
return raw;
|
||||
}
|
||||
let pct = pct.min(100);
|
||||
// ±span% — bounded by raw itself so we can't underflow when
|
||||
// raw + offset goes below zero.
|
||||
let span = u64::from(raw) * u64::from(pct) / 100;
|
||||
if span == 0 {
|
||||
return raw;
|
||||
}
|
||||
let span_i64 = i64::try_from(span).unwrap_or(i64::MAX);
|
||||
let mut rng = rand::thread_rng();
|
||||
let offset = rng.gen_range(-span_i64..=span_i64);
|
||||
let signed = i64::from(raw).saturating_add(offset).max(0);
|
||||
u32::try_from(signed.min(i64::from(u32::MAX))).unwrap_or(u32::MAX)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn exponential_backoff_doubles_per_attempt() {
|
||||
// No jitter (pct=0) for a deterministic check.
|
||||
assert_eq!(compute_backoff(1, BackoffShape::Exponential, 1000, 0), 1000);
|
||||
assert_eq!(compute_backoff(2, BackoffShape::Exponential, 1000, 0), 2000);
|
||||
assert_eq!(compute_backoff(3, BackoffShape::Exponential, 1000, 0), 4000);
|
||||
assert_eq!(compute_backoff(4, BackoffShape::Exponential, 1000, 0), 8000);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn linear_backoff_scales_with_attempt() {
|
||||
assert_eq!(compute_backoff(1, BackoffShape::Linear, 100, 0), 100);
|
||||
assert_eq!(compute_backoff(2, BackoffShape::Linear, 100, 0), 200);
|
||||
assert_eq!(compute_backoff(5, BackoffShape::Linear, 100, 0), 500);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn constant_backoff_returns_base() {
|
||||
for attempt in 1..=5 {
|
||||
assert_eq!(
|
||||
compute_backoff(attempt, BackoffShape::Constant, 750, 0),
|
||||
750
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn jitter_within_pct_of_base() {
|
||||
for _ in 0..100 {
|
||||
let v = compute_backoff(1, BackoffShape::Constant, 1000, 20);
|
||||
// ±20% of 1000 = 800..=1200.
|
||||
assert!((800..=1200).contains(&v), "jitter out of range: {v}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn classify_exec_error_covers_every_variant() {
|
||||
let parse = classify_exec_error(&ExecError::Parse("nope".into()));
|
||||
assert!(matches!(parse.0, InboxFailureKind::Validation));
|
||||
let invalid = classify_exec_error(&ExecError::InvalidResponse("bad".into()));
|
||||
assert!(matches!(invalid.0, InboxFailureKind::Validation));
|
||||
let timeout = classify_exec_error(&ExecError::Timeout(30));
|
||||
assert!(matches!(timeout.0, InboxFailureKind::Timeout));
|
||||
let budget = classify_exec_error(&ExecError::OperationBudgetExceeded);
|
||||
assert!(matches!(budget.0, InboxFailureKind::OperationBudget));
|
||||
let runtime = classify_exec_error(&ExecError::Runtime("threw".into()));
|
||||
assert!(matches!(runtime.0, InboxFailureKind::Runtime));
|
||||
let overload = classify_exec_error(&ExecError::Overloaded {
|
||||
retry_after_secs: 1,
|
||||
});
|
||||
assert!(matches!(overload.0, InboxFailureKind::Overloaded));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn failure_kind_status_codes_match_design_notes() {
|
||||
assert_eq!(failure_kind_to_status(InboxFailureKind::Validation), 422);
|
||||
assert_eq!(failure_kind_to_status(InboxFailureKind::Runtime), 502);
|
||||
assert_eq!(failure_kind_to_status(InboxFailureKind::Overloaded), 503);
|
||||
assert_eq!(failure_kind_to_status(InboxFailureKind::Timeout), 504);
|
||||
assert_eq!(
|
||||
failure_kind_to_status(InboxFailureKind::OperationBudget),
|
||||
507
|
||||
);
|
||||
assert_eq!(failure_kind_to_status(InboxFailureKind::Platform), 500);
|
||||
}
|
||||
}
|
||||
95
crates/manager-core/src/gc.rs
Normal file
95
crates/manager-core/src/gc.rs
Normal file
@@ -0,0 +1,95 @@
|
||||
//! Weekly retention sweepers for `dead_letters` + `abandoned_executions`.
|
||||
//!
|
||||
//! Both use the `FOR UPDATE SKIP LOCKED` claim pattern so concurrent
|
||||
//! sweepers (cluster mode v1.3+) don't fight each other. Defaults
|
||||
//! match design notes §3 / §4: 30 days for DL, 7 days for abandoned.
|
||||
//! Both env-overridable via `PICLOUD_DEAD_LETTER_RETENTION_DAYS` and
|
||||
//! `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS` (loaded by
|
||||
//! `TriggerConfig::from_env`).
|
||||
//!
|
||||
//! Spawned from `build_app` alongside `spawn_session_pruner`.
|
||||
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
use chrono::Utc;
|
||||
|
||||
use crate::abandoned_repo::AbandonedRepo;
|
||||
use crate::dead_letter_repo::DeadLetterRepo;
|
||||
|
||||
/// Weekly sweep cadence — matches `spawn_session_pruner` shape.
|
||||
const SWEEP_INTERVAL: Duration = Duration::from_secs(7 * 24 * 60 * 60);
|
||||
|
||||
/// Per-tick batch cap so we don't try to delete millions of rows in
|
||||
/// one transaction. The loop keeps deleting batches until a tick
|
||||
/// returns 0 rows affected.
|
||||
const SWEEP_BATCH: i64 = 5_000;
|
||||
|
||||
pub fn spawn_dead_letter_gc(repo: Arc<dyn DeadLetterRepo>, retention_days: u32) {
|
||||
tokio::spawn(async move {
|
||||
let mut ticker = tokio::time::interval(SWEEP_INTERVAL);
|
||||
// Skip the immediate first fire — don't sweep at process start.
|
||||
ticker.tick().await;
|
||||
loop {
|
||||
ticker.tick().await;
|
||||
sweep_dead_letters(&*repo, retention_days).await;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
pub fn spawn_abandoned_gc(repo: Arc<dyn AbandonedRepo>, retention_days: u32) {
|
||||
tokio::spawn(async move {
|
||||
let mut ticker = tokio::time::interval(SWEEP_INTERVAL);
|
||||
ticker.tick().await;
|
||||
loop {
|
||||
ticker.tick().await;
|
||||
sweep_abandoned(&*repo, retention_days).await;
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
async fn sweep_dead_letters(repo: &dyn DeadLetterRepo, retention_days: u32) {
|
||||
let cutoff = Utc::now() - chrono::Duration::days(i64::from(retention_days));
|
||||
let mut total: u64 = 0;
|
||||
loop {
|
||||
match repo.gc(cutoff, SWEEP_BATCH).await {
|
||||
Ok(0) => break,
|
||||
Ok(n) => {
|
||||
total += n;
|
||||
if n < SWEEP_BATCH as u64 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(?e, "dead_letters GC sweep errored");
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if total > 0 {
|
||||
tracing::info!(swept = total, "dead_letters GC swept");
|
||||
}
|
||||
}
|
||||
|
||||
async fn sweep_abandoned(repo: &dyn AbandonedRepo, retention_days: u32) {
|
||||
let cutoff = Utc::now() - chrono::Duration::days(i64::from(retention_days));
|
||||
let mut total: u64 = 0;
|
||||
loop {
|
||||
match repo.gc(cutoff, SWEEP_BATCH).await {
|
||||
Ok(0) => break,
|
||||
Ok(n) => {
|
||||
total += n;
|
||||
if n < SWEEP_BATCH as u64 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(?e, "abandoned_executions GC sweep errored");
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
if total > 0 {
|
||||
tracing::info!(swept = total, "abandoned_executions GC swept");
|
||||
}
|
||||
}
|
||||
223
crates/manager-core/src/kv_repo.rs
Normal file
223
crates/manager-core/src/kv_repo.rs
Normal file
@@ -0,0 +1,223 @@
|
||||
//! Low-level Postgres CRUD over `kv_entries`. Stays storage-only;
|
||||
//! authorization, event emission, and empty-collection validation live
|
||||
//! one layer up in `KvServiceImpl`.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
|
||||
use base64::Engine as _;
|
||||
use picloud_shared::{AppId, KvListPage};
|
||||
use sqlx::PgPool;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum KvRepoError {
|
||||
#[error("database error: {0}")]
|
||||
Db(#[from] sqlx::Error),
|
||||
|
||||
#[error("invalid pagination cursor")]
|
||||
InvalidCursor,
|
||||
}
|
||||
|
||||
/// Repo surface. The trait is exposed so tests can substitute an
|
||||
/// in-memory backing without spinning up Postgres.
|
||||
#[async_trait]
|
||||
pub trait KvRepo: Send + Sync {
|
||||
async fn get(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError>;
|
||||
|
||||
/// Upserts the row. Returns the previous value (if any) so callers
|
||||
/// can determine whether this was an `insert` or an `update` for
|
||||
/// the emitted `ServiceEvent`.
|
||||
async fn set(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: serde_json::Value,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError>;
|
||||
|
||||
/// Returns the deleted value if present, `None` if the row didn't
|
||||
/// exist. The caller turns the `bool was-present` part into the
|
||||
/// SDK's return value; the `Option<value>` part feeds the
|
||||
/// `old_payload` field of the emitted delete event.
|
||||
async fn delete(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError>;
|
||||
|
||||
async fn has(&self, app_id: AppId, collection: &str, key: &str) -> Result<bool, KvRepoError>;
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvRepoError>;
|
||||
}
|
||||
|
||||
pub struct PostgresKvRepo {
|
||||
pool: PgPool,
|
||||
}
|
||||
|
||||
impl PostgresKvRepo {
|
||||
#[must_use]
|
||||
pub fn new(pool: PgPool) -> Self {
|
||||
Self { pool }
|
||||
}
|
||||
}
|
||||
|
||||
/// Hard ceiling on `list` page size — scripts that pass anything larger
|
||||
/// silently get clamped to this. Cursor-style pagination keeps a single
|
||||
/// request bounded; clients fetch the next page via the returned cursor.
|
||||
const KV_LIST_MAX_LIMIT: u32 = 1_000;
|
||||
const KV_LIST_DEFAULT_LIMIT: u32 = 100;
|
||||
|
||||
#[async_trait]
|
||||
impl KvRepo for PostgresKvRepo {
|
||||
async fn get(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
let row: Option<(serde_json::Value,)> = sqlx::query_as(
|
||||
"SELECT value FROM kv_entries \
|
||||
WHERE app_id = $1 AND collection = $2 AND key = $3",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(collection)
|
||||
.bind(key)
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
Ok(row.map(|(v,)| v))
|
||||
}
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: serde_json::Value,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
// `RETURNING` after `ON CONFLICT DO UPDATE` exposes the old
|
||||
// value via the `xmax`/old-row trick: capture the prior value
|
||||
// with a CTE so callers know whether this was insert vs update.
|
||||
let row: Option<(Option<serde_json::Value>,)> = sqlx::query_as(
|
||||
"WITH prev AS (\
|
||||
SELECT value FROM kv_entries \
|
||||
WHERE app_id = $1 AND collection = $2 AND key = $3\
|
||||
), \
|
||||
upserted AS (\
|
||||
INSERT INTO kv_entries (app_id, collection, key, value) \
|
||||
VALUES ($1, $2, $3, $4) \
|
||||
ON CONFLICT (app_id, collection, key) DO UPDATE \
|
||||
SET value = EXCLUDED.value, updated_at = NOW() \
|
||||
RETURNING 1\
|
||||
) \
|
||||
SELECT (SELECT value FROM prev) FROM upserted",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(collection)
|
||||
.bind(key)
|
||||
.bind(value)
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
Ok(row.and_then(|(v,)| v))
|
||||
}
|
||||
|
||||
async fn delete(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
let row: Option<(serde_json::Value,)> = sqlx::query_as(
|
||||
"DELETE FROM kv_entries \
|
||||
WHERE app_id = $1 AND collection = $2 AND key = $3 \
|
||||
RETURNING value",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(collection)
|
||||
.bind(key)
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
Ok(row.map(|(v,)| v))
|
||||
}
|
||||
|
||||
async fn has(&self, app_id: AppId, collection: &str, key: &str) -> Result<bool, KvRepoError> {
|
||||
let row: Option<(i64,)> = sqlx::query_as(
|
||||
"SELECT 1 FROM kv_entries \
|
||||
WHERE app_id = $1 AND collection = $2 AND key = $3",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(collection)
|
||||
.bind(key)
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
Ok(row.is_some())
|
||||
}
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvRepoError> {
|
||||
let limit = if limit == 0 {
|
||||
KV_LIST_DEFAULT_LIMIT
|
||||
} else {
|
||||
limit.min(KV_LIST_MAX_LIMIT)
|
||||
};
|
||||
|
||||
let last_key = match cursor {
|
||||
Some(c) => Some(decode_cursor(c)?),
|
||||
None => None,
|
||||
};
|
||||
|
||||
// Keyset pagination: rows beyond `last_key` ordered by key.
|
||||
// `+1` to detect a "more pages" condition without a separate
|
||||
// COUNT query.
|
||||
let take = i64::from(limit) + 1;
|
||||
let rows: Vec<(String,)> = sqlx::query_as(
|
||||
"SELECT key FROM kv_entries \
|
||||
WHERE app_id = $1 AND collection = $2 \
|
||||
AND ($3::text IS NULL OR key > $3) \
|
||||
ORDER BY key ASC \
|
||||
LIMIT $4",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(collection)
|
||||
.bind(last_key.as_deref())
|
||||
.bind(take)
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
|
||||
let mut keys: Vec<String> = rows.into_iter().map(|(k,)| k).collect();
|
||||
let next_cursor = if keys.len() > limit as usize {
|
||||
keys.truncate(limit as usize);
|
||||
keys.last().map(|k| encode_cursor(k))
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
Ok(KvListPage { keys, next_cursor })
|
||||
}
|
||||
}
|
||||
|
||||
fn encode_cursor(last_key: &str) -> String {
|
||||
URL_SAFE_NO_PAD.encode(last_key.as_bytes())
|
||||
}
|
||||
|
||||
fn decode_cursor(cursor: &str) -> Result<String, KvRepoError> {
|
||||
let bytes = URL_SAFE_NO_PAD
|
||||
.decode(cursor)
|
||||
.map_err(|_| KvRepoError::InvalidCursor)?;
|
||||
String::from_utf8(bytes).map_err(|_| KvRepoError::InvalidCursor)
|
||||
}
|
||||
525
crates/manager-core/src/kv_service.rs
Normal file
525
crates/manager-core/src/kv_service.rs
Normal file
@@ -0,0 +1,525 @@
|
||||
//! `KvServiceImpl` — wires the `KvRepo` underneath the
|
||||
//! `picloud_shared::KvService` trait that scripts see via the Rhai
|
||||
//! bridge.
|
||||
//!
|
||||
//! Layers added here (vs the raw repo):
|
||||
//!
|
||||
//! 1. Empty-collection rejection at the SDK boundary
|
||||
//! (`docs/sdk-shape.md`).
|
||||
//! 2. **Script-as-gate authz**: when `cx.principal.is_some()` we run
|
||||
//! `authz::require(...)`; when it's `None` (public unauthenticated
|
||||
//! HTTP — the common case for public routes) we skip the check.
|
||||
//! Cross-app isolation isn't affected — every query is keyed by
|
||||
//! `cx.app_id`, never an argument.
|
||||
//! 3. `ServiceEvent` emission after each mutation (`insert` / `update`
|
||||
//! / `delete`). v1.1.0 ships a `NoopEventEmitter` so this is a
|
||||
//! no-op until the outbox emitter lands later in v1.1.1.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{
|
||||
KvError, KvListPage, KvService, SdkCallCx, ServiceEvent, ServiceEventEmitter,
|
||||
};
|
||||
|
||||
use crate::authz::{self, AuthzRepo, Capability};
|
||||
use crate::kv_repo::{KvRepo, KvRepoError};
|
||||
|
||||
pub struct KvServiceImpl {
|
||||
repo: Arc<dyn KvRepo>,
|
||||
authz: Arc<dyn AuthzRepo>,
|
||||
events: Arc<dyn ServiceEventEmitter>,
|
||||
}
|
||||
|
||||
impl KvServiceImpl {
|
||||
#[must_use]
|
||||
pub fn new(
|
||||
repo: Arc<dyn KvRepo>,
|
||||
authz: Arc<dyn AuthzRepo>,
|
||||
events: Arc<dyn ServiceEventEmitter>,
|
||||
) -> Self {
|
||||
Self {
|
||||
repo,
|
||||
authz,
|
||||
events,
|
||||
}
|
||||
}
|
||||
|
||||
async fn check_read(&self, cx: &SdkCallCx) -> Result<(), KvError> {
|
||||
if let Some(ref principal) = cx.principal {
|
||||
authz::require(&*self.authz, principal, Capability::AppKvRead(cx.app_id))
|
||||
.await
|
||||
.map_err(|_| KvError::Forbidden)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn check_write(&self, cx: &SdkCallCx) -> Result<(), KvError> {
|
||||
if let Some(ref principal) = cx.principal {
|
||||
authz::require(&*self.authz, principal, Capability::AppKvWrite(cx.app_id))
|
||||
.await
|
||||
.map_err(|_| KvError::Forbidden)?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
fn validate_collection(collection: &str) -> Result<(), KvError> {
|
||||
if collection.is_empty() {
|
||||
return Err(KvError::InvalidCollection);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
impl From<KvRepoError> for KvError {
|
||||
fn from(e: KvRepoError) -> Self {
|
||||
Self::Backend(e.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl KvService for KvServiceImpl {
|
||||
async fn get(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvError> {
|
||||
validate_collection(collection)?;
|
||||
self.check_read(cx).await?;
|
||||
Ok(self.repo.get(cx.app_id, collection, key).await?)
|
||||
}
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: serde_json::Value,
|
||||
) -> Result<(), KvError> {
|
||||
validate_collection(collection)?;
|
||||
self.check_write(cx).await?;
|
||||
let previous = self
|
||||
.repo
|
||||
.set(cx.app_id, collection, key, value.clone())
|
||||
.await?;
|
||||
let op = if previous.is_some() {
|
||||
"update"
|
||||
} else {
|
||||
"insert"
|
||||
};
|
||||
// Emit unconditionally; the noop emitter drops it, the outbox
|
||||
// emitter persists it. Best-effort: a failed emit is logged
|
||||
// but does not roll back the write.
|
||||
if let Err(e) = self
|
||||
.events
|
||||
.emit(
|
||||
cx,
|
||||
ServiceEvent {
|
||||
source: "kv",
|
||||
op,
|
||||
collection: Some(collection.to_string()),
|
||||
key: Some(key.to_string()),
|
||||
payload: Some(value),
|
||||
old_payload: previous,
|
||||
},
|
||||
)
|
||||
.await
|
||||
{
|
||||
tracing::warn!(error = %e, source = "kv", op, "event emit failed");
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
|
||||
validate_collection(collection)?;
|
||||
self.check_write(cx).await?;
|
||||
let previous = self.repo.delete(cx.app_id, collection, key).await?;
|
||||
let was_present = previous.is_some();
|
||||
if was_present {
|
||||
if let Err(e) = self
|
||||
.events
|
||||
.emit(
|
||||
cx,
|
||||
ServiceEvent {
|
||||
source: "kv",
|
||||
op: "delete",
|
||||
collection: Some(collection.to_string()),
|
||||
key: Some(key.to_string()),
|
||||
payload: None,
|
||||
old_payload: previous,
|
||||
},
|
||||
)
|
||||
.await
|
||||
{
|
||||
tracing::warn!(error = %e, source = "kv", op = "delete", "event emit failed");
|
||||
}
|
||||
}
|
||||
Ok(was_present)
|
||||
}
|
||||
|
||||
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
|
||||
validate_collection(collection)?;
|
||||
self.check_read(cx).await?;
|
||||
Ok(self.repo.has(cx.app_id, collection, key).await?)
|
||||
}
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvError> {
|
||||
validate_collection(collection)?;
|
||||
self.check_read(cx).await?;
|
||||
Ok(self.repo.list(cx.app_id, collection, cursor, limit).await?)
|
||||
}
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// Tests — in-memory KvRepo so unit tests don't need Postgres.
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use crate::authz::{AuthzError, AuthzRepo};
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{
|
||||
AdminUserId, AppId, AppRole, ExecutionId, InstanceRole, NoopEventEmitter, Principal,
|
||||
RequestId, UserId,
|
||||
};
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use tokio::sync::Mutex;
|
||||
|
||||
#[derive(Default)]
|
||||
struct InMemoryKvRepo {
|
||||
data: Mutex<BTreeMap<(AppId, String, String), serde_json::Value>>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl KvRepo for InMemoryKvRepo {
|
||||
async fn get(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
Ok(self
|
||||
.data
|
||||
.lock()
|
||||
.await
|
||||
.get(&(app_id, collection.to_string(), key.to_string()))
|
||||
.cloned())
|
||||
}
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: serde_json::Value,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
Ok(self
|
||||
.data
|
||||
.lock()
|
||||
.await
|
||||
.insert((app_id, collection.to_string(), key.to_string()), value))
|
||||
}
|
||||
|
||||
async fn delete(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvRepoError> {
|
||||
Ok(self
|
||||
.data
|
||||
.lock()
|
||||
.await
|
||||
.remove(&(app_id, collection.to_string(), key.to_string())))
|
||||
}
|
||||
|
||||
async fn has(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<bool, KvRepoError> {
|
||||
Ok(self.data.lock().await.contains_key(&(
|
||||
app_id,
|
||||
collection.to_string(),
|
||||
key.to_string(),
|
||||
)))
|
||||
}
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvRepoError> {
|
||||
let data = self.data.lock().await;
|
||||
let last_key = cursor.map(std::string::ToString::to_string);
|
||||
let mut keys: Vec<String> = data
|
||||
.iter()
|
||||
.filter(|((a, c, _), _)| *a == app_id && c == collection)
|
||||
.map(|((_, _, k), _)| k.clone())
|
||||
.filter(|k| last_key.as_ref().is_none_or(|lk| k > lk))
|
||||
.collect();
|
||||
keys.sort();
|
||||
let take = (limit as usize).max(1);
|
||||
let next_cursor = if keys.len() > take {
|
||||
keys.truncate(take);
|
||||
keys.last().cloned()
|
||||
} else {
|
||||
None
|
||||
};
|
||||
Ok(KvListPage { keys, next_cursor })
|
||||
}
|
||||
}
|
||||
|
||||
/// AuthzRepo that always denies — used to confirm the service
|
||||
/// short-circuits on cx.principal.is_some() with a denial, and
|
||||
/// that it does NOT call into authz when cx.principal is None.
|
||||
#[derive(Default)]
|
||||
struct DenyingAuthzRepo;
|
||||
|
||||
#[async_trait]
|
||||
impl AuthzRepo for DenyingAuthzRepo {
|
||||
async fn membership(
|
||||
&self,
|
||||
_user_id: UserId,
|
||||
_app_id: AppId,
|
||||
) -> Result<Option<AppRole>, AuthzError> {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
fn anon_cx(app_id: AppId) -> SdkCallCx {
|
||||
SdkCallCx {
|
||||
app_id,
|
||||
principal: None,
|
||||
execution_id: ExecutionId::new(),
|
||||
request_id: RequestId::new(),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: ExecutionId::new(),
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn owner_cx(app_id: AppId) -> SdkCallCx {
|
||||
SdkCallCx {
|
||||
app_id,
|
||||
principal: Some(Principal {
|
||||
user_id: AdminUserId::new(),
|
||||
instance_role: InstanceRole::Owner,
|
||||
scopes: None,
|
||||
app_binding: None,
|
||||
}),
|
||||
execution_id: ExecutionId::new(),
|
||||
request_id: RequestId::new(),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: ExecutionId::new(),
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn member_no_role_cx(app_id: AppId) -> SdkCallCx {
|
||||
SdkCallCx {
|
||||
app_id,
|
||||
principal: Some(Principal {
|
||||
user_id: AdminUserId::new(),
|
||||
instance_role: InstanceRole::Member,
|
||||
scopes: None,
|
||||
app_binding: None,
|
||||
}),
|
||||
execution_id: ExecutionId::new(),
|
||||
request_id: RequestId::new(),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: ExecutionId::new(),
|
||||
is_dead_letter_handler: false,
|
||||
event: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn svc() -> KvServiceImpl {
|
||||
KvServiceImpl::new(
|
||||
Arc::new(InMemoryKvRepo::default()),
|
||||
Arc::new(DenyingAuthzRepo),
|
||||
Arc::new(NoopEventEmitter),
|
||||
)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn set_then_get_round_trips() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
kv.set(&cx, "widgets", "k1", serde_json::json!({"n": 1}))
|
||||
.await
|
||||
.unwrap();
|
||||
let v = kv.get(&cx, "widgets", "k1").await.unwrap();
|
||||
assert_eq!(v, Some(serde_json::json!({"n": 1})));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn get_missing_returns_none() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
let v = kv.get(&cx, "widgets", "nope").await.unwrap();
|
||||
assert_eq!(v, None);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn has_returns_bool() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
assert!(!kv.has(&cx, "widgets", "k1").await.unwrap());
|
||||
kv.set(&cx, "widgets", "k1", serde_json::json!(true))
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(kv.has(&cx, "widgets", "k1").await.unwrap());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn delete_returns_was_present() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
assert!(!kv.delete(&cx, "widgets", "missing").await.unwrap());
|
||||
kv.set(&cx, "widgets", "k1", serde_json::json!(1))
|
||||
.await
|
||||
.unwrap();
|
||||
assert!(kv.delete(&cx, "widgets", "k1").await.unwrap());
|
||||
// Idempotent — second delete returns false.
|
||||
assert!(!kv.delete(&cx, "widgets", "k1").await.unwrap());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn empty_collection_rejected() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
let err = kv.get(&cx, "", "k1").await.unwrap_err();
|
||||
assert!(matches!(err, KvError::InvalidCollection));
|
||||
}
|
||||
|
||||
/// Load-bearing: a script with `cx.app_id = A` must NOT see
|
||||
/// entries inserted under `cx.app_id = B`. This is the cross-app
|
||||
/// isolation boundary; getting this wrong is a security
|
||||
/// vulnerability.
|
||||
#[tokio::test]
|
||||
async fn cross_app_isolation_via_cx_app_id() {
|
||||
let kv = svc();
|
||||
let app_a = AppId::new();
|
||||
let app_b = AppId::new();
|
||||
let cx_a = anon_cx(app_a);
|
||||
let cx_b = anon_cx(app_b);
|
||||
|
||||
kv.set(&cx_a, "shared", "k", serde_json::json!("from-a"))
|
||||
.await
|
||||
.unwrap();
|
||||
kv.set(&cx_b, "shared", "k", serde_json::json!("from-b"))
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(
|
||||
kv.get(&cx_a, "shared", "k").await.unwrap(),
|
||||
Some(serde_json::json!("from-a"))
|
||||
);
|
||||
assert_eq!(
|
||||
kv.get(&cx_b, "shared", "k").await.unwrap(),
|
||||
Some(serde_json::json!("from-b"))
|
||||
);
|
||||
}
|
||||
|
||||
/// Script-as-gate: an `anon_cx` (principal = None) skips the
|
||||
/// capability check entirely. Even with a denying authz repo,
|
||||
/// the write succeeds.
|
||||
#[tokio::test]
|
||||
async fn anonymous_cx_skips_authz() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
kv.set(&cx, "widgets", "k", serde_json::json!(1))
|
||||
.await
|
||||
.unwrap();
|
||||
// No panic, no Forbidden.
|
||||
}
|
||||
|
||||
/// Authenticated principal with no role on the app: the
|
||||
/// `DenyingAuthzRepo` returns no membership, so the capability
|
||||
/// check denies. Set must surface KvError::Forbidden.
|
||||
#[tokio::test]
|
||||
async fn authed_cx_with_no_role_is_forbidden() {
|
||||
let kv = svc();
|
||||
let cx = member_no_role_cx(AppId::new());
|
||||
let err = kv
|
||||
.set(&cx, "widgets", "k", serde_json::json!(1))
|
||||
.await
|
||||
.unwrap_err();
|
||||
assert!(matches!(err, KvError::Forbidden));
|
||||
}
|
||||
|
||||
/// Owner principal: instance-role grants kick in inside `authz::can`
|
||||
/// (Owner -> implicit AppAdmin which covers KvWrite).
|
||||
#[tokio::test]
|
||||
async fn owner_principal_can_write() {
|
||||
let kv = svc();
|
||||
let cx = owner_cx(AppId::new());
|
||||
kv.set(&cx, "widgets", "k", serde_json::json!(1))
|
||||
.await
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn list_cursor_pagination() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
for i in 0..5 {
|
||||
kv.set(
|
||||
&cx,
|
||||
"widgets",
|
||||
&format!("k{i:02}"),
|
||||
serde_json::json!({"i": i}),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
}
|
||||
// page 1 — 2 keys
|
||||
let p1 = kv.list(&cx, "widgets", None, 2).await.unwrap();
|
||||
assert_eq!(p1.keys, vec!["k00".to_string(), "k01".to_string()]);
|
||||
assert!(p1.next_cursor.is_some());
|
||||
// page 2 — 2 keys
|
||||
let p2 = kv
|
||||
.list(&cx, "widgets", p1.next_cursor.as_deref(), 2)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(p2.keys, vec!["k02".to_string(), "k03".to_string()]);
|
||||
// final page — 1 key, no cursor
|
||||
let p3 = kv
|
||||
.list(&cx, "widgets", p2.next_cursor.as_deref(), 2)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(p3.keys, vec!["k04".to_string()]);
|
||||
assert!(p3.next_cursor.is_none());
|
||||
}
|
||||
|
||||
/// Pinning the v1.1.0 contract: services hold the emitter as a
|
||||
/// dyn Arc and call `emit().await` unconditionally. This test
|
||||
/// proves the call site doesn't blow up against the noop impl —
|
||||
/// the outbox emitter (v1.1.1) drops in transparently.
|
||||
#[tokio::test]
|
||||
async fn noop_emitter_does_not_block_mutations() {
|
||||
let kv = svc();
|
||||
let cx = anon_cx(AppId::new());
|
||||
kv.set(&cx, "widgets", "k", serde_json::json!(1))
|
||||
.await
|
||||
.unwrap();
|
||||
kv.delete(&cx, "widgets", "k").await.unwrap();
|
||||
// Reaching here means emit() returned Ok and didn't panic.
|
||||
// Suppress unused-import warning when run alone:
|
||||
let _ = HashMap::<String, String>::new();
|
||||
}
|
||||
}
|
||||
@@ -4,6 +4,7 @@
|
||||
//! the same DB for now; once we add caching and per-node ingress, the
|
||||
//! manager will publish change events.
|
||||
|
||||
pub mod abandoned_repo;
|
||||
pub mod admin_session_repo;
|
||||
pub mod admin_user_repo;
|
||||
pub mod admin_users_api;
|
||||
@@ -21,14 +22,30 @@ pub mod auth_api;
|
||||
pub mod auth_bootstrap;
|
||||
pub mod auth_middleware;
|
||||
pub mod authz;
|
||||
pub mod dead_letter_repo;
|
||||
pub mod dead_letter_service;
|
||||
pub mod dead_letters_api;
|
||||
pub mod dispatcher;
|
||||
pub mod gc;
|
||||
pub mod kv_repo;
|
||||
pub mod kv_service;
|
||||
pub mod log_sink;
|
||||
pub mod migrations;
|
||||
pub mod outbox_event_emitter;
|
||||
pub mod outbox_repo;
|
||||
pub mod principal_resolver;
|
||||
pub mod repo;
|
||||
pub mod route_admin;
|
||||
pub mod route_repo;
|
||||
pub mod sandbox;
|
||||
pub mod scheduler;
|
||||
pub mod trigger_config;
|
||||
pub mod trigger_repo;
|
||||
pub mod triggers_api;
|
||||
|
||||
pub use abandoned_repo::{
|
||||
AbandonedRepo, AbandonedRepoError, NewAbandonedExecution, PostgresAbandonedRepo,
|
||||
};
|
||||
pub use admin_session_repo::{
|
||||
AdminSessionLookup, AdminSessionRepository, AdminSessionRepositoryError,
|
||||
PostgresAdminSessionRepository,
|
||||
@@ -63,7 +80,21 @@ pub use auth_middleware::{
|
||||
API_KEY_PREFIX, API_KEY_PREFIX_LEN, SESSION_COOKIE,
|
||||
};
|
||||
pub use authz::{can, require, AuthzDenied, AuthzError, AuthzRepo, Capability, Decision};
|
||||
pub use dead_letter_repo::{
|
||||
DeadLetterRepo, DeadLetterRepoError, DeadLetterRow, NewDeadLetter, PostgresDeadLetterRepo,
|
||||
};
|
||||
pub use dead_letter_service::PostgresDeadLetterService;
|
||||
pub use dead_letters_api::{dead_letters_router, DeadLettersApiError, DeadLettersState};
|
||||
pub use dispatcher::{compute_backoff, Dispatcher, DispatcherError};
|
||||
pub use gc::{spawn_abandoned_gc, spawn_dead_letter_gc};
|
||||
pub use kv_repo::{KvRepo, KvRepoError, PostgresKvRepo};
|
||||
pub use kv_service::KvServiceImpl;
|
||||
pub use log_sink::PostgresExecutionLogSink;
|
||||
pub use outbox_event_emitter::OutboxEventEmitter;
|
||||
pub use outbox_repo::{
|
||||
NewOutboxRow, OutboxRepo, OutboxRepoError, OutboxRow, OutboxSourceKind, PostgresOutboxRepo,
|
||||
};
|
||||
pub use principal_resolver::{AdminPrincipalResolver, PrincipalResolver, PrincipalResolverError};
|
||||
pub use repo::{
|
||||
ExecutionLogRepository, NewScript, PostgresExecutionLogRepository, PostgresScriptRepository,
|
||||
RepoResolver, ScriptPatch, ScriptRepository, ScriptRepositoryError,
|
||||
@@ -71,3 +102,10 @@ pub use repo::{
|
||||
pub use route_admin::{compile_routes, route_admin_router, RouteAdminState};
|
||||
pub use route_repo::{NewRoute, PostgresRouteRepository, RouteRepository};
|
||||
pub use sandbox::{CeilingError, SandboxCeiling};
|
||||
pub use trigger_config::{BackoffShape, TriggerConfig};
|
||||
pub use trigger_repo::{
|
||||
collection_matches, CreateDeadLetterTrigger, CreateKvTrigger, DeadLetterTriggerMatch,
|
||||
KvTriggerMatch, PostgresTriggerRepo, Trigger, TriggerDetails, TriggerDispatchMode, TriggerKind,
|
||||
TriggerRepo, TriggerRepoError,
|
||||
};
|
||||
pub use triggers_api::{triggers_router, TriggersApiError, TriggersState};
|
||||
|
||||
103
crates/manager-core/src/outbox_event_emitter.rs
Normal file
103
crates/manager-core/src/outbox_event_emitter.rs
Normal file
@@ -0,0 +1,103 @@
|
||||
//! `OutboxEventEmitter` — the real `ServiceEventEmitter` that replaces
|
||||
//! v1.1.0's `NoopEventEmitter` once the triggers framework lands.
|
||||
//!
|
||||
//! On each `emit` (a KV mutation, future doc/file/pubsub event, etc.):
|
||||
//! 1. Look up matching triggers for the event's (app_id, source, op,
|
||||
//! collection) tuple via `TriggerRepo::list_matching_*`.
|
||||
//! 2. For each match, write one outbox row carrying the event payload
|
||||
//! serialized as a `TriggerEvent`.
|
||||
//!
|
||||
//! Defaults applied at write time so `OutboxRow.payload` carries
|
||||
//! everything the dispatcher needs to reconstruct the executor
|
||||
//! invocation without joining back to the trigger row.
|
||||
//!
|
||||
//! Non-KV `ServiceEvent` sources are silently dropped in v1.1.1 — the
|
||||
//! dispatcher only knows how to fire KV triggers this release. Future
|
||||
//! sources (docs/files/pubsub) add their own dispatch arm.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{
|
||||
EmitError, KvEventOp, SdkCallCx, ServiceEvent, ServiceEventEmitter, TriggerEvent,
|
||||
};
|
||||
|
||||
use crate::outbox_repo::{NewOutboxRow, OutboxRepo, OutboxSourceKind};
|
||||
use crate::trigger_repo::TriggerRepo;
|
||||
|
||||
pub struct OutboxEventEmitter {
|
||||
triggers: Arc<dyn TriggerRepo>,
|
||||
outbox: Arc<dyn OutboxRepo>,
|
||||
}
|
||||
|
||||
impl OutboxEventEmitter {
|
||||
#[must_use]
|
||||
pub fn new(triggers: Arc<dyn TriggerRepo>, outbox: Arc<dyn OutboxRepo>) -> Self {
|
||||
Self { triggers, outbox }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl ServiceEventEmitter for OutboxEventEmitter {
|
||||
async fn emit(&self, cx: &SdkCallCx, event: ServiceEvent) -> Result<(), EmitError> {
|
||||
match event.source {
|
||||
"kv" => self.emit_kv(cx, event).await,
|
||||
// Future sources land here. For now, silently drop — the
|
||||
// SDK calls `events.emit(...)` unconditionally for forward
|
||||
// compat, so swallowing without an error is correct.
|
||||
_ => Ok(()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl OutboxEventEmitter {
|
||||
async fn emit_kv(&self, cx: &SdkCallCx, event: ServiceEvent) -> Result<(), EmitError> {
|
||||
let Some(op) = KvEventOp::from_wire(event.op) else {
|
||||
return Ok(()); // unknown op — drop quietly
|
||||
};
|
||||
let Some(collection) = event.collection.clone() else {
|
||||
return Ok(()); // KV events always carry a collection — defensively skip
|
||||
};
|
||||
let key = event.key.clone().unwrap_or_default();
|
||||
|
||||
let matches = self
|
||||
.triggers
|
||||
.list_matching_kv(cx.app_id, &collection, op)
|
||||
.await
|
||||
.map_err(|e| EmitError::Unavailable(format!("trigger lookup: {e}")))?;
|
||||
|
||||
if matches.is_empty() {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Serialize the originating event as a TriggerEvent so the
|
||||
// dispatcher can hand it to the script as `ctx.event` without
|
||||
// round-tripping back to the trigger row.
|
||||
let trigger_event = TriggerEvent::Kv {
|
||||
op,
|
||||
collection,
|
||||
key,
|
||||
value: event.payload.clone(),
|
||||
};
|
||||
let payload = serde_json::to_value(&trigger_event)
|
||||
.map_err(|e| EmitError::Rejected(format!("event serialize: {e}")))?;
|
||||
|
||||
for m in matches {
|
||||
self.outbox
|
||||
.insert(NewOutboxRow {
|
||||
app_id: cx.app_id,
|
||||
source_kind: OutboxSourceKind::Kv,
|
||||
trigger_id: Some(m.trigger_id),
|
||||
script_id: Some(m.script_id),
|
||||
reply_to: None,
|
||||
payload: payload.clone(),
|
||||
origin_principal: cx.principal.as_ref().map(|p| p.user_id),
|
||||
trigger_depth: cx.trigger_depth.saturating_add(1),
|
||||
root_execution_id: Some(cx.root_execution_id),
|
||||
})
|
||||
.await
|
||||
.map_err(|e| EmitError::Unavailable(format!("outbox insert: {e}")))?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
258
crates/manager-core/src/outbox_repo.rs
Normal file
258
crates/manager-core/src/outbox_repo.rs
Normal file
@@ -0,0 +1,258 @@
|
||||
//! `OutboxRepo` — universal trigger outbox CRUD. Hot writes come from
|
||||
//! the `OutboxEventEmitter` (KV mutations fan out via this) and the
|
||||
//! sync-HTTP path. Hot reads come from the dispatcher, which claims
|
||||
//! due rows via `FOR UPDATE SKIP LOCKED`.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use chrono::{DateTime, Utc};
|
||||
use picloud_shared::{
|
||||
AdminUserId, AppId, ExecutionId, NewHttpOutbox, OutboxWriter, OutboxWriterError, ScriptId,
|
||||
TriggerId,
|
||||
};
|
||||
use sqlx::PgPool;
|
||||
use uuid::Uuid;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum OutboxRepoError {
|
||||
#[error("database error: {0}")]
|
||||
Db(#[from] sqlx::Error),
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum OutboxSourceKind {
|
||||
Http,
|
||||
Kv,
|
||||
DeadLetter,
|
||||
}
|
||||
|
||||
impl OutboxSourceKind {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Http => "http",
|
||||
Self::Kv => "kv",
|
||||
Self::DeadLetter => "dead_letter",
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_wire(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"http" => Some(Self::Http),
|
||||
"kv" => Some(Self::Kv),
|
||||
"dead_letter" => Some(Self::DeadLetter),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Insert payload — what each event source writes when fanning out
|
||||
/// to the outbox. `payload` is the serialized `TriggerEvent` (plus
|
||||
/// any extra context the dispatcher needs to reconstruct an
|
||||
/// `ExecRequest`).
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct NewOutboxRow {
|
||||
pub app_id: AppId,
|
||||
pub source_kind: OutboxSourceKind,
|
||||
pub trigger_id: Option<TriggerId>,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub reply_to: Option<Uuid>,
|
||||
pub payload: serde_json::Value,
|
||||
pub origin_principal: Option<AdminUserId>,
|
||||
pub trigger_depth: u32,
|
||||
pub root_execution_id: Option<ExecutionId>,
|
||||
}
|
||||
|
||||
/// Row as the dispatcher sees it after a claim.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct OutboxRow {
|
||||
pub id: Uuid,
|
||||
pub app_id: AppId,
|
||||
pub source_kind: OutboxSourceKind,
|
||||
pub trigger_id: Option<TriggerId>,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub reply_to: Option<Uuid>,
|
||||
pub payload: serde_json::Value,
|
||||
pub origin_principal: Option<AdminUserId>,
|
||||
pub trigger_depth: u32,
|
||||
pub root_execution_id: Option<ExecutionId>,
|
||||
pub attempt_count: u32,
|
||||
pub next_attempt_at: DateTime<Utc>,
|
||||
pub created_at: DateTime<Utc>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait OutboxRepo: Send + Sync {
|
||||
async fn insert(&self, row: NewOutboxRow) -> Result<Uuid, OutboxRepoError>;
|
||||
|
||||
/// Claim up to `limit` due rows. Wraps the claim in a single
|
||||
/// transaction so two concurrent dispatchers (cluster mode) can't
|
||||
/// double-pick a row. Empty Vec when nothing is due.
|
||||
async fn claim_due(
|
||||
&self,
|
||||
claimed_by: &str,
|
||||
limit: i64,
|
||||
) -> Result<Vec<OutboxRow>, OutboxRepoError>;
|
||||
|
||||
/// Remove a row after a terminal outcome (success or dead-letter).
|
||||
async fn delete(&self, id: Uuid) -> Result<(), OutboxRepoError>;
|
||||
|
||||
/// Failure path: bump attempt_count, clear the claim, set the
|
||||
/// next attempt time. The dispatcher computes the delay (with
|
||||
/// backoff + jitter) and passes it in.
|
||||
async fn reschedule(
|
||||
&self,
|
||||
id: Uuid,
|
||||
attempt_count: u32,
|
||||
next_attempt_at: DateTime<Utc>,
|
||||
) -> Result<(), OutboxRepoError>;
|
||||
}
|
||||
|
||||
pub struct PostgresOutboxRepo {
|
||||
pool: PgPool,
|
||||
}
|
||||
|
||||
impl PostgresOutboxRepo {
|
||||
#[must_use]
|
||||
pub fn new(pool: PgPool) -> Self {
|
||||
Self { pool }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl OutboxRepo for PostgresOutboxRepo {
|
||||
async fn insert(&self, row: NewOutboxRow) -> Result<Uuid, OutboxRepoError> {
|
||||
let (id,): (Uuid,) = sqlx::query_as(
|
||||
"INSERT INTO outbox ( \
|
||||
app_id, source_kind, trigger_id, script_id, reply_to, \
|
||||
payload, origin_principal, trigger_depth, root_execution_id \
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) \
|
||||
RETURNING id",
|
||||
)
|
||||
.bind(row.app_id.into_inner())
|
||||
.bind(row.source_kind.as_str())
|
||||
.bind(row.trigger_id.map(TriggerId::into_inner))
|
||||
.bind(row.script_id.map(ScriptId::into_inner))
|
||||
.bind(row.reply_to)
|
||||
.bind(row.payload)
|
||||
.bind(row.origin_principal.map(AdminUserId::into_inner))
|
||||
.bind(i32::try_from(row.trigger_depth).unwrap_or(0))
|
||||
.bind(row.root_execution_id.map(ExecutionId::into_inner))
|
||||
.fetch_one(&self.pool)
|
||||
.await?;
|
||||
Ok(id)
|
||||
}
|
||||
|
||||
async fn claim_due(
|
||||
&self,
|
||||
claimed_by: &str,
|
||||
limit: i64,
|
||||
) -> Result<Vec<OutboxRow>, OutboxRepoError> {
|
||||
let rows: Vec<OutboxRowRaw> = sqlx::query_as(
|
||||
"WITH due AS ( \
|
||||
SELECT id FROM outbox \
|
||||
WHERE claimed_at IS NULL AND next_attempt_at <= NOW() \
|
||||
ORDER BY next_attempt_at \
|
||||
FOR UPDATE SKIP LOCKED \
|
||||
LIMIT $1 \
|
||||
) \
|
||||
UPDATE outbox SET claimed_at = NOW(), claimed_by = $2 \
|
||||
WHERE id IN (SELECT id FROM due) \
|
||||
RETURNING id, app_id, source_kind, trigger_id, script_id, reply_to, \
|
||||
payload, origin_principal, trigger_depth, \
|
||||
root_execution_id, attempt_count, next_attempt_at, created_at",
|
||||
)
|
||||
.bind(limit)
|
||||
.bind(claimed_by)
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
|
||||
Ok(rows.into_iter().filter_map(OutboxRowRaw::hydrate).collect())
|
||||
}
|
||||
|
||||
async fn delete(&self, id: Uuid) -> Result<(), OutboxRepoError> {
|
||||
sqlx::query("DELETE FROM outbox WHERE id = $1")
|
||||
.bind(id)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn reschedule(
|
||||
&self,
|
||||
id: Uuid,
|
||||
attempt_count: u32,
|
||||
next_attempt_at: DateTime<Utc>,
|
||||
) -> Result<(), OutboxRepoError> {
|
||||
sqlx::query(
|
||||
"UPDATE outbox SET attempt_count = $2, next_attempt_at = $3, \
|
||||
claimed_at = NULL, claimed_by = NULL \
|
||||
WHERE id = $1",
|
||||
)
|
||||
.bind(id)
|
||||
.bind(i32::try_from(attempt_count).unwrap_or(0))
|
||||
.bind(next_attempt_at)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// `OutboxWriter` implementation so orchestrator-core (which can't
|
||||
/// depend on manager-core) can enqueue HTTP outbox rows through the
|
||||
/// shared trait.
|
||||
#[async_trait]
|
||||
impl OutboxWriter for PostgresOutboxRepo {
|
||||
async fn enqueue_http(&self, row: NewHttpOutbox) -> Result<Uuid, OutboxWriterError> {
|
||||
self.insert(NewOutboxRow {
|
||||
app_id: row.app_id,
|
||||
source_kind: OutboxSourceKind::Http,
|
||||
trigger_id: Some(TriggerId::from(row.route_id)),
|
||||
script_id: Some(row.script_id),
|
||||
reply_to: row.reply_to,
|
||||
payload: row.payload,
|
||||
origin_principal: row.origin_principal,
|
||||
trigger_depth: row.trigger_depth,
|
||||
root_execution_id: row.root_execution_id,
|
||||
})
|
||||
.await
|
||||
.map_err(|e| OutboxWriterError::Backend(e.to_string()))
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct OutboxRowRaw {
|
||||
id: Uuid,
|
||||
app_id: Uuid,
|
||||
source_kind: String,
|
||||
trigger_id: Option<Uuid>,
|
||||
script_id: Option<Uuid>,
|
||||
reply_to: Option<Uuid>,
|
||||
payload: serde_json::Value,
|
||||
origin_principal: Option<Uuid>,
|
||||
trigger_depth: i32,
|
||||
root_execution_id: Option<Uuid>,
|
||||
attempt_count: i32,
|
||||
next_attempt_at: DateTime<Utc>,
|
||||
created_at: DateTime<Utc>,
|
||||
}
|
||||
|
||||
impl OutboxRowRaw {
|
||||
fn hydrate(self) -> Option<OutboxRow> {
|
||||
Some(OutboxRow {
|
||||
id: self.id,
|
||||
app_id: self.app_id.into(),
|
||||
source_kind: OutboxSourceKind::from_wire(&self.source_kind)?,
|
||||
trigger_id: self.trigger_id.map(Into::into),
|
||||
script_id: self.script_id.map(Into::into),
|
||||
reply_to: self.reply_to,
|
||||
payload: self.payload,
|
||||
origin_principal: self.origin_principal.map(Into::into),
|
||||
trigger_depth: u32::try_from(self.trigger_depth).unwrap_or(0),
|
||||
root_execution_id: self.root_execution_id.map(Into::into),
|
||||
attempt_count: u32::try_from(self.attempt_count).unwrap_or(0),
|
||||
next_attempt_at: self.next_attempt_at,
|
||||
created_at: self.created_at,
|
||||
})
|
||||
}
|
||||
}
|
||||
62
crates/manager-core/src/principal_resolver.rs
Normal file
62
crates/manager-core/src/principal_resolver.rs
Normal file
@@ -0,0 +1,62 @@
|
||||
//! `PrincipalResolver` — turns a `registered_by_principal` user id from
|
||||
//! a trigger row into the `Principal` the dispatcher passes through to
|
||||
//! the executor. Per design notes §4, a trigger execution runs as the
|
||||
//! user that registered the trigger; the original event's caller is
|
||||
//! recorded elsewhere (on the outbox row, for forensics) and does not
|
||||
//! become the execution principal.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{AdminUserId, Principal};
|
||||
|
||||
use crate::admin_user_repo::{AdminUserRepository, AdminUserRepositoryError};
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum PrincipalResolverError {
|
||||
#[error("user not found: {0}")]
|
||||
NotFound(AdminUserId),
|
||||
#[error("user is inactive: {0}")]
|
||||
Inactive(AdminUserId),
|
||||
#[error("admin user repo error: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait PrincipalResolver: Send + Sync {
|
||||
async fn resolve(&self, user_id: AdminUserId) -> Result<Principal, PrincipalResolverError>;
|
||||
}
|
||||
|
||||
pub struct AdminPrincipalResolver {
|
||||
users: std::sync::Arc<dyn AdminUserRepository>,
|
||||
}
|
||||
|
||||
impl AdminPrincipalResolver {
|
||||
#[must_use]
|
||||
pub fn new(users: std::sync::Arc<dyn AdminUserRepository>) -> Self {
|
||||
Self { users }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl PrincipalResolver for AdminPrincipalResolver {
|
||||
async fn resolve(&self, user_id: AdminUserId) -> Result<Principal, PrincipalResolverError> {
|
||||
let row = self
|
||||
.users
|
||||
.get(user_id)
|
||||
.await
|
||||
.map_err(|e: AdminUserRepositoryError| PrincipalResolverError::Backend(e.to_string()))?
|
||||
.ok_or(PrincipalResolverError::NotFound(user_id))?;
|
||||
if !row.is_active {
|
||||
return Err(PrincipalResolverError::Inactive(user_id));
|
||||
}
|
||||
Ok(Principal {
|
||||
user_id,
|
||||
instance_role: row.instance_role,
|
||||
// Trigger executions are cookie-session-style (no API key
|
||||
// scope restriction). Per-app permissions are evaluated
|
||||
// via `authz::can` against the `app_id` of the resource
|
||||
// the script touches, exactly like an admin invocation.
|
||||
scopes: None,
|
||||
app_binding: None,
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -77,6 +77,12 @@ pub struct CreateRouteRequest {
|
||||
pub path_kind: PathKind,
|
||||
pub path: String,
|
||||
pub method: Option<String>,
|
||||
/// Per-route dispatch mode (v1.1.1). Defaults to `Sync` when
|
||||
/// omitted so older clients aren't broken. `Async` routes return
|
||||
/// `202 Accepted` immediately and run the script in the
|
||||
/// background via the dispatcher.
|
||||
#[serde(default)]
|
||||
pub dispatch_mode: picloud_shared::DispatchMode,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
@@ -211,6 +217,7 @@ async fn create_route<RR: RouteRepository, SR: ScriptRepository>(
|
||||
path_kind: input.path_kind,
|
||||
path: normalized_path,
|
||||
method: input.method,
|
||||
dispatch_mode: input.dispatch_mode,
|
||||
})
|
||||
.await?;
|
||||
refresh_table(&state).await?;
|
||||
@@ -370,6 +377,7 @@ pub fn compile_routes(rows: &[Route]) -> Result<Vec<CompiledRoute>, pattern::Par
|
||||
host: pattern::parse_host(r.host_kind, &r.host, r.host_param_name.as_deref())?,
|
||||
path: pattern::parse_path(r.path_kind, &r.path)?,
|
||||
method: r.method.clone(),
|
||||
dispatch_mode: r.dispatch_mode,
|
||||
})
|
||||
})
|
||||
.collect()
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
//! after every write — see the route_admin module for the binding.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{AppId, HostKind, PathKind, Route, ScriptId};
|
||||
use picloud_shared::{AppId, DispatchMode, HostKind, PathKind, Route, ScriptId};
|
||||
use sqlx::PgPool;
|
||||
use uuid::Uuid;
|
||||
|
||||
@@ -20,6 +20,7 @@ pub struct NewRoute {
|
||||
pub path_kind: PathKind,
|
||||
pub path: String,
|
||||
pub method: Option<String>,
|
||||
pub dispatch_mode: DispatchMode,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
@@ -62,7 +63,7 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
async fn list_all(&self) -> Result<Vec<Route>, ScriptRepositoryError> {
|
||||
let rows = sqlx::query_as::<_, RouteRow>(
|
||||
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method, created_at \
|
||||
path_kind, path, method, dispatch_mode, created_at \
|
||||
FROM routes ORDER BY created_at",
|
||||
)
|
||||
.fetch_all(&self.pool)
|
||||
@@ -73,7 +74,7 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
async fn get(&self, route_id: Uuid) -> Result<Option<Route>, ScriptRepositoryError> {
|
||||
let row = sqlx::query_as::<_, RouteRow>(
|
||||
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method, created_at \
|
||||
path_kind, path, method, dispatch_mode, created_at \
|
||||
FROM routes WHERE id = $1",
|
||||
)
|
||||
.bind(route_id)
|
||||
@@ -85,7 +86,7 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Route>, ScriptRepositoryError> {
|
||||
let rows = sqlx::query_as::<_, RouteRow>(
|
||||
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method, created_at \
|
||||
path_kind, path, method, dispatch_mode, created_at \
|
||||
FROM routes WHERE app_id = $1 ORDER BY created_at",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
@@ -100,7 +101,7 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
) -> Result<Vec<Route>, ScriptRepositoryError> {
|
||||
let rows = sqlx::query_as::<_, RouteRow>(
|
||||
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method, created_at \
|
||||
path_kind, path, method, dispatch_mode, created_at \
|
||||
FROM routes WHERE script_id = $1 ORDER BY created_at",
|
||||
)
|
||||
.bind(script_id.into_inner())
|
||||
@@ -113,10 +114,10 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
let res = sqlx::query_as::<_, RouteRow>(
|
||||
"INSERT INTO routes ( \
|
||||
app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method \
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) \
|
||||
path_kind, path, method, dispatch_mode \
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) \
|
||||
RETURNING id, app_id, script_id, host_kind, host, host_param_name, \
|
||||
path_kind, path, method, created_at",
|
||||
path_kind, path, method, dispatch_mode, created_at",
|
||||
)
|
||||
.bind(input.app_id.into_inner())
|
||||
.bind(input.script_id.into_inner())
|
||||
@@ -126,6 +127,7 @@ impl RouteRepository for PostgresRouteRepository {
|
||||
.bind(path_kind_str(input.path_kind))
|
||||
.bind(&input.path)
|
||||
.bind(input.method.as_deref())
|
||||
.bind(input.dispatch_mode.as_str())
|
||||
.fetch_one(&self.pool)
|
||||
.await;
|
||||
|
||||
@@ -198,6 +200,7 @@ struct RouteRow {
|
||||
path_kind: String,
|
||||
path: String,
|
||||
method: Option<String>,
|
||||
dispatch_mode: String,
|
||||
created_at: chrono::DateTime<chrono::Utc>,
|
||||
}
|
||||
|
||||
@@ -221,6 +224,7 @@ impl From<RouteRow> for Route {
|
||||
},
|
||||
path: r.path,
|
||||
method: r.method,
|
||||
dispatch_mode: DispatchMode::from_wire(&r.dispatch_mode).unwrap_or(DispatchMode::Sync),
|
||||
created_at: r.created_at,
|
||||
}
|
||||
}
|
||||
|
||||
157
crates/manager-core/src/trigger_config.rs
Normal file
157
crates/manager-core/src/trigger_config.rs
Normal file
@@ -0,0 +1,157 @@
|
||||
//! Trigger-framework tunables. Defaults match design notes §3 (retry
|
||||
//! policy) and §4 (retention). Each knob is env-overridable via a
|
||||
//! `PICLOUD_*` variable following the same `tracing::warn` on parse
|
||||
//! error pattern `SandboxCeiling::from_env` uses.
|
||||
|
||||
use std::env;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "lowercase")]
|
||||
pub enum BackoffShape {
|
||||
Exponential,
|
||||
Linear,
|
||||
Constant,
|
||||
}
|
||||
|
||||
impl BackoffShape {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Exponential => "exponential",
|
||||
Self::Linear => "linear",
|
||||
Self::Constant => "constant",
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_wire(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"exponential" => Some(Self::Exponential),
|
||||
"linear" => Some(Self::Linear),
|
||||
"constant" => Some(Self::Constant),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct TriggerConfig {
|
||||
/// Maximum `cx.trigger_depth` before the dispatcher refuses
|
||||
/// execution. Above this, the row is dropped + a metric bumped;
|
||||
/// it is NOT dead-lettered (design notes §4: depth-exceeded
|
||||
/// means "you built a loop"). Default 8.
|
||||
pub max_trigger_depth: u32,
|
||||
|
||||
/// Default retry attempts (per-trigger override on the row).
|
||||
pub retry_max_attempts: u32,
|
||||
pub retry_backoff: BackoffShape,
|
||||
pub retry_base_ms: u32,
|
||||
/// ±jitter as a percentage of the computed delay. Applied at
|
||||
/// dispatch time — not per-trigger.
|
||||
pub retry_jitter_pct: u32,
|
||||
|
||||
/// dead-letter retention before GC, in days. Default 30.
|
||||
pub dead_letter_retention_days: u32,
|
||||
/// abandoned-execution retention before GC, in days. Default 7.
|
||||
pub abandoned_retention_days: u32,
|
||||
}
|
||||
|
||||
impl TriggerConfig {
|
||||
#[must_use]
|
||||
pub const fn conservative() -> Self {
|
||||
Self {
|
||||
max_trigger_depth: 8,
|
||||
retry_max_attempts: 3,
|
||||
retry_backoff: BackoffShape::Exponential,
|
||||
retry_base_ms: 1000,
|
||||
retry_jitter_pct: 20,
|
||||
dead_letter_retention_days: 30,
|
||||
abandoned_retention_days: 7,
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_env() -> Self {
|
||||
let mut c = Self::conservative();
|
||||
load_u32(&mut c.max_trigger_depth, "PICLOUD_MAX_TRIGGER_DEPTH");
|
||||
load_u32(
|
||||
&mut c.retry_max_attempts,
|
||||
"PICLOUD_TRIGGER_RETRY_MAX_ATTEMPTS",
|
||||
);
|
||||
load_backoff(&mut c.retry_backoff, "PICLOUD_TRIGGER_RETRY_BACKOFF");
|
||||
load_u32(&mut c.retry_base_ms, "PICLOUD_TRIGGER_RETRY_BASE_MS");
|
||||
load_u32(&mut c.retry_jitter_pct, "PICLOUD_TRIGGER_RETRY_JITTER_PCT");
|
||||
load_u32(
|
||||
&mut c.dead_letter_retention_days,
|
||||
"PICLOUD_DEAD_LETTER_RETENTION_DAYS",
|
||||
);
|
||||
load_u32(
|
||||
&mut c.abandoned_retention_days,
|
||||
"PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS",
|
||||
);
|
||||
c
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for TriggerConfig {
|
||||
fn default() -> Self {
|
||||
Self::conservative()
|
||||
}
|
||||
}
|
||||
|
||||
fn load_u32(dst: &mut u32, key: &str) {
|
||||
if let Ok(v) = env::var(key) {
|
||||
match v.parse::<u32>() {
|
||||
Ok(n) => *dst = n,
|
||||
Err(e) => {
|
||||
tracing::warn!(env = key, error = %e, "ignoring invalid trigger-config value");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn load_backoff(dst: &mut BackoffShape, key: &str) {
|
||||
if let Ok(v) = env::var(key) {
|
||||
match BackoffShape::from_wire(&v) {
|
||||
Some(b) => *dst = b,
|
||||
None => {
|
||||
tracing::warn!(
|
||||
env = key,
|
||||
value = %v,
|
||||
"ignoring invalid trigger-config backoff shape (use exponential|linear|constant)"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn conservative_defaults_match_design_notes() {
|
||||
let c = TriggerConfig::conservative();
|
||||
assert_eq!(c.max_trigger_depth, 8);
|
||||
assert_eq!(c.retry_max_attempts, 3);
|
||||
assert_eq!(c.retry_backoff, BackoffShape::Exponential);
|
||||
assert_eq!(c.retry_base_ms, 1000);
|
||||
assert_eq!(c.retry_jitter_pct, 20);
|
||||
assert_eq!(c.dead_letter_retention_days, 30);
|
||||
assert_eq!(c.abandoned_retention_days, 7);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn backoff_round_trips() {
|
||||
for shape in [
|
||||
BackoffShape::Exponential,
|
||||
BackoffShape::Linear,
|
||||
BackoffShape::Constant,
|
||||
] {
|
||||
assert_eq!(BackoffShape::from_wire(shape.as_str()), Some(shape));
|
||||
}
|
||||
assert_eq!(BackoffShape::from_wire("garbage"), None);
|
||||
}
|
||||
}
|
||||
617
crates/manager-core/src/trigger_repo.rs
Normal file
617
crates/manager-core/src/trigger_repo.rs
Normal file
@@ -0,0 +1,617 @@
|
||||
//! `TriggerRepo` — CRUD over the `triggers` parent + per-kind detail
|
||||
//! tables. The admin endpoints (commit 4) sit on top of this; the
|
||||
//! dispatcher (commit 5) reads `list_matching_*` to fan out events to
|
||||
//! handler scripts.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use chrono::{DateTime, Utc};
|
||||
use picloud_shared::{AdminUserId, AppId, KvEventOp, ScriptId, TriggerId};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use sqlx::PgPool;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::trigger_config::BackoffShape;
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum TriggerRepoError {
|
||||
#[error("database error: {0}")]
|
||||
Db(#[from] sqlx::Error),
|
||||
|
||||
#[error("trigger not found: {0}")]
|
||||
NotFound(TriggerId),
|
||||
|
||||
#[error("invalid trigger payload: {0}")]
|
||||
Invalid(String),
|
||||
}
|
||||
|
||||
/// Parent-table row plus the per-kind detail merged in. Serialized
|
||||
/// back to admin clients via the JSON API.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Trigger {
|
||||
pub id: TriggerId,
|
||||
pub app_id: AppId,
|
||||
pub script_id: ScriptId,
|
||||
pub kind: TriggerKind,
|
||||
pub enabled: bool,
|
||||
pub dispatch_mode: TriggerDispatchMode,
|
||||
pub retry_max_attempts: u32,
|
||||
pub retry_backoff: BackoffShape,
|
||||
pub retry_base_ms: u32,
|
||||
pub registered_by_principal: AdminUserId,
|
||||
pub created_at: DateTime<Utc>,
|
||||
pub updated_at: DateTime<Utc>,
|
||||
pub details: TriggerDetails,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum TriggerKind {
|
||||
Kv,
|
||||
DeadLetter,
|
||||
}
|
||||
|
||||
impl TriggerKind {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Kv => "kv",
|
||||
Self::DeadLetter => "dead_letter",
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_wire(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"kv" => Some(Self::Kv),
|
||||
"dead_letter" => Some(Self::DeadLetter),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum TriggerDispatchMode {
|
||||
Sync,
|
||||
Async,
|
||||
}
|
||||
|
||||
impl TriggerDispatchMode {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Sync => "sync",
|
||||
Self::Async => "async",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
#[serde(tag = "kind", rename_all = "snake_case")]
|
||||
pub enum TriggerDetails {
|
||||
Kv {
|
||||
collection_glob: String,
|
||||
ops: Vec<KvEventOp>,
|
||||
},
|
||||
DeadLetter {
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
source_filter: Option<String>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
trigger_id_filter: Option<TriggerId>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
script_id_filter: Option<ScriptId>,
|
||||
},
|
||||
}
|
||||
|
||||
/// Create payload for a KV trigger. Defaults applied at the admin
|
||||
/// layer (uses `TriggerConfig::from_env` to fill retry settings if
|
||||
/// the request omitted them — keeps the row auditable).
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CreateKvTrigger {
|
||||
pub script_id: ScriptId,
|
||||
pub collection_glob: String,
|
||||
pub ops: Vec<KvEventOp>,
|
||||
pub dispatch_mode: TriggerDispatchMode,
|
||||
pub retry_max_attempts: u32,
|
||||
pub retry_backoff: BackoffShape,
|
||||
pub retry_base_ms: u32,
|
||||
pub registered_by_principal: AdminUserId,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CreateDeadLetterTrigger {
|
||||
pub script_id: ScriptId,
|
||||
pub source_filter: Option<String>,
|
||||
pub trigger_id_filter: Option<TriggerId>,
|
||||
pub script_id_filter: Option<ScriptId>,
|
||||
pub registered_by_principal: AdminUserId,
|
||||
}
|
||||
|
||||
/// One match for the dispatcher's "which KV triggers fire on this
|
||||
/// event" lookup. Carries everything the dispatcher needs to construct
|
||||
/// the outbox row.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct KvTriggerMatch {
|
||||
pub trigger_id: TriggerId,
|
||||
pub script_id: ScriptId,
|
||||
pub dispatch_mode: TriggerDispatchMode,
|
||||
pub retry_max_attempts: u32,
|
||||
pub retry_backoff: BackoffShape,
|
||||
pub retry_base_ms: u32,
|
||||
pub registered_by_principal: AdminUserId,
|
||||
}
|
||||
|
||||
/// One match for the dispatcher's "which dead-letter triggers fire
|
||||
/// on this dead-letter row" lookup.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DeadLetterTriggerMatch {
|
||||
pub trigger_id: TriggerId,
|
||||
pub script_id: ScriptId,
|
||||
pub dispatch_mode: TriggerDispatchMode,
|
||||
pub registered_by_principal: AdminUserId,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait TriggerRepo: Send + Sync {
|
||||
async fn create_kv_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateKvTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError>;
|
||||
|
||||
async fn create_dead_letter_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateDeadLetterTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError>;
|
||||
|
||||
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError>;
|
||||
|
||||
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError>;
|
||||
|
||||
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError>;
|
||||
|
||||
/// Dispatcher hot path: find every enabled KV trigger in `app_id`
|
||||
/// whose `collection_glob` matches `collection` and whose `ops`
|
||||
/// covers `op`. Glob matching done in Rust (the column is plain
|
||||
/// TEXT, the matcher applies "*"/"prefix:*" semantics).
|
||||
async fn list_matching_kv(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
op: KvEventOp,
|
||||
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError>;
|
||||
|
||||
/// Dispatcher hot path for dead-letter fan-out. Filters: source
|
||||
/// (or any-source), originating trigger_id (or any), originating
|
||||
/// script_id (or any). Each filter is "match OR is_null".
|
||||
async fn list_matching_dead_letter(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
source: &str,
|
||||
trigger_id: Option<TriggerId>,
|
||||
script_id: Option<ScriptId>,
|
||||
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError>;
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// Postgres impl
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
pub struct PostgresTriggerRepo {
|
||||
pool: PgPool,
|
||||
}
|
||||
|
||||
impl PostgresTriggerRepo {
|
||||
#[must_use]
|
||||
pub fn new(pool: PgPool) -> Self {
|
||||
Self { pool }
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl TriggerRepo for PostgresTriggerRepo {
|
||||
async fn create_kv_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateKvTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError> {
|
||||
if req.collection_glob.is_empty() {
|
||||
return Err(TriggerRepoError::Invalid(
|
||||
"collection_glob must not be empty".into(),
|
||||
));
|
||||
}
|
||||
let mut tx = self.pool.begin().await?;
|
||||
let parent: TriggerRow = sqlx::query_as(
|
||||
"INSERT INTO triggers ( \
|
||||
app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal \
|
||||
) VALUES ($1, $2, 'kv', TRUE, $3, $4, $5, $6, $7) \
|
||||
RETURNING id, app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal, created_at, updated_at",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(req.script_id.into_inner())
|
||||
.bind(req.dispatch_mode.as_str())
|
||||
.bind(i32::try_from(req.retry_max_attempts).unwrap_or(3))
|
||||
.bind(req.retry_backoff.as_str())
|
||||
.bind(i32::try_from(req.retry_base_ms).unwrap_or(1000))
|
||||
.bind(req.registered_by_principal.into_inner())
|
||||
.fetch_one(&mut *tx)
|
||||
.await?;
|
||||
|
||||
let ops_str: Vec<String> = req.ops.iter().map(|o| o.as_str().to_string()).collect();
|
||||
sqlx::query(
|
||||
"INSERT INTO kv_trigger_details (trigger_id, collection_glob, ops) \
|
||||
VALUES ($1, $2, $3)",
|
||||
)
|
||||
.bind(parent.id)
|
||||
.bind(&req.collection_glob)
|
||||
.bind(&ops_str)
|
||||
.execute(&mut *tx)
|
||||
.await?;
|
||||
|
||||
tx.commit().await?;
|
||||
|
||||
Ok(Trigger {
|
||||
id: parent.id.into(),
|
||||
app_id: parent.app_id.into(),
|
||||
script_id: parent.script_id.into(),
|
||||
kind: TriggerKind::Kv,
|
||||
enabled: parent.enabled,
|
||||
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
|
||||
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(3),
|
||||
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
|
||||
.unwrap_or(BackoffShape::Exponential),
|
||||
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(1000),
|
||||
registered_by_principal: parent.registered_by_principal.into(),
|
||||
created_at: parent.created_at,
|
||||
updated_at: parent.updated_at,
|
||||
details: TriggerDetails::Kv {
|
||||
collection_glob: req.collection_glob,
|
||||
ops: req.ops,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
async fn create_dead_letter_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateDeadLetterTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError> {
|
||||
let mut tx = self.pool.begin().await?;
|
||||
// Dead-letter triggers force max_attempts=1 (design notes §4
|
||||
// recursion-stop). Backoff/base_ms irrelevant but the columns
|
||||
// are NOT NULL — store sensible values.
|
||||
let parent: TriggerRow = sqlx::query_as(
|
||||
"INSERT INTO triggers ( \
|
||||
app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal \
|
||||
) VALUES ($1, $2, 'dead_letter', TRUE, 'async', 1, 'constant', 0, $3) \
|
||||
RETURNING id, app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal, created_at, updated_at",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(req.script_id.into_inner())
|
||||
.bind(req.registered_by_principal.into_inner())
|
||||
.fetch_one(&mut *tx)
|
||||
.await?;
|
||||
|
||||
sqlx::query(
|
||||
"INSERT INTO dead_letter_trigger_details \
|
||||
(trigger_id, source_filter, trigger_id_filter, script_id_filter) \
|
||||
VALUES ($1, $2, $3, $4)",
|
||||
)
|
||||
.bind(parent.id)
|
||||
.bind(req.source_filter.as_deref())
|
||||
.bind(req.trigger_id_filter.map(TriggerId::into_inner))
|
||||
.bind(req.script_id_filter.map(ScriptId::into_inner))
|
||||
.execute(&mut *tx)
|
||||
.await?;
|
||||
|
||||
tx.commit().await?;
|
||||
|
||||
Ok(Trigger {
|
||||
id: parent.id.into(),
|
||||
app_id: parent.app_id.into(),
|
||||
script_id: parent.script_id.into(),
|
||||
kind: TriggerKind::DeadLetter,
|
||||
enabled: parent.enabled,
|
||||
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
|
||||
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(1),
|
||||
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
|
||||
.unwrap_or(BackoffShape::Constant),
|
||||
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(0),
|
||||
registered_by_principal: parent.registered_by_principal.into(),
|
||||
created_at: parent.created_at,
|
||||
updated_at: parent.updated_at,
|
||||
details: TriggerDetails::DeadLetter {
|
||||
source_filter: req.source_filter,
|
||||
trigger_id_filter: req.trigger_id_filter,
|
||||
script_id_filter: req.script_id_filter,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError> {
|
||||
let parents: Vec<TriggerRow> = sqlx::query_as(
|
||||
"SELECT id, app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal, created_at, updated_at \
|
||||
FROM triggers WHERE app_id = $1 ORDER BY created_at DESC",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
|
||||
let mut out = Vec::with_capacity(parents.len());
|
||||
for p in parents {
|
||||
out.push(hydrate_one(&self.pool, p).await?);
|
||||
}
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError> {
|
||||
let parent: Option<TriggerRow> = sqlx::query_as(
|
||||
"SELECT id, app_id, script_id, kind, enabled, dispatch_mode, \
|
||||
retry_max_attempts, retry_backoff, retry_base_ms, \
|
||||
registered_by_principal, created_at, updated_at \
|
||||
FROM triggers WHERE id = $1",
|
||||
)
|
||||
.bind(id.into_inner())
|
||||
.fetch_optional(&self.pool)
|
||||
.await?;
|
||||
match parent {
|
||||
Some(p) => Ok(Some(hydrate_one(&self.pool, p).await?)),
|
||||
None => Ok(None),
|
||||
}
|
||||
}
|
||||
|
||||
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError> {
|
||||
// ON DELETE CASCADE on the detail tables takes care of them.
|
||||
let res = sqlx::query("DELETE FROM triggers WHERE id = $1")
|
||||
.bind(id.into_inner())
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
Ok(res.rows_affected() > 0)
|
||||
}
|
||||
|
||||
async fn list_matching_kv(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
collection: &str,
|
||||
op: KvEventOp,
|
||||
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError> {
|
||||
// Fetch all enabled KV triggers for the app — glob matching
|
||||
// happens in Rust so we don't have to teach the query about
|
||||
// `*` and `prefix:*`. Sets are tiny in practice (one app's
|
||||
// worth of triggers, usually a handful).
|
||||
let rows: Vec<KvMatchRow> = sqlx::query_as(
|
||||
"SELECT t.id, t.script_id, t.dispatch_mode, \
|
||||
t.retry_max_attempts, t.retry_backoff, t.retry_base_ms, \
|
||||
t.registered_by_principal, \
|
||||
d.collection_glob, d.ops \
|
||||
FROM triggers t \
|
||||
JOIN kv_trigger_details d ON d.trigger_id = t.id \
|
||||
WHERE t.app_id = $1 AND t.kind = 'kv' AND t.enabled = TRUE",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
|
||||
let op_str = op.as_str();
|
||||
let mut out = Vec::new();
|
||||
for r in rows {
|
||||
if !collection_matches(&r.collection_glob, collection) {
|
||||
continue;
|
||||
}
|
||||
let any_op = r.ops.is_empty();
|
||||
if !any_op && !r.ops.iter().any(|o| o == op_str) {
|
||||
continue;
|
||||
}
|
||||
out.push(KvTriggerMatch {
|
||||
trigger_id: r.id.into(),
|
||||
script_id: r.script_id.into(),
|
||||
dispatch_mode: dispatch_from_str(&r.dispatch_mode),
|
||||
retry_max_attempts: u32::try_from(r.retry_max_attempts).unwrap_or(3),
|
||||
retry_backoff: BackoffShape::from_wire(&r.retry_backoff)
|
||||
.unwrap_or(BackoffShape::Exponential),
|
||||
retry_base_ms: u32::try_from(r.retry_base_ms).unwrap_or(1000),
|
||||
registered_by_principal: r.registered_by_principal.into(),
|
||||
});
|
||||
}
|
||||
Ok(out)
|
||||
}
|
||||
|
||||
async fn list_matching_dead_letter(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
source: &str,
|
||||
trigger_id: Option<TriggerId>,
|
||||
script_id: Option<ScriptId>,
|
||||
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError> {
|
||||
let rows: Vec<DlMatchRow> = sqlx::query_as(
|
||||
"SELECT t.id, t.script_id, t.dispatch_mode, t.registered_by_principal, \
|
||||
d.source_filter, d.trigger_id_filter, d.script_id_filter \
|
||||
FROM triggers t \
|
||||
JOIN dead_letter_trigger_details d ON d.trigger_id = t.id \
|
||||
WHERE t.app_id = $1 AND t.kind = 'dead_letter' AND t.enabled = TRUE \
|
||||
AND (d.source_filter IS NULL OR d.source_filter = $2) \
|
||||
AND (d.trigger_id_filter IS NULL OR d.trigger_id_filter = $3) \
|
||||
AND (d.script_id_filter IS NULL OR d.script_id_filter = $4)",
|
||||
)
|
||||
.bind(app_id.into_inner())
|
||||
.bind(source)
|
||||
.bind(trigger_id.map(TriggerId::into_inner))
|
||||
.bind(script_id.map(ScriptId::into_inner))
|
||||
.fetch_all(&self.pool)
|
||||
.await?;
|
||||
|
||||
Ok(rows
|
||||
.into_iter()
|
||||
.map(|r| DeadLetterTriggerMatch {
|
||||
trigger_id: r.id.into(),
|
||||
script_id: r.script_id.into(),
|
||||
dispatch_mode: dispatch_from_str(&r.dispatch_mode),
|
||||
registered_by_principal: r.registered_by_principal.into(),
|
||||
})
|
||||
.collect())
|
||||
}
|
||||
}
|
||||
|
||||
async fn hydrate_one(pool: &PgPool, parent: TriggerRow) -> Result<Trigger, TriggerRepoError> {
|
||||
let kind = TriggerKind::from_wire(&parent.kind).ok_or_else(|| {
|
||||
TriggerRepoError::Invalid(format!("unknown trigger kind {}", parent.kind))
|
||||
})?;
|
||||
|
||||
let details = match kind {
|
||||
TriggerKind::Kv => {
|
||||
let row: KvDetailRow = sqlx::query_as(
|
||||
"SELECT collection_glob, ops FROM kv_trigger_details WHERE trigger_id = $1",
|
||||
)
|
||||
.bind(parent.id)
|
||||
.fetch_one(pool)
|
||||
.await?;
|
||||
let ops = row
|
||||
.ops
|
||||
.iter()
|
||||
.filter_map(|s| KvEventOp::from_wire(s))
|
||||
.collect();
|
||||
TriggerDetails::Kv {
|
||||
collection_glob: row.collection_glob,
|
||||
ops,
|
||||
}
|
||||
}
|
||||
TriggerKind::DeadLetter => {
|
||||
let row: DlDetailRow = sqlx::query_as(
|
||||
"SELECT source_filter, trigger_id_filter, script_id_filter \
|
||||
FROM dead_letter_trigger_details WHERE trigger_id = $1",
|
||||
)
|
||||
.bind(parent.id)
|
||||
.fetch_one(pool)
|
||||
.await?;
|
||||
TriggerDetails::DeadLetter {
|
||||
source_filter: row.source_filter,
|
||||
trigger_id_filter: row.trigger_id_filter.map(Into::into),
|
||||
script_id_filter: row.script_id_filter.map(Into::into),
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
Ok(Trigger {
|
||||
id: parent.id.into(),
|
||||
app_id: parent.app_id.into(),
|
||||
script_id: parent.script_id.into(),
|
||||
kind,
|
||||
enabled: parent.enabled,
|
||||
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
|
||||
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(3),
|
||||
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
|
||||
.unwrap_or(BackoffShape::Exponential),
|
||||
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(1000),
|
||||
registered_by_principal: parent.registered_by_principal.into(),
|
||||
created_at: parent.created_at,
|
||||
updated_at: parent.updated_at,
|
||||
details,
|
||||
})
|
||||
}
|
||||
|
||||
fn dispatch_from_str(s: &str) -> TriggerDispatchMode {
|
||||
match s {
|
||||
"sync" => TriggerDispatchMode::Sync,
|
||||
_ => TriggerDispatchMode::Async,
|
||||
}
|
||||
}
|
||||
|
||||
/// Match a `collection_glob` against an actual `collection` name.
|
||||
/// Supported forms (in priority order):
|
||||
/// - `"*"` → matches every collection
|
||||
/// - `"foo*"` → prefix match (anything starting with "foo")
|
||||
/// - `"foo"` → exact match
|
||||
#[must_use]
|
||||
pub fn collection_matches(glob: &str, collection: &str) -> bool {
|
||||
if glob == "*" {
|
||||
return true;
|
||||
}
|
||||
if let Some(prefix) = glob.strip_suffix('*') {
|
||||
return collection.starts_with(prefix);
|
||||
}
|
||||
glob == collection
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct TriggerRow {
|
||||
id: Uuid,
|
||||
app_id: Uuid,
|
||||
script_id: Uuid,
|
||||
kind: String,
|
||||
enabled: bool,
|
||||
dispatch_mode: String,
|
||||
retry_max_attempts: i32,
|
||||
retry_backoff: String,
|
||||
retry_base_ms: i32,
|
||||
registered_by_principal: Uuid,
|
||||
created_at: DateTime<Utc>,
|
||||
updated_at: DateTime<Utc>,
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct KvDetailRow {
|
||||
collection_glob: String,
|
||||
ops: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
#[allow(clippy::struct_field_names)]
|
||||
struct DlDetailRow {
|
||||
source_filter: Option<String>,
|
||||
trigger_id_filter: Option<Uuid>,
|
||||
script_id_filter: Option<Uuid>,
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct KvMatchRow {
|
||||
id: Uuid,
|
||||
script_id: Uuid,
|
||||
dispatch_mode: String,
|
||||
retry_max_attempts: i32,
|
||||
retry_backoff: String,
|
||||
retry_base_ms: i32,
|
||||
registered_by_principal: Uuid,
|
||||
collection_glob: String,
|
||||
ops: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(sqlx::FromRow)]
|
||||
struct DlMatchRow {
|
||||
id: Uuid,
|
||||
script_id: Uuid,
|
||||
dispatch_mode: String,
|
||||
registered_by_principal: Uuid,
|
||||
#[allow(dead_code)]
|
||||
source_filter: Option<String>,
|
||||
#[allow(dead_code)]
|
||||
trigger_id_filter: Option<Uuid>,
|
||||
#[allow(dead_code)]
|
||||
script_id_filter: Option<Uuid>,
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn collection_matcher_handles_star_prefix_exact() {
|
||||
assert!(collection_matches("*", "widgets"));
|
||||
assert!(collection_matches("*", ""));
|
||||
assert!(collection_matches("users:*", "users:1"));
|
||||
assert!(collection_matches("users:*", "users:"));
|
||||
assert!(!collection_matches("users:*", "orgs:1"));
|
||||
assert!(collection_matches("widgets", "widgets"));
|
||||
assert!(!collection_matches("widgets", "Widgets"));
|
||||
}
|
||||
}
|
||||
748
crates/manager-core/src/triggers_api.rs
Normal file
748
crates/manager-core/src/triggers_api.rs
Normal file
@@ -0,0 +1,748 @@
|
||||
//! `/api/v1/admin/apps/{id}/triggers/*` — trigger CRUD admin endpoints.
|
||||
//!
|
||||
//! Per design notes §2, two kinds ship in v1.1.1: `kv` (with
|
||||
//! collection_glob + ops) and `dead_letter` (with optional source /
|
||||
//! trigger_id / script_id filters). Separate endpoints per kind keep
|
||||
//! validation clean.
|
||||
//!
|
||||
//! Every endpoint is guarded by `Capability::AppManageTriggers(app_id)`
|
||||
//! evaluated after the resource lookup so the capability binds to the
|
||||
//! resource's actual `app_id` (mirrors `apps_api`).
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use axum::extract::{Path, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::response::{IntoResponse, Json, Response};
|
||||
use axum::routing::{delete, get, post};
|
||||
use axum::{Extension, Router};
|
||||
use picloud_shared::{AppId, KvEventOp, Principal, ScriptId, TriggerId};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::json;
|
||||
|
||||
use crate::app_repo::AppRepository;
|
||||
use crate::authz::{require, AuthzDenied, AuthzError, AuthzRepo, Capability};
|
||||
use crate::trigger_config::{BackoffShape, TriggerConfig};
|
||||
use crate::trigger_repo::{
|
||||
CreateDeadLetterTrigger, CreateKvTrigger, Trigger, TriggerDispatchMode, TriggerRepo,
|
||||
TriggerRepoError,
|
||||
};
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct TriggersState {
|
||||
pub triggers: Arc<dyn TriggerRepo>,
|
||||
pub apps: Arc<dyn AppRepository>,
|
||||
pub authz: Arc<dyn AuthzRepo>,
|
||||
/// Defaults applied to created triggers when the request omits
|
||||
/// retry settings. Kept on the state struct so tests can swap
|
||||
/// in a stricter / looser config without env tinkering.
|
||||
pub config: TriggerConfig,
|
||||
}
|
||||
|
||||
pub fn triggers_router(state: TriggersState) -> Router {
|
||||
Router::new()
|
||||
.route(
|
||||
"/apps/{app_id}/triggers",
|
||||
get(list_triggers).delete(noop_405),
|
||||
)
|
||||
.route("/apps/{app_id}/triggers/kv", post(create_kv_trigger))
|
||||
.route(
|
||||
"/apps/{app_id}/triggers/dead_letter",
|
||||
post(create_dl_trigger),
|
||||
)
|
||||
.route(
|
||||
"/apps/{app_id}/triggers/{trigger_id}",
|
||||
delete(delete_trigger),
|
||||
)
|
||||
.with_state(state)
|
||||
}
|
||||
|
||||
async fn noop_405() -> StatusCode {
|
||||
StatusCode::METHOD_NOT_ALLOWED
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// DTOs
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct CreateKvTriggerRequest {
|
||||
pub script_id: ScriptId,
|
||||
pub collection_glob: String,
|
||||
/// Subset of `{insert, update, delete}`. Empty array means "any
|
||||
/// op" (the trigger fires on every mutation in matching
|
||||
/// collections).
|
||||
#[serde(default)]
|
||||
pub ops: Vec<KvEventOp>,
|
||||
#[serde(default = "default_dispatch")]
|
||||
pub dispatch_mode: TriggerDispatchMode,
|
||||
/// Overrides for the platform retry defaults. Omitted fields fall
|
||||
/// back to `TriggerConfig` (env-overridable) at write time.
|
||||
#[serde(default)]
|
||||
pub retry_max_attempts: Option<u32>,
|
||||
#[serde(default)]
|
||||
pub retry_backoff: Option<BackoffShape>,
|
||||
#[serde(default)]
|
||||
pub retry_base_ms: Option<u32>,
|
||||
}
|
||||
|
||||
const fn default_dispatch() -> TriggerDispatchMode {
|
||||
TriggerDispatchMode::Async
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct CreateDeadLetterTriggerRequest {
|
||||
pub script_id: ScriptId,
|
||||
#[serde(default)]
|
||||
pub source_filter: Option<String>,
|
||||
#[serde(default)]
|
||||
pub trigger_id_filter: Option<TriggerId>,
|
||||
#[serde(default)]
|
||||
pub script_id_filter: Option<ScriptId>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
pub struct TriggerListResponse {
|
||||
pub triggers: Vec<Trigger>,
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// Handlers
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
async fn list_triggers(
|
||||
State(s): State<TriggersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path(app_id): Path<AppId>,
|
||||
) -> Result<Json<TriggerListResponse>, TriggersApiError> {
|
||||
ensure_app_exists(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppManageTriggers(app_id),
|
||||
)
|
||||
.await?;
|
||||
let triggers = s.triggers.list_for_app(app_id).await?;
|
||||
Ok(Json(TriggerListResponse { triggers }))
|
||||
}
|
||||
|
||||
async fn create_kv_trigger(
|
||||
State(s): State<TriggersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path(app_id): Path<AppId>,
|
||||
Json(input): Json<CreateKvTriggerRequest>,
|
||||
) -> Result<(StatusCode, Json<Trigger>), TriggersApiError> {
|
||||
ensure_app_exists(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppManageTriggers(app_id),
|
||||
)
|
||||
.await?;
|
||||
|
||||
if input.collection_glob.trim().is_empty() {
|
||||
return Err(TriggersApiError::Invalid(
|
||||
"collection_glob must not be empty".into(),
|
||||
));
|
||||
}
|
||||
|
||||
let req = CreateKvTrigger {
|
||||
script_id: input.script_id,
|
||||
collection_glob: input.collection_glob,
|
||||
ops: input.ops,
|
||||
dispatch_mode: input.dispatch_mode,
|
||||
retry_max_attempts: input
|
||||
.retry_max_attempts
|
||||
.unwrap_or(s.config.retry_max_attempts),
|
||||
retry_backoff: input.retry_backoff.unwrap_or(s.config.retry_backoff),
|
||||
retry_base_ms: input.retry_base_ms.unwrap_or(s.config.retry_base_ms),
|
||||
registered_by_principal: principal.user_id,
|
||||
};
|
||||
let created = s.triggers.create_kv_trigger(app_id, req).await?;
|
||||
Ok((StatusCode::CREATED, Json(created)))
|
||||
}
|
||||
|
||||
async fn create_dl_trigger(
|
||||
State(s): State<TriggersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path(app_id): Path<AppId>,
|
||||
Json(input): Json<CreateDeadLetterTriggerRequest>,
|
||||
) -> Result<(StatusCode, Json<Trigger>), TriggersApiError> {
|
||||
ensure_app_exists(&*s.apps, app_id).await?;
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppManageTriggers(app_id),
|
||||
)
|
||||
.await?;
|
||||
let req = CreateDeadLetterTrigger {
|
||||
script_id: input.script_id,
|
||||
source_filter: input.source_filter,
|
||||
trigger_id_filter: input.trigger_id_filter,
|
||||
script_id_filter: input.script_id_filter,
|
||||
registered_by_principal: principal.user_id,
|
||||
};
|
||||
let created = s.triggers.create_dead_letter_trigger(app_id, req).await?;
|
||||
Ok((StatusCode::CREATED, Json(created)))
|
||||
}
|
||||
|
||||
async fn delete_trigger(
|
||||
State(s): State<TriggersState>,
|
||||
Extension(principal): Extension<Principal>,
|
||||
Path((app_id, trigger_id)): Path<(AppId, TriggerId)>,
|
||||
) -> Result<StatusCode, TriggersApiError> {
|
||||
ensure_app_exists(&*s.apps, app_id).await?;
|
||||
// Load the trigger so we can confirm it belongs to the right
|
||||
// app; this prevents a caller from deleting a trigger by id alone
|
||||
// when their capability is bound to a different app.
|
||||
let trigger = s
|
||||
.triggers
|
||||
.get(trigger_id)
|
||||
.await?
|
||||
.ok_or(TriggersApiError::NotFound(trigger_id))?;
|
||||
if trigger.app_id != app_id {
|
||||
return Err(TriggersApiError::NotFound(trigger_id));
|
||||
}
|
||||
require(
|
||||
s.authz.as_ref(),
|
||||
&principal,
|
||||
Capability::AppManageTriggers(app_id),
|
||||
)
|
||||
.await?;
|
||||
if !s.triggers.delete(trigger_id).await? {
|
||||
return Err(TriggersApiError::NotFound(trigger_id));
|
||||
}
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
|
||||
async fn ensure_app_exists(
|
||||
apps: &dyn AppRepository,
|
||||
app_id: AppId,
|
||||
) -> Result<(), TriggersApiError> {
|
||||
apps.get_by_id(app_id)
|
||||
.await
|
||||
.map_err(|e| TriggersApiError::Backend(e.to_string()))?
|
||||
.ok_or_else(|| TriggersApiError::AppNotFound(app_id.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
// Errors
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum TriggersApiError {
|
||||
#[error("app not found: {0}")]
|
||||
AppNotFound(String),
|
||||
|
||||
#[error("trigger not found: {0}")]
|
||||
NotFound(TriggerId),
|
||||
|
||||
#[error("invalid trigger: {0}")]
|
||||
Invalid(String),
|
||||
|
||||
#[error("forbidden")]
|
||||
Forbidden,
|
||||
|
||||
#[error("authorization repo error: {0}")]
|
||||
AuthzRepo(String),
|
||||
|
||||
#[error("trigger backend: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
|
||||
impl From<AuthzDenied> for TriggersApiError {
|
||||
fn from(d: AuthzDenied) -> Self {
|
||||
match d {
|
||||
AuthzDenied::Denied => Self::Forbidden,
|
||||
AuthzDenied::Repo(e) => Self::AuthzRepo(e.to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl From<AuthzError> for TriggersApiError {
|
||||
fn from(e: AuthzError) -> Self {
|
||||
Self::AuthzRepo(e.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
impl From<TriggerRepoError> for TriggersApiError {
|
||||
fn from(e: TriggerRepoError) -> Self {
|
||||
match e {
|
||||
TriggerRepoError::NotFound(id) => Self::NotFound(id),
|
||||
TriggerRepoError::Invalid(s) => Self::Invalid(s),
|
||||
TriggerRepoError::Db(e) => Self::Backend(e.to_string()),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl IntoResponse for TriggersApiError {
|
||||
fn into_response(self) -> Response {
|
||||
let (status, body) = match &self {
|
||||
Self::AppNotFound(_) | Self::NotFound(_) => {
|
||||
(StatusCode::NOT_FOUND, json!({ "error": self.to_string() }))
|
||||
}
|
||||
Self::Invalid(_) => (
|
||||
StatusCode::UNPROCESSABLE_ENTITY,
|
||||
json!({ "error": self.to_string() }),
|
||||
),
|
||||
Self::Forbidden => (StatusCode::FORBIDDEN, json!({ "error": self.to_string() })),
|
||||
Self::AuthzRepo(e) => {
|
||||
tracing::error!(error = %e, "triggers authz repo error");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
json!({ "error": "internal error" }),
|
||||
)
|
||||
}
|
||||
Self::Backend(e) => {
|
||||
tracing::error!(error = %e, "triggers api backend error");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
json!({ "error": "internal error" }),
|
||||
)
|
||||
}
|
||||
};
|
||||
(status, Json(body)).into_response()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
//! In-memory tests for the trigger admin path. The Axum routing
|
||||
//! / extractor surface is exercised by integration tests (which
|
||||
//! need a real Postgres for the trigger repo); these tests cover
|
||||
//! the handlers' invariant logic — capability enforcement, app
|
||||
//! validation, default fallback for retry settings.
|
||||
|
||||
use super::*;
|
||||
use crate::app_repo::{AppLookup, AppRepository};
|
||||
use crate::trigger_repo::{
|
||||
DeadLetterTriggerMatch, KvTriggerMatch, Trigger, TriggerDetails, TriggerRepo,
|
||||
TriggerRepoError,
|
||||
};
|
||||
use async_trait::async_trait;
|
||||
use chrono::Utc;
|
||||
use picloud_shared::{AdminUserId, App, AppRole, KvEventOp, ScriptId, TriggerId, UserId};
|
||||
use std::collections::HashMap;
|
||||
use tokio::sync::Mutex;
|
||||
|
||||
#[derive(Default)]
|
||||
struct InMemoryTriggerRepo {
|
||||
inner: Mutex<HashMap<TriggerId, Trigger>>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl TriggerRepo for InMemoryTriggerRepo {
|
||||
async fn create_kv_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateKvTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError> {
|
||||
let now = Utc::now();
|
||||
let id = TriggerId::new();
|
||||
let trigger = Trigger {
|
||||
id,
|
||||
app_id,
|
||||
script_id: req.script_id,
|
||||
kind: crate::trigger_repo::TriggerKind::Kv,
|
||||
enabled: true,
|
||||
dispatch_mode: req.dispatch_mode,
|
||||
retry_max_attempts: req.retry_max_attempts,
|
||||
retry_backoff: req.retry_backoff,
|
||||
retry_base_ms: req.retry_base_ms,
|
||||
registered_by_principal: req.registered_by_principal,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
details: TriggerDetails::Kv {
|
||||
collection_glob: req.collection_glob,
|
||||
ops: req.ops,
|
||||
},
|
||||
};
|
||||
self.inner.lock().await.insert(id, trigger.clone());
|
||||
Ok(trigger)
|
||||
}
|
||||
async fn create_dead_letter_trigger(
|
||||
&self,
|
||||
app_id: AppId,
|
||||
req: CreateDeadLetterTrigger,
|
||||
) -> Result<Trigger, TriggerRepoError> {
|
||||
let now = Utc::now();
|
||||
let id = TriggerId::new();
|
||||
let trigger = Trigger {
|
||||
id,
|
||||
app_id,
|
||||
script_id: req.script_id,
|
||||
kind: crate::trigger_repo::TriggerKind::DeadLetter,
|
||||
enabled: true,
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: 1,
|
||||
retry_backoff: BackoffShape::Constant,
|
||||
retry_base_ms: 0,
|
||||
registered_by_principal: req.registered_by_principal,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
details: TriggerDetails::DeadLetter {
|
||||
source_filter: req.source_filter,
|
||||
trigger_id_filter: req.trigger_id_filter,
|
||||
script_id_filter: req.script_id_filter,
|
||||
},
|
||||
};
|
||||
self.inner.lock().await.insert(id, trigger.clone());
|
||||
Ok(trigger)
|
||||
}
|
||||
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError> {
|
||||
Ok(self
|
||||
.inner
|
||||
.lock()
|
||||
.await
|
||||
.values()
|
||||
.filter(|t| t.app_id == app_id)
|
||||
.cloned()
|
||||
.collect())
|
||||
}
|
||||
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError> {
|
||||
Ok(self.inner.lock().await.get(&id).cloned())
|
||||
}
|
||||
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError> {
|
||||
Ok(self.inner.lock().await.remove(&id).is_some())
|
||||
}
|
||||
async fn list_matching_kv(
|
||||
&self,
|
||||
_app_id: AppId,
|
||||
_collection: &str,
|
||||
_op: KvEventOp,
|
||||
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError> {
|
||||
Ok(vec![])
|
||||
}
|
||||
async fn list_matching_dead_letter(
|
||||
&self,
|
||||
_app_id: AppId,
|
||||
_source: &str,
|
||||
_trigger_id: Option<TriggerId>,
|
||||
_script_id: Option<ScriptId>,
|
||||
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError> {
|
||||
Ok(vec![])
|
||||
}
|
||||
}
|
||||
|
||||
struct InMemoryAppRepo {
|
||||
existing: Mutex<HashMap<AppId, App>>,
|
||||
}
|
||||
|
||||
impl InMemoryAppRepo {
|
||||
fn with(app_id: AppId) -> Arc<Self> {
|
||||
let now = Utc::now();
|
||||
let mut existing = HashMap::new();
|
||||
existing.insert(
|
||||
app_id,
|
||||
App {
|
||||
id: app_id,
|
||||
slug: "test".into(),
|
||||
name: "test".into(),
|
||||
description: None,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
},
|
||||
);
|
||||
Arc::new(Self {
|
||||
existing: Mutex::new(existing),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl AppRepository for InMemoryAppRepo {
|
||||
async fn create(
|
||||
&self,
|
||||
_slug: &str,
|
||||
_name: &str,
|
||||
_description: Option<&str>,
|
||||
) -> Result<App, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn create_with_takeover(
|
||||
&self,
|
||||
_slug: &str,
|
||||
_name: &str,
|
||||
_description: Option<&str>,
|
||||
) -> Result<App, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn slug_in_history(
|
||||
&self,
|
||||
_slug: &str,
|
||||
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn list(&self) -> Result<Vec<App>, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn list_for_user(
|
||||
&self,
|
||||
_user_id: AdminUserId,
|
||||
) -> Result<Vec<App>, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn get_by_id(
|
||||
&self,
|
||||
id: AppId,
|
||||
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
|
||||
Ok(self.existing.lock().await.get(&id).cloned())
|
||||
}
|
||||
async fn get_by_slug(
|
||||
&self,
|
||||
_slug: &str,
|
||||
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn get_by_slug_or_history(
|
||||
&self,
|
||||
_slug: &str,
|
||||
) -> Result<Option<AppLookup>, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn update(
|
||||
&self,
|
||||
_id: AppId,
|
||||
_name: Option<&str>,
|
||||
_description: Option<Option<&str>>,
|
||||
) -> Result<App, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn rename_slug(
|
||||
&self,
|
||||
_id: AppId,
|
||||
_new_slug: &str,
|
||||
_take_over_history: bool,
|
||||
) -> Result<App, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn delete(&self, _id: AppId) -> Result<(), crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn delete_cascade(
|
||||
&self,
|
||||
_id: AppId,
|
||||
) -> Result<(), crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
async fn count_scripts_in_app(
|
||||
&self,
|
||||
_id: AppId,
|
||||
) -> Result<i64, crate::repo::ScriptRepositoryError> {
|
||||
unimplemented!()
|
||||
}
|
||||
}
|
||||
|
||||
struct AlwaysAllowAuthzRepo;
|
||||
#[async_trait]
|
||||
impl AuthzRepo for AlwaysAllowAuthzRepo {
|
||||
async fn membership(
|
||||
&self,
|
||||
_user_id: UserId,
|
||||
_app_id: AppId,
|
||||
) -> Result<Option<AppRole>, AuthzError> {
|
||||
Ok(Some(AppRole::AppAdmin))
|
||||
}
|
||||
}
|
||||
|
||||
struct AlwaysDenyAuthzRepo;
|
||||
#[async_trait]
|
||||
impl AuthzRepo for AlwaysDenyAuthzRepo {
|
||||
async fn membership(
|
||||
&self,
|
||||
_user_id: UserId,
|
||||
_app_id: AppId,
|
||||
) -> Result<Option<AppRole>, AuthzError> {
|
||||
Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
fn member_principal() -> Principal {
|
||||
Principal {
|
||||
user_id: AdminUserId::new(),
|
||||
instance_role: picloud_shared::InstanceRole::Member,
|
||||
scopes: None,
|
||||
app_binding: None,
|
||||
}
|
||||
}
|
||||
|
||||
fn state_with(authz: Arc<dyn AuthzRepo>, app_id: AppId) -> TriggersState {
|
||||
TriggersState {
|
||||
triggers: Arc::new(InMemoryTriggerRepo::default()),
|
||||
apps: InMemoryAppRepo::with(app_id),
|
||||
authz,
|
||||
config: TriggerConfig::conservative(),
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn unknown_app_returns_404() {
|
||||
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), AppId::new());
|
||||
let res = create_kv_trigger(
|
||||
State(state),
|
||||
Extension(member_principal()),
|
||||
Path(AppId::new()), // a different (non-existent) app
|
||||
Json(CreateKvTriggerRequest {
|
||||
script_id: ScriptId::new(),
|
||||
collection_glob: "*".into(),
|
||||
ops: vec![],
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: None,
|
||||
retry_backoff: None,
|
||||
retry_base_ms: None,
|
||||
}),
|
||||
)
|
||||
.await;
|
||||
let err = res.expect_err("missing app should error");
|
||||
assert!(matches!(err, TriggersApiError::AppNotFound(_)));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn member_without_role_is_forbidden() {
|
||||
let app_id = AppId::new();
|
||||
let state = state_with(Arc::new(AlwaysDenyAuthzRepo), app_id);
|
||||
let res = create_kv_trigger(
|
||||
State(state),
|
||||
Extension(member_principal()),
|
||||
Path(app_id),
|
||||
Json(CreateKvTriggerRequest {
|
||||
script_id: ScriptId::new(),
|
||||
collection_glob: "*".into(),
|
||||
ops: vec![],
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: None,
|
||||
retry_backoff: None,
|
||||
retry_base_ms: None,
|
||||
}),
|
||||
)
|
||||
.await;
|
||||
let err = res.expect_err("member without role should be forbidden");
|
||||
assert!(matches!(err, TriggersApiError::Forbidden));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn kv_trigger_uses_env_defaults_when_omitted() {
|
||||
let app_id = AppId::new();
|
||||
let mut state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_id);
|
||||
// Tweak the config so we can detect that defaults were used.
|
||||
state.config.retry_max_attempts = 7;
|
||||
state.config.retry_base_ms = 12_345;
|
||||
let (status, Json(trigger)) = create_kv_trigger(
|
||||
State(state),
|
||||
Extension(member_principal()),
|
||||
Path(app_id),
|
||||
Json(CreateKvTriggerRequest {
|
||||
script_id: ScriptId::new(),
|
||||
collection_glob: "widgets".into(),
|
||||
ops: vec![KvEventOp::Insert],
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: None,
|
||||
retry_backoff: None,
|
||||
retry_base_ms: None,
|
||||
}),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(status, StatusCode::CREATED);
|
||||
assert_eq!(trigger.retry_max_attempts, 7);
|
||||
assert_eq!(trigger.retry_base_ms, 12_345);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn empty_collection_glob_rejected() {
|
||||
let app_id = AppId::new();
|
||||
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_id);
|
||||
let res = create_kv_trigger(
|
||||
State(state),
|
||||
Extension(member_principal()),
|
||||
Path(app_id),
|
||||
Json(CreateKvTriggerRequest {
|
||||
script_id: ScriptId::new(),
|
||||
collection_glob: " ".into(),
|
||||
ops: vec![],
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: None,
|
||||
retry_backoff: None,
|
||||
retry_base_ms: None,
|
||||
}),
|
||||
)
|
||||
.await;
|
||||
let err = res.expect_err("empty glob should reject");
|
||||
assert!(matches!(err, TriggersApiError::Invalid(_)));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn delete_rejects_cross_app_trigger_id() {
|
||||
let app_a = AppId::new();
|
||||
let app_b = AppId::new();
|
||||
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_a);
|
||||
// Inject the app_b row into the in-memory apps repo too so
|
||||
// the path-existence check succeeds against app_a.
|
||||
// Insert a trigger that belongs to app_a.
|
||||
let trigger = state
|
||||
.triggers
|
||||
.create_kv_trigger(
|
||||
app_a,
|
||||
CreateKvTrigger {
|
||||
script_id: ScriptId::new(),
|
||||
collection_glob: "*".into(),
|
||||
ops: vec![],
|
||||
dispatch_mode: TriggerDispatchMode::Async,
|
||||
retry_max_attempts: 3,
|
||||
retry_backoff: BackoffShape::Exponential,
|
||||
retry_base_ms: 1000,
|
||||
registered_by_principal: AdminUserId::new(),
|
||||
},
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
let _ = app_b;
|
||||
|
||||
// Attempt to delete via app_b's path — should 404.
|
||||
// First, give the in-memory app repo a record for app_b.
|
||||
// (Otherwise we'd 404 on app-existence before reaching the
|
||||
// cross-app check.)
|
||||
let state = TriggersState {
|
||||
apps: {
|
||||
let now = Utc::now();
|
||||
let mut existing = HashMap::new();
|
||||
existing.insert(
|
||||
app_a,
|
||||
App {
|
||||
id: app_a,
|
||||
slug: "a".into(),
|
||||
name: "a".into(),
|
||||
description: None,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
},
|
||||
);
|
||||
existing.insert(
|
||||
app_b,
|
||||
App {
|
||||
id: app_b,
|
||||
slug: "b".into(),
|
||||
name: "b".into(),
|
||||
description: None,
|
||||
created_at: now,
|
||||
updated_at: now,
|
||||
},
|
||||
);
|
||||
Arc::new(InMemoryAppRepo {
|
||||
existing: Mutex::new(existing),
|
||||
})
|
||||
},
|
||||
..state
|
||||
};
|
||||
|
||||
let res = delete_trigger(
|
||||
State(state),
|
||||
Extension(member_principal()),
|
||||
Path((app_b, trigger.id)),
|
||||
)
|
||||
.await;
|
||||
let err = res.expect_err("cross-app delete should 404");
|
||||
assert!(matches!(err, TriggersApiError::NotFound(_)));
|
||||
}
|
||||
}
|
||||
@@ -17,13 +17,15 @@ use axum::{
|
||||
use chrono::Utc;
|
||||
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
|
||||
use picloud_shared::{
|
||||
AppId, ExecutionId, ExecutionLog, ExecutionLogSink, ExecutionStatus, Principal, RequestId,
|
||||
ScriptId,
|
||||
AppId, DispatchMode, ExecutionId, ExecutionLog, ExecutionLogSink, ExecutionStatus,
|
||||
HttpDispatchPayload, InboxFailureKind, InboxResult, NewHttpOutbox, OutboxWriter, Principal,
|
||||
RequestId, ScriptId,
|
||||
};
|
||||
use serde_json::Value as Json_;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::client::ExecutorClient;
|
||||
use crate::inbox::InboxRegistry;
|
||||
use crate::resolver::{ResolverError, ScriptResolver};
|
||||
use crate::routing::{AppDomainTable, RouteTable};
|
||||
|
||||
@@ -39,6 +41,14 @@ pub struct DataPlaneState<E, R> {
|
||||
/// Routing table for user-defined paths, partitioned per app.
|
||||
/// Shared with the manager (admin router writes; this side reads).
|
||||
pub routes: Arc<RouteTable>,
|
||||
/// NATS-style inbox registry (v1.1.1). Used by sync HTTP via
|
||||
/// outbox to await the dispatcher's delivery on a oneshot
|
||||
/// channel.
|
||||
pub inbox: Arc<InboxRegistry>,
|
||||
/// Writer for the universal trigger outbox (v1.1.1). The sync
|
||||
/// HTTP path inserts a row with `reply_to = inbox_id`; the async
|
||||
/// path inserts with `reply_to = None` and returns 202.
|
||||
pub outbox: Arc<dyn OutboxWriter>,
|
||||
}
|
||||
|
||||
impl<E, R> Clone for DataPlaneState<E, R> {
|
||||
@@ -49,6 +59,8 @@ impl<E, R> Clone for DataPlaneState<E, R> {
|
||||
log_sink: self.log_sink.clone(),
|
||||
app_domains: self.app_domains.clone(),
|
||||
routes: self.routes.clone(),
|
||||
inbox: self.inbox.clone(),
|
||||
outbox: self.outbox.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -202,50 +214,312 @@ where
|
||||
Err(e) => return Err(ApiError::BadRequest(format!("body read failed: {e}"))),
|
||||
};
|
||||
|
||||
let mut req = build_exec_request(
|
||||
matched.matched.script_id,
|
||||
&script.name,
|
||||
&headers,
|
||||
&body_bytes,
|
||||
app_id,
|
||||
principal,
|
||||
)?;
|
||||
req.path = path;
|
||||
req.params = matched.params;
|
||||
req.query = parse_query_string(&query_str);
|
||||
req.rest = matched.rest.unwrap_or_default();
|
||||
req.sandbox_overrides = script.sandbox;
|
||||
let body_json: Json_ = if body_bytes.is_empty() {
|
||||
Json_::Null
|
||||
} else {
|
||||
serde_json::from_slice(&body_bytes)
|
||||
.map_err(|e| ApiError::BadRequest(format!("invalid JSON body: {e}")))?
|
||||
};
|
||||
let header_map: BTreeMap<String, String> = headers
|
||||
.iter()
|
||||
.filter_map(|(k, v)| {
|
||||
v.to_str()
|
||||
.ok()
|
||||
.map(|s| (k.as_str().to_string(), s.to_string()))
|
||||
})
|
||||
.collect();
|
||||
let query = parse_query_string(&query_str);
|
||||
let rest = matched.rest.clone().unwrap_or_default();
|
||||
|
||||
let request_id = req.request_id;
|
||||
let request_path = req.path.clone();
|
||||
let request_headers = req.headers.clone();
|
||||
let request_body = req.body.clone();
|
||||
match matched.matched.dispatch_mode {
|
||||
DispatchMode::Async => {
|
||||
handle_async_route(
|
||||
&state,
|
||||
app_id,
|
||||
matched.matched.route_id,
|
||||
matched.matched.script_id,
|
||||
&script.name,
|
||||
path,
|
||||
method,
|
||||
header_map,
|
||||
body_json,
|
||||
matched.params,
|
||||
query,
|
||||
rest,
|
||||
script.timeout_seconds,
|
||||
principal,
|
||||
)
|
||||
.await
|
||||
}
|
||||
DispatchMode::Sync => {
|
||||
handle_sync_route(
|
||||
&state,
|
||||
app_id,
|
||||
matched.matched.route_id,
|
||||
matched.matched.script_id,
|
||||
&script.name,
|
||||
path,
|
||||
method,
|
||||
header_map,
|
||||
body_json,
|
||||
matched.params,
|
||||
query,
|
||||
rest,
|
||||
script.timeout_seconds,
|
||||
principal,
|
||||
)
|
||||
.await
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let timeout = Duration::from_secs(u64::from(script.timeout_seconds));
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
async fn handle_async_route<E, R>(
|
||||
state: &DataPlaneState<E, R>,
|
||||
app_id: AppId,
|
||||
route_id: Uuid,
|
||||
script_id: ScriptId,
|
||||
script_name: &str,
|
||||
path: String,
|
||||
method: String,
|
||||
headers: BTreeMap<String, String>,
|
||||
body: Json_,
|
||||
params: BTreeMap<String, String>,
|
||||
query: BTreeMap<String, String>,
|
||||
rest: String,
|
||||
timeout_seconds: u32,
|
||||
principal: Option<Principal>,
|
||||
) -> Result<Response, ApiError>
|
||||
where
|
||||
E: ExecutorClient + 'static,
|
||||
R: ScriptResolver + 'static,
|
||||
{
|
||||
let payload = HttpDispatchPayload {
|
||||
script_name: script_name.to_string(),
|
||||
path,
|
||||
method,
|
||||
headers,
|
||||
body,
|
||||
params,
|
||||
query,
|
||||
rest,
|
||||
timeout_seconds,
|
||||
};
|
||||
let payload_value = serde_json::to_value(&payload)
|
||||
.map_err(|e| ApiError::BadRequest(format!("payload serialize: {e}")))?;
|
||||
let execution_id = ExecutionId::new();
|
||||
state
|
||||
.outbox
|
||||
.enqueue_http(NewHttpOutbox {
|
||||
app_id,
|
||||
route_id,
|
||||
script_id,
|
||||
reply_to: None,
|
||||
payload: payload_value,
|
||||
origin_principal: principal.map(|p| p.user_id),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: Some(execution_id),
|
||||
})
|
||||
.await
|
||||
.map_err(|e| ApiError::OutboxWrite(e.to_string()))?;
|
||||
Ok((
|
||||
StatusCode::ACCEPTED,
|
||||
Json(serde_json::json!({
|
||||
"accepted_at": Utc::now().to_rfc3339(),
|
||||
"execution_id": execution_id.to_string(),
|
||||
})),
|
||||
)
|
||||
.into_response())
|
||||
}
|
||||
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
async fn handle_sync_route<E, R>(
|
||||
state: &DataPlaneState<E, R>,
|
||||
app_id: AppId,
|
||||
route_id: Uuid,
|
||||
script_id: ScriptId,
|
||||
script_name: &str,
|
||||
path: String,
|
||||
method: String,
|
||||
headers: BTreeMap<String, String>,
|
||||
body: Json_,
|
||||
params: BTreeMap<String, String>,
|
||||
query: BTreeMap<String, String>,
|
||||
rest: String,
|
||||
timeout_seconds: u32,
|
||||
principal: Option<Principal>,
|
||||
) -> Result<Response, ApiError>
|
||||
where
|
||||
E: ExecutorClient + 'static,
|
||||
R: ScriptResolver + 'static,
|
||||
{
|
||||
let payload = HttpDispatchPayload {
|
||||
script_name: script_name.to_string(),
|
||||
path: path.clone(),
|
||||
method,
|
||||
headers: headers.clone(),
|
||||
body: body.clone(),
|
||||
params,
|
||||
query,
|
||||
rest,
|
||||
timeout_seconds,
|
||||
};
|
||||
let payload_value = serde_json::to_value(&payload)
|
||||
.map_err(|e| ApiError::BadRequest(format!("payload serialize: {e}")))?;
|
||||
|
||||
// Register the inbox before writing the outbox row so the
|
||||
// dispatcher can't race-deliver before the orchestrator is
|
||||
// listening.
|
||||
let (inbox_id, rx) = state.inbox.register();
|
||||
|
||||
let execution_id = ExecutionId::new();
|
||||
let outbox_id = state
|
||||
.outbox
|
||||
.enqueue_http(NewHttpOutbox {
|
||||
app_id,
|
||||
route_id,
|
||||
script_id,
|
||||
reply_to: Some(inbox_id),
|
||||
payload: payload_value,
|
||||
origin_principal: principal.map(|p| p.user_id),
|
||||
trigger_depth: 0,
|
||||
root_execution_id: Some(execution_id),
|
||||
})
|
||||
.await
|
||||
.map_err(|e| {
|
||||
// Failed outbox write — abandon the inbox so the dispatcher
|
||||
// can never deliver to a stale entry.
|
||||
state.inbox.cancel(inbox_id);
|
||||
ApiError::OutboxWrite(e.to_string())
|
||||
})?;
|
||||
|
||||
// Wait for the dispatcher's delivery. Outer timeout = script
|
||||
// wall-clock + a small buffer to cover dispatcher latency.
|
||||
let wait_budget = Duration::from_secs(u64::from(timeout_seconds)) + Duration::from_secs(2);
|
||||
let request_id = RequestId::new();
|
||||
let started = Utc::now();
|
||||
let outcome = state.executor.execute(&script.source, req, timeout).await;
|
||||
let result = tokio::time::timeout(wait_budget, rx).await;
|
||||
let finished = Utc::now();
|
||||
|
||||
let log = build_execution_log(
|
||||
script.app_id,
|
||||
matched.matched.script_id,
|
||||
// Tear down the receiver if it's still alive. `inbox.cancel` is a
|
||||
// no-op when the dispatcher already delivered.
|
||||
let _ = state.inbox.cancel(inbox_id);
|
||||
|
||||
let response = match result {
|
||||
Ok(Ok(InboxResult::Success(summary))) => http_response_from_summary(summary),
|
||||
Ok(Ok(InboxResult::Failure { kind, message })) => failure_to_response(kind, &message),
|
||||
Ok(Err(_recv)) => {
|
||||
// Channel was closed without a value — dispatcher dropped
|
||||
// the sender. Treat as platform failure.
|
||||
tracing::warn!(
|
||||
outbox_id = %outbox_id,
|
||||
"inbox channel closed without delivery"
|
||||
);
|
||||
failure_to_response(
|
||||
InboxFailureKind::Platform,
|
||||
"dispatcher closed inbox without delivery",
|
||||
)
|
||||
}
|
||||
Err(_elapsed) => {
|
||||
// Outer timeout — either the script was too slow or the
|
||||
// dispatcher is wedged. Returns 504 by default.
|
||||
failure_to_response(InboxFailureKind::Timeout, "request timed out")
|
||||
}
|
||||
};
|
||||
|
||||
let log = build_inbox_execution_log(
|
||||
app_id,
|
||||
script_id,
|
||||
request_id,
|
||||
request_path,
|
||||
request_headers,
|
||||
request_body,
|
||||
&outcome,
|
||||
path,
|
||||
headers,
|
||||
body,
|
||||
response.status().as_u16(),
|
||||
started,
|
||||
finished,
|
||||
);
|
||||
if let Err(e) = state.log_sink.record(log).await {
|
||||
tracing::warn!(
|
||||
error = %e,
|
||||
script_id = %matched.matched.script_id,
|
||||
%script_id,
|
||||
"failed to persist execution log"
|
||||
);
|
||||
}
|
||||
|
||||
Ok(exec_response_to_http(outcome?))
|
||||
Ok(response)
|
||||
}
|
||||
|
||||
fn http_response_from_summary(summary: picloud_shared::ExecResponseSummary) -> Response {
|
||||
let status =
|
||||
StatusCode::from_u16(summary.status_code).unwrap_or(StatusCode::INTERNAL_SERVER_ERROR);
|
||||
let mut http_headers = HeaderMap::new();
|
||||
for (k, v) in summary.headers {
|
||||
if let (Ok(name), Ok(value)) = (k.parse::<HeaderName>(), v.parse::<HeaderValue>()) {
|
||||
http_headers.insert(name, value);
|
||||
}
|
||||
}
|
||||
http_headers
|
||||
.entry(axum::http::header::CONTENT_TYPE)
|
||||
.or_insert_with(|| HeaderValue::from_static("application/json"));
|
||||
(status, http_headers, Json(summary.body)).into_response()
|
||||
}
|
||||
|
||||
/// Map `InboxFailureKind` onto the design-notes §3 status-code table.
|
||||
fn failure_to_response(kind: InboxFailureKind, message: &str) -> Response {
|
||||
let status = match kind {
|
||||
InboxFailureKind::Validation => StatusCode::UNPROCESSABLE_ENTITY,
|
||||
InboxFailureKind::Runtime => StatusCode::BAD_GATEWAY,
|
||||
InboxFailureKind::Overloaded => StatusCode::SERVICE_UNAVAILABLE,
|
||||
InboxFailureKind::Timeout => StatusCode::GATEWAY_TIMEOUT,
|
||||
InboxFailureKind::OperationBudget => StatusCode::INSUFFICIENT_STORAGE,
|
||||
InboxFailureKind::Platform => StatusCode::INTERNAL_SERVER_ERROR,
|
||||
};
|
||||
let body = Json(serde_json::json!({ "error": message }));
|
||||
if matches!(kind, InboxFailureKind::Overloaded) {
|
||||
return (status, [(axum::http::header::RETRY_AFTER, "1")], body).into_response();
|
||||
}
|
||||
(status, body).into_response()
|
||||
}
|
||||
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
fn build_inbox_execution_log(
|
||||
app_id: AppId,
|
||||
script_id: ScriptId,
|
||||
request_id: RequestId,
|
||||
request_path: String,
|
||||
request_headers: BTreeMap<String, String>,
|
||||
request_body: Json_,
|
||||
response_code: u16,
|
||||
started: chrono::DateTime<Utc>,
|
||||
finished: chrono::DateTime<Utc>,
|
||||
) -> ExecutionLog {
|
||||
let duration_ms = u64::try_from(
|
||||
finished
|
||||
.signed_duration_since(started)
|
||||
.num_milliseconds()
|
||||
.max(0),
|
||||
)
|
||||
.unwrap_or(0);
|
||||
let status = if (200..400).contains(&response_code) {
|
||||
ExecutionStatus::Success
|
||||
} else {
|
||||
ExecutionStatus::Error
|
||||
};
|
||||
ExecutionLog {
|
||||
id: Uuid::new_v4(),
|
||||
app_id,
|
||||
script_id,
|
||||
request_id,
|
||||
request_path,
|
||||
request_headers,
|
||||
request_body,
|
||||
response_code: Some(response_code),
|
||||
response_body: None,
|
||||
script_logs: Json_::Array(vec![]),
|
||||
duration_ms,
|
||||
status,
|
||||
created_at: started,
|
||||
}
|
||||
}
|
||||
|
||||
fn parse_query_string(s: &str) -> BTreeMap<String, String> {
|
||||
@@ -317,6 +591,11 @@ fn build_exec_request(
|
||||
// preserves the original root for chained executions.
|
||||
trigger_depth: 0,
|
||||
root_execution_id: execution_id,
|
||||
// Direct invocations are never DL handlers — that flag is only
|
||||
// set by the dispatcher when it picks a dead_letter trigger row.
|
||||
is_dead_letter_handler: false,
|
||||
// No originating trigger event for direct ingress.
|
||||
event: None,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -416,6 +695,9 @@ pub enum ApiError {
|
||||
|
||||
#[error("execution error: {0}")]
|
||||
Exec(#[from] ExecError),
|
||||
|
||||
#[error("outbox write failed: {0}")]
|
||||
OutboxWrite(String),
|
||||
}
|
||||
|
||||
impl IntoResponse for ApiError {
|
||||
@@ -439,6 +721,13 @@ impl IntoResponse for ApiError {
|
||||
let (status, message) = match &self {
|
||||
E::NotFound(_) => (StatusCode::NOT_FOUND, self.to_string()),
|
||||
E::BadRequest(_) => (StatusCode::BAD_REQUEST, self.to_string()),
|
||||
E::OutboxWrite(e) => {
|
||||
tracing::error!(error = %e, "outbox write failed");
|
||||
(
|
||||
StatusCode::INTERNAL_SERVER_ERROR,
|
||||
"internal error".to_string(),
|
||||
)
|
||||
}
|
||||
E::Resolver(e) => {
|
||||
tracing::error!(error = %e, "resolver failure");
|
||||
(
|
||||
|
||||
139
crates/orchestrator-core/src/inbox.rs
Normal file
139
crates/orchestrator-core/src/inbox.rs
Normal file
@@ -0,0 +1,139 @@
|
||||
//! In-process `InboxRegistry` — the NATS-style request/reply
|
||||
//! implementation for sync HTTP via the trigger outbox (design notes
|
||||
//! §3).
|
||||
//!
|
||||
//! Workflow:
|
||||
//! 1. Orchestrator allocates an `inbox_id`, calls
|
||||
//! `registry.register()` to get a oneshot receiver.
|
||||
//! 2. Orchestrator writes an outbox row with `reply_to = inbox_id`.
|
||||
//! 3. Dispatcher picks the row, runs the script, calls
|
||||
//! `registry.deliver(inbox_id, result)`.
|
||||
//! 4. Orchestrator's `.await` on the receiver fires; it maps the
|
||||
//! `InboxResult` back into an HTTP response.
|
||||
//!
|
||||
//! `Delivered` means the receiver was alive when delivery hit. If the
|
||||
//! orchestrator timed out and dropped the receiver before delivery,
|
||||
//! `Abandoned` comes back — the dispatcher writes an
|
||||
//! `abandoned_executions` row (design notes §3 #9).
|
||||
//!
|
||||
//! Cluster mode (v1.3+) swaps this for a Postgres `LISTEN/NOTIFY`-
|
||||
//! based resolver; the `InboxResolver` trait stays the same.
|
||||
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Mutex;
|
||||
|
||||
use async_trait::async_trait;
|
||||
use picloud_shared::{InboxDeliveryOutcome, InboxResolver, InboxResult};
|
||||
use tokio::sync::oneshot;
|
||||
use uuid::Uuid;
|
||||
|
||||
pub struct InboxRegistry {
|
||||
inner: Mutex<HashMap<Uuid, oneshot::Sender<InboxResult>>>,
|
||||
}
|
||||
|
||||
impl InboxRegistry {
|
||||
#[must_use]
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
inner: Mutex::new(HashMap::new()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Allocate a new inbox id and register the sender side. The
|
||||
/// caller awaits the returned `Receiver`; the dispatcher delivers
|
||||
/// the outcome via `deliver(id, …)`.
|
||||
#[must_use]
|
||||
pub fn register(&self) -> (Uuid, oneshot::Receiver<InboxResult>) {
|
||||
let id = Uuid::new_v4();
|
||||
let (tx, rx) = oneshot::channel();
|
||||
if let Ok(mut g) = self.inner.lock() {
|
||||
g.insert(id, tx);
|
||||
}
|
||||
(id, rx)
|
||||
}
|
||||
|
||||
/// Cancel a pending inbox (orchestrator timed out and gave up).
|
||||
/// Drops the sender so any future `deliver` returns `Abandoned`.
|
||||
/// Returns `true` if the receiver was still registered.
|
||||
pub fn cancel(&self, id: Uuid) -> bool {
|
||||
self.inner
|
||||
.lock()
|
||||
.map(|mut g| g.remove(&id).is_some())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for InboxRegistry {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl InboxResolver for InboxRegistry {
|
||||
async fn deliver(&self, inbox_id: Uuid, result: InboxResult) -> InboxDeliveryOutcome {
|
||||
let Ok(mut g) = self.inner.lock() else {
|
||||
return InboxDeliveryOutcome::Abandoned;
|
||||
};
|
||||
let Some(tx) = g.remove(&inbox_id) else {
|
||||
return InboxDeliveryOutcome::Abandoned;
|
||||
};
|
||||
// `send` returns Err iff the receiver was dropped — exactly
|
||||
// the abandoned-execution case.
|
||||
if tx.send(result).is_err() {
|
||||
InboxDeliveryOutcome::Abandoned
|
||||
} else {
|
||||
InboxDeliveryOutcome::Delivered
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use picloud_shared::ExecResponseSummary;
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
fn ok_result() -> InboxResult {
|
||||
InboxResult::Success(ExecResponseSummary {
|
||||
status_code: 200,
|
||||
headers: BTreeMap::new(),
|
||||
body: serde_json::json!({ "ok": true }),
|
||||
})
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn register_then_deliver_resolves_receiver() {
|
||||
let reg = InboxRegistry::new();
|
||||
let (id, rx) = reg.register();
|
||||
let outcome = reg.deliver(id, ok_result()).await;
|
||||
assert_eq!(outcome, InboxDeliveryOutcome::Delivered);
|
||||
let received = rx.await.expect("receiver should fire");
|
||||
assert!(matches!(received, InboxResult::Success(_)));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn deliver_to_unknown_id_is_abandoned() {
|
||||
let reg = InboxRegistry::new();
|
||||
let outcome = reg.deliver(Uuid::new_v4(), ok_result()).await;
|
||||
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn dropping_receiver_then_delivering_is_abandoned() {
|
||||
let reg = InboxRegistry::new();
|
||||
let (id, rx) = reg.register();
|
||||
drop(rx);
|
||||
let outcome = reg.deliver(id, ok_result()).await;
|
||||
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn cancel_removes_sender() {
|
||||
let reg = InboxRegistry::new();
|
||||
let (id, _rx) = reg.register();
|
||||
assert!(reg.cancel(id));
|
||||
let outcome = reg.deliver(id, ok_result()).await;
|
||||
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
|
||||
}
|
||||
}
|
||||
@@ -11,10 +11,12 @@
|
||||
pub mod api;
|
||||
pub mod client;
|
||||
pub mod gate;
|
||||
pub mod inbox;
|
||||
pub mod resolver;
|
||||
pub mod routing;
|
||||
|
||||
pub use api::{data_plane_router, user_routes_router, DataPlaneState};
|
||||
pub use client::{ExecutorClient, LocalExecutorClient, RemoteExecutorClient};
|
||||
pub use gate::{AcquireError, ExecutionGate};
|
||||
pub use inbox::InboxRegistry;
|
||||
pub use resolver::{ResolverError, ScriptResolver};
|
||||
|
||||
@@ -38,6 +38,11 @@ pub struct MatchResult {
|
||||
pub struct Matched {
|
||||
pub route_id: uuid::Uuid,
|
||||
pub script_id: picloud_shared::ScriptId,
|
||||
/// Per-route dispatch mode (v1.1.1). Forwarded to the
|
||||
/// orchestrator's HTTP handler so it can pick the sync or async
|
||||
/// path. Defaults to `Sync` for older routes that predate the
|
||||
/// column.
|
||||
pub dispatch_mode: picloud_shared::DispatchMode,
|
||||
}
|
||||
|
||||
/// A single route ready for matching. `app_id` is carried so the
|
||||
@@ -51,6 +56,7 @@ pub struct CompiledRoute {
|
||||
pub host: HostPattern,
|
||||
pub path: PathPattern,
|
||||
pub method: Option<String>,
|
||||
pub dispatch_mode: picloud_shared::DispatchMode,
|
||||
}
|
||||
|
||||
/// Find the best matching route for the request. Returns `None` if no
|
||||
@@ -180,6 +186,7 @@ fn match_within_bucket(
|
||||
matched: Matched {
|
||||
route_id: route.route_id,
|
||||
script_id: route.script_id,
|
||||
dispatch_mode: route.dispatch_mode,
|
||||
},
|
||||
params: BTreeMap::new(),
|
||||
rest: None,
|
||||
@@ -230,6 +237,7 @@ fn match_within_bucket(
|
||||
matched: Matched {
|
||||
route_id: route.route_id,
|
||||
script_id: route.script_id,
|
||||
dispatch_mode: route.dispatch_mode,
|
||||
},
|
||||
params,
|
||||
rest,
|
||||
@@ -312,6 +320,7 @@ mod tests {
|
||||
host,
|
||||
path: parse_path(path_kind, raw).unwrap(),
|
||||
method: None,
|
||||
dispatch_mode: picloud_shared::DispatchMode::Sync,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -11,22 +11,28 @@ use axum::{routing::get, Json, Router};
|
||||
use picloud_executor_core::{Engine, Limits};
|
||||
use picloud_manager_core::{
|
||||
admin_router, admins_router, api_keys_router, app_members_router, apps_api, apps_router,
|
||||
attach_principal_if_present, auth_router, compile_routes, migrations, require_authenticated,
|
||||
route_admin_router, AdminSessionRepository, AdminState, AdminUserRepository, AdminsState,
|
||||
attach_principal_if_present, auth_router, compile_routes, dead_letters_router, migrations,
|
||||
require_authenticated, route_admin_router, triggers_router, AbandonedRepo,
|
||||
AdminPrincipalResolver, AdminSessionRepository, AdminState, AdminUserRepository, AdminsState,
|
||||
ApiKeyRepository, ApiKeysState, AppDomainRepository, AppMembersRepository, AppMembersState,
|
||||
AppRepository, AppsState, AuthState, AuthzRepo, PostgresAdminSessionRepository,
|
||||
PostgresAdminUserRepository, PostgresApiKeyRepository, PostgresAppDomainRepository,
|
||||
PostgresAppMembersRepository, PostgresAppRepository, PostgresExecutionLogRepository,
|
||||
PostgresExecutionLogSink, PostgresRouteRepository, PostgresScriptRepository, RepoResolver,
|
||||
RouteAdminState, RouteRepository, SandboxCeiling,
|
||||
AppRepository, AppsState, AuthState, AuthzRepo, DeadLetterRepo, DeadLettersState, Dispatcher,
|
||||
KvServiceImpl, OutboxEventEmitter, OutboxRepo, PostgresAbandonedRepo,
|
||||
PostgresAdminSessionRepository, PostgresAdminUserRepository, PostgresApiKeyRepository,
|
||||
PostgresAppDomainRepository, PostgresAppMembersRepository, PostgresAppRepository,
|
||||
PostgresDeadLetterRepo, PostgresDeadLetterService, PostgresExecutionLogRepository,
|
||||
PostgresExecutionLogSink, PostgresKvRepo, PostgresOutboxRepo, PostgresRouteRepository,
|
||||
PostgresScriptRepository, PostgresTriggerRepo, PrincipalResolver, RepoResolver,
|
||||
RouteAdminState, RouteRepository, SandboxCeiling, ScriptRepository, TriggerConfig, TriggerRepo,
|
||||
TriggersState,
|
||||
};
|
||||
use picloud_orchestrator_core::routing::{AppDomainTable, RouteTable};
|
||||
use picloud_orchestrator_core::{
|
||||
data_plane_router, user_routes_router, DataPlaneState, ExecutionGate, LocalExecutorClient,
|
||||
data_plane_router, user_routes_router, DataPlaneState, ExecutionGate, InboxRegistry,
|
||||
LocalExecutorClient,
|
||||
};
|
||||
use picloud_shared::{
|
||||
ExecutionLogSink, ScriptValidator, Services, API_VERSION, PRODUCT_VERSION, SDK_VERSION,
|
||||
WIRE_VERSION,
|
||||
DeadLetterService, ExecutionLogSink, InboxResolver, KvService, OutboxWriter, ScriptValidator,
|
||||
ServiceEventEmitter, Services, API_VERSION, PRODUCT_VERSION, SDK_VERSION, WIRE_VERSION,
|
||||
};
|
||||
use sqlx::postgres::PgPoolOptions;
|
||||
use sqlx::PgPool;
|
||||
@@ -83,10 +89,6 @@ fn read_session_ttl() -> Duration {
|
||||
/// `/version`) stays open — it's the public ingress for user scripts.
|
||||
#[allow(clippy::too_many_lines)]
|
||||
pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
|
||||
// `Services` is the SDK service bundle. Empty in v1.1.0; the
|
||||
// v1.1.1 KV PR will populate it with `kv: Arc::new(...)` here.
|
||||
let engine = Arc::new(Engine::new(Limits::default(), Services::new()));
|
||||
|
||||
let script_repo = Arc::new(PostgresScriptRepository::new(pool.clone()));
|
||||
let log_repo = Arc::new(PostgresExecutionLogRepository::new(pool.clone()));
|
||||
let log_sink: Arc<dyn ExecutionLogSink> = Arc::new(PostgresExecutionLogSink::new(pool.clone()));
|
||||
@@ -98,10 +100,43 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
|
||||
// (CRUD over the table) and `AuthzRepo` (single-row membership lookup
|
||||
// for capability checks). Construct it once and clone the Arc into
|
||||
// both trait views — same allocation, two vtables.
|
||||
let members_concrete = Arc::new(PostgresAppMembersRepository::new(pool));
|
||||
let members_concrete = Arc::new(PostgresAppMembersRepository::new(pool.clone()));
|
||||
let members: Arc<dyn AppMembersRepository> = members_concrete.clone();
|
||||
let authz: Arc<dyn AuthzRepo> = members_concrete;
|
||||
|
||||
// Triggers framework storage. The outbox event emitter routes
|
||||
// KV mutations into the outbox; the dispatcher fans them out.
|
||||
let trigger_repo: Arc<dyn TriggerRepo> = Arc::new(PostgresTriggerRepo::new(pool.clone()));
|
||||
// PostgresOutboxRepo implements both `OutboxRepo` (the dispatcher
|
||||
// surface) and `OutboxWriter` (the orchestrator surface). Construct
|
||||
// the concrete Arc once, clone it into each trait view — same
|
||||
// allocation, two vtables (mirrors how `members_concrete` above is
|
||||
// used as both `AppMembersRepository` and `AuthzRepo`).
|
||||
let outbox_concrete = Arc::new(PostgresOutboxRepo::new(pool.clone()));
|
||||
let outbox_repo: Arc<dyn OutboxRepo> = outbox_concrete.clone();
|
||||
let outbox_writer: Arc<dyn OutboxWriter> = outbox_concrete;
|
||||
let dl_repo: Arc<dyn DeadLetterRepo> = Arc::new(PostgresDeadLetterRepo::new(pool.clone()));
|
||||
let abandoned_repo: Arc<dyn AbandonedRepo> = Arc::new(PostgresAbandonedRepo::new(pool.clone()));
|
||||
let trigger_config = TriggerConfig::from_env();
|
||||
|
||||
// SDK services bundle. v1.1.1 ships the KV store + the
|
||||
// outbox-backed event emitter + the dead-letter service (replay /
|
||||
// resolve).
|
||||
let kv_repo = Arc::new(PostgresKvRepo::new(pool));
|
||||
let events: Arc<dyn ServiceEventEmitter> = Arc::new(OutboxEventEmitter::new(
|
||||
trigger_repo.clone(),
|
||||
outbox_repo.clone(),
|
||||
));
|
||||
let kv: Arc<dyn KvService> =
|
||||
Arc::new(KvServiceImpl::new(kv_repo, authz.clone(), events.clone()));
|
||||
let dl_service: Arc<dyn DeadLetterService> = Arc::new(PostgresDeadLetterService::new(
|
||||
dl_repo.clone(),
|
||||
outbox_repo.clone(),
|
||||
authz.clone(),
|
||||
));
|
||||
let services = Services::new(kv, dl_service.clone(), events);
|
||||
let engine = Arc::new(Engine::new(Limits::default(), services));
|
||||
|
||||
// Compile the routes table once at startup; admin writes refresh it.
|
||||
let route_table = Arc::new(RouteTable::new());
|
||||
let initial = route_repo.list_all().await?;
|
||||
@@ -132,7 +167,34 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
|
||||
// Single global gate — overflow is rejected with 503 + Retry-After.
|
||||
// See `ExecutionGate` docs and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`.
|
||||
let gate = Arc::new(ExecutionGate::from_env());
|
||||
let executor = Arc::new(LocalExecutorClient::new(engine.clone(), gate));
|
||||
let executor = Arc::new(LocalExecutorClient::new(engine.clone(), gate.clone()));
|
||||
|
||||
// Dispatcher — single tokio task that polls the outbox and routes
|
||||
// due rows to the executor. Shares the `ExecutionGate` with sync
|
||||
// HTTP per design notes §2 (one cap for everything).
|
||||
let dispatcher_script_repo: Arc<dyn ScriptRepository> =
|
||||
Arc::new(PostgresScriptRepoHandle(script_repo.clone()));
|
||||
let principals: Arc<dyn PrincipalResolver> =
|
||||
Arc::new(AdminPrincipalResolver::new(auth.users.clone()));
|
||||
// The InboxRegistry is constructed once and shared between the
|
||||
// orchestrator (registers receivers, awaits) and the dispatcher
|
||||
// (delivers results). Two Arc views on the same allocation.
|
||||
let inbox_registry = Arc::new(InboxRegistry::new());
|
||||
let inbox_resolver: Arc<dyn InboxResolver> = inbox_registry.clone();
|
||||
Dispatcher {
|
||||
outbox: outbox_repo.clone(),
|
||||
triggers: trigger_repo.clone(),
|
||||
scripts: dispatcher_script_repo,
|
||||
dead_letters: dl_repo.clone(),
|
||||
abandoned: abandoned_repo.clone(),
|
||||
principals,
|
||||
executor: executor.clone(),
|
||||
gate,
|
||||
inbox: inbox_resolver,
|
||||
config: trigger_config,
|
||||
instance_id: format!("picloud-{}", std::process::id()),
|
||||
}
|
||||
.spawn();
|
||||
|
||||
let admin = AdminState {
|
||||
repo: Arc::new(PostgresScriptRepoHandle(script_repo.clone())),
|
||||
@@ -155,6 +217,30 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
|
||||
log_sink,
|
||||
app_domains: app_domain_table.clone(),
|
||||
routes: route_table,
|
||||
inbox: inbox_registry,
|
||||
outbox: outbox_writer,
|
||||
};
|
||||
// Weekly retention sweepers for dead_letters + abandoned_executions.
|
||||
// Defaults: 30 days / 7 days (design notes §3 #9 + §4 retention).
|
||||
picloud_manager_core::spawn_dead_letter_gc(
|
||||
dl_repo.clone(),
|
||||
trigger_config.dead_letter_retention_days,
|
||||
);
|
||||
picloud_manager_core::spawn_abandoned_gc(
|
||||
abandoned_repo.clone(),
|
||||
trigger_config.abandoned_retention_days,
|
||||
);
|
||||
let triggers_state = TriggersState {
|
||||
triggers: trigger_repo,
|
||||
apps: apps_repo.clone(),
|
||||
authz: authz.clone(),
|
||||
config: trigger_config,
|
||||
};
|
||||
let dead_letters_state = DeadLettersState {
|
||||
repo: dl_repo,
|
||||
service: dl_service,
|
||||
apps: apps_repo.clone(),
|
||||
authz: authz.clone(),
|
||||
};
|
||||
let apps_state = AppsState {
|
||||
apps: apps_repo,
|
||||
@@ -197,6 +283,8 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
|
||||
.merge(apps_router(apps_state))
|
||||
.merge(app_members_router(app_members_state))
|
||||
.merge(api_keys_router(api_keys_state))
|
||||
.merge(triggers_router(triggers_state))
|
||||
.merge(dead_letters_router(dead_letters_state))
|
||||
.layer(from_fn_with_state(
|
||||
auth_state.clone(),
|
||||
require_authenticated,
|
||||
|
||||
118
crates/shared/src/dead_letters.rs
Normal file
118
crates/shared/src/dead_letters.rs
Normal file
@@ -0,0 +1,118 @@
|
||||
//! `DeadLetterService` — Rhai SDK contract for replaying and resolving
|
||||
//! dead letters. Surface kept intentionally narrow for v1.1.1 (no
|
||||
//! `list` — deferred to v1.2 per `docs/v1.1.x-design-notes.md` §4).
|
||||
//!
|
||||
//! Both methods are gated by `Capability::AppDeadLetterManage(AppId)`
|
||||
//! evaluated inside the impl. Public-HTTP scripts running with
|
||||
//! `cx.principal = None` will fail the check, which matches the
|
||||
//! design's expectation (managing dead letters is an admin act).
|
||||
|
||||
use async_trait::async_trait;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use thiserror::Error;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::SdkCallCx;
|
||||
|
||||
/// Opaque identifier for a `dead_letters` row.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
|
||||
#[serde(transparent)]
|
||||
pub struct DeadLetterId(pub Uuid);
|
||||
|
||||
impl DeadLetterId {
|
||||
#[must_use]
|
||||
pub fn new() -> Self {
|
||||
Self(Uuid::new_v4())
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn into_inner(self) -> Uuid {
|
||||
self.0
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for DeadLetterId {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl From<Uuid> for DeadLetterId {
|
||||
fn from(u: Uuid) -> Self {
|
||||
Self(u)
|
||||
}
|
||||
}
|
||||
|
||||
impl From<DeadLetterId> for Uuid {
|
||||
fn from(id: DeadLetterId) -> Self {
|
||||
id.0
|
||||
}
|
||||
}
|
||||
|
||||
impl std::fmt::Display for DeadLetterId {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
self.0.fmt(f)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait DeadLetterService: Send + Sync {
|
||||
/// Re-enqueue the original event into the outbox. The dead-letter
|
||||
/// row is marked `resolution = 'replayed'` regardless of whether
|
||||
/// the retry ultimately succeeds.
|
||||
async fn replay(&self, cx: &SdkCallCx, id: DeadLetterId) -> Result<(), DeadLetterError>;
|
||||
|
||||
/// Mark the row resolved with the given reason (typically
|
||||
/// `"ignored"` from the dashboard or `"handled_by_script"` from
|
||||
/// inside a `dead_letter` trigger handler).
|
||||
async fn resolve(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
id: DeadLetterId,
|
||||
reason: &str,
|
||||
) -> Result<(), DeadLetterError>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum DeadLetterError {
|
||||
#[error("dead-letter row not found")]
|
||||
NotFound,
|
||||
|
||||
#[error("forbidden")]
|
||||
Forbidden,
|
||||
|
||||
#[error("invalid resolution reason: {0}")]
|
||||
InvalidResolution(String),
|
||||
|
||||
#[error("dead-letter backend error: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
|
||||
/// Stub used to bootstrap the `Services` bundle before the real
|
||||
/// Postgres-backed implementation lands. Behaves like
|
||||
/// `NoopEventEmitter` — every call returns `Backend("...")` so scripts
|
||||
/// see a clear "not yet implemented" error rather than silently
|
||||
/// no-op'ing. Replaced by `PostgresDeadLetterService` in the v1.1.1
|
||||
/// dead-letter PR.
|
||||
#[derive(Debug, Default, Clone, Copy)]
|
||||
pub struct NoopDeadLetterService;
|
||||
|
||||
#[async_trait]
|
||||
impl DeadLetterService for NoopDeadLetterService {
|
||||
async fn replay(&self, _cx: &SdkCallCx, _id: DeadLetterId) -> Result<(), DeadLetterError> {
|
||||
Err(DeadLetterError::Backend(
|
||||
"dead_letters::replay is not yet wired in".into(),
|
||||
))
|
||||
}
|
||||
|
||||
async fn resolve(
|
||||
&self,
|
||||
_cx: &SdkCallCx,
|
||||
_id: DeadLetterId,
|
||||
_reason: &str,
|
||||
) -> Result<(), DeadLetterError> {
|
||||
Err(DeadLetterError::Backend(
|
||||
"dead_letters::resolve is not yet wired in".into(),
|
||||
))
|
||||
}
|
||||
}
|
||||
16
crates/shared/src/exec_summary.rs
Normal file
16
crates/shared/src/exec_summary.rs
Normal file
@@ -0,0 +1,16 @@
|
||||
//! `ExecResponseSummary` — a flattened, crate-portable view of an
|
||||
//! `ExecResponse` for use by `InboxResult`. Lives in
|
||||
//! `picloud-shared` because the dispatcher (manager-core) and the
|
||||
//! orchestrator-core inbox registry both need to read it, and
|
||||
//! `executor-core::ExecResponse` is owned by a leaf crate.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ExecResponseSummary {
|
||||
pub status_code: u16,
|
||||
pub headers: BTreeMap<String, String>,
|
||||
pub body: serde_json::Value,
|
||||
}
|
||||
@@ -53,3 +53,4 @@ id_type!(RequestId);
|
||||
id_type!(AdminUserId);
|
||||
id_type!(AppId);
|
||||
id_type!(ApiKeyId);
|
||||
id_type!(TriggerId);
|
||||
|
||||
86
crates/shared/src/inbox.rs
Normal file
86
crates/shared/src/inbox.rs
Normal file
@@ -0,0 +1,86 @@
|
||||
//! `InboxResolver` — abstraction the dispatcher uses to deliver sync
|
||||
//! HTTP results back to the orchestrator that's awaiting them on a
|
||||
//! oneshot channel. Lives in `picloud-shared` because the dispatcher
|
||||
//! (manager-core) and the registry impl (orchestrator-core) live in
|
||||
//! different crates and need a shared trait surface.
|
||||
//!
|
||||
//! v1.1.1 ships an in-process implementation in `orchestrator-core`
|
||||
//! that keeps a `HashMap<inbox_id, oneshot::Sender<...>>`. Cluster
|
||||
//! mode (v1.3+) swaps this for a Postgres `LISTEN/NOTIFY`-based
|
||||
//! resolver without touching the dispatcher code (design notes §3
|
||||
//! implementation table).
|
||||
//!
|
||||
//! Until commit 6 wires up the real registry, `NoopInboxResolver`
|
||||
//! (`Abandoned` for every attempt) keeps the dispatcher able to run.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::ExecResponseSummary;
|
||||
|
||||
/// Result of trying to hand back a sync-HTTP outcome.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum InboxDeliveryOutcome {
|
||||
/// Receiver still attached; result was delivered. Dispatcher
|
||||
/// deletes the outbox row.
|
||||
Delivered,
|
||||
/// Receiver was dropped (orchestrator timed out). Dispatcher
|
||||
/// writes an `abandoned_executions` row.
|
||||
Abandoned,
|
||||
}
|
||||
|
||||
/// Outcome shape the dispatcher delivers to the inbox. Carries enough
|
||||
/// to reconstruct an HTTP response — full body via JSON, optional
|
||||
/// error string when the executor reported a failure.
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum InboxResult {
|
||||
/// Successful execution. `response` is the `ExecResponse` summary
|
||||
/// (status code + body + headers + logs).
|
||||
Success(ExecResponseSummary),
|
||||
/// Failure modes — script threw, op-budget, timeout, etc. The
|
||||
/// orchestrator maps these to the design-notes §3 status codes
|
||||
/// (422/502/503/504/507/500) when responding to the HTTP caller.
|
||||
Failure {
|
||||
kind: InboxFailureKind,
|
||||
message: String,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum InboxFailureKind {
|
||||
/// Script's Rhai code threw or hit a runtime error → 502.
|
||||
Runtime,
|
||||
/// Wall-clock exceeded → 504.
|
||||
Timeout,
|
||||
/// Operation budget exceeded → 507.
|
||||
OperationBudget,
|
||||
/// Gate refused admission → 503.
|
||||
Overloaded,
|
||||
/// Script parse failure / bad-request → 422.
|
||||
Validation,
|
||||
/// Platform problem (executor crashed, dispatcher crashed, etc.) → 500.
|
||||
Platform,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait InboxResolver: Send + Sync {
|
||||
/// Attempt to deliver `result` to the receiver registered under
|
||||
/// `inbox_id`. Returns `Delivered` if the channel was alive,
|
||||
/// `Abandoned` if the receiver was already dropped (the
|
||||
/// orchestrator's timeout fired before the dispatcher got here).
|
||||
async fn deliver(&self, inbox_id: Uuid, result: InboxResult) -> InboxDeliveryOutcome;
|
||||
}
|
||||
|
||||
/// Bootstrap impl used before the real registry is wired in. Every
|
||||
/// delivery is treated as abandoned — the dispatcher records an
|
||||
/// abandoned-execution row and moves on. Replaced in `build_app` with
|
||||
/// the in-process `InboxRegistry` from orchestrator-core.
|
||||
#[derive(Debug, Default, Clone, Copy)]
|
||||
pub struct NoopInboxResolver;
|
||||
|
||||
#[async_trait]
|
||||
impl InboxResolver for NoopInboxResolver {
|
||||
async fn deliver(&self, _inbox_id: Uuid, _result: InboxResult) -> InboxDeliveryOutcome {
|
||||
InboxDeliveryOutcome::Abandoned
|
||||
}
|
||||
}
|
||||
140
crates/shared/src/kv.rs
Normal file
140
crates/shared/src/kv.rs
Normal file
@@ -0,0 +1,140 @@
|
||||
//! `KvService` — the v1.1.1 key-value store contract.
|
||||
//!
|
||||
//! Lives in `picloud-shared` (not `executor-core`) so the Rhai bridge,
|
||||
//! the manager-core Postgres impl, and any future in-memory test impl
|
||||
//! can all depend on the same trait without dragging
|
||||
//! `executor-core` into `manager-core`'s dep graph.
|
||||
//!
|
||||
//! Implementations MUST derive every storage `app_id` from `cx.app_id`
|
||||
//! — never from a script-passed argument. That is the cross-app
|
||||
//! isolation boundary; see `docs/sdk-shape.md`.
|
||||
|
||||
use async_trait::async_trait;
|
||||
use thiserror::Error;
|
||||
|
||||
use crate::SdkCallCx;
|
||||
|
||||
/// `KvService` is collection-scoped. Scripts get a handle via
|
||||
/// `kv::collection(name)` and call `get`/`set`/`has`/`delete`/`list`
|
||||
/// on it. The trait surface accepts the collection by name so the
|
||||
/// Postgres impl can avoid an extra round-trip to materialize the
|
||||
/// collection (collections are namespaces, not first-class rows).
|
||||
#[async_trait]
|
||||
pub trait KvService: Send + Sync {
|
||||
async fn get(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvError>;
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
key: &str,
|
||||
value: serde_json::Value,
|
||||
) -> Result<(), KvError>;
|
||||
|
||||
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError>;
|
||||
|
||||
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError>;
|
||||
|
||||
/// Cursor-style pagination. `cursor` is opaque to the caller;
|
||||
/// implementations encode the resume key inside. `None` cursor
|
||||
/// starts from the beginning. Implementations cap `limit` at a
|
||||
/// reasonable ceiling internally (script can't request an unbounded
|
||||
/// page).
|
||||
async fn list(
|
||||
&self,
|
||||
cx: &SdkCallCx,
|
||||
collection: &str,
|
||||
cursor: Option<&str>,
|
||||
limit: u32,
|
||||
) -> Result<KvListPage, KvError>;
|
||||
}
|
||||
|
||||
/// One page of keys from `KvService::list`. `next_cursor` is `Some`
|
||||
/// when more pages exist, `None` when exhausted. The cursor encoding
|
||||
/// is implementation-defined (the Postgres impl base64-encodes the
|
||||
/// last key).
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct KvListPage {
|
||||
pub keys: Vec<String>,
|
||||
pub next_cursor: Option<String>,
|
||||
}
|
||||
|
||||
/// Stub used by the test harness so executor-core integration tests
|
||||
/// (which don't touch KV) can construct a `Services` bundle without
|
||||
/// spinning up Postgres. Every call returns
|
||||
/// `KvError::Backend("...")` so accidental KV use surfaces clearly.
|
||||
#[derive(Debug, Default, Clone, Copy)]
|
||||
pub struct NoopKvService;
|
||||
|
||||
#[async_trait]
|
||||
impl KvService for NoopKvService {
|
||||
async fn get(
|
||||
&self,
|
||||
_cx: &SdkCallCx,
|
||||
_collection: &str,
|
||||
_key: &str,
|
||||
) -> Result<Option<serde_json::Value>, KvError> {
|
||||
Err(KvError::Backend("kv is not wired in".into()))
|
||||
}
|
||||
|
||||
async fn set(
|
||||
&self,
|
||||
_cx: &SdkCallCx,
|
||||
_collection: &str,
|
||||
_key: &str,
|
||||
_value: serde_json::Value,
|
||||
) -> Result<(), KvError> {
|
||||
Err(KvError::Backend("kv is not wired in".into()))
|
||||
}
|
||||
|
||||
async fn delete(
|
||||
&self,
|
||||
_cx: &SdkCallCx,
|
||||
_collection: &str,
|
||||
_key: &str,
|
||||
) -> Result<bool, KvError> {
|
||||
Err(KvError::Backend("kv is not wired in".into()))
|
||||
}
|
||||
|
||||
async fn has(&self, _cx: &SdkCallCx, _collection: &str, _key: &str) -> Result<bool, KvError> {
|
||||
Err(KvError::Backend("kv is not wired in".into()))
|
||||
}
|
||||
|
||||
async fn list(
|
||||
&self,
|
||||
_cx: &SdkCallCx,
|
||||
_collection: &str,
|
||||
_cursor: Option<&str>,
|
||||
_limit: u32,
|
||||
) -> Result<KvListPage, KvError> {
|
||||
Err(KvError::Backend("kv is not wired in".into()))
|
||||
}
|
||||
}
|
||||
|
||||
/// Failure modes surfaced to the Rhai bridge. The bridge converts each
|
||||
/// to a Rhai runtime error string; the discriminants exist so internal
|
||||
/// callers (admin endpoints, tests, GC) can react more precisely.
|
||||
#[derive(Debug, Error)]
|
||||
pub enum KvError {
|
||||
/// Empty collection name; rejected at the SDK boundary per
|
||||
/// `docs/sdk-shape.md`.
|
||||
#[error("collection name must not be empty")]
|
||||
InvalidCollection,
|
||||
|
||||
/// Caller principal lacked the required capability. Only raised
|
||||
/// when `cx.principal.is_some()` — scripts running with
|
||||
/// `principal: None` (public HTTP) operate under script-as-gate
|
||||
/// semantics and skip the capability check.
|
||||
#[error("forbidden")]
|
||||
Forbidden,
|
||||
|
||||
/// Anything else — Postgres unavailable, serialization failure,
|
||||
/// etc. The string is safe to surface to a script.
|
||||
#[error("kv backend error: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
@@ -6,30 +6,44 @@
|
||||
|
||||
pub mod app;
|
||||
pub mod auth;
|
||||
pub mod dead_letters;
|
||||
pub mod error;
|
||||
pub mod events;
|
||||
pub mod exec_summary;
|
||||
pub mod execution_log;
|
||||
pub mod ids;
|
||||
pub mod inbox;
|
||||
pub mod kv;
|
||||
pub mod log_sink;
|
||||
pub mod outbox_writer;
|
||||
pub mod route;
|
||||
pub mod sandbox;
|
||||
pub mod script;
|
||||
pub mod sdk_cx;
|
||||
pub mod services;
|
||||
pub mod trigger_event;
|
||||
pub mod validator;
|
||||
pub mod version;
|
||||
|
||||
pub use app::{App, AppDomain, DomainShape};
|
||||
pub use auth::{AppRole, InstanceRole, Principal, Scope, UserId};
|
||||
pub use dead_letters::{DeadLetterError, DeadLetterId, DeadLetterService, NoopDeadLetterService};
|
||||
pub use error::Error;
|
||||
pub use events::{EmitError, NoopEventEmitter, ServiceEvent, ServiceEventEmitter};
|
||||
pub use exec_summary::ExecResponseSummary;
|
||||
pub use execution_log::{ExecutionLog, ExecutionStatus};
|
||||
pub use ids::{AdminUserId, ApiKeyId, AppId, ExecutionId, RequestId, ScriptId};
|
||||
pub use ids::{AdminUserId, ApiKeyId, AppId, ExecutionId, RequestId, ScriptId, TriggerId};
|
||||
pub use inbox::{
|
||||
InboxDeliveryOutcome, InboxFailureKind, InboxResolver, InboxResult, NoopInboxResolver,
|
||||
};
|
||||
pub use kv::{KvError, KvListPage, KvService, NoopKvService};
|
||||
pub use log_sink::{ExecutionLogSink, LogSinkError};
|
||||
pub use route::{HostKind, PathKind, Route};
|
||||
pub use outbox_writer::{HttpDispatchPayload, NewHttpOutbox, OutboxWriter, OutboxWriterError};
|
||||
pub use route::{DispatchMode, HostKind, PathKind, Route};
|
||||
pub use sandbox::ScriptSandbox;
|
||||
pub use script::Script;
|
||||
pub use sdk_cx::SdkCallCx;
|
||||
pub use services::Services;
|
||||
pub use trigger_event::{DeadLetterEventDetail, KvEventOp, TriggerEvent};
|
||||
pub use validator::{ScriptValidator, ValidationError};
|
||||
pub use version::{API_VERSION, PRODUCT_VERSION, SDK_VERSION, WIRE_VERSION};
|
||||
|
||||
72
crates/shared/src/outbox_writer.rs
Normal file
72
crates/shared/src/outbox_writer.rs
Normal file
@@ -0,0 +1,72 @@
|
||||
//! `OutboxWriter` — minimal trait the orchestrator-core sync-HTTP path
|
||||
//! uses to enqueue rows into the universal trigger outbox. The
|
||||
//! manager-core `PostgresOutboxRepo` implements this in addition to
|
||||
//! its richer `OutboxRepo` surface; defining it here lets
|
||||
//! orchestrator-core depend on the trait without pulling in
|
||||
//! manager-core (which would invert the dependency arrow).
|
||||
|
||||
use async_trait::async_trait;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use thiserror::Error;
|
||||
use uuid::Uuid;
|
||||
|
||||
use crate::{AdminUserId, AppId, ExecutionId, ScriptId};
|
||||
|
||||
/// What the orchestrator hands to the outbox when it ingests an HTTP
|
||||
/// request. Carries enough for the dispatcher to reconstruct the
|
||||
/// `ExecRequest` end-to-end.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct NewHttpOutbox {
|
||||
pub app_id: AppId,
|
||||
/// `routes.id` of the matched route. Discriminated against
|
||||
/// `triggers.id` by `source_kind = 'http'` on the outbox row.
|
||||
pub route_id: Uuid,
|
||||
/// Pre-resolved script so the dispatcher doesn't re-look it up.
|
||||
pub script_id: ScriptId,
|
||||
/// `Some(inbox_id)` for sync HTTP (the orchestrator awaits a
|
||||
/// channel keyed on this id). `None` for `dispatch_mode = async`
|
||||
/// — dispatcher fires-and-forgets, no reply path.
|
||||
pub reply_to: Option<Uuid>,
|
||||
/// Serialized `HttpDispatchPayload` (defined below) — everything
|
||||
/// the dispatcher needs to reconstruct an `ExecRequest`.
|
||||
pub payload: serde_json::Value,
|
||||
/// The principal that ingressed the HTTP request (Some when
|
||||
/// authenticated, None for public). Forensic only; the script
|
||||
/// executes as the route's app principal model, not this.
|
||||
pub origin_principal: Option<AdminUserId>,
|
||||
/// `0` for direct HTTP ingress; the dispatcher will increment
|
||||
/// for any further fan-out triggered by the script.
|
||||
pub trigger_depth: u32,
|
||||
pub root_execution_id: Option<ExecutionId>,
|
||||
}
|
||||
|
||||
/// The shape the orchestrator serializes into `NewHttpOutbox.payload`
|
||||
/// (the JSONB column). Mirrored on the dispatcher side so it can
|
||||
/// rebuild an `ExecRequest`.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct HttpDispatchPayload {
|
||||
pub script_name: String,
|
||||
pub path: String,
|
||||
pub method: String,
|
||||
pub headers: std::collections::BTreeMap<String, String>,
|
||||
pub body: serde_json::Value,
|
||||
pub params: std::collections::BTreeMap<String, String>,
|
||||
pub query: std::collections::BTreeMap<String, String>,
|
||||
pub rest: String,
|
||||
pub timeout_seconds: u32,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub trait OutboxWriter: Send + Sync {
|
||||
/// Insert a sync- or async-HTTP outbox row. Returns the row's id
|
||||
/// — the orchestrator stores it locally for forensics and to
|
||||
/// correlate `abandoned_executions` rows when the dispatcher's
|
||||
/// inbox delivery fails.
|
||||
async fn enqueue_http(&self, row: NewHttpOutbox) -> Result<Uuid, OutboxWriterError>;
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum OutboxWriterError {
|
||||
#[error("outbox write failed: {0}")]
|
||||
Backend(String),
|
||||
}
|
||||
@@ -37,6 +37,38 @@ pub enum PathKind {
|
||||
Param,
|
||||
}
|
||||
|
||||
/// Per-route dispatch mode (v1.1.1). `Sync` = orchestrator awaits the
|
||||
/// executor and returns the response in the same HTTP request. `Async`
|
||||
/// = orchestrator writes the request to the trigger outbox, returns
|
||||
/// `202 Accepted` immediately, and the dispatcher runs the script in
|
||||
/// the background (with retries + dead-letter).
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
|
||||
#[serde(rename_all = "lowercase")]
|
||||
pub enum DispatchMode {
|
||||
#[default]
|
||||
Sync,
|
||||
Async,
|
||||
}
|
||||
|
||||
impl DispatchMode {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Sync => "sync",
|
||||
Self::Async => "async",
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_wire(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"sync" => Some(Self::Sync),
|
||||
"async" => Some(Self::Async),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct Route {
|
||||
pub id: Uuid,
|
||||
@@ -60,5 +92,12 @@ pub struct Route {
|
||||
/// `None` = any method.
|
||||
pub method: Option<String>,
|
||||
|
||||
/// v1.1.1: per-route dispatch mode. `Sync` (default) → orchestrator
|
||||
/// awaits the executor inline. `Async` → orchestrator writes to
|
||||
/// the outbox + returns `202 Accepted`; dispatcher fires the
|
||||
/// script in the background with retries.
|
||||
#[serde(default)]
|
||||
pub dispatch_mode: DispatchMode,
|
||||
|
||||
pub created_at: DateTime<Utc>,
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
//! the cx in is shared by both sides. Pure value type — no handles, no
|
||||
//! DB pool references, no allocations beyond what's in `Principal`.
|
||||
|
||||
use crate::{AppId, ExecutionId, Principal, RequestId};
|
||||
use crate::{AppId, ExecutionId, Principal, RequestId, TriggerEvent};
|
||||
|
||||
/// Per-invocation context for every stateful SDK service call.
|
||||
///
|
||||
@@ -51,4 +51,19 @@ pub struct SdkCallCx {
|
||||
/// `execution_id` of the original ingress execution. Lets the audit
|
||||
/// log group every fan-out execution under the originating event.
|
||||
pub root_execution_id: ExecutionId,
|
||||
|
||||
/// `true` only when this invocation is a `dead_letter` trigger
|
||||
/// handler. Set by the dispatcher when it picks an outbox row
|
||||
/// whose trigger has `kind = 'dead_letter'`. The retry / dead-
|
||||
/// letter machinery short-circuits when this is set: handlers
|
||||
/// execute once, with no retry, and a failed run can NEVER be
|
||||
/// dead-lettered itself (design notes §4 recursion-stop rule).
|
||||
/// `false` for every other invocation, including the script
|
||||
/// being used as a non-DL trigger handler.
|
||||
pub is_dead_letter_handler: bool,
|
||||
|
||||
/// The event that fired this script, when it's a triggered
|
||||
/// invocation. `None` for direct ingress (HTTP request, manual
|
||||
/// run). Surfaced to scripts as `ctx.event`.
|
||||
pub event: Option<TriggerEvent>,
|
||||
}
|
||||
|
||||
@@ -1,38 +1,81 @@
|
||||
//! `Services` — bundle of stateful SDK service handles plumbed from the
|
||||
//! host binary into every Rhai execution.
|
||||
//!
|
||||
//! v1.1.0 ships this struct empty. Subsequent PRs in the v1.1.x series
|
||||
//! add one field per service:
|
||||
//! Constructed once at startup in the picloud binary; cloned (cheap —
|
||||
//! every field is an `Arc`) into the per-call sdk bridge so script
|
||||
//! invocations don't need to re-resolve dependencies. The bundle is
|
||||
//! handed to `executor-core::sdk::register_all` alongside an
|
||||
//! `SdkCallCx` to wire each `::` namespace.
|
||||
//!
|
||||
//! ```ignore
|
||||
//! pub kv: Arc<dyn KvService>, // v1.1.1
|
||||
//! pub docs: Arc<dyn DocsService>, // v1.1.2
|
||||
//! pub http: Arc<dyn HttpService>, // v1.1.4
|
||||
//! // …
|
||||
//! ```
|
||||
//!
|
||||
//! The bundle is cheap to clone (`Arc` per service) and is constructed
|
||||
//! once at startup in the picloud binary. The executor takes it by
|
||||
//! reference per invocation, hands it (alongside an `SdkCallCx`) to
|
||||
//! `executor-core::sdk::register_all`, which wires the corresponding
|
||||
//! Rhai `::` namespace per service.
|
||||
//! v1.1.0 shipped this empty; v1.1.1 adds the first two service fields
|
||||
//! (`kv`, `dead_letters`) plus the `events` emitter that bound services
|
||||
//! use to publish events into the triggers outbox.
|
||||
//!
|
||||
//! `#[non_exhaustive]` so adding fields is a non-breaking change for
|
||||
//! consumers that only *pattern-match* a `&Services`; only crates that
|
||||
//! *construct* a `Services` (in practice, just the picloud binary) need
|
||||
//! to update their constructor when new services land.
|
||||
//! *construct* a `Services` (the picloud binary and tests) update.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use crate::{
|
||||
DeadLetterService, KvService, NoopDeadLetterService, NoopEventEmitter, NoopKvService,
|
||||
ServiceEventEmitter,
|
||||
};
|
||||
|
||||
/// SDK service bundle. See module docs for the lifecycle and the v1.1.x
|
||||
/// expansion plan.
|
||||
#[non_exhaustive]
|
||||
#[derive(Default)]
|
||||
pub struct Services {}
|
||||
pub struct Services {
|
||||
/// KV store (v1.1.1). Backed by Postgres in the picloud binary;
|
||||
/// in-memory in tests.
|
||||
pub kv: Arc<dyn KvService>,
|
||||
|
||||
/// Dead-letter management (v1.1.1). Scripts get
|
||||
/// `dead_letters::replay(id)` and `dead_letters::resolve(id, reason)`.
|
||||
pub dead_letters: Arc<dyn DeadLetterService>,
|
||||
|
||||
/// Event emitter for the triggers outbox. Mutating service methods
|
||||
/// (`KvService::set/delete`, future `docs::*`, `files::*`, etc.)
|
||||
/// call `events.emit(cx, event)` after the write succeeds. The
|
||||
/// outbox-backed impl in `manager-core::outbox_event_emitter`
|
||||
/// replaces v1.1.0's `NoopEventEmitter`.
|
||||
pub events: Arc<dyn ServiceEventEmitter>,
|
||||
}
|
||||
|
||||
impl Services {
|
||||
/// Construct an empty bundle. Replaced by a fielded `::new(...)`
|
||||
/// once the first service (KV, v1.1.1) lands.
|
||||
/// Construct a bundle from already-constructed `Arc<dyn …>` handles.
|
||||
/// The picloud binary's `main` wires this up after the DB pool is
|
||||
/// open; tests build it from in-memory fakes.
|
||||
#[must_use]
|
||||
pub fn new() -> Self {
|
||||
Self {}
|
||||
pub fn new(
|
||||
kv: Arc<dyn KvService>,
|
||||
dead_letters: Arc<dyn DeadLetterService>,
|
||||
events: Arc<dyn ServiceEventEmitter>,
|
||||
) -> Self {
|
||||
Self {
|
||||
kv,
|
||||
dead_letters,
|
||||
events,
|
||||
}
|
||||
}
|
||||
|
||||
/// All-noop bundle for tests that build an `Engine` but don't
|
||||
/// exercise the stateful services. Returns the same shape as
|
||||
/// `Services::new` so callers can't accidentally rely on a stub
|
||||
/// silently doing the right thing — every call into a noop
|
||||
/// service surfaces an explicit error.
|
||||
#[must_use]
|
||||
pub fn with_noop_services() -> Self {
|
||||
Self::new(
|
||||
Arc::new(NoopKvService),
|
||||
Arc::new(NoopDeadLetterService),
|
||||
Arc::new(NoopEventEmitter),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Services {
|
||||
fn default() -> Self {
|
||||
Self::with_noop_services()
|
||||
}
|
||||
}
|
||||
|
||||
105
crates/shared/src/trigger_event.rs
Normal file
105
crates/shared/src/trigger_event.rs
Normal file
@@ -0,0 +1,105 @@
|
||||
//! `TriggerEvent` — the description of the event that fired a script.
|
||||
//!
|
||||
//! Built by the dispatcher (in `manager-core`) from the outbox row and
|
||||
//! attached to the `ExecRequest` that's handed to `executor-core`. The
|
||||
//! Rhai bridge in `executor-core::engine::build_ctx_map` flattens this
|
||||
//! into `ctx.event` for the script.
|
||||
//!
|
||||
//! Living in `picloud-shared` so the dispatcher and the executor agree
|
||||
//! on the wire shape. Serializable so cluster mode (v1.3+) can ship
|
||||
//! ExecRequests over HTTP without rewriting this type.
|
||||
|
||||
use chrono::{DateTime, Utc};
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use crate::{DeadLetterId, ScriptId, TriggerId};
|
||||
|
||||
/// Operations a KV trigger can fire on. Stored as a lowercase string
|
||||
/// in `kv_trigger_details.ops` (Postgres `text[]`).
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "lowercase")]
|
||||
pub enum KvEventOp {
|
||||
Insert,
|
||||
Update,
|
||||
Delete,
|
||||
}
|
||||
|
||||
impl KvEventOp {
|
||||
#[must_use]
|
||||
pub const fn as_str(self) -> &'static str {
|
||||
match self {
|
||||
Self::Insert => "insert",
|
||||
Self::Update => "update",
|
||||
Self::Delete => "delete",
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn from_wire(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"insert" => Some(Self::Insert),
|
||||
"update" => Some(Self::Update),
|
||||
"delete" => Some(Self::Delete),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Discriminated description of a triggering event. Lifted from the
|
||||
/// outbox row's payload at dispatch time. Each variant carries the
|
||||
/// fields the corresponding `ctx.event` shape exposes to the script.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
#[serde(tag = "source", rename_all = "snake_case")]
|
||||
pub enum TriggerEvent {
|
||||
/// A KV insert / update / delete fired this handler.
|
||||
Kv {
|
||||
op: KvEventOp,
|
||||
collection: String,
|
||||
key: String,
|
||||
/// Present on `insert` and `update`. Absent on `delete`.
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
value: Option<serde_json::Value>,
|
||||
},
|
||||
|
||||
/// A dead-letter row fired this handler. The original event is
|
||||
/// nested verbatim plus the dead-letter metadata the design notes
|
||||
/// §4 require.
|
||||
DeadLetter {
|
||||
dead_letter_id: DeadLetterId,
|
||||
original: Box<TriggerEvent>,
|
||||
attempts: u32,
|
||||
last_error: String,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
trigger_id: Option<TriggerId>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
script_id: Option<ScriptId>,
|
||||
first_attempt_at: DateTime<Utc>,
|
||||
last_attempt_at: DateTime<Utc>,
|
||||
},
|
||||
}
|
||||
|
||||
impl TriggerEvent {
|
||||
/// The `source` discriminant the script sees on `ctx.event.source`.
|
||||
#[must_use]
|
||||
pub const fn source(&self) -> &'static str {
|
||||
match self {
|
||||
Self::Kv { .. } => "kv",
|
||||
Self::DeadLetter { .. } => "dead_letter",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Convenience accessor on the dead-letter variant for places that
|
||||
/// already know they're handling a DL event. Pulled out so the
|
||||
/// dispatcher and the dashboard don't have to repeat the match.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DeadLetterEventDetail {
|
||||
pub dead_letter_id: DeadLetterId,
|
||||
pub original: TriggerEvent,
|
||||
pub attempts: u32,
|
||||
pub last_error: String,
|
||||
pub trigger_id: Option<TriggerId>,
|
||||
pub script_id: Option<ScriptId>,
|
||||
pub first_attempt_at: DateTime<Utc>,
|
||||
pub last_attempt_at: DateTime<Utc>,
|
||||
}
|
||||
@@ -19,7 +19,10 @@ pub const PRODUCT_VERSION: &str = env!("CARGO_PKG_VERSION");
|
||||
///
|
||||
/// 1.1 additions: `ctx.request.params`, `ctx.request.query`,
|
||||
/// `ctx.request.rest`.
|
||||
pub const SDK_VERSION: &str = "1.1";
|
||||
///
|
||||
/// 1.2 additions (v1.1.1): `kv::collection(name).{get,set,has,delete,list}`,
|
||||
/// `dead_letters::{replay,resolve}`, `ctx.event` for triggered handlers.
|
||||
pub const SDK_VERSION: &str = "1.2";
|
||||
|
||||
/// HTTP API major version. Appears in URL paths as `/api/v{N}/...`.
|
||||
/// Bump (new integer + new URL prefix) when the request/response
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "picloud-dashboard",
|
||||
"version": "0.6.0",
|
||||
"version": "0.7.0",
|
||||
"private": true,
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
|
||||
@@ -186,6 +186,23 @@ export interface UpdateScriptInput {
|
||||
sandbox?: ScriptSandbox;
|
||||
}
|
||||
|
||||
export interface DeadLetterRow {
|
||||
id: string;
|
||||
app_id: string;
|
||||
source: string;
|
||||
op: string;
|
||||
trigger_id: string | null;
|
||||
script_id: string | null;
|
||||
payload: unknown;
|
||||
attempt_count: number;
|
||||
first_attempt_at: string;
|
||||
last_attempt_at: string;
|
||||
last_error: string;
|
||||
created_at: string;
|
||||
resolved_at: string | null;
|
||||
resolution: 'replayed' | 'ignored' | 'handled_by_script' | 'handler_failed' | null;
|
||||
}
|
||||
|
||||
export interface ExecutionResult {
|
||||
status: number;
|
||||
headers: Record<string, string>;
|
||||
@@ -516,6 +533,37 @@ export const api = {
|
||||
)
|
||||
},
|
||||
|
||||
deadLetters: {
|
||||
count: (idOrSlug: string) =>
|
||||
adminRequest<{ unresolved: number }>(
|
||||
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/count`
|
||||
),
|
||||
list: (idOrSlug: string, opts: { unresolved?: boolean; limit?: number; offset?: number } = {}) => {
|
||||
const params = new URLSearchParams();
|
||||
if (opts.unresolved) params.set('unresolved', 'true');
|
||||
if (opts.limit !== undefined) params.set('limit', String(opts.limit));
|
||||
if (opts.offset !== undefined) params.set('offset', String(opts.offset));
|
||||
const qs = params.toString();
|
||||
return adminRequest<{ dead_letters: DeadLetterRow[] }>(
|
||||
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters${qs ? `?${qs}` : ''}`
|
||||
);
|
||||
},
|
||||
get: (idOrSlug: string, dlId: string) =>
|
||||
adminRequest<DeadLetterRow>(
|
||||
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}`
|
||||
),
|
||||
replay: (idOrSlug: string, dlId: string) =>
|
||||
adminRequest<null>(
|
||||
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}/replay`,
|
||||
{ method: 'POST' }
|
||||
),
|
||||
resolve: (idOrSlug: string, dlId: string, reason: string) =>
|
||||
adminRequest<null>(
|
||||
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}/resolve`,
|
||||
{ method: 'POST', body: JSON.stringify({ reason }) }
|
||||
)
|
||||
},
|
||||
|
||||
execute: async (
|
||||
id: string,
|
||||
body: unknown,
|
||||
|
||||
@@ -12,6 +12,26 @@
|
||||
let listError = $state<string | null>(null);
|
||||
let loading = $state(true);
|
||||
|
||||
/// Unresolved-dead-letter count per app (v1.1.1). Loaded in
|
||||
/// parallel after the app list. Failures here are non-fatal —
|
||||
/// missing counts just don't render a badge.
|
||||
let unresolvedDl = $state<Record<string, number>>({});
|
||||
async function loadDlCounts(appList: App[]) {
|
||||
const results = await Promise.all(
|
||||
appList.map(async (a) => {
|
||||
try {
|
||||
const r = await api.deadLetters.count(a.id);
|
||||
return [a.id, r.unresolved] as const;
|
||||
} catch {
|
||||
return [a.id, 0] as const;
|
||||
}
|
||||
})
|
||||
);
|
||||
const next: Record<string, number> = {};
|
||||
for (const [id, count] of results) next[id] = count;
|
||||
unresolvedDl = next;
|
||||
}
|
||||
|
||||
let showCreate = $state(false);
|
||||
let createSlug = $state('');
|
||||
let createName = $state('');
|
||||
@@ -49,6 +69,9 @@
|
||||
listError = null;
|
||||
try {
|
||||
apps = await api.apps.list();
|
||||
if (apps && apps.length > 0) {
|
||||
void loadDlCounts(apps);
|
||||
}
|
||||
} catch (e) {
|
||||
listError = e instanceof Error ? e.message : String(e);
|
||||
apps = null;
|
||||
@@ -201,6 +224,12 @@
|
||||
<div class="primary">
|
||||
<strong>{app.name}</strong>
|
||||
<span class="muted">/{app.slug}</span>
|
||||
{#if unresolvedDl[app.id] > 0}
|
||||
<span
|
||||
class="dl-badge"
|
||||
title="Unresolved dead letters in this app"
|
||||
>{unresolvedDl[app.id]}</span>
|
||||
{/if}
|
||||
</div>
|
||||
<div class="secondary muted">
|
||||
{app.description ?? '—'}
|
||||
@@ -246,6 +275,19 @@
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
.dl-badge {
|
||||
display: inline-block;
|
||||
min-width: 1.25rem;
|
||||
padding: 0.1rem 0.4rem;
|
||||
background: #ef4444;
|
||||
color: #fff;
|
||||
border-radius: 999px;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 600;
|
||||
text-align: center;
|
||||
margin-left: 0.5rem;
|
||||
}
|
||||
|
||||
.muted {
|
||||
color: #64748b;
|
||||
}
|
||||
|
||||
@@ -37,6 +37,20 @@
|
||||
let domains = $state<AppDomain[]>([]);
|
||||
let members = $state<AppMemberDto[]>([]);
|
||||
|
||||
/// v1.1.1 dead-letters surface — design notes §4 mandates the
|
||||
/// dashboard surface this since there's no default handler.
|
||||
let unresolvedDeadLetters = $state<number>(0);
|
||||
async function loadDeadLetterCount(idOrSlug: string) {
|
||||
try {
|
||||
const r = await api.deadLetters.count(idOrSlug);
|
||||
unresolvedDeadLetters = r.unresolved;
|
||||
} catch {
|
||||
// Non-fatal: the page renders fine without the badge if
|
||||
// the count endpoint is unreachable (e.g. older server).
|
||||
unresolvedDeadLetters = 0;
|
||||
}
|
||||
}
|
||||
|
||||
// Derive UI gates from the capabilities helper so the rules stay
|
||||
// in lockstep with the backend's `can()`. canAdminApp also covers
|
||||
// the Members + Settings + Domains-mutation tabs; canWriteApp
|
||||
@@ -107,7 +121,11 @@
|
||||
editName = app.name;
|
||||
editDescription = app.description ?? '';
|
||||
editSlug = app.slug;
|
||||
const loaders: Promise<unknown>[] = [loadScripts(app.id), loadDomains(app.id)];
|
||||
const loaders: Promise<unknown>[] = [
|
||||
loadScripts(app.id),
|
||||
loadDomains(app.id),
|
||||
loadDeadLetterCount(app.id)
|
||||
];
|
||||
if (canAdmin) {
|
||||
loaders.push(loadMembers(app.id), loadEligibleUsers());
|
||||
}
|
||||
@@ -421,6 +439,16 @@
|
||||
class:active={activeTab === 'settings'}
|
||||
onclick={() => (activeTab = 'settings')}>Settings</button
|
||||
>
|
||||
<a
|
||||
class="tab-link"
|
||||
href="{base}/apps/{slug}/dead-letters"
|
||||
title="Dead letters — replay or resolve events that exhausted their retry policy"
|
||||
>
|
||||
Dead letters
|
||||
{#if unresolvedDeadLetters > 0}
|
||||
<span class="dl-badge">{unresolvedDeadLetters}</span>
|
||||
{/if}
|
||||
</a>
|
||||
{/if}
|
||||
</nav>
|
||||
|
||||
@@ -871,6 +899,32 @@
|
||||
border-bottom-color: #38bdf8;
|
||||
}
|
||||
|
||||
.tabs .tab-link {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 0.4rem;
|
||||
color: #94a3b8;
|
||||
text-decoration: none;
|
||||
padding: 0.6rem 1rem;
|
||||
margin-left: auto;
|
||||
border-bottom: 2px solid transparent;
|
||||
font: inherit;
|
||||
}
|
||||
.tabs .tab-link:hover {
|
||||
color: #e2e8f0;
|
||||
}
|
||||
.dl-badge {
|
||||
display: inline-block;
|
||||
min-width: 1.25rem;
|
||||
padding: 0.1rem 0.4rem;
|
||||
background: #ef4444;
|
||||
color: #fff;
|
||||
border-radius: 999px;
|
||||
font-size: 0.75rem;
|
||||
font-weight: 600;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
button {
|
||||
background: #38bdf8;
|
||||
color: #0b1220;
|
||||
|
||||
310
dashboard/src/routes/apps/[slug]/dead-letters/+page.svelte
Normal file
310
dashboard/src/routes/apps/[slug]/dead-letters/+page.svelte
Normal file
@@ -0,0 +1,310 @@
|
||||
<script lang="ts">
|
||||
import { base } from '$app/paths';
|
||||
import { page } from '$app/state';
|
||||
import { api, ApiError, type App, type DeadLetterRow } from '$lib/api';
|
||||
|
||||
let slug = $derived(page.params.slug ?? '');
|
||||
let app = $state<App | null>(null);
|
||||
let rows = $state<DeadLetterRow[]>([]);
|
||||
let unresolved = $state<number>(0);
|
||||
let loading = $state(true);
|
||||
let error = $state<string | null>(null);
|
||||
let unresolvedOnly = $state(true);
|
||||
let expandedId = $state<string | null>(null);
|
||||
|
||||
async function load() {
|
||||
loading = true;
|
||||
error = null;
|
||||
try {
|
||||
const a = await api.apps.get(slug);
|
||||
app = a;
|
||||
const c = await api.deadLetters.count(slug);
|
||||
unresolved = c.unresolved;
|
||||
const r = await api.deadLetters.list(slug, { unresolved: unresolvedOnly, limit: 100 });
|
||||
rows = r.dead_letters;
|
||||
} catch (e) {
|
||||
error = e instanceof ApiError ? e.message : String(e);
|
||||
} finally {
|
||||
loading = false;
|
||||
}
|
||||
}
|
||||
|
||||
$effect(() => {
|
||||
// Re-load whenever the slug or filter changes.
|
||||
void slug;
|
||||
void unresolvedOnly;
|
||||
void load();
|
||||
});
|
||||
|
||||
async function replay(dlId: string) {
|
||||
try {
|
||||
await api.deadLetters.replay(slug, dlId);
|
||||
await load();
|
||||
} catch (e) {
|
||||
error = e instanceof ApiError ? e.message : String(e);
|
||||
}
|
||||
}
|
||||
|
||||
async function markIgnored(dlId: string) {
|
||||
try {
|
||||
await api.deadLetters.resolve(slug, dlId, 'ignored');
|
||||
await load();
|
||||
} catch (e) {
|
||||
error = e instanceof ApiError ? e.message : String(e);
|
||||
}
|
||||
}
|
||||
|
||||
function toggleExpanded(id: string) {
|
||||
expandedId = expandedId === id ? null : id;
|
||||
}
|
||||
|
||||
function fmtTime(iso: string): string {
|
||||
return new Date(iso).toLocaleString();
|
||||
}
|
||||
|
||||
function truncate(s: string, n: number): string {
|
||||
if (s.length <= n) return s;
|
||||
return s.slice(0, n) + '…';
|
||||
}
|
||||
</script>
|
||||
|
||||
<svelte:head>
|
||||
<title>Dead letters · {slug} · PiCloud</title>
|
||||
</svelte:head>
|
||||
|
||||
<div class="container">
|
||||
<header>
|
||||
<div>
|
||||
<a href="{base}/apps/{slug}" class="back">← back to {app?.name ?? slug}</a>
|
||||
<h1>Dead letters</h1>
|
||||
<p class="subtitle">
|
||||
{#if unresolved > 0}
|
||||
<strong class="badge">{unresolved}</strong> unresolved
|
||||
{:else}
|
||||
No unresolved dead letters
|
||||
{/if}
|
||||
</p>
|
||||
</div>
|
||||
<div class="controls">
|
||||
<label>
|
||||
<input type="checkbox" bind:checked={unresolvedOnly} />
|
||||
Show unresolved only
|
||||
</label>
|
||||
<button onclick={load} disabled={loading}>Refresh</button>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
{#if error}
|
||||
<div class="error">{error}</div>
|
||||
{/if}
|
||||
|
||||
{#if loading}
|
||||
<p>Loading…</p>
|
||||
{:else if rows.length === 0}
|
||||
<p class="empty">
|
||||
{#if unresolvedOnly}
|
||||
No unresolved dead letters for this app. 🎉
|
||||
{:else}
|
||||
No dead letters recorded yet.
|
||||
{/if}
|
||||
</p>
|
||||
{:else}
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Created</th>
|
||||
<th>Source</th>
|
||||
<th>Op</th>
|
||||
<th>Script</th>
|
||||
<th>Attempts</th>
|
||||
<th>First / Last attempt</th>
|
||||
<th>Last error</th>
|
||||
<th>Actions</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{#each rows as row (row.id)}
|
||||
<tr class:resolved={row.resolved_at !== null}>
|
||||
<td>{fmtTime(row.created_at)}</td>
|
||||
<td><code>{row.source}</code></td>
|
||||
<td><code>{row.op}</code></td>
|
||||
<td>{row.script_id ? row.script_id.slice(0, 8) : '—'}</td>
|
||||
<td>{row.attempt_count}</td>
|
||||
<td class="times">
|
||||
<div>{fmtTime(row.first_attempt_at)}</div>
|
||||
<div>{fmtTime(row.last_attempt_at)}</div>
|
||||
</td>
|
||||
<td class="err">
|
||||
<button class="link" onclick={() => toggleExpanded(row.id)}>
|
||||
{truncate(row.last_error, 60)}
|
||||
</button>
|
||||
</td>
|
||||
<td class="actions">
|
||||
{#if row.resolved_at === null}
|
||||
<button onclick={() => replay(row.id)}>Replay</button>
|
||||
<button class="secondary" onclick={() => markIgnored(row.id)}>
|
||||
Mark resolved
|
||||
</button>
|
||||
{:else}
|
||||
<span class="resolution">{row.resolution ?? 'resolved'}</span>
|
||||
{/if}
|
||||
</td>
|
||||
</tr>
|
||||
{#if expandedId === row.id}
|
||||
<tr class="detail">
|
||||
<td colspan="8">
|
||||
<div class="detail-grid">
|
||||
<section>
|
||||
<h3>Payload</h3>
|
||||
<pre>{JSON.stringify(row.payload, null, 2)}</pre>
|
||||
</section>
|
||||
<section>
|
||||
<h3>Last error</h3>
|
||||
<pre>{row.last_error}</pre>
|
||||
</section>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{/if}
|
||||
{/each}
|
||||
</tbody>
|
||||
</table>
|
||||
{/if}
|
||||
</div>
|
||||
|
||||
<style>
|
||||
.container {
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 2rem;
|
||||
}
|
||||
header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: flex-start;
|
||||
margin-bottom: 1rem;
|
||||
gap: 1rem;
|
||||
}
|
||||
.back {
|
||||
font-size: 0.85rem;
|
||||
color: var(--text-muted, #666);
|
||||
text-decoration: none;
|
||||
}
|
||||
.back:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
h1 {
|
||||
margin: 0.25rem 0;
|
||||
}
|
||||
.subtitle {
|
||||
color: var(--text-muted, #666);
|
||||
margin: 0;
|
||||
}
|
||||
.badge {
|
||||
display: inline-block;
|
||||
min-width: 1.5rem;
|
||||
padding: 0.1rem 0.4rem;
|
||||
background: #c00;
|
||||
color: #fff;
|
||||
border-radius: 999px;
|
||||
text-align: center;
|
||||
font-weight: 600;
|
||||
}
|
||||
.controls {
|
||||
display: flex;
|
||||
gap: 0.75rem;
|
||||
align-items: center;
|
||||
}
|
||||
.error {
|
||||
background: #fee;
|
||||
border: 1px solid #fbb;
|
||||
color: #900;
|
||||
padding: 0.75rem 1rem;
|
||||
border-radius: 4px;
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
.empty {
|
||||
color: var(--text-muted, #666);
|
||||
text-align: center;
|
||||
padding: 2rem;
|
||||
}
|
||||
table {
|
||||
width: 100%;
|
||||
border-collapse: collapse;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
th,
|
||||
td {
|
||||
text-align: left;
|
||||
padding: 0.5rem 0.75rem;
|
||||
border-bottom: 1px solid var(--border, #e0e0e0);
|
||||
vertical-align: top;
|
||||
}
|
||||
th {
|
||||
background: var(--bg-secondary, #f5f5f5);
|
||||
font-weight: 600;
|
||||
}
|
||||
tr.resolved {
|
||||
opacity: 0.6;
|
||||
}
|
||||
.times div {
|
||||
font-size: 0.8rem;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.err button.link {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--link, #06c);
|
||||
text-decoration: underline;
|
||||
cursor: pointer;
|
||||
padding: 0;
|
||||
font-family: monospace;
|
||||
font-size: 0.85rem;
|
||||
text-align: left;
|
||||
}
|
||||
.actions {
|
||||
white-space: nowrap;
|
||||
display: flex;
|
||||
gap: 0.4rem;
|
||||
}
|
||||
.actions button.secondary {
|
||||
background: transparent;
|
||||
color: var(--text, #333);
|
||||
border: 1px solid var(--border, #ccc);
|
||||
}
|
||||
.resolution {
|
||||
font-style: italic;
|
||||
color: var(--text-muted, #666);
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
tr.detail td {
|
||||
background: var(--bg-secondary, #fafafa);
|
||||
padding: 0;
|
||||
}
|
||||
.detail-grid {
|
||||
display: grid;
|
||||
grid-template-columns: 2fr 1fr;
|
||||
gap: 1rem;
|
||||
padding: 1rem;
|
||||
}
|
||||
.detail-grid section h3 {
|
||||
margin: 0 0 0.5rem 0;
|
||||
font-size: 0.85rem;
|
||||
text-transform: uppercase;
|
||||
color: var(--text-muted, #666);
|
||||
}
|
||||
.detail-grid pre {
|
||||
background: #fff;
|
||||
border: 1px solid var(--border, #e0e0e0);
|
||||
padding: 0.75rem;
|
||||
border-radius: 4px;
|
||||
font-size: 0.8rem;
|
||||
overflow: auto;
|
||||
max-height: 300px;
|
||||
margin: 0;
|
||||
}
|
||||
code {
|
||||
font-family: monospace;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
</style>
|
||||
215
docs/stdlib-reference.md
Normal file
215
docs/stdlib-reference.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Rhai stdlib reference
|
||||
|
||||
Everything in this document is callable from any user script without
|
||||
imports — Rhai's built-in standard library plus the seven PiCloud
|
||||
utility modules added in v1.1.0. Stateful service modules (KV, docs,
|
||||
HTTP, …) ship in subsequent v1.1.x releases and are documented
|
||||
separately.
|
||||
|
||||
For the architectural shape (why some modules are stateless and
|
||||
register at engine build, why others are per-call), see
|
||||
[sdk-shape.md](sdk-shape.md).
|
||||
|
||||
## Conventions
|
||||
|
||||
- **Throw on failure.** Every function throws a Rhai runtime error on
|
||||
bad input (invalid pattern, invalid encoding, out-of-range arg). Use
|
||||
`try { ... } catch (e) { ... }` if you want to handle it.
|
||||
- **`()` for absent.** Functions that semantically may have no result
|
||||
(e.g. `regex::find` when nothing matches) return `()`. Test with
|
||||
`if v == () { ... }`.
|
||||
- **`bool` for predicates.** Yes/no questions return `bool`.
|
||||
- **UTC, milliseconds, lowercase hex, RFC 3986.** Defaults chosen once,
|
||||
not per call.
|
||||
|
||||
---
|
||||
|
||||
## Rhai built-ins (free with every script)
|
||||
|
||||
These come with the Rhai engine itself. See the
|
||||
[Rhai book](https://rhai.rs/book/lib/index.html) for full signatures.
|
||||
|
||||
**Math:** `+ - * / %`, `min`, `max`, `abs`, `sqrt`, `pow`, `floor`,
|
||||
`ceil`, `round`, `to_int`, `to_float`, `sin`, `cos`, `tan`, `asin`,
|
||||
`acos`, `atan`, `exp`, `ln`, `log`, `PI()`, `E()`.
|
||||
|
||||
**String:** `len`, `is_empty`, `contains`, `starts_with`, `ends_with`,
|
||||
`index_of`, `split`, `trim`, `to_lower`, `to_upper`, `replace`, `chars`,
|
||||
`pad`, `sub_string`, `crop`, `+` (concatenation).
|
||||
|
||||
**Array:** `push`, `pop`, `shift`, `insert`, `remove`, `len`, `clear`,
|
||||
`truncate`, `extend`, `filter`, `map`, `reduce`, `reduce_rev`, `find`,
|
||||
`find_map`, `any`, `all`, `index_of`, `contains`, `sort`, `reverse`,
|
||||
`dedup`, `chunks`, `splice`, `[]` indexing.
|
||||
|
||||
**Map:** `len`, `is_empty`, `contains`, `keys`, `values`, `mixin`,
|
||||
`remove`, `clear`, `fill_with`, `+` (merge), `[]` and `.` access.
|
||||
|
||||
**Blob:** `len`, `push`, `pop`, `clear`, `as_string`, `parse_le_int`,
|
||||
`write_*`, `[]` indexing. Blobs are `Vec<u8>` at the Rust layer.
|
||||
|
||||
**Logging:** `log::trace`, `log::info`, `log::warn`, `log::error` —
|
||||
each takes a message and optionally a structured-data map. (Documented
|
||||
with the SDK contract; mentioned here for completeness.)
|
||||
|
||||
---
|
||||
|
||||
## `regex::` — regular expressions
|
||||
|
||||
Linear-time, no backtracking (powered by the Rust `regex` crate).
|
||||
Patterns compile per call.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `regex::is_match(pattern, text) -> bool` | Whether `text` contains a match. |
|
||||
| `regex::find(pattern, text) -> String \| ()` | First match or `()` if none. |
|
||||
| `regex::find_all(pattern, text) -> Array` | All matches as `String` array. |
|
||||
| `regex::replace(pattern, text, replacement) -> String` | Replace first match only. |
|
||||
| `regex::replace_all(pattern, text, replacement) -> String` | Replace every match. |
|
||||
| `regex::split(pattern, text) -> Array` | Split `text` on matches. |
|
||||
| `regex::captures(pattern, text) -> Array \| ()` | `[full, group1, group2, ...]` from the first match; unmatched optional groups appear as `()`. |
|
||||
|
||||
Invalid patterns throw. Use `\\` to escape inside Rhai string literals
|
||||
(`"\\d+"`) or backtick strings to skip escaping (`` `\d+` ``).
|
||||
|
||||
```rhai
|
||||
if regex::is_match(`^/api/v\d+/`, ctx.request.path) {
|
||||
let cap = regex::captures(`/api/v(\d+)/(.+)`, ctx.request.path);
|
||||
let version = cap[1]; // "1"
|
||||
let rest = cap[2]; // "users"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `random::` — cryptographically-secure randomness
|
||||
|
||||
All randomness comes from `OsRng`. There is deliberately no "fast
|
||||
non-crypto" variant — scripts shouldn't have to pick.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `random::int(min, max) -> i64` | Uniform integer in `[min, max]` (inclusive). Throws if `min > max`. |
|
||||
| `random::float() -> f64` | Uniform float in `[0.0, 1.0)`. |
|
||||
| `random::bytes(n) -> Blob` | `n` random bytes. `n` in `0..=65536`. |
|
||||
| `random::string(n) -> String` | `n` random alphanumeric chars (`A-Za-z0-9`). `n` in `0..=4096`. |
|
||||
| `random::uuid() -> String` | UUID v4 in canonical 8-4-4-4-12 form. |
|
||||
|
||||
```rhai
|
||||
let token = random::uuid();
|
||||
let salt = random::bytes(16);
|
||||
let pin = random::int(100000, 999999);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `time::` — UTC time
|
||||
|
||||
Canonical time value is **milliseconds since the Unix epoch** as `i64`.
|
||||
ISO 8601 / RFC 3339 strings are for I/O. UTC only — no timezone support.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `time::now() -> String` | Current UTC time as ISO 8601 with ms (e.g. `"2026-05-30T20:15:00.123Z"`). |
|
||||
| `time::now_ms() -> i64` | Current ms since Unix epoch. |
|
||||
| `time::parse(iso) -> i64` | Parse RFC 3339 / ISO 8601 string to ms. Throws on bad input. |
|
||||
| `time::format(ms) -> String` | Format ms-since-epoch as ISO 8601 with ms precision. |
|
||||
| `time::add_seconds(ms, secs) -> i64` | `ms + secs*1000`, with overflow check. |
|
||||
| `time::diff_seconds(a_ms, b_ms) -> i64` | `(b_ms - a_ms) / 1000`, truncated. |
|
||||
|
||||
```rhai
|
||||
let started_at = time::now_ms();
|
||||
// ... do work ...
|
||||
let elapsed = time::diff_seconds(started_at, time::now_ms());
|
||||
|
||||
let deadline = time::format(time::add_seconds(time::now_ms(), 3600));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `json::` — JSON parse and stringify
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `json::parse(s) -> Dynamic` | Parse a JSON string. Returns Rhai maps, arrays, scalars, or `()` for null. Throws on invalid JSON. |
|
||||
| `json::stringify(v) -> String` | Compact JSON. |
|
||||
| `json::stringify_pretty(v) -> String` | Pretty-printed (2-space indent). |
|
||||
|
||||
```rhai
|
||||
let payload = json::parse(ctx.request.body); // if body came in as a string
|
||||
let body_str = json::stringify(#{ ok: true, items: [1, 2, 3] });
|
||||
```
|
||||
|
||||
Note: `ctx.request.body` is *already* parsed when the request body is
|
||||
`Content-Type: application/json` — only call `json::parse` on raw
|
||||
strings.
|
||||
|
||||
---
|
||||
|
||||
## `base64::` — standard and URL-safe Base64
|
||||
|
||||
Two alphabets: standard (with `=` padding) and URL-safe (no padding).
|
||||
Encoders accept both `String` and `Blob`; decoders always return `Blob`.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `base64::encode(input) -> String` | Standard alphabet, padded. `input` is `String` or `Blob`. |
|
||||
| `base64::decode(s) -> Blob` | Decode standard alphabet. Throws on invalid. |
|
||||
| `base64::encode_url(input) -> String` | URL-safe alphabet, **no padding**. |
|
||||
| `base64::decode_url(s) -> Blob` | Decode URL-safe alphabet. Throws on invalid. |
|
||||
|
||||
```rhai
|
||||
let token = base64::encode_url(random::bytes(32)); // URL-safe session token
|
||||
let raw = base64::decode("aGVsbG8=");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `hex::` — hexadecimal
|
||||
|
||||
Encode produces lowercase. Decode accepts mixed case.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `hex::encode(input) -> String` | Lowercase hex. `input` is `String` or `Blob`. |
|
||||
| `hex::decode(s) -> Blob` | Decode hex (case-insensitive). Throws on invalid. |
|
||||
|
||||
```rhai
|
||||
let fingerprint = hex::encode(random::bytes(20));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `url::` — percent-encoding
|
||||
|
||||
Unreserved set per RFC 3986 (`A-Z`, `a-z`, `0-9`, `-`, `_`, `.`, `~`)
|
||||
is preserved; everything else is percent-encoded.
|
||||
|
||||
| Function | Description |
|
||||
|---|---|
|
||||
| `url::encode(s) -> String` | Percent-encode a component value. |
|
||||
| `url::decode(s) -> String` | Percent-decode. Throws on invalid UTF-8 in the decoded output. |
|
||||
| `url::encode_query(map) -> String` | Build `k1=v1&k2=v2` from a Map. Both keys and values are percent-encoded. Non-string values are coerced via `to_string()`. |
|
||||
|
||||
`url::encode_query` emits keys in the Map's natural order, which is
|
||||
alphabetical (Rhai's `Map` is a `BTreeMap`). RFC 3986 leaves query
|
||||
parameter ordering unspecified, so this is fine for any conforming
|
||||
consumer; if you need a specific ordering, build the string by hand.
|
||||
|
||||
```rhai
|
||||
let qs = url::encode_query(#{ q: "rust regex", page: 2 });
|
||||
// → "page=2&q=rust%20regex"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What's not here
|
||||
|
||||
- **Crypto** (sha256/hmac/argon2/encryption) — deferred to a focused
|
||||
later PR.
|
||||
- **Timezones** — UTC only in v1.1.0. Format with an offset upstream
|
||||
if you need local time.
|
||||
- **JWT, YAML, XML, CSV, Markdown** — not planned for v1.1.x.
|
||||
- **Stateful services** (KV, docs, HTTP, cron, files, pubsub, secrets,
|
||||
email, users, queue, invoke) — land per the v1.1.x roadmap in the
|
||||
[blueprint §12](../serverless_cloud_blueprint.md).
|
||||
617
docs/v1.1.x-design-notes.md
Normal file
617
docs/v1.1.x-design-notes.md
Normal file
@@ -0,0 +1,617 @@
|
||||
# v1.1.x design notes — in-flight decisions + revised roadmap
|
||||
|
||||
Planning document for the v1.1.x release series. Companion to:
|
||||
|
||||
- [`serverless_cloud_blueprint.md`](../serverless_cloud_blueprint.md) — authoritative design
|
||||
- [`docs/sdk-shape.md`](sdk-shape.md) — SDK conventions (settled in v1.1.0)
|
||||
- [`docs/stdlib-reference.md`](stdlib-reference.md) — stdlib API (settled in v1.1.0)
|
||||
- [`docs/versioning.md`](versioning.md) — versioning policy (post-1.0 carve-out settled with v1.1.0)
|
||||
|
||||
Items in this doc are either **tentatively decided but not yet shipped** or **open calls awaiting the maintainer's decision**. Once an item ships, its content moves into the blueprint and the corresponding section here gets pruned.
|
||||
|
||||
This document was created at the v1.1.0 → v1.1.1 boundary, capturing the architectural conversations that followed v1.1.0 but haven't yet landed in code or in the blueprint.
|
||||
|
||||
---
|
||||
|
||||
## 1. The three messaging primitives
|
||||
|
||||
PiCloud will expose three distinct messaging concepts. The right way to slice them is along **recipient model** and **delivery semantics**:
|
||||
|
||||
| | Recipients | Durability | Delivery | Retry on script failure | Mental model |
|
||||
|---|---|---|---|---|---|
|
||||
| **`invoke(script_id, args)`** | One **named** script | None (or fire-and-forget durable) | At-most-once sync, or at-least-once async | Caller-controlled via `retry::*` | Function call |
|
||||
| **`pubsub::publish_durable(topic, msg)`** | **All** scripts subscribed via trigger | Through outbox | **At-least-once per subscriber** | Per-subscriber retry up to N, then dead-letter | Fan-out broadcast (persisted) |
|
||||
| **`pubsub::publish_ephemeral(topic, msg)`** *(future)* | **All** scripts subscribed via trigger | None (in-memory NOTIFY) | **At-most-once per subscriber** | None | Fan-out broadcast (best-effort) |
|
||||
| **`queue::enqueue(name, msg)`** | **Exactly one** consumer wins | Durable table | **At-least-once total** | Visibility timeout + nack-on-throw | Work distribution |
|
||||
|
||||
**Critical distinction:** pub/sub and queue both end up at-least-once, but the **subscriber model** differs. Queue: 1 message → 1 delivery record → consumers compete. Pub/sub: 1 message → N delivery records (one per subscriber) → no competition.
|
||||
|
||||
### Pub/sub reframe — durable through the outbox, ephemeral as named escape hatch
|
||||
|
||||
The original blueprint plan was pub/sub via Postgres `LISTEN/NOTIFY` (ephemeral, sub-millisecond fan-out). Reframe to **reuse the triggers framework's outbox infrastructure for the durable path, and keep ephemeral as a separately-named future API**:
|
||||
|
||||
- `pubsub::publish_durable(topic, msg)` writes to the outbox (v1.1.5)
|
||||
- Dispatcher fans out one delivery record per subscribed script trigger
|
||||
- Each delivery retried on failure with the same machinery as KV / doc / file triggers
|
||||
- After N retries → dead-letter (see §4)
|
||||
- `pubsub::publish_ephemeral(topic, msg)` is committed as a future addition for the in-memory `LISTEN/NOTIFY` path — not shipped in v1.1.5, but the API split is decided now so users learn "durable by default, opt into ephemeral" from the start (rather than the reverse, which would be a breaking rename later).
|
||||
|
||||
**Wins:** one delivery model in the whole system for the durable path, durable pub/sub for free, shared observability/retry/dead-letter tooling across every event-firing surface.
|
||||
|
||||
**Cost:** ~1ms Postgres write per `publish_durable` (vs in-memory NOTIFY). For solo-dev / consumer hardware, the right tradeoff. The ephemeral escape hatch exists for sub-ms / high-frequency workloads if/when they emerge.
|
||||
|
||||
**Note on durability semantics.** "Durable" here means the outbox row persists, not that fan-out is transactional with the publisher's own data writes. A script doing `kv.set(...)` then `pubsub::publish_durable(...)` performs two separate writes; a crash between them can drop the publish. This matches the standard transactional-outbox pattern and is consistent with how KV / doc / file triggers already work.
|
||||
|
||||
### Queue stays separate
|
||||
|
||||
Pub/sub-through-outbox cannot model "work distribution with backpressure" cleanly. Queue keeps its own table:
|
||||
|
||||
- Producer: `queue::enqueue(name, msg)` → queue table
|
||||
- Consumer: `queue:receive` trigger fires when message available; runtime claims with `FOR UPDATE SKIP LOCKED` + visibility timeout
|
||||
- Script returns successfully → auto-ack (delete row)
|
||||
- Script throws → auto-nack (clear claim; message becomes visible again)
|
||||
- Visibility timeout exceeded → reclaim allowed (handles crashed consumers)
|
||||
- Max delivery attempts → dead-letter
|
||||
|
||||
The queue table IS the outbox for queue semantics — no double-buffering.
|
||||
|
||||
### Status
|
||||
|
||||
- **Durable pub/sub via trigger outbox**: ✅ Decided 2026-06-01 — ship as `pubsub::publish_durable` in v1.1.5.
|
||||
- **Ephemeral pub/sub**: ✅ Committed 2026-06-01 as a future addition named `pubsub::publish_ephemeral`. Not in v1.1.5; the explicit-naming split lands now so the durable default doesn't need a breaking rename later.
|
||||
- **Drop `LISTEN/NOTIFY` for v1.1.5**: ✅ Decided 2026-06-01.
|
||||
- **Queue stays separate from pub/sub**: ✅ Decided 2026-06-01 — two distinct top-level namespaces (`queue::*` and `pubsub::*`); no unifying `messaging::*` abstraction. Rationale: the two have genuinely different mental models (work distribution vs fan-out), the implementations share almost no code (queue needs `FOR UPDATE SKIP LOCKED` + visibility timeout + nack-on-throw; pub/sub needs per-subscriber fan-out + independent retry/dead-letter), and a unified API would force users to choose a mode they already know from the use case. A future Kafka-shaped consumer-group unification was considered and rejected — PiCloud is outbox-based, not log-based, so going Kafka-shaped would mean rebuilding storage.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~Pub/sub durability via trigger outbox~~ — ✅ Decided 2026-06-01: yes, both `publish_durable` (v1.1.5) and `publish_ephemeral` (future) committed with explicit names.
|
||||
2. ~~Queue and pub/sub stay separate concepts~~ — ✅ Decided 2026-06-01: separate top-level namespaces; no unifying messaging abstraction.
|
||||
|
||||
---
|
||||
|
||||
## 2. Universal trigger outbox
|
||||
|
||||
The triggers framework's outbox should be the universal substrate for **async dispatch**. Every event source that fires scripts asynchronously writes to the same outbox table; one dispatcher reads from it and routes to the executor with shared load control, retry, dead-letter, and trigger-depth tracking.
|
||||
|
||||
### What runs through the outbox
|
||||
|
||||
| Ingress | Path | Reason |
|
||||
|---|---|---|
|
||||
| **HTTP request (sync)** | Direct: orchestrator → executor → response (with NATS-style indirection — see §3) | Caller is waiting; the inbox pattern makes this work via the outbox |
|
||||
| **HTTP request (async, opt-in)** | Orchestrator writes outbox → returns 202 → dispatcher → executor | Webhooks, fire-and-forget endpoints; explicit opt-in via route config |
|
||||
| **Cron tick** | Scheduler writes outbox → dispatcher → executor | No caller; naturally async |
|
||||
| **KV / doc / file change** | Service writes outbox → dispatcher → executor | No caller; the originating script already returned |
|
||||
| **Pub/sub publish** | Service writes outbox → dispatcher → executor (per subscriber) | Fan-out semantics |
|
||||
| **Queue message** | Queue table IS the outbox; dispatcher claims via `FOR UPDATE SKIP LOCKED` | Avoids double-buffering |
|
||||
| **Inbound email** | SMTP receiver writes outbox → dispatcher → executor | No caller |
|
||||
|
||||
### What this gives
|
||||
|
||||
1. **One dispatcher = one place** for load control (the existing `ExecutionGate`), retry, dead-letter, trigger-depth tracking, fan-out. New event source = "write to outbox in this shape", nothing else.
|
||||
2. **Routes become a trigger kind**, conceptually. A route is `(source=http, filter=method+path, script_id, dispatch_mode=sync|async)`. Schema-wise the `routes` table likely stays separate from the new `triggers` table (polymorphic JSON columns get ugly), but the mental model collapses to "everything that fires a script is a trigger".
|
||||
3. **`dispatch_mode = async` is a per-route opt-in**. Webhook handlers can return 202 immediately and process in the background — dispatcher handles retries, caller gets a snappy ack.
|
||||
4. **Replay and debugging.** Every async invocation has an outbox row; admin can re-fire a trigger by re-dispatching the row.
|
||||
5. **Decoupled lifecycle.** Dispatcher can be paused for maintenance without affecting HTTP ingress (it just queues); HTTP can degrade (overflow 503s) without affecting async work already in the outbox.
|
||||
|
||||
### What this doesn't change
|
||||
|
||||
- Sync HTTP still hits the `ExecutionGate` the same way (now via the dispatcher).
|
||||
- Async outbox dispatch also hits the gate when the dispatcher picks a row. Sync and async share the cap on actual blocking-thread-in-use.
|
||||
- Trigger CRUD likely stays in per-kind tables for schema sanity; the unification is conceptual + dispatch-layer, not schema-layer.
|
||||
|
||||
### Status
|
||||
|
||||
- **Universal outbox for async dispatch**: ✅ Decided 2026-06-01 — yes; all async ingress (KV/cron/pubsub/queue/email/dead-letter) writes to one outbox; one dispatcher reads it.
|
||||
- **Sync HTTP via outbox (NATS-style inbox)**: ✅ Decided 2026-06-01 — in-process oneshot in v1.1.1; cluster-mode keeps the door open for `LISTEN/NOTIFY` keyed on `inbox_id` in v1.3+ (see §3 implementation table).
|
||||
- **Routes-as-trigger conceptually**: ✅ yes — the dispatch layer treats routes and triggers uniformly.
|
||||
- **Trigger storage shape: Layout E (parent + per-kind detail tables)**: ✅ Decided 2026-06-01. One shared `triggers` parent with common columns (`id`, `app_id`, `script_id`, `kind`, `enabled`, `dispatch_mode`, retry config, timestamps); one `<kind>_trigger_details` table per service (`kv_trigger_details`, `cron_trigger_details`, `pubsub_trigger_details`, `queue_trigger_details`, `email_trigger_details`, `dead_letter_trigger_details`). Outbox FKs to `triggers.id`; dead-letters FK same. Exact column set (notably `outbox.app_id` denormalization, whether `script_id` also lives on outbox, ON DELETE behavior on the parent vs detail tables) will be refined when v1.1.1 implementation lands.
|
||||
- **`routes` table stays separate from the `triggers` parent for now**: ✅ Decided 2026-06-01. `routes` is Phase-3 production schema with its own trie-index columns; folding into the parent is a v1.2 cleanup, not a v1.1.1 requirement. Outbox discriminates HTTP rows via `source_kind = 'http'` and `trigger_id` referencing `routes.id` for HTTP, `triggers.id` for everything else.
|
||||
- **Per-route `dispatch_mode: sync|async`**: ✅ Decided 2026-06-01 — ships in v1.1.1. Async returns `202 Accepted` with a JSON body `{ "accepted_at": "...", "execution_id": "..." }`. `dispatch_mode` is a route property fixed at route creation; scripts cannot switch modes mid-call.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~Sync HTTP via outbox + per-request inbox~~ — ✅ Decided 2026-06-01: yes via outbox; in-process oneshot now, `LISTEN/NOTIFY` explicitly preserved for cluster mode (v1.3+).
|
||||
2. ~~Ship `dispatch_mode: async` in v1.1.1~~ — ✅ Decided 2026-06-01: yes; `202 Accepted` + JSON body with `execution_id`; route-level config only.
|
||||
3. ~~Trigger storage shape~~ — ✅ Decided 2026-06-01: Layout E (parent + per-kind detail tables); `routes` stays its own table for v1.1.x. Exact column set deferred to implementation PR.
|
||||
|
||||
---
|
||||
|
||||
## 3. NATS-style request/reply for sync HTTP
|
||||
|
||||
The constraint that makes "universal outbox" tricky: HTTP has a caller waiting. We can't write to outbox, return 202, and walk away — the user's browser expects `200 OK` with body. NATS's request/reply pattern resolves this elegantly.
|
||||
|
||||
### Pattern
|
||||
|
||||
```
|
||||
HTTP request → orchestrator generates inbox_id, registers a oneshot channel
|
||||
→ writes outbox row { source: http, payload, reply_to: inbox_id }
|
||||
→ awaits on the channel (with timeout = script's wall-clock + buffer)
|
||||
|
||||
Dispatcher → picks outbox row
|
||||
→ dispatches to executor (gate + spawn_blocking + Rhai)
|
||||
→ if reply_to.is_some(): resolves the channel with the result
|
||||
→ if reply_to.is_none(): records completion + retries on failure per trigger config
|
||||
|
||||
Orchestrator → channel resolves → returns response to HTTP caller
|
||||
→ on timeout: returns 504 or 500 → see status-code calls below
|
||||
```
|
||||
|
||||
The HTTP caller's experience is unchanged (synchronous request/response). Under the hood, dispatch is identical for every invocation source.
|
||||
|
||||
### Implementation by deployment mode
|
||||
|
||||
| Mode | Mechanism | Trade-off |
|
||||
|---|---|---|
|
||||
| **In-process (v1.1.1, MVP)** | Per-orchestrator `HashMap<InboxId, oneshot::Sender<Result>>`; dispatcher resolves the oneshot | Sub-ms wake-up; fails across process boundaries |
|
||||
| **Cross-process (cluster mode v1.3+)** | Postgres `LISTEN/NOTIFY` keyed on `inbox_id`, with a `responses` row as durable backup | Sub-10ms wake-up; survives across nodes; needs careful long-listener management |
|
||||
| **Polling fallback** | Orchestrator polls `responses` table for `inbox_id` every ~10ms | Simple; ~10ms minimum latency; only as fallback |
|
||||
|
||||
### Latency cost (honest numbers)
|
||||
|
||||
Per sync HTTP request, NATS-style adds: ~1-2ms Postgres write (outbox) + sub-ms dispatcher wake (in-process channel) + ~1ms response resolve = **~2-5ms overhead**. For most scripts (10-100ms execution), this is noise. PiCloud isn't optimizing for sub-ms; the architectural unification is worth a few ms.
|
||||
|
||||
### Default retry policy — decided
|
||||
|
||||
✅ Decided 2026-06-01:
|
||||
|
||||
| Knob | Default | Env override | Per-trigger column |
|
||||
|---|---|---|---|
|
||||
| Max attempts | 3 | `PICLOUD_TRIGGER_RETRY_MAX_ATTEMPTS` | `retry_max_attempts` |
|
||||
| Backoff shape | exponential | `PICLOUD_TRIGGER_RETRY_BACKOFF` (`exponential` \| `linear` \| `constant`) | `retry_backoff` |
|
||||
| Base delay | 1000ms | `PICLOUD_TRIGGER_RETRY_BASE_MS` | `retry_base_ms` |
|
||||
| Jitter | ±20% | `PICLOUD_TRIGGER_RETRY_JITTER_PCT` | (not per-trigger; dispatcher-side) |
|
||||
|
||||
With the defaults, schedule after each failed attempt is **~1s / ~2s / ~4s** (each ±20%), total time-to-dead-letter ~7s.
|
||||
|
||||
**What triggers a retry:** any of Rhai runtime error, wall-clock timeout, operation-budget-exceeded, or platform-side failure (Postgres unavailable, executor crashed). Distinguishing them in the dispatcher is fiddly and the retry cost is bounded by `max_attempts`; if op-budget retries become dead-letter spam in practice, revisit.
|
||||
|
||||
**Per-trigger override:** the three retry columns on the `triggers` parent table (Layout E) take precedence over the env-configured defaults. Trigger CRUD endpoints accept these on create/update; if omitted, the env defaults are applied at write time (not lazily at dispatch — keeps the policy auditable from the row itself).
|
||||
|
||||
**Sync HTTP exception:** unchanged. `reply_to.is_some()` rows are never retried regardless of policy (see below).
|
||||
|
||||
### Retry policy — `reply_to` IS the signal
|
||||
|
||||
| Outbox row | Retry behavior |
|
||||
|---|---|
|
||||
| `reply_to.is_some()` | **Never retry.** Caller is waiting; retrying means the script might run twice and the caller gets one of two outcomes. Always: one attempt, surface result (success or failure) to inbox. |
|
||||
| `reply_to.is_none()` | Retry per trigger's configured policy. Default: 3 attempts, exponential backoff (1s, 2s, 4s), dead-letter after. |
|
||||
|
||||
Per-trigger config lives on the trigger row:
|
||||
|
||||
```
|
||||
trigger { source: cron, schedule: "0 */5 * * * *",
|
||||
retry: { max_attempts: 5, backoff: exponential, base_ms: 1000 } }
|
||||
|
||||
trigger { source: pubsub, topic: "user.created",
|
||||
retry: { max_attempts: 3, backoff: linear, base_ms: 500 } }
|
||||
|
||||
trigger { source: http, method: POST, path: "/api/foo",
|
||||
dispatch_mode: sync } // retry absent — sync HTTP is always 1-attempt
|
||||
```
|
||||
|
||||
### Failure / crash handling
|
||||
|
||||
With NATS-style indirection, there are new ways for a sync HTTP request to vanish. Every failure path must resolve the orchestrator's oneshot channel with something:
|
||||
|
||||
| Failure mode | Detection | Caller sees |
|
||||
|---|---|---|
|
||||
| Script throws / runtime error | Executor returns `ExecError::Runtime` → written to inbox | 502 (or 500 — see status-code discussion) |
|
||||
| Script exceeds wall-clock | `tokio::time::timeout` fires inside dispatcher → written to inbox | 504 (or 500) |
|
||||
| Operation budget exceeded | Executor returns `ExecError::OperationBudgetExceeded` → inbox | 507 (or 500) |
|
||||
| Executor process crashes mid-execution | `JoinError` → `ExecError::Runtime` → inbox | 500 |
|
||||
| Dispatcher process dies between claim and reply | Orchestrator's wait times out | 500 |
|
||||
| Outbox write fails (Postgres unavailable) | Orchestrator never publishes; immediate error | 500 |
|
||||
| Orchestrator's own wait times out unexpectedly | Channel timeout fires before inbox resolves | 504 (or 500) |
|
||||
|
||||
Every path resolves the channel with a result. The orchestrator's outer timeout is the backstop for "dispatcher just died completely".
|
||||
|
||||
### Status code strategy — decided
|
||||
|
||||
✅ Decided 2026-06-01: keep the granular status codes (Option A), with one refinement — `500` is reserved for **platform** problems (dispatcher vanished, outbox write failed, inbox channel timed out unexpectedly), not used as a generic catch-all.
|
||||
|
||||
| Code | Cause | Who's at fault |
|
||||
|---|---|---|
|
||||
| 422 | Request validation failed | Client |
|
||||
| 502 | Script threw / Rhai runtime error | User script |
|
||||
| 503 | Gate refused (overloaded); `Retry-After: 1` | Platform (capacity) |
|
||||
| 504 | Wall-clock timeout | Either (slow script or platform overload) |
|
||||
| 507 | Operation budget exceeded | User script |
|
||||
| 500 | Dispatcher vanished / outbox write failed / inbox channel timed out unexpectedly | Platform (bug or infra) |
|
||||
|
||||
Rationale: each code is actionable for the caller (back off, redesign as async, fix the script, file a bug). Flattening to `500` would collapse "script crashed" vs "overloaded" vs "your timeout is too tight" vs "platform broke" into one undifferentiated signal — losing both client-facing UX and our own observability/alerting axis.
|
||||
|
||||
### Status
|
||||
|
||||
- **NATS-style for sync HTTP**: ✅ Decided 2026-06-01 (see §2 #3).
|
||||
- **`reply_to` presence as the "don't retry" signal**: ✅ Decided 2026-06-01 (folded with the NATS-style decision).
|
||||
- **Status code strategy**: ✅ Decided 2026-06-01 — keep granular distinctions; `500` reserved for platform problems only.
|
||||
- **Default retry policy**: ✅ Decided 2026-06-01 — 3 attempts / exponential / 1000ms base / ±20% jitter; all four env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger columns on the parent table take precedence.
|
||||
- **Cancel-on-timeout semantics**: ✅ Decided 2026-06-01 — option (b). Late results are discarded from the caller's POV (they already got a 504) but the dispatcher writes an `abandoned_executions` row whenever it tries to resolve a oneshot that's already closed/dropped. 7-day default retention via `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`; weekly GC sweep. A counter (`picloud_abandoned_executions_total{app_id}`) bumps on insert — that's the primary observability signal; the rows themselves are for forensics when the counter spikes. Only the dispatcher-after-orchestrator-timeout edge case writes a row; ordinary "script timed out, caller got 504" stays uneventful.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~NATS-style request/reply for sync HTTP~~ — ✅ Decided 2026-06-01 (see §2 #3).
|
||||
2. ~~Status code strategy~~ — ✅ Decided 2026-06-01: Option A (keep distinctions); 500 reserved for platform problems.
|
||||
3. ~~Default retry policy on triggers~~ — ✅ Decided 2026-06-01: 3/exp/1000ms base + ±20% jitter; env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger row columns override the env defaults.
|
||||
4. ~~Cancel-on-timeout semantics~~ — ✅ Decided 2026-06-01: option (b) — `abandoned_executions` table, dispatcher-written, 7-day retention, metric counter on insert.
|
||||
|
||||
---
|
||||
|
||||
## 4. Dead-letter handling
|
||||
|
||||
Events that exhaust their retry policy land in a **separate `dead_letters` table** (not a flag on the outbox — outbox should stay a queue with fast inserts and scans). Users handle dead letters by registering a script for the new `dead_letter` **trigger kind**.
|
||||
|
||||
### Schema sketch
|
||||
|
||||
```sql
|
||||
CREATE TABLE dead_letters (
|
||||
id UUID PRIMARY KEY,
|
||||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
original_event_id UUID NOT NULL, -- the outbox row id
|
||||
source TEXT NOT NULL, -- "kv", "cron", "pubsub", "queue", "email"
|
||||
op TEXT NOT NULL,
|
||||
trigger_id UUID, -- which trigger config fired (null for direct dispatches)
|
||||
script_id UUID, -- which script failed
|
||||
payload JSONB NOT NULL, -- the event payload, verbatim
|
||||
attempt_count INT NOT NULL,
|
||||
first_attempt_at TIMESTAMPTZ NOT NULL,
|
||||
last_attempt_at TIMESTAMPTZ NOT NULL,
|
||||
last_error TEXT NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
resolved_at TIMESTAMPTZ, -- null = unresolved
|
||||
resolution TEXT -- "replayed" | "ignored" | "handled_by_script" | "handler_failed"
|
||||
);
|
||||
|
||||
CREATE INDEX idx_dead_letters_app_unresolved
|
||||
ON dead_letters(app_id) WHERE resolved_at IS NULL;
|
||||
```
|
||||
|
||||
### Dead letter as trigger source
|
||||
|
||||
```
|
||||
trigger {
|
||||
source: dead_letter,
|
||||
filter: { source: "kv" }, -- optional; defaults to "any source"
|
||||
script_id: <your handler>,
|
||||
dispatch_mode: async,
|
||||
retry: { max_attempts: 1 } -- forced — see recursion stop rule below
|
||||
}
|
||||
```
|
||||
|
||||
Filterable on:
|
||||
- `source`: only dead letters from a particular event source (kv, cron, pubsub, …)
|
||||
- `trigger_id`: only dead letters from a particular trigger config
|
||||
- `script_id`: only dead letters from a particular script
|
||||
- No filter: every dead letter fires this handler
|
||||
|
||||
`ctx.event` for a dead-letter handler:
|
||||
|
||||
```rhai
|
||||
ctx.event.source // "dead_letter"
|
||||
ctx.event.dead_letter = #{
|
||||
original: #{
|
||||
source: "kv",
|
||||
op: "insert",
|
||||
collection: "widgets",
|
||||
key: "k1",
|
||||
payload: #{ ... }
|
||||
},
|
||||
attempts: 3,
|
||||
last_error: "script timeout after 30s",
|
||||
trigger_id: "...",
|
||||
script_id: "...",
|
||||
first_attempt_at: "2026-05-30T12:00:00.000Z",
|
||||
last_attempt_at: "2026-05-30T12:00:14.000Z"
|
||||
}
|
||||
```
|
||||
|
||||
The handler can `log::error`, send `email::send` to admins, write to `docs::collection("incidents").create(...)`, post to external alerting via `http::post`, or call `dead_letters::replay(id)` if it decides retry is favorable.
|
||||
|
||||
### Recursion stop rule — decided
|
||||
|
||||
✅ Decided 2026-06-01: **dead-letter handlers execute once, no retry, and CANNOT themselves be dead-lettered.**
|
||||
|
||||
- The flag lives on the **execution/outbox row** (set by the dispatcher when it picks a row whose trigger has `kind = 'dead_letter'`), not on the trigger config. Same handler script could in principle be reused for non-DL work without inheriting the no-retry treatment.
|
||||
- On handler failure:
|
||||
- Full payload + error logged to structured logs
|
||||
- Counter `picloud_dead_letter_handler_failures{app_id}` bumped
|
||||
- Original dead-letter row annotated with `resolution = 'handler_failed'`
|
||||
- **No retry, no second dead-letter row, no further fire.**
|
||||
- **Missing handler script** (trigger references `script_id` that's been deleted): treated as a handler failure — same metric bump, same `resolution = 'handler_failed'`, same no-retry. Auto-disabling the trigger is deferred to v1.2; for v1.1.1 the user sees the metric spike and investigates.
|
||||
- **Indirect loops** (DL handler writes to KV → fires a KV trigger → that handler fails → dead-letters → fires the same DL handler) are not blocked by this rule directly; they're bounded by the existing trigger-depth limit (`cx.trigger_depth`). The recursion-stop rule only prevents the *direct* infinite regress where a DL handler's failure would itself produce a DL row.
|
||||
|
||||
Rationale: if your alerting script is broken, the platform shouldn't try to alert about that with the same broken script. The chain has to terminate, period.
|
||||
|
||||
### Defaults — decided
|
||||
|
||||
✅ Decided 2026-06-01: **no automatic handler.** Dead letters land in the table; users opt into handling by registering a `dead_letter` trigger.
|
||||
|
||||
**Load-bearing commitment:** the v1.1.1 dashboard surfaces this state. Without dashboard surface, "no default handler" is irresponsible — users wouldn't know dead-letters exist until they queried Postgres directly. So shipping the table without the UI is not an option.
|
||||
|
||||
Required in v1.1.1 alongside the table:
|
||||
|
||||
- An **unresolved-count badge** per app, visible in the dashboard's app list and on the app detail page. Source query: `SELECT count(*) FROM dead_letters WHERE app_id = $1 AND resolved_at IS NULL`.
|
||||
- A **per-app dead-letters list view** reachable from the badge. Columns: `created_at`, `source`, `op`, `script_id`, `last_error`, `attempt_count`, `first_attempt_at`, `last_attempt_at`. Per-row actions: **Replay** (re-inserts the original event into the outbox; dispatcher tries again from scratch) and **Mark resolved** (sets `resolution = 'ignored'`, no further action).
|
||||
- A row detail panel showing the full payload + complete error history.
|
||||
|
||||
Rationale: most apps will run for months without ever needing a DL handler; the table is the durable record either way. The dashboard surface gives users the lightest-touch signal that something is wrong without committing v1.1.1 to building a notifications channel.
|
||||
|
||||
A heavier built-in default ("log to admin notifications channel") was considered and rejected — it would smuggle a notifications-surface design into v1.1.1 under the guise of a default, with real product-design questions (channel shape, configuration, opt-out, rate-limiting) that aren't worth answering yet. If the dashboard badge proves insufficient in practice, a structured-log fallback (writing to `execution_logs` with a known `dead_letter` shape) is an additive future change, not a breaking one.
|
||||
|
||||
### Sync HTTP failures don't dead-letter
|
||||
|
||||
Sync HTTP requests (`reply_to.is_some()`) failures don't land in `dead_letters`. Caller already got an error response; every failed HTTP request landing in `dead_letters` would flood the table; `execution_logs` already captures sync request failures. If a user wants alerts on HTTP endpoint failures, that's **monitoring** (v1.3+ territory), not dead-lettering.
|
||||
|
||||
### Pub/sub fan-out dead-letters independently
|
||||
|
||||
One `pubsub::publish` → N subscribers → each retries independently → each can independently dead-letter. So one publish can produce N dead-letter rows (one per subscriber that exhausted retries). Subscribers are independent failure domains.
|
||||
|
||||
### Manual replay — Rhai SDK scope decided
|
||||
|
||||
✅ Decided 2026-06-01: ship `dead_letters::replay(id)` and `dead_letters::resolve(id, reason)` in v1.1.1; **defer `dead_letters::list(filter)` to v1.2** to align with `docs::find()` query semantics.
|
||||
|
||||
| Surface | Use case | Shipping in |
|
||||
|---|---|---|
|
||||
| `POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/replay` | Admin clicks "replay" in dashboard | v1.1.1 |
|
||||
| `POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/resolve` | Admin marks resolved via dashboard | v1.1.1 |
|
||||
| `GET /api/v1/admin/apps/{id}/dead_letters` | Dashboard list view | v1.1.1 |
|
||||
| `dead_letters::replay(id)` Rhai SDK | A handler script decides to retry programmatically | v1.1.1 |
|
||||
| `dead_letters::resolve(id, reason)` Rhai SDK | A handler decides "this is fine, don't bother me" | v1.1.1 |
|
||||
| `dead_letters::list(filter)` Rhai SDK | Bulk replay / cleanup scripts | **v1.2** (aligns with `docs::find()` query DSL) |
|
||||
|
||||
Replay re-inserts the original event into the outbox; dispatcher tries again from scratch.
|
||||
|
||||
**Authz:** both replay and resolve are gated by a new `Capability::AppDeadLetterManage(AppId)` checked inside the service methods. The capability is granted to app admins by default (existing Phase 3.5 role hierarchy). A public HTTP script running with `principal: None` would fail this check, which is correct.
|
||||
|
||||
**Trigger-execution principal (related decision):** ✅ a trigger execution runs as the principal that **registered the trigger**, captured on the trigger row at registration time. This gives a clean "the trigger fires as you" model and matches how cron jobs are typically conceptualized. The original event's principal (e.g. the anonymous caller of a public HTTP route) is recorded for forensics on the outbox row but does not become the execution principal. This is a wider trigger-framework decision surfaced here because dead-letter authz is the first concrete consumer; it applies to **every** trigger kind, not just dead-letter.
|
||||
|
||||
### Retention — decided
|
||||
|
||||
✅ Decided 2026-06-01: **30 days, GC by `created_at`, env-overridable only (no per-app override in v1.1.1).**
|
||||
|
||||
- Default: 30 days
|
||||
- Override: `PICLOUD_DEAD_LETTER_RETENTION_DAYS` (whole-deployment, not per-app)
|
||||
- GC condition: `created_at < NOW() - retention` — applies to both resolved and unresolved rows uniformly. (Activity-age GC — keeping recently-resolved rows 30 days post-resolution — was considered and deferred; can switch if user feedback shows it's needed without breaking anything.)
|
||||
- GC job: weekly sweep in `manager-core`, claiming via `FOR UPDATE SKIP LOCKED` to match the dispatcher's claim pattern.
|
||||
|
||||
Per-app retention overrides are deferred to a later release. The env var covers single-deployer needs; per-app settings would need a dashboard surface + permissions story that isn't worth smuggling into v1.1.1.
|
||||
|
||||
### Status
|
||||
|
||||
- **Separate `dead_letters` table**: leaning yes.
|
||||
- **`dead_letter` as trigger kind**: leaning yes.
|
||||
- **Recursion stop rule** (handlers can't be dead-lettered): ✅ Decided 2026-06-01 (above); flag lives on the execution; missing-handler case treated as handler failure.
|
||||
- **No default handler** (rows sit in table; dashboard surfaces them): ✅ Decided 2026-06-01 — unresolved-count badge + per-app list view ship in v1.1.1 alongside the table.
|
||||
- **Sync HTTP failures don't dead-letter**: leaning yes.
|
||||
- **Retention**: ✅ Decided 2026-06-01 — 30 days, GC by `created_at`, env-only override (`PICLOUD_DEAD_LETTER_RETENTION_DAYS`); weekly `FOR UPDATE SKIP LOCKED` sweep in `manager-core`.
|
||||
- **Rhai SDK scope**: ✅ Decided 2026-06-01 — `replay` + `resolve` ship in v1.1.1; `list` deferred to v1.2 to align with `docs::find()` query DSL. New `Capability::AppDeadLetterManage(AppId)`.
|
||||
- **Trigger-execution principal**: ✅ Decided 2026-06-01 — trigger fires as the principal that registered it (captured on the trigger row at registration). Original event's principal is recorded on the outbox row for forensics but does not become the execution principal. Applies to all trigger kinds.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~Dead-letter handlers unretryable + can't be dead-lettered themselves~~ — ✅ Decided 2026-06-01: confirmed; flag on execution; missing-handler = `resolution = 'handler_failed'`; indirect loops bounded by `cx.trigger_depth`.
|
||||
2. ~~No default dead-letter handler~~ — ✅ Decided 2026-06-01: confirmed; rows sit in the table by default. Dashboard unresolved-count badge + per-app DL list view (with Replay + Mark-resolved actions) ship in v1.1.1 alongside the table.
|
||||
3. ~~30-day default retention~~ — ✅ Decided 2026-06-01: 30 days, GC by `created_at`, env-only override; per-app retention deferred.
|
||||
4. ~~Rhai SDK for dead-letters in v1.1.1~~ — ✅ Decided 2026-06-01: `replay` + `resolve` ship; `list` deferred to v1.2 to align with `docs::find()`; new `Capability::AppDeadLetterManage(AppId)`. Related: trigger executions run as the trigger-registering principal.
|
||||
|
||||
---
|
||||
|
||||
## 5. Realtime updates for external clients
|
||||
|
||||
Apps built on PiCloud need a way for browser/mobile clients to receive live updates (chat messages, dashboard data, multiplayer state, notifications). Today's pub/sub is internal-only (script ↔ script via triggers).
|
||||
|
||||
### The chosen approach — decided
|
||||
|
||||
✅ Decided 2026-06-01: **Option C (one publish API, topics opt-in to external visibility) with the registration split below.**
|
||||
|
||||
- One `pubsub::publish_durable(topic, msg)` API for scripts — produces a single event regardless of who subscribes.
|
||||
- Topics are **internal-only by default**: script triggers can subscribe; external clients cannot.
|
||||
- **Externally-subscribable topics must be registered explicitly** (admin API + dashboard surface). Internal-only topics remain implicit — anyone can `publish_durable("any.topic", msg)` and triggers can subscribe without registration. To externalize: create a `topics` row with `external_subscribable = true` first.
|
||||
- External clients connect to `GET /realtime/topics/{topic}` via SSE; they only receive messages from registered, externally-subscribable topics they're permitted to access.
|
||||
|
||||
**UI/security commitments** (the difference between C working and C being default-public in disguise):
|
||||
|
||||
1. The externally-subscribable opt-in is prominent UI, not a buried checkbox.
|
||||
2. The topic list view shows "external: yes/no" as a first-class column.
|
||||
3. Marking a topic externally-subscribable requires app admin role (capability-gated via `Capability::AppTopicManage(AppId)`).
|
||||
4. The bit-flip is its own API endpoint (not a side-effect of generic topic update) so it carries an independent audit trail.
|
||||
|
||||
**Wins:** one publish API for scripts (DRY), topics are private by default (security), external visibility requires deliberate explicit registration (not just a config flag flipped during quick edits).
|
||||
|
||||
**Why not A (every topic externally-visible by default):** topic names tend to describe the event, not the audience; internal topics frequently carry PII or sensitive payloads; the Firebase-style "remember to lock it down" anti-pattern this whole design rejects.
|
||||
|
||||
**Why not B (separate `channels::` service):** doubles the publish API for almost-identical use cases; scripts wanting both internal triggers AND client push would publish twice; users wrap it in a helper and we're back at C with extra steps and no central policy enforcement.
|
||||
|
||||
### Transport: SSE first — decided
|
||||
|
||||
✅ Decided 2026-06-01: **SSE-only for v1.1.6. WebSocket added in a later release if real bidirectional demand emerges.**
|
||||
|
||||
- Simpler than WebSocket; works through any HTTP proxy without protocol upgrade
|
||||
- Browsers auto-reconnect on disconnect (native `EventSource`)
|
||||
- Covers the dominant use cases (chat-message-list updates, dashboard streams, notifications, IoT telemetry, build-status streams) cleanly
|
||||
- Production-quality SSE requires HTTP/2 between Caddy and clients to dodge the per-origin connection cap on HTTP/1.1 — Caddy speaks HTTP/2 by default, so this is just a config note for the deploy docs
|
||||
|
||||
**Why not ship WS in v1.1.6:** WS is the right tool for sub-100ms bidirectional state (multiplayer games, CRDT collaborative editing, typing-indicator-level presence). On consumer hardware with Postgres-backed event distribution, that latency budget is dominated by the server stack anyway — WS would be paying implementation cost (frame management, ping/pong, close codes, backpressure protocol) without unlocking the latency it's designed for. SSE-only also frees v1.1.6 to invest in `@picloud/client` library quality instead of transport edge cases.
|
||||
|
||||
**Future addition path:** WebSocket coexists with SSE on a different endpoint (e.g. `/realtime/ws/{topic}`) backed by the same subscriber registry. Purely additive — no SSE clients break, no architecture decision in v1.1.6 closes the door.
|
||||
|
||||
### Auth model for external subscribers — decided
|
||||
|
||||
✅ Decided 2026-06-01: ship **public** + **HMAC-signed subscriber-token** auth in v1.1.6; **users-SDK session-based** auth follows in v1.1.8 (additive); **script-mediated per-subscribe** auth deferred to v1.2.
|
||||
|
||||
**Topic config columns:**
|
||||
- `external_subscribable: bool` — can external clients ever subscribe?
|
||||
- `auth_mode: 'public' | 'token'` — if external, what's the gate? (ignored when `external_subscribable = false`)
|
||||
- v1.1.8 adds `auth_mode = 'session'` for users-SDK-based sessions; v1.2 adds `auth_mode = 'script'` for script-mediated.
|
||||
|
||||
**v1.1.6 trust flow (token-gated topics):**
|
||||
|
||||
| Hop | Auth mechanism |
|
||||
|---|---|
|
||||
| Script → its own token-mint endpoint | Existing API-key + app authz |
|
||||
| Script → SDK helper to mint token | New `pubsub::subscriber_token(topics, ttl)` |
|
||||
| Frontend → script's token endpoint | App's own auth (cookie/session/whatever the app defines) |
|
||||
| Frontend → PiCloud SSE | Short-lived HMAC-signed subscriber token (bearer header) |
|
||||
| SSE handler → token validation | HMAC verify, scope-check requested topic against token's allowed list |
|
||||
|
||||
The frontend **never** touches the app's API key. The script signs scoped, short-lived bearers (HMAC over `{topic_list, exp, app_id}`) with a secret derived from the app's API-key material. The SSE endpoint validates the signature without a DB lookup.
|
||||
|
||||
**Token TTL:** clamped 10s ≤ ttl ≤ 24h. Default 1h. Both bounds and default env-overridable (`PICLOUD_SUBSCRIBER_TOKEN_TTL_MIN_SEC`, `PICLOUD_SUBSCRIBER_TOKEN_TTL_MAX_SEC`, `PICLOUD_SUBSCRIBER_TOKEN_TTL_DEFAULT_SEC`).
|
||||
|
||||
**Token revocation:** none in v1.1.6 by design. HMAC bearers can't be revoked individually; rotation of the signing key invalidates all bearers wholesale. Short TTL is the safety mechanism. Per-token revocation arrives implicitly with v1.1.8's session-based auth (sessions CAN be invalidated).
|
||||
|
||||
**Public topics:** no auth at all. `GET /realtime/topics/{topic}` works for anyone if the topic has `external_subscribable = true AND auth_mode = 'public'`. Used for marketing-style broadcasts and public stat boards.
|
||||
|
||||
### Status
|
||||
|
||||
- **Approach C (opt-in external subscription)**: ✅ Decided 2026-06-01 — internal-only by default; externally-subscribable topics require explicit registration + admin-role capability; UI surface treats the bit-flip as a deliberate, audited action.
|
||||
- **SSE first, WebSocket later**: ✅ Decided 2026-06-01 — SSE-only in v1.1.6; WS deferred until concrete demand emerges; future addition is purely additive on a separate endpoint.
|
||||
- **Public + token-gated auth in v1.1.6**: ✅ Decided 2026-06-01 — HMAC-signed subscriber-token flow (not raw API-key passing); `users::*` session-based and script-mediated auth deferred per the table above.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~Approach C confirmed~~ — ✅ Decided 2026-06-01: yes, with explicit registration required for externally-subscribable topics (internal-only stays implicit); new `Capability::AppTopicManage(AppId)`.
|
||||
2. ~~SSE first, WebSocket deferred~~ — ✅ Decided 2026-06-01: SSE-only in v1.1.6; WS deferred to a later release; future addition is purely additive.
|
||||
3. ~~Auth model~~ — ✅ Decided 2026-06-01: public + HMAC-signed subscriber tokens in v1.1.6; `users::*` session auth in v1.1.8; script-mediated auth in v1.2; token TTL clamped 10s–24h (default 1h), env-overridable; no per-token revocation in v1.1.6 (rely on TTL).
|
||||
|
||||
---
|
||||
|
||||
## 6. Frontend client library
|
||||
|
||||
Strategic positioning question: how much should PiCloud expose to frontend developers building apps on top of it?
|
||||
|
||||
### The two ends of the spectrum
|
||||
|
||||
| End | Frontend gets | Examples |
|
||||
|---|---|---|
|
||||
| **Minimalist** | HTTP to dev-defined script endpoints + SSE on dev-marked-public topics. Nothing else. | AWS Lambda + API Gateway, Cloudflare Workers, Deno Deploy |
|
||||
| **Maximalist** | Direct client-side access to KV/docs/users/files. Frontend writes `kv.get()`, `docs.find()`, no Rhai script for trivial reads. | Firebase, Supabase, AWS Amplify |
|
||||
|
||||
PiCloud today sits at the minimalist end (services exist for scripts to use, not for frontends). Crossing to maximalist would be a real product pivot, not a feature add.
|
||||
|
||||
### The chosen approach: hybrid — decided
|
||||
|
||||
✅ Decided 2026-06-01: **Hybrid model. No direct service access from the frontend; client library standardizes script-mediated ceremony.**
|
||||
|
||||
Four pieces ship in `@picloud/client` for v1.1.6:
|
||||
|
||||
1. **Typed HTTP client to dev-defined endpoints** — `picloud.endpoint('/api/users').post({ name: 'alice' })`. Fetch wrapper with auth header injection, retry logic, structured error handling.
|
||||
2. **SSE subscription** — `picloud.subscribe('chat-room-123', msg => …)`. Auto-reconnect, token refresh, backpressure.
|
||||
3. **Auth flow helpers** — `picloud.auth.login(email, password)`, `picloud.auth.logout()`, `picloud.auth.token`. These call **dev-defined** endpoints under the hood (`/api/auth/login` etc.); the lib just standardizes the dance + token storage.
|
||||
4. **Realtime-aware framework hooks** — `useTopic(topic)` for React, store-shape `subscribe(topic)` for Svelte. Thin polish over the SSE primitive; what frontend devs actually write.
|
||||
|
||||
Hard rule, load-bearing: **no `picloud.kv.get()` / `picloud.docs.find()` / `picloud.users.list()` from the frontend.** Direct service access from the browser is a strategic and security commitment, not a v1.1.6 limitation. A frontend dev who wants `kv.get()` from the browser writes a 6-line Rhai script binding it to a route — that friction is intentional, makes the dev decide deliberately that the read is okay to expose.
|
||||
|
||||
**Why not Firebase-mode** (full direct service access):
|
||||
- Different product, different competition (Supabase / Amplify / Appwrite have 5-year head start, fulltime teams).
|
||||
- Requires security-rule language + per-row authorization evaluator + tooling that PiCloud's solo-dev audience cannot operate safely. Firebase's #1 cause of data exposure is misconfigured rules — well-documented, recurring.
|
||||
- Script-as-gate is dramatically more defensible: the rules are just code, in the same language as the rest of the app, debuggable like any other code.
|
||||
|
||||
**Why not pure-minimalist** (no client lib, just docs):
|
||||
- Every PiCloud frontend dev hand-rolls the same fetch wrapper, SSE reconnect, token refresh, login/logout dance. Shipping `@picloud/client` removes that boilerplate without expanding the security surface.
|
||||
|
||||
### Why hybrid, not maximalist
|
||||
|
||||
Firebase trades security for DX; the security-rule misconfiguration footgun is the #1 cause of accidental data exposure in serverless apps. PiCloud's "solo dev / consumer hardware" audience does not have the operational capacity to defend a Firebase-style attack surface against misconfiguration. The script layer is also where PiCloud differentiates — if frontends bypass scripts to talk directly to services, we're competing with Supabase head-to-head (unwinnable, they're better-resourced and have a 5-year head start).
|
||||
|
||||
### Why hybrid, not pure minimalist
|
||||
|
||||
A frontend dev shouldn't have to hand-roll fetch wrappers, SSE reconnect logic, and token-refresh dances. That stuff is identical across every app. Shipping it as `@picloud/client` is genuinely valuable — it doesn't expand the security surface (scripts still gate everything), it just removes boilerplate.
|
||||
|
||||
### TypeScript first — decided
|
||||
|
||||
✅ Decided 2026-06-01: **TypeScript only for v1.1.6. Other-language SDKs deferred, demand-driven, no preemptive ranking.**
|
||||
|
||||
- TS covers ~85% of the realistic v1.x audience (web + React Native mobile + Capacitor + Electron).
|
||||
- Native iOS / Android / Python / Rust / Go users can hit the REST + SSE endpoints directly without an SDK; they lose the typed wrapper but aren't blocked from shipping.
|
||||
- The REST + SSE surface is documented as the **public protocol contract** so future PiCloud or the community can build other-language SDKs against a stable spec. PiCloud doesn't promise specific languages or timelines preemptively; a real user with a concrete use case is what triggers a new SDK.
|
||||
- **Known caveat:** React Native doesn't ship a native `EventSource`. The TS client should runtime-detect and either fall back gracefully or require an explicit polyfill (`react-native-sse` / `react-native-event-source`) with clear docs. Not a blocker; worth surfacing in the v1.1.6 README.
|
||||
|
||||
### Status
|
||||
|
||||
- **Hybrid model (frontend through scripts only)**: ✅ Decided 2026-06-01 — confirmed; no direct service access from the browser; client lib standardizes script-mediated ceremony only.
|
||||
- **TypeScript first, other languages deferred**: ✅ Decided 2026-06-01 — TS-only in v1.1.6; REST + SSE documented as public protocol contract; other languages demand-driven with no preemptive ranking; React Native SSE polyfill noted as known caveat.
|
||||
- **Co-ship with realtime as v1.1.6**: ✅ Decided 2026-06-01 — server-side realtime AND `@picloud/client@1.0.0` ship together in v1.1.6. Built in parallel against a frozen REST + SSE spec. If v1.1.6 scope blows up under pressure, the lib is the deferrable piece (slips to v1.1.6.1); the realtime server itself doesn't slip.
|
||||
- **Type safety / codegen**: ✅ Decided 2026-06-01 — defer codegen to v1.2+; v1.1.6 ships hand-written types with `endpoint<Req, Res>()` generic + optional client-side runtime validation via user-provided schemas (zod/valibot adapter; ~50 lines). No schema-declaration syntax in v1.1.6 — committing to that before v1.2's coherent codegen design would lock us into a shape we'd regret. Doc schemas (already arriving in v1.1.2) are the natural foundation for v1.2 codegen; script-endpoint schemas get designed alongside the generator, not before.
|
||||
|
||||
### Open calls
|
||||
|
||||
1. ~~Hybrid model~~ — ✅ Decided 2026-06-01: confirmed; no direct service access from the frontend; `@picloud/client` ships typed HTTP + SSE + auth-flow + framework hooks.
|
||||
2. ~~TypeScript first, multi-language deferred~~ — ✅ Decided 2026-06-01: TS-only in v1.1.6; REST + SSE is the public protocol; other-language SDKs are demand-driven; React Native SSE polyfill caveat documented.
|
||||
3. ~~Co-ship realtime + client lib~~ — ✅ Decided 2026-06-01: co-ship in v1.1.6, built in parallel against a frozen REST + SSE spec. Lib is the deferrable piece under scope pressure (slips to v1.1.6.1); server doesn't slip.
|
||||
4. ~~Type safety / codegen~~ — ✅ Decided 2026-06-01: defer codegen to v1.2+; v1.1.6 ships hand-written types with `endpoint<Req, Res>()` generic + optional zod/valibot runtime validation; no schema declarations in v1.1.6.
|
||||
|
||||
---
|
||||
|
||||
## 7. Revised v1.1.x roadmap
|
||||
|
||||
Net changes vs the [blueprint §12](../serverless_cloud_blueprint.md) roadmap:
|
||||
|
||||
- **v1.1.5 pub/sub**: now via trigger outbox (drops `LISTEN/NOTIFY` plan), tightening implementation scope
|
||||
- **NEW v1.1.6 Realtime Channels & Client Library**: realtime SSE + `@picloud/client` TS package; co-shipped
|
||||
- **v1.1.7+ items shifted by one** (was v1.1.6/7/8 → now v1.1.7/8/9)
|
||||
- **Dead letters and the unified outbox/dispatcher** are absorbed into v1.1.1's existing scope (triggers framework)
|
||||
|
||||
| Version | Capability |
|
||||
|---|---|
|
||||
| **v1.1.0** | **Foundation & Standard Library** — SDK shape, `Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape; stdlib utilities (regex, random, time, json, base64, hex, url). ✓ Shipped. |
|
||||
| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (universal outbox + dispatcher + NATS-style sync HTTP via inbox + per-trigger retry config + dead-letter table & `dead_letter` trigger source + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. |
|
||||
| **v1.1.2** | **Documents** — `docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. |
|
||||
| **v1.1.3** | **Modules** — `scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. |
|
||||
| **v1.1.4** | **Outbound HTTP & Scheduled Tasks** — `http::*` with SSRF deny-list; cron triggers (small now that the framework exists). |
|
||||
| **v1.1.5** | **Files & Pub/Sub** — filesystem-backed blobs (`files/<app_id>/<id[0:2]>/<id>`) with `files:*` triggers; pub/sub via the universal outbox with `pubsub:*` triggers. |
|
||||
| **v1.1.6** | **Realtime Channels & Client Library** *(new)* — SSE-based external subscription to per-app pub/sub topics (public + HMAC-signed subscriber-token auth, minted via `pubsub::subscriber_token`); `@picloud/client` TypeScript package (typed HTTP via `endpoint<Req,Res>()`, SSE subscription, auth helpers, framework hooks). |
|
||||
| **v1.1.7** | **Configuration & Email** *(was v1.1.6)* — encrypted per-app secrets; outbound `email::send/send_html` + inbound `email:receive` trigger. |
|
||||
| **v1.1.8** | **User Management** *(was v1.1.7)* — `users::*` for in-script CRUD, auth, roles, invites, password reset. |
|
||||
| **v1.1.9** | **Durable Queues & Function Composition** *(was v1.1.8)* — `queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). |
|
||||
| **v1.2** | **Workflows & Hierarchies** (per blueprint §Phase 5) — DAG execution, advanced docs query, interceptors, read triggers, audit log, script-mediated realtime auth, `dead_letters::list` (aligned with `docs::find()` query DSL), client-lib type codegen from script-declared schemas. |
|
||||
| **v1.3+** | **Scale & Ops** (per blueprint §Phase 6) — cluster mode (NATS-style request/reply swaps to `LISTEN/NOTIFY`), cross-app data sharing, script versioning + rollback, rate limiting, richer auth, metrics, distributed tracing, webhooks, S3, monitoring/alerting on HTTP endpoint failures. |
|
||||
|
||||
The v1.1.9 release marks the end of the v1.1.x expansion cadence. v1.2 is the next minor product bump (phase milestone per [versioning policy](versioning.md)).
|
||||
|
||||
---
|
||||
|
||||
## Consolidated open calls
|
||||
|
||||
All 20 open calls were resolved on 2026-06-01. This section is retained as a quick decision index — each item links the original question to the decision recorded in its section above. Sections will be pruned individually as their decisions ship into code and the [serverless_cloud_blueprint.md](../serverless_cloud_blueprint.md).
|
||||
|
||||
### §1 — Messaging primitives
|
||||
1. ~~Pub/sub durability via trigger outbox~~ — ✅ Decided 2026-06-01: `publish_durable` ships in v1.1.5; `publish_ephemeral` committed as a future API.
|
||||
2. ~~Queue and pub/sub stay separate~~ — ✅ Decided 2026-06-01: separate top-level namespaces; no unifying messaging abstraction.
|
||||
|
||||
### §2 — Universal trigger outbox
|
||||
3. ~~Sync HTTP via outbox + per-request inbox~~ — ✅ Decided 2026-06-01: yes via outbox; in-process oneshot for v1.1.1, `LISTEN/NOTIFY` preserved as the cluster-mode (v1.3+) cross-process variant.
|
||||
4. ~~Ship `dispatch_mode: async` for HTTP routes in v1.1.1~~ — ✅ Decided 2026-06-01: yes; `202 Accepted` + JSON body with `execution_id`; route-level config only.
|
||||
5. ~~Trigger storage shape~~ — ✅ Decided 2026-06-01: Layout E (parent `triggers` + per-kind `<kind>_trigger_details`); `routes` stays its own table for v1.1.x; column-set refinements deferred to implementation PR.
|
||||
|
||||
### §3 — NATS-style sync HTTP
|
||||
6. ~~NATS-style request/reply for sync HTTP~~ — ✅ Decided 2026-06-01 (see §2 #3).
|
||||
7. ~~Status code strategy~~ — ✅ Decided 2026-06-01: keep distinctions; `500` reserved for platform problems.
|
||||
8. ~~Default retry policy on triggers~~ — ✅ Decided 2026-06-01: 3/exp/1000ms + ±20% jitter; env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger columns override.
|
||||
9. ~~Cancel-on-timeout semantics~~ — ✅ Decided 2026-06-01: (b) — `abandoned_executions` table; dispatcher-written; 7-day retention via `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`; metric counter on insert.
|
||||
|
||||
### §4 — Dead letters
|
||||
10. ~~Dead-letter handlers unretryable + can't be dead-lettered themselves~~ — ✅ Decided 2026-06-01: confirmed; flag lives on the execution; missing handler = `resolution = 'handler_failed'`; indirect loops bounded by `cx.trigger_depth`.
|
||||
11. ~~No default dead-letter handler~~ — ✅ Decided 2026-06-01: confirmed; rows sit in the table by default. Dashboard unresolved-count badge + per-app DL list view ship in v1.1.1.
|
||||
12. ~~30-day default retention~~ — ✅ Decided 2026-06-01: 30 days, GC by `created_at`, env-only override (`PICLOUD_DEAD_LETTER_RETENTION_DAYS`).
|
||||
13. ~~Rhai SDK for dead-letters in v1.1.1~~ — ✅ Decided 2026-06-01: `replay` + `resolve` in v1.1.1; `list` deferred to v1.2; new `Capability::AppDeadLetterManage(AppId)`. Related: trigger executions inherit the registrant's principal.
|
||||
|
||||
### §5 — Realtime
|
||||
14. ~~Approach C confirmed~~ — ✅ Decided 2026-06-01: yes, with explicit registration required for externally-subscribable topics; new `Capability::AppTopicManage(AppId)`.
|
||||
15. ~~SSE first, WebSocket deferred~~ — ✅ Decided 2026-06-01: SSE-only in v1.1.6; WS deferred.
|
||||
16. ~~Auth model~~ — ✅ Decided 2026-06-01: public + HMAC-signed subscriber tokens in v1.1.6; `users::*` session auth in v1.1.8; script-mediated in v1.2; TTL 10s–24h (default 1h), env-overridable.
|
||||
|
||||
### §6 — Frontend client library
|
||||
17. ~~Hybrid model~~ — ✅ Decided 2026-06-01: confirmed; no direct service access from the frontend; client lib standardizes script-mediated ceremony only.
|
||||
18. ~~TypeScript first, multi-language deferred~~ — ✅ Decided 2026-06-01: TS-only in v1.1.6; REST + SSE is the public protocol contract.
|
||||
19. ~~Co-ship realtime + client lib~~ — ✅ Decided 2026-06-01: co-ship in v1.1.6, parallel-built against a frozen spec; lib is the deferrable piece under scope pressure.
|
||||
20. ~~Type safety / codegen~~ — ✅ Decided 2026-06-01: defer codegen to v1.2+; v1.1.6 ships hand-written types via `endpoint<Req, Res>()` + optional zod/valibot runtime validation.
|
||||
|
||||
---
|
||||
|
||||
## Lifecycle of this document
|
||||
|
||||
- **Created** at the v1.1.0 → v1.1.1 boundary (after the foundation PR series shipped).
|
||||
- **Each section gets pruned** once its decisions ship and land in the blueprint.
|
||||
- **Open calls are answered** in conversation, then folded into the corresponding section as "Decided: X" with the date.
|
||||
- **Document deleted** when v1.1.9 ships — everything by then is either in the blueprint, in code, or explicitly deferred to v1.2+.
|
||||
@@ -14,8 +14,8 @@ All of these carry the same version and are bumped together:
|
||||
|
||||
- Every crate in the Cargo workspace (via `version.workspace = true`)
|
||||
- The dashboard's `package.json`
|
||||
- Docker image tags (`picloud:0.2.0`)
|
||||
- Git tags (`v0.2.0`)
|
||||
- Docker image tags (`picloud:1.1.0`)
|
||||
- Git tags (`v1.1.0`)
|
||||
|
||||
Defined once in [`Cargo.toml`](../Cargo.toml) under `[workspace.package]`. There is no scenario where one crate is at a different version than another in the same build.
|
||||
|
||||
@@ -106,19 +106,15 @@ A versioning scheme without enforcement decays in months. Five cheap mechanical
|
||||
|
||||
## When to bump what
|
||||
|
||||
The product version follows SemVer applied pragmatically — we're pre-1.0, so the rules are looser:
|
||||
The product version uses SemVer with one carve-out for the platform's expansion cadence:
|
||||
|
||||
- **Patch** (`0.2.0 → 0.2.1`) — bug fixes, no surface change
|
||||
- **Minor** (`0.2 → 0.3`) — any surface bump, new features, or breaking changes (pre-1.0 license)
|
||||
- **Major** (`0 → 1`) — first stable release; SDK and API both committed to long-term compatibility
|
||||
- **Major** (`1.x → 2.0`) — surface major bump on a user-facing contract: removed/renamed/retyped SDK function, retired API version, breaking schema change that requires user action, breaking wire-protocol change.
|
||||
- **Minor** (`1.1 → 1.2`) — phase milestone or coherent capability cluster. Bumped when the maintainer marks a release as "the platform moved forward in a way that warrants a number". Typically aligned with blueprint Phase boundaries (Phase 5 → v1.2, Phase 6 → v1.3+).
|
||||
- **Patch** (`1.1.0 → 1.1.1`) — everything else: bug fixes AND **additive-only surface changes**. New SDK function, new admin endpoint, new schema migration that only adds tables/columns, new env var, new trigger kind — all patch.
|
||||
|
||||
After `1.0`, the product version follows strict SemVer based on the *worst* surface change:
|
||||
**Why the carve-out:** PiCloud ships in many small additive PRs (every v1.1.x release adds SDK surface). A strict "minor product bump per minor surface bump" rule would inflate the product version faster than the actual user-perceived "platform changed" milestones warrant. Patch-for-additions keeps the minor digit aligned with capability clusters, not individual feature drops.
|
||||
|
||||
- Any surface major bump → product major bump
|
||||
- Any surface minor bump → product minor bump (at minimum)
|
||||
- No surface changes → product patch
|
||||
|
||||
A surface can hit its own `1.0` independently of the product. The SDK in particular is likely to stabilize before the platform does, since scripts in production demand it.
|
||||
**Surface versions follow their own rules** (table above) and don't track the product version. A surface can independently hit its own `1.0` or `2.0`. The SDK in particular is likely to stabilize before the platform does, since scripts in production demand it.
|
||||
|
||||
---
|
||||
|
||||
@@ -126,7 +122,7 @@ A surface can hit its own `1.0` independently of the product. The SDK in particu
|
||||
|
||||
| | Version |
|
||||
|---|---|
|
||||
| Product | `0.6.0` |
|
||||
| Product | `1.1.0` |
|
||||
| SDK | `1.1` (adds `ctx.request.params`, `ctx.request.query`, `ctx.request.rest`) |
|
||||
| API | `1` (additive: `Script.app_id`, `Route.app_id`, `ExecutionLog.app_id`, new `/api/v1/admin/apps/*` and `/api/v1/admin/api-keys/*` endpoints, `?app=` filter on script list, `Authorization: Bearer pic_…` credential type, 403 responses on previously-401-only admin endpoints when the caller lacks the required capability) |
|
||||
| Schema | `6` (matches `migrations/0006_users_authz.sql`) |
|
||||
@@ -138,15 +134,19 @@ Read live from `GET /version` on any running instance.
|
||||
|
||||
## Examples
|
||||
|
||||
**Adding a `kv.*` SDK in v1.1+:**
|
||||
- Workspace bump: `0.2.0 → 0.3.0` (pre-1.0 minor)
|
||||
- SDK bump: `"1.0" → "1.1"` (added functions only)
|
||||
- API bump: none (no new endpoints affect existing API contract)
|
||||
- Schema bump: `1 → 2` (`0002_kv_store.sql` adds the `kv_store` table)
|
||||
**Adding a `kv.*` SDK in v1.1.1:**
|
||||
- Workspace bump: `1.1.0 → 1.1.1` (patch — additive SDK + schema, no breakage)
|
||||
- SDK bump: `"1.1" → "1.2"` (added functions only)
|
||||
- API bump: none (admin endpoints for trigger CRUD are additive)
|
||||
- Schema bump: `6 → 7` (`0007_kv_store.sql` adds the `kv_store` table)
|
||||
|
||||
**Cutting the v1.2 release (Phase 5: workflows, advanced query, interceptors):**
|
||||
- Workspace bump: `1.1.8 → 1.2.0` (minor — phase milestone)
|
||||
- Even if no individual change is breaking, the maintainer-marked phase transition warrants the minor digit.
|
||||
|
||||
**Renaming `ctx.execution_id` to `ctx.exec_id`:**
|
||||
- SDK bump: `"1.x" → "2.0"` (breaking)
|
||||
- Product: minor bump pre-1.0, major bump post-1.0
|
||||
- SDK bump: `"1.x" → "2.0"` (breaking — removed/retyped script-visible field)
|
||||
- Workspace bump: `1.x.y → 2.0.0` (product major — user-facing contract break)
|
||||
- Migration path: keep `ctx.execution_id` available in 1.x for a deprecation window, add `ctx.exec_id` alongside; flip to 2.0 only when both fields have shipped together for a release.
|
||||
|
||||
**Adding pagination to `GET /api/v1/admin/scripts`:**
|
||||
|
||||
Reference in New Issue
Block a user