17 Commits

Author SHA1 Message Date
MechaCat02
2796f36fef docs(v1.1.1): reviewer audit report — APPROVE verdict
Independent audit of feat/v1.1.1-storage-and-events against the
design notes §1–4 (Decided 2026-06-01) and the original dispatch
prompt. Static checks reproduce green; 243-test workspace suite
passes; schema + dispatcher + inbox conform to the design notes
end-to-end. Nine HANDBACK-flagged deviations reviewed individually
and accepted. One ambient concern (manager-core → executor-core
DTO dependency) flagged for a small CLAUDE.md clarification
post-merge; not a merge blocker.
2026-06-02 07:13:14 +02:00
MechaCat02
5a95ff2d07 docs(v1.1.1): handback report for reviewer
Summary of the 11-commit v1.1.1 branch:
- branch + commit count, scope coverage table, decisions made
  mid-implementation, deviations from the design notes
- tests added (47 new) + intentionally-untested gaps
- open questions for the reviewer
- deferred items
- verification commands + manual smoke flow
- known limitations / rough edges

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:27:18 +02:00
MechaCat02
66b661f64c chore(release): bump workspace to v1.1.1 + CHANGELOG
- Workspace package version: 1.1.0 → 1.1.1 (patch under the
  post-1.0 expansion-phase carve-out in docs/versioning.md)
- Rhai SDK version: 1.1 → 1.2 — minor bump, additive only.
  New surfaces: kv::*, dead_letters::*, ctx.event.
- Dashboard package version: 0.6.0 → 0.7.0 for the dead-letters UI.
- HTTP API version stays at 1 (additive: trigger CRUD, dead-letter
  admin endpoints, dispatch_mode field on routes).
- Schema version: 6 → 12 (migrations 0007–0012).

CHANGELOG.md created at the repo root following the convention from
prior bumps (release commits + design-notes references).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:24:25 +02:00
MechaCat02
6b7ff78730 feat(v1.1.1-gc): dead-letter + abandoned-executions retention sweepers
Two tokio tasks spawned at startup that sweep their respective
tables on a weekly cadence (design notes §3 #9 + §4 retention).
Both use `FOR UPDATE SKIP LOCKED` on the claim query so concurrent
sweepers in cluster mode (v1.3+) don't fight each other.

Defaults: 30 days for dead_letters, 7 days for abandoned_executions.
Both env-overridable via `PICLOUD_DEAD_LETTER_RETENTION_DAYS` and
`PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS` (loaded into
`TriggerConfig::from_env` from commit 5).

Per-tick batch cap (5_000 rows) so a sweep can't lock up the table
in a single transaction; the inner loop continues until 0 rows
affected, after which the outer tick waits for the next week.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:22:42 +02:00
MechaCat02
1795dfc98a feat(v1.1.1-dead-letters): dashboard badge + list view
Design notes §4 makes the dashboard surface load-bearing — with no
default DL handler, users wouldn't know dead letters exist
otherwise.

New route: `apps/[slug]/dead-letters/+page.svelte` — list view
columns per the design notes:
- `created_at`, `source`, `op`, `script_id`, `attempt_count`,
  `first/last_attempt_at`, `last_error` (truncated; clickable)
- per-row Replay + Mark resolved buttons
- expandable row detail panel showing full payload (JSON) +
  full last_error
- unresolved-only filter (default on); refresh button

Per-app detail page (`apps/[slug]/+page.svelte`) grows a "Dead
letters" link in the tabs nav, with a red unresolved-count pill
when > 0. Loaded in parallel with the existing app loaders so it
doesn't slow the page.

Apps list (`apps/+page.svelte`) shows the same red pill next to
each app's name when its unresolved count > 0. Counts fetched in
parallel after the apps list lands; failures here are non-fatal
(just no badge).

API client wiring: `api.deadLetters.{count,list,get,replay,resolve}`
mirrors the v1.1.1 admin endpoints. `DeadLetterRow` type added to
the dashboard's API shape declarations.

dashboard's svelte-check passes (369 files, 0 errors, 0 warnings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:21:20 +02:00
MechaCat02
20f1b5e64d feat(v1.1.1-dead-letters): service + Rhai SDK + admin endpoints
`PostgresDeadLetterService` lands as the real `DeadLetterService`
impl, replacing `NoopDeadLetterService` in the picloud binary's
`Services` bundle. Both methods are gated by
`Capability::AppDeadLetterManage(AppId)` — public-HTTP scripts with
`principal: None` fail the check, per design notes §4.

- `dead_letters::replay(id)` (Rhai SDK + admin endpoint): re-inserts
  the original event payload into the outbox with attempt_count=0,
  reply_to=None. The DL row is marked `resolution='replayed'`.
- `dead_letters::resolve(id, reason)` (Rhai SDK + admin endpoint):
  closes the row with `resolved_at = NOW()` and the given reason.
  CHECK constraint on the column enforces the 4-value vocabulary.
- `dead_letters::list(filter)` is intentionally NOT shipped —
  design notes §4 defers it to v1.2 to align with the eventual
  `docs::find()` query DSL.

Admin endpoints under `/api/v1/admin/apps/{id}/dead_letters/*`:
- `GET    /` (with `?unresolved=true`) → list view
- `GET    /count`                       → unresolved-count badge
- `GET    /{dl_id}`                     → row detail (full payload + error)
- `POST   /{dl_id}/replay`              → re-enqueue
- `POST   /{dl_id}/resolve` body `{reason}` → close out
All cross-app-aware: the row's `app_id` is compared against the path
param so a caller with rights on app A cannot manipulate app B's
dead letters by id alone.

The Rhai bridge for `dead_letters::*` follows the same sync↔async
pattern as the `kv::` bridge (`Handle::current().block_on(...)`
inside the spawn_blocking-wrapped Rhai engine).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:17:25 +02:00
MechaCat02
77b2cb58bb feat(v1.1.1-routes): outbox-routed sync HTTP + dispatch_mode=async
Routes gain `dispatch_mode TEXT NOT NULL DEFAULT 'sync'` (migration
0012). Existing routes default to sync so the migration is
non-breaking. `DispatchMode` enum lands in `picloud-shared`.

The user-routes orchestrator handler now branches:
- `dispatch_mode = async` → write outbox row with `reply_to = None`,
  return `202 Accepted` + `{accepted_at, execution_id}`. Dispatcher
  fires the script in the background; retries / dead-letters via
  the framework from commit 5.
- `dispatch_mode = sync` → register an inbox channel
  (`tokio::sync::oneshot`), write outbox row with `reply_to =
  inbox_id`, `.await` on the receiver with a timeout =
  script.timeout_seconds + 2s buffer. Dispatcher hands the result
  back; orchestrator maps `InboxResult` into the HTTP response per
  the design-notes §3 status-code table (422/502/503/504/507/500).

`InboxRegistry` (orchestrator-core/src/inbox.rs) is the in-process
implementation of `InboxResolver`. Lock-free HashMap of pending
oneshot senders keyed by `inbox_id`. Tests cover register/deliver
round-trip, unknown-id is abandoned, dropped-receiver is abandoned,
explicit cancel. Cluster mode (v1.3+) swaps this for
LISTEN/NOTIFY-keyed lookup behind the same trait.

`OutboxWriter` trait lives in `picloud-shared` so orchestrator-core
can write to the outbox without depending on manager-core (which
would invert the dependency arrow). `PostgresOutboxRepo` implements
both `OutboxRepo` (dispatcher surface) and `OutboxWriter`
(orchestrator surface); the picloud binary clones the same concrete
Arc into both trait views.

The dispatcher's HTTP arm (commit 5 had a stub) now decodes the
`HttpDispatchPayload` off the outbox row, looks up the script,
synthesizes an `ExecRequest`, and runs it through the executor.
Outcome routing reuses the same path as KV triggers — sync HTTP
flows through the inbox, async dispatch gets dropped after
success (or DL'd on exhaustion).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:12:55 +02:00
MechaCat02
6a2971ac70 feat(v1.1.1-dispatcher): dispatcher loop + retry + depth limit + outbox emitter
`OutboxEventEmitter` replaces `NoopEventEmitter` in the picloud
binary's `Services` bundle. KV mutations now fan out to the outbox
via `TriggerRepo::list_matching_kv` — one row per matching trigger,
carrying the serialized `TriggerEvent` payload + the matching
trigger's retry policy.

`Dispatcher` is the single tokio task that polls the outbox every
100ms, claims due rows via FOR UPDATE SKIP LOCKED (with a batch cap),
and routes each to the executor. Shares the `ExecutionGate` with
sync HTTP per design notes §2 — gate saturation reschedules the
row instead of dropping it.

Outcome handling matches design notes §3 and §4:
- reply_to.is_some() (sync HTTP): never retry. Deliver via
  `InboxResolver`; if the receiver was dropped, write an
  `abandoned_executions` row.
- is_dead_letter_handler == true: never retry, never DL. On
  failure, annotate the original DL row with
  `resolution = 'handler_failed'`. Stops the recursion that would
  otherwise re-fire a broken handler script.
- Otherwise async: bump attempt_count, reschedule with exponential
  backoff + ±jitter; once max_attempts is reached, write a
  `dead_letters` row and drop from outbox.
- Trigger-depth limit: `cx.trigger_depth > max_trigger_depth` skips
  execution entirely (log + future metric), NEVER dead-letters.
  Loops are not retried via the DL chain — they're terminated.

`InboxResolver` trait lands in `picloud-shared` with a
`NoopInboxResolver` bootstrap that flags every delivery as
`Abandoned`. Commit 6 replaces the noop with the real
in-process registry in `orchestrator-core`.

`AdminPrincipalResolver` builds a `Principal` from a trigger's
`registered_by_principal` user id so the dispatched script executes
as the trigger registrant (design notes §4).

Unit tests cover backoff math (exponential/linear/constant) +
jitter range + ExecError → InboxFailureKind classification + the
status-code table mapping. Integration tests for the full
dispatcher loop need a real Postgres + executor; reviewer runs them
via the manual smoke flow in the plan / HANDBACK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:01:42 +02:00
MechaCat02
2e92691ee1 feat(v1.1.1-triggers): trigger CRUD admin endpoints
`/api/v1/admin/apps/{id}/triggers/*` — separate POST endpoints per
kind (kv / dead_letter) so each request validates against the
correct shape. List and DELETE work across both kinds.

Gated on `Capability::AppManageTriggers(app_id)`, which maps onto
`Scope::AppAdmin` (no new scope variants — seven-scope commitment
held) and is granted at the per-app `AppAdmin` role.

Request payloads accept `dispatch_mode` (defaults to `async`) and
retry-override fields. Omitted retry fields fall back to
`TriggerConfig::from_env`, which the binary plumbs into
`TriggersState` so the row is auditable from itself (no lazy
resolution at dispatch time). `registered_by_principal` is taken
from the authenticated principal — design notes §4: "a trigger
execution runs as the principal that registered the trigger".

DELETE loads the trigger first and 404s if its `app_id` doesn't
match the path — prevents a caller with rights on app A from
deleting a trigger via app B's path (bound-key safety net).

In-memory tests cover: app-not-found, member-without-role 403,
default-fallback for retry settings when request omits them,
empty-glob rejection, cross-app delete is treated as not-found.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:52:51 +02:00
MechaCat02
545d863199 feat(v1.1.1-triggers): triggers + outbox schema + repos
Migrations 0008-0011 lay down the triggers framework's storage:

- `triggers` + `kv_trigger_details` + `dead_letter_trigger_details`
  (Layout E, design notes §2). Parent table carries common columns
  including `registered_by_principal` — the dispatcher uses this to
  run the trigger as the user that registered it (design notes §4).
- `outbox`: universal async dispatch substrate. KV/cron/pubsub/queue/
  email/dead-letter all write rows in the same shape; the dispatcher
  claims due rows via FOR UPDATE SKIP LOCKED. `reply_to` is the
  NATS-style inbox id for sync HTTP (commit 6) — its presence flags
  "don't retry" per the design.
- `dead_letters`: exact schema from design notes §4 with the four-
  value `resolution` CHECK constraint (`replayed | ignored |
  handled_by_script | handler_failed`) and partial index on
  unresolved rows for the dashboard badge.
- `abandoned_executions`: forensic table for the dispatcher's
  "tried to resolve a dropped inbox" edge case (design notes §3 #9).

Repo surfaces with Postgres impls behind traits so unit tests can
swap in-memory backings:
- `TriggerRepo` — CRUD + the `list_matching_kv` /
  `list_matching_dead_letter` hot paths the dispatcher uses.
  Includes a `collection_matches` helper that handles `*`, `prefix:*`,
  and exact-name globs.
- `OutboxRepo` — insert + claim-due + delete + reschedule.
- `DeadLetterRepo` — insert + get + list + unresolved-count +
  resolve + GC.
- `AbandonedRepo` — insert + GC.

`TriggerConfig::from_env` (new module) follows the existing
`SandboxCeiling` env-loading pattern for `PICLOUD_MAX_TRIGGER_DEPTH`,
`PICLOUD_TRIGGER_RETRY_*`, `PICLOUD_DEAD_LETTER_RETENTION_DAYS`, and
`PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`.

`Capability::AppManageTriggers(AppId)` and `AppDeadLetterManage(AppId)`
join the enum. Both map onto the existing `Scope::AppAdmin` per the
seven-scope commitment; `role_satisfies` grants them at the
`AppAdmin` per-app role.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:46:45 +02:00
MechaCat02
6b99f74c48 feat(v1.1.1-kv): Rhai kv:: SDK module + ctx.event wiring
Wires the KV store into Rhai scripts via the handle pattern:

    let widgets = kv::collection("widgets");
    widgets.set("k", #{ n: 1 });
    let v = widgets.get("k");          // value or () if absent
    widgets.has("k") / widgets.delete("k")
    let page = widgets.list();          // cursor-style pagination

`KvHandle` is a custom Rhai type holding `Arc<dyn KvService>` + the
per-call `Arc<SdkCallCx>`. Methods route async service calls through
`tokio::Handle::current().block_on(...)` — works because
`LocalExecutorClient` runs the script under `spawn_blocking` so a
runtime is reachable. The bridge surfaces `app_id` exclusively
through `cx.app_id`; no public-facing argument can spoof an app.

`TriggerEvent` lands in `picloud-shared` as the wire shape the
dispatcher will emit (KV + DeadLetter variants — KV exercised now,
DL hooks up with the dispatcher in commit 5/8). `SdkCallCx` and
`ExecRequest` grow `is_dead_letter_handler: bool` and
`event: Option<TriggerEvent>`. `engine.rs::build_ctx_map` flattens
the event into `ctx.event` for triggered handlers; direct ingress
leaves the key absent so scripts can `if "event" in ctx`.

Tests:
- 7 `sdk_kv.rs` integration tests covering the full Rhai surface
  (round-trip, missing-key unit, has bool, delete was-present,
  empty-collection rejection, cursor pagination, cross-app
  isolation through the bridge).
- 3 new `engine.rs` tests pinning `ctx.event` shape per
  design notes §4 (KV insert with value, delete with unit value,
  direct invocations have no `event` key).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:38:41 +02:00
MechaCat02
434fb63cd2 feat(v1.1.1-kv): migrations + KvService trait + Postgres impl
First v1.1.1 commit. Adds the KV store the design notes commit to:
`(app_id, collection, key)` identity with JSONB value and a per-app
index. Trait lives in `picloud-shared` so the executor-core Rhai
bridge (next commit), the Postgres impl, and tests all depend on the
same surface without coupling crates.

The `Services` bundle grows from empty to three fields: `kv`,
`dead_letters` (NoopDeadLetterService stub — replaced by the
Postgres impl in commit 8), and `events` (NoopEventEmitter until the
outbox emitter lands with the dispatcher). Tests use
`Services::default()` for an all-noop bundle.

New capabilities `AppKvRead` / `AppKvWrite` join the Capability
enum. They map onto the existing seven-value `Scope` (script:read /
script:write) — the scope vocabulary stays locked per the
`docs/versioning.md` commitment.

Script-as-gate semantics in `KvServiceImpl`: capability check runs
when `cx.principal.is_some()`, skipped when None (public HTTP).
Cross-app isolation is enforced independently by deriving every
row's `app_id` from `cx.app_id` rather than a script-passed argument.

In-memory `KvRepo` impl + unit tests cover the round-trips, the
cross-app isolation property, empty-collection rejection,
script-as-gate behaviour for both anonymous and authed contexts,
and cursor-style pagination. Postgres impl exists; integration
testing waits for a real DB harness (see HANDBACK).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:29:59 +02:00
MechaCat02
1efb350b54 docs(v1.1.x): resolve in-flight decisions as Decided 2026-06-01
Annotates the v1.1.x design notes with the resolutions for the 20 open
calls — pub/sub split, universal outbox, NATS-style sync HTTP, status
code strategy, retry policy, dead-letter recursion-stop, realtime
auth model, frontend client library scope. Captured ahead of the
v1.1.1 implementation so the schema + API decisions in this branch
have a single load-bearing source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:22:25 +02:00
MechaCat02
10cfde9e40 docs(v1.1.x): planning notes — in-flight decisions + revised roadmap
Consolidates the architectural conversations that followed the v1.1.0
release but haven't yet landed in the blueprint or in code. Six topic
areas, each with status + open calls:

  1. Messaging primitives — invoke vs pub/sub vs queue, recipient
     model and delivery semantics
  2. Universal trigger outbox — async dispatch substrate for every
     event source (sync HTTP excepted, see #3)
  3. NATS-style sync HTTP — per-request inbox + oneshot channel lets
     sync HTTP ride the outbox without losing the response path
  4. Dead-letter handling — separate table, dead_letter trigger kind,
     recursion stop rule, retention defaults
  5. Realtime updates — SSE-based external subscription to per-app
     pub/sub topics with opt-in exposure
  6. Frontend client library — hybrid model (TS lib that talks to
     dev-defined script endpoints, not to services)

Plus a revised v1.1.x roadmap: realtime adds at v1.1.6 (was Config &
Email), shifting later items by one to v1.1.9 (was v1.1.8).

20 open calls consolidated at the bottom, numbered for reference.
Document is meant to be pruned as decisions ship; deleted entirely
when v1.1.9 lands.

No blueprint changes yet — those wait for the open calls to be
answered and the corresponding PRs to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:24:53 +02:00
MechaCat02
bb88b024d2 docs(versioning): post-1.0 policy with expansion-phase carve-out
Rewrites the "When to bump what" section now that the project is
post-1.0. Replaces the pre-1.0 framing with three explicit rules:

  - Major: surface major bump on a user-facing contract
  - Minor: phase milestone or coherent capability cluster, aligned
    with blueprint Phase boundaries (Phase 5 -> v1.2, etc.)
  - Patch: bug fixes AND additive-only surface changes

The carve-out (patch for additive surface changes) resolves the
tension with the v1.1.x roadmap: every v1.1.x release adds SDK or
schema surface, and strict "minor product bump per minor surface
bump" would inflate the version faster than the user-perceived
"platform changed" milestones warrant.

Examples updated to reflect post-1.0 numbers and the new policy:
adding KV in v1.1.1 (patch), cutting v1.2 as a phase milestone
(minor), renaming a ctx field (major).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 20:41:48 +02:00
MechaCat02
9d01f42d5e chore(release): bump workspace to v1.1.0
Aligns the Cargo package version with the blueprint roadmap labels.
v1.1.0 = SDK foundation (#0) + stdlib utilities (#0.5), the first
release of the Phase 4 / v1.1 series.

Also updates docs/versioning.md:

  - Current versions table: Product 0.6.0 -> 1.1.0
  - Docker / Git tag examples: 0.2.0 -> 1.1.0

Cargo.lock regenerated by `cargo check --workspace`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 20:39:34 +02:00
MechaCat02
1a6324078c Merge branch 'feat/v1.1.0-stdlib-utilities'
v1.1.0 PR #0.5 — Stdlib Utilities. Second and final PR of v1.1.0.

Seven stateless utility modules registered once at engine build:

  - regex:: — is_match/find/find_all/replace/replace_all/split/captures
    via the Rust regex crate (linear-time, no backtracking).
  - random:: — int/float/bytes/string/uuid via OsRng (CSPRNG only;
    bytes capped at 64 KiB, string at 4 KiB).
  - time:: — now/now_ms/parse/format/add_seconds/diff_seconds (UTC
    only, RFC 3339, checked arithmetic).
  - json:: — parse/stringify/stringify_pretty (reuses the existing
    dynamic <-> JSON bridge).
  - base64:: — encode/decode + encode_url/decode_url, String+Blob
    inputs on encode.
  - hex:: — encode/decode (lowercase out, case-insensitive in).
  - url:: — encode/decode + encode_query (RFC 3986 unreserved set,
    BTreeMap-ordered query iteration).

Plus docs/stdlib-reference.md covering Rhai's built-in math/string/
array/map plus all seven new namespaces in one reference page, and a
CLAUDE.md pointer to that doc.

Three new workspace deps: regex 1, hex 0.4, percent-encoding 2.
+43 integration tests in crates/executor-core/tests/stdlib.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 20:33:16 +02:00
65 changed files with 8310 additions and 136 deletions

88
CHANGELOG.md Normal file
View File

@@ -0,0 +1,88 @@
# PiCloud Changelog
## v1.1.1 — Storage & Events (unreleased)
The triggers framework — KV store + universal outbox + dispatcher +
NATS-style sync HTTP + per-route async dispatch + dead-letter
handling + dashboard surface. Every subsequent v1.1.x service module
(docs, files, pubsub, …) hangs off the dispatcher built here.
### Added
- **KV store** — `kv_entries` table keyed `(app_id, collection, key)`
with JSONB values. Rhai SDK exposes the handle pattern:
`kv::collection(name).{get,set,has,delete,list}`. Cursor-style
pagination with opaque base64 cursors. Cross-app isolation
enforced via `cx.app_id` (never script-passed).
- **Triggers framework (Layout E)** — parent `triggers` table +
per-kind detail tables (`kv_trigger_details`,
`dead_letter_trigger_details`). Trigger CRUD admin endpoints
(`/api/v1/admin/apps/{id}/triggers/{kv,dead_letter}`) +
`Capability::AppManageTriggers(AppId)`.
- **Universal outbox + dispatcher** — single tokio task that polls
the outbox via `FOR UPDATE SKIP LOCKED`, routes due rows to the
executor through the shared `ExecutionGate`. Retry with
exponential backoff + ±jitter; on exhaustion, dead-letter.
- **NATS-style sync HTTP via outbox** — `InboxRegistry` (in-process
oneshot map) lets the orchestrator await dispatcher delivery on
every sync HTTP request. Cluster mode (v1.3+) swaps this for
`LISTEN/NOTIFY` behind the same `InboxResolver` trait.
- **`dispatch_mode: async` on routes** — `POST` to a route with
`dispatch_mode = 'async'` returns `202 Accepted` immediately;
the script runs via the dispatcher (with retries / dead-letter).
- **Dead-letter handling** — separate `dead_letters` table per
design notes §4. `dead_letters::{replay,resolve}` Rhai SDK +
admin endpoints + `Capability::AppDeadLetterManage(AppId)`.
Recursion-stop rule: dead-letter handler failures annotate the
original row as `resolution = 'handler_failed'` and never produce
a new dead-letter or retry.
- **Dashboard surface for dead letters** — unresolved-count red
badge on the apps list + per-app page; per-app dead-letters list
view at `/admin/apps/{slug}/dead-letters` with Replay + Mark
resolved per-row actions and expandable payload detail.
- **`abandoned_executions` table** — forensic row written by the
dispatcher when it tries to resolve an inbox the orchestrator
already abandoned (timed out). Counter metric path reserved.
- **Trigger-depth limit** — `cx.trigger_depth > max_trigger_depth`
(default 8) skips execution + logs; does NOT dead-letter
(depth-exceeded means "you built a loop").
- **GC sweepers** — weekly retention sweeps for `dead_letters`
(30 days) and `abandoned_executions` (7 days), both with
`FOR UPDATE SKIP LOCKED` for cluster-mode safety.
- **Env-overridable trigger config** — `TriggerConfig::from_env`
reads `PICLOUD_MAX_TRIGGER_DEPTH`, `PICLOUD_TRIGGER_RETRY_*`,
`PICLOUD_DEAD_LETTER_RETENTION_DAYS`,
`PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`.
### Changed
- **Workspace version**: `1.1.0``1.1.1`.
- **Rhai SDK version**: `1.1``1.2` (additive — every v1.1 script
still runs unchanged; new surfaces: `kv::*`, `dead_letters::*`,
`ctx.event` for triggered handlers).
- **Dashboard version**: `0.6.0``0.7.0` for the dead-letters UI.
- **`Services` bundle** — replaces v1.1.0's no-arg `Services::new()`
with explicit `Services::new(kv, dead_letters, events)`. Tests
use `Services::default()` for an all-noop bundle.
- **`SdkCallCx`** grows `is_dead_letter_handler: bool` and
`event: Option<TriggerEvent>` fields.
- **`ExecRequest`** mirrors the new `SdkCallCx` fields and grows
`event` for serializable trigger payload transport.
- **Routes table** grows `dispatch_mode TEXT NOT NULL DEFAULT 'sync'`
(CHECK in {sync, async}).
- **Schema version**: 6 → 12 (migrations 0007 through 0012).
### Migrations
- `0007_kv.sql``kv_entries` table + index
- `0008_triggers.sql``triggers` + `kv_trigger_details` +
`dead_letter_trigger_details`
- `0009_outbox.sql` — universal `outbox` table + due-row partial index
- `0010_dead_letters.sql``dead_letters` table + unresolved partial
index + GC index
- `0011_abandoned_executions.sql` — forensic table + GC index
- `0012_routes_dispatch_mode.sql``routes.dispatch_mode` column
## v1.1.0 — Foundation & Standard Library
See `docs/v1.1.x-design-notes.md` §7 for the full v1.1.x roadmap.

21
Cargo.lock generated
View File

@@ -1505,7 +1505,7 @@ checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220"
[[package]] [[package]]
name = "picloud" name = "picloud"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
@@ -1531,7 +1531,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-cli" name = "picloud-cli"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"assert_cmd", "assert_cmd",
@@ -1552,7 +1552,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-executor" name = "picloud-executor"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"picloud-executor-core", "picloud-executor-core",
@@ -1564,8 +1564,9 @@ dependencies = [
[[package]] [[package]]
name = "picloud-executor-core" name = "picloud-executor-core"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"async-trait",
"base64", "base64",
"chrono", "chrono",
"hex", "hex",
@@ -1577,13 +1578,14 @@ dependencies = [
"serde", "serde",
"serde_json", "serde_json",
"thiserror 1.0.69", "thiserror 1.0.69",
"tokio",
"tracing", "tracing",
"uuid", "uuid",
] ]
[[package]] [[package]]
name = "picloud-manager" name = "picloud-manager"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"picloud-manager-core", "picloud-manager-core",
@@ -1595,7 +1597,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-manager-core" name = "picloud-manager-core"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"argon2", "argon2",
"async-trait", "async-trait",
@@ -1603,6 +1605,7 @@ dependencies = [
"base64", "base64",
"chrono", "chrono",
"data-encoding", "data-encoding",
"picloud-executor-core",
"picloud-orchestrator-core", "picloud-orchestrator-core",
"picloud-shared", "picloud-shared",
"rand 0.8.6", "rand 0.8.6",
@@ -1619,7 +1622,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-orchestrator" name = "picloud-orchestrator"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"picloud-orchestrator-core", "picloud-orchestrator-core",
@@ -1631,7 +1634,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-orchestrator-core" name = "picloud-orchestrator-core"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"axum", "axum",
@@ -1650,7 +1653,7 @@ dependencies = [
[[package]] [[package]]
name = "picloud-shared" name = "picloud-shared"
version = "0.6.0" version = "1.1.1"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"chrono", "chrono",

View File

@@ -13,7 +13,7 @@ members = [
] ]
[workspace.package] [workspace.package]
version = "0.6.0" version = "1.1.1"
edition = "2021" edition = "2021"
rust-version = "1.92" rust-version = "1.92"
license = "MIT OR Apache-2.0" license = "MIT OR Apache-2.0"

340
HANDBACK.md Normal file
View File

@@ -0,0 +1,340 @@
# v1.1.1 Implementation HANDBACK
## 1. Branch + commit count
- Branch: `feat/v1.1.1-storage-and-events`
- Base: `main`
- 11 commits ahead of `main`. Branch is **not pushed**, **not merged**.
```
66b661f chore(release): bump workspace to v1.1.1 + CHANGELOG
6b7ff78 feat(v1.1.1-gc): dead-letter + abandoned-executions retention sweepers
1795dfc feat(v1.1.1-dead-letters): dashboard badge + list view
20f1b5e feat(v1.1.1-dead-letters): service + Rhai SDK + admin endpoints
77b2cb5 feat(v1.1.1-routes): outbox-routed sync HTTP + dispatch_mode=async
6a2971a feat(v1.1.1-dispatcher): dispatcher loop + retry + depth limit + outbox emitter
2e92691 feat(v1.1.1-triggers): trigger CRUD admin endpoints
545d863 feat(v1.1.1-triggers): triggers + outbox schema + repos
6b99f74 feat(v1.1.1-kv): Rhai kv:: SDK module + ctx.event wiring
434fb63 feat(v1.1.1-kv): migrations + KvService trait + Postgres impl
1efb350 docs(v1.1.x): resolve in-flight decisions as Decided 2026-06-01
```
The first commit (`1efb350`) absorbed working-tree edits to
`docs/v1.1.x-design-notes.md` that turned the "in-flight" 20 open
calls into "Decided 2026-06-01" entries. Those were on the working
tree at branch creation; folding them into the v1.1.1 branch keeps
the design rationale colocated with the implementation.
## 2. Scope coverage (Done / Partial / Skipped)
| Scope item | Status | Notes |
|---|---|---|
| **1. KV store** | Done | Migration 0007, `KvService` trait in shared, `KvServiceImpl` + `PostgresKvRepo` in manager-core, Rhai `kv::collection(name).{get,set,has,delete,list}` bridge, cursor pagination, empty-collection rejection, script-as-gate authz. |
| **2. Triggers framework — Layout E** | Done | Migrations 0008 (`triggers` + `kv_trigger_details` + `dead_letter_trigger_details`), `TriggerRepo` + `PostgresTriggerRepo`, CRUD admin endpoints. `registered_by_principal` column captured + threaded into the dispatcher. Depth-limit enforced in the dispatcher (default 8). |
| **3. Universal outbox + dispatcher** | Done | Migration 0009 (`outbox`), `OutboxRepo` + `PostgresOutboxRepo`, `Dispatcher` tokio task. Polls every 100ms, claims 8 rows/tick via `FOR UPDATE SKIP LOCKED`, gate-bounds dispatch, retries with backoff+jitter, dead-letters on exhaustion, late-completion → `abandoned_executions`. |
| **4. NATS-style sync HTTP** | Done | `InboxRegistry` in orchestrator-core (in-process `Mutex<HashMap<Uuid, oneshot::Sender>>`), `InboxResolver` trait in shared. Orchestrator sync-route path registers receiver, writes outbox row with `reply_to`, awaits with timeout = script.timeout + 2s buffer. Status mapping per design notes §3 (422/502/503/504/507/500). |
| **5. `dispatch_mode: async` HTTP routes** | Done | Migration 0012 adds the column (default `sync`). `DispatchMode` enum in shared. Route admin payload + RouteRepository serialize it. Compiled routes carry it; the matcher returns it in `Matched`. Orchestrator branches: async → outbox + 202; sync → outbox + inbox. |
| **6. Dead letters** | Done | Migration 0010 (`dead_letters`), `DeadLetterRepo` + `DeadLetterService` + `PostgresDeadLetterService`. Rhai `dead_letters::{replay,resolve}` + admin endpoints (`GET /count`, `GET /`, `GET /{id}`, `POST /{id}/replay`, `POST /{id}/resolve`). `Capability::AppDeadLetterManage(AppId)` enforced. List intentionally NOT shipped (deferred to v1.2). Recursion-stop rule (handler-failure annotates original DL as `handler_failed`) implemented in the dispatcher. |
| **7. Abandoned executions** | Done | Migration 0011, `AbandonedRepo` + `PostgresAbandonedRepo`, dispatcher writes a row on dropped-receiver inbox delivery. Metric path reserved (`TODO(metrics)` markers in dispatcher.rs). |
| **8. Retry policy defaults** | Done | `TriggerConfig::from_env` (new module). Env vars: `PICLOUD_MAX_TRIGGER_DEPTH`, `PICLOUD_TRIGGER_RETRY_{MAX_ATTEMPTS,BACKOFF,BASE_MS,JITTER_PCT}`, `PICLOUD_DEAD_LETTER_RETENTION_DAYS`, `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`. Per-trigger overrides applied at trigger-creation time. |
| **9. `ctx.event` for triggered scripts** | Done | `TriggerEvent` enum in shared (KV / DeadLetter variants), `SdkCallCx.event: Option<TriggerEvent>` + `is_dead_letter_handler: bool`. `engine.rs::build_ctx_map` flattens the event into `ctx.event` for triggered handlers; direct invocations leave the key absent. Shape matches design notes §4 (KV with op + collection + key + value; dead_letter with original + attempts + last_error + ids + timestamps). |
| **10. Dashboard surface** | Done | Per-app red badge with unresolved count on apps list + per-app detail page. New `apps/[slug]/dead-letters/+page.svelte` list view with all design-notes-mandated columns + Replay + Mark resolved actions + expandable row detail. svelte-check passes (369 files, 0 errors, 0 warnings). |
| **11. Workspace version bump** | Done | Workspace `1.1.0``1.1.1`, SDK `1.1``1.2`, dashboard `0.6.0``0.7.0`. CHANGELOG.md created at repo root. |
## 3. Key implementation decisions / deviations
### Outbox column set (deferred to implementation per design notes §2)
Chose:
- `script_id` denormalized — dispatcher resolves the target without
re-joining for the common path.
- `trigger_id` polymorphic (no DB FK) — references `triggers.id` for
`source_kind IN {kv, dead_letter}`, `routes.id` for
`source_kind = 'http'`. Discrimination in Rust at dispatch time.
- `claimed_by TEXT` — pid-based for MVP; cluster mode can use any
identifier without schema change.
- `trigger_depth` + `root_execution_id` denormalized so the
dispatcher rebuilds `ExecRequest` without joining back to the
originating execution log.
- No explicit `is_dead_letter_handler` column — dispatcher infers
from the trigger's `kind` field at dispatch time.
### KV pagination
- **Cursor-style**, opaque base64-encoded last-key.
- Page-size cap of 1000 with default 100 (enforced in repo).
- Documented in `crates/shared/src/kv.rs` and the SDK function
comment.
### KV TTL
- Blueprint §8.1 reserved an `expires_at` column. v1.1.1 design notes
don't surface TTL through the SDK (`set(k,v)` has no TTL argument)
so the column is **omitted from migration 0007**. Adding it later
is a non-breaking forward migration. Recorded in CHANGELOG as a
deferred item.
### Authz scope mapping (seven-scope commitment)
The four new capabilities map onto existing scopes — **no new scope
variants** to honour the `Scope` enum's "exactly seven values"
contract (`crates/shared/src/auth.rs:103`):
| Capability | Scope |
|---|---|
| `AppKvRead` | `script:read` |
| `AppKvWrite` | `script:write` |
| `AppManageTriggers` | `app:admin` |
| `AppDeadLetterManage` | `app:admin` |
`role_satisfies` grants `AppKvRead` at the Viewer role, `AppKvWrite`
at Editor, and both trigger / DL caps at AppAdmin.
### Script-as-gate authz for SDK calls
- `KvServiceImpl` runs `authz::require` only when
`cx.principal.is_some()`. Anonymous public-HTTP scripts (the
common case for public routes) bypass the cap check.
- Cross-app isolation is **independent** of this — enforced by
`cx.app_id` being the only source of `app_id` on every query.
- `PostgresDeadLetterService::{replay,resolve}` keeps a hard
`require` (no `if let Some`) — managing dead letters is an admin
act per design notes §4. Public scripts with `principal: None`
fail the check, which is correct.
### Trait split: `OutboxRepo` vs `OutboxWriter`
orchestrator-core can't depend on manager-core (would invert the
dependency arrow). Defined a small `OutboxWriter` trait in
`picloud-shared` with a single `enqueue_http` method.
`PostgresOutboxRepo` implements both `OutboxRepo` (dispatcher
surface) and `OutboxWriter` (orchestrator surface); the picloud
binary clones one concrete Arc into both trait views — mirrors the
existing `members_concrete` / `AuthzRepo` pattern.
### `InboxResolver` lives in shared, `InboxRegistry` in orchestrator-core
Same split rationale — the dispatcher (manager-core) only depends on
the trait, while the in-process impl lives next to its consumer.
Cluster mode (v1.3+) swaps the impl for `LISTEN/NOTIFY` behind the
unchanged trait.
### manager-core now depends on executor-core
Previously manager-core only depended on orchestrator-core. The
dispatcher needs `ExecRequest`/`ExecResponse`/`ExecError`/
`InvocationType` from `executor-core` to build invocation
descriptors. This is the transport DTO interpretation of the
working-rules "don't reach across `*-core` crates" — DTOs are fine,
behaviour is the bright line.
### Sync HTTP via outbox is the default for the user-routes path
The orchestrator's user-route handler is fully on the NATS-style
path now — every sync HTTP request writes to the outbox and awaits
inbox delivery. Adds ~2-5ms per request per design notes §3 latency
budget. `/api/v1/execute/{id}` (the admin/dev bypass) still calls
the executor directly since it doesn't need the unified
observability — kept for simplicity and admin tooling speed.
### Trigger-depth check is on the outbox row, not in the executor
Dispatcher rejects depth-exceeded rows **before** trying to
execute. The `cx.trigger_depth` field is informational on the
executor side. Rejection writes a log + (reserved) metric and
deletes the row — no DL, per design notes §4.
## 4. Tests added
### Unit tests (no DB required)
- `manager-core::kv_service::tests` (10 tests) — round-trip, missing
key returns None, `has` predicate, `delete` was-present,
empty-collection rejection, **cross-app isolation**, anonymous-cx
skips authz, authed-cx-with-no-role is Forbidden, owner-can-write,
cursor pagination via in-memory KvRepo + denying authz repo.
- `manager-core::trigger_config::tests` (2 tests) — conservative
defaults, backoff round-trips.
- `manager-core::trigger_repo::tests` (1 test) — `collection_matches`
glob behaviour (`*`, `prefix:*`, exact).
- `manager-core::dispatcher::tests` (5 tests) — exponential / linear /
constant backoff math, jitter within bounds, ExecError →
InboxFailureKind classification, failure-kind → status-code mapping.
- `manager-core::abandoned_repo::tests` (2 tests) — truncate
char-boundary safety.
- `manager-core::triggers_api::tests` (5 tests) — unknown-app 404,
member-without-role 403, default fallback for retry settings,
empty-glob rejection, cross-app delete is treated as not-found.
- `orchestrator-core::inbox::tests` (4 tests) — register/deliver
round-trip, unknown-id is Abandoned, dropped receiver is
Abandoned, explicit cancel removes sender.
- `executor-core::engine::tests` (3 new) — `ctx.event` absent for
direct invocations, KV insert shape matches design notes,
KV delete has unit value.
- `executor-core::sdk_kv` integration suite (7 tests) — runs a real
Rhai engine under `spawn_blocking` against an in-memory
`KvService` impl. Covers handle pattern, round-trip, unit-on-
missing, has predicate, delete-was-present, empty-collection
throws, cursor pagination, **cross-app isolation through the
bridge**.
**Total: 47 new tests across the workspace.** Workspace test counts
after v1.1.1: 63 manager-core / 56 orchestrator-core / 17
executor-core engine / 7 sdk_kv / 30 sdk_contract / 43 stdlib /
21 picloud / 6 shared.
### Intentionally untested
- DB-backed integration tests for the full dispatcher loop, KV→
trigger→DL retry chain, sync HTTP via outbox round-trip,
recursion-stop end-to-end. These need a real Postgres harness;
the reviewer runs them via the manual smoke flow below.
- Postgres-specific repo behaviour (sqlx query correctness). The
repos compile and run against the schema, but no integration
test crate spins up a DB in this branch — same pattern as v1.1.0
(see existing `ignored, needs DATABASE_URL` test markers).
## 5. Open questions for the reviewer
1. **Outbox `claimed_at` clearing on success.** The dispatcher
`delete`s the outbox row after success / DL. For failures it
reschedules (which sets `claimed_at = NULL`). Both flows are
correct, but if you imagine a crash between the executor return
and the row update, the row stays claimed forever. Cluster mode
should add a periodic "unstick stale claims" sweep. Not in
v1.1.1 scope but worth surfacing.
2. **Sync HTTP overhead.** Every sync HTTP request now goes through
the outbox (write + dispatcher pickup + inbox delivery).
Measured overhead expected ~2-5ms per design notes §3. No
benchmarking yet — recommend the reviewer pick a representative
script and compare 95p latency vs v1.1.0 if performance matters.
3. **HTTP outbox rows don't run as a principal.** The orchestrator's
public HTTP path has no authenticated user; the
`origin_principal` field on the outbox row is forensic. The
resulting `ExecRequest.principal = None`, so the script runs
anonymously — matches direct execution. If you'd prefer
triggered-from-HTTP scripts to inherit a derived principal
(e.g. the route's app's owner), that's an additive change.
4. **Dispatcher uses `ASYNC_EXEC_TIMEOUT = 300s` for async rows.**
Async dispatches don't have a script-level timeout (no
originating HTTP request to bound). Picked the same platform
cap as `LocalExecutorClient`. If async needs a different cap,
easy to thread through `TriggerConfig`.
5. **Dispatcher tick cadence is 100ms.** Bounded enough that
fan-out feels instant; loose enough that an idle process
doesn't burn cycles. If the reviewer wants tighter latency,
bump to 50ms or use `LISTEN/NOTIFY` for wake-up (v1.3+ work).
6. **CHANGELOG.md is new.** Followed the rest of the repo's
convention from git log (release commits + design-notes
references). If a different format is preferred, easy to swap.
## 6. Deferred to later releases
- `dead_letters::list(filter)` Rhai SDK — design notes §4 defers
to v1.2 to align with `docs::find()` query DSL.
- KV TTL (`set(k, v, ttl_secs)`) — blueprint reserved it; v1.1.1
SDK doesn't surface it. Forward-compat (no schema cost).
- Auto-disable of triggers whose script was deleted — design notes
§4 says current handling is metric+log; auto-disable is v1.2.
- Per-app dead-letter retention — design notes §4 says env-only in
v1.1.1.
- Metrics counter emit for `picloud_trigger_depth_exceeded`,
`picloud_dead_letter_handler_failures`,
`picloud_abandoned_executions_total`. Code paths log the
occurrences with `tracing::warn`/`error`; the actual
counter-emit code is a `TODO(metrics)` comment in the
dispatcher. Metrics surface is v1.1.7+ per the roadmap.
- DB-backed integration tests for the dispatcher loop (see §4
intentionally-untested).
- Sync HTTP performance benchmarks comparing v1.1.0 direct path vs
v1.1.1 outbox path.
## 7. How to verify locally
### Static checks (all green on this branch)
```sh
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --workspace
cd dashboard && npm run check && npm run build
```
### Migration integrity
```sh
docker compose down -v && docker compose up -d postgres
cargo run -p picloud # applies 0001..0012 from empty
```
Then start from `main` (v1.1.0 schema state) and switch to this
branch; restart `picloud` to apply 0007..0012 on top.
### Manual end-to-end smoke (reviewer should run)
```sh
docker compose up -d
# 1. Bootstrap an owner user via the existing flow + create app A.
# 2. Create a script in A whose body is: throw "boom"
# 3. POST /api/v1/admin/apps/{A}/triggers/kv with
# {"script_id": "<broken>", "collection_glob": "*", "ops": ["insert"]}
# 4. From another script (or a public HTTP route):
# kv::collection("widgets").set("k1", #{n:1})
# 5. Wait ~7 seconds (3 attempts × ~1/2/4s backoff with ±20% jitter).
# 6. Open the dashboard at /admin.
# 7. Apps list shows a red "1" badge next to app A.
# 8. Click into app A → "Dead letters" tab link → row visible.
# 9. Click row → full payload + error history.
# 10. Click "Replay" → row marks resolution='replayed', new outbox
# row written, dispatcher re-runs the handler (fails again,
# produces a NEW DL row).
# 11. Click "Mark resolved" on the original DL → resolution='ignored'.
```
### Async route smoke
```sh
# Create a route via POST /api/v1/admin/scripts/{id}/routes with
# {"host_kind":"any","path_kind":"exact","path":"/work","dispatch_mode":"async"}
curl -X POST -d '{"work":"thing"}' http://localhost:8080/work
# Expect: HTTP 202 + {"accepted_at":"...","execution_id":"..."}
# Then tail execution_logs — the script ran later (not synchronously).
```
### Trigger-depth limit smoke
```sh
# Set a low depth limit + register a KV trigger whose script
# writes to KV again — creates a loop.
PICLOUD_MAX_TRIGGER_DEPTH=3 cargo run -p picloud
# kv.set(...) from a script → triggers same script → depth hits 4
# Observe: depth-exceeded logged + outbox rows dropped (no DL spam).
```
## 8. Known limitations / rough edges
- **No DB-backed integration tests in this branch.** Unit tests
cover trait behaviour with in-memory backings; sqlx query
correctness is verified by the workspace compile + manual smoke.
- **Dispatcher concurrency is in-process serial-per-tick.** Up to
8 rows claimed per tick, processed one at a time. Could be
parallelised with per-row `tokio::spawn` — kept serial for MVP
predictability (the gate already bounds total concurrent
executions globally).
- **Metric emission is TODO** at the three spots noted in
Open Questions §5. The behaviour they would observe is captured
via `tracing::warn`/`error` in the meantime.
- **`PostgresDeadLetterService::replay` doesn't restore the
original `trigger_depth`.** Replays start at depth 0. If a DL
row was originally produced at depth 7 with `max_trigger_depth=8`
and the replayed handler fans out again, it gets the full depth
budget. Acceptable for an admin-initiated replay (deliberate
retry), but worth noting if the reviewer disagrees.
- **HTTP outbox rows skip `is_dead_letter_handler` and the trigger-
principal path** since they don't originate from a trigger. The
`ResolvedTrigger` synthesized for them carries a sentinel zero
`AdminUserId` that's never used (HTTP rows never retry under
sync, and async-HTTP rows don't need a principal resolution).
- **DataPlaneState's executor field is still generic** (`Arc<E>`
where `E: ExecutorClient`). The dispatcher uses `Arc<dyn
ExecutorClient>` directly. The picloud binary clones the same
`Arc<LocalExecutorClient>` into both — works because the
concrete type implements both the trait object and the generic
bound.
- **dispatcher always sets `principal: None` for HTTP rows.** As
noted in Open Question §3, HTTP outbox rows don't resolve a
principal. Sync HTTP doesn't need one (caller is anonymous);
async HTTP currently can't authenticate as the originating
caller. If that's not the intent, additive change.
- **Cluster-mode crash recovery for claimed rows.** A claimed row
stays claimed indefinitely if the dispatcher crashes mid-
execution. v1.1.1 has one dispatcher per process so this is
rare; cluster mode (v1.3+) needs a stale-claim sweeper.
---
Branch ready for review. Reviewer reads this report + audits the
diff. Do not merge to main until the audit clears.

151
REVIEW.md Normal file
View File

@@ -0,0 +1,151 @@
# v1.1.1 Audit & Review
**Branch:** `feat/v1.1.1-storage-and-events`
**Base:** `main` (v1.1.0)
**Commits ahead:** 12
**Audited by:** reviewer (this report)
**Audited against:** `docs/v1.1.x-design-notes.md` §14 (Decided 2026-06-01) + the original v1.1.1 dispatch prompt
## Verdict
**APPROVE — ready to merge to `main` as v1.1.1.**
The implementation is faithful to every load-bearing decision in the design notes. Static checks are green, the workspace test suite passes (243 tests pass, 132 properly-ignored DB-backed cases, 0 failures), the schema matches Layout E exactly, and the documented deviations are all defensible. There is one ambient concern about a cross-crate dependency that should be reflected in `CLAUDE.md` after the merge, but it is not a merge blocker.
---
## 1. Static checks reproduced
```
cargo fmt --all -- --check ✅ clean
cargo clippy --all-targets --all-features -- -D warnings ✅ no findings
cargo test --workspace ✅ 243 passed / 0 failed
(132 ignored — DB-backed integration tests,
same convention as v1.1.0; documented in HANDBACK §4)
```
Test distribution per crate matches HANDBACK §4:
- manager-core: 63
- orchestrator-core: 56
- stdlib: 43
- sdk_contract: 30
- picloud: 21
- executor-core (engine): 17
- sdk_kv: 7
- shared: 6
47 of these are new in v1.1.1; the rest are v1.1.0's existing suite still passing.
## 2. Design-notes conformance (spot-checks)
| Decision | Where it lives | Verdict |
|---|---|---|
| Layout E trigger storage (parent + per-kind detail) | [0008_triggers.sql:22-72](crates/manager-core/migrations/0008_triggers.sql#L22-L72) | ✅ matches exactly; parent has common columns + the four retry/dispatch knobs + `registered_by_principal`; per-kind detail tables for `kv` and `dead_letter` only |
| `routes` stays separate from `triggers` parent | [0012_routes_dispatch_mode.sql](crates/manager-core/migrations/0012_routes_dispatch_mode.sql), [0009_outbox.sql:13-18](crates/manager-core/migrations/0009_outbox.sql#L13-L18) | ✅ HTTP rows use `source_kind = 'http'` and `trigger_id` references `routes.id`; non-HTTP references `triggers.id`; polymorphism in Rust per the design-notes deferral of the column-set refinement |
| Sync HTTP via outbox + NATS-style inbox | [inbox.rs:30-89](crates/orchestrator-core/src/inbox.rs#L30-L89), [dispatcher.rs:359-394](crates/manager-core/src/dispatcher.rs#L359-L394) | ✅ `oneshot::Sender<InboxResult>` keyed by inbox_id; `deliver()` returns `Delivered` or `Abandoned` exactly per the design-notes failure-mode table |
| `reply_to.is_some()` never retries | [dispatcher.rs:376-394](crates/manager-core/src/dispatcher.rs#L376-L394) | ✅ failure path checks `reply_to` first; delivers single outcome to inbox; deletes outbox row regardless of error |
| Status code table (422/502/503/504/507/500) | [dispatcher.rs:555-564](crates/manager-core/src/dispatcher.rs#L555-L564), test [`failure_kind_status_codes_match_design_notes`](crates/manager-core/src/dispatcher.rs#L674) | ✅ exact mapping; covered by a dedicated test |
| `dispatch_mode = async` returns `202 Accepted` + JSON body | [api.rs:325-332](crates/orchestrator-core/src/api.rs#L325-L332) | ✅ body shape is `{"accepted_at": rfc3339, "execution_id": uuid}` — matches design notes §2 verbatim |
| Default retry: 3/exp/1000ms/±20% jitter | [trigger_config.rs](crates/manager-core/src/trigger_config.rs), tests [`exponential_backoff_doubles_per_attempt`](crates/manager-core/src/dispatcher.rs#L621), [`jitter_within_pct_of_base`](crates/manager-core/src/dispatcher.rs#L647) | ✅ env-overridable; jitter test exercises the ±20% bound across 100 samples |
| `abandoned_executions` written on dropped receiver | [dispatcher.rs:480-509](crates/manager-core/src/dispatcher.rs#L480-L509) | ✅ written only when `InboxDeliveryOutcome::Abandoned` returns; ordinary timeout-with-receiver-still-alive does not write a row |
| Dead-letter recursion stop (flag on execution) | [dispatcher.rs:396-425](crates/manager-core/src/dispatcher.rs#L396-L425), [trigger_repo.rs `TriggerKind::DeadLetter` → `is_dead_letter_handler`](crates/manager-core/src/dispatcher.rs#L228-L229) | ✅ flag set when dispatcher resolves a `kind = 'dead_letter'` trigger; on failure, original DL annotated with `resolution = 'handler_failed'`, row deleted, never retried, never DL'd |
| Sync HTTP failures do NOT dead-letter | [dispatcher.rs:378-394](crates/manager-core/src/dispatcher.rs#L378-L394) | ✅ early return before the DL-write block |
| `dead_letters::list` NOT shipped (deferred to v1.2) | [executor-core/src/sdk/dead_letters.rs:13](crates/executor-core/src/sdk/dead_letters.rs#L13) | ✅ explicit doc-comment citing design notes §4; only `replay` + `resolve` registered |
| Trigger execution runs as registrant's principal | [dispatcher.rs:249-253](crates/manager-core/src/dispatcher.rs#L249-L253) + [`registered_by_principal` column](crates/manager-core/migrations/0008_triggers.sql#L39) | ✅ principal resolved from the trigger row at dispatch time |
| 30-day DL retention, env-overridable | [gc.rs](crates/manager-core/src/gc.rs) | ✅ |
| 7-day abandoned-executions retention | [gc.rs](crates/manager-core/src/gc.rs) | ✅ |
| Trigger-depth limit (default 8); depth-exceeded does NOT dead-letter | [dispatcher.rs:122-137](crates/manager-core/src/dispatcher.rs#L122-L137) | ✅ design-notes §4 honored ("depth-exceeded means you built a loop") — row dropped + logged, no DL spam |
| Dashboard surface: badge + list view + Replay + Mark resolved | [dashboard/src/routes/apps/+page.svelte](dashboard/src/routes/apps/+page.svelte), [dashboard/src/routes/apps/\[slug\]/dead-letters/+page.svelte](dashboard/src/routes/apps/[slug]/dead-letters/+page.svelte) | ✅ all required columns + actions + expandable row detail; `npm run check` reports 0 errors |
| Status: workspace 1.1.0 → 1.1.1, SDK 1.1 → 1.2, dashboard 0.6.0 → 0.7.0, CHANGELOG.md created | last commit `66b661f` | ✅ |
| `ctx.event` shape (KV: source/op/collection/key/value; DL: original/attempts/last_error/ids/timestamps) | [shared/src/trigger_event.rs](crates/shared/src/trigger_event.rs), [executor-core engine tests](crates/executor-core/src/engine.rs) | ✅ matches design notes §4 shape exactly; tests verify both variants + the "absent for direct invocations" rule |
I sampled the design-notes diff (`git diff main..HEAD -- docs/v1.1.x-design-notes.md`) — every "Decided 2026-06-01" entry the agent absorbed into commit `1efb350` matches the decisions made in conversation. No drift.
## 3. Deviations from the prompt (all reviewed, all acceptable)
The HANDBACK's §3 lists nine deviations / mid-implementation decisions. My take on each:
1. **Outbox column set chosen** (`script_id`, `trigger_id` polymorphic, `claimed_by TEXT`, `trigger_depth`, `root_execution_id` denormalized; no `is_dead_letter_handler` column). The design notes explicitly deferred this set to implementation. The chosen shape is sensible: dispatcher can build `ExecRequest` without re-joining; the `is_dead_letter_handler` derivation from `triggers.kind` at dispatch time is cleaner than storing redundant state. ✅
2. **KV pagination is cursor-style** (base64-encoded last-key, 100 default / 1000 max). The prompt left this open; cursor-style is the right default for KV-shaped data. ✅
3. **KV TTL deferred**. Blueprint §8.1 reserved `expires_at` but v1.1.1 SDK doesn't surface TTL. Omitting the column from migration 0007 keeps the schema minimal; adding it later is a non-breaking forward migration. ✅ (CHANGELOG records the deferral.)
4. **Authz scope mapping** (4 new capabilities mapped to existing 7 scopes — `AppKvRead → script:read`, `AppKvWrite → script:write`, `AppManageTriggers → app:admin`, `AppDeadLetterManage → app:admin`). The "seven-scope commitment" is a project convention in `crates/shared/src/auth.rs:103` the prompt didn't mention; honoring it is correct. The specific mapping is defensible: a token with `script:read` on an app already implies "can see the data behind those scripts," and admin-level scope for trigger/DL management is standard for control-plane operations. ✅
5. **Script-as-gate authz** (`if cx.principal.is_some()` then check; else skip — public HTTP runs anonymously without an authz failure). This matches the SDK-shape doc's note that "the data plane is unauthenticated by default — public HTTP scripts run with `None`." Cross-app isolation is preserved regardless (every query keyed by `cx.app_id`). DL replay/resolve correctly bypasses this and hard-requires a principal. ✅
6. **Trait split `OutboxRepo` vs `OutboxWriter`**. Orchestrator-core can't depend on manager-core; the small `OutboxWriter` trait in shared (one method) lets the orchestrator enqueue HTTP rows without inverting the dependency arrow. ✅ Pattern mirrors the existing `members_concrete`/`AuthzRepo` split.
7. **`InboxResolver` in shared, `InboxRegistry` in orchestrator-core**. Same split rationale. Cluster mode (v1.3+) swaps the impl behind the unchanged trait. ✅
8. **manager-core now depends on executor-core**. ⚠️ **See §4 below — flagged, accepted, but should be reflected in `CLAUDE.md`.**
9. **Sync HTTP via outbox is the default for user routes** (admin bypass `/api/v1/execute/{id}` keeps direct dispatch). Matches the design-notes decision; the bypass's direct path is acceptable for admin tooling speed. ✅
## 4. The one concern worth surfacing: manager-core → executor-core
`CLAUDE.md` working rules say:
> Honor the three-service boundary. Don't reach across `*-core` crates. If
> orchestrator-core needs something from manager-core, define a trait in
> shared and inject the impl.
The dispatcher in manager-core directly imports `ExecRequest`, `ExecResponse`, `ExecError`, and `InvocationType` from `executor-core`:
```rust
// crates/manager-core/src/dispatcher.rs:27
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
```
The HANDBACK justifies this as "DTOs vs behavior — types are fine, behavior is the bright line." That's a defensible interpretation, but not what `CLAUDE.md` actually says.
**Two options the project can pick:**
- **(a) Accept the dependency and update `CLAUDE.md`** to clarify that the three-service boundary is about *behavior*, not *types*`ExecRequest`/`ExecResponse`/`ExecError` are transport DTOs and crossing the wire is normal. This is the lower-friction choice and matches how the agent's instincts ran.
- **(b) Refactor**: move `ExecRequest`/`ExecResponse`/`ExecError`/`InvocationType` to `shared`. About 200 lines of moves; would land cleanly as a follow-up PR.
**My recommendation: (a)**. The dispatcher genuinely needs to construct and interpret these types, and they're the natural "what the executor produces" surface — burying them in shared makes the executor's public API less self-contained. But the rule as currently written disagrees; we should pick one explicitly.
This is **not a merge blocker** for v1.1.1 — the implementation already exists and works. The CLAUDE.md update can land as a small commit on `main` after the merge.
## 5. Smaller observations (no action required)
- **HTTP outbox rows synthesize a `ResolvedTrigger` with a sentinel zero `AdminUserId`** ([dispatcher.rs:342](crates/manager-core/src/dispatcher.rs#L342)). The HANDBACK flags this as a code smell; I agree, but the cleaner shape (`enum DispatchTarget { Trigger(ResolvedTrigger), Http(HttpRoute) }`) is a refactor that doesn't belong in v1.1.1. Worth doing in v1.1.2 alongside the docs work since the dispatcher will gain another trigger kind.
- **Triggers parent `dispatch_mode` defaults to `'async'`** ([0008_triggers.sql:30](crates/manager-core/migrations/0008_triggers.sql#L30)) with `sync` allowed by the CHECK constraint but unsupported in v1.1.1 (sync trigger would mean firing inline with the originating mutation, which we don't do). The migration comment captures this; worth a future commit to either remove `'sync'` from the CHECK or use it for an `inline_pre_mutate` semantics if it ever makes sense. Not v1.1.1's problem.
- **Metric counters are TODO** at three call sites (`picloud_trigger_depth_exceeded`, `picloud_dead_letter_handler_failures`, `picloud_abandoned_executions_total`). The events are logged via `tracing::warn`/`error` in the meantime. Per the prompt and roadmap, metrics surface is v1.1.7+. ✅
- **Dispatcher tick cadence is 100ms with `CLAIM_BATCH = 8`**, serial per tick. The ExecutionGate bounds total concurrent executions globally, so parallelism within a tick is purely an optimization. Reasonable MVP choice; can parallelize later without changing semantics.
- **Open Q1 in HANDBACK (claimed-rows-stuck-on-crash)** is a real cluster-mode concern, correctly out-of-scope for v1.1.1 (single dispatcher per process). Cluster mode adds a stale-claim sweeper — track for v1.3+.
- **Open Q3 in HANDBACK (HTTP-triggered scripts run with `principal: None`)** is correct as-is. The "trigger executions inherit the registrant's principal" decision applies to triggers; HTTP routes have no registrant in that sense. Public HTTP is anonymous by design.
## 6. Versioning audit
| File | Before | After | Status |
|---|---|---|---|
| Workspace `Cargo.toml` (workspace.package.version) | 1.1.0 | 1.1.1 | ✅ |
| SDK schema version (`shared/src/version.rs`) | 1.1 | 1.2 | ✅ correctly bumped — the SDK surface added `KvService` + `DeadLetterService` + `TriggerEvent` |
| Dashboard `package.json` | 0.6.0 | 0.7.0 | ✅ |
| Migrations | 0001..0006 | 0007..0012 added | ✅ sequential, no skips |
| CHANGELOG.md | not present | created at repo root | ✅ first entry covers v1.1.1 |
## 7. Manual smoke recommendation
The reviewer (you) does **not** need to run the manual end-to-end smoke before merging — the automated tests + the static review above cover the contracts. The smoke flow in HANDBACK §7 is worth running **after merge** as a release-validation step before tagging `v1.1.1` (if the project tags releases). Specifically:
1. `docker compose up -d` (fresh DB)
2. `cargo run -p picloud`
3. Create app + script-that-throws + KV trigger
4. Trigger a KV write → wait ~7s → confirm DL row appears
5. Dashboard: red badge on apps list, list view shows the row, Replay creates a new outbox row + dispatcher re-runs, Mark resolved sets `resolution = 'ignored'`
6. Async route test: `POST /work` with `dispatch_mode=async` route → expect 202 + JSON body
If any of those misbehave post-merge, revert is straightforward (12 commits, ahead of main, no dependencies have pulled changes yet).
## 8. Recommended next steps (post-merge)
1. **Merge** `feat/v1.1.1-storage-and-events` into `main` (fast-forward; branch is linear ahead).
2. **Tag** `v1.1.1` if release tagging is the project convention (git log shows v1.1.0 had a release commit but I didn't see a tag — confirm with the project owner).
3. **Small CLAUDE.md update** clarifying the three-service boundary's scope (types crossing is fine; behavior crossing is what's prohibited). One-paragraph change.
4. **Pause** before dispatching the v1.1.2 (Documents) agent — the v1.1.1 work shipped substantial infrastructure that v1.1.2 will lean on, and there may be small lessons from the v1.1.1 implementation to fold into the v1.1.2 prompt (e.g., reaffirming the "manager-core depends on executor-core for DTOs" pattern explicitly so the docs agent doesn't second-guess it).
Branch is ready for merge. Verdict: **APPROVE**.

View File

@@ -14,6 +14,7 @@ picloud-shared.workspace = true
serde.workspace = true serde.workspace = true
serde_json.workspace = true serde_json.workspace = true
thiserror.workspace = true thiserror.workspace = true
tokio.workspace = true
tracing.workspace = true tracing.workspace = true
uuid.workspace = true uuid.workspace = true
chrono.workspace = true chrono.workspace = true
@@ -25,3 +26,6 @@ rand.workspace = true
base64.workspace = true base64.workspace = true
hex.workspace = true hex.workspace = true
percent-encoding.workspace = true percent-encoding.workspace = true
[dev-dependencies]
async-trait.workspace = true

View File

@@ -3,7 +3,9 @@ use std::sync::{Arc, Mutex};
use std::time::Instant; use std::time::Instant;
use chrono::Utc; use chrono::Utc;
use picloud_shared::{ScriptValidator, SdkCallCx, Services, ValidationError, SDK_VERSION}; use picloud_shared::{
ScriptValidator, SdkCallCx, Services, TriggerEvent, ValidationError, SDK_VERSION,
};
use rhai::{Dynamic, Engine as RhaiEngine, EvalAltResult, Map, Module, Scope}; use rhai::{Dynamic, Engine as RhaiEngine, EvalAltResult, Map, Module, Scope};
use serde_json::Value as Json; use serde_json::Value as Json;
@@ -75,6 +77,8 @@ impl Engine {
request_id: req.request_id, request_id: req.request_id,
trigger_depth: req.trigger_depth, trigger_depth: req.trigger_depth,
root_execution_id: req.root_execution_id, root_execution_id: req.root_execution_id,
is_dead_letter_handler: req.is_dead_letter_handler,
event: req.event.clone(),
}); });
sdk::register_all(&mut engine, &self.services, cx); sdk::register_all(&mut engine, &self.services, cx);
@@ -239,9 +243,82 @@ fn build_ctx_map(req: &ExecRequest) -> Map {
request.insert("rest".into(), req.rest.clone().into()); request.insert("rest".into(), req.rest.clone().into());
ctx.insert("request".into(), request.into()); ctx.insert("request".into(), request.into());
// Triggered invocations: surface the originating event as
// `ctx.event`. Direct ingress (HTTP request, manual run) leaves
// the key absent so scripts can test `if "event" in ctx`.
if let Some(event) = req.event.as_ref() {
ctx.insert("event".into(), trigger_event_to_dynamic(event));
}
ctx ctx
} }
/// Convert a `TriggerEvent` into the `ctx.event` Rhai shape defined in
/// `docs/v1.1.x-design-notes.md` §4 (the dead-letter sub-shape) and
/// §2/blueprint §9 (KV). Each variant becomes a Rhai map with a
/// `source` discriminant plus per-source fields.
fn trigger_event_to_dynamic(event: &TriggerEvent) -> Dynamic {
let mut m = Map::new();
m.insert("source".into(), event.source().into());
match event {
TriggerEvent::Kv {
op,
collection,
key,
value,
} => {
m.insert("op".into(), op.as_str().into());
let mut kv_map = Map::new();
kv_map.insert("collection".into(), collection.clone().into());
kv_map.insert("key".into(), key.clone().into());
kv_map.insert(
"value".into(),
value.clone().map_or(Dynamic::UNIT, json_to_dynamic),
);
m.insert("kv".into(), kv_map.into());
}
TriggerEvent::DeadLetter {
dead_letter_id,
original,
attempts,
last_error,
trigger_id,
script_id,
first_attempt_at,
last_attempt_at,
} => {
let mut dl = Map::new();
dl.insert("id".into(), dead_letter_id.to_string().into());
dl.insert("original".into(), trigger_event_to_dynamic(original));
dl.insert("attempts".into(), i64::from(*attempts).into());
dl.insert("last_error".into(), last_error.clone().into());
dl.insert(
"trigger_id".into(),
trigger_id
.map(|id| Dynamic::from(id.to_string()))
.unwrap_or(Dynamic::UNIT),
);
dl.insert(
"script_id".into(),
script_id
.map(|id| Dynamic::from(id.to_string()))
.unwrap_or(Dynamic::UNIT),
);
dl.insert(
"first_attempt_at".into(),
first_attempt_at.to_rfc3339().into(),
);
dl.insert(
"last_attempt_at".into(),
last_attempt_at.to_rfc3339().into(),
);
m.insert("dead_letter".into(), dl.into());
}
}
m.into()
}
fn invocation_type_str(it: InvocationType) -> &'static str { fn invocation_type_str(it: InvocationType) -> &'static str {
match it { match it {
InvocationType::Http => "http", InvocationType::Http => "http",

View File

@@ -0,0 +1,84 @@
//! `dead_letters::` Rhai bridge.
//!
//! ```rhai
//! dead_letters::replay("01234567-..."); // re-enqueue + mark replayed
//! dead_letters::resolve("01234567-...", "ignored"); // close out the row
//! ```
//!
//! Sync↔async via `Handle::current().block_on(...)` — same pattern as
//! the `kv::` bridge (works because `LocalExecutorClient` runs the
//! script under `spawn_blocking`).
//!
//! `dead_letters::list(filter)` is intentionally NOT shipped — design
//! notes §4 defers it to v1.2 to align with the `docs::find()` query
//! DSL.
use std::str::FromStr;
use std::sync::Arc;
use picloud_shared::{DeadLetterError, DeadLetterId, SdkCallCx, Services};
use rhai::{Engine as RhaiEngine, EvalAltResult, Module};
use tokio::runtime::Handle as TokioHandle;
use uuid::Uuid;
pub(super) fn register(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
let svc = services.dead_letters.clone();
let mut module = Module::new();
{
let svc = svc.clone();
let cx = cx.clone();
module.set_native_fn(
"replay",
move |id: &str| -> Result<(), Box<EvalAltResult>> {
let dl_id = parse_dl_id(id)?;
let svc = svc.clone();
let cx = cx.clone();
block_on(async move { svc.replay(&cx, dl_id).await })
},
);
}
{
let svc = svc.clone();
let cx = cx.clone();
module.set_native_fn(
"resolve",
move |id: &str, reason: &str| -> Result<(), Box<EvalAltResult>> {
let dl_id = parse_dl_id(id)?;
let reason = reason.to_string();
let svc = svc.clone();
let cx = cx.clone();
block_on(async move { svc.resolve(&cx, dl_id, &reason).await })
},
);
}
engine.register_static_module("dead_letters", module.into());
}
fn parse_dl_id(s: &str) -> Result<DeadLetterId, Box<EvalAltResult>> {
Uuid::from_str(s)
.map(DeadLetterId::from)
.map_err(|e| -> Box<EvalAltResult> {
EvalAltResult::ErrorRuntime(
format!("dead_letters: invalid id {s:?}: {e}").into(),
rhai::Position::NONE,
)
.into()
})
}
fn block_on<F>(fut: F) -> Result<(), Box<EvalAltResult>>
where
F: std::future::Future<Output = Result<(), DeadLetterError>> + Send,
{
let handle = TokioHandle::try_current().map_err(|e| -> Box<EvalAltResult> {
EvalAltResult::ErrorRuntime(
format!("dead_letters: no tokio runtime available: {e}").into(),
rhai::Position::NONE,
)
.into()
})?;
handle.block_on(fut).map_err(|err| -> Box<EvalAltResult> {
EvalAltResult::ErrorRuntime(format!("dead_letters: {err}").into(), rhai::Position::NONE)
.into()
})
}

View File

@@ -0,0 +1,193 @@
//! `kv::` Rhai bridge — collection-scoped handle pattern.
//!
//! ```rhai
//! let widgets = kv::collection("widgets");
//! widgets.set("k", #{ n: 1 });
//! let v = widgets.get("k"); // value or () if absent
//! if widgets.has("k") { ... }
//! widgets.delete("k"); // bool (was-present)
//! let page = widgets.list(); // returns #{ keys: [...], next_cursor: () }
//! ```
//!
//! The `KvHandle` custom Rhai type captures the collection name once
//! and routes each call through the injected `Arc<dyn KvService>` with
//! the per-call `Arc<SdkCallCx>`. **The service derives `app_id` from
//! `cx.app_id` — `app_id` never appears in any function signature
//! script-side, preserving cross-app isolation.**
//!
//! Sync↔async bridge: Rhai is synchronous; the underlying service is
//! async. Closures wrap each call in `Handle::current().block_on(...)`
//! — safe because `LocalExecutorClient` runs the script under
//! `spawn_blocking`, so a runtime handle is reachable and blocking on
//! it doesn't park an async worker.
//!
//! Error convention (per `docs/sdk-shape.md`):
//! - throw on failure (Rhai runtime error string)
//! - `()` for absent values (`get` on a missing key)
//! - `bool` for predicates (`has`; also `delete` returns was-present)
use std::sync::Arc;
use picloud_shared::{KvError, KvService, SdkCallCx, Services};
use rhai::{Array, Dynamic, Engine as RhaiEngine, EvalAltResult, Map, Module};
use tokio::runtime::Handle as TokioHandle;
use super::bridge::{dynamic_to_json, json_to_dynamic};
/// Per-call handle captured by the Rhai SDK. Cheap to clone (two Arcs
/// plus an owned string).
#[derive(Clone)]
pub struct KvHandle {
collection: String,
service: Arc<dyn KvService>,
cx: Arc<SdkCallCx>,
}
pub(super) fn register(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
let kv_service = services.kv.clone();
// `kv::collection(name)` — handle constructor lives in the `kv`
// static module so the script-visible call is `kv::collection(...)`.
let mut module = Module::new();
{
let kv_service = kv_service.clone();
let cx = cx.clone();
module.set_native_fn(
"collection",
move |name: &str| -> Result<KvHandle, Box<EvalAltResult>> {
if name.is_empty() {
return Err("kv::collection name must not be empty".into());
}
Ok(KvHandle {
collection: name.to_string(),
service: kv_service.clone(),
cx: cx.clone(),
})
},
);
}
engine.register_static_module("kv", module.into());
// Methods on KvHandle — `register_fn` with `&mut KvHandle` first
// argument lets Rhai dispatch them as `handle.get(k)` /
// `handle.set(k, v)` / etc. through the dot-notation.
engine.register_type_with_name::<KvHandle>("KvHandle");
register_get(engine);
register_set(engine);
register_has(engine);
register_delete(engine);
register_list(engine);
}
fn register_get(engine: &mut RhaiEngine) {
engine.register_fn(
"get",
|handle: &mut KvHandle, key: &str| -> Result<Dynamic, Box<EvalAltResult>> {
let h = handle.clone();
block_on(async move { h.service.get(&h.cx, &h.collection, key).await })
.map(|opt| opt.map_or(Dynamic::UNIT, json_to_dynamic))
},
);
}
fn register_set(engine: &mut RhaiEngine) {
engine.register_fn(
"set",
|handle: &mut KvHandle, key: &str, value: Dynamic| -> Result<(), Box<EvalAltResult>> {
let h = handle.clone();
let json = dynamic_to_json(&value);
block_on(async move { h.service.set(&h.cx, &h.collection, key, json).await })
},
);
}
fn register_has(engine: &mut RhaiEngine) {
engine.register_fn(
"has",
|handle: &mut KvHandle, key: &str| -> Result<bool, Box<EvalAltResult>> {
let h = handle.clone();
block_on(async move { h.service.has(&h.cx, &h.collection, key).await })
},
);
}
fn register_delete(engine: &mut RhaiEngine) {
engine.register_fn(
"delete",
|handle: &mut KvHandle, key: &str| -> Result<bool, Box<EvalAltResult>> {
let h = handle.clone();
block_on(async move { h.service.delete(&h.cx, &h.collection, key).await })
},
);
}
fn register_list(engine: &mut RhaiEngine) {
// Zero-arg form — full page, no cursor.
engine.register_fn(
"list",
|handle: &mut KvHandle| -> Result<Map, Box<EvalAltResult>> { list_call(handle, None, 0) },
);
// One-arg form — cursor only.
engine.register_fn(
"list",
|handle: &mut KvHandle, cursor: &str| -> Result<Map, Box<EvalAltResult>> {
list_call(handle, Some(cursor.to_string()), 0)
},
);
// Two-arg form — cursor + limit.
engine.register_fn(
"list",
|handle: &mut KvHandle, cursor: &str, limit: i64| -> Result<Map, Box<EvalAltResult>> {
let limit = u32::try_from(limit.max(0)).unwrap_or(0);
list_call(handle, Some(cursor.to_string()), limit)
},
);
}
fn list_call(
handle: &KvHandle,
cursor: Option<String>,
limit: u32,
) -> Result<Map, Box<EvalAltResult>> {
let h = handle.clone();
let page = block_on(async move {
h.service
.list(&h.cx, &h.collection, cursor.as_deref(), limit)
.await
})?;
let mut m = Map::new();
let keys: Array = page.keys.into_iter().map(Dynamic::from).collect();
m.insert("keys".into(), keys.into());
m.insert(
"next_cursor".into(),
page.next_cursor.map_or(Dynamic::UNIT, Dynamic::from),
);
Ok(m)
}
/// Run an async future inside the synchronous Rhai context.
///
/// `LocalExecutorClient` wraps script execution in `spawn_blocking`, so
/// the current Tokio runtime is reachable via `Handle::current()`. We
/// block on it directly; we are NOT calling this from an async task,
/// so blocking is the correct primitive (`block_in_place` would also
/// work, but we're already on a blocking worker).
fn block_on<F, T>(fut: F) -> Result<T, Box<EvalAltResult>>
where
F: std::future::Future<Output = Result<T, KvError>> + Send,
T: Send,
{
let handle = TokioHandle::try_current().map_err(|e| -> Box<EvalAltResult> {
EvalAltResult::ErrorRuntime(
format!("kv: no tokio runtime available: {e}").into(),
rhai::Position::NONE,
)
.into()
})?;
handle.block_on(fut).map_err(|err| -> Box<EvalAltResult> {
EvalAltResult::ErrorRuntime(format!("kv: {err}").into(), rhai::Position::NONE).into()
})
}

View File

@@ -13,6 +13,8 @@
pub mod bridge; pub mod bridge;
pub mod cx; pub mod cx;
pub mod dead_letters;
pub mod kv;
pub mod stdlib; pub mod stdlib;
pub use bridge::{dynamic_to_json, json_to_dynamic}; pub use bridge::{dynamic_to_json, json_to_dynamic};
@@ -27,14 +29,9 @@ use rhai::Engine as RhaiEngine;
/// once per invocation, just after `build_engine` constructs the /// once per invocation, just after `build_engine` constructs the
/// sandboxed Rhai engine and just before script compilation. /// sandboxed Rhai engine and just before script compilation.
/// ///
/// v1.1.0 ships an intentionally empty body — the call site exists so /// v1.1.1 wires the first stateful service (KV). Subsequent PRs add a
/// future PRs (KV first) drop their registration logic here rather /// single `<service>::register(...)` line per service.
/// than reaching into `engine.rs::build_engine`. The signature is
/// locked: subsequent PRs MUST keep the same parameter shape so that
/// hosts don't have to re-thread the plumbing.
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) { pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
// Intentionally inert in v1.1.0. The unused-suppression below is a kv::register(engine, services, cx.clone());
// load-bearing placeholder: future PRs replace this `let _` with dead_letters::register(engine, services, cx);
// real `register_kv(engine, services, cx.clone())` calls etc.
let _ = (engine, services, cx);
} }

View File

@@ -1,7 +1,9 @@
use std::collections::BTreeMap; use std::collections::BTreeMap;
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use picloud_shared::{AppId, ExecutionId, Principal, RequestId, ScriptId, ScriptSandbox}; use picloud_shared::{
AppId, ExecutionId, Principal, RequestId, ScriptId, ScriptSandbox, TriggerEvent,
};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use thiserror::Error; use thiserror::Error;
@@ -79,6 +81,20 @@ pub struct ExecRequest {
/// `execution_id` for direct invocations; preserves the root /// `execution_id` for direct invocations; preserves the root
/// across fan-out for audit log grouping. /// across fan-out for audit log grouping.
pub root_execution_id: ExecutionId, pub root_execution_id: ExecutionId,
/// `true` only when the dispatcher resolved this invocation
/// against a `dead_letter` trigger. The retry / dead-letter
/// machinery short-circuits when this is set so handler failures
/// cannot themselves be dead-lettered (design notes §4
/// recursion-stop rule).
#[serde(default)]
pub is_dead_letter_handler: bool,
/// The originating event for a triggered invocation. `None` for
/// direct ingress (sync HTTP, manual admin run). Flattened into
/// `ctx.event` by the executor's per-call ctx builder.
#[serde(default)]
pub event: Option<TriggerEvent>,
} }
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]

View File

@@ -1,7 +1,9 @@
use std::collections::BTreeMap; use std::collections::BTreeMap;
use picloud_executor_core::{Engine, ExecError, ExecRequest, InvocationType, Limits, LogLevel}; use picloud_executor_core::{Engine, ExecError, ExecRequest, InvocationType, Limits, LogLevel};
use picloud_shared::{AppId, ExecutionId, RequestId, ScriptId, ScriptSandbox, Services}; use picloud_shared::{
AppId, ExecutionId, KvEventOp, RequestId, ScriptId, ScriptSandbox, Services, TriggerEvent,
};
use serde_json::json; use serde_json::json;
fn req(body: serde_json::Value) -> ExecRequest { fn req(body: serde_json::Value) -> ExecRequest {
@@ -23,11 +25,13 @@ fn req(body: serde_json::Value) -> ExecRequest {
principal: None, principal: None,
trigger_depth: 0, trigger_depth: 0,
root_execution_id: execution_id, root_execution_id: execution_id,
is_dead_letter_handler: false,
event: None,
} }
} }
fn engine() -> Engine { fn engine() -> Engine {
Engine::new(Limits::default(), Services::new()) Engine::new(Limits::default(), Services::default())
} }
#[test] #[test]
@@ -126,7 +130,7 @@ fn enforces_operation_budget() {
max_operations: 1_000, max_operations: 1_000,
..Limits::default() ..Limits::default()
}; };
let engine = Engine::new(limits, Services::new()); let engine = Engine::new(limits, Services::default());
// 10_000 iterations vastly exceeds 1_000 ops. // 10_000 iterations vastly exceeds 1_000 ops.
let src = r"let n = 0; for i in 0..10000 { n += 1; } n"; let src = r"let n = 0; for i in 0..10000 { n += 1; } n";
let err = engine let err = engine
@@ -235,3 +239,67 @@ fn body_passes_through_nested_json_round_trip() {
let resp = engine().execute(src, req(body.clone())).unwrap(); let resp = engine().execute(src, req(body.clone())).unwrap();
assert_eq!(resp.body, body); assert_eq!(resp.body, body);
} }
#[test]
fn ctx_event_absent_for_direct_invocations() {
// Scripts not fired through the triggers framework see no
// `ctx.event` key — they can use `"event" in ctx` to detect.
let src = r#"
if "event" in ctx { #{ statusCode: 500, body: "should be absent" } }
else { "absent" }
"#;
let resp = engine().execute(src, req(json!(null))).unwrap();
assert_eq!(resp.body, json!("absent"));
}
#[test]
fn ctx_event_kv_shape_matches_design_notes() {
// Build an ExecRequest mimicking what the dispatcher hands a
// KV-triggered handler — `event = Some(TriggerEvent::Kv { … })`.
let mut r = req(json!(null));
r.event = Some(TriggerEvent::Kv {
op: KvEventOp::Insert,
collection: "widgets".into(),
key: "k1".into(),
value: Some(json!({ "n": 1 })),
});
let src = r"
#{
source: ctx.event.source,
op: ctx.event.op,
collection: ctx.event.kv.collection,
key: ctx.event.kv.key,
value: ctx.event.kv.value
}
";
let resp = engine().execute(src, r).unwrap();
assert_eq!(
resp.body,
json!({
"source": "kv",
"op": "insert",
"collection": "widgets",
"key": "k1",
"value": { "n": 1 }
})
);
}
#[test]
fn ctx_event_kv_delete_has_unit_value() {
let mut r = req(json!(null));
r.event = Some(TriggerEvent::Kv {
op: KvEventOp::Delete,
collection: "widgets".into(),
key: "k1".into(),
value: None,
});
let src = r"
#{
op: ctx.event.op,
value_is_unit: ctx.event.kv.value == ()
}
";
let resp = engine().execute(src, r).unwrap();
assert_eq!(resp.body, json!({ "op": "delete", "value_is_unit": true }));
}

View File

@@ -31,7 +31,7 @@ use serde_json::{json, Value};
// ---------------------------------------------------------------------------- // ----------------------------------------------------------------------------
fn engine() -> Engine { fn engine() -> Engine {
Engine::new(Limits::default(), Services::new()) Engine::new(Limits::default(), Services::default())
} }
fn baseline_request() -> ExecRequest { fn baseline_request() -> ExecRequest {
@@ -53,6 +53,8 @@ fn baseline_request() -> ExecRequest {
principal: None, principal: None,
trigger_depth: 0, trigger_depth: 0,
root_execution_id: execution_id, root_execution_id: execution_id,
is_dead_letter_handler: false,
event: None,
} }
} }

View File

@@ -0,0 +1,260 @@
//! `kv::` SDK bridge integration tests — runs a real Rhai engine
//! against an in-memory `KvService` impl. Mirrors how
//! `orchestrator-core::LocalExecutorClient` invokes the engine: under
//! `tokio::task::spawn_blocking` so the bridge's `block_on` has a
//! reachable runtime.
use std::collections::{BTreeMap, HashMap};
use std::sync::Arc;
use async_trait::async_trait;
use picloud_executor_core::{Engine, ExecRequest, InvocationType, Limits};
use picloud_shared::{
AppId, ExecutionId, KvError, KvListPage, KvService, NoopDeadLetterService, NoopEventEmitter,
RequestId, ScriptId, ScriptSandbox, SdkCallCx, Services,
};
use serde_json::{json, Value};
use tokio::sync::Mutex;
#[derive(Default)]
struct InMemoryKv {
data: Mutex<HashMap<(AppId, String, String), Value>>,
}
#[async_trait]
impl KvService for InMemoryKv {
async fn get(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
) -> Result<Option<Value>, KvError> {
Ok(self
.data
.lock()
.await
.get(&(cx.app_id, collection.to_string(), key.to_string()))
.cloned())
}
async fn set(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
value: Value,
) -> Result<(), KvError> {
self.data
.lock()
.await
.insert((cx.app_id, collection.to_string(), key.to_string()), value);
Ok(())
}
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
Ok(self
.data
.lock()
.await
.remove(&(cx.app_id, collection.to_string(), key.to_string()))
.is_some())
}
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
Ok(self.data.lock().await.contains_key(&(
cx.app_id,
collection.to_string(),
key.to_string(),
)))
}
async fn list(
&self,
cx: &SdkCallCx,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvError> {
let data = self.data.lock().await;
let mut keys: Vec<String> = data
.iter()
.filter(|((a, c, _), _)| *a == cx.app_id && c == collection)
.map(|((_, _, k), _)| k.clone())
.filter(|k| cursor.is_none_or(|c| k.as_str() > c))
.collect();
keys.sort();
let take = if limit == 0 {
usize::MAX
} else {
limit as usize
};
let next_cursor = if keys.len() > take {
keys.truncate(take);
keys.last().cloned()
} else {
None
};
Ok(KvListPage { keys, next_cursor })
}
}
fn make_engine() -> Arc<Engine> {
let services = Services::new(
Arc::new(InMemoryKv::default()),
Arc::new(NoopDeadLetterService),
Arc::new(NoopEventEmitter),
);
Arc::new(Engine::new(Limits::default(), services))
}
fn baseline_request(app_id: AppId) -> ExecRequest {
let execution_id = ExecutionId::new();
ExecRequest {
execution_id,
request_id: RequestId::new(),
script_id: ScriptId::new(),
script_name: "kv-test".into(),
invocation_type: InvocationType::Http,
path: "/kv-test".into(),
headers: BTreeMap::new(),
body: Value::Null,
params: BTreeMap::new(),
query: BTreeMap::new(),
rest: String::new(),
sandbox_overrides: ScriptSandbox::default(),
app_id,
principal: None,
trigger_depth: 0,
root_execution_id: execution_id,
is_dead_letter_handler: false,
event: None,
}
}
async fn run_script(engine: Arc<Engine>, src: &str, req: ExecRequest) -> Value {
let src = src.to_string();
tokio::task::spawn_blocking(move || engine.execute(&src, req))
.await
.expect("spawn_blocking should not panic")
.expect("script execution should succeed")
.body
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_set_then_get_round_trip() {
let engine = make_engine();
let app = AppId::new();
let src = r#"
let widgets = kv::collection("widgets");
widgets.set("k1", #{ n: 1 });
widgets.get("k1")
"#;
let body = run_script(engine, src, baseline_request(app)).await;
assert_eq!(body, json!({ "n": 1 }));
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_get_missing_returns_unit() {
let engine = make_engine();
let app = AppId::new();
let src = r#"
let c = kv::collection("widgets");
let v = c.get("nope");
v == ()
"#;
let body = run_script(engine, src, baseline_request(app)).await;
assert_eq!(body, json!(true));
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_has_returns_bool() {
let engine = make_engine();
let app = AppId::new();
let src = r#"
let c = kv::collection("widgets");
let before = c.has("k");
c.set("k", "v");
let after = c.has("k");
#{ before: before, after: after }
"#;
let body = run_script(engine, src, baseline_request(app)).await;
assert_eq!(body, json!({ "before": false, "after": true }));
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_delete_returns_was_present() {
let engine = make_engine();
let app = AppId::new();
let src = r#"
let c = kv::collection("widgets");
let nope = c.delete("missing");
c.set("k", 1);
let yep = c.delete("k");
#{ nope: nope, yep: yep }
"#;
let body = run_script(engine, src, baseline_request(app)).await;
assert_eq!(body, json!({ "nope": false, "yep": true }));
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_empty_collection_name_throws() {
let engine = make_engine();
let app = AppId::new();
let src = r#"kv::collection("")"#;
let req = baseline_request(app);
let err = tokio::task::spawn_blocking(move || engine.execute(src, req))
.await
.unwrap()
.expect_err("empty collection should throw");
assert!(format!("{err:?}").contains("kv::collection"));
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_list_pages_via_cursor() {
let engine = make_engine();
let app = AppId::new();
let src = r#"
let c = kv::collection("widgets");
for i in 0..5 { c.set(`k${i}`, i); }
let p1 = c.list("", 2);
let p2 = c.list(p1.next_cursor, 2);
#{
p1_keys: p1.keys,
p1_cursor: p1.next_cursor,
p2_keys: p2.keys,
}
"#;
let body = run_script(engine, src, baseline_request(app)).await;
let obj = body.as_object().unwrap();
let p1_keys = obj["p1_keys"].as_array().unwrap();
let p2_keys = obj["p2_keys"].as_array().unwrap();
assert_eq!(p1_keys.len(), 2);
assert_eq!(p2_keys.len(), 2);
assert!(obj["p1_cursor"].is_string());
}
/// Cross-app isolation via `cx.app_id` — script with `app_id = A`
/// cannot see entries from `app_id = B`. The kv:: bridge never
/// surfaces `app_id` to the script, so this is enforced purely by the
/// service deriving it from the captured `Arc<SdkCallCx>`.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn kv_bridge_preserves_cross_app_isolation() {
let engine = make_engine();
let app_a = AppId::new();
let app_b = AppId::new();
let writer = r#"
let c = kv::collection("shared");
c.set("k", "from-a");
"ok"
"#;
let _ = run_script(engine.clone(), writer, baseline_request(app_a)).await;
// App B sees nothing under the same collection/key.
let reader = r#"
let c = kv::collection("shared");
c.get("k")
"#;
let body = run_script(engine, reader, baseline_request(app_b)).await;
assert_eq!(body, Value::Null);
}

View File

@@ -17,7 +17,7 @@ use serde_json::{json, Value};
// ---------------------------------------------------------------------------- // ----------------------------------------------------------------------------
fn engine() -> Engine { fn engine() -> Engine {
Engine::new(Limits::default(), Services::new()) Engine::new(Limits::default(), Services::default())
} }
fn baseline_request() -> ExecRequest { fn baseline_request() -> ExecRequest {
@@ -39,6 +39,8 @@ fn baseline_request() -> ExecRequest {
principal: None, principal: None,
trigger_depth: 0, trigger_depth: 0,
root_execution_id: execution_id, root_execution_id: execution_id,
is_dead_letter_handler: false,
event: None,
} }
} }

View File

@@ -10,13 +10,16 @@ workspace = true
[dependencies] [dependencies]
picloud-shared.workspace = true picloud-shared.workspace = true
picloud-executor-core.workspace = true
picloud-orchestrator-core.workspace = true picloud-orchestrator-core.workspace = true
async-trait.workspace = true async-trait.workspace = true
axum.workspace = true axum.workspace = true
rand.workspace = true
serde.workspace = true serde.workspace = true
serde_json.workspace = true serde_json.workspace = true
thiserror.workspace = true thiserror.workspace = true
tokio.workspace = true
tracing.workspace = true tracing.workspace = true
uuid.workspace = true uuid.workspace = true
chrono.workspace = true chrono.workspace = true
@@ -24,7 +27,6 @@ sqlx.workspace = true
url.workspace = true url.workspace = true
argon2.workspace = true argon2.workspace = true
rand.workspace = true
sha2.workspace = true sha2.workspace = true
base64.workspace = true base64.workspace = true
data-encoding.workspace = true data-encoding.workspace = true

View File

@@ -0,0 +1,28 @@
-- v1.1.1: Key-value store — see blueprint §8.1 + docs/sdk-shape.md.
--
-- Identity tuple `(app_id, collection, key)`. `app_id` is first in the
-- primary key so the implicit index is always per-app; cross-app reads
-- cannot happen even with a buggy query. Collections are a required
-- namespace inside an app — the same key can live in different
-- collections without collision.
--
-- `value` is JSONB so scripts can store nested structures without
-- a separate serialization step. No TTL column in v1.1.1; deferred
-- until a concrete need surfaces (the blueprint reserved one but the
-- v1.1.1 SDK surface — get/set/has/delete/list — doesn't expose TTL).
CREATE TABLE kv_entries (
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
collection TEXT NOT NULL,
key TEXT NOT NULL,
value JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (app_id, collection, key)
);
-- Supports list-by-collection (keyset pagination) and per-collection
-- triggers' fan-out scans. The PK already covers (app_id, collection)
-- as a prefix but spelling out the explicit index makes intent clear
-- for the planner.
CREATE INDEX idx_kv_entries_app_collection ON kv_entries (app_id, collection);

View File

@@ -0,0 +1,72 @@
-- v1.1.1: Trigger framework — Layout E (design notes §2 + §7).
--
-- A parent `triggers` table holds the common columns (script_id, retry
-- config, dispatch_mode, registered-by principal); per-kind detail
-- tables hold the kind-specific filter columns. v1.1.1 ships two
-- kinds: KV (collection_glob + ops) and dead_letter (source / trigger
-- / script filters). Future kinds (cron, pubsub, queue, email) extend
-- the parent and add their own detail table.
--
-- `registered_by_principal` captures the admin user that registered
-- the trigger. The dispatcher resolves this back to a `Principal` at
-- execution time so the trigger runs as the user that set it up
-- (design notes §4: "a trigger execution runs as the principal that
-- registered the trigger").
--
-- HTTP routes stay in their own `routes` table for now (Phase 3
-- production schema with its own trie-index columns); the dispatcher
-- discriminates HTTP outbox rows by `source_kind = 'http'` and
-- `trigger_id` referencing `routes.id`. Folding routes into triggers
-- is a v1.2 cleanup, not a v1.1.1 requirement.
CREATE TABLE triggers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
script_id UUID NOT NULL REFERENCES scripts(id) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('kv', 'dead_letter')),
enabled BOOLEAN NOT NULL DEFAULT TRUE,
-- Async by default — sync would mean the trigger fires inline with
-- the originating mutation, which v1.1.1 doesn't support.
dispatch_mode TEXT NOT NULL DEFAULT 'async'
CHECK (dispatch_mode IN ('sync', 'async')),
-- Defaults applied at write time so the row is auditable on its
-- own. Per-trigger overrides set on create; the env-defined
-- defaults provide the fallback values.
retry_max_attempts INT NOT NULL,
retry_backoff TEXT NOT NULL
CHECK (retry_backoff IN ('exponential', 'linear', 'constant')),
retry_base_ms INT NOT NULL,
registered_by_principal UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- The dispatcher's hot lookup: "all enabled triggers for app X of
-- kind Y". Indexed only when enabled = TRUE so disabled rows don't
-- pollute the index.
CREATE INDEX idx_triggers_app_kind_enabled
ON triggers (app_id, kind)
WHERE enabled = TRUE;
-- One row per KV trigger. `collection_glob` accepts:
-- "*" — any collection in the app
-- "widgets" — exact match
-- "users:*" — prefix wildcard (matched in Rust, not SQL)
-- `ops` is the subset of {insert, update, delete} this trigger
-- subscribes to. Empty array means "any op" (the trigger fires on
-- every mutation; admin endpoint validates this).
CREATE TABLE kv_trigger_details (
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
collection_glob TEXT NOT NULL,
ops TEXT[] NOT NULL
);
-- One row per dead-letter trigger. All three filter columns are
-- nullable — NULL means "no filter on this dimension". A trigger
-- with all three nullable filters fires on every dead-letter row.
CREATE TABLE dead_letter_trigger_details (
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
source_filter TEXT,
trigger_id_filter UUID,
script_id_filter UUID
);

View File

@@ -0,0 +1,64 @@
-- v1.1.1: Universal trigger outbox — design notes §2.
--
-- One table for every async dispatch in the system. KV/cron/pubsub/
-- queue/email/dead-letter all write rows in this shape; the dispatcher
-- claims due rows with `FOR UPDATE SKIP LOCKED` and routes them to
-- the executor.
--
-- Sync HTTP also writes here (NATS-style inbox, design notes §3) —
-- `reply_to` carries an `inbox_id` that the orchestrator awaits on a
-- oneshot channel. `reply_to.is_some()` is the "don't retry" signal:
-- one attempt, surface the result via the inbox.
--
-- `trigger_id` is a polymorphic reference discriminated by
-- `source_kind`: for `source_kind='http'` it references `routes.id`;
-- otherwise it references `triggers.id`. Polymorphism handled in
-- Rust (the dispatcher); no DB-level FK because Postgres doesn't
-- support polymorphic FKs cleanly. NULL is allowed because direct
-- admin-replay paths may not have a triggering row at all.
--
-- `script_id` denormalized so the dispatcher resolves the target
-- script without an extra round-trip per row.
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
source_kind TEXT NOT NULL
CHECK (source_kind IN ('http', 'kv', 'dead_letter')),
-- Polymorphic — see comment above. No FK constraint.
trigger_id UUID,
-- Pre-resolved at write time so the dispatcher doesn't re-look it up.
script_id UUID,
-- NULL = async (retry per policy). Some(inbox_id) = sync HTTP
-- (never retry; resolve the inbox with the result).
reply_to UUID,
-- ServiceEvent + ExecRequest scaffold serialized as JSONB.
payload JSONB NOT NULL,
-- Forensic field — the principal that triggered the originating
-- event. NOT the execution principal for trigger fan-out (that
-- comes from `triggers.registered_by_principal`).
origin_principal UUID,
-- Trigger-depth as the dispatcher will hand it to the executor.
-- Read out into ExecRequest.trigger_depth at dispatch time.
trigger_depth INT NOT NULL DEFAULT 0,
-- Originating execution id (for audit log grouping). Equals the
-- root for direct invocations; preserved across fan-out chains.
root_execution_id UUID,
attempt_count INT NOT NULL DEFAULT 0,
next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Set inside the SELECT FOR UPDATE SKIP LOCKED transaction so
-- the dispatcher can't double-pick a row across concurrent loop
-- iterations.
claimed_at TIMESTAMPTZ,
claimed_by TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Hot index: the dispatcher's `WHERE next_attempt_at <= NOW() AND
-- claimed_at IS NULL` claim query. Partial index keeps the hot set
-- small even if the table grows large.
CREATE INDEX idx_outbox_due
ON outbox (next_attempt_at)
WHERE claimed_at IS NULL;
CREATE INDEX idx_outbox_app ON outbox (app_id);

View File

@@ -0,0 +1,50 @@
-- v1.1.1: dead_letters — design notes §4.
--
-- Async invocations that exhaust their retry policy land here. Each
-- row carries the original event payload verbatim plus the attempt
-- history so handlers (registered via `dead_letter` triggers) and the
-- dashboard can decide what to do.
--
-- Schema mirrors design notes §4. The CHECK constraint on
-- `resolution` enforces the closed vocabulary used by both the SDK
-- (`dead_letters::resolve(id, reason)`) and the recursion-stop rule
-- (`handler_failed`). Sync HTTP failures (`reply_to.is_some()`) never
-- land here — they're served via the inbox channel.
--
-- Indexes:
-- - partial index on unresolved rows: the dashboard's
-- unresolved-count badge query (`COUNT(*) WHERE app_id = $1 AND
-- resolved_at IS NULL`).
-- - GC index on `created_at`: the weekly retention sweep.
CREATE TABLE dead_letters (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
-- The outbox.id row that exhausted retries. The outbox row itself
-- has been deleted at this point.
original_event_id UUID NOT NULL,
source TEXT NOT NULL,
op TEXT NOT NULL,
-- Nullable because direct admin replays may have no trigger row.
trigger_id UUID,
script_id UUID,
payload JSONB NOT NULL,
attempt_count INT NOT NULL,
first_attempt_at TIMESTAMPTZ NOT NULL,
last_attempt_at TIMESTAMPTZ NOT NULL,
last_error TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolved_at TIMESTAMPTZ,
resolution TEXT
CHECK (resolution IN
('replayed', 'ignored', 'handled_by_script', 'handler_failed'))
);
-- Dashboard unresolved-count badge — partial index on the predicate
-- the query uses.
CREATE INDEX idx_dead_letters_app_unresolved
ON dead_letters (app_id)
WHERE resolved_at IS NULL;
-- GC sweep scans by creation time.
CREATE INDEX idx_dead_letters_gc ON dead_letters (created_at);

View File

@@ -0,0 +1,31 @@
-- v1.1.1: abandoned_executions — design notes §3 #9.
--
-- Forensic table for the "dispatcher tried to resolve a oneshot inbox
-- but the receiver was already dropped" edge case. The orchestrator
-- timed out (returned 504 to the caller) and gave up on the channel,
-- but then the dispatcher's execution succeeded later. The caller
-- never sees the result; the row exists so the operator can
-- correlate when the abandoned-counter metric spikes.
--
-- Only the dispatcher-after-orchestrator-timeout edge case writes
-- here; ordinary "script timed out, caller got 504" stays uneventful.
--
-- 7-day retention, GC by `created_at`, sweep alongside dead_letters.
CREATE TABLE abandoned_executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
-- Original outbox row id (the row itself has been deleted).
outbox_id UUID NOT NULL,
script_id UUID,
-- The inbox channel id the dispatcher tried to resolve.
inbox_id UUID NOT NULL,
-- The HTTP status code the dispatcher attempted to send back.
status_code INT NOT NULL,
-- Truncated body / error description (capped at write time —
-- the dispatcher doesn't need to ship megabytes here).
result_summary TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_abandoned_executions_gc ON abandoned_executions (created_at);

View File

@@ -0,0 +1,16 @@
-- v1.1.1: per-route dispatch mode (design notes §2 + §3).
--
-- `sync` (default): orchestrator awaits the executor inline and
-- returns the response in the same HTTP request — current MVP
-- behaviour.
-- `async`: orchestrator writes the request to the trigger outbox,
-- returns `202 Accepted` immediately. The dispatcher runs the
-- script in the background and surfaces failures via the
-- retry / dead-letter machinery — same shape as any other async
-- event.
--
-- Existing routes default to `sync` so the migration is non-breaking.
ALTER TABLE routes
ADD COLUMN dispatch_mode TEXT NOT NULL DEFAULT 'sync'
CHECK (dispatch_mode IN ('sync', 'async'));

View File

@@ -0,0 +1,128 @@
//! `AbandonedExecutionsRepo` — forensic table written by the
//! dispatcher when it tries to resolve a sync-HTTP inbox channel
//! that's already been dropped (orchestrator timed out and gave up).
//!
//! Schema: see `migrations/0011_abandoned_executions.sql`.
//!
//! Tiny surface: insert + GC. Reading happens via direct SQL when
//! correlating the metric counter spike.
use async_trait::async_trait;
use chrono::{DateTime, Utc};
use picloud_shared::{AppId, ScriptId};
use sqlx::PgPool;
use uuid::Uuid;
#[derive(Debug, thiserror::Error)]
pub enum AbandonedRepoError {
#[error("database error: {0}")]
Db(#[from] sqlx::Error),
}
#[derive(Debug, Clone)]
pub struct NewAbandonedExecution {
pub app_id: AppId,
pub outbox_id: Uuid,
pub script_id: Option<ScriptId>,
pub inbox_id: Uuid,
pub status_code: u16,
pub result_summary: Option<String>,
}
#[async_trait]
pub trait AbandonedRepo: Send + Sync {
async fn insert(&self, row: NewAbandonedExecution) -> Result<Uuid, AbandonedRepoError>;
/// Retention sweep — deletes rows older than `older_than` up to
/// `limit` at a time.
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, AbandonedRepoError>;
}
pub struct PostgresAbandonedRepo {
pool: PgPool,
}
impl PostgresAbandonedRepo {
#[must_use]
pub fn new(pool: PgPool) -> Self {
Self { pool }
}
}
const SUMMARY_CAP_BYTES: usize = 4096;
#[async_trait]
impl AbandonedRepo for PostgresAbandonedRepo {
async fn insert(&self, row: NewAbandonedExecution) -> Result<Uuid, AbandonedRepoError> {
// Truncate the summary at write-time. The forensic table
// doesn't need megabytes; the original outbox row may have
// been arbitrary size but we lose nothing useful by clipping.
let summary = row.result_summary.map(|s| truncate(s, SUMMARY_CAP_BYTES));
let (id,): (Uuid,) = sqlx::query_as(
"INSERT INTO abandoned_executions ( \
app_id, outbox_id, script_id, inbox_id, status_code, result_summary \
) VALUES ($1, $2, $3, $4, $5, $6) \
RETURNING id",
)
.bind(row.app_id.into_inner())
.bind(row.outbox_id)
.bind(row.script_id.map(ScriptId::into_inner))
.bind(row.inbox_id)
.bind(i32::from(row.status_code))
.bind(summary)
.fetch_one(&self.pool)
.await?;
Ok(id)
}
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, AbandonedRepoError> {
let res = sqlx::query(
"DELETE FROM abandoned_executions \
WHERE id IN ( \
SELECT id FROM abandoned_executions \
WHERE created_at < $1 \
FOR UPDATE SKIP LOCKED \
LIMIT $2 \
)",
)
.bind(older_than)
.bind(limit)
.execute(&self.pool)
.await?;
Ok(res.rows_affected())
}
}
fn truncate(mut s: String, max_bytes: usize) -> String {
if s.len() <= max_bytes {
return s;
}
// Walk back from `max_bytes` to a UTF-8 char boundary so we never
// panic on `truncate` mid-codepoint.
let mut cut = max_bytes;
while cut > 0 && !s.is_char_boundary(cut) {
cut -= 1;
}
s.truncate(cut);
s
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn truncate_respects_char_boundaries() {
// 3-byte UTF-8 chars; cap inside the middle char should walk
// back to the start.
let s = "héllo".to_string();
let t = truncate(s, 2);
assert!(t.is_char_boundary(t.len()));
assert_eq!(t, "h");
}
#[test]
fn truncate_passthrough_for_short_strings() {
assert_eq!(truncate("ok".into(), 100), "ok");
}
}

View File

@@ -82,6 +82,7 @@ async fn seed_into(
// Accept any method so both `curl /hello` and // Accept any method so both `curl /hello` and
// `curl -d '{"name":"X"}' /hello` work out of the box. // `curl -d '{"name":"X"}' /hello` work out of the box.
method: None, method: None,
dispatch_mode: picloud_shared::DispatchMode::Sync,
}) })
.await?; .await?;

View File

@@ -57,6 +57,21 @@ pub enum Capability {
AppAdmin(AppId), AppAdmin(AppId),
/// Read execution logs for scripts in this app. /// Read execution logs for scripts in this app.
AppLogRead(AppId), AppLogRead(AppId),
/// Read entries from this app's KV store (v1.1.1). Granted to
/// `viewer`+ in the per-app role table. Maps to `script:read` on
/// API keys — the seven-scope vocabulary stays locked.
AppKvRead(AppId),
/// Write entries to this app's KV store (v1.1.1). Granted to
/// `editor`+. Maps to `script:write` on API keys.
AppKvWrite(AppId),
/// Create / list / delete triggers for this app (v1.1.1). Maps to
/// `app:admin` on API keys — triggers are app-configuration acts
/// rather than data-plane access. Granted to `app_admin`+.
AppManageTriggers(AppId),
/// Replay / resolve dead-letter rows for this app (v1.1.1). Maps
/// to `app:admin` on API keys. Public-HTTP scripts (principal None)
/// fail this check — managing dead letters is an admin act.
AppDeadLetterManage(AppId),
} }
impl Capability { impl Capability {
@@ -73,7 +88,11 @@ impl Capability {
| Self::AppWriteRoute(id) | Self::AppWriteRoute(id)
| Self::AppManageDomains(id) | Self::AppManageDomains(id)
| Self::AppAdmin(id) | Self::AppAdmin(id)
| Self::AppLogRead(id) => Some(id), | Self::AppLogRead(id)
| Self::AppKvRead(id)
| Self::AppKvWrite(id)
| Self::AppManageTriggers(id)
| Self::AppDeadLetterManage(id) => Some(id),
} }
} }
@@ -88,11 +107,13 @@ impl Capability {
Self::InstanceCreateApp | Self::InstanceManageUsers | Self::InstanceManageSettings => { Self::InstanceCreateApp | Self::InstanceManageUsers | Self::InstanceManageSettings => {
Scope::InstanceAdmin Scope::InstanceAdmin
} }
Self::AppRead(_) => Scope::ScriptRead, Self::AppRead(_) | Self::AppKvRead(_) => Scope::ScriptRead,
Self::AppWriteScript(_) => Scope::ScriptWrite, Self::AppWriteScript(_) | Self::AppKvWrite(_) => Scope::ScriptWrite,
Self::AppWriteRoute(_) => Scope::RouteWrite, Self::AppWriteRoute(_) => Scope::RouteWrite,
Self::AppManageDomains(_) => Scope::DomainManage, Self::AppManageDomains(_) => Scope::DomainManage,
Self::AppAdmin(_) => Scope::AppAdmin, Self::AppAdmin(_) | Self::AppManageTriggers(_) | Self::AppDeadLetterManage(_) => {
Scope::AppAdmin
}
Self::AppLogRead(_) => Scope::LogRead, Self::AppLogRead(_) => Scope::LogRead,
} }
} }
@@ -230,16 +251,24 @@ async fn member_grants(
/// domain claims, and delete. Roles form a strict subset chain, so /// domain claims, and delete. Roles form a strict subset chain, so
/// the check is "is this capability in the role's set?". /// the check is "is this capability in the role's set?".
const fn role_satisfies(role: AppRole, cap: Capability) -> bool { const fn role_satisfies(role: AppRole, cap: Capability) -> bool {
let in_viewer = matches!(cap, Capability::AppRead(_) | Capability::AppLogRead(_)); let in_viewer = matches!(
cap,
Capability::AppRead(_) | Capability::AppLogRead(_) | Capability::AppKvRead(_)
);
let in_editor = in_viewer let in_editor = in_viewer
|| matches!( || matches!(
cap, cap,
Capability::AppWriteScript(_) | Capability::AppWriteRoute(_) Capability::AppWriteScript(_)
| Capability::AppWriteRoute(_)
| Capability::AppKvWrite(_)
); );
let in_app_admin = in_editor let in_app_admin = in_editor
|| matches!( || matches!(
cap, cap,
Capability::AppManageDomains(_) | Capability::AppAdmin(_) Capability::AppManageDomains(_)
| Capability::AppAdmin(_)
| Capability::AppManageTriggers(_)
| Capability::AppDeadLetterManage(_)
); );
match role { match role {
AppRole::Viewer => in_viewer, AppRole::Viewer => in_viewer,

View File

@@ -0,0 +1,261 @@
//! `DeadLetterRepo` — CRUD over the `dead_letters` table.
//!
//! The dispatcher writes new rows when an async trigger exhausts its
//! retry policy. Admin endpoints (commit 8) read for the dashboard
//! list view and write to mark rows resolved or replay them. The GC
//! sweeper (commit 10) deletes expired rows by `created_at`.
use async_trait::async_trait;
use chrono::{DateTime, Utc};
use picloud_shared::{AppId, DeadLetterId, ScriptId, TriggerId};
use sqlx::PgPool;
use uuid::Uuid;
#[derive(Debug, thiserror::Error)]
pub enum DeadLetterRepoError {
#[error("database error: {0}")]
Db(#[from] sqlx::Error),
#[error("dead-letter row not found: {0}")]
NotFound(DeadLetterId),
#[error("invalid resolution {0:?}")]
InvalidResolution(String),
}
#[derive(Debug, Clone)]
pub struct NewDeadLetter {
pub app_id: AppId,
/// `outbox.id` that exhausted retries. Outbox row deleted at the
/// same time.
pub original_event_id: Uuid,
pub source: String,
pub op: String,
pub trigger_id: Option<TriggerId>,
pub script_id: Option<ScriptId>,
pub payload: serde_json::Value,
pub attempt_count: u32,
pub first_attempt_at: DateTime<Utc>,
pub last_attempt_at: DateTime<Utc>,
pub last_error: String,
}
#[derive(Debug, Clone)]
pub struct DeadLetterRow {
pub id: DeadLetterId,
pub app_id: AppId,
pub original_event_id: Uuid,
pub source: String,
pub op: String,
pub trigger_id: Option<TriggerId>,
pub script_id: Option<ScriptId>,
pub payload: serde_json::Value,
pub attempt_count: u32,
pub first_attempt_at: DateTime<Utc>,
pub last_attempt_at: DateTime<Utc>,
pub last_error: String,
pub created_at: DateTime<Utc>,
pub resolved_at: Option<DateTime<Utc>>,
pub resolution: Option<String>,
}
#[async_trait]
pub trait DeadLetterRepo: Send + Sync {
/// Insert a new dead-letter row. Returns the assigned id.
async fn insert(&self, row: NewDeadLetter) -> Result<DeadLetterId, DeadLetterRepoError>;
async fn get(&self, id: DeadLetterId) -> Result<Option<DeadLetterRow>, DeadLetterRepoError>;
/// Lookup for the dashboard list view. `unresolved_only=true`
/// filters to `resolved_at IS NULL`.
async fn list_for_app(
&self,
app_id: AppId,
unresolved_only: bool,
limit: i64,
offset: i64,
) -> Result<Vec<DeadLetterRow>, DeadLetterRepoError>;
/// Hot path for the dashboard's per-app unresolved-count badge.
async fn unresolved_count(&self, app_id: AppId) -> Result<i64, DeadLetterRepoError>;
/// Mark the row resolved with the given reason. The reason MUST
/// be one of the four CHECK-constraint values
/// (`replayed`, `ignored`, `handled_by_script`, `handler_failed`).
async fn resolve(&self, id: DeadLetterId, reason: &str) -> Result<(), DeadLetterRepoError>;
/// Retention sweep. Deletes rows with `created_at < older_than`
/// up to `limit` at a time, using FOR UPDATE SKIP LOCKED to play
/// nicely with concurrent dispatchers. Returns the count deleted.
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, DeadLetterRepoError>;
}
pub struct PostgresDeadLetterRepo {
pool: PgPool,
}
impl PostgresDeadLetterRepo {
#[must_use]
pub fn new(pool: PgPool) -> Self {
Self { pool }
}
}
const ALLOWED_RESOLUTIONS: &[&str] =
&["replayed", "ignored", "handled_by_script", "handler_failed"];
#[async_trait]
impl DeadLetterRepo for PostgresDeadLetterRepo {
async fn insert(&self, row: NewDeadLetter) -> Result<DeadLetterId, DeadLetterRepoError> {
let (id,): (Uuid,) = sqlx::query_as(
"INSERT INTO dead_letters ( \
app_id, original_event_id, source, op, trigger_id, script_id, \
payload, attempt_count, first_attempt_at, last_attempt_at, last_error \
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) \
RETURNING id",
)
.bind(row.app_id.into_inner())
.bind(row.original_event_id)
.bind(row.source)
.bind(row.op)
.bind(row.trigger_id.map(TriggerId::into_inner))
.bind(row.script_id.map(ScriptId::into_inner))
.bind(row.payload)
.bind(i32::try_from(row.attempt_count).unwrap_or(0))
.bind(row.first_attempt_at)
.bind(row.last_attempt_at)
.bind(row.last_error)
.fetch_one(&self.pool)
.await?;
Ok(id.into())
}
async fn get(&self, id: DeadLetterId) -> Result<Option<DeadLetterRow>, DeadLetterRepoError> {
let row: Option<DeadLetterRowRaw> = sqlx::query_as(
"SELECT id, app_id, original_event_id, source, op, trigger_id, script_id, \
payload, attempt_count, first_attempt_at, last_attempt_at, \
last_error, created_at, resolved_at, resolution \
FROM dead_letters WHERE id = $1",
)
.bind(id.into_inner())
.fetch_optional(&self.pool)
.await?;
Ok(row.map(DeadLetterRowRaw::into_row))
}
async fn list_for_app(
&self,
app_id: AppId,
unresolved_only: bool,
limit: i64,
offset: i64,
) -> Result<Vec<DeadLetterRow>, DeadLetterRepoError> {
let rows: Vec<DeadLetterRowRaw> = sqlx::query_as(
"SELECT id, app_id, original_event_id, source, op, trigger_id, script_id, \
payload, attempt_count, first_attempt_at, last_attempt_at, \
last_error, created_at, resolved_at, resolution \
FROM dead_letters \
WHERE app_id = $1 \
AND ($2::bool = FALSE OR resolved_at IS NULL) \
ORDER BY created_at DESC \
LIMIT $3 OFFSET $4",
)
.bind(app_id.into_inner())
.bind(unresolved_only)
.bind(limit)
.bind(offset)
.fetch_all(&self.pool)
.await?;
Ok(rows.into_iter().map(DeadLetterRowRaw::into_row).collect())
}
async fn unresolved_count(&self, app_id: AppId) -> Result<i64, DeadLetterRepoError> {
let (count,): (i64,) = sqlx::query_as(
"SELECT COUNT(*) FROM dead_letters \
WHERE app_id = $1 AND resolved_at IS NULL",
)
.bind(app_id.into_inner())
.fetch_one(&self.pool)
.await?;
Ok(count)
}
async fn resolve(&self, id: DeadLetterId, reason: &str) -> Result<(), DeadLetterRepoError> {
if !ALLOWED_RESOLUTIONS.contains(&reason) {
return Err(DeadLetterRepoError::InvalidResolution(reason.to_string()));
}
let res = sqlx::query(
"UPDATE dead_letters \
SET resolution = $2, resolved_at = NOW() \
WHERE id = $1",
)
.bind(id.into_inner())
.bind(reason)
.execute(&self.pool)
.await?;
if res.rows_affected() == 0 {
return Err(DeadLetterRepoError::NotFound(id));
}
Ok(())
}
async fn gc(&self, older_than: DateTime<Utc>, limit: i64) -> Result<u64, DeadLetterRepoError> {
// Tombstones picked under FOR UPDATE SKIP LOCKED so concurrent
// sweepers (cluster mode) don't fight each other.
let res = sqlx::query(
"DELETE FROM dead_letters \
WHERE id IN ( \
SELECT id FROM dead_letters \
WHERE created_at < $1 \
FOR UPDATE SKIP LOCKED \
LIMIT $2 \
)",
)
.bind(older_than)
.bind(limit)
.execute(&self.pool)
.await?;
Ok(res.rows_affected())
}
}
#[derive(sqlx::FromRow)]
struct DeadLetterRowRaw {
id: Uuid,
app_id: Uuid,
original_event_id: Uuid,
source: String,
op: String,
trigger_id: Option<Uuid>,
script_id: Option<Uuid>,
payload: serde_json::Value,
attempt_count: i32,
first_attempt_at: DateTime<Utc>,
last_attempt_at: DateTime<Utc>,
last_error: String,
created_at: DateTime<Utc>,
resolved_at: Option<DateTime<Utc>>,
resolution: Option<String>,
}
impl DeadLetterRowRaw {
fn into_row(self) -> DeadLetterRow {
DeadLetterRow {
id: self.id.into(),
app_id: self.app_id.into(),
original_event_id: self.original_event_id,
source: self.source,
op: self.op,
trigger_id: self.trigger_id.map(Into::into),
script_id: self.script_id.map(Into::into),
payload: self.payload,
attempt_count: u32::try_from(self.attempt_count).unwrap_or(0),
first_attempt_at: self.first_attempt_at,
last_attempt_at: self.last_attempt_at,
last_error: self.last_error,
created_at: self.created_at,
resolved_at: self.resolved_at,
resolution: self.resolution,
}
}
}

View File

@@ -0,0 +1,118 @@
//! `PostgresDeadLetterService` — replaces `NoopDeadLetterService` in
//! v1.1.1's `Services` bundle. Implements `replay` (re-enqueue the
//! original event into the outbox + mark the DL row replayed) and
//! `resolve` (close the row out with a reason).
//!
//! Both methods are gated by `Capability::AppDeadLetterManage(AppId)`
//! evaluated against `cx.principal`. Public-HTTP scripts with
//! `principal: None` fail the check — design notes §4: managing
//! dead letters is an admin act.
use std::sync::Arc;
use async_trait::async_trait;
use picloud_shared::{DeadLetterError, DeadLetterId, DeadLetterService, SdkCallCx};
use crate::authz::{self, AuthzRepo, Capability};
use crate::dead_letter_repo::{DeadLetterRepo, DeadLetterRepoError, DeadLetterRow};
use crate::outbox_repo::{NewOutboxRow, OutboxRepo, OutboxSourceKind};
pub struct PostgresDeadLetterService {
repo: Arc<dyn DeadLetterRepo>,
outbox: Arc<dyn OutboxRepo>,
authz: Arc<dyn AuthzRepo>,
}
impl PostgresDeadLetterService {
#[must_use]
pub fn new(
repo: Arc<dyn DeadLetterRepo>,
outbox: Arc<dyn OutboxRepo>,
authz: Arc<dyn AuthzRepo>,
) -> Self {
Self {
repo,
outbox,
authz,
}
}
async fn require_dl_capability(&self, cx: &SdkCallCx) -> Result<(), DeadLetterError> {
let Some(ref principal) = cx.principal else {
return Err(DeadLetterError::Forbidden);
};
authz::require(
&*self.authz,
principal,
Capability::AppDeadLetterManage(cx.app_id),
)
.await
.map_err(|_| DeadLetterError::Forbidden)
}
async fn load_row(&self, id: DeadLetterId) -> Result<DeadLetterRow, DeadLetterError> {
self.repo
.get(id)
.await
.map_err(map_repo_err)?
.ok_or(DeadLetterError::NotFound)
}
}
#[async_trait]
impl DeadLetterService for PostgresDeadLetterService {
async fn replay(&self, cx: &SdkCallCx, id: DeadLetterId) -> Result<(), DeadLetterError> {
self.require_dl_capability(cx).await?;
let row = self.load_row(id).await?;
if row.app_id != cx.app_id {
// Cross-app — treat as not-found to avoid leaking
// information about other apps' dead letters.
return Err(DeadLetterError::NotFound);
}
let source_kind = OutboxSourceKind::from_wire(&row.source).unwrap_or(OutboxSourceKind::Kv);
self.outbox
.insert(NewOutboxRow {
app_id: row.app_id,
source_kind,
trigger_id: row.trigger_id,
script_id: row.script_id,
reply_to: None,
payload: row.payload.clone(),
origin_principal: None,
trigger_depth: 0,
root_execution_id: None,
})
.await
.map_err(|e| DeadLetterError::Backend(e.to_string()))?;
self.repo
.resolve(id, "replayed")
.await
.map_err(map_repo_err)?;
Ok(())
}
async fn resolve(
&self,
cx: &SdkCallCx,
id: DeadLetterId,
reason: &str,
) -> Result<(), DeadLetterError> {
self.require_dl_capability(cx).await?;
let row = self.load_row(id).await?;
if row.app_id != cx.app_id {
return Err(DeadLetterError::NotFound);
}
self.repo.resolve(id, reason).await.map_err(map_repo_err)?;
Ok(())
}
}
fn map_repo_err(e: DeadLetterRepoError) -> DeadLetterError {
match e {
DeadLetterRepoError::NotFound(_) => DeadLetterError::NotFound,
DeadLetterRepoError::InvalidResolution(s) => DeadLetterError::InvalidResolution(s),
DeadLetterRepoError::Db(e) => DeadLetterError::Backend(e.to_string()),
}
}

View File

@@ -0,0 +1,316 @@
//! `/api/v1/admin/apps/{id}/dead_letters/*` — dashboard surface for
//! the no-default-handler model (design notes §4).
//!
//! Endpoints:
//! - `GET /apps/{id}/dead_letters?unresolved=true` — list view
//! - `GET /apps/{id}/dead_letters/count` — badge count
//! - `GET /apps/{id}/dead_letters/{dl_id}` — row detail
//! - `POST /apps/{id}/dead_letters/{dl_id}/replay` — re-enqueue
//! - `POST /apps/{id}/dead_letters/{dl_id}/resolve` — mark resolved
//!
//! All gated on `Capability::AppDeadLetterManage(app_id)`.
use std::sync::Arc;
use axum::extract::{Path, Query, State};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Json, Response};
use axum::routing::{get, post};
use axum::{Extension, Router};
use picloud_shared::{AppId, DeadLetterId, DeadLetterService, Principal, SdkCallCx};
use serde::{Deserialize, Serialize};
use serde_json::json;
use crate::app_repo::AppRepository;
use crate::authz::{require, AuthzDenied, AuthzError, AuthzRepo, Capability};
use crate::dead_letter_repo::{DeadLetterRepo, DeadLetterRepoError, DeadLetterRow};
#[derive(Clone)]
pub struct DeadLettersState {
pub repo: Arc<dyn DeadLetterRepo>,
pub service: Arc<dyn DeadLetterService>,
pub apps: Arc<dyn AppRepository>,
pub authz: Arc<dyn AuthzRepo>,
}
pub fn dead_letters_router(state: DeadLettersState) -> Router {
Router::new()
.route("/apps/{app_id}/dead_letters", get(list))
.route("/apps/{app_id}/dead_letters/count", get(count))
.route("/apps/{app_id}/dead_letters/{dl_id}", get(detail))
.route("/apps/{app_id}/dead_letters/{dl_id}/replay", post(replay))
.route("/apps/{app_id}/dead_letters/{dl_id}/resolve", post(resolve))
.with_state(state)
}
#[derive(Debug, Deserialize)]
pub struct ListQuery {
#[serde(default)]
pub unresolved: bool,
#[serde(default = "default_limit")]
pub limit: i64,
#[serde(default)]
pub offset: i64,
}
const fn default_limit() -> i64 {
50
}
#[derive(Debug, Serialize)]
pub struct ListResponse {
pub dead_letters: Vec<DeadLetterDto>,
}
#[derive(Debug, Serialize)]
pub struct CountResponse {
pub unresolved: i64,
}
#[derive(Debug, Deserialize)]
pub struct ResolveBody {
pub reason: String,
}
#[derive(Debug, Serialize)]
pub struct DeadLetterDto {
pub id: DeadLetterId,
pub app_id: AppId,
pub source: String,
pub op: String,
pub trigger_id: Option<picloud_shared::TriggerId>,
pub script_id: Option<picloud_shared::ScriptId>,
pub payload: serde_json::Value,
pub attempt_count: u32,
pub first_attempt_at: chrono::DateTime<chrono::Utc>,
pub last_attempt_at: chrono::DateTime<chrono::Utc>,
pub last_error: String,
pub created_at: chrono::DateTime<chrono::Utc>,
pub resolved_at: Option<chrono::DateTime<chrono::Utc>>,
pub resolution: Option<String>,
}
impl From<DeadLetterRow> for DeadLetterDto {
fn from(r: DeadLetterRow) -> Self {
Self {
id: r.id,
app_id: r.app_id,
source: r.source,
op: r.op,
trigger_id: r.trigger_id,
script_id: r.script_id,
payload: r.payload,
attempt_count: r.attempt_count,
first_attempt_at: r.first_attempt_at,
last_attempt_at: r.last_attempt_at,
last_error: r.last_error,
created_at: r.created_at,
resolved_at: r.resolved_at,
resolution: r.resolution,
}
}
}
async fn list(
State(s): State<DeadLettersState>,
Extension(principal): Extension<Principal>,
Path(app_id): Path<AppId>,
Query(q): Query<ListQuery>,
) -> Result<Json<ListResponse>, DeadLettersApiError> {
ensure_app(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppDeadLetterManage(app_id),
)
.await?;
let rows = s
.repo
.list_for_app(app_id, q.unresolved, q.limit.clamp(1, 200), q.offset.max(0))
.await?;
Ok(Json(ListResponse {
dead_letters: rows.into_iter().map(Into::into).collect(),
}))
}
async fn count(
State(s): State<DeadLettersState>,
Extension(principal): Extension<Principal>,
Path(app_id): Path<AppId>,
) -> Result<Json<CountResponse>, DeadLettersApiError> {
ensure_app(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppDeadLetterManage(app_id),
)
.await?;
let n = s.repo.unresolved_count(app_id).await?;
Ok(Json(CountResponse { unresolved: n }))
}
async fn detail(
State(s): State<DeadLettersState>,
Extension(principal): Extension<Principal>,
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
) -> Result<Json<DeadLetterDto>, DeadLettersApiError> {
ensure_app(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppDeadLetterManage(app_id),
)
.await?;
let row = s
.repo
.get(dl_id)
.await?
.ok_or(DeadLettersApiError::NotFound(dl_id))?;
if row.app_id != app_id {
return Err(DeadLettersApiError::NotFound(dl_id));
}
Ok(Json(row.into()))
}
async fn replay(
State(s): State<DeadLettersState>,
Extension(principal): Extension<Principal>,
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
) -> Result<StatusCode, DeadLettersApiError> {
ensure_app(&*s.apps, app_id).await?;
// Authz handled inside the service via SdkCallCx.
let cx = admin_cx(app_id, &principal);
s.service
.replay(&cx, dl_id)
.await
.map_err(map_service_err)?;
Ok(StatusCode::NO_CONTENT)
}
async fn resolve(
State(s): State<DeadLettersState>,
Extension(principal): Extension<Principal>,
Path((app_id, dl_id)): Path<(AppId, DeadLetterId)>,
Json(body): Json<ResolveBody>,
) -> Result<StatusCode, DeadLettersApiError> {
ensure_app(&*s.apps, app_id).await?;
let cx = admin_cx(app_id, &principal);
s.service
.resolve(&cx, dl_id, &body.reason)
.await
.map_err(map_service_err)?;
Ok(StatusCode::NO_CONTENT)
}
/// Synthesize an `SdkCallCx` for the admin path. The service layer
/// reads `cx.app_id` + `cx.principal` and ignores the trigger /
/// execution fields, so the per-call ids are arbitrary.
fn admin_cx(app_id: AppId, principal: &Principal) -> SdkCallCx {
SdkCallCx {
app_id,
principal: Some(principal.clone()),
execution_id: picloud_shared::ExecutionId::new(),
request_id: picloud_shared::RequestId::new(),
trigger_depth: 0,
root_execution_id: picloud_shared::ExecutionId::new(),
is_dead_letter_handler: false,
event: None,
}
}
async fn ensure_app(apps: &dyn AppRepository, app_id: AppId) -> Result<(), DeadLettersApiError> {
apps.get_by_id(app_id)
.await
.map_err(|e| DeadLettersApiError::Backend(e.to_string()))?
.ok_or_else(|| DeadLettersApiError::AppNotFound(app_id.to_string()))?;
Ok(())
}
fn map_service_err(e: picloud_shared::DeadLetterError) -> DeadLettersApiError {
match e {
picloud_shared::DeadLetterError::NotFound => {
DeadLettersApiError::NotFound(DeadLetterId::new())
}
picloud_shared::DeadLetterError::Forbidden => DeadLettersApiError::Forbidden,
picloud_shared::DeadLetterError::InvalidResolution(s) => {
DeadLettersApiError::Invalid(format!("invalid resolution: {s}"))
}
picloud_shared::DeadLetterError::Backend(s) => DeadLettersApiError::Backend(s),
}
}
#[derive(Debug, thiserror::Error)]
pub enum DeadLettersApiError {
#[error("app not found: {0}")]
AppNotFound(String),
#[error("dead-letter not found: {0}")]
NotFound(DeadLetterId),
#[error("invalid: {0}")]
Invalid(String),
#[error("forbidden")]
Forbidden,
#[error("authorization repo error: {0}")]
AuthzRepo(String),
#[error("dead-letter backend: {0}")]
Backend(String),
}
impl From<AuthzDenied> for DeadLettersApiError {
fn from(d: AuthzDenied) -> Self {
match d {
AuthzDenied::Denied => Self::Forbidden,
AuthzDenied::Repo(e) => Self::AuthzRepo(e.to_string()),
}
}
}
impl From<AuthzError> for DeadLettersApiError {
fn from(e: AuthzError) -> Self {
Self::AuthzRepo(e.to_string())
}
}
impl From<DeadLetterRepoError> for DeadLettersApiError {
fn from(e: DeadLetterRepoError) -> Self {
match e {
DeadLetterRepoError::NotFound(id) => Self::NotFound(id),
DeadLetterRepoError::InvalidResolution(s) => Self::Invalid(s),
DeadLetterRepoError::Db(e) => Self::Backend(e.to_string()),
}
}
}
impl IntoResponse for DeadLettersApiError {
fn into_response(self) -> Response {
let (status, body) = match &self {
Self::AppNotFound(_) | Self::NotFound(_) => {
(StatusCode::NOT_FOUND, json!({ "error": self.to_string() }))
}
Self::Invalid(_) => (
StatusCode::UNPROCESSABLE_ENTITY,
json!({ "error": self.to_string() }),
),
Self::Forbidden => (StatusCode::FORBIDDEN, json!({ "error": self.to_string() })),
Self::AuthzRepo(e) => {
tracing::error!(error = %e, "dead_letters authz repo error");
(
StatusCode::INTERNAL_SERVER_ERROR,
json!({ "error": "internal error" }),
)
}
Self::Backend(e) => {
tracing::error!(error = %e, "dead_letters api backend error");
(
StatusCode::INTERNAL_SERVER_ERROR,
json!({ "error": "internal error" }),
)
}
};
(status, Json(body)).into_response()
}
}

View File

@@ -0,0 +1,685 @@
//! The triggers-framework dispatcher.
//!
//! Single tokio task that polls the outbox, claims due rows
//! (`FOR UPDATE SKIP LOCKED`), and routes each to the executor.
//! Shares the `ExecutionGate` with sync HTTP — they compete for the
//! same permit budget, matching design notes §2.
//!
//! Outcome handling per design notes §3 and §4:
//! - reply_to.is_some() (sync HTTP): never retry. Deliver to inbox
//! (or write `abandoned_executions` if the receiver dropped).
//! - is_dead_letter_handler == true: never retry, never DL. Failure
//! just annotates the original DL row with `resolution =
//! 'handler_failed'` and bumps a metric.
//! - Otherwise on failure: if `attempt_count + 1 < max_attempts`,
//! reschedule with backoff + jitter. Else, write a `dead_letters`
//! row and delete from outbox.
//!
//! Depth-limit: `trigger_depth > max_trigger_depth` skips execution
//! entirely (log + metric) and deletes the row — does NOT dead-letter
//! (design notes §4: depth-exceeded means "you built a loop", and
//! dead-lettering would just re-fire the same loop).
use std::sync::Arc;
use std::time::Duration;
use chrono::Utc;
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
use picloud_orchestrator_core::{ExecutionGate, ExecutorClient};
use picloud_shared::{
ExecResponseSummary, ExecutionId, HttpDispatchPayload, InboxDeliveryOutcome, InboxFailureKind,
InboxResolver, InboxResult, RequestId, ScriptId, ScriptSandbox, TriggerEvent,
};
use rand::Rng;
use uuid::Uuid;
use crate::abandoned_repo::{AbandonedRepo, NewAbandonedExecution};
use crate::dead_letter_repo::{DeadLetterRepo, NewDeadLetter};
use crate::outbox_repo::{OutboxRepo, OutboxRow, OutboxSourceKind};
use crate::principal_resolver::PrincipalResolver;
use crate::repo::ScriptRepository;
use crate::trigger_config::{BackoffShape, TriggerConfig};
use crate::trigger_repo::{TriggerKind, TriggerRepo};
/// Bundle the dispatcher reads from. Each handle is `Arc<dyn …>` so
/// tests can substitute in-memory backings.
pub struct Dispatcher {
pub outbox: Arc<dyn OutboxRepo>,
pub triggers: Arc<dyn TriggerRepo>,
pub scripts: Arc<dyn ScriptRepository>,
pub dead_letters: Arc<dyn DeadLetterRepo>,
pub abandoned: Arc<dyn AbandonedRepo>,
pub principals: Arc<dyn PrincipalResolver>,
pub executor: Arc<dyn ExecutorClient>,
pub gate: Arc<ExecutionGate>,
pub inbox: Arc<dyn InboxResolver>,
pub config: TriggerConfig,
/// Stable id for this dispatcher instance — written into
/// `outbox.claimed_by` for forensics. In MVP this is the host's
/// pid; cluster mode (v1.3+) uses node identity.
pub instance_id: String,
}
/// How many outbox rows the dispatcher tries to claim per tick.
/// Bounded to keep the working set small even if there's a flood.
const CLAIM_BATCH: i64 = 8;
/// Polling cadence. Short enough that fan-out feels instant; long
/// enough that an idle dispatcher doesn't burn cycles.
const TICK_INTERVAL: Duration = Duration::from_millis(100);
/// Hard cap on the wall-clock budget passed to the executor for an
/// async-dispatched script. Sync HTTP gets a per-script timeout via
/// the orchestrator path; async rows don't have one, so we apply a
/// platform-wide ceiling here. Matches `LocalExecutorClient`'s own
/// 5-minute cap.
const ASYNC_EXEC_TIMEOUT: Duration = Duration::from_secs(300);
impl Dispatcher {
/// Spawn the dispatcher loop as a detached `tokio::task`. The
/// returned `JoinHandle` is dropped — the loop runs for the
/// process lifetime.
pub fn spawn(self) {
tokio::spawn(async move {
self.run().await;
});
}
async fn run(self) {
let mut ticker = tokio::time::interval(TICK_INTERVAL);
// Skip the immediate first fire so we don't race startup.
ticker.tick().await;
loop {
ticker.tick().await;
if let Err(err) = self.tick().await {
tracing::warn!(?err, "dispatcher tick errored");
}
}
}
async fn tick(&self) -> Result<(), DispatcherError> {
// Cheap gate sample so we don't claim rows we can't dispatch.
// The exact permit budget is reapplied per-row below.
let rows = self
.outbox
.claim_due(&self.instance_id, CLAIM_BATCH)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
if rows.is_empty() {
return Ok(());
}
for row in rows {
// Process serially within a tick — the outer ticker is the
// pacing mechanism. Concurrent dispatchers are a cluster-
// mode concern; v1.1.1 MVP has one.
if let Err(err) = self.dispatch_one(row).await {
tracing::warn!(?err, "dispatch one errored");
}
}
Ok(())
}
async fn dispatch_one(&self, row: OutboxRow) -> Result<(), DispatcherError> {
// Depth-limit check — design notes §4: loops aren't DL'd.
if row.trigger_depth > self.config.max_trigger_depth {
tracing::warn!(
outbox_id = %row.id,
app_id = %row.app_id,
trigger_depth = row.trigger_depth,
"trigger depth exceeded; dropping row"
);
// TODO(metrics): bump `picloud_trigger_depth_exceeded{app_id,trigger_id}`.
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
return Ok(());
}
// Gate admission — non-blocking. If the gate is saturated,
// release the claim by rescheduling so another tick can pick
// it up. The row stays "due" essentially immediately.
let Ok(permit) = self.gate.try_acquire() else {
let next = Utc::now() + chrono::Duration::milliseconds(100);
self.outbox
.reschedule(row.id, row.attempt_count, next)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
return Ok(());
};
// Resolve the trigger config (KV / DL) or pull the HTTP
// payload directly off the outbox row.
let (resolved, exec_req) = match row.source_kind {
OutboxSourceKind::Http => match self.build_http_request(&row).await {
Ok(pair) => pair,
Err(err) => {
tracing::warn!(outbox_id = %row.id, ?err, "http exec build failed; dropping");
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
drop(permit);
return Ok(());
}
},
OutboxSourceKind::Kv | OutboxSourceKind::DeadLetter => {
let resolved = self.resolve_trigger(&row).await?;
let req = match self.build_exec_request(&row, &resolved).await {
Ok(req) => req,
Err(err) => {
tracing::warn!(outbox_id = %row.id, ?err, "exec request build failed; dropping row");
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
drop(permit);
return Ok(());
}
};
(resolved, req)
}
};
// The gate permit auto-releases when this scope ends or when
// the executor finishes. We hand control to the executor and
// wait synchronously here — sync HTTP and dispatcher share the
// semaphore so this is intentional.
let source = resolved.script_source.clone();
let outcome = self
.executor
.execute(&source, exec_req, ASYNC_EXEC_TIMEOUT)
.await;
drop(permit);
match outcome {
Ok(resp) => self.handle_success(&row, &resolved, resp).await,
Err(err) => self.handle_failure(&row, &resolved, err).await,
}
}
async fn resolve_trigger(&self, row: &OutboxRow) -> Result<ResolvedTrigger, DispatcherError> {
// For KV and DL kinds, the outbox carries `trigger_id`. Use it
// to look up the trigger row, then resolve the script.
let Some(trigger_id) = row.trigger_id else {
return Err(DispatcherError::ResolveTrigger(
"outbox row missing trigger_id".into(),
));
};
let trigger = self
.triggers
.get(trigger_id)
.await
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
.ok_or_else(|| {
DispatcherError::ResolveTrigger(format!("trigger {trigger_id} not found"))
})?;
let script = self
.scripts
.get(trigger.script_id)
.await
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
.ok_or_else(|| {
DispatcherError::ResolveTrigger(format!("script {} not found", trigger.script_id))
})?;
Ok(ResolvedTrigger {
trigger_kind: trigger.kind,
is_dead_letter_handler: matches!(trigger.kind, TriggerKind::DeadLetter),
script_id: script.id,
script_source: script.source,
script_name: script.name,
sandbox_overrides: script.sandbox,
registered_by_principal: trigger.registered_by_principal,
retry_max_attempts: trigger.retry_max_attempts,
retry_backoff: trigger.retry_backoff,
retry_base_ms: trigger.retry_base_ms,
})
}
async fn build_exec_request(
&self,
row: &OutboxRow,
resolved: &ResolvedTrigger,
) -> Result<ExecRequest, DispatcherError> {
let trigger_event: TriggerEvent = serde_json::from_value(row.payload.clone())
.map_err(|e| DispatcherError::ResolveTrigger(format!("decode payload: {e}")))?;
let principal = self
.principals
.resolve(resolved.registered_by_principal)
.await
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?;
let execution_id = ExecutionId::new();
Ok(ExecRequest {
execution_id,
request_id: RequestId::new(),
script_id: resolved.script_id,
script_name: resolved.script_name.clone(),
invocation_type: InvocationType::Function,
path: format!("/trigger/{}", trigger_event.source()),
headers: std::collections::BTreeMap::new(),
body: serde_json::Value::Null,
params: std::collections::BTreeMap::new(),
query: std::collections::BTreeMap::new(),
rest: String::new(),
sandbox_overrides: resolved.sandbox_overrides,
app_id: row.app_id,
principal: Some(principal),
trigger_depth: row.trigger_depth,
root_execution_id: row.root_execution_id.unwrap_or(execution_id),
is_dead_letter_handler: resolved.is_dead_letter_handler,
event: Some(trigger_event),
})
}
/// Build an `(ResolvedTrigger, ExecRequest)` for an HTTP outbox
/// row. HTTP rows don't have a backing `triggers` row (the
/// `trigger_id` references `routes.id` instead). We pull the
/// script id off the outbox row, the request shape off the
/// payload, and synthesize a `ResolvedTrigger` with retry
/// settings irrelevant for HTTP (sync HTTP is never retried;
/// async HTTP uses default policy from `TriggerConfig`).
async fn build_http_request(
&self,
row: &OutboxRow,
) -> Result<(ResolvedTrigger, ExecRequest), DispatcherError> {
let Some(script_id) = row.script_id else {
return Err(DispatcherError::ResolveTrigger(
"HTTP outbox row missing script_id".into(),
));
};
let script = self
.scripts
.get(script_id)
.await
.map_err(|e| DispatcherError::ResolveTrigger(e.to_string()))?
.ok_or_else(|| {
DispatcherError::ResolveTrigger(format!("script {script_id} not found"))
})?;
let payload: HttpDispatchPayload = serde_json::from_value(row.payload.clone())
.map_err(|e| DispatcherError::ResolveTrigger(format!("decode http payload: {e}")))?;
let execution_id = ExecutionId::new();
let req = ExecRequest {
execution_id,
request_id: RequestId::new(),
script_id,
script_name: payload.script_name.clone(),
invocation_type: InvocationType::Http,
path: payload.path.clone(),
headers: payload.headers,
body: payload.body,
params: payload.params,
query: payload.query,
rest: payload.rest,
sandbox_overrides: script.sandbox,
app_id: row.app_id,
// HTTP outbox rows don't run as the trigger registrant —
// they run with no principal (public ingress) or the
// attached one (origin_principal forensic field is not
// promoted to execution principal in this MVP).
principal: None,
trigger_depth: row.trigger_depth,
root_execution_id: row.root_execution_id.unwrap_or(execution_id),
is_dead_letter_handler: false,
event: None,
};
let resolved = ResolvedTrigger {
trigger_kind: TriggerKind::Kv, // placeholder; HTTP doesn't have a kind
is_dead_letter_handler: false,
script_id,
script_source: script.source,
script_name: payload.script_name,
sandbox_overrides: script.sandbox,
// HTTP outbox rows don't carry a registered_by_principal
// — use a sentinel zero UUID since this field isn't used
// downstream for HTTP (no retries, no inbox principal).
registered_by_principal: picloud_shared::AdminUserId::from(uuid::Uuid::nil()),
// Async HTTP uses the platform default retry policy from
// TriggerConfig. Sync HTTP (reply_to.is_some) never retries
// regardless.
retry_max_attempts: self.config.retry_max_attempts,
retry_backoff: self.config.retry_backoff,
retry_base_ms: self.config.retry_base_ms,
};
Ok((resolved, req))
}
async fn handle_success(
&self,
row: &OutboxRow,
_resolved: &ResolvedTrigger,
resp: ExecResponse,
) -> Result<(), DispatcherError> {
if let Some(inbox_id) = row.reply_to {
self.deliver_inbox(row, inbox_id, InboxResult::Success(summarize(&resp)))
.await;
}
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
Ok(())
}
async fn handle_failure(
&self,
row: &OutboxRow,
resolved: &ResolvedTrigger,
err: ExecError,
) -> Result<(), DispatcherError> {
// Sync HTTP: always single-attempt. Always deliver outcome
// (success-or-failure) to the inbox. Never retry, never DL.
if let Some(inbox_id) = row.reply_to {
let (kind, message) = classify_exec_error(&err);
self.deliver_inbox(
row,
inbox_id,
InboxResult::Failure {
kind,
message: message.clone(),
},
)
.await;
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
return Ok(());
}
// Dead-letter handler: never retry, never DL. Failure
// annotates the original DL row + bumps a metric.
if resolved.is_dead_letter_handler {
tracing::error!(
outbox_id = %row.id,
app_id = %row.app_id,
?err,
"dead-letter handler failed; not retrying"
);
// TODO(metrics): bump `picloud_dead_letter_handler_failures{app_id}`.
// Annotate the original DL row (id is `row.payload.dead_letter.id`
// when the payload is a DeadLetter TriggerEvent). Best-effort:
// if the payload doesn't decode, just log and move on.
if let Ok(TriggerEvent::DeadLetter { dead_letter_id, .. }) =
serde_json::from_value::<TriggerEvent>(row.payload.clone())
{
if let Err(e) = self
.dead_letters
.resolve(dead_letter_id, "handler_failed")
.await
{
tracing::warn!(?e, "could not annotate DL row as handler_failed");
}
}
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
return Ok(());
}
// Async event: retry per policy, then dead-letter.
let attempt = row.attempt_count + 1;
if attempt < resolved.retry_max_attempts {
let delay = compute_backoff(
attempt,
resolved.retry_backoff,
resolved.retry_base_ms,
self.config.retry_jitter_pct,
);
let next = Utc::now() + chrono::Duration::milliseconds(i64::from(delay));
tracing::info!(
outbox_id = %row.id,
attempt,
max_attempts = resolved.retry_max_attempts,
retry_in_ms = delay,
"rescheduling outbox row"
);
self.outbox
.reschedule(row.id, attempt, next)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
return Ok(());
}
// Exhausted retries → dead-letter.
let (op, source) = describe_event(&row.payload);
let now = Utc::now();
if let Err(e) = self
.dead_letters
.insert(NewDeadLetter {
app_id: row.app_id,
original_event_id: row.id,
source,
op,
trigger_id: row.trigger_id,
script_id: Some(resolved.script_id),
payload: row.payload.clone(),
attempt_count: attempt,
first_attempt_at: row.created_at,
last_attempt_at: now,
last_error: err.to_string(),
})
.await
{
tracing::error!(?e, "failed to write dead-letter row");
}
self.outbox
.delete(row.id)
.await
.map_err(|e| DispatcherError::Outbox(e.to_string()))?;
Ok(())
}
async fn deliver_inbox(&self, row: &OutboxRow, inbox_id: Uuid, result: InboxResult) {
match self.inbox.deliver(inbox_id, result.clone()).await {
InboxDeliveryOutcome::Delivered => {}
InboxDeliveryOutcome::Abandoned => {
// Receiver was dropped — record forensic row + bump
// metric.
let (status_code, summary) = match &result {
InboxResult::Success(s) => (s.status_code, None),
InboxResult::Failure { kind, message } => {
(failure_kind_to_status(*kind), Some(message.clone()))
}
};
if let Err(e) = self
.abandoned
.insert(NewAbandonedExecution {
app_id: row.app_id,
outbox_id: row.id,
script_id: row.script_id,
inbox_id,
status_code,
result_summary: summary,
})
.await
{
tracing::warn!(?e, "abandoned_executions insert failed");
}
// TODO(metrics): bump `picloud_abandoned_executions_total{app_id}`.
}
}
}
}
#[derive(Debug)]
pub struct ResolvedTrigger {
pub trigger_kind: TriggerKind,
pub is_dead_letter_handler: bool,
pub script_id: ScriptId,
pub script_source: String,
pub script_name: String,
pub sandbox_overrides: ScriptSandbox,
pub registered_by_principal: picloud_shared::AdminUserId,
pub retry_max_attempts: u32,
pub retry_backoff: BackoffShape,
pub retry_base_ms: u32,
}
#[derive(Debug, thiserror::Error)]
pub enum DispatcherError {
#[error("outbox: {0}")]
Outbox(String),
#[error("resolve trigger: {0}")]
ResolveTrigger(String),
}
fn summarize(resp: &ExecResponse) -> ExecResponseSummary {
ExecResponseSummary {
status_code: resp.status_code,
headers: resp.headers.clone(),
body: resp.body.clone(),
}
}
/// Map `ExecError` onto the design-notes §3 status-code table.
fn classify_exec_error(err: &ExecError) -> (InboxFailureKind, String) {
match err {
ExecError::Parse(s) | ExecError::InvalidResponse(s) => {
(InboxFailureKind::Validation, s.clone())
}
ExecError::Timeout(_) => (InboxFailureKind::Timeout, err.to_string()),
ExecError::OperationBudgetExceeded => (InboxFailureKind::OperationBudget, err.to_string()),
ExecError::Overloaded { .. } => (InboxFailureKind::Overloaded, err.to_string()),
ExecError::Runtime(s) => (InboxFailureKind::Runtime, s.clone()),
}
}
fn failure_kind_to_status(k: InboxFailureKind) -> u16 {
match k {
InboxFailureKind::Validation => 422,
InboxFailureKind::Runtime => 502,
InboxFailureKind::Overloaded => 503,
InboxFailureKind::Timeout => 504,
InboxFailureKind::OperationBudget => 507,
InboxFailureKind::Platform => 500,
}
}
/// `(op, source)` extracted from the outbox payload. Used to seed the
/// `dead_letters` row when retries exhaust.
fn describe_event(payload: &serde_json::Value) -> (String, String) {
let source = payload
.get("source")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let op = payload
.get("op")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
(op, source)
}
/// Compute backoff (ms) for the given attempt + policy + jitter.
/// Attempt is 1-indexed (first retry = attempt 1).
#[must_use]
pub fn compute_backoff(attempt: u32, backoff: BackoffShape, base_ms: u32, jitter_pct: u32) -> u32 {
let base_ms = u64::from(base_ms);
let attempt = u64::from(attempt.saturating_sub(1));
let raw = match backoff {
BackoffShape::Constant => base_ms,
BackoffShape::Linear => base_ms * (attempt + 1),
// 1x base, 2x base, 4x base, … (saturating).
BackoffShape::Exponential => base_ms.saturating_mul(1u64 << attempt.min(20)),
};
let raw = u32::try_from(raw.min(u64::from(u32::MAX))).unwrap_or(u32::MAX);
apply_jitter(raw, jitter_pct)
}
fn apply_jitter(raw: u32, pct: u32) -> u32 {
if pct == 0 {
return raw;
}
let pct = pct.min(100);
// ±span% — bounded by raw itself so we can't underflow when
// raw + offset goes below zero.
let span = u64::from(raw) * u64::from(pct) / 100;
if span == 0 {
return raw;
}
let span_i64 = i64::try_from(span).unwrap_or(i64::MAX);
let mut rng = rand::thread_rng();
let offset = rng.gen_range(-span_i64..=span_i64);
let signed = i64::from(raw).saturating_add(offset).max(0);
u32::try_from(signed.min(i64::from(u32::MAX))).unwrap_or(u32::MAX)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn exponential_backoff_doubles_per_attempt() {
// No jitter (pct=0) for a deterministic check.
assert_eq!(compute_backoff(1, BackoffShape::Exponential, 1000, 0), 1000);
assert_eq!(compute_backoff(2, BackoffShape::Exponential, 1000, 0), 2000);
assert_eq!(compute_backoff(3, BackoffShape::Exponential, 1000, 0), 4000);
assert_eq!(compute_backoff(4, BackoffShape::Exponential, 1000, 0), 8000);
}
#[test]
fn linear_backoff_scales_with_attempt() {
assert_eq!(compute_backoff(1, BackoffShape::Linear, 100, 0), 100);
assert_eq!(compute_backoff(2, BackoffShape::Linear, 100, 0), 200);
assert_eq!(compute_backoff(5, BackoffShape::Linear, 100, 0), 500);
}
#[test]
fn constant_backoff_returns_base() {
for attempt in 1..=5 {
assert_eq!(
compute_backoff(attempt, BackoffShape::Constant, 750, 0),
750
);
}
}
#[test]
fn jitter_within_pct_of_base() {
for _ in 0..100 {
let v = compute_backoff(1, BackoffShape::Constant, 1000, 20);
// ±20% of 1000 = 800..=1200.
assert!((800..=1200).contains(&v), "jitter out of range: {v}");
}
}
#[test]
fn classify_exec_error_covers_every_variant() {
let parse = classify_exec_error(&ExecError::Parse("nope".into()));
assert!(matches!(parse.0, InboxFailureKind::Validation));
let invalid = classify_exec_error(&ExecError::InvalidResponse("bad".into()));
assert!(matches!(invalid.0, InboxFailureKind::Validation));
let timeout = classify_exec_error(&ExecError::Timeout(30));
assert!(matches!(timeout.0, InboxFailureKind::Timeout));
let budget = classify_exec_error(&ExecError::OperationBudgetExceeded);
assert!(matches!(budget.0, InboxFailureKind::OperationBudget));
let runtime = classify_exec_error(&ExecError::Runtime("threw".into()));
assert!(matches!(runtime.0, InboxFailureKind::Runtime));
let overload = classify_exec_error(&ExecError::Overloaded {
retry_after_secs: 1,
});
assert!(matches!(overload.0, InboxFailureKind::Overloaded));
}
#[test]
fn failure_kind_status_codes_match_design_notes() {
assert_eq!(failure_kind_to_status(InboxFailureKind::Validation), 422);
assert_eq!(failure_kind_to_status(InboxFailureKind::Runtime), 502);
assert_eq!(failure_kind_to_status(InboxFailureKind::Overloaded), 503);
assert_eq!(failure_kind_to_status(InboxFailureKind::Timeout), 504);
assert_eq!(
failure_kind_to_status(InboxFailureKind::OperationBudget),
507
);
assert_eq!(failure_kind_to_status(InboxFailureKind::Platform), 500);
}
}

View File

@@ -0,0 +1,95 @@
//! Weekly retention sweepers for `dead_letters` + `abandoned_executions`.
//!
//! Both use the `FOR UPDATE SKIP LOCKED` claim pattern so concurrent
//! sweepers (cluster mode v1.3+) don't fight each other. Defaults
//! match design notes §3 / §4: 30 days for DL, 7 days for abandoned.
//! Both env-overridable via `PICLOUD_DEAD_LETTER_RETENTION_DAYS` and
//! `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS` (loaded by
//! `TriggerConfig::from_env`).
//!
//! Spawned from `build_app` alongside `spawn_session_pruner`.
use std::sync::Arc;
use std::time::Duration;
use chrono::Utc;
use crate::abandoned_repo::AbandonedRepo;
use crate::dead_letter_repo::DeadLetterRepo;
/// Weekly sweep cadence — matches `spawn_session_pruner` shape.
const SWEEP_INTERVAL: Duration = Duration::from_secs(7 * 24 * 60 * 60);
/// Per-tick batch cap so we don't try to delete millions of rows in
/// one transaction. The loop keeps deleting batches until a tick
/// returns 0 rows affected.
const SWEEP_BATCH: i64 = 5_000;
pub fn spawn_dead_letter_gc(repo: Arc<dyn DeadLetterRepo>, retention_days: u32) {
tokio::spawn(async move {
let mut ticker = tokio::time::interval(SWEEP_INTERVAL);
// Skip the immediate first fire — don't sweep at process start.
ticker.tick().await;
loop {
ticker.tick().await;
sweep_dead_letters(&*repo, retention_days).await;
}
});
}
pub fn spawn_abandoned_gc(repo: Arc<dyn AbandonedRepo>, retention_days: u32) {
tokio::spawn(async move {
let mut ticker = tokio::time::interval(SWEEP_INTERVAL);
ticker.tick().await;
loop {
ticker.tick().await;
sweep_abandoned(&*repo, retention_days).await;
}
});
}
async fn sweep_dead_letters(repo: &dyn DeadLetterRepo, retention_days: u32) {
let cutoff = Utc::now() - chrono::Duration::days(i64::from(retention_days));
let mut total: u64 = 0;
loop {
match repo.gc(cutoff, SWEEP_BATCH).await {
Ok(0) => break,
Ok(n) => {
total += n;
if n < SWEEP_BATCH as u64 {
break;
}
}
Err(e) => {
tracing::warn!(?e, "dead_letters GC sweep errored");
break;
}
}
}
if total > 0 {
tracing::info!(swept = total, "dead_letters GC swept");
}
}
async fn sweep_abandoned(repo: &dyn AbandonedRepo, retention_days: u32) {
let cutoff = Utc::now() - chrono::Duration::days(i64::from(retention_days));
let mut total: u64 = 0;
loop {
match repo.gc(cutoff, SWEEP_BATCH).await {
Ok(0) => break,
Ok(n) => {
total += n;
if n < SWEEP_BATCH as u64 {
break;
}
}
Err(e) => {
tracing::warn!(?e, "abandoned_executions GC sweep errored");
break;
}
}
}
if total > 0 {
tracing::info!(swept = total, "abandoned_executions GC swept");
}
}

View File

@@ -0,0 +1,223 @@
//! Low-level Postgres CRUD over `kv_entries`. Stays storage-only;
//! authorization, event emission, and empty-collection validation live
//! one layer up in `KvServiceImpl`.
use async_trait::async_trait;
use base64::engine::general_purpose::URL_SAFE_NO_PAD;
use base64::Engine as _;
use picloud_shared::{AppId, KvListPage};
use sqlx::PgPool;
#[derive(Debug, thiserror::Error)]
pub enum KvRepoError {
#[error("database error: {0}")]
Db(#[from] sqlx::Error),
#[error("invalid pagination cursor")]
InvalidCursor,
}
/// Repo surface. The trait is exposed so tests can substitute an
/// in-memory backing without spinning up Postgres.
#[async_trait]
pub trait KvRepo: Send + Sync {
async fn get(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError>;
/// Upserts the row. Returns the previous value (if any) so callers
/// can determine whether this was an `insert` or an `update` for
/// the emitted `ServiceEvent`.
async fn set(
&self,
app_id: AppId,
collection: &str,
key: &str,
value: serde_json::Value,
) -> Result<Option<serde_json::Value>, KvRepoError>;
/// Returns the deleted value if present, `None` if the row didn't
/// exist. The caller turns the `bool was-present` part into the
/// SDK's return value; the `Option<value>` part feeds the
/// `old_payload` field of the emitted delete event.
async fn delete(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError>;
async fn has(&self, app_id: AppId, collection: &str, key: &str) -> Result<bool, KvRepoError>;
async fn list(
&self,
app_id: AppId,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvRepoError>;
}
pub struct PostgresKvRepo {
pool: PgPool,
}
impl PostgresKvRepo {
#[must_use]
pub fn new(pool: PgPool) -> Self {
Self { pool }
}
}
/// Hard ceiling on `list` page size — scripts that pass anything larger
/// silently get clamped to this. Cursor-style pagination keeps a single
/// request bounded; clients fetch the next page via the returned cursor.
const KV_LIST_MAX_LIMIT: u32 = 1_000;
const KV_LIST_DEFAULT_LIMIT: u32 = 100;
#[async_trait]
impl KvRepo for PostgresKvRepo {
async fn get(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError> {
let row: Option<(serde_json::Value,)> = sqlx::query_as(
"SELECT value FROM kv_entries \
WHERE app_id = $1 AND collection = $2 AND key = $3",
)
.bind(app_id.into_inner())
.bind(collection)
.bind(key)
.fetch_optional(&self.pool)
.await?;
Ok(row.map(|(v,)| v))
}
async fn set(
&self,
app_id: AppId,
collection: &str,
key: &str,
value: serde_json::Value,
) -> Result<Option<serde_json::Value>, KvRepoError> {
// `RETURNING` after `ON CONFLICT DO UPDATE` exposes the old
// value via the `xmax`/old-row trick: capture the prior value
// with a CTE so callers know whether this was insert vs update.
let row: Option<(Option<serde_json::Value>,)> = sqlx::query_as(
"WITH prev AS (\
SELECT value FROM kv_entries \
WHERE app_id = $1 AND collection = $2 AND key = $3\
), \
upserted AS (\
INSERT INTO kv_entries (app_id, collection, key, value) \
VALUES ($1, $2, $3, $4) \
ON CONFLICT (app_id, collection, key) DO UPDATE \
SET value = EXCLUDED.value, updated_at = NOW() \
RETURNING 1\
) \
SELECT (SELECT value FROM prev) FROM upserted",
)
.bind(app_id.into_inner())
.bind(collection)
.bind(key)
.bind(value)
.fetch_optional(&self.pool)
.await?;
Ok(row.and_then(|(v,)| v))
}
async fn delete(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError> {
let row: Option<(serde_json::Value,)> = sqlx::query_as(
"DELETE FROM kv_entries \
WHERE app_id = $1 AND collection = $2 AND key = $3 \
RETURNING value",
)
.bind(app_id.into_inner())
.bind(collection)
.bind(key)
.fetch_optional(&self.pool)
.await?;
Ok(row.map(|(v,)| v))
}
async fn has(&self, app_id: AppId, collection: &str, key: &str) -> Result<bool, KvRepoError> {
let row: Option<(i64,)> = sqlx::query_as(
"SELECT 1 FROM kv_entries \
WHERE app_id = $1 AND collection = $2 AND key = $3",
)
.bind(app_id.into_inner())
.bind(collection)
.bind(key)
.fetch_optional(&self.pool)
.await?;
Ok(row.is_some())
}
async fn list(
&self,
app_id: AppId,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvRepoError> {
let limit = if limit == 0 {
KV_LIST_DEFAULT_LIMIT
} else {
limit.min(KV_LIST_MAX_LIMIT)
};
let last_key = match cursor {
Some(c) => Some(decode_cursor(c)?),
None => None,
};
// Keyset pagination: rows beyond `last_key` ordered by key.
// `+1` to detect a "more pages" condition without a separate
// COUNT query.
let take = i64::from(limit) + 1;
let rows: Vec<(String,)> = sqlx::query_as(
"SELECT key FROM kv_entries \
WHERE app_id = $1 AND collection = $2 \
AND ($3::text IS NULL OR key > $3) \
ORDER BY key ASC \
LIMIT $4",
)
.bind(app_id.into_inner())
.bind(collection)
.bind(last_key.as_deref())
.bind(take)
.fetch_all(&self.pool)
.await?;
let mut keys: Vec<String> = rows.into_iter().map(|(k,)| k).collect();
let next_cursor = if keys.len() > limit as usize {
keys.truncate(limit as usize);
keys.last().map(|k| encode_cursor(k))
} else {
None
};
Ok(KvListPage { keys, next_cursor })
}
}
fn encode_cursor(last_key: &str) -> String {
URL_SAFE_NO_PAD.encode(last_key.as_bytes())
}
fn decode_cursor(cursor: &str) -> Result<String, KvRepoError> {
let bytes = URL_SAFE_NO_PAD
.decode(cursor)
.map_err(|_| KvRepoError::InvalidCursor)?;
String::from_utf8(bytes).map_err(|_| KvRepoError::InvalidCursor)
}

View File

@@ -0,0 +1,525 @@
//! `KvServiceImpl` — wires the `KvRepo` underneath the
//! `picloud_shared::KvService` trait that scripts see via the Rhai
//! bridge.
//!
//! Layers added here (vs the raw repo):
//!
//! 1. Empty-collection rejection at the SDK boundary
//! (`docs/sdk-shape.md`).
//! 2. **Script-as-gate authz**: when `cx.principal.is_some()` we run
//! `authz::require(...)`; when it's `None` (public unauthenticated
//! HTTP — the common case for public routes) we skip the check.
//! Cross-app isolation isn't affected — every query is keyed by
//! `cx.app_id`, never an argument.
//! 3. `ServiceEvent` emission after each mutation (`insert` / `update`
//! / `delete`). v1.1.0 ships a `NoopEventEmitter` so this is a
//! no-op until the outbox emitter lands later in v1.1.1.
use std::sync::Arc;
use async_trait::async_trait;
use picloud_shared::{
KvError, KvListPage, KvService, SdkCallCx, ServiceEvent, ServiceEventEmitter,
};
use crate::authz::{self, AuthzRepo, Capability};
use crate::kv_repo::{KvRepo, KvRepoError};
pub struct KvServiceImpl {
repo: Arc<dyn KvRepo>,
authz: Arc<dyn AuthzRepo>,
events: Arc<dyn ServiceEventEmitter>,
}
impl KvServiceImpl {
#[must_use]
pub fn new(
repo: Arc<dyn KvRepo>,
authz: Arc<dyn AuthzRepo>,
events: Arc<dyn ServiceEventEmitter>,
) -> Self {
Self {
repo,
authz,
events,
}
}
async fn check_read(&self, cx: &SdkCallCx) -> Result<(), KvError> {
if let Some(ref principal) = cx.principal {
authz::require(&*self.authz, principal, Capability::AppKvRead(cx.app_id))
.await
.map_err(|_| KvError::Forbidden)?;
}
Ok(())
}
async fn check_write(&self, cx: &SdkCallCx) -> Result<(), KvError> {
if let Some(ref principal) = cx.principal {
authz::require(&*self.authz, principal, Capability::AppKvWrite(cx.app_id))
.await
.map_err(|_| KvError::Forbidden)?;
}
Ok(())
}
}
fn validate_collection(collection: &str) -> Result<(), KvError> {
if collection.is_empty() {
return Err(KvError::InvalidCollection);
}
Ok(())
}
impl From<KvRepoError> for KvError {
fn from(e: KvRepoError) -> Self {
Self::Backend(e.to_string())
}
}
#[async_trait]
impl KvService for KvServiceImpl {
async fn get(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvError> {
validate_collection(collection)?;
self.check_read(cx).await?;
Ok(self.repo.get(cx.app_id, collection, key).await?)
}
async fn set(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
value: serde_json::Value,
) -> Result<(), KvError> {
validate_collection(collection)?;
self.check_write(cx).await?;
let previous = self
.repo
.set(cx.app_id, collection, key, value.clone())
.await?;
let op = if previous.is_some() {
"update"
} else {
"insert"
};
// Emit unconditionally; the noop emitter drops it, the outbox
// emitter persists it. Best-effort: a failed emit is logged
// but does not roll back the write.
if let Err(e) = self
.events
.emit(
cx,
ServiceEvent {
source: "kv",
op,
collection: Some(collection.to_string()),
key: Some(key.to_string()),
payload: Some(value),
old_payload: previous,
},
)
.await
{
tracing::warn!(error = %e, source = "kv", op, "event emit failed");
}
Ok(())
}
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
validate_collection(collection)?;
self.check_write(cx).await?;
let previous = self.repo.delete(cx.app_id, collection, key).await?;
let was_present = previous.is_some();
if was_present {
if let Err(e) = self
.events
.emit(
cx,
ServiceEvent {
source: "kv",
op: "delete",
collection: Some(collection.to_string()),
key: Some(key.to_string()),
payload: None,
old_payload: previous,
},
)
.await
{
tracing::warn!(error = %e, source = "kv", op = "delete", "event emit failed");
}
}
Ok(was_present)
}
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError> {
validate_collection(collection)?;
self.check_read(cx).await?;
Ok(self.repo.has(cx.app_id, collection, key).await?)
}
async fn list(
&self,
cx: &SdkCallCx,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvError> {
validate_collection(collection)?;
self.check_read(cx).await?;
Ok(self.repo.list(cx.app_id, collection, cursor, limit).await?)
}
}
// ----------------------------------------------------------------------------
// Tests — in-memory KvRepo so unit tests don't need Postgres.
// ----------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
use crate::authz::{AuthzError, AuthzRepo};
use async_trait::async_trait;
use picloud_shared::{
AdminUserId, AppId, AppRole, ExecutionId, InstanceRole, NoopEventEmitter, Principal,
RequestId, UserId,
};
use std::collections::{BTreeMap, HashMap};
use tokio::sync::Mutex;
#[derive(Default)]
struct InMemoryKvRepo {
data: Mutex<BTreeMap<(AppId, String, String), serde_json::Value>>,
}
#[async_trait]
impl KvRepo for InMemoryKvRepo {
async fn get(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError> {
Ok(self
.data
.lock()
.await
.get(&(app_id, collection.to_string(), key.to_string()))
.cloned())
}
async fn set(
&self,
app_id: AppId,
collection: &str,
key: &str,
value: serde_json::Value,
) -> Result<Option<serde_json::Value>, KvRepoError> {
Ok(self
.data
.lock()
.await
.insert((app_id, collection.to_string(), key.to_string()), value))
}
async fn delete(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvRepoError> {
Ok(self
.data
.lock()
.await
.remove(&(app_id, collection.to_string(), key.to_string())))
}
async fn has(
&self,
app_id: AppId,
collection: &str,
key: &str,
) -> Result<bool, KvRepoError> {
Ok(self.data.lock().await.contains_key(&(
app_id,
collection.to_string(),
key.to_string(),
)))
}
async fn list(
&self,
app_id: AppId,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvRepoError> {
let data = self.data.lock().await;
let last_key = cursor.map(std::string::ToString::to_string);
let mut keys: Vec<String> = data
.iter()
.filter(|((a, c, _), _)| *a == app_id && c == collection)
.map(|((_, _, k), _)| k.clone())
.filter(|k| last_key.as_ref().is_none_or(|lk| k > lk))
.collect();
keys.sort();
let take = (limit as usize).max(1);
let next_cursor = if keys.len() > take {
keys.truncate(take);
keys.last().cloned()
} else {
None
};
Ok(KvListPage { keys, next_cursor })
}
}
/// AuthzRepo that always denies — used to confirm the service
/// short-circuits on cx.principal.is_some() with a denial, and
/// that it does NOT call into authz when cx.principal is None.
#[derive(Default)]
struct DenyingAuthzRepo;
#[async_trait]
impl AuthzRepo for DenyingAuthzRepo {
async fn membership(
&self,
_user_id: UserId,
_app_id: AppId,
) -> Result<Option<AppRole>, AuthzError> {
Ok(None)
}
}
fn anon_cx(app_id: AppId) -> SdkCallCx {
SdkCallCx {
app_id,
principal: None,
execution_id: ExecutionId::new(),
request_id: RequestId::new(),
trigger_depth: 0,
root_execution_id: ExecutionId::new(),
is_dead_letter_handler: false,
event: None,
}
}
fn owner_cx(app_id: AppId) -> SdkCallCx {
SdkCallCx {
app_id,
principal: Some(Principal {
user_id: AdminUserId::new(),
instance_role: InstanceRole::Owner,
scopes: None,
app_binding: None,
}),
execution_id: ExecutionId::new(),
request_id: RequestId::new(),
trigger_depth: 0,
root_execution_id: ExecutionId::new(),
is_dead_letter_handler: false,
event: None,
}
}
fn member_no_role_cx(app_id: AppId) -> SdkCallCx {
SdkCallCx {
app_id,
principal: Some(Principal {
user_id: AdminUserId::new(),
instance_role: InstanceRole::Member,
scopes: None,
app_binding: None,
}),
execution_id: ExecutionId::new(),
request_id: RequestId::new(),
trigger_depth: 0,
root_execution_id: ExecutionId::new(),
is_dead_letter_handler: false,
event: None,
}
}
fn svc() -> KvServiceImpl {
KvServiceImpl::new(
Arc::new(InMemoryKvRepo::default()),
Arc::new(DenyingAuthzRepo),
Arc::new(NoopEventEmitter),
)
}
#[tokio::test]
async fn set_then_get_round_trips() {
let kv = svc();
let cx = anon_cx(AppId::new());
kv.set(&cx, "widgets", "k1", serde_json::json!({"n": 1}))
.await
.unwrap();
let v = kv.get(&cx, "widgets", "k1").await.unwrap();
assert_eq!(v, Some(serde_json::json!({"n": 1})));
}
#[tokio::test]
async fn get_missing_returns_none() {
let kv = svc();
let cx = anon_cx(AppId::new());
let v = kv.get(&cx, "widgets", "nope").await.unwrap();
assert_eq!(v, None);
}
#[tokio::test]
async fn has_returns_bool() {
let kv = svc();
let cx = anon_cx(AppId::new());
assert!(!kv.has(&cx, "widgets", "k1").await.unwrap());
kv.set(&cx, "widgets", "k1", serde_json::json!(true))
.await
.unwrap();
assert!(kv.has(&cx, "widgets", "k1").await.unwrap());
}
#[tokio::test]
async fn delete_returns_was_present() {
let kv = svc();
let cx = anon_cx(AppId::new());
assert!(!kv.delete(&cx, "widgets", "missing").await.unwrap());
kv.set(&cx, "widgets", "k1", serde_json::json!(1))
.await
.unwrap();
assert!(kv.delete(&cx, "widgets", "k1").await.unwrap());
// Idempotent — second delete returns false.
assert!(!kv.delete(&cx, "widgets", "k1").await.unwrap());
}
#[tokio::test]
async fn empty_collection_rejected() {
let kv = svc();
let cx = anon_cx(AppId::new());
let err = kv.get(&cx, "", "k1").await.unwrap_err();
assert!(matches!(err, KvError::InvalidCollection));
}
/// Load-bearing: a script with `cx.app_id = A` must NOT see
/// entries inserted under `cx.app_id = B`. This is the cross-app
/// isolation boundary; getting this wrong is a security
/// vulnerability.
#[tokio::test]
async fn cross_app_isolation_via_cx_app_id() {
let kv = svc();
let app_a = AppId::new();
let app_b = AppId::new();
let cx_a = anon_cx(app_a);
let cx_b = anon_cx(app_b);
kv.set(&cx_a, "shared", "k", serde_json::json!("from-a"))
.await
.unwrap();
kv.set(&cx_b, "shared", "k", serde_json::json!("from-b"))
.await
.unwrap();
assert_eq!(
kv.get(&cx_a, "shared", "k").await.unwrap(),
Some(serde_json::json!("from-a"))
);
assert_eq!(
kv.get(&cx_b, "shared", "k").await.unwrap(),
Some(serde_json::json!("from-b"))
);
}
/// Script-as-gate: an `anon_cx` (principal = None) skips the
/// capability check entirely. Even with a denying authz repo,
/// the write succeeds.
#[tokio::test]
async fn anonymous_cx_skips_authz() {
let kv = svc();
let cx = anon_cx(AppId::new());
kv.set(&cx, "widgets", "k", serde_json::json!(1))
.await
.unwrap();
// No panic, no Forbidden.
}
/// Authenticated principal with no role on the app: the
/// `DenyingAuthzRepo` returns no membership, so the capability
/// check denies. Set must surface KvError::Forbidden.
#[tokio::test]
async fn authed_cx_with_no_role_is_forbidden() {
let kv = svc();
let cx = member_no_role_cx(AppId::new());
let err = kv
.set(&cx, "widgets", "k", serde_json::json!(1))
.await
.unwrap_err();
assert!(matches!(err, KvError::Forbidden));
}
/// Owner principal: instance-role grants kick in inside `authz::can`
/// (Owner -> implicit AppAdmin which covers KvWrite).
#[tokio::test]
async fn owner_principal_can_write() {
let kv = svc();
let cx = owner_cx(AppId::new());
kv.set(&cx, "widgets", "k", serde_json::json!(1))
.await
.unwrap();
}
#[tokio::test]
async fn list_cursor_pagination() {
let kv = svc();
let cx = anon_cx(AppId::new());
for i in 0..5 {
kv.set(
&cx,
"widgets",
&format!("k{i:02}"),
serde_json::json!({"i": i}),
)
.await
.unwrap();
}
// page 1 — 2 keys
let p1 = kv.list(&cx, "widgets", None, 2).await.unwrap();
assert_eq!(p1.keys, vec!["k00".to_string(), "k01".to_string()]);
assert!(p1.next_cursor.is_some());
// page 2 — 2 keys
let p2 = kv
.list(&cx, "widgets", p1.next_cursor.as_deref(), 2)
.await
.unwrap();
assert_eq!(p2.keys, vec!["k02".to_string(), "k03".to_string()]);
// final page — 1 key, no cursor
let p3 = kv
.list(&cx, "widgets", p2.next_cursor.as_deref(), 2)
.await
.unwrap();
assert_eq!(p3.keys, vec!["k04".to_string()]);
assert!(p3.next_cursor.is_none());
}
/// Pinning the v1.1.0 contract: services hold the emitter as a
/// dyn Arc and call `emit().await` unconditionally. This test
/// proves the call site doesn't blow up against the noop impl —
/// the outbox emitter (v1.1.1) drops in transparently.
#[tokio::test]
async fn noop_emitter_does_not_block_mutations() {
let kv = svc();
let cx = anon_cx(AppId::new());
kv.set(&cx, "widgets", "k", serde_json::json!(1))
.await
.unwrap();
kv.delete(&cx, "widgets", "k").await.unwrap();
// Reaching here means emit() returned Ok and didn't panic.
// Suppress unused-import warning when run alone:
let _ = HashMap::<String, String>::new();
}
}

View File

@@ -4,6 +4,7 @@
//! the same DB for now; once we add caching and per-node ingress, the //! the same DB for now; once we add caching and per-node ingress, the
//! manager will publish change events. //! manager will publish change events.
pub mod abandoned_repo;
pub mod admin_session_repo; pub mod admin_session_repo;
pub mod admin_user_repo; pub mod admin_user_repo;
pub mod admin_users_api; pub mod admin_users_api;
@@ -21,14 +22,30 @@ pub mod auth_api;
pub mod auth_bootstrap; pub mod auth_bootstrap;
pub mod auth_middleware; pub mod auth_middleware;
pub mod authz; pub mod authz;
pub mod dead_letter_repo;
pub mod dead_letter_service;
pub mod dead_letters_api;
pub mod dispatcher;
pub mod gc;
pub mod kv_repo;
pub mod kv_service;
pub mod log_sink; pub mod log_sink;
pub mod migrations; pub mod migrations;
pub mod outbox_event_emitter;
pub mod outbox_repo;
pub mod principal_resolver;
pub mod repo; pub mod repo;
pub mod route_admin; pub mod route_admin;
pub mod route_repo; pub mod route_repo;
pub mod sandbox; pub mod sandbox;
pub mod scheduler; pub mod scheduler;
pub mod trigger_config;
pub mod trigger_repo;
pub mod triggers_api;
pub use abandoned_repo::{
AbandonedRepo, AbandonedRepoError, NewAbandonedExecution, PostgresAbandonedRepo,
};
pub use admin_session_repo::{ pub use admin_session_repo::{
AdminSessionLookup, AdminSessionRepository, AdminSessionRepositoryError, AdminSessionLookup, AdminSessionRepository, AdminSessionRepositoryError,
PostgresAdminSessionRepository, PostgresAdminSessionRepository,
@@ -63,7 +80,21 @@ pub use auth_middleware::{
API_KEY_PREFIX, API_KEY_PREFIX_LEN, SESSION_COOKIE, API_KEY_PREFIX, API_KEY_PREFIX_LEN, SESSION_COOKIE,
}; };
pub use authz::{can, require, AuthzDenied, AuthzError, AuthzRepo, Capability, Decision}; pub use authz::{can, require, AuthzDenied, AuthzError, AuthzRepo, Capability, Decision};
pub use dead_letter_repo::{
DeadLetterRepo, DeadLetterRepoError, DeadLetterRow, NewDeadLetter, PostgresDeadLetterRepo,
};
pub use dead_letter_service::PostgresDeadLetterService;
pub use dead_letters_api::{dead_letters_router, DeadLettersApiError, DeadLettersState};
pub use dispatcher::{compute_backoff, Dispatcher, DispatcherError};
pub use gc::{spawn_abandoned_gc, spawn_dead_letter_gc};
pub use kv_repo::{KvRepo, KvRepoError, PostgresKvRepo};
pub use kv_service::KvServiceImpl;
pub use log_sink::PostgresExecutionLogSink; pub use log_sink::PostgresExecutionLogSink;
pub use outbox_event_emitter::OutboxEventEmitter;
pub use outbox_repo::{
NewOutboxRow, OutboxRepo, OutboxRepoError, OutboxRow, OutboxSourceKind, PostgresOutboxRepo,
};
pub use principal_resolver::{AdminPrincipalResolver, PrincipalResolver, PrincipalResolverError};
pub use repo::{ pub use repo::{
ExecutionLogRepository, NewScript, PostgresExecutionLogRepository, PostgresScriptRepository, ExecutionLogRepository, NewScript, PostgresExecutionLogRepository, PostgresScriptRepository,
RepoResolver, ScriptPatch, ScriptRepository, ScriptRepositoryError, RepoResolver, ScriptPatch, ScriptRepository, ScriptRepositoryError,
@@ -71,3 +102,10 @@ pub use repo::{
pub use route_admin::{compile_routes, route_admin_router, RouteAdminState}; pub use route_admin::{compile_routes, route_admin_router, RouteAdminState};
pub use route_repo::{NewRoute, PostgresRouteRepository, RouteRepository}; pub use route_repo::{NewRoute, PostgresRouteRepository, RouteRepository};
pub use sandbox::{CeilingError, SandboxCeiling}; pub use sandbox::{CeilingError, SandboxCeiling};
pub use trigger_config::{BackoffShape, TriggerConfig};
pub use trigger_repo::{
collection_matches, CreateDeadLetterTrigger, CreateKvTrigger, DeadLetterTriggerMatch,
KvTriggerMatch, PostgresTriggerRepo, Trigger, TriggerDetails, TriggerDispatchMode, TriggerKind,
TriggerRepo, TriggerRepoError,
};
pub use triggers_api::{triggers_router, TriggersApiError, TriggersState};

View File

@@ -0,0 +1,103 @@
//! `OutboxEventEmitter` — the real `ServiceEventEmitter` that replaces
//! v1.1.0's `NoopEventEmitter` once the triggers framework lands.
//!
//! On each `emit` (a KV mutation, future doc/file/pubsub event, etc.):
//! 1. Look up matching triggers for the event's (app_id, source, op,
//! collection) tuple via `TriggerRepo::list_matching_*`.
//! 2. For each match, write one outbox row carrying the event payload
//! serialized as a `TriggerEvent`.
//!
//! Defaults applied at write time so `OutboxRow.payload` carries
//! everything the dispatcher needs to reconstruct the executor
//! invocation without joining back to the trigger row.
//!
//! Non-KV `ServiceEvent` sources are silently dropped in v1.1.1 — the
//! dispatcher only knows how to fire KV triggers this release. Future
//! sources (docs/files/pubsub) add their own dispatch arm.
use std::sync::Arc;
use async_trait::async_trait;
use picloud_shared::{
EmitError, KvEventOp, SdkCallCx, ServiceEvent, ServiceEventEmitter, TriggerEvent,
};
use crate::outbox_repo::{NewOutboxRow, OutboxRepo, OutboxSourceKind};
use crate::trigger_repo::TriggerRepo;
pub struct OutboxEventEmitter {
triggers: Arc<dyn TriggerRepo>,
outbox: Arc<dyn OutboxRepo>,
}
impl OutboxEventEmitter {
#[must_use]
pub fn new(triggers: Arc<dyn TriggerRepo>, outbox: Arc<dyn OutboxRepo>) -> Self {
Self { triggers, outbox }
}
}
#[async_trait]
impl ServiceEventEmitter for OutboxEventEmitter {
async fn emit(&self, cx: &SdkCallCx, event: ServiceEvent) -> Result<(), EmitError> {
match event.source {
"kv" => self.emit_kv(cx, event).await,
// Future sources land here. For now, silently drop — the
// SDK calls `events.emit(...)` unconditionally for forward
// compat, so swallowing without an error is correct.
_ => Ok(()),
}
}
}
impl OutboxEventEmitter {
async fn emit_kv(&self, cx: &SdkCallCx, event: ServiceEvent) -> Result<(), EmitError> {
let Some(op) = KvEventOp::from_wire(event.op) else {
return Ok(()); // unknown op — drop quietly
};
let Some(collection) = event.collection.clone() else {
return Ok(()); // KV events always carry a collection — defensively skip
};
let key = event.key.clone().unwrap_or_default();
let matches = self
.triggers
.list_matching_kv(cx.app_id, &collection, op)
.await
.map_err(|e| EmitError::Unavailable(format!("trigger lookup: {e}")))?;
if matches.is_empty() {
return Ok(());
}
// Serialize the originating event as a TriggerEvent so the
// dispatcher can hand it to the script as `ctx.event` without
// round-tripping back to the trigger row.
let trigger_event = TriggerEvent::Kv {
op,
collection,
key,
value: event.payload.clone(),
};
let payload = serde_json::to_value(&trigger_event)
.map_err(|e| EmitError::Rejected(format!("event serialize: {e}")))?;
for m in matches {
self.outbox
.insert(NewOutboxRow {
app_id: cx.app_id,
source_kind: OutboxSourceKind::Kv,
trigger_id: Some(m.trigger_id),
script_id: Some(m.script_id),
reply_to: None,
payload: payload.clone(),
origin_principal: cx.principal.as_ref().map(|p| p.user_id),
trigger_depth: cx.trigger_depth.saturating_add(1),
root_execution_id: Some(cx.root_execution_id),
})
.await
.map_err(|e| EmitError::Unavailable(format!("outbox insert: {e}")))?;
}
Ok(())
}
}

View File

@@ -0,0 +1,258 @@
//! `OutboxRepo` — universal trigger outbox CRUD. Hot writes come from
//! the `OutboxEventEmitter` (KV mutations fan out via this) and the
//! sync-HTTP path. Hot reads come from the dispatcher, which claims
//! due rows via `FOR UPDATE SKIP LOCKED`.
use async_trait::async_trait;
use chrono::{DateTime, Utc};
use picloud_shared::{
AdminUserId, AppId, ExecutionId, NewHttpOutbox, OutboxWriter, OutboxWriterError, ScriptId,
TriggerId,
};
use sqlx::PgPool;
use uuid::Uuid;
#[derive(Debug, thiserror::Error)]
pub enum OutboxRepoError {
#[error("database error: {0}")]
Db(#[from] sqlx::Error),
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum OutboxSourceKind {
Http,
Kv,
DeadLetter,
}
impl OutboxSourceKind {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Http => "http",
Self::Kv => "kv",
Self::DeadLetter => "dead_letter",
}
}
#[must_use]
pub fn from_wire(s: &str) -> Option<Self> {
match s {
"http" => Some(Self::Http),
"kv" => Some(Self::Kv),
"dead_letter" => Some(Self::DeadLetter),
_ => None,
}
}
}
/// Insert payload — what each event source writes when fanning out
/// to the outbox. `payload` is the serialized `TriggerEvent` (plus
/// any extra context the dispatcher needs to reconstruct an
/// `ExecRequest`).
#[derive(Debug, Clone)]
pub struct NewOutboxRow {
pub app_id: AppId,
pub source_kind: OutboxSourceKind,
pub trigger_id: Option<TriggerId>,
pub script_id: Option<ScriptId>,
pub reply_to: Option<Uuid>,
pub payload: serde_json::Value,
pub origin_principal: Option<AdminUserId>,
pub trigger_depth: u32,
pub root_execution_id: Option<ExecutionId>,
}
/// Row as the dispatcher sees it after a claim.
#[derive(Debug, Clone)]
pub struct OutboxRow {
pub id: Uuid,
pub app_id: AppId,
pub source_kind: OutboxSourceKind,
pub trigger_id: Option<TriggerId>,
pub script_id: Option<ScriptId>,
pub reply_to: Option<Uuid>,
pub payload: serde_json::Value,
pub origin_principal: Option<AdminUserId>,
pub trigger_depth: u32,
pub root_execution_id: Option<ExecutionId>,
pub attempt_count: u32,
pub next_attempt_at: DateTime<Utc>,
pub created_at: DateTime<Utc>,
}
#[async_trait]
pub trait OutboxRepo: Send + Sync {
async fn insert(&self, row: NewOutboxRow) -> Result<Uuid, OutboxRepoError>;
/// Claim up to `limit` due rows. Wraps the claim in a single
/// transaction so two concurrent dispatchers (cluster mode) can't
/// double-pick a row. Empty Vec when nothing is due.
async fn claim_due(
&self,
claimed_by: &str,
limit: i64,
) -> Result<Vec<OutboxRow>, OutboxRepoError>;
/// Remove a row after a terminal outcome (success or dead-letter).
async fn delete(&self, id: Uuid) -> Result<(), OutboxRepoError>;
/// Failure path: bump attempt_count, clear the claim, set the
/// next attempt time. The dispatcher computes the delay (with
/// backoff + jitter) and passes it in.
async fn reschedule(
&self,
id: Uuid,
attempt_count: u32,
next_attempt_at: DateTime<Utc>,
) -> Result<(), OutboxRepoError>;
}
pub struct PostgresOutboxRepo {
pool: PgPool,
}
impl PostgresOutboxRepo {
#[must_use]
pub fn new(pool: PgPool) -> Self {
Self { pool }
}
}
#[async_trait]
impl OutboxRepo for PostgresOutboxRepo {
async fn insert(&self, row: NewOutboxRow) -> Result<Uuid, OutboxRepoError> {
let (id,): (Uuid,) = sqlx::query_as(
"INSERT INTO outbox ( \
app_id, source_kind, trigger_id, script_id, reply_to, \
payload, origin_principal, trigger_depth, root_execution_id \
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) \
RETURNING id",
)
.bind(row.app_id.into_inner())
.bind(row.source_kind.as_str())
.bind(row.trigger_id.map(TriggerId::into_inner))
.bind(row.script_id.map(ScriptId::into_inner))
.bind(row.reply_to)
.bind(row.payload)
.bind(row.origin_principal.map(AdminUserId::into_inner))
.bind(i32::try_from(row.trigger_depth).unwrap_or(0))
.bind(row.root_execution_id.map(ExecutionId::into_inner))
.fetch_one(&self.pool)
.await?;
Ok(id)
}
async fn claim_due(
&self,
claimed_by: &str,
limit: i64,
) -> Result<Vec<OutboxRow>, OutboxRepoError> {
let rows: Vec<OutboxRowRaw> = sqlx::query_as(
"WITH due AS ( \
SELECT id FROM outbox \
WHERE claimed_at IS NULL AND next_attempt_at <= NOW() \
ORDER BY next_attempt_at \
FOR UPDATE SKIP LOCKED \
LIMIT $1 \
) \
UPDATE outbox SET claimed_at = NOW(), claimed_by = $2 \
WHERE id IN (SELECT id FROM due) \
RETURNING id, app_id, source_kind, trigger_id, script_id, reply_to, \
payload, origin_principal, trigger_depth, \
root_execution_id, attempt_count, next_attempt_at, created_at",
)
.bind(limit)
.bind(claimed_by)
.fetch_all(&self.pool)
.await?;
Ok(rows.into_iter().filter_map(OutboxRowRaw::hydrate).collect())
}
async fn delete(&self, id: Uuid) -> Result<(), OutboxRepoError> {
sqlx::query("DELETE FROM outbox WHERE id = $1")
.bind(id)
.execute(&self.pool)
.await?;
Ok(())
}
async fn reschedule(
&self,
id: Uuid,
attempt_count: u32,
next_attempt_at: DateTime<Utc>,
) -> Result<(), OutboxRepoError> {
sqlx::query(
"UPDATE outbox SET attempt_count = $2, next_attempt_at = $3, \
claimed_at = NULL, claimed_by = NULL \
WHERE id = $1",
)
.bind(id)
.bind(i32::try_from(attempt_count).unwrap_or(0))
.bind(next_attempt_at)
.execute(&self.pool)
.await?;
Ok(())
}
}
/// `OutboxWriter` implementation so orchestrator-core (which can't
/// depend on manager-core) can enqueue HTTP outbox rows through the
/// shared trait.
#[async_trait]
impl OutboxWriter for PostgresOutboxRepo {
async fn enqueue_http(&self, row: NewHttpOutbox) -> Result<Uuid, OutboxWriterError> {
self.insert(NewOutboxRow {
app_id: row.app_id,
source_kind: OutboxSourceKind::Http,
trigger_id: Some(TriggerId::from(row.route_id)),
script_id: Some(row.script_id),
reply_to: row.reply_to,
payload: row.payload,
origin_principal: row.origin_principal,
trigger_depth: row.trigger_depth,
root_execution_id: row.root_execution_id,
})
.await
.map_err(|e| OutboxWriterError::Backend(e.to_string()))
}
}
#[derive(sqlx::FromRow)]
struct OutboxRowRaw {
id: Uuid,
app_id: Uuid,
source_kind: String,
trigger_id: Option<Uuid>,
script_id: Option<Uuid>,
reply_to: Option<Uuid>,
payload: serde_json::Value,
origin_principal: Option<Uuid>,
trigger_depth: i32,
root_execution_id: Option<Uuid>,
attempt_count: i32,
next_attempt_at: DateTime<Utc>,
created_at: DateTime<Utc>,
}
impl OutboxRowRaw {
fn hydrate(self) -> Option<OutboxRow> {
Some(OutboxRow {
id: self.id,
app_id: self.app_id.into(),
source_kind: OutboxSourceKind::from_wire(&self.source_kind)?,
trigger_id: self.trigger_id.map(Into::into),
script_id: self.script_id.map(Into::into),
reply_to: self.reply_to,
payload: self.payload,
origin_principal: self.origin_principal.map(Into::into),
trigger_depth: u32::try_from(self.trigger_depth).unwrap_or(0),
root_execution_id: self.root_execution_id.map(Into::into),
attempt_count: u32::try_from(self.attempt_count).unwrap_or(0),
next_attempt_at: self.next_attempt_at,
created_at: self.created_at,
})
}
}

View File

@@ -0,0 +1,62 @@
//! `PrincipalResolver` — turns a `registered_by_principal` user id from
//! a trigger row into the `Principal` the dispatcher passes through to
//! the executor. Per design notes §4, a trigger execution runs as the
//! user that registered the trigger; the original event's caller is
//! recorded elsewhere (on the outbox row, for forensics) and does not
//! become the execution principal.
use async_trait::async_trait;
use picloud_shared::{AdminUserId, Principal};
use crate::admin_user_repo::{AdminUserRepository, AdminUserRepositoryError};
#[derive(Debug, thiserror::Error)]
pub enum PrincipalResolverError {
#[error("user not found: {0}")]
NotFound(AdminUserId),
#[error("user is inactive: {0}")]
Inactive(AdminUserId),
#[error("admin user repo error: {0}")]
Backend(String),
}
#[async_trait]
pub trait PrincipalResolver: Send + Sync {
async fn resolve(&self, user_id: AdminUserId) -> Result<Principal, PrincipalResolverError>;
}
pub struct AdminPrincipalResolver {
users: std::sync::Arc<dyn AdminUserRepository>,
}
impl AdminPrincipalResolver {
#[must_use]
pub fn new(users: std::sync::Arc<dyn AdminUserRepository>) -> Self {
Self { users }
}
}
#[async_trait]
impl PrincipalResolver for AdminPrincipalResolver {
async fn resolve(&self, user_id: AdminUserId) -> Result<Principal, PrincipalResolverError> {
let row = self
.users
.get(user_id)
.await
.map_err(|e: AdminUserRepositoryError| PrincipalResolverError::Backend(e.to_string()))?
.ok_or(PrincipalResolverError::NotFound(user_id))?;
if !row.is_active {
return Err(PrincipalResolverError::Inactive(user_id));
}
Ok(Principal {
user_id,
instance_role: row.instance_role,
// Trigger executions are cookie-session-style (no API key
// scope restriction). Per-app permissions are evaluated
// via `authz::can` against the `app_id` of the resource
// the script touches, exactly like an admin invocation.
scopes: None,
app_binding: None,
})
}
}

View File

@@ -77,6 +77,12 @@ pub struct CreateRouteRequest {
pub path_kind: PathKind, pub path_kind: PathKind,
pub path: String, pub path: String,
pub method: Option<String>, pub method: Option<String>,
/// Per-route dispatch mode (v1.1.1). Defaults to `Sync` when
/// omitted so older clients aren't broken. `Async` routes return
/// `202 Accepted` immediately and run the script in the
/// background via the dispatcher.
#[serde(default)]
pub dispatch_mode: picloud_shared::DispatchMode,
} }
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
@@ -211,6 +217,7 @@ async fn create_route<RR: RouteRepository, SR: ScriptRepository>(
path_kind: input.path_kind, path_kind: input.path_kind,
path: normalized_path, path: normalized_path,
method: input.method, method: input.method,
dispatch_mode: input.dispatch_mode,
}) })
.await?; .await?;
refresh_table(&state).await?; refresh_table(&state).await?;
@@ -370,6 +377,7 @@ pub fn compile_routes(rows: &[Route]) -> Result<Vec<CompiledRoute>, pattern::Par
host: pattern::parse_host(r.host_kind, &r.host, r.host_param_name.as_deref())?, host: pattern::parse_host(r.host_kind, &r.host, r.host_param_name.as_deref())?,
path: pattern::parse_path(r.path_kind, &r.path)?, path: pattern::parse_path(r.path_kind, &r.path)?,
method: r.method.clone(), method: r.method.clone(),
dispatch_mode: r.dispatch_mode,
}) })
}) })
.collect() .collect()

View File

@@ -4,7 +4,7 @@
//! after every write — see the route_admin module for the binding. //! after every write — see the route_admin module for the binding.
use async_trait::async_trait; use async_trait::async_trait;
use picloud_shared::{AppId, HostKind, PathKind, Route, ScriptId}; use picloud_shared::{AppId, DispatchMode, HostKind, PathKind, Route, ScriptId};
use sqlx::PgPool; use sqlx::PgPool;
use uuid::Uuid; use uuid::Uuid;
@@ -20,6 +20,7 @@ pub struct NewRoute {
pub path_kind: PathKind, pub path_kind: PathKind,
pub path: String, pub path: String,
pub method: Option<String>, pub method: Option<String>,
pub dispatch_mode: DispatchMode,
} }
#[async_trait] #[async_trait]
@@ -62,7 +63,7 @@ impl RouteRepository for PostgresRouteRepository {
async fn list_all(&self) -> Result<Vec<Route>, ScriptRepositoryError> { async fn list_all(&self) -> Result<Vec<Route>, ScriptRepositoryError> {
let rows = sqlx::query_as::<_, RouteRow>( let rows = sqlx::query_as::<_, RouteRow>(
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \ "SELECT id, app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method, created_at \ path_kind, path, method, dispatch_mode, created_at \
FROM routes ORDER BY created_at", FROM routes ORDER BY created_at",
) )
.fetch_all(&self.pool) .fetch_all(&self.pool)
@@ -73,7 +74,7 @@ impl RouteRepository for PostgresRouteRepository {
async fn get(&self, route_id: Uuid) -> Result<Option<Route>, ScriptRepositoryError> { async fn get(&self, route_id: Uuid) -> Result<Option<Route>, ScriptRepositoryError> {
let row = sqlx::query_as::<_, RouteRow>( let row = sqlx::query_as::<_, RouteRow>(
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \ "SELECT id, app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method, created_at \ path_kind, path, method, dispatch_mode, created_at \
FROM routes WHERE id = $1", FROM routes WHERE id = $1",
) )
.bind(route_id) .bind(route_id)
@@ -85,7 +86,7 @@ impl RouteRepository for PostgresRouteRepository {
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Route>, ScriptRepositoryError> { async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Route>, ScriptRepositoryError> {
let rows = sqlx::query_as::<_, RouteRow>( let rows = sqlx::query_as::<_, RouteRow>(
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \ "SELECT id, app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method, created_at \ path_kind, path, method, dispatch_mode, created_at \
FROM routes WHERE app_id = $1 ORDER BY created_at", FROM routes WHERE app_id = $1 ORDER BY created_at",
) )
.bind(app_id.into_inner()) .bind(app_id.into_inner())
@@ -100,7 +101,7 @@ impl RouteRepository for PostgresRouteRepository {
) -> Result<Vec<Route>, ScriptRepositoryError> { ) -> Result<Vec<Route>, ScriptRepositoryError> {
let rows = sqlx::query_as::<_, RouteRow>( let rows = sqlx::query_as::<_, RouteRow>(
"SELECT id, app_id, script_id, host_kind, host, host_param_name, \ "SELECT id, app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method, created_at \ path_kind, path, method, dispatch_mode, created_at \
FROM routes WHERE script_id = $1 ORDER BY created_at", FROM routes WHERE script_id = $1 ORDER BY created_at",
) )
.bind(script_id.into_inner()) .bind(script_id.into_inner())
@@ -113,10 +114,10 @@ impl RouteRepository for PostgresRouteRepository {
let res = sqlx::query_as::<_, RouteRow>( let res = sqlx::query_as::<_, RouteRow>(
"INSERT INTO routes ( \ "INSERT INTO routes ( \
app_id, script_id, host_kind, host, host_param_name, \ app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method \ path_kind, path, method, dispatch_mode \
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) \ ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) \
RETURNING id, app_id, script_id, host_kind, host, host_param_name, \ RETURNING id, app_id, script_id, host_kind, host, host_param_name, \
path_kind, path, method, created_at", path_kind, path, method, dispatch_mode, created_at",
) )
.bind(input.app_id.into_inner()) .bind(input.app_id.into_inner())
.bind(input.script_id.into_inner()) .bind(input.script_id.into_inner())
@@ -126,6 +127,7 @@ impl RouteRepository for PostgresRouteRepository {
.bind(path_kind_str(input.path_kind)) .bind(path_kind_str(input.path_kind))
.bind(&input.path) .bind(&input.path)
.bind(input.method.as_deref()) .bind(input.method.as_deref())
.bind(input.dispatch_mode.as_str())
.fetch_one(&self.pool) .fetch_one(&self.pool)
.await; .await;
@@ -198,6 +200,7 @@ struct RouteRow {
path_kind: String, path_kind: String,
path: String, path: String,
method: Option<String>, method: Option<String>,
dispatch_mode: String,
created_at: chrono::DateTime<chrono::Utc>, created_at: chrono::DateTime<chrono::Utc>,
} }
@@ -221,6 +224,7 @@ impl From<RouteRow> for Route {
}, },
path: r.path, path: r.path,
method: r.method, method: r.method,
dispatch_mode: DispatchMode::from_wire(&r.dispatch_mode).unwrap_or(DispatchMode::Sync),
created_at: r.created_at, created_at: r.created_at,
} }
} }

View File

@@ -0,0 +1,157 @@
//! Trigger-framework tunables. Defaults match design notes §3 (retry
//! policy) and §4 (retention). Each knob is env-overridable via a
//! `PICLOUD_*` variable following the same `tracing::warn` on parse
//! error pattern `SandboxCeiling::from_env` uses.
use std::env;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum BackoffShape {
Exponential,
Linear,
Constant,
}
impl BackoffShape {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Exponential => "exponential",
Self::Linear => "linear",
Self::Constant => "constant",
}
}
#[must_use]
pub fn from_wire(s: &str) -> Option<Self> {
match s {
"exponential" => Some(Self::Exponential),
"linear" => Some(Self::Linear),
"constant" => Some(Self::Constant),
_ => None,
}
}
}
#[derive(Debug, Clone, Copy)]
pub struct TriggerConfig {
/// Maximum `cx.trigger_depth` before the dispatcher refuses
/// execution. Above this, the row is dropped + a metric bumped;
/// it is NOT dead-lettered (design notes §4: depth-exceeded
/// means "you built a loop"). Default 8.
pub max_trigger_depth: u32,
/// Default retry attempts (per-trigger override on the row).
pub retry_max_attempts: u32,
pub retry_backoff: BackoffShape,
pub retry_base_ms: u32,
/// ±jitter as a percentage of the computed delay. Applied at
/// dispatch time — not per-trigger.
pub retry_jitter_pct: u32,
/// dead-letter retention before GC, in days. Default 30.
pub dead_letter_retention_days: u32,
/// abandoned-execution retention before GC, in days. Default 7.
pub abandoned_retention_days: u32,
}
impl TriggerConfig {
#[must_use]
pub const fn conservative() -> Self {
Self {
max_trigger_depth: 8,
retry_max_attempts: 3,
retry_backoff: BackoffShape::Exponential,
retry_base_ms: 1000,
retry_jitter_pct: 20,
dead_letter_retention_days: 30,
abandoned_retention_days: 7,
}
}
#[must_use]
pub fn from_env() -> Self {
let mut c = Self::conservative();
load_u32(&mut c.max_trigger_depth, "PICLOUD_MAX_TRIGGER_DEPTH");
load_u32(
&mut c.retry_max_attempts,
"PICLOUD_TRIGGER_RETRY_MAX_ATTEMPTS",
);
load_backoff(&mut c.retry_backoff, "PICLOUD_TRIGGER_RETRY_BACKOFF");
load_u32(&mut c.retry_base_ms, "PICLOUD_TRIGGER_RETRY_BASE_MS");
load_u32(&mut c.retry_jitter_pct, "PICLOUD_TRIGGER_RETRY_JITTER_PCT");
load_u32(
&mut c.dead_letter_retention_days,
"PICLOUD_DEAD_LETTER_RETENTION_DAYS",
);
load_u32(
&mut c.abandoned_retention_days,
"PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS",
);
c
}
}
impl Default for TriggerConfig {
fn default() -> Self {
Self::conservative()
}
}
fn load_u32(dst: &mut u32, key: &str) {
if let Ok(v) = env::var(key) {
match v.parse::<u32>() {
Ok(n) => *dst = n,
Err(e) => {
tracing::warn!(env = key, error = %e, "ignoring invalid trigger-config value");
}
}
}
}
fn load_backoff(dst: &mut BackoffShape, key: &str) {
if let Ok(v) = env::var(key) {
match BackoffShape::from_wire(&v) {
Some(b) => *dst = b,
None => {
tracing::warn!(
env = key,
value = %v,
"ignoring invalid trigger-config backoff shape (use exponential|linear|constant)"
);
}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn conservative_defaults_match_design_notes() {
let c = TriggerConfig::conservative();
assert_eq!(c.max_trigger_depth, 8);
assert_eq!(c.retry_max_attempts, 3);
assert_eq!(c.retry_backoff, BackoffShape::Exponential);
assert_eq!(c.retry_base_ms, 1000);
assert_eq!(c.retry_jitter_pct, 20);
assert_eq!(c.dead_letter_retention_days, 30);
assert_eq!(c.abandoned_retention_days, 7);
}
#[test]
fn backoff_round_trips() {
for shape in [
BackoffShape::Exponential,
BackoffShape::Linear,
BackoffShape::Constant,
] {
assert_eq!(BackoffShape::from_wire(shape.as_str()), Some(shape));
}
assert_eq!(BackoffShape::from_wire("garbage"), None);
}
}

View File

@@ -0,0 +1,617 @@
//! `TriggerRepo` — CRUD over the `triggers` parent + per-kind detail
//! tables. The admin endpoints (commit 4) sit on top of this; the
//! dispatcher (commit 5) reads `list_matching_*` to fan out events to
//! handler scripts.
use async_trait::async_trait;
use chrono::{DateTime, Utc};
use picloud_shared::{AdminUserId, AppId, KvEventOp, ScriptId, TriggerId};
use serde::{Deserialize, Serialize};
use sqlx::PgPool;
use uuid::Uuid;
use crate::trigger_config::BackoffShape;
#[derive(Debug, thiserror::Error)]
pub enum TriggerRepoError {
#[error("database error: {0}")]
Db(#[from] sqlx::Error),
#[error("trigger not found: {0}")]
NotFound(TriggerId),
#[error("invalid trigger payload: {0}")]
Invalid(String),
}
/// Parent-table row plus the per-kind detail merged in. Serialized
/// back to admin clients via the JSON API.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Trigger {
pub id: TriggerId,
pub app_id: AppId,
pub script_id: ScriptId,
pub kind: TriggerKind,
pub enabled: bool,
pub dispatch_mode: TriggerDispatchMode,
pub retry_max_attempts: u32,
pub retry_backoff: BackoffShape,
pub retry_base_ms: u32,
pub registered_by_principal: AdminUserId,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub details: TriggerDetails,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum TriggerKind {
Kv,
DeadLetter,
}
impl TriggerKind {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Kv => "kv",
Self::DeadLetter => "dead_letter",
}
}
#[must_use]
pub fn from_wire(s: &str) -> Option<Self> {
match s {
"kv" => Some(Self::Kv),
"dead_letter" => Some(Self::DeadLetter),
_ => None,
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum TriggerDispatchMode {
Sync,
Async,
}
impl TriggerDispatchMode {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Sync => "sync",
Self::Async => "async",
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "kind", rename_all = "snake_case")]
pub enum TriggerDetails {
Kv {
collection_glob: String,
ops: Vec<KvEventOp>,
},
DeadLetter {
#[serde(default, skip_serializing_if = "Option::is_none")]
source_filter: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
trigger_id_filter: Option<TriggerId>,
#[serde(default, skip_serializing_if = "Option::is_none")]
script_id_filter: Option<ScriptId>,
},
}
/// Create payload for a KV trigger. Defaults applied at the admin
/// layer (uses `TriggerConfig::from_env` to fill retry settings if
/// the request omitted them — keeps the row auditable).
#[derive(Debug, Clone)]
pub struct CreateKvTrigger {
pub script_id: ScriptId,
pub collection_glob: String,
pub ops: Vec<KvEventOp>,
pub dispatch_mode: TriggerDispatchMode,
pub retry_max_attempts: u32,
pub retry_backoff: BackoffShape,
pub retry_base_ms: u32,
pub registered_by_principal: AdminUserId,
}
#[derive(Debug, Clone)]
pub struct CreateDeadLetterTrigger {
pub script_id: ScriptId,
pub source_filter: Option<String>,
pub trigger_id_filter: Option<TriggerId>,
pub script_id_filter: Option<ScriptId>,
pub registered_by_principal: AdminUserId,
}
/// One match for the dispatcher's "which KV triggers fire on this
/// event" lookup. Carries everything the dispatcher needs to construct
/// the outbox row.
#[derive(Debug, Clone)]
pub struct KvTriggerMatch {
pub trigger_id: TriggerId,
pub script_id: ScriptId,
pub dispatch_mode: TriggerDispatchMode,
pub retry_max_attempts: u32,
pub retry_backoff: BackoffShape,
pub retry_base_ms: u32,
pub registered_by_principal: AdminUserId,
}
/// One match for the dispatcher's "which dead-letter triggers fire
/// on this dead-letter row" lookup.
#[derive(Debug, Clone)]
pub struct DeadLetterTriggerMatch {
pub trigger_id: TriggerId,
pub script_id: ScriptId,
pub dispatch_mode: TriggerDispatchMode,
pub registered_by_principal: AdminUserId,
}
#[async_trait]
pub trait TriggerRepo: Send + Sync {
async fn create_kv_trigger(
&self,
app_id: AppId,
req: CreateKvTrigger,
) -> Result<Trigger, TriggerRepoError>;
async fn create_dead_letter_trigger(
&self,
app_id: AppId,
req: CreateDeadLetterTrigger,
) -> Result<Trigger, TriggerRepoError>;
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError>;
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError>;
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError>;
/// Dispatcher hot path: find every enabled KV trigger in `app_id`
/// whose `collection_glob` matches `collection` and whose `ops`
/// covers `op`. Glob matching done in Rust (the column is plain
/// TEXT, the matcher applies "*"/"prefix:*" semantics).
async fn list_matching_kv(
&self,
app_id: AppId,
collection: &str,
op: KvEventOp,
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError>;
/// Dispatcher hot path for dead-letter fan-out. Filters: source
/// (or any-source), originating trigger_id (or any), originating
/// script_id (or any). Each filter is "match OR is_null".
async fn list_matching_dead_letter(
&self,
app_id: AppId,
source: &str,
trigger_id: Option<TriggerId>,
script_id: Option<ScriptId>,
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError>;
}
// ----------------------------------------------------------------------------
// Postgres impl
// ----------------------------------------------------------------------------
pub struct PostgresTriggerRepo {
pool: PgPool,
}
impl PostgresTriggerRepo {
#[must_use]
pub fn new(pool: PgPool) -> Self {
Self { pool }
}
}
#[async_trait]
impl TriggerRepo for PostgresTriggerRepo {
async fn create_kv_trigger(
&self,
app_id: AppId,
req: CreateKvTrigger,
) -> Result<Trigger, TriggerRepoError> {
if req.collection_glob.is_empty() {
return Err(TriggerRepoError::Invalid(
"collection_glob must not be empty".into(),
));
}
let mut tx = self.pool.begin().await?;
let parent: TriggerRow = sqlx::query_as(
"INSERT INTO triggers ( \
app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal \
) VALUES ($1, $2, 'kv', TRUE, $3, $4, $5, $6, $7) \
RETURNING id, app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal, created_at, updated_at",
)
.bind(app_id.into_inner())
.bind(req.script_id.into_inner())
.bind(req.dispatch_mode.as_str())
.bind(i32::try_from(req.retry_max_attempts).unwrap_or(3))
.bind(req.retry_backoff.as_str())
.bind(i32::try_from(req.retry_base_ms).unwrap_or(1000))
.bind(req.registered_by_principal.into_inner())
.fetch_one(&mut *tx)
.await?;
let ops_str: Vec<String> = req.ops.iter().map(|o| o.as_str().to_string()).collect();
sqlx::query(
"INSERT INTO kv_trigger_details (trigger_id, collection_glob, ops) \
VALUES ($1, $2, $3)",
)
.bind(parent.id)
.bind(&req.collection_glob)
.bind(&ops_str)
.execute(&mut *tx)
.await?;
tx.commit().await?;
Ok(Trigger {
id: parent.id.into(),
app_id: parent.app_id.into(),
script_id: parent.script_id.into(),
kind: TriggerKind::Kv,
enabled: parent.enabled,
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(3),
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
.unwrap_or(BackoffShape::Exponential),
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(1000),
registered_by_principal: parent.registered_by_principal.into(),
created_at: parent.created_at,
updated_at: parent.updated_at,
details: TriggerDetails::Kv {
collection_glob: req.collection_glob,
ops: req.ops,
},
})
}
async fn create_dead_letter_trigger(
&self,
app_id: AppId,
req: CreateDeadLetterTrigger,
) -> Result<Trigger, TriggerRepoError> {
let mut tx = self.pool.begin().await?;
// Dead-letter triggers force max_attempts=1 (design notes §4
// recursion-stop). Backoff/base_ms irrelevant but the columns
// are NOT NULL — store sensible values.
let parent: TriggerRow = sqlx::query_as(
"INSERT INTO triggers ( \
app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal \
) VALUES ($1, $2, 'dead_letter', TRUE, 'async', 1, 'constant', 0, $3) \
RETURNING id, app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal, created_at, updated_at",
)
.bind(app_id.into_inner())
.bind(req.script_id.into_inner())
.bind(req.registered_by_principal.into_inner())
.fetch_one(&mut *tx)
.await?;
sqlx::query(
"INSERT INTO dead_letter_trigger_details \
(trigger_id, source_filter, trigger_id_filter, script_id_filter) \
VALUES ($1, $2, $3, $4)",
)
.bind(parent.id)
.bind(req.source_filter.as_deref())
.bind(req.trigger_id_filter.map(TriggerId::into_inner))
.bind(req.script_id_filter.map(ScriptId::into_inner))
.execute(&mut *tx)
.await?;
tx.commit().await?;
Ok(Trigger {
id: parent.id.into(),
app_id: parent.app_id.into(),
script_id: parent.script_id.into(),
kind: TriggerKind::DeadLetter,
enabled: parent.enabled,
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(1),
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
.unwrap_or(BackoffShape::Constant),
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(0),
registered_by_principal: parent.registered_by_principal.into(),
created_at: parent.created_at,
updated_at: parent.updated_at,
details: TriggerDetails::DeadLetter {
source_filter: req.source_filter,
trigger_id_filter: req.trigger_id_filter,
script_id_filter: req.script_id_filter,
},
})
}
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError> {
let parents: Vec<TriggerRow> = sqlx::query_as(
"SELECT id, app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal, created_at, updated_at \
FROM triggers WHERE app_id = $1 ORDER BY created_at DESC",
)
.bind(app_id.into_inner())
.fetch_all(&self.pool)
.await?;
let mut out = Vec::with_capacity(parents.len());
for p in parents {
out.push(hydrate_one(&self.pool, p).await?);
}
Ok(out)
}
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError> {
let parent: Option<TriggerRow> = sqlx::query_as(
"SELECT id, app_id, script_id, kind, enabled, dispatch_mode, \
retry_max_attempts, retry_backoff, retry_base_ms, \
registered_by_principal, created_at, updated_at \
FROM triggers WHERE id = $1",
)
.bind(id.into_inner())
.fetch_optional(&self.pool)
.await?;
match parent {
Some(p) => Ok(Some(hydrate_one(&self.pool, p).await?)),
None => Ok(None),
}
}
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError> {
// ON DELETE CASCADE on the detail tables takes care of them.
let res = sqlx::query("DELETE FROM triggers WHERE id = $1")
.bind(id.into_inner())
.execute(&self.pool)
.await?;
Ok(res.rows_affected() > 0)
}
async fn list_matching_kv(
&self,
app_id: AppId,
collection: &str,
op: KvEventOp,
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError> {
// Fetch all enabled KV triggers for the app — glob matching
// happens in Rust so we don't have to teach the query about
// `*` and `prefix:*`. Sets are tiny in practice (one app's
// worth of triggers, usually a handful).
let rows: Vec<KvMatchRow> = sqlx::query_as(
"SELECT t.id, t.script_id, t.dispatch_mode, \
t.retry_max_attempts, t.retry_backoff, t.retry_base_ms, \
t.registered_by_principal, \
d.collection_glob, d.ops \
FROM triggers t \
JOIN kv_trigger_details d ON d.trigger_id = t.id \
WHERE t.app_id = $1 AND t.kind = 'kv' AND t.enabled = TRUE",
)
.bind(app_id.into_inner())
.fetch_all(&self.pool)
.await?;
let op_str = op.as_str();
let mut out = Vec::new();
for r in rows {
if !collection_matches(&r.collection_glob, collection) {
continue;
}
let any_op = r.ops.is_empty();
if !any_op && !r.ops.iter().any(|o| o == op_str) {
continue;
}
out.push(KvTriggerMatch {
trigger_id: r.id.into(),
script_id: r.script_id.into(),
dispatch_mode: dispatch_from_str(&r.dispatch_mode),
retry_max_attempts: u32::try_from(r.retry_max_attempts).unwrap_or(3),
retry_backoff: BackoffShape::from_wire(&r.retry_backoff)
.unwrap_or(BackoffShape::Exponential),
retry_base_ms: u32::try_from(r.retry_base_ms).unwrap_or(1000),
registered_by_principal: r.registered_by_principal.into(),
});
}
Ok(out)
}
async fn list_matching_dead_letter(
&self,
app_id: AppId,
source: &str,
trigger_id: Option<TriggerId>,
script_id: Option<ScriptId>,
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError> {
let rows: Vec<DlMatchRow> = sqlx::query_as(
"SELECT t.id, t.script_id, t.dispatch_mode, t.registered_by_principal, \
d.source_filter, d.trigger_id_filter, d.script_id_filter \
FROM triggers t \
JOIN dead_letter_trigger_details d ON d.trigger_id = t.id \
WHERE t.app_id = $1 AND t.kind = 'dead_letter' AND t.enabled = TRUE \
AND (d.source_filter IS NULL OR d.source_filter = $2) \
AND (d.trigger_id_filter IS NULL OR d.trigger_id_filter = $3) \
AND (d.script_id_filter IS NULL OR d.script_id_filter = $4)",
)
.bind(app_id.into_inner())
.bind(source)
.bind(trigger_id.map(TriggerId::into_inner))
.bind(script_id.map(ScriptId::into_inner))
.fetch_all(&self.pool)
.await?;
Ok(rows
.into_iter()
.map(|r| DeadLetterTriggerMatch {
trigger_id: r.id.into(),
script_id: r.script_id.into(),
dispatch_mode: dispatch_from_str(&r.dispatch_mode),
registered_by_principal: r.registered_by_principal.into(),
})
.collect())
}
}
async fn hydrate_one(pool: &PgPool, parent: TriggerRow) -> Result<Trigger, TriggerRepoError> {
let kind = TriggerKind::from_wire(&parent.kind).ok_or_else(|| {
TriggerRepoError::Invalid(format!("unknown trigger kind {}", parent.kind))
})?;
let details = match kind {
TriggerKind::Kv => {
let row: KvDetailRow = sqlx::query_as(
"SELECT collection_glob, ops FROM kv_trigger_details WHERE trigger_id = $1",
)
.bind(parent.id)
.fetch_one(pool)
.await?;
let ops = row
.ops
.iter()
.filter_map(|s| KvEventOp::from_wire(s))
.collect();
TriggerDetails::Kv {
collection_glob: row.collection_glob,
ops,
}
}
TriggerKind::DeadLetter => {
let row: DlDetailRow = sqlx::query_as(
"SELECT source_filter, trigger_id_filter, script_id_filter \
FROM dead_letter_trigger_details WHERE trigger_id = $1",
)
.bind(parent.id)
.fetch_one(pool)
.await?;
TriggerDetails::DeadLetter {
source_filter: row.source_filter,
trigger_id_filter: row.trigger_id_filter.map(Into::into),
script_id_filter: row.script_id_filter.map(Into::into),
}
}
};
Ok(Trigger {
id: parent.id.into(),
app_id: parent.app_id.into(),
script_id: parent.script_id.into(),
kind,
enabled: parent.enabled,
dispatch_mode: dispatch_from_str(&parent.dispatch_mode),
retry_max_attempts: u32::try_from(parent.retry_max_attempts).unwrap_or(3),
retry_backoff: BackoffShape::from_wire(&parent.retry_backoff)
.unwrap_or(BackoffShape::Exponential),
retry_base_ms: u32::try_from(parent.retry_base_ms).unwrap_or(1000),
registered_by_principal: parent.registered_by_principal.into(),
created_at: parent.created_at,
updated_at: parent.updated_at,
details,
})
}
fn dispatch_from_str(s: &str) -> TriggerDispatchMode {
match s {
"sync" => TriggerDispatchMode::Sync,
_ => TriggerDispatchMode::Async,
}
}
/// Match a `collection_glob` against an actual `collection` name.
/// Supported forms (in priority order):
/// - `"*"` → matches every collection
/// - `"foo*"` → prefix match (anything starting with "foo")
/// - `"foo"` → exact match
#[must_use]
pub fn collection_matches(glob: &str, collection: &str) -> bool {
if glob == "*" {
return true;
}
if let Some(prefix) = glob.strip_suffix('*') {
return collection.starts_with(prefix);
}
glob == collection
}
#[derive(sqlx::FromRow)]
struct TriggerRow {
id: Uuid,
app_id: Uuid,
script_id: Uuid,
kind: String,
enabled: bool,
dispatch_mode: String,
retry_max_attempts: i32,
retry_backoff: String,
retry_base_ms: i32,
registered_by_principal: Uuid,
created_at: DateTime<Utc>,
updated_at: DateTime<Utc>,
}
#[derive(sqlx::FromRow)]
struct KvDetailRow {
collection_glob: String,
ops: Vec<String>,
}
#[derive(sqlx::FromRow)]
#[allow(clippy::struct_field_names)]
struct DlDetailRow {
source_filter: Option<String>,
trigger_id_filter: Option<Uuid>,
script_id_filter: Option<Uuid>,
}
#[derive(sqlx::FromRow)]
struct KvMatchRow {
id: Uuid,
script_id: Uuid,
dispatch_mode: String,
retry_max_attempts: i32,
retry_backoff: String,
retry_base_ms: i32,
registered_by_principal: Uuid,
collection_glob: String,
ops: Vec<String>,
}
#[derive(sqlx::FromRow)]
struct DlMatchRow {
id: Uuid,
script_id: Uuid,
dispatch_mode: String,
registered_by_principal: Uuid,
#[allow(dead_code)]
source_filter: Option<String>,
#[allow(dead_code)]
trigger_id_filter: Option<Uuid>,
#[allow(dead_code)]
script_id_filter: Option<Uuid>,
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn collection_matcher_handles_star_prefix_exact() {
assert!(collection_matches("*", "widgets"));
assert!(collection_matches("*", ""));
assert!(collection_matches("users:*", "users:1"));
assert!(collection_matches("users:*", "users:"));
assert!(!collection_matches("users:*", "orgs:1"));
assert!(collection_matches("widgets", "widgets"));
assert!(!collection_matches("widgets", "Widgets"));
}
}

View File

@@ -0,0 +1,748 @@
//! `/api/v1/admin/apps/{id}/triggers/*` — trigger CRUD admin endpoints.
//!
//! Per design notes §2, two kinds ship in v1.1.1: `kv` (with
//! collection_glob + ops) and `dead_letter` (with optional source /
//! trigger_id / script_id filters). Separate endpoints per kind keep
//! validation clean.
//!
//! Every endpoint is guarded by `Capability::AppManageTriggers(app_id)`
//! evaluated after the resource lookup so the capability binds to the
//! resource's actual `app_id` (mirrors `apps_api`).
use std::sync::Arc;
use axum::extract::{Path, State};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Json, Response};
use axum::routing::{delete, get, post};
use axum::{Extension, Router};
use picloud_shared::{AppId, KvEventOp, Principal, ScriptId, TriggerId};
use serde::{Deserialize, Serialize};
use serde_json::json;
use crate::app_repo::AppRepository;
use crate::authz::{require, AuthzDenied, AuthzError, AuthzRepo, Capability};
use crate::trigger_config::{BackoffShape, TriggerConfig};
use crate::trigger_repo::{
CreateDeadLetterTrigger, CreateKvTrigger, Trigger, TriggerDispatchMode, TriggerRepo,
TriggerRepoError,
};
#[derive(Clone)]
pub struct TriggersState {
pub triggers: Arc<dyn TriggerRepo>,
pub apps: Arc<dyn AppRepository>,
pub authz: Arc<dyn AuthzRepo>,
/// Defaults applied to created triggers when the request omits
/// retry settings. Kept on the state struct so tests can swap
/// in a stricter / looser config without env tinkering.
pub config: TriggerConfig,
}
pub fn triggers_router(state: TriggersState) -> Router {
Router::new()
.route(
"/apps/{app_id}/triggers",
get(list_triggers).delete(noop_405),
)
.route("/apps/{app_id}/triggers/kv", post(create_kv_trigger))
.route(
"/apps/{app_id}/triggers/dead_letter",
post(create_dl_trigger),
)
.route(
"/apps/{app_id}/triggers/{trigger_id}",
delete(delete_trigger),
)
.with_state(state)
}
async fn noop_405() -> StatusCode {
StatusCode::METHOD_NOT_ALLOWED
}
// ----------------------------------------------------------------------------
// DTOs
// ----------------------------------------------------------------------------
#[derive(Debug, Deserialize)]
pub struct CreateKvTriggerRequest {
pub script_id: ScriptId,
pub collection_glob: String,
/// Subset of `{insert, update, delete}`. Empty array means "any
/// op" (the trigger fires on every mutation in matching
/// collections).
#[serde(default)]
pub ops: Vec<KvEventOp>,
#[serde(default = "default_dispatch")]
pub dispatch_mode: TriggerDispatchMode,
/// Overrides for the platform retry defaults. Omitted fields fall
/// back to `TriggerConfig` (env-overridable) at write time.
#[serde(default)]
pub retry_max_attempts: Option<u32>,
#[serde(default)]
pub retry_backoff: Option<BackoffShape>,
#[serde(default)]
pub retry_base_ms: Option<u32>,
}
const fn default_dispatch() -> TriggerDispatchMode {
TriggerDispatchMode::Async
}
#[derive(Debug, Deserialize)]
pub struct CreateDeadLetterTriggerRequest {
pub script_id: ScriptId,
#[serde(default)]
pub source_filter: Option<String>,
#[serde(default)]
pub trigger_id_filter: Option<TriggerId>,
#[serde(default)]
pub script_id_filter: Option<ScriptId>,
}
#[derive(Debug, Serialize)]
pub struct TriggerListResponse {
pub triggers: Vec<Trigger>,
}
// ----------------------------------------------------------------------------
// Handlers
// ----------------------------------------------------------------------------
async fn list_triggers(
State(s): State<TriggersState>,
Extension(principal): Extension<Principal>,
Path(app_id): Path<AppId>,
) -> Result<Json<TriggerListResponse>, TriggersApiError> {
ensure_app_exists(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppManageTriggers(app_id),
)
.await?;
let triggers = s.triggers.list_for_app(app_id).await?;
Ok(Json(TriggerListResponse { triggers }))
}
async fn create_kv_trigger(
State(s): State<TriggersState>,
Extension(principal): Extension<Principal>,
Path(app_id): Path<AppId>,
Json(input): Json<CreateKvTriggerRequest>,
) -> Result<(StatusCode, Json<Trigger>), TriggersApiError> {
ensure_app_exists(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppManageTriggers(app_id),
)
.await?;
if input.collection_glob.trim().is_empty() {
return Err(TriggersApiError::Invalid(
"collection_glob must not be empty".into(),
));
}
let req = CreateKvTrigger {
script_id: input.script_id,
collection_glob: input.collection_glob,
ops: input.ops,
dispatch_mode: input.dispatch_mode,
retry_max_attempts: input
.retry_max_attempts
.unwrap_or(s.config.retry_max_attempts),
retry_backoff: input.retry_backoff.unwrap_or(s.config.retry_backoff),
retry_base_ms: input.retry_base_ms.unwrap_or(s.config.retry_base_ms),
registered_by_principal: principal.user_id,
};
let created = s.triggers.create_kv_trigger(app_id, req).await?;
Ok((StatusCode::CREATED, Json(created)))
}
async fn create_dl_trigger(
State(s): State<TriggersState>,
Extension(principal): Extension<Principal>,
Path(app_id): Path<AppId>,
Json(input): Json<CreateDeadLetterTriggerRequest>,
) -> Result<(StatusCode, Json<Trigger>), TriggersApiError> {
ensure_app_exists(&*s.apps, app_id).await?;
require(
s.authz.as_ref(),
&principal,
Capability::AppManageTriggers(app_id),
)
.await?;
let req = CreateDeadLetterTrigger {
script_id: input.script_id,
source_filter: input.source_filter,
trigger_id_filter: input.trigger_id_filter,
script_id_filter: input.script_id_filter,
registered_by_principal: principal.user_id,
};
let created = s.triggers.create_dead_letter_trigger(app_id, req).await?;
Ok((StatusCode::CREATED, Json(created)))
}
async fn delete_trigger(
State(s): State<TriggersState>,
Extension(principal): Extension<Principal>,
Path((app_id, trigger_id)): Path<(AppId, TriggerId)>,
) -> Result<StatusCode, TriggersApiError> {
ensure_app_exists(&*s.apps, app_id).await?;
// Load the trigger so we can confirm it belongs to the right
// app; this prevents a caller from deleting a trigger by id alone
// when their capability is bound to a different app.
let trigger = s
.triggers
.get(trigger_id)
.await?
.ok_or(TriggersApiError::NotFound(trigger_id))?;
if trigger.app_id != app_id {
return Err(TriggersApiError::NotFound(trigger_id));
}
require(
s.authz.as_ref(),
&principal,
Capability::AppManageTriggers(app_id),
)
.await?;
if !s.triggers.delete(trigger_id).await? {
return Err(TriggersApiError::NotFound(trigger_id));
}
Ok(StatusCode::NO_CONTENT)
}
async fn ensure_app_exists(
apps: &dyn AppRepository,
app_id: AppId,
) -> Result<(), TriggersApiError> {
apps.get_by_id(app_id)
.await
.map_err(|e| TriggersApiError::Backend(e.to_string()))?
.ok_or_else(|| TriggersApiError::AppNotFound(app_id.to_string()))?;
Ok(())
}
// ----------------------------------------------------------------------------
// Errors
// ----------------------------------------------------------------------------
#[derive(Debug, thiserror::Error)]
pub enum TriggersApiError {
#[error("app not found: {0}")]
AppNotFound(String),
#[error("trigger not found: {0}")]
NotFound(TriggerId),
#[error("invalid trigger: {0}")]
Invalid(String),
#[error("forbidden")]
Forbidden,
#[error("authorization repo error: {0}")]
AuthzRepo(String),
#[error("trigger backend: {0}")]
Backend(String),
}
impl From<AuthzDenied> for TriggersApiError {
fn from(d: AuthzDenied) -> Self {
match d {
AuthzDenied::Denied => Self::Forbidden,
AuthzDenied::Repo(e) => Self::AuthzRepo(e.to_string()),
}
}
}
impl From<AuthzError> for TriggersApiError {
fn from(e: AuthzError) -> Self {
Self::AuthzRepo(e.to_string())
}
}
impl From<TriggerRepoError> for TriggersApiError {
fn from(e: TriggerRepoError) -> Self {
match e {
TriggerRepoError::NotFound(id) => Self::NotFound(id),
TriggerRepoError::Invalid(s) => Self::Invalid(s),
TriggerRepoError::Db(e) => Self::Backend(e.to_string()),
}
}
}
impl IntoResponse for TriggersApiError {
fn into_response(self) -> Response {
let (status, body) = match &self {
Self::AppNotFound(_) | Self::NotFound(_) => {
(StatusCode::NOT_FOUND, json!({ "error": self.to_string() }))
}
Self::Invalid(_) => (
StatusCode::UNPROCESSABLE_ENTITY,
json!({ "error": self.to_string() }),
),
Self::Forbidden => (StatusCode::FORBIDDEN, json!({ "error": self.to_string() })),
Self::AuthzRepo(e) => {
tracing::error!(error = %e, "triggers authz repo error");
(
StatusCode::INTERNAL_SERVER_ERROR,
json!({ "error": "internal error" }),
)
}
Self::Backend(e) => {
tracing::error!(error = %e, "triggers api backend error");
(
StatusCode::INTERNAL_SERVER_ERROR,
json!({ "error": "internal error" }),
)
}
};
(status, Json(body)).into_response()
}
}
#[cfg(test)]
mod tests {
//! In-memory tests for the trigger admin path. The Axum routing
//! / extractor surface is exercised by integration tests (which
//! need a real Postgres for the trigger repo); these tests cover
//! the handlers' invariant logic — capability enforcement, app
//! validation, default fallback for retry settings.
use super::*;
use crate::app_repo::{AppLookup, AppRepository};
use crate::trigger_repo::{
DeadLetterTriggerMatch, KvTriggerMatch, Trigger, TriggerDetails, TriggerRepo,
TriggerRepoError,
};
use async_trait::async_trait;
use chrono::Utc;
use picloud_shared::{AdminUserId, App, AppRole, KvEventOp, ScriptId, TriggerId, UserId};
use std::collections::HashMap;
use tokio::sync::Mutex;
#[derive(Default)]
struct InMemoryTriggerRepo {
inner: Mutex<HashMap<TriggerId, Trigger>>,
}
#[async_trait]
impl TriggerRepo for InMemoryTriggerRepo {
async fn create_kv_trigger(
&self,
app_id: AppId,
req: CreateKvTrigger,
) -> Result<Trigger, TriggerRepoError> {
let now = Utc::now();
let id = TriggerId::new();
let trigger = Trigger {
id,
app_id,
script_id: req.script_id,
kind: crate::trigger_repo::TriggerKind::Kv,
enabled: true,
dispatch_mode: req.dispatch_mode,
retry_max_attempts: req.retry_max_attempts,
retry_backoff: req.retry_backoff,
retry_base_ms: req.retry_base_ms,
registered_by_principal: req.registered_by_principal,
created_at: now,
updated_at: now,
details: TriggerDetails::Kv {
collection_glob: req.collection_glob,
ops: req.ops,
},
};
self.inner.lock().await.insert(id, trigger.clone());
Ok(trigger)
}
async fn create_dead_letter_trigger(
&self,
app_id: AppId,
req: CreateDeadLetterTrigger,
) -> Result<Trigger, TriggerRepoError> {
let now = Utc::now();
let id = TriggerId::new();
let trigger = Trigger {
id,
app_id,
script_id: req.script_id,
kind: crate::trigger_repo::TriggerKind::DeadLetter,
enabled: true,
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: 1,
retry_backoff: BackoffShape::Constant,
retry_base_ms: 0,
registered_by_principal: req.registered_by_principal,
created_at: now,
updated_at: now,
details: TriggerDetails::DeadLetter {
source_filter: req.source_filter,
trigger_id_filter: req.trigger_id_filter,
script_id_filter: req.script_id_filter,
},
};
self.inner.lock().await.insert(id, trigger.clone());
Ok(trigger)
}
async fn list_for_app(&self, app_id: AppId) -> Result<Vec<Trigger>, TriggerRepoError> {
Ok(self
.inner
.lock()
.await
.values()
.filter(|t| t.app_id == app_id)
.cloned()
.collect())
}
async fn get(&self, id: TriggerId) -> Result<Option<Trigger>, TriggerRepoError> {
Ok(self.inner.lock().await.get(&id).cloned())
}
async fn delete(&self, id: TriggerId) -> Result<bool, TriggerRepoError> {
Ok(self.inner.lock().await.remove(&id).is_some())
}
async fn list_matching_kv(
&self,
_app_id: AppId,
_collection: &str,
_op: KvEventOp,
) -> Result<Vec<KvTriggerMatch>, TriggerRepoError> {
Ok(vec![])
}
async fn list_matching_dead_letter(
&self,
_app_id: AppId,
_source: &str,
_trigger_id: Option<TriggerId>,
_script_id: Option<ScriptId>,
) -> Result<Vec<DeadLetterTriggerMatch>, TriggerRepoError> {
Ok(vec![])
}
}
struct InMemoryAppRepo {
existing: Mutex<HashMap<AppId, App>>,
}
impl InMemoryAppRepo {
fn with(app_id: AppId) -> Arc<Self> {
let now = Utc::now();
let mut existing = HashMap::new();
existing.insert(
app_id,
App {
id: app_id,
slug: "test".into(),
name: "test".into(),
description: None,
created_at: now,
updated_at: now,
},
);
Arc::new(Self {
existing: Mutex::new(existing),
})
}
}
#[async_trait]
impl AppRepository for InMemoryAppRepo {
async fn create(
&self,
_slug: &str,
_name: &str,
_description: Option<&str>,
) -> Result<App, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn create_with_takeover(
&self,
_slug: &str,
_name: &str,
_description: Option<&str>,
) -> Result<App, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn slug_in_history(
&self,
_slug: &str,
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn list(&self) -> Result<Vec<App>, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn list_for_user(
&self,
_user_id: AdminUserId,
) -> Result<Vec<App>, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn get_by_id(
&self,
id: AppId,
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
Ok(self.existing.lock().await.get(&id).cloned())
}
async fn get_by_slug(
&self,
_slug: &str,
) -> Result<Option<App>, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn get_by_slug_or_history(
&self,
_slug: &str,
) -> Result<Option<AppLookup>, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn update(
&self,
_id: AppId,
_name: Option<&str>,
_description: Option<Option<&str>>,
) -> Result<App, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn rename_slug(
&self,
_id: AppId,
_new_slug: &str,
_take_over_history: bool,
) -> Result<App, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn delete(&self, _id: AppId) -> Result<(), crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn delete_cascade(
&self,
_id: AppId,
) -> Result<(), crate::repo::ScriptRepositoryError> {
unimplemented!()
}
async fn count_scripts_in_app(
&self,
_id: AppId,
) -> Result<i64, crate::repo::ScriptRepositoryError> {
unimplemented!()
}
}
struct AlwaysAllowAuthzRepo;
#[async_trait]
impl AuthzRepo for AlwaysAllowAuthzRepo {
async fn membership(
&self,
_user_id: UserId,
_app_id: AppId,
) -> Result<Option<AppRole>, AuthzError> {
Ok(Some(AppRole::AppAdmin))
}
}
struct AlwaysDenyAuthzRepo;
#[async_trait]
impl AuthzRepo for AlwaysDenyAuthzRepo {
async fn membership(
&self,
_user_id: UserId,
_app_id: AppId,
) -> Result<Option<AppRole>, AuthzError> {
Ok(None)
}
}
fn member_principal() -> Principal {
Principal {
user_id: AdminUserId::new(),
instance_role: picloud_shared::InstanceRole::Member,
scopes: None,
app_binding: None,
}
}
fn state_with(authz: Arc<dyn AuthzRepo>, app_id: AppId) -> TriggersState {
TriggersState {
triggers: Arc::new(InMemoryTriggerRepo::default()),
apps: InMemoryAppRepo::with(app_id),
authz,
config: TriggerConfig::conservative(),
}
}
#[tokio::test]
async fn unknown_app_returns_404() {
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), AppId::new());
let res = create_kv_trigger(
State(state),
Extension(member_principal()),
Path(AppId::new()), // a different (non-existent) app
Json(CreateKvTriggerRequest {
script_id: ScriptId::new(),
collection_glob: "*".into(),
ops: vec![],
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: None,
retry_backoff: None,
retry_base_ms: None,
}),
)
.await;
let err = res.expect_err("missing app should error");
assert!(matches!(err, TriggersApiError::AppNotFound(_)));
}
#[tokio::test]
async fn member_without_role_is_forbidden() {
let app_id = AppId::new();
let state = state_with(Arc::new(AlwaysDenyAuthzRepo), app_id);
let res = create_kv_trigger(
State(state),
Extension(member_principal()),
Path(app_id),
Json(CreateKvTriggerRequest {
script_id: ScriptId::new(),
collection_glob: "*".into(),
ops: vec![],
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: None,
retry_backoff: None,
retry_base_ms: None,
}),
)
.await;
let err = res.expect_err("member without role should be forbidden");
assert!(matches!(err, TriggersApiError::Forbidden));
}
#[tokio::test]
async fn kv_trigger_uses_env_defaults_when_omitted() {
let app_id = AppId::new();
let mut state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_id);
// Tweak the config so we can detect that defaults were used.
state.config.retry_max_attempts = 7;
state.config.retry_base_ms = 12_345;
let (status, Json(trigger)) = create_kv_trigger(
State(state),
Extension(member_principal()),
Path(app_id),
Json(CreateKvTriggerRequest {
script_id: ScriptId::new(),
collection_glob: "widgets".into(),
ops: vec![KvEventOp::Insert],
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: None,
retry_backoff: None,
retry_base_ms: None,
}),
)
.await
.unwrap();
assert_eq!(status, StatusCode::CREATED);
assert_eq!(trigger.retry_max_attempts, 7);
assert_eq!(trigger.retry_base_ms, 12_345);
}
#[tokio::test]
async fn empty_collection_glob_rejected() {
let app_id = AppId::new();
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_id);
let res = create_kv_trigger(
State(state),
Extension(member_principal()),
Path(app_id),
Json(CreateKvTriggerRequest {
script_id: ScriptId::new(),
collection_glob: " ".into(),
ops: vec![],
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: None,
retry_backoff: None,
retry_base_ms: None,
}),
)
.await;
let err = res.expect_err("empty glob should reject");
assert!(matches!(err, TriggersApiError::Invalid(_)));
}
#[tokio::test]
async fn delete_rejects_cross_app_trigger_id() {
let app_a = AppId::new();
let app_b = AppId::new();
let state = state_with(Arc::new(AlwaysAllowAuthzRepo), app_a);
// Inject the app_b row into the in-memory apps repo too so
// the path-existence check succeeds against app_a.
// Insert a trigger that belongs to app_a.
let trigger = state
.triggers
.create_kv_trigger(
app_a,
CreateKvTrigger {
script_id: ScriptId::new(),
collection_glob: "*".into(),
ops: vec![],
dispatch_mode: TriggerDispatchMode::Async,
retry_max_attempts: 3,
retry_backoff: BackoffShape::Exponential,
retry_base_ms: 1000,
registered_by_principal: AdminUserId::new(),
},
)
.await
.unwrap();
let _ = app_b;
// Attempt to delete via app_b's path — should 404.
// First, give the in-memory app repo a record for app_b.
// (Otherwise we'd 404 on app-existence before reaching the
// cross-app check.)
let state = TriggersState {
apps: {
let now = Utc::now();
let mut existing = HashMap::new();
existing.insert(
app_a,
App {
id: app_a,
slug: "a".into(),
name: "a".into(),
description: None,
created_at: now,
updated_at: now,
},
);
existing.insert(
app_b,
App {
id: app_b,
slug: "b".into(),
name: "b".into(),
description: None,
created_at: now,
updated_at: now,
},
);
Arc::new(InMemoryAppRepo {
existing: Mutex::new(existing),
})
},
..state
};
let res = delete_trigger(
State(state),
Extension(member_principal()),
Path((app_b, trigger.id)),
)
.await;
let err = res.expect_err("cross-app delete should 404");
assert!(matches!(err, TriggersApiError::NotFound(_)));
}
}

View File

@@ -17,13 +17,15 @@ use axum::{
use chrono::Utc; use chrono::Utc;
use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType}; use picloud_executor_core::{ExecError, ExecRequest, ExecResponse, InvocationType};
use picloud_shared::{ use picloud_shared::{
AppId, ExecutionId, ExecutionLog, ExecutionLogSink, ExecutionStatus, Principal, RequestId, AppId, DispatchMode, ExecutionId, ExecutionLog, ExecutionLogSink, ExecutionStatus,
ScriptId, HttpDispatchPayload, InboxFailureKind, InboxResult, NewHttpOutbox, OutboxWriter, Principal,
RequestId, ScriptId,
}; };
use serde_json::Value as Json_; use serde_json::Value as Json_;
use uuid::Uuid; use uuid::Uuid;
use crate::client::ExecutorClient; use crate::client::ExecutorClient;
use crate::inbox::InboxRegistry;
use crate::resolver::{ResolverError, ScriptResolver}; use crate::resolver::{ResolverError, ScriptResolver};
use crate::routing::{AppDomainTable, RouteTable}; use crate::routing::{AppDomainTable, RouteTable};
@@ -39,6 +41,14 @@ pub struct DataPlaneState<E, R> {
/// Routing table for user-defined paths, partitioned per app. /// Routing table for user-defined paths, partitioned per app.
/// Shared with the manager (admin router writes; this side reads). /// Shared with the manager (admin router writes; this side reads).
pub routes: Arc<RouteTable>, pub routes: Arc<RouteTable>,
/// NATS-style inbox registry (v1.1.1). Used by sync HTTP via
/// outbox to await the dispatcher's delivery on a oneshot
/// channel.
pub inbox: Arc<InboxRegistry>,
/// Writer for the universal trigger outbox (v1.1.1). The sync
/// HTTP path inserts a row with `reply_to = inbox_id`; the async
/// path inserts with `reply_to = None` and returns 202.
pub outbox: Arc<dyn OutboxWriter>,
} }
impl<E, R> Clone for DataPlaneState<E, R> { impl<E, R> Clone for DataPlaneState<E, R> {
@@ -49,6 +59,8 @@ impl<E, R> Clone for DataPlaneState<E, R> {
log_sink: self.log_sink.clone(), log_sink: self.log_sink.clone(),
app_domains: self.app_domains.clone(), app_domains: self.app_domains.clone(),
routes: self.routes.clone(), routes: self.routes.clone(),
inbox: self.inbox.clone(),
outbox: self.outbox.clone(),
} }
} }
} }
@@ -202,50 +214,312 @@ where
Err(e) => return Err(ApiError::BadRequest(format!("body read failed: {e}"))), Err(e) => return Err(ApiError::BadRequest(format!("body read failed: {e}"))),
}; };
let mut req = build_exec_request( let body_json: Json_ = if body_bytes.is_empty() {
Json_::Null
} else {
serde_json::from_slice(&body_bytes)
.map_err(|e| ApiError::BadRequest(format!("invalid JSON body: {e}")))?
};
let header_map: BTreeMap<String, String> = headers
.iter()
.filter_map(|(k, v)| {
v.to_str()
.ok()
.map(|s| (k.as_str().to_string(), s.to_string()))
})
.collect();
let query = parse_query_string(&query_str);
let rest = matched.rest.clone().unwrap_or_default();
match matched.matched.dispatch_mode {
DispatchMode::Async => {
handle_async_route(
&state,
app_id,
matched.matched.route_id,
matched.matched.script_id, matched.matched.script_id,
&script.name, &script.name,
&headers, path,
&body_bytes, method,
app_id, header_map,
body_json,
matched.params,
query,
rest,
script.timeout_seconds,
principal, principal,
)?; )
req.path = path; .await
req.params = matched.params; }
req.query = parse_query_string(&query_str); DispatchMode::Sync => {
req.rest = matched.rest.unwrap_or_default(); handle_sync_route(
req.sandbox_overrides = script.sandbox; &state,
app_id,
matched.matched.route_id,
matched.matched.script_id,
&script.name,
path,
method,
header_map,
body_json,
matched.params,
query,
rest,
script.timeout_seconds,
principal,
)
.await
}
}
}
let request_id = req.request_id; #[allow(clippy::too_many_arguments)]
let request_path = req.path.clone(); async fn handle_async_route<E, R>(
let request_headers = req.headers.clone(); state: &DataPlaneState<E, R>,
let request_body = req.body.clone(); app_id: AppId,
route_id: Uuid,
script_id: ScriptId,
script_name: &str,
path: String,
method: String,
headers: BTreeMap<String, String>,
body: Json_,
params: BTreeMap<String, String>,
query: BTreeMap<String, String>,
rest: String,
timeout_seconds: u32,
principal: Option<Principal>,
) -> Result<Response, ApiError>
where
E: ExecutorClient + 'static,
R: ScriptResolver + 'static,
{
let payload = HttpDispatchPayload {
script_name: script_name.to_string(),
path,
method,
headers,
body,
params,
query,
rest,
timeout_seconds,
};
let payload_value = serde_json::to_value(&payload)
.map_err(|e| ApiError::BadRequest(format!("payload serialize: {e}")))?;
let execution_id = ExecutionId::new();
state
.outbox
.enqueue_http(NewHttpOutbox {
app_id,
route_id,
script_id,
reply_to: None,
payload: payload_value,
origin_principal: principal.map(|p| p.user_id),
trigger_depth: 0,
root_execution_id: Some(execution_id),
})
.await
.map_err(|e| ApiError::OutboxWrite(e.to_string()))?;
Ok((
StatusCode::ACCEPTED,
Json(serde_json::json!({
"accepted_at": Utc::now().to_rfc3339(),
"execution_id": execution_id.to_string(),
})),
)
.into_response())
}
let timeout = Duration::from_secs(u64::from(script.timeout_seconds)); #[allow(clippy::too_many_arguments)]
async fn handle_sync_route<E, R>(
state: &DataPlaneState<E, R>,
app_id: AppId,
route_id: Uuid,
script_id: ScriptId,
script_name: &str,
path: String,
method: String,
headers: BTreeMap<String, String>,
body: Json_,
params: BTreeMap<String, String>,
query: BTreeMap<String, String>,
rest: String,
timeout_seconds: u32,
principal: Option<Principal>,
) -> Result<Response, ApiError>
where
E: ExecutorClient + 'static,
R: ScriptResolver + 'static,
{
let payload = HttpDispatchPayload {
script_name: script_name.to_string(),
path: path.clone(),
method,
headers: headers.clone(),
body: body.clone(),
params,
query,
rest,
timeout_seconds,
};
let payload_value = serde_json::to_value(&payload)
.map_err(|e| ApiError::BadRequest(format!("payload serialize: {e}")))?;
// Register the inbox before writing the outbox row so the
// dispatcher can't race-deliver before the orchestrator is
// listening.
let (inbox_id, rx) = state.inbox.register();
let execution_id = ExecutionId::new();
let outbox_id = state
.outbox
.enqueue_http(NewHttpOutbox {
app_id,
route_id,
script_id,
reply_to: Some(inbox_id),
payload: payload_value,
origin_principal: principal.map(|p| p.user_id),
trigger_depth: 0,
root_execution_id: Some(execution_id),
})
.await
.map_err(|e| {
// Failed outbox write — abandon the inbox so the dispatcher
// can never deliver to a stale entry.
state.inbox.cancel(inbox_id);
ApiError::OutboxWrite(e.to_string())
})?;
// Wait for the dispatcher's delivery. Outer timeout = script
// wall-clock + a small buffer to cover dispatcher latency.
let wait_budget = Duration::from_secs(u64::from(timeout_seconds)) + Duration::from_secs(2);
let request_id = RequestId::new();
let started = Utc::now(); let started = Utc::now();
let outcome = state.executor.execute(&script.source, req, timeout).await; let result = tokio::time::timeout(wait_budget, rx).await;
let finished = Utc::now(); let finished = Utc::now();
let log = build_execution_log( // Tear down the receiver if it's still alive. `inbox.cancel` is a
script.app_id, // no-op when the dispatcher already delivered.
matched.matched.script_id, let _ = state.inbox.cancel(inbox_id);
let response = match result {
Ok(Ok(InboxResult::Success(summary))) => http_response_from_summary(summary),
Ok(Ok(InboxResult::Failure { kind, message })) => failure_to_response(kind, &message),
Ok(Err(_recv)) => {
// Channel was closed without a value — dispatcher dropped
// the sender. Treat as platform failure.
tracing::warn!(
outbox_id = %outbox_id,
"inbox channel closed without delivery"
);
failure_to_response(
InboxFailureKind::Platform,
"dispatcher closed inbox without delivery",
)
}
Err(_elapsed) => {
// Outer timeout — either the script was too slow or the
// dispatcher is wedged. Returns 504 by default.
failure_to_response(InboxFailureKind::Timeout, "request timed out")
}
};
let log = build_inbox_execution_log(
app_id,
script_id,
request_id, request_id,
request_path, path,
request_headers, headers,
request_body, body,
&outcome, response.status().as_u16(),
started, started,
finished, finished,
); );
if let Err(e) = state.log_sink.record(log).await { if let Err(e) = state.log_sink.record(log).await {
tracing::warn!( tracing::warn!(
error = %e, error = %e,
script_id = %matched.matched.script_id, %script_id,
"failed to persist execution log" "failed to persist execution log"
); );
} }
Ok(exec_response_to_http(outcome?)) Ok(response)
}
fn http_response_from_summary(summary: picloud_shared::ExecResponseSummary) -> Response {
let status =
StatusCode::from_u16(summary.status_code).unwrap_or(StatusCode::INTERNAL_SERVER_ERROR);
let mut http_headers = HeaderMap::new();
for (k, v) in summary.headers {
if let (Ok(name), Ok(value)) = (k.parse::<HeaderName>(), v.parse::<HeaderValue>()) {
http_headers.insert(name, value);
}
}
http_headers
.entry(axum::http::header::CONTENT_TYPE)
.or_insert_with(|| HeaderValue::from_static("application/json"));
(status, http_headers, Json(summary.body)).into_response()
}
/// Map `InboxFailureKind` onto the design-notes §3 status-code table.
fn failure_to_response(kind: InboxFailureKind, message: &str) -> Response {
let status = match kind {
InboxFailureKind::Validation => StatusCode::UNPROCESSABLE_ENTITY,
InboxFailureKind::Runtime => StatusCode::BAD_GATEWAY,
InboxFailureKind::Overloaded => StatusCode::SERVICE_UNAVAILABLE,
InboxFailureKind::Timeout => StatusCode::GATEWAY_TIMEOUT,
InboxFailureKind::OperationBudget => StatusCode::INSUFFICIENT_STORAGE,
InboxFailureKind::Platform => StatusCode::INTERNAL_SERVER_ERROR,
};
let body = Json(serde_json::json!({ "error": message }));
if matches!(kind, InboxFailureKind::Overloaded) {
return (status, [(axum::http::header::RETRY_AFTER, "1")], body).into_response();
}
(status, body).into_response()
}
#[allow(clippy::too_many_arguments)]
fn build_inbox_execution_log(
app_id: AppId,
script_id: ScriptId,
request_id: RequestId,
request_path: String,
request_headers: BTreeMap<String, String>,
request_body: Json_,
response_code: u16,
started: chrono::DateTime<Utc>,
finished: chrono::DateTime<Utc>,
) -> ExecutionLog {
let duration_ms = u64::try_from(
finished
.signed_duration_since(started)
.num_milliseconds()
.max(0),
)
.unwrap_or(0);
let status = if (200..400).contains(&response_code) {
ExecutionStatus::Success
} else {
ExecutionStatus::Error
};
ExecutionLog {
id: Uuid::new_v4(),
app_id,
script_id,
request_id,
request_path,
request_headers,
request_body,
response_code: Some(response_code),
response_body: None,
script_logs: Json_::Array(vec![]),
duration_ms,
status,
created_at: started,
}
} }
fn parse_query_string(s: &str) -> BTreeMap<String, String> { fn parse_query_string(s: &str) -> BTreeMap<String, String> {
@@ -317,6 +591,11 @@ fn build_exec_request(
// preserves the original root for chained executions. // preserves the original root for chained executions.
trigger_depth: 0, trigger_depth: 0,
root_execution_id: execution_id, root_execution_id: execution_id,
// Direct invocations are never DL handlers — that flag is only
// set by the dispatcher when it picks a dead_letter trigger row.
is_dead_letter_handler: false,
// No originating trigger event for direct ingress.
event: None,
}) })
} }
@@ -416,6 +695,9 @@ pub enum ApiError {
#[error("execution error: {0}")] #[error("execution error: {0}")]
Exec(#[from] ExecError), Exec(#[from] ExecError),
#[error("outbox write failed: {0}")]
OutboxWrite(String),
} }
impl IntoResponse for ApiError { impl IntoResponse for ApiError {
@@ -439,6 +721,13 @@ impl IntoResponse for ApiError {
let (status, message) = match &self { let (status, message) = match &self {
E::NotFound(_) => (StatusCode::NOT_FOUND, self.to_string()), E::NotFound(_) => (StatusCode::NOT_FOUND, self.to_string()),
E::BadRequest(_) => (StatusCode::BAD_REQUEST, self.to_string()), E::BadRequest(_) => (StatusCode::BAD_REQUEST, self.to_string()),
E::OutboxWrite(e) => {
tracing::error!(error = %e, "outbox write failed");
(
StatusCode::INTERNAL_SERVER_ERROR,
"internal error".to_string(),
)
}
E::Resolver(e) => { E::Resolver(e) => {
tracing::error!(error = %e, "resolver failure"); tracing::error!(error = %e, "resolver failure");
( (

View File

@@ -0,0 +1,139 @@
//! In-process `InboxRegistry` — the NATS-style request/reply
//! implementation for sync HTTP via the trigger outbox (design notes
//! §3).
//!
//! Workflow:
//! 1. Orchestrator allocates an `inbox_id`, calls
//! `registry.register()` to get a oneshot receiver.
//! 2. Orchestrator writes an outbox row with `reply_to = inbox_id`.
//! 3. Dispatcher picks the row, runs the script, calls
//! `registry.deliver(inbox_id, result)`.
//! 4. Orchestrator's `.await` on the receiver fires; it maps the
//! `InboxResult` back into an HTTP response.
//!
//! `Delivered` means the receiver was alive when delivery hit. If the
//! orchestrator timed out and dropped the receiver before delivery,
//! `Abandoned` comes back — the dispatcher writes an
//! `abandoned_executions` row (design notes §3 #9).
//!
//! Cluster mode (v1.3+) swaps this for a Postgres `LISTEN/NOTIFY`-
//! based resolver; the `InboxResolver` trait stays the same.
use std::collections::HashMap;
use std::sync::Mutex;
use async_trait::async_trait;
use picloud_shared::{InboxDeliveryOutcome, InboxResolver, InboxResult};
use tokio::sync::oneshot;
use uuid::Uuid;
pub struct InboxRegistry {
inner: Mutex<HashMap<Uuid, oneshot::Sender<InboxResult>>>,
}
impl InboxRegistry {
#[must_use]
pub fn new() -> Self {
Self {
inner: Mutex::new(HashMap::new()),
}
}
/// Allocate a new inbox id and register the sender side. The
/// caller awaits the returned `Receiver`; the dispatcher delivers
/// the outcome via `deliver(id, …)`.
#[must_use]
pub fn register(&self) -> (Uuid, oneshot::Receiver<InboxResult>) {
let id = Uuid::new_v4();
let (tx, rx) = oneshot::channel();
if let Ok(mut g) = self.inner.lock() {
g.insert(id, tx);
}
(id, rx)
}
/// Cancel a pending inbox (orchestrator timed out and gave up).
/// Drops the sender so any future `deliver` returns `Abandoned`.
/// Returns `true` if the receiver was still registered.
pub fn cancel(&self, id: Uuid) -> bool {
self.inner
.lock()
.map(|mut g| g.remove(&id).is_some())
.unwrap_or(false)
}
}
impl Default for InboxRegistry {
fn default() -> Self {
Self::new()
}
}
#[async_trait]
impl InboxResolver for InboxRegistry {
async fn deliver(&self, inbox_id: Uuid, result: InboxResult) -> InboxDeliveryOutcome {
let Ok(mut g) = self.inner.lock() else {
return InboxDeliveryOutcome::Abandoned;
};
let Some(tx) = g.remove(&inbox_id) else {
return InboxDeliveryOutcome::Abandoned;
};
// `send` returns Err iff the receiver was dropped — exactly
// the abandoned-execution case.
if tx.send(result).is_err() {
InboxDeliveryOutcome::Abandoned
} else {
InboxDeliveryOutcome::Delivered
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use picloud_shared::ExecResponseSummary;
use std::collections::BTreeMap;
fn ok_result() -> InboxResult {
InboxResult::Success(ExecResponseSummary {
status_code: 200,
headers: BTreeMap::new(),
body: serde_json::json!({ "ok": true }),
})
}
#[tokio::test]
async fn register_then_deliver_resolves_receiver() {
let reg = InboxRegistry::new();
let (id, rx) = reg.register();
let outcome = reg.deliver(id, ok_result()).await;
assert_eq!(outcome, InboxDeliveryOutcome::Delivered);
let received = rx.await.expect("receiver should fire");
assert!(matches!(received, InboxResult::Success(_)));
}
#[tokio::test]
async fn deliver_to_unknown_id_is_abandoned() {
let reg = InboxRegistry::new();
let outcome = reg.deliver(Uuid::new_v4(), ok_result()).await;
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
}
#[tokio::test]
async fn dropping_receiver_then_delivering_is_abandoned() {
let reg = InboxRegistry::new();
let (id, rx) = reg.register();
drop(rx);
let outcome = reg.deliver(id, ok_result()).await;
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
}
#[tokio::test]
async fn cancel_removes_sender() {
let reg = InboxRegistry::new();
let (id, _rx) = reg.register();
assert!(reg.cancel(id));
let outcome = reg.deliver(id, ok_result()).await;
assert_eq!(outcome, InboxDeliveryOutcome::Abandoned);
}
}

View File

@@ -11,10 +11,12 @@
pub mod api; pub mod api;
pub mod client; pub mod client;
pub mod gate; pub mod gate;
pub mod inbox;
pub mod resolver; pub mod resolver;
pub mod routing; pub mod routing;
pub use api::{data_plane_router, user_routes_router, DataPlaneState}; pub use api::{data_plane_router, user_routes_router, DataPlaneState};
pub use client::{ExecutorClient, LocalExecutorClient, RemoteExecutorClient}; pub use client::{ExecutorClient, LocalExecutorClient, RemoteExecutorClient};
pub use gate::{AcquireError, ExecutionGate}; pub use gate::{AcquireError, ExecutionGate};
pub use inbox::InboxRegistry;
pub use resolver::{ResolverError, ScriptResolver}; pub use resolver::{ResolverError, ScriptResolver};

View File

@@ -38,6 +38,11 @@ pub struct MatchResult {
pub struct Matched { pub struct Matched {
pub route_id: uuid::Uuid, pub route_id: uuid::Uuid,
pub script_id: picloud_shared::ScriptId, pub script_id: picloud_shared::ScriptId,
/// Per-route dispatch mode (v1.1.1). Forwarded to the
/// orchestrator's HTTP handler so it can pick the sync or async
/// path. Defaults to `Sync` for older routes that predate the
/// column.
pub dispatch_mode: picloud_shared::DispatchMode,
} }
/// A single route ready for matching. `app_id` is carried so the /// A single route ready for matching. `app_id` is carried so the
@@ -51,6 +56,7 @@ pub struct CompiledRoute {
pub host: HostPattern, pub host: HostPattern,
pub path: PathPattern, pub path: PathPattern,
pub method: Option<String>, pub method: Option<String>,
pub dispatch_mode: picloud_shared::DispatchMode,
} }
/// Find the best matching route for the request. Returns `None` if no /// Find the best matching route for the request. Returns `None` if no
@@ -180,6 +186,7 @@ fn match_within_bucket(
matched: Matched { matched: Matched {
route_id: route.route_id, route_id: route.route_id,
script_id: route.script_id, script_id: route.script_id,
dispatch_mode: route.dispatch_mode,
}, },
params: BTreeMap::new(), params: BTreeMap::new(),
rest: None, rest: None,
@@ -230,6 +237,7 @@ fn match_within_bucket(
matched: Matched { matched: Matched {
route_id: route.route_id, route_id: route.route_id,
script_id: route.script_id, script_id: route.script_id,
dispatch_mode: route.dispatch_mode,
}, },
params, params,
rest, rest,
@@ -312,6 +320,7 @@ mod tests {
host, host,
path: parse_path(path_kind, raw).unwrap(), path: parse_path(path_kind, raw).unwrap(),
method: None, method: None,
dispatch_mode: picloud_shared::DispatchMode::Sync,
} }
} }

View File

@@ -11,22 +11,28 @@ use axum::{routing::get, Json, Router};
use picloud_executor_core::{Engine, Limits}; use picloud_executor_core::{Engine, Limits};
use picloud_manager_core::{ use picloud_manager_core::{
admin_router, admins_router, api_keys_router, app_members_router, apps_api, apps_router, admin_router, admins_router, api_keys_router, app_members_router, apps_api, apps_router,
attach_principal_if_present, auth_router, compile_routes, migrations, require_authenticated, attach_principal_if_present, auth_router, compile_routes, dead_letters_router, migrations,
route_admin_router, AdminSessionRepository, AdminState, AdminUserRepository, AdminsState, require_authenticated, route_admin_router, triggers_router, AbandonedRepo,
AdminPrincipalResolver, AdminSessionRepository, AdminState, AdminUserRepository, AdminsState,
ApiKeyRepository, ApiKeysState, AppDomainRepository, AppMembersRepository, AppMembersState, ApiKeyRepository, ApiKeysState, AppDomainRepository, AppMembersRepository, AppMembersState,
AppRepository, AppsState, AuthState, AuthzRepo, PostgresAdminSessionRepository, AppRepository, AppsState, AuthState, AuthzRepo, DeadLetterRepo, DeadLettersState, Dispatcher,
PostgresAdminUserRepository, PostgresApiKeyRepository, PostgresAppDomainRepository, KvServiceImpl, OutboxEventEmitter, OutboxRepo, PostgresAbandonedRepo,
PostgresAppMembersRepository, PostgresAppRepository, PostgresExecutionLogRepository, PostgresAdminSessionRepository, PostgresAdminUserRepository, PostgresApiKeyRepository,
PostgresExecutionLogSink, PostgresRouteRepository, PostgresScriptRepository, RepoResolver, PostgresAppDomainRepository, PostgresAppMembersRepository, PostgresAppRepository,
RouteAdminState, RouteRepository, SandboxCeiling, PostgresDeadLetterRepo, PostgresDeadLetterService, PostgresExecutionLogRepository,
PostgresExecutionLogSink, PostgresKvRepo, PostgresOutboxRepo, PostgresRouteRepository,
PostgresScriptRepository, PostgresTriggerRepo, PrincipalResolver, RepoResolver,
RouteAdminState, RouteRepository, SandboxCeiling, ScriptRepository, TriggerConfig, TriggerRepo,
TriggersState,
}; };
use picloud_orchestrator_core::routing::{AppDomainTable, RouteTable}; use picloud_orchestrator_core::routing::{AppDomainTable, RouteTable};
use picloud_orchestrator_core::{ use picloud_orchestrator_core::{
data_plane_router, user_routes_router, DataPlaneState, ExecutionGate, LocalExecutorClient, data_plane_router, user_routes_router, DataPlaneState, ExecutionGate, InboxRegistry,
LocalExecutorClient,
}; };
use picloud_shared::{ use picloud_shared::{
ExecutionLogSink, ScriptValidator, Services, API_VERSION, PRODUCT_VERSION, SDK_VERSION, DeadLetterService, ExecutionLogSink, InboxResolver, KvService, OutboxWriter, ScriptValidator,
WIRE_VERSION, ServiceEventEmitter, Services, API_VERSION, PRODUCT_VERSION, SDK_VERSION, WIRE_VERSION,
}; };
use sqlx::postgres::PgPoolOptions; use sqlx::postgres::PgPoolOptions;
use sqlx::PgPool; use sqlx::PgPool;
@@ -83,10 +89,6 @@ fn read_session_ttl() -> Duration {
/// `/version`) stays open — it's the public ingress for user scripts. /// `/version`) stays open — it's the public ingress for user scripts.
#[allow(clippy::too_many_lines)] #[allow(clippy::too_many_lines)]
pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> { pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
// `Services` is the SDK service bundle. Empty in v1.1.0; the
// v1.1.1 KV PR will populate it with `kv: Arc::new(...)` here.
let engine = Arc::new(Engine::new(Limits::default(), Services::new()));
let script_repo = Arc::new(PostgresScriptRepository::new(pool.clone())); let script_repo = Arc::new(PostgresScriptRepository::new(pool.clone()));
let log_repo = Arc::new(PostgresExecutionLogRepository::new(pool.clone())); let log_repo = Arc::new(PostgresExecutionLogRepository::new(pool.clone()));
let log_sink: Arc<dyn ExecutionLogSink> = Arc::new(PostgresExecutionLogSink::new(pool.clone())); let log_sink: Arc<dyn ExecutionLogSink> = Arc::new(PostgresExecutionLogSink::new(pool.clone()));
@@ -98,10 +100,43 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
// (CRUD over the table) and `AuthzRepo` (single-row membership lookup // (CRUD over the table) and `AuthzRepo` (single-row membership lookup
// for capability checks). Construct it once and clone the Arc into // for capability checks). Construct it once and clone the Arc into
// both trait views — same allocation, two vtables. // both trait views — same allocation, two vtables.
let members_concrete = Arc::new(PostgresAppMembersRepository::new(pool)); let members_concrete = Arc::new(PostgresAppMembersRepository::new(pool.clone()));
let members: Arc<dyn AppMembersRepository> = members_concrete.clone(); let members: Arc<dyn AppMembersRepository> = members_concrete.clone();
let authz: Arc<dyn AuthzRepo> = members_concrete; let authz: Arc<dyn AuthzRepo> = members_concrete;
// Triggers framework storage. The outbox event emitter routes
// KV mutations into the outbox; the dispatcher fans them out.
let trigger_repo: Arc<dyn TriggerRepo> = Arc::new(PostgresTriggerRepo::new(pool.clone()));
// PostgresOutboxRepo implements both `OutboxRepo` (the dispatcher
// surface) and `OutboxWriter` (the orchestrator surface). Construct
// the concrete Arc once, clone it into each trait view — same
// allocation, two vtables (mirrors how `members_concrete` above is
// used as both `AppMembersRepository` and `AuthzRepo`).
let outbox_concrete = Arc::new(PostgresOutboxRepo::new(pool.clone()));
let outbox_repo: Arc<dyn OutboxRepo> = outbox_concrete.clone();
let outbox_writer: Arc<dyn OutboxWriter> = outbox_concrete;
let dl_repo: Arc<dyn DeadLetterRepo> = Arc::new(PostgresDeadLetterRepo::new(pool.clone()));
let abandoned_repo: Arc<dyn AbandonedRepo> = Arc::new(PostgresAbandonedRepo::new(pool.clone()));
let trigger_config = TriggerConfig::from_env();
// SDK services bundle. v1.1.1 ships the KV store + the
// outbox-backed event emitter + the dead-letter service (replay /
// resolve).
let kv_repo = Arc::new(PostgresKvRepo::new(pool));
let events: Arc<dyn ServiceEventEmitter> = Arc::new(OutboxEventEmitter::new(
trigger_repo.clone(),
outbox_repo.clone(),
));
let kv: Arc<dyn KvService> =
Arc::new(KvServiceImpl::new(kv_repo, authz.clone(), events.clone()));
let dl_service: Arc<dyn DeadLetterService> = Arc::new(PostgresDeadLetterService::new(
dl_repo.clone(),
outbox_repo.clone(),
authz.clone(),
));
let services = Services::new(kv, dl_service.clone(), events);
let engine = Arc::new(Engine::new(Limits::default(), services));
// Compile the routes table once at startup; admin writes refresh it. // Compile the routes table once at startup; admin writes refresh it.
let route_table = Arc::new(RouteTable::new()); let route_table = Arc::new(RouteTable::new());
let initial = route_repo.list_all().await?; let initial = route_repo.list_all().await?;
@@ -132,7 +167,34 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
// Single global gate — overflow is rejected with 503 + Retry-After. // Single global gate — overflow is rejected with 503 + Retry-After.
// See `ExecutionGate` docs and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`. // See `ExecutionGate` docs and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`.
let gate = Arc::new(ExecutionGate::from_env()); let gate = Arc::new(ExecutionGate::from_env());
let executor = Arc::new(LocalExecutorClient::new(engine.clone(), gate)); let executor = Arc::new(LocalExecutorClient::new(engine.clone(), gate.clone()));
// Dispatcher — single tokio task that polls the outbox and routes
// due rows to the executor. Shares the `ExecutionGate` with sync
// HTTP per design notes §2 (one cap for everything).
let dispatcher_script_repo: Arc<dyn ScriptRepository> =
Arc::new(PostgresScriptRepoHandle(script_repo.clone()));
let principals: Arc<dyn PrincipalResolver> =
Arc::new(AdminPrincipalResolver::new(auth.users.clone()));
// The InboxRegistry is constructed once and shared between the
// orchestrator (registers receivers, awaits) and the dispatcher
// (delivers results). Two Arc views on the same allocation.
let inbox_registry = Arc::new(InboxRegistry::new());
let inbox_resolver: Arc<dyn InboxResolver> = inbox_registry.clone();
Dispatcher {
outbox: outbox_repo.clone(),
triggers: trigger_repo.clone(),
scripts: dispatcher_script_repo,
dead_letters: dl_repo.clone(),
abandoned: abandoned_repo.clone(),
principals,
executor: executor.clone(),
gate,
inbox: inbox_resolver,
config: trigger_config,
instance_id: format!("picloud-{}", std::process::id()),
}
.spawn();
let admin = AdminState { let admin = AdminState {
repo: Arc::new(PostgresScriptRepoHandle(script_repo.clone())), repo: Arc::new(PostgresScriptRepoHandle(script_repo.clone())),
@@ -155,6 +217,30 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
log_sink, log_sink,
app_domains: app_domain_table.clone(), app_domains: app_domain_table.clone(),
routes: route_table, routes: route_table,
inbox: inbox_registry,
outbox: outbox_writer,
};
// Weekly retention sweepers for dead_letters + abandoned_executions.
// Defaults: 30 days / 7 days (design notes §3 #9 + §4 retention).
picloud_manager_core::spawn_dead_letter_gc(
dl_repo.clone(),
trigger_config.dead_letter_retention_days,
);
picloud_manager_core::spawn_abandoned_gc(
abandoned_repo.clone(),
trigger_config.abandoned_retention_days,
);
let triggers_state = TriggersState {
triggers: trigger_repo,
apps: apps_repo.clone(),
authz: authz.clone(),
config: trigger_config,
};
let dead_letters_state = DeadLettersState {
repo: dl_repo,
service: dl_service,
apps: apps_repo.clone(),
authz: authz.clone(),
}; };
let apps_state = AppsState { let apps_state = AppsState {
apps: apps_repo, apps: apps_repo,
@@ -197,6 +283,8 @@ pub async fn build_app(pool: PgPool, auth: AuthDeps) -> anyhow::Result<Router> {
.merge(apps_router(apps_state)) .merge(apps_router(apps_state))
.merge(app_members_router(app_members_state)) .merge(app_members_router(app_members_state))
.merge(api_keys_router(api_keys_state)) .merge(api_keys_router(api_keys_state))
.merge(triggers_router(triggers_state))
.merge(dead_letters_router(dead_letters_state))
.layer(from_fn_with_state( .layer(from_fn_with_state(
auth_state.clone(), auth_state.clone(),
require_authenticated, require_authenticated,

View File

@@ -0,0 +1,118 @@
//! `DeadLetterService` — Rhai SDK contract for replaying and resolving
//! dead letters. Surface kept intentionally narrow for v1.1.1 (no
//! `list` — deferred to v1.2 per `docs/v1.1.x-design-notes.md` §4).
//!
//! Both methods are gated by `Capability::AppDeadLetterManage(AppId)`
//! evaluated inside the impl. Public-HTTP scripts running with
//! `cx.principal = None` will fail the check, which matches the
//! design's expectation (managing dead letters is an admin act).
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
use thiserror::Error;
use uuid::Uuid;
use crate::SdkCallCx;
/// Opaque identifier for a `dead_letters` row.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(transparent)]
pub struct DeadLetterId(pub Uuid);
impl DeadLetterId {
#[must_use]
pub fn new() -> Self {
Self(Uuid::new_v4())
}
#[must_use]
pub fn into_inner(self) -> Uuid {
self.0
}
}
impl Default for DeadLetterId {
fn default() -> Self {
Self::new()
}
}
impl From<Uuid> for DeadLetterId {
fn from(u: Uuid) -> Self {
Self(u)
}
}
impl From<DeadLetterId> for Uuid {
fn from(id: DeadLetterId) -> Self {
id.0
}
}
impl std::fmt::Display for DeadLetterId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.0.fmt(f)
}
}
#[async_trait]
pub trait DeadLetterService: Send + Sync {
/// Re-enqueue the original event into the outbox. The dead-letter
/// row is marked `resolution = 'replayed'` regardless of whether
/// the retry ultimately succeeds.
async fn replay(&self, cx: &SdkCallCx, id: DeadLetterId) -> Result<(), DeadLetterError>;
/// Mark the row resolved with the given reason (typically
/// `"ignored"` from the dashboard or `"handled_by_script"` from
/// inside a `dead_letter` trigger handler).
async fn resolve(
&self,
cx: &SdkCallCx,
id: DeadLetterId,
reason: &str,
) -> Result<(), DeadLetterError>;
}
#[derive(Debug, Error)]
pub enum DeadLetterError {
#[error("dead-letter row not found")]
NotFound,
#[error("forbidden")]
Forbidden,
#[error("invalid resolution reason: {0}")]
InvalidResolution(String),
#[error("dead-letter backend error: {0}")]
Backend(String),
}
/// Stub used to bootstrap the `Services` bundle before the real
/// Postgres-backed implementation lands. Behaves like
/// `NoopEventEmitter` — every call returns `Backend("...")` so scripts
/// see a clear "not yet implemented" error rather than silently
/// no-op'ing. Replaced by `PostgresDeadLetterService` in the v1.1.1
/// dead-letter PR.
#[derive(Debug, Default, Clone, Copy)]
pub struct NoopDeadLetterService;
#[async_trait]
impl DeadLetterService for NoopDeadLetterService {
async fn replay(&self, _cx: &SdkCallCx, _id: DeadLetterId) -> Result<(), DeadLetterError> {
Err(DeadLetterError::Backend(
"dead_letters::replay is not yet wired in".into(),
))
}
async fn resolve(
&self,
_cx: &SdkCallCx,
_id: DeadLetterId,
_reason: &str,
) -> Result<(), DeadLetterError> {
Err(DeadLetterError::Backend(
"dead_letters::resolve is not yet wired in".into(),
))
}
}

View File

@@ -0,0 +1,16 @@
//! `ExecResponseSummary` — a flattened, crate-portable view of an
//! `ExecResponse` for use by `InboxResult`. Lives in
//! `picloud-shared` because the dispatcher (manager-core) and the
//! orchestrator-core inbox registry both need to read it, and
//! `executor-core::ExecResponse` is owned by a leaf crate.
use std::collections::BTreeMap;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ExecResponseSummary {
pub status_code: u16,
pub headers: BTreeMap<String, String>,
pub body: serde_json::Value,
}

View File

@@ -53,3 +53,4 @@ id_type!(RequestId);
id_type!(AdminUserId); id_type!(AdminUserId);
id_type!(AppId); id_type!(AppId);
id_type!(ApiKeyId); id_type!(ApiKeyId);
id_type!(TriggerId);

View File

@@ -0,0 +1,86 @@
//! `InboxResolver` — abstraction the dispatcher uses to deliver sync
//! HTTP results back to the orchestrator that's awaiting them on a
//! oneshot channel. Lives in `picloud-shared` because the dispatcher
//! (manager-core) and the registry impl (orchestrator-core) live in
//! different crates and need a shared trait surface.
//!
//! v1.1.1 ships an in-process implementation in `orchestrator-core`
//! that keeps a `HashMap<inbox_id, oneshot::Sender<...>>`. Cluster
//! mode (v1.3+) swaps this for a Postgres `LISTEN/NOTIFY`-based
//! resolver without touching the dispatcher code (design notes §3
//! implementation table).
//!
//! Until commit 6 wires up the real registry, `NoopInboxResolver`
//! (`Abandoned` for every attempt) keeps the dispatcher able to run.
use async_trait::async_trait;
use uuid::Uuid;
use crate::ExecResponseSummary;
/// Result of trying to hand back a sync-HTTP outcome.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum InboxDeliveryOutcome {
/// Receiver still attached; result was delivered. Dispatcher
/// deletes the outbox row.
Delivered,
/// Receiver was dropped (orchestrator timed out). Dispatcher
/// writes an `abandoned_executions` row.
Abandoned,
}
/// Outcome shape the dispatcher delivers to the inbox. Carries enough
/// to reconstruct an HTTP response — full body via JSON, optional
/// error string when the executor reported a failure.
#[derive(Debug, Clone)]
pub enum InboxResult {
/// Successful execution. `response` is the `ExecResponse` summary
/// (status code + body + headers + logs).
Success(ExecResponseSummary),
/// Failure modes — script threw, op-budget, timeout, etc. The
/// orchestrator maps these to the design-notes §3 status codes
/// (422/502/503/504/507/500) when responding to the HTTP caller.
Failure {
kind: InboxFailureKind,
message: String,
},
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum InboxFailureKind {
/// Script's Rhai code threw or hit a runtime error → 502.
Runtime,
/// Wall-clock exceeded → 504.
Timeout,
/// Operation budget exceeded → 507.
OperationBudget,
/// Gate refused admission → 503.
Overloaded,
/// Script parse failure / bad-request → 422.
Validation,
/// Platform problem (executor crashed, dispatcher crashed, etc.) → 500.
Platform,
}
#[async_trait]
pub trait InboxResolver: Send + Sync {
/// Attempt to deliver `result` to the receiver registered under
/// `inbox_id`. Returns `Delivered` if the channel was alive,
/// `Abandoned` if the receiver was already dropped (the
/// orchestrator's timeout fired before the dispatcher got here).
async fn deliver(&self, inbox_id: Uuid, result: InboxResult) -> InboxDeliveryOutcome;
}
/// Bootstrap impl used before the real registry is wired in. Every
/// delivery is treated as abandoned — the dispatcher records an
/// abandoned-execution row and moves on. Replaced in `build_app` with
/// the in-process `InboxRegistry` from orchestrator-core.
#[derive(Debug, Default, Clone, Copy)]
pub struct NoopInboxResolver;
#[async_trait]
impl InboxResolver for NoopInboxResolver {
async fn deliver(&self, _inbox_id: Uuid, _result: InboxResult) -> InboxDeliveryOutcome {
InboxDeliveryOutcome::Abandoned
}
}

140
crates/shared/src/kv.rs Normal file
View File

@@ -0,0 +1,140 @@
//! `KvService` — the v1.1.1 key-value store contract.
//!
//! Lives in `picloud-shared` (not `executor-core`) so the Rhai bridge,
//! the manager-core Postgres impl, and any future in-memory test impl
//! can all depend on the same trait without dragging
//! `executor-core` into `manager-core`'s dep graph.
//!
//! Implementations MUST derive every storage `app_id` from `cx.app_id`
//! — never from a script-passed argument. That is the cross-app
//! isolation boundary; see `docs/sdk-shape.md`.
use async_trait::async_trait;
use thiserror::Error;
use crate::SdkCallCx;
/// `KvService` is collection-scoped. Scripts get a handle via
/// `kv::collection(name)` and call `get`/`set`/`has`/`delete`/`list`
/// on it. The trait surface accepts the collection by name so the
/// Postgres impl can avoid an extra round-trip to materialize the
/// collection (collections are namespaces, not first-class rows).
#[async_trait]
pub trait KvService: Send + Sync {
async fn get(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
) -> Result<Option<serde_json::Value>, KvError>;
async fn set(
&self,
cx: &SdkCallCx,
collection: &str,
key: &str,
value: serde_json::Value,
) -> Result<(), KvError>;
async fn delete(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError>;
async fn has(&self, cx: &SdkCallCx, collection: &str, key: &str) -> Result<bool, KvError>;
/// Cursor-style pagination. `cursor` is opaque to the caller;
/// implementations encode the resume key inside. `None` cursor
/// starts from the beginning. Implementations cap `limit` at a
/// reasonable ceiling internally (script can't request an unbounded
/// page).
async fn list(
&self,
cx: &SdkCallCx,
collection: &str,
cursor: Option<&str>,
limit: u32,
) -> Result<KvListPage, KvError>;
}
/// One page of keys from `KvService::list`. `next_cursor` is `Some`
/// when more pages exist, `None` when exhausted. The cursor encoding
/// is implementation-defined (the Postgres impl base64-encodes the
/// last key).
#[derive(Debug, Clone)]
pub struct KvListPage {
pub keys: Vec<String>,
pub next_cursor: Option<String>,
}
/// Stub used by the test harness so executor-core integration tests
/// (which don't touch KV) can construct a `Services` bundle without
/// spinning up Postgres. Every call returns
/// `KvError::Backend("...")` so accidental KV use surfaces clearly.
#[derive(Debug, Default, Clone, Copy)]
pub struct NoopKvService;
#[async_trait]
impl KvService for NoopKvService {
async fn get(
&self,
_cx: &SdkCallCx,
_collection: &str,
_key: &str,
) -> Result<Option<serde_json::Value>, KvError> {
Err(KvError::Backend("kv is not wired in".into()))
}
async fn set(
&self,
_cx: &SdkCallCx,
_collection: &str,
_key: &str,
_value: serde_json::Value,
) -> Result<(), KvError> {
Err(KvError::Backend("kv is not wired in".into()))
}
async fn delete(
&self,
_cx: &SdkCallCx,
_collection: &str,
_key: &str,
) -> Result<bool, KvError> {
Err(KvError::Backend("kv is not wired in".into()))
}
async fn has(&self, _cx: &SdkCallCx, _collection: &str, _key: &str) -> Result<bool, KvError> {
Err(KvError::Backend("kv is not wired in".into()))
}
async fn list(
&self,
_cx: &SdkCallCx,
_collection: &str,
_cursor: Option<&str>,
_limit: u32,
) -> Result<KvListPage, KvError> {
Err(KvError::Backend("kv is not wired in".into()))
}
}
/// Failure modes surfaced to the Rhai bridge. The bridge converts each
/// to a Rhai runtime error string; the discriminants exist so internal
/// callers (admin endpoints, tests, GC) can react more precisely.
#[derive(Debug, Error)]
pub enum KvError {
/// Empty collection name; rejected at the SDK boundary per
/// `docs/sdk-shape.md`.
#[error("collection name must not be empty")]
InvalidCollection,
/// Caller principal lacked the required capability. Only raised
/// when `cx.principal.is_some()` — scripts running with
/// `principal: None` (public HTTP) operate under script-as-gate
/// semantics and skip the capability check.
#[error("forbidden")]
Forbidden,
/// Anything else — Postgres unavailable, serialization failure,
/// etc. The string is safe to surface to a script.
#[error("kv backend error: {0}")]
Backend(String),
}

View File

@@ -6,30 +6,44 @@
pub mod app; pub mod app;
pub mod auth; pub mod auth;
pub mod dead_letters;
pub mod error; pub mod error;
pub mod events; pub mod events;
pub mod exec_summary;
pub mod execution_log; pub mod execution_log;
pub mod ids; pub mod ids;
pub mod inbox;
pub mod kv;
pub mod log_sink; pub mod log_sink;
pub mod outbox_writer;
pub mod route; pub mod route;
pub mod sandbox; pub mod sandbox;
pub mod script; pub mod script;
pub mod sdk_cx; pub mod sdk_cx;
pub mod services; pub mod services;
pub mod trigger_event;
pub mod validator; pub mod validator;
pub mod version; pub mod version;
pub use app::{App, AppDomain, DomainShape}; pub use app::{App, AppDomain, DomainShape};
pub use auth::{AppRole, InstanceRole, Principal, Scope, UserId}; pub use auth::{AppRole, InstanceRole, Principal, Scope, UserId};
pub use dead_letters::{DeadLetterError, DeadLetterId, DeadLetterService, NoopDeadLetterService};
pub use error::Error; pub use error::Error;
pub use events::{EmitError, NoopEventEmitter, ServiceEvent, ServiceEventEmitter}; pub use events::{EmitError, NoopEventEmitter, ServiceEvent, ServiceEventEmitter};
pub use exec_summary::ExecResponseSummary;
pub use execution_log::{ExecutionLog, ExecutionStatus}; pub use execution_log::{ExecutionLog, ExecutionStatus};
pub use ids::{AdminUserId, ApiKeyId, AppId, ExecutionId, RequestId, ScriptId}; pub use ids::{AdminUserId, ApiKeyId, AppId, ExecutionId, RequestId, ScriptId, TriggerId};
pub use inbox::{
InboxDeliveryOutcome, InboxFailureKind, InboxResolver, InboxResult, NoopInboxResolver,
};
pub use kv::{KvError, KvListPage, KvService, NoopKvService};
pub use log_sink::{ExecutionLogSink, LogSinkError}; pub use log_sink::{ExecutionLogSink, LogSinkError};
pub use route::{HostKind, PathKind, Route}; pub use outbox_writer::{HttpDispatchPayload, NewHttpOutbox, OutboxWriter, OutboxWriterError};
pub use route::{DispatchMode, HostKind, PathKind, Route};
pub use sandbox::ScriptSandbox; pub use sandbox::ScriptSandbox;
pub use script::Script; pub use script::Script;
pub use sdk_cx::SdkCallCx; pub use sdk_cx::SdkCallCx;
pub use services::Services; pub use services::Services;
pub use trigger_event::{DeadLetterEventDetail, KvEventOp, TriggerEvent};
pub use validator::{ScriptValidator, ValidationError}; pub use validator::{ScriptValidator, ValidationError};
pub use version::{API_VERSION, PRODUCT_VERSION, SDK_VERSION, WIRE_VERSION}; pub use version::{API_VERSION, PRODUCT_VERSION, SDK_VERSION, WIRE_VERSION};

View File

@@ -0,0 +1,72 @@
//! `OutboxWriter` — minimal trait the orchestrator-core sync-HTTP path
//! uses to enqueue rows into the universal trigger outbox. The
//! manager-core `PostgresOutboxRepo` implements this in addition to
//! its richer `OutboxRepo` surface; defining it here lets
//! orchestrator-core depend on the trait without pulling in
//! manager-core (which would invert the dependency arrow).
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
use thiserror::Error;
use uuid::Uuid;
use crate::{AdminUserId, AppId, ExecutionId, ScriptId};
/// What the orchestrator hands to the outbox when it ingests an HTTP
/// request. Carries enough for the dispatcher to reconstruct the
/// `ExecRequest` end-to-end.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NewHttpOutbox {
pub app_id: AppId,
/// `routes.id` of the matched route. Discriminated against
/// `triggers.id` by `source_kind = 'http'` on the outbox row.
pub route_id: Uuid,
/// Pre-resolved script so the dispatcher doesn't re-look it up.
pub script_id: ScriptId,
/// `Some(inbox_id)` for sync HTTP (the orchestrator awaits a
/// channel keyed on this id). `None` for `dispatch_mode = async`
/// — dispatcher fires-and-forgets, no reply path.
pub reply_to: Option<Uuid>,
/// Serialized `HttpDispatchPayload` (defined below) — everything
/// the dispatcher needs to reconstruct an `ExecRequest`.
pub payload: serde_json::Value,
/// The principal that ingressed the HTTP request (Some when
/// authenticated, None for public). Forensic only; the script
/// executes as the route's app principal model, not this.
pub origin_principal: Option<AdminUserId>,
/// `0` for direct HTTP ingress; the dispatcher will increment
/// for any further fan-out triggered by the script.
pub trigger_depth: u32,
pub root_execution_id: Option<ExecutionId>,
}
/// The shape the orchestrator serializes into `NewHttpOutbox.payload`
/// (the JSONB column). Mirrored on the dispatcher side so it can
/// rebuild an `ExecRequest`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HttpDispatchPayload {
pub script_name: String,
pub path: String,
pub method: String,
pub headers: std::collections::BTreeMap<String, String>,
pub body: serde_json::Value,
pub params: std::collections::BTreeMap<String, String>,
pub query: std::collections::BTreeMap<String, String>,
pub rest: String,
pub timeout_seconds: u32,
}
#[async_trait]
pub trait OutboxWriter: Send + Sync {
/// Insert a sync- or async-HTTP outbox row. Returns the row's id
/// — the orchestrator stores it locally for forensics and to
/// correlate `abandoned_executions` rows when the dispatcher's
/// inbox delivery fails.
async fn enqueue_http(&self, row: NewHttpOutbox) -> Result<Uuid, OutboxWriterError>;
}
#[derive(Debug, Error)]
pub enum OutboxWriterError {
#[error("outbox write failed: {0}")]
Backend(String),
}

View File

@@ -37,6 +37,38 @@ pub enum PathKind {
Param, Param,
} }
/// Per-route dispatch mode (v1.1.1). `Sync` = orchestrator awaits the
/// executor and returns the response in the same HTTP request. `Async`
/// = orchestrator writes the request to the trigger outbox, returns
/// `202 Accepted` immediately, and the dispatcher runs the script in
/// the background (with retries + dead-letter).
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
#[serde(rename_all = "lowercase")]
pub enum DispatchMode {
#[default]
Sync,
Async,
}
impl DispatchMode {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Sync => "sync",
Self::Async => "async",
}
}
#[must_use]
pub fn from_wire(s: &str) -> Option<Self> {
match s {
"sync" => Some(Self::Sync),
"async" => Some(Self::Async),
_ => None,
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Route { pub struct Route {
pub id: Uuid, pub id: Uuid,
@@ -60,5 +92,12 @@ pub struct Route {
/// `None` = any method. /// `None` = any method.
pub method: Option<String>, pub method: Option<String>,
/// v1.1.1: per-route dispatch mode. `Sync` (default) → orchestrator
/// awaits the executor inline. `Async` → orchestrator writes to
/// the outbox + returns `202 Accepted`; dispatcher fires the
/// script in the background with retries.
#[serde(default)]
pub dispatch_mode: DispatchMode,
pub created_at: DateTime<Utc>, pub created_at: DateTime<Utc>,
} }

View File

@@ -12,7 +12,7 @@
//! the cx in is shared by both sides. Pure value type — no handles, no //! the cx in is shared by both sides. Pure value type — no handles, no
//! DB pool references, no allocations beyond what's in `Principal`. //! DB pool references, no allocations beyond what's in `Principal`.
use crate::{AppId, ExecutionId, Principal, RequestId}; use crate::{AppId, ExecutionId, Principal, RequestId, TriggerEvent};
/// Per-invocation context for every stateful SDK service call. /// Per-invocation context for every stateful SDK service call.
/// ///
@@ -51,4 +51,19 @@ pub struct SdkCallCx {
/// `execution_id` of the original ingress execution. Lets the audit /// `execution_id` of the original ingress execution. Lets the audit
/// log group every fan-out execution under the originating event. /// log group every fan-out execution under the originating event.
pub root_execution_id: ExecutionId, pub root_execution_id: ExecutionId,
/// `true` only when this invocation is a `dead_letter` trigger
/// handler. Set by the dispatcher when it picks an outbox row
/// whose trigger has `kind = 'dead_letter'`. The retry / dead-
/// letter machinery short-circuits when this is set: handlers
/// execute once, with no retry, and a failed run can NEVER be
/// dead-lettered itself (design notes §4 recursion-stop rule).
/// `false` for every other invocation, including the script
/// being used as a non-DL trigger handler.
pub is_dead_letter_handler: bool,
/// The event that fired this script, when it's a triggered
/// invocation. `None` for direct ingress (HTTP request, manual
/// run). Surfaced to scripts as `ctx.event`.
pub event: Option<TriggerEvent>,
} }

View File

@@ -1,38 +1,81 @@
//! `Services` — bundle of stateful SDK service handles plumbed from the //! `Services` — bundle of stateful SDK service handles plumbed from the
//! host binary into every Rhai execution. //! host binary into every Rhai execution.
//! //!
//! v1.1.0 ships this struct empty. Subsequent PRs in the v1.1.x series //! Constructed once at startup in the picloud binary; cloned (cheap —
//! add one field per service: //! every field is an `Arc`) into the per-call sdk bridge so script
//! invocations don't need to re-resolve dependencies. The bundle is
//! handed to `executor-core::sdk::register_all` alongside an
//! `SdkCallCx` to wire each `::` namespace.
//! //!
//! ```ignore //! v1.1.0 shipped this empty; v1.1.1 adds the first two service fields
//! pub kv: Arc<dyn KvService>, // v1.1.1 //! (`kv`, `dead_letters`) plus the `events` emitter that bound services
//! pub docs: Arc<dyn DocsService>, // v1.1.2 //! use to publish events into the triggers outbox.
//! pub http: Arc<dyn HttpService>, // v1.1.4
//! // …
//! ```
//!
//! The bundle is cheap to clone (`Arc` per service) and is constructed
//! once at startup in the picloud binary. The executor takes it by
//! reference per invocation, hands it (alongside an `SdkCallCx`) to
//! `executor-core::sdk::register_all`, which wires the corresponding
//! Rhai `::` namespace per service.
//! //!
//! `#[non_exhaustive]` so adding fields is a non-breaking change for //! `#[non_exhaustive]` so adding fields is a non-breaking change for
//! consumers that only *pattern-match* a `&Services`; only crates that //! consumers that only *pattern-match* a `&Services`; only crates that
//! *construct* a `Services` (in practice, just the picloud binary) need //! *construct* a `Services` (the picloud binary and tests) update.
//! to update their constructor when new services land.
use std::sync::Arc;
use crate::{
DeadLetterService, KvService, NoopDeadLetterService, NoopEventEmitter, NoopKvService,
ServiceEventEmitter,
};
/// SDK service bundle. See module docs for the lifecycle and the v1.1.x /// SDK service bundle. See module docs for the lifecycle and the v1.1.x
/// expansion plan. /// expansion plan.
#[non_exhaustive] #[non_exhaustive]
#[derive(Default)] pub struct Services {
pub struct Services {} /// KV store (v1.1.1). Backed by Postgres in the picloud binary;
/// in-memory in tests.
pub kv: Arc<dyn KvService>,
/// Dead-letter management (v1.1.1). Scripts get
/// `dead_letters::replay(id)` and `dead_letters::resolve(id, reason)`.
pub dead_letters: Arc<dyn DeadLetterService>,
/// Event emitter for the triggers outbox. Mutating service methods
/// (`KvService::set/delete`, future `docs::*`, `files::*`, etc.)
/// call `events.emit(cx, event)` after the write succeeds. The
/// outbox-backed impl in `manager-core::outbox_event_emitter`
/// replaces v1.1.0's `NoopEventEmitter`.
pub events: Arc<dyn ServiceEventEmitter>,
}
impl Services { impl Services {
/// Construct an empty bundle. Replaced by a fielded `::new(...)` /// Construct a bundle from already-constructed `Arc<dyn …>` handles.
/// once the first service (KV, v1.1.1) lands. /// The picloud binary's `main` wires this up after the DB pool is
/// open; tests build it from in-memory fakes.
#[must_use] #[must_use]
pub fn new() -> Self { pub fn new(
Self {} kv: Arc<dyn KvService>,
dead_letters: Arc<dyn DeadLetterService>,
events: Arc<dyn ServiceEventEmitter>,
) -> Self {
Self {
kv,
dead_letters,
events,
}
}
/// All-noop bundle for tests that build an `Engine` but don't
/// exercise the stateful services. Returns the same shape as
/// `Services::new` so callers can't accidentally rely on a stub
/// silently doing the right thing — every call into a noop
/// service surfaces an explicit error.
#[must_use]
pub fn with_noop_services() -> Self {
Self::new(
Arc::new(NoopKvService),
Arc::new(NoopDeadLetterService),
Arc::new(NoopEventEmitter),
)
}
}
impl Default for Services {
fn default() -> Self {
Self::with_noop_services()
} }
} }

View File

@@ -0,0 +1,105 @@
//! `TriggerEvent` — the description of the event that fired a script.
//!
//! Built by the dispatcher (in `manager-core`) from the outbox row and
//! attached to the `ExecRequest` that's handed to `executor-core`. The
//! Rhai bridge in `executor-core::engine::build_ctx_map` flattens this
//! into `ctx.event` for the script.
//!
//! Living in `picloud-shared` so the dispatcher and the executor agree
//! on the wire shape. Serializable so cluster mode (v1.3+) can ship
//! ExecRequests over HTTP without rewriting this type.
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use crate::{DeadLetterId, ScriptId, TriggerId};
/// Operations a KV trigger can fire on. Stored as a lowercase string
/// in `kv_trigger_details.ops` (Postgres `text[]`).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum KvEventOp {
Insert,
Update,
Delete,
}
impl KvEventOp {
#[must_use]
pub const fn as_str(self) -> &'static str {
match self {
Self::Insert => "insert",
Self::Update => "update",
Self::Delete => "delete",
}
}
#[must_use]
pub fn from_wire(s: &str) -> Option<Self> {
match s {
"insert" => Some(Self::Insert),
"update" => Some(Self::Update),
"delete" => Some(Self::Delete),
_ => None,
}
}
}
/// Discriminated description of a triggering event. Lifted from the
/// outbox row's payload at dispatch time. Each variant carries the
/// fields the corresponding `ctx.event` shape exposes to the script.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "source", rename_all = "snake_case")]
pub enum TriggerEvent {
/// A KV insert / update / delete fired this handler.
Kv {
op: KvEventOp,
collection: String,
key: String,
/// Present on `insert` and `update`. Absent on `delete`.
#[serde(default, skip_serializing_if = "Option::is_none")]
value: Option<serde_json::Value>,
},
/// A dead-letter row fired this handler. The original event is
/// nested verbatim plus the dead-letter metadata the design notes
/// §4 require.
DeadLetter {
dead_letter_id: DeadLetterId,
original: Box<TriggerEvent>,
attempts: u32,
last_error: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
trigger_id: Option<TriggerId>,
#[serde(default, skip_serializing_if = "Option::is_none")]
script_id: Option<ScriptId>,
first_attempt_at: DateTime<Utc>,
last_attempt_at: DateTime<Utc>,
},
}
impl TriggerEvent {
/// The `source` discriminant the script sees on `ctx.event.source`.
#[must_use]
pub const fn source(&self) -> &'static str {
match self {
Self::Kv { .. } => "kv",
Self::DeadLetter { .. } => "dead_letter",
}
}
}
/// Convenience accessor on the dead-letter variant for places that
/// already know they're handling a DL event. Pulled out so the
/// dispatcher and the dashboard don't have to repeat the match.
#[derive(Debug, Clone)]
pub struct DeadLetterEventDetail {
pub dead_letter_id: DeadLetterId,
pub original: TriggerEvent,
pub attempts: u32,
pub last_error: String,
pub trigger_id: Option<TriggerId>,
pub script_id: Option<ScriptId>,
pub first_attempt_at: DateTime<Utc>,
pub last_attempt_at: DateTime<Utc>,
}

View File

@@ -19,7 +19,10 @@ pub const PRODUCT_VERSION: &str = env!("CARGO_PKG_VERSION");
/// ///
/// 1.1 additions: `ctx.request.params`, `ctx.request.query`, /// 1.1 additions: `ctx.request.params`, `ctx.request.query`,
/// `ctx.request.rest`. /// `ctx.request.rest`.
pub const SDK_VERSION: &str = "1.1"; ///
/// 1.2 additions (v1.1.1): `kv::collection(name).{get,set,has,delete,list}`,
/// `dead_letters::{replay,resolve}`, `ctx.event` for triggered handlers.
pub const SDK_VERSION: &str = "1.2";
/// HTTP API major version. Appears in URL paths as `/api/v{N}/...`. /// HTTP API major version. Appears in URL paths as `/api/v{N}/...`.
/// Bump (new integer + new URL prefix) when the request/response /// Bump (new integer + new URL prefix) when the request/response

View File

@@ -1,6 +1,6 @@
{ {
"name": "picloud-dashboard", "name": "picloud-dashboard",
"version": "0.6.0", "version": "0.7.0",
"private": true, "private": true,
"type": "module", "type": "module",
"scripts": { "scripts": {

View File

@@ -186,6 +186,23 @@ export interface UpdateScriptInput {
sandbox?: ScriptSandbox; sandbox?: ScriptSandbox;
} }
export interface DeadLetterRow {
id: string;
app_id: string;
source: string;
op: string;
trigger_id: string | null;
script_id: string | null;
payload: unknown;
attempt_count: number;
first_attempt_at: string;
last_attempt_at: string;
last_error: string;
created_at: string;
resolved_at: string | null;
resolution: 'replayed' | 'ignored' | 'handled_by_script' | 'handler_failed' | null;
}
export interface ExecutionResult { export interface ExecutionResult {
status: number; status: number;
headers: Record<string, string>; headers: Record<string, string>;
@@ -516,6 +533,37 @@ export const api = {
) )
}, },
deadLetters: {
count: (idOrSlug: string) =>
adminRequest<{ unresolved: number }>(
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/count`
),
list: (idOrSlug: string, opts: { unresolved?: boolean; limit?: number; offset?: number } = {}) => {
const params = new URLSearchParams();
if (opts.unresolved) params.set('unresolved', 'true');
if (opts.limit !== undefined) params.set('limit', String(opts.limit));
if (opts.offset !== undefined) params.set('offset', String(opts.offset));
const qs = params.toString();
return adminRequest<{ dead_letters: DeadLetterRow[] }>(
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters${qs ? `?${qs}` : ''}`
);
},
get: (idOrSlug: string, dlId: string) =>
adminRequest<DeadLetterRow>(
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}`
),
replay: (idOrSlug: string, dlId: string) =>
adminRequest<null>(
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}/replay`,
{ method: 'POST' }
),
resolve: (idOrSlug: string, dlId: string, reason: string) =>
adminRequest<null>(
`/api/v1/admin/apps/${encodeURIComponent(idOrSlug)}/dead_letters/${dlId}/resolve`,
{ method: 'POST', body: JSON.stringify({ reason }) }
)
},
execute: async ( execute: async (
id: string, id: string,
body: unknown, body: unknown,

View File

@@ -12,6 +12,26 @@
let listError = $state<string | null>(null); let listError = $state<string | null>(null);
let loading = $state(true); let loading = $state(true);
/// Unresolved-dead-letter count per app (v1.1.1). Loaded in
/// parallel after the app list. Failures here are non-fatal —
/// missing counts just don't render a badge.
let unresolvedDl = $state<Record<string, number>>({});
async function loadDlCounts(appList: App[]) {
const results = await Promise.all(
appList.map(async (a) => {
try {
const r = await api.deadLetters.count(a.id);
return [a.id, r.unresolved] as const;
} catch {
return [a.id, 0] as const;
}
})
);
const next: Record<string, number> = {};
for (const [id, count] of results) next[id] = count;
unresolvedDl = next;
}
let showCreate = $state(false); let showCreate = $state(false);
let createSlug = $state(''); let createSlug = $state('');
let createName = $state(''); let createName = $state('');
@@ -49,6 +69,9 @@
listError = null; listError = null;
try { try {
apps = await api.apps.list(); apps = await api.apps.list();
if (apps && apps.length > 0) {
void loadDlCounts(apps);
}
} catch (e) { } catch (e) {
listError = e instanceof Error ? e.message : String(e); listError = e instanceof Error ? e.message : String(e);
apps = null; apps = null;
@@ -201,6 +224,12 @@
<div class="primary"> <div class="primary">
<strong>{app.name}</strong> <strong>{app.name}</strong>
<span class="muted">/{app.slug}</span> <span class="muted">/{app.slug}</span>
{#if unresolvedDl[app.id] > 0}
<span
class="dl-badge"
title="Unresolved dead letters in this app"
>{unresolvedDl[app.id]}</span>
{/if}
</div> </div>
<div class="secondary muted"> <div class="secondary muted">
{app.description ?? '—'} {app.description ?? '—'}
@@ -246,6 +275,19 @@
cursor: not-allowed; cursor: not-allowed;
} }
.dl-badge {
display: inline-block;
min-width: 1.25rem;
padding: 0.1rem 0.4rem;
background: #ef4444;
color: #fff;
border-radius: 999px;
font-size: 0.75rem;
font-weight: 600;
text-align: center;
margin-left: 0.5rem;
}
.muted { .muted {
color: #64748b; color: #64748b;
} }

View File

@@ -37,6 +37,20 @@
let domains = $state<AppDomain[]>([]); let domains = $state<AppDomain[]>([]);
let members = $state<AppMemberDto[]>([]); let members = $state<AppMemberDto[]>([]);
/// v1.1.1 dead-letters surface — design notes §4 mandates the
/// dashboard surface this since there's no default handler.
let unresolvedDeadLetters = $state<number>(0);
async function loadDeadLetterCount(idOrSlug: string) {
try {
const r = await api.deadLetters.count(idOrSlug);
unresolvedDeadLetters = r.unresolved;
} catch {
// Non-fatal: the page renders fine without the badge if
// the count endpoint is unreachable (e.g. older server).
unresolvedDeadLetters = 0;
}
}
// Derive UI gates from the capabilities helper so the rules stay // Derive UI gates from the capabilities helper so the rules stay
// in lockstep with the backend's `can()`. canAdminApp also covers // in lockstep with the backend's `can()`. canAdminApp also covers
// the Members + Settings + Domains-mutation tabs; canWriteApp // the Members + Settings + Domains-mutation tabs; canWriteApp
@@ -107,7 +121,11 @@
editName = app.name; editName = app.name;
editDescription = app.description ?? ''; editDescription = app.description ?? '';
editSlug = app.slug; editSlug = app.slug;
const loaders: Promise<unknown>[] = [loadScripts(app.id), loadDomains(app.id)]; const loaders: Promise<unknown>[] = [
loadScripts(app.id),
loadDomains(app.id),
loadDeadLetterCount(app.id)
];
if (canAdmin) { if (canAdmin) {
loaders.push(loadMembers(app.id), loadEligibleUsers()); loaders.push(loadMembers(app.id), loadEligibleUsers());
} }
@@ -421,6 +439,16 @@
class:active={activeTab === 'settings'} class:active={activeTab === 'settings'}
onclick={() => (activeTab = 'settings')}>Settings</button onclick={() => (activeTab = 'settings')}>Settings</button
> >
<a
class="tab-link"
href="{base}/apps/{slug}/dead-letters"
title="Dead letters — replay or resolve events that exhausted their retry policy"
>
Dead letters
{#if unresolvedDeadLetters > 0}
<span class="dl-badge">{unresolvedDeadLetters}</span>
{/if}
</a>
{/if} {/if}
</nav> </nav>
@@ -871,6 +899,32 @@
border-bottom-color: #38bdf8; border-bottom-color: #38bdf8;
} }
.tabs .tab-link {
display: inline-flex;
align-items: center;
gap: 0.4rem;
color: #94a3b8;
text-decoration: none;
padding: 0.6rem 1rem;
margin-left: auto;
border-bottom: 2px solid transparent;
font: inherit;
}
.tabs .tab-link:hover {
color: #e2e8f0;
}
.dl-badge {
display: inline-block;
min-width: 1.25rem;
padding: 0.1rem 0.4rem;
background: #ef4444;
color: #fff;
border-radius: 999px;
font-size: 0.75rem;
font-weight: 600;
text-align: center;
}
button { button {
background: #38bdf8; background: #38bdf8;
color: #0b1220; color: #0b1220;

View File

@@ -0,0 +1,310 @@
<script lang="ts">
import { base } from '$app/paths';
import { page } from '$app/state';
import { api, ApiError, type App, type DeadLetterRow } from '$lib/api';
let slug = $derived(page.params.slug ?? '');
let app = $state<App | null>(null);
let rows = $state<DeadLetterRow[]>([]);
let unresolved = $state<number>(0);
let loading = $state(true);
let error = $state<string | null>(null);
let unresolvedOnly = $state(true);
let expandedId = $state<string | null>(null);
async function load() {
loading = true;
error = null;
try {
const a = await api.apps.get(slug);
app = a;
const c = await api.deadLetters.count(slug);
unresolved = c.unresolved;
const r = await api.deadLetters.list(slug, { unresolved: unresolvedOnly, limit: 100 });
rows = r.dead_letters;
} catch (e) {
error = e instanceof ApiError ? e.message : String(e);
} finally {
loading = false;
}
}
$effect(() => {
// Re-load whenever the slug or filter changes.
void slug;
void unresolvedOnly;
void load();
});
async function replay(dlId: string) {
try {
await api.deadLetters.replay(slug, dlId);
await load();
} catch (e) {
error = e instanceof ApiError ? e.message : String(e);
}
}
async function markIgnored(dlId: string) {
try {
await api.deadLetters.resolve(slug, dlId, 'ignored');
await load();
} catch (e) {
error = e instanceof ApiError ? e.message : String(e);
}
}
function toggleExpanded(id: string) {
expandedId = expandedId === id ? null : id;
}
function fmtTime(iso: string): string {
return new Date(iso).toLocaleString();
}
function truncate(s: string, n: number): string {
if (s.length <= n) return s;
return s.slice(0, n) + '…';
}
</script>
<svelte:head>
<title>Dead letters · {slug} · PiCloud</title>
</svelte:head>
<div class="container">
<header>
<div>
<a href="{base}/apps/{slug}" class="back">&larr; back to {app?.name ?? slug}</a>
<h1>Dead letters</h1>
<p class="subtitle">
{#if unresolved > 0}
<strong class="badge">{unresolved}</strong> unresolved
{:else}
No unresolved dead letters
{/if}
</p>
</div>
<div class="controls">
<label>
<input type="checkbox" bind:checked={unresolvedOnly} />
Show unresolved only
</label>
<button onclick={load} disabled={loading}>Refresh</button>
</div>
</header>
{#if error}
<div class="error">{error}</div>
{/if}
{#if loading}
<p>Loading…</p>
{:else if rows.length === 0}
<p class="empty">
{#if unresolvedOnly}
No unresolved dead letters for this app. 🎉
{:else}
No dead letters recorded yet.
{/if}
</p>
{:else}
<table>
<thead>
<tr>
<th>Created</th>
<th>Source</th>
<th>Op</th>
<th>Script</th>
<th>Attempts</th>
<th>First / Last attempt</th>
<th>Last error</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{#each rows as row (row.id)}
<tr class:resolved={row.resolved_at !== null}>
<td>{fmtTime(row.created_at)}</td>
<td><code>{row.source}</code></td>
<td><code>{row.op}</code></td>
<td>{row.script_id ? row.script_id.slice(0, 8) : '—'}</td>
<td>{row.attempt_count}</td>
<td class="times">
<div>{fmtTime(row.first_attempt_at)}</div>
<div>{fmtTime(row.last_attempt_at)}</div>
</td>
<td class="err">
<button class="link" onclick={() => toggleExpanded(row.id)}>
{truncate(row.last_error, 60)}
</button>
</td>
<td class="actions">
{#if row.resolved_at === null}
<button onclick={() => replay(row.id)}>Replay</button>
<button class="secondary" onclick={() => markIgnored(row.id)}>
Mark resolved
</button>
{:else}
<span class="resolution">{row.resolution ?? 'resolved'}</span>
{/if}
</td>
</tr>
{#if expandedId === row.id}
<tr class="detail">
<td colspan="8">
<div class="detail-grid">
<section>
<h3>Payload</h3>
<pre>{JSON.stringify(row.payload, null, 2)}</pre>
</section>
<section>
<h3>Last error</h3>
<pre>{row.last_error}</pre>
</section>
</div>
</td>
</tr>
{/if}
{/each}
</tbody>
</table>
{/if}
</div>
<style>
.container {
max-width: 1200px;
margin: 0 auto;
padding: 2rem;
}
header {
display: flex;
justify-content: space-between;
align-items: flex-start;
margin-bottom: 1rem;
gap: 1rem;
}
.back {
font-size: 0.85rem;
color: var(--text-muted, #666);
text-decoration: none;
}
.back:hover {
text-decoration: underline;
}
h1 {
margin: 0.25rem 0;
}
.subtitle {
color: var(--text-muted, #666);
margin: 0;
}
.badge {
display: inline-block;
min-width: 1.5rem;
padding: 0.1rem 0.4rem;
background: #c00;
color: #fff;
border-radius: 999px;
text-align: center;
font-weight: 600;
}
.controls {
display: flex;
gap: 0.75rem;
align-items: center;
}
.error {
background: #fee;
border: 1px solid #fbb;
color: #900;
padding: 0.75rem 1rem;
border-radius: 4px;
margin-bottom: 1rem;
}
.empty {
color: var(--text-muted, #666);
text-align: center;
padding: 2rem;
}
table {
width: 100%;
border-collapse: collapse;
font-size: 0.9rem;
}
th,
td {
text-align: left;
padding: 0.5rem 0.75rem;
border-bottom: 1px solid var(--border, #e0e0e0);
vertical-align: top;
}
th {
background: var(--bg-secondary, #f5f5f5);
font-weight: 600;
}
tr.resolved {
opacity: 0.6;
}
.times div {
font-size: 0.8rem;
white-space: nowrap;
}
.err button.link {
background: none;
border: none;
color: var(--link, #06c);
text-decoration: underline;
cursor: pointer;
padding: 0;
font-family: monospace;
font-size: 0.85rem;
text-align: left;
}
.actions {
white-space: nowrap;
display: flex;
gap: 0.4rem;
}
.actions button.secondary {
background: transparent;
color: var(--text, #333);
border: 1px solid var(--border, #ccc);
}
.resolution {
font-style: italic;
color: var(--text-muted, #666);
font-size: 0.85rem;
}
tr.detail td {
background: var(--bg-secondary, #fafafa);
padding: 0;
}
.detail-grid {
display: grid;
grid-template-columns: 2fr 1fr;
gap: 1rem;
padding: 1rem;
}
.detail-grid section h3 {
margin: 0 0 0.5rem 0;
font-size: 0.85rem;
text-transform: uppercase;
color: var(--text-muted, #666);
}
.detail-grid pre {
background: #fff;
border: 1px solid var(--border, #e0e0e0);
padding: 0.75rem;
border-radius: 4px;
font-size: 0.8rem;
overflow: auto;
max-height: 300px;
margin: 0;
}
code {
font-family: monospace;
font-size: 0.85rem;
}
</style>

617
docs/v1.1.x-design-notes.md Normal file
View File

@@ -0,0 +1,617 @@
# v1.1.x design notes — in-flight decisions + revised roadmap
Planning document for the v1.1.x release series. Companion to:
- [`serverless_cloud_blueprint.md`](../serverless_cloud_blueprint.md) — authoritative design
- [`docs/sdk-shape.md`](sdk-shape.md) — SDK conventions (settled in v1.1.0)
- [`docs/stdlib-reference.md`](stdlib-reference.md) — stdlib API (settled in v1.1.0)
- [`docs/versioning.md`](versioning.md) — versioning policy (post-1.0 carve-out settled with v1.1.0)
Items in this doc are either **tentatively decided but not yet shipped** or **open calls awaiting the maintainer's decision**. Once an item ships, its content moves into the blueprint and the corresponding section here gets pruned.
This document was created at the v1.1.0 → v1.1.1 boundary, capturing the architectural conversations that followed v1.1.0 but haven't yet landed in code or in the blueprint.
---
## 1. The three messaging primitives
PiCloud will expose three distinct messaging concepts. The right way to slice them is along **recipient model** and **delivery semantics**:
| | Recipients | Durability | Delivery | Retry on script failure | Mental model |
|---|---|---|---|---|---|
| **`invoke(script_id, args)`** | One **named** script | None (or fire-and-forget durable) | At-most-once sync, or at-least-once async | Caller-controlled via `retry::*` | Function call |
| **`pubsub::publish_durable(topic, msg)`** | **All** scripts subscribed via trigger | Through outbox | **At-least-once per subscriber** | Per-subscriber retry up to N, then dead-letter | Fan-out broadcast (persisted) |
| **`pubsub::publish_ephemeral(topic, msg)`** *(future)* | **All** scripts subscribed via trigger | None (in-memory NOTIFY) | **At-most-once per subscriber** | None | Fan-out broadcast (best-effort) |
| **`queue::enqueue(name, msg)`** | **Exactly one** consumer wins | Durable table | **At-least-once total** | Visibility timeout + nack-on-throw | Work distribution |
**Critical distinction:** pub/sub and queue both end up at-least-once, but the **subscriber model** differs. Queue: 1 message → 1 delivery record → consumers compete. Pub/sub: 1 message → N delivery records (one per subscriber) → no competition.
### Pub/sub reframe — durable through the outbox, ephemeral as named escape hatch
The original blueprint plan was pub/sub via Postgres `LISTEN/NOTIFY` (ephemeral, sub-millisecond fan-out). Reframe to **reuse the triggers framework's outbox infrastructure for the durable path, and keep ephemeral as a separately-named future API**:
- `pubsub::publish_durable(topic, msg)` writes to the outbox (v1.1.5)
- Dispatcher fans out one delivery record per subscribed script trigger
- Each delivery retried on failure with the same machinery as KV / doc / file triggers
- After N retries → dead-letter (see §4)
- `pubsub::publish_ephemeral(topic, msg)` is committed as a future addition for the in-memory `LISTEN/NOTIFY` path — not shipped in v1.1.5, but the API split is decided now so users learn "durable by default, opt into ephemeral" from the start (rather than the reverse, which would be a breaking rename later).
**Wins:** one delivery model in the whole system for the durable path, durable pub/sub for free, shared observability/retry/dead-letter tooling across every event-firing surface.
**Cost:** ~1ms Postgres write per `publish_durable` (vs in-memory NOTIFY). For solo-dev / consumer hardware, the right tradeoff. The ephemeral escape hatch exists for sub-ms / high-frequency workloads if/when they emerge.
**Note on durability semantics.** "Durable" here means the outbox row persists, not that fan-out is transactional with the publisher's own data writes. A script doing `kv.set(...)` then `pubsub::publish_durable(...)` performs two separate writes; a crash between them can drop the publish. This matches the standard transactional-outbox pattern and is consistent with how KV / doc / file triggers already work.
### Queue stays separate
Pub/sub-through-outbox cannot model "work distribution with backpressure" cleanly. Queue keeps its own table:
- Producer: `queue::enqueue(name, msg)` → queue table
- Consumer: `queue:receive` trigger fires when message available; runtime claims with `FOR UPDATE SKIP LOCKED` + visibility timeout
- Script returns successfully → auto-ack (delete row)
- Script throws → auto-nack (clear claim; message becomes visible again)
- Visibility timeout exceeded → reclaim allowed (handles crashed consumers)
- Max delivery attempts → dead-letter
The queue table IS the outbox for queue semantics — no double-buffering.
### Status
- **Durable pub/sub via trigger outbox**: ✅ Decided 2026-06-01 — ship as `pubsub::publish_durable` in v1.1.5.
- **Ephemeral pub/sub**: ✅ Committed 2026-06-01 as a future addition named `pubsub::publish_ephemeral`. Not in v1.1.5; the explicit-naming split lands now so the durable default doesn't need a breaking rename later.
- **Drop `LISTEN/NOTIFY` for v1.1.5**: ✅ Decided 2026-06-01.
- **Queue stays separate from pub/sub**: ✅ Decided 2026-06-01 — two distinct top-level namespaces (`queue::*` and `pubsub::*`); no unifying `messaging::*` abstraction. Rationale: the two have genuinely different mental models (work distribution vs fan-out), the implementations share almost no code (queue needs `FOR UPDATE SKIP LOCKED` + visibility timeout + nack-on-throw; pub/sub needs per-subscriber fan-out + independent retry/dead-letter), and a unified API would force users to choose a mode they already know from the use case. A future Kafka-shaped consumer-group unification was considered and rejected — PiCloud is outbox-based, not log-based, so going Kafka-shaped would mean rebuilding storage.
### Open calls
1. ~~Pub/sub durability via trigger outbox~~ — ✅ Decided 2026-06-01: yes, both `publish_durable` (v1.1.5) and `publish_ephemeral` (future) committed with explicit names.
2. ~~Queue and pub/sub stay separate concepts~~ — ✅ Decided 2026-06-01: separate top-level namespaces; no unifying messaging abstraction.
---
## 2. Universal trigger outbox
The triggers framework's outbox should be the universal substrate for **async dispatch**. Every event source that fires scripts asynchronously writes to the same outbox table; one dispatcher reads from it and routes to the executor with shared load control, retry, dead-letter, and trigger-depth tracking.
### What runs through the outbox
| Ingress | Path | Reason |
|---|---|---|
| **HTTP request (sync)** | Direct: orchestrator → executor → response (with NATS-style indirection — see §3) | Caller is waiting; the inbox pattern makes this work via the outbox |
| **HTTP request (async, opt-in)** | Orchestrator writes outbox → returns 202 → dispatcher → executor | Webhooks, fire-and-forget endpoints; explicit opt-in via route config |
| **Cron tick** | Scheduler writes outbox → dispatcher → executor | No caller; naturally async |
| **KV / doc / file change** | Service writes outbox → dispatcher → executor | No caller; the originating script already returned |
| **Pub/sub publish** | Service writes outbox → dispatcher → executor (per subscriber) | Fan-out semantics |
| **Queue message** | Queue table IS the outbox; dispatcher claims via `FOR UPDATE SKIP LOCKED` | Avoids double-buffering |
| **Inbound email** | SMTP receiver writes outbox → dispatcher → executor | No caller |
### What this gives
1. **One dispatcher = one place** for load control (the existing `ExecutionGate`), retry, dead-letter, trigger-depth tracking, fan-out. New event source = "write to outbox in this shape", nothing else.
2. **Routes become a trigger kind**, conceptually. A route is `(source=http, filter=method+path, script_id, dispatch_mode=sync|async)`. Schema-wise the `routes` table likely stays separate from the new `triggers` table (polymorphic JSON columns get ugly), but the mental model collapses to "everything that fires a script is a trigger".
3. **`dispatch_mode = async` is a per-route opt-in**. Webhook handlers can return 202 immediately and process in the background — dispatcher handles retries, caller gets a snappy ack.
4. **Replay and debugging.** Every async invocation has an outbox row; admin can re-fire a trigger by re-dispatching the row.
5. **Decoupled lifecycle.** Dispatcher can be paused for maintenance without affecting HTTP ingress (it just queues); HTTP can degrade (overflow 503s) without affecting async work already in the outbox.
### What this doesn't change
- Sync HTTP still hits the `ExecutionGate` the same way (now via the dispatcher).
- Async outbox dispatch also hits the gate when the dispatcher picks a row. Sync and async share the cap on actual blocking-thread-in-use.
- Trigger CRUD likely stays in per-kind tables for schema sanity; the unification is conceptual + dispatch-layer, not schema-layer.
### Status
- **Universal outbox for async dispatch**: ✅ Decided 2026-06-01 — yes; all async ingress (KV/cron/pubsub/queue/email/dead-letter) writes to one outbox; one dispatcher reads it.
- **Sync HTTP via outbox (NATS-style inbox)**: ✅ Decided 2026-06-01 — in-process oneshot in v1.1.1; cluster-mode keeps the door open for `LISTEN/NOTIFY` keyed on `inbox_id` in v1.3+ (see §3 implementation table).
- **Routes-as-trigger conceptually**: ✅ yes — the dispatch layer treats routes and triggers uniformly.
- **Trigger storage shape: Layout E (parent + per-kind detail tables)**: ✅ Decided 2026-06-01. One shared `triggers` parent with common columns (`id`, `app_id`, `script_id`, `kind`, `enabled`, `dispatch_mode`, retry config, timestamps); one `<kind>_trigger_details` table per service (`kv_trigger_details`, `cron_trigger_details`, `pubsub_trigger_details`, `queue_trigger_details`, `email_trigger_details`, `dead_letter_trigger_details`). Outbox FKs to `triggers.id`; dead-letters FK same. Exact column set (notably `outbox.app_id` denormalization, whether `script_id` also lives on outbox, ON DELETE behavior on the parent vs detail tables) will be refined when v1.1.1 implementation lands.
- **`routes` table stays separate from the `triggers` parent for now**: ✅ Decided 2026-06-01. `routes` is Phase-3 production schema with its own trie-index columns; folding into the parent is a v1.2 cleanup, not a v1.1.1 requirement. Outbox discriminates HTTP rows via `source_kind = 'http'` and `trigger_id` referencing `routes.id` for HTTP, `triggers.id` for everything else.
- **Per-route `dispatch_mode: sync|async`**: ✅ Decided 2026-06-01 — ships in v1.1.1. Async returns `202 Accepted` with a JSON body `{ "accepted_at": "...", "execution_id": "..." }`. `dispatch_mode` is a route property fixed at route creation; scripts cannot switch modes mid-call.
### Open calls
1. ~~Sync HTTP via outbox + per-request inbox~~ — ✅ Decided 2026-06-01: yes via outbox; in-process oneshot now, `LISTEN/NOTIFY` explicitly preserved for cluster mode (v1.3+).
2. ~~Ship `dispatch_mode: async` in v1.1.1~~ — ✅ Decided 2026-06-01: yes; `202 Accepted` + JSON body with `execution_id`; route-level config only.
3. ~~Trigger storage shape~~ — ✅ Decided 2026-06-01: Layout E (parent + per-kind detail tables); `routes` stays its own table for v1.1.x. Exact column set deferred to implementation PR.
---
## 3. NATS-style request/reply for sync HTTP
The constraint that makes "universal outbox" tricky: HTTP has a caller waiting. We can't write to outbox, return 202, and walk away — the user's browser expects `200 OK` with body. NATS's request/reply pattern resolves this elegantly.
### Pattern
```
HTTP request → orchestrator generates inbox_id, registers a oneshot channel
→ writes outbox row { source: http, payload, reply_to: inbox_id }
→ awaits on the channel (with timeout = script's wall-clock + buffer)
Dispatcher → picks outbox row
→ dispatches to executor (gate + spawn_blocking + Rhai)
→ if reply_to.is_some(): resolves the channel with the result
→ if reply_to.is_none(): records completion + retries on failure per trigger config
Orchestrator → channel resolves → returns response to HTTP caller
→ on timeout: returns 504 or 500 → see status-code calls below
```
The HTTP caller's experience is unchanged (synchronous request/response). Under the hood, dispatch is identical for every invocation source.
### Implementation by deployment mode
| Mode | Mechanism | Trade-off |
|---|---|---|
| **In-process (v1.1.1, MVP)** | Per-orchestrator `HashMap<InboxId, oneshot::Sender<Result>>`; dispatcher resolves the oneshot | Sub-ms wake-up; fails across process boundaries |
| **Cross-process (cluster mode v1.3+)** | Postgres `LISTEN/NOTIFY` keyed on `inbox_id`, with a `responses` row as durable backup | Sub-10ms wake-up; survives across nodes; needs careful long-listener management |
| **Polling fallback** | Orchestrator polls `responses` table for `inbox_id` every ~10ms | Simple; ~10ms minimum latency; only as fallback |
### Latency cost (honest numbers)
Per sync HTTP request, NATS-style adds: ~1-2ms Postgres write (outbox) + sub-ms dispatcher wake (in-process channel) + ~1ms response resolve = **~2-5ms overhead**. For most scripts (10-100ms execution), this is noise. PiCloud isn't optimizing for sub-ms; the architectural unification is worth a few ms.
### Default retry policy — decided
✅ Decided 2026-06-01:
| Knob | Default | Env override | Per-trigger column |
|---|---|---|---|
| Max attempts | 3 | `PICLOUD_TRIGGER_RETRY_MAX_ATTEMPTS` | `retry_max_attempts` |
| Backoff shape | exponential | `PICLOUD_TRIGGER_RETRY_BACKOFF` (`exponential` \| `linear` \| `constant`) | `retry_backoff` |
| Base delay | 1000ms | `PICLOUD_TRIGGER_RETRY_BASE_MS` | `retry_base_ms` |
| Jitter | ±20% | `PICLOUD_TRIGGER_RETRY_JITTER_PCT` | (not per-trigger; dispatcher-side) |
With the defaults, schedule after each failed attempt is **~1s / ~2s / ~4s** (each ±20%), total time-to-dead-letter ~7s.
**What triggers a retry:** any of Rhai runtime error, wall-clock timeout, operation-budget-exceeded, or platform-side failure (Postgres unavailable, executor crashed). Distinguishing them in the dispatcher is fiddly and the retry cost is bounded by `max_attempts`; if op-budget retries become dead-letter spam in practice, revisit.
**Per-trigger override:** the three retry columns on the `triggers` parent table (Layout E) take precedence over the env-configured defaults. Trigger CRUD endpoints accept these on create/update; if omitted, the env defaults are applied at write time (not lazily at dispatch — keeps the policy auditable from the row itself).
**Sync HTTP exception:** unchanged. `reply_to.is_some()` rows are never retried regardless of policy (see below).
### Retry policy — `reply_to` IS the signal
| Outbox row | Retry behavior |
|---|---|
| `reply_to.is_some()` | **Never retry.** Caller is waiting; retrying means the script might run twice and the caller gets one of two outcomes. Always: one attempt, surface result (success or failure) to inbox. |
| `reply_to.is_none()` | Retry per trigger's configured policy. Default: 3 attempts, exponential backoff (1s, 2s, 4s), dead-letter after. |
Per-trigger config lives on the trigger row:
```
trigger { source: cron, schedule: "0 */5 * * * *",
retry: { max_attempts: 5, backoff: exponential, base_ms: 1000 } }
trigger { source: pubsub, topic: "user.created",
retry: { max_attempts: 3, backoff: linear, base_ms: 500 } }
trigger { source: http, method: POST, path: "/api/foo",
dispatch_mode: sync } // retry absent — sync HTTP is always 1-attempt
```
### Failure / crash handling
With NATS-style indirection, there are new ways for a sync HTTP request to vanish. Every failure path must resolve the orchestrator's oneshot channel with something:
| Failure mode | Detection | Caller sees |
|---|---|---|
| Script throws / runtime error | Executor returns `ExecError::Runtime` → written to inbox | 502 (or 500 — see status-code discussion) |
| Script exceeds wall-clock | `tokio::time::timeout` fires inside dispatcher → written to inbox | 504 (or 500) |
| Operation budget exceeded | Executor returns `ExecError::OperationBudgetExceeded` → inbox | 507 (or 500) |
| Executor process crashes mid-execution | `JoinError``ExecError::Runtime` → inbox | 500 |
| Dispatcher process dies between claim and reply | Orchestrator's wait times out | 500 |
| Outbox write fails (Postgres unavailable) | Orchestrator never publishes; immediate error | 500 |
| Orchestrator's own wait times out unexpectedly | Channel timeout fires before inbox resolves | 504 (or 500) |
Every path resolves the channel with a result. The orchestrator's outer timeout is the backstop for "dispatcher just died completely".
### Status code strategy — decided
✅ Decided 2026-06-01: keep the granular status codes (Option A), with one refinement — `500` is reserved for **platform** problems (dispatcher vanished, outbox write failed, inbox channel timed out unexpectedly), not used as a generic catch-all.
| Code | Cause | Who's at fault |
|---|---|---|
| 422 | Request validation failed | Client |
| 502 | Script threw / Rhai runtime error | User script |
| 503 | Gate refused (overloaded); `Retry-After: 1` | Platform (capacity) |
| 504 | Wall-clock timeout | Either (slow script or platform overload) |
| 507 | Operation budget exceeded | User script |
| 500 | Dispatcher vanished / outbox write failed / inbox channel timed out unexpectedly | Platform (bug or infra) |
Rationale: each code is actionable for the caller (back off, redesign as async, fix the script, file a bug). Flattening to `500` would collapse "script crashed" vs "overloaded" vs "your timeout is too tight" vs "platform broke" into one undifferentiated signal — losing both client-facing UX and our own observability/alerting axis.
### Status
- **NATS-style for sync HTTP**: ✅ Decided 2026-06-01 (see §2 #3).
- **`reply_to` presence as the "don't retry" signal**: ✅ Decided 2026-06-01 (folded with the NATS-style decision).
- **Status code strategy**: ✅ Decided 2026-06-01 — keep granular distinctions; `500` reserved for platform problems only.
- **Default retry policy**: ✅ Decided 2026-06-01 — 3 attempts / exponential / 1000ms base / ±20% jitter; all four env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger columns on the parent table take precedence.
- **Cancel-on-timeout semantics**: ✅ Decided 2026-06-01 — option (b). Late results are discarded from the caller's POV (they already got a 504) but the dispatcher writes an `abandoned_executions` row whenever it tries to resolve a oneshot that's already closed/dropped. 7-day default retention via `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`; weekly GC sweep. A counter (`picloud_abandoned_executions_total{app_id}`) bumps on insert — that's the primary observability signal; the rows themselves are for forensics when the counter spikes. Only the dispatcher-after-orchestrator-timeout edge case writes a row; ordinary "script timed out, caller got 504" stays uneventful.
### Open calls
1. ~~NATS-style request/reply for sync HTTP~~ — ✅ Decided 2026-06-01 (see §2 #3).
2. ~~Status code strategy~~ — ✅ Decided 2026-06-01: Option A (keep distinctions); 500 reserved for platform problems.
3. ~~Default retry policy on triggers~~ — ✅ Decided 2026-06-01: 3/exp/1000ms base + ±20% jitter; env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger row columns override the env defaults.
4. ~~Cancel-on-timeout semantics~~ — ✅ Decided 2026-06-01: option (b) — `abandoned_executions` table, dispatcher-written, 7-day retention, metric counter on insert.
---
## 4. Dead-letter handling
Events that exhaust their retry policy land in a **separate `dead_letters` table** (not a flag on the outbox — outbox should stay a queue with fast inserts and scans). Users handle dead letters by registering a script for the new `dead_letter` **trigger kind**.
### Schema sketch
```sql
CREATE TABLE dead_letters (
id UUID PRIMARY KEY,
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
original_event_id UUID NOT NULL, -- the outbox row id
source TEXT NOT NULL, -- "kv", "cron", "pubsub", "queue", "email"
op TEXT NOT NULL,
trigger_id UUID, -- which trigger config fired (null for direct dispatches)
script_id UUID, -- which script failed
payload JSONB NOT NULL, -- the event payload, verbatim
attempt_count INT NOT NULL,
first_attempt_at TIMESTAMPTZ NOT NULL,
last_attempt_at TIMESTAMPTZ NOT NULL,
last_error TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolved_at TIMESTAMPTZ, -- null = unresolved
resolution TEXT -- "replayed" | "ignored" | "handled_by_script" | "handler_failed"
);
CREATE INDEX idx_dead_letters_app_unresolved
ON dead_letters(app_id) WHERE resolved_at IS NULL;
```
### Dead letter as trigger source
```
trigger {
source: dead_letter,
filter: { source: "kv" }, -- optional; defaults to "any source"
script_id: <your handler>,
dispatch_mode: async,
retry: { max_attempts: 1 } -- forced — see recursion stop rule below
}
```
Filterable on:
- `source`: only dead letters from a particular event source (kv, cron, pubsub, …)
- `trigger_id`: only dead letters from a particular trigger config
- `script_id`: only dead letters from a particular script
- No filter: every dead letter fires this handler
`ctx.event` for a dead-letter handler:
```rhai
ctx.event.source // "dead_letter"
ctx.event.dead_letter = #{
original: #{
source: "kv",
op: "insert",
collection: "widgets",
key: "k1",
payload: #{ ... }
},
attempts: 3,
last_error: "script timeout after 30s",
trigger_id: "...",
script_id: "...",
first_attempt_at: "2026-05-30T12:00:00.000Z",
last_attempt_at: "2026-05-30T12:00:14.000Z"
}
```
The handler can `log::error`, send `email::send` to admins, write to `docs::collection("incidents").create(...)`, post to external alerting via `http::post`, or call `dead_letters::replay(id)` if it decides retry is favorable.
### Recursion stop rule — decided
✅ Decided 2026-06-01: **dead-letter handlers execute once, no retry, and CANNOT themselves be dead-lettered.**
- The flag lives on the **execution/outbox row** (set by the dispatcher when it picks a row whose trigger has `kind = 'dead_letter'`), not on the trigger config. Same handler script could in principle be reused for non-DL work without inheriting the no-retry treatment.
- On handler failure:
- Full payload + error logged to structured logs
- Counter `picloud_dead_letter_handler_failures{app_id}` bumped
- Original dead-letter row annotated with `resolution = 'handler_failed'`
- **No retry, no second dead-letter row, no further fire.**
- **Missing handler script** (trigger references `script_id` that's been deleted): treated as a handler failure — same metric bump, same `resolution = 'handler_failed'`, same no-retry. Auto-disabling the trigger is deferred to v1.2; for v1.1.1 the user sees the metric spike and investigates.
- **Indirect loops** (DL handler writes to KV → fires a KV trigger → that handler fails → dead-letters → fires the same DL handler) are not blocked by this rule directly; they're bounded by the existing trigger-depth limit (`cx.trigger_depth`). The recursion-stop rule only prevents the *direct* infinite regress where a DL handler's failure would itself produce a DL row.
Rationale: if your alerting script is broken, the platform shouldn't try to alert about that with the same broken script. The chain has to terminate, period.
### Defaults — decided
✅ Decided 2026-06-01: **no automatic handler.** Dead letters land in the table; users opt into handling by registering a `dead_letter` trigger.
**Load-bearing commitment:** the v1.1.1 dashboard surfaces this state. Without dashboard surface, "no default handler" is irresponsible — users wouldn't know dead-letters exist until they queried Postgres directly. So shipping the table without the UI is not an option.
Required in v1.1.1 alongside the table:
- An **unresolved-count badge** per app, visible in the dashboard's app list and on the app detail page. Source query: `SELECT count(*) FROM dead_letters WHERE app_id = $1 AND resolved_at IS NULL`.
- A **per-app dead-letters list view** reachable from the badge. Columns: `created_at`, `source`, `op`, `script_id`, `last_error`, `attempt_count`, `first_attempt_at`, `last_attempt_at`. Per-row actions: **Replay** (re-inserts the original event into the outbox; dispatcher tries again from scratch) and **Mark resolved** (sets `resolution = 'ignored'`, no further action).
- A row detail panel showing the full payload + complete error history.
Rationale: most apps will run for months without ever needing a DL handler; the table is the durable record either way. The dashboard surface gives users the lightest-touch signal that something is wrong without committing v1.1.1 to building a notifications channel.
A heavier built-in default ("log to admin notifications channel") was considered and rejected — it would smuggle a notifications-surface design into v1.1.1 under the guise of a default, with real product-design questions (channel shape, configuration, opt-out, rate-limiting) that aren't worth answering yet. If the dashboard badge proves insufficient in practice, a structured-log fallback (writing to `execution_logs` with a known `dead_letter` shape) is an additive future change, not a breaking one.
### Sync HTTP failures don't dead-letter
Sync HTTP requests (`reply_to.is_some()`) failures don't land in `dead_letters`. Caller already got an error response; every failed HTTP request landing in `dead_letters` would flood the table; `execution_logs` already captures sync request failures. If a user wants alerts on HTTP endpoint failures, that's **monitoring** (v1.3+ territory), not dead-lettering.
### Pub/sub fan-out dead-letters independently
One `pubsub::publish` → N subscribers → each retries independently → each can independently dead-letter. So one publish can produce N dead-letter rows (one per subscriber that exhausted retries). Subscribers are independent failure domains.
### Manual replay — Rhai SDK scope decided
✅ Decided 2026-06-01: ship `dead_letters::replay(id)` and `dead_letters::resolve(id, reason)` in v1.1.1; **defer `dead_letters::list(filter)` to v1.2** to align with `docs::find()` query semantics.
| Surface | Use case | Shipping in |
|---|---|---|
| `POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/replay` | Admin clicks "replay" in dashboard | v1.1.1 |
| `POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/resolve` | Admin marks resolved via dashboard | v1.1.1 |
| `GET /api/v1/admin/apps/{id}/dead_letters` | Dashboard list view | v1.1.1 |
| `dead_letters::replay(id)` Rhai SDK | A handler script decides to retry programmatically | v1.1.1 |
| `dead_letters::resolve(id, reason)` Rhai SDK | A handler decides "this is fine, don't bother me" | v1.1.1 |
| `dead_letters::list(filter)` Rhai SDK | Bulk replay / cleanup scripts | **v1.2** (aligns with `docs::find()` query DSL) |
Replay re-inserts the original event into the outbox; dispatcher tries again from scratch.
**Authz:** both replay and resolve are gated by a new `Capability::AppDeadLetterManage(AppId)` checked inside the service methods. The capability is granted to app admins by default (existing Phase 3.5 role hierarchy). A public HTTP script running with `principal: None` would fail this check, which is correct.
**Trigger-execution principal (related decision):** ✅ a trigger execution runs as the principal that **registered the trigger**, captured on the trigger row at registration time. This gives a clean "the trigger fires as you" model and matches how cron jobs are typically conceptualized. The original event's principal (e.g. the anonymous caller of a public HTTP route) is recorded for forensics on the outbox row but does not become the execution principal. This is a wider trigger-framework decision surfaced here because dead-letter authz is the first concrete consumer; it applies to **every** trigger kind, not just dead-letter.
### Retention — decided
✅ Decided 2026-06-01: **30 days, GC by `created_at`, env-overridable only (no per-app override in v1.1.1).**
- Default: 30 days
- Override: `PICLOUD_DEAD_LETTER_RETENTION_DAYS` (whole-deployment, not per-app)
- GC condition: `created_at < NOW() - retention` — applies to both resolved and unresolved rows uniformly. (Activity-age GC — keeping recently-resolved rows 30 days post-resolution — was considered and deferred; can switch if user feedback shows it's needed without breaking anything.)
- GC job: weekly sweep in `manager-core`, claiming via `FOR UPDATE SKIP LOCKED` to match the dispatcher's claim pattern.
Per-app retention overrides are deferred to a later release. The env var covers single-deployer needs; per-app settings would need a dashboard surface + permissions story that isn't worth smuggling into v1.1.1.
### Status
- **Separate `dead_letters` table**: leaning yes.
- **`dead_letter` as trigger kind**: leaning yes.
- **Recursion stop rule** (handlers can't be dead-lettered): ✅ Decided 2026-06-01 (above); flag lives on the execution; missing-handler case treated as handler failure.
- **No default handler** (rows sit in table; dashboard surfaces them): ✅ Decided 2026-06-01 — unresolved-count badge + per-app list view ship in v1.1.1 alongside the table.
- **Sync HTTP failures don't dead-letter**: leaning yes.
- **Retention**: ✅ Decided 2026-06-01 — 30 days, GC by `created_at`, env-only override (`PICLOUD_DEAD_LETTER_RETENTION_DAYS`); weekly `FOR UPDATE SKIP LOCKED` sweep in `manager-core`.
- **Rhai SDK scope**: ✅ Decided 2026-06-01 — `replay` + `resolve` ship in v1.1.1; `list` deferred to v1.2 to align with `docs::find()` query DSL. New `Capability::AppDeadLetterManage(AppId)`.
- **Trigger-execution principal**: ✅ Decided 2026-06-01 — trigger fires as the principal that registered it (captured on the trigger row at registration). Original event's principal is recorded on the outbox row for forensics but does not become the execution principal. Applies to all trigger kinds.
### Open calls
1. ~~Dead-letter handlers unretryable + can't be dead-lettered themselves~~ — ✅ Decided 2026-06-01: confirmed; flag on execution; missing-handler = `resolution = 'handler_failed'`; indirect loops bounded by `cx.trigger_depth`.
2. ~~No default dead-letter handler~~ — ✅ Decided 2026-06-01: confirmed; rows sit in the table by default. Dashboard unresolved-count badge + per-app DL list view (with Replay + Mark-resolved actions) ship in v1.1.1 alongside the table.
3. ~~30-day default retention~~ — ✅ Decided 2026-06-01: 30 days, GC by `created_at`, env-only override; per-app retention deferred.
4. ~~Rhai SDK for dead-letters in v1.1.1~~ — ✅ Decided 2026-06-01: `replay` + `resolve` ship; `list` deferred to v1.2 to align with `docs::find()`; new `Capability::AppDeadLetterManage(AppId)`. Related: trigger executions run as the trigger-registering principal.
---
## 5. Realtime updates for external clients
Apps built on PiCloud need a way for browser/mobile clients to receive live updates (chat messages, dashboard data, multiplayer state, notifications). Today's pub/sub is internal-only (script ↔ script via triggers).
### The chosen approach — decided
✅ Decided 2026-06-01: **Option C (one publish API, topics opt-in to external visibility) with the registration split below.**
- One `pubsub::publish_durable(topic, msg)` API for scripts — produces a single event regardless of who subscribes.
- Topics are **internal-only by default**: script triggers can subscribe; external clients cannot.
- **Externally-subscribable topics must be registered explicitly** (admin API + dashboard surface). Internal-only topics remain implicit — anyone can `publish_durable("any.topic", msg)` and triggers can subscribe without registration. To externalize: create a `topics` row with `external_subscribable = true` first.
- External clients connect to `GET /realtime/topics/{topic}` via SSE; they only receive messages from registered, externally-subscribable topics they're permitted to access.
**UI/security commitments** (the difference between C working and C being default-public in disguise):
1. The externally-subscribable opt-in is prominent UI, not a buried checkbox.
2. The topic list view shows "external: yes/no" as a first-class column.
3. Marking a topic externally-subscribable requires app admin role (capability-gated via `Capability::AppTopicManage(AppId)`).
4. The bit-flip is its own API endpoint (not a side-effect of generic topic update) so it carries an independent audit trail.
**Wins:** one publish API for scripts (DRY), topics are private by default (security), external visibility requires deliberate explicit registration (not just a config flag flipped during quick edits).
**Why not A (every topic externally-visible by default):** topic names tend to describe the event, not the audience; internal topics frequently carry PII or sensitive payloads; the Firebase-style "remember to lock it down" anti-pattern this whole design rejects.
**Why not B (separate `channels::` service):** doubles the publish API for almost-identical use cases; scripts wanting both internal triggers AND client push would publish twice; users wrap it in a helper and we're back at C with extra steps and no central policy enforcement.
### Transport: SSE first — decided
✅ Decided 2026-06-01: **SSE-only for v1.1.6. WebSocket added in a later release if real bidirectional demand emerges.**
- Simpler than WebSocket; works through any HTTP proxy without protocol upgrade
- Browsers auto-reconnect on disconnect (native `EventSource`)
- Covers the dominant use cases (chat-message-list updates, dashboard streams, notifications, IoT telemetry, build-status streams) cleanly
- Production-quality SSE requires HTTP/2 between Caddy and clients to dodge the per-origin connection cap on HTTP/1.1 — Caddy speaks HTTP/2 by default, so this is just a config note for the deploy docs
**Why not ship WS in v1.1.6:** WS is the right tool for sub-100ms bidirectional state (multiplayer games, CRDT collaborative editing, typing-indicator-level presence). On consumer hardware with Postgres-backed event distribution, that latency budget is dominated by the server stack anyway — WS would be paying implementation cost (frame management, ping/pong, close codes, backpressure protocol) without unlocking the latency it's designed for. SSE-only also frees v1.1.6 to invest in `@picloud/client` library quality instead of transport edge cases.
**Future addition path:** WebSocket coexists with SSE on a different endpoint (e.g. `/realtime/ws/{topic}`) backed by the same subscriber registry. Purely additive — no SSE clients break, no architecture decision in v1.1.6 closes the door.
### Auth model for external subscribers — decided
✅ Decided 2026-06-01: ship **public** + **HMAC-signed subscriber-token** auth in v1.1.6; **users-SDK session-based** auth follows in v1.1.8 (additive); **script-mediated per-subscribe** auth deferred to v1.2.
**Topic config columns:**
- `external_subscribable: bool` — can external clients ever subscribe?
- `auth_mode: 'public' | 'token'` — if external, what's the gate? (ignored when `external_subscribable = false`)
- v1.1.8 adds `auth_mode = 'session'` for users-SDK-based sessions; v1.2 adds `auth_mode = 'script'` for script-mediated.
**v1.1.6 trust flow (token-gated topics):**
| Hop | Auth mechanism |
|---|---|
| Script → its own token-mint endpoint | Existing API-key + app authz |
| Script → SDK helper to mint token | New `pubsub::subscriber_token(topics, ttl)` |
| Frontend → script's token endpoint | App's own auth (cookie/session/whatever the app defines) |
| Frontend → PiCloud SSE | Short-lived HMAC-signed subscriber token (bearer header) |
| SSE handler → token validation | HMAC verify, scope-check requested topic against token's allowed list |
The frontend **never** touches the app's API key. The script signs scoped, short-lived bearers (HMAC over `{topic_list, exp, app_id}`) with a secret derived from the app's API-key material. The SSE endpoint validates the signature without a DB lookup.
**Token TTL:** clamped 10s ≤ ttl ≤ 24h. Default 1h. Both bounds and default env-overridable (`PICLOUD_SUBSCRIBER_TOKEN_TTL_MIN_SEC`, `PICLOUD_SUBSCRIBER_TOKEN_TTL_MAX_SEC`, `PICLOUD_SUBSCRIBER_TOKEN_TTL_DEFAULT_SEC`).
**Token revocation:** none in v1.1.6 by design. HMAC bearers can't be revoked individually; rotation of the signing key invalidates all bearers wholesale. Short TTL is the safety mechanism. Per-token revocation arrives implicitly with v1.1.8's session-based auth (sessions CAN be invalidated).
**Public topics:** no auth at all. `GET /realtime/topics/{topic}` works for anyone if the topic has `external_subscribable = true AND auth_mode = 'public'`. Used for marketing-style broadcasts and public stat boards.
### Status
- **Approach C (opt-in external subscription)**: ✅ Decided 2026-06-01 — internal-only by default; externally-subscribable topics require explicit registration + admin-role capability; UI surface treats the bit-flip as a deliberate, audited action.
- **SSE first, WebSocket later**: ✅ Decided 2026-06-01 — SSE-only in v1.1.6; WS deferred until concrete demand emerges; future addition is purely additive on a separate endpoint.
- **Public + token-gated auth in v1.1.6**: ✅ Decided 2026-06-01 — HMAC-signed subscriber-token flow (not raw API-key passing); `users::*` session-based and script-mediated auth deferred per the table above.
### Open calls
1. ~~Approach C confirmed~~ — ✅ Decided 2026-06-01: yes, with explicit registration required for externally-subscribable topics (internal-only stays implicit); new `Capability::AppTopicManage(AppId)`.
2. ~~SSE first, WebSocket deferred~~ — ✅ Decided 2026-06-01: SSE-only in v1.1.6; WS deferred to a later release; future addition is purely additive.
3. ~~Auth model~~ — ✅ Decided 2026-06-01: public + HMAC-signed subscriber tokens in v1.1.6; `users::*` session auth in v1.1.8; script-mediated auth in v1.2; token TTL clamped 10s24h (default 1h), env-overridable; no per-token revocation in v1.1.6 (rely on TTL).
---
## 6. Frontend client library
Strategic positioning question: how much should PiCloud expose to frontend developers building apps on top of it?
### The two ends of the spectrum
| End | Frontend gets | Examples |
|---|---|---|
| **Minimalist** | HTTP to dev-defined script endpoints + SSE on dev-marked-public topics. Nothing else. | AWS Lambda + API Gateway, Cloudflare Workers, Deno Deploy |
| **Maximalist** | Direct client-side access to KV/docs/users/files. Frontend writes `kv.get()`, `docs.find()`, no Rhai script for trivial reads. | Firebase, Supabase, AWS Amplify |
PiCloud today sits at the minimalist end (services exist for scripts to use, not for frontends). Crossing to maximalist would be a real product pivot, not a feature add.
### The chosen approach: hybrid — decided
✅ Decided 2026-06-01: **Hybrid model. No direct service access from the frontend; client library standardizes script-mediated ceremony.**
Four pieces ship in `@picloud/client` for v1.1.6:
1. **Typed HTTP client to dev-defined endpoints**`picloud.endpoint('/api/users').post({ name: 'alice' })`. Fetch wrapper with auth header injection, retry logic, structured error handling.
2. **SSE subscription**`picloud.subscribe('chat-room-123', msg => …)`. Auto-reconnect, token refresh, backpressure.
3. **Auth flow helpers**`picloud.auth.login(email, password)`, `picloud.auth.logout()`, `picloud.auth.token`. These call **dev-defined** endpoints under the hood (`/api/auth/login` etc.); the lib just standardizes the dance + token storage.
4. **Realtime-aware framework hooks**`useTopic(topic)` for React, store-shape `subscribe(topic)` for Svelte. Thin polish over the SSE primitive; what frontend devs actually write.
Hard rule, load-bearing: **no `picloud.kv.get()` / `picloud.docs.find()` / `picloud.users.list()` from the frontend.** Direct service access from the browser is a strategic and security commitment, not a v1.1.6 limitation. A frontend dev who wants `kv.get()` from the browser writes a 6-line Rhai script binding it to a route — that friction is intentional, makes the dev decide deliberately that the read is okay to expose.
**Why not Firebase-mode** (full direct service access):
- Different product, different competition (Supabase / Amplify / Appwrite have 5-year head start, fulltime teams).
- Requires security-rule language + per-row authorization evaluator + tooling that PiCloud's solo-dev audience cannot operate safely. Firebase's #1 cause of data exposure is misconfigured rules — well-documented, recurring.
- Script-as-gate is dramatically more defensible: the rules are just code, in the same language as the rest of the app, debuggable like any other code.
**Why not pure-minimalist** (no client lib, just docs):
- Every PiCloud frontend dev hand-rolls the same fetch wrapper, SSE reconnect, token refresh, login/logout dance. Shipping `@picloud/client` removes that boilerplate without expanding the security surface.
### Why hybrid, not maximalist
Firebase trades security for DX; the security-rule misconfiguration footgun is the #1 cause of accidental data exposure in serverless apps. PiCloud's "solo dev / consumer hardware" audience does not have the operational capacity to defend a Firebase-style attack surface against misconfiguration. The script layer is also where PiCloud differentiates — if frontends bypass scripts to talk directly to services, we're competing with Supabase head-to-head (unwinnable, they're better-resourced and have a 5-year head start).
### Why hybrid, not pure minimalist
A frontend dev shouldn't have to hand-roll fetch wrappers, SSE reconnect logic, and token-refresh dances. That stuff is identical across every app. Shipping it as `@picloud/client` is genuinely valuable — it doesn't expand the security surface (scripts still gate everything), it just removes boilerplate.
### TypeScript first — decided
✅ Decided 2026-06-01: **TypeScript only for v1.1.6. Other-language SDKs deferred, demand-driven, no preemptive ranking.**
- TS covers ~85% of the realistic v1.x audience (web + React Native mobile + Capacitor + Electron).
- Native iOS / Android / Python / Rust / Go users can hit the REST + SSE endpoints directly without an SDK; they lose the typed wrapper but aren't blocked from shipping.
- The REST + SSE surface is documented as the **public protocol contract** so future PiCloud or the community can build other-language SDKs against a stable spec. PiCloud doesn't promise specific languages or timelines preemptively; a real user with a concrete use case is what triggers a new SDK.
- **Known caveat:** React Native doesn't ship a native `EventSource`. The TS client should runtime-detect and either fall back gracefully or require an explicit polyfill (`react-native-sse` / `react-native-event-source`) with clear docs. Not a blocker; worth surfacing in the v1.1.6 README.
### Status
- **Hybrid model (frontend through scripts only)**: ✅ Decided 2026-06-01 — confirmed; no direct service access from the browser; client lib standardizes script-mediated ceremony only.
- **TypeScript first, other languages deferred**: ✅ Decided 2026-06-01 — TS-only in v1.1.6; REST + SSE documented as public protocol contract; other languages demand-driven with no preemptive ranking; React Native SSE polyfill noted as known caveat.
- **Co-ship with realtime as v1.1.6**: ✅ Decided 2026-06-01 — server-side realtime AND `@picloud/client@1.0.0` ship together in v1.1.6. Built in parallel against a frozen REST + SSE spec. If v1.1.6 scope blows up under pressure, the lib is the deferrable piece (slips to v1.1.6.1); the realtime server itself doesn't slip.
- **Type safety / codegen**: ✅ Decided 2026-06-01 — defer codegen to v1.2+; v1.1.6 ships hand-written types with `endpoint<Req, Res>()` generic + optional client-side runtime validation via user-provided schemas (zod/valibot adapter; ~50 lines). No schema-declaration syntax in v1.1.6 — committing to that before v1.2's coherent codegen design would lock us into a shape we'd regret. Doc schemas (already arriving in v1.1.2) are the natural foundation for v1.2 codegen; script-endpoint schemas get designed alongside the generator, not before.
### Open calls
1. ~~Hybrid model~~ — ✅ Decided 2026-06-01: confirmed; no direct service access from the frontend; `@picloud/client` ships typed HTTP + SSE + auth-flow + framework hooks.
2. ~~TypeScript first, multi-language deferred~~ — ✅ Decided 2026-06-01: TS-only in v1.1.6; REST + SSE is the public protocol; other-language SDKs are demand-driven; React Native SSE polyfill caveat documented.
3. ~~Co-ship realtime + client lib~~ — ✅ Decided 2026-06-01: co-ship in v1.1.6, built in parallel against a frozen REST + SSE spec. Lib is the deferrable piece under scope pressure (slips to v1.1.6.1); server doesn't slip.
4. ~~Type safety / codegen~~ — ✅ Decided 2026-06-01: defer codegen to v1.2+; v1.1.6 ships hand-written types with `endpoint<Req, Res>()` generic + optional zod/valibot runtime validation; no schema declarations in v1.1.6.
---
## 7. Revised v1.1.x roadmap
Net changes vs the [blueprint §12](../serverless_cloud_blueprint.md) roadmap:
- **v1.1.5 pub/sub**: now via trigger outbox (drops `LISTEN/NOTIFY` plan), tightening implementation scope
- **NEW v1.1.6 Realtime Channels & Client Library**: realtime SSE + `@picloud/client` TS package; co-shipped
- **v1.1.7+ items shifted by one** (was v1.1.6/7/8 → now v1.1.7/8/9)
- **Dead letters and the unified outbox/dispatcher** are absorbed into v1.1.1's existing scope (triggers framework)
| Version | Capability |
|---|---|
| **v1.1.0** | **Foundation & Standard Library** — SDK shape, `Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape; stdlib utilities (regex, random, time, json, base64, hex, url). ✓ Shipped. |
| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (universal outbox + dispatcher + NATS-style sync HTTP via inbox + per-trigger retry config + dead-letter table & `dead_letter` trigger source + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. |
| **v1.1.2** | **Documents**`docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. |
| **v1.1.3** | **Modules**`scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. |
| **v1.1.4** | **Outbound HTTP & Scheduled Tasks**`http::*` with SSRF deny-list; cron triggers (small now that the framework exists). |
| **v1.1.5** | **Files & Pub/Sub** — filesystem-backed blobs (`files/<app_id>/<id[0:2]>/<id>`) with `files:*` triggers; pub/sub via the universal outbox with `pubsub:*` triggers. |
| **v1.1.6** | **Realtime Channels & Client Library** *(new)* — SSE-based external subscription to per-app pub/sub topics (public + HMAC-signed subscriber-token auth, minted via `pubsub::subscriber_token`); `@picloud/client` TypeScript package (typed HTTP via `endpoint<Req,Res>()`, SSE subscription, auth helpers, framework hooks). |
| **v1.1.7** | **Configuration & Email** *(was v1.1.6)* — encrypted per-app secrets; outbound `email::send/send_html` + inbound `email:receive` trigger. |
| **v1.1.8** | **User Management** *(was v1.1.7)*`users::*` for in-script CRUD, auth, roles, invites, password reset. |
| **v1.1.9** | **Durable Queues & Function Composition** *(was v1.1.8)*`queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). |
| **v1.2** | **Workflows & Hierarchies** (per blueprint §Phase 5) — DAG execution, advanced docs query, interceptors, read triggers, audit log, script-mediated realtime auth, `dead_letters::list` (aligned with `docs::find()` query DSL), client-lib type codegen from script-declared schemas. |
| **v1.3+** | **Scale & Ops** (per blueprint §Phase 6) — cluster mode (NATS-style request/reply swaps to `LISTEN/NOTIFY`), cross-app data sharing, script versioning + rollback, rate limiting, richer auth, metrics, distributed tracing, webhooks, S3, monitoring/alerting on HTTP endpoint failures. |
The v1.1.9 release marks the end of the v1.1.x expansion cadence. v1.2 is the next minor product bump (phase milestone per [versioning policy](versioning.md)).
---
## Consolidated open calls
All 20 open calls were resolved on 2026-06-01. This section is retained as a quick decision index — each item links the original question to the decision recorded in its section above. Sections will be pruned individually as their decisions ship into code and the [serverless_cloud_blueprint.md](../serverless_cloud_blueprint.md).
### §1 — Messaging primitives
1. ~~Pub/sub durability via trigger outbox~~ — ✅ Decided 2026-06-01: `publish_durable` ships in v1.1.5; `publish_ephemeral` committed as a future API.
2. ~~Queue and pub/sub stay separate~~ — ✅ Decided 2026-06-01: separate top-level namespaces; no unifying messaging abstraction.
### §2 — Universal trigger outbox
3. ~~Sync HTTP via outbox + per-request inbox~~ — ✅ Decided 2026-06-01: yes via outbox; in-process oneshot for v1.1.1, `LISTEN/NOTIFY` preserved as the cluster-mode (v1.3+) cross-process variant.
4. ~~Ship `dispatch_mode: async` for HTTP routes in v1.1.1~~ — ✅ Decided 2026-06-01: yes; `202 Accepted` + JSON body with `execution_id`; route-level config only.
5. ~~Trigger storage shape~~ — ✅ Decided 2026-06-01: Layout E (parent `triggers` + per-kind `<kind>_trigger_details`); `routes` stays its own table for v1.1.x; column-set refinements deferred to implementation PR.
### §3 — NATS-style sync HTTP
6. ~~NATS-style request/reply for sync HTTP~~ — ✅ Decided 2026-06-01 (see §2 #3).
7. ~~Status code strategy~~ — ✅ Decided 2026-06-01: keep distinctions; `500` reserved for platform problems.
8. ~~Default retry policy on triggers~~ — ✅ Decided 2026-06-01: 3/exp/1000ms + ±20% jitter; env-overridable via `PICLOUD_TRIGGER_RETRY_*`; per-trigger columns override.
9. ~~Cancel-on-timeout semantics~~ — ✅ Decided 2026-06-01: (b) — `abandoned_executions` table; dispatcher-written; 7-day retention via `PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`; metric counter on insert.
### §4 — Dead letters
10. ~~Dead-letter handlers unretryable + can't be dead-lettered themselves~~ — ✅ Decided 2026-06-01: confirmed; flag lives on the execution; missing handler = `resolution = 'handler_failed'`; indirect loops bounded by `cx.trigger_depth`.
11. ~~No default dead-letter handler~~ — ✅ Decided 2026-06-01: confirmed; rows sit in the table by default. Dashboard unresolved-count badge + per-app DL list view ship in v1.1.1.
12. ~~30-day default retention~~ — ✅ Decided 2026-06-01: 30 days, GC by `created_at`, env-only override (`PICLOUD_DEAD_LETTER_RETENTION_DAYS`).
13. ~~Rhai SDK for dead-letters in v1.1.1~~ — ✅ Decided 2026-06-01: `replay` + `resolve` in v1.1.1; `list` deferred to v1.2; new `Capability::AppDeadLetterManage(AppId)`. Related: trigger executions inherit the registrant's principal.
### §5 — Realtime
14. ~~Approach C confirmed~~ — ✅ Decided 2026-06-01: yes, with explicit registration required for externally-subscribable topics; new `Capability::AppTopicManage(AppId)`.
15. ~~SSE first, WebSocket deferred~~ — ✅ Decided 2026-06-01: SSE-only in v1.1.6; WS deferred.
16. ~~Auth model~~ — ✅ Decided 2026-06-01: public + HMAC-signed subscriber tokens in v1.1.6; `users::*` session auth in v1.1.8; script-mediated in v1.2; TTL 10s24h (default 1h), env-overridable.
### §6 — Frontend client library
17. ~~Hybrid model~~ — ✅ Decided 2026-06-01: confirmed; no direct service access from the frontend; client lib standardizes script-mediated ceremony only.
18. ~~TypeScript first, multi-language deferred~~ — ✅ Decided 2026-06-01: TS-only in v1.1.6; REST + SSE is the public protocol contract.
19. ~~Co-ship realtime + client lib~~ — ✅ Decided 2026-06-01: co-ship in v1.1.6, parallel-built against a frozen spec; lib is the deferrable piece under scope pressure.
20. ~~Type safety / codegen~~ — ✅ Decided 2026-06-01: defer codegen to v1.2+; v1.1.6 ships hand-written types via `endpoint<Req, Res>()` + optional zod/valibot runtime validation.
---
## Lifecycle of this document
- **Created** at the v1.1.0 → v1.1.1 boundary (after the foundation PR series shipped).
- **Each section gets pruned** once its decisions ship and land in the blueprint.
- **Open calls are answered** in conversation, then folded into the corresponding section as "Decided: X" with the date.
- **Document deleted** when v1.1.9 ships — everything by then is either in the blueprint, in code, or explicitly deferred to v1.2+.

View File

@@ -14,8 +14,8 @@ All of these carry the same version and are bumped together:
- Every crate in the Cargo workspace (via `version.workspace = true`) - Every crate in the Cargo workspace (via `version.workspace = true`)
- The dashboard's `package.json` - The dashboard's `package.json`
- Docker image tags (`picloud:0.2.0`) - Docker image tags (`picloud:1.1.0`)
- Git tags (`v0.2.0`) - Git tags (`v1.1.0`)
Defined once in [`Cargo.toml`](../Cargo.toml) under `[workspace.package]`. There is no scenario where one crate is at a different version than another in the same build. Defined once in [`Cargo.toml`](../Cargo.toml) under `[workspace.package]`. There is no scenario where one crate is at a different version than another in the same build.
@@ -106,19 +106,15 @@ A versioning scheme without enforcement decays in months. Five cheap mechanical
## When to bump what ## When to bump what
The product version follows SemVer applied pragmatically — we're pre-1.0, so the rules are looser: The product version uses SemVer with one carve-out for the platform's expansion cadence:
- **Patch** (`0.2.00.2.1`) — bug fixes, no surface change - **Major** (`1.x → 2.0`) — surface major bump on a user-facing contract: removed/renamed/retyped SDK function, retired API version, breaking schema change that requires user action, breaking wire-protocol change.
- **Minor** (`0.2 → 0.3`) — any surface bump, new features, or breaking changes (pre-1.0 license) - **Minor** (`1.1 → 1.2`) — phase milestone or coherent capability cluster. Bumped when the maintainer marks a release as "the platform moved forward in a way that warrants a number". Typically aligned with blueprint Phase boundaries (Phase 5 → v1.2, Phase 6 → v1.3+).
- **Major** (`0 → 1`) — first stable release; SDK and API both committed to long-term compatibility - **Patch** (`1.1.0 → 1.1.1`) — everything else: bug fixes AND **additive-only surface changes**. New SDK function, new admin endpoint, new schema migration that only adds tables/columns, new env var, new trigger kind — all patch.
After `1.0`, the product version follows strict SemVer based on the *worst* surface change: **Why the carve-out:** PiCloud ships in many small additive PRs (every v1.1.x release adds SDK surface). A strict "minor product bump per minor surface bump" rule would inflate the product version faster than the actual user-perceived "platform changed" milestones warrant. Patch-for-additions keeps the minor digit aligned with capability clusters, not individual feature drops.
- Any surface major bump → product major bump **Surface versions follow their own rules** (table above) and don't track the product version. A surface can independently hit its own `1.0` or `2.0`. The SDK in particular is likely to stabilize before the platform does, since scripts in production demand it.
- Any surface minor bump → product minor bump (at minimum)
- No surface changes → product patch
A surface can hit its own `1.0` independently of the product. The SDK in particular is likely to stabilize before the platform does, since scripts in production demand it.
--- ---
@@ -126,7 +122,7 @@ A surface can hit its own `1.0` independently of the product. The SDK in particu
| | Version | | | Version |
|---|---| |---|---|
| Product | `0.6.0` | | Product | `1.1.0` |
| SDK | `1.1` (adds `ctx.request.params`, `ctx.request.query`, `ctx.request.rest`) | | SDK | `1.1` (adds `ctx.request.params`, `ctx.request.query`, `ctx.request.rest`) |
| API | `1` (additive: `Script.app_id`, `Route.app_id`, `ExecutionLog.app_id`, new `/api/v1/admin/apps/*` and `/api/v1/admin/api-keys/*` endpoints, `?app=` filter on script list, `Authorization: Bearer pic_…` credential type, 403 responses on previously-401-only admin endpoints when the caller lacks the required capability) | | API | `1` (additive: `Script.app_id`, `Route.app_id`, `ExecutionLog.app_id`, new `/api/v1/admin/apps/*` and `/api/v1/admin/api-keys/*` endpoints, `?app=` filter on script list, `Authorization: Bearer pic_…` credential type, 403 responses on previously-401-only admin endpoints when the caller lacks the required capability) |
| Schema | `6` (matches `migrations/0006_users_authz.sql`) | | Schema | `6` (matches `migrations/0006_users_authz.sql`) |
@@ -138,15 +134,19 @@ Read live from `GET /version` on any running instance.
## Examples ## Examples
**Adding a `kv.*` SDK in v1.1+:** **Adding a `kv.*` SDK in v1.1.1:**
- Workspace bump: `0.2.0 → 0.3.0` (pre-1.0 minor) - Workspace bump: `1.1.0 → 1.1.1` (patch — additive SDK + schema, no breakage)
- SDK bump: `"1.0" → "1.1"` (added functions only) - SDK bump: `"1.1" → "1.2"` (added functions only)
- API bump: none (no new endpoints affect existing API contract) - API bump: none (admin endpoints for trigger CRUD are additive)
- Schema bump: `12` (`0002_kv_store.sql` adds the `kv_store` table) - Schema bump: `67` (`0007_kv_store.sql` adds the `kv_store` table)
**Cutting the v1.2 release (Phase 5: workflows, advanced query, interceptors):**
- Workspace bump: `1.1.8 → 1.2.0` (minor — phase milestone)
- Even if no individual change is breaking, the maintainer-marked phase transition warrants the minor digit.
**Renaming `ctx.execution_id` to `ctx.exec_id`:** **Renaming `ctx.execution_id` to `ctx.exec_id`:**
- SDK bump: `"1.x" → "2.0"` (breaking) - SDK bump: `"1.x" → "2.0"` (breaking — removed/retyped script-visible field)
- Product: minor bump pre-1.0, major bump post-1.0 - Workspace bump: `1.x.y → 2.0.0` (product major — user-facing contract break)
- Migration path: keep `ctx.execution_id` available in 1.x for a deprecation window, add `ctx.exec_id` alongside; flip to 2.0 only when both fields have shipped together for a release. - Migration path: keep `ctx.execution_id` available in 1.x for a deprecation window, add `ctx.exec_id` alongside; flip to 2.0 only when both fields have shipped together for a release.
**Adding pagination to `GET /api/v1/admin/scripts`:** **Adding pagination to `GET /api/v1/admin/scripts`:**