diff --git a/docs/sdk-shape.md b/docs/sdk-shape.md new file mode 100644 index 0000000..15e0a7f --- /dev/null +++ b/docs/sdk-shape.md @@ -0,0 +1,227 @@ +# SDK shape (v1.1.x stateful services) + +This document describes the architectural shape every v1.1.x SDK +service follows. It is **not** a feature reference for any particular +service — those live in their own docs as each PR lands (KV in v1.1.1, +docs in v1.1.2, …). What follows is the contract those PRs implement +against, so the surface stays consistent and the build doesn't drift. + +The shape was laid down in v1.1.0 (the SDK foundation PR). If you find +yourself re-litigating any of it inside a service PR, push back and +update this doc explicitly first. + +## Two kinds of Rhai modules + +**Stateless utility modules** (regex, time, json, base64, hex, url — +landing as v1.1.0's stdlib PR) are registered once at engine build. +They have no per-call state and no cross-app sensitivity. Implementation +goes in `executor-core::engine::build_engine` next to the existing +`log::` registration. They use Rhai's `register_static_module`. + +**Stateful service modules** (kv, docs, http, cron, files, pubsub, +secrets, email, users, queue, invoke) are registered **per call** by +`executor-core::sdk::register_all`. They need: + +- A service handle bundled in `picloud_shared::Services` (constructed + once at startup, cloned cheaply per call). +- A per-call `SdkCallCx` carrying the calling app, principal, + execution ids, and trigger depth. +- Closures that capture both, registered as Rhai native functions + inside a per-call `rhai::Module`. + +Mixing the two categories in one module is wrong — services that +internally consult per-call context are stateful, period. + +## `::` namespace style + +Every SDK module exposes itself under a `::` namespace, mirroring the +existing `log::`: + +```rhai +log::info("hello"); // v1.0 — present +let value = kv::collection("widgets").get("k"); // v1.1.1 +let resp = http::get("https://example.com"); // v1.1.4 +``` + +Dotted-object syntax (`kv.get("widgets", "k")`) is **not** used. +Rationale: `::` is consistent with Rust import syntax, doesn't +require a wrapper "module object" in Rhai's scope, and keeps the +module boundary obvious in scripts. + +## Handle pattern for collection-scoped services + +Services that operate on collections expose a **collection handle** +returned by an `::collection(name)` constructor: + +```rhai +let widgets = kv::collection("widgets"); +widgets.set("k", "v"); +let v = widgets.get("k"); +``` + +Not `kv::set("widgets", "k", "v")`. The handle is a Rhai custom type +the service registers; method calls bind to that type. This: + +- Removes the "did I get the collection-name argument right?" foot-gun. +- Lets the implementation cache per-collection state on the handle + (prepared statements, connection affinity) without leaking that + into the call signature. +- Pre-empts the "collection is implicit" failure mode where two + services in the same script accidentally share a default collection. + +`(app_id, collection, key)` is the identity tuple for KV; `(app_id, +collection, id)` for docs. Collections are **mandatory**, not optional +— even single-collection apps name their collection. The service layer +rejects requests with empty collection names. + +## Error convention + +- **Throw on failure.** `widgets.set("k", "v")` throws a Rhai runtime + error on any operational problem (DB unavailable, payload too large, + authz denied). Scripts opting into error handling use Rhai's + `try/catch`. +- **`()` for absent.** `widgets.get("missing")` returns `()` (Rhai + unit). Scripts test absence with `if v == () { ... }` or use the + matching `has(k)` predicate. +- **`bool` for predicates.** `widgets.has(k)` is the cheap existence + check that doesn't deserialize the value. + +This convention is uniform across every v1.1.x service. Adding +`Result`-flavoured variants is a design departure that requires a doc +update before implementation. + +## `SdkCallCx` and cross-app isolation + +Every stateful service trait method takes `&SdkCallCx` as its first +non-self argument. The cx carries: + +```rust +pub struct SdkCallCx { + pub app_id: AppId, + pub principal: Option, + pub execution_id: ExecutionId, + pub request_id: RequestId, + pub trigger_depth: u32, + pub root_execution_id: ExecutionId, +} +``` + +**The service implementation MUST derive `app_id` from `cx.app_id` — +never from a script-passed argument.** Scripts cannot name another +app's data, period. The closure registered into Rhai captures the +`Arc` for the call; the script never sees or passes +`app_id`. + +Why this matters: a `kv::set("widgets", "k", v)` call with a +script-supplied `app_id` would be a tenant-isolation vulnerability if +that arg ever leaked into the storage query. By deriving from the +host-attached cx, the service can't be tricked. + +`principal` is `Option` because the data plane is +unauthenticated by default — public HTTP scripts run with `None`. +Services that need an authenticated identity (e.g., `users::*`) check +`cx.principal.is_some()` and throw if missing. + +## Sync ↔ async bridge + +Rhai is synchronous; service trait methods (KV writes, HTTP calls) are +async. The bridge runs *inside the `spawn_blocking` thread* that +already wraps `Engine::execute` (orchestrator-core's +`LocalExecutorClient`): + +```rust +// Inside a Rhai-registered closure. +let runtime = tokio::runtime::Handle::current(); +let result = runtime.block_on(service.do_thing(&cx, args)); +``` + +`Handle::current()` finds the same Tokio runtime that scheduled the +`spawn_blocking`, so the `block_on` doesn't construct a fresh runtime. +The thread is already off the async worker pool (that's what +`spawn_blocking` does), so blocking inside it is safe. + +This pattern goes in every stateful service's registered Rhai closure. +The first service PR (KV, v1.1.1) lands a helper so subsequent services +don't reinvent the boilerplate. + +## `ServiceEventEmitter` + +Every stateful service that mutates data also emits events for the +(future) triggers framework: + +```rust +emitter.emit(&cx, ServiceEvent { + source: "kv", + op: "insert", + collection: Some("widgets".into()), + key: Some("k".into()), + payload: Some(new_value_json), + old_payload: None, +}).await?; +``` + +v1.1.0 ships only `NoopEventEmitter`. The v1.1.1 triggers PR replaces +that with an outbox-backed implementation: events land in a Postgres +outbox table; a dispatcher worker reads them out-of-band, matches +against registered triggers, and fans out script executions. The +dispatcher enforces a depth limit via `cx.trigger_depth` so a +trigger-fires-its-own-trigger chain can't run away. + +Services hold `Arc` and emit unconditionally; +the noop drops events, the real impl persists them. From the service's +perspective the emission is fire-and-forget. + +## `ExecutionGate` and `PICLOUD_MAX_CONCURRENT_EXECUTIONS` + +A single global semaphore caps concurrent script executions. Default +is 32; override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var. +Acquisition is **non-blocking, no queue** — if a permit isn't free, +the request is refused immediately with HTTP 503 and a `Retry-After: +1` header. + +Rationale: Rhai execution runs under `spawn_blocking`, which uses a +finite pool of blocking threads (defaults to 512 in current Tokio). +Without a cap, a script storm parks every blocking thread and starves +every other workload (DB writes, log sinks, audit emission). Hard +pushback is preferable to silent degradation. + +Per-app or per-script caps are deferred until a real workload demands +them. The gate lives in `orchestrator-core::gate::ExecutionGate` and +is constructed once in the picloud binary's `build_app`. + +## Registration: where future services hook in + +```rust +// orchestrator-core / executor-core internal call path — +// you do not implement this; you implement registration helpers +// that future PRs call from here. +pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc) { + // v1.1.1: register_kv(engine, services, cx.clone()); + // v1.1.2: register_docs(engine, services, cx.clone()); + // … +} +``` + +Each service PR adds: + +1. A `Service` trait + impl in `manager-core` (since that's where the + DB-backed implementations live). +2. A field on `picloud_shared::Services` (`pub kv: Arc`). +3. A `register_kv` helper inside `executor-core::sdk::kv` that takes + the engine, the service, and the cx, then registers the Rhai + `::collection(...)` constructor and method bindings. +4. A new `Capability` variant in `manager-core::authz` (e.g. + `AppKvRead(AppId)`) and a check inside the service impl. + +That sequence is the entire mechanical pattern; nothing here should +require architecture-level discussion past v1.1.0. + +## What this doc does NOT cover + +- Service-specific schemas (KV table layout, docs query DSL, etc.) — + in each service PR. +- Authentication and the admin auth model — see blueprint §11.5, + §11.6 and Phase 3.5. +- The trigger dispatch design (outbox row layout, fan-out semantics, + trigger CRUD endpoints) — comes with v1.1.1. +- Cluster mode considerations — deferred to v1.3+. diff --git a/serverless_cloud_blueprint.md b/serverless_cloud_blueprint.md index 4dee843..f3c2341 100644 --- a/serverless_cloud_blueprint.md +++ b/serverless_cloud_blueprint.md @@ -1022,9 +1022,9 @@ The scripts and routes endpoints keep their existing shape — this avoids forci --- -## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — Pending +## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — ✓ Shipped -**Status**: pending. Targets `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, an extended `auth_middleware.rs`, new shared types under `crates/shared/src/auth.rs`, migration `0006_users_authz.sql`. +**Status**: shipped, ahead of the originally planned slot. Lives in `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, the extended `auth_middleware.rs`, shared types under `crates/shared/src/auth.rs`, and migration `0006_users_authz.sql`. `can(principal, capability)` and `require(principal, capability)` are the single gate every admin handler goes through. **Purpose**: bridge Phase 3b → Phase 4. Phase 4's v1.1 SDKs (KV, docs, HTTP, cron) each gate access on the calling principal. Without a real authorization model in place, every SDK addition has to either invent its own gate or stay open. Phase 3.5 lands `can(principal, capability)` as the single check every future SDK + admin endpoint goes through, so v1.1 work focuses on data plane shape, not on re-litigating auth. @@ -1223,7 +1223,7 @@ Defer to follow-up sessions: dashboard surfaces for invites / key minting (curl --- -### Phase 3: v1.0.x — Foundations (Current focus) +### Phase 3: v1.0.x — Foundations ✓ (Shipped) Three foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive. @@ -1231,24 +1231,27 @@ Three foundation pieces that must land before the v1.1 service expansion, becaus **3b. Multi-app scoping** — ✓ shipped. See section 11.5. `apps`, `app_domains`, `app_slug_history` tables; `app_id` columns on `scripts`, `routes`, `execution_logs`. Migration assigns existing data to a `default` app and always claims `localhost`; a Rust-side bootstrap inserts a `Hello World` script + `/hello` route when the default app is empty. Orchestrator dispatch is two-phase (Host → app → route trie). `/api/v1/execute/{id}/*` continues to work without a public domain claim. Dashboard is app-hierarchical (`/admin/apps`, `/admin/apps/{slug}/...`); API stays flat with new endpoints under `/api/v1/admin/apps/*` and a `?app=` filter on script listing. Per-app admin roles deferred. -**3c. Users, roles, and bearer-token auth** — pending. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only. +**3c. Users, roles, and bearer-token auth (Phase 3.5)** — ✓ shipped. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only. **Why all three before v1.1**: every v1.1 service (KV, docs, users, etc.) needs both an `app_id` scoping key in its schema and a `Principal` to authorize against. Adding both now is one migration each on a small surface; adding them after the SDKs ship is many migrations on populated data plus a re-gate of every SDK call. --- -### Phase 4: v1.1 (Expand Capabilities & Services) -Ordered roughly by foundation value: each row enables the rows below it. +### Phase 4: v1.1 (Expand Capabilities & Services) — Current focus -1. **Rhai SDK: KV Store** (`kv.get/set/delete/has` with collections, scoped per app) -2. **Rhai SDK: Document Store** (`docs.create/find/update/delete/list/query`, scoped per app) -3. **Rhai SDK: HTTP** (`http.get/post/put/delete` with SSRF deny-list) -4. **Cron triggers** (manager scheduler skeleton already exists; needs schedules table + `FOR UPDATE SKIP LOCKED` dispatch) -5. **Rhai SDK: Email** (`email.send` via SMTP; needs per-deploy config) -6. **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset; depends on email for invites; scoped per app) -7. **Queue triggers** (start with Postgres LISTEN/NOTIFY; RabbitMQ/Redis later if needed) -8. **`invoke()` + `retry::*`** (function-to-function calls; execution_logs gain `parent_execution_id`) -9. **Secrets management** (encrypted env vars, per app) +Released in patch steps (v1.1.0 → v1.1.8), each landing one focused capability. The split lets each release ship behind tests + docs without long-lived branches. SDK shape (handle pattern, `::` namespace, error convention, `ExecutionGate`, `SdkCallCx`, `ServiceEventEmitter` — see §7.5 and [docs/sdk-shape.md](../docs/sdk-shape.md)) is fixed in v1.1.0; every subsequent release fills in the contents without re-litigating the shape. + +| Version | Capability | +|---------|------------| +| **v1.1.0** | **Foundation & Standard Library** — SDK shape (`Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape); stdlib utilities (regex, random, time, json, base64, hex, url). | +| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (outbox + dispatcher + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. | +| **v1.1.2** | **Documents** — `docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. | +| **v1.1.3** | **Modules** — `scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. | +| **v1.1.4** | **Outbound HTTP & Scheduled Tasks** — `http::*` with SSRF deny-list; cron triggers. | +| **v1.1.5** | **Files & Messaging** — filesystem-backed blobs with `files:*` triggers; pub/sub via LISTEN/NOTIFY with `pubsub:*` triggers. | +| **v1.1.6** | **Configuration & Email** — encrypted per-app secrets; outbound `email::send` / `send_html` + inbound `email:receive` trigger. | +| **v1.1.7** | **User Management** — `users::*` for in-script CRUD, auth, roles, invites, password reset. | +| **v1.1.8** | **Durable Queues & Function Composition** — `queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). | --- @@ -1309,59 +1312,71 @@ Ordered roughly by foundation value: each row enables the rows below it. | **ctx** (global) | `ctx.execution_id`, `ctx.script_id`, `ctx.script_name`, `ctx.request_id`, `ctx.trace_id`, `ctx.invocation_type`, `ctx.parent_execution_id`, `ctx.request.path`, `ctx.request.headers`, `ctx.request.body` | MVP+ | | **Response** | Return `{ statusCode, headers?, body }` | MVP | +## 7.5 SDK Architecture (v1.1.x foundation) + +Stateful Rhai SDK services (KV, docs, HTTP, …) hang off a common shape laid down by the v1.1.0 SDK foundation PR. Full reference lives in [docs/sdk-shape.md](../docs/sdk-shape.md); this section sketches the moving parts so other sections can refer to them by name. + +**`Services` bundle** (`picloud_shared::Services`) — an `#[non_exhaustive]` struct constructed once at startup. v1.1.0 ships it empty; each subsequent v1.1.x PR adds one `Arc` / `Arc` / … field. Held on `Engine`, passed by reference to the per-call registration hook. + +**Per-call context** (`picloud_shared::SdkCallCx`) — every stateful service trait method takes `&SdkCallCx` as its first non-self argument. Carries `app_id`, `Option`, `execution_id`, `request_id`, and the `trigger_depth` / `root_execution_id` slots that the triggers framework populates. Services derive `app_id` from the cx — never from script-passed args. **That rule is the cross-app isolation boundary**; scripts cannot name another app's data. + +**Handle pattern** — collection-scoped services expose `kv::collection("widgets").get("k")`, not `kv::get("widgets", "k")`. Removes the wrong-collection-name foot-gun and lets implementations cache per-collection state. `(app_id, collection, key)` is the identity tuple for KV; `(app_id, collection, id)` for docs. Collections are mandatory. + +**Error convention** — throw on failure, `()` for absent, `bool` for predicates. Uniform across every v1.1.x service. Scripts opt into handling errors via Rhai's `try/catch`. + +**`ExecutionGate`** (`orchestrator-core::gate::ExecutionGate`) — single global semaphore capping concurrent script executions. Default 32, override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var. Non-blocking — on overflow, the orchestrator returns HTTP 503 with `Retry-After: 1` immediately. No queue. Rationale: Rhai runs under `spawn_blocking`, so unbounded concurrency would park every blocking thread and starve every other workload. + +**`ServiceEventEmitter`** (`picloud_shared::ServiceEventEmitter`) — every mutating service method emits a `ServiceEvent { source, op, collection, key, payload, old_payload }`. v1.1.0 ships `NoopEventEmitter`; the real outbox-backed dispatcher lands with v1.1.1 (see 7.5.1). + +### 7.5.1 Trigger architecture (sketch) + +Triggers fire scripts in response to service events. Three locked properties; full design and CRUD endpoints land with v1.1.1. + +1. **Async outbox**: services emit events synchronously into a Postgres outbox table; a separate dispatcher worker reads, matches them against registered triggers, and fans out script executions. Service writes don't block on trigger fan-out. +2. **Depth-limited**: each trigger-spawned execution increments `cx.trigger_depth`. The dispatcher refuses to fan out beyond a configured ceiling to prevent runaway feedback loops. `cx.root_execution_id` preserves the originating execution id for audit grouping. +3. **Trigger model**: a trigger is `(service, event, filter) → script`, stored in a `triggers` table. The filter is the dispatcher's match predicate on the emitted `ServiceEvent`. + ### 8.1 KV Store Service -**Purpose**: Simple key-value persistence organized by collections, shared across script invocations and scripts. +**Purpose**: Simple key-value persistence organized by collections, scoped per app and shared across script invocations and scripts within that app. -**PostgreSQL Setup:** +**PostgreSQL Schema:** ```sql --- Enable hstore extension (one-time setup) -CREATE EXTENSION IF NOT EXISTS hstore; - --- Create KV table with collection support CREATE TABLE kv_store ( + app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE, collection TEXT NOT NULL, - key TEXT NOT NULL, - value hstore NOT NULL, + key TEXT NOT NULL, + value JSONB NOT NULL, expires_at TIMESTAMP, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), - - PRIMARY KEY (collection, key) + + PRIMARY KEY (app_id, collection, key) ); -CREATE INDEX idx_kv_collection ON kv_store(collection); -CREATE INDEX idx_kv_expires ON kv_store(expires_at) +CREATE INDEX idx_kv_app_collection ON kv_store(app_id, collection); +CREATE INDEX idx_kv_expires ON kv_store(expires_at) WHERE expires_at IS NOT NULL; ``` -**Why hstore + collections?** -- Lightweight, purpose-built for key-value storage -- Collections allow logical grouping (e.g., `kv:sessions`, `kv:counters`, `kv:flags`) -- Faster than JSONB for simple KV use cases -- Built-in indexing support -- Keeps all data in one database (no Redis dependency) +**Why JSONB + mandatory collections + `app_id` first:** +- `(app_id, collection, key)` is the identity tuple. The PK begins with `app_id` so the index is naturally per-app; cross-app reads can't happen even if the service layer has a bug. +- Collections are **mandatory** — every set / get / delete names one. The same key can legitimately live in multiple collections within one app (`sessions:abc` and `counters:abc` are distinct rows). +- JSONB carries arbitrary script-side values (nested objects, arrays) without a separate serialization step. `hstore` was considered and ruled out — it doesn't carry nested types and would force a second JSONB column the moment a script writes a structured value. -**Rhai SDK:** +**Value-size cap:** 64 KiB per value, enforced at the service layer (script-visible error on overflow). The cap keeps KV "small fast values, not blob storage"; the v1.1.5 files SDK is the right home for large payloads. + +**Rhai SDK (handle pattern — see [docs/sdk-shape.md](docs/sdk-shape.md)):** ```rhai -// Get a value from a collection -let val = kv.get("sessions", "user:123"); // Returns object or null +let sessions = kv::collection("sessions"); +sessions.set("user:123", #{ token: "abc", created: "2026-04-10" }); +let val = sessions.get("user:123"); // value or () if absent +sessions.delete("user:123"); +sessions.set("user:123", #{ token: "xyz" }, 3600); // TTL in seconds +if sessions.has("user:123") { ... } -// Set a value in a collection -kv.set("sessions", "user:123", { token: "abc", created: "2026-04-10" }); - -// Delete a key from a collection -kv.delete("sessions", "user:123"); - -// Set with TTL (seconds) -kv.set("sessions", "user:123", { token: "xyz" }, 3600); - -// Check if key exists in a collection -if kv.has("sessions", "user:123") { ... } - -// Use different collections for different purposes -kv.set("counters", "api:calls", 42); -kv.set("flags", "feature:beta", true); -kv.set("cache", "page:home", { html: "..." }); +// Distinct collections in one script — different handles. +let counters = kv::collection("counters"); +counters.set("api:calls", 42); ``` **Use Cases:**