docs(sdk): SDK-shape reference + blueprint updates for v1.1.x

Lands the developer-facing reference for the SDK shape every v1.1.x
service implements against, plus the blueprint changes the shape and
the recently-shipped Phase 3.5 imply:

  - New docs/sdk-shape.md — covers handle pattern, :: namespace,
    throw/() error convention, sync↔async bridge, cross-app isolation
    rule, ServiceEventEmitter, ExecutionGate + env var, stateless vs
    stateful module registration.
  - Blueprint §11.6 (Phase 3.5): Pending → ✓ Shipped, with a note that
    it landed ahead of the originally planned slot.
  - Blueprint §8.1 (KV Store): replace hstore schema + rationale with
    JSONB. PK becomes (app_id, collection, key); cross-app isolation
    is enforced at the index, not just the service layer. Note 64 KiB
    per-value cap enforced at the service layer (lands with the KV PR
    in v1.1.1).
  - Blueprint new §7.5 (SDK Architecture): brief overview pointing to
    docs/sdk-shape.md. Includes §7.5.1 sketch of the trigger
    architecture (outbox + depth limit + (service, event, filter) →
    script).
  - Blueprint §12 Phase 4: restructured to enumerate v1.1.0 through
    v1.1.8 with one focused capability per release. Current focus
    moves to Phase 4 (v1.1.0) now that Phase 3.5 is done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-30 18:57:44 +02:00
parent 902dd78027
commit 5302bd3192
2 changed files with 294 additions and 52 deletions

227
docs/sdk-shape.md Normal file
View File

@@ -0,0 +1,227 @@
# SDK shape (v1.1.x stateful services)
This document describes the architectural shape every v1.1.x SDK
service follows. It is **not** a feature reference for any particular
service — those live in their own docs as each PR lands (KV in v1.1.1,
docs in v1.1.2, …). What follows is the contract those PRs implement
against, so the surface stays consistent and the build doesn't drift.
The shape was laid down in v1.1.0 (the SDK foundation PR). If you find
yourself re-litigating any of it inside a service PR, push back and
update this doc explicitly first.
## Two kinds of Rhai modules
**Stateless utility modules** (regex, time, json, base64, hex, url —
landing as v1.1.0's stdlib PR) are registered once at engine build.
They have no per-call state and no cross-app sensitivity. Implementation
goes in `executor-core::engine::build_engine` next to the existing
`log::` registration. They use Rhai's `register_static_module`.
**Stateful service modules** (kv, docs, http, cron, files, pubsub,
secrets, email, users, queue, invoke) are registered **per call** by
`executor-core::sdk::register_all`. They need:
- A service handle bundled in `picloud_shared::Services` (constructed
once at startup, cloned cheaply per call).
- A per-call `SdkCallCx` carrying the calling app, principal,
execution ids, and trigger depth.
- Closures that capture both, registered as Rhai native functions
inside a per-call `rhai::Module`.
Mixing the two categories in one module is wrong — services that
internally consult per-call context are stateful, period.
## `::` namespace style
Every SDK module exposes itself under a `::` namespace, mirroring the
existing `log::`:
```rhai
log::info("hello"); // v1.0 — present
let value = kv::collection("widgets").get("k"); // v1.1.1
let resp = http::get("https://example.com"); // v1.1.4
```
Dotted-object syntax (`kv.get("widgets", "k")`) is **not** used.
Rationale: `::` is consistent with Rust import syntax, doesn't
require a wrapper "module object" in Rhai's scope, and keeps the
module boundary obvious in scripts.
## Handle pattern for collection-scoped services
Services that operate on collections expose a **collection handle**
returned by an `::collection(name)` constructor:
```rhai
let widgets = kv::collection("widgets");
widgets.set("k", "v");
let v = widgets.get("k");
```
Not `kv::set("widgets", "k", "v")`. The handle is a Rhai custom type
the service registers; method calls bind to that type. This:
- Removes the "did I get the collection-name argument right?" foot-gun.
- Lets the implementation cache per-collection state on the handle
(prepared statements, connection affinity) without leaking that
into the call signature.
- Pre-empts the "collection is implicit" failure mode where two
services in the same script accidentally share a default collection.
`(app_id, collection, key)` is the identity tuple for KV; `(app_id,
collection, id)` for docs. Collections are **mandatory**, not optional
— even single-collection apps name their collection. The service layer
rejects requests with empty collection names.
## Error convention
- **Throw on failure.** `widgets.set("k", "v")` throws a Rhai runtime
error on any operational problem (DB unavailable, payload too large,
authz denied). Scripts opting into error handling use Rhai's
`try/catch`.
- **`()` for absent.** `widgets.get("missing")` returns `()` (Rhai
unit). Scripts test absence with `if v == () { ... }` or use the
matching `has(k)` predicate.
- **`bool` for predicates.** `widgets.has(k)` is the cheap existence
check that doesn't deserialize the value.
This convention is uniform across every v1.1.x service. Adding
`Result`-flavoured variants is a design departure that requires a doc
update before implementation.
## `SdkCallCx` and cross-app isolation
Every stateful service trait method takes `&SdkCallCx` as its first
non-self argument. The cx carries:
```rust
pub struct SdkCallCx {
pub app_id: AppId,
pub principal: Option<Principal>,
pub execution_id: ExecutionId,
pub request_id: RequestId,
pub trigger_depth: u32,
pub root_execution_id: ExecutionId,
}
```
**The service implementation MUST derive `app_id` from `cx.app_id`
never from a script-passed argument.** Scripts cannot name another
app's data, period. The closure registered into Rhai captures the
`Arc<SdkCallCx>` for the call; the script never sees or passes
`app_id`.
Why this matters: a `kv::set("widgets", "k", v)` call with a
script-supplied `app_id` would be a tenant-isolation vulnerability if
that arg ever leaked into the storage query. By deriving from the
host-attached cx, the service can't be tricked.
`principal` is `Option<Principal>` because the data plane is
unauthenticated by default — public HTTP scripts run with `None`.
Services that need an authenticated identity (e.g., `users::*`) check
`cx.principal.is_some()` and throw if missing.
## Sync ↔ async bridge
Rhai is synchronous; service trait methods (KV writes, HTTP calls) are
async. The bridge runs *inside the `spawn_blocking` thread* that
already wraps `Engine::execute` (orchestrator-core's
`LocalExecutorClient`):
```rust
// Inside a Rhai-registered closure.
let runtime = tokio::runtime::Handle::current();
let result = runtime.block_on(service.do_thing(&cx, args));
```
`Handle::current()` finds the same Tokio runtime that scheduled the
`spawn_blocking`, so the `block_on` doesn't construct a fresh runtime.
The thread is already off the async worker pool (that's what
`spawn_blocking` does), so blocking inside it is safe.
This pattern goes in every stateful service's registered Rhai closure.
The first service PR (KV, v1.1.1) lands a helper so subsequent services
don't reinvent the boilerplate.
## `ServiceEventEmitter`
Every stateful service that mutates data also emits events for the
(future) triggers framework:
```rust
emitter.emit(&cx, ServiceEvent {
source: "kv",
op: "insert",
collection: Some("widgets".into()),
key: Some("k".into()),
payload: Some(new_value_json),
old_payload: None,
}).await?;
```
v1.1.0 ships only `NoopEventEmitter`. The v1.1.1 triggers PR replaces
that with an outbox-backed implementation: events land in a Postgres
outbox table; a dispatcher worker reads them out-of-band, matches
against registered triggers, and fans out script executions. The
dispatcher enforces a depth limit via `cx.trigger_depth` so a
trigger-fires-its-own-trigger chain can't run away.
Services hold `Arc<dyn ServiceEventEmitter>` and emit unconditionally;
the noop drops events, the real impl persists them. From the service's
perspective the emission is fire-and-forget.
## `ExecutionGate` and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`
A single global semaphore caps concurrent script executions. Default
is 32; override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var.
Acquisition is **non-blocking, no queue** — if a permit isn't free,
the request is refused immediately with HTTP 503 and a `Retry-After:
1` header.
Rationale: Rhai execution runs under `spawn_blocking`, which uses a
finite pool of blocking threads (defaults to 512 in current Tokio).
Without a cap, a script storm parks every blocking thread and starves
every other workload (DB writes, log sinks, audit emission). Hard
pushback is preferable to silent degradation.
Per-app or per-script caps are deferred until a real workload demands
them. The gate lives in `orchestrator-core::gate::ExecutionGate` and
is constructed once in the picloud binary's `build_app`.
## Registration: where future services hook in
```rust
// orchestrator-core / executor-core internal call path —
// you do not implement this; you implement registration helpers
// that future PRs call from here.
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
// v1.1.1: register_kv(engine, services, cx.clone());
// v1.1.2: register_docs(engine, services, cx.clone());
// …
}
```
Each service PR adds:
1. A `Service` trait + impl in `manager-core` (since that's where the
DB-backed implementations live).
2. A field on `picloud_shared::Services` (`pub kv: Arc<dyn KvService>`).
3. A `register_kv` helper inside `executor-core::sdk::kv` that takes
the engine, the service, and the cx, then registers the Rhai
`::collection(...)` constructor and method bindings.
4. A new `Capability` variant in `manager-core::authz` (e.g.
`AppKvRead(AppId)`) and a check inside the service impl.
That sequence is the entire mechanical pattern; nothing here should
require architecture-level discussion past v1.1.0.
## What this doc does NOT cover
- Service-specific schemas (KV table layout, docs query DSL, etc.) —
in each service PR.
- Authentication and the admin auth model — see blueprint §11.5,
§11.6 and Phase 3.5.
- The trigger dispatch design (outbox row layout, fan-out semantics,
trigger CRUD endpoints) — comes with v1.1.1.
- Cluster mode considerations — deferred to v1.3+.

View File

@@ -1022,9 +1022,9 @@ The scripts and routes endpoints keep their existing shape — this avoids forci
---
## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — Pending
## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — ✓ Shipped
**Status**: pending. Targets `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, an extended `auth_middleware.rs`, new shared types under `crates/shared/src/auth.rs`, migration `0006_users_authz.sql`.
**Status**: shipped, ahead of the originally planned slot. Lives in `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, the extended `auth_middleware.rs`, shared types under `crates/shared/src/auth.rs`, and migration `0006_users_authz.sql`. `can(principal, capability)` and `require(principal, capability)` are the single gate every admin handler goes through.
**Purpose**: bridge Phase 3b → Phase 4. Phase 4's v1.1 SDKs (KV, docs, HTTP, cron) each gate access on the calling principal. Without a real authorization model in place, every SDK addition has to either invent its own gate or stay open. Phase 3.5 lands `can(principal, capability)` as the single check every future SDK + admin endpoint goes through, so v1.1 work focuses on data plane shape, not on re-litigating auth.
@@ -1223,7 +1223,7 @@ Defer to follow-up sessions: dashboard surfaces for invites / key minting (curl
---
### Phase 3: v1.0.x — Foundations (Current focus)
### Phase 3: v1.0.x — Foundations ✓ (Shipped)
Three foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
@@ -1231,24 +1231,27 @@ Three foundation pieces that must land before the v1.1 service expansion, becaus
**3b. Multi-app scoping** — ✓ shipped. See section 11.5. `apps`, `app_domains`, `app_slug_history` tables; `app_id` columns on `scripts`, `routes`, `execution_logs`. Migration assigns existing data to a `default` app and always claims `localhost`; a Rust-side bootstrap inserts a `Hello World` script + `/hello` route when the default app is empty. Orchestrator dispatch is two-phase (Host → app → route trie). `/api/v1/execute/{id}/*` continues to work without a public domain claim. Dashboard is app-hierarchical (`/admin/apps`, `/admin/apps/{slug}/...`); API stays flat with new endpoints under `/api/v1/admin/apps/*` and a `?app=` filter on script listing. Per-app admin roles deferred.
**3c. Users, roles, and bearer-token auth** — pending. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only.
**3c. Users, roles, and bearer-token auth (Phase 3.5)** — ✓ shipped. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only.
**Why all three before v1.1**: every v1.1 service (KV, docs, users, etc.) needs both an `app_id` scoping key in its schema and a `Principal` to authorize against. Adding both now is one migration each on a small surface; adding them after the SDKs ship is many migrations on populated data plus a re-gate of every SDK call.
---
### Phase 4: v1.1 (Expand Capabilities & Services)
Ordered roughly by foundation value: each row enables the rows below it.
### Phase 4: v1.1 (Expand Capabilities & Services) — Current focus
1. **Rhai SDK: KV Store** (`kv.get/set/delete/has` with collections, scoped per app)
2. **Rhai SDK: Document Store** (`docs.create/find/update/delete/list/query`, scoped per app)
3. **Rhai SDK: HTTP** (`http.get/post/put/delete` with SSRF deny-list)
4. **Cron triggers** (manager scheduler skeleton already exists; needs schedules table + `FOR UPDATE SKIP LOCKED` dispatch)
5. **Rhai SDK: Email** (`email.send` via SMTP; needs per-deploy config)
6. **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset; depends on email for invites; scoped per app)
7. **Queue triggers** (start with Postgres LISTEN/NOTIFY; RabbitMQ/Redis later if needed)
8. **`invoke()` + `retry::*`** (function-to-function calls; execution_logs gain `parent_execution_id`)
9. **Secrets management** (encrypted env vars, per app)
Released in patch steps (v1.1.0 → v1.1.8), each landing one focused capability. The split lets each release ship behind tests + docs without long-lived branches. SDK shape (handle pattern, `::` namespace, error convention, `ExecutionGate`, `SdkCallCx`, `ServiceEventEmitter` — see §7.5 and [docs/sdk-shape.md](../docs/sdk-shape.md)) is fixed in v1.1.0; every subsequent release fills in the contents without re-litigating the shape.
| Version | Capability |
|---------|------------|
| **v1.1.0** | **Foundation & Standard Library** — SDK shape (`Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape); stdlib utilities (regex, random, time, json, base64, hex, url). |
| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (outbox + dispatcher + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. |
| **v1.1.2** | **Documents**`docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. |
| **v1.1.3** | **Modules**`scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. |
| **v1.1.4** | **Outbound HTTP & Scheduled Tasks**`http::*` with SSRF deny-list; cron triggers. |
| **v1.1.5** | **Files & Messaging** — filesystem-backed blobs with `files:*` triggers; pub/sub via LISTEN/NOTIFY with `pubsub:*` triggers. |
| **v1.1.6** | **Configuration & Email** — encrypted per-app secrets; outbound `email::send` / `send_html` + inbound `email:receive` trigger. |
| **v1.1.7** | **User Management**`users::*` for in-script CRUD, auth, roles, invites, password reset. |
| **v1.1.8** | **Durable Queues & Function Composition**`queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). |
---
@@ -1309,59 +1312,71 @@ Ordered roughly by foundation value: each row enables the rows below it.
| **ctx** (global) | `ctx.execution_id`, `ctx.script_id`, `ctx.script_name`, `ctx.request_id`, `ctx.trace_id`, `ctx.invocation_type`, `ctx.parent_execution_id`, `ctx.request.path`, `ctx.request.headers`, `ctx.request.body` | MVP+ |
| **Response** | Return `{ statusCode, headers?, body }` | MVP |
## 7.5 SDK Architecture (v1.1.x foundation)
Stateful Rhai SDK services (KV, docs, HTTP, …) hang off a common shape laid down by the v1.1.0 SDK foundation PR. Full reference lives in [docs/sdk-shape.md](../docs/sdk-shape.md); this section sketches the moving parts so other sections can refer to them by name.
**`Services` bundle** (`picloud_shared::Services`) — an `#[non_exhaustive]` struct constructed once at startup. v1.1.0 ships it empty; each subsequent v1.1.x PR adds one `Arc<dyn KvService>` / `Arc<dyn DocsService>` / … field. Held on `Engine`, passed by reference to the per-call registration hook.
**Per-call context** (`picloud_shared::SdkCallCx`) — every stateful service trait method takes `&SdkCallCx` as its first non-self argument. Carries `app_id`, `Option<Principal>`, `execution_id`, `request_id`, and the `trigger_depth` / `root_execution_id` slots that the triggers framework populates. Services derive `app_id` from the cx — never from script-passed args. **That rule is the cross-app isolation boundary**; scripts cannot name another app's data.
**Handle pattern** — collection-scoped services expose `kv::collection("widgets").get("k")`, not `kv::get("widgets", "k")`. Removes the wrong-collection-name foot-gun and lets implementations cache per-collection state. `(app_id, collection, key)` is the identity tuple for KV; `(app_id, collection, id)` for docs. Collections are mandatory.
**Error convention** — throw on failure, `()` for absent, `bool` for predicates. Uniform across every v1.1.x service. Scripts opt into handling errors via Rhai's `try/catch`.
**`ExecutionGate`** (`orchestrator-core::gate::ExecutionGate`) — single global semaphore capping concurrent script executions. Default 32, override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var. Non-blocking — on overflow, the orchestrator returns HTTP 503 with `Retry-After: 1` immediately. No queue. Rationale: Rhai runs under `spawn_blocking`, so unbounded concurrency would park every blocking thread and starve every other workload.
**`ServiceEventEmitter`** (`picloud_shared::ServiceEventEmitter`) — every mutating service method emits a `ServiceEvent { source, op, collection, key, payload, old_payload }`. v1.1.0 ships `NoopEventEmitter`; the real outbox-backed dispatcher lands with v1.1.1 (see 7.5.1).
### 7.5.1 Trigger architecture (sketch)
Triggers fire scripts in response to service events. Three locked properties; full design and CRUD endpoints land with v1.1.1.
1. **Async outbox**: services emit events synchronously into a Postgres outbox table; a separate dispatcher worker reads, matches them against registered triggers, and fans out script executions. Service writes don't block on trigger fan-out.
2. **Depth-limited**: each trigger-spawned execution increments `cx.trigger_depth`. The dispatcher refuses to fan out beyond a configured ceiling to prevent runaway feedback loops. `cx.root_execution_id` preserves the originating execution id for audit grouping.
3. **Trigger model**: a trigger is `(service, event, filter) → script`, stored in a `triggers` table. The filter is the dispatcher's match predicate on the emitted `ServiceEvent`.
### 8.1 KV Store Service
**Purpose**: Simple key-value persistence organized by collections, shared across script invocations and scripts.
**Purpose**: Simple key-value persistence organized by collections, scoped per app and shared across script invocations and scripts within that app.
**PostgreSQL Setup:**
**PostgreSQL Schema:**
```sql
-- Enable hstore extension (one-time setup)
CREATE EXTENSION IF NOT EXISTS hstore;
-- Create KV table with collection support
CREATE TABLE kv_store (
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
collection TEXT NOT NULL,
key TEXT NOT NULL,
value hstore NOT NULL,
value JSONB NOT NULL,
expires_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (collection, key)
PRIMARY KEY (app_id, collection, key)
);
CREATE INDEX idx_kv_collection ON kv_store(collection);
CREATE INDEX idx_kv_app_collection ON kv_store(app_id, collection);
CREATE INDEX idx_kv_expires ON kv_store(expires_at)
WHERE expires_at IS NOT NULL;
```
**Why hstore + collections?**
- Lightweight, purpose-built for key-value storage
- Collections allow logical grouping (e.g., `kv:sessions`, `kv:counters`, `kv:flags`)
- Faster than JSONB for simple KV use cases
- Built-in indexing support
- Keeps all data in one database (no Redis dependency)
**Why JSONB + mandatory collections + `app_id` first:**
- `(app_id, collection, key)` is the identity tuple. The PK begins with `app_id` so the index is naturally per-app; cross-app reads can't happen even if the service layer has a bug.
- Collections are **mandatory** — every set / get / delete names one. The same key can legitimately live in multiple collections within one app (`sessions:abc` and `counters:abc` are distinct rows).
- JSONB carries arbitrary script-side values (nested objects, arrays) without a separate serialization step. `hstore` was considered and ruled out — it doesn't carry nested types and would force a second JSONB column the moment a script writes a structured value.
**Rhai SDK:**
**Value-size cap:** 64 KiB per value, enforced at the service layer (script-visible error on overflow). The cap keeps KV "small fast values, not blob storage"; the v1.1.5 files SDK is the right home for large payloads.
**Rhai SDK (handle pattern — see [docs/sdk-shape.md](docs/sdk-shape.md)):**
```rhai
// Get a value from a collection
let val = kv.get("sessions", "user:123"); // Returns object or null
let sessions = kv::collection("sessions");
sessions.set("user:123", #{ token: "abc", created: "2026-04-10" });
let val = sessions.get("user:123"); // value or () if absent
sessions.delete("user:123");
sessions.set("user:123", #{ token: "xyz" }, 3600); // TTL in seconds
if sessions.has("user:123") { ... }
// Set a value in a collection
kv.set("sessions", "user:123", { token: "abc", created: "2026-04-10" });
// Delete a key from a collection
kv.delete("sessions", "user:123");
// Set with TTL (seconds)
kv.set("sessions", "user:123", { token: "xyz" }, 3600);
// Check if key exists in a collection
if kv.has("sessions", "user:123") { ... }
// Use different collections for different purposes
kv.set("counters", "api:calls", 42);
kv.set("flags", "feature:beta", true);
kv.set("cache", "page:home", { html: "..." });
// Distinct collections in one script — different handles.
let counters = kv::collection("counters");
counters.set("api:calls", 42);
```
**Use Cases:**