# SDK shape (v1.1.x stateful services) This document describes the architectural shape every v1.1.x SDK service follows. It is **not** a feature reference for any particular service — those live in their own docs as each PR lands (KV in v1.1.1, docs in v1.1.2, …). What follows is the contract those PRs implement against, so the surface stays consistent and the build doesn't drift. The shape was laid down in v1.1.0 (the SDK foundation PR). If you find yourself re-litigating any of it inside a service PR, push back and update this doc explicitly first. ## Two kinds of Rhai modules **Stateless utility modules** (regex, time, json, base64, hex, url — landing as v1.1.0's stdlib PR) are registered once at engine build. They have no per-call state and no cross-app sensitivity. Implementation goes in `executor-core::engine::build_engine` next to the existing `log::` registration. They use Rhai's `register_static_module`. **Stateful service modules** (kv, docs, http, cron, files, pubsub, secrets, email, users, queue, invoke) are registered **per call** by `executor-core::sdk::register_all`. They need: - A service handle bundled in `picloud_shared::Services` (constructed once at startup, cloned cheaply per call). - A per-call `SdkCallCx` carrying the calling app, principal, execution ids, and trigger depth. - Closures that capture both, registered as Rhai native functions inside a per-call `rhai::Module`. Mixing the two categories in one module is wrong — services that internally consult per-call context are stateful, period. ## `::` namespace style Every SDK module exposes itself under a `::` namespace, mirroring the existing `log::`: ```rhai log::info("hello"); // v1.0 — present let value = kv::collection("widgets").get("k"); // v1.1.1 let resp = http::get("https://example.com"); // v1.1.4 ``` Dotted-object syntax (`kv.get("widgets", "k")`) is **not** used. Rationale: `::` is consistent with Rust import syntax, doesn't require a wrapper "module object" in Rhai's scope, and keeps the module boundary obvious in scripts. ## Handle pattern for collection-scoped services Services that operate on collections expose a **collection handle** returned by an `::collection(name)` constructor: ```rhai let widgets = kv::collection("widgets"); widgets.set("k", "v"); let v = widgets.get("k"); ``` Not `kv::set("widgets", "k", "v")`. The handle is a Rhai custom type the service registers; method calls bind to that type. This: - Removes the "did I get the collection-name argument right?" foot-gun. - Lets the implementation cache per-collection state on the handle (prepared statements, connection affinity) without leaking that into the call signature. - Pre-empts the "collection is implicit" failure mode where two services in the same script accidentally share a default collection. `(app_id, collection, key)` is the identity tuple for KV; `(app_id, collection, id)` for docs. Collections are **mandatory**, not optional — even single-collection apps name their collection. The service layer rejects requests with empty collection names. ## Error convention - **Throw on failure.** `widgets.set("k", "v")` throws a Rhai runtime error on any operational problem (DB unavailable, payload too large, authz denied). Scripts opting into error handling use Rhai's `try/catch`. - **`()` for absent.** `widgets.get("missing")` returns `()` (Rhai unit). Scripts test absence with `if v == () { ... }` or use the matching `has(k)` predicate. - **`bool` for predicates.** `widgets.has(k)` is the cheap existence check that doesn't deserialize the value. This convention is uniform across every v1.1.x service. Adding `Result`-flavoured variants is a design departure that requires a doc update before implementation. ## `SdkCallCx` and cross-app isolation Every stateful service trait method takes `&SdkCallCx` as its first non-self argument. The cx carries: ```rust pub struct SdkCallCx { pub app_id: AppId, pub principal: Option, pub execution_id: ExecutionId, pub request_id: RequestId, pub trigger_depth: u32, pub root_execution_id: ExecutionId, } ``` **The service implementation MUST derive `app_id` from `cx.app_id` — never from a script-passed argument.** Scripts cannot name another app's data, period. The closure registered into Rhai captures the `Arc` for the call; the script never sees or passes `app_id`. Why this matters: a `kv::set("widgets", "k", v)` call with a script-supplied `app_id` would be a tenant-isolation vulnerability if that arg ever leaked into the storage query. By deriving from the host-attached cx, the service can't be tricked. `principal` is `Option` because the data plane is unauthenticated by default — public HTTP scripts run with `None`. Services that need an authenticated identity (e.g., `users::*`) check `cx.principal.is_some()` and throw if missing. ## Sync ↔ async bridge Rhai is synchronous; service trait methods (KV writes, HTTP calls) are async. The bridge runs *inside the `spawn_blocking` thread* that already wraps `Engine::execute` (orchestrator-core's `LocalExecutorClient`): ```rust // Inside a Rhai-registered closure. let runtime = tokio::runtime::Handle::current(); let result = runtime.block_on(service.do_thing(&cx, args)); ``` `Handle::current()` finds the same Tokio runtime that scheduled the `spawn_blocking`, so the `block_on` doesn't construct a fresh runtime. The thread is already off the async worker pool (that's what `spawn_blocking` does), so blocking inside it is safe. This pattern goes in every stateful service's registered Rhai closure. The first service PR (KV, v1.1.1) lands a helper so subsequent services don't reinvent the boilerplate. ## `ServiceEventEmitter` Every stateful service that mutates data also emits events for the (future) triggers framework: ```rust emitter.emit(&cx, ServiceEvent { source: "kv", op: "insert", collection: Some("widgets".into()), key: Some("k".into()), payload: Some(new_value_json), old_payload: None, }).await?; ``` v1.1.0 ships only `NoopEventEmitter`. The v1.1.1 triggers PR replaces that with an outbox-backed implementation: events land in a Postgres outbox table; a dispatcher worker reads them out-of-band, matches against registered triggers, and fans out script executions. The dispatcher enforces a depth limit via `cx.trigger_depth` so a trigger-fires-its-own-trigger chain can't run away. Services hold `Arc` and emit unconditionally; the noop drops events, the real impl persists them. From the service's perspective the emission is fire-and-forget. ## `ExecutionGate` and `PICLOUD_MAX_CONCURRENT_EXECUTIONS` A single global semaphore caps concurrent script executions. Default is 32; override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var. Acquisition is **non-blocking, no queue** — if a permit isn't free, the request is refused immediately with HTTP 503 and a `Retry-After: 1` header. Rationale: Rhai execution runs under `spawn_blocking`, which uses a finite pool of blocking threads (defaults to 512 in current Tokio). Without a cap, a script storm parks every blocking thread and starves every other workload (DB writes, log sinks, audit emission). Hard pushback is preferable to silent degradation. Per-app or per-script caps are deferred until a real workload demands them. The gate lives in `orchestrator-core::gate::ExecutionGate` and is constructed once in the picloud binary's `build_app`. ## Registration: where future services hook in ```rust // orchestrator-core / executor-core internal call path — // you do not implement this; you implement registration helpers // that future PRs call from here. pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc) { // v1.1.1: register_kv(engine, services, cx.clone()); // v1.1.2: register_docs(engine, services, cx.clone()); // … } ``` Each service PR adds: 1. A `Service` trait + impl in `manager-core` (since that's where the DB-backed implementations live). 2. A field on `picloud_shared::Services` (`pub kv: Arc`). 3. A `register_kv` helper inside `executor-core::sdk::kv` that takes the engine, the service, and the cx, then registers the Rhai `::collection(...)` constructor and method bindings. 4. A new `Capability` variant in `manager-core::authz` (e.g. `AppKvRead(AppId)`) and a check inside the service impl. That sequence is the entire mechanical pattern; nothing here should require architecture-level discussion past v1.1.0. ## What this doc does NOT cover - Service-specific schemas (KV table layout, docs query DSL, etc.) — in each service PR. - Authentication and the admin auth model — see blueprint §11.5, §11.6 and Phase 3.5. - The trigger dispatch design (outbox row layout, fan-out semantics, trigger CRUD endpoints) — comes with v1.1.1. - Cluster mode considerations — deferred to v1.3+.