Lands the developer-facing reference for the SDK shape every v1.1.x
service implements against, plus the blueprint changes the shape and
the recently-shipped Phase 3.5 imply:
- New docs/sdk-shape.md — covers handle pattern, :: namespace,
throw/() error convention, sync↔async bridge, cross-app isolation
rule, ServiceEventEmitter, ExecutionGate + env var, stateless vs
stateful module registration.
- Blueprint §11.6 (Phase 3.5): Pending → ✓ Shipped, with a note that
it landed ahead of the originally planned slot.
- Blueprint §8.1 (KV Store): replace hstore schema + rationale with
JSONB. PK becomes (app_id, collection, key); cross-app isolation
is enforced at the index, not just the service layer. Note 64 KiB
per-value cap enforced at the service layer (lands with the KV PR
in v1.1.1).
- Blueprint new §7.5 (SDK Architecture): brief overview pointing to
docs/sdk-shape.md. Includes §7.5.1 sketch of the trigger
architecture (outbox + depth limit + (service, event, filter) →
script).
- Blueprint §12 Phase 4: restructured to enumerate v1.1.0 through
v1.1.8 with one focused capability per release. Current focus
moves to Phase 4 (v1.1.0) now that Phase 3.5 is done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.9 KiB
SDK shape (v1.1.x stateful services)
This document describes the architectural shape every v1.1.x SDK service follows. It is not a feature reference for any particular service — those live in their own docs as each PR lands (KV in v1.1.1, docs in v1.1.2, …). What follows is the contract those PRs implement against, so the surface stays consistent and the build doesn't drift.
The shape was laid down in v1.1.0 (the SDK foundation PR). If you find yourself re-litigating any of it inside a service PR, push back and update this doc explicitly first.
Two kinds of Rhai modules
Stateless utility modules (regex, time, json, base64, hex, url —
landing as v1.1.0's stdlib PR) are registered once at engine build.
They have no per-call state and no cross-app sensitivity. Implementation
goes in executor-core::engine::build_engine next to the existing
log:: registration. They use Rhai's register_static_module.
Stateful service modules (kv, docs, http, cron, files, pubsub,
secrets, email, users, queue, invoke) are registered per call by
executor-core::sdk::register_all. They need:
- A service handle bundled in
picloud_shared::Services(constructed once at startup, cloned cheaply per call). - A per-call
SdkCallCxcarrying the calling app, principal, execution ids, and trigger depth. - Closures that capture both, registered as Rhai native functions
inside a per-call
rhai::Module.
Mixing the two categories in one module is wrong — services that internally consult per-call context are stateful, period.
:: namespace style
Every SDK module exposes itself under a :: namespace, mirroring the
existing log:::
log::info("hello"); // v1.0 — present
let value = kv::collection("widgets").get("k"); // v1.1.1
let resp = http::get("https://example.com"); // v1.1.4
Dotted-object syntax (kv.get("widgets", "k")) is not used.
Rationale: :: is consistent with Rust import syntax, doesn't
require a wrapper "module object" in Rhai's scope, and keeps the
module boundary obvious in scripts.
Handle pattern for collection-scoped services
Services that operate on collections expose a collection handle
returned by an ::collection(name) constructor:
let widgets = kv::collection("widgets");
widgets.set("k", "v");
let v = widgets.get("k");
Not kv::set("widgets", "k", "v"). The handle is a Rhai custom type
the service registers; method calls bind to that type. This:
- Removes the "did I get the collection-name argument right?" foot-gun.
- Lets the implementation cache per-collection state on the handle (prepared statements, connection affinity) without leaking that into the call signature.
- Pre-empts the "collection is implicit" failure mode where two services in the same script accidentally share a default collection.
(app_id, collection, key) is the identity tuple for KV; (app_id, collection, id) for docs. Collections are mandatory, not optional
— even single-collection apps name their collection. The service layer
rejects requests with empty collection names.
Error convention
- Throw on failure.
widgets.set("k", "v")throws a Rhai runtime error on any operational problem (DB unavailable, payload too large, authz denied). Scripts opting into error handling use Rhai'stry/catch. ()for absent.widgets.get("missing")returns()(Rhai unit). Scripts test absence withif v == () { ... }or use the matchinghas(k)predicate.boolfor predicates.widgets.has(k)is the cheap existence check that doesn't deserialize the value.
This convention is uniform across every v1.1.x service. Adding
Result-flavoured variants is a design departure that requires a doc
update before implementation.
SdkCallCx and cross-app isolation
Every stateful service trait method takes &SdkCallCx as its first
non-self argument. The cx carries:
pub struct SdkCallCx {
pub app_id: AppId,
pub principal: Option<Principal>,
pub execution_id: ExecutionId,
pub request_id: RequestId,
pub trigger_depth: u32,
pub root_execution_id: ExecutionId,
}
The service implementation MUST derive app_id from cx.app_id —
never from a script-passed argument. Scripts cannot name another
app's data, period. The closure registered into Rhai captures the
Arc<SdkCallCx> for the call; the script never sees or passes
app_id.
Why this matters: a kv::set("widgets", "k", v) call with a
script-supplied app_id would be a tenant-isolation vulnerability if
that arg ever leaked into the storage query. By deriving from the
host-attached cx, the service can't be tricked.
principal is Option<Principal> because the data plane is
unauthenticated by default — public HTTP scripts run with None.
Services that need an authenticated identity (e.g., users::*) check
cx.principal.is_some() and throw if missing.
Sync ↔ async bridge
Rhai is synchronous; service trait methods (KV writes, HTTP calls) are
async. The bridge runs inside the spawn_blocking thread that
already wraps Engine::execute (orchestrator-core's
LocalExecutorClient):
// Inside a Rhai-registered closure.
let runtime = tokio::runtime::Handle::current();
let result = runtime.block_on(service.do_thing(&cx, args));
Handle::current() finds the same Tokio runtime that scheduled the
spawn_blocking, so the block_on doesn't construct a fresh runtime.
The thread is already off the async worker pool (that's what
spawn_blocking does), so blocking inside it is safe.
This pattern goes in every stateful service's registered Rhai closure. The first service PR (KV, v1.1.1) lands a helper so subsequent services don't reinvent the boilerplate.
ServiceEventEmitter
Every stateful service that mutates data also emits events for the (future) triggers framework:
emitter.emit(&cx, ServiceEvent {
source: "kv",
op: "insert",
collection: Some("widgets".into()),
key: Some("k".into()),
payload: Some(new_value_json),
old_payload: None,
}).await?;
v1.1.0 ships only NoopEventEmitter. The v1.1.1 triggers PR replaces
that with an outbox-backed implementation: events land in a Postgres
outbox table; a dispatcher worker reads them out-of-band, matches
against registered triggers, and fans out script executions. The
dispatcher enforces a depth limit via cx.trigger_depth so a
trigger-fires-its-own-trigger chain can't run away.
Services hold Arc<dyn ServiceEventEmitter> and emit unconditionally;
the noop drops events, the real impl persists them. From the service's
perspective the emission is fire-and-forget.
ExecutionGate and PICLOUD_MAX_CONCURRENT_EXECUTIONS
A single global semaphore caps concurrent script executions. Default
is 32; override via the PICLOUD_MAX_CONCURRENT_EXECUTIONS env var.
Acquisition is non-blocking, no queue — if a permit isn't free,
the request is refused immediately with HTTP 503 and a Retry-After: 1 header.
Rationale: Rhai execution runs under spawn_blocking, which uses a
finite pool of blocking threads (defaults to 512 in current Tokio).
Without a cap, a script storm parks every blocking thread and starves
every other workload (DB writes, log sinks, audit emission). Hard
pushback is preferable to silent degradation.
Per-app or per-script caps are deferred until a real workload demands
them. The gate lives in orchestrator-core::gate::ExecutionGate and
is constructed once in the picloud binary's build_app.
Registration: where future services hook in
// orchestrator-core / executor-core internal call path —
// you do not implement this; you implement registration helpers
// that future PRs call from here.
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
// v1.1.1: register_kv(engine, services, cx.clone());
// v1.1.2: register_docs(engine, services, cx.clone());
// …
}
Each service PR adds:
- A
Servicetrait + impl inmanager-core(since that's where the DB-backed implementations live). - A field on
picloud_shared::Services(pub kv: Arc<dyn KvService>). - A
register_kvhelper insideexecutor-core::sdk::kvthat takes the engine, the service, and the cx, then registers the Rhai::collection(...)constructor and method bindings. - A new
Capabilityvariant inmanager-core::authz(e.g.AppKvRead(AppId)) and a check inside the service impl.
That sequence is the entire mechanical pattern; nothing here should require architecture-level discussion past v1.1.0.
What this doc does NOT cover
- Service-specific schemas (KV table layout, docs query DSL, etc.) — in each service PR.
- Authentication and the admin auth model — see blueprint §11.5, §11.6 and Phase 3.5.
- The trigger dispatch design (outbox row layout, fan-out semantics, trigger CRUD endpoints) — comes with v1.1.1.
- Cluster mode considerations — deferred to v1.3+.