Files

MechaCat02 5302bd3192 docs(sdk): SDK-shape reference + blueprint updates for v1.1.x

Lands the developer-facing reference for the SDK shape every v1.1.x
service implements against, plus the blueprint changes the shape and
the recently-shipped Phase 3.5 imply:

  - New docs/sdk-shape.md — covers handle pattern, :: namespace,
    throw/() error convention, sync↔async bridge, cross-app isolation
    rule, ServiceEventEmitter, ExecutionGate + env var, stateless vs
    stateful module registration.
  - Blueprint §11.6 (Phase 3.5): Pending → ✓ Shipped, with a note that
    it landed ahead of the originally planned slot.
  - Blueprint §8.1 (KV Store): replace hstore schema + rationale with
    JSONB. PK becomes (app_id, collection, key); cross-app isolation
    is enforced at the index, not just the service layer. Note 64 KiB
    per-value cap enforced at the service layer (lands with the KV PR
    in v1.1.1).
  - Blueprint new §7.5 (SDK Architecture): brief overview pointing to
    docs/sdk-shape.md. Includes §7.5.1 sketch of the trigger
    architecture (outbox + depth limit + (service, event, filter) →
    script).
  - Blueprint §12 Phase 4: restructured to enumerate v1.1.0 through
    v1.1.8 with one focused capability per release. Current focus
    moves to Phase 4 (v1.1.0) now that Phase 3.5 is done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-30 18:57:44 +02:00

8.9 KiB

Raw Blame History

SDK shape (v1.1.x stateful services)

This document describes the architectural shape every v1.1.x SDK service follows. It is not a feature reference for any particular service — those live in their own docs as each PR lands (KV in v1.1.1, docs in v1.1.2, …). What follows is the contract those PRs implement against, so the surface stays consistent and the build doesn't drift.

The shape was laid down in v1.1.0 (the SDK foundation PR). If you find yourself re-litigating any of it inside a service PR, push back and update this doc explicitly first.

Two kinds of Rhai modules

Stateless utility modules (regex, time, json, base64, hex, url — landing as v1.1.0's stdlib PR) are registered once at engine build. They have no per-call state and no cross-app sensitivity. Implementation goes in executor-core::engine::build_engine next to the existing log:: registration. They use Rhai's register_static_module.

Stateful service modules (kv, docs, http, cron, files, pubsub, secrets, email, users, queue, invoke) are registered per call by executor-core::sdk::register_all. They need:

A service handle bundled in picloud_shared::Services (constructed once at startup, cloned cheaply per call).
A per-call SdkCallCx carrying the calling app, principal, execution ids, and trigger depth.
Closures that capture both, registered as Rhai native functions inside a per-call rhai::Module.

Mixing the two categories in one module is wrong — services that internally consult per-call context are stateful, period.

`::` namespace style

Every SDK module exposes itself under a :: namespace, mirroring the existing log:::

log::info("hello");                              // v1.0 — present
let value = kv::collection("widgets").get("k");  // v1.1.1
let resp  = http::get("https://example.com");    // v1.1.4

Dotted-object syntax (kv.get("widgets", "k")) is not used. Rationale: :: is consistent with Rust import syntax, doesn't require a wrapper "module object" in Rhai's scope, and keeps the module boundary obvious in scripts.

Handle pattern for collection-scoped services

Services that operate on collections expose a collection handle returned by an ::collection(name) constructor:

let widgets = kv::collection("widgets");
widgets.set("k", "v");
let v = widgets.get("k");

Not kv::set("widgets", "k", "v"). The handle is a Rhai custom type the service registers; method calls bind to that type. This:

Removes the "did I get the collection-name argument right?" foot-gun.
Lets the implementation cache per-collection state on the handle (prepared statements, connection affinity) without leaking that into the call signature.
Pre-empts the "collection is implicit" failure mode where two services in the same script accidentally share a default collection.

(app_id, collection, key) is the identity tuple for KV; (app_id, collection, id) for docs. Collections are mandatory, not optional — even single-collection apps name their collection. The service layer rejects requests with empty collection names.

Error convention

Throw on failure. widgets.set("k", "v") throws a Rhai runtime error on any operational problem (DB unavailable, payload too large, authz denied). Scripts opting into error handling use Rhai's try/catch.
() for absent. widgets.get("missing") returns () (Rhai unit). Scripts test absence with if v == () { ... } or use the matching has(k) predicate.
bool for predicates. widgets.has(k) is the cheap existence check that doesn't deserialize the value.

This convention is uniform across every v1.1.x service. Adding Result-flavoured variants is a design departure that requires a doc update before implementation.

`SdkCallCx` and cross-app isolation

Every stateful service trait method takes &SdkCallCx as its first non-self argument. The cx carries:

pub struct SdkCallCx {
    pub app_id: AppId,
    pub principal: Option<Principal>,
    pub execution_id: ExecutionId,
    pub request_id: RequestId,
    pub trigger_depth: u32,
    pub root_execution_id: ExecutionId,
}

The service implementation MUST derive app_id from cx.app_id — never from a script-passed argument. Scripts cannot name another app's data, period. The closure registered into Rhai captures the Arc<SdkCallCx> for the call; the script never sees or passes app_id.

Why this matters: a kv::set("widgets", "k", v) call with a script-supplied app_id would be a tenant-isolation vulnerability if that arg ever leaked into the storage query. By deriving from the host-attached cx, the service can't be tricked.

principal is Option<Principal> because the data plane is unauthenticated by default — public HTTP scripts run with None. Services that need an authenticated identity (e.g., users::*) check cx.principal.is_some() and throw if missing.

Sync ↔ async bridge

Rhai is synchronous; service trait methods (KV writes, HTTP calls) are async. The bridge runs inside the spawn_blocking thread that already wraps Engine::execute (orchestrator-core's LocalExecutorClient):

// Inside a Rhai-registered closure.
let runtime = tokio::runtime::Handle::current();
let result = runtime.block_on(service.do_thing(&cx, args));

Handle::current() finds the same Tokio runtime that scheduled the spawn_blocking, so the block_on doesn't construct a fresh runtime. The thread is already off the async worker pool (that's what spawn_blocking does), so blocking inside it is safe.

This pattern goes in every stateful service's registered Rhai closure. The first service PR (KV, v1.1.1) lands a helper so subsequent services don't reinvent the boilerplate.

`ServiceEventEmitter`

Every stateful service that mutates data also emits events for the (future) triggers framework:

emitter.emit(&cx, ServiceEvent {
    source:      "kv",
    op:          "insert",
    collection:  Some("widgets".into()),
    key:         Some("k".into()),
    payload:     Some(new_value_json),
    old_payload: None,
}).await?;

v1.1.0 ships only NoopEventEmitter. The v1.1.1 triggers PR replaces that with an outbox-backed implementation: events land in a Postgres outbox table; a dispatcher worker reads them out-of-band, matches against registered triggers, and fans out script executions. The dispatcher enforces a depth limit via cx.trigger_depth so a trigger-fires-its-own-trigger chain can't run away.

Services hold Arc<dyn ServiceEventEmitter> and emit unconditionally; the noop drops events, the real impl persists them. From the service's perspective the emission is fire-and-forget.

`ExecutionGate` and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`

A single global semaphore caps concurrent script executions. Default is 32; override via the PICLOUD_MAX_CONCURRENT_EXECUTIONS env var. Acquisition is non-blocking, no queue — if a permit isn't free, the request is refused immediately with HTTP 503 and a Retry-After: 1 header.

Rationale: Rhai execution runs under spawn_blocking, which uses a finite pool of blocking threads (defaults to 512 in current Tokio). Without a cap, a script storm parks every blocking thread and starves every other workload (DB writes, log sinks, audit emission). Hard pushback is preferable to silent degradation.

Per-app or per-script caps are deferred until a real workload demands them. The gate lives in orchestrator-core::gate::ExecutionGate and is constructed once in the picloud binary's build_app.

Registration: where future services hook in

// orchestrator-core / executor-core internal call path —
// you do not implement this; you implement registration helpers
// that future PRs call from here.
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
    // v1.1.1: register_kv(engine, services, cx.clone());
    // v1.1.2: register_docs(engine, services, cx.clone());
    // …
}

Each service PR adds:

A Service trait + impl in manager-core (since that's where the DB-backed implementations live).
A field on picloud_shared::Services (pub kv: Arc<dyn KvService>).
A register_kv helper inside executor-core::sdk::kv that takes the engine, the service, and the cx, then registers the Rhai ::collection(...) constructor and method bindings.
A new Capability variant in manager-core::authz (e.g. AppKvRead(AppId)) and a check inside the service impl.

That sequence is the entire mechanical pattern; nothing here should require architecture-level discussion past v1.1.0.

What this doc does NOT cover

Service-specific schemas (KV table layout, docs query DSL, etc.) — in each service PR.
Authentication and the admin auth model — see blueprint §11.5, §11.6 and Phase 3.5.
The trigger dispatch design (outbox row layout, fan-out semantics, trigger CRUD endpoints) — comes with v1.1.1.
Cluster mode considerations — deferred to v1.3+.

8.9 KiB Raw Blame History