docs(sdk): SDK-shape reference + blueprint updates for v1.1.x

Lands the developer-facing reference for the SDK shape every v1.1.x
service implements against, plus the blueprint changes the shape and
the recently-shipped Phase 3.5 imply:

  - New docs/sdk-shape.md — covers handle pattern, :: namespace,
    throw/() error convention, sync↔async bridge, cross-app isolation
    rule, ServiceEventEmitter, ExecutionGate + env var, stateless vs
    stateful module registration.
  - Blueprint §11.6 (Phase 3.5): Pending → ✓ Shipped, with a note that
    it landed ahead of the originally planned slot.
  - Blueprint §8.1 (KV Store): replace hstore schema + rationale with
    JSONB. PK becomes (app_id, collection, key); cross-app isolation
    is enforced at the index, not just the service layer. Note 64 KiB
    per-value cap enforced at the service layer (lands with the KV PR
    in v1.1.1).
  - Blueprint new §7.5 (SDK Architecture): brief overview pointing to
    docs/sdk-shape.md. Includes §7.5.1 sketch of the trigger
    architecture (outbox + depth limit + (service, event, filter) →
    script).
  - Blueprint §12 Phase 4: restructured to enumerate v1.1.0 through
    v1.1.8 with one focused capability per release. Current focus
    moves to Phase 4 (v1.1.0) now that Phase 3.5 is done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-30 18:57:44 +02:00
parent 902dd78027
commit 5302bd3192
2 changed files with 294 additions and 52 deletions

227
docs/sdk-shape.md Normal file
View File

@@ -0,0 +1,227 @@
# SDK shape (v1.1.x stateful services)
This document describes the architectural shape every v1.1.x SDK
service follows. It is **not** a feature reference for any particular
service — those live in their own docs as each PR lands (KV in v1.1.1,
docs in v1.1.2, …). What follows is the contract those PRs implement
against, so the surface stays consistent and the build doesn't drift.
The shape was laid down in v1.1.0 (the SDK foundation PR). If you find
yourself re-litigating any of it inside a service PR, push back and
update this doc explicitly first.
## Two kinds of Rhai modules
**Stateless utility modules** (regex, time, json, base64, hex, url —
landing as v1.1.0's stdlib PR) are registered once at engine build.
They have no per-call state and no cross-app sensitivity. Implementation
goes in `executor-core::engine::build_engine` next to the existing
`log::` registration. They use Rhai's `register_static_module`.
**Stateful service modules** (kv, docs, http, cron, files, pubsub,
secrets, email, users, queue, invoke) are registered **per call** by
`executor-core::sdk::register_all`. They need:
- A service handle bundled in `picloud_shared::Services` (constructed
once at startup, cloned cheaply per call).
- A per-call `SdkCallCx` carrying the calling app, principal,
execution ids, and trigger depth.
- Closures that capture both, registered as Rhai native functions
inside a per-call `rhai::Module`.
Mixing the two categories in one module is wrong — services that
internally consult per-call context are stateful, period.
## `::` namespace style
Every SDK module exposes itself under a `::` namespace, mirroring the
existing `log::`:
```rhai
log::info("hello"); // v1.0 — present
let value = kv::collection("widgets").get("k"); // v1.1.1
let resp = http::get("https://example.com"); // v1.1.4
```
Dotted-object syntax (`kv.get("widgets", "k")`) is **not** used.
Rationale: `::` is consistent with Rust import syntax, doesn't
require a wrapper "module object" in Rhai's scope, and keeps the
module boundary obvious in scripts.
## Handle pattern for collection-scoped services
Services that operate on collections expose a **collection handle**
returned by an `::collection(name)` constructor:
```rhai
let widgets = kv::collection("widgets");
widgets.set("k", "v");
let v = widgets.get("k");
```
Not `kv::set("widgets", "k", "v")`. The handle is a Rhai custom type
the service registers; method calls bind to that type. This:
- Removes the "did I get the collection-name argument right?" foot-gun.
- Lets the implementation cache per-collection state on the handle
(prepared statements, connection affinity) without leaking that
into the call signature.
- Pre-empts the "collection is implicit" failure mode where two
services in the same script accidentally share a default collection.
`(app_id, collection, key)` is the identity tuple for KV; `(app_id,
collection, id)` for docs. Collections are **mandatory**, not optional
— even single-collection apps name their collection. The service layer
rejects requests with empty collection names.
## Error convention
- **Throw on failure.** `widgets.set("k", "v")` throws a Rhai runtime
error on any operational problem (DB unavailable, payload too large,
authz denied). Scripts opting into error handling use Rhai's
`try/catch`.
- **`()` for absent.** `widgets.get("missing")` returns `()` (Rhai
unit). Scripts test absence with `if v == () { ... }` or use the
matching `has(k)` predicate.
- **`bool` for predicates.** `widgets.has(k)` is the cheap existence
check that doesn't deserialize the value.
This convention is uniform across every v1.1.x service. Adding
`Result`-flavoured variants is a design departure that requires a doc
update before implementation.
## `SdkCallCx` and cross-app isolation
Every stateful service trait method takes `&SdkCallCx` as its first
non-self argument. The cx carries:
```rust
pub struct SdkCallCx {
pub app_id: AppId,
pub principal: Option<Principal>,
pub execution_id: ExecutionId,
pub request_id: RequestId,
pub trigger_depth: u32,
pub root_execution_id: ExecutionId,
}
```
**The service implementation MUST derive `app_id` from `cx.app_id`
never from a script-passed argument.** Scripts cannot name another
app's data, period. The closure registered into Rhai captures the
`Arc<SdkCallCx>` for the call; the script never sees or passes
`app_id`.
Why this matters: a `kv::set("widgets", "k", v)` call with a
script-supplied `app_id` would be a tenant-isolation vulnerability if
that arg ever leaked into the storage query. By deriving from the
host-attached cx, the service can't be tricked.
`principal` is `Option<Principal>` because the data plane is
unauthenticated by default — public HTTP scripts run with `None`.
Services that need an authenticated identity (e.g., `users::*`) check
`cx.principal.is_some()` and throw if missing.
## Sync ↔ async bridge
Rhai is synchronous; service trait methods (KV writes, HTTP calls) are
async. The bridge runs *inside the `spawn_blocking` thread* that
already wraps `Engine::execute` (orchestrator-core's
`LocalExecutorClient`):
```rust
// Inside a Rhai-registered closure.
let runtime = tokio::runtime::Handle::current();
let result = runtime.block_on(service.do_thing(&cx, args));
```
`Handle::current()` finds the same Tokio runtime that scheduled the
`spawn_blocking`, so the `block_on` doesn't construct a fresh runtime.
The thread is already off the async worker pool (that's what
`spawn_blocking` does), so blocking inside it is safe.
This pattern goes in every stateful service's registered Rhai closure.
The first service PR (KV, v1.1.1) lands a helper so subsequent services
don't reinvent the boilerplate.
## `ServiceEventEmitter`
Every stateful service that mutates data also emits events for the
(future) triggers framework:
```rust
emitter.emit(&cx, ServiceEvent {
source: "kv",
op: "insert",
collection: Some("widgets".into()),
key: Some("k".into()),
payload: Some(new_value_json),
old_payload: None,
}).await?;
```
v1.1.0 ships only `NoopEventEmitter`. The v1.1.1 triggers PR replaces
that with an outbox-backed implementation: events land in a Postgres
outbox table; a dispatcher worker reads them out-of-band, matches
against registered triggers, and fans out script executions. The
dispatcher enforces a depth limit via `cx.trigger_depth` so a
trigger-fires-its-own-trigger chain can't run away.
Services hold `Arc<dyn ServiceEventEmitter>` and emit unconditionally;
the noop drops events, the real impl persists them. From the service's
perspective the emission is fire-and-forget.
## `ExecutionGate` and `PICLOUD_MAX_CONCURRENT_EXECUTIONS`
A single global semaphore caps concurrent script executions. Default
is 32; override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var.
Acquisition is **non-blocking, no queue** — if a permit isn't free,
the request is refused immediately with HTTP 503 and a `Retry-After:
1` header.
Rationale: Rhai execution runs under `spawn_blocking`, which uses a
finite pool of blocking threads (defaults to 512 in current Tokio).
Without a cap, a script storm parks every blocking thread and starves
every other workload (DB writes, log sinks, audit emission). Hard
pushback is preferable to silent degradation.
Per-app or per-script caps are deferred until a real workload demands
them. The gate lives in `orchestrator-core::gate::ExecutionGate` and
is constructed once in the picloud binary's `build_app`.
## Registration: where future services hook in
```rust
// orchestrator-core / executor-core internal call path —
// you do not implement this; you implement registration helpers
// that future PRs call from here.
pub fn register_all(engine: &mut RhaiEngine, services: &Services, cx: Arc<SdkCallCx>) {
// v1.1.1: register_kv(engine, services, cx.clone());
// v1.1.2: register_docs(engine, services, cx.clone());
// …
}
```
Each service PR adds:
1. A `Service` trait + impl in `manager-core` (since that's where the
DB-backed implementations live).
2. A field on `picloud_shared::Services` (`pub kv: Arc<dyn KvService>`).
3. A `register_kv` helper inside `executor-core::sdk::kv` that takes
the engine, the service, and the cx, then registers the Rhai
`::collection(...)` constructor and method bindings.
4. A new `Capability` variant in `manager-core::authz` (e.g.
`AppKvRead(AppId)`) and a check inside the service impl.
That sequence is the entire mechanical pattern; nothing here should
require architecture-level discussion past v1.1.0.
## What this doc does NOT cover
- Service-specific schemas (KV table layout, docs query DSL, etc.) —
in each service PR.
- Authentication and the admin auth model — see blueprint §11.5,
§11.6 and Phase 3.5.
- The trigger dispatch design (outbox row layout, fan-out semantics,
trigger CRUD endpoints) — comes with v1.1.1.
- Cluster mode considerations — deferred to v1.3+.