docs: design Phase 3 admin auth and multi-app scoping

Adds blueprint sections 11.4 (admin auth) and 11.5 (app scoping) and restructures the section 12 roadmap to put both ahead of v1.1, since retrofitting app_id into KV/docs/users schemas after they ship is far more expensive than adding it now. Admin auth: per-user admin_users (not a shared secret), Argon2id, env-var bootstrap that becomes inert after first admin exists, session token doubling as bearer token, 24h sliding TTL. Schema designed forward-compatible with later RBAC. App scoping: apps own scripts/routes/domains. Domain claims at app level (exact / wildcard / {param} parameterized) with collision check at claim time, so route-conflict errors stay strictly intra-app. Two-phase orchestrator dispatch (Host → app → route trie). Slug rename keeps the old slug as a permanent redirect until another app claims it. Fresh-install migration seeds a Hello World app; upgrades go into a default app instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 22:58:37 +02:00
parent 56de652f7a
commit 646bd55174
2 changed files with 326 additions and 42 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,6 +8,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

 Authoritative design: [serverless_cloud_blueprint.md](serverless_cloud_blueprint.md). The blueprint is a living document — when architecture decisions are made in conversation that contradict it, treat the latest decision as truth and update the blueprint.

+**Current focus (Phase 3, pre-v1.1):** admin auth gate, then multi-app scoping. The latter introduces `apps` as the top-level isolation boundary for scripts, routes, domains, and (later) data. See blueprint §11.5 for the design. Every v1.1+ feature must assume `app_id` exists as a scoping dimension.
+
 ## Three-Service Architecture

 The platform splits into three logical services, each backed by a `*-core` library crate so the same logic runs in single-process MVP mode and split-process cluster mode:
@@ -26,7 +28,7 @@ In MVP, all three run in one process (`picloud` binary). In cluster mode, each r

 Versioned API surfaces live under `/api/v{N}/...`. See [docs/versioning.md](docs/versioning.md) for the full scheme.

- `/api/v1/admin/*` — manager (control plane: script CRUD, routes CRUD + check + match, logs, config)
+- `/api/v1/admin/*` — manager (control plane: script CRUD, routes CRUD + check + match, logs, config; apps CRUD once Phase 3b lands)
 - `/api/v1/execute/{id}` — orchestrator (data plane: invoke a script by ID, always-available bypass)
 - `/admin/*` — dashboard SPA (SvelteKit, `paths.base = '/admin'`)
 - `/healthz` — liveness (string `"ok"`)
@@ -37,6 +39,10 @@ Reserved path prefixes (rejected at route creation): `/api/`, `/admin/`, `/healt

 Caddy fronts everything. Same Caddyfile shape works for single-node and cluster — only upstream targets change.

+**Param syntax convention:** route paths use `:name` (e.g., `/users/:id`); domains (once apps land) use `{name}` (e.g., `{tenant}.example.com`). These are deliberately distinct — never use `:` in a domain context or `{}` in a route-path context.
+
+**Two-phase dispatch (Phase 3b onward):** the orchestrator first resolves `Host` → app (most-specific domain claim wins), then runs that app's route trie. The route matcher itself is unchanged and never sees other apps' routes.
+
 ## Tech Stack

 - **Rust 1.92+** workspace, pinned via `rust-toolchain.toml`
@@ -102,4 +108,6 @@ docs/

 ## Out of MVP

-Queue triggers, cron triggers, SMTP ingress, KV / docs / email / users / HTTP SDKs in scripts, interceptors, workflows, function-to-function `invoke()`, auth, multi-tenancy, secrets, metrics dashboard. All deferred to v1.1+ per the blueprint. Don't pre-build for them — but don't make decisions that close the door on them either.
+Queue triggers, cron triggers, SMTP ingress, KV / docs / email / users / HTTP SDKs in scripts, interceptors, workflows, function-to-function `invoke()`, secrets, metrics dashboard. All deferred to v1.1+ per the blueprint. Don't pre-build for them — but don't make decisions that close the door on them either.
+
+**Pulled forward to Phase 3 (pre-v1.1):** admin auth, multi-app scoping. Cross-app data sharing (export/import) stays at v1.3+; the initial cut enforces strict isolation. See blueprint §11.5.
--- a/serverless_cloud_blueprint.md
+++ b/serverless_cloud_blueprint.md
@@ -732,68 +732,344 @@ volumes:

 ---

+## 11.4 Admin Auth (Phase 3a)
+
+**Purpose**: gate the admin API (`/api/v1/admin/*`) and dashboard (`/admin/*`) behind per-user authentication. Today the surface is open — anyone reaching the bound port can create, edit, and delete scripts.
+
+**Why per-user, not a shared secret**: shared admin passwords get shared between humans, leave no audit trail, and can't be revoked per-person. Per-user accounts solve all three. The initial cut deliberately stops there — no roles, no per-app permissions — because that scope is small enough to ship in a single phase without blocking Phase 3b. Roles + per-app permissions are queued for v1.3+.
+
+### Naming: `admin_users` vs `users`
+
+We reserve the unqualified **`users`** table for the v1.1+ Rhai SDK feature (script-level end users — see §8.4). Platform-operator accounts live in **`admin_users`**. They are different concepts and never share rows, even when a PiCloud install hosts apps that themselves run user management.
+
+### Schema
+
+```sql
+CREATE TABLE admin_users (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  username TEXT NOT NULL UNIQUE,
+  password_hash TEXT NOT NULL,       -- Argon2id
+  created_at TIMESTAMP DEFAULT NOW(),
+  updated_at TIMESTAMP DEFAULT NOW(),
+  last_login_at TIMESTAMP
+);
+
+CREATE TABLE admin_sessions (
+  token_hash TEXT PRIMARY KEY,       -- SHA-256 of the bearer token; raw token only exists in the login response + cookie
+  user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
+  created_at TIMESTAMP DEFAULT NOW(),
+  expires_at TIMESTAMP NOT NULL,
+  last_used_at TIMESTAMP DEFAULT NOW()
+);
+
+CREATE INDEX idx_admin_sessions_user ON admin_sessions(user_id);
+CREATE INDEX idx_admin_sessions_expiry ON admin_sessions(expires_at);
+```
+
+**Password hashing**: Argon2id with default OWASP parameters. This also resolves the v1.1+ open question about user-password hashing (§10) — the platform settles on Argon2id once, here.
+
+### Bootstrap
+
+On startup, if `admin_users` is empty, the manager reads `PICLOUD_ADMIN_USERNAME` plus a password from env (or a config file) and inserts the row. Two password env vars are accepted, in this precedence:
+
+1. **`PICLOUD_ADMIN_PASSWORD_HASH`** (recommended) — pre-computed Argon2id PHC-format hash. The platform validates the string parses, then inserts it as-is. This avoids the raw password ever being written into env/compose files or process listings.
+2. **`PICLOUD_ADMIN_PASSWORD`** (fallback) — raw password. The platform hashes it with Argon2id defaults and discards the raw value. Simpler for first-time setup; less ideal for committed configs.
+
+If both are set, the hash wins and the raw value is ignored (with a warning logged). If neither is set on a fresh install, startup fails with a clear error pointing at the env vars.
+
+**Once that bootstrap row exists, the env vars become inert** — restarting with different values does not change the password. This is deliberate: the env var is a one-time setup hatch, not a recovery backdoor (a backdoor would let anyone with systemd-unit or compose-file access override any admin's password).
+
+Recovery is a separate manual flow:
+```sh
+picloud admin reset-password <username>
+```
+This requires shell access on the host (and therefore implies the operator already controls the box).
+
+### Login & Session
+
+```
+POST /api/v1/admin/auth/login
+{ "username": "...", "password": "..." }
+
+→ 200 OK
+Set-Cookie: picloud_session=<token>; HttpOnly; Secure; SameSite=Lax; Path=/
+{ "user": { "id": "...", "username": "..." }, "token": "<token>", "expires_at": "..." }
+```
+
+Token format: opaque random string (32 bytes base64). Stored hashed; the raw value lives only in the login response and the session cookie. The same token works as a bearer credential for non-browser clients:
+
+```
+Authorization: Bearer <token>
+```
+
+One token system serves both dashboard and CLI/CI clients — no separate "API token" concept. Personal long-lived API tokens can be added later as a distinct `admin_api_tokens` table if demand appears.
+
+**Session TTL** is a **24-hour sliding window**: each authenticated request bumps `expires_at` to `now + ttl` and `last_used_at` to `now`. The TTL itself is configurable per deploy via `PICLOUD_SESSION_TTL_HOURS` (default `24`). A separate background sweep deletes rows where `expires_at < now()`; until that sweep runs, expired rows are also rejected at auth-check time (so a stuck sweep can't extend session lifetime past expiry).
+
+Companion endpoints:
+- `POST /api/v1/admin/auth/logout` — deletes the session row.
+- `GET /api/v1/admin/auth/me` — returns the current authenticated user.
+
+### Admin User Management
+
+```
+GET    /api/v1/admin/admins             — list
+POST   /api/v1/admin/admins             — create
+GET    /api/v1/admin/admins/{id}        — get
+PATCH  /api/v1/admin/admins/{id}        — update (username, password)
+DELETE /api/v1/admin/admins/{id}        — delete (rejected if it would leave zero admins)
+```
+
+Initial cut: every authenticated admin can call all of these. No self-elevation concerns because there are no privilege levels yet.
+
+### Forward Compatibility
+
+Schema is intentionally simple so role/permission tables can be added without touching `admin_users`. Illustrative future shape:
+
+```sql
+CREATE TABLE admin_roles (
+  id UUID PRIMARY KEY,
+  name TEXT UNIQUE                       -- e.g., 'super_admin', 'app_editor', 'app_viewer'
+);
+
+CREATE TABLE admin_user_roles (
+  admin_user_id UUID REFERENCES admin_users(id) ON DELETE CASCADE,
+  role_id       UUID REFERENCES admin_roles(id) ON DELETE RESTRICT,
+  app_id        UUID REFERENCES apps(id) ON DELETE CASCADE,   -- nullable for global roles
+  PRIMARY KEY (admin_user_id, role_id, app_id)
+);
+```
+
+Permission checks land in middleware that initially only enforces "authenticated"; the same middleware is the seam where role checks slot in later. Don't pre-build the role tables — but keep the middleware shape such that adding them is a localized change.
+
+---
+
+## 11.5 App Scoping (v1.x)
+
+**Purpose**: PiCloud hosts multiple independent applications on one platform. Each app is the isolation boundary for scripts, routes, domains, and (later) data — App A cannot see or modify App B's resources except through HTTP calls between them.
+
+**Why this slot**: pulled forward from the original v1.3+ "multi-user / project namespacing" bullet. Adding the `app_id` scoping dimension to schemas while the surface is small is cheap; retrofitting it after KV, docs, users, etc. ship is a multi-table migration on populated data.
+
+### Apps Own Scripts
+
+Every script belongs to exactly one app (`scripts.app_id`, non-null). Script IDs remain globally unique UUIDs — the API operates on script IDs directly without needing `app_id` in the URL. The dashboard nests scripts under their app in URLs (see "Dashboard URL Layout" below) but the script ID alone is still enough to resolve them server-side.
+
+Cross-app script reuse is not done by linking. A future **duplicate-to-app** feature may copy a script's content and config into another app under a new ID, with **snapshot semantics**: the copy is independent, and changes to the original do not propagate. Genuine cross-app integration goes through HTTP calls (and, much later, an explicit export/import model for shared data).
+
+### Apps Own Domains
+
+Routes can no longer claim arbitrary hostnames freely. Each app declares a set of **domain claims**:
+
+| Form | Example | Matches |
+|---|---|---|
+| Exact host | `app.example.com` | only that exact host |
+| Single-label wildcard | `*.example.com` | one label deep: `foo.example.com`, not `a.b.example.com` |
+| Parameterized | `{tenant}.example.com` | same shape as wildcard; binds `tenant` into request context |
+
+**Syntax convention**: domain parameters use `{name}` (curly braces); route-path parameters use `:name` (colon). These are deliberately distinct so docs and conflict messages never confuse the two.
+
+Every app also implicitly carries the reserved claim `__internal__`, granting access to `/api/v1/execute/{id}/*` for that app's scripts. An app with no public domain still works for execute-by-id (and, later, cron triggers, queue triggers, etc.).
+
+When a route is created, its host must match one of the parent app's domain claims. The dashboard's route-creation UI offers a selector populated from the app's claims rather than a free-text host field.
+
+### Conflict Rules — Checked at Claim Time
+
+Domain-claim collisions are detected when a domain is added to an app, not when requests arrive:
+
+- **Exact vs identical exact** → reject ("domain already claimed").
+- **Exact vs wildcard** → allowed. `foo.example.com` (App A) coexists with `*.example.com` (App B); at request time the more-specific match wins, so A handles `foo.example.com`, B handles every other subdomain.
+- **Wildcard vs wildcard at the same shape** → reject. Two apps cannot both claim `*.example.com`. `{tenant}.example.com` has the same shape as `*.example.com` for this check — the parameter name is a binding, not a discriminator.
+
+Route-conflict errors are strictly **intra-app**. A user creating a route inside App A never sees an error that references App B. The only cross-app surface is "this domain is already claimed" at domain-claim time, which is honest and unavoidable.
+
+### Runtime Dispatch
+
+Request handling becomes a two-phase lookup:
+
+1. **Host → app**: pick the app whose claim most-specifically matches the request's `Host` header (exact beats wildcard; ties are impossible by the claim rules above).
+2. **Path → route**: run that app's route trie unchanged using the existing matcher.
+
+The orchestrator's route matcher does not learn about apps — it just operates on whichever app's table was selected in step 1. This keeps the existing conflict-detection logic intact.
+
+### Local Development
+
+On `localhost`, `localhost` is treated as a regular domain claimed by exactly one app, defaulting to a bootstrap "default" app installed at first run. Dev and prod use the same dispatch model — no second mental model.
+
+### Cross-App Data Sharing — Deferred
+
+Per-app isolation is the **default and only mode** in the initial cut. KV collection `users` in App A is distinct from KV collection `users` in App B; App B cannot read App A's data without an HTTP endpoint that App A explicitly exposes.
+
+A formal export/import model — where App B exports a collection under a public name and admin grants App A read or read-write access — is a future addition. Until it ships, the escape hatch is function-to-function HTTP calls. Sharing is easier to add than to retract; isolation comes first.
+
+### Schema Sketch
+
+```sql
+CREATE TABLE apps (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  slug TEXT NOT NULL UNIQUE,    -- URL-safe; used in dashboard paths
+  name TEXT NOT NULL,           -- display name; can be edited freely
+  description TEXT,
+  created_at TIMESTAMP DEFAULT NOW(),
+  updated_at TIMESTAMP DEFAULT NOW()
+);
+
+CREATE TABLE app_domains (
+  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
+  pattern TEXT NOT NULL,        -- 'app.example.com' | '*.example.com' | '{tenant}.example.com'
+  shape TEXT NOT NULL,          -- 'exact' | 'wildcard' | 'parameterized'
+  shape_key TEXT NOT NULL,      -- normalized form for collision check (parameterized → wildcard form)
+  created_at TIMESTAMP DEFAULT NOW(),
+
+  UNIQUE (shape_key)            -- two apps cannot share the same shape-key
+);
+
+ALTER TABLE scripts ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE RESTRICT;
+ALTER TABLE routes  ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE;
+
+-- Existing route uniqueness checks remain unchanged; they are now scoped within an app.
+```
+
+The `UNIQUE (shape_key)` constraint enforces the "same shape" rule at the DB level. Exact-vs-wildcard coexistence is allowed because exact hosts produce a different `shape_key` from wildcards.
+
+### Bootstrap & Migration
+
+The migration's behavior **depends on whether the install already has user content**:
+
+- **Fresh install** (no pre-existing scripts or routes): seed a **"Hello World"** app with `localhost` as its sole domain claim, a `hello.rhai` script that returns a greeting, and a `/hello` GET route. This serves as the reference example for new users — they can hit `http://localhost:<port>/hello` immediately after first boot and see something work. The seed is intentionally minimal; future iterations may flesh it out.
+- **Upgrading install** (pre-existing scripts or routes): create a **"default"** app with `slug = 'default'`, `localhost` as its sole domain claim, and assign every existing script and route to it. The Hello World seed is **not** added in this case — adding it would pollute the user's existing content.
+
+The branch point is detected by inspecting whether `scripts` had any rows before the migration ran.
+
+### Dashboard URL Layout
+
+The dashboard is **app-hierarchical**, using the app's `slug` for human-readable URLs:
+
+```
+/admin/apps                          — app list
+/admin/apps/new                      — create app
+/admin/apps/{slug}                   — app overview
+/admin/apps/{slug}/scripts           — scripts in this app
+/admin/apps/{slug}/scripts/{id}      — script detail (script ID still globally unique; slug is for breadcrumbs)
+/admin/apps/{slug}/routes            — routes in this app
+/admin/apps/{slug}/domains           — domain claims for this app
+/admin/apps/{slug}/settings          — app settings
+```
+
+Renaming an app changes its `slug`. The previous slug stays as a **permanent redirect** to the renamed app, persisting until another app (a new app or another rename) tries to claim that retired slug. When such a collision happens, the dashboard shows a warning before letting the operator proceed: *"`old-slug` currently redirects to app `bar` — using it here will break any external links that still target the old slug."* If the operator confirms, the redirect row is dropped and the slug is reused.
+
+Implementation sketch:
+
+```sql
+CREATE TABLE app_slug_history (
+  slug TEXT PRIMARY KEY,                       -- the retired slug
+  current_app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
+  retired_at TIMESTAMP DEFAULT NOW()
+);
+```
+
+Slug lookup order:
+1. `apps.slug = {slug}` → render the page directly.
+2. `app_slug_history.slug = {slug}` → `301` redirect to `/admin/apps/{current_app.slug}/<rest>`.
+3. Neither → `404`.
+
+Slug claim order (create or rename to a slug `S`):
+1. If `S` matches a current app's slug → reject as a conflict (the usual unique-constraint error).
+2. If `S` matches a row in `app_slug_history` → return a "needs confirmation" response. Dashboard surfaces the warning; on confirm, delete the history row inside the same transaction as the create/rename.
+3. Otherwise → proceed normally; if this was a rename, insert the old slug into `app_slug_history`.
+
+A rename back to an app's own retired slug is a special case: just delete the row from `app_slug_history` and don't warn.
+
+### API URL Layout
+
+The HTTP API stays **flat**:
+
+```
+GET    /api/v1/admin/apps                     — list apps
+POST   /api/v1/admin/apps                     — create app
+GET    /api/v1/admin/apps/{id_or_slug}        — get app
+PATCH  /api/v1/admin/apps/{id_or_slug}        — update app
+DELETE /api/v1/admin/apps/{id_or_slug}        — delete app
+GET    /api/v1/admin/apps/{id_or_slug}/domains   — list/manage domain claims
+POST   /api/v1/admin/apps/{id_or_slug}/domains
+DELETE /api/v1/admin/apps/{id_or_slug}/domains/{domain_id}
+
+GET    /api/v1/admin/scripts                  — list scripts (now supports ?app={id_or_slug} filter)
+GET    /api/v1/admin/scripts/{id}             — unchanged; script IDs are globally unique
+... (rest of scripts/routes endpoints unchanged)
+```
+
+The scripts and routes endpoints keep their existing shape — this avoids forcing API consumers to a v2 migration. The new app-management endpoints are additive. Clients that want app context can use the `?app=` filter.
+
+---
+
 ## 12. Development Roadmap

-### Phase 1: MVP ✓ (Current)
- [x] Orchestrator: REST API for script CRUD + execute
- [x] Executor image: load + run Rhai script
- [x] Dashboard: upload script, deploy, delete
- [x] PostgreSQL: script storage + execution logs
- [ ] **Timeline**: 4-6 weeks
+### Phase 1: MVP ✓ (Shipped)
+- [x] Manager: REST API for script CRUD + executions log
+- [x] Orchestrator: HTTP ingress, route resolution, dispatch
+- [x] Executor: embedded Rhai engine with sandbox limits (replaces the original Docker-per-execution model — embedded gives better latency and less infra)
+- [x] Dashboard (SvelteKit): script upload, edit, routing config, execution log viewer
+- [x] PostgreSQL: scripts, routes, execution_logs; embedded migrations
+- [x] Caddy reverse proxy in front of everything

-**Deliverables:**
- Docker image for executor
- Rust binary (Orchestrator)
- Static HTML + Alpine.js dashboard
- docker-compose.yml for local/prod deployment
+**Delivered beyond original MVP scope:** custom routing (exact / prefix / param + host-aware) with conflict detection, per-script Rhai sandbox config, four-tab dashboard detail UI, structured versioning scheme (product + SDK + API + schema + wire) with `/version` self-report, Rhai editor with autocomplete / goto / find-usages / formatter, SDK contract + schema snapshot + integration test suites.

 ---

-### Phase 2: v1.0 (Polish & Usability)
- Script versioning + rollback
- Execution history dashboard (view logs, timings, errors)
- Better error messages (script parse errors, timeouts)
- Timeout/resource limit enforcement
- Container cleanup/GC
- Rhai SDK: `request()` function fully documented
+### Phase 2: v1.0 (Polish & Usability) ✓ (Shipped)
+- [x] Execution history dashboard
+- [x] Better error messages (Rhai parse errors, sandbox limits, timeouts)
+- [x] Timeout / resource-limit enforcement (per-script sandbox config)
+- [x] Rhai SDK docs current through SDK 1.1

-**Timeline**: 2-3 weeks
+(Script versioning + rollback remains deferred — see Phase 6.)

 ---

-### Phase 3: v1.1 (Expand Capabilities & Services)
- Queue-based triggers (RabbitMQ / Redis)
- Scheduled jobs (cron syntax)
- Secrets management (encrypted env vars)
- **Rhai SDK: KV Store** (`kv.get()`, `kv.set()`, `kv.delete()` with collections)
- **Rhai SDK: Document Store** (`docs.create()`, `docs.find()`, `docs.update()`, `docs.delete()` with schema validation)
- **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset)
- **Rhai SDK: Email** (`email.send(to, subject, body)` via SMTP)
- Rhai SDK: `s3.*`, `queue.*`, `invoke()`, `retry.*()`
- External HTTP calls from scripts (`http.get()`, `http.post()`)
- Script versioning with automatic rollback on error
+### Phase 3: v1.0.x — Foundations (Current focus)

-**Timeline**: 8-10 weeks
+Two foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
+
+**3a. Admin auth** — see section 11.4. Per-user `admin_users` (not a shared secret), Argon2id passwords, env-var bootstrap of the first admin, session-token doubling as bearer token for API. No roles in this cut; schema is forward-compatible with later RBAC.
+
+**3b. Multi-app scoping** — see section 11.5. Introduce `apps`, `app_domains`, and `app_id` columns on `scripts` and `routes`. Migration assigns existing data to a `default` app (or seeds a `Hello World` app on fresh installs). Orchestrator dispatch becomes two-phase (Host → app → route). Reserved internal domain (`__internal__`) keeps `/api/v1/execute/{id}/*` working for app scripts without requiring a public hostname. Dashboard becomes app-hierarchical (`/admin/apps/{slug}/...`); API keeps its existing flat shape with new app-management endpoints under `/api/v1/admin/apps/*`.
+
+**Why both before v1.1**: every v1.1 service (KV, docs, users, etc.) needs an `app_id` scoping key in its schema. Adding it now, with one small migration on existing tables, is cheap. Adding it after those services ship is several migrations on populated data.

 ---

-### Phase 4: v1.2 (Advanced Workflows & Hierarchies)
+### Phase 4: v1.1 (Expand Capabilities & Services)
+Ordered roughly by foundation value: each row enables the rows below it.
+
+1. **Rhai SDK: KV Store** (`kv.get/set/delete/has` with collections, scoped per app)
+2. **Rhai SDK: Document Store** (`docs.create/find/update/delete/list/query`, scoped per app)
+3. **Rhai SDK: HTTP** (`http.get/post/put/delete` with SSRF deny-list)
+4. **Cron triggers** (manager scheduler skeleton already exists; needs schedules table + `FOR UPDATE SKIP LOCKED` dispatch)
+5. **Rhai SDK: Email** (`email.send` via SMTP; needs per-deploy config)
+6. **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset; depends on email for invites; scoped per app)
+7. **Queue triggers** (start with Postgres LISTEN/NOTIFY; RabbitMQ/Redis later if needed)
+8. **`invoke()` + `retry::*`** (function-to-function calls; execution_logs gain `parent_execution_id`)
+9. **Secrets management** (encrypted env vars, per app)
+
+---
+
+### Phase 5: v1.2 (Advanced Workflows & Hierarchies)
 - Function workflows (DAG execution, conditional branching, error handling)
- Function hierarchy (parent/child invocation, sync/async calls)
 - Nested workflows
 - Call graph visualization + execution tracing
- Advanced query support for document store (`docs.query()` with filters)
-
-**Timeline**: 6-8 weeks
+- Advanced query support for document store (`docs.query()` with filters: `$gt`, `$or`, etc.)
+- Service interceptors (see section 9.4)

 ---

-### Phase 5: v1.3+ (Scaling, Security, Observability)
- Multi-user / project namespacing
+### Phase 6: v1.3+ (Scaling, Security, Observability)
+- Cluster mode (split-process manager + per-node orchestrator + executor); cluster-mode wire protocol versioning
+- Cross-app data sharing (explicit export/import model — see section 11.5)
+- Script versioning + rollback (keep N historical versions in a side table; rollback endpoint)
 - Rate limiting on endpoints
- Auth (API keys, dashboard login)
+- Auth (richer model: API keys, OAuth, etc.)
 - Metrics + monitoring dashboard
- Container pooling / warm starts
 - Distributed tracing (OpenTelemetry)
 - Webhooks for execution events
 - S3 integration (object storage reads/writes)