docs: design Phase 3 admin auth and multi-app scoping

Adds blueprint sections 11.4 (admin auth) and 11.5 (app scoping) and
restructures the section 12 roadmap to put both ahead of v1.1, since
retrofitting app_id into KV/docs/users schemas after they ship is far
more expensive than adding it now.

Admin auth: per-user admin_users (not a shared secret), Argon2id,
env-var bootstrap that becomes inert after first admin exists, session
token doubling as bearer token, 24h sliding TTL. Schema designed
forward-compatible with later RBAC.

App scoping: apps own scripts/routes/domains. Domain claims at app
level (exact / wildcard / {param} parameterized) with collision check
at claim time, so route-conflict errors stay strictly intra-app.
Two-phase orchestrator dispatch (Host → app → route trie). Slug rename
keeps the old slug as a permanent redirect until another app claims
it. Fresh-install migration seeds a Hello World app; upgrades go into
a default app instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-24 22:58:37 +02:00
parent 56de652f7a
commit 646bd55174
2 changed files with 326 additions and 42 deletions

View File

@@ -8,6 +8,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
Authoritative design: [serverless_cloud_blueprint.md](serverless_cloud_blueprint.md). The blueprint is a living document — when architecture decisions are made in conversation that contradict it, treat the latest decision as truth and update the blueprint.
**Current focus (Phase 3, pre-v1.1):** admin auth gate, then multi-app scoping. The latter introduces `apps` as the top-level isolation boundary for scripts, routes, domains, and (later) data. See blueprint §11.5 for the design. Every v1.1+ feature must assume `app_id` exists as a scoping dimension.
## Three-Service Architecture
The platform splits into three logical services, each backed by a `*-core` library crate so the same logic runs in single-process MVP mode and split-process cluster mode:
@@ -26,7 +28,7 @@ In MVP, all three run in one process (`picloud` binary). In cluster mode, each r
Versioned API surfaces live under `/api/v{N}/...`. See [docs/versioning.md](docs/versioning.md) for the full scheme.
- `/api/v1/admin/*` — manager (control plane: script CRUD, routes CRUD + check + match, logs, config)
- `/api/v1/admin/*` — manager (control plane: script CRUD, routes CRUD + check + match, logs, config; apps CRUD once Phase 3b lands)
- `/api/v1/execute/{id}` — orchestrator (data plane: invoke a script by ID, always-available bypass)
- `/admin/*` — dashboard SPA (SvelteKit, `paths.base = '/admin'`)
- `/healthz` — liveness (string `"ok"`)
@@ -37,6 +39,10 @@ Reserved path prefixes (rejected at route creation): `/api/`, `/admin/`, `/healt
Caddy fronts everything. Same Caddyfile shape works for single-node and cluster — only upstream targets change.
**Param syntax convention:** route paths use `:name` (e.g., `/users/:id`); domains (once apps land) use `{name}` (e.g., `{tenant}.example.com`). These are deliberately distinct — never use `:` in a domain context or `{}` in a route-path context.
**Two-phase dispatch (Phase 3b onward):** the orchestrator first resolves `Host` → app (most-specific domain claim wins), then runs that app's route trie. The route matcher itself is unchanged and never sees other apps' routes.
## Tech Stack
- **Rust 1.92+** workspace, pinned via `rust-toolchain.toml`
@@ -102,4 +108,6 @@ docs/
## Out of MVP
Queue triggers, cron triggers, SMTP ingress, KV / docs / email / users / HTTP SDKs in scripts, interceptors, workflows, function-to-function `invoke()`, auth, multi-tenancy, secrets, metrics dashboard. All deferred to v1.1+ per the blueprint. Don't pre-build for them — but don't make decisions that close the door on them either.
Queue triggers, cron triggers, SMTP ingress, KV / docs / email / users / HTTP SDKs in scripts, interceptors, workflows, function-to-function `invoke()`, secrets, metrics dashboard. All deferred to v1.1+ per the blueprint. Don't pre-build for them — but don't make decisions that close the door on them either.
**Pulled forward to Phase 3 (pre-v1.1):** admin auth, multi-app scoping. Cross-app data sharing (export/import) stays at v1.3+; the initial cut enforces strict isolation. See blueprint §11.5.

View File

@@ -732,68 +732,344 @@ volumes:
---
## 11.4 Admin Auth (Phase 3a)
**Purpose**: gate the admin API (`/api/v1/admin/*`) and dashboard (`/admin/*`) behind per-user authentication. Today the surface is open — anyone reaching the bound port can create, edit, and delete scripts.
**Why per-user, not a shared secret**: shared admin passwords get shared between humans, leave no audit trail, and can't be revoked per-person. Per-user accounts solve all three. The initial cut deliberately stops there — no roles, no per-app permissions — because that scope is small enough to ship in a single phase without blocking Phase 3b. Roles + per-app permissions are queued for v1.3+.
### Naming: `admin_users` vs `users`
We reserve the unqualified **`users`** table for the v1.1+ Rhai SDK feature (script-level end users — see §8.4). Platform-operator accounts live in **`admin_users`**. They are different concepts and never share rows, even when a PiCloud install hosts apps that themselves run user management.
### Schema
```sql
CREATE TABLE admin_users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username TEXT NOT NULL UNIQUE,
password_hash TEXT NOT NULL, -- Argon2id
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
last_login_at TIMESTAMP
);
CREATE TABLE admin_sessions (
token_hash TEXT PRIMARY KEY, -- SHA-256 of the bearer token; raw token only exists in the login response + cookie
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP NOT NULL,
last_used_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_admin_sessions_user ON admin_sessions(user_id);
CREATE INDEX idx_admin_sessions_expiry ON admin_sessions(expires_at);
```
**Password hashing**: Argon2id with default OWASP parameters. This also resolves the v1.1+ open question about user-password hashing (§10) — the platform settles on Argon2id once, here.
### Bootstrap
On startup, if `admin_users` is empty, the manager reads `PICLOUD_ADMIN_USERNAME` plus a password from env (or a config file) and inserts the row. Two password env vars are accepted, in this precedence:
1. **`PICLOUD_ADMIN_PASSWORD_HASH`** (recommended) — pre-computed Argon2id PHC-format hash. The platform validates the string parses, then inserts it as-is. This avoids the raw password ever being written into env/compose files or process listings.
2. **`PICLOUD_ADMIN_PASSWORD`** (fallback) — raw password. The platform hashes it with Argon2id defaults and discards the raw value. Simpler for first-time setup; less ideal for committed configs.
If both are set, the hash wins and the raw value is ignored (with a warning logged). If neither is set on a fresh install, startup fails with a clear error pointing at the env vars.
**Once that bootstrap row exists, the env vars become inert** — restarting with different values does not change the password. This is deliberate: the env var is a one-time setup hatch, not a recovery backdoor (a backdoor would let anyone with systemd-unit or compose-file access override any admin's password).
Recovery is a separate manual flow:
```sh
picloud admin reset-password <username>
```
This requires shell access on the host (and therefore implies the operator already controls the box).
### Login & Session
```
POST /api/v1/admin/auth/login
{ "username": "...", "password": "..." }
→ 200 OK
Set-Cookie: picloud_session=<token>; HttpOnly; Secure; SameSite=Lax; Path=/
{ "user": { "id": "...", "username": "..." }, "token": "<token>", "expires_at": "..." }
```
Token format: opaque random string (32 bytes base64). Stored hashed; the raw value lives only in the login response and the session cookie. The same token works as a bearer credential for non-browser clients:
```
Authorization: Bearer <token>
```
One token system serves both dashboard and CLI/CI clients — no separate "API token" concept. Personal long-lived API tokens can be added later as a distinct `admin_api_tokens` table if demand appears.
**Session TTL** is a **24-hour sliding window**: each authenticated request bumps `expires_at` to `now + ttl` and `last_used_at` to `now`. The TTL itself is configurable per deploy via `PICLOUD_SESSION_TTL_HOURS` (default `24`). A separate background sweep deletes rows where `expires_at < now()`; until that sweep runs, expired rows are also rejected at auth-check time (so a stuck sweep can't extend session lifetime past expiry).
Companion endpoints:
- `POST /api/v1/admin/auth/logout` — deletes the session row.
- `GET /api/v1/admin/auth/me` — returns the current authenticated user.
### Admin User Management
```
GET /api/v1/admin/admins — list
POST /api/v1/admin/admins — create
GET /api/v1/admin/admins/{id} — get
PATCH /api/v1/admin/admins/{id} — update (username, password)
DELETE /api/v1/admin/admins/{id} — delete (rejected if it would leave zero admins)
```
Initial cut: every authenticated admin can call all of these. No self-elevation concerns because there are no privilege levels yet.
### Forward Compatibility
Schema is intentionally simple so role/permission tables can be added without touching `admin_users`. Illustrative future shape:
```sql
CREATE TABLE admin_roles (
id UUID PRIMARY KEY,
name TEXT UNIQUE -- e.g., 'super_admin', 'app_editor', 'app_viewer'
);
CREATE TABLE admin_user_roles (
admin_user_id UUID REFERENCES admin_users(id) ON DELETE CASCADE,
role_id UUID REFERENCES admin_roles(id) ON DELETE RESTRICT,
app_id UUID REFERENCES apps(id) ON DELETE CASCADE, -- nullable for global roles
PRIMARY KEY (admin_user_id, role_id, app_id)
);
```
Permission checks land in middleware that initially only enforces "authenticated"; the same middleware is the seam where role checks slot in later. Don't pre-build the role tables — but keep the middleware shape such that adding them is a localized change.
---
## 11.5 App Scoping (v1.x)
**Purpose**: PiCloud hosts multiple independent applications on one platform. Each app is the isolation boundary for scripts, routes, domains, and (later) data — App A cannot see or modify App B's resources except through HTTP calls between them.
**Why this slot**: pulled forward from the original v1.3+ "multi-user / project namespacing" bullet. Adding the `app_id` scoping dimension to schemas while the surface is small is cheap; retrofitting it after KV, docs, users, etc. ship is a multi-table migration on populated data.
### Apps Own Scripts
Every script belongs to exactly one app (`scripts.app_id`, non-null). Script IDs remain globally unique UUIDs — the API operates on script IDs directly without needing `app_id` in the URL. The dashboard nests scripts under their app in URLs (see "Dashboard URL Layout" below) but the script ID alone is still enough to resolve them server-side.
Cross-app script reuse is not done by linking. A future **duplicate-to-app** feature may copy a script's content and config into another app under a new ID, with **snapshot semantics**: the copy is independent, and changes to the original do not propagate. Genuine cross-app integration goes through HTTP calls (and, much later, an explicit export/import model for shared data).
### Apps Own Domains
Routes can no longer claim arbitrary hostnames freely. Each app declares a set of **domain claims**:
| Form | Example | Matches |
|---|---|---|
| Exact host | `app.example.com` | only that exact host |
| Single-label wildcard | `*.example.com` | one label deep: `foo.example.com`, not `a.b.example.com` |
| Parameterized | `{tenant}.example.com` | same shape as wildcard; binds `tenant` into request context |
**Syntax convention**: domain parameters use `{name}` (curly braces); route-path parameters use `:name` (colon). These are deliberately distinct so docs and conflict messages never confuse the two.
Every app also implicitly carries the reserved claim `__internal__`, granting access to `/api/v1/execute/{id}/*` for that app's scripts. An app with no public domain still works for execute-by-id (and, later, cron triggers, queue triggers, etc.).
When a route is created, its host must match one of the parent app's domain claims. The dashboard's route-creation UI offers a selector populated from the app's claims rather than a free-text host field.
### Conflict Rules — Checked at Claim Time
Domain-claim collisions are detected when a domain is added to an app, not when requests arrive:
- **Exact vs identical exact** → reject ("domain already claimed").
- **Exact vs wildcard** → allowed. `foo.example.com` (App A) coexists with `*.example.com` (App B); at request time the more-specific match wins, so A handles `foo.example.com`, B handles every other subdomain.
- **Wildcard vs wildcard at the same shape** → reject. Two apps cannot both claim `*.example.com`. `{tenant}.example.com` has the same shape as `*.example.com` for this check — the parameter name is a binding, not a discriminator.
Route-conflict errors are strictly **intra-app**. A user creating a route inside App A never sees an error that references App B. The only cross-app surface is "this domain is already claimed" at domain-claim time, which is honest and unavoidable.
### Runtime Dispatch
Request handling becomes a two-phase lookup:
1. **Host → app**: pick the app whose claim most-specifically matches the request's `Host` header (exact beats wildcard; ties are impossible by the claim rules above).
2. **Path → route**: run that app's route trie unchanged using the existing matcher.
The orchestrator's route matcher does not learn about apps — it just operates on whichever app's table was selected in step 1. This keeps the existing conflict-detection logic intact.
### Local Development
On `localhost`, `localhost` is treated as a regular domain claimed by exactly one app, defaulting to a bootstrap "default" app installed at first run. Dev and prod use the same dispatch model — no second mental model.
### Cross-App Data Sharing — Deferred
Per-app isolation is the **default and only mode** in the initial cut. KV collection `users` in App A is distinct from KV collection `users` in App B; App B cannot read App A's data without an HTTP endpoint that App A explicitly exposes.
A formal export/import model — where App B exports a collection under a public name and admin grants App A read or read-write access — is a future addition. Until it ships, the escape hatch is function-to-function HTTP calls. Sharing is easier to add than to retract; isolation comes first.
### Schema Sketch
```sql
CREATE TABLE apps (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug TEXT NOT NULL UNIQUE, -- URL-safe; used in dashboard paths
name TEXT NOT NULL, -- display name; can be edited freely
description TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE app_domains (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
pattern TEXT NOT NULL, -- 'app.example.com' | '*.example.com' | '{tenant}.example.com'
shape TEXT NOT NULL, -- 'exact' | 'wildcard' | 'parameterized'
shape_key TEXT NOT NULL, -- normalized form for collision check (parameterized → wildcard form)
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE (shape_key) -- two apps cannot share the same shape-key
);
ALTER TABLE scripts ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE RESTRICT;
ALTER TABLE routes ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE;
-- Existing route uniqueness checks remain unchanged; they are now scoped within an app.
```
The `UNIQUE (shape_key)` constraint enforces the "same shape" rule at the DB level. Exact-vs-wildcard coexistence is allowed because exact hosts produce a different `shape_key` from wildcards.
### Bootstrap & Migration
The migration's behavior **depends on whether the install already has user content**:
- **Fresh install** (no pre-existing scripts or routes): seed a **"Hello World"** app with `localhost` as its sole domain claim, a `hello.rhai` script that returns a greeting, and a `/hello` GET route. This serves as the reference example for new users — they can hit `http://localhost:<port>/hello` immediately after first boot and see something work. The seed is intentionally minimal; future iterations may flesh it out.
- **Upgrading install** (pre-existing scripts or routes): create a **"default"** app with `slug = 'default'`, `localhost` as its sole domain claim, and assign every existing script and route to it. The Hello World seed is **not** added in this case — adding it would pollute the user's existing content.
The branch point is detected by inspecting whether `scripts` had any rows before the migration ran.
### Dashboard URL Layout
The dashboard is **app-hierarchical**, using the app's `slug` for human-readable URLs:
```
/admin/apps — app list
/admin/apps/new — create app
/admin/apps/{slug} — app overview
/admin/apps/{slug}/scripts — scripts in this app
/admin/apps/{slug}/scripts/{id} — script detail (script ID still globally unique; slug is for breadcrumbs)
/admin/apps/{slug}/routes — routes in this app
/admin/apps/{slug}/domains — domain claims for this app
/admin/apps/{slug}/settings — app settings
```
Renaming an app changes its `slug`. The previous slug stays as a **permanent redirect** to the renamed app, persisting until another app (a new app or another rename) tries to claim that retired slug. When such a collision happens, the dashboard shows a warning before letting the operator proceed: *"`old-slug` currently redirects to app `bar` — using it here will break any external links that still target the old slug."* If the operator confirms, the redirect row is dropped and the slug is reused.
Implementation sketch:
```sql
CREATE TABLE app_slug_history (
slug TEXT PRIMARY KEY, -- the retired slug
current_app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
retired_at TIMESTAMP DEFAULT NOW()
);
```
Slug lookup order:
1. `apps.slug = {slug}` → render the page directly.
2. `app_slug_history.slug = {slug}``301` redirect to `/admin/apps/{current_app.slug}/<rest>`.
3. Neither → `404`.
Slug claim order (create or rename to a slug `S`):
1. If `S` matches a current app's slug → reject as a conflict (the usual unique-constraint error).
2. If `S` matches a row in `app_slug_history` → return a "needs confirmation" response. Dashboard surfaces the warning; on confirm, delete the history row inside the same transaction as the create/rename.
3. Otherwise → proceed normally; if this was a rename, insert the old slug into `app_slug_history`.
A rename back to an app's own retired slug is a special case: just delete the row from `app_slug_history` and don't warn.
### API URL Layout
The HTTP API stays **flat**:
```
GET /api/v1/admin/apps — list apps
POST /api/v1/admin/apps — create app
GET /api/v1/admin/apps/{id_or_slug} — get app
PATCH /api/v1/admin/apps/{id_or_slug} — update app
DELETE /api/v1/admin/apps/{id_or_slug} — delete app
GET /api/v1/admin/apps/{id_or_slug}/domains — list/manage domain claims
POST /api/v1/admin/apps/{id_or_slug}/domains
DELETE /api/v1/admin/apps/{id_or_slug}/domains/{domain_id}
GET /api/v1/admin/scripts — list scripts (now supports ?app={id_or_slug} filter)
GET /api/v1/admin/scripts/{id} — unchanged; script IDs are globally unique
... (rest of scripts/routes endpoints unchanged)
```
The scripts and routes endpoints keep their existing shape — this avoids forcing API consumers to a v2 migration. The new app-management endpoints are additive. Clients that want app context can use the `?app=` filter.
---
## 12. Development Roadmap
### Phase 1: MVP ✓ (Current)
- [x] Orchestrator: REST API for script CRUD + execute
- [x] Executor image: load + run Rhai script
- [x] Dashboard: upload script, deploy, delete
- [x] PostgreSQL: script storage + execution logs
- [ ] **Timeline**: 4-6 weeks
### Phase 1: MVP ✓ (Shipped)
- [x] Manager: REST API for script CRUD + executions log
- [x] Orchestrator: HTTP ingress, route resolution, dispatch
- [x] Executor: embedded Rhai engine with sandbox limits (replaces the original Docker-per-execution model — embedded gives better latency and less infra)
- [x] Dashboard (SvelteKit): script upload, edit, routing config, execution log viewer
- [x] PostgreSQL: scripts, routes, execution_logs; embedded migrations
- [x] Caddy reverse proxy in front of everything
**Deliverables:**
- Docker image for executor
- Rust binary (Orchestrator)
- Static HTML + Alpine.js dashboard
- docker-compose.yml for local/prod deployment
**Delivered beyond original MVP scope:** custom routing (exact / prefix / param + host-aware) with conflict detection, per-script Rhai sandbox config, four-tab dashboard detail UI, structured versioning scheme (product + SDK + API + schema + wire) with `/version` self-report, Rhai editor with autocomplete / goto / find-usages / formatter, SDK contract + schema snapshot + integration test suites.
---
### Phase 2: v1.0 (Polish & Usability)
- Script versioning + rollback
- Execution history dashboard (view logs, timings, errors)
- Better error messages (script parse errors, timeouts)
- Timeout/resource limit enforcement
- Container cleanup/GC
- Rhai SDK: `request()` function fully documented
### Phase 2: v1.0 (Polish & Usability) ✓ (Shipped)
- [x] Execution history dashboard
- [x] Better error messages (Rhai parse errors, sandbox limits, timeouts)
- [x] Timeout / resource-limit enforcement (per-script sandbox config)
- [x] Rhai SDK docs current through SDK 1.1
**Timeline**: 2-3 weeks
(Script versioning + rollback remains deferred — see Phase 6.)
---
### Phase 3: v1.1 (Expand Capabilities & Services)
- Queue-based triggers (RabbitMQ / Redis)
- Scheduled jobs (cron syntax)
- Secrets management (encrypted env vars)
- **Rhai SDK: KV Store** (`kv.get()`, `kv.set()`, `kv.delete()` with collections)
- **Rhai SDK: Document Store** (`docs.create()`, `docs.find()`, `docs.update()`, `docs.delete()` with schema validation)
- **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset)
- **Rhai SDK: Email** (`email.send(to, subject, body)` via SMTP)
- Rhai SDK: `s3.*`, `queue.*`, `invoke()`, `retry.*()`
- External HTTP calls from scripts (`http.get()`, `http.post()`)
- Script versioning with automatic rollback on error
### Phase 3: v1.0.x — Foundations (Current focus)
**Timeline**: 8-10 weeks
Two foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
**3a. Admin auth** — see section 11.4. Per-user `admin_users` (not a shared secret), Argon2id passwords, env-var bootstrap of the first admin, session-token doubling as bearer token for API. No roles in this cut; schema is forward-compatible with later RBAC.
**3b. Multi-app scoping** — see section 11.5. Introduce `apps`, `app_domains`, and `app_id` columns on `scripts` and `routes`. Migration assigns existing data to a `default` app (or seeds a `Hello World` app on fresh installs). Orchestrator dispatch becomes two-phase (Host → app → route). Reserved internal domain (`__internal__`) keeps `/api/v1/execute/{id}/*` working for app scripts without requiring a public hostname. Dashboard becomes app-hierarchical (`/admin/apps/{slug}/...`); API keeps its existing flat shape with new app-management endpoints under `/api/v1/admin/apps/*`.
**Why both before v1.1**: every v1.1 service (KV, docs, users, etc.) needs an `app_id` scoping key in its schema. Adding it now, with one small migration on existing tables, is cheap. Adding it after those services ship is several migrations on populated data.
---
### Phase 4: v1.2 (Advanced Workflows & Hierarchies)
### Phase 4: v1.1 (Expand Capabilities & Services)
Ordered roughly by foundation value: each row enables the rows below it.
1. **Rhai SDK: KV Store** (`kv.get/set/delete/has` with collections, scoped per app)
2. **Rhai SDK: Document Store** (`docs.create/find/update/delete/list/query`, scoped per app)
3. **Rhai SDK: HTTP** (`http.get/post/put/delete` with SSRF deny-list)
4. **Cron triggers** (manager scheduler skeleton already exists; needs schedules table + `FOR UPDATE SKIP LOCKED` dispatch)
5. **Rhai SDK: Email** (`email.send` via SMTP; needs per-deploy config)
6. **Rhai SDK: User Management** (auth, CRUD, roles, permissions, invitations, password reset; depends on email for invites; scoped per app)
7. **Queue triggers** (start with Postgres LISTEN/NOTIFY; RabbitMQ/Redis later if needed)
8. **`invoke()` + `retry::*`** (function-to-function calls; execution_logs gain `parent_execution_id`)
9. **Secrets management** (encrypted env vars, per app)
---
### Phase 5: v1.2 (Advanced Workflows & Hierarchies)
- Function workflows (DAG execution, conditional branching, error handling)
- Function hierarchy (parent/child invocation, sync/async calls)
- Nested workflows
- Call graph visualization + execution tracing
- Advanced query support for document store (`docs.query()` with filters)
**Timeline**: 6-8 weeks
- Advanced query support for document store (`docs.query()` with filters: `$gt`, `$or`, etc.)
- Service interceptors (see section 9.4)
---
### Phase 5: v1.3+ (Scaling, Security, Observability)
- Multi-user / project namespacing
### Phase 6: v1.3+ (Scaling, Security, Observability)
- Cluster mode (split-process manager + per-node orchestrator + executor); cluster-mode wire protocol versioning
- Cross-app data sharing (explicit export/import model — see section 11.5)
- Script versioning + rollback (keep N historical versions in a side table; rollback endpoint)
- Rate limiting on endpoints
- Auth (API keys, dashboard login)
- Auth (richer model: API keys, OAuth, etc.)
- Metrics + monitoring dashboard
- Container pooling / warm starts
- Distributed tracing (OpenTelemetry)
- Webhooks for execution events
- S3 integration (object storage reads/writes)