Two review-pass nits from the v1.1.0-foundation review:
- Blueprint §6 Tech Stack table still listed the database as
"PostgreSQL + hstore" with an hstore-for-KV rationale — directly
contradicting the §8.1 KV rewrite that explicitly rejected hstore
in favour of JSONB. Updates the row so the high-level summary
matches the §8.1 reasoning.
- LocalExecutorClient::execute now documents the permit-vs-timeout
interaction: when tokio::time::timeout fires the future drops and
the permit returns, but the detached spawn_blocking thread keeps
running until the Rhai script winds down. In-use blocking threads
can briefly exceed the gate's permit count after a timeout. Calling
it out so future readers don't read the implementation as buggy.
No behaviour change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1835 lines
78 KiB
Markdown
1835 lines
78 KiB
Markdown
# Project Blueprint: Lightweight Event-Based Serverless Cloud
|
||
|
||
**Status**: Phase 4 — Blueprint Complete
|
||
**Last Updated**: 2026-05-27
|
||
**Audience**: Solo developer (DIY self-hosted)
|
||
|
||
---
|
||
|
||
## 1. Project Overview
|
||
|
||
### Vision
|
||
A lightweight, self-hosted, event-driven compute platform that allows developers to deploy and trigger Rhai scripts via HTTP endpoints. Scripts run in isolated containers, scale to zero when idle, and return structured responses. Optimized for resource efficiency on consumer hardware (< 100 functions).
|
||
|
||
### Core Value Proposition
|
||
- **Simple deployment**: Upload a Rhai script, get an HTTP endpoint
|
||
- **Minimal overhead**: Containers spawn on-demand, no persistent services running
|
||
- **DIY-friendly**: Run on modest hardware (single server, RPi-adjacent)
|
||
- **Extensible**: Pluggable storage, compute, and messaging later
|
||
|
||
### MVP Scope
|
||
**In Scope:**
|
||
- Dashboard: script upload + metadata (name, description, version, config)
|
||
- REST API: script CRUD operations
|
||
- HTTP-triggered script execution
|
||
- Request → Rhai script → JSON response
|
||
- PostgreSQL for script storage
|
||
- Docker for isolated execution
|
||
- Execution logs and basic observability
|
||
|
||
**Out of Scope (v1.1+):**
|
||
- Queue-based triggers
|
||
- Scheduled jobs (cron)
|
||
- Multi-user/projects
|
||
- External HTTP calls from scripts
|
||
- Metrics dashboards
|
||
- Secrets management
|
||
- Script versioning/rollback
|
||
|
||
### Success Criteria
|
||
1. Deploy a Rhai script in < 1 minute
|
||
2. Script responds to HTTP requests within 500ms (p95)
|
||
3. Runs on single modest server (2GB RAM, dual-core CPU)
|
||
4. No background services consume CPU when idle
|
||
|
||
---
|
||
|
||
## 2. Architecture Overview
|
||
|
||
### High-Level System Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Self-Hosted Server │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ┌──────────────────────┐ ┌──────────────────────┐ │
|
||
│ │ Web Dashboard │ │ Orchestrator API │ │
|
||
│ │ (Alpine.js SPA) │ │ (Rust + Axum) │ │
|
||
│ │ Port 3000 │ │ Port 8080 │ │
|
||
│ └──────┬───────────────┘ └──────────┬───────────┘ │
|
||
│ │ │ │
|
||
│ │ Upload script │ HTTP requests │
|
||
│ │ Manage scripts │ Script metadata │
|
||
│ │ │ │
|
||
│ └────────────────┬────────────────────┘ │
|
||
│ │ │
|
||
│ ┌───────▼────────┐ │
|
||
│ │ PostgreSQL │ │
|
||
│ │ (scripts, MD) │ │
|
||
│ └────────────────┘ │
|
||
│ │ │
|
||
│ ┌────────────────┼────────────────┐ │
|
||
│ │ │ │ │
|
||
│ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │
|
||
│ │Container │ │Container │ │Container │ │
|
||
│ │ Instance │ │ Instance │ │ Instance │ (on-demand) │
|
||
│ │(Rhai Ex.)│ │(Rhai Ex.)│ │(Rhai Ex.)│ │
|
||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||
│ │ │ │ │
|
||
│ └─────────────────┼────────────────┘ │
|
||
│ │ │
|
||
│ ┌────────▼────────┐ │
|
||
│ │ Docker Daemon │ │
|
||
│ │ (container mgmt) │ │
|
||
│ └─────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Data Flow: HTTP Request → Response
|
||
|
||
1. **HTTP Request** arrives at Orchestrator (`POST /api/execute/{script_id}`)
|
||
2. **Orchestrator** fetches script from PostgreSQL
|
||
3. **Docker daemon** spawns container from pre-built executor image
|
||
4. **Container startup** loads script into Rhai runtime + passes request context
|
||
5. **Rhai script** executes, processes request, returns JSON object
|
||
6. **Orchestrator** extracts `statusCode`, `headers`, `body` from response
|
||
7. **HTTP Response** sent to client
|
||
8. **Container** is destroyed (scale to zero)
|
||
|
||
---
|
||
|
||
## 3. Core Components
|
||
|
||
### 3.1 Orchestrator Service
|
||
**Language**: Rust
|
||
**Framework**: Axum
|
||
**Port**: 8080 (default)
|
||
|
||
**Responsibilities:**
|
||
- HTTP server (REST API for script management + trigger)
|
||
- Script lifecycle: fetch, validate, store
|
||
- Container orchestration: spawn, monitor, cleanup
|
||
- Request/response marshalling
|
||
- Error handling & logging
|
||
|
||
**Key Endpoints (MVP):**
|
||
- `POST /api/scripts` — upload script
|
||
- `GET /api/scripts` — list all scripts
|
||
- `DELETE /api/scripts/{id}` — delete script
|
||
- `POST /api/execute/{script_id}` — trigger script execution (with request body/headers)
|
||
|
||
**Internal Tasks:**
|
||
- Periodically clean up orphaned containers (optional, for MVP just GC on startup)
|
||
- Log execution events to stdout/logs
|
||
|
||
---
|
||
|
||
### 3.2 Executor Container Image
|
||
**Base**: `alpine:latest`
|
||
**Contents**:
|
||
- Rhai runtime (compiled binary or via package manager)
|
||
- Minimal libc (musl on Alpine)
|
||
- Script loader + executor wrapper
|
||
- Logging utilities
|
||
|
||
**Startup Flow:**
|
||
```bash
|
||
# Pseudo-code
|
||
SCRIPT_CONTENT=$(passed via env var or stdin)
|
||
SCRIPT_PATH=/tmp/script.rhai
|
||
echo "$SCRIPT_CONTENT" > $SCRIPT_PATH
|
||
|
||
REQUEST_JSON=$(read from stdin or env)
|
||
rhai_executor --script $SCRIPT_PATH --request "$REQUEST_JSON"
|
||
```
|
||
|
||
**Output**: JSON response to stdout, captured by Orchestrator
|
||
|
||
---
|
||
|
||
### 3.3 Dashboard (Web UI)
|
||
**Framework**: Alpine.js (MVP), Svelte (v1.0+)
|
||
**Port**: 3000 (default)
|
||
|
||
**Features (MVP):**
|
||
- Script upload form (file picker or textarea)
|
||
- Script metadata input (name, description, version, config)
|
||
- Config fields: timeout (s), memory limit (MB), enabled service access (DB/S3/queue/functions)
|
||
- List of deployed scripts
|
||
- Simple "Deploy" / "Delete" actions
|
||
|
||
**Technology Stack:**
|
||
- HTML + CSS + Alpine.js
|
||
- Fetch API to call Orchestrator
|
||
- No build step (initially), just serve static files
|
||
|
||
---
|
||
|
||
### 3.4 PostgreSQL Database
|
||
**Schema (MVP):**
|
||
|
||
```sql
|
||
CREATE TABLE scripts (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
name TEXT NOT NULL,
|
||
description TEXT,
|
||
version INT DEFAULT 1,
|
||
script_content TEXT NOT NULL,
|
||
|
||
-- Config
|
||
timeout_seconds INT DEFAULT 30,
|
||
memory_limit_mb INT DEFAULT 256,
|
||
|
||
-- Service access (MVP: unused, future)
|
||
access_db BOOLEAN DEFAULT false,
|
||
access_s3 BOOLEAN DEFAULT false,
|
||
access_queue BOOLEAN DEFAULT false,
|
||
access_functions BOOLEAN DEFAULT false,
|
||
|
||
-- Metadata
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
-- Execution tracking (MVP: optional)
|
||
last_executed_at TIMESTAMP,
|
||
execution_count INT DEFAULT 0
|
||
);
|
||
|
||
CREATE TABLE execution_logs (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
script_id UUID REFERENCES scripts(id) ON DELETE CASCADE,
|
||
request_path TEXT,
|
||
request_headers JSONB,
|
||
request_body JSONB,
|
||
response_code INT,
|
||
response_body JSONB,
|
||
logs TEXT,
|
||
duration_ms INT,
|
||
status TEXT, -- 'success', 'timeout', 'error', etc.
|
||
created_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
```
|
||
|
||
**Rationale:**
|
||
- Simple, relational structure
|
||
- `execution_logs` for audit trail + debugging (can be pruned later)
|
||
- JSONB for flexible config/response storage
|
||
|
||
---
|
||
|
||
## 4. Data Model
|
||
|
||
### Script Entity
|
||
```json
|
||
{
|
||
"id": "uuid",
|
||
"name": "Process Payment",
|
||
"description": "Webhook handler for payment processor",
|
||
"version": 1,
|
||
"script_content": "let req = request();\nlet amt = req.body.amount;\n{ statusCode: 200, body: { processed: amt } }",
|
||
"timeout_seconds": 30,
|
||
"memory_limit_mb": 256,
|
||
"access_db": false,
|
||
"access_s3": false,
|
||
"access_queue": false,
|
||
"access_functions": false,
|
||
"interceptors": {
|
||
"s3": { "before_write": false },
|
||
"documents": { "before_create": false },
|
||
"queue": { "before_send": false }
|
||
},
|
||
"created_at": "2026-04-10T12:00:00Z",
|
||
"updated_at": "2026-04-10T12:00:00Z",
|
||
"last_executed_at": "2026-04-10T12:05:00Z",
|
||
"execution_count": 42
|
||
}
|
||
```
|
||
|
||
### Execution Log Entity
|
||
```json
|
||
{
|
||
"id": "uuid",
|
||
"script_id": "uuid",
|
||
"request_path": "/api/execute/script-123",
|
||
"request_headers": { "content-type": "application/json" },
|
||
"request_body": { "amount": 100 },
|
||
"response_code": 200,
|
||
"response_body": { "processed": 100 },
|
||
"logs": "[12:05:10] Script started\n[12:05:11] Processing...",
|
||
"duration_ms": 145,
|
||
"status": "success",
|
||
"created_at": "2026-04-10T12:05:11Z"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 5. API Specification (MVP)
|
||
|
||
### 5.1 Upload Script
|
||
```
|
||
POST /api/scripts
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"name": "string",
|
||
"description": "string",
|
||
"script_content": "string",
|
||
"timeout_seconds": 30,
|
||
"memory_limit_mb": 256
|
||
}
|
||
|
||
Response: 201 Created
|
||
{
|
||
"id": "uuid",
|
||
"name": "...",
|
||
...
|
||
}
|
||
```
|
||
|
||
### 5.2 List Scripts
|
||
```
|
||
GET /api/scripts
|
||
|
||
Response: 200 OK
|
||
[
|
||
{ id: "...", name: "...", ... },
|
||
{ id: "...", name: "...", ... }
|
||
]
|
||
```
|
||
|
||
### 5.3 Delete Script
|
||
```
|
||
DELETE /api/scripts/{script_id}
|
||
|
||
Response: 204 No Content
|
||
```
|
||
|
||
### 5.4 Execute Script (via HTTP Endpoint)
|
||
```
|
||
POST /api/execute/{script_id}
|
||
Content-Type: application/json
|
||
[any headers]
|
||
|
||
[any request body]
|
||
|
||
Response: [script-returned status code]
|
||
{
|
||
"..." : "..."
|
||
}
|
||
```
|
||
|
||
**Notes:**
|
||
- Script receives full HTTP request (path, headers, body)
|
||
- Response is script's JSON object (assumes `{ statusCode, headers, body }`)
|
||
- On error (timeout, crash): `{ statusCode: 500, body: "Server error" }`
|
||
|
||
---
|
||
|
||
## 6. Rhai SDK (MVP Stub)
|
||
|
||
For MVP, scripts have access to:
|
||
|
||
### Core Request/Response
|
||
- **ctx object**: Contains execution metadata + request data (see below)
|
||
- **Return value**: `{ statusCode: int, headers: object, body: object }`
|
||
|
||
### Context Object (Available Globally)
|
||
```rhai
|
||
// Execution metadata
|
||
ctx.execution_id // UUID of this execution
|
||
ctx.script_id // UUID of the script being run
|
||
ctx.script_name // Name of the script
|
||
ctx.request_id // Request ID for tracing
|
||
ctx.trace_id // For call graphs (v1.2+)
|
||
ctx.invocation_type // 'http', 'function', 'scheduled', etc.
|
||
ctx.parent_execution_id // For function hierarchies (v1.2+)
|
||
|
||
// Request context
|
||
ctx.request.path // HTTP path
|
||
ctx.request.headers // HTTP headers object
|
||
ctx.request.body // Request body (parsed JSON or raw)
|
||
```
|
||
|
||
### Structured Logging (v1.0+)
|
||
```rhai
|
||
log.info("Processing order", { order_id: 123, user: "alice" });
|
||
log.warn("Rate limit approaching", { remaining: 10 });
|
||
log.error("Payment failed", { error: "timeout", retry_count: 2 });
|
||
log.debug("Internal state", { state: { ... } });
|
||
```
|
||
**Output**: Captured in execution logs, searchable in dashboard
|
||
|
||
### Error Handling & Retry (v1.1+)
|
||
```rhai
|
||
// Retry a function with exponential backoff
|
||
let result = retry::call(
|
||
|| { invoke("process-data", { item: 123 }) },
|
||
{
|
||
max_attempts: 3,
|
||
backoff: "exponential", // or "linear"
|
||
initial_delay_ms: 100,
|
||
max_delay_ms: 5000
|
||
}
|
||
);
|
||
|
||
// Retry an HTTP call
|
||
let response = retry::http_call(
|
||
|| { http.post("https://api.example.com/webhook", body) },
|
||
{
|
||
max_attempts: 5,
|
||
backoff: "exponential",
|
||
on_retry: |attempt, error| {
|
||
log.warn("Retry attempt", { attempt, error });
|
||
}
|
||
}
|
||
);
|
||
|
||
// Manual error handling
|
||
try {
|
||
let data = invoke("might-fail", {});
|
||
} catch err {
|
||
log.error("Invocation failed", { error: err });
|
||
return { statusCode: 500, body: { error: "Service unavailable" } };
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6.1 Future: Document Schema Validation (v1.2+)
|
||
|
||
For documents, allow optional **schema definitions** similar to MongoDB:
|
||
|
||
```rhai
|
||
// Define schema when creating
|
||
docs.create("users",
|
||
{ name: "Alice", email: "alice@example.com" },
|
||
{
|
||
schema: {
|
||
name: "string",
|
||
email: "string",
|
||
age: "number?", // optional
|
||
tags: "array"
|
||
}
|
||
}
|
||
);
|
||
|
||
// Validate before update
|
||
docs.update("users", user_id,
|
||
{ age: 31 },
|
||
{ schema: { age: "number" } }
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## 6.2 Example Script: Full SDK Usage
|
||
|
||
```rhai
|
||
// Get execution and request context
|
||
let user_id = ctx.request.body.user_id;
|
||
|
||
// Log start
|
||
log.info("Processing request", {
|
||
script: ctx.script_name,
|
||
execution_id: ctx.execution_id
|
||
});
|
||
|
||
// Call another function with retry
|
||
let user_data = retry::call(
|
||
|| { invoke("fetch-user", { id: user_id }) },
|
||
{ max_attempts: 2, backoff: "linear" }
|
||
);
|
||
|
||
if user_data.statusCode != 200 {
|
||
log.error("Failed to fetch user", { response: user_data });
|
||
return { statusCode: 500, body: { error: "User fetch failed" } };
|
||
}
|
||
|
||
// Store in KV cache
|
||
kv.set("user-cache", `user:${user_id}`, user_data.body, 3600);
|
||
|
||
// Store in documents
|
||
let doc = docs.create("user-requests", {
|
||
user_id: user_id,
|
||
request_at: "2026-04-10T12:00:00Z",
|
||
status: "processed"
|
||
});
|
||
|
||
// Log completion
|
||
log.info("Request processed", {
|
||
doc_id: doc,
|
||
user_id: user_id
|
||
});
|
||
|
||
return {
|
||
statusCode: 200,
|
||
headers: { "Content-Type": "application/json" },
|
||
body: { user: user_data.body, cached: true }
|
||
};
|
||
```
|
||
|
||
### 8.4 User Management Service
|
||
**Purpose**: Built-in user authentication, management, and invitations with secure password handling.
|
||
|
||
**PostgreSQL Schema:**
|
||
```sql
|
||
CREATE EXTENSION IF NOT EXISTS pgcrypto; -- For password hashing
|
||
|
||
CREATE TABLE users (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
email TEXT NOT NULL UNIQUE,
|
||
password_hash TEXT NOT NULL,
|
||
password_salt TEXT NOT NULL,
|
||
|
||
-- Profile
|
||
name TEXT,
|
||
locked BOOLEAN DEFAULT false,
|
||
|
||
-- Roles & Permissions
|
||
roles TEXT[] DEFAULT '{}', -- e.g., ["admin", "moderator"]
|
||
permissions JSONB DEFAULT '{}', -- Custom permissions structure
|
||
|
||
-- Metadata
|
||
metadata JSONB DEFAULT '{}', -- Custom user data (profile pic URL, preferences, etc.)
|
||
|
||
-- Audit
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW(),
|
||
last_login_at TIMESTAMP,
|
||
last_password_change_at TIMESTAMP
|
||
);
|
||
|
||
-- Invitations & password reset tokens
|
||
CREATE TABLE user_tokens (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
|
||
token_type TEXT NOT NULL, -- 'invite', 'password_reset', 'login_link'
|
||
token_hash TEXT NOT NULL UNIQUE,
|
||
expires_at TIMESTAMP NOT NULL,
|
||
used_at TIMESTAMP,
|
||
created_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
|
||
CREATE INDEX idx_users_email ON users(email);
|
||
CREATE INDEX idx_user_tokens_user_id ON user_tokens(user_id);
|
||
CREATE INDEX idx_user_tokens_type ON user_tokens(token_type);
|
||
```
|
||
|
||
**Rhai SDK (v1.1+):**
|
||
|
||
```rhai
|
||
// ===== CREATE & INVITE =====
|
||
|
||
// Create user with password
|
||
let user_id = users.create({
|
||
email: "alice@example.com",
|
||
password: "secure-password",
|
||
name: "Alice Smith",
|
||
roles: ["user"],
|
||
metadata: { profile_pic: "https://..." }
|
||
});
|
||
|
||
// Send invite link (creates token, sends email)
|
||
users.send_invite(email) → { token_sent: true, expires_in_days: 7 }
|
||
|
||
// Set password from invite/reset token
|
||
users.set_password_from_token(token, new_password) → { user_id, success: true }
|
||
|
||
// ===== AUTHENTICATION =====
|
||
|
||
// Authenticate user
|
||
let user = users.authenticate(email, password);
|
||
if user {
|
||
let user_id = user.id;
|
||
let roles = user.roles;
|
||
} else {
|
||
// Authentication failed
|
||
}
|
||
|
||
// Send password reset link
|
||
users.send_password_reset(email) → { sent: true, expires_in_hours: 24 }
|
||
|
||
// Send login link (passwordless)
|
||
users.send_login_link(email) → { sent: true, expires_in_minutes: 15 }
|
||
|
||
// Verify login link token
|
||
let user = users.verify_login_token(token);
|
||
|
||
// ===== READ & SEARCH =====
|
||
|
||
// Get user by ID
|
||
let user = users.get(user_id);
|
||
|
||
// Find user by email
|
||
let user = users.find_by_email("alice@example.com");
|
||
|
||
// Search users
|
||
let results = users.search({
|
||
query: "alice", // Searches email, name
|
||
limit: 50,
|
||
offset: 0
|
||
});
|
||
|
||
// List users with filtering
|
||
let users_list = users.list({
|
||
roles: ["admin"], // Filter by roles
|
||
locked: false, // Include/exclude locked users
|
||
limit: 100,
|
||
offset: 0
|
||
});
|
||
|
||
// ===== UPDATE =====
|
||
|
||
// Update user data (except password)
|
||
users.update(user_id, {
|
||
name: "Alice Johnson",
|
||
roles: ["user", "moderator"],
|
||
metadata: { theme: "dark", notifications: true }
|
||
});
|
||
|
||
// Update password (requires old password or token)
|
||
users.update_password(user_id, old_password, new_password)
|
||
→ { success: true } or { error: "Wrong password" }
|
||
|
||
// ===== LOCK & DELETE =====
|
||
|
||
// Lock user (disable login)
|
||
users.lock(user_id) → { success: true }
|
||
|
||
// Unlock user
|
||
users.unlock(user_id) → { success: true }
|
||
|
||
// Delete user
|
||
users.delete(user_id) → { success: true }
|
||
|
||
// ===== PERMISSIONS & ROLES =====
|
||
|
||
// Check if user has role
|
||
if users.has_role(user_id, "admin") {
|
||
// Allow admin action
|
||
}
|
||
|
||
// Check if user has permission
|
||
if users.has_permission(user_id, "posts:delete") {
|
||
// Allow deletion
|
||
}
|
||
|
||
// Grant role to user
|
||
users.add_role(user_id, "moderator");
|
||
|
||
// Revoke role
|
||
users.remove_role(user_id, "moderator");
|
||
|
||
// Set custom permissions
|
||
users.set_permissions(user_id, {
|
||
"posts:create": true,
|
||
"posts:delete": false,
|
||
"comments:moderate": true
|
||
});
|
||
```
|
||
|
||
**User Object (returned from get/auth/find):**
|
||
```json
|
||
{
|
||
"id": "uuid",
|
||
"email": "alice@example.com",
|
||
"name": "Alice Smith",
|
||
"roles": ["user", "moderator"],
|
||
"permissions": { "posts:create": true },
|
||
"metadata": { "theme": "dark" },
|
||
"locked": false,
|
||
"created_at": "2026-04-10T12:00:00Z",
|
||
"updated_at": "2026-04-10T12:05:00Z",
|
||
"last_login_at": "2026-04-10T11:55:00Z"
|
||
}
|
||
```
|
||
|
||
**Use Cases:**
|
||
- User registration with email verification
|
||
- Login flows (password or passwordless)
|
||
- Password reset flows
|
||
- Role-based access control (RBAC)
|
||
- User search/directory
|
||
- Account management (lock, delete)
|
||
|
||
---
|
||
|
||
| Layer | Technology | Rationale |
|
||
|-------|-----------|-----------|
|
||
| **Orchestrator** | Rust + Axum | Performance, safety, async-first; minimal overhead |
|
||
| **Dashboard** | Alpine.js + vanilla HTML/CSS | Zero dependencies, simple to deploy, fast enough for MVP |
|
||
| **Database** | PostgreSQL 15+ (`pgcrypto`) | Robust ACID database; JSONB carries data-plane values (v1.1+). See §8.1. |
|
||
| **Container Runtime** | Docker (Docker daemon) | Industry standard, simple CLI |
|
||
| **Executor Image** | Alpine Linux + Rhai | Minimal image size (~50-100MB), fast startup |
|
||
| **Scripting** | Rhai | Lightweight, embedded-friendly, safe by default |
|
||
| **Deployment** | Docker Compose (local) / systemd (production) | Simple multi-service orchestration |
|
||
|
||
---
|
||
|
||
## 11. Deployment Model (MVP)
|
||
|
||
### Local Development
|
||
```bash
|
||
# Clone repo
|
||
git clone <repo> serverless-cloud
|
||
cd serverless-cloud
|
||
|
||
# Start all services (Orchestrator + Dashboard + Postgres)
|
||
docker-compose up
|
||
|
||
# Dashboard: http://localhost:3000
|
||
# Orchestrator: http://localhost:8080
|
||
```
|
||
|
||
### Production (Single Server)
|
||
```bash
|
||
# On target machine:
|
||
# 1. Install Docker, Docker Compose
|
||
# 2. Deploy docker-compose.yml
|
||
# 3. Optionally: use systemd service to auto-restart on reboot
|
||
|
||
docker-compose -f docker-compose.prod.yml up -d
|
||
```
|
||
|
||
### docker-compose.yml (MVP Template)
|
||
```yaml
|
||
version: '3.8'
|
||
services:
|
||
postgres:
|
||
image: postgres:15-alpine
|
||
environment:
|
||
POSTGRES_DB: serverless
|
||
POSTGRES_USER: app
|
||
POSTGRES_PASSWORD: changeme
|
||
volumes:
|
||
- postgres_data:/var/lib/postgresql/data
|
||
ports:
|
||
- "5432:5432"
|
||
|
||
orchestrator:
|
||
build: ./orchestrator
|
||
environment:
|
||
DATABASE_URL: postgres://app:changeme@postgres:5432/serverless
|
||
DOCKER_HOST: unix:///var/run/docker.sock
|
||
ports:
|
||
- "8080:8080"
|
||
volumes:
|
||
- /var/run/docker.sock:/var/run/docker.sock
|
||
|
||
dashboard:
|
||
image: nginx:alpine
|
||
volumes:
|
||
- ./dashboard/dist:/usr/share/nginx/html
|
||
ports:
|
||
- "3000:80"
|
||
|
||
volumes:
|
||
postgres_data:
|
||
```
|
||
|
||
---
|
||
|
||
## 11.4 Admin Auth (Phase 3a) — Shipped
|
||
|
||
**Status**: shipped. Implementation lives in `crates/manager-core/src/{auth,auth_*,admin_user_repo,admin_session_repo,admin_users_api}.rs`; migration `0004_admin_auth.sql`.
|
||
|
||
**Purpose**: gate the admin API (`/api/v1/admin/*`) and dashboard (`/admin/*`) behind per-user authentication. Before this phase the surface was open — anyone reaching the bound port could create, edit, and delete scripts.
|
||
|
||
**Why per-user, not a shared secret**: shared admin passwords get shared between humans, leave no audit trail, and can't be revoked per-person. Per-user accounts solve all three. The initial cut deliberately stops there — no roles, no per-app permissions — because that scope is small enough to ship in a single phase without blocking Phase 3b. Roles + per-app permissions are queued for v1.3+.
|
||
|
||
### Naming: `admin_users` vs `users`
|
||
|
||
We reserve the unqualified **`users`** table for the v1.1+ Rhai SDK feature (script-level end users — see §8.4). Platform-operator accounts live in **`admin_users`**. They are different concepts and never share rows, even when a PiCloud install hosts apps that themselves run user management.
|
||
|
||
### Schema
|
||
|
||
```sql
|
||
CREATE TABLE admin_users (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
username TEXT NOT NULL UNIQUE,
|
||
password_hash TEXT NOT NULL, -- Argon2id (PHC string)
|
||
is_active BOOLEAN NOT NULL DEFAULT TRUE,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
last_login_at TIMESTAMPTZ
|
||
);
|
||
|
||
CREATE TABLE admin_sessions (
|
||
token_hash TEXT PRIMARY KEY, -- SHA-256(hex) of the bearer token; raw token only exists in the login response + cookie
|
||
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
expires_at TIMESTAMPTZ NOT NULL,
|
||
last_used_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||
);
|
||
|
||
CREATE INDEX admin_sessions_user_idx ON admin_sessions (user_id);
|
||
CREATE INDEX admin_sessions_expiry_idx ON admin_sessions (expires_at);
|
||
```
|
||
|
||
`is_active` was added to the shipped cut so admins can be deactivated (login rejected, sessions wiped) without losing audit history; deletion still cascades sessions through the FK.
|
||
|
||
**Password hashing**: Argon2id with default OWASP parameters. This also resolves the v1.1+ open question about user-password hashing (§10) — the platform settles on Argon2id once, here.
|
||
|
||
### Bootstrap
|
||
|
||
On startup, if `admin_users` is empty, the manager reads `PICLOUD_ADMIN_USERNAME` plus a password from env (or a config file) and inserts the row. Two password env vars are accepted, in this precedence:
|
||
|
||
1. **`PICLOUD_ADMIN_PASSWORD_HASH`** (recommended) — pre-computed Argon2id PHC-format hash. The platform validates the string parses, then inserts it as-is. This avoids the raw password ever being written into env/compose files or process listings.
|
||
2. **`PICLOUD_ADMIN_PASSWORD`** (fallback) — raw password. The platform hashes it with Argon2id defaults and discards the raw value. Simpler for first-time setup; less ideal for committed configs.
|
||
|
||
If both are set, the hash wins and the raw value is ignored (with a warning logged). If neither is set on a fresh install, startup fails with a clear error pointing at the env vars.
|
||
|
||
**Once that bootstrap row exists, the env vars become inert** — restarting with different values does not change the password. This is deliberate: the env var is a one-time setup hatch, not a recovery backdoor (a backdoor would let anyone with systemd-unit or compose-file access override any admin's password).
|
||
|
||
Recovery is a separate manual flow:
|
||
```sh
|
||
picloud admin reset-password <username>
|
||
```
|
||
This requires shell access on the host (and therefore implies the operator already controls the box).
|
||
|
||
### Login & Session
|
||
|
||
```
|
||
POST /api/v1/admin/auth/login
|
||
{ "username": "...", "password": "..." }
|
||
|
||
→ 200 OK
|
||
Set-Cookie: picloud_session=<token>; HttpOnly; Secure; SameSite=Lax; Path=/
|
||
{ "user": { "id": "...", "username": "..." }, "token": "<token>", "expires_at": "..." }
|
||
```
|
||
|
||
Token format: opaque random string (32 bytes base64). Stored hashed; the raw value lives only in the login response and the session cookie. The same token works as a bearer credential for non-browser clients:
|
||
|
||
```
|
||
Authorization: Bearer <token>
|
||
```
|
||
|
||
One token system serves both dashboard and CLI/CI clients — no separate "API token" concept. Personal long-lived API tokens can be added later as a distinct `admin_api_tokens` table if demand appears.
|
||
|
||
**Session TTL** is a **24-hour sliding window**: each authenticated request bumps `expires_at` to `now + ttl` and `last_used_at` to `now`. The TTL itself is configurable per deploy via `PICLOUD_SESSION_TTL_HOURS` (default `24`). A separate background sweep deletes rows where `expires_at < now()`; until that sweep runs, expired rows are also rejected at auth-check time (so a stuck sweep can't extend session lifetime past expiry).
|
||
|
||
Companion endpoints:
|
||
- `POST /api/v1/admin/auth/logout` — deletes the session row.
|
||
- `GET /api/v1/admin/auth/me` — returns the current authenticated user.
|
||
|
||
### Admin User Management
|
||
|
||
```
|
||
GET /api/v1/admin/admins — list
|
||
POST /api/v1/admin/admins — create ({ username, password })
|
||
GET /api/v1/admin/admins/{id} — get
|
||
PATCH /api/v1/admin/admins/{id} — update ({ username?, password?, is_active? })
|
||
DELETE /api/v1/admin/admins/{id} — delete
|
||
```
|
||
|
||
Initial cut: every authenticated admin can call all of these. No self-elevation concerns because there are no privilege levels yet. The PATCH and DELETE handlers both refuse to leave the system with zero active admins (`422 Unprocessable Entity` with a clear message); PATCH that transitions `is_active` from true to false also wipes that user's sessions immediately.
|
||
|
||
Validation: username `^[a-z0-9._-]{2,32}$`, password minimum 8 characters (no complexity rules — follows NIST 800-63B guidance).
|
||
|
||
Dashboard surface: `/admin/login` (unauthed), `/admin/admins` (user list with add / change-password / deactivate / reactivate / delete actions per row). The top-bar shows the logged-in admin and a logout button. Token is held in a Svelte store with a localStorage echo so a page refresh doesn't sign you out; cookie-based auth works in parallel for non-SPA browser hits.
|
||
|
||
### Forward Compatibility
|
||
|
||
Schema is intentionally simple so role/permission tables can be added without touching `admin_users`. Illustrative future shape:
|
||
|
||
```sql
|
||
CREATE TABLE admin_roles (
|
||
id UUID PRIMARY KEY,
|
||
name TEXT UNIQUE -- e.g., 'super_admin', 'app_editor', 'app_viewer'
|
||
);
|
||
|
||
CREATE TABLE admin_user_roles (
|
||
admin_user_id UUID REFERENCES admin_users(id) ON DELETE CASCADE,
|
||
role_id UUID REFERENCES admin_roles(id) ON DELETE RESTRICT,
|
||
app_id UUID REFERENCES apps(id) ON DELETE CASCADE, -- nullable for global roles
|
||
PRIMARY KEY (admin_user_id, role_id, app_id)
|
||
);
|
||
```
|
||
|
||
Permission checks land in middleware that initially only enforces "authenticated"; the same middleware is the seam where role checks slot in later. Don't pre-build the role tables — but keep the middleware shape such that adding them is a localized change.
|
||
|
||
---
|
||
|
||
## 11.5 App Scoping (Phase 3b) — Shipped
|
||
|
||
**Status**: shipped. Implementation lives in:
|
||
- `crates/shared/src/{app,ids,script,route}.rs` — `App`, `AppDomain`, `AppId`, `app_id` fields on `Script`/`Route`/`ExecutionLog`.
|
||
- `crates/manager-core/src/{app_repo,app_domain_repo,apps_api,app_bootstrap}.rs` — repos + admin API + Hello-World seed.
|
||
- `crates/orchestrator-core/src/routing/{app_domains,pattern,table}.rs` — `AppDomainTable`, `parse_app_domain`, per-app `RouteTable`.
|
||
- Migration `0005_apps.sql`.
|
||
|
||
**Deviations from the design below**: none of substance. Two operational notes:
|
||
- The Hello-World seed lives in `crates/manager-core/seeds/hello.rhai` and is inserted by a Rust bootstrap step (`seed_hello_world_if_fresh`) rather than from the migration — keeps it testable and gives the dashboard editor real source to render. The migration always inserts the `default` app + `localhost` claim; the seed only fires when that app is otherwise empty.
|
||
- Per-app admin roles/permissions are deferred — every authenticated admin can act on every app. The middleware seam (`auth_middleware::require_admin`) is the place where role checks slot in later.
|
||
|
||
**Purpose**: PiCloud hosts multiple independent applications on one platform. Each app is the isolation boundary for scripts, routes, domains, and (later) data — App A cannot see or modify App B's resources except through HTTP calls between them.
|
||
|
||
**Why this slot**: pulled forward from the original v1.3+ "multi-user / project namespacing" bullet. Adding the `app_id` scoping dimension to schemas while the surface is small is cheap; retrofitting it after KV, docs, users, etc. ship is a multi-table migration on populated data.
|
||
|
||
### Apps Own Scripts
|
||
|
||
Every script belongs to exactly one app (`scripts.app_id`, non-null). Script IDs remain globally unique UUIDs — the API operates on script IDs directly without needing `app_id` in the URL. The dashboard nests scripts under their app in URLs (see "Dashboard URL Layout" below) but the script ID alone is still enough to resolve them server-side.
|
||
|
||
Cross-app script reuse is not done by linking. A future **duplicate-to-app** feature may copy a script's content and config into another app under a new ID, with **snapshot semantics**: the copy is independent, and changes to the original do not propagate. Genuine cross-app integration goes through HTTP calls (and, much later, an explicit export/import model for shared data).
|
||
|
||
### Apps Own Domains
|
||
|
||
Routes can no longer claim arbitrary hostnames freely. Each app declares a set of **domain claims**:
|
||
|
||
| Form | Example | Matches |
|
||
|---|---|---|
|
||
| Exact host | `app.example.com` | only that exact host |
|
||
| Single-label wildcard | `*.example.com` | one label deep: `foo.example.com`, not `a.b.example.com` |
|
||
| Parameterized | `{tenant}.example.com` | same shape as wildcard; binds `tenant` into request context |
|
||
|
||
**Syntax convention**: domain parameters use `{name}` (curly braces); route-path parameters use `:name` (colon). These are deliberately distinct so docs and conflict messages never confuse the two.
|
||
|
||
Every app also implicitly carries the reserved claim `__internal__`, granting access to `/api/v1/execute/{id}/*` for that app's scripts. An app with no public domain still works for execute-by-id (and, later, cron triggers, queue triggers, etc.).
|
||
|
||
When a route is created, its host must match one of the parent app's domain claims. The dashboard's route-creation UI offers a selector populated from the app's claims rather than a free-text host field.
|
||
|
||
### Conflict Rules — Checked at Claim Time
|
||
|
||
Domain-claim collisions are detected when a domain is added to an app, not when requests arrive:
|
||
|
||
- **Exact vs identical exact** → reject ("domain already claimed").
|
||
- **Exact vs wildcard** → allowed. `foo.example.com` (App A) coexists with `*.example.com` (App B); at request time the more-specific match wins, so A handles `foo.example.com`, B handles every other subdomain.
|
||
- **Wildcard vs wildcard at the same shape** → reject. Two apps cannot both claim `*.example.com`. `{tenant}.example.com` has the same shape as `*.example.com` for this check — the parameter name is a binding, not a discriminator.
|
||
|
||
Route-conflict errors are strictly **intra-app**. A user creating a route inside App A never sees an error that references App B. The only cross-app surface is "this domain is already claimed" at domain-claim time, which is honest and unavoidable.
|
||
|
||
### Runtime Dispatch
|
||
|
||
Request handling becomes a two-phase lookup:
|
||
|
||
1. **Host → app**: pick the app whose claim most-specifically matches the request's `Host` header (exact beats wildcard; ties are impossible by the claim rules above).
|
||
2. **Path → route**: run that app's route trie unchanged using the existing matcher.
|
||
|
||
The orchestrator's route matcher does not learn about apps — it just operates on whichever app's table was selected in step 1. This keeps the existing conflict-detection logic intact.
|
||
|
||
### Local Development
|
||
|
||
On `localhost`, `localhost` is treated as a regular domain claimed by exactly one app, defaulting to a bootstrap "default" app installed at first run. Dev and prod use the same dispatch model — no second mental model.
|
||
|
||
### Cross-App Data Sharing — Deferred
|
||
|
||
Per-app isolation is the **default and only mode** in the initial cut. KV collection `users` in App A is distinct from KV collection `users` in App B; App B cannot read App A's data without an HTTP endpoint that App A explicitly exposes.
|
||
|
||
A formal export/import model — where App B exports a collection under a public name and admin grants App A read or read-write access — is a future addition. Until it ships, the escape hatch is function-to-function HTTP calls. Sharing is easier to add than to retract; isolation comes first.
|
||
|
||
### Schema Sketch
|
||
|
||
```sql
|
||
CREATE TABLE apps (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
slug TEXT NOT NULL UNIQUE, -- URL-safe; used in dashboard paths
|
||
name TEXT NOT NULL, -- display name; can be edited freely
|
||
description TEXT,
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
|
||
CREATE TABLE app_domains (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||
pattern TEXT NOT NULL, -- 'app.example.com' | '*.example.com' | '{tenant}.example.com'
|
||
shape TEXT NOT NULL, -- 'exact' | 'wildcard' | 'parameterized'
|
||
shape_key TEXT NOT NULL, -- normalized form for collision check (parameterized → wildcard form)
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
UNIQUE (shape_key) -- two apps cannot share the same shape-key
|
||
);
|
||
|
||
ALTER TABLE scripts ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE RESTRICT;
|
||
ALTER TABLE routes ADD COLUMN app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE;
|
||
|
||
-- Existing route uniqueness checks remain unchanged; they are now scoped within an app.
|
||
```
|
||
|
||
The `UNIQUE (shape_key)` constraint enforces the "same shape" rule at the DB level. Exact-vs-wildcard coexistence is allowed because exact hosts produce a different `shape_key` from wildcards.
|
||
|
||
### Bootstrap & Migration
|
||
|
||
The migration's behavior **depends on whether the install already has user content**:
|
||
|
||
- **Fresh install** (no pre-existing scripts or routes): seed a **"Hello World"** app with `localhost` as its sole domain claim, a `hello.rhai` script that returns a greeting, and a `/hello` GET route. This serves as the reference example for new users — they can hit `http://localhost:<port>/hello` immediately after first boot and see something work. The seed is intentionally minimal; future iterations may flesh it out.
|
||
- **Upgrading install** (pre-existing scripts or routes): create a **"default"** app with `slug = 'default'`, `localhost` as its sole domain claim, and assign every existing script and route to it. The Hello World seed is **not** added in this case — adding it would pollute the user's existing content.
|
||
|
||
The branch point is detected by inspecting whether `scripts` had any rows before the migration ran.
|
||
|
||
### Dashboard URL Layout
|
||
|
||
The dashboard is **app-hierarchical**, using the app's `slug` for human-readable URLs:
|
||
|
||
```
|
||
/admin/apps — app list
|
||
/admin/apps/new — create app
|
||
/admin/apps/{slug} — app overview
|
||
/admin/apps/{slug}/scripts — scripts in this app
|
||
/admin/apps/{slug}/scripts/{id} — script detail (script ID still globally unique; slug is for breadcrumbs)
|
||
/admin/apps/{slug}/routes — routes in this app
|
||
/admin/apps/{slug}/domains — domain claims for this app
|
||
/admin/apps/{slug}/settings — app settings
|
||
```
|
||
|
||
Renaming an app changes its `slug`. The previous slug stays as a **permanent redirect** to the renamed app, persisting until another app (a new app or another rename) tries to claim that retired slug. When such a collision happens, the dashboard shows a warning before letting the operator proceed: *"`old-slug` currently redirects to app `bar` — using it here will break any external links that still target the old slug."* If the operator confirms, the redirect row is dropped and the slug is reused.
|
||
|
||
Implementation sketch:
|
||
|
||
```sql
|
||
CREATE TABLE app_slug_history (
|
||
slug TEXT PRIMARY KEY, -- the retired slug
|
||
current_app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||
retired_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
```
|
||
|
||
Slug lookup order:
|
||
1. `apps.slug = {slug}` → render the page directly.
|
||
2. `app_slug_history.slug = {slug}` → `301` redirect to `/admin/apps/{current_app.slug}/<rest>`.
|
||
3. Neither → `404`.
|
||
|
||
Slug claim order (create or rename to a slug `S`):
|
||
1. If `S` matches a current app's slug → reject as a conflict (the usual unique-constraint error).
|
||
2. If `S` matches a row in `app_slug_history` → return a "needs confirmation" response. Dashboard surfaces the warning; on confirm, delete the history row inside the same transaction as the create/rename.
|
||
3. Otherwise → proceed normally; if this was a rename, insert the old slug into `app_slug_history`.
|
||
|
||
A rename back to an app's own retired slug is a special case: just delete the row from `app_slug_history` and don't warn.
|
||
|
||
### API URL Layout
|
||
|
||
The HTTP API stays **flat**:
|
||
|
||
```
|
||
GET /api/v1/admin/apps — list apps
|
||
POST /api/v1/admin/apps — create app
|
||
GET /api/v1/admin/apps/{id_or_slug} — get app
|
||
PATCH /api/v1/admin/apps/{id_or_slug} — update app
|
||
DELETE /api/v1/admin/apps/{id_or_slug} — delete app
|
||
GET /api/v1/admin/apps/{id_or_slug}/domains — list/manage domain claims
|
||
POST /api/v1/admin/apps/{id_or_slug}/domains
|
||
DELETE /api/v1/admin/apps/{id_or_slug}/domains/{domain_id}
|
||
|
||
GET /api/v1/admin/scripts — list scripts (now supports ?app={id_or_slug} filter)
|
||
GET /api/v1/admin/scripts/{id} — unchanged; script IDs are globally unique
|
||
... (rest of scripts/routes endpoints unchanged)
|
||
```
|
||
|
||
The scripts and routes endpoints keep their existing shape — this avoids forcing API consumers to a v2 migration. The new app-management endpoints are additive. Clients that want app context can use the `?app=` filter.
|
||
|
||
---
|
||
|
||
## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — ✓ Shipped
|
||
|
||
**Status**: shipped, ahead of the originally planned slot. Lives in `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, the extended `auth_middleware.rs`, shared types under `crates/shared/src/auth.rs`, and migration `0006_users_authz.sql`. `can(principal, capability)` and `require(principal, capability)` are the single gate every admin handler goes through.
|
||
|
||
**Purpose**: bridge Phase 3b → Phase 4. Phase 4's v1.1 SDKs (KV, docs, HTTP, cron) each gate access on the calling principal. Without a real authorization model in place, every SDK addition has to either invent its own gate or stay open. Phase 3.5 lands `can(principal, capability)` as the single check every future SDK + admin endpoint goes through, so v1.1 work focuses on data plane shape, not on re-litigating auth.
|
||
|
||
**Why this slot**: same logic as Phase 3b. Adding a `Principal` parameter and a capability check to surfaces that don't exist yet is free; retrofitting them onto live SDK services after v1.1 ships is a refactor of every gate.
|
||
|
||
### Principal Model
|
||
|
||
One `Principal` value represents a human admin user. Service accounts (CI bots, Rhai scripts calling out) get **schema room** in this phase but no runtime support — `users.kind` style differentiation lands when Phase 4's `users.*` SDK arrives. Until then, every authenticated request resolves to exactly one admin row, whether the credential is a session cookie or a bearer API key.
|
||
|
||
```rust
|
||
pub struct Principal {
|
||
pub user_id: UserId, // alias of AdminUserId for the transition
|
||
pub instance_role: InstanceRole,
|
||
pub scopes: Option<Vec<Scope>>, // None = cookie session (full role authority)
|
||
// Some = API key (intersect with role)
|
||
pub app_binding: Option<AppId>, // API key bound to one app; denies other apps
|
||
}
|
||
```
|
||
|
||
### Instance Roles (one per user)
|
||
|
||
| Role | Powers |
|
||
|---|---|
|
||
| `owner` | full instance control, manage other owners, implicit `app_admin` on every app. Multiple owners allowed. |
|
||
| `admin` | create apps, invite users, implicit `app_admin` on every app. Cannot manage instance-wide settings (sandbox ceiling, etc.) or other owners. |
|
||
| `member` | invited into specific apps only. Cannot create apps, cannot invite. **Strict isolation enforced at SQL** — list endpoints `WHERE app_id IN (SELECT app_id FROM app_members WHERE user_id = $1)`; the API never returns apps a member isn't part of. |
|
||
|
||
The current Phase 3a `admin_users` rows all become `owner` via `DEFAULT 'owner'` on the new column. Multi-owner installs get a startup `tracing::warn!` listing the active owner usernames so the operator can demote extras via `PATCH /api/v1/admin/admins/{id}`.
|
||
|
||
### App-Scoped Roles (zero-to-many per user × app)
|
||
|
||
| Role | Grants |
|
||
|---|---|
|
||
| `app_admin` | settings, domain claims, delete app, **delete scripts** |
|
||
| `editor` | create + edit scripts, routes, sandbox config (no script delete) |
|
||
| `viewer` | read scripts + execution logs |
|
||
|
||
Implicit grants from instance role: every `owner` and every `admin` is `app_admin` on every app — a single-human install would otherwise have to add itself to each new app's `app_members`. Explicit `app_members` rows are the only path for `member` users.
|
||
|
||
Script **save** uses `AppWriteScript` (editor+); script **delete** uses `AppAdmin` (app_admin+). Editors can iterate on a script's source freely but cannot remove it — destructive cleanup stays with the role that also owns the app.
|
||
|
||
### Auth Methods — Same Principal, Different Extractor
|
||
|
||
Two credential types feed the same middleware:
|
||
|
||
1. **Session cookie** (Phase 3a, unchanged) — `picloud_session=<token>`. Extracted by header or cookie. SHA-256 lookup against `admin_sessions.token_hash`. Sliding 24h TTL. Produces `Principal { scopes: None, app_binding: None }`.
|
||
|
||
2. **Bearer API key** (new) — `Authorization: Bearer pic_<base32(32 random bytes)>`. The `pic_` prefix is the discriminator: present → API key path; absent → session path. The 8 chars immediately after `pic_` are indexed (`api_keys.prefix`); the full body after `pic_` is Argon2id-verified against each candidate's `hash`. Last-used timestamp updates inline.
|
||
|
||
Both paths converge on the same `Principal` extension; handlers cannot tell which credential was presented unless they introspect `principal.scopes`.
|
||
|
||
### API Key Format & Storage
|
||
|
||
- Raw form: `pic_<base32(32 random bytes, no padding)>` — ~56 chars total.
|
||
- Stored: 8-char prefix + Argon2id PHC hash of the body. Raw value returned **exactly once** in the `POST /api/v1/admin/api-keys` response; never logged, never readable again.
|
||
- Optional `expires_at`. Lookup queries always filter `expires_at IS NULL OR expires_at > NOW()`.
|
||
- Optional `app_id` ("bound key") — every `App*(other_app)` capability is denied for this key, regardless of the user's role.
|
||
|
||
### Scope Set (intentionally narrow)
|
||
|
||
Exactly seven scopes; no further subdivision until a real use case appears:
|
||
|
||
`script:read`, `script:write`, `route:write`, `domain:manage`, `log:read`, `app:admin`, `instance:admin`
|
||
|
||
Mint-time validation rejects unknown values. Bound keys (`app_id` set) cannot carry `instance:*` scopes — the combination is irreconcilable (a bound credential cannot claim instance-wide authority) and is rejected with 422.
|
||
|
||
### Effective Capability — `can(principal, capability)`
|
||
|
||
```
|
||
allow = role_grants(principal.instance_role, capability)
|
||
∧ (principal.scopes.is_none() ∨ required_scope(capability) ∈ principal.scopes)
|
||
∧ (principal.app_binding.is_none() ∨ capability.app_id() == principal.app_binding)
|
||
```
|
||
|
||
`role_grants` collapses the three tables (instance role + implicit app grants + explicit `app_members`) into a single yes/no. Each handler calls `state.authz.require(&principal, Capability::AppWrite(script.app_id))` after loading the resource (so the capability binds to the resource's actual `app_id`, not a path param the caller controls).
|
||
|
||
### Deactivation Symmetry
|
||
|
||
Phase 3a's `set_active(false)` wipes that user's `admin_sessions`. Phase 3.5 extends it to also set `expires_at = NOW()` on every row in `api_keys WHERE user_id = $1` — both credential surfaces become inert at the same moment, no enumeration window.
|
||
|
||
### CLI Auth Posture (forward note)
|
||
|
||
The eventual `picloud` CLI authenticates by **paste-the-token**, not OAuth: the user runs `picloud login`, the dashboard mints a fresh key (or the user mints one via `POST /api/v1/admin/api-keys`), and the CLI prompts for the raw token. The CLI binary itself is deferred; the dashboard surface and the bearer credential type land here so the CLI is a thin wrapper when it arrives.
|
||
|
||
### Schema (Migration 0006)
|
||
|
||
```sql
|
||
ALTER TABLE admin_users
|
||
ADD COLUMN instance_role TEXT NOT NULL DEFAULT 'owner'
|
||
CHECK (instance_role IN ('owner','admin','member')),
|
||
ADD COLUMN email TEXT UNIQUE,
|
||
ADD COLUMN mfa_secret TEXT; -- reserved slot, not built
|
||
|
||
CREATE TABLE app_members (
|
||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
|
||
role TEXT NOT NULL CHECK (role IN ('app_admin','editor','viewer')),
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||
PRIMARY KEY (app_id, user_id)
|
||
);
|
||
CREATE INDEX app_members_user_id_idx ON app_members (user_id);
|
||
|
||
CREATE TABLE api_keys (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
|
||
hash TEXT NOT NULL, -- Argon2id PHC
|
||
prefix TEXT NOT NULL, -- first 8 chars after `pic_`
|
||
name TEXT NOT NULL,
|
||
scopes TEXT[] NOT NULL, -- intersected with role at check time
|
||
app_id UUID NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||
expires_at TIMESTAMPTZ NULL,
|
||
last_used_at TIMESTAMPTZ NULL,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||
);
|
||
CREATE INDEX api_keys_prefix_idx ON api_keys (prefix);
|
||
CREATE INDEX api_keys_user_id_idx ON api_keys (user_id);
|
||
|
||
-- Reserved (not built this phase):
|
||
-- invites (token, email, instance_role, app_id, app_role, invited_by, expires_at, consumed_at)
|
||
-- service_accounts (id, name, owning_user_id, …)
|
||
```
|
||
|
||
### New Endpoints (additive — no API major bump)
|
||
|
||
```
|
||
POST /api/v1/admin/api-keys — { name, scopes[], app_id?, expires_at? }
|
||
→ 201 { …, raw_token } (raw returned exactly once)
|
||
GET /api/v1/admin/api-keys — list caller's own keys (no raw)
|
||
DELETE /api/v1/admin/api-keys/{id} — caller's own only
|
||
```
|
||
|
||
Every existing `/api/v1/admin/*` endpoint is re-gated from "any authed admin" to a specific `Capability`. Request/response shapes are unchanged; what changes is the set of callers each endpoint accepts (a `member` now gets 403 on app surfaces they're not part of, where before they would have been 401-or-200 depending only on session validity).
|
||
|
||
### App Member Management Endpoints
|
||
|
||
Exposes the `app_members` table as a first-class CRUD surface so app admins can manage who they share an app with from the dashboard, not just from SQL.
|
||
|
||
```
|
||
GET /api/v1/admin/apps/{id_or_slug}/members — list members (ordered by username),
|
||
joined with admin_users for
|
||
username / email / instance_role / is_active
|
||
POST /api/v1/admin/apps/{id_or_slug}/members — { user_id, role } → 201 enriched DTO
|
||
409 on duplicate (promotions go through PATCH)
|
||
422 if target user is_active = false
|
||
422 if target user instance_role != 'member'
|
||
(owners/admins have implicit authority;
|
||
an explicit row would be dead weight)
|
||
PATCH /api/v1/admin/apps/{id_or_slug}/members/{user_id} — { role } → 200 enriched DTO
|
||
404 if no existing membership
|
||
DELETE /api/v1/admin/apps/{id_or_slug}/members/{user_id} — 204 (idempotent — 204 also when missing)
|
||
```
|
||
|
||
All four are gated on `Capability::AppAdmin(app_id)`. Editors and viewers get 403 on list and never see the dashboard's Members tab.
|
||
|
||
**`my_role` on the app lookup endpoint.** `GET /api/v1/admin/apps/{id_or_slug}` now returns an additional `my_role: Option<AppRole>`, computed server-side from the principal: `Owner → app_admin`, `Admin → editor`, `Member → app_members.role`. The dashboard uses this single field to decide whether to render the Members tab (visible iff `my_role == app_admin`), keeping API and UI gate logic identical.
|
||
|
||
**No last-app-admin guard.** Unlike the last-owner protection on `admin_users`, removing the final `app_admin` row from `app_members` is allowed. Every `owner` instance-role user implicitly satisfies `Capability::AppAdmin(_)` via the top-level `role_grants` branch, so no app can become permanently orphaned — an owner can always re-issue grants. The `admin` instance role is only implicit *editor*, so it does **not** provide a fallback path; the owner guarantee alone is what makes the no-guard position safe.
|
||
|
||
**Dead-row sweep on promotion (deferred).** Promoting a user from `member` → `admin`/`owner` leaves their `app_members` rows in place. They become inert (implicit grants supersede), but are not auto-deleted. A future hook can sweep them; harmless for now.
|
||
|
||
Additive within `/api/v1/admin/...` — no API major bump per [docs/versioning.md](docs/versioning.md).
|
||
|
||
### Out of Scope (Phase 3.5)
|
||
|
||
Schema room only, not built:
|
||
|
||
- **Invites** — email-based join flow; `invites` table reserved in the migration comment block.
|
||
- **MFA / TOTP** — `mfa_secret` column reserved on `admin_users`.
|
||
- **Service accounts** — reserved as a future table; for now, every API key belongs to a human `admin_users` row.
|
||
|
||
Defer to follow-up sessions: dashboard surfaces for invites / key minting (curl is the supported interface this phase — member management has a dashboard tab; see above), OIDC / SAML / SCIM, the `picloud` CLI binary itself, email/SMTP delivery of invites, audit log shipping.
|
||
|
||
---
|
||
|
||
## 12. Development Roadmap
|
||
|
||
### Phase 1: MVP ✓ (Shipped)
|
||
- [x] Manager: REST API for script CRUD + executions log
|
||
- [x] Orchestrator: HTTP ingress, route resolution, dispatch
|
||
- [x] Executor: embedded Rhai engine with sandbox limits (replaces the original Docker-per-execution model — embedded gives better latency and less infra)
|
||
- [x] Dashboard (SvelteKit): script upload, edit, routing config, execution log viewer
|
||
- [x] PostgreSQL: scripts, routes, execution_logs; embedded migrations
|
||
- [x] Caddy reverse proxy in front of everything
|
||
|
||
**Delivered beyond original MVP scope:** custom routing (exact / prefix / param + host-aware) with conflict detection, per-script Rhai sandbox config, four-tab dashboard detail UI, structured versioning scheme (product + SDK + API + schema + wire) with `/version` self-report, Rhai editor with autocomplete / goto / find-usages / formatter, SDK contract + schema snapshot + integration test suites.
|
||
|
||
---
|
||
|
||
### Phase 2: v1.0 (Polish & Usability) ✓ (Shipped)
|
||
- [x] Execution history dashboard
|
||
- [x] Better error messages (Rhai parse errors, sandbox limits, timeouts)
|
||
- [x] Timeout / resource-limit enforcement (per-script sandbox config)
|
||
- [x] Rhai SDK docs current through SDK 1.1
|
||
|
||
(Script versioning + rollback remains deferred — see Phase 6.)
|
||
|
||
---
|
||
|
||
### Phase 3: v1.0.x — Foundations ✓ (Shipped)
|
||
|
||
Three foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
|
||
|
||
**3a. Admin auth** — ✓ shipped. See section 11.4. Per-user `admin_users` (not a shared secret), Argon2id passwords, env-var bootstrap of the first admin, session-token doubling as bearer token for API. No roles in this cut; schema is forward-compatible with later RBAC.
|
||
|
||
**3b. Multi-app scoping** — ✓ shipped. See section 11.5. `apps`, `app_domains`, `app_slug_history` tables; `app_id` columns on `scripts`, `routes`, `execution_logs`. Migration assigns existing data to a `default` app and always claims `localhost`; a Rust-side bootstrap inserts a `Hello World` script + `/hello` route when the default app is empty. Orchestrator dispatch is two-phase (Host → app → route trie). `/api/v1/execute/{id}/*` continues to work without a public domain claim. Dashboard is app-hierarchical (`/admin/apps`, `/admin/apps/{slug}/...`); API stays flat with new endpoints under `/api/v1/admin/apps/*` and a `?app=` filter on script listing. Per-app admin roles deferred.
|
||
|
||
**3c. Users, roles, and bearer-token auth (Phase 3.5)** — ✓ shipped. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only.
|
||
|
||
**Why all three before v1.1**: every v1.1 service (KV, docs, users, etc.) needs both an `app_id` scoping key in its schema and a `Principal` to authorize against. Adding both now is one migration each on a small surface; adding them after the SDKs ship is many migrations on populated data plus a re-gate of every SDK call.
|
||
|
||
---
|
||
|
||
### Phase 4: v1.1 (Expand Capabilities & Services) — Current focus
|
||
|
||
Released in patch steps (v1.1.0 → v1.1.8), each landing one focused capability. The split lets each release ship behind tests + docs without long-lived branches. SDK shape (handle pattern, `::` namespace, error convention, `ExecutionGate`, `SdkCallCx`, `ServiceEventEmitter` — see §7.5 and [docs/sdk-shape.md](../docs/sdk-shape.md)) is fixed in v1.1.0; every subsequent release fills in the contents without re-litigating the shape.
|
||
|
||
| Version | Capability |
|
||
|---------|------------|
|
||
| **v1.1.0** | **Foundation & Standard Library** — SDK shape (`Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape); stdlib utilities (regex, random, time, json, base64, hex, url). |
|
||
| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (outbox + dispatcher + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. |
|
||
| **v1.1.2** | **Documents** — `docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. |
|
||
| **v1.1.3** | **Modules** — `scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. |
|
||
| **v1.1.4** | **Outbound HTTP & Scheduled Tasks** — `http::*` with SSRF deny-list; cron triggers. |
|
||
| **v1.1.5** | **Files & Messaging** — filesystem-backed blobs with `files:*` triggers; pub/sub via LISTEN/NOTIFY with `pubsub:*` triggers. |
|
||
| **v1.1.6** | **Configuration & Email** — encrypted per-app secrets; outbound `email::send` / `send_html` + inbound `email:receive` trigger. |
|
||
| **v1.1.7** | **User Management** — `users::*` for in-script CRUD, auth, roles, invites, password reset. |
|
||
| **v1.1.8** | **Durable Queues & Function Composition** — `queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). |
|
||
|
||
---
|
||
|
||
### Phase 5: v1.2 (Advanced Workflows & Hierarchies)
|
||
- Function workflows (DAG execution, conditional branching, error handling)
|
||
- Nested workflows
|
||
- Call graph visualization + execution tracing
|
||
- Advanced query support for document store (`docs.query()` with filters: `$gt`, `$or`, etc.)
|
||
- Service interceptors (see section 9.4)
|
||
|
||
---
|
||
|
||
### Phase 6: v1.3+ (Scaling, Security, Observability)
|
||
- Cluster mode (split-process manager + per-node orchestrator + executor); cluster-mode wire protocol versioning
|
||
- Cross-app data sharing (explicit export/import model — see section 11.5)
|
||
- Script versioning + rollback (keep N historical versions in a side table; rollback endpoint)
|
||
- Rate limiting on endpoints
|
||
- Auth (richer model: API keys, OAuth, etc.)
|
||
- Metrics + monitoring dashboard
|
||
- Distributed tracing (OpenTelemetry)
|
||
- Webhooks for execution events
|
||
- S3 integration (object storage reads/writes)
|
||
|
||
---
|
||
|
||
## 7. Complete Rhai SDK Reference (MVP → v1.1+)
|
||
|
||
### Storage & Data
|
||
| Component | Methods | Availability |
|
||
|-----------|---------|--------------|
|
||
| **KV Store** | `kv.get(collection, key)`, `kv.set(collection, key, value, ttl?)`, `kv.delete(collection, key)`, `kv.has(collection, key)` | v1.1 |
|
||
| **Documents** | `docs.create(collection, data, schema?)`, `docs.find(collection, id)`, `docs.update(collection, id, data, schema?)`, `docs.delete(collection, id)`, `docs.list(collection, opts?)`, `docs.query(collection, filter?)` | v1.1 |
|
||
| **S3** | `s3.get(key)`, `s3.put(key, data)`, `s3.delete(key)`, `s3.list(prefix?)` | v1.1 |
|
||
| **Users** | `users.create(data)`, `users.get(id)`, `users.find_by_email(email)`, `users.search(query, limit, offset)`, `users.list(filters)`, `users.update(id, data)`, `users.authenticate(email, password)`, `users.update_password(id, old, new)`, `users.lock/unlock(id)`, `users.delete(id)`, `users.send_invite(email)`, `users.send_password_reset(email)`, `users.send_login_link(email)`, `users.has_role/permission(id, role/perm)`, `users.add/remove_role(id, role)` | v1.1 |
|
||
|
||
### Communication
|
||
| Component | Methods | Availability |
|
||
|-----------|---------|--------------|
|
||
| **Email** | `email.send(to, subject, body)`, `email.send_html(to, subject, html, text?)` | v1.1 |
|
||
| **HTTP** | `http.get(url, opts?)`, `http.post(url, body, opts?)`, `http.put(...)`, `http.delete(...)` | v1.1 |
|
||
|
||
### Functions & Execution
|
||
| Component | Methods | Availability |
|
||
|-----------|---------|--------------|
|
||
| **Invoke** | `invoke(function_id, args, opts?)`, `invoke_async(function_id, args)` | v1.1 |
|
||
| **Queue** | `queue.send(queue_name, message)`, `queue.send_batch(queue_name, messages)` | v1.1 |
|
||
| **Retry** | `retry::call(fn, opts)`, `retry::http_call(fn, opts)` | v1.1 |
|
||
|
||
### Observability & Context
|
||
| Component | Methods | Availability |
|
||
|-----------|---------|--------------|
|
||
| **Logging** | `log.info(msg, data?)`, `log.warn(msg, data?)`, `log.error(msg, data?)`, `log.debug(msg, data?)` | v1.0 |
|
||
| **Context** | `context().execution_id()`, `context().script_id()`, `context().request_id()`, `context().trace_id()`, `context().invocation_type()`, `context().parent_execution_id()` | v1.0+ |
|
||
|
||
### Request/Response & Context
|
||
| Component | Structure | Availability |
|
||
|-----------|-----------|--------------|
|
||
| **ctx** (global) | `ctx.execution_id`, `ctx.script_id`, `ctx.script_name`, `ctx.request_id`, `ctx.trace_id`, `ctx.invocation_type`, `ctx.parent_execution_id`, `ctx.request.path`, `ctx.request.headers`, `ctx.request.body` | MVP+ |
|
||
| **Response** | Return `{ statusCode, headers?, body }` | MVP |
|
||
|
||
## 7.5 SDK Architecture (v1.1.x foundation)
|
||
|
||
Stateful Rhai SDK services (KV, docs, HTTP, …) hang off a common shape laid down by the v1.1.0 SDK foundation PR. Full reference lives in [docs/sdk-shape.md](../docs/sdk-shape.md); this section sketches the moving parts so other sections can refer to them by name.
|
||
|
||
**`Services` bundle** (`picloud_shared::Services`) — an `#[non_exhaustive]` struct constructed once at startup. v1.1.0 ships it empty; each subsequent v1.1.x PR adds one `Arc<dyn KvService>` / `Arc<dyn DocsService>` / … field. Held on `Engine`, passed by reference to the per-call registration hook.
|
||
|
||
**Per-call context** (`picloud_shared::SdkCallCx`) — every stateful service trait method takes `&SdkCallCx` as its first non-self argument. Carries `app_id`, `Option<Principal>`, `execution_id`, `request_id`, and the `trigger_depth` / `root_execution_id` slots that the triggers framework populates. Services derive `app_id` from the cx — never from script-passed args. **That rule is the cross-app isolation boundary**; scripts cannot name another app's data.
|
||
|
||
**Handle pattern** — collection-scoped services expose `kv::collection("widgets").get("k")`, not `kv::get("widgets", "k")`. Removes the wrong-collection-name foot-gun and lets implementations cache per-collection state. `(app_id, collection, key)` is the identity tuple for KV; `(app_id, collection, id)` for docs. Collections are mandatory.
|
||
|
||
**Error convention** — throw on failure, `()` for absent, `bool` for predicates. Uniform across every v1.1.x service. Scripts opt into handling errors via Rhai's `try/catch`.
|
||
|
||
**`ExecutionGate`** (`orchestrator-core::gate::ExecutionGate`) — single global semaphore capping concurrent script executions. Default 32, override via the `PICLOUD_MAX_CONCURRENT_EXECUTIONS` env var. Non-blocking — on overflow, the orchestrator returns HTTP 503 with `Retry-After: 1` immediately. No queue. Rationale: Rhai runs under `spawn_blocking`, so unbounded concurrency would park every blocking thread and starve every other workload.
|
||
|
||
**`ServiceEventEmitter`** (`picloud_shared::ServiceEventEmitter`) — every mutating service method emits a `ServiceEvent { source, op, collection, key, payload, old_payload }`. v1.1.0 ships `NoopEventEmitter`; the real outbox-backed dispatcher lands with v1.1.1 (see 7.5.1).
|
||
|
||
### 7.5.1 Trigger architecture (sketch)
|
||
|
||
Triggers fire scripts in response to service events. Three locked properties; full design and CRUD endpoints land with v1.1.1.
|
||
|
||
1. **Async outbox**: services emit events synchronously into a Postgres outbox table; a separate dispatcher worker reads, matches them against registered triggers, and fans out script executions. Service writes don't block on trigger fan-out.
|
||
2. **Depth-limited**: each trigger-spawned execution increments `cx.trigger_depth`. The dispatcher refuses to fan out beyond a configured ceiling to prevent runaway feedback loops. `cx.root_execution_id` preserves the originating execution id for audit grouping.
|
||
3. **Trigger model**: a trigger is `(service, event, filter) → script`, stored in a `triggers` table. The filter is the dispatcher's match predicate on the emitted `ServiceEvent`.
|
||
|
||
### 8.1 KV Store Service
|
||
**Purpose**: Simple key-value persistence organized by collections, scoped per app and shared across script invocations and scripts within that app.
|
||
|
||
**PostgreSQL Schema:**
|
||
```sql
|
||
CREATE TABLE kv_store (
|
||
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||
collection TEXT NOT NULL,
|
||
key TEXT NOT NULL,
|
||
value JSONB NOT NULL,
|
||
expires_at TIMESTAMP,
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
PRIMARY KEY (app_id, collection, key)
|
||
);
|
||
|
||
CREATE INDEX idx_kv_app_collection ON kv_store(app_id, collection);
|
||
CREATE INDEX idx_kv_expires ON kv_store(expires_at)
|
||
WHERE expires_at IS NOT NULL;
|
||
```
|
||
|
||
**Why JSONB + mandatory collections + `app_id` first:**
|
||
- `(app_id, collection, key)` is the identity tuple. The PK begins with `app_id` so the index is naturally per-app; cross-app reads can't happen even if the service layer has a bug.
|
||
- Collections are **mandatory** — every set / get / delete names one. The same key can legitimately live in multiple collections within one app (`sessions:abc` and `counters:abc` are distinct rows).
|
||
- JSONB carries arbitrary script-side values (nested objects, arrays) without a separate serialization step. `hstore` was considered and ruled out — it doesn't carry nested types and would force a second JSONB column the moment a script writes a structured value.
|
||
|
||
**Value-size cap:** 64 KiB per value, enforced at the service layer (script-visible error on overflow). The cap keeps KV "small fast values, not blob storage"; the v1.1.5 files SDK is the right home for large payloads.
|
||
|
||
**Rhai SDK (handle pattern — see [docs/sdk-shape.md](docs/sdk-shape.md)):**
|
||
```rhai
|
||
let sessions = kv::collection("sessions");
|
||
sessions.set("user:123", #{ token: "abc", created: "2026-04-10" });
|
||
let val = sessions.get("user:123"); // value or () if absent
|
||
sessions.delete("user:123");
|
||
sessions.set("user:123", #{ token: "xyz" }, 3600); // TTL in seconds
|
||
if sessions.has("user:123") { ... }
|
||
|
||
// Distinct collections in one script — different handles.
|
||
let counters = kv::collection("counters");
|
||
counters.set("api:calls", 42);
|
||
```
|
||
|
||
**Use Cases:**
|
||
- Cache frequently accessed data
|
||
- Store user session state
|
||
- Counters, flags, feature toggles
|
||
- Rate limiting state (hit counts)
|
||
|
||
---
|
||
|
||
### 8.2 Document Store Service
|
||
**Purpose**: Flexible NoSQL-like storage for complex JSON documents, organized by collections.
|
||
|
||
**PostgreSQL Schema:**
|
||
```sql
|
||
CREATE TABLE documents (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
collection TEXT NOT NULL,
|
||
data JSONB NOT NULL,
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
UNIQUE(collection, id)
|
||
);
|
||
|
||
CREATE INDEX idx_docs_collection ON documents(collection);
|
||
CREATE INDEX idx_docs_data ON documents USING GIN(data);
|
||
```
|
||
|
||
**Rhai SDK:**
|
||
```rhai
|
||
// Create a document
|
||
let doc_id = docs.create("users", {
|
||
name: "Alice",
|
||
email: "alice@example.com",
|
||
tags: ["vip", "beta"]
|
||
});
|
||
|
||
// Find by ID
|
||
let user = docs.find("users", doc_id);
|
||
|
||
// Update document
|
||
docs.update("users", doc_id, {
|
||
last_login: "2026-04-10T12:00:00Z"
|
||
});
|
||
|
||
// Delete document
|
||
docs.delete("users", doc_id);
|
||
|
||
// Query by field (simple equality, v1.2+ advanced queries)
|
||
let admins = docs.query("users", { role: "admin" });
|
||
|
||
// List all in collection (with pagination)
|
||
let all_users = docs.list("users", { limit: 100, offset: 0 });
|
||
```
|
||
|
||
**Use Cases:**
|
||
- User profiles, orders, transactions
|
||
- Event log / audit trail
|
||
- Content (posts, articles, comments)
|
||
- Configuration documents
|
||
- Workflow state
|
||
|
||
---
|
||
|
||
### 8.3 Email Service
|
||
**Purpose**: Send outgoing emails via SMTP.
|
||
|
||
**Configuration (stored in orchestrator config):**
|
||
```yaml
|
||
email:
|
||
smtp_host: "smtp.gmail.com"
|
||
smtp_port: 587
|
||
smtp_user: "your-email@gmail.com"
|
||
smtp_password: "app-password" # Or from secrets manager
|
||
from_address: "noreply@yourdomain.com"
|
||
from_name: "Serverless Cloud"
|
||
```
|
||
|
||
**Rhai SDK:**
|
||
```rhai
|
||
// Simple send
|
||
email.send({
|
||
to: "user@example.com",
|
||
subject: "Welcome!",
|
||
body: "Hello, welcome to our service."
|
||
});
|
||
|
||
// HTML body
|
||
email.send({
|
||
to: "user@example.com",
|
||
subject: "Welcome!",
|
||
html: "<h1>Welcome!</h1><p>Hello user.</p>",
|
||
text: "Welcome! Hello user." // Fallback
|
||
});
|
||
|
||
// With CC, BCC, reply-to
|
||
email.send({
|
||
to: "user@example.com",
|
||
cc: "admin@example.com",
|
||
bcc: "archive@example.com",
|
||
reply_to: "support@example.com",
|
||
subject: "Notification",
|
||
body: "..."
|
||
});
|
||
|
||
// Template-like (basic string interpolation)
|
||
let name = req.body.name;
|
||
email.send({
|
||
to: req.body.email,
|
||
subject: `Welcome, ${name}!`,
|
||
body: `Hi ${name},\n\nWelcome to our service.`
|
||
});
|
||
```
|
||
|
||
**Use Cases:**
|
||
- Welcome emails on sign-up
|
||
- Notifications (password reset, order status)
|
||
- Alerts from scripts
|
||
- Digest emails from queued data
|
||
|
||
---
|
||
|
||
## 9. v1.2+ Future Vision: Workflows & Hierarchies
|
||
|
||
### 9.1 Function Workflows (DAG Execution)
|
||
**Concept**: Chain multiple functions together in a directed acyclic graph (DAG).
|
||
|
||
**Example:**
|
||
```
|
||
Function A (process raw data)
|
||
↓
|
||
Function B (validate data)
|
||
↓
|
||
Function C (store in DB + send notification)
|
||
```
|
||
|
||
**Workflow Definition (YAML, v1.2+):**
|
||
```yaml
|
||
name: "data-pipeline"
|
||
description: "Process, validate, store data"
|
||
|
||
steps:
|
||
- name: "process"
|
||
function: "process-raw-data"
|
||
input: "{{ trigger.body }}"
|
||
|
||
- name: "validate"
|
||
function: "validate-data"
|
||
input: "{{ steps.process.output }}"
|
||
on_error: "fail" # or "skip", "retry"
|
||
|
||
- name: "store"
|
||
function: "store-and-notify"
|
||
input: "{{ steps.validate.output }}"
|
||
timeout: 60
|
||
retry:
|
||
attempts: 3
|
||
backoff: "exponential"
|
||
|
||
output: "{{ steps.store.output }}"
|
||
```
|
||
|
||
**Features:**
|
||
- Sequential execution (A → B → C)
|
||
- Parallel execution (B & C in parallel after A)
|
||
- Conditional branching (if A succeeds, run B; else run C)
|
||
- Error handling (fail fast, skip, retry with backoff)
|
||
- Data passing between steps (output of A → input of B)
|
||
- Workflow state tracking + execution history
|
||
- Timeout per step + total timeout
|
||
|
||
**Schema (PostgreSQL):**
|
||
```sql
|
||
CREATE TABLE workflows (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
name TEXT NOT NULL UNIQUE,
|
||
description TEXT,
|
||
definition JSONB NOT NULL, -- YAML parsed as JSON
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW()
|
||
);
|
||
|
||
CREATE TABLE workflow_executions (
|
||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||
workflow_id UUID REFERENCES workflows(id),
|
||
status TEXT, -- 'pending', 'running', 'success', 'failed'
|
||
steps_state JSONB, -- { "process": { output: ... }, "validate": { output: ... } }
|
||
error_message TEXT,
|
||
started_at TIMESTAMP,
|
||
completed_at TIMESTAMP
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
### 9.2 Function Hierarchy (Parent/Child Invocation)
|
||
**Concept**: Functions can invoke other functions and wait for results (like microservice calls).
|
||
|
||
**Example:**
|
||
```
|
||
Parent Function A
|
||
├─ Child Function B (sync call, waits)
|
||
├─ Child Function C (sync call, waits)
|
||
└─ Child Function D (async, fire-and-forget)
|
||
```
|
||
|
||
**Rhai SDK:**
|
||
```rhai
|
||
// Synchronous invoke (waits for result)
|
||
let result_b = invoke("function-b", { param: "value" });
|
||
let result_c = invoke("function-c", { param: "value" });
|
||
|
||
// Process results
|
||
if result_b.statusCode == 200 {
|
||
let data = result_b.body;
|
||
// ... process
|
||
}
|
||
|
||
// Asynchronous invoke (fire-and-forget)
|
||
invoke_async("function-d", { param: "value" });
|
||
|
||
// Invoke with timeout
|
||
let result = invoke("function-b", { param: "value" }, { timeout: 30 });
|
||
```
|
||
|
||
**Orchestrator Behavior:**
|
||
- Parent function execution starts container
|
||
- Child function invocation: spawn new container (nested execution)
|
||
- Sync: parent waits; async: parent continues
|
||
- Error handling: propagate up or catch locally
|
||
- Timeout cascading: child timeout ≤ parent timeout
|
||
|
||
**Call Graph Tracking:**
|
||
```
|
||
Function Execution Tree:
|
||
parent-func-exec-123
|
||
├─ child-b-exec-456 (sync, 200ms)
|
||
├─ child-c-exec-789 (sync, 500ms)
|
||
└─ child-d-exec-012 (async, initiated)
|
||
|
||
Total execution: 700ms (max of child times)
|
||
```
|
||
|
||
**Schema (PostgreSQL):**
|
||
```sql
|
||
ALTER TABLE execution_logs ADD COLUMN (
|
||
parent_execution_id UUID REFERENCES execution_logs(id),
|
||
invocation_type TEXT, -- 'http', 'parent_sync', 'parent_async'
|
||
call_depth INT DEFAULT 0 -- Track nesting level
|
||
);
|
||
|
||
CREATE INDEX idx_execution_parent ON execution_logs(parent_execution_id);
|
||
```
|
||
|
||
---
|
||
|
||
### 9.4 Service Interceptors & Middleware (v1.2+)
|
||
|
||
**Concept**: A script can act as middleware to intercept and validate/transform service operations before they execute.
|
||
|
||
**Use Cases:**
|
||
- Auth function intercepts S3 writes: validate user permissions
|
||
- Audit function intercepts document updates: log all mutations
|
||
- Rate-limiting function intercepts queue sends: enforce quotas
|
||
- Data validation function intercepts DB operations: enforce schema
|
||
|
||
**Script Configuration (at upload):**
|
||
```json
|
||
{
|
||
"name": "auth-interceptor",
|
||
"description": "Authorize S3 writes",
|
||
"version": 1,
|
||
"script_content": "...",
|
||
|
||
"interceptors": {
|
||
"s3": {
|
||
"before_write": true,
|
||
"before_read": false
|
||
},
|
||
"queue": {
|
||
"before_send": true
|
||
},
|
||
"documents": {
|
||
"before_create": true,
|
||
"before_update": true,
|
||
"before_delete": true
|
||
},
|
||
"kv": {
|
||
"before_set": false,
|
||
"before_delete": false
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Interceptor Script Execution:**
|
||
When another script calls `s3.put("bucket", "key", data)`:
|
||
1. Orchestrator checks if any interceptor is registered for `s3.before_write`
|
||
2. If yes, spawn interceptor script with context:
|
||
```rhai
|
||
ctx.operation = {
|
||
service: "s3",
|
||
action: "write",
|
||
bucket: "bucket",
|
||
key: "key",
|
||
caller_script_id: "...",
|
||
caller_execution_id: "..."
|
||
}
|
||
ctx.data = { ... } // The data being written
|
||
```
|
||
3. Interceptor script returns: `{ allowed: true/false, reason: "...", data: {...} }`
|
||
4. If `allowed: false`, reject the operation → error to caller
|
||
5. If `allowed: true`, use potentially modified `data` → execute `s3.put()`
|
||
|
||
**Interceptor Script Example:**
|
||
```rhai
|
||
// Auth interceptor for S3
|
||
let user_id = ctx.request.body.user_id;
|
||
let key = ctx.operation.key;
|
||
|
||
// Check if user owns this key
|
||
let allowed = kv.get("permissions", `user:${user_id}:s3:${key}`);
|
||
|
||
if allowed {
|
||
log.info("S3 write authorized", { user_id, key });
|
||
{
|
||
allowed: true,
|
||
data: ctx.data // Optionally transform/add metadata
|
||
}
|
||
} else {
|
||
log.warn("S3 write denied", { user_id, key });
|
||
{
|
||
allowed: false,
|
||
reason: "User does not have write permission"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Availability Matrix (v1.2+):**
|
||
| Service | Before Operations |
|
||
|---------|------------------|
|
||
| **S3** | read, write, delete, list |
|
||
| **Documents** | create, read, update, delete, query |
|
||
| **KV** | set, get, delete |
|
||
| **Queue** | send, send_batch |
|
||
| **Email** | send |
|
||
| **HTTP** | get, post, put, delete |
|
||
| **Functions (invoke)** | call, call_async |
|
||
| **Users** | create, update, authenticate, lock, delete |
|
||
|
||
**Notes:**
|
||
- HTTP triggers have NO before interceptors (they're entry points)
|
||
- Interceptors are **per-script, opt-in** (scripts only intercept what they explicitly configure)
|
||
- Failed interceptors return `{ allowed: false }` → original caller gets error
|
||
- Interceptor failures are logged in audit trail
|
||
- **v1.3+ consideration**: Global policies / RBAC layer on top of interceptors
|
||
|
||
---
|
||
|
||
## 10. Open Questions & Notes
|
||
|
||
### Architecture
|
||
- [ ] **Container image caching**: Should we keep a warm executor image in memory between requests? (v1.1 optimization)
|
||
- [ ] **Script isolation**: Do we need process-level isolation beyond Docker (seccomp, AppArmor)?
|
||
- [ ] **Networking**: Can scripts initiate outbound connections? (deferred to v1.1)
|
||
|
||
### v1.1 Services
|
||
- [ ] **KV expiration**: Background cleanup task for expired keys, or lazy deletion?
|
||
- [ ] **Document queries**: Start with simple equality, or support complex filters (v1.2)?
|
||
- [ ] **Email retries**: If SMTP fails, retry strategy (exponential backoff)?
|
||
- [ ] **SMTP configuration**: Environment variables, config file, or dashboard UI?
|
||
- [ ] **User password hashing**: Use bcrypt, Argon2, or scrypt? What cost factor?
|
||
- [ ] **User invitations**: Email template customization? Configurable expiration?
|
||
- [ ] **Passwordless login**: Email-based or SMS-based login links?
|
||
- [ ] **Session management**: Sessions table for tracking login tokens/refresh tokens?
|
||
- [ ] **2FA/MFA**: In-scope for v1.1 or defer to v1.2?
|
||
|
||
### v1.2+ Workflows & Hierarchies
|
||
- [ ] **Workflow DAG format**: YAML, JSON, or domain-specific language (DSL)?
|
||
- [ ] **Branching logic**: Simple if/else, or complex conditions (switch/case)?
|
||
- [ ] **Workflow versioning**: Support multiple versions with rollback?
|
||
- [ ] **Call graph limits**: Max depth of nested function calls (prevent runaway recursion)?
|
||
- [ ] **Timeout cascading**: How strictly to enforce (child ≤ parent)?
|
||
- [ ] **Observability**: Generate trace IDs for call graphs, visualize in dashboard?
|
||
|
||
### v1.2+ Service Interceptors
|
||
- [ ] **Interceptor chaining**: If multiple scripts intercept same operation, execution order?
|
||
- [ ] **Performance**: Interceptor overhead on every service call — caching/optimization needed?
|
||
- [ ] **Interceptor failures**: If interceptor times out, fail the entire operation or allow bypass?
|
||
- [ ] **Circular dependencies**: Prevent interceptor A calling service that triggers interceptor B calling A?
|
||
- [ ] **Audit trail**: Log all interceptor decisions (allowed/denied) automatically?
|
||
- [ ] **Debugging**: How to trace interceptor execution in logs/dashboard?
|
||
|
||
### Rhai & SDK
|
||
- [ ] **Module loading**: Can scripts `import` external Rhai modules? (probably no for MVP)
|
||
- [ ] **File system access**: Can scripts read/write to local filesystem? (no for MVP)
|
||
- [ ] **Request/response sizes**: Max payload size? (set sensible default, e.g., 10MB)
|
||
|
||
### Operations
|
||
- [ ] **Container logs**: Capture executor stdout/stderr → attach to execution log? (yes, nice to have)
|
||
- [ ] **Script parsing errors**: Fail at upload time or runtime? (recommend: upload validation in Rhai)
|
||
- [ ] **Garbage collection**: How often to prune old execution logs? (optional MVP, monthly default)
|
||
|
||
### Future Integrations
|
||
- [ ] **Metrics backend**: Prometheus, InfluxDB, or local file?
|
||
- [ ] **Log aggregation**: ELK, Loki, or just local files?
|
||
- [ ] **Secrets backend**: Hashicorp Vault, local encrypted file, or built-in?
|
||
|
||
---
|
||
|
||
## 13. Success Metrics (MVP)
|
||
|
||
1. **Deployment ease**: Script uploaded and responding to HTTP in < 1 minute
|
||
2. **Performance**: p95 latency < 500ms (including container startup)
|
||
3. **Resource efficiency**: Server CPU/memory stays < 30% at rest, scales only on active requests
|
||
4. **Reliability**: 99.5% uptime, no memory leaks or orphaned containers
|
||
5. **Developer experience**: Dashboard feels responsive, errors are clear
|
||
|
||
---
|
||
|
||
## 14. Assumptions & Dependencies
|
||
|
||
**Assumptions:**
|
||
- Single server, modest hardware (2GB+ RAM, dual-core CPU)
|
||
- Rhai is mature enough for MVP (checked v1.12+)
|
||
- Docker daemon available on target machine
|
||
- PostgreSQL can be containerized (not separate managed service)
|
||
|
||
**Dependencies:**
|
||
- Docker (for executor runtime)
|
||
- Rust 1.70+ (for Orchestrator build)
|
||
- Rhai crate (script execution)
|
||
- Axum crate (HTTP framework)
|
||
- PostgreSQL client library (sqlx or tokio-postgres)
|
||
- Alpine Linux (executor base image)
|
||
|
||
---
|
||
|
||
## 16. Next Steps
|
||
|
||
1. **Clarify any ambiguities** in this blueprint
|
||
2. **Spike: Rhai executor image** — build minimal Alpine + Rhai image, test startup time
|
||
3. **Spike: Axum API** — scaffold REST endpoints for script CRUD
|
||
4. **Spike: PostgreSQL schema** — finalize schema, migrations
|
||
5. **Build Phase 1**: Orchestrator → Dashboard → Executor → docker-compose integration
|
||
|
||
---
|
||
|
||
## Document Control
|
||
|
||
| Version | Date | Author | Notes |
|
||
|---------|------|--------|-------|
|
||
| 1.0 | 2026-04-10 | Blueprint | MVP scope, architecture, tech stack locked |
|
||
|