docs(v1.1.x): planning notes — in-flight decisions + revised roadmap

Consolidates the architectural conversations that followed the v1.1.0 release but haven't yet landed in the blueprint or in code. Six topic areas, each with status + open calls: 1. Messaging primitives — invoke vs pub/sub vs queue, recipient model and delivery semantics 2. Universal trigger outbox — async dispatch substrate for every event source (sync HTTP excepted, see #3) 3. NATS-style sync HTTP — per-request inbox + oneshot channel lets sync HTTP ride the outbox without losing the response path 4. Dead-letter handling — separate table, dead_letter trigger kind, recursion stop rule, retention defaults 5. Realtime updates — SSE-based external subscription to per-app pub/sub topics with opt-in exposure 6. Frontend client library — hybrid model (TS lib that talks to dev-defined script endpoints, not to services) Plus a revised v1.1.x roadmap: realtime adds at v1.1.6 (was Config & Email), shifting later items by one to v1.1.9 (was v1.1.8). 20 open calls consolidated at the bottom, numbered for reference. Document is meant to be pruned as decisions ship; deleted entirely when v1.1.9 lands. No blueprint changes yet — those wait for the open calls to be answered and the corresponding PRs to ship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:24:53 +02:00
parent bb88b024d2
commit 10cfde9e40
1 changed files with 502 additions and 0 deletions
--- a/docs/v1.1.x-design-notes.md
+++ b/docs/v1.1.x-design-notes.md
@@ -0,0 +1,502 @@
+# v1.1.x design notes — in-flight decisions + revised roadmap
+
+Planning document for the v1.1.x release series. Companion to:
+
+- [`serverless_cloud_blueprint.md`](../serverless_cloud_blueprint.md) — authoritative design
+- [`docs/sdk-shape.md`](sdk-shape.md) — SDK conventions (settled in v1.1.0)
+- [`docs/stdlib-reference.md`](stdlib-reference.md) — stdlib API (settled in v1.1.0)
+- [`docs/versioning.md`](versioning.md) — versioning policy (post-1.0 carve-out settled with v1.1.0)
+
+Items in this doc are either **tentatively decided but not yet shipped** or **open calls awaiting the maintainer's decision**. Once an item ships, its content moves into the blueprint and the corresponding section here gets pruned.
+
+This document was created at the v1.1.0 → v1.1.1 boundary, capturing the architectural conversations that followed v1.1.0 but haven't yet landed in code or in the blueprint.
+
+---
+
+## 1. The three messaging primitives
+
+PiCloud will expose three distinct messaging concepts. The right way to slice them is along **recipient model** and **delivery semantics**:
+
+| | Recipients | Durability | Delivery | Retry on script failure | Mental model |
+|---|---|---|---|---|---|
+| **`invoke(script_id, args)`** | One **named** script | None (or fire-and-forget durable) | At-most-once sync, or at-least-once async | Caller-controlled via `retry::*` | Function call |
+| **`pubsub::publish(topic, msg)`** | **All** scripts subscribed via trigger | Through outbox | **At-least-once per subscriber** | Per-subscriber retry up to N, then dead-letter | Fan-out broadcast |
+| **`queue::enqueue(name, msg)`** | **Exactly one** consumer wins | Durable table | **At-least-once total** | Visibility timeout + nack-on-throw | Work distribution |
+
+**Critical distinction:** pub/sub and queue both end up at-least-once, but the **subscriber model** differs. Queue: 1 message → 1 delivery record → consumers compete. Pub/sub: 1 message → N delivery records (one per subscriber) → no competition.
+
+### Pub/sub reframe — drop ephemeral, use the outbox
+
+The original blueprint plan was pub/sub via Postgres `LISTEN/NOTIFY` (ephemeral, sub-millisecond fan-out). Reframe to **reuse the triggers framework's outbox infrastructure**:
+
+- `pubsub::publish(topic, msg)` writes to the outbox
+- Dispatcher fans out one delivery record per subscribed script trigger
+- Each delivery retried on failure with the same machinery as KV / doc / file triggers
+- After N retries → dead-letter (see §4)
+
+**Wins:** one delivery model in the whole system, durable pub/sub for free, shared observability/retry/dead-letter tooling across every event-firing surface.
+
+**Cost:** ~1ms Postgres write per publish (vs in-memory NOTIFY). For solo-dev / consumer hardware, the right tradeoff. If sub-ms ever matters, `pubsub::publish_ephemeral` is a future addition that bypasses the outbox.
+
+### Queue stays separate
+
+Pub/sub-through-outbox cannot model "work distribution with backpressure" cleanly. Queue keeps its own table:
+
+- Producer: `queue::enqueue(name, msg)` → queue table
+- Consumer: `queue:receive` trigger fires when message available; runtime claims with `FOR UPDATE SKIP LOCKED` + visibility timeout
+- Script returns successfully → auto-ack (delete row)
+- Script throws → auto-nack (clear claim; message becomes visible again)
+- Visibility timeout exceeded → reclaim allowed (handles crashed consumers)
+- Max delivery attempts → dead-letter
+
+The queue table IS the outbox for queue semantics — no double-buffering.
+
+### Status
+
+- **Pub/sub via trigger outbox**: leaning yes; needs final ack.
+- **Queue stays separate from pub/sub**: leaning yes.
+- **Drop `LISTEN/NOTIFY` plan**: leaning yes.
+
+### Open calls
+
+1. Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
+2. Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?
+
+---
+
+## 2. Universal trigger outbox
+
+The triggers framework's outbox should be the universal substrate for **async dispatch**. Every event source that fires scripts asynchronously writes to the same outbox table; one dispatcher reads from it and routes to the executor with shared load control, retry, dead-letter, and trigger-depth tracking.
+
+### What runs through the outbox
+
+| Ingress | Path | Reason |
+|---|---|---|
+| **HTTP request (sync)** | Direct: orchestrator → executor → response (with NATS-style indirection — see §3) | Caller is waiting; the inbox pattern makes this work via the outbox |
+| **HTTP request (async, opt-in)** | Orchestrator writes outbox → returns 202 → dispatcher → executor | Webhooks, fire-and-forget endpoints; explicit opt-in via route config |
+| **Cron tick** | Scheduler writes outbox → dispatcher → executor | No caller; naturally async |
+| **KV / doc / file change** | Service writes outbox → dispatcher → executor | No caller; the originating script already returned |
+| **Pub/sub publish** | Service writes outbox → dispatcher → executor (per subscriber) | Fan-out semantics |
+| **Queue message** | Queue table IS the outbox; dispatcher claims via `FOR UPDATE SKIP LOCKED` | Avoids double-buffering |
+| **Inbound email** | SMTP receiver writes outbox → dispatcher → executor | No caller |
+
+### What this gives
+
+1. **One dispatcher = one place** for load control (the existing `ExecutionGate`), retry, dead-letter, trigger-depth tracking, fan-out. New event source = "write to outbox in this shape", nothing else.
+2. **Routes become a trigger kind**, conceptually. A route is `(source=http, filter=method+path, script_id, dispatch_mode=sync|async)`. Schema-wise the `routes` table likely stays separate from the new `triggers` table (polymorphic JSON columns get ugly), but the mental model collapses to "everything that fires a script is a trigger".
+3. **`dispatch_mode = async` is a per-route opt-in**. Webhook handlers can return 202 immediately and process in the background — dispatcher handles retries, caller gets a snappy ack.
+4. **Replay and debugging.** Every async invocation has an outbox row; admin can re-fire a trigger by re-dispatching the row.
+5. **Decoupled lifecycle.** Dispatcher can be paused for maintenance without affecting HTTP ingress (it just queues); HTTP can degrade (overflow 503s) without affecting async work already in the outbox.
+
+### What this doesn't change
+
+- Sync HTTP still hits the `ExecutionGate` the same way (now via the dispatcher).
+- Async outbox dispatch also hits the gate when the dispatcher picks a row. Sync and async share the cap on actual blocking-thread-in-use.
+- Trigger CRUD likely stays in per-kind tables for schema sanity; the unification is conceptual + dispatch-layer, not schema-layer.
+
+### Status
+
+- **Universal outbox for async dispatch**: leaning yes.
+- **Routes-as-trigger conceptually**: leaning yes.
+- **Routes-as-trigger schema-wise**: leaning no (keep separate tables).
+- **Per-route `dispatch_mode: sync|async`**: leaning yes for v1.1.1 since the dispatcher is already being built.
+
+### Open calls
+
+1. Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
+2. Ship `dispatch_mode: async` for HTTP routes in v1.1.1, or defer to a later release?
+3. Keep `routes` and `triggers` as separate tables (unified at the dispatcher only), or merge schemas?
+
+---
+
+## 3. NATS-style request/reply for sync HTTP
+
+The constraint that makes "universal outbox" tricky: HTTP has a caller waiting. We can't write to outbox, return 202, and walk away — the user's browser expects `200 OK` with body. NATS's request/reply pattern resolves this elegantly.
+
+### Pattern
+
+```
+HTTP request  →  orchestrator generates inbox_id, registers a oneshot channel
+              →  writes outbox row { source: http, payload, reply_to: inbox_id }
+              →  awaits on the channel (with timeout = script's wall-clock + buffer)
+
+Dispatcher    →  picks outbox row
+              →  dispatches to executor (gate + spawn_blocking + Rhai)
+              →  if reply_to.is_some(): resolves the channel with the result
+              →  if reply_to.is_none(): records completion + retries on failure per trigger config
+
+Orchestrator  →  channel resolves → returns response to HTTP caller
+              →  on timeout: returns 504 or 500 → see status-code calls below
+```
+
+The HTTP caller's experience is unchanged (synchronous request/response). Under the hood, dispatch is identical for every invocation source.
+
+### Implementation by deployment mode
+
+| Mode | Mechanism | Trade-off |
+|---|---|---|
+| **In-process (v1.1.1, MVP)** | Per-orchestrator `HashMap<InboxId, oneshot::Sender<Result>>`; dispatcher resolves the oneshot | Sub-ms wake-up; fails across process boundaries |
+| **Cross-process (cluster mode v1.3+)** | Postgres `LISTEN/NOTIFY` keyed on `inbox_id`, with a `responses` row as durable backup | Sub-10ms wake-up; survives across nodes; needs careful long-listener management |
+| **Polling fallback** | Orchestrator polls `responses` table for `inbox_id` every ~10ms | Simple; ~10ms minimum latency; only as fallback |
+
+### Latency cost (honest numbers)
+
+Per sync HTTP request, NATS-style adds: ~1-2ms Postgres write (outbox) + sub-ms dispatcher wake (in-process channel) + ~1ms response resolve = **~2-5ms overhead**. For most scripts (10-100ms execution), this is noise. PiCloud isn't optimizing for sub-ms; the architectural unification is worth a few ms.
+
+### Retry policy — `reply_to` IS the signal
+
+| Outbox row | Retry behavior |
+|---|---|
+| `reply_to.is_some()` | **Never retry.** Caller is waiting; retrying means the script might run twice and the caller gets one of two outcomes. Always: one attempt, surface result (success or failure) to inbox. |
+| `reply_to.is_none()` | Retry per trigger's configured policy. Default: 3 attempts, exponential backoff (1s, 2s, 4s), dead-letter after. |
+
+Per-trigger config lives on the trigger row:
+
+```
+trigger { source: cron,   schedule: "0 */5 * * * *",
+          retry: { max_attempts: 5, backoff: exponential, base_ms: 1000 } }
+
+trigger { source: pubsub, topic: "user.created",
+          retry: { max_attempts: 3, backoff: linear,      base_ms: 500  } }
+
+trigger { source: http,   method: POST, path: "/api/foo",
+          dispatch_mode: sync }   // retry absent — sync HTTP is always 1-attempt
+```
+
+### Failure / crash handling
+
+With NATS-style indirection, there are new ways for a sync HTTP request to vanish. Every failure path must resolve the orchestrator's oneshot channel with something:
+
+| Failure mode | Detection | Caller sees |
+|---|---|---|
+| Script throws / runtime error | Executor returns `ExecError::Runtime` → written to inbox | 502 (or 500 — see status-code discussion) |
+| Script exceeds wall-clock | `tokio::time::timeout` fires inside dispatcher → written to inbox | 504 (or 500) |
+| Operation budget exceeded | Executor returns `ExecError::OperationBudgetExceeded` → inbox | 507 (or 500) |
+| Executor process crashes mid-execution | `JoinError` → `ExecError::Runtime` → inbox | 500 |
+| Dispatcher process dies between claim and reply | Orchestrator's wait times out | 500 |
+| Outbox write fails (Postgres unavailable) | Orchestrator never publishes; immediate error | 500 |
+| Orchestrator's own wait times out unexpectedly | Channel timeout fires before inbox resolves | 504 (or 500) |
+
+Every path resolves the channel with a result. The orchestrator's outer timeout is the backstop for "dispatcher just died completely".
+
+### Status code strategy — open question
+
+Today's orchestrator distinguishes 422 / 502 / 503 / 504 / 507 / 500. User raised "everything should be 500" framing. Two options:
+
+- **Option A (recommended):** keep existing distinctions. Script crashes → 502, timeouts → 504, overloaded → 503, parse errors → 422, dispatcher vanished → 500. Clients get actionable info.
+- **Option B:** flatten to 500 for everything that's "platform couldn't return a useful response". Simpler surface; loses actionable distinctions.
+
+### Status
+
+- **NATS-style for sync HTTP**: leaning yes; resolves the outbox vs direct-dispatch tension.
+- **`reply_to` presence as the "don't retry" signal**: leaning yes.
+- **Default retry policy** (3 attempts, exp backoff 1s/2s/4s): proposed.
+
+### Open calls
+
+1. NATS-style request/reply for sync HTTP — confirmed?
+2. Status code strategy: keep existing distinctions (A, recommended) or flatten to 500 (B)?
+3. Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
+4. Cancel-on-timeout semantics: if orchestrator's wait times out but executor finishes successfully later, do we (a) discard the late result, (b) write it to an "abandoned executions" table for debugging, or (c) attempt to ack the caller late? Leaning (b) — log + discard but keep the row for forensics.
+
+---
+
+## 4. Dead-letter handling
+
+Events that exhaust their retry policy land in a **separate `dead_letters` table** (not a flag on the outbox — outbox should stay a queue with fast inserts and scans). Users handle dead letters by registering a script for the new `dead_letter` **trigger kind**.
+
+### Schema sketch
+
+```sql
+CREATE TABLE dead_letters (
+  id                UUID PRIMARY KEY,
+  app_id            UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
+  original_event_id UUID NOT NULL,         -- the outbox row id
+  source            TEXT NOT NULL,         -- "kv", "cron", "pubsub", "queue", "email"
+  op                TEXT NOT NULL,
+  trigger_id        UUID,                  -- which trigger config fired (null for direct dispatches)
+  script_id         UUID,                  -- which script failed
+  payload           JSONB NOT NULL,        -- the event payload, verbatim
+  attempt_count     INT  NOT NULL,
+  first_attempt_at  TIMESTAMPTZ NOT NULL,
+  last_attempt_at   TIMESTAMPTZ NOT NULL,
+  last_error        TEXT NOT NULL,
+  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  resolved_at       TIMESTAMPTZ,           -- null = unresolved
+  resolution        TEXT                   -- "replayed" | "ignored" | "handled_by_script" | "handler_failed"
+);
+
+CREATE INDEX idx_dead_letters_app_unresolved
+  ON dead_letters(app_id) WHERE resolved_at IS NULL;
+```
+
+### Dead letter as trigger source
+
+```
+trigger {
+  source: dead_letter,
+  filter: { source: "kv" },        -- optional; defaults to "any source"
+  script_id: <your handler>,
+  dispatch_mode: async,
+  retry: { max_attempts: 1 }       -- forced — see recursion stop rule below
+}
+```
+
+Filterable on:
+- `source`: only dead letters from a particular event source (kv, cron, pubsub, …)
+- `trigger_id`: only dead letters from a particular trigger config
+- `script_id`: only dead letters from a particular script
+- No filter: every dead letter fires this handler
+
+`ctx.event` for a dead-letter handler:
+
+```rhai
+ctx.event.source              // "dead_letter"
+ctx.event.dead_letter = #{
+    original: #{
+        source:     "kv",
+        op:         "insert",
+        collection: "widgets",
+        key:        "k1",
+        payload:    #{ ... }
+    },
+    attempts:        3,
+    last_error:      "script timeout after 30s",
+    trigger_id:      "...",
+    script_id:       "...",
+    first_attempt_at: "2026-05-30T12:00:00.000Z",
+    last_attempt_at:  "2026-05-30T12:00:14.000Z"
+}
+```
+
+The handler can `log::error`, send `email::send` to admins, write to `docs::collection("incidents").create(...)`, post to external alerting via `http::post`, or call `dead_letters::replay(id)` if it decides retry is favorable.
+
+### Recursion stop rule
+
+**Dead-letter handlers execute once, no retry, and CANNOT themselves be dead-lettered.**
+
+When the dispatcher invokes a dead-letter trigger, the resulting execution is marked `is_dead_letter_handler = true`. If it fails:
+
+- Failure is logged to the structured log (full payload + error)
+- A metric is bumped (`picloud_dead_letter_handler_failures`)
+- Original dead-letter row annotated with `resolution = "handler_failed"`
+- **Nothing else is fired.** Chain stops definitively.
+
+This is the only safe stop rule. If your alerting script is broken, the platform shouldn't try to alert about that with the same broken script.
+
+### Defaults
+
+**No automatic handler.** Dead letters silently land in the table. Users opt into handling by registering a trigger. The dashboard surfaces an unresolved-count badge so users notice.
+
+This avoids over-engineering — most apps will run for months without a dead-letter trigger; the table is the durable record either way.
+
+### Sync HTTP failures don't dead-letter
+
+Sync HTTP requests (`reply_to.is_some()`) failures don't land in `dead_letters`. Caller already got an error response; every failed HTTP request landing in `dead_letters` would flood the table; `execution_logs` already captures sync request failures. If a user wants alerts on HTTP endpoint failures, that's **monitoring** (v1.3+ territory), not dead-lettering.
+
+### Pub/sub fan-out dead-letters independently
+
+One `pubsub::publish` → N subscribers → each retries independently → each can independently dead-letter. So one publish can produce N dead-letter rows (one per subscriber that exhausted retries). Subscribers are independent failure domains.
+
+### Manual replay
+
+| Surface | Use case |
+|---|---|
+| `POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/replay` | Admin clicks "replay" in dashboard |
+| `dead_letters::replay(id)` Rhai SDK | A handler script decides to retry programmatically |
+| `dead_letters::resolve(id, reason)` Rhai SDK | A handler decides "this is fine, don't bother me" |
+
+Replay re-inserts the original event into the outbox; dispatcher tries again from scratch.
+
+### Retention
+
+Time-based: delete dead letters older than 30 days by default. Configurable per-app via app settings, or globally via env var (`PICLOUD_DEAD_LETTER_RETENTION_DAYS`). A weekly GC job in the manager handles the deletion using `FOR UPDATE SKIP LOCKED`.
+
+### Status
+
+- **Separate `dead_letters` table**: leaning yes.
+- **`dead_letter` as trigger kind**: leaning yes.
+- **Recursion stop rule** (handlers can't be dead-lettered): leaning yes.
+- **No default handler** (rows just sit in table): leaning yes.
+- **Sync HTTP failures don't dead-letter**: leaning yes.
+
+### Open calls
+
+1. Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
+2. No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or do you want a built-in "log to admin notifications channel" default?
+3. 30-day default retention sensible, or longer/shorter?
+4. Include Rhai SDK (`dead_letters::list/replay/resolve`) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?
+
+---
+
+## 5. Realtime updates for external clients
+
+Apps built on PiCloud need a way for browser/mobile clients to receive live updates (chat messages, dashboard data, multiplayer state, notifications). Today's pub/sub is internal-only (script ↔ script via triggers).
+
+### The chosen approach
+
+**Option C (from prior debate): topics with opt-in external subscription.**
+
+- One `pubsub::publish(topic, msg)` API for scripts — produces a single event
+- Topics are **internal-only by default** — script triggers can subscribe; external clients cannot
+- Apps explicitly mark topics as externally-subscribable (per-topic config in dashboard / API)
+- External clients connect to `GET /realtime/topics/{topic}` via SSE and receive only messages published to topics they're permitted to subscribe to
+
+**Wins:** one publish API for scripts (DRY), topics don't leak by default (security), external visibility is an explicit opt-in per topic.
+
+### Transport: SSE first
+
+SSE (Server-Sent Events) for v1.x:
+
+- Simpler than WebSocket; works through any HTTP proxy without protocol upgrade
+- Browsers auto-reconnect on disconnect
+- Sufficient for "server-pushed events to the browser" (the dominant use case)
+
+WebSocket is added later if bidirectional comms (chat-style) warrant it.
+
+### Auth model for external subscribers
+
+Three flavors, ordered by complexity:
+
+- **Public topics** — anyone with the URL connects. For marketing-style broadcasts, public stat boards.
+- **Token-gated topics** — client presents a token issued by a script. Pusher / Ably-style. Token can be a PiCloud API key (v1.1.6) or a users-SDK session token (v1.1.8+).
+- **Script-mediated** — a script handles each subscribe request and decides yes/no. Most flexible, defer to v1.2.
+
+Ship public + token-gated in v1.1.6; defer script-mediated.
+
+### Status
+
+- **Approach C (opt-in external subscription)**: leaning yes.
+- **SSE first, WebSocket later**: leaning yes.
+- **Public + token-gated auth in v1.1.6**: leaning yes.
+
+### Open calls
+
+1. Approach C confirmed (vs A: pubsub IS realtime, B: separate `channels::` service)?
+2. SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
+3. Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?
+
+---
+
+## 6. Frontend client library
+
+Strategic positioning question: how much should PiCloud expose to frontend developers building apps on top of it?
+
+### The two ends of the spectrum
+
+| End | Frontend gets | Examples |
+|---|---|---|
+| **Minimalist** | HTTP to dev-defined script endpoints + SSE on dev-marked-public topics. Nothing else. | AWS Lambda + API Gateway, Cloudflare Workers, Deno Deploy |
+| **Maximalist** | Direct client-side access to KV/docs/users/files. Frontend writes `kv.get()`, `docs.find()`, no Rhai script for trivial reads. | Firebase, Supabase, AWS Amplify |
+
+PiCloud today sits at the minimalist end (services exist for scripts to use, not for frontends). Crossing to maximalist would be a real product pivot, not a feature add.
+
+### The chosen approach: hybrid
+
+**Ship a client library that talks to scripts, not to services.** Specifically, three things:
+
+1. **Typed HTTP client to dev-defined endpoints** — `picloud.endpoint('/api/users').post({ name: 'alice' })`. Fetch wrapper with auth header injection, retry logic, structured error handling.
+2. **SSE subscription** — `picloud.subscribe('chat-room-123', msg => …)`. Auto-reconnect, token refresh, backpressure.
+3. **Auth flow helpers** — `picloud.auth.login(email, password)`, `picloud.auth.logout()`, `picloud.auth.token`. These call **dev-defined** endpoints under the hood (`/api/auth/login` etc.); the lib just standardizes the dance + token storage.
+
+Crucially: **no `picloud.kv.get()` or `picloud.docs.find()` from the frontend.** Those stay server-side, behind dev-written Rhai scripts.
+
+### Why hybrid, not maximalist
+
+Firebase trades security for DX; the security-rule misconfiguration footgun is the #1 cause of accidental data exposure in serverless apps. PiCloud's "solo dev / consumer hardware" audience does not have the operational capacity to defend a Firebase-style attack surface against misconfiguration. The script layer is also where PiCloud differentiates — if frontends bypass scripts to talk directly to services, we're competing with Supabase head-to-head (unwinnable, they're better-resourced and have a 5-year head start).
+
+### Why hybrid, not pure minimalist
+
+A frontend dev shouldn't have to hand-roll fetch wrappers, SSE reconnect logic, and token-refresh dances. That stuff is identical across every app. Shipping it as `@picloud/client` is genuinely valuable — it doesn't expand the security surface (scripts still gate everything), it just removes boilerplate.
+
+### TypeScript first
+
+Ship TypeScript first. Cross-language story (Python, Swift, Kotlin, Rust, …) deferred until demand emerges. TS covers the dominant "web app + mobile via React Native" segment.
+
+### Status
+
+- **Hybrid model (frontend through scripts only)**: leaning yes.
+- **TypeScript first, other languages deferred**: leaning yes.
+- **Co-ship with realtime as v1.1.6**: leaning yes (SSE wrapper is the killer feature of the lib).
+
+### Open calls
+
+1. Hybrid model — confirmed, or do you want to seriously evaluate Firebase-mode?
+2. TypeScript first, multi-language deferred — confirmed?
+3. Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
+4. Type safety: hand-written types only, or aim for codegen from script-declared schemas? Codegen is big — defer to v1.2+ if at all?
+
+---
+
+## 7. Revised v1.1.x roadmap
+
+Net changes vs the [blueprint §12](../serverless_cloud_blueprint.md) roadmap:
+
+- **v1.1.5 pub/sub**: now via trigger outbox (drops `LISTEN/NOTIFY` plan), tightening implementation scope
+- **NEW v1.1.6 Realtime Channels & Client Library**: realtime SSE + `@picloud/client` TS package; co-shipped
+- **v1.1.7+ items shifted by one** (was v1.1.6/7/8 → now v1.1.7/8/9)
+- **Dead letters and the unified outbox/dispatcher** are absorbed into v1.1.1's existing scope (triggers framework)
+
+| Version | Capability |
+|---|---|
+| **v1.1.0** | **Foundation & Standard Library** — SDK shape, `Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape; stdlib utilities (regex, random, time, json, base64, hex, url). ✓ Shipped. |
+| **v1.1.1** | **Storage & Events** — KV store keyed `(app_id, collection, key)`; triggers framework (universal outbox + dispatcher + NATS-style sync HTTP via inbox + per-trigger retry config + dead-letter table & `dead_letter` trigger source + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds. |
+| **v1.1.2** | **Documents** — `docs::collection(name).create/find/update/delete/list` with `docs:*` triggers. |
+| **v1.1.3** | **Modules** — `scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation. |
+| **v1.1.4** | **Outbound HTTP & Scheduled Tasks** — `http::*` with SSRF deny-list; cron triggers (small now that the framework exists). |
+| **v1.1.5** | **Files & Pub/Sub** — filesystem-backed blobs (`files/<app_id>/<id[0:2]>/<id>`) with `files:*` triggers; pub/sub via the universal outbox with `pubsub:*` triggers. |
+| **v1.1.6** | **Realtime Channels & Client Library** *(new)* — SSE-based external subscription to per-app pub/sub topics (public + API-key auth modes); `@picloud/client` TypeScript package (typed HTTP, SSE subscription, auth helpers). |
+| **v1.1.7** | **Configuration & Email** *(was v1.1.6)* — encrypted per-app secrets; outbound `email::send/send_html` + inbound `email:receive` trigger. |
+| **v1.1.8** | **User Management** *(was v1.1.7)* — `users::*` for in-script CRUD, auth, roles, invites, password reset. |
+| **v1.1.9** | **Durable Queues & Function Composition** *(was v1.1.8)* — `queue::*` with `queue:receive` trigger; `invoke()` + `retry::*` (closures-as-args, re-entrant Rhai). |
+| **v1.2** | **Workflows & Hierarchies** (per blueprint §Phase 5) — DAG execution, advanced docs query, interceptors, read triggers, audit log, script-mediated realtime auth. |
+| **v1.3+** | **Scale & Ops** (per blueprint §Phase 6) — cluster mode (NATS-style request/reply swaps to `LISTEN/NOTIFY`), cross-app data sharing, script versioning + rollback, rate limiting, richer auth, metrics, distributed tracing, webhooks, S3, monitoring/alerting on HTTP endpoint failures. |
+
+The v1.1.9 release marks the end of the v1.1.x expansion cadence. v1.2 is the next minor product bump (phase milestone per [versioning policy](versioning.md)).
+
+---
+
+## Consolidated open calls
+
+Numbered for easy reference in conversation. All currently un-answered.
+
+### §1 — Messaging primitives
+1. Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
+2. Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?
+
+### §2 — Universal trigger outbox
+3. Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
+4. Ship `dispatch_mode: async` for HTTP routes in v1.1.1, or defer to a later release?
+5. Keep `routes` and `triggers` as separate tables (unified at the dispatcher only), or merge schemas?
+
+### §3 — NATS-style sync HTTP
+6. NATS-style request/reply for sync HTTP — confirmed?
+7. Status code strategy: keep existing distinctions (recommended) or flatten to 500?
+8. Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
+9. Cancel-on-timeout semantics: discard late results (a), write to "abandoned executions" table for debugging (b — recommended), or attempt to ack the caller late (c)?
+
+### §4 — Dead letters
+10. Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
+11. No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or built-in "log to admin notifications channel" default?
+12. 30-day default retention sensible, or longer/shorter?
+13. Include Rhai SDK (`dead_letters::list/replay/resolve`) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?
+
+### §5 — Realtime
+14. Approach C confirmed (opt-in external subscription on pub/sub topics) vs A: pubsub IS realtime, B: separate `channels::` service?
+15. SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
+16. Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?
+
+### §6 — Frontend client library
+17. Hybrid model (frontend through scripts only) — confirmed, or seriously evaluate Firebase-mode?
+18. TypeScript first, multi-language deferred — confirmed?
+19. Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
+20. Type safety: hand-written types only, or aim for codegen from script-declared schemas? Defer codegen to v1.2+ if at all?
+
+---
+
+## Lifecycle of this document
+
+- **Created** at the v1.1.0 → v1.1.1 boundary (after the foundation PR series shipped).
+- **Each section gets pruned** once its decisions ship and land in the blueprint.
+- **Open calls are answered** in conversation, then folded into the corresponding section as "Decided: X" with the date.
+- **Document deleted** when v1.1.9 ships — everything by then is either in the blueprint, in code, or explicitly deferred to v1.2+.