Files
PiCloud/docs/v1.1.x-design-notes.md
MechaCat02 10cfde9e40 docs(v1.1.x): planning notes — in-flight decisions + revised roadmap
Consolidates the architectural conversations that followed the v1.1.0
release but haven't yet landed in the blueprint or in code. Six topic
areas, each with status + open calls:

  1. Messaging primitives — invoke vs pub/sub vs queue, recipient
     model and delivery semantics
  2. Universal trigger outbox — async dispatch substrate for every
     event source (sync HTTP excepted, see #3)
  3. NATS-style sync HTTP — per-request inbox + oneshot channel lets
     sync HTTP ride the outbox without losing the response path
  4. Dead-letter handling — separate table, dead_letter trigger kind,
     recursion stop rule, retention defaults
  5. Realtime updates — SSE-based external subscription to per-app
     pub/sub topics with opt-in exposure
  6. Frontend client library — hybrid model (TS lib that talks to
     dev-defined script endpoints, not to services)

Plus a revised v1.1.x roadmap: realtime adds at v1.1.6 (was Config &
Email), shifting later items by one to v1.1.9 (was v1.1.8).

20 open calls consolidated at the bottom, numbered for reference.
Document is meant to be pruned as decisions ship; deleted entirely
when v1.1.9 lands.

No blueprint changes yet — those wait for the open calls to be
answered and the corresponding PRs to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:24:53 +02:00

30 KiB

v1.1.x design notes — in-flight decisions + revised roadmap

Planning document for the v1.1.x release series. Companion to:

Items in this doc are either tentatively decided but not yet shipped or open calls awaiting the maintainer's decision. Once an item ships, its content moves into the blueprint and the corresponding section here gets pruned.

This document was created at the v1.1.0 → v1.1.1 boundary, capturing the architectural conversations that followed v1.1.0 but haven't yet landed in code or in the blueprint.


1. The three messaging primitives

PiCloud will expose three distinct messaging concepts. The right way to slice them is along recipient model and delivery semantics:

Recipients Durability Delivery Retry on script failure Mental model
invoke(script_id, args) One named script None (or fire-and-forget durable) At-most-once sync, or at-least-once async Caller-controlled via retry::* Function call
pubsub::publish(topic, msg) All scripts subscribed via trigger Through outbox At-least-once per subscriber Per-subscriber retry up to N, then dead-letter Fan-out broadcast
queue::enqueue(name, msg) Exactly one consumer wins Durable table At-least-once total Visibility timeout + nack-on-throw Work distribution

Critical distinction: pub/sub and queue both end up at-least-once, but the subscriber model differs. Queue: 1 message → 1 delivery record → consumers compete. Pub/sub: 1 message → N delivery records (one per subscriber) → no competition.

Pub/sub reframe — drop ephemeral, use the outbox

The original blueprint plan was pub/sub via Postgres LISTEN/NOTIFY (ephemeral, sub-millisecond fan-out). Reframe to reuse the triggers framework's outbox infrastructure:

  • pubsub::publish(topic, msg) writes to the outbox
  • Dispatcher fans out one delivery record per subscribed script trigger
  • Each delivery retried on failure with the same machinery as KV / doc / file triggers
  • After N retries → dead-letter (see §4)

Wins: one delivery model in the whole system, durable pub/sub for free, shared observability/retry/dead-letter tooling across every event-firing surface.

Cost: ~1ms Postgres write per publish (vs in-memory NOTIFY). For solo-dev / consumer hardware, the right tradeoff. If sub-ms ever matters, pubsub::publish_ephemeral is a future addition that bypasses the outbox.

Queue stays separate

Pub/sub-through-outbox cannot model "work distribution with backpressure" cleanly. Queue keeps its own table:

  • Producer: queue::enqueue(name, msg) → queue table
  • Consumer: queue:receive trigger fires when message available; runtime claims with FOR UPDATE SKIP LOCKED + visibility timeout
  • Script returns successfully → auto-ack (delete row)
  • Script throws → auto-nack (clear claim; message becomes visible again)
  • Visibility timeout exceeded → reclaim allowed (handles crashed consumers)
  • Max delivery attempts → dead-letter

The queue table IS the outbox for queue semantics — no double-buffering.

Status

  • Pub/sub via trigger outbox: leaning yes; needs final ack.
  • Queue stays separate from pub/sub: leaning yes.
  • Drop LISTEN/NOTIFY plan: leaning yes.

Open calls

  1. Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
  2. Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?

2. Universal trigger outbox

The triggers framework's outbox should be the universal substrate for async dispatch. Every event source that fires scripts asynchronously writes to the same outbox table; one dispatcher reads from it and routes to the executor with shared load control, retry, dead-letter, and trigger-depth tracking.

What runs through the outbox

Ingress Path Reason
HTTP request (sync) Direct: orchestrator → executor → response (with NATS-style indirection — see §3) Caller is waiting; the inbox pattern makes this work via the outbox
HTTP request (async, opt-in) Orchestrator writes outbox → returns 202 → dispatcher → executor Webhooks, fire-and-forget endpoints; explicit opt-in via route config
Cron tick Scheduler writes outbox → dispatcher → executor No caller; naturally async
KV / doc / file change Service writes outbox → dispatcher → executor No caller; the originating script already returned
Pub/sub publish Service writes outbox → dispatcher → executor (per subscriber) Fan-out semantics
Queue message Queue table IS the outbox; dispatcher claims via FOR UPDATE SKIP LOCKED Avoids double-buffering
Inbound email SMTP receiver writes outbox → dispatcher → executor No caller

What this gives

  1. One dispatcher = one place for load control (the existing ExecutionGate), retry, dead-letter, trigger-depth tracking, fan-out. New event source = "write to outbox in this shape", nothing else.
  2. Routes become a trigger kind, conceptually. A route is (source=http, filter=method+path, script_id, dispatch_mode=sync|async). Schema-wise the routes table likely stays separate from the new triggers table (polymorphic JSON columns get ugly), but the mental model collapses to "everything that fires a script is a trigger".
  3. dispatch_mode = async is a per-route opt-in. Webhook handlers can return 202 immediately and process in the background — dispatcher handles retries, caller gets a snappy ack.
  4. Replay and debugging. Every async invocation has an outbox row; admin can re-fire a trigger by re-dispatching the row.
  5. Decoupled lifecycle. Dispatcher can be paused for maintenance without affecting HTTP ingress (it just queues); HTTP can degrade (overflow 503s) without affecting async work already in the outbox.

What this doesn't change

  • Sync HTTP still hits the ExecutionGate the same way (now via the dispatcher).
  • Async outbox dispatch also hits the gate when the dispatcher picks a row. Sync and async share the cap on actual blocking-thread-in-use.
  • Trigger CRUD likely stays in per-kind tables for schema sanity; the unification is conceptual + dispatch-layer, not schema-layer.

Status

  • Universal outbox for async dispatch: leaning yes.
  • Routes-as-trigger conceptually: leaning yes.
  • Routes-as-trigger schema-wise: leaning no (keep separate tables).
  • Per-route dispatch_mode: sync|async: leaning yes for v1.1.1 since the dispatcher is already being built.

Open calls

  1. Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
  2. Ship dispatch_mode: async for HTTP routes in v1.1.1, or defer to a later release?
  3. Keep routes and triggers as separate tables (unified at the dispatcher only), or merge schemas?

3. NATS-style request/reply for sync HTTP

The constraint that makes "universal outbox" tricky: HTTP has a caller waiting. We can't write to outbox, return 202, and walk away — the user's browser expects 200 OK with body. NATS's request/reply pattern resolves this elegantly.

Pattern

HTTP request  →  orchestrator generates inbox_id, registers a oneshot channel
              →  writes outbox row { source: http, payload, reply_to: inbox_id }
              →  awaits on the channel (with timeout = script's wall-clock + buffer)

Dispatcher    →  picks outbox row
              →  dispatches to executor (gate + spawn_blocking + Rhai)
              →  if reply_to.is_some(): resolves the channel with the result
              →  if reply_to.is_none(): records completion + retries on failure per trigger config

Orchestrator  →  channel resolves → returns response to HTTP caller
              →  on timeout: returns 504 or 500 → see status-code calls below

The HTTP caller's experience is unchanged (synchronous request/response). Under the hood, dispatch is identical for every invocation source.

Implementation by deployment mode

Mode Mechanism Trade-off
In-process (v1.1.1, MVP) Per-orchestrator HashMap<InboxId, oneshot::Sender<Result>>; dispatcher resolves the oneshot Sub-ms wake-up; fails across process boundaries
Cross-process (cluster mode v1.3+) Postgres LISTEN/NOTIFY keyed on inbox_id, with a responses row as durable backup Sub-10ms wake-up; survives across nodes; needs careful long-listener management
Polling fallback Orchestrator polls responses table for inbox_id every ~10ms Simple; ~10ms minimum latency; only as fallback

Latency cost (honest numbers)

Per sync HTTP request, NATS-style adds: ~1-2ms Postgres write (outbox) + sub-ms dispatcher wake (in-process channel) + ~1ms response resolve = ~2-5ms overhead. For most scripts (10-100ms execution), this is noise. PiCloud isn't optimizing for sub-ms; the architectural unification is worth a few ms.

Retry policy — reply_to IS the signal

Outbox row Retry behavior
reply_to.is_some() Never retry. Caller is waiting; retrying means the script might run twice and the caller gets one of two outcomes. Always: one attempt, surface result (success or failure) to inbox.
reply_to.is_none() Retry per trigger's configured policy. Default: 3 attempts, exponential backoff (1s, 2s, 4s), dead-letter after.

Per-trigger config lives on the trigger row:

trigger { source: cron,   schedule: "0 */5 * * * *",
          retry: { max_attempts: 5, backoff: exponential, base_ms: 1000 } }

trigger { source: pubsub, topic: "user.created",
          retry: { max_attempts: 3, backoff: linear,      base_ms: 500  } }

trigger { source: http,   method: POST, path: "/api/foo",
          dispatch_mode: sync }   // retry absent — sync HTTP is always 1-attempt

Failure / crash handling

With NATS-style indirection, there are new ways for a sync HTTP request to vanish. Every failure path must resolve the orchestrator's oneshot channel with something:

Failure mode Detection Caller sees
Script throws / runtime error Executor returns ExecError::Runtime → written to inbox 502 (or 500 — see status-code discussion)
Script exceeds wall-clock tokio::time::timeout fires inside dispatcher → written to inbox 504 (or 500)
Operation budget exceeded Executor returns ExecError::OperationBudgetExceeded → inbox 507 (or 500)
Executor process crashes mid-execution JoinErrorExecError::Runtime → inbox 500
Dispatcher process dies between claim and reply Orchestrator's wait times out 500
Outbox write fails (Postgres unavailable) Orchestrator never publishes; immediate error 500
Orchestrator's own wait times out unexpectedly Channel timeout fires before inbox resolves 504 (or 500)

Every path resolves the channel with a result. The orchestrator's outer timeout is the backstop for "dispatcher just died completely".

Status code strategy — open question

Today's orchestrator distinguishes 422 / 502 / 503 / 504 / 507 / 500. User raised "everything should be 500" framing. Two options:

  • Option A (recommended): keep existing distinctions. Script crashes → 502, timeouts → 504, overloaded → 503, parse errors → 422, dispatcher vanished → 500. Clients get actionable info.
  • Option B: flatten to 500 for everything that's "platform couldn't return a useful response". Simpler surface; loses actionable distinctions.

Status

  • NATS-style for sync HTTP: leaning yes; resolves the outbox vs direct-dispatch tension.
  • reply_to presence as the "don't retry" signal: leaning yes.
  • Default retry policy (3 attempts, exp backoff 1s/2s/4s): proposed.

Open calls

  1. NATS-style request/reply for sync HTTP — confirmed?
  2. Status code strategy: keep existing distinctions (A, recommended) or flatten to 500 (B)?
  3. Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
  4. Cancel-on-timeout semantics: if orchestrator's wait times out but executor finishes successfully later, do we (a) discard the late result, (b) write it to an "abandoned executions" table for debugging, or (c) attempt to ack the caller late? Leaning (b) — log + discard but keep the row for forensics.

4. Dead-letter handling

Events that exhaust their retry policy land in a separate dead_letters table (not a flag on the outbox — outbox should stay a queue with fast inserts and scans). Users handle dead letters by registering a script for the new dead_letter trigger kind.

Schema sketch

CREATE TABLE dead_letters (
  id                UUID PRIMARY KEY,
  app_id            UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
  original_event_id UUID NOT NULL,         -- the outbox row id
  source            TEXT NOT NULL,         -- "kv", "cron", "pubsub", "queue", "email"
  op                TEXT NOT NULL,
  trigger_id        UUID,                  -- which trigger config fired (null for direct dispatches)
  script_id         UUID,                  -- which script failed
  payload           JSONB NOT NULL,        -- the event payload, verbatim
  attempt_count     INT  NOT NULL,
  first_attempt_at  TIMESTAMPTZ NOT NULL,
  last_attempt_at   TIMESTAMPTZ NOT NULL,
  last_error        TEXT NOT NULL,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  resolved_at       TIMESTAMPTZ,           -- null = unresolved
  resolution        TEXT                   -- "replayed" | "ignored" | "handled_by_script" | "handler_failed"
);

CREATE INDEX idx_dead_letters_app_unresolved
  ON dead_letters(app_id) WHERE resolved_at IS NULL;

Dead letter as trigger source

trigger {
  source: dead_letter,
  filter: { source: "kv" },        -- optional; defaults to "any source"
  script_id: <your handler>,
  dispatch_mode: async,
  retry: { max_attempts: 1 }       -- forced — see recursion stop rule below
}

Filterable on:

  • source: only dead letters from a particular event source (kv, cron, pubsub, …)
  • trigger_id: only dead letters from a particular trigger config
  • script_id: only dead letters from a particular script
  • No filter: every dead letter fires this handler

ctx.event for a dead-letter handler:

ctx.event.source              // "dead_letter"
ctx.event.dead_letter = #{
    original: #{
        source:     "kv",
        op:         "insert",
        collection: "widgets",
        key:        "k1",
        payload:    #{ ... }
    },
    attempts:        3,
    last_error:      "script timeout after 30s",
    trigger_id:      "...",
    script_id:       "...",
    first_attempt_at: "2026-05-30T12:00:00.000Z",
    last_attempt_at:  "2026-05-30T12:00:14.000Z"
}

The handler can log::error, send email::send to admins, write to docs::collection("incidents").create(...), post to external alerting via http::post, or call dead_letters::replay(id) if it decides retry is favorable.

Recursion stop rule

Dead-letter handlers execute once, no retry, and CANNOT themselves be dead-lettered.

When the dispatcher invokes a dead-letter trigger, the resulting execution is marked is_dead_letter_handler = true. If it fails:

  • Failure is logged to the structured log (full payload + error)
  • A metric is bumped (picloud_dead_letter_handler_failures)
  • Original dead-letter row annotated with resolution = "handler_failed"
  • Nothing else is fired. Chain stops definitively.

This is the only safe stop rule. If your alerting script is broken, the platform shouldn't try to alert about that with the same broken script.

Defaults

No automatic handler. Dead letters silently land in the table. Users opt into handling by registering a trigger. The dashboard surfaces an unresolved-count badge so users notice.

This avoids over-engineering — most apps will run for months without a dead-letter trigger; the table is the durable record either way.

Sync HTTP failures don't dead-letter

Sync HTTP requests (reply_to.is_some()) failures don't land in dead_letters. Caller already got an error response; every failed HTTP request landing in dead_letters would flood the table; execution_logs already captures sync request failures. If a user wants alerts on HTTP endpoint failures, that's monitoring (v1.3+ territory), not dead-lettering.

Pub/sub fan-out dead-letters independently

One pubsub::publish → N subscribers → each retries independently → each can independently dead-letter. So one publish can produce N dead-letter rows (one per subscriber that exhausted retries). Subscribers are independent failure domains.

Manual replay

Surface Use case
POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/replay Admin clicks "replay" in dashboard
dead_letters::replay(id) Rhai SDK A handler script decides to retry programmatically
dead_letters::resolve(id, reason) Rhai SDK A handler decides "this is fine, don't bother me"

Replay re-inserts the original event into the outbox; dispatcher tries again from scratch.

Retention

Time-based: delete dead letters older than 30 days by default. Configurable per-app via app settings, or globally via env var (PICLOUD_DEAD_LETTER_RETENTION_DAYS). A weekly GC job in the manager handles the deletion using FOR UPDATE SKIP LOCKED.

Status

  • Separate dead_letters table: leaning yes.
  • dead_letter as trigger kind: leaning yes.
  • Recursion stop rule (handlers can't be dead-lettered): leaning yes.
  • No default handler (rows just sit in table): leaning yes.
  • Sync HTTP failures don't dead-letter: leaning yes.

Open calls

  1. Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
  2. No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or do you want a built-in "log to admin notifications channel" default?
  3. 30-day default retention sensible, or longer/shorter?
  4. Include Rhai SDK (dead_letters::list/replay/resolve) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?

5. Realtime updates for external clients

Apps built on PiCloud need a way for browser/mobile clients to receive live updates (chat messages, dashboard data, multiplayer state, notifications). Today's pub/sub is internal-only (script ↔ script via triggers).

The chosen approach

Option C (from prior debate): topics with opt-in external subscription.

  • One pubsub::publish(topic, msg) API for scripts — produces a single event
  • Topics are internal-only by default — script triggers can subscribe; external clients cannot
  • Apps explicitly mark topics as externally-subscribable (per-topic config in dashboard / API)
  • External clients connect to GET /realtime/topics/{topic} via SSE and receive only messages published to topics they're permitted to subscribe to

Wins: one publish API for scripts (DRY), topics don't leak by default (security), external visibility is an explicit opt-in per topic.

Transport: SSE first

SSE (Server-Sent Events) for v1.x:

  • Simpler than WebSocket; works through any HTTP proxy without protocol upgrade
  • Browsers auto-reconnect on disconnect
  • Sufficient for "server-pushed events to the browser" (the dominant use case)

WebSocket is added later if bidirectional comms (chat-style) warrant it.

Auth model for external subscribers

Three flavors, ordered by complexity:

  • Public topics — anyone with the URL connects. For marketing-style broadcasts, public stat boards.
  • Token-gated topics — client presents a token issued by a script. Pusher / Ably-style. Token can be a PiCloud API key (v1.1.6) or a users-SDK session token (v1.1.8+).
  • Script-mediated — a script handles each subscribe request and decides yes/no. Most flexible, defer to v1.2.

Ship public + token-gated in v1.1.6; defer script-mediated.

Status

  • Approach C (opt-in external subscription): leaning yes.
  • SSE first, WebSocket later: leaning yes.
  • Public + token-gated auth in v1.1.6: leaning yes.

Open calls

  1. Approach C confirmed (vs A: pubsub IS realtime, B: separate channels:: service)?
  2. SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
  3. Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?

6. Frontend client library

Strategic positioning question: how much should PiCloud expose to frontend developers building apps on top of it?

The two ends of the spectrum

End Frontend gets Examples
Minimalist HTTP to dev-defined script endpoints + SSE on dev-marked-public topics. Nothing else. AWS Lambda + API Gateway, Cloudflare Workers, Deno Deploy
Maximalist Direct client-side access to KV/docs/users/files. Frontend writes kv.get(), docs.find(), no Rhai script for trivial reads. Firebase, Supabase, AWS Amplify

PiCloud today sits at the minimalist end (services exist for scripts to use, not for frontends). Crossing to maximalist would be a real product pivot, not a feature add.

The chosen approach: hybrid

Ship a client library that talks to scripts, not to services. Specifically, three things:

  1. Typed HTTP client to dev-defined endpointspicloud.endpoint('/api/users').post({ name: 'alice' }). Fetch wrapper with auth header injection, retry logic, structured error handling.
  2. SSE subscriptionpicloud.subscribe('chat-room-123', msg => …). Auto-reconnect, token refresh, backpressure.
  3. Auth flow helperspicloud.auth.login(email, password), picloud.auth.logout(), picloud.auth.token. These call dev-defined endpoints under the hood (/api/auth/login etc.); the lib just standardizes the dance + token storage.

Crucially: no picloud.kv.get() or picloud.docs.find() from the frontend. Those stay server-side, behind dev-written Rhai scripts.

Why hybrid, not maximalist

Firebase trades security for DX; the security-rule misconfiguration footgun is the #1 cause of accidental data exposure in serverless apps. PiCloud's "solo dev / consumer hardware" audience does not have the operational capacity to defend a Firebase-style attack surface against misconfiguration. The script layer is also where PiCloud differentiates — if frontends bypass scripts to talk directly to services, we're competing with Supabase head-to-head (unwinnable, they're better-resourced and have a 5-year head start).

Why hybrid, not pure minimalist

A frontend dev shouldn't have to hand-roll fetch wrappers, SSE reconnect logic, and token-refresh dances. That stuff is identical across every app. Shipping it as @picloud/client is genuinely valuable — it doesn't expand the security surface (scripts still gate everything), it just removes boilerplate.

TypeScript first

Ship TypeScript first. Cross-language story (Python, Swift, Kotlin, Rust, …) deferred until demand emerges. TS covers the dominant "web app + mobile via React Native" segment.

Status

  • Hybrid model (frontend through scripts only): leaning yes.
  • TypeScript first, other languages deferred: leaning yes.
  • Co-ship with realtime as v1.1.6: leaning yes (SSE wrapper is the killer feature of the lib).

Open calls

  1. Hybrid model — confirmed, or do you want to seriously evaluate Firebase-mode?
  2. TypeScript first, multi-language deferred — confirmed?
  3. Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
  4. Type safety: hand-written types only, or aim for codegen from script-declared schemas? Codegen is big — defer to v1.2+ if at all?

7. Revised v1.1.x roadmap

Net changes vs the blueprint §12 roadmap:

  • v1.1.5 pub/sub: now via trigger outbox (drops LISTEN/NOTIFY plan), tightening implementation scope
  • NEW v1.1.6 Realtime Channels & Client Library: realtime SSE + @picloud/client TS package; co-shipped
  • v1.1.7+ items shifted by one (was v1.1.6/7/8 → now v1.1.7/8/9)
  • Dead letters and the unified outbox/dispatcher are absorbed into v1.1.1's existing scope (triggers framework)
Version Capability
v1.1.0 Foundation & Standard Library — SDK shape, Services bundle, SdkCallCx, ExecutionGate, ServiceEventEmitter trait shape; stdlib utilities (regex, random, time, json, base64, hex, url). ✓ Shipped.
v1.1.1 Storage & Events — KV store keyed (app_id, collection, key); triggers framework (universal outbox + dispatcher + NATS-style sync HTTP via inbox + per-trigger retry config + dead-letter table & dead_letter trigger source + trigger CRUD + ctx.event + depth limit); KV trigger kinds.
v1.1.2 Documentsdocs::collection(name).create/find/update/delete/list with docs:* triggers.
v1.1.3 Modulesscripts.kind, per-app resolver replaces DummyModuleResolver, AST cache + dep-graph invalidation.
v1.1.4 Outbound HTTP & Scheduled Taskshttp::* with SSRF deny-list; cron triggers (small now that the framework exists).
v1.1.5 Files & Pub/Sub — filesystem-backed blobs (files/<app_id>/<id[0:2]>/<id>) with files:* triggers; pub/sub via the universal outbox with pubsub:* triggers.
v1.1.6 Realtime Channels & Client Library (new) — SSE-based external subscription to per-app pub/sub topics (public + API-key auth modes); @picloud/client TypeScript package (typed HTTP, SSE subscription, auth helpers).
v1.1.7 Configuration & Email (was v1.1.6) — encrypted per-app secrets; outbound email::send/send_html + inbound email:receive trigger.
v1.1.8 User Management (was v1.1.7)users::* for in-script CRUD, auth, roles, invites, password reset.
v1.1.9 Durable Queues & Function Composition (was v1.1.8)queue::* with queue:receive trigger; invoke() + retry::* (closures-as-args, re-entrant Rhai).
v1.2 Workflows & Hierarchies (per blueprint §Phase 5) — DAG execution, advanced docs query, interceptors, read triggers, audit log, script-mediated realtime auth.
v1.3+ Scale & Ops (per blueprint §Phase 6) — cluster mode (NATS-style request/reply swaps to LISTEN/NOTIFY), cross-app data sharing, script versioning + rollback, rate limiting, richer auth, metrics, distributed tracing, webhooks, S3, monitoring/alerting on HTTP endpoint failures.

The v1.1.9 release marks the end of the v1.1.x expansion cadence. v1.2 is the next minor product bump (phase milestone per versioning policy).


Consolidated open calls

Numbered for easy reference in conversation. All currently un-answered.

§1 — Messaging primitives

  1. Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
  2. Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?

§2 — Universal trigger outbox

  1. Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
  2. Ship dispatch_mode: async for HTTP routes in v1.1.1, or defer to a later release?
  3. Keep routes and triggers as separate tables (unified at the dispatcher only), or merge schemas?

§3 — NATS-style sync HTTP

  1. NATS-style request/reply for sync HTTP — confirmed?
  2. Status code strategy: keep existing distinctions (recommended) or flatten to 500?
  3. Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
  4. Cancel-on-timeout semantics: discard late results (a), write to "abandoned executions" table for debugging (b — recommended), or attempt to ack the caller late (c)?

§4 — Dead letters

  1. Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
  2. No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or built-in "log to admin notifications channel" default?
  3. 30-day default retention sensible, or longer/shorter?
  4. Include Rhai SDK (dead_letters::list/replay/resolve) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?

§5 — Realtime

  1. Approach C confirmed (opt-in external subscription on pub/sub topics) vs A: pubsub IS realtime, B: separate channels:: service?
  2. SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
  3. Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?

§6 — Frontend client library

  1. Hybrid model (frontend through scripts only) — confirmed, or seriously evaluate Firebase-mode?
  2. TypeScript first, multi-language deferred — confirmed?
  3. Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
  4. Type safety: hand-written types only, or aim for codegen from script-declared schemas? Defer codegen to v1.2+ if at all?

Lifecycle of this document

  • Created at the v1.1.0 → v1.1.1 boundary (after the foundation PR series shipped).
  • Each section gets pruned once its decisions ship and land in the blueprint.
  • Open calls are answered in conversation, then folded into the corresponding section as "Decided: X" with the date.
  • Document deleted when v1.1.9 ships — everything by then is either in the blueprint, in code, or explicitly deferred to v1.2+.