Files

MechaCat02 10cfde9e40 docs(v1.1.x): planning notes — in-flight decisions + revised roadmap

Consolidates the architectural conversations that followed the v1.1.0
release but haven't yet landed in the blueprint or in code. Six topic
areas, each with status + open calls:

  1. Messaging primitives — invoke vs pub/sub vs queue, recipient
     model and delivery semantics
  2. Universal trigger outbox — async dispatch substrate for every
     event source (sync HTTP excepted, see #3)
  3. NATS-style sync HTTP — per-request inbox + oneshot channel lets
     sync HTTP ride the outbox without losing the response path
  4. Dead-letter handling — separate table, dead_letter trigger kind,
     recursion stop rule, retention defaults
  5. Realtime updates — SSE-based external subscription to per-app
     pub/sub topics with opt-in exposure
  6. Frontend client library — hybrid model (TS lib that talks to
     dev-defined script endpoints, not to services)

Plus a revised v1.1.x roadmap: realtime adds at v1.1.6 (was Config &
Email), shifting later items by one to v1.1.9 (was v1.1.8).

20 open calls consolidated at the bottom, numbered for reference.
Document is meant to be pruned as decisions ship; deleted entirely
when v1.1.9 lands.

No blueprint changes yet — those wait for the open calls to be
answered and the corresponding PRs to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-31 20:24:53 +02:00

30 KiB

Raw Blame History

v1.1.x design notes — in-flight decisions + revised roadmap

Planning document for the v1.1.x release series. Companion to:

serverless_cloud_blueprint.md — authoritative design
docs/sdk-shape.md — SDK conventions (settled in v1.1.0)
docs/stdlib-reference.md — stdlib API (settled in v1.1.0)
docs/versioning.md — versioning policy (post-1.0 carve-out settled with v1.1.0)

Items in this doc are either tentatively decided but not yet shipped or open calls awaiting the maintainer's decision. Once an item ships, its content moves into the blueprint and the corresponding section here gets pruned.

This document was created at the v1.1.0 → v1.1.1 boundary, capturing the architectural conversations that followed v1.1.0 but haven't yet landed in code or in the blueprint.

1. The three messaging primitives

PiCloud will expose three distinct messaging concepts. The right way to slice them is along recipient model and delivery semantics:

	Recipients	Durability	Delivery	Retry on script failure	Mental model
`invoke(script_id, args)`	One named script	None (or fire-and-forget durable)	At-most-once sync, or at-least-once async	Caller-controlled via `retry::*`	Function call
`pubsub::publish(topic, msg)`	All scripts subscribed via trigger	Through outbox	At-least-once per subscriber	Per-subscriber retry up to N, then dead-letter	Fan-out broadcast
`queue::enqueue(name, msg)`	Exactly one consumer wins	Durable table	At-least-once total	Visibility timeout + nack-on-throw	Work distribution

Critical distinction: pub/sub and queue both end up at-least-once, but the subscriber model differs. Queue: 1 message → 1 delivery record → consumers compete. Pub/sub: 1 message → N delivery records (one per subscriber) → no competition.

Pub/sub reframe — drop ephemeral, use the outbox

The original blueprint plan was pub/sub via Postgres LISTEN/NOTIFY (ephemeral, sub-millisecond fan-out). Reframe to reuse the triggers framework's outbox infrastructure:

pubsub::publish(topic, msg) writes to the outbox
Dispatcher fans out one delivery record per subscribed script trigger
Each delivery retried on failure with the same machinery as KV / doc / file triggers
After N retries → dead-letter (see §4)

Wins: one delivery model in the whole system, durable pub/sub for free, shared observability/retry/dead-letter tooling across every event-firing surface.

Cost: ~1ms Postgres write per publish (vs in-memory NOTIFY). For solo-dev / consumer hardware, the right tradeoff. If sub-ms ever matters, pubsub::publish_ephemeral is a future addition that bypasses the outbox.

Queue stays separate

Pub/sub-through-outbox cannot model "work distribution with backpressure" cleanly. Queue keeps its own table:

Producer: queue::enqueue(name, msg) → queue table
Consumer: queue:receive trigger fires when message available; runtime claims with FOR UPDATE SKIP LOCKED + visibility timeout
Script returns successfully → auto-ack (delete row)
Script throws → auto-nack (clear claim; message becomes visible again)
Visibility timeout exceeded → reclaim allowed (handles crashed consumers)
Max delivery attempts → dead-letter

The queue table IS the outbox for queue semantics — no double-buffering.

Status

Pub/sub via trigger outbox: leaning yes; needs final ack.
Queue stays separate from pub/sub: leaning yes.
Drop LISTEN/NOTIFY plan: leaning yes.

Open calls

Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?

2. Universal trigger outbox

The triggers framework's outbox should be the universal substrate for async dispatch. Every event source that fires scripts asynchronously writes to the same outbox table; one dispatcher reads from it and routes to the executor with shared load control, retry, dead-letter, and trigger-depth tracking.

What runs through the outbox

Ingress	Path	Reason
HTTP request (sync)	Direct: orchestrator → executor → response (with NATS-style indirection — see §3)	Caller is waiting; the inbox pattern makes this work via the outbox
HTTP request (async, opt-in)	Orchestrator writes outbox → returns 202 → dispatcher → executor	Webhooks, fire-and-forget endpoints; explicit opt-in via route config
Cron tick	Scheduler writes outbox → dispatcher → executor	No caller; naturally async
KV / doc / file change	Service writes outbox → dispatcher → executor	No caller; the originating script already returned
Pub/sub publish	Service writes outbox → dispatcher → executor (per subscriber)	Fan-out semantics
Queue message	Queue table IS the outbox; dispatcher claims via `FOR UPDATE SKIP LOCKED`	Avoids double-buffering
Inbound email	SMTP receiver writes outbox → dispatcher → executor	No caller

What this gives

One dispatcher = one place for load control (the existing ExecutionGate), retry, dead-letter, trigger-depth tracking, fan-out. New event source = "write to outbox in this shape", nothing else.
Routes become a trigger kind, conceptually. A route is (source=http, filter=method+path, script_id, dispatch_mode=sync|async). Schema-wise the routes table likely stays separate from the new triggers table (polymorphic JSON columns get ugly), but the mental model collapses to "everything that fires a script is a trigger".
dispatch_mode = async is a per-route opt-in. Webhook handlers can return 202 immediately and process in the background — dispatcher handles retries, caller gets a snappy ack.
Replay and debugging. Every async invocation has an outbox row; admin can re-fire a trigger by re-dispatching the row.
Decoupled lifecycle. Dispatcher can be paused for maintenance without affecting HTTP ingress (it just queues); HTTP can degrade (overflow 503s) without affecting async work already in the outbox.

What this doesn't change

Sync HTTP still hits the ExecutionGate the same way (now via the dispatcher).
Async outbox dispatch also hits the gate when the dispatcher picks a row. Sync and async share the cap on actual blocking-thread-in-use.
Trigger CRUD likely stays in per-kind tables for schema sanity; the unification is conceptual + dispatch-layer, not schema-layer.

Status

Universal outbox for async dispatch: leaning yes.
Routes-as-trigger conceptually: leaning yes.
Routes-as-trigger schema-wise: leaning no (keep separate tables).
Per-route dispatch_mode: sync|async: leaning yes for v1.1.1 since the dispatcher is already being built.

Open calls

Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
Ship dispatch_mode: async for HTTP routes in v1.1.1, or defer to a later release?
Keep routes and triggers as separate tables (unified at the dispatcher only), or merge schemas?

3. NATS-style request/reply for sync HTTP

The constraint that makes "universal outbox" tricky: HTTP has a caller waiting. We can't write to outbox, return 202, and walk away — the user's browser expects 200 OK with body. NATS's request/reply pattern resolves this elegantly.

Pattern

HTTP request  →  orchestrator generates inbox_id, registers a oneshot channel
              →  writes outbox row { source: http, payload, reply_to: inbox_id }
              →  awaits on the channel (with timeout = script's wall-clock + buffer)

Dispatcher    →  picks outbox row
              →  dispatches to executor (gate + spawn_blocking + Rhai)
              →  if reply_to.is_some(): resolves the channel with the result
              →  if reply_to.is_none(): records completion + retries on failure per trigger config

Orchestrator  →  channel resolves → returns response to HTTP caller
              →  on timeout: returns 504 or 500 → see status-code calls below

The HTTP caller's experience is unchanged (synchronous request/response). Under the hood, dispatch is identical for every invocation source.

Implementation by deployment mode

Mode	Mechanism	Trade-off
In-process (v1.1.1, MVP)	Per-orchestrator `HashMap<InboxId, oneshot::Sender<Result>>`; dispatcher resolves the oneshot	Sub-ms wake-up; fails across process boundaries
Cross-process (cluster mode v1.3+)	Postgres `LISTEN/NOTIFY` keyed on `inbox_id`, with a `responses` row as durable backup	Sub-10ms wake-up; survives across nodes; needs careful long-listener management
Polling fallback	Orchestrator polls `responses` table for `inbox_id` every ~10ms	Simple; ~10ms minimum latency; only as fallback

Latency cost (honest numbers)

Per sync HTTP request, NATS-style adds: ~1-2ms Postgres write (outbox) + sub-ms dispatcher wake (in-process channel) + ~1ms response resolve = ~2-5ms overhead. For most scripts (10-100ms execution), this is noise. PiCloud isn't optimizing for sub-ms; the architectural unification is worth a few ms.

Retry policy — `reply_to` IS the signal

Outbox row	Retry behavior
`reply_to.is_some()`	Never retry. Caller is waiting; retrying means the script might run twice and the caller gets one of two outcomes. Always: one attempt, surface result (success or failure) to inbox.
`reply_to.is_none()`	Retry per trigger's configured policy. Default: 3 attempts, exponential backoff (1s, 2s, 4s), dead-letter after.

Per-trigger config lives on the trigger row:

trigger { source: cron,   schedule: "0 */5 * * * *",
          retry: { max_attempts: 5, backoff: exponential, base_ms: 1000 } }

trigger { source: pubsub, topic: "user.created",
          retry: { max_attempts: 3, backoff: linear,      base_ms: 500  } }

trigger { source: http,   method: POST, path: "/api/foo",
          dispatch_mode: sync }   // retry absent — sync HTTP is always 1-attempt

Failure / crash handling

With NATS-style indirection, there are new ways for a sync HTTP request to vanish. Every failure path must resolve the orchestrator's oneshot channel with something:

Failure mode	Detection	Caller sees
Script throws / runtime error	Executor returns `ExecError::Runtime` → written to inbox	502 (or 500 — see status-code discussion)
Script exceeds wall-clock	`tokio::time::timeout` fires inside dispatcher → written to inbox	504 (or 500)
Operation budget exceeded	Executor returns `ExecError::OperationBudgetExceeded` → inbox	507 (or 500)
Executor process crashes mid-execution	`JoinError` → `ExecError::Runtime` → inbox	500
Dispatcher process dies between claim and reply	Orchestrator's wait times out	500
Outbox write fails (Postgres unavailable)	Orchestrator never publishes; immediate error	500
Orchestrator's own wait times out unexpectedly	Channel timeout fires before inbox resolves	504 (or 500)

Every path resolves the channel with a result. The orchestrator's outer timeout is the backstop for "dispatcher just died completely".

Status code strategy — open question

Today's orchestrator distinguishes 422 / 502 / 503 / 504 / 507 / 500. User raised "everything should be 500" framing. Two options:

Option A (recommended): keep existing distinctions. Script crashes → 502, timeouts → 504, overloaded → 503, parse errors → 422, dispatcher vanished → 500. Clients get actionable info.
Option B: flatten to 500 for everything that's "platform couldn't return a useful response". Simpler surface; loses actionable distinctions.

Status

NATS-style for sync HTTP: leaning yes; resolves the outbox vs direct-dispatch tension.
reply_to presence as the "don't retry" signal: leaning yes.
Default retry policy (3 attempts, exp backoff 1s/2s/4s): proposed.

Open calls

NATS-style request/reply for sync HTTP — confirmed?
Status code strategy: keep existing distinctions (A, recommended) or flatten to 500 (B)?
Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
Cancel-on-timeout semantics: if orchestrator's wait times out but executor finishes successfully later, do we (a) discard the late result, (b) write it to an "abandoned executions" table for debugging, or (c) attempt to ack the caller late? Leaning (b) — log + discard but keep the row for forensics.

4. Dead-letter handling

Events that exhaust their retry policy land in a separate dead_letters table (not a flag on the outbox — outbox should stay a queue with fast inserts and scans). Users handle dead letters by registering a script for the new dead_letter trigger kind.

Schema sketch

CREATE TABLE dead_letters (
  id                UUID PRIMARY KEY,
  app_id            UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
  original_event_id UUID NOT NULL,         -- the outbox row id
  source            TEXT NOT NULL,         -- "kv", "cron", "pubsub", "queue", "email"
  op                TEXT NOT NULL,
  trigger_id        UUID,                  -- which trigger config fired (null for direct dispatches)
  script_id         UUID,                  -- which script failed
  payload           JSONB NOT NULL,        -- the event payload, verbatim
  attempt_count     INT  NOT NULL,
  first_attempt_at  TIMESTAMPTZ NOT NULL,
  last_attempt_at   TIMESTAMPTZ NOT NULL,
  last_error        TEXT NOT NULL,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  resolved_at       TIMESTAMPTZ,           -- null = unresolved
  resolution        TEXT                   -- "replayed" | "ignored" | "handled_by_script" | "handler_failed"
);

CREATE INDEX idx_dead_letters_app_unresolved
  ON dead_letters(app_id) WHERE resolved_at IS NULL;

Dead letter as trigger source

trigger {
  source: dead_letter,
  filter: { source: "kv" },        -- optional; defaults to "any source"
  script_id: <your handler>,
  dispatch_mode: async,
  retry: { max_attempts: 1 }       -- forced — see recursion stop rule below
}

Filterable on:

source: only dead letters from a particular event source (kv, cron, pubsub, …)
trigger_id: only dead letters from a particular trigger config
script_id: only dead letters from a particular script
No filter: every dead letter fires this handler

ctx.event for a dead-letter handler:

ctx.event.source              // "dead_letter"
ctx.event.dead_letter = #{
    original: #{
        source:     "kv",
        op:         "insert",
        collection: "widgets",
        key:        "k1",
        payload:    #{ ... }
    },
    attempts:        3,
    last_error:      "script timeout after 30s",
    trigger_id:      "...",
    script_id:       "...",
    first_attempt_at: "2026-05-30T12:00:00.000Z",
    last_attempt_at:  "2026-05-30T12:00:14.000Z"
}

The handler can log::error, send email::send to admins, write to docs::collection("incidents").create(...), post to external alerting via http::post, or call dead_letters::replay(id) if it decides retry is favorable.

Recursion stop rule

Dead-letter handlers execute once, no retry, and CANNOT themselves be dead-lettered.

When the dispatcher invokes a dead-letter trigger, the resulting execution is marked is_dead_letter_handler = true. If it fails:

Failure is logged to the structured log (full payload + error)
A metric is bumped (picloud_dead_letter_handler_failures)
Original dead-letter row annotated with resolution = "handler_failed"
Nothing else is fired. Chain stops definitively.

This is the only safe stop rule. If your alerting script is broken, the platform shouldn't try to alert about that with the same broken script.

Defaults

No automatic handler. Dead letters silently land in the table. Users opt into handling by registering a trigger. The dashboard surfaces an unresolved-count badge so users notice.

This avoids over-engineering — most apps will run for months without a dead-letter trigger; the table is the durable record either way.

Sync HTTP failures don't dead-letter

Sync HTTP requests (reply_to.is_some()) failures don't land in dead_letters. Caller already got an error response; every failed HTTP request landing in dead_letters would flood the table; execution_logs already captures sync request failures. If a user wants alerts on HTTP endpoint failures, that's monitoring (v1.3+ territory), not dead-lettering.

Pub/sub fan-out dead-letters independently

One pubsub::publish → N subscribers → each retries independently → each can independently dead-letter. So one publish can produce N dead-letter rows (one per subscriber that exhausted retries). Subscribers are independent failure domains.

Manual replay

Surface	Use case
`POST /api/v1/admin/apps/{id}/dead_letters/{dl_id}/replay`	Admin clicks "replay" in dashboard
`dead_letters::replay(id)` Rhai SDK	A handler script decides to retry programmatically
`dead_letters::resolve(id, reason)` Rhai SDK	A handler decides "this is fine, don't bother me"

Replay re-inserts the original event into the outbox; dispatcher tries again from scratch.

Retention

Time-based: delete dead letters older than 30 days by default. Configurable per-app via app settings, or globally via env var (PICLOUD_DEAD_LETTER_RETENTION_DAYS). A weekly GC job in the manager handles the deletion using FOR UPDATE SKIP LOCKED.

Status

Separate dead_letters table: leaning yes.
dead_letter as trigger kind: leaning yes.
Recursion stop rule (handlers can't be dead-lettered): leaning yes.
No default handler (rows just sit in table): leaning yes.
Sync HTTP failures don't dead-letter: leaning yes.

Open calls

Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or do you want a built-in "log to admin notifications channel" default?
30-day default retention sensible, or longer/shorter?
Include Rhai SDK (dead_letters::list/replay/resolve) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?

5. Realtime updates for external clients

Apps built on PiCloud need a way for browser/mobile clients to receive live updates (chat messages, dashboard data, multiplayer state, notifications). Today's pub/sub is internal-only (script ↔ script via triggers).

The chosen approach

Option C (from prior debate): topics with opt-in external subscription.

One pubsub::publish(topic, msg) API for scripts — produces a single event
Topics are internal-only by default — script triggers can subscribe; external clients cannot
Apps explicitly mark topics as externally-subscribable (per-topic config in dashboard / API)
External clients connect to GET /realtime/topics/{topic} via SSE and receive only messages published to topics they're permitted to subscribe to

Wins: one publish API for scripts (DRY), topics don't leak by default (security), external visibility is an explicit opt-in per topic.

Transport: SSE first

SSE (Server-Sent Events) for v1.x:

Simpler than WebSocket; works through any HTTP proxy without protocol upgrade
Browsers auto-reconnect on disconnect
Sufficient for "server-pushed events to the browser" (the dominant use case)

WebSocket is added later if bidirectional comms (chat-style) warrant it.

Auth model for external subscribers

Three flavors, ordered by complexity:

Public topics — anyone with the URL connects. For marketing-style broadcasts, public stat boards.
Token-gated topics — client presents a token issued by a script. Pusher / Ably-style. Token can be a PiCloud API key (v1.1.6) or a users-SDK session token (v1.1.8+).
Script-mediated — a script handles each subscribe request and decides yes/no. Most flexible, defer to v1.2.

Ship public + token-gated in v1.1.6; defer script-mediated.

Status

Approach C (opt-in external subscription): leaning yes.
SSE first, WebSocket later: leaning yes.
Public + token-gated auth in v1.1.6: leaning yes.

Open calls

Approach C confirmed (vs A: pubsub IS realtime, B: separate channels:: service)?
SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?

6. Frontend client library

Strategic positioning question: how much should PiCloud expose to frontend developers building apps on top of it?

The two ends of the spectrum

End	Frontend gets	Examples
Minimalist	HTTP to dev-defined script endpoints + SSE on dev-marked-public topics. Nothing else.	AWS Lambda + API Gateway, Cloudflare Workers, Deno Deploy
Maximalist	Direct client-side access to KV/docs/users/files. Frontend writes `kv.get()`, `docs.find()`, no Rhai script for trivial reads.	Firebase, Supabase, AWS Amplify

PiCloud today sits at the minimalist end (services exist for scripts to use, not for frontends). Crossing to maximalist would be a real product pivot, not a feature add.

The chosen approach: hybrid

Ship a client library that talks to scripts, not to services. Specifically, three things:

Typed HTTP client to dev-defined endpoints — picloud.endpoint('/api/users').post({ name: 'alice' }). Fetch wrapper with auth header injection, retry logic, structured error handling.
SSE subscription — picloud.subscribe('chat-room-123', msg => …). Auto-reconnect, token refresh, backpressure.
Auth flow helpers — picloud.auth.login(email, password), picloud.auth.logout(), picloud.auth.token. These call dev-defined endpoints under the hood (/api/auth/login etc.); the lib just standardizes the dance + token storage.

Crucially: no picloud.kv.get() or picloud.docs.find() from the frontend. Those stay server-side, behind dev-written Rhai scripts.

Why hybrid, not maximalist

Firebase trades security for DX; the security-rule misconfiguration footgun is the #1 cause of accidental data exposure in serverless apps. PiCloud's "solo dev / consumer hardware" audience does not have the operational capacity to defend a Firebase-style attack surface against misconfiguration. The script layer is also where PiCloud differentiates — if frontends bypass scripts to talk directly to services, we're competing with Supabase head-to-head (unwinnable, they're better-resourced and have a 5-year head start).

Why hybrid, not pure minimalist

A frontend dev shouldn't have to hand-roll fetch wrappers, SSE reconnect logic, and token-refresh dances. That stuff is identical across every app. Shipping it as @picloud/client is genuinely valuable — it doesn't expand the security surface (scripts still gate everything), it just removes boilerplate.

TypeScript first

Ship TypeScript first. Cross-language story (Python, Swift, Kotlin, Rust, …) deferred until demand emerges. TS covers the dominant "web app + mobile via React Native" segment.

Status

Hybrid model (frontend through scripts only): leaning yes.
TypeScript first, other languages deferred: leaning yes.
Co-ship with realtime as v1.1.6: leaning yes (SSE wrapper is the killer feature of the lib).

Open calls

Hybrid model — confirmed, or do you want to seriously evaluate Firebase-mode?
TypeScript first, multi-language deferred — confirmed?
Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
Type safety: hand-written types only, or aim for codegen from script-declared schemas? Codegen is big — defer to v1.2+ if at all?

7. Revised v1.1.x roadmap

Net changes vs the blueprint §12 roadmap:

v1.1.5 pub/sub: now via trigger outbox (drops LISTEN/NOTIFY plan), tightening implementation scope
NEW v1.1.6 Realtime Channels & Client Library: realtime SSE + @picloud/client TS package; co-shipped
v1.1.7+ items shifted by one (was v1.1.6/7/8 → now v1.1.7/8/9)
Dead letters and the unified outbox/dispatcher are absorbed into v1.1.1's existing scope (triggers framework)

Version	Capability
v1.1.0	Foundation & Standard Library — SDK shape, `Services` bundle, `SdkCallCx`, `ExecutionGate`, `ServiceEventEmitter` trait shape; stdlib utilities (regex, random, time, json, base64, hex, url). ✓ Shipped.
v1.1.1	Storage & Events — KV store keyed `(app_id, collection, key)`; triggers framework (universal outbox + dispatcher + NATS-style sync HTTP via inbox + per-trigger retry config + dead-letter table & `dead_letter` trigger source + trigger CRUD + `ctx.event` + depth limit); KV trigger kinds.
v1.1.2	Documents — `docs::collection(name).create/find/update/delete/list` with `docs:*` triggers.
v1.1.3	Modules — `scripts.kind`, per-app resolver replaces `DummyModuleResolver`, AST cache + dep-graph invalidation.
v1.1.4	Outbound HTTP & Scheduled Tasks — `http::*` with SSRF deny-list; cron triggers (small now that the framework exists).
v1.1.5	Files & Pub/Sub — filesystem-backed blobs (`files/<app_id>/<id[0:2]>/<id>`) with `files:` triggers; pub/sub via the universal outbox with `pubsub:` triggers.
v1.1.6	Realtime Channels & Client Library (new) — SSE-based external subscription to per-app pub/sub topics (public + API-key auth modes); `@picloud/client` TypeScript package (typed HTTP, SSE subscription, auth helpers).
v1.1.7	Configuration & Email (was v1.1.6) — encrypted per-app secrets; outbound `email::send/send_html` + inbound `email:receive` trigger.
v1.1.8	User Management (was v1.1.7) — `users::*` for in-script CRUD, auth, roles, invites, password reset.
v1.1.9	Durable Queues & Function Composition (was v1.1.8) — `queue::` with `queue:receive` trigger; `invoke()` + `retry::` (closures-as-args, re-entrant Rhai).
v1.2	Workflows & Hierarchies (per blueprint §Phase 5) — DAG execution, advanced docs query, interceptors, read triggers, audit log, script-mediated realtime auth.
v1.3+	Scale & Ops (per blueprint §Phase 6) — cluster mode (NATS-style request/reply swaps to `LISTEN/NOTIFY`), cross-app data sharing, script versioning + rollback, rate limiting, richer auth, metrics, distributed tracing, webhooks, S3, monitoring/alerting on HTTP endpoint failures.

The v1.1.9 release marks the end of the v1.1.x expansion cadence. v1.2 is the next minor product bump (phase milestone per versioning policy).

Consolidated open calls

Numbered for easy reference in conversation. All currently un-answered.

§1 — Messaging primitives

Pub/sub durability via trigger outbox (durable, at-least-once) — confirmed?
Queue and pub/sub stay separate concepts (rather than unifying under a "messaging" abstraction with a subscription-mode flag) — confirmed?

§2 — Universal trigger outbox

Sync HTTP via outbox + per-request inbox pattern (NATS-style; see §3) — confirmed, or keep direct dispatch for sync?
Ship dispatch_mode: async for HTTP routes in v1.1.1, or defer to a later release?
Keep routes and triggers as separate tables (unified at the dispatcher only), or merge schemas?

§3 — NATS-style sync HTTP

NATS-style request/reply for sync HTTP — confirmed?
Status code strategy: keep existing distinctions (recommended) or flatten to 500?
Default retry policy on triggers: 3 attempts with exp backoff (1s/2s/4s), or different defaults?
Cancel-on-timeout semantics: discard late results (a), write to "abandoned executions" table for debugging (b — recommended), or attempt to ack the caller late (c)?

§4 — Dead letters

Dead-letter handlers unretryable + can't be dead-lettered themselves — confirmed?
No default dead-letter handler (rows just sit in the table); user opts in — confirmed, or built-in "log to admin notifications channel" default?
30-day default retention sensible, or longer/shorter?
Include Rhai SDK (dead_letters::list/replay/resolve) in v1.1.1 alongside admin endpoints, or defer the script-side surface to a later release?

§5 — Realtime

Approach C confirmed (opt-in external subscription on pub/sub topics) vs A: pubsub IS realtime, B: separate channels:: service?
SSE first, WebSocket deferred — confirmed, or ship both in v1.1.6?
Auth: public + API-key gating in v1.1.6, defer users-SDK-based tokens to v1.1.8 follow-up — confirmed?

§6 — Frontend client library

Hybrid model (frontend through scripts only) — confirmed, or seriously evaluate Firebase-mode?
TypeScript first, multi-language deferred — confirmed?
Co-ship realtime + client lib as v1.1.6, or split (server in v1.1.6, client lib later)?
Type safety: hand-written types only, or aim for codegen from script-declared schemas? Defer codegen to v1.2+ if at all?

Lifecycle of this document

Created at the v1.1.0 → v1.1.1 boundary (after the foundation PR series shipped).
Each section gets pruned once its decisions ship and land in the blueprint.
Open calls are answered in conversation, then folded into the corresponding section as "Decided: X" with the date.
Document deleted when v1.1.9 ships — everything by then is either in the blueprint, in code, or explicitly deferred to v1.2+.

30 KiB Raw Blame History

v1.1.x design notes — in-flight decisions + revised roadmap

1. The three messaging primitives

Pub/sub reframe — drop ephemeral, use the outbox

Queue stays separate

Status

Open calls

2. Universal trigger outbox

What runs through the outbox

What this gives

What this doesn't change

Status

Open calls

3. NATS-style request/reply for sync HTTP

Pattern

Implementation by deployment mode

Latency cost (honest numbers)

Retry policy — reply_to IS the signal

Failure / crash handling

Status code strategy — open question

Status

Open calls

4. Dead-letter handling

Schema sketch

Dead letter as trigger source

Recursion stop rule

Defaults

Sync HTTP failures don't dead-letter

Pub/sub fan-out dead-letters independently

Manual replay

Retention

Status

Open calls

5. Realtime updates for external clients

The chosen approach

Transport: SSE first

Auth model for external subscribers

Status

Open calls

6. Frontend client library

The two ends of the spectrum

The chosen approach: hybrid

Why hybrid, not maximalist

Why hybrid, not pure minimalist

TypeScript first

Status

Open calls

7. Revised v1.1.x roadmap

Consolidated open calls

§1 — Messaging primitives

§2 — Universal trigger outbox

§3 — NATS-style sync HTTP

§4 — Dead letters

§5 — Realtime

§6 — Frontend client library

Lifecycle of this document

30 KiB

Raw Blame History

Retry policy — `reply_to` IS the signal