feat(v1.1.1-triggers): triggers + outbox schema + repos

Migrations 0008-0011 lay down the triggers framework's storage:

- `triggers` + `kv_trigger_details` + `dead_letter_trigger_details`
  (Layout E, design notes §2). Parent table carries common columns
  including `registered_by_principal` — the dispatcher uses this to
  run the trigger as the user that registered it (design notes §4).
- `outbox`: universal async dispatch substrate. KV/cron/pubsub/queue/
  email/dead-letter all write rows in the same shape; the dispatcher
  claims due rows via FOR UPDATE SKIP LOCKED. `reply_to` is the
  NATS-style inbox id for sync HTTP (commit 6) — its presence flags
  "don't retry" per the design.
- `dead_letters`: exact schema from design notes §4 with the four-
  value `resolution` CHECK constraint (`replayed | ignored |
  handled_by_script | handler_failed`) and partial index on
  unresolved rows for the dashboard badge.
- `abandoned_executions`: forensic table for the dispatcher's
  "tried to resolve a dropped inbox" edge case (design notes §3 #9).

Repo surfaces with Postgres impls behind traits so unit tests can
swap in-memory backings:
- `TriggerRepo` — CRUD + the `list_matching_kv` /
  `list_matching_dead_letter` hot paths the dispatcher uses.
  Includes a `collection_matches` helper that handles `*`, `prefix:*`,
  and exact-name globs.
- `OutboxRepo` — insert + claim-due + delete + reschedule.
- `DeadLetterRepo` — insert + get + list + unresolved-count +
  resolve + GC.
- `AbandonedRepo` — insert + GC.

`TriggerConfig::from_env` (new module) follows the existing
`SandboxCeiling` env-loading pattern for `PICLOUD_MAX_TRIGGER_DEPTH`,
`PICLOUD_TRIGGER_RETRY_*`, `PICLOUD_DEAD_LETTER_RETENTION_DAYS`, and
`PICLOUD_ABANDONED_EXECUTIONS_RETENTION_DAYS`.

`Capability::AppManageTriggers(AppId)` and `AppDeadLetterManage(AppId)`
join the enum. Both map onto the existing `Scope::AppAdmin` per the
seven-scope commitment; `role_satisfies` grants them at the
`AppAdmin` per-app role.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-01 21:46:45 +02:00
parent 6b99f74c48
commit 545d863199
11 changed files with 1651 additions and 3 deletions

View File

@@ -0,0 +1,72 @@
-- v1.1.1: Trigger framework — Layout E (design notes §2 + §7).
--
-- A parent `triggers` table holds the common columns (script_id, retry
-- config, dispatch_mode, registered-by principal); per-kind detail
-- tables hold the kind-specific filter columns. v1.1.1 ships two
-- kinds: KV (collection_glob + ops) and dead_letter (source / trigger
-- / script filters). Future kinds (cron, pubsub, queue, email) extend
-- the parent and add their own detail table.
--
-- `registered_by_principal` captures the admin user that registered
-- the trigger. The dispatcher resolves this back to a `Principal` at
-- execution time so the trigger runs as the user that set it up
-- (design notes §4: "a trigger execution runs as the principal that
-- registered the trigger").
--
-- HTTP routes stay in their own `routes` table for now (Phase 3
-- production schema with its own trie-index columns); the dispatcher
-- discriminates HTTP outbox rows by `source_kind = 'http'` and
-- `trigger_id` referencing `routes.id`. Folding routes into triggers
-- is a v1.2 cleanup, not a v1.1.1 requirement.
CREATE TABLE triggers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
script_id UUID NOT NULL REFERENCES scripts(id) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('kv', 'dead_letter')),
enabled BOOLEAN NOT NULL DEFAULT TRUE,
-- Async by default — sync would mean the trigger fires inline with
-- the originating mutation, which v1.1.1 doesn't support.
dispatch_mode TEXT NOT NULL DEFAULT 'async'
CHECK (dispatch_mode IN ('sync', 'async')),
-- Defaults applied at write time so the row is auditable on its
-- own. Per-trigger overrides set on create; the env-defined
-- defaults provide the fallback values.
retry_max_attempts INT NOT NULL,
retry_backoff TEXT NOT NULL
CHECK (retry_backoff IN ('exponential', 'linear', 'constant')),
retry_base_ms INT NOT NULL,
registered_by_principal UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- The dispatcher's hot lookup: "all enabled triggers for app X of
-- kind Y". Indexed only when enabled = TRUE so disabled rows don't
-- pollute the index.
CREATE INDEX idx_triggers_app_kind_enabled
ON triggers (app_id, kind)
WHERE enabled = TRUE;
-- One row per KV trigger. `collection_glob` accepts:
-- "*" — any collection in the app
-- "widgets" — exact match
-- "users:*" — prefix wildcard (matched in Rust, not SQL)
-- `ops` is the subset of {insert, update, delete} this trigger
-- subscribes to. Empty array means "any op" (the trigger fires on
-- every mutation; admin endpoint validates this).
CREATE TABLE kv_trigger_details (
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
collection_glob TEXT NOT NULL,
ops TEXT[] NOT NULL
);
-- One row per dead-letter trigger. All three filter columns are
-- nullable — NULL means "no filter on this dimension". A trigger
-- with all three nullable filters fires on every dead-letter row.
CREATE TABLE dead_letter_trigger_details (
trigger_id UUID PRIMARY KEY REFERENCES triggers(id) ON DELETE CASCADE,
source_filter TEXT,
trigger_id_filter UUID,
script_id_filter UUID
);

View File

@@ -0,0 +1,64 @@
-- v1.1.1: Universal trigger outbox — design notes §2.
--
-- One table for every async dispatch in the system. KV/cron/pubsub/
-- queue/email/dead-letter all write rows in this shape; the dispatcher
-- claims due rows with `FOR UPDATE SKIP LOCKED` and routes them to
-- the executor.
--
-- Sync HTTP also writes here (NATS-style inbox, design notes §3) —
-- `reply_to` carries an `inbox_id` that the orchestrator awaits on a
-- oneshot channel. `reply_to.is_some()` is the "don't retry" signal:
-- one attempt, surface the result via the inbox.
--
-- `trigger_id` is a polymorphic reference discriminated by
-- `source_kind`: for `source_kind='http'` it references `routes.id`;
-- otherwise it references `triggers.id`. Polymorphism handled in
-- Rust (the dispatcher); no DB-level FK because Postgres doesn't
-- support polymorphic FKs cleanly. NULL is allowed because direct
-- admin-replay paths may not have a triggering row at all.
--
-- `script_id` denormalized so the dispatcher resolves the target
-- script without an extra round-trip per row.
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
source_kind TEXT NOT NULL
CHECK (source_kind IN ('http', 'kv', 'dead_letter')),
-- Polymorphic — see comment above. No FK constraint.
trigger_id UUID,
-- Pre-resolved at write time so the dispatcher doesn't re-look it up.
script_id UUID,
-- NULL = async (retry per policy). Some(inbox_id) = sync HTTP
-- (never retry; resolve the inbox with the result).
reply_to UUID,
-- ServiceEvent + ExecRequest scaffold serialized as JSONB.
payload JSONB NOT NULL,
-- Forensic field — the principal that triggered the originating
-- event. NOT the execution principal for trigger fan-out (that
-- comes from `triggers.registered_by_principal`).
origin_principal UUID,
-- Trigger-depth as the dispatcher will hand it to the executor.
-- Read out into ExecRequest.trigger_depth at dispatch time.
trigger_depth INT NOT NULL DEFAULT 0,
-- Originating execution id (for audit log grouping). Equals the
-- root for direct invocations; preserved across fan-out chains.
root_execution_id UUID,
attempt_count INT NOT NULL DEFAULT 0,
next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Set inside the SELECT FOR UPDATE SKIP LOCKED transaction so
-- the dispatcher can't double-pick a row across concurrent loop
-- iterations.
claimed_at TIMESTAMPTZ,
claimed_by TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Hot index: the dispatcher's `WHERE next_attempt_at <= NOW() AND
-- claimed_at IS NULL` claim query. Partial index keeps the hot set
-- small even if the table grows large.
CREATE INDEX idx_outbox_due
ON outbox (next_attempt_at)
WHERE claimed_at IS NULL;
CREATE INDEX idx_outbox_app ON outbox (app_id);

View File

@@ -0,0 +1,50 @@
-- v1.1.1: dead_letters — design notes §4.
--
-- Async invocations that exhaust their retry policy land here. Each
-- row carries the original event payload verbatim plus the attempt
-- history so handlers (registered via `dead_letter` triggers) and the
-- dashboard can decide what to do.
--
-- Schema mirrors design notes §4. The CHECK constraint on
-- `resolution` enforces the closed vocabulary used by both the SDK
-- (`dead_letters::resolve(id, reason)`) and the recursion-stop rule
-- (`handler_failed`). Sync HTTP failures (`reply_to.is_some()`) never
-- land here — they're served via the inbox channel.
--
-- Indexes:
-- - partial index on unresolved rows: the dashboard's
-- unresolved-count badge query (`COUNT(*) WHERE app_id = $1 AND
-- resolved_at IS NULL`).
-- - GC index on `created_at`: the weekly retention sweep.
CREATE TABLE dead_letters (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
-- The outbox.id row that exhausted retries. The outbox row itself
-- has been deleted at this point.
original_event_id UUID NOT NULL,
source TEXT NOT NULL,
op TEXT NOT NULL,
-- Nullable because direct admin replays may have no trigger row.
trigger_id UUID,
script_id UUID,
payload JSONB NOT NULL,
attempt_count INT NOT NULL,
first_attempt_at TIMESTAMPTZ NOT NULL,
last_attempt_at TIMESTAMPTZ NOT NULL,
last_error TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolved_at TIMESTAMPTZ,
resolution TEXT
CHECK (resolution IN
('replayed', 'ignored', 'handled_by_script', 'handler_failed'))
);
-- Dashboard unresolved-count badge — partial index on the predicate
-- the query uses.
CREATE INDEX idx_dead_letters_app_unresolved
ON dead_letters (app_id)
WHERE resolved_at IS NULL;
-- GC sweep scans by creation time.
CREATE INDEX idx_dead_letters_gc ON dead_letters (created_at);

View File

@@ -0,0 +1,31 @@
-- v1.1.1: abandoned_executions — design notes §3 #9.
--
-- Forensic table for the "dispatcher tried to resolve a oneshot inbox
-- but the receiver was already dropped" edge case. The orchestrator
-- timed out (returned 504 to the caller) and gave up on the channel,
-- but then the dispatcher's execution succeeded later. The caller
-- never sees the result; the row exists so the operator can
-- correlate when the abandoned-counter metric spikes.
--
-- Only the dispatcher-after-orchestrator-timeout edge case writes
-- here; ordinary "script timed out, caller got 504" stays uneventful.
--
-- 7-day retention, GC by `created_at`, sweep alongside dead_letters.
CREATE TABLE abandoned_executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
-- Original outbox row id (the row itself has been deleted).
outbox_id UUID NOT NULL,
script_id UUID,
-- The inbox channel id the dispatcher tried to resolve.
inbox_id UUID NOT NULL,
-- The HTTP status code the dispatcher attempted to send back.
status_code INT NOT NULL,
-- Truncated body / error description (capped at write time —
-- the dispatcher doesn't need to ship megabytes here).
result_summary TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_abandoned_executions_gc ON abandoned_executions (created_at);