diff --git a/HANDBACK.md b/HANDBACK.md index c351320..4d2d795 100644 --- a/HANDBACK.md +++ b/HANDBACK.md @@ -1,220 +1,330 @@ -# HANDBACK — v1.1.6 Realtime Channels & Client Library +# v1.1.7 — Configuration & Email — HANDBACK -Branch: `feat/v1.1.6-realtime-client` (from `main`). Not pushed, no PR. - -## 1. Scope coverage (§1–§13) - -| § | Item | Status | -|---|------|--------| -| 1 | `topics` table (`0021_topics.sql`) | ✅ Done | -| 2 | Topic admin endpoints (POST/GET/PATCH/DELETE), `AppTopicManage` | ✅ Done | -| 3 | SSE endpoint `GET /realtime/topics/{topic}` | ✅ Done | -| 4 | In-process `RealtimeBroadcaster` + GC + publish wiring | ✅ Done | -| 5 | HMAC subscriber tokens + `app_secrets` (`0022`) + `pubsub::subscriber_token` | ✅ Done | -| 6 | Dashboard Topics tab | ✅ Done | -| 7 | `@picloud/client` TypeScript package | ✅ Done | -| 8 | Topic-aware publish → realtime wiring | ✅ Done | -| 9 | Dispatcher e2e tests (six) | ✅ Done (location deviation — see §7) | -| 10 | Empty-blob decision | ✅ Done — **relaxed** (accept empty) | -| 11 | Orphan `*.tmp.*` sweeper | ✅ Done | -| 12 | Version bumps (1.1.6 / SDK 1.7 / dash 0.12.0 / client 1.0.0) | ✅ Done | -| 13 | Tests (all named-critical cases) | ✅ Done | - -## 2. Realtime implementation notes - -### Topic resolution / SSE handshake sequence -1. Extract `Host` → `app_domains.resolve_app(host)` (existing two-phase - dispatch). No app → **404**. -2. Token from `Authorization: Bearer ` **or** `?token=` (EventSource - can't set headers). -3. `RealtimeAuthority::authorize_subscribe(app_id, topic, token)`: - - topic missing OR `external_subscribable = false` → `NotFound` → **404** - (both collapse to 404 so the endpoint can't probe internal topics); - - `auth_mode='public'` → allow; - - `auth_mode='token'` → verify HMAC (present, signed by this app's key, - unexpired, scoped to this topic) → allow, else `Unauthorized` → **401** - (generic; never says which check failed). -4. `broadcaster.subscribe(app_id, topic)` → `broadcast::Receiver`; stream - `data: {topic,message,published_at}\n\n` with `:heartbeat` keepalive - every `PICLOUD_REALTIME_HEARTBEAT_SEC` (default 30). Headers: - `text/event-stream`, `Cache-Control: no-cache` (set by axum Sse), - `X-Accel-Buffering: no` (added). Client disconnect drops the receiver → - automatic cleanup; the periodic GC reaps empty channels. - -The SSE handler lives in **orchestrator-core** (`realtime_api.rs`) and -depends only on the three picloud-shared traits — all DB + signing-key -access stays in the manager-core `RealtimeAuthority` impl, so the -data-plane crate never touches the key. Cluster mode (v1.3+) swaps the -broadcaster + authority impls behind the same traits. - -### HMAC signing-key storage — chose **table** (`app_secrets`) -Per the asked decision. 32 random bytes per app, lazily created on the -first `pubsub::subscriber_token` call (`ON CONFLICT DO NOTHING` for -concurrency). No global `PICLOUD_INSTANCE_SECRET`; the table is the -natural home for v1.1.7's encrypted per-app secrets. -**Tension flagged:** §5 aspired to "validate the signature without a DB -lookup". The table approach needs a key read on the token-subscribe path. -Mitigated by an in-memory key cache in `RealtimeAuthorityImpl` (keys never -rotate in v1.1.6, so the cache needs no invalidation); the subscribe path -already reads the `topics` row, so this adds no new round-trip *category*. - -### In-process broadcaster -`Mutex>>`. -Capacity per channel `PICLOUD_REALTIME_BROADCAST_CAPACITY` (default 64); -slow consumers lose oldest events (`broadcast` lag semantics — best-effort, -no replay). `subscribe` creates channels lazily; `publish` is a silent -no-op when no channel exists; `drop_topic` removes the sender (existing -receivers observe a closed channel and disconnect). A `spawn_realtime_gc` -task (~60s) drops senders with `receiver_count() == 0`. - -### Publish wiring (order + failure modes) — §8 ordering chosen -`PubsubServiceImpl::publish_durable`: validate/authz → **transactional -outbox fan-out + commit** (existing) → **then** best-effort -`RealtimeBroadcaster::publish`. The broadcast runs on a child -`tokio::spawn` whose `JoinHandle` is awaited, so a panicking broadcaster -becomes a `warn` log, never a failed publish (the durable deliveries -already committed). One `published_at` instant is shared by both paths. -**Brief-internal contradiction flagged** — see §9. - -## 3. Client lib implementation notes - -- **Build:** **tsup** (dual ESM+CJS + `.d.ts`), tests **vitest**, lint = - `tsc --noEmit` (strict; `noUncheckedIndexedAccess`, no `any` in exports). -- **Layout:** `src/{index,client,endpoint,subscribe,auth,types}.ts` + - `src/react/index.ts` + `src/svelte/index.ts`; subpath exports - `@picloud/client/react` and `@picloud/client/svelte` via package `exports`. -- **Reconnect:** SSE is implemented over **streaming `fetch`** (not native - `EventSource`) so the lib can (a) detect a **401** on (re)connect and call - `onTokenExpired` to refresh, (b) send **`Last-Event-ID`** on resume - (server ignores it in v1.1.6 — client ships ready), and (c) apply its own - **exponential backoff** (base 1s → ×2 → cap 30s; reset on successful - open). Token rides in `?token=` (EventSource-parity). React Native caveat - documented in the README (supply a streaming-`fetch` polyfill). -- **zod/valibot adapter:** the `Validator = { parse(input): T }` shape. - A Zod schema satisfies it directly; Valibot wraps in one line. Used by - `endpoint(...).get({ validate })` and `subscribe(..., { validate })`. No - hard dep. - -## 4. v1.1.5 follow-ups - -- **Dispatcher e2e (six):** `dispatcher_delivers_{kv,docs,cron,files,pubsub, - dead_letter}_to_handler` in `crates/picloud/tests/dispatcher_e2e.rs`. - Verified green against a real Postgres (all 6). -- **Empty-blob — RELAXED** (per the asked decision). Dropped the - `data.is_empty()` rejection in `NewFile::validate` **and** `FileUpdate:: - validate` (the latter for consistency — flagged in §7). Flipped the - pinning test in `files_service.rs` and added `empty_file_round_trips`. -- **Orphan sweep:** `spawn_files_orphan_sweep` (`files_sweep.rs`), every 6h - (`PICLOUD_FILES_ORPHAN_SWEEP_INTERVAL_SEC`), unlinks `*.tmp.*` older than - 1h (`PICLOUD_FILES_ORPHAN_TMP_TTL_SEC`); logs dirs-walked / files-deleted / - bytes-reclaimed. No DB cross-check (v1.3+). Tested: deletes old tmp, keeps - young tmp, keeps non-tmp, missing-root no panic. - -## 5. Schema decisions beyond the brief -None — `0021_topics.sql` and `0022_app_secrets.sql` match the brief's DDL -verbatim. Schema-snapshot golden re-blessed on a fresh DB (only the two new -tables added; PK/FK `ON DELETE CASCADE` + the `auth_mode` CHECK present). - -## 6. How to verify locally -See §8 below. - -## 7. Decisions beyond the brief — every prompt-default deviation -1. **Dispatcher e2e location:** the brief says - `crates/manager-core/tests/dispatcher_e2e.rs`; I put it in - **`crates/picloud/tests/`**. `build_app` (the full dispatcher + scheduler - + executor wiring) lives in the `picloud` crate; a manager-core test would - need a manager→picloud dev-dependency cycle or a hand-rolled re-wire of - build_app. The picloud harness (`server_with_app` pattern) is proven and - the tests run green against a real DB. Gating uses the **schema_snapshot - pattern** (env check + early return), NOT `#[ignore]`, so CI's plain - `cargo test` with `DATABASE_URL` runs them while local stays green — as - the brief requested. -2. **Empty-blob relaxation extended to `FileUpdate::validate`** — the brief - names only `NewFile::validate`. Relaxing create-empty but rejecting - update-to-empty would be an inconsistent API, so I relaxed both. -3. **Publish order: §8 (broadcast AFTER outbox commit)**, not §4 (broadcast - FIRST). The two sections contradict; §8 is the dedicated, numbered, - rationale-backed section. See §9. -4. **Topic-name validation** — `topics_api` rejects empty names and names - containing `*` (external pattern subscription is v2). The brief didn't - specify name validation; this is a small guard. -5. **Client lib lint = `tsc --noEmit`** (not eslint) to keep devDeps lean; - strict typecheck is the gate. "No `any` in exports" is enforced by review - + strict TS, not an eslint rule. -6. **Cron e2e poll budget = 45s** — the cron scheduler skips its first tick - then ticks every 30s (default). The test polls 45s so it passes at the - default; set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to make it ~2s in CI. - -## 8. Attestation (gate runs on this HEAD) +**Branch:** `feat/v1.1.7-secrets-email` (9 commits off `main`, not pushed) +**Status:** ready for review. NOT merged, NOT pushed, no PR opened. ``` -cargo fmt --all -- --check → clean -cargo clippy --all-targets --all-features -- -D warnings → clean (exit 0) -cargo test --workspace → 482 passed, 0 failed - (DB-gated dispatcher_e2e - auto-skip as no-op when - DATABASE_URL unset) -# DB-gated (real Postgres @ 127.0.0.1:15432, picloud/picloud): -DATABASE_URL=… PICLOUD_CRON_TICK_INTERVAL_MS=1000 \ - cargo test -p picloud --test dispatcher_e2e → 6 passed -DATABASE_URL=… cargo test -p picloud-manager-core --test schema_snapshot → 1 passed -(cd dashboard && npm run check) → 0 errors, 0 warnings (371 files) -(cd clients/typescript && npm run lint) → clean (tsc --noEmit) -(cd clients/typescript && npm run test) → 15 passed (5 files) -(cd clients/typescript && npm run build) → tsup ESM+CJS+.d.ts OK +a7d3dad chore(v1.1.7): re-bless schema snapshot for secrets + email migrations +2ea47eb chore(v1.1.7): fix clippy --all-targets warnings +b355851 chore(v1.1.7): version bumps + CHANGELOG +fffcdf6 feat(v1.1.7-realtime-migration): encrypt signing keys at rest +02335a8 fix(v1.1.7-dead-letter): wire dispatcher → list_matching_dead_letter +1f78937 feat(v1.1.7-email-inbound): webhook receiver + email:receive trigger +8f2d2bc feat(v1.1.7-email-outbound): SMTP send/send_html +2d11090 feat(v1.1.7-secrets): secrets SDK + table + admin API + dashboard +dc2e4fa feat(v1.1.7-crypto): master-key infra + encryption helpers ``` -Migrations verified applying cleanly on a fresh DB **and** on top of the -existing v1.1.5 dev DB (0020 → 0021 → 0022). Schema-snapshot golden diff is -exactly the two new tables. + +--- + +## 1. Scope coverage + +| Item | Status | +|---|---| +| Encryption infrastructure (master key + AES-256-GCM envelope) | **Done** | +| `secrets::*` SDK + `0023_secrets.sql` + admin API + dashboard tab | **Done** | +| Outbound email `email::send` / `email::send_html` (lettre SMTP) | **Done** | +| Inbound email webhook receiver + `email:receive` trigger + `0024` | **Done** (full scope, per user decision) | +| Dispatcher routing for email | **Done** | +| dead_letter handler wiring fix | **Done** | +| Realtime signing-key encryption (two-phase) + `0025` | **Done** | +| Dashboard (Secrets tab, email trigger form, `npm run check`) | **Done** | +| Version bumps (1.1.7 / SDK 1.8 / dashboard 0.13.0) + CHANGELOG | **Done** | +| Tests (match v1.1.5/v1.1.6 density) | **Done** | + +Nothing deferred from scope-in. Inbound email (the deferrable-if-scope- +blew-up piece) was implemented in full. + +--- + +## 2. Encryption infrastructure notes + +- **Module:** `crates/shared/src/crypto.rs` (`picloud_shared::crypto`). +- **Master-key sourcing** (`MasterKey::from_env` → `resolve`): + - `PICLOUD_SECRET_KEY` = base64 of exactly 32 bytes. Missing → + `MasterKeyError::Missing` (fatal); non-base64 → `Malformed`; wrong + length → `WrongLength`. **Sourced in `main.rs::run_server` before any + DB work** — `build_app` takes the `MasterKey` as a parameter (so + tests pass a fixed key and don't mutate process env). + - Dev fallback: deterministic key (`SHA-256("picloud-dev-master-key-v1.1.7")`) + used ONLY when `PICLOUD_SECRET_KEY` is unset **AND** + `PICLOUD_DEV_MODE=true`, with a prominent `warn!`. No quiet + unencrypted mode. +- **aes-gcm version:** `0.10` (features `aes`, `alloc`). `Aes256Gcm`. +- **Nonce generation:** 12 bytes from `rand::thread_rng().fill_bytes` + (OS-CSPRNG-seeded), per-encryption. +- **Storage layout:** ciphertext **with the 16-byte GCM auth tag + appended** (RustCrypto `Aead`-trait layout — `encrypt` returns + `ciphertext || tag`, `decrypt` consumes the same). The 12-byte nonce is + stored in a separate column. `MasterKey`'s `Debug` is redacted. +- **Plaintext cap (secrets):** 64 KB default, enforced in + `secrets_service::seal` (the SDK boundary) → `SecretsError::TooLarge` + with limit + actual size. Override: `PICLOUD_SECRET_MAX_VALUE_BYTES`. +- **Key rotation:** out of scope. Documented in CHANGELOG + the module + docs that changing `PICLOUD_SECRET_KEY` orphans all ciphertext. + +--- + +## 3. Secrets notes + +- `SecretsService` (trait, `picloud-shared`) → `SecretsServiceImpl` + + `PostgresSecretsRepo` (`manager-core`) → Rhai bridge + (`executor-core/src/sdk/secrets.rs`). Collection-less; `app_id` from + `cx.app_id`. +- **JSON round-trip:** `set` serializes the value to JSON bytes, caps, + encrypts; `get` decrypts + deserializes — a String returns a String + (not a JSON-quoted `"\"…\""`). Verified by unit + bridge tests. +- **No ServiceEvent emission** (secret writes don't fire triggers). +- Admin API: `GET/POST/DELETE /api/v1/admin/apps/{id}/secrets`; list + returns names + `updated_at` only. +- Authz: `Capability::AppSecretsRead/Write` → `script:read`/`script:write`. + No new Scope variants (seven-scope commitment held). + +--- + +## 4. Email implementation notes + +- **SMTP transport:** `lettre 0.11` (`smtp-transport`, + `tokio1-rustls-tls`, `builder`, `hostname`). **Connection model:** one + connection per call (lettre default); pooling deferred to v1.2. The + transport sits behind an internal `EmailTransport` trait so the service + is unit-tested with a recording fake (no live SMTP). +- **Disabled mode:** if HOST/USER/PASSWORD aren't all set, + `EmailServiceImpl::from_env` builds no transport and every `send` + returns `NotConfigured` (warned at startup). A malformed relay + descriptor is also logged and yields disabled mode (email is + non-critical; never blocks startup). +- **Address validation:** hand-rolled RFC 5322-ish pre-check (single `@`, + non-empty local part, domain contains a dot, ≤320 bytes) followed by a + `lettre::Mailbox` parse (the authoritative validator). No deliverability + check. +- **Size cap:** 25 MB on `message.formatted()`, + `PICLOUD_EMAIL_MAX_MESSAGE_BYTES`. +- `email::send` forces text-only (ignores any `html`); `email::send_html` + requires `html` and builds `MultiPart::alternative_plain_html`. + `reply_to` defaults to `from`. `to`/`cc`/`bcc` accept a String or an + Array of Strings. +- **Inbound normalization:** only the generic provider-agnostic JSON + shape `{from,to[],cc[],subject,text,html,message_id}` is accepted in + v1.1.7 — `from` required, rest default. Provider-specific unmarshallers + → v1.2. The expected shape is documented on the dashboard email-trigger + form. + +--- + +## 5. Dead-letter handler fix notes + +- **Call site:** `dispatcher::handle_failure`, the retry-exhaustion + branch. After `DeadLetterRepo::insert` (which returns the new + `DeadLetterId`), a new helper `fan_out_dead_letter` runs. +- **What it does:** calls `TriggerRepo::list_matching_dead_letter(app_id, + source, row.trigger_id, Some(resolved.script_id))` (the method that had + no production caller) and inserts one outbox row per match + (`source_kind = DeadLetter`, the DL trigger's id + handler script id, + `trigger_depth + 1`, `origin_principal = the DL trigger's registered + principal`). +- **Payload — built from the REAL `TriggerEvent::DeadLetter` variant**, + not the brief's §6 field list (see §7 deviations): `{ dead_letter_id, + original: Box::new(decoded row payload), attempts, last_error, + trigger_id, script_id, first_attempt_at, last_attempt_at }`. If the + outbox payload can't be decoded back into a `TriggerEvent` (so the + nested `original` can't be built), the fan-out is skipped — the + dead-letter row is still durably written. +- **Recursion-stop:** unchanged. The `is_dead_letter_handler` + short-circuit at the top of `handle_failure` returns before the + exhaustion branch, so a DL handler's own failure is never re-dead- + lettered. No new guard needed. +- **Tests verify the handler actually fires** + (`crates/picloud/tests/dispatcher_e2e.rs`, DB-gated): + `dispatcher_delivers_dead_letter_to_handler` now asserts BOTH row-create + AND handler-fire (inline doc updated); + `dispatcher_delivers_dead_letter_to_handler_actually_fires` asserts the + nested `original` KV event + `last_error`; + `dead_letter_source_filter_excludes_nonmatching` exercises the source + filter dimension; `dead_letter_handler_failure_does_not_recurse` proves + the recursion-stop (count stays at 1). + +--- + +## 6. Realtime signing-key migration notes + +- **Two-phase**, as recommended. `0025_encrypt_realtime_keys.sql` adds + NULL-able `realtime_signing_key_encrypted` + `realtime_signing_key_nonce` + and `DROP NOT NULL` on the plaintext column (so new keys can be stored + encrypted-only). +- **Repo:** `PostgresAppSecretsRepo` now holds the `MasterKey`. New keys + are written encrypted-only; the read path (`signing_key` / + `get_or_create_signing_key`) prefers the encrypted columns and falls + back to plaintext during the compat window (pure `decode_signing_key` + helper, unit-tested for all four precedence states). +- **Startup task:** `migrate_plaintext_keys()` runs once in `build_app` + (after the master key is loaded), encrypting any rows that still have + plaintext but no encrypted value. Plaintext is **left in place** for + rollback safety. Idempotent. +- **Plaintext column drop:** deferred to **v1.1.8** (documented in + CHANGELOG + the migration). Operators must upgrade through v1.1.7 + (which performs the encryption) before v1.1.8. +- SSE keeps working: `RealtimeAuthorityImpl` is unchanged (it calls + `signing_key`). Verified by the pubsub e2e + unit tests; the dev DB + applied 0025 + the startup encryption cleanly during the test run. + +--- + +## 7. Decisions beyond the brief / deviations flagged + +1. **`inbound_secret` stored ENCRYPTED (user-approved deviation).** The + brief defaulted to a plaintext `inbound_secret` column on + `email_trigger_details`; the user chose to encrypt it via the master + key. Implemented: `0024` stores `inbound_secret_encrypted` + + `inbound_secret_nonce`; the admin endpoint seals the secret (as a JSON + string, via the secrets `seal` helper); the receiver `open`s it per + inbound POST to verify the HMAC. **Trade-off:** one AES-GCM decrypt per + inbound request on the hot path — negligible vs. the HMAC + DB + round-trip already there. The decrypted secret is never logged. + +2. **Brief-internal contradiction flagged, not reinterpreted — §6 + `TriggerEvent::DeadLetter` field names.** The brief's §6 sketches the + payload as `{source, op, original_event_id, original_payload, + attempt_count, last_error, …}`. The actual variant + (`crates/shared/src/trigger_event.rs`) is `{dead_letter_id, original: + Box, attempts, last_error, trigger_id, script_id, + first_attempt_at, last_attempt_at}`. I built the payload from the + **real** variant (which the brief itself instructs to "verify + serializes correctly"). No type change needed. + +3. **`build_app` signature gained a `MasterKey` parameter.** Rather than + sourcing the key inside `build_app` (which would force every e2e test + to set process env), `main.rs` sources it and passes it in. The 3 + existing `build_app` test callers pass a fixed test key. + +4. **Pre-existing clippy warnings fixed (see §10).** Four warnings predate + this work; I fixed them in a dedicated commit so the `-D warnings` + gate is green, and flag them as a latent finding. + +5. **Email-trigger retry settings** use the standard async defaults + (3 attempts, exponential, 1000 ms) — the brief didn't specify; matches + the cron/kv default shape. + +No other deviations from prompt-specified defaults. + +--- + +## 8. How to verify locally — §8 attestation (sourced from cargo's literal output) + +All gates run on the handed-back HEAD (`a7d3dad`): + +```sh +cargo fmt --all -- --check # clean +cargo clippy --all-targets --all-features -- -D warnings # clean (exit 0) +cd dashboard && npm run check # 0 ERRORS 0 WARNINGS (371 files) +``` + +Full test run **with `DATABASE_URL` set** so the DB-gated suites +(schema_snapshot, dispatcher_e2e ×9, email_inbound ×8) execute: + +```sh +DATABASE_URL='postgres://picloud:picloud@127.0.0.1:15432/picloud' \ + cargo test --workspace -- --test-threads=2 +``` + +**Pass count, summed from cargo's literal output (NOT hand-counted):** + +```sh +DATABASE_URL=... cargo test --workspace -- --test-threads=2 2>&1 | \ + awk '/test result: ok\./ { gsub(";", ""); sum += $4 } END { print sum }' +# => 617 +``` + +**617 passed, 0 failed** across the workspace (34 `test result:` lines, +0 `FAILED`). Largest binaries: 290 (manager-core lib), 74, 43, 32, 30; +plus `dispatcher_e2e` (9) and `email_inbound` (8). + +**Bounded-parallelism note (`--test-threads=2`):** the picloud e2e +binaries each call `build_app`, which opens its own Postgres pool. Under +full default parallelism against the *shared dev* Postgres, ~9 concurrent +`build_app`s exhaust connections and a couple of e2e tests flake on +timeout (observed: `dispatcher_delivers_pubsub_to_handler`, +`dead_letter_handler_failure_does_not_recurse`). They pass reliably at +`--test-threads=2` and in isolation. CI's dedicated fresh `postgres:15` +(not a shared dev DB) does not hit this. Environmental, not a correctness +issue — flagged so the reviewer runs the DB-gated suite with bounded +parallelism (or on CI). + +**Migrations:** apply cleanly on the v1.1.6 dev DB (0023→0025 applied +during the test run) and the schema-snapshot guardrail passes after +re-bless. The `BLESS` diff was exactly the new tables/columns/constraints +(secrets, email_trigger_details, app_secrets encrypted columns + +NULL-able plaintext, widened kind/source CHECKs, migrations 0023–0025) — +no unrelated drift. + +**Manual smoke:** the e2e suite covers secrets set/get/delete/list, +inbound signed POST → handler fires with `ctx.event.email`, dead-letter +handler fires, realtime-key encryption + SSE. Outbound email to a live +relay (mailtrap) was NOT exercised (no SMTP configured in this +environment) — asserted instead via recording-transport unit tests +(To/From/Subject/body, multipart parts, cc/bcc, reply_to). + +--- ## 9. Open questions for the reviewer -1. **§4 vs §8 ordering contradiction (load-bearing, flagged not - reinterpreted):** §4 says "realtime broadcast FIRST, then transactional - outbox fan-out"; §8 says broadcast AFTER the fan-out commits (numbered - steps 1–4). I implemented **§8** (dedicated section + failure-mode - rationale). If §4's ordering was intended, the broadcast should move - before `fan_out_publish` — but then a publish whose outbox write fails - would still have notified SSE subscribers of an event that never durably - happened, which seems wrong. Please confirm §8 is correct. -2. **Dead-letter → handler fan-out appears unwired (see §10).** Confirm - whether the `dead_letter` trigger source is intended to fire in v1.1.x. -## 10. Latent findings -**`dead_letter` trigger handlers do not fire on dead-letter creation.** -`dispatcher::handle_failure` writes the `dead_letters` row via -`DeadLetterRepo::insert` but never enqueues outbox deliveries for matching -`dead_letter`-kind triggers. `TriggerRepo::list_matching_dead_letter` is -defined + implemented but has **no production caller** (only the trait def, -the Postgres impl, and a test fake). So a registered `dead_letter` trigger -never runs from an exhausted-retry event. This predates v1.1.6 (the design -notes §4 describe it shipping in v1.1.1). I did **not** fix it (out of -v1.1.6 scope) — flagging for the reviewer. Consequence: the -`dispatcher_delivers_dead_letter_to_handler` e2e test asserts the -**dead-letter row is produced** (the wired behavior) rather than a handler -firing; documented in the test, and is the honest assertion until the -fan-out lands. +1. **§8 bounded-parallelism caveat** — acceptable, or should the e2e + harness share a single `build_app`/pool across tests in a binary? + (Out of v1.1.7 scope; the existing v1.1.6 e2e tests have the same + shape.) +2. **`email::send` ignoring a stray `html` key** (forcing text-only) vs. + throwing — I chose forgiving text-only; happy to make it strict. +3. **Inbound `received_at`** is stamped by the receiver (`Utc::now()`), + not read from a provider header — confirm that's the intended + semantics. -No new security gaps introduced. Cross-app isolation holds: `subscriber_token` -claims carry `app_id` and are signed per-app; the SSE authority rejects -cross-app tokens (a per-app key already fails the signature, plus an explicit -`claims.app_id == app_id` check); topic admin endpoints bind the capability -to the path `app_id` after loading the app; the broadcaster keys channels by -`(AppId, topic)` so app A's publishes never reach app B's subscribers -(unit-tested). +--- -## 11. Deferred items (beyond this prompt's OUT list) -None added. Everything on the brief's OUT list stayed out (WebSocket, -session/script auth modes, topic-pattern external subscription, server-side -last-event-id replay, per-app SSE/rate limits, other-language SDKs, codegen, -full DB-cross-check sweeper). +## 10. Latent security / correctness findings -## 12. Known limitations / rough edges -- SSE delivery is best-effort at-most-once; a slow consumer past the - broadcast buffer loses events with no server-side replay (by design). -- The client lib's streaming-`fetch` SSE needs a polyfill on runtimes - without `fetch` body streaming (React Native) — documented. -- Cron e2e takes ~31s at the default 30s tick interval (45s poll budget); - set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to speed it up. -- The realtime key cache in `RealtimeAuthorityImpl` is per-process and never - invalidated — correct only because v1.1.6 has no key rotation. A future - rotation API must clear it. +1. **`clippy --all-targets --all-features -- -D warnings` did NOT pass at + v1.1.6 HEAD** (verified by stashing this branch and re-running clippy + on the committed slice-1 tree). Four pre-existing warnings: + `double_must_use` on `realtime_router`, `map_unwrap_or` in + `pubsub_service`, `redundant_closure` in `topic_repo`, + `needless_raw_string_hashes` in a subscriber-token test. Fixed all four + (commit `2ea47eb`) so the gate is now green — flagging because it means + prior "clippy green" claims were likely run without `--all-targets` + (which compiles the test binaries). + +2. **Inbound HMAC fails closed on decrypt error.** If a stored + `inbound_secret` can't be decrypted (e.g. `PICLOUD_SECRET_KEY` + rotated), the receiver returns 401 — it refuses the POST rather than + silently skipping verification. Intentional. + +3. **No rate limiting on the public inbound-email endpoint.** Like every + public data-plane route, `/api/v1/email-inbound/...` is + unauthenticated by design (URL + HMAC are the gate). An unsigned + trigger (no `inbound_secret`) accepts any POST to its URL and enqueues + outbox rows — URL secrecy is the only guard, as documented. Mitigation + is operator-level (Caddy) rate limiting, the same answer as for other + public routes; no new gap introduced, but noted. + +--- + +## 11. Deferred items (unchanged from brief) + +Master-key rotation / per-app master key (v1.2); native SMTP listener +(v1.3+); provider-specific inbound unmarshallers, inbound attachments, +outbound SMTP connection pooling, per-app `from` validation / SPF / DKIM +(v1.2 / operator); dashboard inbound payload viewer (v1.2, PII); drop the +plaintext `realtime_signing_key` column (v1.1.8); secrets +versioning/history + secrets-change triggers (never); `users::*` (v1.1.8); +`queue::*` / `invoke()` (v1.1.9). + +--- + +## 12. Known limitations + +- Production `EmailTransport` is a per-call connection; high outbound + volume is connection-churn-bound until pooling (v1.2). +- Outbound `email::send` was not smoke-tested against a live relay in + this environment (no SMTP configured); the SMTP message contents are + asserted via recording-transport unit tests. +- The §8 DB-gated run requires bounded parallelism on a shared Postgres + (see §8); CI's dedicated Postgres does not.