# v1.1.7 — Configuration & Email — HANDBACK **Branch:** `feat/v1.1.7-secrets-email` (9 commits off `main`, not pushed) **Status:** ready for review. NOT merged, NOT pushed, no PR opened. ``` a7d3dad chore(v1.1.7): re-bless schema snapshot for secrets + email migrations 2ea47eb chore(v1.1.7): fix clippy --all-targets warnings b355851 chore(v1.1.7): version bumps + CHANGELOG fffcdf6 feat(v1.1.7-realtime-migration): encrypt signing keys at rest 02335a8 fix(v1.1.7-dead-letter): wire dispatcher → list_matching_dead_letter 1f78937 feat(v1.1.7-email-inbound): webhook receiver + email:receive trigger 8f2d2bc feat(v1.1.7-email-outbound): SMTP send/send_html 2d11090 feat(v1.1.7-secrets): secrets SDK + table + admin API + dashboard dc2e4fa feat(v1.1.7-crypto): master-key infra + encryption helpers ``` --- ## 1. Scope coverage | Item | Status | |---|---| | Encryption infrastructure (master key + AES-256-GCM envelope) | **Done** | | `secrets::*` SDK + `0023_secrets.sql` + admin API + dashboard tab | **Done** | | Outbound email `email::send` / `email::send_html` (lettre SMTP) | **Done** | | Inbound email webhook receiver + `email:receive` trigger + `0024` | **Done** (full scope, per user decision) | | Dispatcher routing for email | **Done** | | dead_letter handler wiring fix | **Done** | | Realtime signing-key encryption (two-phase) + `0025` | **Done** | | Dashboard (Secrets tab, email trigger form, `npm run check`) | **Done** | | Version bumps (1.1.7 / SDK 1.8 / dashboard 0.13.0) + CHANGELOG | **Done** | | Tests (match v1.1.5/v1.1.6 density) | **Done** | Nothing deferred from scope-in. Inbound email (the deferrable-if-scope- blew-up piece) was implemented in full. --- ## 2. Encryption infrastructure notes - **Module:** `crates/shared/src/crypto.rs` (`picloud_shared::crypto`). - **Master-key sourcing** (`MasterKey::from_env` → `resolve`): - `PICLOUD_SECRET_KEY` = base64 of exactly 32 bytes. Missing → `MasterKeyError::Missing` (fatal); non-base64 → `Malformed`; wrong length → `WrongLength`. **Sourced in `main.rs::run_server` before any DB work** — `build_app` takes the `MasterKey` as a parameter (so tests pass a fixed key and don't mutate process env). - Dev fallback: deterministic key (`SHA-256("picloud-dev-master-key-v1.1.7")`) used ONLY when `PICLOUD_SECRET_KEY` is unset **AND** `PICLOUD_DEV_MODE=true`, with a prominent `warn!`. No quiet unencrypted mode. - **aes-gcm version:** `0.10` (features `aes`, `alloc`). `Aes256Gcm`. - **Nonce generation:** 12 bytes from `rand::thread_rng().fill_bytes` (OS-CSPRNG-seeded), per-encryption. - **Storage layout:** ciphertext **with the 16-byte GCM auth tag appended** (RustCrypto `Aead`-trait layout — `encrypt` returns `ciphertext || tag`, `decrypt` consumes the same). The 12-byte nonce is stored in a separate column. `MasterKey`'s `Debug` is redacted. - **Plaintext cap (secrets):** 64 KB default, enforced in `secrets_service::seal` (the SDK boundary) → `SecretsError::TooLarge` with limit + actual size. Override: `PICLOUD_SECRET_MAX_VALUE_BYTES`. - **Key rotation:** out of scope. Documented in CHANGELOG + the module docs that changing `PICLOUD_SECRET_KEY` orphans all ciphertext. --- ## 3. Secrets notes - `SecretsService` (trait, `picloud-shared`) → `SecretsServiceImpl` + `PostgresSecretsRepo` (`manager-core`) → Rhai bridge (`executor-core/src/sdk/secrets.rs`). Collection-less; `app_id` from `cx.app_id`. - **JSON round-trip:** `set` serializes the value to JSON bytes, caps, encrypts; `get` decrypts + deserializes — a String returns a String (not a JSON-quoted `"\"…\""`). Verified by unit + bridge tests. - **No ServiceEvent emission** (secret writes don't fire triggers). - Admin API: `GET/POST/DELETE /api/v1/admin/apps/{id}/secrets`; list returns names + `updated_at` only. - Authz: `Capability::AppSecretsRead/Write` → `script:read`/`script:write`. No new Scope variants (seven-scope commitment held). --- ## 4. Email implementation notes - **SMTP transport:** `lettre 0.11` (`smtp-transport`, `tokio1-rustls-tls`, `builder`, `hostname`). **Connection model:** one connection per call (lettre default); pooling deferred to v1.2. The transport sits behind an internal `EmailTransport` trait so the service is unit-tested with a recording fake (no live SMTP). - **Disabled mode:** if HOST/USER/PASSWORD aren't all set, `EmailServiceImpl::from_env` builds no transport and every `send` returns `NotConfigured` (warned at startup). A malformed relay descriptor is also logged and yields disabled mode (email is non-critical; never blocks startup). - **Address validation:** hand-rolled RFC 5322-ish pre-check (single `@`, non-empty local part, domain contains a dot, ≤320 bytes) followed by a `lettre::Mailbox` parse (the authoritative validator). No deliverability check. - **Size cap:** 25 MB on `message.formatted()`, `PICLOUD_EMAIL_MAX_MESSAGE_BYTES`. - `email::send` forces text-only (ignores any `html`); `email::send_html` requires `html` and builds `MultiPart::alternative_plain_html`. `reply_to` defaults to `from`. `to`/`cc`/`bcc` accept a String or an Array of Strings. - **Inbound normalization:** only the generic provider-agnostic JSON shape `{from,to[],cc[],subject,text,html,message_id}` is accepted in v1.1.7 — `from` required, rest default. Provider-specific unmarshallers → v1.2. The expected shape is documented on the dashboard email-trigger form. --- ## 5. Dead-letter handler fix notes - **Call site:** `dispatcher::handle_failure`, the retry-exhaustion branch. After `DeadLetterRepo::insert` (which returns the new `DeadLetterId`), a new helper `fan_out_dead_letter` runs. - **What it does:** calls `TriggerRepo::list_matching_dead_letter(app_id, source, row.trigger_id, Some(resolved.script_id))` (the method that had no production caller) and inserts one outbox row per match (`source_kind = DeadLetter`, the DL trigger's id + handler script id, `trigger_depth + 1`, `origin_principal = the DL trigger's registered principal`). - **Payload — built from the REAL `TriggerEvent::DeadLetter` variant**, not the brief's §6 field list (see §7 deviations): `{ dead_letter_id, original: Box::new(decoded row payload), attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at }`. If the outbox payload can't be decoded back into a `TriggerEvent` (so the nested `original` can't be built), the fan-out is skipped — the dead-letter row is still durably written. - **Recursion-stop:** unchanged. The `is_dead_letter_handler` short-circuit at the top of `handle_failure` returns before the exhaustion branch, so a DL handler's own failure is never re-dead- lettered. No new guard needed. - **Tests verify the handler actually fires** (`crates/picloud/tests/dispatcher_e2e.rs`, DB-gated): `dispatcher_delivers_dead_letter_to_handler` now asserts BOTH row-create AND handler-fire (inline doc updated); `dispatcher_delivers_dead_letter_to_handler_actually_fires` asserts the nested `original` KV event + `last_error`; `dead_letter_source_filter_excludes_nonmatching` exercises the source filter dimension; `dead_letter_handler_failure_does_not_recurse` proves the recursion-stop (count stays at 1). --- ## 6. Realtime signing-key migration notes - **Two-phase**, as recommended. `0025_encrypt_realtime_keys.sql` adds NULL-able `realtime_signing_key_encrypted` + `realtime_signing_key_nonce` and `DROP NOT NULL` on the plaintext column (so new keys can be stored encrypted-only). - **Repo:** `PostgresAppSecretsRepo` now holds the `MasterKey`. New keys are written encrypted-only; the read path (`signing_key` / `get_or_create_signing_key`) prefers the encrypted columns and falls back to plaintext during the compat window (pure `decode_signing_key` helper, unit-tested for all four precedence states). - **Startup task:** `migrate_plaintext_keys()` runs once in `build_app` (after the master key is loaded), encrypting any rows that still have plaintext but no encrypted value. Plaintext is **left in place** for rollback safety. Idempotent. - **Plaintext column drop:** deferred to **v1.1.8** (documented in CHANGELOG + the migration). Operators must upgrade through v1.1.7 (which performs the encryption) before v1.1.8. - SSE keeps working: `RealtimeAuthorityImpl` is unchanged (it calls `signing_key`). Verified by the pubsub e2e + unit tests; the dev DB applied 0025 + the startup encryption cleanly during the test run. --- ## 7. Decisions beyond the brief / deviations flagged 1. **`inbound_secret` stored ENCRYPTED (user-approved deviation).** The brief defaulted to a plaintext `inbound_secret` column on `email_trigger_details`; the user chose to encrypt it via the master key. Implemented: `0024` stores `inbound_secret_encrypted` + `inbound_secret_nonce`; the admin endpoint seals the secret (as a JSON string, via the secrets `seal` helper); the receiver `open`s it per inbound POST to verify the HMAC. **Trade-off:** one AES-GCM decrypt per inbound request on the hot path — negligible vs. the HMAC + DB round-trip already there. The decrypted secret is never logged. 2. **Brief-internal contradiction flagged, not reinterpreted — §6 `TriggerEvent::DeadLetter` field names.** The brief's §6 sketches the payload as `{source, op, original_event_id, original_payload, attempt_count, last_error, …}`. The actual variant (`crates/shared/src/trigger_event.rs`) is `{dead_letter_id, original: Box, attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at}`. I built the payload from the **real** variant (which the brief itself instructs to "verify serializes correctly"). No type change needed. 3. **`build_app` signature gained a `MasterKey` parameter.** Rather than sourcing the key inside `build_app` (which would force every e2e test to set process env), `main.rs` sources it and passes it in. The 3 existing `build_app` test callers pass a fixed test key. 4. **Pre-existing clippy warnings fixed (see §10).** Four warnings predate this work; I fixed them in a dedicated commit so the `-D warnings` gate is green, and flag them as a latent finding. 5. **Email-trigger retry settings** use the standard async defaults (3 attempts, exponential, 1000 ms) — the brief didn't specify; matches the cron/kv default shape. No other deviations from prompt-specified defaults. --- ## 8. How to verify locally — §8 attestation (sourced from cargo's literal output) All gates run on the handed-back HEAD (`a7d3dad`): ```sh cargo fmt --all -- --check # clean cargo clippy --all-targets --all-features -- -D warnings # clean (exit 0) cd dashboard && npm run check # 0 ERRORS 0 WARNINGS (371 files) ``` Full test run **with `DATABASE_URL` set** so the DB-gated suites (schema_snapshot, dispatcher_e2e ×9, email_inbound ×8) execute: ```sh DATABASE_URL='postgres://picloud:picloud@127.0.0.1:15432/picloud' \ cargo test --workspace -- --test-threads=2 ``` **Pass count, summed from cargo's literal output (NOT hand-counted):** ```sh DATABASE_URL=... cargo test --workspace -- --test-threads=2 2>&1 | \ awk '/test result: ok\./ { gsub(";", ""); sum += $4 } END { print sum }' # => 617 ``` **617 passed, 0 failed** across the workspace (34 `test result:` lines, 0 `FAILED`). Largest binaries: 290 (manager-core lib), 74, 43, 32, 30; plus `dispatcher_e2e` (9) and `email_inbound` (8). **Bounded-parallelism note (`--test-threads=2`):** the picloud e2e binaries each call `build_app`, which opens its own Postgres pool. Under full default parallelism against the *shared dev* Postgres, ~9 concurrent `build_app`s exhaust connections and a couple of e2e tests flake on timeout (observed: `dispatcher_delivers_pubsub_to_handler`, `dead_letter_handler_failure_does_not_recurse`). They pass reliably at `--test-threads=2` and in isolation. CI's dedicated fresh `postgres:15` (not a shared dev DB) does not hit this. Environmental, not a correctness issue — flagged so the reviewer runs the DB-gated suite with bounded parallelism (or on CI). **Migrations:** apply cleanly on the v1.1.6 dev DB (0023→0025 applied during the test run) and the schema-snapshot guardrail passes after re-bless. The `BLESS` diff was exactly the new tables/columns/constraints (secrets, email_trigger_details, app_secrets encrypted columns + NULL-able plaintext, widened kind/source CHECKs, migrations 0023–0025) — no unrelated drift. **Manual smoke:** the e2e suite covers secrets set/get/delete/list, inbound signed POST → handler fires with `ctx.event.email`, dead-letter handler fires, realtime-key encryption + SSE. Outbound email to a live relay (mailtrap) was NOT exercised (no SMTP configured in this environment) — asserted instead via recording-transport unit tests (To/From/Subject/body, multipart parts, cc/bcc, reply_to). --- ## 9. Open questions for the reviewer 1. **§8 bounded-parallelism caveat** — acceptable, or should the e2e harness share a single `build_app`/pool across tests in a binary? (Out of v1.1.7 scope; the existing v1.1.6 e2e tests have the same shape.) 2. **`email::send` ignoring a stray `html` key** (forcing text-only) vs. throwing — I chose forgiving text-only; happy to make it strict. 3. **Inbound `received_at`** is stamped by the receiver (`Utc::now()`), not read from a provider header — confirm that's the intended semantics. --- ## 10. Latent security / correctness findings 1. **`clippy --all-targets --all-features -- -D warnings` did NOT pass at v1.1.6 HEAD** (verified by stashing this branch and re-running clippy on the committed slice-1 tree). Four pre-existing warnings: `double_must_use` on `realtime_router`, `map_unwrap_or` in `pubsub_service`, `redundant_closure` in `topic_repo`, `needless_raw_string_hashes` in a subscriber-token test. Fixed all four (commit `2ea47eb`) so the gate is now green — flagging because it means prior "clippy green" claims were likely run without `--all-targets` (which compiles the test binaries). 2. **Inbound HMAC fails closed on decrypt error.** If a stored `inbound_secret` can't be decrypted (e.g. `PICLOUD_SECRET_KEY` rotated), the receiver returns 401 — it refuses the POST rather than silently skipping verification. Intentional. 3. **No rate limiting on the public inbound-email endpoint.** Like every public data-plane route, `/api/v1/email-inbound/...` is unauthenticated by design (URL + HMAC are the gate). An unsigned trigger (no `inbound_secret`) accepts any POST to its URL and enqueues outbox rows — URL secrecy is the only guard, as documented. Mitigation is operator-level (Caddy) rate limiting, the same answer as for other public routes; no new gap introduced, but noted. --- ## 11. Deferred items (unchanged from brief) Master-key rotation / per-app master key (v1.2); native SMTP listener (v1.3+); provider-specific inbound unmarshallers, inbound attachments, outbound SMTP connection pooling, per-app `from` validation / SPF / DKIM (v1.2 / operator); dashboard inbound payload viewer (v1.2, PII); drop the plaintext `realtime_signing_key` column (v1.1.8); secrets versioning/history + secrets-change triggers (never); `users::*` (v1.1.8); `queue::*` / `invoke()` (v1.1.9). --- ## 12. Known limitations - Production `EmailTransport` is a per-call connection; high outbound volume is connection-churn-bound until pooling (v1.2). - Outbound `email::send` was not smoke-tested against a live relay in this environment (no SMTP configured); the SMTP message contents are asserted via recording-transport unit tests. - The §8 DB-gated run requires bounded parallelism on a shared Postgres (see §8); CI's dedicated Postgres does not.