16 KiB
v1.1.7 — Configuration & Email — HANDBACK
Branch: feat/v1.1.7-secrets-email (9 commits off main, not pushed)
Status: ready for review. NOT merged, NOT pushed, no PR opened.
a7d3dad chore(v1.1.7): re-bless schema snapshot for secrets + email migrations
2ea47eb chore(v1.1.7): fix clippy --all-targets warnings
b355851 chore(v1.1.7): version bumps + CHANGELOG
fffcdf6 feat(v1.1.7-realtime-migration): encrypt signing keys at rest
02335a8 fix(v1.1.7-dead-letter): wire dispatcher → list_matching_dead_letter
1f78937 feat(v1.1.7-email-inbound): webhook receiver + email:receive trigger
8f2d2bc feat(v1.1.7-email-outbound): SMTP send/send_html
2d11090 feat(v1.1.7-secrets): secrets SDK + table + admin API + dashboard
dc2e4fa feat(v1.1.7-crypto): master-key infra + encryption helpers
1. Scope coverage
| Item | Status |
|---|---|
| Encryption infrastructure (master key + AES-256-GCM envelope) | Done |
secrets::* SDK + 0023_secrets.sql + admin API + dashboard tab |
Done |
Outbound email email::send / email::send_html (lettre SMTP) |
Done |
Inbound email webhook receiver + email:receive trigger + 0024 |
Done (full scope, per user decision) |
| Dispatcher routing for email | Done |
| dead_letter handler wiring fix | Done |
Realtime signing-key encryption (two-phase) + 0025 |
Done |
Dashboard (Secrets tab, email trigger form, npm run check) |
Done |
| Version bumps (1.1.7 / SDK 1.8 / dashboard 0.13.0) + CHANGELOG | Done |
| Tests (match v1.1.5/v1.1.6 density) | Done |
Nothing deferred from scope-in. Inbound email (the deferrable-if-scope- blew-up piece) was implemented in full.
2. Encryption infrastructure notes
- Module:
crates/shared/src/crypto.rs(picloud_shared::crypto). - Master-key sourcing (
MasterKey::from_env→resolve):PICLOUD_SECRET_KEY= base64 of exactly 32 bytes. Missing →MasterKeyError::Missing(fatal); non-base64 →Malformed; wrong length →WrongLength. Sourced inmain.rs::run_serverbefore any DB work —build_apptakes theMasterKeyas a parameter (so tests pass a fixed key and don't mutate process env).- Dev fallback: deterministic key (
SHA-256("picloud-dev-master-key-v1.1.7")) used ONLY whenPICLOUD_SECRET_KEYis unset ANDPICLOUD_DEV_MODE=true, with a prominentwarn!. No quiet unencrypted mode.
- aes-gcm version:
0.10(featuresaes,alloc).Aes256Gcm. - Nonce generation: 12 bytes from
rand::thread_rng().fill_bytes(OS-CSPRNG-seeded), per-encryption. - Storage layout: ciphertext with the 16-byte GCM auth tag
appended (RustCrypto
Aead-trait layout —encryptreturnsciphertext || tag,decryptconsumes the same). The 12-byte nonce is stored in a separate column.MasterKey'sDebugis redacted. - Plaintext cap (secrets): 64 KB default, enforced in
secrets_service::seal(the SDK boundary) →SecretsError::TooLargewith limit + actual size. Override:PICLOUD_SECRET_MAX_VALUE_BYTES. - Key rotation: out of scope. Documented in CHANGELOG + the module
docs that changing
PICLOUD_SECRET_KEYorphans all ciphertext.
3. Secrets notes
SecretsService(trait,picloud-shared) →SecretsServiceImpl+PostgresSecretsRepo(manager-core) → Rhai bridge (executor-core/src/sdk/secrets.rs). Collection-less;app_idfromcx.app_id.- JSON round-trip:
setserializes the value to JSON bytes, caps, encrypts;getdecrypts + deserializes — a String returns a String (not a JSON-quoted"\"…\""). Verified by unit + bridge tests. - No ServiceEvent emission (secret writes don't fire triggers).
- Admin API:
GET/POST/DELETE /api/v1/admin/apps/{id}/secrets; list returns names +updated_atonly. - Authz:
Capability::AppSecretsRead/Write→script:read/script:write. No new Scope variants (seven-scope commitment held).
4. Email implementation notes
- SMTP transport:
lettre 0.11(smtp-transport,tokio1-rustls-tls,builder,hostname). Connection model: one connection per call (lettre default); pooling deferred to v1.2. The transport sits behind an internalEmailTransporttrait so the service is unit-tested with a recording fake (no live SMTP). - Disabled mode: if HOST/USER/PASSWORD aren't all set,
EmailServiceImpl::from_envbuilds no transport and everysendreturnsNotConfigured(warned at startup). A malformed relay descriptor is also logged and yields disabled mode (email is non-critical; never blocks startup). - Address validation: hand-rolled RFC 5322-ish pre-check (single
@, non-empty local part, domain contains a dot, ≤320 bytes) followed by alettre::Mailboxparse (the authoritative validator). No deliverability check. - Size cap: 25 MB on
message.formatted(),PICLOUD_EMAIL_MAX_MESSAGE_BYTES. email::sendforces text-only (ignores anyhtml);email::send_htmlrequireshtmland buildsMultiPart::alternative_plain_html.reply_todefaults tofrom.to/cc/bccaccept a String or an Array of Strings.- Inbound normalization: only the generic provider-agnostic JSON
shape
{from,to[],cc[],subject,text,html,message_id}is accepted in v1.1.7 —fromrequired, rest default. Provider-specific unmarshallers → v1.2. The expected shape is documented on the dashboard email-trigger form.
5. Dead-letter handler fix notes
- Call site:
dispatcher::handle_failure, the retry-exhaustion branch. AfterDeadLetterRepo::insert(which returns the newDeadLetterId), a new helperfan_out_dead_letterruns. - What it does: calls
TriggerRepo::list_matching_dead_letter(app_id, source, row.trigger_id, Some(resolved.script_id))(the method that had no production caller) and inserts one outbox row per match (source_kind = DeadLetter, the DL trigger's id + handler script id,trigger_depth + 1,origin_principal = the DL trigger's registered principal). - Payload — built from the REAL
TriggerEvent::DeadLettervariant, not the brief's §6 field list (see §7 deviations):{ dead_letter_id, original: Box::new(decoded row payload), attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at }. If the outbox payload can't be decoded back into aTriggerEvent(so the nestedoriginalcan't be built), the fan-out is skipped — the dead-letter row is still durably written. - Recursion-stop: unchanged. The
is_dead_letter_handlershort-circuit at the top ofhandle_failurereturns before the exhaustion branch, so a DL handler's own failure is never re-dead- lettered. No new guard needed. - Tests verify the handler actually fires
(
crates/picloud/tests/dispatcher_e2e.rs, DB-gated):dispatcher_delivers_dead_letter_to_handlernow asserts BOTH row-create AND handler-fire (inline doc updated);dispatcher_delivers_dead_letter_to_handler_actually_firesasserts the nestedoriginalKV event +last_error;dead_letter_source_filter_excludes_nonmatchingexercises the source filter dimension;dead_letter_handler_failure_does_not_recurseproves the recursion-stop (count stays at 1).
6. Realtime signing-key migration notes
- Two-phase, as recommended.
0025_encrypt_realtime_keys.sqladds NULL-ablerealtime_signing_key_encrypted+realtime_signing_key_nonceandDROP NOT NULLon the plaintext column (so new keys can be stored encrypted-only). - Repo:
PostgresAppSecretsReponow holds theMasterKey. New keys are written encrypted-only; the read path (signing_key/get_or_create_signing_key) prefers the encrypted columns and falls back to plaintext during the compat window (puredecode_signing_keyhelper, unit-tested for all four precedence states). - Startup task:
migrate_plaintext_keys()runs once inbuild_app(after the master key is loaded), encrypting any rows that still have plaintext but no encrypted value. Plaintext is left in place for rollback safety. Idempotent. - Plaintext column drop: deferred to v1.1.8 (documented in CHANGELOG + the migration). Operators must upgrade through v1.1.7 (which performs the encryption) before v1.1.8.
- SSE keeps working:
RealtimeAuthorityImplis unchanged (it callssigning_key). Verified by the pubsub e2e + unit tests; the dev DB applied 0025 + the startup encryption cleanly during the test run.
7. Decisions beyond the brief / deviations flagged
-
inbound_secretstored ENCRYPTED (user-approved deviation). The brief defaulted to a plaintextinbound_secretcolumn onemail_trigger_details; the user chose to encrypt it via the master key. Implemented:0024storesinbound_secret_encrypted+inbound_secret_nonce; the admin endpoint seals the secret (as a JSON string, via the secretssealhelper); the receiveropens it per inbound POST to verify the HMAC. Trade-off: one AES-GCM decrypt per inbound request on the hot path — negligible vs. the HMAC + DB round-trip already there. The decrypted secret is never logged. -
Brief-internal contradiction flagged, not reinterpreted — §6
TriggerEvent::DeadLetterfield names. The brief's §6 sketches the payload as{source, op, original_event_id, original_payload, attempt_count, last_error, …}. The actual variant (crates/shared/src/trigger_event.rs) is{dead_letter_id, original: Box<TriggerEvent>, attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at}. I built the payload from the real variant (which the brief itself instructs to "verify serializes correctly"). No type change needed. -
build_appsignature gained aMasterKeyparameter. Rather than sourcing the key insidebuild_app(which would force every e2e test to set process env),main.rssources it and passes it in. The 3 existingbuild_apptest callers pass a fixed test key. -
Pre-existing clippy warnings fixed (see §10). Four warnings predate this work; I fixed them in a dedicated commit so the
-D warningsgate is green, and flag them as a latent finding. -
Email-trigger retry settings use the standard async defaults (3 attempts, exponential, 1000 ms) — the brief didn't specify; matches the cron/kv default shape.
No other deviations from prompt-specified defaults.
8. How to verify locally — §8 attestation (sourced from cargo's literal output)
All gates run on the handed-back HEAD (a7d3dad):
cargo fmt --all -- --check # clean
cargo clippy --all-targets --all-features -- -D warnings # clean (exit 0)
cd dashboard && npm run check # 0 ERRORS 0 WARNINGS (371 files)
Full test run with DATABASE_URL set so the DB-gated suites
(schema_snapshot, dispatcher_e2e ×9, email_inbound ×8) execute:
DATABASE_URL='postgres://picloud:picloud@127.0.0.1:15432/picloud' \
cargo test --workspace -- --test-threads=2
Pass count, summed from cargo's literal output (NOT hand-counted):
DATABASE_URL=... cargo test --workspace -- --test-threads=2 2>&1 | \
awk '/test result: ok\./ { gsub(";", ""); sum += $4 } END { print sum }'
# => 617
617 passed, 0 failed across the workspace (34 test result: lines,
0 FAILED). Largest binaries: 290 (manager-core lib), 74, 43, 32, 30;
plus dispatcher_e2e (9) and email_inbound (8).
Bounded-parallelism note (--test-threads=2): the picloud e2e
binaries each call build_app, which opens its own Postgres pool. Under
full default parallelism against the shared dev Postgres, ~9 concurrent
build_apps exhaust connections and a couple of e2e tests flake on
timeout (observed: dispatcher_delivers_pubsub_to_handler,
dead_letter_handler_failure_does_not_recurse). They pass reliably at
--test-threads=2 and in isolation. CI's dedicated fresh postgres:15
(not a shared dev DB) does not hit this. Environmental, not a correctness
issue — flagged so the reviewer runs the DB-gated suite with bounded
parallelism (or on CI).
Migrations: apply cleanly on the v1.1.6 dev DB (0023→0025 applied
during the test run) and the schema-snapshot guardrail passes after
re-bless. The BLESS diff was exactly the new tables/columns/constraints
(secrets, email_trigger_details, app_secrets encrypted columns +
NULL-able plaintext, widened kind/source CHECKs, migrations 0023–0025) —
no unrelated drift.
Manual smoke: the e2e suite covers secrets set/get/delete/list,
inbound signed POST → handler fires with ctx.event.email, dead-letter
handler fires, realtime-key encryption + SSE. Outbound email to a live
relay (mailtrap) was NOT exercised (no SMTP configured in this
environment) — asserted instead via recording-transport unit tests
(To/From/Subject/body, multipart parts, cc/bcc, reply_to).
9. Open questions for the reviewer
- §8 bounded-parallelism caveat — acceptable, or should the e2e
harness share a single
build_app/pool across tests in a binary? (Out of v1.1.7 scope; the existing v1.1.6 e2e tests have the same shape.) email::sendignoring a strayhtmlkey (forcing text-only) vs. throwing — I chose forgiving text-only; happy to make it strict.- Inbound
received_atis stamped by the receiver (Utc::now()), not read from a provider header — confirm that's the intended semantics.
10. Latent security / correctness findings
-
clippy --all-targets --all-features -- -D warningsdid NOT pass at v1.1.6 HEAD (verified by stashing this branch and re-running clippy on the committed slice-1 tree). Four pre-existing warnings:double_must_useonrealtime_router,map_unwrap_orinpubsub_service,redundant_closureintopic_repo,needless_raw_string_hashesin a subscriber-token test. Fixed all four (commit2ea47eb) so the gate is now green — flagging because it means prior "clippy green" claims were likely run without--all-targets(which compiles the test binaries). -
Inbound HMAC fails closed on decrypt error. If a stored
inbound_secretcan't be decrypted (e.g.PICLOUD_SECRET_KEYrotated), the receiver returns 401 — it refuses the POST rather than silently skipping verification. Intentional. -
No rate limiting on the public inbound-email endpoint. Like every public data-plane route,
/api/v1/email-inbound/...is unauthenticated by design (URL + HMAC are the gate). An unsigned trigger (noinbound_secret) accepts any POST to its URL and enqueues outbox rows — URL secrecy is the only guard, as documented. Mitigation is operator-level (Caddy) rate limiting, the same answer as for other public routes; no new gap introduced, but noted.
11. Deferred items (unchanged from brief)
Master-key rotation / per-app master key (v1.2); native SMTP listener
(v1.3+); provider-specific inbound unmarshallers, inbound attachments,
outbound SMTP connection pooling, per-app from validation / SPF / DKIM
(v1.2 / operator); dashboard inbound payload viewer (v1.2, PII); drop the
plaintext realtime_signing_key column (v1.1.8); secrets
versioning/history + secrets-change triggers (never); users::* (v1.1.8);
queue::* / invoke() (v1.1.9).
12. Known limitations
- Production
EmailTransportis a per-call connection; high outbound volume is connection-churn-bound until pooling (v1.2). - Outbound
email::sendwas not smoke-tested against a live relay in this environment (no SMTP configured); the SMTP message contents are asserted via recording-transport unit tests. - The §8 DB-gated run requires bounded parallelism on a shared Postgres (see §8); CI's dedicated Postgres does not.