Files
PiCloud/HANDBACK.md
MechaCat02 3cfb795206 docs(v1.1.7): handback report
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 22:46:33 +02:00

16 KiB
Raw Blame History

v1.1.7 — Configuration & Email — HANDBACK

Branch: feat/v1.1.7-secrets-email (9 commits off main, not pushed) Status: ready for review. NOT merged, NOT pushed, no PR opened.

a7d3dad chore(v1.1.7): re-bless schema snapshot for secrets + email migrations
2ea47eb chore(v1.1.7): fix clippy --all-targets warnings
b355851 chore(v1.1.7): version bumps + CHANGELOG
fffcdf6 feat(v1.1.7-realtime-migration): encrypt signing keys at rest
02335a8 fix(v1.1.7-dead-letter): wire dispatcher → list_matching_dead_letter
1f78937 feat(v1.1.7-email-inbound): webhook receiver + email:receive trigger
8f2d2bc feat(v1.1.7-email-outbound): SMTP send/send_html
2d11090 feat(v1.1.7-secrets): secrets SDK + table + admin API + dashboard
dc2e4fa feat(v1.1.7-crypto): master-key infra + encryption helpers

1. Scope coverage

Item Status
Encryption infrastructure (master key + AES-256-GCM envelope) Done
secrets::* SDK + 0023_secrets.sql + admin API + dashboard tab Done
Outbound email email::send / email::send_html (lettre SMTP) Done
Inbound email webhook receiver + email:receive trigger + 0024 Done (full scope, per user decision)
Dispatcher routing for email Done
dead_letter handler wiring fix Done
Realtime signing-key encryption (two-phase) + 0025 Done
Dashboard (Secrets tab, email trigger form, npm run check) Done
Version bumps (1.1.7 / SDK 1.8 / dashboard 0.13.0) + CHANGELOG Done
Tests (match v1.1.5/v1.1.6 density) Done

Nothing deferred from scope-in. Inbound email (the deferrable-if-scope- blew-up piece) was implemented in full.


2. Encryption infrastructure notes

  • Module: crates/shared/src/crypto.rs (picloud_shared::crypto).
  • Master-key sourcing (MasterKey::from_envresolve):
    • PICLOUD_SECRET_KEY = base64 of exactly 32 bytes. Missing → MasterKeyError::Missing (fatal); non-base64 → Malformed; wrong length → WrongLength. Sourced in main.rs::run_server before any DB workbuild_app takes the MasterKey as a parameter (so tests pass a fixed key and don't mutate process env).
    • Dev fallback: deterministic key (SHA-256("picloud-dev-master-key-v1.1.7")) used ONLY when PICLOUD_SECRET_KEY is unset AND PICLOUD_DEV_MODE=true, with a prominent warn!. No quiet unencrypted mode.
  • aes-gcm version: 0.10 (features aes, alloc). Aes256Gcm.
  • Nonce generation: 12 bytes from rand::thread_rng().fill_bytes (OS-CSPRNG-seeded), per-encryption.
  • Storage layout: ciphertext with the 16-byte GCM auth tag appended (RustCrypto Aead-trait layout — encrypt returns ciphertext || tag, decrypt consumes the same). The 12-byte nonce is stored in a separate column. MasterKey's Debug is redacted.
  • Plaintext cap (secrets): 64 KB default, enforced in secrets_service::seal (the SDK boundary) → SecretsError::TooLarge with limit + actual size. Override: PICLOUD_SECRET_MAX_VALUE_BYTES.
  • Key rotation: out of scope. Documented in CHANGELOG + the module docs that changing PICLOUD_SECRET_KEY orphans all ciphertext.

3. Secrets notes

  • SecretsService (trait, picloud-shared) → SecretsServiceImpl + PostgresSecretsRepo (manager-core) → Rhai bridge (executor-core/src/sdk/secrets.rs). Collection-less; app_id from cx.app_id.
  • JSON round-trip: set serializes the value to JSON bytes, caps, encrypts; get decrypts + deserializes — a String returns a String (not a JSON-quoted "\"…\""). Verified by unit + bridge tests.
  • No ServiceEvent emission (secret writes don't fire triggers).
  • Admin API: GET/POST/DELETE /api/v1/admin/apps/{id}/secrets; list returns names + updated_at only.
  • Authz: Capability::AppSecretsRead/Writescript:read/script:write. No new Scope variants (seven-scope commitment held).

4. Email implementation notes

  • SMTP transport: lettre 0.11 (smtp-transport, tokio1-rustls-tls, builder, hostname). Connection model: one connection per call (lettre default); pooling deferred to v1.2. The transport sits behind an internal EmailTransport trait so the service is unit-tested with a recording fake (no live SMTP).
  • Disabled mode: if HOST/USER/PASSWORD aren't all set, EmailServiceImpl::from_env builds no transport and every send returns NotConfigured (warned at startup). A malformed relay descriptor is also logged and yields disabled mode (email is non-critical; never blocks startup).
  • Address validation: hand-rolled RFC 5322-ish pre-check (single @, non-empty local part, domain contains a dot, ≤320 bytes) followed by a lettre::Mailbox parse (the authoritative validator). No deliverability check.
  • Size cap: 25 MB on message.formatted(), PICLOUD_EMAIL_MAX_MESSAGE_BYTES.
  • email::send forces text-only (ignores any html); email::send_html requires html and builds MultiPart::alternative_plain_html. reply_to defaults to from. to/cc/bcc accept a String or an Array of Strings.
  • Inbound normalization: only the generic provider-agnostic JSON shape {from,to[],cc[],subject,text,html,message_id} is accepted in v1.1.7 — from required, rest default. Provider-specific unmarshallers → v1.2. The expected shape is documented on the dashboard email-trigger form.

5. Dead-letter handler fix notes

  • Call site: dispatcher::handle_failure, the retry-exhaustion branch. After DeadLetterRepo::insert (which returns the new DeadLetterId), a new helper fan_out_dead_letter runs.
  • What it does: calls TriggerRepo::list_matching_dead_letter(app_id, source, row.trigger_id, Some(resolved.script_id)) (the method that had no production caller) and inserts one outbox row per match (source_kind = DeadLetter, the DL trigger's id + handler script id, trigger_depth + 1, origin_principal = the DL trigger's registered principal).
  • Payload — built from the REAL TriggerEvent::DeadLetter variant, not the brief's §6 field list (see §7 deviations): { dead_letter_id, original: Box::new(decoded row payload), attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at }. If the outbox payload can't be decoded back into a TriggerEvent (so the nested original can't be built), the fan-out is skipped — the dead-letter row is still durably written.
  • Recursion-stop: unchanged. The is_dead_letter_handler short-circuit at the top of handle_failure returns before the exhaustion branch, so a DL handler's own failure is never re-dead- lettered. No new guard needed.
  • Tests verify the handler actually fires (crates/picloud/tests/dispatcher_e2e.rs, DB-gated): dispatcher_delivers_dead_letter_to_handler now asserts BOTH row-create AND handler-fire (inline doc updated); dispatcher_delivers_dead_letter_to_handler_actually_fires asserts the nested original KV event + last_error; dead_letter_source_filter_excludes_nonmatching exercises the source filter dimension; dead_letter_handler_failure_does_not_recurse proves the recursion-stop (count stays at 1).

6. Realtime signing-key migration notes

  • Two-phase, as recommended. 0025_encrypt_realtime_keys.sql adds NULL-able realtime_signing_key_encrypted + realtime_signing_key_nonce and DROP NOT NULL on the plaintext column (so new keys can be stored encrypted-only).
  • Repo: PostgresAppSecretsRepo now holds the MasterKey. New keys are written encrypted-only; the read path (signing_key / get_or_create_signing_key) prefers the encrypted columns and falls back to plaintext during the compat window (pure decode_signing_key helper, unit-tested for all four precedence states).
  • Startup task: migrate_plaintext_keys() runs once in build_app (after the master key is loaded), encrypting any rows that still have plaintext but no encrypted value. Plaintext is left in place for rollback safety. Idempotent.
  • Plaintext column drop: deferred to v1.1.8 (documented in CHANGELOG + the migration). Operators must upgrade through v1.1.7 (which performs the encryption) before v1.1.8.
  • SSE keeps working: RealtimeAuthorityImpl is unchanged (it calls signing_key). Verified by the pubsub e2e + unit tests; the dev DB applied 0025 + the startup encryption cleanly during the test run.

7. Decisions beyond the brief / deviations flagged

  1. inbound_secret stored ENCRYPTED (user-approved deviation). The brief defaulted to a plaintext inbound_secret column on email_trigger_details; the user chose to encrypt it via the master key. Implemented: 0024 stores inbound_secret_encrypted + inbound_secret_nonce; the admin endpoint seals the secret (as a JSON string, via the secrets seal helper); the receiver opens it per inbound POST to verify the HMAC. Trade-off: one AES-GCM decrypt per inbound request on the hot path — negligible vs. the HMAC + DB round-trip already there. The decrypted secret is never logged.

  2. Brief-internal contradiction flagged, not reinterpreted — §6 TriggerEvent::DeadLetter field names. The brief's §6 sketches the payload as {source, op, original_event_id, original_payload, attempt_count, last_error, …}. The actual variant (crates/shared/src/trigger_event.rs) is {dead_letter_id, original: Box<TriggerEvent>, attempts, last_error, trigger_id, script_id, first_attempt_at, last_attempt_at}. I built the payload from the real variant (which the brief itself instructs to "verify serializes correctly"). No type change needed.

  3. build_app signature gained a MasterKey parameter. Rather than sourcing the key inside build_app (which would force every e2e test to set process env), main.rs sources it and passes it in. The 3 existing build_app test callers pass a fixed test key.

  4. Pre-existing clippy warnings fixed (see §10). Four warnings predate this work; I fixed them in a dedicated commit so the -D warnings gate is green, and flag them as a latent finding.

  5. Email-trigger retry settings use the standard async defaults (3 attempts, exponential, 1000 ms) — the brief didn't specify; matches the cron/kv default shape.

No other deviations from prompt-specified defaults.


8. How to verify locally — §8 attestation (sourced from cargo's literal output)

All gates run on the handed-back HEAD (a7d3dad):

cargo fmt --all -- --check                                   # clean
cargo clippy --all-targets --all-features -- -D warnings     # clean (exit 0)
cd dashboard && npm run check                                # 0 ERRORS 0 WARNINGS (371 files)

Full test run with DATABASE_URL set so the DB-gated suites (schema_snapshot, dispatcher_e2e ×9, email_inbound ×8) execute:

DATABASE_URL='postgres://picloud:picloud@127.0.0.1:15432/picloud' \
  cargo test --workspace -- --test-threads=2

Pass count, summed from cargo's literal output (NOT hand-counted):

DATABASE_URL=... cargo test --workspace -- --test-threads=2 2>&1 | \
  awk '/test result: ok\./ { gsub(";", ""); sum += $4 } END { print sum }'
# => 617

617 passed, 0 failed across the workspace (34 test result: lines, 0 FAILED). Largest binaries: 290 (manager-core lib), 74, 43, 32, 30; plus dispatcher_e2e (9) and email_inbound (8).

Bounded-parallelism note (--test-threads=2): the picloud e2e binaries each call build_app, which opens its own Postgres pool. Under full default parallelism against the shared dev Postgres, ~9 concurrent build_apps exhaust connections and a couple of e2e tests flake on timeout (observed: dispatcher_delivers_pubsub_to_handler, dead_letter_handler_failure_does_not_recurse). They pass reliably at --test-threads=2 and in isolation. CI's dedicated fresh postgres:15 (not a shared dev DB) does not hit this. Environmental, not a correctness issue — flagged so the reviewer runs the DB-gated suite with bounded parallelism (or on CI).

Migrations: apply cleanly on the v1.1.6 dev DB (0023→0025 applied during the test run) and the schema-snapshot guardrail passes after re-bless. The BLESS diff was exactly the new tables/columns/constraints (secrets, email_trigger_details, app_secrets encrypted columns + NULL-able plaintext, widened kind/source CHECKs, migrations 00230025) — no unrelated drift.

Manual smoke: the e2e suite covers secrets set/get/delete/list, inbound signed POST → handler fires with ctx.event.email, dead-letter handler fires, realtime-key encryption + SSE. Outbound email to a live relay (mailtrap) was NOT exercised (no SMTP configured in this environment) — asserted instead via recording-transport unit tests (To/From/Subject/body, multipart parts, cc/bcc, reply_to).


9. Open questions for the reviewer

  1. §8 bounded-parallelism caveat — acceptable, or should the e2e harness share a single build_app/pool across tests in a binary? (Out of v1.1.7 scope; the existing v1.1.6 e2e tests have the same shape.)
  2. email::send ignoring a stray html key (forcing text-only) vs. throwing — I chose forgiving text-only; happy to make it strict.
  3. Inbound received_at is stamped by the receiver (Utc::now()), not read from a provider header — confirm that's the intended semantics.

10. Latent security / correctness findings

  1. clippy --all-targets --all-features -- -D warnings did NOT pass at v1.1.6 HEAD (verified by stashing this branch and re-running clippy on the committed slice-1 tree). Four pre-existing warnings: double_must_use on realtime_router, map_unwrap_or in pubsub_service, redundant_closure in topic_repo, needless_raw_string_hashes in a subscriber-token test. Fixed all four (commit 2ea47eb) so the gate is now green — flagging because it means prior "clippy green" claims were likely run without --all-targets (which compiles the test binaries).

  2. Inbound HMAC fails closed on decrypt error. If a stored inbound_secret can't be decrypted (e.g. PICLOUD_SECRET_KEY rotated), the receiver returns 401 — it refuses the POST rather than silently skipping verification. Intentional.

  3. No rate limiting on the public inbound-email endpoint. Like every public data-plane route, /api/v1/email-inbound/... is unauthenticated by design (URL + HMAC are the gate). An unsigned trigger (no inbound_secret) accepts any POST to its URL and enqueues outbox rows — URL secrecy is the only guard, as documented. Mitigation is operator-level (Caddy) rate limiting, the same answer as for other public routes; no new gap introduced, but noted.


11. Deferred items (unchanged from brief)

Master-key rotation / per-app master key (v1.2); native SMTP listener (v1.3+); provider-specific inbound unmarshallers, inbound attachments, outbound SMTP connection pooling, per-app from validation / SPF / DKIM (v1.2 / operator); dashboard inbound payload viewer (v1.2, PII); drop the plaintext realtime_signing_key column (v1.1.8); secrets versioning/history + secrets-change triggers (never); users::* (v1.1.8); queue::* / invoke() (v1.1.9).


12. Known limitations

  • Production EmailTransport is a per-call connection; high outbound volume is connection-churn-bound until pooling (v1.2).
  • Outbound email::send was not smoke-tested against a live relay in this environment (no SMTP configured); the SMTP message contents are asserted via recording-transport unit tests.
  • The §8 DB-gated run requires bounded parallelism on a shared Postgres (see §8); CI's dedicated Postgres does not.