Files
PiCloud/HANDBACK.md
MechaCat02 6080fc67f6 docs(v1.1.4): handback report
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 20:26:44 +02:00

14 KiB
Raw Blame History

v1.1.4 Handback — Outbound HTTP SDK & Cron Triggers

Branch: feat/v1.1.4-http-cron (off main) Commits: 1 implementation commit (feat(v1.1.4): outbound HTTP SDK + cron triggers) + this HANDBACK commit.

Note on commit granularity: the brief suggested split feat(v1.1.4-http) / feat(v1.1.4-cron) commits. The two features are interleaved across shared files (Cargo.toml, crates/picloud/src/lib.rs, crates/manager-core/src/lib.rs, version.rs, services.rs), so cleanly-compiling per-theme commits aren't separable without interactive hunk staging (unavailable in this environment). I chose one coherent, green commit over shipping broken intermediates. Squash/relabel as you see fit.


Scope coverage

# Item Status
1 http::* SDK surface (get/post/put/patch/delete/head/post_form/request) Done
2 SSRF deny-list (resolved-IP, DNS-rebinding defense, scheme/port, body caps, UA, timeouts, PICLOUD_HTTP_ALLOW_PRIVATE) Done
3 http authz (Capability::AppHttpRequestscript:write, script-as-gate, no new Scope) Done
4 HttpService trait + HttpServiceImpl + Services wiring Done
5 Cron migration 0017 (Layout-E extension) Done
6 Cron scheduler tokio task (catch-up = fire-once) Done
7 ctx.event.cron shape + TriggerEvent::Cron Done
8 Dispatcher routing extension (… | Cron) Done
9 Dashboard cron trigger UI (minimal) Done
10a Redact ModuleSourceError::Backend at resolver boundary Done
10b Pin rhai = "=1.24" Done
10c CHANGELOG retroactive v1.1.3 cross-app-trigger security note Done
11 Version bumps (workspace 1.1.4, SDK 1.5, dashboard 0.10.0) Done
12 Tests (~50-70) Done — 70 new

SSRF policy implementation notes

  • reqwest hook. crates/manager-core/src/ssrf.rs defines SsrfResolver implementing reqwest::dns::Resolve, plugged in via ClientBuilder::dns_resolver. It delegates to the system resolver (injectable for tests — see DNS-rebinding test), then filters each IpAddr through SsrfPolicy::check. Because reqwest re-resolves at every connection (including each redirect hop), the policy applies post-redirect too.
  • dns_resolver is generic over a concrete R: Resolve (stores Arc<R>), so the resolver is passed as Arc<SsrfResolver>, not Arc<dyn Resolve>.
  • Literal-IP gap closed. reqwest only routes hostnames through the custom resolver — a URL with a literal-IP host (http://127.0.0.1/) bypasses it entirely. The impl therefore also runs SsrfPolicy::check on literal-IP hosts at URL-parse time (validate_url), on every hop. Both paths are tested.
  • IPv4-mapped IPv6 re-check. check_v6 calls Ipv6Addr::to_ipv4_mapped(); if Some, it re-runs the v4 deny-list against the embedded address (::ffff:127.0.0.1 → denied as "loopback").
  • Applied before AND after redirects. Redirects are followed manually (client built with redirect(Policy::none())) so per-request follow_redirects/max_redirects are honored; each hop re-validates scheme/port + literal-IP and re-resolves hostnames through the SSRF resolver.
  • Script-visible error format. "http: blocked by SSRF policy: <reason>" where <reason> is a CIDR category (loopback, private, link-local, carrier-grade-nat, multicast, reserved, unique-local, unspecified). The resolved IP is never included. The all-addresses-denied case surfaces as Ssrf (not a generic DNS error) via a marker error the resolver emits and the impl detects by walking the reqwest error source chain.

Cron scheduler implementation notes

  • Catch-up = fire-once. Matches the brief; no deviation. next_due returns a single canonical scheduled-at (first slot after last_fired_at, or created_at if never fired); after firing, last_fired_at = now, so the next tick sees only future slots. Verified live against Postgres: an every-second (* * * * * *) trigger with a 2s tick advanced last_fired_at ~once per 2s, not once per second.
  • No ExecutionGate contention. The scheduler only enqueues to the outbox (one row per due trigger per tick, in a FOR UPDATE OF d SKIP LOCKED transaction that also bumps last_fired_at). The existing dispatcher acquires the gate and delivers it identically to kv/docs/dead_letter — verified live (the cron outbox row was consumed, the script executed, the row deleted).
  • Timezone handling. chrono-tz. Invalid IANA names are rejected at the admin endpoint with a 422 (TriggersApiError::Invalid, message contains "timezone"); the repo re-validates defensively before insert.
  • Schema beyond the brief: none. Followed the brief exactly — schedule, timezone DEFAULT 'UTC', last_fired_at, idx_cron_triggers_due. No stored next_scheduled_at column (an exploration agent suggested one; the brief computes next-fire in-process, which I followed).

Tests added (70 new)

  • SSRF policy + resolver (ssrf.rs, 20): one per deny CIDR (127/8, 0/8, 10/8, 172.16/12, 192.168/16, 169.254/16 incl. metadata, 100.64/10, 224/4, 240/4, ::1, ::, fe80::/10, fc00::/7, ff00::/8); 172.x outside-range allowed; public v4/v6 allowed; IPv4-mapped re-check; allow_private disables all; resolver returns only allowed addrs; all-denied → SSRF marker; DNS rebinding (mock resolver: public then private — second denied); empty resolution ≠ SSRF.
  • HTTP client (http_service.rs, 16): GET/POST round-trips vs a hand-rolled TcpListener; body dispatch + default UA; custom UA override; empty body; non-2xx no-error; response cap via Content-Length; response cap mid-stream (no Content-Length); request body cap pre-send; redirect-to-max-then-throw; scheme rejection (file/ftp/gopher); port rejection (22/25/465/587); SSRF literal-loopback; SSRF hostname-resolves-to-loopback; timeout; authz (anon skips / member forbidden / member-with-role allowed).
  • Bridge integration (sdk_http.rs, 15): real Rhai engine under spawn_blocking vs a recording fake — status+JSON body, non-JSON string, empty→(), Map→JSON, String→text, ()→no body, headers+timeout forwarded, unknown opt key throws, timeout>max throws, non-2xx no-throw, network error throws http:, post_form url-encoding, request arbitrary method, default-UA carries script_id, cx.app_id forwarded for attribution.
  • Cron scheduler (cron_scheduler.rs, 11): 6-field schedule accept / 5-field
    • malformed reject; IANA tz accept / reject; due/not-due; never-fired uses created_at; catch-up fires exactly once after 5 missed windows; timezone affects fire time; bad schedule/tz → None.
  • Cron admin (triggers_api.rs, 6): create succeeds; invalid schedule; unknown timezone; module target rejected (v1.1.3 regression); cross-app target rejected (v1.1.3 regression); member-without-role forbidden.
  • Module redaction (2): modules.rs — backend error redacted from the script-visible error (no leak); module_redaction_logging.rs — original error is logged at ERROR level (captured via a global tracing subscriber).

Decisions beyond the brief (every prompt-default deviation, flagged)

  1. Three-arg split verb(url, body, opts) (user-approved during planning). Diverges from the brief's documented two-arg (url, opts) shape and generalizes the escape hatch to request(method, url, body, opts). Resolves the brief's internal contradiction (its Slack example http::post(url, #{text:...}) passed a bare body map, which would be an "unknown opt key" under the two-arg rule). The opts vocabulary is now exactly {headers, timeout_ms, follow_redirects, max_redirects}body_raw was dropped (raw strings go through the positional body as a String). The Slack example works unchanged (#{text:...} is the body).
  2. Cron crate = cron (0.12), not croner. The brief allowed either; cron handles the 6-field-with-seconds format and named weekdays (MON-FRI) used in the brief's example, and integrates with chrono Schedule::after.
  3. Catch-up = fire-once — matches the brief; called out explicitly as requested. No deviation.
  4. SdkCallCx gained a script_id field. The brief's default User-Agent is picloud/<v> (script:<script_id>), but SdkCallCx didn't carry the script id. Adding it (sourced from ExecRequest.script_id in the engine) is the clean home and doubles as the audit-attribution key the brief emphasizes. All 19 construction sites updated. The dead-letters admin cx uses a fresh sentinel id (no script executes there).
  5. SSRF also blocks IPv6 unspecified :: and IPv4 0.0.0.0 with reason "unspecified". 0.0.0.0/8 is in the brief's list; :: is not explicitly but is an obvious sibling hole, so I blocked it too (defensible superset).
  6. No reqwest feature additions neededdns_resolver and Response::chunk() compile under the existing default-features = false, features = ["json","rustls-tls"]. No cookie jar (cookies feature is off, so there's no jar to disable). Added url as an executor-core dep (for form_urlencoded in post_form).

How to verify locally (§8 attestation — run on this exact HEAD)

All four gates were run on the handback HEAD (the feat(v1.1.4) commit, before this markdown commit):

cargo fmt --all -- --check                               → exit 0
cargo clippy --all-targets --all-features -- -D warnings → exit 0
cargo test --workspace                                   → 427 passed, 0 failed
(cd dashboard && npm run check)                          → 0 errors, 0 warnings (369 files)

This HANDBACK commit is pure markdown (no gate-relevant files), so the numbers above hold for the final HEAD.

Migrations — verified against a real Postgres (dev stack, port 15432):

  • Fresh-DB replay: the #[sqlx::test] schema-snapshot test applies all migrations on a fresh ephemeral DB and matches the (re-blessed) golden — passes.
  • On-top-of-prior-state: booting picloud against a dev DB pinned at migration 0006 applied 0007…0017 cleanly ("migrations applied"); _sqlx_migrations max is now 17; cron_trigger_details + widened CHECKs present.

Live smoke performed:

  • Boot logged the PICLOUD_HTTP_ALLOW_PRIVATE warning and started the cron scheduler + HTTP service without panic.
  • Seeded an every-second cron trigger → scheduler set last_fired_at, dispatcher consumed the outbox row and ran the script (row deleted on success), and last_fired_at advanced at the tick cadence (fire-once confirmed). Smoke data cleaned up afterward.
  • HTTP GET / SSRF-block / body-dispatch behaviors are covered by the automated integration tests (real TcpListener round-trips + loopback/hostname SSRF blocks) rather than a manual curl flow, since a live SSRF-block smoke conflicts with the PICLOUD_HTTP_ALLOW_PRIVATE a local-server smoke requires.

To re-run the schema snapshot:

docker compose up -d postgres
DATABASE_URL=postgres://picloud:picloud@127.0.0.1:15432/picloud \
  cargo test -p picloud-manager-core --test schema_snapshot -- --include-ignored

⚠️ Latent issue found: stale schema-snapshot golden

crates/manager-core/tests/expected_schema.txt was significantly stale — the committed golden was missing many tables from prior releases (abandoned_executions, dead_letters, dead_letter_trigger_details, docs_*, etc.). The schema_snapshot test is #[ignore] (needs a DB), so it was apparently never re-blessed across v1.1.1v1.1.3 and silently drifted.

I re-blessed it, so the diff is large (+217 lines) but **only cron_trigger_details

  • the two widened CHECK constraints are v1.1.4-new** — the rest is pre-existing drift correction. The blessed golden now matches a clean replay (verified). Recommend the reviewer skim the diff to confirm, and consider whether the #[ignore] should be lifted in CI (with a DB service) so the golden can't drift again.

Latent security findings

None new beyond the (already-known, already-closed-in-v1.1.3) cross-app trigger gap, which §10c now documents in the CHANGELOG. The SSRF surface is the main security mechanism in this release; see the SSRF notes above for the defense-in-depth layering (resolver hook + literal-IP check + per-hop re-validation + IP-never-leaked errors).

One thing for the reviewer to weigh: the SSRF policy is a hardcoded deny-list with no per-app allow-list (deferred to v1.2 per the brief). An operator who needs a script to reach a private service has only the all-or-nothing PICLOUD_HTTP_ALLOW_PRIVATE global escape hatch today.

Open questions for the reviewer

  1. Three-arg HTTP shape (decision #1) — confirm you're happy with verb(url, body, opts) + dropping body_raw, vs the brief's documented two-arg form. This is the one user-facing API-shape divergence.
  2. Stale schema golden — OK to land the full re-bless in this PR, or would you prefer the drift correction split out?

Deferred items (per the brief's OUT list — not built)

WebSocket/SSE, streaming responses, HTTP/3, per-app outbound allow/deny lists, per-app rate limits, mTLS, request signing, cookie jar, cron backfill replay, cron next-fire preview, cron schedule history, drift compensation, module-import-over-HTTP, files/pubsub/secrets/email/users/queue.

Known limitations / rough edges

  • The dashboard Triggers tab lists all trigger kinds but only creates cron triggers (kv/docs creation remains API-only, unchanged from before). No next-fire-at preview (deferred to v1.2).
  • post_form / body field order follows Rhai map iteration order (BTreeMap-backed, so sorted/deterministic; not insertion order).
  • The cron scheduler tick is floored at 1s; sub-second schedules effectively fire at the tick cadence (by design — see the fire-once policy).
  • The stale REVIEW.md at repo root is the v1.1.3 reviewer's artifact; the v1.1.4 reviewer should overwrite it.