253 lines
14 KiB
Markdown
253 lines
14 KiB
Markdown
# v1.1.4 Handback — Outbound HTTP SDK & Cron Triggers
|
||
|
||
**Branch:** `feat/v1.1.4-http-cron` (off `main`)
|
||
**Commits:** 1 implementation commit (`feat(v1.1.4): outbound HTTP SDK + cron triggers`) + this HANDBACK commit.
|
||
|
||
> **Note on commit granularity:** the brief suggested split
|
||
> `feat(v1.1.4-http)` / `feat(v1.1.4-cron)` commits. The two features are
|
||
> interleaved across shared files (`Cargo.toml`, `crates/picloud/src/lib.rs`,
|
||
> `crates/manager-core/src/lib.rs`, `version.rs`, `services.rs`), so
|
||
> cleanly-*compiling* per-theme commits aren't separable without interactive
|
||
> hunk staging (unavailable in this environment). I chose one coherent,
|
||
> green commit over shipping broken intermediates. Squash/relabel as you see fit.
|
||
|
||
---
|
||
|
||
## Scope coverage
|
||
|
||
| # | Item | Status |
|
||
|---|------|--------|
|
||
| 1 | `http::*` SDK surface (get/post/put/patch/delete/head/post_form/request) | **Done** |
|
||
| 2 | SSRF deny-list (resolved-IP, DNS-rebinding defense, scheme/port, body caps, UA, timeouts, `PICLOUD_HTTP_ALLOW_PRIVATE`) | **Done** |
|
||
| 3 | http authz (`Capability::AppHttpRequest` → `script:write`, script-as-gate, no new Scope) | **Done** |
|
||
| 4 | `HttpService` trait + `HttpServiceImpl` + Services wiring | **Done** |
|
||
| 5 | Cron migration `0017` (Layout-E extension) | **Done** |
|
||
| 6 | Cron scheduler tokio task (catch-up = fire-once) | **Done** |
|
||
| 7 | `ctx.event.cron` shape + `TriggerEvent::Cron` | **Done** |
|
||
| 8 | Dispatcher routing extension (`… \| Cron`) | **Done** |
|
||
| 9 | Dashboard cron trigger UI (minimal) | **Done** |
|
||
| 10a | Redact `ModuleSourceError::Backend` at resolver boundary | **Done** |
|
||
| 10b | Pin `rhai = "=1.24"` | **Done** |
|
||
| 10c | CHANGELOG retroactive v1.1.3 cross-app-trigger security note | **Done** |
|
||
| 11 | Version bumps (workspace 1.1.4, SDK 1.5, dashboard 0.10.0) | **Done** |
|
||
| 12 | Tests (~50-70) | **Done** — 70 new |
|
||
|
||
---
|
||
|
||
## SSRF policy implementation notes
|
||
|
||
- **reqwest hook.** `crates/manager-core/src/ssrf.rs` defines `SsrfResolver`
|
||
implementing `reqwest::dns::Resolve`, plugged in via
|
||
`ClientBuilder::dns_resolver`. It delegates to the system resolver
|
||
(injectable for tests — see DNS-rebinding test), then filters each `IpAddr`
|
||
through `SsrfPolicy::check`. Because reqwest re-resolves at every connection
|
||
(including each redirect hop), the policy applies post-redirect too.
|
||
- **`dns_resolver` is generic over a concrete `R: Resolve`** (stores `Arc<R>`),
|
||
so the resolver is passed as `Arc<SsrfResolver>`, not `Arc<dyn Resolve>`.
|
||
- **Literal-IP gap closed.** reqwest only routes *hostnames* through the custom
|
||
resolver — a URL with a literal-IP host (`http://127.0.0.1/`) bypasses it
|
||
entirely. The impl therefore *also* runs `SsrfPolicy::check` on literal-IP
|
||
hosts at URL-parse time (`validate_url`), on every hop. Both paths are tested.
|
||
- **IPv4-mapped IPv6 re-check.** `check_v6` calls `Ipv6Addr::to_ipv4_mapped()`;
|
||
if `Some`, it re-runs the v4 deny-list against the embedded address
|
||
(`::ffff:127.0.0.1` → denied as "loopback").
|
||
- **Applied before AND after redirects.** Redirects are followed *manually*
|
||
(client built with `redirect(Policy::none())`) so per-request
|
||
`follow_redirects`/`max_redirects` are honored; each hop re-validates
|
||
scheme/port + literal-IP and re-resolves hostnames through the SSRF resolver.
|
||
- **Script-visible error format.** `"http: blocked by SSRF policy: <reason>"`
|
||
where `<reason>` is a CIDR category (`loopback`, `private`, `link-local`,
|
||
`carrier-grade-nat`, `multicast`, `reserved`, `unique-local`, `unspecified`).
|
||
**The resolved IP is never included.** The all-addresses-denied case surfaces
|
||
as `Ssrf` (not a generic DNS error) via a marker error the resolver emits and
|
||
the impl detects by walking the reqwest error source chain.
|
||
|
||
## Cron scheduler implementation notes
|
||
|
||
- **Catch-up = fire-once.** Matches the brief; no deviation. `next_due` returns a
|
||
single canonical scheduled-at (first slot after `last_fired_at`, or
|
||
`created_at` if never fired); after firing, `last_fired_at = now`, so the next
|
||
tick sees only future slots. **Verified live** against Postgres: an
|
||
every-second (`* * * * * *`) trigger with a 2s tick advanced `last_fired_at`
|
||
~once per 2s, not once per second.
|
||
- **No ExecutionGate contention.** The scheduler only enqueues to the outbox
|
||
(one row per due trigger per tick, in a `FOR UPDATE OF d SKIP LOCKED`
|
||
transaction that also bumps `last_fired_at`). The existing dispatcher acquires
|
||
the gate and delivers it identically to kv/docs/dead_letter — verified live
|
||
(the cron outbox row was consumed, the script executed, the row deleted).
|
||
- **Timezone handling.** `chrono-tz`. Invalid IANA names are rejected at the
|
||
admin endpoint with a 422 (`TriggersApiError::Invalid`, message contains
|
||
"timezone"); the repo re-validates defensively before insert.
|
||
- **Schema beyond the brief:** none. Followed the brief exactly — `schedule`,
|
||
`timezone DEFAULT 'UTC'`, `last_fired_at`, `idx_cron_triggers_due`. **No**
|
||
stored `next_scheduled_at` column (an exploration agent suggested one; the
|
||
brief computes next-fire in-process, which I followed).
|
||
|
||
---
|
||
|
||
## Tests added (70 new)
|
||
|
||
- **SSRF policy + resolver (`ssrf.rs`, 20):** one per deny CIDR (127/8, 0/8,
|
||
10/8, 172.16/12, 192.168/16, 169.254/16 incl. metadata, 100.64/10, 224/4,
|
||
240/4, ::1, ::, fe80::/10, fc00::/7, ff00::/8); 172.x outside-range allowed;
|
||
public v4/v6 allowed; IPv4-mapped re-check; `allow_private` disables all;
|
||
resolver returns only allowed addrs; all-denied → SSRF marker; **DNS rebinding**
|
||
(mock resolver: public then private — second denied); empty resolution ≠ SSRF.
|
||
- **HTTP client (`http_service.rs`, 16):** GET/POST round-trips vs a hand-rolled
|
||
`TcpListener`; body dispatch + default UA; custom UA override; empty body;
|
||
non-2xx no-error; response cap via Content-Length; response cap mid-stream
|
||
(no Content-Length); request body cap pre-send; redirect-to-max-then-throw;
|
||
scheme rejection (file/ftp/gopher); port rejection (22/25/465/587); SSRF
|
||
literal-loopback; SSRF hostname-resolves-to-loopback; timeout; authz
|
||
(anon skips / member forbidden / member-with-role allowed).
|
||
- **Bridge integration (`sdk_http.rs`, 15):** real Rhai engine under
|
||
`spawn_blocking` vs a recording fake — status+JSON body, non-JSON string,
|
||
empty→`()`, Map→JSON, String→text, `()`→no body, headers+timeout forwarded,
|
||
unknown opt key throws, timeout>max throws, non-2xx no-throw, network error
|
||
throws `http:`, `post_form` url-encoding, `request` arbitrary method,
|
||
default-UA carries `script_id`, `cx.app_id` forwarded for attribution.
|
||
- **Cron scheduler (`cron_scheduler.rs`, 11):** 6-field schedule accept / 5-field
|
||
+ malformed reject; IANA tz accept / reject; due/not-due; never-fired uses
|
||
created_at; **catch-up fires exactly once after 5 missed windows**; timezone
|
||
affects fire time; bad schedule/tz → None.
|
||
- **Cron admin (`triggers_api.rs`, 6):** create succeeds; invalid schedule;
|
||
unknown timezone; **module target rejected** (v1.1.3 regression); **cross-app
|
||
target rejected** (v1.1.3 regression); member-without-role forbidden.
|
||
- **Module redaction (2):** `modules.rs` — backend error redacted from the
|
||
script-visible error (no leak); `module_redaction_logging.rs` — original error
|
||
**is** logged at ERROR level (captured via a global tracing subscriber).
|
||
|
||
---
|
||
|
||
## Decisions beyond the brief (every prompt-default deviation, flagged)
|
||
|
||
1. **Three-arg split `verb(url, body, opts)`** (user-approved during planning).
|
||
Diverges from the brief's documented two-arg `(url, opts)` shape and
|
||
generalizes the escape hatch to `request(method, url, body, opts)`. Resolves
|
||
the brief's internal contradiction (its Slack example `http::post(url, #{text:...})`
|
||
passed a bare body map, which would be an "unknown opt key" under the
|
||
two-arg rule). The `opts` vocabulary is now exactly
|
||
`{headers, timeout_ms, follow_redirects, max_redirects}` — **`body_raw` was
|
||
dropped** (raw strings go through the positional body as a String). The
|
||
Slack example works unchanged (`#{text:...}` is the body).
|
||
2. **Cron crate = `cron` (0.12), not `croner`.** The brief allowed either; `cron`
|
||
handles the 6-field-with-seconds format and named weekdays (`MON-FRI`) used in
|
||
the brief's example, and integrates with chrono `Schedule::after`.
|
||
3. **Catch-up = fire-once** — matches the brief; called out explicitly as
|
||
requested. No deviation.
|
||
4. **`SdkCallCx` gained a `script_id` field.** The brief's default User-Agent is
|
||
`picloud/<v> (script:<script_id>)`, but `SdkCallCx` didn't carry the script
|
||
id. Adding it (sourced from `ExecRequest.script_id` in the engine) is the
|
||
clean home and doubles as the audit-attribution key the brief emphasizes. All
|
||
19 construction sites updated. The dead-letters admin cx uses a fresh sentinel
|
||
id (no script executes there).
|
||
5. **SSRF also blocks IPv6 unspecified `::` and IPv4 `0.0.0.0`** with reason
|
||
"unspecified". `0.0.0.0/8` is in the brief's list; `::` is not explicitly but
|
||
is an obvious sibling hole, so I blocked it too (defensible superset).
|
||
6. **No reqwest feature additions needed** — `dns_resolver` and `Response::chunk()`
|
||
compile under the existing `default-features = false, features = ["json","rustls-tls"]`.
|
||
No cookie jar (cookies feature is off, so there's no jar to disable). Added
|
||
`url` as an executor-core dep (for `form_urlencoded` in `post_form`).
|
||
|
||
---
|
||
|
||
## How to verify locally (§8 attestation — run on this exact HEAD)
|
||
|
||
All four gates were run on the handback HEAD (the `feat(v1.1.4)` commit, before
|
||
this markdown commit):
|
||
|
||
```
|
||
cargo fmt --all -- --check → exit 0
|
||
cargo clippy --all-targets --all-features -- -D warnings → exit 0
|
||
cargo test --workspace → 427 passed, 0 failed
|
||
(cd dashboard && npm run check) → 0 errors, 0 warnings (369 files)
|
||
```
|
||
|
||
This HANDBACK commit is pure markdown (no gate-relevant files), so the numbers
|
||
above hold for the final HEAD.
|
||
|
||
**Migrations — verified against a real Postgres (dev stack, port 15432):**
|
||
- Fresh-DB replay: the `#[sqlx::test]` schema-snapshot test applies all
|
||
migrations on a fresh ephemeral DB and matches the (re-blessed) golden — passes.
|
||
- On-top-of-prior-state: booting `picloud` against a dev DB pinned at migration
|
||
`0006` applied `0007…0017` cleanly (`"migrations applied"`); `_sqlx_migrations`
|
||
max is now `17`; `cron_trigger_details` + widened CHECKs present.
|
||
|
||
**Live smoke performed:**
|
||
- Boot logged the `PICLOUD_HTTP_ALLOW_PRIVATE` warning and started the cron
|
||
scheduler + HTTP service without panic.
|
||
- Seeded an every-second cron trigger → scheduler set `last_fired_at`, dispatcher
|
||
consumed the outbox row and ran the script (row deleted on success), and
|
||
`last_fired_at` advanced at the tick cadence (fire-once confirmed). Smoke data
|
||
cleaned up afterward.
|
||
- HTTP GET / SSRF-block / body-dispatch behaviors are covered by the automated
|
||
integration tests (real `TcpListener` round-trips + loopback/hostname SSRF
|
||
blocks) rather than a manual curl flow, since a live SSRF-block smoke
|
||
conflicts with the `PICLOUD_HTTP_ALLOW_PRIVATE` a local-server smoke requires.
|
||
|
||
To re-run the schema snapshot:
|
||
```
|
||
docker compose up -d postgres
|
||
DATABASE_URL=postgres://picloud:picloud@127.0.0.1:15432/picloud \
|
||
cargo test -p picloud-manager-core --test schema_snapshot -- --include-ignored
|
||
```
|
||
|
||
---
|
||
|
||
## ⚠️ Latent issue found: stale schema-snapshot golden
|
||
|
||
`crates/manager-core/tests/expected_schema.txt` was **significantly stale** — the
|
||
committed golden was missing many tables from prior releases
|
||
(`abandoned_executions`, `dead_letters`, `dead_letter_trigger_details`,
|
||
`docs_*`, etc.). The `schema_snapshot` test is `#[ignore]` (needs a DB), so it
|
||
was apparently never re-blessed across v1.1.1–v1.1.3 and silently drifted.
|
||
|
||
I re-blessed it, so the diff is large (+217 lines) but **only `cron_trigger_details`
|
||
+ the two widened CHECK constraints are v1.1.4-new** — the rest is pre-existing
|
||
drift correction. The blessed golden now matches a clean replay (verified).
|
||
Recommend the reviewer skim the diff to confirm, and consider whether the
|
||
`#[ignore]` should be lifted in CI (with a DB service) so the golden can't drift
|
||
again.
|
||
|
||
---
|
||
|
||
## Latent security findings
|
||
|
||
None new beyond the (already-known, already-closed-in-v1.1.3) cross-app trigger
|
||
gap, which §10c now documents in the CHANGELOG. The SSRF surface is the main
|
||
security mechanism in this release; see the SSRF notes above for the
|
||
defense-in-depth layering (resolver hook + literal-IP check + per-hop
|
||
re-validation + IP-never-leaked errors).
|
||
|
||
One thing for the reviewer to weigh: the SSRF policy is a hardcoded deny-list
|
||
with no per-app allow-list (deferred to v1.2 per the brief). An operator who
|
||
needs a script to reach a private service has only the all-or-nothing
|
||
`PICLOUD_HTTP_ALLOW_PRIVATE` global escape hatch today.
|
||
|
||
## Open questions for the reviewer
|
||
|
||
1. **Three-arg HTTP shape** (decision #1) — confirm you're happy with
|
||
`verb(url, body, opts)` + dropping `body_raw`, vs the brief's documented
|
||
two-arg form. This is the one user-facing API-shape divergence.
|
||
2. **Stale schema golden** — OK to land the full re-bless in this PR, or would
|
||
you prefer the drift correction split out?
|
||
|
||
## Deferred items (per the brief's OUT list — not built)
|
||
|
||
WebSocket/SSE, streaming responses, HTTP/3, per-app outbound allow/deny lists,
|
||
per-app rate limits, mTLS, request signing, cookie jar, cron backfill replay,
|
||
cron next-fire preview, cron schedule history, drift compensation,
|
||
module-import-over-HTTP, files/pubsub/secrets/email/users/queue.
|
||
|
||
## Known limitations / rough edges
|
||
|
||
- The dashboard Triggers tab lists all trigger kinds but only *creates* cron
|
||
triggers (kv/docs creation remains API-only, unchanged from before). No
|
||
next-fire-at preview (deferred to v1.2).
|
||
- `post_form` / body field order follows Rhai map iteration order
|
||
(`BTreeMap`-backed, so sorted/deterministic; not insertion order).
|
||
- The cron scheduler tick is floored at 1s; sub-second schedules effectively
|
||
fire at the tick cadence (by design — see the fire-once policy).
|
||
- The stale REVIEW.md at repo root is the v1.1.3 reviewer's artifact; the
|
||
v1.1.4 reviewer should overwrite it.
|