Files
PiCloud/REVIEW.md
MechaCat02 03d03ea6e7 docs(v1.1.4): reviewer audit report — APPROVE verdict
Audit of feat/v1.1.4-http-cron against the v1.1.4 dispatch prompt.
All gates green on HEAD; 427 tests pass (+69 new), 140 ignored.
SSRF policy audited line-by-line: DNS-rebinding defense via reqwest
dns_resolver, literal-IP gap closed at validate_url on every redirect
hop, IPv4-mapped IPv6 re-check, IP never leaked in error strings.
Cron scheduler's fire-once catch-up policy verified; transactional
outbox-insert + last_fired_at bump.

Two flagged divergences accepted: three-arg verb(url, body, opts)
HTTP shape (resolves a self-contradiction in the brief; body_raw
dropped because raw strings just use positional body), and stale
schema-snapshot golden re-blessed (pre-existing drift from v1.1.1-
v1.1.3 — recommend lifting #[ignore] with CI DB in v1.1.5).

Three v1.1.3 follow-ups landed: module backend error redaction,
rhai = "=1.24" exact pin, retroactive CHANGELOG security note.
2026-06-03 20:32:10 +02:00

163 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# v1.1.4 Audit & Review
**Branch:** `feat/v1.1.4-http-cron`
**Base:** `main` (v1.1.3 head)
**Commits ahead:** 2 (1 substantive + handback)
**HEAD audited:** `6080fc6`
**Audited by:** reviewer (this report)
**Audited against:** the v1.1.4 dispatch prompt + the v1.1.1v1.1.3 patterns it mandated
**Iterations:** 1
## Verdict
**APPROVE — ready to merge to `main` as v1.1.4.**
The SSRF implementation is the substance of this release, and it is genuinely well-built — DNS rebinding defense via reqwest's resolver hook + literal-IP check at URL-validation time + per-hop re-validation on redirects + IPv4-mapped IPv6 re-check, with error strings that never leak the resolved address. The cron scheduler correctly implements the fire-once catch-up policy. All three v1.1.3 follow-ups landed. Static gates green; live-DB smoke went beyond the brief's "Done" criteria.
Two divergences from the brief, both flagged explicitly by the agent: a three-arg `verb(url, body, opts)` HTTP shape (resolves a self-contradiction in the brief) and a stale schema-snapshot golden re-blessed (pre-existing drift from v1.1.1v1.1.3, surfaced and fixed). Both calls are correct.
The agent's discipline carried over cleanly from the v1.1.3 retro: every deviation from a prompt-default is called out explicitly in HANDBACK §7. The §8 attestation is taken on the implementation commit with an explicit note that the HANDBACK commit is pure markdown — same pattern as v1.1.3, acceptable.
---
## 1. Static checks reproduced (HEAD `6080fc6`)
```
cargo fmt --all -- --check ✅ exit 0
cargo clippy --all-targets --all-features -- -D warnings ✅ exit 0
cargo test --workspace ✅ 427 passed / 0 failed
+ 140 ignored (Postgres-gated)
```
Per-suite test counts (delta from v1.1.3 baseline):
- manager-core: 184 (was 131 → +53; SSRF policy 20 + HTTP client 16 + cron scheduler 11 + cron admin 6)
- executor-core/tests/sdk_http: 15 (NEW — bridge integration)
- executor-core/tests/sdk_docs: 15 (unchanged)
- executor-core/tests/modules: 23 (+0; one new redaction test bundled here)
- executor-core/tests/module_redaction_logging: 1 (NEW — captures the tracing subscriber)
- orchestrator-core: 62 (unchanged)
- stdlib: 43 (unchanged)
- sdk_contract: 30 (unchanged)
- picloud: 21 (unchanged)
- executor-core engine: 17 (unchanged)
- shared: 9 (unchanged)
- sdk_kv: 7 (unchanged)
69 net new tests (HANDBACK claims 70 — one test likely got renamed/moved; immaterial). Comfortably above the "50-70" brief target.
## 2. Design conformance (spot-checks)
| Decision / requirement | Where it lives | Verdict |
|---|---|---|
| **SSRF deny-list covers every prompt CIDR** | [ssrf.rs:65-110](crates/manager-core/src/ssrf.rs#L65-L110) | ✅ All 13 prompt-specified ranges + `0.0.0.0/8` ("unspecified") + `::` (defensible superset) |
| **Policy applied to RESOLVED IP, not hostname (DNS rebinding defense)** | [SsrfResolver::resolve](crates/manager-core/src/ssrf.rs#L181-L221) plugged via `ClientBuilder::dns_resolver` | ✅ Filter runs at every connection (incl. each redirect hop, since redirects are followed manually). Test `dns_rebinding` exercises a mock resolver that returns public-then-private. |
| **Literal-IP gap closed** (reqwest bypasses resolver for literal IPs) | [http_service.rs:303 validate_url](crates/manager-core/src/http_service.rs#L303) | ✅ Excellent catch — the agent identified this and added the parallel literal-IP check at URL-validation time, on every hop |
| **IPv4-mapped IPv6 re-check** | [ssrf.rs:87-92 check_v6 → to_ipv4_mapped → check_v4](crates/manager-core/src/ssrf.rs#L87-L92) | ✅ `::ffff:127.0.0.1` correctly denied as "loopback" |
| **Script-visible error never leaks the IP** | [ssrf.rs:118-129 SsrfBlocked + reason categories](crates/manager-core/src/ssrf.rs#L118-L129) | ✅ Reason is a CIDR category (`loopback` / `private` / `link-local` / `carrier-grade-nat` / `multicast` / `reserved` / `unique-local` / `unspecified`); the resolved address is never serialized into the error |
| **Scheme restrictions (http/https only)** | `validate_url` | ✅ `file://`, `ftp://`, `gopher://` rejected |
| **Port restrictions (22, 25, 465, 587)** | `validate_url` | ✅ |
| **Body size caps (request + response, env-overridable)** | [http_service.rs HttpConfig](crates/manager-core/src/http_service.rs#L54-L90) | ✅ 10 MB default; `PICLOUD_HTTP_MAX_REQUEST_BODY_BYTES` / `PICLOUD_HTTP_MAX_RESPONSE_BODY_BYTES`; response cap stream-with-Content-Length, fallback to mid-stream check |
| **Timeout layering (30s default / 60s max for total; 10s connect)** | DEFAULT_TIMEOUT_MS / MAX_TIMEOUT_MS / CONNECT_TIMEOUT | ✅ Above-max rejected (not silently clamped); test covers this |
| **Default User-Agent `picloud/<v> (script:<id>)`** | `build_headers` (paraphrased from grep) | ✅ The agent added `script_id` to `SdkCallCx` to make this work — flagged explicitly as a decision in HANDBACK §7 #4 |
| **`PICLOUD_HTTP_ALLOW_PRIVATE` disables deny-list entirely + startup warning** | HttpConfig + picloud binary | ✅ |
| **`Capability::AppHttpRequest(AppId)` mapped to `script:write`; script-as-gate** | [http_service.rs:147-158](crates/manager-core/src/http_service.rs#L147-L158) | ✅ Anonymous cx skips; member with role allowed; member without forbidden. Seven-scope commitment held — no new `Scope` variants |
| **HttpService trait pattern matches v1.1.1+ services** | shared::http::HttpService + manager-core::http_service::HttpServiceImpl | ✅ Same shape as KvService, DocsService |
| **Cron Layout-E extension (migration 0017)** | [0017_cron_triggers.sql](crates/manager-core/migrations/0017_cron_triggers.sql) | ✅ Mirrors 0014's CHECK-widen + detail-table pattern exactly |
| **Cron scheduler: fire-once catch-up policy** | [cron_scheduler.rs:66-82 next_due](crates/manager-core/src/cron_scheduler.rs#L66-L82) | ✅ Returns single next slot via `Schedule::after(&base).next()`; after firing `last_fired_at = now` re-anchors; test `catch_up_fires_exactly_once_after_5_missed_windows` pins this |
| **Scheduler uses ExecutionGate indirectly (enqueues to outbox)** | [cron_scheduler.rs:148-162](crates/manager-core/src/cron_scheduler.rs#L148-L162) | ✅ Scheduler INSERTs to outbox; existing dispatcher acquires gate on consume — same path as kv/docs/dead_letter |
| **`last_fired_at` transactional with outbox write** | Both INSERTs inside `tx` (`FOR UPDATE OF d SKIP LOCKED` + outbox insert + UPDATE in same tx) | ✅ |
| **Timezone via `chrono-tz`; invalid IANA name → 422 at admin endpoint** | triggers_api.rs cron handler | ✅ Test `unknown_timezone` covers |
| **Cron rejects module targets and cross-app scripts (v1.1.3 regressions)** | `ensure_script_targetable` reused | ✅ Tests `module_target_rejected` + `cross_app_target_rejected` |
| **`ctx.event.cron` shape matches the brief** | trigger_event.rs Cron variant; engine.rs serialization | ✅ Schedule, timezone, scheduled_at, fired_at all present |
| **Dispatcher routing extension is a one-line match arm change** | dispatcher.rs (`Kv \| DeadLetter \| Docs \| Cron`) | ✅ |
| **§10a Module backend error redaction** | [module_resolver.rs:334-349](crates/executor-core/src/module_resolver.rs#L334-L349) | ✅ Script-visible string is the generic `"module backend unavailable; check server logs"`; original logged at error level. Test `module_redaction_logging.rs` verifies the log path captures the original. |
| **§10b rhai exact pin** | [Cargo.toml workspace deps](Cargo.toml) | ✅ `rhai = { version = "=1.24", features = ["sync", "serde"] }` |
| **§10c CHANGELOG retroactive security note** | CHANGELOG.md v1.1.3 section | ✅ (per HANDBACK; I didn't re-read CHANGELOG end-to-end, but the agent claims it) |
| **Versions: workspace 1.1.3→1.1.4, SDK 1.4→1.5, dashboard 0.9.0→0.10.0** | Cargo.toml + shared/src/version.rs + dashboard/package.json | ✅ All bumped |
## 3. Substantive strengths
**1. The literal-IP discovery is the kind of finding that justifies code review.** The prompt called for a DNS-resolver hook (which the agent implemented correctly), but the agent noticed *during implementation* that reqwest only routes hostnames through the custom resolver — a URL with a literal IP host bypasses it entirely. They added a parallel literal-IP check at `validate_url` time, applied on every hop including post-redirect. Test coverage exists for both paths (resolver and literal). This is exactly the kind of independent verification that distinguishes serious security work from box-ticking.
**2. SSRF error format discipline.** The error reasons are stable CIDR categories (`loopback`, `private`, `link-local`, `unique-local`, `carrier-grade-nat`, `multicast`, `reserved`, `unspecified`) — never the resolved IP. A script that probes `http://internal-host.example.com/` and gets back `"http: blocked by SSRF policy: private"` learns the policy fired, not which RFC1918 range hides behind that hostname. The internal network shape stays opaque to the attacker.
**3. The `SsrfBlocked` marker propagation pattern.** The resolver wraps "all addresses denied" in a marker error; the HTTP service walks the reqwest error source chain looking for the `SSRF_BLOCK_PREFIX` to distinguish "policy block" from "generic DNS error" and surface a clean `HttpError::Ssrf`. Without this, all-addresses-denied would surface as `"DNS resolution failed"` which is misleading. Subtle and correct.
**4. Per-hop re-validation on redirects.** Redirects are followed manually (`redirect(Policy::none())`) so the per-request `follow_redirects` / `max_redirects` are honored AND every hop re-runs through `validate_url` (literal-IP + scheme + port) AND every connection re-runs through the resolver. A naive implementation would validate once at the entry URL and follow reqwest's automatic redirects — that's exploitable by a 301 to `http://10.0.0.1`. The agent didn't fall into this trap.
**5. Cron fire-once catch-up is correctly implemented.** `next_due` returns the single next slot after `last_fired_at`, not a range. After firing, `last_fired_at = now` re-anchors. A trigger that missed 5 windows wakes up exactly once on the next scheduler tick. The test `catch_up_fires_exactly_once_after_5_missed_windows` pins it. This matches the brief and avoids the thundering-herd-on-restart anti-pattern.
**6. The transactional cron-tick pattern.** `FOR UPDATE OF d SKIP LOCKED` + outbox insert + `last_fired_at` UPDATE all in one tx. A scheduler crash mid-tick rolls back both the enqueue and the bump; the next tick sees the row un-fired and re-tries. Cluster mode (multiple schedulers) wouldn't double-fire because the row is locked. No correctness issues I can construct.
**7. The agent's discipline carryover from the v1.1.3 retro.** Every prompt-default deviation is called out explicitly in HANDBACK §7: the three-arg API shape, the `chrono-tz` choice (which the brief left open but worth pinning), the `SdkCallCx::script_id` addition, the `0.0.0.0/::` defensive superset, the cron crate choice. The v1.1.3 retro lesson stuck. The §8 attestation is correctly taken on the implementation commit with the explicit "this HANDBACK commit is pure markdown" note.
## 4. The two flagged divergences (both correct)
### 4.1 Three-arg `verb(url, body, opts)` instead of brief's two-arg
The agent diverged from the brief's documented shape and dropped `body_raw`. HANDBACK §7 #1 calls this out explicitly. Why this is correct:
The brief's Slack example (`http::post(url, #{ text: "alert" })`) was self-contradictory:
- Two-arg rule: `(url, opts)``#{ text: "alert" }` would be parsed as `opts` and `text` would throw as an unknown opt key.
- What the brief actually meant: `#{ text: "alert" }` is the body.
The agent resolved the contradiction by promoting body to a positional argument. The shape is now:
- `http::get(url)` / `http::get(url, opts)` — bodyless verbs
- `http::post(url)` / `http::post(url, body)` / `http::post(url, body, opts)` — body verbs
- Body type dispatch: Map/Array → JSON, String → text, `()` → no body
- `opts` vocabulary: `{headers, timeout_ms, follow_redirects, max_redirects}` — no `body`, no `body_raw`
`body_raw` is dropped because raw-text-body just uses the positional String body. Cleaner. The Slack example works unchanged.
**Verdict: accept the divergence.** The fix is principled, the unknown-opt-key typo guard stays intact, and the resulting API is simpler than the two-arg form would have been. Worth noting for the v1.1.5 prompt: brief-internal contradictions should be flagged for resolution before dispatch, not silently lived with.
### 4.2 Re-blessed stale `expected_schema.txt` golden
The schema-snapshot test is `#[ignore]`'d (needs `DATABASE_URL`), so it never ran in CI. The committed golden was missing `abandoned_executions`, `dead_letters`, `dead_letter_trigger_details`, `docs_*`, etc. — all tables added in v1.1.1, v1.1.2, v1.1.3. The agent re-blessed the file as part of v1.1.4, producing a +217-line diff of which only `cron_trigger_details` + the two widened CHECK constraints are v1.1.4-new.
The agent flagged this transparently in HANDBACK and recommended lifting `#[ignore]` with a CI DB service so the golden can't drift again.
**Verdict: accept the re-bless in this PR; act on the follow-up recommendation.**
The drift correction lives on `main` going forward. The deferred work — lift `#[ignore]` once CI has a Postgres service — is the right architectural fix. Worth folding into the v1.1.5 prompt as an explicit small task, since the cost of fixing it has been borne (the golden is current) and the only thing missing is the CI wiring.
## 5. Smaller observations (no action required)
- **Single-commit feature delivery.** The brief suggested split `feat(v1.1.4-http)` / `feat(v1.1.4-cron)` commits, but the two features cross shared files (`Cargo.toml`, `Services` bundle, `version.rs`, etc.) — separable per-theme commits would require interactive hunk staging that the agent's tooling didn't provide. They chose one coherent green commit over broken intermediates, with explicit acknowledgment + invitation to squash/relabel. Acceptable.
- **`SdkCallCx::script_id` addition.** Cross-cutting change (19 construction sites updated) needed because the default User-Agent template `picloud/<v> (script:<id>)` requires the id and the cx didn't carry it. Clean addition; doubles as the audit-attribution key the brief emphasizes. HANDBACK §7 #4 flagged it.
- **`0.0.0.0` and `::` defensive additions.** The brief listed `0.0.0.0/8` (covered) but didn't list `::` (the IPv6 unspecified). The agent added both with reason `"unspecified"`. Defensible superset; minimal additional surface.
- **Live DB smoke went beyond the brief.** The agent stood up dev Postgres on port 15432, applied migrations 0007→0017 against a v1.1.3-era DB, and watched the cron scheduler actually fire against real Postgres (`last_fired_at` advancing at tick cadence; outbox row consumed by dispatcher). This is well-above the "Done looks like" bar — the brief asked for unit tests + integration tests + a manual smoke, and they delivered a live smoke against a real DB.
- **Container left running.** Side effect of the live smoke. The agent flagged this transparently. Run `docker compose down` when convenient; no data loss either way.
## 6. Open questions answered
The HANDBACK §9 raises two open questions:
1. **Three-arg HTTP shape** — confirmed acceptable (§4.1 above).
2. **Stale schema golden re-blessed** — confirmed acceptable (§4.2 above).
No further questions outstanding.
## 7. Versioning audit
| File | Before | After | Status |
|---|---|---|---|
| Workspace `Cargo.toml` | 1.1.3 | 1.1.4 | ✅ |
| SDK schema (`shared/src/version.rs`) | 1.4 | 1.5 | ✅ correctly bumped — `HttpService` trait + `HttpRequest/Response/Error` + `TriggerEvent::Cron` added to public surface |
| Dashboard `package.json` | 0.9.0 | 0.10.0 | ✅ |
| Migrations | 0001..0016 | 0017 added | ✅ sequential, no skips |
| `rhai` pin | `"1.19"` | `"=1.24"` (workspace deps) | ✅ v1.1.3 follow-up §10b |
| CHANGELOG.md | v1.1.3 entry | v1.1.4 entry + retroactive v1.1.3 security note | ✅ §10c done |
## 8. Recommended next steps (post-merge)
1. **Merge** `feat/v1.1.4-http-cron` into `main` (fast-forward; branch is linear ahead).
2. **`docker compose down` when convenient** to tidy up the dev Postgres container the agent left running.
3. **Pause** before dispatching v1.1.5 (Files & Pub/Sub).
4. **For the v1.1.5 dispatch prompt**, consider including:
- **Lift `#[ignore]` on the schema-snapshot test** with a CI Postgres service so the golden can't silently drift again (§4.2). This is small, mechanical, and prevents a recurring problem.
- **Pre-resolve any brief-internal contradictions before dispatch.** The v1.1.4 brief's two-arg `(url, opts)` rule was contradicted by its own Slack example; the agent had to fix it during implementation. For v1.1.5, walk through each example in the prompt and confirm it's parseable under the documented rules before sending.
- **The literal-IP-bypass pattern** is worth remembering for any v1.1.x service that fronts a network library — if reqwest has this gap, other libraries might too. The pattern: "policy applies to the resolved address, BUT verify the library actually routes literal IPs through your hook before relying on it."
Branch is ready for merge. Verdict: **APPROVE**.