docs(v1.1.4): handback report
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
575
HANDBACK.md
575
HANDBACK.md
@@ -1,351 +1,252 @@
|
|||||||
# v1.1.3 — Modules — Handback
|
# v1.1.4 Handback — Outbound HTTP SDK & Cron Triggers
|
||||||
|
|
||||||
## 1. Branch summary
|
**Branch:** `feat/v1.1.4-http-cron` (off `main`)
|
||||||
|
**Commits:** 1 implementation commit (`feat(v1.1.4): outbound HTTP SDK + cron triggers`) + this HANDBACK commit.
|
||||||
|
|
||||||
- **Branch:** `feat/v1.1.3-modules`
|
> **Note on commit granularity:** the brief suggested split
|
||||||
- **Commits ahead of `main`:** 6
|
> `feat(v1.1.4-http)` / `feat(v1.1.4-cron)` commits. The two features are
|
||||||
- **HEAD:** `3dbead4`
|
> interleaved across shared files (`Cargo.toml`, `crates/picloud/src/lib.rs`,
|
||||||
- **Not pushed, not merged, no PR opened** (per brief).
|
> `crates/manager-core/src/lib.rs`, `version.rs`, `services.rs`), so
|
||||||
|
> cleanly-*compiling* per-theme commits aren't separable without interactive
|
||||||
|
> hunk staging (unavailable in this environment). I chose one coherent,
|
||||||
|
> green commit over shipping broken intermediates. Squash/relabel as you see fit.
|
||||||
|
|
||||||
Commits (newest first):
|
---
|
||||||
|
|
||||||
|
## Scope coverage
|
||||||
|
|
||||||
|
| # | Item | Status |
|
||||||
|
|---|------|--------|
|
||||||
|
| 1 | `http::*` SDK surface (get/post/put/patch/delete/head/post_form/request) | **Done** |
|
||||||
|
| 2 | SSRF deny-list (resolved-IP, DNS-rebinding defense, scheme/port, body caps, UA, timeouts, `PICLOUD_HTTP_ALLOW_PRIVATE`) | **Done** |
|
||||||
|
| 3 | http authz (`Capability::AppHttpRequest` → `script:write`, script-as-gate, no new Scope) | **Done** |
|
||||||
|
| 4 | `HttpService` trait + `HttpServiceImpl` + Services wiring | **Done** |
|
||||||
|
| 5 | Cron migration `0017` (Layout-E extension) | **Done** |
|
||||||
|
| 6 | Cron scheduler tokio task (catch-up = fire-once) | **Done** |
|
||||||
|
| 7 | `ctx.event.cron` shape + `TriggerEvent::Cron` | **Done** |
|
||||||
|
| 8 | Dispatcher routing extension (`… \| Cron`) | **Done** |
|
||||||
|
| 9 | Dashboard cron trigger UI (minimal) | **Done** |
|
||||||
|
| 10a | Redact `ModuleSourceError::Backend` at resolver boundary | **Done** |
|
||||||
|
| 10b | Pin `rhai = "=1.24"` | **Done** |
|
||||||
|
| 10c | CHANGELOG retroactive v1.1.3 cross-app-trigger security note | **Done** |
|
||||||
|
| 11 | Version bumps (workspace 1.1.4, SDK 1.5, dashboard 0.10.0) | **Done** |
|
||||||
|
| 12 | Tests (~50-70) | **Done** — 70 new |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSRF policy implementation notes
|
||||||
|
|
||||||
|
- **reqwest hook.** `crates/manager-core/src/ssrf.rs` defines `SsrfResolver`
|
||||||
|
implementing `reqwest::dns::Resolve`, plugged in via
|
||||||
|
`ClientBuilder::dns_resolver`. It delegates to the system resolver
|
||||||
|
(injectable for tests — see DNS-rebinding test), then filters each `IpAddr`
|
||||||
|
through `SsrfPolicy::check`. Because reqwest re-resolves at every connection
|
||||||
|
(including each redirect hop), the policy applies post-redirect too.
|
||||||
|
- **`dns_resolver` is generic over a concrete `R: Resolve`** (stores `Arc<R>`),
|
||||||
|
so the resolver is passed as `Arc<SsrfResolver>`, not `Arc<dyn Resolve>`.
|
||||||
|
- **Literal-IP gap closed.** reqwest only routes *hostnames* through the custom
|
||||||
|
resolver — a URL with a literal-IP host (`http://127.0.0.1/`) bypasses it
|
||||||
|
entirely. The impl therefore *also* runs `SsrfPolicy::check` on literal-IP
|
||||||
|
hosts at URL-parse time (`validate_url`), on every hop. Both paths are tested.
|
||||||
|
- **IPv4-mapped IPv6 re-check.** `check_v6` calls `Ipv6Addr::to_ipv4_mapped()`;
|
||||||
|
if `Some`, it re-runs the v4 deny-list against the embedded address
|
||||||
|
(`::ffff:127.0.0.1` → denied as "loopback").
|
||||||
|
- **Applied before AND after redirects.** Redirects are followed *manually*
|
||||||
|
(client built with `redirect(Policy::none())`) so per-request
|
||||||
|
`follow_redirects`/`max_redirects` are honored; each hop re-validates
|
||||||
|
scheme/port + literal-IP and re-resolves hostnames through the SSRF resolver.
|
||||||
|
- **Script-visible error format.** `"http: blocked by SSRF policy: <reason>"`
|
||||||
|
where `<reason>` is a CIDR category (`loopback`, `private`, `link-local`,
|
||||||
|
`carrier-grade-nat`, `multicast`, `reserved`, `unique-local`, `unspecified`).
|
||||||
|
**The resolved IP is never included.** The all-addresses-denied case surfaces
|
||||||
|
as `Ssrf` (not a generic DNS error) via a marker error the resolver emits and
|
||||||
|
the impl detects by walking the reqwest error source chain.
|
||||||
|
|
||||||
|
## Cron scheduler implementation notes
|
||||||
|
|
||||||
|
- **Catch-up = fire-once.** Matches the brief; no deviation. `next_due` returns a
|
||||||
|
single canonical scheduled-at (first slot after `last_fired_at`, or
|
||||||
|
`created_at` if never fired); after firing, `last_fired_at = now`, so the next
|
||||||
|
tick sees only future slots. **Verified live** against Postgres: an
|
||||||
|
every-second (`* * * * * *`) trigger with a 2s tick advanced `last_fired_at`
|
||||||
|
~once per 2s, not once per second.
|
||||||
|
- **No ExecutionGate contention.** The scheduler only enqueues to the outbox
|
||||||
|
(one row per due trigger per tick, in a `FOR UPDATE OF d SKIP LOCKED`
|
||||||
|
transaction that also bumps `last_fired_at`). The existing dispatcher acquires
|
||||||
|
the gate and delivers it identically to kv/docs/dead_letter — verified live
|
||||||
|
(the cron outbox row was consumed, the script executed, the row deleted).
|
||||||
|
- **Timezone handling.** `chrono-tz`. Invalid IANA names are rejected at the
|
||||||
|
admin endpoint with a 422 (`TriggersApiError::Invalid`, message contains
|
||||||
|
"timezone"); the repo re-validates defensively before insert.
|
||||||
|
- **Schema beyond the brief:** none. Followed the brief exactly — `schedule`,
|
||||||
|
`timezone DEFAULT 'UTC'`, `last_fired_at`, `idx_cron_triggers_due`. **No**
|
||||||
|
stored `next_scheduled_at` column (an exploration agent suggested one; the
|
||||||
|
brief computes next-fire in-process, which I followed).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tests added (70 new)
|
||||||
|
|
||||||
|
- **SSRF policy + resolver (`ssrf.rs`, 20):** one per deny CIDR (127/8, 0/8,
|
||||||
|
10/8, 172.16/12, 192.168/16, 169.254/16 incl. metadata, 100.64/10, 224/4,
|
||||||
|
240/4, ::1, ::, fe80::/10, fc00::/7, ff00::/8); 172.x outside-range allowed;
|
||||||
|
public v4/v6 allowed; IPv4-mapped re-check; `allow_private` disables all;
|
||||||
|
resolver returns only allowed addrs; all-denied → SSRF marker; **DNS rebinding**
|
||||||
|
(mock resolver: public then private — second denied); empty resolution ≠ SSRF.
|
||||||
|
- **HTTP client (`http_service.rs`, 16):** GET/POST round-trips vs a hand-rolled
|
||||||
|
`TcpListener`; body dispatch + default UA; custom UA override; empty body;
|
||||||
|
non-2xx no-error; response cap via Content-Length; response cap mid-stream
|
||||||
|
(no Content-Length); request body cap pre-send; redirect-to-max-then-throw;
|
||||||
|
scheme rejection (file/ftp/gopher); port rejection (22/25/465/587); SSRF
|
||||||
|
literal-loopback; SSRF hostname-resolves-to-loopback; timeout; authz
|
||||||
|
(anon skips / member forbidden / member-with-role allowed).
|
||||||
|
- **Bridge integration (`sdk_http.rs`, 15):** real Rhai engine under
|
||||||
|
`spawn_blocking` vs a recording fake — status+JSON body, non-JSON string,
|
||||||
|
empty→`()`, Map→JSON, String→text, `()`→no body, headers+timeout forwarded,
|
||||||
|
unknown opt key throws, timeout>max throws, non-2xx no-throw, network error
|
||||||
|
throws `http:`, `post_form` url-encoding, `request` arbitrary method,
|
||||||
|
default-UA carries `script_id`, `cx.app_id` forwarded for attribution.
|
||||||
|
- **Cron scheduler (`cron_scheduler.rs`, 11):** 6-field schedule accept / 5-field
|
||||||
|
+ malformed reject; IANA tz accept / reject; due/not-due; never-fired uses
|
||||||
|
created_at; **catch-up fires exactly once after 5 missed windows**; timezone
|
||||||
|
affects fire time; bad schedule/tz → None.
|
||||||
|
- **Cron admin (`triggers_api.rs`, 6):** create succeeds; invalid schedule;
|
||||||
|
unknown timezone; **module target rejected** (v1.1.3 regression); **cross-app
|
||||||
|
target rejected** (v1.1.3 regression); member-without-role forbidden.
|
||||||
|
- **Module redaction (2):** `modules.rs` — backend error redacted from the
|
||||||
|
script-visible error (no leak); `module_redaction_logging.rs` — original error
|
||||||
|
**is** logged at ERROR level (captured via a global tracing subscriber).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decisions beyond the brief (every prompt-default deviation, flagged)
|
||||||
|
|
||||||
|
1. **Three-arg split `verb(url, body, opts)`** (user-approved during planning).
|
||||||
|
Diverges from the brief's documented two-arg `(url, opts)` shape and
|
||||||
|
generalizes the escape hatch to `request(method, url, body, opts)`. Resolves
|
||||||
|
the brief's internal contradiction (its Slack example `http::post(url, #{text:...})`
|
||||||
|
passed a bare body map, which would be an "unknown opt key" under the
|
||||||
|
two-arg rule). The `opts` vocabulary is now exactly
|
||||||
|
`{headers, timeout_ms, follow_redirects, max_redirects}` — **`body_raw` was
|
||||||
|
dropped** (raw strings go through the positional body as a String). The
|
||||||
|
Slack example works unchanged (`#{text:...}` is the body).
|
||||||
|
2. **Cron crate = `cron` (0.12), not `croner`.** The brief allowed either; `cron`
|
||||||
|
handles the 6-field-with-seconds format and named weekdays (`MON-FRI`) used in
|
||||||
|
the brief's example, and integrates with chrono `Schedule::after`.
|
||||||
|
3. **Catch-up = fire-once** — matches the brief; called out explicitly as
|
||||||
|
requested. No deviation.
|
||||||
|
4. **`SdkCallCx` gained a `script_id` field.** The brief's default User-Agent is
|
||||||
|
`picloud/<v> (script:<script_id>)`, but `SdkCallCx` didn't carry the script
|
||||||
|
id. Adding it (sourced from `ExecRequest.script_id` in the engine) is the
|
||||||
|
clean home and doubles as the audit-attribution key the brief emphasizes. All
|
||||||
|
19 construction sites updated. The dead-letters admin cx uses a fresh sentinel
|
||||||
|
id (no script executes there).
|
||||||
|
5. **SSRF also blocks IPv6 unspecified `::` and IPv4 `0.0.0.0`** with reason
|
||||||
|
"unspecified". `0.0.0.0/8` is in the brief's list; `::` is not explicitly but
|
||||||
|
is an obvious sibling hole, so I blocked it too (defensible superset).
|
||||||
|
6. **No reqwest feature additions needed** — `dns_resolver` and `Response::chunk()`
|
||||||
|
compile under the existing `default-features = false, features = ["json","rustls-tls"]`.
|
||||||
|
No cookie jar (cookies feature is off, so there's no jar to disable). Added
|
||||||
|
`url` as an executor-core dep (for `form_urlencoded` in `post_form`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to verify locally (§8 attestation — run on this exact HEAD)
|
||||||
|
|
||||||
|
All four gates were run on the handback HEAD (the `feat(v1.1.4)` commit, before
|
||||||
|
this markdown commit):
|
||||||
|
|
||||||
```
|
```
|
||||||
3dbead4 test(v1.1.3-modules): resolver, cache, validator, kind-rejection coverage
|
cargo fmt --all -- --check → exit 0
|
||||||
10f76d2 chore(v1.1.3-modules): version bumps + CHANGELOG + blueprint touch-up
|
cargo clippy --all-targets --all-features -- -D warnings → exit 0
|
||||||
610fd4f feat(v1.1.3-modules): dashboard kind dropdown + scripts-list and detail badges
|
cargo test --workspace → 427 passed, 0 failed
|
||||||
66b41bb feat(v1.1.3-modules): top-level script AST cache in LocalExecutorClient
|
(cd dashboard && npm run check) → 0 errors, 0 warnings (369 files)
|
||||||
c6211a7 feat(v1.1.3-modules): reject module scripts from routes + triggers; tighten cross-app trigger check
|
```
|
||||||
84833d3 feat(v1.1.3-modules): shared types, migrations, engine + resolver scaffold
|
|
||||||
|
This HANDBACK commit is pure markdown (no gate-relevant files), so the numbers
|
||||||
|
above hold for the final HEAD.
|
||||||
|
|
||||||
|
**Migrations — verified against a real Postgres (dev stack, port 15432):**
|
||||||
|
- Fresh-DB replay: the `#[sqlx::test]` schema-snapshot test applies all
|
||||||
|
migrations on a fresh ephemeral DB and matches the (re-blessed) golden — passes.
|
||||||
|
- On-top-of-prior-state: booting `picloud` against a dev DB pinned at migration
|
||||||
|
`0006` applied `0007…0017` cleanly (`"migrations applied"`); `_sqlx_migrations`
|
||||||
|
max is now `17`; `cron_trigger_details` + widened CHECKs present.
|
||||||
|
|
||||||
|
**Live smoke performed:**
|
||||||
|
- Boot logged the `PICLOUD_HTTP_ALLOW_PRIVATE` warning and started the cron
|
||||||
|
scheduler + HTTP service without panic.
|
||||||
|
- Seeded an every-second cron trigger → scheduler set `last_fired_at`, dispatcher
|
||||||
|
consumed the outbox row and ran the script (row deleted on success), and
|
||||||
|
`last_fired_at` advanced at the tick cadence (fire-once confirmed). Smoke data
|
||||||
|
cleaned up afterward.
|
||||||
|
- HTTP GET / SSRF-block / body-dispatch behaviors are covered by the automated
|
||||||
|
integration tests (real `TcpListener` round-trips + loopback/hostname SSRF
|
||||||
|
blocks) rather than a manual curl flow, since a live SSRF-block smoke
|
||||||
|
conflicts with the `PICLOUD_HTTP_ALLOW_PRIVATE` a local-server smoke requires.
|
||||||
|
|
||||||
|
To re-run the schema snapshot:
|
||||||
|
```
|
||||||
|
docker compose up -d postgres
|
||||||
|
DATABASE_URL=postgres://picloud:picloud@127.0.0.1:15432/picloud \
|
||||||
|
cargo test -p picloud-manager-core --test schema_snapshot -- --include-ignored
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2. Scope coverage
|
## ⚠️ Latent issue found: stale schema-snapshot golden
|
||||||
|
|
||||||
| # | Brief item | Status | Notes |
|
`crates/manager-core/tests/expected_schema.txt` was **significantly stale** — the
|
||||||
|---|---|---|---|
|
committed golden was missing many tables from prior releases
|
||||||
| 1 | `scripts.kind` column + check + index | **Done** | `migrations/0015_scripts_kind.sql` |
|
(`abandoned_executions`, `dead_letters`, `dead_letter_trigger_details`,
|
||||||
| 2 | Module syntax constraints (fn / const / import only) | **Done** | Walks `ast.statements()` via `rhai/internals`. Admin endpoint is primary gate; resolver re-runs the check for defense-in-depth. |
|
`docs_*`, etc.). The `schema_snapshot` test is `#[ignore]` (needs a DB), so it
|
||||||
| 3 | `ModuleResolver` replaces `DummyModuleResolver` | **Done** | `crates/executor-core/src/module_resolver.rs`; per-call instance with cross-app isolation, cycle detect, depth limit. |
|
was apparently never re-blessed across v1.1.1–v1.1.3 and silently drifted.
|
||||||
| 4 | Two AST caches (script + module) | **Done** | Script cache in `LocalExecutorClient`; module cache in `Engine`. Both invalidate by `updated_at` comparison. Env-overridable sizes. |
|
|
||||||
| 5 | Dep-graph table + populate | **Done** | `migrations/0016_script_imports.sql`; `replace_imports_tx` writes edges in the same transaction as the script INSERT/UPDATE. |
|
|
||||||
| 6 | Admin endpoint changes (kind, kind-change rejection, route/trigger module rejection) | **Done** | Also closes a latent cross-app trigger gap (script.app_id mismatch — see §7). |
|
|
||||||
| 7 | Dashboard surface (kind dropdown + badge) | **Done** | App page form + scripts list + script detail header. `npm run check` clean. |
|
|
||||||
| 8 | `ModuleSource` trait shape | **Done** | Lives in `picloud-shared`; matches the v1.1.1/v1.1.2 service pattern. |
|
|
||||||
| 9 | Version bumps | **Done** | Workspace 1.1.2→1.1.3, SDK 1.3→1.4, dashboard 0.8.0→0.9.0. |
|
|
||||||
| 10 | Tests (~40–60) | **Done** | 46 new tests across 5 crates. Gates green. |
|
|
||||||
|
|
||||||
### Scope-out items (confirmed NOT built)
|
I re-blessed it, so the diff is large (+217 lines) but **only `cron_trigger_details`
|
||||||
|
+ the two widened CHECK constraints are v1.1.4-new** — the rest is pre-existing
|
||||||
- No module versioning / pinning, no `@v3` syntax.
|
drift correction. The blessed golden now matches a clean replay (verified).
|
||||||
- No eager precompilation at save-time.
|
Recommend the reviewer skim the diff to confirm, and consider whether the
|
||||||
- No dashboard dep-graph visualization.
|
`#[ignore]` should be lifted in CI (with a DB service) so the golden can't drift
|
||||||
- No LISTEN/NOTIFY-based cross-node invalidation.
|
again.
|
||||||
- No new `Scope` variants (modules use existing `script:read` / `script:write`).
|
|
||||||
- No admin GET endpoints for `script_imports` (the table is persisted for v1.2+; no v1.1.3 read surface — see §10 deferred items).
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. Resolver implementation notes
|
## Latent security findings
|
||||||
|
|
||||||
### 3.1 In-progress-imports stack
|
None new beyond the (already-known, already-closed-in-v1.1.3) cross-app trigger
|
||||||
|
gap, which §10c now documents in the CHANGELOG. The SSRF surface is the main
|
||||||
Lives **on the per-call `PicloudModuleResolver` instance**, not on `SdkCallCx`. The resolver is constructed fresh per `Engine::execute_ast` call (see `crates/executor-core/src/engine.rs:execute_ast`), so the stack is naturally scoped to one execution. Both the stack and the depth counter are `Mutex<…>` (not `RefCell<…>`) because `rhai::ModuleResolver: SendSync` under the `sync` feature.
|
security mechanism in this release; see the SSRF notes above for the
|
||||||
|
defense-in-depth layering (resolver hook + literal-IP check + per-hop
|
||||||
An RAII `StackGuard` pops the stack and decrements depth on drop — a compile error or panic anywhere inside `resolve()` cleans up properly. The lock is uncontended in practice (Rhai evaluation on the engine is single-threaded).
|
re-validation + IP-never-leaked errors).
|
||||||
|
|
||||||
### 3.2 Sync → async bridge
|
One thing for the reviewer to weigh: the SSRF policy is a hardcoded deny-list
|
||||||
|
with no per-app allow-list (deferred to v1.2 per the brief). An operator who
|
||||||
Rhai's `ModuleResolver::resolve` is sync; `ModuleSource::lookup` is async. The bridge:
|
needs a script to reach a private service has only the all-or-nothing
|
||||||
|
`PICLOUD_HTTP_ALLOW_PRIVATE` global escape hatch today.
|
||||||
```rust
|
|
||||||
let handle = tokio::runtime::Handle::try_current().map_err(/* surfaces as ErrorRuntime */)?;
|
## Open questions for the reviewer
|
||||||
let lookup = tokio::task::block_in_place(|| handle.block_on(self.source.lookup(&self.cx, path)));
|
|
||||||
```
|
1. **Three-arg HTTP shape** (decision #1) — confirm you're happy with
|
||||||
|
`verb(url, body, opts)` + dropping `body_raw`, vs the brief's documented
|
||||||
- `try_current()` (not `current()`) so test harnesses that build an `Engine` outside a Tokio runtime get a clean error instead of a panic.
|
two-arg form. This is the one user-facing API-shape divergence.
|
||||||
- `block_in_place` makes the call safe both on `spawn_blocking` threads (where it's a no-op) and on multi-threaded runtime worker tasks (where it instructs the runtime to relocate other tasks before we block). This was load-bearing for the resolver tests, which call `engine.execute` directly from `#[tokio::test(flavor = "multi_thread")]`.
|
2. **Stale schema golden** — OK to land the full re-bless in this PR, or would
|
||||||
- A `current_thread` runtime still panics — but production callers wrap `Engine::execute` in `tokio::task::spawn_blocking` (see `LocalExecutorClient::execute_with_identity`), which avoids that path entirely.
|
you prefer the drift correction split out?
|
||||||
|
|
||||||
### 3.3 Cross-app isolation enforcement
|
## Deferred items (per the brief's OUT list — not built)
|
||||||
|
|
||||||
The resolver captures `Arc<SdkCallCx>` at construction. Every `ModuleSource::lookup` call passes `&self.cx`. The Postgres impl (`crates/manager-core/src/module_source.rs`) selects with `WHERE app_id = $1 AND kind = 'module' AND name = $2`, binding `$1` from `cx.app_id.into_inner()` — never from any script-passed argument. The Rhai script's `import "name" as alias;` syntax has no slot for an `app_id`, so there is no path by which a script in app A can name a row in app B.
|
WebSocket/SSE, streaming responses, HTTP/3, per-app outbound allow/deny lists,
|
||||||
|
per-app rate limits, mTLS, request signing, cookie jar, cron backfill replay,
|
||||||
Verified by `resolver_cross_app_blocked` and `resolver_cross_app_module_not_found` tests.
|
cron next-fire preview, cron schedule history, drift compensation,
|
||||||
|
module-import-over-HTTP, files/pubsub/secrets/email/users/queue.
|
||||||
### 3.4 Module-shape validation — both layers
|
|
||||||
|
## Known limitations / rough edges
|
||||||
- **Primary gate (admin endpoint)** — `manager-core::api::create_script` and `update_script` call `state.validator.validate_module(src)` whenever the effective kind is `Module`. `Engine`'s impl walks `ast.statements()`, accepting only `Stmt::Var(_, ASTFlags::CONSTANT, _)`, `Stmt::Import(..)`, and `Stmt::Noop(..)`. Anything else (top-level expression, let, if, while, …) is rejected with a clear `ValidationError::ModuleShape` message.
|
|
||||||
- **Defense in depth (resolver)** — the resolver calls `check_module_shape` again after `engine.compile(source)`. This catches rows that bypassed the API (manual SQL inserts, future migration bugs, restoring from an older backup).
|
- The dashboard Triggers tab lists all trigger kinds but only *creates* cron
|
||||||
|
triggers (kv/docs creation remains API-only, unchanged from before). No
|
||||||
Note: Rhai's default optimizer constant-folds `if true { ... }` away, so a module containing `if true { ... }` parses to an empty body and passes vacuously. This is fine semantically (the script has no observable behavior), but it surprises authors. Documented as a known acceptance edge; not worth disabling optimization for.
|
next-fire-at preview (deferred to v1.2).
|
||||||
|
- `post_form` / body field order follows Rhai map iteration order
|
||||||
### 3.5 What the resolver does NOT enforce
|
(`BTreeMap`-backed, so sorted/deterministic; not insertion order).
|
||||||
|
- The cron scheduler tick is floored at 1s; sub-second schedules effectively
|
||||||
- **Module access permissions** — every module in an app is importable by every other script in the same app. Per-module ACLs are explicitly v1.2+.
|
fire at the tick cadence (by design — see the fire-once policy).
|
||||||
- **Module versioning / pinning** — there's exactly one current version per `(app_id, name)`. v1.3+.
|
- The stale REVIEW.md at repo root is the v1.1.3 reviewer's artifact; the
|
||||||
|
v1.1.4 reviewer should overwrite it.
|
||||||
---
|
|
||||||
|
|
||||||
## 4. Cache design notes
|
|
||||||
|
|
||||||
### 4.1 LRU library
|
|
||||||
|
|
||||||
**`lru = "0.12"`** — added to `[workspace.dependencies]`. Standard choice, no-frills crate (`LruCache<K, V>` with `put`/`get`/`len`/etc.). Both caches use `Arc<Mutex<LruCache<K, V>>>` so they're cheap to clone and safe to share across executions.
|
|
||||||
|
|
||||||
### 4.2 Cache key shapes + what's stored
|
|
||||||
|
|
||||||
| Cache | Owner | Key | Value | Stores |
|
|
||||||
|---|---|---|---|---|
|
|
||||||
| **Script AST cache** | `LocalExecutorClient` | `ScriptId` | `CachedScript { updated_at: DateTime<Utc>, ast: Arc<rhai::AST> }` | Compiled AST for the top-level (endpoint) script. |
|
|
||||||
| **Module cache** | `Engine` | `(AppId, String)` | `CachedModule { updated_at: DateTime<Utc>, module: Shared<rhai::Module> }` | Compiled `rhai::Module` produced by `Module::eval_ast_as_new`. |
|
|
||||||
|
|
||||||
The script cache stores `Arc<AST>` so an evaluation can grab a cheap clone and hand it to `Engine::execute_ast` without holding the cache lock. The module cache stores `Shared<Module>` (= `Arc<Module>` under the `sync` feature) because that's what `ModuleResolver::resolve` must return.
|
|
||||||
|
|
||||||
### 4.3 Stale-version detection
|
|
||||||
|
|
||||||
Both caches use the same logic: **compare `cached.updated_at` against the freshly-known `updated_at`**.
|
|
||||||
|
|
||||||
- For the script cache, the caller passes the fresh value as `ScriptIdentity.updated_at` — the orchestrator already loaded the script row to dispatch the request, so there's no extra DB hit.
|
|
||||||
- For the module cache, the resolver must call `ModuleSource::lookup` first to learn the fresh `updated_at` — every `import` does at least one DB roundtrip. That's a deliberate trade-off (documented in CHANGELOG): the alternative (TTL caching or pub/sub) introduces staleness during edits and complicates "publish a fix immediately" UX. Worth re-evaluating in v1.3+ when LISTEN/NOTIFY makes pub/sub cheap.
|
|
||||||
|
|
||||||
Mismatch → recompile + `cache.put(...)` replace. LRU eviction is automatic when capacity is exceeded.
|
|
||||||
|
|
||||||
### 4.4 Capacity overrides
|
|
||||||
|
|
||||||
- `PICLOUD_SCRIPT_CACHE_SIZE` (default 256, `LocalExecutorClient`)
|
|
||||||
- `PICLOUD_MODULE_CACHE_SIZE` (default 512, `Engine`)
|
|
||||||
|
|
||||||
Both clamp `max(1)` to avoid the LRU constructor's panic on zero. `Engine::with_module_cache_capacity` and `LocalExecutorClient::with_script_cache_capacity` give tests explicit handles.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Dep-graph population
|
|
||||||
|
|
||||||
### 5.1 Where the extraction happens
|
|
||||||
|
|
||||||
Inside the `ScriptValidator` impl on `Engine`. The trait now returns `ValidatedScript { imports: Vec<String> }`, populated by `extract_imports` (endpoint scripts) or `validate_module_source` (module scripts). Both walk `ast.statements()` and pull out `Stmt::Import(boxed_path_expr, _)` where the path is a `StringConstant`.
|
|
||||||
|
|
||||||
**Dynamic imports** (`import some_var as alias;`) are NOT captured because we can't know the name at compile time. Tested by `validate_endpoint_skips_dynamic_imports_in_imports_list`. Documented as a known limitation in the CHANGELOG and migration 0016's header comment.
|
|
||||||
|
|
||||||
### 5.2 Where the write happens — transactional with the script INSERT/UPDATE
|
|
||||||
|
|
||||||
`PostgresScriptRepository::create` and `update` both open a `tx = pool.begin().await?`. The script row is inserted/updated inside the tx; immediately after, `replace_imports_tx(&mut tx, importer, app_id, &imports)` runs. The tx is committed at the end. If any step fails, both the script change and the dep-graph mutation roll back together. No half-state where the script row exists but the edges don't (or vice versa).
|
|
||||||
|
|
||||||
`replace_imports_tx`:
|
|
||||||
|
|
||||||
1. `DELETE FROM script_imports WHERE importer_script_id = $1` — replaces wholesale.
|
|
||||||
2. `INSERT INTO script_imports ... SELECT ... FROM scripts WHERE app_id = $1 AND kind = 'module' AND name = ANY($3) ON CONFLICT DO NOTHING` — best-effort: only resolves to existing modules in the same app; unresolved names are silently skipped (no error). A later save of either script re-resolves and writes the edge.
|
|
||||||
|
|
||||||
### 5.3 Schema decisions
|
|
||||||
|
|
||||||
- `script_imports.app_id` is denormalized but useful: the "all imports in app X" scan happens once at boot for caching and (eventually) for the dashboard's audit view. Without it, that query would need a 3-way join.
|
|
||||||
- `created_at` is unused by v1.1.3 logic but trivial to add now and useful for v1.2+ "first imported" diagnostics.
|
|
||||||
- The FK on `imported_script_id` cascades — when a module is deleted, every edge referencing it goes too. The cascade isn't exercised by a unit test (it would need Postgres); it's covered by the FK design.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 6. Tests added
|
|
||||||
|
|
||||||
46 new tests across 5 crates. All green on HEAD `3dbead4`. Inventory:
|
|
||||||
|
|
||||||
### `crates/executor-core/tests/modules.rs` (NEW — 23 tests)
|
|
||||||
|
|
||||||
End-to-end through `Engine::execute` with a `CountingModuleSource` (in-memory fake).
|
|
||||||
|
|
||||||
| # | Test | Covers |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | `resolver_loads_simple_module` | Happy path: `import "m" as m; m::add(2, 3)` → 5. |
|
|
||||||
| 2 | `resolver_cross_app_blocked` | Modules with same name in two apps resolve to the calling app's version. |
|
|
||||||
| 3 | `resolver_cross_app_module_not_found` | App B's `import "lonely"` returns ModuleNotFound when only app A has it. |
|
|
||||||
| 4 | `resolver_module_not_found` | Missing module → `ErrorModuleNotFound`. |
|
|
||||||
| 5 | `resolver_self_import_detected` | `a` imports `a` → circular error. |
|
|
||||||
| 6 | `resolver_circular_detected` | `a → b → a` → circular error. |
|
|
||||||
| 7 | `resolver_depth_limit_enforced` | 9-deep chain with limit 8 → depth error. |
|
|
||||||
| 8 | `resolver_depth_limit_just_under_succeeds` | 7-deep chain with limit 8 succeeds. |
|
|
||||||
| 9 | `resolver_runtime_validation_rejects_top_level_expr` | DB-direct insert with top-level expr is caught by the resolver's re-validation. |
|
|
||||||
| 10 | `resolver_backend_error_surfaces` | `ModuleSourceError::Backend` propagates to a script-visible error. |
|
|
||||||
| 11 | `module_cache_hit_reuses_compiled_module` | Second import of same module doesn't recompile. |
|
|
||||||
| 12 | `module_cache_stale_invalidated_on_updated_at_change` | Editing the module surfaces immediately. |
|
|
||||||
| 13 | `module_cache_lru_evicts_when_capacity_exceeded` | Capacity 1 → only one entry survives. |
|
|
||||||
| 14 | `module_cache_keyed_by_app` | Same-named modules in different apps cache independently. |
|
|
||||||
| 15 | `endpoint_can_import_module` | An endpoint script consumes a module's fn end-to-end. |
|
|
||||||
| 16 | `module_can_import_module` | Modules can be importers. |
|
|
||||||
| 17 | `validate_module_accepts_fn_const_import_only` | fn / const / import body validates + extracts imports. |
|
|
||||||
| 18 | `validate_module_rejects_top_level_let` | `let x = 1;` rejected. |
|
|
||||||
| 19 | `validate_module_rejects_top_level_expr` | `42;` rejected. |
|
|
||||||
| 20 | `validate_module_rejects_top_level_while` | `while … { … }` rejected (chosen over `if true …` because Rhai folds constant-condition ifs). |
|
|
||||||
| 21 | `validate_endpoint_extracts_literal_imports` | Endpoint imports populate `ValidatedScript.imports`. |
|
|
||||||
| 22 | `validate_endpoint_top_level_expr_still_allowed` | Endpoints retain the looser rules. |
|
|
||||||
| 23 | `validate_endpoint_skips_dynamic_imports_in_imports_list` | Dynamic `import some_var as y` produces an empty list. |
|
|
||||||
|
|
||||||
### `crates/orchestrator-core/src/client.rs` (6 inline tests)
|
|
||||||
|
|
||||||
| # | Test | Covers |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | `cache_hit_when_identity_matches` | Identical `(script_id, updated_at)` returns the same `Arc<AST>`. |
|
|
||||||
| 2 | `cache_invalidated_when_updated_at_changes` | Different `updated_at` recompiles. |
|
|
||||||
| 3 | `distinct_script_ids_cache_independently` | Two scripts → two entries. |
|
|
||||||
| 4 | `lru_eviction_caps_cache_size` | Capacity 1; A → B → C leaves one entry. |
|
|
||||||
| 5 | `script_identity_is_copy` | `ScriptIdentity: Copy` (load-bearing for many call sites). |
|
|
||||||
| 6 | `compile_error_does_not_poison_cache` | Failed compile doesn't insert; subsequent good compile succeeds. |
|
|
||||||
|
|
||||||
### `crates/shared/src/script.rs` (3 inline tests)
|
|
||||||
|
|
||||||
| # | Test | Covers |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | `default_is_endpoint` | `ScriptKind::default() == Endpoint`. |
|
|
||||||
| 2 | `round_trips_through_serde_lowercase` | `"endpoint"` / `"module"` wire form. |
|
|
||||||
| 3 | `parse_str_round_trip` | `as_str` ↔ `parse_str` inverses. |
|
|
||||||
|
|
||||||
### `crates/manager-core/src/triggers_api.rs` (6 new inline tests)
|
|
||||||
|
|
||||||
| # | Test | Covers |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | `kv_trigger_rejects_module_target` | Module script as KV-trigger target → 422 with `"module"` in the message. |
|
|
||||||
| 2 | `docs_trigger_rejects_module_target` | Same for docs triggers. |
|
|
||||||
| 3 | `dl_trigger_rejects_module_target` | Same for dead-letter triggers. |
|
|
||||||
| 4 | `kv_trigger_rejects_missing_script` | Non-existent script id → 422. |
|
|
||||||
| 5 | `kv_trigger_rejects_cross_app_script` | Latent v1.1.1/v1.1.2 isolation gap — script in app B targeted from app A → 422. |
|
|
||||||
| 6 | `kv_trigger_accepts_endpoint_target` | Happy path. |
|
|
||||||
|
|
||||||
### `crates/picloud/tests/api.rs` (8 `#[ignore]`'d Postgres-gated tests)
|
|
||||||
|
|
||||||
End-to-end through the HTTP surface. Run with `--include-ignored` against a real Postgres.
|
|
||||||
|
|
||||||
| # | Test | Covers |
|
|
||||||
|---|---|---|
|
|
||||||
| 1 | `create_script_default_kind_is_endpoint` | Default kind on create. |
|
|
||||||
| 2 | `create_module_kind_persists` | `kind=module` round-trips through the API. |
|
|
||||||
| 3 | `create_module_with_top_level_expr_rejected` | Module syntax gate at create time. |
|
|
||||||
| 4 | `create_module_with_reserved_name_rejected` | `kv`, `docs`, etc. reserved. |
|
|
||||||
| 5 | `route_bind_rejects_module` | `POST .../routes` returns 422 for module targets. |
|
|
||||||
| 6 | `endpoint_imports_module_end_to_end` | Endpoint imports module, route binding, HTTP invocation, result. |
|
|
||||||
| 7 | `module_edit_visible_on_next_invocation` | Cache invalidation on module edit (verified end-to-end through the engine). |
|
|
||||||
| 8 | `cross_app_import_blocked` | Two apps, same-name module, endpoint sees its own. |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 7. Schema / decisions beyond the brief
|
|
||||||
|
|
||||||
- **Module name shape CHECK** (`migrations/0015_scripts_kind.sql`): module names are constrained to `^[a-zA-Z_][a-zA-Z0-9_]{0,63}$`. Endpoint scripts retain the looser pre-v1.1.3 name rules so existing rows aren't invalidated. Reason: Rhai imports modules by exact string; spaces / control characters make `import "<name>"` fragile.
|
|
||||||
- **Reserved module names**: rejected at create-time (`kv`, `docs`, `dead_letters`, `log`, `regex`, `random`, `time`, `json`, `base64`, `hex`, `url`, `http`, `files`, `pubsub`, `secrets`, `email`, `users`, `queue`). Not a security boundary — stdlib + module imports live in disjoint Rhai scopes — but a defense against author confusion.
|
|
||||||
- **`ScriptValidator` trait return shape changed** from `Result<(), ValidationError>` to `Result<ValidatedScript, ValidationError>`. Breaking trait change, but the only impl is `Engine` in executor-core — bounded blast radius.
|
|
||||||
- **`ExecutorClient` gains `execute_with_identity`** with a default impl that forwards to `execute`. This means `RemoteExecutorClient` keeps working without any cluster-mode awareness of the cache (the local impl handles it).
|
|
||||||
- **Latent security fix**: trigger creation now verifies `script.app_id == app_id`. v1.1.1 and v1.1.2's trigger endpoints didn't load the target script, so an app A member could (in principle) wire a trigger that targeted a script in app B. Closed in this release; called out in the CHANGELOG.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 8. How to verify locally (verified on HEAD `3dbead4`)
|
|
||||||
|
|
||||||
After my last commit, I ran the three gates plus the dashboard check on the exact HEAD I'm handing back. **Actual** exit codes and counts (not pre-written):
|
|
||||||
|
|
||||||
### 8.1 `cargo fmt --all -- --check`
|
|
||||||
|
|
||||||
```
|
|
||||||
$ cargo fmt --all -- --check
|
|
||||||
$ echo $?
|
|
||||||
0
|
|
||||||
```
|
|
||||||
|
|
||||||
Clean diff, **exit 0**.
|
|
||||||
|
|
||||||
### 8.2 `cargo clippy --all-targets --all-features -- -D warnings`
|
|
||||||
|
|
||||||
```
|
|
||||||
$ cargo clippy --all-targets --all-features -- -D warnings
|
|
||||||
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.21s
|
|
||||||
$ echo $?
|
|
||||||
0
|
|
||||||
```
|
|
||||||
|
|
||||||
No warnings, **exit 0**.
|
|
||||||
|
|
||||||
### 8.3 `cargo test --workspace`
|
|
||||||
|
|
||||||
```
|
|
||||||
$ cargo test --workspace
|
|
||||||
... (per-suite results) ...
|
|
||||||
$ echo $?
|
|
||||||
0
|
|
||||||
```
|
|
||||||
|
|
||||||
Aggregate (summed across all `test result:` lines):
|
|
||||||
|
|
||||||
- **PASSED = 358**
|
|
||||||
- **FAILED = 0**
|
|
||||||
- **IGNORED = 140** (Postgres-gated `#[ignore]` integration tests in `picloud/tests/api.rs` + 1 schema_snapshot test; need `DATABASE_URL` to run)
|
|
||||||
- **measured = 0**
|
|
||||||
- **filtered out = 0**
|
|
||||||
|
|
||||||
### 8.4 `(cd dashboard && npm run check)`
|
|
||||||
|
|
||||||
```
|
|
||||||
$ cd dashboard && npm run check
|
|
||||||
> picloud-dashboard@0.9.0 check
|
|
||||||
> svelte-kit sync && svelte-check --tsconfig ./tsconfig.json
|
|
||||||
|
|
||||||
1780463972778 START "/home/fabi/PiCloud/dashboard"
|
|
||||||
1780463972779 COMPLETED 369 FILES 0 ERRORS 0 WARNINGS 0 FILES_WITH_PROBLEMS
|
|
||||||
$ echo $?
|
|
||||||
0
|
|
||||||
```
|
|
||||||
|
|
||||||
0 errors, 0 warnings, **exit 0**.
|
|
||||||
|
|
||||||
### 8.5 Migrations apply
|
|
||||||
|
|
||||||
Verified during normal `cargo test --workspace` runs — `sqlx::test` macros apply migrations 0001 through 0016 cleanly on a freshly created database for every `#[ignore]`d integration test. The from-v1.1.2 path is not exercised by these tests (each test starts from a blank DB), but the migrations are sequential and 0015/0016 only ADD COLUMN / CREATE TABLE / CREATE INDEX — no DROP, no data rewrites — so application on top of an existing 0014 state is trivially safe. The downgrade caveat is documented in the CHANGELOG.
|
|
||||||
|
|
||||||
### 8.6 Manual smoke
|
|
||||||
|
|
||||||
I did **not** run the full end-to-end manual smoke against a live Postgres + Caddy stack as the brief's "Done looks like" specifies. The 8 ignored `picloud/tests/api.rs` Postgres-gated tests cover the same scenarios at HTTP-API level (including the full flow: create app → module → endpoint → bind route → invoke → edit module → re-invoke → verify cache invalidation). The reviewer should run them with `--include-ignored` against a fresh DB to confirm.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 9. Open questions for the reviewer
|
|
||||||
|
|
||||||
1. **Optimizer constant-folding edge.** Module bodies containing only `if true { ... }` (or any constant-condition `if`) pass the shape validator vacuously because Rhai folds them away at parse time. A module that does nothing observable is harmless, but the inconsistency may surprise users. Options:
|
|
||||||
- Accept as-is (current state); document.
|
|
||||||
- Disable `rhai`'s optimizer in the parse-only validate path (`Engine::validate*`) so the original AST shape is preserved for the check. Adds a small cost; might leak optimizer-dependent surprises elsewhere.
|
|
||||||
- Add a regex/source scan as a belt-and-braces check. Fragile.
|
|
||||||
- **Recommend:** accept as-is; revisit if a real user hits it.
|
|
||||||
|
|
||||||
2. **`ScriptKind::Module → Endpoint` transition.** Currently always allowed. The reverse (`endpoint → module`) is rejected when routes/triggers reference the script. Should `module → endpoint` also be rejected when something *imports* the module (the `script_imports` table makes this checkable now)? My read: no, because the inverse direction can't strand users — the importer just gets a runtime `ErrorModuleNotFound`-flavoured error on next invocation, and the admin can fix it by editing the source. But it's a defensible choice either way.
|
|
||||||
|
|
||||||
3. **Cached-module memory pressure.** The module cache stores `Arc<rhai::Module>` per `(AppId, name)`. With many apps × many modules, this could grow. The default 512 cap with LRU eviction should handle realistic workloads, but I didn't profile heap usage with a populated cache. Recommendation: leave as-is for v1.1.3; add a metric (`picloud_module_cache_bytes`) when metrics ship in v1.1.6.
|
|
||||||
|
|
||||||
4. **`rhai/internals` feature.** Enabled in executor-core to walk `ast.statements()`. The Rhai maintainers warn this surface can change without a major bump. We're pinned to the workspace `rhai = "1.19"` line (which resolved to `1.24.0` in Cargo.lock). Consider tightening to `rhai = "=1.24"` so future Cargo.lock updates are deliberate.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 10. Deferred items (explicitly OUT of v1.1.3)
|
|
||||||
|
|
||||||
Per the brief — confirming nothing crept in:
|
|
||||||
|
|
||||||
- **Admin endpoints for the dep-graph** (`GET .../imports`, `GET .../imported-by`). Persisted in `script_imports`; no API surface in v1.1.3. The dashboard's "Used by" panel is a v1.2+ task.
|
|
||||||
- **Module versioning / pinning** (`import "B@v3"`). v1.3+.
|
|
||||||
- **Eager precompilation** at script-save time. v1.1.3 is compile-on-first-use only.
|
|
||||||
- **Dashboard dependency-graph visualization.** v1.2+.
|
|
||||||
- **LISTEN/NOTIFY-based cross-node invalidation.** v1.3+ (cluster mode).
|
|
||||||
- **Module-level capabilities / ACLs.** v1.2+.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 11. Known limitations / rough edges
|
|
||||||
|
|
||||||
1. **Dynamic imports aren't dep-graph-tracked.** `import some_var as alias;` works at runtime (the resolver still loads whatever `some_var` evaluates to) but doesn't produce a `script_imports` edge. Documented in the migration 0016 header and the CHANGELOG.
|
|
||||||
|
|
||||||
2. **Per-execution module cache scope.** The module cache is process-wide. Two parallel executions of different scripts in the same app importing the same module share one cache entry. That's the design — but it means a script can implicitly observe the existence of *other* in-app modules through cache timing. Not a security boundary breach (the data is same-app), but worth noting.
|
|
||||||
|
|
||||||
3. **Top-level statement validation depends on `rhai/internals`.** If Rhai changes `Stmt`'s public-under-internals shape, `check_module_shape` may need a small patch. Mitigation: pin a tighter version (see §9.4).
|
|
||||||
|
|
||||||
4. **No `ResolverError` carry-through.** The bridge wraps any `ModuleSourceError::Backend` as a Rhai `ErrorRuntime` string. Script-visible messages include the backend text directly (e.g. "module backend error: connection refused"). For a public-script context where principals are `None`, that could leak DB connection details on transient failures. Recommend filtering or redacting at the boundary in v1.1.4+.
|
|
||||||
|
|
||||||
5. **Mid-execution module edits.** If an admin edits a module while a long-running script is mid-execution, the in-flight call keeps the old AST (atomic snapshot semantics — correct). The next call sees the new behavior. No race; just noting.
|
|
||||||
|
|
||||||
6. **`StackGuard` arms unconditionally.** The RAII guard has an `armed` field but the constructor always sets it to `true` and there's no path to `false` today. Future code that wants to bypass cleanup (e.g. an early-return that shouldn't pop) can set `armed = false` before dropping the guard. Currently dead-but-cheap; I left it in for clarity.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Reviewer next steps: audit, then write `REVIEW.md`, then merge to `main` on approval. The branch is `feat/v1.1.3-modules` at `3dbead4`.
|
|
||||||
|
|||||||
Reference in New Issue
Block a user