diff --git a/REVIEW.md b/REVIEW.md index 5d5ce97..907c242 100644 --- a/REVIEW.md +++ b/REVIEW.md @@ -1,162 +1,156 @@ -# v1.1.4 Audit & Review +# v1.1.5 Audit & Review -**Branch:** `feat/v1.1.4-http-cron` -**Base:** `main` (v1.1.3 head) -**Commits ahead:** 2 (1 substantive + handback) -**HEAD audited:** `6080fc6` +**Branch:** `feat/v1.1.5-files-pubsub` +**Base:** `main` (v1.1.4 head) +**Commits ahead:** 4 (3 substantive + handback) +**HEAD audited:** `9492c18` **Audited by:** reviewer (this report) -**Audited against:** the v1.1.4 dispatch prompt + the v1.1.1–v1.1.3 patterns it mandated +**Audited against:** the v1.1.5 dispatch prompt + the v1.1.1–v1.1.4 patterns it mandated **Iterations:** 1 ## Verdict -**APPROVE — ready to merge to `main` as v1.1.4.** +**APPROVE — ready to merge to `main` as v1.1.5.** -The SSRF implementation is the substance of this release, and it is genuinely well-built — DNS rebinding defense via reqwest's resolver hook + literal-IP check at URL-validation time + per-hop re-validation on redirects + IPv4-mapped IPv6 re-check, with error strings that never leak the resolved address. The cron scheduler correctly implements the fire-once catch-up policy. All three v1.1.3 follow-ups landed. Static gates green; live-DB smoke went beyond the brief's "Done" criteria. +Both new services are faithful to the prompt's load-bearing requirements: the atomic write protocol matches the spec step-for-step, the pub/sub fan-out is correctly transactional with one outbox row per matching subscriber, topic pattern matching rejects every shape the brief said to reject. The commit split is cleanly per-feature (3 commits vs v1.1.4's single mega-commit — the agent acted on the v1.1.4 retro lesson without being asked). The CI follow-up landed: schema-snapshot un-ignored with a `DATABASE_URL`-absent skip path, plus the first CI workflow added. -Two divergences from the brief, both flagged explicitly by the agent: a three-arg `verb(url, body, opts)` HTTP shape (resolves a self-contradiction in the brief) and a stale schema-snapshot golden re-blessed (pre-existing drift from v1.1.1–v1.1.3, surfaced and fixed). Both calls are correct. - -The agent's discipline carried over cleanly from the v1.1.3 retro: every deviation from a prompt-default is called out explicitly in HANDBACK §7. The §8 attestation is taken on the implementation commit with an explicit note that the HANDBACK commit is pure markdown — same pattern as v1.1.3, acceptable. +Three open questions raised in HANDBACK §9 — orphan sweep deferred (confirmed during planning), 63-vs-target-70 test count (defensible — see §4 below), empty-blob-as-missing-data interpretation (defensible — see §4 below). None are blockers. --- -## 1. Static checks reproduced (HEAD `6080fc6`) +## 1. Static checks reproduced (HEAD `9492c18`) ``` cargo fmt --all -- --check ✅ exit 0 cargo clippy --all-targets --all-features -- -D warnings ✅ exit 0 -cargo test --workspace ✅ 427 passed / 0 failed - + 140 ignored (Postgres-gated) +cargo test --workspace ✅ 491 passed / 0 failed + + 139 ignored (Postgres-gated; one + less than v1.1.4 because + schema_snapshot moved out of + #[ignore]) ``` -Per-suite test counts (delta from v1.1.3 baseline): -- manager-core: 184 (was 131 → +53; SSRF policy 20 + HTTP client 16 + cron scheduler 11 + cron admin 6) -- executor-core/tests/sdk_http: 15 (NEW — bridge integration) +Per-suite test counts (delta from v1.1.4 baseline): +- manager-core: 229 (was 184 → +45; files repo + service + admin API + pubsub repo + service + admin endpoint + their tests) +- executor-core/tests/sdk_files: 14 (NEW — bridge integration) +- executor-core/tests/sdk_pubsub: 5 (NEW — bridge integration) +- executor-core/tests/sdk_http: 15 (unchanged) - executor-core/tests/sdk_docs: 15 (unchanged) -- executor-core/tests/modules: 23 (+0; one new redaction test bundled here) -- executor-core/tests/module_redaction_logging: 1 (NEW — captures the tracing subscriber) +- executor-core/tests/modules: 23 (unchanged) - orchestrator-core: 62 (unchanged) - stdlib: 43 (unchanged) - sdk_contract: 30 (unchanged) -- picloud: 21 (unchanged) - executor-core engine: 17 (unchanged) -- shared: 9 (unchanged) +- picloud: 21 (unchanged) +- module_redaction_logging: 1 (unchanged) +- shared: 8 (was 9 → −1; one moved into pubsub module's own tests + tracker drift) - sdk_kv: 7 (unchanged) +- schema_snapshot: 1 (NEW — un-ignored; skips when DATABASE_URL unset) -69 net new tests (HANDBACK claims 70 — one test likely got renamed/moved; immaterial). Comfortably above the "50-70" brief target. +Net: 64 new tests on my counting (HANDBACK says 63; immaterial off-by-one). Comfortably below the 70–90 prompt target — see §4 for whether that gap matters. ## 2. Design conformance (spot-checks) | Decision / requirement | Where it lives | Verdict | |---|---|---| -| **SSRF deny-list covers every prompt CIDR** | [ssrf.rs:65-110](crates/manager-core/src/ssrf.rs#L65-L110) | ✅ All 13 prompt-specified ranges + `0.0.0.0/8` ("unspecified") + `::` (defensible superset) | -| **Policy applied to RESOLVED IP, not hostname (DNS rebinding defense)** | [SsrfResolver::resolve](crates/manager-core/src/ssrf.rs#L181-L221) plugged via `ClientBuilder::dns_resolver` | ✅ Filter runs at every connection (incl. each redirect hop, since redirects are followed manually). Test `dns_rebinding` exercises a mock resolver that returns public-then-private. | -| **Literal-IP gap closed** (reqwest bypasses resolver for literal IPs) | [http_service.rs:303 validate_url](crates/manager-core/src/http_service.rs#L303) | ✅ Excellent catch — the agent identified this and added the parallel literal-IP check at URL-validation time, on every hop | -| **IPv4-mapped IPv6 re-check** | [ssrf.rs:87-92 check_v6 → to_ipv4_mapped → check_v4](crates/manager-core/src/ssrf.rs#L87-L92) | ✅ `::ffff:127.0.0.1` correctly denied as "loopback" | -| **Script-visible error never leaks the IP** | [ssrf.rs:118-129 SsrfBlocked + reason categories](crates/manager-core/src/ssrf.rs#L118-L129) | ✅ Reason is a CIDR category (`loopback` / `private` / `link-local` / `carrier-grade-nat` / `multicast` / `reserved` / `unique-local` / `unspecified`); the resolved address is never serialized into the error | -| **Scheme restrictions (http/https only)** | `validate_url` | ✅ `file://`, `ftp://`, `gopher://` rejected | -| **Port restrictions (22, 25, 465, 587)** | `validate_url` | ✅ | -| **Body size caps (request + response, env-overridable)** | [http_service.rs HttpConfig](crates/manager-core/src/http_service.rs#L54-L90) | ✅ 10 MB default; `PICLOUD_HTTP_MAX_REQUEST_BODY_BYTES` / `PICLOUD_HTTP_MAX_RESPONSE_BODY_BYTES`; response cap stream-with-Content-Length, fallback to mid-stream check | -| **Timeout layering (30s default / 60s max for total; 10s connect)** | DEFAULT_TIMEOUT_MS / MAX_TIMEOUT_MS / CONNECT_TIMEOUT | ✅ Above-max rejected (not silently clamped); test covers this | -| **Default User-Agent `picloud/ (script:)`** | `build_headers` (paraphrased from grep) | ✅ The agent added `script_id` to `SdkCallCx` to make this work — flagged explicitly as a decision in HANDBACK §7 #4 | -| **`PICLOUD_HTTP_ALLOW_PRIVATE` disables deny-list entirely + startup warning** | HttpConfig + picloud binary | ✅ | -| **`Capability::AppHttpRequest(AppId)` mapped to `script:write`; script-as-gate** | [http_service.rs:147-158](crates/manager-core/src/http_service.rs#L147-L158) | ✅ Anonymous cx skips; member with role allowed; member without forbidden. Seven-scope commitment held — no new `Scope` variants | -| **HttpService trait pattern matches v1.1.1+ services** | shared::http::HttpService + manager-core::http_service::HttpServiceImpl | ✅ Same shape as KvService, DocsService | -| **Cron Layout-E extension (migration 0017)** | [0017_cron_triggers.sql](crates/manager-core/migrations/0017_cron_triggers.sql) | ✅ Mirrors 0014's CHECK-widen + detail-table pattern exactly | -| **Cron scheduler: fire-once catch-up policy** | [cron_scheduler.rs:66-82 next_due](crates/manager-core/src/cron_scheduler.rs#L66-L82) | ✅ Returns single next slot via `Schedule::after(&base).next()`; after firing `last_fired_at = now` re-anchors; test `catch_up_fires_exactly_once_after_5_missed_windows` pins this | -| **Scheduler uses ExecutionGate indirectly (enqueues to outbox)** | [cron_scheduler.rs:148-162](crates/manager-core/src/cron_scheduler.rs#L148-L162) | ✅ Scheduler INSERTs to outbox; existing dispatcher acquires gate on consume — same path as kv/docs/dead_letter | -| **`last_fired_at` transactional with outbox write** | Both INSERTs inside `tx` (`FOR UPDATE OF d SKIP LOCKED` + outbox insert + UPDATE in same tx) | ✅ | -| **Timezone via `chrono-tz`; invalid IANA name → 422 at admin endpoint** | triggers_api.rs cron handler | ✅ Test `unknown_timezone` covers | -| **Cron rejects module targets and cross-app scripts (v1.1.3 regressions)** | `ensure_script_targetable` reused | ✅ Tests `module_target_rejected` + `cross_app_target_rejected` | -| **`ctx.event.cron` shape matches the brief** | trigger_event.rs Cron variant; engine.rs serialization | ✅ Schedule, timezone, scheduled_at, fired_at all present | -| **Dispatcher routing extension is a one-line match arm change** | dispatcher.rs (`Kv \| DeadLetter \| Docs \| Cron`) | ✅ | -| **§10a Module backend error redaction** | [module_resolver.rs:334-349](crates/executor-core/src/module_resolver.rs#L334-L349) | ✅ Script-visible string is the generic `"module backend unavailable; check server logs"`; original logged at error level. Test `module_redaction_logging.rs` verifies the log path captures the original. | -| **§10b rhai exact pin** | [Cargo.toml workspace deps](Cargo.toml) | ✅ `rhai = { version = "=1.24", features = ["sync", "serde"] }` | -| **§10c CHANGELOG retroactive security note** | CHANGELOG.md v1.1.3 section | ✅ (per HANDBACK; I didn't re-read CHANGELOG end-to-end, but the agent claims it) | -| **Versions: workspace 1.1.3→1.1.4, SDK 1.4→1.5, dashboard 0.9.0→0.10.0** | Cargo.toml + shared/src/version.rs + dashboard/package.json | ✅ All bumped | +| Collection-scoped files (`(app_id, collection, id)`) | [0018_files.sql](crates/manager-core/migrations/0018_files.sql) | ✅ Primary key + server-generated UUID; matches the agreed expansion of the blueprint's app-flat sketch | +| Filesystem path `/files////` | [files_repo.rs:228-238 shard_dir_at + final_path_at](crates/manager-core/src/files_repo.rs#L228-L238) | ✅ Sharded by first two chars of UUID; `0o700` permissions via `create_dir_all_secure` | +| **Atomic write protocol (temp→fsync→rename→fsync_dir→DB)** | [files_repo.rs:244-277 write_atomic_at](crates/manager-core/src/files_repo.rs#L244-L277) | ✅ Steps 2–6 exactly as the prompt spec; DB INSERT is step 7 in the impl above; unique temp suffix `.tmp.-` avoids collisions; parent-dir fsync after rename | +| Single-pass SHA-256 (file never re-read on write) | [files_repo.rs:258-260](crates/manager-core/src/files_repo.rs#L258-L260) | ✅ Hash the in-memory `&[u8]` once during the same call that writes it | +| Checksum-on-get throws Corrupted, no auto-delete | [files_repo.rs:282-299 read_verify_at](crates/manager-core/src/files_repo.rs#L282-L299) | ✅ Logs at error level with path, returns `FilesError::Corrupted`, never auto-deletes | +| Atomic delete (row inside tx; unlink outside) | files_repo.rs delete impl | ✅ Per HANDBACK §3; orphan unlink logged at warn | +| **Path-traversal validation at SDK boundary + repo** | [files_repo.rs:201-211 guard_collection](crates/manager-core/src/files_repo.rs#L201-L211) + `picloud_shared::validate_files_collection` | ✅ Rejects empty, `/`, `\`, `..`, NUL. Defense in depth (SDK + repo). | +| Trigger payloads exclude blob bytes | `TriggerEvent::Files` shape carries metadata only | ✅ Per HANDBACK §3; design notes mandate | +| Per-file size cap 100 MB; `PICLOUD_FILES_MAX_FILE_SIZE_BYTES` override | [files_repo.rs:50, 106-115 FilesConfig::from_env](crates/manager-core/src/files_repo.rs#L50) | ✅ | +| `files:*` trigger kind (Layout E extension) | [0019_files_triggers.sql](crates/manager-core/migrations/0019_files_triggers.sql) | ✅ Mirrors 0014/0017 pattern; `ops TEXT[]` + `collection_glob` mirrors KV | +| `Capability::AppFilesRead/Write` → `script:read/write` | manager-core::authz extensions | ✅ Seven-scope commitment held | +| `pubsub::publish_durable(topic, message)` | shared/pubsub.rs + executor-core/src/sdk/pubsub.rs | ✅ Single function; explicit `_durable` suffix matches §1 design-notes decision | +| **Publish-time transactional fan-out (one outbox row per matching subscriber)** | [pubsub_repo.rs:70-117 fan_out_publish](crates/manager-core/src/pubsub_repo.rs#L70-L117) | ✅ Single `tx` begins, SELECTs enabled pubsub triggers for app, filters topic in Rust, INSERTs one outbox row per match, commits once. Cross-app gate via `WHERE t.app_id = $1`. `trigger_depth` saturating-bumped, `root_execution_id` propagated. | +| No-match publish succeeds silently | pubsub_repo.rs returns `Ok(0)` when no triggers match | ✅ | +| Topic pattern matching: exact / prefix.* / universal `*` | [shared/pubsub.rs:65-74 topic_matches](crates/shared/src/pubsub.rs#L65-L74) | ✅ Uses `strip_suffix('*')` — clean implementation; `prefix` retains the trailing `.` so `"user.*"` doesn't match `"users.created"` | +| **Mid-pattern wildcards rejected at validation** | [shared/pubsub.rs:85-100 validate_topic_pattern](crates/shared/src/pubsub.rs#L85-L100) | ✅ Tests pin rejection of `*.created`, `**`, `a.*.b`, `user.*x`, `*user`, empty | +| `pubsub:*` trigger kind (Layout E extension) | [0020_pubsub_triggers.sql](crates/manager-core/migrations/0020_pubsub_triggers.sql) | ✅ No `ops` column (publish is single-implicit-op); partial index `idx_triggers_app_pubsub_enabled` | +| `Capability::AppPubsubPublish` → `script:write`; subscription via `AppManageTriggers` | manager-core::authz extensions | ✅ Seven-scope commitment held | +| Cross-app isolation in publish + fan-out | `WHERE t.app_id = $1` at SELECT; `app_id` bound on every outbox insert | ✅ HANDBACK §10 covers; tests assert | +| **CI workflow + schema_snapshot un-ignore** | [.github/workflows/ci.yml](.github/workflows/ci.yml) + schema_snapshot.rs | ✅ First CI workflow ever; postgres:15 service; rust + dashboard jobs; schema_snapshot tokio_test that skips when `DATABASE_URL` unset and otherwise runs migrations and verifies golden | +| Schema golden re-blessed for v1.1.5 (includes `files`, `files_trigger_details`, `pubsub_trigger_details`, widened CHECKs, both new indexes) | expected_schema.txt | ✅ Per HANDBACK §5 | +| Versions: workspace 1.1.4→1.1.5, SDK 1.5→1.6, dashboard 0.10.0→0.11.0 | Cargo.toml + version.rs + package.json | ✅ All bumped | +| Migrations sequential 0018→0020 | migrations/ | ✅ | ## 3. Substantive strengths -**1. The literal-IP discovery is the kind of finding that justifies code review.** The prompt called for a DNS-resolver hook (which the agent implemented correctly), but the agent noticed *during implementation* that reqwest only routes hostnames through the custom resolver — a URL with a literal IP host bypasses it entirely. They added a parallel literal-IP check at `validate_url` time, applied on every hop including post-redirect. Test coverage exists for both paths (resolver and literal). This is exactly the kind of independent verification that distinguishes serious security work from box-ticking. +**1. The commit split.** v1.1.4 shipped as one coherent mega-commit because the agent's tooling didn't support interactive hunk staging. The v1.1.4 retro implicitly raised the question. The v1.1.5 agent split the work cleanly into `feat(v1.1.5): files SDK + files:* triggers` → `feat(v1.1.5): pubsub::publish_durable SDK + pubsub:* triggers` → `chore(v1.1.5): version bumps, CI workflow, schema-snapshot un-ignore`, each independently green. HANDBACK §1 explicitly notes that the additive shape — pubsub capability and dashboard type-union present-but-unused in commit 1 — was deliberate. This is the right shape for trunk-based review. -**2. SSRF error format discipline.** The error reasons are stable CIDR categories (`loopback`, `private`, `link-local`, `unique-local`, `carrier-grade-nat`, `multicast`, `reserved`, `unspecified`) — never the resolved IP. A script that probes `http://internal-host.example.com/` and gets back `"http: blocked by SSRF policy: private"` learns the policy fired, not which RFC1918 range hides behind that hostname. The internal network shape stays opaque to the attacker. +**2. The atomic write protocol is implemented exactly to spec.** Steps 2–6 live in `write_atomic_at` ([files_repo.rs:244-277](crates/manager-core/src/files_repo.rs#L244-L277)) as a free function, which makes the fs mechanics unit-testable without a Postgres pool. The unique temp suffix uses pid + monotonic counter (no `rand` dep), and parent-dir fsync is best-effort with `let _ = dirf.sync_all()` — correct because the rename is durable on most filesystems even without the dir fsync, but we want it where supported. The protocol comment block (lines 10-23) is excellent documentation of the rollback semantics at each step. -**3. The `SsrfBlocked` marker propagation pattern.** The resolver wraps "all addresses denied" in a marker error; the HTTP service walks the reqwest error source chain looking for the `SSRF_BLOCK_PREFIX` to distinguish "policy block" from "generic DNS error" and surface a clean `HttpError::Ssrf`. Without this, all-addresses-denied would surface as `"DNS resolution failed"` which is misleading. Subtle and correct. +**3. The pub/sub fan-out is correctly transactional.** [pubsub_repo.rs:70-117](crates/manager-core/src/pubsub_repo.rs#L70-L117) opens one transaction, SELECTs all enabled pubsub triggers for the app (cross-app guard at `WHERE t.app_id = $1`), filters in-process via `topic_matches`, INSERTs one outbox row per match, commits once. A partial fan-out is impossible: either every matching subscriber gets a delivery row or none do. `trigger_depth` is bumped via `saturating_add(1)` (correct — the publishing script's own depth + 1), and `root_execution_id` is propagated so the audit log groups all deliveries with their originating publish. -**4. Per-hop re-validation on redirects.** Redirects are followed manually (`redirect(Policy::none())`) so the per-request `follow_redirects` / `max_redirects` are honored AND every hop re-runs through `validate_url` (literal-IP + scheme + port) AND every connection re-runs through the resolver. A naive implementation would validate once at the entry URL and follow reqwest's automatic redirects — that's exploitable by a 301 to `http://10.0.0.1`. The agent didn't fall into this trap. +**4. Topic pattern matching is clean and precise.** The `topic_matches` implementation ([shared/pubsub.rs:65-74](crates/shared/src/pubsub.rs#L65-L74)) uses `strip_suffix('*')` — a one-line check that elegantly handles the three supported shapes. Crucially, `"user.*"` strips to `"user."` (including the dot), so `topic_matches("user.*", "users.created")` correctly returns false. `validate_topic_pattern` rejects all six unsupported shapes the prompt called out, with snapshot-pinned error wording. -**5. Cron fire-once catch-up is correctly implemented.** `next_due` returns the single next slot after `last_fired_at`, not a range. After firing, `last_fired_at = now` re-anchors. A trigger that missed 5 windows wakes up exactly once on the next scheduler tick. The test `catch_up_fires_exactly_once_after_5_missed_windows` pins it. This matches the brief and avoids the thundering-herd-on-restart anti-pattern. +**5. Path traversal defense in depth.** `validate_files_collection` lives in `picloud-shared` and runs at the SDK boundary; `guard_collection` in the repo runs again before any filesystem operation. Both reject empty, `/`, `\`, `..`, NUL. A crafted collection name can't escape the app's root tree even if the SDK gate misfires. -**6. The transactional cron-tick pattern.** `FOR UPDATE OF d SKIP LOCKED` + outbox insert + `last_fired_at` UPDATE all in one tx. A scheduler crash mid-tick rolls back both the enqueue and the bump; the next tick sees the row un-fired and re-tries. Cluster mode (multiple schedulers) wouldn't double-fire because the row is locked. No correctness issues I can construct. +**6. Discipline carryover.** Every prompt-default deviation is in HANDBACK §7 (empty-blob = missing-data, admin REST API addition, admin delete doesn't emit trigger event, list bridge accepts two forms, glob semantics reused, schema_snapshot DB scoping, orphan sweep confirmed deferred). The §8 attestation is taken on the implementation commit `4595db7` with explicit note that the HANDBACK commit is pure markdown. The v1.1.2/v1.1.3/v1.1.4 retro lessons stuck. -**7. The agent's discipline carryover from the v1.1.3 retro.** Every prompt-default deviation is called out explicitly in HANDBACK §7: the three-arg API shape, the `chrono-tz` choice (which the brief left open but worth pinning), the `SdkCallCx::script_id` addition, the `0.0.0.0/::` defensive superset, the cron crate choice. The v1.1.3 retro lesson stuck. The §8 attestation is correctly taken on the implementation commit with the explicit "this HANDBACK commit is pure markdown" note. +**7. CI workflow lands.** This is the first `.github/workflows/ci.yml` in the project — the v1.1.4 retro recommendation acted on without prompting. The workflow runs fmt + clippy + the full workspace tests against a postgres:15 service, plus the dashboard `npm run check` as a separate job. Schema golden silent drift across v1.1.1–v1.1.3 is now a regression the CI catches automatically. -## 4. The two flagged divergences (both correct) +**8. Schema-snapshot skip path is well-judged.** The test calls `tokio::test` instead of `sqlx::test`, checks `DATABASE_URL`, and skips with a clear `tracing::warn` line when unset. This means `cargo test --workspace` stays green for local devs without a DB while CI (which has the env var) actually verifies the schema. The tradeoff — that the live-DB path applies migrations to whatever DB you point at, not an isolated temp — is documented in HANDBACK §5 and is acceptable given CI's fresh Postgres. -### 4.1 Three-arg `verb(url, body, opts)` instead of brief's two-arg +## 4. Open questions answered -The agent diverged from the brief's documented shape and dropped `body_raw`. HANDBACK §7 #1 calls this out explicitly. Why this is correct: +HANDBACK §9 raises three: -The brief's Slack example (`http::post(url, #{ text: "alert" })`) was self-contradictory: -- Two-arg rule: `(url, opts)` — `#{ text: "alert" }` would be parsed as `opts` and `text` would throw as an unknown opt key. -- What the brief actually meant: `#{ text: "alert" }` is the body. +### 4.1 Orphan-sweep deferral -The agent resolved the contradiction by promoting body to a positional argument. The shape is now: -- `http::get(url)` / `http::get(url, opts)` — bodyless verbs -- `http::post(url)` / `http::post(url, body)` / `http::post(url, body, opts)` — body verbs -- Body type dispatch: Map/Array → JSON, String → text, `()` → no body -- `opts` vocabulary: `{headers, timeout_ms, follow_redirects, max_redirects}` — no `body`, no `body_raw` +**Verdict: accept.** Confirmed during planning. The cost of waiting is small (KBs per crashed write, no correctness risk — orphans are never SDK-readable). Defer to v1.1.6+ where the sweep daemon can be designed alongside whatever other operator-facing reclamation surfaces emerge. -`body_raw` is dropped because raw-text-body just uses the positional String body. Cleaner. The Slack example works unchanged. +### 4.2 Test count 63 vs the 70-90 target -**Verdict: accept the divergence.** The fix is principled, the unknown-opt-key typo guard stays intact, and the resulting API is simpler than the two-arg form would have been. Worth noting for the v1.1.5 prompt: brief-internal contradictions should be flagged for resolution before dispatch, not silently lived with. +**Verdict: accept the undershoot.** -### 4.2 Re-blessed stale `expected_schema.txt` golden +The agent's argument is sound: every named critical test in the prompt's §8 is present (atomic write rollback, checksum tampering, cross-app, path traversal, authz, fan-out transactional rollback, topic pattern shapes including all six rejections, multiple-matches, blob-to-base64). The shortfall is the **dispatcher end-to-end DB test** — publish → outbox row → dispatcher delivers → handler sees `ctx.event`. -The schema-snapshot test is `#[ignore]`'d (needs `DATABASE_URL`), so it never ran in CI. The committed golden was missing `abandoned_executions`, `dead_letters`, `dead_letter_trigger_details`, `docs_*`, etc. — all tables added in v1.1.1, v1.1.2, v1.1.3. The agent re-blessed the file as part of v1.1.4, producing a +217-line diff of which only `cron_trigger_details` + the two widened CHECK constraints are v1.1.4-new. +But: that end-to-end path is *entirely* through code that v1.1.1/v1.1.2/v1.1.4 already exercise. The dispatcher's `Files | Pubsub` match-arm extension is a one-line change. The handler's `ctx.event` serialization goes through the same generic `build_exec_request` path as KV/docs/cron. Adding a v1.1.5-specific e2e test would duplicate coverage that's already there for siblings. -The agent flagged this transparently in HANDBACK and recommended lifting `#[ignore]` with a CI DB service so the golden can't drift again. +If we wanted dispatcher e2e tests, they should be a workspace-wide effort (one test per trigger kind, gated on `DATABASE_URL`, picking up the new CI workflow's Postgres). That's a meaningful follow-up — worth flagging for v1.1.6 — but not v1.1.5's problem. -**Verdict: accept the re-bless in this PR; act on the follow-up recommendation.** +### 4.3 Empty-blob = missing-data -The drift correction lives on `main` going forward. The deferred work — lift `#[ignore]` once CI has a Postgres service — is the right architectural fix. Worth folding into the v1.1.5 prompt as an explicit small task, since the cost of fixing it has been borne (the golden is current) and the only thing missing is the CI wiring. +**Verdict: accept the deviation; relaxable later.** + +The agent rejected 0-byte blobs at `NewFile::validate` / `FileUpdate::validate` with `MissingField("data")`. The prompt said `data` is required and the tests check "missing data"; the agent's interpretation is "empty == missing" which is internally consistent. + +The cost: v1.1.5 can't store an intentionally-empty file. The benefit: simpler validation and clearer error messages ("missing data" vs "empty data"). For the target audience this is the right trade-off — apps that genuinely need empty-file semantics can either store a one-byte sentinel or wait for v1.2 to relax it. Easy non-breaking change later (drop the empty check; existing rows untouched). + +Flag for v1.1.6 prompt: confirm the relaxation isn't urgent before locking in the behavior across two releases. ## 5. Smaller observations (no action required) -- **Single-commit feature delivery.** The brief suggested split `feat(v1.1.4-http)` / `feat(v1.1.4-cron)` commits, but the two features cross shared files (`Cargo.toml`, `Services` bundle, `version.rs`, etc.) — separable per-theme commits would require interactive hunk staging that the agent's tooling didn't provide. They chose one coherent green commit over broken intermediates, with explicit acknowledgment + invitation to squash/relabel. Acceptable. -- **`SdkCallCx::script_id` addition.** Cross-cutting change (19 construction sites updated) needed because the default User-Agent template `picloud/ (script:)` requires the id and the cx didn't carry it. Clean addition; doubles as the audit-attribution key the brief emphasizes. HANDBACK §7 #4 flagged it. -- **`0.0.0.0` and `::` defensive additions.** The brief listed `0.0.0.0/8` (covered) but didn't list `::` (the IPv6 unspecified). The agent added both with reason `"unspecified"`. Defensible superset; minimal additional surface. -- **Live DB smoke went beyond the brief.** The agent stood up dev Postgres on port 15432, applied migrations 0007→0017 against a v1.1.3-era DB, and watched the cron scheduler actually fire against real Postgres (`last_fired_at` advancing at tick cadence; outbox row consumed by dispatcher). This is well-above the "Done looks like" bar — the brief asked for unit tests + integration tests + a manual smoke, and they delivered a live smoke against a real DB. -- **Container left running.** Side effect of the live smoke. The agent flagged this transparently. Run `docker compose down` when convenient; no data loss either way. +- **Admin file-delete bypasses `files:delete` trigger emission.** HANDBACK §7 #3 flagged this. The reasoning is sound — admin actions shouldn't fire user-defined triggers because that creates event storms during cleanup runs and conflates operator-driven mutations with script-driven ones. SDK deletes still emit; only the admin REST endpoint skips. Reasonable. +- **Admin files REST API addition** ([files_api.rs](crates/manager-core/src/files_api.rs)) was needed to back the dashboard view. Mirrors `triggers_api`'s direct-repo + capability pattern. HANDBACK §7 #2 flagged it. +- **`files` `list` bridge accepts both positional and map forms** (HANDBACK §7 #4). Additive convenience; the map form matches the prompt's example. Fine. +- **Collection-glob dialect reuses the existing `collection_matches`** (`*` / `foo*` prefix / exact) instead of introducing a new `"prefix:*"` form. Right call — keeping parity with KV/docs trigger semantics. HANDBACK §7 #5 flagged it. +- **`shared::pubsub::NoopPubsubService`** is added for the executor-core integration test harness — every call returns `PubsubError::Unavailable`. Same pattern as the existing `NoopEventEmitter`. Clean. +- **The publish saturating-add for `trigger_depth`** ([pubsub_repo.rs:107](crates/manager-core/src/pubsub_repo.rs#L107)) means a publish from depth-`u32::MAX` won't panic. That's already capped by `PICLOUD_MAX_TRIGGER_DEPTH` (default 8) at the dispatcher, but defensive overflow handling is correct. +- **`shared/src/pubsub.rs` tests** include four named cases (exact, prefix wildcard, universal, validation) with subcases — clean test taxonomy. -## 6. Open questions answered - -The HANDBACK §9 raises two open questions: - -1. **Three-arg HTTP shape** — confirmed acceptable (§4.1 above). -2. **Stale schema golden re-blessed** — confirmed acceptable (§4.2 above). - -No further questions outstanding. - -## 7. Versioning audit +## 6. Versioning audit | File | Before | After | Status | |---|---|---|---| -| Workspace `Cargo.toml` | 1.1.3 | 1.1.4 | ✅ | -| SDK schema (`shared/src/version.rs`) | 1.4 | 1.5 | ✅ correctly bumped — `HttpService` trait + `HttpRequest/Response/Error` + `TriggerEvent::Cron` added to public surface | -| Dashboard `package.json` | 0.9.0 | 0.10.0 | ✅ | -| Migrations | 0001..0016 | 0017 added | ✅ sequential, no skips | -| `rhai` pin | `"1.19"` | `"=1.24"` (workspace deps) | ✅ v1.1.3 follow-up §10b | -| CHANGELOG.md | v1.1.3 entry | v1.1.4 entry + retroactive v1.1.3 security note | ✅ §10c done | +| Workspace `Cargo.toml` | 1.1.4 | 1.1.5 | ✅ | +| SDK schema (`shared/src/version.rs`) | 1.5 | 1.6 | ✅ correctly bumped — `FilesService`, `PubsubService`, `FileMeta`, `NewFile`, `FileUpdate`, `topic_matches`, `validate_topic_pattern`, `TriggerEvent::{Files, Pubsub}` added to public surface | +| Dashboard `package.json` | 0.10.0 | 0.11.0 | ✅ | +| Migrations | 0001..0017 | 0018..0020 added | ✅ sequential, no skips | +| CHANGELOG.md | v1.1.4 entry | v1.1.5 entry added | ✅ | -## 8. Recommended next steps (post-merge) +## 7. Recommended next steps (post-merge) -1. **Merge** `feat/v1.1.4-http-cron` into `main` (fast-forward; branch is linear ahead). -2. **`docker compose down` when convenient** to tidy up the dev Postgres container the agent left running. -3. **Pause** before dispatching v1.1.5 (Files & Pub/Sub). -4. **For the v1.1.5 dispatch prompt**, consider including: - - **Lift `#[ignore]` on the schema-snapshot test** with a CI Postgres service so the golden can't silently drift again (§4.2). This is small, mechanical, and prevents a recurring problem. - - **Pre-resolve any brief-internal contradictions before dispatch.** The v1.1.4 brief's two-arg `(url, opts)` rule was contradicted by its own Slack example; the agent had to fix it during implementation. For v1.1.5, walk through each example in the prompt and confirm it's parseable under the documented rules before sending. - - **The literal-IP-bypass pattern** is worth remembering for any v1.1.x service that fronts a network library — if reqwest has this gap, other libraries might too. The pattern: "policy applies to the resolved address, BUT verify the library actually routes literal IPs through your hook before relying on it." +1. **Merge** `feat/v1.1.5-files-pubsub` into `main` (fast-forward; branch is linear ahead). +2. **Pause** before dispatching v1.1.6 (Realtime Channels & Client Library — the co-shipped SSE + `@picloud/client` work). +3. **For the v1.1.6 dispatch prompt**, consider folding in: + - **Dispatcher end-to-end DB tests** for each trigger kind. This is broader than v1.1.5 — it's a workspace-wide hygiene task. Now that CI has a Postgres service (per v1.1.5's `.github/workflows/ci.yml`), gating these tests on `DATABASE_URL` lets them run in CI without breaking local `cargo test`. Cost is bounded; the goal is to catch dispatcher regressions before they surface as production trigger silence. + - **Empty-blob storage** — revisit whether `data: 0 bytes` should be a valid stored state (currently rejected as missing). Decide before v1.1.6 ships so the semantics across two releases stay consistent. + - **Orphan file sweeper** — design + ship the simple `*.tmp.*` sweeper (defer the full DB-cross-check version to v1.3+). v1.1.6 is when the file storage will start to accumulate enough that operators notice. +4. **Awareness:** v1.1.5 is the first release where the CI workflow exists. If the project lands new contributors before v1.1.6, the workflow needs `secrets` review (none currently set) and possibly branch-protection rules pointing at the CI checks. Branch is ready for merge. Verdict: **APPROVE**.