docs(v1.1.6): handback report
Scope coverage, realtime + client-lib notes, §8 attestation (482 workspace tests; e2e 6/6 + schema snapshot verified on a real DB), every prompt-default deviation, and the latent dead_letter-trigger fan-out finding + §4-vs-§8 ordering question for the reviewer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
297
HANDBACK.md
297
HANDBACK.md
@@ -1,127 +1,220 @@
|
||||
# HANDBACK — v1.1.5 Files & Pub/Sub
|
||||
# HANDBACK — v1.1.6 Realtime Channels & Client Library
|
||||
|
||||
## §1 Branch + commits
|
||||
Branch: `feat/v1.1.6-realtime-client` (from `main`). Not pushed, no PR.
|
||||
|
||||
- **Branch:** `feat/v1.1.5-files-pubsub` (off `main`). Not pushed, not merged, no PR.
|
||||
- **Commits:** the two-feature split decided in planning + a finalize commit; HANDBACK is the 4th (docs):
|
||||
1. `6e132b6 feat(v1.1.5): files SDK + files:* triggers`
|
||||
2. `834c787 feat(v1.1.5): pubsub::publish_durable SDK + pubsub:* triggers`
|
||||
3. `4595db7 chore(v1.1.5): version bumps, CI workflow, schema-snapshot un-ignore`
|
||||
4. `docs(v1.1.5): handback report` (this file)
|
||||
## 1. Scope coverage (§1–§13)
|
||||
|
||||
Each of commits 1–3 is independently green (fmt + clippy + `cargo test --workspace`). Shared files (Cargo deps, `Services` bundle, `version.rs`, dispatcher arm, authz enum, CHANGELOG) are touched in both feature commits as planned — additive only, so commit 1 compiles green with the `AppPubsubPublish` capability and the dashboard `'pubsub'` type union present-but-unused until commit 2.
|
||||
| § | Item | Status |
|
||||
|---|------|--------|
|
||||
| 1 | `topics` table (`0021_topics.sql`) | ✅ Done |
|
||||
| 2 | Topic admin endpoints (POST/GET/PATCH/DELETE), `AppTopicManage` | ✅ Done |
|
||||
| 3 | SSE endpoint `GET /realtime/topics/{topic}` | ✅ Done |
|
||||
| 4 | In-process `RealtimeBroadcaster` + GC + publish wiring | ✅ Done |
|
||||
| 5 | HMAC subscriber tokens + `app_secrets` (`0022`) + `pubsub::subscriber_token` | ✅ Done |
|
||||
| 6 | Dashboard Topics tab | ✅ Done |
|
||||
| 7 | `@picloud/client` TypeScript package | ✅ Done |
|
||||
| 8 | Topic-aware publish → realtime wiring | ✅ Done |
|
||||
| 9 | Dispatcher e2e tests (six) | ✅ Done (location deviation — see §7) |
|
||||
| 10 | Empty-blob decision | ✅ Done — **relaxed** (accept empty) |
|
||||
| 11 | Orphan `*.tmp.*` sweeper | ✅ Done |
|
||||
| 12 | Version bumps (1.1.6 / SDK 1.7 / dash 0.12.0 / client 1.0.0) | ✅ Done |
|
||||
| 13 | Tests (all named-critical cases) | ✅ Done |
|
||||
|
||||
## §2 Scope coverage
|
||||
## 2. Realtime implementation notes
|
||||
|
||||
| Brief item | Status | Notes |
|
||||
|---|---|---|
|
||||
| §1 `files::*` SDK | ✅ | `create/head/get/update/delete/list`, blob in/out, metadata maps, throw-vs-`()` convention. |
|
||||
| §1 migration 0018_files.sql | ✅ | metadata table + `idx_files_app_collection`. Bytes on disk, never in PG. |
|
||||
| §1 atomic writes/deletes, checksum, size+name+type caps, authz, events | ✅ | See §3. |
|
||||
| §2 `files:*` trigger (Layout-E, 0019) | ✅ | widen 2 CHECKs + `files_trigger_details`; `TriggerEvent::Files` (metadata only); admin `POST /triggers/files`; `emit_files`; dispatcher arm. |
|
||||
| §3 `pubsub::publish_durable` SDK | ✅ | publish-time transactional fan-out; topic matching in Rust; succeed-silently on no match. |
|
||||
| §4 `pubsub:*` trigger (Layout-E, 0020) | ✅ | widen 2 CHECKs + `pubsub_trigger_details` + partial index; `TriggerEvent::Pubsub`; admin `POST /triggers/pubsub`; dispatcher arm. |
|
||||
| §5 dashboard Files view | ✅ | `apps/[slug]/files/+page.svelte` (list per collection, per-row delete w/ confirm). Backed by a new admin files API (§7.2). |
|
||||
| §5 dashboard Pub/Sub trigger form | ✅ | added to the Triggers tab beside Cron; trigger-list renders files + pubsub. `npm run check` clean. |
|
||||
| §6 schema_snapshot CI follow-up | ✅ | §6b skip-when-absent + un-ignore; §6a new `.github/workflows/ci.yml`. See §5. |
|
||||
| §7 version bumps | ✅ | workspace 1.1.4→1.1.5, SDK 1.5→1.6, dashboard 0.10.0→0.11.0, CHANGELOG, CLAUDE.md env table. |
|
||||
| §8 tests | ⚠️ | 63 new tests (target 70–90). Every *named* critical test covered; gap is the dispatcher end-to-end DB test (see §9.2). |
|
||||
### Topic resolution / SSE handshake sequence
|
||||
1. Extract `Host` → `app_domains.resolve_app(host)` (existing two-phase
|
||||
dispatch). No app → **404**.
|
||||
2. Token from `Authorization: Bearer <t>` **or** `?token=<t>` (EventSource
|
||||
can't set headers).
|
||||
3. `RealtimeAuthority::authorize_subscribe(app_id, topic, token)`:
|
||||
- topic missing OR `external_subscribable = false` → `NotFound` → **404**
|
||||
(both collapse to 404 so the endpoint can't probe internal topics);
|
||||
- `auth_mode='public'` → allow;
|
||||
- `auth_mode='token'` → verify HMAC (present, signed by this app's key,
|
||||
unexpired, scoped to this topic) → allow, else `Unauthorized` → **401**
|
||||
(generic; never says which check failed).
|
||||
4. `broadcaster.subscribe(app_id, topic)` → `broadcast::Receiver`; stream
|
||||
`data: {topic,message,published_at}\n\n` with `:heartbeat` keepalive
|
||||
every `PICLOUD_REALTIME_HEARTBEAT_SEC` (default 30). Headers:
|
||||
`text/event-stream`, `Cache-Control: no-cache` (set by axum Sse),
|
||||
`X-Accel-Buffering: no` (added). Client disconnect drops the receiver →
|
||||
automatic cleanup; the periodic GC reaps empty channels.
|
||||
|
||||
## §3 Files implementation notes
|
||||
The SSE handler lives in **orchestrator-core** (`realtime_api.rs`) and
|
||||
depends only on the three picloud-shared traits — all DB + signing-key
|
||||
access stays in the manager-core `RealtimeAuthority` impl, so the
|
||||
data-plane crate never touches the key. Cluster mode (v1.3+) swaps the
|
||||
broadcaster + authority impls behind the same traits.
|
||||
|
||||
**Service layering** (`FilesServiceImpl`, manager-core): validate collection (empty + traversal) → script-as-gate authz (`AppFilesRead`/`AppFilesWrite`, skipped when `cx.principal` is `None`) → field/size-cap validation → repo call keyed by `cx.app_id` → best-effort `ServiceEvent` emit. `executor-core` has **no** Postgres or filesystem dependency — both traits live in `picloud-shared`, the impl in manager-core.
|
||||
### HMAC signing-key storage — chose **table** (`app_secrets`)
|
||||
Per the asked decision. 32 random bytes per app, lazily created on the
|
||||
first `pubsub::subscriber_token` call (`ON CONFLICT DO NOTHING` for
|
||||
concurrency). No global `PICLOUD_INSTANCE_SECRET`; the table is the
|
||||
natural home for v1.1.7's encrypted per-app secrets.
|
||||
**Tension flagged:** §5 aspired to "validate the signature without a DB
|
||||
lookup". The table approach needs a key read on the token-subscribe path.
|
||||
Mitigated by an in-memory key cache in `RealtimeAuthorityImpl` (keys never
|
||||
rotate in v1.1.6, so the cache needs no invalidation); the subscribe path
|
||||
already reads the `topics` row, so this adds no new round-trip *category*.
|
||||
|
||||
**Atomic-write protocol** (`write_atomic_at`, a free fn so it's unit-testable without a pool):
|
||||
1. Validate collection path-safety (defensive — already enforced at the SDK boundary).
|
||||
2. `create_dir_all` the shard dir `<root>/files/<app_id>/<collection>/<id[0:2]>/<id>` with `0o700` (Unix `DirBuilderExt::mode`).
|
||||
3. SHA-256 the in-memory bytes (single pass — never re-reads the file) while writing to `<final>.tmp.<pid>-<atomic-counter>`.
|
||||
4. `sync_all()` the temp file.
|
||||
5. `rename(tmp, final)` — atomic on POSIX.
|
||||
6. `sync_all()` the parent dir (rename durability).
|
||||
7. INSERT/UPDATE the DB row.
|
||||
### In-process broadcaster
|
||||
`Mutex<HashMap<(AppId, String), broadcast::Sender<RealtimeEvent>>>`.
|
||||
Capacity per channel `PICLOUD_REALTIME_BROADCAST_CAPACITY` (default 64);
|
||||
slow consumers lose oldest events (`broadcast` lag semantics — best-effort,
|
||||
no replay). `subscribe` creates channels lazily; `publish` is a silent
|
||||
no-op when no channel exists; `drop_topic` removes the sender (existing
|
||||
receivers observe a closed channel and disconnect). A `spawn_realtime_gc`
|
||||
task (~60s) drops senders with `receiver_count() == 0`.
|
||||
|
||||
Rollback per step: crash in 1–5 → orphan `*.tmp.*` (never read; the pid+counter suffix avoids collisions); crash in 5–7 → bytes with no row, **never reachable via the SDK** because every read starts from the row. `update` reads the prior row first (existence + CDC `prev`), writes new bytes, then UPDATEs.
|
||||
### Publish wiring (order + failure modes) — §8 ordering chosen
|
||||
`PubsubServiceImpl::publish_durable`: validate/authz → **transactional
|
||||
outbox fan-out + commit** (existing) → **then** best-effort
|
||||
`RealtimeBroadcaster::publish`. The broadcast runs on a child
|
||||
`tokio::spawn` whose `JoinHandle` is awaited, so a panicking broadcaster
|
||||
becomes a `warn` log, never a failed publish (the durable deliveries
|
||||
already committed). One `published_at` instant is shared by both paths.
|
||||
**Brief-internal contradiction flagged** — see §9.
|
||||
|
||||
**Atomic-delete protocol** (`FsFilesRepo::delete`): `SELECT … FOR UPDATE` + `DELETE` in one transaction → commit → `unlink` outside the tx. Unlink failure leaves an orphan (logged at warn); failure before commit changes nothing. Returns the deleted metadata so the service can emit.
|
||||
## 3. Client lib implementation notes
|
||||
|
||||
**Path-traversal validation:** `picloud_shared::validate_files_collection` rejects empty / `/` / `\` / `..` / NUL at the SDK boundary; `FsFilesRepo::guard_collection` repeats it before any fs op. UUID ids can't produce traversal (verified defensively).
|
||||
- **Build:** **tsup** (dual ESM+CJS + `.d.ts`), tests **vitest**, lint =
|
||||
`tsc --noEmit` (strict; `noUncheckedIndexedAccess`, no `any` in exports).
|
||||
- **Layout:** `src/{index,client,endpoint,subscribe,auth,types}.ts` +
|
||||
`src/react/index.ts` + `src/svelte/index.ts`; subpath exports
|
||||
`@picloud/client/react` and `@picloud/client/svelte` via package `exports`.
|
||||
- **Reconnect:** SSE is implemented over **streaming `fetch`** (not native
|
||||
`EventSource`) so the lib can (a) detect a **401** on (re)connect and call
|
||||
`onTokenExpired` to refresh, (b) send **`Last-Event-ID`** on resume
|
||||
(server ignores it in v1.1.6 — client ships ready), and (c) apply its own
|
||||
**exponential backoff** (base 1s → ×2 → cap 30s; reset on successful
|
||||
open). Token rides in `?token=` (EventSource-parity). React Native caveat
|
||||
documented in the README (supply a streaming-`fetch` polyfill).
|
||||
- **zod/valibot adapter:** the `Validator<T> = { parse(input): T }` shape.
|
||||
A Zod schema satisfies it directly; Valibot wraps in one line. Used by
|
||||
`endpoint(...).get({ validate })` and `subscribe(..., { validate })`. No
|
||||
hard dep.
|
||||
|
||||
**Per-call SHA-256:** computed once over the in-memory `Vec<u8>` during the write (`sha2::Sha256`), hex-lowercased, stored on the row. The file is never re-read to hash. Known-vector tests pin `SHA-256("abc")` and `SHA-256("")`.
|
||||
## 4. v1.1.5 follow-ups
|
||||
|
||||
**Checksum-on-get:** `get` reads the file, re-hashes, compares to the stored checksum. Mismatch (or missing bytes while the row persists) → `FilesError::Corrupted`, logged at error level with the path, **no auto-delete**. To scripts this surfaces as a thrown Rhai error `"files: file content corrupted (checksum mismatch)"`.
|
||||
- **Dispatcher e2e (six):** `dispatcher_delivers_{kv,docs,cron,files,pubsub,
|
||||
dead_letter}_to_handler` in `crates/picloud/tests/dispatcher_e2e.rs`.
|
||||
Verified green against a real Postgres (all 6).
|
||||
- **Empty-blob — RELAXED** (per the asked decision). Dropped the
|
||||
`data.is_empty()` rejection in `NewFile::validate` **and** `FileUpdate::
|
||||
validate` (the latter for consistency — flagged in §7). Flipped the
|
||||
pinning test in `files_service.rs` and added `empty_file_round_trips`.
|
||||
- **Orphan sweep:** `spawn_files_orphan_sweep` (`files_sweep.rs`), every 6h
|
||||
(`PICLOUD_FILES_ORPHAN_SWEEP_INTERVAL_SEC`), unlinks `*.tmp.*` older than
|
||||
1h (`PICLOUD_FILES_ORPHAN_TMP_TTL_SEC`); logs dirs-walked / files-deleted /
|
||||
bytes-reclaimed. No DB cross-check (v1.3+). Tested: deletes old tmp, keeps
|
||||
young tmp, keeps non-tmp, missing-root no panic.
|
||||
|
||||
## §4 Pub/Sub implementation notes
|
||||
## 5. Schema decisions beyond the brief
|
||||
None — `0021_topics.sql` and `0022_app_secrets.sql` match the brief's DDL
|
||||
verbatim. Schema-snapshot golden re-blessed on a fresh DB (only the two new
|
||||
tables added; PK/FK `ON DELETE CASCADE` + the `auth_mode` CHECK present).
|
||||
|
||||
**Fan-out-at-publish-time, transactional** (`PostgresPubsubRepo::fan_out_publish`): one transaction — `SELECT` all enabled pubsub triggers for the app (joined to `pubsub_trigger_details`), filter by `topic_matches` in Rust, INSERT one `outbox` row (`source_kind='pubsub'`) per survivor, commit once. A mid-fan-out failure rolls back every row (no half-fan-out). Each delivery row then retries/dead-letters independently through the unchanged dispatcher (its trigger arm just gained `| OutboxSourceKind::Pubsub`).
|
||||
## 6. How to verify locally
|
||||
See §8 below.
|
||||
|
||||
**Topic pattern matching** runs in Rust (`picloud_shared::topic_matches`), not SQL: `"*"` → all; `"<prefix>.*"` → `starts_with("<prefix>.")`; otherwise exact. `validate_topic_pattern` (used at trigger creation in the admin endpoint and defensively in the repo) accepts only `*` / `<prefix>.*` / no-star-exact, rejecting `*.created`, `**`, `a.*.b`, `user.*x`, etc. with `"unsupported pubsub topic pattern: …"`.
|
||||
## 7. Decisions beyond the brief — every prompt-default deviation
|
||||
1. **Dispatcher e2e location:** the brief says
|
||||
`crates/manager-core/tests/dispatcher_e2e.rs`; I put it in
|
||||
**`crates/picloud/tests/`**. `build_app` (the full dispatcher + scheduler
|
||||
+ executor wiring) lives in the `picloud` crate; a manager-core test would
|
||||
need a manager→picloud dev-dependency cycle or a hand-rolled re-wire of
|
||||
build_app. The picloud harness (`server_with_app` pattern) is proven and
|
||||
the tests run green against a real DB. Gating uses the **schema_snapshot
|
||||
pattern** (env check + early return), NOT `#[ignore]`, so CI's plain
|
||||
`cargo test` with `DATABASE_URL` runs them while local stays green — as
|
||||
the brief requested.
|
||||
2. **Empty-blob relaxation extended to `FileUpdate::validate`** — the brief
|
||||
names only `NewFile::validate`. Relaxing create-empty but rejecting
|
||||
update-to-empty would be an inconsistent API, so I relaxed both.
|
||||
3. **Publish order: §8 (broadcast AFTER outbox commit)**, not §4 (broadcast
|
||||
FIRST). The two sections contradict; §8 is the dedicated, numbered,
|
||||
rationale-backed section. See §9.
|
||||
4. **Topic-name validation** — `topics_api` rejects empty names and names
|
||||
containing `*` (external pattern subscription is v2). The brief didn't
|
||||
specify name validation; this is a small guard.
|
||||
5. **Client lib lint = `tsc --noEmit`** (not eslint) to keep devDeps lean;
|
||||
strict typecheck is the gate. "No `any` in exports" is enforced by review
|
||||
+ strict TS, not an eslint rule.
|
||||
6. **Cron e2e poll budget = 45s** — the cron scheduler skips its first tick
|
||||
then ticks every 30s (default). The test polls 45s so it passes at the
|
||||
default; set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to make it ~2s in CI.
|
||||
|
||||
**No matching trigger → the publish succeeds, zero outbox rows** (the design-notes-preferred succeed-silently). `published_at` is stamped manager-side (`Utc::now()`) so every delivery agrees on one instant. `ctx.event.pubsub = #{ topic, message, published_at }`, `ctx.event.op = "publish"`.
|
||||
|
||||
There is **no `list_matching_pubsub` on `TriggerRepo`** — pubsub publishes directly (it's not a `ServiceEvent`), so the fan-out SELECT lives in `pubsub_repo`, not the `OutboxEventEmitter`. This is the one structural asymmetry vs files/kv/docs, intentional per the publish-time-fan-out decision.
|
||||
|
||||
## §5 CI follow-up (§6) status
|
||||
|
||||
- **Pre-existing CI:** none (no `.github/`, no `.gitlab-ci.yml`).
|
||||
- **§6a (added):** `.github/workflows/ci.yml` — a `rust` job with a `postgres:15` service (`DATABASE_URL=postgres://picloud:picloud@localhost:5432/picloud`) running `cargo fmt --all -- --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --workspace`; a separate `dashboard` job running `npm ci` + `npm run check`.
|
||||
- **§6b (done):** `schema_snapshot.rs` is no longer `#[ignore]`'d. Reworked from `#[sqlx::test]` to `#[tokio::test]` that **skips cleanly when `DATABASE_URL` is unset** (chosen over fail-loud so `cargo test --workspace` stays green locally) and otherwise connects, runs `sqlx::migrate!`, and dumps. Golden `expected_schema.txt` re-blessed (now contains `files`, `files_trigger_details`, `pubsub_trigger_details`, both widened CHECKs, `idx_files_app_collection`, `idx_triggers_app_pubsub_enabled`, and migrations 0018–0020).
|
||||
- **Tradeoff (documented):** the non-`sqlx::test` path applies migrations against the `DATABASE_URL` database directly rather than an isolated throwaway DB. Migrations are forward-only/idempotent and CI's Postgres is fresh, so the structural dump is identical; locally it will also apply 0018–0020 to whatever DB you point at.
|
||||
|
||||
## §6 Schema decisions beyond the brief
|
||||
|
||||
- `files` table is verbatim from the brief. `files_trigger_details` / `pubsub_trigger_details` mirror `kv_trigger_details` / `cron_trigger_details`.
|
||||
- `pubsub_trigger_details` has no `ops` column (a publish has a single implicit op) — only `topic_pattern`.
|
||||
- `idx_triggers_app_pubsub_enabled` is the third partial index of its kind (per the brief's note); deliberate duplication.
|
||||
|
||||
## §7 Decisions beyond the brief (every prompt-default deviation)
|
||||
|
||||
1. **Empty blob treated as a missing `data` field.** `NewFile::validate` / `FileUpdate::validate` reject 0-byte `data` with `FilesError::MissingField("data")`. The brief lists `data` as required and tests "missing … data"; the cleanest testable interpretation at the service layer is "empty == missing". Consequence: v1.1.5 cannot store an intentionally-empty file. Easy to relax later.
|
||||
2. **Admin files REST API added** (`files_api.rs`: `GET /apps/{id}/files?collection=…`, `DELETE /apps/{id}/files/{collection}/{file_id}`). The brief's §5 dashboard needs a backend but didn't spell out admin endpoints; I added a minimal one mirroring `triggers_api`'s direct-repo + capability pattern (`AppFilesRead` for list, `AppFilesWrite` for delete).
|
||||
3. **Admin file delete does NOT emit a `files:delete` trigger event.** It's an operator cleanup action, not a script mutation, so it goes straight to the repo. SDK deletes still emit. Flagging because "every successful mutation emits" could be read to include admin deletes.
|
||||
4. **Files `list` bridge accepts both positional and map forms** — `list()`, `list(cursor)`, `list(cursor, limit)`, and `list(#{ cursor, limit })` (the map form the brief's example used). Additive convenience.
|
||||
5. **Files collection-glob semantics reuse the existing `collection_matches`** (`*` / `foo*` prefix / exact), identical to kv/docs. The brief mentioned a `"prefix:*"` form in one spot; I kept parity with the established kv/docs matcher rather than introduce a new glob dialect.
|
||||
6. **schema_snapshot runs against the live `DATABASE_URL` DB** rather than an isolated temp DB (see §5).
|
||||
7. **Orphan sweep deferred to v1.1.6+** — confirmed with the user during planning (the brief's recommended default). No `*.tmp.*` sweeper daemon shipped.
|
||||
|
||||
## §8 How to verify locally — attestation (fresh run on HEAD `4595db7`)
|
||||
## 8. Attestation (gate runs on this HEAD)
|
||||
|
||||
```
|
||||
cargo fmt --all -- --check → exit 0
|
||||
cargo clippy --all-targets --all-features -- -D warnings → exit 0
|
||||
cargo test --workspace → 491 passed, 0 failed (exit 0)
|
||||
(schema_snapshot skips cleanly with no DATABASE_URL)
|
||||
cd dashboard && npm run check → 0 errors, 0 warnings (exit 0)
|
||||
cargo fmt --all -- --check → clean
|
||||
cargo clippy --all-targets --all-features -- -D warnings → clean (exit 0)
|
||||
cargo test --workspace → 482 passed, 0 failed
|
||||
(DB-gated dispatcher_e2e
|
||||
auto-skip as no-op when
|
||||
DATABASE_URL unset)
|
||||
# DB-gated (real Postgres @ 127.0.0.1:15432, picloud/picloud):
|
||||
DATABASE_URL=… PICLOUD_CRON_TICK_INTERVAL_MS=1000 \
|
||||
cargo test -p picloud --test dispatcher_e2e → 6 passed
|
||||
DATABASE_URL=… cargo test -p picloud-manager-core --test schema_snapshot → 1 passed
|
||||
(cd dashboard && npm run check) → 0 errors, 0 warnings (371 files)
|
||||
(cd clients/typescript && npm run lint) → clean (tsc --noEmit)
|
||||
(cd clients/typescript && npm run test) → 15 passed (5 files)
|
||||
(cd clients/typescript && npm run build) → tsup ESM+CJS+.d.ts OK
|
||||
```
|
||||
Migrations verified applying cleanly on a fresh DB **and** on top of the
|
||||
existing v1.1.5 dev DB (0020 → 0021 → 0022). Schema-snapshot golden diff is
|
||||
exactly the two new tables.
|
||||
|
||||
With a live Postgres (the schema guardrail actually verifies the schema):
|
||||
```
|
||||
DATABASE_URL=postgres://picloud:picloud@127.0.0.1:15432/picloud \
|
||||
cargo test -p picloud-manager-core --test schema_snapshot → test result: ok. 1 passed
|
||||
```
|
||||
Migrations 0018–0020 applied cleanly on top of the existing v1.1.4 dev DB during the re-bless — the same `sqlx::migrate!` replay CI runs on a fresh Postgres.
|
||||
## 9. Open questions for the reviewer
|
||||
1. **§4 vs §8 ordering contradiction (load-bearing, flagged not
|
||||
reinterpreted):** §4 says "realtime broadcast FIRST, then transactional
|
||||
outbox fan-out"; §8 says broadcast AFTER the fan-out commits (numbered
|
||||
steps 1–4). I implemented **§8** (dedicated section + failure-mode
|
||||
rationale). If §4's ordering was intended, the broadcast should move
|
||||
before `fan_out_publish` — but then a publish whose outbox write fails
|
||||
would still have notified SSE subscribers of an event that never durably
|
||||
happened, which seems wrong. Please confirm §8 is correct.
|
||||
2. **Dead-letter → handler fan-out appears unwired (see §10).** Confirm
|
||||
whether the `dead_letter` trigger source is intended to fire in v1.1.x.
|
||||
|
||||
Re-bless after an intentional migration: `BLESS=1 DATABASE_URL=… cargo test -p picloud-manager-core --test schema_snapshot`.
|
||||
## 10. Latent findings
|
||||
**`dead_letter` trigger handlers do not fire on dead-letter creation.**
|
||||
`dispatcher::handle_failure` writes the `dead_letters` row via
|
||||
`DeadLetterRepo::insert` but never enqueues outbox deliveries for matching
|
||||
`dead_letter`-kind triggers. `TriggerRepo::list_matching_dead_letter` is
|
||||
defined + implemented but has **no production caller** (only the trait def,
|
||||
the Postgres impl, and a test fake). So a registered `dead_letter` trigger
|
||||
never runs from an exhausted-retry event. This predates v1.1.6 (the design
|
||||
notes §4 describe it shipping in v1.1.1). I did **not** fix it (out of
|
||||
v1.1.6 scope) — flagging for the reviewer. Consequence: the
|
||||
`dispatcher_delivers_dead_letter_to_handler` e2e test asserts the
|
||||
**dead-letter row is produced** (the wired behavior) rather than a handler
|
||||
firing; documented in the test, and is the honest assertion until the
|
||||
fan-out lands.
|
||||
|
||||
**Not run this session:** the full running-binary manual smoke (a script that does `files::collection("uploads").create(...)` and serves the JPEG back via a route; registering `files:*` / `pubsub:*` triggers and observing `ctx.event`). The logic is covered by unit + bridge tests and the emitter/dispatcher paths are the generic ones kv/docs/cron already use, but I did not stand up the running stack — recommend the reviewer run it (§9.2).
|
||||
No new security gaps introduced. Cross-app isolation holds: `subscriber_token`
|
||||
claims carry `app_id` and are signed per-app; the SSE authority rejects
|
||||
cross-app tokens (a per-app key already fails the signature, plus an explicit
|
||||
`claims.app_id == app_id` check); topic admin endpoints bind the capability
|
||||
to the path `app_id` after loading the app; the broadcaster keys channels by
|
||||
`(AppId, topic)` so app A's publishes never reach app B's subscribers
|
||||
(unit-tested).
|
||||
|
||||
## §9 Open questions for the reviewer
|
||||
## 11. Deferred items (beyond this prompt's OUT list)
|
||||
None added. Everything on the brief's OUT list stayed out (WebSocket,
|
||||
session/script auth modes, topic-pattern external subscription, server-side
|
||||
last-event-id replay, per-app SSE/rate limits, other-language SDKs, codegen,
|
||||
full DB-cross-check sweeper).
|
||||
|
||||
1. **Orphan sweep** — deferred to v1.1.6+ per the planning decision. Confirm shipping v1.1.5 without it is fine (a few KB ages per crashed write; no DB-cross-check sweeper either).
|
||||
2. **Test count 63 vs the 70–90 target.** Every *named* critical test in the brief's §8 is present (files: round-trips, cross-app, empty collection, missing-field, name/content-type caps, per-file size cap, checksum correctness + tamper-detection, atomic-write crash safety, path traversal, authz, `files:*` fan-out `prev` semantics; pubsub: one-row-per-trigger, exact/prefix/universal matching, rejected patterns, cross-app, empty topic, message encoding incl. blob→base64, transactional rollback, multiple matches). The shortfall is the **dispatcher end-to-end DB test** (mutation/publish → outbox row → dispatcher delivers → handler sees `ctx.event`). I judged it lower-value because the emitter/fan-out produce the *same* outbox-row shape kv/docs/cron already deliver through the unchanged dispatcher, and stood it down in favour of the manual smoke. Want a `DATABASE_URL`-gated integration test added for it?
|
||||
3. **Empty-blob = missing-data** (§7.1) — acceptable, or should empty files be storable?
|
||||
|
||||
## §10 Latent security findings
|
||||
|
||||
None new. Checked specifically: (a) cross-app isolation is keyed on `cx.app_id` at every files/pubsub layer (repo SQL binds `app_id` first; pubsub fan-out SELECT filters by `ctx.app_id`); tests assert app A can't see/fire app B's files/triggers. (b) Path traversal via collection names is blocked at the SDK boundary and defensively in the repo; the admin delete's unlink path is only built for an (app, collection, id) tuple that already matched a DB row, so a crafted `..` segment can't unlink arbitrary files. (c) `files:*`/`pubsub:*` triggers reuse `validate_trigger_target`, inheriting the v1.1.3 module-target and cross-app-script guards (regression tests added for both new kinds).
|
||||
|
||||
## §11 Deferred items (per brief Scope-OUT + orphan-sweep decision)
|
||||
|
||||
`publish_ephemeral` (v1.2), per-app storage quotas (v1.2), file dedup (v1.2+), presigned URLs / external download tokens (v1.1.6+), streaming up/download (Rhai is sync), file-level ACLs (v1.2+), mid-pattern wildcards (v1.2), topic ACLs / external subscription / `topics` table (v1.1.6), realtime SSE (v1.1.6), and the **orphan-file sweep daemon** (v1.1.6+ — confirmed deferred).
|
||||
|
||||
## §12 Known limitations / rough edges
|
||||
|
||||
- **No orphan reclamation** — crashed writes leave `*.tmp.*`; rename-completed-but-DB-failed leaves unreferenced bytes. Both are harmless (never SDK-readable) but accumulate until v1.1.6's sweeper.
|
||||
- **Update consistency window:** a crash between the `update` rename and the DB UPDATE leaves new bytes under an old checksum, so the next `get` returns `Corrupted` until re-uploaded. This is the brief's accepted step-5–7 window, surfaced honestly.
|
||||
- **Pub/sub fan-out holds one transaction across all subscribers** — fine at v1.1.x scale; a topic-trie index is the v1.2 escape hatch if it becomes a hot path.
|
||||
- **Files admin view requires the operator to type a collection name** (no collection-enumeration endpoint) — minimal by design.
|
||||
- **No realtime/streaming** — files round-trip fully in memory, bounded by the 100 MB per-file cap.
|
||||
## 12. Known limitations / rough edges
|
||||
- SSE delivery is best-effort at-most-once; a slow consumer past the
|
||||
broadcast buffer loses events with no server-side replay (by design).
|
||||
- The client lib's streaming-`fetch` SSE needs a polyfill on runtimes
|
||||
without `fetch` body streaming (React Native) — documented.
|
||||
- Cron e2e takes ~31s at the default 30s tick interval (45s poll budget);
|
||||
set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to speed it up.
|
||||
- The realtime key cache in `RealtimeAuthorityImpl` is per-process and never
|
||||
invalidated — correct only because v1.1.6 has no key rotation. A future
|
||||
rotation API must clear it.
|
||||
|
||||
Reference in New Issue
Block a user