# HANDBACK — v1.1.6 Realtime Channels & Client Library Branch: `feat/v1.1.6-realtime-client` (from `main`). Not pushed, no PR. ## 1. Scope coverage (§1–§13) | § | Item | Status | |---|------|--------| | 1 | `topics` table (`0021_topics.sql`) | ✅ Done | | 2 | Topic admin endpoints (POST/GET/PATCH/DELETE), `AppTopicManage` | ✅ Done | | 3 | SSE endpoint `GET /realtime/topics/{topic}` | ✅ Done | | 4 | In-process `RealtimeBroadcaster` + GC + publish wiring | ✅ Done | | 5 | HMAC subscriber tokens + `app_secrets` (`0022`) + `pubsub::subscriber_token` | ✅ Done | | 6 | Dashboard Topics tab | ✅ Done | | 7 | `@picloud/client` TypeScript package | ✅ Done | | 8 | Topic-aware publish → realtime wiring | ✅ Done | | 9 | Dispatcher e2e tests (six) | ✅ Done (location deviation — see §7) | | 10 | Empty-blob decision | ✅ Done — **relaxed** (accept empty) | | 11 | Orphan `*.tmp.*` sweeper | ✅ Done | | 12 | Version bumps (1.1.6 / SDK 1.7 / dash 0.12.0 / client 1.0.0) | ✅ Done | | 13 | Tests (all named-critical cases) | ✅ Done | ## 2. Realtime implementation notes ### Topic resolution / SSE handshake sequence 1. Extract `Host` → `app_domains.resolve_app(host)` (existing two-phase dispatch). No app → **404**. 2. Token from `Authorization: Bearer ` **or** `?token=` (EventSource can't set headers). 3. `RealtimeAuthority::authorize_subscribe(app_id, topic, token)`: - topic missing OR `external_subscribable = false` → `NotFound` → **404** (both collapse to 404 so the endpoint can't probe internal topics); - `auth_mode='public'` → allow; - `auth_mode='token'` → verify HMAC (present, signed by this app's key, unexpired, scoped to this topic) → allow, else `Unauthorized` → **401** (generic; never says which check failed). 4. `broadcaster.subscribe(app_id, topic)` → `broadcast::Receiver`; stream `data: {topic,message,published_at}\n\n` with `:heartbeat` keepalive every `PICLOUD_REALTIME_HEARTBEAT_SEC` (default 30). Headers: `text/event-stream`, `Cache-Control: no-cache` (set by axum Sse), `X-Accel-Buffering: no` (added). Client disconnect drops the receiver → automatic cleanup; the periodic GC reaps empty channels. The SSE handler lives in **orchestrator-core** (`realtime_api.rs`) and depends only on the three picloud-shared traits — all DB + signing-key access stays in the manager-core `RealtimeAuthority` impl, so the data-plane crate never touches the key. Cluster mode (v1.3+) swaps the broadcaster + authority impls behind the same traits. ### HMAC signing-key storage — chose **table** (`app_secrets`) Per the asked decision. 32 random bytes per app, lazily created on the first `pubsub::subscriber_token` call (`ON CONFLICT DO NOTHING` for concurrency). No global `PICLOUD_INSTANCE_SECRET`; the table is the natural home for v1.1.7's encrypted per-app secrets. **Tension flagged:** §5 aspired to "validate the signature without a DB lookup". The table approach needs a key read on the token-subscribe path. Mitigated by an in-memory key cache in `RealtimeAuthorityImpl` (keys never rotate in v1.1.6, so the cache needs no invalidation); the subscribe path already reads the `topics` row, so this adds no new round-trip *category*. ### In-process broadcaster `Mutex>>`. Capacity per channel `PICLOUD_REALTIME_BROADCAST_CAPACITY` (default 64); slow consumers lose oldest events (`broadcast` lag semantics — best-effort, no replay). `subscribe` creates channels lazily; `publish` is a silent no-op when no channel exists; `drop_topic` removes the sender (existing receivers observe a closed channel and disconnect). A `spawn_realtime_gc` task (~60s) drops senders with `receiver_count() == 0`. ### Publish wiring (order + failure modes) — §8 ordering chosen `PubsubServiceImpl::publish_durable`: validate/authz → **transactional outbox fan-out + commit** (existing) → **then** best-effort `RealtimeBroadcaster::publish`. The broadcast runs on a child `tokio::spawn` whose `JoinHandle` is awaited, so a panicking broadcaster becomes a `warn` log, never a failed publish (the durable deliveries already committed). One `published_at` instant is shared by both paths. **Brief-internal contradiction flagged** — see §9. ## 3. Client lib implementation notes - **Build:** **tsup** (dual ESM+CJS + `.d.ts`), tests **vitest**, lint = `tsc --noEmit` (strict; `noUncheckedIndexedAccess`, no `any` in exports). - **Layout:** `src/{index,client,endpoint,subscribe,auth,types}.ts` + `src/react/index.ts` + `src/svelte/index.ts`; subpath exports `@picloud/client/react` and `@picloud/client/svelte` via package `exports`. - **Reconnect:** SSE is implemented over **streaming `fetch`** (not native `EventSource`) so the lib can (a) detect a **401** on (re)connect and call `onTokenExpired` to refresh, (b) send **`Last-Event-ID`** on resume (server ignores it in v1.1.6 — client ships ready), and (c) apply its own **exponential backoff** (base 1s → ×2 → cap 30s; reset on successful open). Token rides in `?token=` (EventSource-parity). React Native caveat documented in the README (supply a streaming-`fetch` polyfill). - **zod/valibot adapter:** the `Validator = { parse(input): T }` shape. A Zod schema satisfies it directly; Valibot wraps in one line. Used by `endpoint(...).get({ validate })` and `subscribe(..., { validate })`. No hard dep. ## 4. v1.1.5 follow-ups - **Dispatcher e2e (six):** `dispatcher_delivers_{kv,docs,cron,files,pubsub, dead_letter}_to_handler` in `crates/picloud/tests/dispatcher_e2e.rs`. Verified green against a real Postgres (all 6). - **Empty-blob — RELAXED** (per the asked decision). Dropped the `data.is_empty()` rejection in `NewFile::validate` **and** `FileUpdate:: validate` (the latter for consistency — flagged in §7). Flipped the pinning test in `files_service.rs` and added `empty_file_round_trips`. - **Orphan sweep:** `spawn_files_orphan_sweep` (`files_sweep.rs`), every 6h (`PICLOUD_FILES_ORPHAN_SWEEP_INTERVAL_SEC`), unlinks `*.tmp.*` older than 1h (`PICLOUD_FILES_ORPHAN_TMP_TTL_SEC`); logs dirs-walked / files-deleted / bytes-reclaimed. No DB cross-check (v1.3+). Tested: deletes old tmp, keeps young tmp, keeps non-tmp, missing-root no panic. ## 5. Schema decisions beyond the brief None — `0021_topics.sql` and `0022_app_secrets.sql` match the brief's DDL verbatim. Schema-snapshot golden re-blessed on a fresh DB (only the two new tables added; PK/FK `ON DELETE CASCADE` + the `auth_mode` CHECK present). ## 6. How to verify locally See §8 below. ## 7. Decisions beyond the brief — every prompt-default deviation 1. **Dispatcher e2e location:** the brief says `crates/manager-core/tests/dispatcher_e2e.rs`; I put it in **`crates/picloud/tests/`**. `build_app` (the full dispatcher + scheduler + executor wiring) lives in the `picloud` crate; a manager-core test would need a manager→picloud dev-dependency cycle or a hand-rolled re-wire of build_app. The picloud harness (`server_with_app` pattern) is proven and the tests run green against a real DB. Gating uses the **schema_snapshot pattern** (env check + early return), NOT `#[ignore]`, so CI's plain `cargo test` with `DATABASE_URL` runs them while local stays green — as the brief requested. 2. **Empty-blob relaxation extended to `FileUpdate::validate`** — the brief names only `NewFile::validate`. Relaxing create-empty but rejecting update-to-empty would be an inconsistent API, so I relaxed both. 3. **Publish order: §8 (broadcast AFTER outbox commit)**, not §4 (broadcast FIRST). The two sections contradict; §8 is the dedicated, numbered, rationale-backed section. See §9. 4. **Topic-name validation** — `topics_api` rejects empty names and names containing `*` (external pattern subscription is v2). The brief didn't specify name validation; this is a small guard. 5. **Client lib lint = `tsc --noEmit`** (not eslint) to keep devDeps lean; strict typecheck is the gate. "No `any` in exports" is enforced by review + strict TS, not an eslint rule. 6. **Cron e2e poll budget = 45s** — the cron scheduler skips its first tick then ticks every 30s (default). The test polls 45s so it passes at the default; set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to make it ~2s in CI. ## 8. Attestation (gate runs on this HEAD) ``` cargo fmt --all -- --check → clean cargo clippy --all-targets --all-features -- -D warnings → clean (exit 0) cargo test --workspace → 482 passed, 0 failed (DB-gated dispatcher_e2e auto-skip as no-op when DATABASE_URL unset) # DB-gated (real Postgres @ 127.0.0.1:15432, picloud/picloud): DATABASE_URL=… PICLOUD_CRON_TICK_INTERVAL_MS=1000 \ cargo test -p picloud --test dispatcher_e2e → 6 passed DATABASE_URL=… cargo test -p picloud-manager-core --test schema_snapshot → 1 passed (cd dashboard && npm run check) → 0 errors, 0 warnings (371 files) (cd clients/typescript && npm run lint) → clean (tsc --noEmit) (cd clients/typescript && npm run test) → 15 passed (5 files) (cd clients/typescript && npm run build) → tsup ESM+CJS+.d.ts OK ``` Migrations verified applying cleanly on a fresh DB **and** on top of the existing v1.1.5 dev DB (0020 → 0021 → 0022). Schema-snapshot golden diff is exactly the two new tables. ## 9. Open questions for the reviewer 1. **§4 vs §8 ordering contradiction (load-bearing, flagged not reinterpreted):** §4 says "realtime broadcast FIRST, then transactional outbox fan-out"; §8 says broadcast AFTER the fan-out commits (numbered steps 1–4). I implemented **§8** (dedicated section + failure-mode rationale). If §4's ordering was intended, the broadcast should move before `fan_out_publish` — but then a publish whose outbox write fails would still have notified SSE subscribers of an event that never durably happened, which seems wrong. Please confirm §8 is correct. 2. **Dead-letter → handler fan-out appears unwired (see §10).** Confirm whether the `dead_letter` trigger source is intended to fire in v1.1.x. ## 10. Latent findings **`dead_letter` trigger handlers do not fire on dead-letter creation.** `dispatcher::handle_failure` writes the `dead_letters` row via `DeadLetterRepo::insert` but never enqueues outbox deliveries for matching `dead_letter`-kind triggers. `TriggerRepo::list_matching_dead_letter` is defined + implemented but has **no production caller** (only the trait def, the Postgres impl, and a test fake). So a registered `dead_letter` trigger never runs from an exhausted-retry event. This predates v1.1.6 (the design notes §4 describe it shipping in v1.1.1). I did **not** fix it (out of v1.1.6 scope) — flagging for the reviewer. Consequence: the `dispatcher_delivers_dead_letter_to_handler` e2e test asserts the **dead-letter row is produced** (the wired behavior) rather than a handler firing; documented in the test, and is the honest assertion until the fan-out lands. No new security gaps introduced. Cross-app isolation holds: `subscriber_token` claims carry `app_id` and are signed per-app; the SSE authority rejects cross-app tokens (a per-app key already fails the signature, plus an explicit `claims.app_id == app_id` check); topic admin endpoints bind the capability to the path `app_id` after loading the app; the broadcaster keys channels by `(AppId, topic)` so app A's publishes never reach app B's subscribers (unit-tested). ## 11. Deferred items (beyond this prompt's OUT list) None added. Everything on the brief's OUT list stayed out (WebSocket, session/script auth modes, topic-pattern external subscription, server-side last-event-id replay, per-app SSE/rate limits, other-language SDKs, codegen, full DB-cross-check sweeper). ## 12. Known limitations / rough edges - SSE delivery is best-effort at-most-once; a slow consumer past the broadcast buffer loses events with no server-side replay (by design). - The client lib's streaming-`fetch` SSE needs a polyfill on runtimes without `fetch` body streaming (React Native) — documented. - Cron e2e takes ~31s at the default 30s tick interval (45s poll budget); set `PICLOUD_CRON_TICK_INTERVAL_MS=1000` to speed it up. - The realtime key cache in `RealtimeAuthorityImpl` is per-process and never invalidated — correct only because v1.1.6 has no key rotation. A future rotation API must clear it.