Scope coverage, realtime + client-lib notes, §8 attestation (482 workspace tests; e2e 6/6 + schema snapshot verified on a real DB), every prompt-default deviation, and the latent dead_letter-trigger fan-out finding + §4-vs-§8 ordering question for the reviewer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
HANDBACK — v1.1.6 Realtime Channels & Client Library
Branch: feat/v1.1.6-realtime-client (from main). Not pushed, no PR.
1. Scope coverage (§1–§13)
| § | Item | Status |
|---|---|---|
| 1 | topics table (0021_topics.sql) |
✅ Done |
| 2 | Topic admin endpoints (POST/GET/PATCH/DELETE), AppTopicManage |
✅ Done |
| 3 | SSE endpoint GET /realtime/topics/{topic} |
✅ Done |
| 4 | In-process RealtimeBroadcaster + GC + publish wiring |
✅ Done |
| 5 | HMAC subscriber tokens + app_secrets (0022) + pubsub::subscriber_token |
✅ Done |
| 6 | Dashboard Topics tab | ✅ Done |
| 7 | @picloud/client TypeScript package |
✅ Done |
| 8 | Topic-aware publish → realtime wiring | ✅ Done |
| 9 | Dispatcher e2e tests (six) | ✅ Done (location deviation — see §7) |
| 10 | Empty-blob decision | ✅ Done — relaxed (accept empty) |
| 11 | Orphan *.tmp.* sweeper |
✅ Done |
| 12 | Version bumps (1.1.6 / SDK 1.7 / dash 0.12.0 / client 1.0.0) | ✅ Done |
| 13 | Tests (all named-critical cases) | ✅ Done |
2. Realtime implementation notes
Topic resolution / SSE handshake sequence
- Extract
Host→app_domains.resolve_app(host)(existing two-phase dispatch). No app → 404. - Token from
Authorization: Bearer <t>or?token=<t>(EventSource can't set headers). RealtimeAuthority::authorize_subscribe(app_id, topic, token):- topic missing OR
external_subscribable = false→NotFound→ 404 (both collapse to 404 so the endpoint can't probe internal topics); auth_mode='public'→ allow;auth_mode='token'→ verify HMAC (present, signed by this app's key, unexpired, scoped to this topic) → allow, elseUnauthorized→ 401 (generic; never says which check failed).
- topic missing OR
broadcaster.subscribe(app_id, topic)→broadcast::Receiver; streamdata: {topic,message,published_at}\n\nwith:heartbeatkeepalive everyPICLOUD_REALTIME_HEARTBEAT_SEC(default 30). Headers:text/event-stream,Cache-Control: no-cache(set by axum Sse),X-Accel-Buffering: no(added). Client disconnect drops the receiver → automatic cleanup; the periodic GC reaps empty channels.
The SSE handler lives in orchestrator-core (realtime_api.rs) and
depends only on the three picloud-shared traits — all DB + signing-key
access stays in the manager-core RealtimeAuthority impl, so the
data-plane crate never touches the key. Cluster mode (v1.3+) swaps the
broadcaster + authority impls behind the same traits.
HMAC signing-key storage — chose table (app_secrets)
Per the asked decision. 32 random bytes per app, lazily created on the
first pubsub::subscriber_token call (ON CONFLICT DO NOTHING for
concurrency). No global PICLOUD_INSTANCE_SECRET; the table is the
natural home for v1.1.7's encrypted per-app secrets.
Tension flagged: §5 aspired to "validate the signature without a DB
lookup". The table approach needs a key read on the token-subscribe path.
Mitigated by an in-memory key cache in RealtimeAuthorityImpl (keys never
rotate in v1.1.6, so the cache needs no invalidation); the subscribe path
already reads the topics row, so this adds no new round-trip category.
In-process broadcaster
Mutex<HashMap<(AppId, String), broadcast::Sender<RealtimeEvent>>>.
Capacity per channel PICLOUD_REALTIME_BROADCAST_CAPACITY (default 64);
slow consumers lose oldest events (broadcast lag semantics — best-effort,
no replay). subscribe creates channels lazily; publish is a silent
no-op when no channel exists; drop_topic removes the sender (existing
receivers observe a closed channel and disconnect). A spawn_realtime_gc
task (~60s) drops senders with receiver_count() == 0.
Publish wiring (order + failure modes) — §8 ordering chosen
PubsubServiceImpl::publish_durable: validate/authz → transactional
outbox fan-out + commit (existing) → then best-effort
RealtimeBroadcaster::publish. The broadcast runs on a child
tokio::spawn whose JoinHandle is awaited, so a panicking broadcaster
becomes a warn log, never a failed publish (the durable deliveries
already committed). One published_at instant is shared by both paths.
Brief-internal contradiction flagged — see §9.
3. Client lib implementation notes
- Build: tsup (dual ESM+CJS +
.d.ts), tests vitest, lint =tsc --noEmit(strict;noUncheckedIndexedAccess, noanyin exports). - Layout:
src/{index,client,endpoint,subscribe,auth,types}.ts+src/react/index.ts+src/svelte/index.ts; subpath exports@picloud/client/reactand@picloud/client/sveltevia packageexports. - Reconnect: SSE is implemented over streaming
fetch(not nativeEventSource) so the lib can (a) detect a 401 on (re)connect and callonTokenExpiredto refresh, (b) sendLast-Event-IDon resume (server ignores it in v1.1.6 — client ships ready), and (c) apply its own exponential backoff (base 1s → ×2 → cap 30s; reset on successful open). Token rides in?token=(EventSource-parity). React Native caveat documented in the README (supply a streaming-fetchpolyfill). - zod/valibot adapter: the
Validator<T> = { parse(input): T }shape. A Zod schema satisfies it directly; Valibot wraps in one line. Used byendpoint(...).get({ validate })andsubscribe(..., { validate }). No hard dep.
4. v1.1.5 follow-ups
- Dispatcher e2e (six):
dispatcher_delivers_{kv,docs,cron,files,pubsub, dead_letter}_to_handlerincrates/picloud/tests/dispatcher_e2e.rs. Verified green against a real Postgres (all 6). - Empty-blob — RELAXED (per the asked decision). Dropped the
data.is_empty()rejection inNewFile::validateandFileUpdate:: validate(the latter for consistency — flagged in §7). Flipped the pinning test infiles_service.rsand addedempty_file_round_trips. - Orphan sweep:
spawn_files_orphan_sweep(files_sweep.rs), every 6h (PICLOUD_FILES_ORPHAN_SWEEP_INTERVAL_SEC), unlinks*.tmp.*older than 1h (PICLOUD_FILES_ORPHAN_TMP_TTL_SEC); logs dirs-walked / files-deleted / bytes-reclaimed. No DB cross-check (v1.3+). Tested: deletes old tmp, keeps young tmp, keeps non-tmp, missing-root no panic.
5. Schema decisions beyond the brief
None — 0021_topics.sql and 0022_app_secrets.sql match the brief's DDL
verbatim. Schema-snapshot golden re-blessed on a fresh DB (only the two new
tables added; PK/FK ON DELETE CASCADE + the auth_mode CHECK present).
6. How to verify locally
See §8 below.
7. Decisions beyond the brief — every prompt-default deviation
- Dispatcher e2e location: the brief says
crates/manager-core/tests/dispatcher_e2e.rs; I put it incrates/picloud/tests/.build_app(the full dispatcher + scheduler- executor wiring) lives in the
picloudcrate; a manager-core test would need a manager→picloud dev-dependency cycle or a hand-rolled re-wire of build_app. The picloud harness (server_with_apppattern) is proven and the tests run green against a real DB. Gating uses the schema_snapshot pattern (env check + early return), NOT#[ignore], so CI's plaincargo testwithDATABASE_URLruns them while local stays green — as the brief requested.
- executor wiring) lives in the
- Empty-blob relaxation extended to
FileUpdate::validate— the brief names onlyNewFile::validate. Relaxing create-empty but rejecting update-to-empty would be an inconsistent API, so I relaxed both. - Publish order: §8 (broadcast AFTER outbox commit), not §4 (broadcast FIRST). The two sections contradict; §8 is the dedicated, numbered, rationale-backed section. See §9.
- Topic-name validation —
topics_apirejects empty names and names containing*(external pattern subscription is v2). The brief didn't specify name validation; this is a small guard. - Client lib lint =
tsc --noEmit(not eslint) to keep devDeps lean; strict typecheck is the gate. "Noanyin exports" is enforced by review- strict TS, not an eslint rule.
- Cron e2e poll budget = 45s — the cron scheduler skips its first tick
then ticks every 30s (default). The test polls 45s so it passes at the
default; set
PICLOUD_CRON_TICK_INTERVAL_MS=1000to make it ~2s in CI.
8. Attestation (gate runs on this HEAD)
cargo fmt --all -- --check → clean
cargo clippy --all-targets --all-features -- -D warnings → clean (exit 0)
cargo test --workspace → 482 passed, 0 failed
(DB-gated dispatcher_e2e
auto-skip as no-op when
DATABASE_URL unset)
# DB-gated (real Postgres @ 127.0.0.1:15432, picloud/picloud):
DATABASE_URL=… PICLOUD_CRON_TICK_INTERVAL_MS=1000 \
cargo test -p picloud --test dispatcher_e2e → 6 passed
DATABASE_URL=… cargo test -p picloud-manager-core --test schema_snapshot → 1 passed
(cd dashboard && npm run check) → 0 errors, 0 warnings (371 files)
(cd clients/typescript && npm run lint) → clean (tsc --noEmit)
(cd clients/typescript && npm run test) → 15 passed (5 files)
(cd clients/typescript && npm run build) → tsup ESM+CJS+.d.ts OK
Migrations verified applying cleanly on a fresh DB and on top of the existing v1.1.5 dev DB (0020 → 0021 → 0022). Schema-snapshot golden diff is exactly the two new tables.
9. Open questions for the reviewer
- §4 vs §8 ordering contradiction (load-bearing, flagged not
reinterpreted): §4 says "realtime broadcast FIRST, then transactional
outbox fan-out"; §8 says broadcast AFTER the fan-out commits (numbered
steps 1–4). I implemented §8 (dedicated section + failure-mode
rationale). If §4's ordering was intended, the broadcast should move
before
fan_out_publish— but then a publish whose outbox write fails would still have notified SSE subscribers of an event that never durably happened, which seems wrong. Please confirm §8 is correct. - Dead-letter → handler fan-out appears unwired (see §10). Confirm
whether the
dead_lettertrigger source is intended to fire in v1.1.x.
10. Latent findings
dead_letter trigger handlers do not fire on dead-letter creation.
dispatcher::handle_failure writes the dead_letters row via
DeadLetterRepo::insert but never enqueues outbox deliveries for matching
dead_letter-kind triggers. TriggerRepo::list_matching_dead_letter is
defined + implemented but has no production caller (only the trait def,
the Postgres impl, and a test fake). So a registered dead_letter trigger
never runs from an exhausted-retry event. This predates v1.1.6 (the design
notes §4 describe it shipping in v1.1.1). I did not fix it (out of
v1.1.6 scope) — flagging for the reviewer. Consequence: the
dispatcher_delivers_dead_letter_to_handler e2e test asserts the
dead-letter row is produced (the wired behavior) rather than a handler
firing; documented in the test, and is the honest assertion until the
fan-out lands.
No new security gaps introduced. Cross-app isolation holds: subscriber_token
claims carry app_id and are signed per-app; the SSE authority rejects
cross-app tokens (a per-app key already fails the signature, plus an explicit
claims.app_id == app_id check); topic admin endpoints bind the capability
to the path app_id after loading the app; the broadcaster keys channels by
(AppId, topic) so app A's publishes never reach app B's subscribers
(unit-tested).
11. Deferred items (beyond this prompt's OUT list)
None added. Everything on the brief's OUT list stayed out (WebSocket, session/script auth modes, topic-pattern external subscription, server-side last-event-id replay, per-app SSE/rate limits, other-language SDKs, codegen, full DB-cross-check sweeper).
12. Known limitations / rough edges
- SSE delivery is best-effort at-most-once; a slow consumer past the broadcast buffer loses events with no server-side replay (by design).
- The client lib's streaming-
fetchSSE needs a polyfill on runtimes withoutfetchbody streaming (React Native) — documented. - Cron e2e takes ~31s at the default 30s tick interval (45s poll budget);
set
PICLOUD_CRON_TICK_INTERVAL_MS=1000to speed it up. - The realtime key cache in
RealtimeAuthorityImplis per-process and never invalidated — correct only because v1.1.6 has no key rotation. A future rotation API must clear it.