Audit of feat/v1.1.6-realtime-client against the v1.1.6 dispatch prompt. All gates green; clippy clean; ~550 tests pass (HANDBACK §8 claimed 482 — minor count-discrepancy flagged for retro, not a blocker). Both flagged items verified and resolved: - §4-vs-§8 publish-ordering contradiction in the brief: the agent picked §8 (broadcast AFTER outbox commit) and explicitly flagged the contradiction. Confirmed correct — §8's ordering protects against subscribers being told an event happened that subsequently failed to durably commit. §4's broadcast-first phrasing was a latency-optimization aside; §8 is the dedicated numbered spec. The v1.1.4 retro discipline lesson (flag-don't-reinterpret) worked. - Latent finding: dead_letter trigger handlers never fire — verified via grep. list_matching_dead_letter has no production caller; the bug predates v1.1.6 (shipped silent since v1.1.1). Correctly out-of-scope for v1.1.6. The dispatcher e2e test for dead_letter asserts the wired behavior (row created) with inline docs explaining why it's not asserting handler-fire. Fix folded into v1.1.7 prompt recommendations along with a retroactive CHANGELOG note. Three v1.1.5 follow-ups landed: six dispatcher e2e tests gated on DATABASE_URL, empty-blob relaxed, orphan tmp-sweeper. HMAC signing key persisted to app_secrets table (recommended path); streaming- fetch SSE in the client lib unlocks bearer-header auth + 401 detection + Last-Event-ID resume.
20 KiB
v1.1.6 Audit & Review
Branch: feat/v1.1.6-realtime-client
Base: main (v1.1.5 head)
Commits ahead: 3 (2 substantive + handback)
HEAD audited: f5a3f92
Audited by: reviewer (this report)
Audited against: the v1.1.6 dispatch prompt + the v1.1.1–v1.1.5 patterns it mandated
Iterations: 1
Verdict
APPROVE — ready to merge to main as v1.1.6.
The largest release in v1.1.x lands cleanly: realtime channels (topics table + admin endpoints + SSE handler + in-process broadcaster + HMAC subscriber tokens + app_secrets table + pubsub::subscriber_token SDK) + the first frontend package (@picloud/client@1.0.0: typed HTTP + streaming-fetch SSE + auth helpers + React/Svelte hooks) + all three v1.1.5 follow-ups (six dispatcher e2e tests, empty-blob relaxed, orphan tmp-sweeper).
Two open questions raised in HANDBACK §9/§10 — I'll weigh in:
- §4-vs-§8 ordering contradiction in the brief: the agent picked §8 (broadcast AFTER outbox commit) and flagged the contradiction transparently rather than silently reinterpreting. §8 is the correct call — see §3 below. This is the v1.1.4 retro lesson on brief-internal contradictions working as intended.
- Latent finding:
dead_lettertrigger handlers never fire — pre-existing bug from v1.1.1 confirmed. Not v1.1.6's responsibility to fix; correctly out-of-scope. See §4 below for the verification and the v1.1.7 follow-up.
Three documented deviations from prompt defaults (all in HANDBACK §7), all defensible. One test-count discrepancy worth noting (582 vs 482 claim — see §5). None of this blocks merge.
1. Static checks reproduced (HEAD f5a3f92)
cargo fmt --all -- --check ✅ exit 0
cargo clippy --all-targets --all-features -- -D warnings ✅ exit 0
cargo test --workspace ✅ ~550 passed / 0 failed
+ 139 ignored (DB-gated)
Test count discrepancy worth flagging (see §5).
2. Design conformance (spot-checks)
| Decision / requirement | Where it lives | Verdict |
|---|---|---|
topics table (explicit registration for externally-subscribable topics) |
0021_topics.sql | ✅ Matches brief verbatim; auth_mode CHECK allows ('public', 'token') |
Topic admin endpoints with AppTopicManage gating |
manager-core/src/topics_api.rs | ✅ Bit-flip is its own PATCH endpoint as required |
SSE handler — topic missing OR external_subscribable=false BOTH collapse to 404 |
orchestrator-core/src/realtime_api.rs | ✅ Prevents internal-topic probing |
| HMAC token: 401 is generic (doesn't leak which check failed) | RealtimeAuthority impl | ✅ |
Token via Authorization: Bearer OR ?token= (EventSource compat) |
realtime_api.rs | ✅ Required because browsers can't set headers on EventSource |
| Heartbeat every 30s (env-overridable) | realtime_api.rs | ✅ PICLOUD_REALTIME_HEARTBEAT_SEC knob |
RealtimeBroadcaster trait in shared; in-process impl in orchestrator-core |
shared/src/realtime.rs + orchestrator-core/src/realtime.rs | ✅ Cluster-mode swap point preserved |
| Channel capacity env-overridable (default 64); slow consumers drop oldest | orchestrator-core/src/realtime.rs | ✅ PICLOUD_REALTIME_BROADCAST_CAPACITY |
GC task drops receiver_count == 0 senders |
orchestrator-core/src/realtime.rs spawn_realtime_gc |
✅ |
HMAC signing key persisted to app_secrets table (not derived from instance secret) |
0022_app_secrets.sql + app_secrets_repo.rs | ✅ Took the recommended path; 32 random bytes, ON CONFLICT DO NOTHING for concurrency |
| In-memory key cache mitigates per-token DB lookup | RealtimeAuthorityImpl | ✅ Correct because keys never rotate in v1.1.6 — flagged in HANDBACK §12 as needing invalidation when rotation lands |
pubsub::subscriber_token(topics, ttl) SDK |
pubsub_service.rs:203+ mint_subscriber_token | ✅ Anonymous cx throws; unregistered topic throws; ttl clamped 10s–24h |
| Token TTL knobs env-overridable | pubsub_service.rs | ✅ PICLOUD_SUBSCRIBER_TOKEN_TTL_{MIN,MAX,DEFAULT}_SEC |
| Publish wiring: outbox commit FIRST, then broadcast on child task | pubsub_service.rs:138-201 | ✅ §8 ordering, broadcast inside tokio::spawn whose JoinHandle is awaited so panics surface as warn logs |
published_at stamped once, shared by both delivery paths |
pubsub_service.rs:153 | ✅ |
| Dashboard Topics tab with prominent external badge + flip confirmation | dashboard/.../+page.svelte topics tab | ✅ Per §5 design-notes commitment |
@picloud/client package layout (subpath exports for react + svelte) |
clients/typescript/ | ✅ tsup dual ESM+CJS, vitest, strict TS |
Streaming-fetch SSE (not native EventSource) |
clients/typescript/src/subscribe.ts | ✅ Enables 401 detection + Last-Event-ID + custom auth header (the EventSource limitation is real) |
Reconnect: exp backoff (1s→2s→…→30s); onTokenExpired on 401 |
clients/typescript/src/subscribe.ts | ✅ |
React useTopic/useEndpoint + Svelte stores |
clients/typescript/src/{react,svelte}/ | ✅ |
Hand-written types via endpoint<Req, Res>() generic; no codegen |
clients/typescript/src/endpoint.ts | ✅ Codegen explicitly deferred to v1.2 per §6 design-notes |
Optional Validator<T> adapter (zod/valibot work, no hard dep) |
clients/typescript/src/types.ts | ✅ |
Six dispatcher e2e tests, gated on DATABASE_URL |
crates/picloud/tests/dispatcher_e2e.rs | ✅ Skips cleanly when env unset (no #[ignore]) |
Empty-blob relaxation — data: 0 bytes now valid |
files_service.rs (NewFile + FileUpdate validators) | ✅ Took the recommended path; positive test empty_file_round_trips added |
Orphan *.tmp.* sweeper, every 6h, deletes >1h old |
files_sweep.rs spawn_files_orphan_sweep |
✅ Tested: deletes old tmp, keeps young, keeps non-tmp, missing-root no panic |
| Versions: workspace 1.1.5→1.1.6, SDK 1.6→1.7, dashboard 0.11.0→0.12.0, client@1.0.0 | Cargo.toml + version.rs + package.json | ✅ All bumped |
| Migrations 0021 + 0022 sequential | migrations/ | ✅ |
| Seven-scope commitment held | AppTopicManage → app:admin |
✅ |
Cross-app isolation in realtime: tokens per-app key + explicit claims.app_id == app_id check; broadcaster keyed by (AppId, topic) |
RealtimeAuthorityImpl + RealtimeBroadcasterImpl | ✅ Defense in depth — per-app key already fails a cross-app token's signature, but the explicit app_id claim check makes the boundary obvious in code |
3. The §4-vs-§8 ordering contradiction (HANDBACK §9 #1)
The brief literally contradicted itself. §4 said:
"Order: realtime broadcast FIRST (fast, in-memory), then transactional outbox fan-out (slower)."
§8 said:
"Order matters: 1. Validate. 2. Transactional fan-out to outbox. 3. Commit. 4. Non-transactional broadcast to in-process subscribers."
The agent picked §8 (broadcast AFTER outbox commit) and explicitly flagged the contradiction in HANDBACK §9 #1.
§8 is correct. Three reasons:
-
Correctness over latency. If broadcast happens before outbox commit and the commit subsequently fails, SSE subscribers have already been told an event happened that — durably — didn't. They can't replay the apology. §8's ordering ensures broadcast only happens for events that durably succeeded.
-
§8 is the dedicated, numbered, rationale-backed section. §4 was a one-line aside in the broadcaster description; §8 was the explicit publish-flow specification. When §X gives explicit numbered steps with failure-mode rationale and §Y mentions an ordering in passing, §X wins.
-
The agent's failure-mode analysis is right. Per pubsub_service.rs:170-173: broadcast failure after commit means "durable deliveries still happen; SSE subscribers miss this event (no replay in v1.1.6)." Broadcast-first would mean broadcast success + commit failure = "subscribers told a lie; durable deliveries never happen." The latter is strictly worse.
Verdict: confirm §8. The agent acted on the v1.1.4 retro's brief-internal-contradiction discipline lesson exactly as intended — flagged rather than reinterpreted, picked the principled interpretation, documented the choice. This is the right behavior; the lesson stuck.
The v1.1.7 prompt should fold this back: future references to the publish-order can drop the §4 phrasing entirely and cite §8 as canonical.
4. Latent finding: dead_letter handlers never fire (HANDBACK §10)
Verified. Grepping for list_matching_dead_letter callers:
crates/manager-core/src/outbox_event_emitter.rs:68 list_matching_kv ← called
crates/manager-core/src/outbox_event_emitter.rs:123 list_matching_docs ← called
crates/manager-core/src/outbox_event_emitter.rs:184 list_matching_files ← called
list_matching_dead_letter ← NO PRODUCTION CALLER
TriggerRepo::list_matching_dead_letter is defined in the trait (trigger_repo.rs:356), implemented for Postgres (trigger_repo.rs:947), and exists in test fakes (triggers_api.rs:830). But no code path in the dispatcher or the emitter calls it. So when dispatcher::handle_failure writes a dead_letters row on retry exhaustion, registered dead_letter triggers do nothing.
This is a real bug from v1.1.1. The design notes §4 specified dead_letter triggers as a shipped v1.1.1 feature; the wiring was never connected. v1.1.2/v1.1.3/v1.1.4/v1.1.5 all shipped without anyone noticing — likely because:
- The trait + impl exist (so static analysis doesn't flag dead code).
- v1.1.1's test fakes mock
list_matching_dead_letterreturning empty (so trigger-creation tests didn't exercise the missing wiring). - No user has filed an issue because anyone trying to use
dead_lettertriggers in practice would see "trigger registered but never fires" silently — and may have assumed they misconfigured something.
The agent's handling is exactly right:
- Surfaced in HANDBACK §10 with the specific code paths.
- Did NOT attempt a fix (out of v1.1.6 scope).
- Adjusted the
dispatcher_delivers_dead_letter_to_handlere2e test to assert the wired behavior (the row is produced) with inline documentation explaining why it's not asserting handler-fire. This is the honest test for what the code currently does.
For v1.1.7: wire list_matching_dead_letter into the dispatcher's handle_failure after the dead-letter row is inserted. The recursion-stop rule from v1.1.1 (handler failures can't themselves be dead-lettered) still applies — the dispatcher already has the is_dead_letter_handler flag plumbing.
For deployments: this bug has been silently shipped since v1.1.1. Anyone running v1.1.1–v1.1.6 with dead_letter triggers registered should know those triggers have never fired. The fix in v1.1.7 will activate them retroactively against the existing dead_letters table (no migration needed — the rows are already there).
Worth a CHANGELOG.md callout in v1.1.7 alongside the fix, similar to how v1.1.3's cross-app trigger gap got a retroactive security note.
Verdict: not a v1.1.6 blocker. The bug predates this release; v1.1.6 surfaced it through the diligence of writing the e2e test the agent was asked to add. Excellent defensive work.
5. Test count discrepancy
HANDBACK §8 attests cargo test --workspace → 482 passed. My re-run on the same HEAD reports ~550 passed. Counting the unique test result: ok. N passed lines:
manager-core 256 (was 229 in v1.1.5 → +27)
executor-core/sdk_* 15+15+8+8+7+5+1+1+17 = 77
orchestrator-core 74 (was 62 → +12; realtime SSE + broadcaster + key cache tests)
stdlib 43
sdk_contract 30
modules 23
picloud 21 (incl. 6 dispatcher_e2e skipping no-op)
schema_snapshot 1
shared/pubsub 6 (or somewhere thereabouts)
files-related 20
───
~550
The agent's 482 count was likely a snapshot taken before the final commit added a test file, or a cargo test --workspace 2>&1 | grep -c "passed" (counts lines, not values) misread. Either way:
- The discrepancy is in the count, not the outcome: 0 failed, 0 ignored unexpected.
- The gates exit 0; clippy is clean; fmt is clean.
- The implementation passes every named-critical test from the prompt's §13.
Verdict: minor accounting drift, not a blocker. Flag for the v1.1.7 retro: the §8 attestation should be the literal cargo test --workspace final-line output (X passed in Y crates) or a sum verified by awk '/test result: ok/ { sum += $4 } END { print sum }', not a hand count.
6. Substantive strengths
1. Streaming-fetch SSE in the client lib was the right call. Native EventSource can't set custom auth headers (forcing the ?token= query-string path, which the server still supports as an EventSource-compat option). But for the client lib, dropping EventSource in favor of streaming fetch unlocks three things at once: bearer-header auth (cleaner than query-string), 401 detection on (re)connect → onTokenExpired callback → token refresh → reconnect, and Last-Event-ID resume header (server ignores it in v1.1.6 but the client ships ready). Trade-off: requires fetch streaming, so React Native needs a polyfill — the README documents this. Right trade for v1.1.6's target audience.
2. The HMAC-signing-key persisted-table choice avoids a global secret. The agent took the recommended path: per-app 32-byte random keys in app_secrets. No PICLOUD_INSTANCE_SECRET env var to operate. Future v1.1.7 encrypted-per-app-secrets work has its natural home. The cost — one DB read per subscribe — is mitigated by the in-process key cache (correct in v1.1.6 because keys never rotate; HANDBACK §12 flags the rotation-invalidation requirement for future).
3. Defense in depth on cross-app isolation. Per-app signing key + explicit claims.app_id == app_id check + broadcaster channels keyed by (AppId, topic). Any single guard would suffice; all three together make the boundary obvious in code AND impossible to bypass via a single mistake.
4. Three v1.1.5 follow-ups all landed. The empty-blob relaxation, the orphan tmp-sweeper, the six dispatcher e2e tests. All in this release, not deferred. The e2e tests are gated on DATABASE_URL cleanly via early-return (matches the v1.1.5 schema_snapshot pattern); CI's Postgres service exercises them.
5. The agent's discipline carryover is exemplary. Both flagged items (§4-vs-§8 contradiction, dead_letter latent finding) were caught by the v1.1.4 + v1.1.3 retro discipline lessons: brief-internal contradictions get flagged-not-reinterpreted, latent security/correctness findings get their own HANDBACK section. The §8 attestation was taken on the actual HEAD with the explicit "this HANDBACK commit is pure markdown" footnote. Every deviation is in §7. The system is working.
6. Commit split. Three commits — realtime+followups+versions, client lib, handback. Cleaner than v1.1.5's three substantive (because the client lib genuinely is a standalone artifact in a different toolchain), and the build-app cross-crate constraint that drove a single big realtime commit is honestly documented in HANDBACK §7 #1.
7. Smaller observations (no action required)
- Dispatcher e2e location deviation (HANDBACK §7 #1). Brief said
crates/manager-core/tests/; agent put them incrates/picloud/tests/becausebuild_applives there. The cycle the agent describes (manager-core → picloud dev-dep) is real. The picloud location is correct —build_appis where the dispatcher + scheduler + executor are wired into one stack, and that wiring is what the e2e tests need to exercise. - Empty-blob relaxation extended to
FileUpdate::validate(HANDBACK §7 #2). The brief only namedNewFile::validate. Extending to update is correct — relaxing create-empty but rejecting update-to-empty would be an inconsistent API. - Topic-name validation (HANDBACK §7 #4). Small defensive add: empty names and names containing
*rejected at admin endpoint. Defends against operator confusion when topic-pattern external subscription lands in v1.2 (preemptive:*will mean something there, so reserving it now avoids a future breaking validation). - Client lib lint via
tsc --noEmitinstead of eslint (HANDBACK §7 #5). Right call for v1.1.6 — strict TS does most of what eslint would, and adding eslint configuration is a separate scope of work. Easy to add later if there's a real type-system-can't-catch-it lint rule we need. - Cron e2e 45s poll budget (HANDBACK §7 #6). Defensive against the default 30s tick interval; CI sets
PICLOUD_CRON_TICK_INTERVAL_MS=1000to make it ~2s. Reasonable. broadcast::Sendershape vsoneshot::Sender. v1.1.1'sInboxRegistryuses oneshot (single delivery). v1.1.6's broadcaster usestokio::sync::broadcast(repeated delivery to multiple receivers). Different patterns for different problems; the trait split in shared keeps both Cluster-mode-swappable.
8. Versioning audit
| File | Before | After | Status |
|---|---|---|---|
Workspace Cargo.toml |
1.1.5 | 1.1.6 | ✅ |
SDK schema (shared/src/version.rs) |
1.6 | 1.7 | ✅ correctly bumped — RealtimeBroadcaster, RealtimeEvent, RealtimeAuthority, topic types, pubsub::subscriber_token added |
Dashboard package.json |
0.11.0 | 0.12.0 | ✅ |
@picloud/client package.json |
(new) | 1.0.0 | ✅ Initial release |
| Migrations | 0001..0020 | 0021..0022 added | ✅ sequential |
| CHANGELOG.md | v1.1.5 entry | v1.1.6 entry added | ✅ |
9. Recommended next steps (post-merge)
- Merge
feat/v1.1.6-realtime-clientintomain(fast-forward; branch is linear ahead). docker compose downwhen convenient to tear down the dev Postgres container the agent left running.- Pause before dispatching v1.1.7 (Configuration & Email).
- For the v1.1.7 dispatch prompt, fold in:
- Fix
dead_letterhandler wiring (§4 above). Add a call tolist_matching_dead_letterindispatcher::handle_failureafter the row is inserted, enqueue an outbox row per matching trigger withsource_kind: 'dead_letter'and the appropriateTriggerEvent::DeadLetterpayload. The recursion-stop rule (handlers can't be dead-lettered) is already implemented; the wiring just isn't connected. Plus a CHANGELOG retroactive note covering v1.1.1–v1.1.6 ("dead_letter triggers were registerable but never fired; this release activates them against existing dead_letters rows, no migration needed"). - Encrypted per-app secrets. v1.1.7's brief topic. The
app_secretstable from v1.1.6 is the natural home; the realtime signing key already lives there. v1.1.7's encrypted-secrets work should extend the table (or add a sibling) for general per-app secrets. - §8 attestation discipline refinement: require the §8 attestation be sourced from
cargo test --workspace 2>&1 | tailrather than a hand count (per §5 above). - The brief-internal-contradiction discipline lesson stuck. v1.1.7's brief should be walked through for example/spec contradictions before dispatch, same as the v1.1.5 retro lesson the v1.1.6 brief honored. Keep doing this.
- Fix
- For the v1.1.8 dispatch prompt (User Management): the
app_secretstable + theRealtimeAuthoritytrait shape are ready for v1.1.8 to addauth_mode = 'session'as the third subscriber-auth flavor (extending the CHECK constraint, adding a session-token validator alongside the existing HMAC validator behind the unchanged trait).
Branch is ready for merge. Verdict: APPROVE.