feat: crawler job queue ops and dedup index (0.27.0)

Adds enqueue / lease / ack_done / ack_failed / release / reap_done on crawler::jobs, backed by the existing crawler_jobs table. lease() uses a single FOR UPDATE SKIP LOCKED CTE that also re-claims stale running rows (crashed-worker recovery), and ack_failed applies an exponential backoff capped at 1h before retrying. Migration 0014 adds a partial unique index on (payload->>'chapter_id') restricted to (pending|running) sync_chapter_content jobs, so producers can just INSERT ... ON CONFLICT DO NOTHING without racing each other. The slot frees again the moment the job leaves the in-flight states, so a future force-refetch can re-enqueue. Library-only — no daemon, no API hook. Those land in the next two phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 19:59:09 +02:00
parent 89b84252a5
commit 93c7fd63fc
6 changed files with 676 additions and 6 deletions
--- a/backend/migrations/0014_crawler_jobs_dedup_index.sql
+++ b/backend/migrations/0014_crawler_jobs_dedup_index.sql
@@ -0,0 +1,15 @@
+-- Dedup SyncChapterContent jobs in flight.
+--
+-- Without this, the daemon's bookmark/cron enqueue paths would have to do a
+-- pre-check + insert race that's incorrect under concurrency. The partial
+-- unique index lets both producers use plain `INSERT ... ON CONFLICT DO
+-- NOTHING`: at most one (pending|running) job per chapter_id exists, and the
+-- slot frees again as soon as the job transitions to done/failed/dead so a
+-- re-enqueue is possible after the row is reaped or a force-refetch is wanted.
+--
+-- Scoped to sync_chapter_content payloads only so Discover / SyncManga /
+-- SyncChapterList jobs (which don't carry a chapter_id) remain un-deduped.
+CREATE UNIQUE INDEX crawler_jobs_chapter_content_dedup_idx
+    ON crawler_jobs ((payload->>'chapter_id'))
+ WHERE state IN ('pending', 'running')
+   AND payload->>'kind' = 'sync_chapter_content';