feat: crawler job queue ops and dedup index (0.27.0)
Adds enqueue / lease / ack_done / ack_failed / release / reap_done on crawler::jobs, backed by the existing crawler_jobs table. lease() uses a single FOR UPDATE SKIP LOCKED CTE that also re-claims stale running rows (crashed-worker recovery), and ack_failed applies an exponential backoff capped at 1h before retrying. Migration 0014 adds a partial unique index on (payload->>'chapter_id') restricted to (pending|running) sync_chapter_content jobs, so producers can just INSERT ... ON CONFLICT DO NOTHING without racing each other. The slot frees again the moment the job leaves the in-flight states, so a future force-refetch can re-enqueue. Library-only — no daemon, no API hook. Those land in the next two phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
15
backend/migrations/0014_crawler_jobs_dedup_index.sql
Normal file
15
backend/migrations/0014_crawler_jobs_dedup_index.sql
Normal file
@@ -0,0 +1,15 @@
|
||||
-- Dedup SyncChapterContent jobs in flight.
|
||||
--
|
||||
-- Without this, the daemon's bookmark/cron enqueue paths would have to do a
|
||||
-- pre-check + insert race that's incorrect under concurrency. The partial
|
||||
-- unique index lets both producers use plain `INSERT ... ON CONFLICT DO
|
||||
-- NOTHING`: at most one (pending|running) job per chapter_id exists, and the
|
||||
-- slot frees again as soon as the job transitions to done/failed/dead so a
|
||||
-- re-enqueue is possible after the row is reaped or a force-refetch is wanted.
|
||||
--
|
||||
-- Scoped to sync_chapter_content payloads only so Discover / SyncManga /
|
||||
-- SyncChapterList jobs (which don't carry a chapter_id) remain un-deduped.
|
||||
CREATE UNIQUE INDEX crawler_jobs_chapter_content_dedup_idx
|
||||
ON crawler_jobs ((payload->>'chapter_id'))
|
||||
WHERE state IN ('pending', 'running')
|
||||
AND payload->>'kind' = 'sync_chapter_content';
|
||||
Reference in New Issue
Block a user