feat: crawler job queue ops and dedup index (0.27.0)

Adds enqueue / lease / ack_done / ack_failed / release / reap_done on
crawler::jobs, backed by the existing crawler_jobs table. lease() uses
a single FOR UPDATE SKIP LOCKED CTE that also re-claims stale running
rows (crashed-worker recovery), and ack_failed applies an exponential
backoff capped at 1h before retrying.

Migration 0014 adds a partial unique index on
(payload->>'chapter_id') restricted to (pending|running)
sync_chapter_content jobs, so producers can just
INSERT ... ON CONFLICT DO NOTHING without racing each other. The slot
frees again the moment the job leaves the in-flight states, so a
future force-refetch can re-enqueue.

Library-only — no daemon, no API hook. Those land in the next two
phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-25 19:59:09 +02:00
parent 89b84252a5
commit 93c7fd63fc
6 changed files with 676 additions and 6 deletions

View File

@@ -0,0 +1,15 @@
-- Dedup SyncChapterContent jobs in flight.
--
-- Without this, the daemon's bookmark/cron enqueue paths would have to do a
-- pre-check + insert race that's incorrect under concurrency. The partial
-- unique index lets both producers use plain `INSERT ... ON CONFLICT DO
-- NOTHING`: at most one (pending|running) job per chapter_id exists, and the
-- slot frees again as soon as the job transitions to done/failed/dead so a
-- re-enqueue is possible after the row is reaped or a force-refetch is wanted.
--
-- Scoped to sync_chapter_content payloads only so Discover / SyncManga /
-- SyncChapterList jobs (which don't carry a chapter_id) remain un-deduped.
CREATE UNIQUE INDEX crawler_jobs_chapter_content_dedup_idx
ON crawler_jobs ((payload->>'chapter_id'))
WHERE state IN ('pending', 'running')
AND payload->>'kind' = 'sync_chapter_content';