fix(crawler): scope dispatch_target to live sources, newest first (0.36.4)

The chapter dispatcher's URL resolver had no dropped_at filter and no
ORDER BY — a chapter whose only chapter_sources row had been soft-
dropped was still dispatched against the stale URL, eating retry
budget on guaranteed transients. With multiple live sources the LIMIT
1 winner was nondeterministic.

Add `AND cs.dropped_at IS NULL` and `ORDER BY cs.last_seen_at DESC`
to dispatch_target, bringing it in lockstep with the enqueue queries
in pipeline.rs that already filter on dropped_at. Returns None when
all sources are dropped — callers in daemon.rs already treat None
as "ack the job, skip the work."

Tests in tests/repo_chapter.rs cover the three branches (freshest
live wins, dropped sources skipped, all-dropped returns None).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-30 20:03:45 +02:00
parent 70e8a7895c
commit 655ea42731
5 changed files with 182 additions and 7 deletions

View File

@@ -118,10 +118,21 @@ pub async fn page_count(pool: &PgPool, id: Uuid) -> sqlx::Result<Option<i32>> {
.await
}
/// Look up the manga_id + most recent source_url for a chapter. Used
/// by the daemon's chapter dispatcher to resolve the URL it needs to
/// hand to `content::sync_chapter_content`. Returns `None` if the
/// chapter (or its source row) is gone.
/// Look up the manga_id + most recent live source_url for a chapter.
/// Used by the daemon's chapter dispatcher to resolve the URL it needs
/// to hand to `content::sync_chapter_content`.
///
/// Skips soft-dropped sources (`cs.dropped_at IS NOT NULL`) and breaks
/// ties between multiple live sources by `last_seen_at DESC`, so the
/// freshest still-attached URL wins. Returns `None` when the chapter
/// is gone or all its source rows are dropped — callers in the
/// dispatcher treat `None` as "ack the job, skip the work."
///
/// The enqueue queries (`pipeline::enqueue_bookmarked_pending` and
/// `enqueue_pending_for_manga`) apply the same `dropped_at IS NULL`
/// filter — this resolver stays in lockstep so a chapter that was
/// dropped between enqueue and lease isn't dispatched against a stale
/// URL.
pub async fn dispatch_target(
pool: &PgPool,
chapter_id: Uuid,
@@ -131,6 +142,8 @@ pub async fn dispatch_target(
FROM chapters c \
JOIN chapter_sources cs ON cs.chapter_id = c.id \
WHERE c.id = $1 \
AND cs.dropped_at IS NULL \
ORDER BY cs.last_seen_at DESC \
LIMIT 1",
)
.bind(chapter_id)