Commit Graph

2 Commits

Author SHA1 Message Date
MechaCat02
e02d125f51 feat(crawler): live cover + chapter-content observability with realtime page counts
Extends the live dashboard so an operator can see exactly what's being
fetched, in realtime:

- Chapters being crawled now are tracked in the status as `active_chapters`
  (manga title · ch.N) with a live page counter that climbs per stored page
  (set_chapter_pages, pushed via the existing watch→SSE). The dispatcher
  registers each via an RAII ChapterGuard (sync Mutex) that removes the
  entry on completion, panic, or timeout-drop — replacing the old per-worker
  slot model.
- Covers: status now carries the cover being fetched now (`current_cover`,
  set around download_and_store_cover in both the metadata pass and backfill)
  and a `covers_queued` backlog count; CoverBackfill phase gains index/total.
- Two paginated backlog endpoints (fetched on demand, auto-refreshed when the
  live counts change): GET /admin/crawler/active-jobs (which chapters of which
  mangas are queued/running) and GET /admin/crawler/covers (mangas missing a
  cover). repo: list_active_jobs, list_missing_cover_mangas, count_missing_covers.
- dispatch_target now also returns manga title + chapter number.

Frontend: the crawler page replaces the Workers table with an Active-chapters
table (live page bars), adds a current-cover line + covers-queued figure, and
two backlog sections (Queued chapters / Queued covers) with search + Pager,
auto-refetched via $effect on the live counts.

Tests: status guard/page + cover unit tests; repo list/count tests; endpoint
tests; frontend api tests. Version 0.53.1 -> 0.54.0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 20:41:51 +02:00
MechaCat02
655ea42731 fix(crawler): scope dispatch_target to live sources, newest first (0.36.4)
The chapter dispatcher's URL resolver had no dropped_at filter and no
ORDER BY — a chapter whose only chapter_sources row had been soft-
dropped was still dispatched against the stale URL, eating retry
budget on guaranteed transients. With multiple live sources the LIMIT
1 winner was nondeterministic.

Add `AND cs.dropped_at IS NULL` and `ORDER BY cs.last_seen_at DESC`
to dispatch_target, bringing it in lockstep with the enqueue queries
in pipeline.rs that already filter on dropped_at. Returns None when
all sources are dropped — callers in daemon.rs already treat None
as "ack the job, skip the work."

Tests in tests/repo_chapter.rs cover the three branches (freshest
live wins, dropped sources skipped, all-dropped returns None).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 20:03:45 +02:00