Worker dispatch was already wrapped in AssertUnwindSafe(...)
.catch_unwind() — a panicking handler ack's the job failed and the
worker keeps going. The cron tick had no such guard: a panic in
metadata.run, enqueue_bookmarked_pending, reap_done, or
write_last_tick would kill the cron task. The JoinSet would drop it,
workers would keep running, and no future metadata pass would ever
fire until daemon restart.
Wrap the tick body (between advisory-lock acquire and unlock) in the
same AssertUnwindSafe(...).catch_unwind() pattern. The unlock and
connection drop run unconditionally so a panicked tick doesn't leave
the lock held for another replica.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layering cleanups from REVIEW.md §5 / §3:
- Drop the three private `is_unique_violation` helpers in
repo::{user,chapter,bookmark} in favour of sqlx 0.8's
`DatabaseError::is_unique_violation()` method (already used by
repo::collection).
- Remove the unreachable 23505 branch in repo::chapter::create — the
(manga_id, number) UNIQUE was dropped in 0013, so the defensive arm
could no longer fire. A doc note records what to do if uniqueness
is re-added.
- Move three inline SQL queries out of handlers/daemon into repo
functions: bookmarks' chapter-belongs-to-manga guard
(`repo::chapter::belongs_to_manga`), the daemon's dispatch lookup
(`repo::chapter::dispatch_target`), and the daemon's page_count
safety net (`repo::chapter::page_count`). Restores the
handlers→repo layering invariant in CLAUDE.md.
- New `crawler::url_utils` module consolidates host_of / origin_of /
registrable_domain — they used to live in three crawler submodules
with diverging edge-case behaviour. Tests moved with them.
- Doc cross-references on repo::author::set_for_manga and
repo::genre::set_for_manga pointing to the crawler's name-keyed
variants, so the intentional duplication is discoverable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The backend now boots an internal crawler daemon that runs a daily
metadata pass (CRAWLER_DAILY_AT in CRAWLER_TZ, advisory-lock guarded
for multi-replica safety) and drains SyncChapterContent jobs from
crawler_jobs through a worker pool. Chromium launches lazily on first
job and is torn down after CRAWLER_IDLE_TIMEOUT_S seconds of inactivity.
Modules:
- crawler::browser_manager — lazy-launch / idle-teardown wrapper
around browser::Handle, with an on_launch hook that re-injects
PHPSESSID on every fresh Chromium spawn.
- crawler::pipeline — run_metadata_pass (the shared discover/upsert
/cover/sync-chapters loop) and the enqueue_bookmarked_pending helper
used by the cron tick.
- crawler::daemon — cron task + worker pool, behind two trait seams
(MetadataPass, ChapterDispatcher) so tests can inject stubs without
standing up Chromium or a live source.
Behavior:
- CRAWLER_DAEMON=false skips daemon spawn entirely (default for tests).
- Catch-up tick fires on startup if the last persisted slot was missed.
- A SyncOutcome::SessionExpired sets a sticky AtomicBool; workers
idle until operator restart with a refreshed PHPSESSID.
- Worker dispatch wrapped in catch_unwind so a panicking handler
marks the job failed instead of taking down the worker.
- Migration 0015 adds a small crawler_state k-v table for the
last_metadata_tick_at watermark.
Dep additions: chrono-tz (IANA TZ parsing).
CLI (bin/crawler) reuses pipeline::run_metadata_pass and now holds
the browser via BrowserManager so the on_launch session injection
flow stays in one place. Inline chapter-content sync semantics are
unchanged — the queue is for the daemon, force-refetches and manual
backfills still bypass it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>