feat: in-process crawler daemon with cron and worker pool (0.28.0)
The backend now boots an internal crawler daemon that runs a daily metadata pass (CRAWLER_DAILY_AT in CRAWLER_TZ, advisory-lock guarded for multi-replica safety) and drains SyncChapterContent jobs from crawler_jobs through a worker pool. Chromium launches lazily on first job and is torn down after CRAWLER_IDLE_TIMEOUT_S seconds of inactivity. Modules: - crawler::browser_manager — lazy-launch / idle-teardown wrapper around browser::Handle, with an on_launch hook that re-injects PHPSESSID on every fresh Chromium spawn. - crawler::pipeline — run_metadata_pass (the shared discover/upsert /cover/sync-chapters loop) and the enqueue_bookmarked_pending helper used by the cron tick. - crawler::daemon — cron task + worker pool, behind two trait seams (MetadataPass, ChapterDispatcher) so tests can inject stubs without standing up Chromium or a live source. Behavior: - CRAWLER_DAEMON=false skips daemon spawn entirely (default for tests). - Catch-up tick fires on startup if the last persisted slot was missed. - A SyncOutcome::SessionExpired sets a sticky AtomicBool; workers idle until operator restart with a refreshed PHPSESSID. - Worker dispatch wrapped in catch_unwind so a panicking handler marks the job failed instead of taking down the worker. - Migration 0015 adds a small crawler_state k-v table for the last_metadata_tick_at watermark. Dep additions: chrono-tz (IANA TZ parsing). CLI (bin/crawler) reuses pipeline::run_metadata_pass and now holds the browser via BrowserManager so the on_launch session injection flow stays in one place. Inline chapter-content sync semantics are unchanged — the queue is for the daemon, force-refetches and manual backfills still bypass it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -12,10 +12,21 @@ async fn main() -> anyhow::Result<()> {
|
||||
|
||||
let config = mangalord::config::Config::from_env()?;
|
||||
let addr: SocketAddr = config.bind_address.parse()?;
|
||||
let app = mangalord::app::build(config).await?;
|
||||
let mangalord::app::AppHandle { router, daemon } = mangalord::app::build(config).await?;
|
||||
|
||||
tracing::info!(%addr, "mangalord listening");
|
||||
let listener = tokio::net::TcpListener::bind(addr).await?;
|
||||
axum::serve(listener, app).await?;
|
||||
axum::serve(listener, router)
|
||||
.with_graceful_shutdown(async {
|
||||
let _ = tokio::signal::ctrl_c().await;
|
||||
tracing::info!("ctrl-c received; shutting down");
|
||||
})
|
||||
.await?;
|
||||
|
||||
// Drain background tasks (crawler daemon) before exiting so Chromium
|
||||
// gets a clean shutdown rather than relying on kill-on-drop.
|
||||
if let Some(d) = daemon {
|
||||
d.shutdown().await;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user