feat(crawler): live status surface, runtime session, dead-job repo, auto-restart

Adds the in-process observability + control infrastructure the admin
dashboard consumes:

- status.rs: CrawlerStatus/Phase/WorkerState + StatusHandle. The daemon
  publishes its current phase (idle/walking/fetching-metadata/cover-backfill),
  per-worker activity, and last-pass summary. Wired through the cron,
  run_metadata_pass, and the worker loop.
- session_control.rs: SessionController refreshes PHPSESSID at runtime —
  rewrites the shared reqwest cookie jar, updates the value on_launch reads,
  persists to crawler_state (survives restart), and clears the expired flag.
  on_launch now reads the live session instead of a startup snapshot.
- RealChapterDispatcher auto-triggers a coordinated browser restart after
  CRAWLER_BROWSER_RESTART_THRESHOLD consecutive transient failures.
- repo::crawler: list_dead_jobs, requeue_dead_jobs (all/manga/job, bypassing
  the quarantine, skipping live duplicates), job_state_counts.
- AppState gains CrawlerControl bundling browser_manager + session + status
  + metadata_pass for the admin endpoints.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-03 20:38:54 +02:00
parent 3f91bea768
commit cd0a1e13a9
12 changed files with 787 additions and 19 deletions

View File

@@ -115,6 +115,7 @@ pub async fn run_metadata_pass(
allowlist: &DownloadAllowlist,
max_image_bytes: usize,
max_consecutive_failures: u32,
status: Option<&crate::crawler::status::StatusHandle>,
tor: Option<&crate::crawler::tor::TorController>,
) -> anyhow::Result<MetadataStats> {
let lease = browser_manager
@@ -122,6 +123,9 @@ pub async fn run_metadata_pass(
.await
.context("acquire browser lease for metadata pass")?;
let browser_ref: &chromiumoxide::Browser = &lease;
if let Some(s) = status {
s.set_phase(crate::crawler::status::Phase::WalkingList).await;
}
let source = {
let s = TargetSource::new(start_url.to_string());
@@ -226,6 +230,14 @@ pub async fn run_metadata_pass(
continue;
}
stats.discovered += 1;
if let Some(s) = status {
s.set_phase(crate::crawler::status::Phase::FetchingMetadata {
index: stats.discovered,
total: max_refs,
title: r.title.clone(),
})
.await;
}
tracing::info!(
idx = stats.discovered,
key = %r.source_manga_key,