feat(crawler): live status surface, runtime session, dead-job repo, auto-restart

Adds the in-process observability + control infrastructure the admin
dashboard consumes:

- status.rs: CrawlerStatus/Phase/WorkerState + StatusHandle. The daemon
  publishes its current phase (idle/walking/fetching-metadata/cover-backfill),
  per-worker activity, and last-pass summary. Wired through the cron,
  run_metadata_pass, and the worker loop.
- session_control.rs: SessionController refreshes PHPSESSID at runtime —
  rewrites the shared reqwest cookie jar, updates the value on_launch reads,
  persists to crawler_state (survives restart), and clears the expired flag.
  on_launch now reads the live session instead of a startup snapshot.
- RealChapterDispatcher auto-triggers a coordinated browser restart after
  CRAWLER_BROWSER_RESTART_THRESHOLD consecutive transient failures.
- repo::crawler: list_dead_jobs, requeue_dead_jobs (all/manga/job, bypassing
  the quarantine, skipping live duplicates), job_state_counts.
- AppState gains CrawlerControl bundling browser_manager + session + status
  + metadata_pass for the admin endpoints.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-03 20:38:54 +02:00
parent 3f91bea768
commit cd0a1e13a9
12 changed files with 787 additions and 19 deletions

View File

@@ -78,6 +78,7 @@ fn harness_with_auth_config(
// handlers return 503 in this config. Tests that need a stub
// resync service swap it in via `harness_with_resync`.
resync: None,
crawler: None,
};
Harness { app: router(state), _storage_dir: storage_dir }
}
@@ -152,6 +153,7 @@ pub fn harness_with_resync(
},
auth_limiter,
resync: Some(resync),
crawler: None,
};
Harness {
app: router(state),