Commit Graph

2 Commits

Author SHA1 Message Date
MechaCat02
9fe0f26d75 feat: in-process crawler daemon with cron and worker pool (0.28.0)
The backend now boots an internal crawler daemon that runs a daily
metadata pass (CRAWLER_DAILY_AT in CRAWLER_TZ, advisory-lock guarded
for multi-replica safety) and drains SyncChapterContent jobs from
crawler_jobs through a worker pool. Chromium launches lazily on first
job and is torn down after CRAWLER_IDLE_TIMEOUT_S seconds of inactivity.

Modules:
- crawler::browser_manager — lazy-launch / idle-teardown wrapper
  around browser::Handle, with an on_launch hook that re-injects
  PHPSESSID on every fresh Chromium spawn.
- crawler::pipeline — run_metadata_pass (the shared discover/upsert
  /cover/sync-chapters loop) and the enqueue_bookmarked_pending helper
  used by the cron tick.
- crawler::daemon — cron task + worker pool, behind two trait seams
  (MetadataPass, ChapterDispatcher) so tests can inject stubs without
  standing up Chromium or a live source.

Behavior:
- CRAWLER_DAEMON=false skips daemon spawn entirely (default for tests).
- Catch-up tick fires on startup if the last persisted slot was missed.
- A SyncOutcome::SessionExpired sets a sticky AtomicBool; workers
  idle until operator restart with a refreshed PHPSESSID.
- Worker dispatch wrapped in catch_unwind so a panicking handler
  marks the job failed instead of taking down the worker.
- Migration 0015 adds a small crawler_state k-v table for the
  last_metadata_tick_at watermark.

Dep additions: chrono-tz (IANA TZ parsing).

CLI (bin/crawler) reuses pipeline::run_metadata_pass and now holds
the browser via BrowserManager so the on_launch session injection
flow stays in one place. Inline chapter-content sync semantics are
unchanged — the queue is for the daemon, force-refetches and manual
backfills still bypass it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:32:02 +02:00
MechaCat02
6c1d04aaf4 chore: initial project scaffold
Set up Mangalord with a Rust/axum backend, SvelteKit frontend, Postgres,
and Docker Compose deployment. Establishes the architecture and TDD
patterns the project will extend:

- Hexagonal-ish backend layering (domain / repo / storage / api) with
  a pluggable Storage trait (LocalStorage today, S3 as a future impl).
- Initial migration: users, mangas, chapters, bookmarks.
- Vertical slice for mangas (list, search, create, get) with
  #[sqlx::test] integration coverage and storage unit tests.
- SvelteKit frontend using Svelte 5 runes, typed API client, Vitest
  unit tests and Playwright e2e with route mocking.
- CLAUDE.md documenting layering, TDD/git/SemVer workflow rules, and
  extension points (tags, fulltext search, OCR, S3, auth).
- Project-scoped .claude/settings.json with permission allowlist for
  the toolchain (git, cargo, npm/vite, docker, psql, gh, doc fetches).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:16 +02:00