Mangalord

Author	SHA1	Message	Date
MechaCat02	1eebb90e25	fix(crawler): unhang shutdown on lingering Arc<Browser>, silence WS noise (0.43.1) Some checks failed deploy / test-backend (push) Failing after 6s Details deploy / test-frontend (push) Failing after 40s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details - Handle::close aborts its chromiumoxide driver task when another Arc<Browser> outlives the call, so shutdown returns instead of hanging on a stream that never terminates. Generic close_or_abort helper with regression tests covering both Arc paths. - daemon.shutdown() is wrapped in a 5s timeout in main as defense in depth. - Default RUST_LOG silences chromiumoxide::conn / chromiumoxide::handler WS-deserialize ERROR spam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 14:47:36 +02:00
MechaCat02	030b27754b	feat(api): admin-initiated user creation via POST /admin/users (0.43.0) Some checks failed deploy / test-backend (push) Failing after 8s Details deploy / test-frontend (push) Failing after 38s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details Pairs with the ALLOW_SELF_REGISTER toggle from 0.42.0: admins can mint accounts regardless of the toggle state, so a closed-membership deployment still has a working enrollment path. The endpoint accepts { username, password, is_admin? } so admins can mint co-admins in one call (avoiding a separate promote + extra audit row for the common "invite a co-admin" flow). Implementation: - POST /api/v1/admin/users guarded by RequireAdmin - Reuses validate_username / validate_password from api::auth (made pub(crate)) so the admin path can never produce an account self- register would reject and vice versa - repo::user::admin_create_user wraps INSERT + admin_audit insert in a single tx — same "audit reflects what committed" semantics as the existing admin_safe_* fns - Audit row: action="create_user", payload={username, is_admin} Frontend: - createAdminUser() in lib/api/admin.ts - /admin/users grows a collapsible "Create user" form above the table (username, password, "Make admin" checkbox). Errors surface inline; the list reloads on success. Backend tests: 7 new, including the headline `create_user_works_even_when_self_register_disabled` that pins the admin-create path is NOT gated by the public toggle.	2026-05-31 14:00:31 +02:00
MechaCat02	2f47faa11c	feat(auth): ALLOW_SELF_REGISTER toggle + public /auth/config endpoint (0.42.0) Lets operators run a closed-membership deployment by setting ALLOW_SELF_REGISTER=false (default true, so existing deploys are unaffected). When off, POST /auth/register returns 403 forbidden. The rate-limit token is consumed BEFORE the disabled check so the timing doesn't distinguish enabled-but-rejected from disabled — closes the toggle-state probe channel. New public GET /auth/config returns { self_register_enabled: bool } so the frontend can render its register affordances correctly without conflating "disabled" with "rate-limited" (which a probe attempt would). Frontend: a lightweight reactive `authConfig` store loads the flag once on root-layout mount (and again on /register direct navigation, which bypasses the layout's onMount). Header hides the Register link when the toggle is off; /register renders a "self-registration is disabled — ask an administrator" notice instead of the form. Admin-create endpoint that pairs with this toggle is intentionally not in this PR — it lands as the next branch (feat/admin-user-create). The toggle alone is independently useful for deployments that want to lock down enrollment without yet wiring an admin UI.	2026-05-31 13:56:18 +02:00
MechaCat02	6dd21451a8	chore: sync Cargo.lock for 0.41.2	2026-05-30 22:26:24 +02:00
MechaCat02	f6728dc71a	fix(admin): security-audit findings — paginate chapters, lock down unchecked helper (0.41.2) Addresses the security-audit findings on top of the admin feature stack: M1: /admin/mangas/:id/chapters now paginates (default limit 200, max 500). A long-runner with thousands of chapters would otherwise produce a multi-MB response with that many scalar subqueries per row — admin-only but a real stall risk on one expand-click. Adds explicit pagination tests for the cap and offset; frontend renders a "Showing first N of M" hint when the cap clips the result. L1: repo::user::set_is_admin renamed to set_is_admin_unchecked with a doc-comment pointing at admin_safe_set_is_admin for production use. The short name was a footgun — a future contributor reaching for it would silently bypass self-protection, the last-admin invariant, and the audit log. Used only by integration-test setup; production code goes through the admin_safe_* paths. CSRF posture: build_session_cookie carries a comment that the SameSite=Lax default is the project's CSRF defense for state-changing mutations and breaks the instant anyone adds a side-effecting GET under /admin/*. Spells out what to do then (Strict + explicit token check). Test counts: 43 backend admin tests + 12 vitest admin tests all green; svelte-check 0/0 across 446 files.	2026-05-30 22:23:55 +02:00
MechaCat02	aa2159ca06	fix(admin): three review findings — audit no-op, 404, chapter priority (0.41.1) - admin_safe_set_is_admin: short-circuit when target.is_admin == value, before writing audit. PATCH {is_admin: true} on someone already admin previously wrote a misleading "promote_user" row even though the UPDATE was a no-op. - list_chapters (/admin/mangas/:id/chapters): explicit exists() check on manga_id, returns 404 instead of 200 [] for a typo'd / deleted manga. - ChapterSyncState priority: the Failed branch now requires page_count = 0, so a chapter with pages on disk AND a historical dead job (from a re-download attempt that crashed) stays Synced. The old order contradicted Synced's documented "downloaded at some point" contract. Doc comments updated alongside the SQL. Three new regression tests pin the behaviour.	2026-05-30 21:58:15 +02:00
MechaCat02	b434c9b68d	feat(frontend): /admin dashboard with users/mangas/system views (0.41.0) Adds the SvelteKit /admin route tree backed by the admin endpoints landed in PR 1-4. Pages: Overview (alerts + summary cards), Users (list / promote-demote / delete), Mangas (list with sync state + expandable per-chapter state), System (live disk/mem/cpu bars, refreshing every 5s). Security model: the backend's RequireAdmin extractor is the actual boundary. /admin/+layout.ts calls getSystemStats() at load and translates the response — 401 → redirect to /login, 403 → throw SvelteKit error(403) which renders the framework error page. The header's "Admin" link is hidden unless `session.user?.is_admin`, but that's UX only. Carries `is_admin: boolean` through to the frontend User TS type so the header check works and so admin tables can show role per row. Vitest covers lib/api/admin.ts (10 tests: list/delete/PATCH for users, sync-state filter for mangas, nested chapter route, system disk-nullable case). Playwright is intentionally deferred until the routes stabilise — admin UI is operator-only and changes shape often in v0.	2026-05-30 21:49:39 +02:00
MechaCat02	cc4ec76d17	feat(api): admin system metrics endpoint with disk/mem/cpu alerts (0.40.0) Adds GET /api/v1/admin/system returning disk (scoped to storage_dir via statvfs), memory, CPU, and a server-side alerts array that fires at >90% disk or memory. Disk uses nix::sys::statvfs directly rather than sysinfo's Disks API to avoid mountpoint-matching gymnastics for the storage_dir. A new `Storage::local_root() -> Option<&Path>` trait method exposes the root; the default returns None so a future S3Storage gets `disk: null` in the response instead of fabricated numbers. CPU is sampled inline (refresh → 250ms sleep → refresh → read) so the endpoint adds 250ms of latency per call. No background-cache yet — admin traffic is low-volume and the moving parts aren't worth it until polling shows up. Alerts are evaluated server-side so the frontend can render them without re-implementing the thresholds.	2026-05-30 21:45:06 +02:00
MechaCat02	bf7c9b5c2a	feat(api): admin manga/chapter overview with derived sync state (0.39.0) Adds GET /api/v1/admin/mangas and /admin/mangas/:id/chapters guarded by RequireAdmin. Sync state is computed at query time from the existing crawler signals (manga_sources / chapter_sources / crawler_jobs) — no new state column is persisted, so the crawler stays the single writer of these signals. Per-manga priority: InProgress (in-flight sync_manga or sync_chapter_list job) > Dropped (all source rows soft-dropped) > Synced (default; covers user-uploaded mangas with zero source rows). Per-chapter priority: Downloading (in-flight sync_chapter_content) > Dropped (all source rows soft-dropped) > Failed (most-recent terminal job is dead) > NotDownloaded (page_count = 0) > Synced. The Failed check sits ABOVE NotDownloaded so the more informative "we tried and it died" state wins over "we never got around to it" — see the priority comment in repo/admin_view.rs. Migration 0020 adds a partial index on crawler_jobs((payload->>'source_manga_key')) for the one job kind (sync_manga) whose payload doesn't carry manga_id directly — without it the in-flight detection for a manga falls back to a seqscan over the job table.	2026-05-30 21:41:09 +02:00
MechaCat02	0b2018ceca	feat(api): admin user management endpoints with audit log (0.38.0) Adds /api/v1/admin/users list / DELETE / PATCH guarded by RequireAdmin, plus the audit-log substrate every future destructive admin endpoint will reuse. Safety properties: - Cannot self-delete or self-demote (409 conflict, message calls out "yourself" so the UI can render an explanation). - Cannot remove the last admin via either DELETE or demote. The check takes pg_advisory_xact_lock(ADMIN_INVARIANT_LOCK_KEY) and re-counts admins inside the same tx, closing the parallel-demote race that a bare "if count > 1" check would let through. The HTTP-serial path to this guard is structurally unreachable (the actor would have to be the lone admin demoting themselves, which the self-guard fires on first); the parallel race test exercises it via repo calls. Audit log (admin_audit table) records the action inside the same tx as the action itself, so a rolled-back action never leaves an orphan audit row. actor_user_id is ON DELETE SET NULL so the log outlives a later-deleted admin. target_id is not a FK because future audit kinds will target non-user rows.	2026-05-30 21:35:35 +02:00
MechaCat02	ab8b7acc34	feat(auth): admin role with cookie-only RequireAdmin extractor (0.37.0) Adds an `is_admin` flag on users plus the substrate every later PR in the admin feature builds on: - migration 0018 adds the column with default false - `repo::user::bootstrap_admin` creates or promotes the user named by `ADMIN_USERNAME` at startup, hashing `ADMIN_PASSWORD` only when the row is new — never overwriting an existing hash, so an operator can rotate the admin password via the UI without env-var conflict - `CurrentSessionUser` extractor accepts only the session cookie; `RequireAdmin` composes over it and additionally requires `user.is_admin`. Bearer tokens are intentionally excluded so an admin's bot token never inherits admin authority (privilege-escalation surface that bites every "API keys reuse user perms" auth design) - demotion is instant: `RequireAdmin` re-reads the user row each request `/api/v1/auth/me` now exposes `is_admin`; no other response embeds `User`, so no privacy fanout to audit.	2026-05-30 21:26:26 +02:00
MechaCat02	9925f54695	fix(crawler): narrow browser-dead heuristic to typed downcasts (0.36.7) anyhow_looks_browser_dead substring-matched any chain message containing channel / connection / websocket / transport / closed / nav timeout. Real chromium failures hit those words, but so do reqwest TCP-reset errors during CDN image downloads, sqlx pool- timeout errors, and any number of non-browser failures — each of which triggered a wasted chromium relaunch + session-probe re-run against the catalog's rate-limit budget. Drop the substring pass. Walk the chain looking only for typed NavError (flagged via is_likely_browser_dead) or CdpError. Every place we feed a chromium error into anyhow goes through one of those types, so the typed downcasts cover the real cases without the false-positive surface. NavError::is_likely_browser_dead also drops its own substring check on Cdp(e); any CdpError surfacing at the navigation layer means the chromium-facing channel is the failing layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:41:59 +02:00
MechaCat02	eaa5afda50	fix(crawler): skip sync when empty chapters + prior > 0 (0.36.6) The wait_for_selector wait in 0.36.2 narrows the partial-render race window but doesn't close it: a render that takes longer than SELECTOR_TIMEOUT (10s) still hands an empty Vec to sync_manga_chapters, and the soft-drop branch flips every existing chapter to dropped_at. The next tick recovers but a manga's reader briefly stops working in between. Close it at the pipeline level. Between fetch_manga and the upsert/ sync, if the parsed chapter list is empty and the prior live count for (source_id, source_manga_key) is > 0, treat the fetch as a transient failure: log, bump mangas_failed, skip upsert + sync + the seen.insert so a later batch / tick retries. Brand-new mangas with genuinely zero chapters (prior == 0) pass through unchanged. New repo helper repo::crawler::live_chapter_count_for_source_manga joins chapters → chapter_sources → manga_sources with dropped_at IS NULL — same lockstep as dispatch_target and the enqueue queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:17:42 +02:00
MechaCat02	5c04b0532b	fix(crawler): panic-isolate the cron tick body (0.36.5) Worker dispatch was already wrapped in AssertUnwindSafe(...) .catch_unwind() — a panicking handler ack's the job failed and the worker keeps going. The cron tick had no such guard: a panic in metadata.run, enqueue_bookmarked_pending, reap_done, or write_last_tick would kill the cron task. The JoinSet would drop it, workers would keep running, and no future metadata pass would ever fire until daemon restart. Wrap the tick body (between advisory-lock acquire and unlock) in the same AssertUnwindSafe(...).catch_unwind() pattern. The unlock and connection drop run unconditionally so a panicked tick doesn't leave the lock held for another replica. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:08:11 +02:00
MechaCat02	655ea42731	fix(crawler): scope dispatch_target to live sources, newest first (0.36.4) The chapter dispatcher's URL resolver had no dropped_at filter and no ORDER BY — a chapter whose only chapter_sources row had been soft- dropped was still dispatched against the stale URL, eating retry budget on guaranteed transients. With multiple live sources the LIMIT 1 winner was nondeterministic. Add `AND cs.dropped_at IS NULL` and `ORDER BY cs.last_seen_at DESC` to dispatch_target, bringing it in lockstep with the enqueue queries in pipeline.rs that already filter on dropped_at. Returns None when all sources are dropped — callers in daemon.rs already treat None as "ack the job, skip the work." Tests in tests/repo_chapter.rs cover the three branches (freshest live wins, dropped sources skipped, all-dropped returns None). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:03:45 +02:00
MechaCat02	70e8a7895c	fix(crawler): relaunch chromium on CDP / nav-timeout errors (0.36.3) BrowserManager only re-launched chromium when the cached handle was None. A crash mid-pass left the handle Some pointing at a dead process — every subsequent acquire returned the zombie Browser, and every nav cascaded CDP errors until the idle reaper fired. Add BrowserManager::invalidate(): take the inner mutex, drop the handle (closing it if present), and signal the next acquire to relaunch. Idempotent — invalidating an empty handle is a no-op. Wire detection via NavError::is_likely_browser_dead and a chain-walking anyhow_looks_browser_dead helper: substring-match common channel/connection/transport/WebSocket markers and surface NavError::Timeout as "presumed dead." Apply at both error boundaries — RealChapterDispatcher::dispatch and RealMetadataPass::run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:39:19 +02:00
MechaCat02	8e0b638e3f	fix(crawler): wait for page marker instead of fixed 1s sleep (0.36.2) A chromium snapshot taken between the wrapper-render and row-render phases let parse_chapter_list return Ok(vec![]) for a manga that actually has chapters — the soft-drop branch in sync_manga_chapters then flipped every existing chapter to dropped_at. Add wait_for_selector to crawler::nav. navigate() now takes a CSS marker matching the most-specific element the downstream parser will look for (one of LIST_PAGE_MARKER / DETAIL_PAGE_CHAPTERS_MARKER / DETAIL_PAGE_LAYOUT_MARKER). The wait is best-effort and capped by SELECTOR_TIMEOUT (10s); a legitimately empty page can still pass through because the parser's #chapter_table sentinel and the universal broken-page body check stay in force. Same pattern wired at the reader nav (a#pic_container) and probe nav (#logo), replacing the implicit assumption that the post-load JS had finished within 1 second. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:29:38 +02:00
MechaCat02	e2bd1462ba	fix(crawler): wrap wait_for_navigation in 30s timeout (0.36.1) A hung TLS handshake or a page that never fires load could wedge a worker (or the cron metadata pass) indefinitely — chromiumoxide imposes no navigation timeout of its own. New crawler::nav::wait_for_nav caps each navigation at NAV_TIMEOUT (30s) and returns a typed NavError so timeouts surface as transient (retryable) errors. Wired at the three navigation sites: - source::target::navigate (catalog/detail/pagination) - content::sync_chapter_content (chapter reader) - session::fetch_probe_html (session probe) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:10:51 +02:00
MechaCat02	9f56f283d4	feat(crawler): single-mode walker gated by recovery flag (0.36.0) Collapses the crawler to a single newest-first walker and replaces the N-consecutive-unchanged streak with a per-manga rule: stop on the first manga where metadata is Unchanged AND chapter sync reports zero new chapters. The early stop is gated by a per-source recovery flag stored in `crawler_state` — set to `false` when a run starts, back to `true` only on a clean exit (end-of-walk or intentional stop). A crashed run leaves the flag `false` automatically (no shutdown code runs), so the next tick walks the full catalog instead of bailing at the first caught-up manga. This means a crashed mid-walk run self-heals on the next tick: the flag stays `false`, the next walk visits every page (recovering anything the crash missed past its crash point), and steady state resumes once the recovery sweep reaches end-of-walk. Removed: - DiscoverMode enum, Backfill mode, the boundary re-check + displaced-refs machinery in TargetSourceWalker. - Drop-pass (mark_dropped_mangas) and seed-completion plumbing (mark_seed_completed / seed_completed_at). The recovery flag subsumes the seed-completion signal; drop detection was explicitly opted out. - JobPayload::Discover (no production callers). - CRAWLER_MODE / CRAWLER_INCREMENTAL_STOP_AFTER env vars and the CrawlerModePref config type. `should_mark_clean_exit(walked_to_completion, hit_stop_condition)` encodes the clean-exit truth table in its signature — `hit_limit` is deliberately absent so a future edit cannot accidentally count a caller-imposed cap as a clean exit. Net -501 lines, 261 backend tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 23:49:28 +02:00
MechaCat02	33f7e19077	fix(crawler): serialize sync_manga_chapters per-manga (0.35.6) Two concurrent calls of sync_manga_chapters for the same manga both read seen_keys, both run the drop UPDATE filtered on `NOT (key = ANY $3)`, and the later commit can soft-drop a chapter the earlier had just inserted (lost-update under MVCC). Today the cron tick is the only caller and the daemon-level advisory lock keeps it single-flight, but that lock is held on one pool connection and doesn't actually serialize the function: any future caller (bookmark hook, admin-triggered re-sync, parallel worker) would race against the cron. Add `pg_advisory_xact_lock(hashtextextended(manga_id::text, 0))` at the start of the transaction. Auto-releases on commit/rollback so a panic mid-call can't strand the lock. Lock keyed per-manga so calls for different mangas still parallelize. Test sync_chapters_serializes_concurrent_calls_for_same_manga spawns two tokio tasks calling the function concurrently with overlapping chapter lists and asserts every chapter survives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:45:01 +02:00
MechaCat02	c6bb9160e3	fix(crawler): scope chapter_sources lookup per-manga (0.35.5) chapter_sources's PRIMARY KEY was (source_id, source_chapter_key) and the lookup in sync_manga_chapters didn't constrain by manga_id, so a source whose chapter slugs aren't globally unique (e.g. "chapter-1" appearing under multiple mangas) silently attributed every collision to the first manga that synced it. The INSERT path would have conflicted on the second manga's sync. Migration 0017 drops the old PK and rekeys on (source_id, chapter_id) — the natural identity of a per-source chapter attachment — and adds an index on (source_id, source_chapter_key) for the lookup path. The repo lookup now joins chapters and filters by manga_id; the UPDATE path keys on chapter_id directly (the row's natural identifier post-migration). Test sync_chapters_isolates_colliding_keys_across_mangas pins the contract end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:43:08 +02:00
MechaCat02	50763addcf	fix(crawler): quarantine recently-dead chapters from re-enqueue (0.35.4) The partial dedup index only blocks (pending\|running) duplicates, so once a SyncChapterContent job transitions to 'dead' (max_attempts exhausted) the slot frees. Every subsequent cron tick re-enqueued the chapter — page_count = 0 and dropped_at IS NULL stay true — burned another max_attempts retries, and died again. Permanent-failure chapters spun forever. enqueue_bookmarked_pending and enqueue_pending_for_manga now skip chapters whose latest sync_chapter_content job is dead within CHAPTER_DEAD_QUARANTINE_DAYS (7). A failed chapter goes silent for a week, then gets one more shot — long enough for a transient site issue to resolve, short enough that permanent failures don't stay permanent if conditions change. Two integration tests pin both halves of the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:42:41 +02:00
MechaCat02	766c6eebac	fix(crawler): guard ack_done/ack_failed/release on state='running' (0.35.3) The three lease-ack functions matched their UPDATE on the job id alone. If a lease expired and another worker re-leased the row, a late ack from the original worker would clobber the new lease's state, leased_until, and (for release) decrement its attempts. Add `AND state = 'running'` to each UPDATE and log a warn when rows_affected is zero, so a stolen lease shows up in telemetry without blocking the new lease holder's progress. Three new integration tests pin the contract: - ack_done_no_ops_when_lease_was_stolen - ack_failed_no_ops_when_state_is_not_running - release_no_ops_when_state_is_not_running Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:42:18 +02:00
MechaCat02	c686d6eb51	fix(crawler): sentinel-gate parse_chapter_list to stop false drops (0.35.2) parse_chapter_list previously returned Vec::new() on any selector miss. The empty list flowed into sync_manga_chapters, whose soft-drop branch then flipped every existing chapter's dropped_at to NOW(). Bookmarks subsequently pointed at dropped sources, and enqueue_bookmarked_pending (filters on cs.dropped_at IS NULL) silently stopped re-fetching pages. Same shape as the walker race fixed in 0.35.1: a transient parse miss masquerading as "source removed everything" → false soft-drop. Fix: require #chapter_table in the DOM. Present-but-empty is preserved as Ok(vec![]) so a freshly added manga with no published chapters still parses cleanly. Absent table is now Transient — the job system reschedules with backoff instead of treating the partial render as data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:41:47 +02:00
MechaCat02	dea9b1aaa8	fix(crawler): close walker race against site reordering (0.35.1) The target site orders by update_date DESC, and any new or updated manga pushes everyone down by one slot. The paginated walker was blind to this drift: * Backfill (page last -> 1): shifts push items into pages already finished. The displaced manga was silently missed; with mark_dropped_mangas running on a fully-completed walk, items even got false-dropped because last_seen_at was stale. * Incremental (page 1 -> last): a shift causes the slot-last item of an already-read page to reappear on the next page, leading to a redundant fetch_manga and an inflated consecutive_unchanged streak. Fix is two-pronged: 1. Backfill boundary re-check. After fetching each page P, re-fetch the previously-walked page P+1 and check where its old slot-0 key now sits. If it slid to slot K, the first K entries are items that used to live on P and slid past us; they get appended to the batch. If the anchor is gone entirely (multi-page shift or it was bumped to page 1), the whole re-fetched page is processed conservatively and the pipeline dedup absorbs the noise. The re-check must be the last navigation of the iteration to close the within-iteration race. 2. Run-scoped dedup in run_metadata_pass. A HashSet<String> of source_manga_keys avoids double-processing. The set uses a contains-then-insert pattern with insert firing after a successful upsert, so a transient fetch/upsert failure leaves the key retryable if it reappears later in the same pass (via the boundary re-check or another batch). Incremental mode does not run the re-check (shifts move in the same direction as the walk); only the dedup helps it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:14:01 +02:00
MechaCat02	f57ca8e45c	feat: harden auth, shutdown, and session bundle (0.35.0) Some checks failed deploy / test-backend (push) Failing after 1m37s Details deploy / test-frontend (push) Failing after 16m31s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details Three features bundled into one release: - rate-limit /auth/login, /register, /me/password (token bucket, 5 req/sec sustained with 10-request burst by default; 429 + Retry-After header on hit; tracing::warn! per hit so operators see attack patterns; AUTH_RATE_PER_SEC / AUTH_RATE_BURST env knobs) - handle SIGTERM for graceful container stops (replaces bare ctrl_c() with a select over ctrl_c + SignalKind::terminate() so docker compose stop runs the daemon shutdown path instead of letting Chromium leak past SIGKILL) - clear session.user on 401 from any API call (setOn401Hook in api/client.ts, registered from session.svelte.ts gated on $app/environment::browser so the SSR bundle never installs it; fixes "logged in but no bookmarks/collections" mid-session expiry state) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 20:27:21 +02:00
MechaCat02	8d34132883	bugfix: security & correctness bundle (0.34.1) Five fixes bundled into one release: - preserve user-attached tags across crawler upserts (repo::crawler::sync_tags now scopes to added_by IS NULL; orphaned attachments from deleted users are reaped as crawler-owned) - gate manga PATCH and cover endpoints on uploaded_by (require_can_edit in api::mangas; non-NULL uploaded_by must match the caller) - equalise login response time across user-existence branches (run argon2 against a OnceLock-cached dummy hash on the no-user branch so timing doesn't leak username existence) - crawler download defences (SSRF allowlist of host literals including IPv4-mapped IPv6 ranges, 32 MiB streamed size cap, reject non-whitelisted image types, three-way chapter-probe classifier replaces the binary #avatar_menu check) - tighten validation and clean up dead unload path (attach_tag + create_token enforce 64-char caps; LocalStorage rejects NUL bytes explicitly; reader flushFinalProgress drops the always-405 sendBeacon path) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 20:24:51 +02:00
MechaCat02	c5c1179e9d	chore: full hop-by-hop header strip and 60s timeout on /api/* proxy The SvelteKit proxy was only stripping host + content-length; the rest of RFC 7230 §6.1 (connection, keep-alive, proxy-authenticate, proxy-authorization, te, trailer, transfer-encoding, upgrade) leaked through to axum. Axum doesn't emit them so the impact is theoretical, but the proxy should be RFC-conformant. Also adds an AbortController with a configurable 60s timeout (BACKEND_PROXY_TIMEOUT_MS) so a wedged backend can't hang the browser request indefinitely — failures surface as the standard 502 upstream_unavailable envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	c320eda7cd	chore: dedupe is_unique_violation, lift SQL into repo, centralise URL parsing Three layering cleanups from REVIEW.md §5 / §3: - Drop the three private `is_unique_violation` helpers in repo::{user,chapter,bookmark} in favour of sqlx 0.8's `DatabaseError::is_unique_violation()` method (already used by repo::collection). - Remove the unreachable 23505 branch in repo::chapter::create — the (manga_id, number) UNIQUE was dropped in 0013, so the defensive arm could no longer fire. A doc note records what to do if uniqueness is re-added. - Move three inline SQL queries out of handlers/daemon into repo functions: bookmarks' chapter-belongs-to-manga guard (`repo::chapter::belongs_to_manga`), the daemon's dispatch lookup (`repo::chapter::dispatch_target`), and the daemon's page_count safety net (`repo::chapter::page_count`). Restores the handlers→repo layering invariant in CLAUDE.md. - New `crawler::url_utils` module consolidates host_of / origin_of / registrable_domain — they used to live in three crawler submodules with diverging edge-case behaviour. Tests moved with them. - Doc cross-references on repo::author::set_for_manga and repo::genre::set_for_manga pointing to the crawler's name-keyed variants, so the intentional duplication is discoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	bd9a6bd257	chore: drop dead 'failed' branch from crawler_jobs partial index 0012_crawler.sql's partial index on `state IN ('pending','failed')` indexes a state that no code path ever writes — ack_failed in crawler/jobs.rs only ever moves jobs to 'dead' or 'pending'. The 'failed' branch costs a write on every state change without ever matching a query. Drop it; the CHECK still allows 'failed' so a future migration can re-introduce it cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	ebc1966103	chore: run containers as non-root, add HEALTHCHECK, npm ci Backend: new `app` user (UID 10001), STORAGE_DIR pre-chowned so the named volume inherits ownership, curl installed for the HEALTHCHECK that pings /api/v1/health. The crawler's Chromium uses --no-sandbox already so dropping privileges costs nothing operationally. Frontend: switch `npm install` to `npm ci` (matches CI; deterministic versions; refuses to silently rewrite package-lock.json mid-build). Run as the built-in `node` user via --chown=node:node, add a busybox wget HEALTHCHECK on port 3000. Both images now expose container-level health so orchestrators can take a wedged container out of rotation instead of letting it keep serving timeouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	e4333631e1	chore: run CI on PRs, require POSTGRES_PASSWORD, document HTTPS need - .gitea/workflows/deploy.yml: trigger on pull_request to main so PRs get test feedback; gate build-and-push + deploy on push events so PRs only run the test jobs (no registry push, no SSH deploy). - docker-compose.yml: change `${POSTGRES_PASSWORD:-mangalord}` to `${POSTGRES_PASSWORD:?...}` so a deploy without an .env fails fast instead of booting Postgres with a known-default credential. - .env.example: change the example value to a "change-me" sentinel, add a banner explaining that production needs HTTPS in front of the frontend container because COOKIE_SECURE=true makes browsers refuse cookies over plain HTTP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	e7662d18d6	feat: gitea actions for build, push, and ssh deploy (0.34.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 06:56:13 +02:00
MechaCat02	45ce0d8f12	feat: incremental crawl mode with seed-completion gate (0.33.0) Daemon now auto-detects mode per source: Backfill until the first full walk records `seed_completed:<source>` in `crawler_state`, then Incremental (newest-first, stops after N consecutive Unchanged upserts). `CRAWLER_MODE` overrides to a fixed mode; CLI rejects `auto` since it has no pre-run DB state. `Source::discover` returns a lazy `DiscoverWalk` so Incremental can break out mid-walk without prefetching pages. The drop pass and seed marker are now gated on a true full walk — fixes a latent soft-drop of the index tail under partial sweeps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 06:41:26 +02:00
MechaCat02	51f42b03e9	feat: default crawler browser to headless (0.32.0) LaunchOptions::from_env() and LaunchOptions::default() now return BrowserMode::Headless. The in-process daemon (via CrawlerConfig::from_env) and the standalone crawler binary both pick this up — no display required for production runs, smaller resource footprint. `Headed` stays as an explicit opt-in via CRAWLER_BROWSER_MODE=headed for debugging or sites that fingerprint headless Chrome. New unit test locks the default in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 20:27:05 +02:00
MechaCat02	fa0a7da311	feat: edit existing manga metadata (0.31.0) Adds PUT /mangas/:id/cover (multipart) and DELETE /mangas/:id/cover so covers can be replaced or cleared after creation, and wires a dedicated /manga/[id]/edit SvelteKit route that combines the existing PATCH with the new cover endpoints. Cover PUT cleans up the old blob when the extension changes, swallowing StorageError::NotFound so a manually-gone file doesn't surface as a 404 to the client. Edit link on the manga detail page is gated on session.user, matching the auth posture of the underlying handlers. Also pins the local-dev port story via loadEnv() in vite.config.ts so VITE_PORT / BACKEND_URL from a (gitignored) .env keep the dev URL stable across runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 20:26:23 +02:00
MechaCat02	9ff49166a5	feat: transient-page detection across the crawler (0.30.0) Until now, when the target site returned its 403 "we're sorry, the request file are not found" response on a page that actually exists, selectors matched nothing and the crawler treated the page as "legitimately empty". Pagination walks silently dropped whole pages worth of mangas, fetch_manga skipped individual entries, and the startup session probe blamed PHPSESSID for what was a site hiccup. This branch adds a single detection layer that the whole pipeline routes through: - `crawler::detect`: PageError::Transient typed signal, plus two primitives (`is_broken_page_body` matches the universal 403 body; `has_logo_sentinel` asserts #logo, the site-wide header element) and a `retry_on_transient` helper that retries a closure on Transient with a small attempt budget. - `navigate()` screens every fetched body for the broken-page signature before handing it to a selector. - Parsers (`parse_manga_list_from`, `parse_manga_detail`, `parse_chapter_pages`) check their structural sentinels (#logo for full-layout pages; a#pic_container for the reader, which doesn't render #logo) and return Result<_, PageError>. Empty Vec is now reserved for genuinely empty pages. - `discover()` retries each pagination page up to 3× (2s apart) before failing the whole Discover job — at which point the existing job system's retry/backoff takes over for longer outages. - `verify_session` is three-state: broken-page → retry probe; #logo present but #avatar_menu absent → genuine logout (the only state that should blame PHPSESSID); both present → ok. Test coverage added at the helper level: 13 unit tests for the detection module (body signature, logo sentinel, PageError, retry helper), parser-level tests for both transient and legitimately-empty inputs, and 6 unit tests for the session probe classifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 22:47:21 +02:00
MechaCat02	b845d88766	feat: bookmark create enqueues SyncChapterContent jobs (0.29.0) After a successful bookmark insert, the create handler spawns a detached tokio task that calls pipeline::enqueue_pending_for_manga for every chapter of the manga where page_count = 0 and the source row is not dropped. Bookmark create returns 201 immediately; enqueue work happens in the background and its failure is logged without surfacing to the user (the daily cron sweeps anything missed). The Phase A dedup index handles re-bookmarks idempotently — deleting and recreating a bookmark does not duplicate in-flight jobs — and the Phase B worker pool drains them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 20:59:14 +02:00
MechaCat02	9fe0f26d75	feat: in-process crawler daemon with cron and worker pool (0.28.0) The backend now boots an internal crawler daemon that runs a daily metadata pass (CRAWLER_DAILY_AT in CRAWLER_TZ, advisory-lock guarded for multi-replica safety) and drains SyncChapterContent jobs from crawler_jobs through a worker pool. Chromium launches lazily on first job and is torn down after CRAWLER_IDLE_TIMEOUT_S seconds of inactivity. Modules: - crawler::browser_manager — lazy-launch / idle-teardown wrapper around browser::Handle, with an on_launch hook that re-injects PHPSESSID on every fresh Chromium spawn. - crawler::pipeline — run_metadata_pass (the shared discover/upsert /cover/sync-chapters loop) and the enqueue_bookmarked_pending helper used by the cron tick. - crawler::daemon — cron task + worker pool, behind two trait seams (MetadataPass, ChapterDispatcher) so tests can inject stubs without standing up Chromium or a live source. Behavior: - CRAWLER_DAEMON=false skips daemon spawn entirely (default for tests). - Catch-up tick fires on startup if the last persisted slot was missed. - A SyncOutcome::SessionExpired sets a sticky AtomicBool; workers idle until operator restart with a refreshed PHPSESSID. - Worker dispatch wrapped in catch_unwind so a panicking handler marks the job failed instead of taking down the worker. - Migration 0015 adds a small crawler_state k-v table for the last_metadata_tick_at watermark. Dep additions: chrono-tz (IANA TZ parsing). CLI (bin/crawler) reuses pipeline::run_metadata_pass and now holds the browser via BrowserManager so the on_launch session injection flow stays in one place. Inline chapter-content sync semantics are unchanged — the queue is for the daemon, force-refetches and manual backfills still bypass it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 20:32:02 +02:00
MechaCat02	93c7fd63fc	feat: crawler job queue ops and dedup index (0.27.0) Adds enqueue / lease / ack_done / ack_failed / release / reap_done on crawler::jobs, backed by the existing crawler_jobs table. lease() uses a single FOR UPDATE SKIP LOCKED CTE that also re-claims stale running rows (crashed-worker recovery), and ack_failed applies an exponential backoff capped at 1h before retrying. Migration 0014 adds a partial unique index on (payload->>'chapter_id') restricted to (pending\|running) sync_chapter_content jobs, so producers can just INSERT ... ON CONFLICT DO NOTHING without racing each other. The slot frees again the moment the job leaves the in-flight states, so a future force-refetch can re-enqueue. Library-only — no daemon, no API hook. Those land in the next two phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 19:59:09 +02:00
MechaCat02	89b84252a5	bugfix: subquery-wrap pending chapters query so DISTINCT + ORDER BY agree (0.26.1) PG rejects `SELECT DISTINCT c.id, c.manga_id, cs.source_url ... ORDER BY c.manga_id, c.created_at` because the ORDER BY references a column not in the DISTINCT projection. Wrap the DISTINCT in a subquery (which includes created_at) and apply the ORDER BY in the outer SELECT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 22:20:15 +02:00
MechaCat02	728d704a66	feat: CRAWLER_KEEP_BROWSER_OPEN waits for Ctrl+C in headed mode (0.26.0) Debug aid: when set in headed mode, the crawler blocks on Ctrl+C at every shutdown point (early auth bails + normal completion) instead of closing the browser immediately. Operator can inspect DOM, cookies, and network state in the visible Chromium window before exit. Ignored in headless (no window to inspect) — logged as a warning if set under headless so the operator doesn't sit waiting. chromiumoxide's `Browser` is `kill_on_drop`, so the close-or-wait helper must await Ctrl+C before the Handle is dropped — otherwise the Chromium child gets killed out from under the operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 21:33:18 +02:00
MechaCat02	d24e68c78d	feat: chapter content sync via PHPSESSID + per-host pacing (0.25.0) After the metadata pass, the crawler now fetches per-chapter image content for chapters belonging to bookmarked mangas. Logged-in chapter pages render every page image at once (no per-page navigation), so the crawler reuses the operator's browser session via a pasted PHPSESSID cookie. Each chapter sync is a single transaction: storage puts + page row inserts + page_count update commit together, or roll back together on any image error so the chapter stays at page_count=0 and is retried next run. New crawler modules: - `rate_limit::HostRateLimiters`: per-host buckets keyed by URL host, with optional per-host overrides. Replaces the single shared `Mutex<RateLimiter>`. Catalog and CDN no longer share a budget; default 1 req/s per host. - `session`: derives `.<registrable>.<tld>` from the start URL (override via `CRAWLER_COOKIE_DOMAIN` for multi-part TLDs), injects PHPSESSID into the Chromium cookie store, probes `#avatar_menu` at startup to fail fast on a bad/expired cookie. - `content`: parses `a#pic_container img:not(.loading)` with `pageN` id-based sorting (DOM order isn't trusted), then performs the atomic chapter sync. bin/crawler additions: - Concurrent chapter content phase via `futures_util::for_each_concurrent` (`CRAWLER_CHAPTER_WORKERS`, default 1). Browser is borrowed across workers — chromiumoxide allows concurrent `new_page` on `&self` — and per-host rate limit gates total RPS regardless of worker count. - reqwest gets the `cookies` feature, a `Jar` seeded with PHPSESSID for the catalog domain only (CDN intentionally not given the cookie), and `Referer` is set on cover + chapter image fetches. - New env knobs: `CRAWLER_PHPSESSID`, `CRAWLER_COOKIE_DOMAIN`, `CRAWLER_USER_AGENT`, `CRAWLER_CHAPTER_WORKERS`, `CRAWLER_SKIP_CHAPTER_CONTENT`, `CRAWLER_FORCE_REFETCH_CHAPTERS`, `CRAWLER_CDN_HOST` + `CRAWLER_CDN_RATE_MS`. - Mid-run session-expired detection: `#avatar_menu` is re-checked on every chapter page nav; first failure aborts the phase with a cookie-refresh message. Bookmark-driven enqueueing is sync-on-crawl-tick only: the bookmarked chapters with `page_count = 0` are queried at the start of the chapter-content phase. Sync-on-bookmark via an API hook is deferred to a follow-up branch — that needs a daemon consumer of crawler_jobs, which doesn't exist yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 00:28:36 +02:00
MechaCat02	51346227dd	feat: route reader by chapter id, allow duplicate-numbered chapters (0.24.0) Real-world sources publish multiple chapters at the same number: different scanlators ("Ch.52 from bloomingdale" + "Ch.52 from mina"), translator notices and farewells, alt-translations. The (manga_id, number) UNIQUE constraint from 0001 silently collapsed all of those into a single row via the upsert path in repo::crawler. Migration 0013 drops the constraint; sync_manga_chapters now plain-INSERTs each SourceChapterRef so every parsed chapter survives as its own row. Identity moves from the (manga_id, number) tuple to the chapter UUID: - `GET /api/v1/mangas/:manga_id/chapters/:chapter_id` (replaces :number) - `GET /api/v1/mangas/:manga_id/chapters/:chapter_id/pages` - `repo::chapter::find_by_id_in_manga` (replaces find_by_manga_and_number) - Frontend reader route renamed to `/manga/[id]/chapter/[chapter_id]` - Chapter links throughout (manga page list, continue-reading CTA, reader prev/next, history rows, bookmark cards) use chapter.id - API clients getChapter/getChapterPages take a chapter id string read_progress + bookmarks already FK chapter_id; they only enrich with chapter_number for display, which is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:37:07 +02:00
MechaCat02	c51353ead3	bugfix: chapter source key uses chapter id, not /pg-1/ (0.23.1) Listing links point at the reader's page 1 (`.../uu/br_chapter-N/pg-1/`). The generic `derive_key_from_url` took the last URL segment and returned `"pg-1"` for every chapter, so all parsed chapters collapsed onto a single `chapter_sources` row downstream and the first-manga chapter was the only row that survived. New `derive_chapter_key_from_url` strips a trailing `/pg-\d+/` before picking the chapter-identifying segment (`br_chapter-N` / `to_chapter-N`). Notices, hiatus rows, and duplicate-numbered chapters are preserved as distinct parser entries. The (manga_id, number) UNIQUE collapse in the chapters table is a separate, follow-up concern handled in feat/chapter-id-routing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:15:36 +02:00
MechaCat02	b1a3a4e9d3	feat: crawler manga-list & metadata sync with cover download (0.23.0) - TargetSource: first concrete impl of the Source trait, modeled on the old Puppeteer crawler's selectors (+ status normalization, tag-count stripping, chapter list) - DiscoverMode::Backfill walks pagination last->1, reverse within each page (oldest-first); Incremental walks forward - RateLimiter (tokio-time aware) plumbed through FetchContext so the pagination walk honors the same per-host budget as the outer loop - repo::crawler: ensure_source, upsert_manga_from_source (returns New/Updated/Unchanged + current cover_image_path for backfill decisions), sync_manga_chapters, mark_dropped_mangas — all transactional, with case-insensitive lookups and source-insertable genres - Cover image download via reqwest+infer; stored under mangas/{id}/cover.{ext} via the Storage trait - Single CRAWLER_PROXY env wires both Chromium (--proxy-server) and reqwest::Proxy::all (HTTP/HTTPS/SOCKS5) - Crawler binary: positional start URL or $CRAWLER_START_URL, $CRAWLER_LIMIT (cap fetches + skip drop pass on partial runs), $CRAWLER_SKIP_CHAPTERS (disable selector AND sync), $CRAWLER_RATE_MS - Silences chromiumoxide 0.7's known CDP deserialize log spam via default tracing filter + CdpError::Serde downgrade - 9 sqlx integration tests + 11 selector/rate-limit unit tests	2026-05-21 22:04:23 +02:00
MechaCat02	26eccd0abe	feat: crawler scaffold with chromium launcher (0.22.0) - crawler module (browser, source trait, jobs, diff) + binary - chromiumoxide launcher with fetcher feature (auto-downloads Chromium on first run, caches under ~/.cache/mangalord/chromium) - LaunchOptions struct with extra_args, parseable from CRAWLER_BROWSER_MODE and CRAWLER_BROWSER_ARGS - migration 0012 introduces sources, manga_sources, chapter_sources, crawler_jobs - integration tests for headed + headless launch, ipify load+parse, and extra-args propagation (all #[ignore], opt-in)	2026-05-20 22:07:56 +02:00
MechaCat02	89b8785a40	bugfix: reader-nav is fully fixed; no settle-on-scroll (0.21.3)	2026-05-17 20:57:05 +02:00
MechaCat02	64ccc0ba84	bugfix: measure bar heights with ResizeObserver instead of magic numbers (0.21.2)	2026-05-17 20:47:32 +02:00
MechaCat02	215325ad2f	bugfix: reader nav sticks under the app header instead of behind it (0.21.1) $(top offset was 44px (header's 60px minus var(--space-4)), placing the bar inside the layout header. Now sticks at var(--app-header-h).) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:42:38 +02:00

1 2

86 Commits