Mangalord

Author	SHA1	Message	Date
MechaCat02	679abae736	feat(chapter): preserve source-site order in chapter list (0.52.0) Some checks failed deploy / test-backend (push) Failing after 11m48s Details deploy / test-frontend (push) Successful in 9m45s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details The user-facing chapter list ordered by (number ASC, created_at ASC), which broke the source site's order in two ways: non-numeric entries ("notice. : Officials") parsed to number=0 and clustered at the top, even though the site placed them mid-list, and variants sharing a number ("Ch.14 : PH" / "Ch.14 : Official") were torn apart by the created_at tiebreak. Capture each chapter's position in the source DOM as `source_index` (0 = first = newest on this site) on every crawler sync, including the UPDATE branch so a new chapter prepended on the source shifts every existing row down by one on the next tick. The list query reverses this with `ORDER BY source_index DESC NULLS LAST, number ASC, created_at ASC` so the oldest chapter appears first, variants stay adjacent in the order the site shows them, and non-numeric entries land where the site placed them. User-uploaded chapters and pre- migration rows keep their NULL source_index and fall through to the prior number/created_at tiebreak via NULLS LAST. The reader's client-side `[...chapters].sort((a,b) => a.number - b.number)` is dropped; prev/next now walks the server-ordered array positionally so it traverses variants and non-numeric entries in display order. Existing data populates on the next cron tick or via admin force-resync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 07:25:09 +02:00
MechaCat02	c134bdbbde	feat: cover retry backfill + admin force-resync for manga & chapter (0.50.0) Adds a per-tick cover-backfill pass to the crawler daemon so mangas whose cover download failed on first attempt get retried — the metadata pass's early-stop optimisation otherwise prevents the walk from revisiting them. Adds admin-only POST /admin/mangas/:id/resync and POST /admin/chapters/:id/resync that refetch metadata + cover (or chapter content with force_refetch) from the crawler source synchronously and return the refreshed row. Surfaced in the UI as "Force resync" buttons on the manga detail and reader pages, admin-only via session.user.is_admin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-01 22:00:09 +02:00
MechaCat02	eaa5afda50	fix(crawler): skip sync when empty chapters + prior > 0 (0.36.6) The wait_for_selector wait in 0.36.2 narrows the partial-render race window but doesn't close it: a render that takes longer than SELECTOR_TIMEOUT (10s) still hands an empty Vec to sync_manga_chapters, and the soft-drop branch flips every existing chapter to dropped_at. The next tick recovers but a manga's reader briefly stops working in between. Close it at the pipeline level. Between fetch_manga and the upsert/ sync, if the parsed chapter list is empty and the prior live count for (source_id, source_manga_key) is > 0, treat the fetch as a transient failure: log, bump mangas_failed, skip upsert + sync + the seen.insert so a later batch / tick retries. Brand-new mangas with genuinely zero chapters (prior == 0) pass through unchanged. New repo helper repo::crawler::live_chapter_count_for_source_manga joins chapters → chapter_sources → manga_sources with dropped_at IS NULL — same lockstep as dispatch_target and the enqueue queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:17:42 +02:00
MechaCat02	9f56f283d4	feat(crawler): single-mode walker gated by recovery flag (0.36.0) Collapses the crawler to a single newest-first walker and replaces the N-consecutive-unchanged streak with a per-manga rule: stop on the first manga where metadata is Unchanged AND chapter sync reports zero new chapters. The early stop is gated by a per-source recovery flag stored in `crawler_state` — set to `false` when a run starts, back to `true` only on a clean exit (end-of-walk or intentional stop). A crashed run leaves the flag `false` automatically (no shutdown code runs), so the next tick walks the full catalog instead of bailing at the first caught-up manga. This means a crashed mid-walk run self-heals on the next tick: the flag stays `false`, the next walk visits every page (recovering anything the crash missed past its crash point), and steady state resumes once the recovery sweep reaches end-of-walk. Removed: - DiscoverMode enum, Backfill mode, the boundary re-check + displaced-refs machinery in TargetSourceWalker. - Drop-pass (mark_dropped_mangas) and seed-completion plumbing (mark_seed_completed / seed_completed_at). The recovery flag subsumes the seed-completion signal; drop detection was explicitly opted out. - JobPayload::Discover (no production callers). - CRAWLER_MODE / CRAWLER_INCREMENTAL_STOP_AFTER env vars and the CrawlerModePref config type. `should_mark_clean_exit(walked_to_completion, hit_stop_condition)` encodes the clean-exit truth table in its signature — `hit_limit` is deliberately absent so a future edit cannot accidentally count a caller-imposed cap as a clean exit. Net -501 lines, 261 backend tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 23:49:28 +02:00
MechaCat02	33f7e19077	fix(crawler): serialize sync_manga_chapters per-manga (0.35.6) Two concurrent calls of sync_manga_chapters for the same manga both read seen_keys, both run the drop UPDATE filtered on `NOT (key = ANY $3)`, and the later commit can soft-drop a chapter the earlier had just inserted (lost-update under MVCC). Today the cron tick is the only caller and the daemon-level advisory lock keeps it single-flight, but that lock is held on one pool connection and doesn't actually serialize the function: any future caller (bookmark hook, admin-triggered re-sync, parallel worker) would race against the cron. Add `pg_advisory_xact_lock(hashtextextended(manga_id::text, 0))` at the start of the transaction. Auto-releases on commit/rollback so a panic mid-call can't strand the lock. Lock keyed per-manga so calls for different mangas still parallelize. Test sync_chapters_serializes_concurrent_calls_for_same_manga spawns two tokio tasks calling the function concurrently with overlapping chapter lists and asserts every chapter survives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:45:01 +02:00
MechaCat02	c6bb9160e3	fix(crawler): scope chapter_sources lookup per-manga (0.35.5) chapter_sources's PRIMARY KEY was (source_id, source_chapter_key) and the lookup in sync_manga_chapters didn't constrain by manga_id, so a source whose chapter slugs aren't globally unique (e.g. "chapter-1" appearing under multiple mangas) silently attributed every collision to the first manga that synced it. The INSERT path would have conflicted on the second manga's sync. Migration 0017 drops the old PK and rekeys on (source_id, chapter_id) — the natural identity of a per-source chapter attachment — and adds an index on (source_id, source_chapter_key) for the lookup path. The repo lookup now joins chapters and filters by manga_id; the UPDATE path keys on chapter_id directly (the row's natural identifier post-migration). Test sync_chapters_isolates_colliding_keys_across_mangas pins the contract end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:43:08 +02:00
MechaCat02	8d34132883	bugfix: security & correctness bundle (0.34.1) Five fixes bundled into one release: - preserve user-attached tags across crawler upserts (repo::crawler::sync_tags now scopes to added_by IS NULL; orphaned attachments from deleted users are reaped as crawler-owned) - gate manga PATCH and cover endpoints on uploaded_by (require_can_edit in api::mangas; non-NULL uploaded_by must match the caller) - equalise login response time across user-existence branches (run argon2 against a OnceLock-cached dummy hash on the no-user branch so timing doesn't leak username existence) - crawler download defences (SSRF allowlist of host literals including IPv4-mapped IPv6 ranges, 32 MiB streamed size cap, reject non-whitelisted image types, three-way chapter-probe classifier replaces the binary #avatar_menu check) - tighten validation and clean up dead unload path (attach_tag + create_token enforce 64-char caps; LocalStorage rejects NUL bytes explicitly; reader flushFinalProgress drops the always-405 sendBeacon path) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 20:24:51 +02:00
MechaCat02	51346227dd	feat: route reader by chapter id, allow duplicate-numbered chapters (0.24.0) Real-world sources publish multiple chapters at the same number: different scanlators ("Ch.52 from bloomingdale" + "Ch.52 from mina"), translator notices and farewells, alt-translations. The (manga_id, number) UNIQUE constraint from 0001 silently collapsed all of those into a single row via the upsert path in repo::crawler. Migration 0013 drops the constraint; sync_manga_chapters now plain-INSERTs each SourceChapterRef so every parsed chapter survives as its own row. Identity moves from the (manga_id, number) tuple to the chapter UUID: - `GET /api/v1/mangas/:manga_id/chapters/:chapter_id` (replaces :number) - `GET /api/v1/mangas/:manga_id/chapters/:chapter_id/pages` - `repo::chapter::find_by_id_in_manga` (replaces find_by_manga_and_number) - Frontend reader route renamed to `/manga/[id]/chapter/[chapter_id]` - Chapter links throughout (manga page list, continue-reading CTA, reader prev/next, history rows, bookmark cards) use chapter.id - API clients getChapter/getChapterPages take a chapter id string read_progress + bookmarks already FK chapter_id; they only enrich with chapter_number for display, which is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 23:37:07 +02:00
MechaCat02	b1a3a4e9d3	feat: crawler manga-list & metadata sync with cover download (0.23.0) - TargetSource: first concrete impl of the Source trait, modeled on the old Puppeteer crawler's selectors (+ status normalization, tag-count stripping, chapter list) - DiscoverMode::Backfill walks pagination last->1, reverse within each page (oldest-first); Incremental walks forward - RateLimiter (tokio-time aware) plumbed through FetchContext so the pagination walk honors the same per-host budget as the outer loop - repo::crawler: ensure_source, upsert_manga_from_source (returns New/Updated/Unchanged + current cover_image_path for backfill decisions), sync_manga_chapters, mark_dropped_mangas — all transactional, with case-insensitive lookups and source-insertable genres - Cover image download via reqwest+infer; stored under mangas/{id}/cover.{ext} via the Storage trait - Single CRAWLER_PROXY env wires both Chromium (--proxy-server) and reqwest::Proxy::all (HTTP/HTTPS/SOCKS5) - Crawler binary: positional start URL or $CRAWLER_START_URL, $CRAWLER_LIMIT (cap fetches + skip drop pass on partial runs), $CRAWLER_SKIP_CHAPTERS (disable selector AND sync), $CRAWLER_RATE_MS - Silences chromiumoxide 0.7's known CDP deserialize log spam via default tracing filter + CdpError::Serde downgrade - 9 sqlx integration tests + 11 selector/rate-limit unit tests	2026-05-21 22:04:23 +02:00

9 Commits