Mangalord

Author	SHA1	Message	Date
MechaCat02	e50fc093c3	feat: add PRIVATE_MODE site-wide auth gate (0.48.0) When `PRIVATE_MODE=true`, every API path except a small allowlist (`/health`, `/auth/{config,login,logout,register}`) requires a valid session cookie or bearer token — anonymous reads are rejected with 401. Self-registration is force-disabled in private mode regardless of `ALLOW_SELF_REGISTER`, so a locked-down instance flips with a single switch (admins still mint accounts via `POST /admin/users`). The backend gate is a tower middleware that reuses the existing `CurrentUser` extractor, so the cookie + bearer paths cannot drift from per-handler auth. `/auth/config` now exposes the flag plus the effective `self_register_enabled` value so the frontend can render the navbar correctly on the first paint. On the frontend, a new universal root `+layout.ts` fetches the config and redirects anonymous visitors to `/login?next=<path>` before page-specific loads fire. The redirect is UX only — the backend middleware is the source of truth, so crafted requests still 401. Defaults stay public (`PRIVATE_MODE=false`); existing deployments need no env change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-01 20:11:22 +02:00
MechaCat02	72756cfef2	feat(crawler): honour CRAWLER_LIMIT in the in-process daemon (0.47.0) The CLI binary already capped runs at CRAWLER_LIMIT mangas, but the daemon's RealMetadataPass passed a hardcoded `0` (no cap) to `pipeline::run_metadata_pass`, so the env var was silently ignored once the daemon took over the metadata pass. Adds `manga_limit` to `CrawlerConfig`, reads it from `CRAWLER_LIMIT` (default 0 = no cap), and threads it through `RealMetadataPass::run` so a daemon-driven sweep stops at the same boundary as a CLI run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-01 20:07:01 +02:00
MechaCat02	4e20350645	fix(crawler): translate socks5h:// → socks5:// for Chromium --proxy-server All checks were successful deploy / test-backend (push) Successful in 19m30s Details deploy / test-frontend (push) Successful in 9m42s Details deploy / build-and-push (push) Successful in 8m10s Details deploy / deploy (push) Successful in 15s Details Chromium doesn't know the socks5h scheme (curl/reqwest convention) and bails navigations with ERR_NO_SUPPORTED_PROXIES. It does, however, send destination hostnames over SOCKS5 by default, so stripping the `h` is a pure scheme rename — remote-DNS behaviour is preserved. reqwest keeps the user's original CRAWLER_PROXY string (`socks5h://...` remains valid and meaningful for it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:56:45 +02:00
MechaCat02	713ca139c4	feat(deploy): add optional tor service to dev compose for native-backend dev Mirrors the prod tor service but with 127.0.0.1-only host port bindings so a `cargo run` on the host can reach 127.0.0.1:9050 / 9051. Default password baked in (overridable via TOR_CONTROL_PASSWORD env) since host-loopback is the only exposure surface — same friction-free posture as the postgres entry in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:56:45 +02:00
MechaCat02	e3cff9d874	fix(deploy): pivot tor service to password auth + wrapper entrypoint Some checks failed deploy / test-backend (push) Successful in 20m29s Details deploy / test-frontend (push) Successful in 9m42s Details deploy / deploy (push) Has been cancelled Details deploy / build-and-push (push) Has been cancelled Details Dockurr/tor's stock entrypoint binds the control port to localhost (unreachable from a sibling container), refuses to run as a non-default user (its setup chowns dirs and su-execs down to its `tor` user, both requiring root), and skips its own HashedControlPassword injection whenever the user's torrc declares a ControlPort. The combination meant the original cookie-via-shared- volume design couldn't work without fighting the image. This commit: - Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD with `tor --hash-password`, appends the hash to a writable copy of /etc/tor/torrc, then execs tor. Container runs as root only for that bring-up; the torrc's `User tor` directive drops privs after port binding. - Adds a healthcheck on the tor service that gates downstream containers on both 9050 + 9051 actually listening (was service_started, which fires before tor finishes bootstrap). - Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have rotated mid-chapter for any chapter with > ~50 images, which is exactly the kind of fingerprint we're trying to avoid. - Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides (PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend). docker-compose.yml fails fast if unset. - Removes the tor-data shared volume on backend (cookie auth is no longer the default; operators wanting cookie can mount it back). - Documents the pivot + the cookie-vs-password tradeoff in .env.example. End-to-end validated: `docker compose up -d tor`, then `printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' \| nc tor 9051` returns three `250 OK` lines. Audit ref: #2, #3, #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:25:54 +02:00
MechaCat02	d47e832613	fix(crawler): redact TorAuth::Password in Debug, drop NEWNYM info→debug The startup log line in app.rs and bin/crawler.rs `?t`-debug-formats the TorController, which through the derived Debug on TorAuth would expand TorAuth::Password(p) and leak the plaintext password to logs. Implement Debug manually on TorAuth — None / Password(<redacted>) / Cookie(<path>) — and lock the redaction with a regression test. Drop the per-NEWNYM success log from info to debug: a busy crawl rotates circuits many times per minute. Failed NEWNYMs already log at warn — those stay loud. Tightens the closed_connection_mid_reply_is_an_error assertion which was tautological (`closed connection` OR `AUTHENTICATE`) by driving the mock to read the AUTH line then drop, exercising only the EOF-mid-reply path. Audit ref: #7, #9, nit on tautological test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:25:36 +02:00
MechaCat02	c30c7a546f	fix(crawler): unify recircuit budget semantics — N = total attempts The three retry-with-recircuit sites disagreed: detect.rs's retry_on_transient_with_hook used "N = total attempts" (3 → 3 fetches), but session.rs's unauth branch and content.rs's chapter loop used "N = recircuits" (3 → 4 fetches). At the same wall-clock "max=3", different sites hit the upstream a different number of times. Unify on N = total attempts (matching the existing retry_on_transient convention). The CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS env var now means exactly what its name suggests. Disabling the recircuit feature collapses to max_attempts=1 (single attempt, no retry) — bit-for-bit pre-TOR behavior preserved. Adds a debug_assert!(max >= 1) on both helpers and a new content.rs test exercising the mixed Transient → Unauth → Ok sequence to lock in the shared-counter invariant. Audit ref: #5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:25:25 +02:00
MechaCat02	a0db7beb81	chore: bump to 0.46.0 for TOR proxy + recircuit feature CRAWLER_TOR_CONTROL_URL, _PASSWORD, _COOKIE_PATH, _RECIRCUIT_MAX_ATTEMPTS are new feature env vars; treat per CLAUDE.md as a minor bump (feat:). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:01:57 +02:00
MechaCat02	ecbbebafc4	feat(deploy): dockurr/tor service + torrc; wire crawler to use it by default Adds a `tor` service to the compose stack (dockurr/tor) with a torrc tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr + IsolateDestPort so NEWNYM picks up promptly, control port on 9051 with cookie auth, MaxCircuitDirtiness 60. Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on out-of-the-box. Operators can override both to empty in .env to opt out without removing the service. The tor-data named volume is mounted ro on the backend so it can read /var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles the permissions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:01:04 +02:00
MechaCat02	8c6378b877	feat(crawler): recircuit TOR on transient pages and unauthenticated probes - target.rs swaps retry_on_transient → retry_on_transient_with_hook, signaling NEWNYM via ctx.tor between attempts when configured. - session.rs gains verify_session_with_recircuit; the bare verify_session is now a one-line wrapper passing tor=None, unauth_max_recircuit=0. The inner run_session_probe_loop is pure-over-IO and unit-tested with closure-based fakes. - content.rs extracts fetch_chapter_html_once + the closure-driven fetch_chapter_html_with_recircuit, used by sync_chapter_content to retry on Transient or Unauthenticated up to a recircuit_budget. Budget = 0 (no TOR) preserves original behavior bit-for-bit. - app.rs and bin/crawler.rs construct the controller before on_launch and pass it into verify_session_with_recircuit, so a transient hiccup at startup no longer requires PHPSESSID rotation. Recircuit budget defaults to CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS (3). Errors from NEWNYM are logged and swallowed — failing to recircuit should not take down the crawl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 20:01:04 +02:00
MechaCat02	8557e432a2	feat(crawler): plumb TorController through FetchContext and pipelines Adds CRAWLER_TOR_CONTROL_URL / _PASSWORD / _COOKIE_PATH / _RECIRCUIT_MAX_ATTEMPTS to CrawlerConfig and to bin/crawler.rs's env reads. Constructs an Option<Arc<TorController>> at daemon / CLI startup and threads it through FetchContext, pipeline::run_metadata_pass, and content::sync_chapter_content as Option<&TorController>. Pure scaffolding — the controller isn't used yet; behavior is unchanged. Next commit wires the retry hooks and session-probe recircuit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 19:59:47 +02:00
MechaCat02	d6d84dedcb	feat(crawler): retry_on_transient_with_hook for between-retry side effects Adds a sibling fn that fires a caller-supplied async hook between a transient failure and the next attempt. The existing retry_on_transient becomes a thin wrapper over it (no-op hook), so no call sites churn yet. Hook contract: fires only between attempts (N-1 times for N attempts), never after a non-transient error or after the final attempt. Designed for TOR NEWNYM, but the signature is generic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 19:59:47 +02:00
MechaCat02	d37b94871e	feat(crawler): TorController for control-port NEWNYM signaling Minimal client over tokio::net::TcpStream — AUTHENTICATE then SIGNAL NEWNYM, one-shot connection. Supports cookie-file and password auth (cookie preferred when both provided); covers the multi-line `250-...\r\n250 OK` reply form so future torrc tweaks won't confuse the parser. Not yet wired into the crawler — that lands in the next commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 19:59:47 +02:00
fabi	8e39fadd21	ci: build via host docker socket (plain build); fix missing daemon socket (#5 ) All checks were successful deploy / test-backend (push) Successful in 19m12s Details deploy / test-frontend (push) Successful in 9m43s Details deploy / build-and-push (push) Successful in 8m11s Details deploy / deploy (push) Successful in 11s Details	2026-05-31 17:40:14 +00:00
fabi	3b3d13a0f6	fix(crawler): walk list pages incrementally; stop on empty page (0.45.1) (#4 ) Some checks failed deploy / test-backend (push) Successful in 18m58s Details deploy / test-frontend (push) Successful in 9m43s Details deploy / build-and-push (push) Failing after 2m26s Details deploy / deploy (push) Has been skipped Details	2026-05-31 16:37:14 +00:00
fabi	0f90af80cb	ci(test-backend): ubuntu-latest + rustup (fix node-not-found) (#3 ) Some checks failed deploy / test-backend (push) Has been cancelled Details deploy / test-frontend (push) Has been cancelled Details deploy / build-and-push (push) Has been cancelled Details deploy / deploy (push) Has been cancelled Details	2026-05-31 16:18:21 +00:00
fabi	6b49a47d0a	feat(crawler): system Chromium via CRAWLER_CHROMIUM_BINARY (0.45.0) (#2 ) Some checks failed deploy / test-backend (push) Failing after 7s Details deploy / test-frontend (push) Failing after 33s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details	2026-05-31 15:47:47 +00:00
fabi	e851355f28	Merge pull request 'ci: no-SSH local deploy + Dockerfile build fixes' (#1 ) from fix/ci-deploy-pipeline into main Some checks failed deploy / test-backend (push) Failing after 7s Details deploy / test-frontend (push) Failing after 30s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details	2026-05-31 15:43:54 +00:00
fabi	2a0cc24c07	ci: deploy to the local stack over the runner socket, not SSH Some checks failed deploy / test-backend (pull_request) Failing after 1m6s Details deploy / test-frontend (pull_request) Failing after 1m18s Details deploy / build-and-push (pull_request) Has been skipped Details deploy / deploy (pull_request) Has been skipped Details The runner lives on the deploy host and shares its docker daemon, so the deploy job runs `docker compose pull && up -d` against the central compose via a bind-mounted compose dir (docker:cli + docker_host: "-") instead of appleboy/ssh-action. Drops the SSH_* secrets and recreates only the two mangalord services at the freshly built SHA. Requires /mnt/ssd/docker-data in the runner's container.valid_volumes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 17:26:58 +02:00
fabi	a615b0aee7	fix(docker): unblock image builds on this host - backend dep-cache stage stubs only main.rs/lib.rs, but Cargo.toml declares a second [[bin]] crawler at src/bin/crawler.rs, so `cargo build --locked` aborts ("can't find bin crawler"). Stub it too. - runtime was debian:bookworm-slim (glibc 2.36) while rust:1-slim now tracks trixie (glibc 2.41) -> "GLIBC_2.39 not found" at boot. Pin the runtime to debian:trixie-slim so it matches the builder's glibc. - frontend healthcheck probed localhost (-> musl picks IPv6 ::1) but the Node server binds IPv4 0.0.0.0 only -> false "unhealthy". Probe 127.0.0.1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 17:26:58 +02:00
MechaCat02	a2826d6467	feat(crawler): CRAWLER_ALLOW_ANY_HOST bypasses the host allowlist (0.44.0) Some checks failed deploy / test-backend (push) Failing after 11s Details deploy / test-frontend (push) Failing after 36s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details Operators whose sources shard images across numbered CDN subdomains can't pre-enumerate every host in CRAWLER_DOWNLOAD_ALLOWLIST. The new flag short-circuits the host check in DownloadAllowlist::contains while leaving scheme, localhost, and private-IP defenses in is_safe_url untouched — scraped URLs pointing at 10.x / 169.254.169.254 / file:// stay refused. Default is false; fail-closed posture is preserved unless the operator opts in. Wired into both the server (config::build_download_allowlist) and the bin/crawler.rs one-shot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 14:52:49 +02:00
MechaCat02	1eebb90e25	fix(crawler): unhang shutdown on lingering Arc<Browser>, silence WS noise (0.43.1) Some checks failed deploy / test-backend (push) Failing after 6s Details deploy / test-frontend (push) Failing after 40s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details - Handle::close aborts its chromiumoxide driver task when another Arc<Browser> outlives the call, so shutdown returns instead of hanging on a stream that never terminates. Generic close_or_abort helper with regression tests covering both Arc paths. - daemon.shutdown() is wrapped in a 5s timeout in main as defense in depth. - Default RUST_LOG silences chromiumoxide::conn / chromiumoxide::handler WS-deserialize ERROR spam. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 14:47:36 +02:00
MechaCat02	030b27754b	feat(api): admin-initiated user creation via POST /admin/users (0.43.0) Some checks failed deploy / test-backend (push) Failing after 8s Details deploy / test-frontend (push) Failing after 38s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details Pairs with the ALLOW_SELF_REGISTER toggle from 0.42.0: admins can mint accounts regardless of the toggle state, so a closed-membership deployment still has a working enrollment path. The endpoint accepts { username, password, is_admin? } so admins can mint co-admins in one call (avoiding a separate promote + extra audit row for the common "invite a co-admin" flow). Implementation: - POST /api/v1/admin/users guarded by RequireAdmin - Reuses validate_username / validate_password from api::auth (made pub(crate)) so the admin path can never produce an account self- register would reject and vice versa - repo::user::admin_create_user wraps INSERT + admin_audit insert in a single tx — same "audit reflects what committed" semantics as the existing admin_safe_* fns - Audit row: action="create_user", payload={username, is_admin} Frontend: - createAdminUser() in lib/api/admin.ts - /admin/users grows a collapsible "Create user" form above the table (username, password, "Make admin" checkbox). Errors surface inline; the list reloads on success. Backend tests: 7 new, including the headline `create_user_works_even_when_self_register_disabled` that pins the admin-create path is NOT gated by the public toggle.	2026-05-31 14:00:31 +02:00
MechaCat02	2f47faa11c	feat(auth): ALLOW_SELF_REGISTER toggle + public /auth/config endpoint (0.42.0) Lets operators run a closed-membership deployment by setting ALLOW_SELF_REGISTER=false (default true, so existing deploys are unaffected). When off, POST /auth/register returns 403 forbidden. The rate-limit token is consumed BEFORE the disabled check so the timing doesn't distinguish enabled-but-rejected from disabled — closes the toggle-state probe channel. New public GET /auth/config returns { self_register_enabled: bool } so the frontend can render its register affordances correctly without conflating "disabled" with "rate-limited" (which a probe attempt would). Frontend: a lightweight reactive `authConfig` store loads the flag once on root-layout mount (and again on /register direct navigation, which bypasses the layout's onMount). Header hides the Register link when the toggle is off; /register renders a "self-registration is disabled — ask an administrator" notice instead of the form. Admin-create endpoint that pairs with this toggle is intentionally not in this PR — it lands as the next branch (feat/admin-user-create). The toggle alone is independently useful for deployments that want to lock down enrollment without yet wiring an admin UI.	2026-05-31 13:56:18 +02:00
MechaCat02	6dd21451a8	chore: sync Cargo.lock for 0.41.2	2026-05-30 22:26:24 +02:00
MechaCat02	f6728dc71a	fix(admin): security-audit findings — paginate chapters, lock down unchecked helper (0.41.2) Addresses the security-audit findings on top of the admin feature stack: M1: /admin/mangas/:id/chapters now paginates (default limit 200, max 500). A long-runner with thousands of chapters would otherwise produce a multi-MB response with that many scalar subqueries per row — admin-only but a real stall risk on one expand-click. Adds explicit pagination tests for the cap and offset; frontend renders a "Showing first N of M" hint when the cap clips the result. L1: repo::user::set_is_admin renamed to set_is_admin_unchecked with a doc-comment pointing at admin_safe_set_is_admin for production use. The short name was a footgun — a future contributor reaching for it would silently bypass self-protection, the last-admin invariant, and the audit log. Used only by integration-test setup; production code goes through the admin_safe_* paths. CSRF posture: build_session_cookie carries a comment that the SameSite=Lax default is the project's CSRF defense for state-changing mutations and breaks the instant anyone adds a side-effecting GET under /admin/*. Spells out what to do then (Strict + explicit token check). Test counts: 43 backend admin tests + 12 vitest admin tests all green; svelte-check 0/0 across 446 files.	2026-05-30 22:23:55 +02:00
MechaCat02	aa2159ca06	fix(admin): three review findings — audit no-op, 404, chapter priority (0.41.1) - admin_safe_set_is_admin: short-circuit when target.is_admin == value, before writing audit. PATCH {is_admin: true} on someone already admin previously wrote a misleading "promote_user" row even though the UPDATE was a no-op. - list_chapters (/admin/mangas/:id/chapters): explicit exists() check on manga_id, returns 404 instead of 200 [] for a typo'd / deleted manga. - ChapterSyncState priority: the Failed branch now requires page_count = 0, so a chapter with pages on disk AND a historical dead job (from a re-download attempt that crashed) stays Synced. The old order contradicted Synced's documented "downloaded at some point" contract. Doc comments updated alongside the SQL. Three new regression tests pin the behaviour.	2026-05-30 21:58:15 +02:00
MechaCat02	b434c9b68d	feat(frontend): /admin dashboard with users/mangas/system views (0.41.0) Adds the SvelteKit /admin route tree backed by the admin endpoints landed in PR 1-4. Pages: Overview (alerts + summary cards), Users (list / promote-demote / delete), Mangas (list with sync state + expandable per-chapter state), System (live disk/mem/cpu bars, refreshing every 5s). Security model: the backend's RequireAdmin extractor is the actual boundary. /admin/+layout.ts calls getSystemStats() at load and translates the response — 401 → redirect to /login, 403 → throw SvelteKit error(403) which renders the framework error page. The header's "Admin" link is hidden unless `session.user?.is_admin`, but that's UX only. Carries `is_admin: boolean` through to the frontend User TS type so the header check works and so admin tables can show role per row. Vitest covers lib/api/admin.ts (10 tests: list/delete/PATCH for users, sync-state filter for mangas, nested chapter route, system disk-nullable case). Playwright is intentionally deferred until the routes stabilise — admin UI is operator-only and changes shape often in v0.	2026-05-30 21:49:39 +02:00
MechaCat02	cc4ec76d17	feat(api): admin system metrics endpoint with disk/mem/cpu alerts (0.40.0) Adds GET /api/v1/admin/system returning disk (scoped to storage_dir via statvfs), memory, CPU, and a server-side alerts array that fires at >90% disk or memory. Disk uses nix::sys::statvfs directly rather than sysinfo's Disks API to avoid mountpoint-matching gymnastics for the storage_dir. A new `Storage::local_root() -> Option<&Path>` trait method exposes the root; the default returns None so a future S3Storage gets `disk: null` in the response instead of fabricated numbers. CPU is sampled inline (refresh → 250ms sleep → refresh → read) so the endpoint adds 250ms of latency per call. No background-cache yet — admin traffic is low-volume and the moving parts aren't worth it until polling shows up. Alerts are evaluated server-side so the frontend can render them without re-implementing the thresholds.	2026-05-30 21:45:06 +02:00
MechaCat02	bf7c9b5c2a	feat(api): admin manga/chapter overview with derived sync state (0.39.0) Adds GET /api/v1/admin/mangas and /admin/mangas/:id/chapters guarded by RequireAdmin. Sync state is computed at query time from the existing crawler signals (manga_sources / chapter_sources / crawler_jobs) — no new state column is persisted, so the crawler stays the single writer of these signals. Per-manga priority: InProgress (in-flight sync_manga or sync_chapter_list job) > Dropped (all source rows soft-dropped) > Synced (default; covers user-uploaded mangas with zero source rows). Per-chapter priority: Downloading (in-flight sync_chapter_content) > Dropped (all source rows soft-dropped) > Failed (most-recent terminal job is dead) > NotDownloaded (page_count = 0) > Synced. The Failed check sits ABOVE NotDownloaded so the more informative "we tried and it died" state wins over "we never got around to it" — see the priority comment in repo/admin_view.rs. Migration 0020 adds a partial index on crawler_jobs((payload->>'source_manga_key')) for the one job kind (sync_manga) whose payload doesn't carry manga_id directly — without it the in-flight detection for a manga falls back to a seqscan over the job table.	2026-05-30 21:41:09 +02:00
MechaCat02	0b2018ceca	feat(api): admin user management endpoints with audit log (0.38.0) Adds /api/v1/admin/users list / DELETE / PATCH guarded by RequireAdmin, plus the audit-log substrate every future destructive admin endpoint will reuse. Safety properties: - Cannot self-delete or self-demote (409 conflict, message calls out "yourself" so the UI can render an explanation). - Cannot remove the last admin via either DELETE or demote. The check takes pg_advisory_xact_lock(ADMIN_INVARIANT_LOCK_KEY) and re-counts admins inside the same tx, closing the parallel-demote race that a bare "if count > 1" check would let through. The HTTP-serial path to this guard is structurally unreachable (the actor would have to be the lone admin demoting themselves, which the self-guard fires on first); the parallel race test exercises it via repo calls. Audit log (admin_audit table) records the action inside the same tx as the action itself, so a rolled-back action never leaves an orphan audit row. actor_user_id is ON DELETE SET NULL so the log outlives a later-deleted admin. target_id is not a FK because future audit kinds will target non-user rows.	2026-05-30 21:35:35 +02:00
MechaCat02	ab8b7acc34	feat(auth): admin role with cookie-only RequireAdmin extractor (0.37.0) Adds an `is_admin` flag on users plus the substrate every later PR in the admin feature builds on: - migration 0018 adds the column with default false - `repo::user::bootstrap_admin` creates or promotes the user named by `ADMIN_USERNAME` at startup, hashing `ADMIN_PASSWORD` only when the row is new — never overwriting an existing hash, so an operator can rotate the admin password via the UI without env-var conflict - `CurrentSessionUser` extractor accepts only the session cookie; `RequireAdmin` composes over it and additionally requires `user.is_admin`. Bearer tokens are intentionally excluded so an admin's bot token never inherits admin authority (privilege-escalation surface that bites every "API keys reuse user perms" auth design) - demotion is instant: `RequireAdmin` re-reads the user row each request `/api/v1/auth/me` now exposes `is_admin`; no other response embeds `User`, so no privacy fanout to audit.	2026-05-30 21:26:26 +02:00
MechaCat02	9925f54695	fix(crawler): narrow browser-dead heuristic to typed downcasts (0.36.7) anyhow_looks_browser_dead substring-matched any chain message containing channel / connection / websocket / transport / closed / nav timeout. Real chromium failures hit those words, but so do reqwest TCP-reset errors during CDN image downloads, sqlx pool- timeout errors, and any number of non-browser failures — each of which triggered a wasted chromium relaunch + session-probe re-run against the catalog's rate-limit budget. Drop the substring pass. Walk the chain looking only for typed NavError (flagged via is_likely_browser_dead) or CdpError. Every place we feed a chromium error into anyhow goes through one of those types, so the typed downcasts cover the real cases without the false-positive surface. NavError::is_likely_browser_dead also drops its own substring check on Cdp(e); any CdpError surfacing at the navigation layer means the chromium-facing channel is the failing layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:41:59 +02:00
MechaCat02	eaa5afda50	fix(crawler): skip sync when empty chapters + prior > 0 (0.36.6) The wait_for_selector wait in 0.36.2 narrows the partial-render race window but doesn't close it: a render that takes longer than SELECTOR_TIMEOUT (10s) still hands an empty Vec to sync_manga_chapters, and the soft-drop branch flips every existing chapter to dropped_at. The next tick recovers but a manga's reader briefly stops working in between. Close it at the pipeline level. Between fetch_manga and the upsert/ sync, if the parsed chapter list is empty and the prior live count for (source_id, source_manga_key) is > 0, treat the fetch as a transient failure: log, bump mangas_failed, skip upsert + sync + the seen.insert so a later batch / tick retries. Brand-new mangas with genuinely zero chapters (prior == 0) pass through unchanged. New repo helper repo::crawler::live_chapter_count_for_source_manga joins chapters → chapter_sources → manga_sources with dropped_at IS NULL — same lockstep as dispatch_target and the enqueue queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:17:42 +02:00
MechaCat02	5c04b0532b	fix(crawler): panic-isolate the cron tick body (0.36.5) Worker dispatch was already wrapped in AssertUnwindSafe(...) .catch_unwind() — a panicking handler ack's the job failed and the worker keeps going. The cron tick had no such guard: a panic in metadata.run, enqueue_bookmarked_pending, reap_done, or write_last_tick would kill the cron task. The JoinSet would drop it, workers would keep running, and no future metadata pass would ever fire until daemon restart. Wrap the tick body (between advisory-lock acquire and unlock) in the same AssertUnwindSafe(...).catch_unwind() pattern. The unlock and connection drop run unconditionally so a panicked tick doesn't leave the lock held for another replica. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:08:11 +02:00
MechaCat02	655ea42731	fix(crawler): scope dispatch_target to live sources, newest first (0.36.4) The chapter dispatcher's URL resolver had no dropped_at filter and no ORDER BY — a chapter whose only chapter_sources row had been soft- dropped was still dispatched against the stale URL, eating retry budget on guaranteed transients. With multiple live sources the LIMIT 1 winner was nondeterministic. Add `AND cs.dropped_at IS NULL` and `ORDER BY cs.last_seen_at DESC` to dispatch_target, bringing it in lockstep with the enqueue queries in pipeline.rs that already filter on dropped_at. Returns None when all sources are dropped — callers in daemon.rs already treat None as "ack the job, skip the work." Tests in tests/repo_chapter.rs cover the three branches (freshest live wins, dropped sources skipped, all-dropped returns None). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:03:45 +02:00
MechaCat02	70e8a7895c	fix(crawler): relaunch chromium on CDP / nav-timeout errors (0.36.3) BrowserManager only re-launched chromium when the cached handle was None. A crash mid-pass left the handle Some pointing at a dead process — every subsequent acquire returned the zombie Browser, and every nav cascaded CDP errors until the idle reaper fired. Add BrowserManager::invalidate(): take the inner mutex, drop the handle (closing it if present), and signal the next acquire to relaunch. Idempotent — invalidating an empty handle is a no-op. Wire detection via NavError::is_likely_browser_dead and a chain-walking anyhow_looks_browser_dead helper: substring-match common channel/connection/transport/WebSocket markers and surface NavError::Timeout as "presumed dead." Apply at both error boundaries — RealChapterDispatcher::dispatch and RealMetadataPass::run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:39:19 +02:00
MechaCat02	8e0b638e3f	fix(crawler): wait for page marker instead of fixed 1s sleep (0.36.2) A chromium snapshot taken between the wrapper-render and row-render phases let parse_chapter_list return Ok(vec![]) for a manga that actually has chapters — the soft-drop branch in sync_manga_chapters then flipped every existing chapter to dropped_at. Add wait_for_selector to crawler::nav. navigate() now takes a CSS marker matching the most-specific element the downstream parser will look for (one of LIST_PAGE_MARKER / DETAIL_PAGE_CHAPTERS_MARKER / DETAIL_PAGE_LAYOUT_MARKER). The wait is best-effort and capped by SELECTOR_TIMEOUT (10s); a legitimately empty page can still pass through because the parser's #chapter_table sentinel and the universal broken-page body check stay in force. Same pattern wired at the reader nav (a#pic_container) and probe nav (#logo), replacing the implicit assumption that the post-load JS had finished within 1 second. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:29:38 +02:00
MechaCat02	e2bd1462ba	fix(crawler): wrap wait_for_navigation in 30s timeout (0.36.1) A hung TLS handshake or a page that never fires load could wedge a worker (or the cron metadata pass) indefinitely — chromiumoxide imposes no navigation timeout of its own. New crawler::nav::wait_for_nav caps each navigation at NAV_TIMEOUT (30s) and returns a typed NavError so timeouts surface as transient (retryable) errors. Wired at the three navigation sites: - source::target::navigate (catalog/detail/pagination) - content::sync_chapter_content (chapter reader) - session::fetch_probe_html (session probe) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:10:51 +02:00
MechaCat02	9f56f283d4	feat(crawler): single-mode walker gated by recovery flag (0.36.0) Collapses the crawler to a single newest-first walker and replaces the N-consecutive-unchanged streak with a per-manga rule: stop on the first manga where metadata is Unchanged AND chapter sync reports zero new chapters. The early stop is gated by a per-source recovery flag stored in `crawler_state` — set to `false` when a run starts, back to `true` only on a clean exit (end-of-walk or intentional stop). A crashed run leaves the flag `false` automatically (no shutdown code runs), so the next tick walks the full catalog instead of bailing at the first caught-up manga. This means a crashed mid-walk run self-heals on the next tick: the flag stays `false`, the next walk visits every page (recovering anything the crash missed past its crash point), and steady state resumes once the recovery sweep reaches end-of-walk. Removed: - DiscoverMode enum, Backfill mode, the boundary re-check + displaced-refs machinery in TargetSourceWalker. - Drop-pass (mark_dropped_mangas) and seed-completion plumbing (mark_seed_completed / seed_completed_at). The recovery flag subsumes the seed-completion signal; drop detection was explicitly opted out. - JobPayload::Discover (no production callers). - CRAWLER_MODE / CRAWLER_INCREMENTAL_STOP_AFTER env vars and the CrawlerModePref config type. `should_mark_clean_exit(walked_to_completion, hit_stop_condition)` encodes the clean-exit truth table in its signature — `hit_limit` is deliberately absent so a future edit cannot accidentally count a caller-imposed cap as a clean exit. Net -501 lines, 261 backend tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 23:49:28 +02:00
MechaCat02	33f7e19077	fix(crawler): serialize sync_manga_chapters per-manga (0.35.6) Two concurrent calls of sync_manga_chapters for the same manga both read seen_keys, both run the drop UPDATE filtered on `NOT (key = ANY $3)`, and the later commit can soft-drop a chapter the earlier had just inserted (lost-update under MVCC). Today the cron tick is the only caller and the daemon-level advisory lock keeps it single-flight, but that lock is held on one pool connection and doesn't actually serialize the function: any future caller (bookmark hook, admin-triggered re-sync, parallel worker) would race against the cron. Add `pg_advisory_xact_lock(hashtextextended(manga_id::text, 0))` at the start of the transaction. Auto-releases on commit/rollback so a panic mid-call can't strand the lock. Lock keyed per-manga so calls for different mangas still parallelize. Test sync_chapters_serializes_concurrent_calls_for_same_manga spawns two tokio tasks calling the function concurrently with overlapping chapter lists and asserts every chapter survives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:45:01 +02:00
MechaCat02	c6bb9160e3	fix(crawler): scope chapter_sources lookup per-manga (0.35.5) chapter_sources's PRIMARY KEY was (source_id, source_chapter_key) and the lookup in sync_manga_chapters didn't constrain by manga_id, so a source whose chapter slugs aren't globally unique (e.g. "chapter-1" appearing under multiple mangas) silently attributed every collision to the first manga that synced it. The INSERT path would have conflicted on the second manga's sync. Migration 0017 drops the old PK and rekeys on (source_id, chapter_id) — the natural identity of a per-source chapter attachment — and adds an index on (source_id, source_chapter_key) for the lookup path. The repo lookup now joins chapters and filters by manga_id; the UPDATE path keys on chapter_id directly (the row's natural identifier post-migration). Test sync_chapters_isolates_colliding_keys_across_mangas pins the contract end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:43:08 +02:00
MechaCat02	50763addcf	fix(crawler): quarantine recently-dead chapters from re-enqueue (0.35.4) The partial dedup index only blocks (pending\|running) duplicates, so once a SyncChapterContent job transitions to 'dead' (max_attempts exhausted) the slot frees. Every subsequent cron tick re-enqueued the chapter — page_count = 0 and dropped_at IS NULL stay true — burned another max_attempts retries, and died again. Permanent-failure chapters spun forever. enqueue_bookmarked_pending and enqueue_pending_for_manga now skip chapters whose latest sync_chapter_content job is dead within CHAPTER_DEAD_QUARANTINE_DAYS (7). A failed chapter goes silent for a week, then gets one more shot — long enough for a transient site issue to resolve, short enough that permanent failures don't stay permanent if conditions change. Two integration tests pin both halves of the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:42:41 +02:00
MechaCat02	766c6eebac	fix(crawler): guard ack_done/ack_failed/release on state='running' (0.35.3) The three lease-ack functions matched their UPDATE on the job id alone. If a lease expired and another worker re-leased the row, a late ack from the original worker would clobber the new lease's state, leased_until, and (for release) decrement its attempts. Add `AND state = 'running'` to each UPDATE and log a warn when rows_affected is zero, so a stolen lease shows up in telemetry without blocking the new lease holder's progress. Three new integration tests pin the contract: - ack_done_no_ops_when_lease_was_stolen - ack_failed_no_ops_when_state_is_not_running - release_no_ops_when_state_is_not_running Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:42:18 +02:00
MechaCat02	c686d6eb51	fix(crawler): sentinel-gate parse_chapter_list to stop false drops (0.35.2) parse_chapter_list previously returned Vec::new() on any selector miss. The empty list flowed into sync_manga_chapters, whose soft-drop branch then flipped every existing chapter's dropped_at to NOW(). Bookmarks subsequently pointed at dropped sources, and enqueue_bookmarked_pending (filters on cs.dropped_at IS NULL) silently stopped re-fetching pages. Same shape as the walker race fixed in 0.35.1: a transient parse miss masquerading as "source removed everything" → false soft-drop. Fix: require #chapter_table in the DOM. Present-but-empty is preserved as Ok(vec![]) so a freshly added manga with no published chapters still parses cleanly. Absent table is now Transient — the job system reschedules with backoff instead of treating the partial render as data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:41:47 +02:00
MechaCat02	dea9b1aaa8	fix(crawler): close walker race against site reordering (0.35.1) The target site orders by update_date DESC, and any new or updated manga pushes everyone down by one slot. The paginated walker was blind to this drift: * Backfill (page last -> 1): shifts push items into pages already finished. The displaced manga was silently missed; with mark_dropped_mangas running on a fully-completed walk, items even got false-dropped because last_seen_at was stale. * Incremental (page 1 -> last): a shift causes the slot-last item of an already-read page to reappear on the next page, leading to a redundant fetch_manga and an inflated consecutive_unchanged streak. Fix is two-pronged: 1. Backfill boundary re-check. After fetching each page P, re-fetch the previously-walked page P+1 and check where its old slot-0 key now sits. If it slid to slot K, the first K entries are items that used to live on P and slid past us; they get appended to the batch. If the anchor is gone entirely (multi-page shift or it was bumped to page 1), the whole re-fetched page is processed conservatively and the pipeline dedup absorbs the noise. The re-check must be the last navigation of the iteration to close the within-iteration race. 2. Run-scoped dedup in run_metadata_pass. A HashSet<String> of source_manga_keys avoids double-processing. The set uses a contains-then-insert pattern with insert firing after a successful upsert, so a transient fetch/upsert failure leaves the key retryable if it reappears later in the same pass (via the boundary re-check or another batch). Incremental mode does not run the re-check (shifts move in the same direction as the walk); only the dedup helps it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 20:14:01 +02:00
MechaCat02	f57ca8e45c	feat: harden auth, shutdown, and session bundle (0.35.0) Some checks failed deploy / test-backend (push) Failing after 1m37s Details deploy / test-frontend (push) Failing after 16m31s Details deploy / build-and-push (push) Has been skipped Details deploy / deploy (push) Has been skipped Details Three features bundled into one release: - rate-limit /auth/login, /register, /me/password (token bucket, 5 req/sec sustained with 10-request burst by default; 429 + Retry-After header on hit; tracing::warn! per hit so operators see attack patterns; AUTH_RATE_PER_SEC / AUTH_RATE_BURST env knobs) - handle SIGTERM for graceful container stops (replaces bare ctrl_c() with a select over ctrl_c + SignalKind::terminate() so docker compose stop runs the daemon shutdown path instead of letting Chromium leak past SIGKILL) - clear session.user on 401 from any API call (setOn401Hook in api/client.ts, registered from session.svelte.ts gated on $app/environment::browser so the SSR bundle never installs it; fixes "logged in but no bookmarks/collections" mid-session expiry state) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 20:27:21 +02:00
MechaCat02	8d34132883	bugfix: security & correctness bundle (0.34.1) Five fixes bundled into one release: - preserve user-attached tags across crawler upserts (repo::crawler::sync_tags now scopes to added_by IS NULL; orphaned attachments from deleted users are reaped as crawler-owned) - gate manga PATCH and cover endpoints on uploaded_by (require_can_edit in api::mangas; non-NULL uploaded_by must match the caller) - equalise login response time across user-existence branches (run argon2 against a OnceLock-cached dummy hash on the no-user branch so timing doesn't leak username existence) - crawler download defences (SSRF allowlist of host literals including IPv4-mapped IPv6 ranges, 32 MiB streamed size cap, reject non-whitelisted image types, three-way chapter-probe classifier replaces the binary #avatar_menu check) - tighten validation and clean up dead unload path (attach_tag + create_token enforce 64-char caps; LocalStorage rejects NUL bytes explicitly; reader flushFinalProgress drops the always-405 sendBeacon path) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 20:24:51 +02:00
MechaCat02	c5c1179e9d	chore: full hop-by-hop header strip and 60s timeout on /api/* proxy The SvelteKit proxy was only stripping host + content-length; the rest of RFC 7230 §6.1 (connection, keep-alive, proxy-authenticate, proxy-authorization, te, trailer, transfer-encoding, upgrade) leaked through to axum. Axum doesn't emit them so the impact is theoretical, but the proxy should be RFC-conformant. Also adds an AbortController with a configurable 60s timeout (BACKEND_PROXY_TIMEOUT_MS) so a wedged backend can't hang the browser request indefinitely — failures surface as the standard 502 upstream_unavailable envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00
MechaCat02	c320eda7cd	chore: dedupe is_unique_violation, lift SQL into repo, centralise URL parsing Three layering cleanups from REVIEW.md §5 / §3: - Drop the three private `is_unique_violation` helpers in repo::{user,chapter,bookmark} in favour of sqlx 0.8's `DatabaseError::is_unique_violation()` method (already used by repo::collection). - Remove the unreachable 23505 branch in repo::chapter::create — the (manga_id, number) UNIQUE was dropped in 0013, so the defensive arm could no longer fire. A doc note records what to do if uniqueness is re-added. - Move three inline SQL queries out of handlers/daemon into repo functions: bookmarks' chapter-belongs-to-manga guard (`repo::chapter::belongs_to_manga`), the daemon's dispatch lookup (`repo::chapter::dispatch_target`), and the daemon's page_count safety net (`repo::chapter::page_count`). Restores the handlers→repo layering invariant in CLAUDE.md. - New `crawler::url_utils` module consolidates host_of / origin_of / registrable_domain — they used to live in three crawler submodules with diverging edge-case behaviour. Tests moved with them. - Doc cross-references on repo::author::set_for_manga and repo::genre::set_for_manga pointing to the crawler's name-keyed variants, so the intentional duplication is discoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 20:24:05 +02:00

1 2 3

107 Commits