When `PRIVATE_MODE=true`, every API path except a small allowlist
(`/health`, `/auth/{config,login,logout,register}`) requires a valid
session cookie or bearer token — anonymous reads are rejected with
401. Self-registration is force-disabled in private mode regardless
of `ALLOW_SELF_REGISTER`, so a locked-down instance flips with a
single switch (admins still mint accounts via `POST /admin/users`).
The backend gate is a tower middleware that reuses the existing
`CurrentUser` extractor, so the cookie + bearer paths cannot drift
from per-handler auth. `/auth/config` now exposes the flag plus the
effective `self_register_enabled` value so the frontend can render
the navbar correctly on the first paint.
On the frontend, a new universal root `+layout.ts` fetches the
config and redirects anonymous visitors to `/login?next=<path>`
before page-specific loads fire. The redirect is UX only — the
backend middleware is the source of truth, so crafted requests
still 401.
Defaults stay public (`PRIVATE_MODE=false`); existing deployments
need no env change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CLI binary already capped runs at CRAWLER_LIMIT mangas, but the
daemon's RealMetadataPass passed a hardcoded `0` (no cap) to
`pipeline::run_metadata_pass`, so the env var was silently ignored once
the daemon took over the metadata pass.
Adds `manga_limit` to `CrawlerConfig`, reads it from `CRAWLER_LIMIT`
(default 0 = no cap), and threads it through `RealMetadataPass::run`
so a daemon-driven sweep stops at the same boundary as a CLI run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Chromium doesn't know the socks5h scheme (curl/reqwest convention)
and bails navigations with ERR_NO_SUPPORTED_PROXIES. It does, however,
send destination hostnames over SOCKS5 by default, so stripping the
`h` is a pure scheme rename — remote-DNS behaviour is preserved.
reqwest keeps the user's original CRAWLER_PROXY string (`socks5h://...`
remains valid and meaningful for it).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the prod tor service but with 127.0.0.1-only host port bindings
so a `cargo run` on the host can reach 127.0.0.1:9050 / 9051. Default
password baked in (overridable via TOR_CONTROL_PASSWORD env) since
host-loopback is the only exposure surface — same friction-free posture
as the postgres entry in this file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dockurr/tor's stock entrypoint binds the control port to localhost
(unreachable from a sibling container), refuses to run as a
non-default user (its setup chowns dirs and su-execs down to its
`tor` user, both requiring root), and skips its own
HashedControlPassword injection whenever the user's torrc declares
a ControlPort. The combination meant the original cookie-via-shared-
volume design couldn't work without fighting the image.
This commit:
- Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD
with `tor --hash-password`, appends the hash to a writable copy
of /etc/tor/torrc, then execs tor. Container runs as root only
for that bring-up; the torrc's `User tor` directive drops privs
after port binding.
- Adds a healthcheck on the tor service that gates downstream
containers on both 9050 + 9051 actually listening (was
service_started, which fires before tor finishes bootstrap).
- Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have
rotated mid-chapter for any chapter with > ~50 images, which is
exactly the kind of fingerprint we're trying to avoid.
- Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides
(PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend).
docker-compose.yml fails fast if unset.
- Removes the tor-data shared volume on backend (cookie auth is no
longer the default; operators wanting cookie can mount it back).
- Documents the pivot + the cookie-vs-password tradeoff in
.env.example.
End-to-end validated: `docker compose up -d tor`, then
`printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051`
returns three `250 OK` lines.
Audit ref: #2, #3, #6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The startup log line in app.rs and bin/crawler.rs `?t`-debug-formats
the TorController, which through the derived Debug on TorAuth would
expand TorAuth::Password(p) and leak the plaintext password to logs.
Implement Debug manually on TorAuth — None / Password(<redacted>) /
Cookie(<path>) — and lock the redaction with a regression test.
Drop the per-NEWNYM success log from info to debug: a busy crawl
rotates circuits many times per minute. Failed NEWNYMs already log
at warn — those stay loud.
Tightens the closed_connection_mid_reply_is_an_error assertion which
was tautological (`closed connection` OR `AUTHENTICATE`) by driving
the mock to read the AUTH line then drop, exercising only the
EOF-mid-reply path.
Audit ref: #7, #9, nit on tautological test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three retry-with-recircuit sites disagreed: detect.rs's
retry_on_transient_with_hook used "N = total attempts" (3 → 3
fetches), but session.rs's unauth branch and content.rs's chapter
loop used "N = recircuits" (3 → 4 fetches). At the same wall-clock
"max=3", different sites hit the upstream a different number of times.
Unify on N = total attempts (matching the existing
retry_on_transient convention). The CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS
env var now means exactly what its name suggests. Disabling the
recircuit feature collapses to max_attempts=1 (single attempt, no
retry) — bit-for-bit pre-TOR behavior preserved.
Adds a debug_assert!(max >= 1) on both helpers and a new
content.rs test exercising the mixed Transient → Unauth → Ok
sequence to lock in the shared-counter invariant.
Audit ref: #5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CRAWLER_TOR_CONTROL_URL, _PASSWORD, _COOKIE_PATH,
_RECIRCUIT_MAX_ATTEMPTS are new feature env vars; treat per CLAUDE.md
as a minor bump (feat:).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `tor` service to the compose stack (dockurr/tor) with a torrc
tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr +
IsolateDestPort so NEWNYM picks up promptly, control port on 9051
with cookie auth, MaxCircuitDirtiness 60.
Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and
CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on
out-of-the-box. Operators can override both to empty in .env to opt
out without removing the service.
The tor-data named volume is mounted ro on the backend so it can read
/var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles
the permissions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- target.rs swaps retry_on_transient → retry_on_transient_with_hook,
signaling NEWNYM via ctx.tor between attempts when configured.
- session.rs gains verify_session_with_recircuit; the bare
verify_session is now a one-line wrapper passing tor=None,
unauth_max_recircuit=0. The inner run_session_probe_loop is
pure-over-IO and unit-tested with closure-based fakes.
- content.rs extracts fetch_chapter_html_once + the closure-driven
fetch_chapter_html_with_recircuit, used by sync_chapter_content to
retry on Transient or Unauthenticated up to a recircuit_budget.
Budget = 0 (no TOR) preserves original behavior bit-for-bit.
- app.rs and bin/crawler.rs construct the controller before on_launch
and pass it into verify_session_with_recircuit, so a transient
hiccup at startup no longer requires PHPSESSID rotation.
Recircuit budget defaults to CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS (3).
Errors from NEWNYM are logged and swallowed — failing to recircuit
should not take down the crawl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds CRAWLER_TOR_CONTROL_URL / _PASSWORD / _COOKIE_PATH /
_RECIRCUIT_MAX_ATTEMPTS to CrawlerConfig and to bin/crawler.rs's
env reads. Constructs an Option<Arc<TorController>> at daemon /
CLI startup and threads it through FetchContext,
pipeline::run_metadata_pass, and content::sync_chapter_content as
Option<&TorController>.
Pure scaffolding — the controller isn't used yet; behavior is
unchanged. Next commit wires the retry hooks and session-probe
recircuit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sibling fn that fires a caller-supplied async hook between a
transient failure and the next attempt. The existing
retry_on_transient becomes a thin wrapper over it (no-op hook), so
no call sites churn yet.
Hook contract: fires only between attempts (N-1 times for N
attempts), never after a non-transient error or after the final
attempt. Designed for TOR NEWNYM, but the signature is generic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minimal client over tokio::net::TcpStream — AUTHENTICATE then
SIGNAL NEWNYM, one-shot connection. Supports cookie-file and
password auth (cookie preferred when both provided); covers the
multi-line `250-...\r\n250 OK` reply form so future torrc tweaks
won't confuse the parser.
Not yet wired into the crawler — that lands in the next commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner lives on the deploy host and shares its docker daemon, so the
deploy job runs `docker compose pull && up -d` against the central compose
via a bind-mounted compose dir (docker:cli + docker_host: "-") instead of
appleboy/ssh-action. Drops the SSH_* secrets and recreates only the two
mangalord services at the freshly built SHA. Requires /mnt/ssd/docker-data
in the runner's container.valid_volumes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- backend dep-cache stage stubs only main.rs/lib.rs, but Cargo.toml
declares a second [[bin]] crawler at src/bin/crawler.rs, so
`cargo build --locked` aborts ("can't find bin crawler"). Stub it too.
- runtime was debian:bookworm-slim (glibc 2.36) while rust:1-slim now
tracks trixie (glibc 2.41) -> "GLIBC_2.39 not found" at boot. Pin the
runtime to debian:trixie-slim so it matches the builder's glibc.
- frontend healthcheck probed localhost (-> musl picks IPv6 ::1) but the
Node server binds IPv4 0.0.0.0 only -> false "unhealthy". Probe 127.0.0.1.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Operators whose sources shard images across numbered CDN subdomains
can't pre-enumerate every host in CRAWLER_DOWNLOAD_ALLOWLIST. The new
flag short-circuits the host check in DownloadAllowlist::contains
while leaving scheme, localhost, and private-IP defenses in
is_safe_url untouched — scraped URLs pointing at 10.x /
169.254.169.254 / file:// stay refused. Default is false; fail-closed
posture is preserved unless the operator opts in. Wired into both the
server (config::build_download_allowlist) and the bin/crawler.rs
one-shot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Handle::close aborts its chromiumoxide driver task when another
Arc<Browser> outlives the call, so shutdown returns instead of
hanging on a stream that never terminates. Generic close_or_abort
helper with regression tests covering both Arc paths.
- daemon.shutdown() is wrapped in a 5s timeout in main as defense
in depth.
- Default RUST_LOG silences chromiumoxide::conn / chromiumoxide::handler
WS-deserialize ERROR spam.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with the ALLOW_SELF_REGISTER toggle from 0.42.0: admins can mint
accounts regardless of the toggle state, so a closed-membership
deployment still has a working enrollment path. The endpoint accepts
{ username, password, is_admin? } so admins can mint co-admins in one
call (avoiding a separate promote + extra audit row for the common
"invite a co-admin" flow).
Implementation:
- POST /api/v1/admin/users guarded by RequireAdmin
- Reuses validate_username / validate_password from api::auth (made
pub(crate)) so the admin path can never produce an account self-
register would reject and vice versa
- repo::user::admin_create_user wraps INSERT + admin_audit insert in
a single tx — same "audit reflects what committed" semantics as the
existing admin_safe_* fns
- Audit row: action="create_user", payload={username, is_admin}
Frontend:
- createAdminUser() in lib/api/admin.ts
- /admin/users grows a collapsible "Create user" form above the table
(username, password, "Make admin" checkbox). Errors surface inline;
the list reloads on success.
Backend tests: 7 new, including the headline
`create_user_works_even_when_self_register_disabled` that pins the
admin-create path is NOT gated by the public toggle.
Lets operators run a closed-membership deployment by setting
ALLOW_SELF_REGISTER=false (default true, so existing deploys are
unaffected). When off, POST /auth/register returns 403 forbidden. The
rate-limit token is consumed BEFORE the disabled check so the timing
doesn't distinguish enabled-but-rejected from disabled — closes the
toggle-state probe channel.
New public GET /auth/config returns { self_register_enabled: bool }
so the frontend can render its register affordances correctly
without conflating "disabled" with "rate-limited" (which a probe
attempt would).
Frontend: a lightweight reactive `authConfig` store loads the flag
once on root-layout mount (and again on /register direct navigation,
which bypasses the layout's onMount). Header hides the Register link
when the toggle is off; /register renders a "self-registration is
disabled — ask an administrator" notice instead of the form.
Admin-create endpoint that pairs with this toggle is intentionally
not in this PR — it lands as the next branch (feat/admin-user-create).
The toggle alone is independently useful for deployments that want
to lock down enrollment without yet wiring an admin UI.
Addresses the security-audit findings on top of the admin feature stack:
M1: /admin/mangas/:id/chapters now paginates (default limit 200, max 500).
A long-runner with thousands of chapters would otherwise produce a multi-MB
response with that many scalar subqueries per row — admin-only but a real
stall risk on one expand-click. Adds explicit pagination tests for the cap
and offset; frontend renders a "Showing first N of M" hint when the cap
clips the result.
L1: repo::user::set_is_admin renamed to set_is_admin_unchecked with a
doc-comment pointing at admin_safe_set_is_admin for production use. The
short name was a footgun — a future contributor reaching for it would
silently bypass self-protection, the last-admin invariant, and the audit
log. Used only by integration-test setup; production code goes through
the admin_safe_* paths.
CSRF posture: build_session_cookie carries a comment that the
SameSite=Lax default is the project's CSRF defense for state-changing
mutations and breaks the instant anyone adds a side-effecting GET under
/admin/*. Spells out what to do then (Strict + explicit token check).
Test counts: 43 backend admin tests + 12 vitest admin tests all green;
svelte-check 0/0 across 446 files.
- admin_safe_set_is_admin: short-circuit when target.is_admin == value,
before writing audit. PATCH {is_admin: true} on someone already admin
previously wrote a misleading "promote_user" row even though the UPDATE
was a no-op.
- list_chapters (/admin/mangas/:id/chapters): explicit exists() check on
manga_id, returns 404 instead of 200 [] for a typo'd / deleted manga.
- ChapterSyncState priority: the Failed branch now requires page_count = 0,
so a chapter with pages on disk AND a historical dead job (from a
re-download attempt that crashed) stays Synced. The old order
contradicted Synced's documented "downloaded at some point" contract.
Doc comments updated alongside the SQL.
Three new regression tests pin the behaviour.
Adds the SvelteKit /admin route tree backed by the admin endpoints
landed in PR 1-4. Pages: Overview (alerts + summary cards), Users
(list / promote-demote / delete), Mangas (list with sync state +
expandable per-chapter state), System (live disk/mem/cpu bars,
refreshing every 5s).
Security model: the backend's RequireAdmin extractor is the actual
boundary. /admin/+layout.ts calls getSystemStats() at load and
translates the response — 401 → redirect to /login, 403 → throw
SvelteKit error(403) which renders the framework error page. The
header's "Admin" link is hidden unless `session.user?.is_admin`,
but that's UX only.
Carries `is_admin: boolean` through to the frontend User TS type so
the header check works and so admin tables can show role per row.
Vitest covers lib/api/admin.ts (10 tests: list/delete/PATCH for
users, sync-state filter for mangas, nested chapter route, system
disk-nullable case). Playwright is intentionally deferred until the
routes stabilise — admin UI is operator-only and changes shape often
in v0.
Adds GET /api/v1/admin/system returning disk (scoped to storage_dir
via statvfs), memory, CPU, and a server-side alerts array that fires
at >90% disk or memory.
Disk uses nix::sys::statvfs directly rather than sysinfo's Disks API
to avoid mountpoint-matching gymnastics for the storage_dir. A new
`Storage::local_root() -> Option<&Path>` trait method exposes the
root; the default returns None so a future S3Storage gets `disk:
null` in the response instead of fabricated numbers.
CPU is sampled inline (refresh → 250ms sleep → refresh → read) so the
endpoint adds 250ms of latency per call. No background-cache yet —
admin traffic is low-volume and the moving parts aren't worth it
until polling shows up.
Alerts are evaluated server-side so the frontend can render them
without re-implementing the thresholds.
Adds GET /api/v1/admin/mangas and /admin/mangas/:id/chapters guarded by
RequireAdmin. Sync state is computed at query time from the existing
crawler signals (manga_sources / chapter_sources / crawler_jobs) — no
new state column is persisted, so the crawler stays the single writer
of these signals.
Per-manga priority: InProgress (in-flight sync_manga or
sync_chapter_list job) > Dropped (all source rows soft-dropped) >
Synced (default; covers user-uploaded mangas with zero source rows).
Per-chapter priority: Downloading (in-flight sync_chapter_content) >
Dropped (all source rows soft-dropped) > Failed (most-recent terminal
job is dead) > NotDownloaded (page_count = 0) > Synced. The Failed
check sits ABOVE NotDownloaded so the more informative "we tried and
it died" state wins over "we never got around to it" — see the
priority comment in repo/admin_view.rs.
Migration 0020 adds a partial index on
crawler_jobs((payload->>'source_manga_key')) for the one job kind
(sync_manga) whose payload doesn't carry manga_id directly — without it
the in-flight detection for a manga falls back to a seqscan over the
job table.
Adds /api/v1/admin/users list / DELETE / PATCH guarded by RequireAdmin,
plus the audit-log substrate every future destructive admin endpoint
will reuse.
Safety properties:
- Cannot self-delete or self-demote (409 conflict, message calls out
"yourself" so the UI can render an explanation).
- Cannot remove the last admin via either DELETE or demote. The check
takes pg_advisory_xact_lock(ADMIN_INVARIANT_LOCK_KEY) and re-counts
admins inside the same tx, closing the parallel-demote race that a
bare "if count > 1" check would let through. The HTTP-serial path to
this guard is structurally unreachable (the actor would have to be
the lone admin demoting themselves, which the self-guard fires on
first); the parallel race test exercises it via repo calls.
Audit log (admin_audit table) records the action inside the same tx
as the action itself, so a rolled-back action never leaves an orphan
audit row. actor_user_id is ON DELETE SET NULL so the log outlives a
later-deleted admin. target_id is not a FK because future audit kinds
will target non-user rows.
Adds an `is_admin` flag on users plus the substrate every later PR in the
admin feature builds on:
- migration 0018 adds the column with default false
- `repo::user::bootstrap_admin` creates or promotes the user named by
`ADMIN_USERNAME` at startup, hashing `ADMIN_PASSWORD` only when the row
is new — never overwriting an existing hash, so an operator can rotate
the admin password via the UI without env-var conflict
- `CurrentSessionUser` extractor accepts only the session cookie;
`RequireAdmin` composes over it and additionally requires
`user.is_admin`. Bearer tokens are intentionally excluded so an
admin's bot token never inherits admin authority (privilege-escalation
surface that bites every "API keys reuse user perms" auth design)
- demotion is instant: `RequireAdmin` re-reads the user row each request
`/api/v1/auth/me` now exposes `is_admin`; no other response embeds
`User`, so no privacy fanout to audit.
anyhow_looks_browser_dead substring-matched any chain message
containing channel / connection / websocket / transport / closed /
nav timeout. Real chromium failures hit those words, but so do
reqwest TCP-reset errors during CDN image downloads, sqlx pool-
timeout errors, and any number of non-browser failures — each of
which triggered a wasted chromium relaunch + session-probe re-run
against the catalog's rate-limit budget.
Drop the substring pass. Walk the chain looking only for typed
NavError (flagged via is_likely_browser_dead) or CdpError. Every
place we feed a chromium error into anyhow goes through one of
those types, so the typed downcasts cover the real cases without
the false-positive surface.
NavError::is_likely_browser_dead also drops its own substring
check on Cdp(e); any CdpError surfacing at the navigation layer
means the chromium-facing channel is the failing layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wait_for_selector wait in 0.36.2 narrows the partial-render race
window but doesn't close it: a render that takes longer than
SELECTOR_TIMEOUT (10s) still hands an empty Vec to sync_manga_chapters,
and the soft-drop branch flips every existing chapter to dropped_at.
The next tick recovers but a manga's reader briefly stops working in
between.
Close it at the pipeline level. Between fetch_manga and the upsert/
sync, if the parsed chapter list is empty and the prior live count
for (source_id, source_manga_key) is > 0, treat the fetch as a
transient failure: log, bump mangas_failed, skip upsert + sync + the
seen.insert so a later batch / tick retries. Brand-new mangas with
genuinely zero chapters (prior == 0) pass through unchanged.
New repo helper repo::crawler::live_chapter_count_for_source_manga
joins chapters → chapter_sources → manga_sources with dropped_at IS
NULL — same lockstep as dispatch_target and the enqueue queries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker dispatch was already wrapped in AssertUnwindSafe(...)
.catch_unwind() — a panicking handler ack's the job failed and the
worker keeps going. The cron tick had no such guard: a panic in
metadata.run, enqueue_bookmarked_pending, reap_done, or
write_last_tick would kill the cron task. The JoinSet would drop it,
workers would keep running, and no future metadata pass would ever
fire until daemon restart.
Wrap the tick body (between advisory-lock acquire and unlock) in the
same AssertUnwindSafe(...).catch_unwind() pattern. The unlock and
connection drop run unconditionally so a panicked tick doesn't leave
the lock held for another replica.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The chapter dispatcher's URL resolver had no dropped_at filter and no
ORDER BY — a chapter whose only chapter_sources row had been soft-
dropped was still dispatched against the stale URL, eating retry
budget on guaranteed transients. With multiple live sources the LIMIT
1 winner was nondeterministic.
Add `AND cs.dropped_at IS NULL` and `ORDER BY cs.last_seen_at DESC`
to dispatch_target, bringing it in lockstep with the enqueue queries
in pipeline.rs that already filter on dropped_at. Returns None when
all sources are dropped — callers in daemon.rs already treat None
as "ack the job, skip the work."
Tests in tests/repo_chapter.rs cover the three branches (freshest
live wins, dropped sources skipped, all-dropped returns None).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BrowserManager only re-launched chromium when the cached handle was
None. A crash mid-pass left the handle Some pointing at a dead
process — every subsequent acquire returned the zombie Browser, and
every nav cascaded CDP errors until the idle reaper fired.
Add BrowserManager::invalidate(): take the inner mutex, drop the
handle (closing it if present), and signal the next acquire to
relaunch. Idempotent — invalidating an empty handle is a no-op.
Wire detection via NavError::is_likely_browser_dead and a
chain-walking anyhow_looks_browser_dead helper: substring-match
common channel/connection/transport/WebSocket markers and surface
NavError::Timeout as "presumed dead." Apply at both error
boundaries — RealChapterDispatcher::dispatch and
RealMetadataPass::run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A chromium snapshot taken between the wrapper-render and row-render
phases let parse_chapter_list return Ok(vec![]) for a manga that
actually has chapters — the soft-drop branch in sync_manga_chapters
then flipped every existing chapter to dropped_at.
Add wait_for_selector to crawler::nav. navigate() now takes a CSS
marker matching the most-specific element the downstream parser will
look for (one of LIST_PAGE_MARKER / DETAIL_PAGE_CHAPTERS_MARKER /
DETAIL_PAGE_LAYOUT_MARKER). The wait is best-effort and capped by
SELECTOR_TIMEOUT (10s); a legitimately empty page can still pass
through because the parser's #chapter_table sentinel and the
universal broken-page body check stay in force.
Same pattern wired at the reader nav (a#pic_container) and probe
nav (#logo), replacing the implicit assumption that the post-load
JS had finished within 1 second.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A hung TLS handshake or a page that never fires load could wedge a
worker (or the cron metadata pass) indefinitely — chromiumoxide
imposes no navigation timeout of its own.
New crawler::nav::wait_for_nav caps each navigation at NAV_TIMEOUT
(30s) and returns a typed NavError so timeouts surface as transient
(retryable) errors. Wired at the three navigation sites:
- source::target::navigate (catalog/detail/pagination)
- content::sync_chapter_content (chapter reader)
- session::fetch_probe_html (session probe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapses the crawler to a single newest-first walker and replaces the
N-consecutive-unchanged streak with a per-manga rule: stop on the first
manga where metadata is Unchanged AND chapter sync reports zero new
chapters. The early stop is gated by a per-source recovery flag stored
in `crawler_state` — set to `false` when a run starts, back to `true`
only on a clean exit (end-of-walk or intentional stop). A crashed run
leaves the flag `false` automatically (no shutdown code runs), so the
next tick walks the full catalog instead of bailing at the first
caught-up manga.
This means a crashed mid-walk run self-heals on the next tick: the
flag stays `false`, the next walk visits every page (recovering
anything the crash missed past its crash point), and steady state
resumes once the recovery sweep reaches end-of-walk.
Removed:
- DiscoverMode enum, Backfill mode, the boundary re-check +
displaced-refs machinery in TargetSourceWalker.
- Drop-pass (mark_dropped_mangas) and seed-completion plumbing
(mark_seed_completed / seed_completed_at). The recovery flag
subsumes the seed-completion signal; drop detection was explicitly
opted out.
- JobPayload::Discover (no production callers).
- CRAWLER_MODE / CRAWLER_INCREMENTAL_STOP_AFTER env vars and the
CrawlerModePref config type.
`should_mark_clean_exit(walked_to_completion, hit_stop_condition)`
encodes the clean-exit truth table in its signature — `hit_limit` is
deliberately absent so a future edit cannot accidentally count a
caller-imposed cap as a clean exit.
Net -501 lines, 261 backend tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent calls of sync_manga_chapters for the same manga both
read seen_keys, both run the drop UPDATE filtered on `NOT (key = ANY
$3)`, and the later commit can soft-drop a chapter the earlier had
just inserted (lost-update under MVCC). Today the cron tick is the
only caller and the daemon-level advisory lock keeps it single-flight,
but that lock is held on one pool connection and doesn't actually
serialize the *function*: any future caller (bookmark hook,
admin-triggered re-sync, parallel worker) would race against the cron.
Add `pg_advisory_xact_lock(hashtextextended(manga_id::text, 0))` at
the start of the transaction. Auto-releases on commit/rollback so a
panic mid-call can't strand the lock. Lock keyed per-manga so calls
for different mangas still parallelize.
Test sync_chapters_serializes_concurrent_calls_for_same_manga spawns
two tokio tasks calling the function concurrently with overlapping
chapter lists and asserts every chapter survives.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chapter_sources's PRIMARY KEY was (source_id, source_chapter_key) and
the lookup in sync_manga_chapters didn't constrain by manga_id, so a
source whose chapter slugs aren't globally unique (e.g. "chapter-1"
appearing under multiple mangas) silently attributed every collision
to the first manga that synced it. The INSERT path would have
conflicted on the second manga's sync.
Migration 0017 drops the old PK and rekeys on (source_id, chapter_id)
— the natural identity of a per-source chapter attachment — and adds
an index on (source_id, source_chapter_key) for the lookup path. The
repo lookup now joins chapters and filters by manga_id; the UPDATE
path keys on chapter_id directly (the row's natural identifier
post-migration).
Test sync_chapters_isolates_colliding_keys_across_mangas pins the
contract end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The partial dedup index only blocks (pending|running) duplicates, so
once a SyncChapterContent job transitions to 'dead' (max_attempts
exhausted) the slot frees. Every subsequent cron tick re-enqueued the
chapter — page_count = 0 and dropped_at IS NULL stay true — burned
another max_attempts retries, and died again. Permanent-failure
chapters spun forever.
enqueue_bookmarked_pending and enqueue_pending_for_manga now skip
chapters whose latest sync_chapter_content job is dead within
CHAPTER_DEAD_QUARANTINE_DAYS (7). A failed chapter goes silent for a
week, then gets one more shot — long enough for a transient site
issue to resolve, short enough that permanent failures don't stay
permanent if conditions change.
Two integration tests pin both halves of the contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three lease-ack functions matched their UPDATE on the job id
alone. If a lease expired and another worker re-leased the row, a
late ack from the original worker would clobber the new lease's
state, leased_until, and (for release) decrement its attempts.
Add `AND state = 'running'` to each UPDATE and log a warn when
rows_affected is zero, so a stolen lease shows up in telemetry without
blocking the new lease holder's progress.
Three new integration tests pin the contract:
- ack_done_no_ops_when_lease_was_stolen
- ack_failed_no_ops_when_state_is_not_running
- release_no_ops_when_state_is_not_running
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parse_chapter_list previously returned Vec::new() on any selector
miss. The empty list flowed into sync_manga_chapters, whose soft-drop
branch then flipped every existing chapter's dropped_at to NOW().
Bookmarks subsequently pointed at dropped sources, and
enqueue_bookmarked_pending (filters on cs.dropped_at IS NULL) silently
stopped re-fetching pages.
Same shape as the walker race fixed in 0.35.1: a transient parse miss
masquerading as "source removed everything" → false soft-drop.
Fix: require #chapter_table in the DOM. Present-but-empty is preserved
as Ok(vec![]) so a freshly added manga with no published chapters
still parses cleanly. Absent table is now Transient — the job system
reschedules with backoff instead of treating the partial render as
data.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The target site orders by update_date DESC, and any new or updated
manga pushes everyone down by one slot. The paginated walker was
blind to this drift:
* Backfill (page last -> 1): shifts push items into pages already
finished. The displaced manga was silently missed; with
mark_dropped_mangas running on a fully-completed walk, items even
got false-dropped because last_seen_at was stale.
* Incremental (page 1 -> last): a shift causes the slot-last item
of an already-read page to reappear on the next page, leading to
a redundant fetch_manga and an inflated consecutive_unchanged
streak.
Fix is two-pronged:
1. Backfill boundary re-check. After fetching each page P, re-fetch
the previously-walked page P+1 and check where its old slot-0
key now sits. If it slid to slot K, the first K entries are
items that used to live on P and slid past us; they get appended
to the batch. If the anchor is gone entirely (multi-page shift
or it was bumped to page 1), the whole re-fetched page is
processed conservatively and the pipeline dedup absorbs the
noise. The re-check must be the *last* navigation of the
iteration to close the within-iteration race.
2. Run-scoped dedup in run_metadata_pass. A HashSet<String> of
source_manga_keys avoids double-processing. The set uses a
contains-then-insert pattern with insert firing *after* a
successful upsert, so a transient fetch/upsert failure leaves
the key retryable if it reappears later in the same pass (via
the boundary re-check or another batch).
Incremental mode does not run the re-check (shifts move in the
same direction as the walk); only the dedup helps it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three features bundled into one release:
- rate-limit /auth/login, /register, /me/password (token bucket,
5 req/sec sustained with 10-request burst by default; 429 +
Retry-After header on hit; tracing::warn! per hit so operators
see attack patterns; AUTH_RATE_PER_SEC / AUTH_RATE_BURST env knobs)
- handle SIGTERM for graceful container stops (replaces bare
ctrl_c() with a select over ctrl_c + SignalKind::terminate() so
docker compose stop runs the daemon shutdown path instead of
letting Chromium leak past SIGKILL)
- clear session.user on 401 from any API call (setOn401Hook in
api/client.ts, registered from session.svelte.ts gated on
$app/environment::browser so the SSR bundle never installs it;
fixes "logged in but no bookmarks/collections" mid-session
expiry state)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five fixes bundled into one release:
- preserve user-attached tags across crawler upserts
(repo::crawler::sync_tags now scopes to added_by IS NULL; orphaned
attachments from deleted users are reaped as crawler-owned)
- gate manga PATCH and cover endpoints on uploaded_by (require_can_edit
in api::mangas; non-NULL uploaded_by must match the caller)
- equalise login response time across user-existence branches
(run argon2 against a OnceLock-cached dummy hash on the no-user
branch so timing doesn't leak username existence)
- crawler download defences (SSRF allowlist of host literals
including IPv4-mapped IPv6 ranges, 32 MiB streamed size cap,
reject non-whitelisted image types, three-way chapter-probe
classifier replaces the binary #avatar_menu check)
- tighten validation and clean up dead unload path
(attach_tag + create_token enforce 64-char caps; LocalStorage
rejects NUL bytes explicitly; reader flushFinalProgress drops
the always-405 sendBeacon path)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The SvelteKit proxy was only stripping host + content-length; the rest
of RFC 7230 §6.1 (connection, keep-alive, proxy-authenticate,
proxy-authorization, te, trailer, transfer-encoding, upgrade) leaked
through to axum. Axum doesn't emit them so the impact is theoretical,
but the proxy should be RFC-conformant. Also adds an AbortController
with a configurable 60s timeout (BACKEND_PROXY_TIMEOUT_MS) so a
wedged backend can't hang the browser request indefinitely — failures
surface as the standard 502 upstream_unavailable envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layering cleanups from REVIEW.md §5 / §3:
- Drop the three private `is_unique_violation` helpers in
repo::{user,chapter,bookmark} in favour of sqlx 0.8's
`DatabaseError::is_unique_violation()` method (already used by
repo::collection).
- Remove the unreachable 23505 branch in repo::chapter::create — the
(manga_id, number) UNIQUE was dropped in 0013, so the defensive arm
could no longer fire. A doc note records what to do if uniqueness
is re-added.
- Move three inline SQL queries out of handlers/daemon into repo
functions: bookmarks' chapter-belongs-to-manga guard
(`repo::chapter::belongs_to_manga`), the daemon's dispatch lookup
(`repo::chapter::dispatch_target`), and the daemon's page_count
safety net (`repo::chapter::page_count`). Restores the
handlers→repo layering invariant in CLAUDE.md.
- New `crawler::url_utils` module consolidates host_of / origin_of /
registrable_domain — they used to live in three crawler submodules
with diverging edge-case behaviour. Tests moved with them.
- Doc cross-references on repo::author::set_for_manga and
repo::genre::set_for_manga pointing to the crawler's name-keyed
variants, so the intentional duplication is discoverable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>