Compare commits

...

21 Commits

Author SHA1 Message Date
MechaCat02
d3935dc82f docs: hand-off report for 2026-06-05 session
Snapshot of main, in-flight branches, session-specific changes (CRAWLER_LIMIT
in the daemon, browser-restart clears session_expired), and dev-stack
commands — for whoever picks the work up next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-05 07:11:39 +02:00
MechaCat02
679abae736 feat(chapter): preserve source-site order in chapter list (0.52.0)
Some checks failed
deploy / test-backend (push) Failing after 11m48s
deploy / test-frontend (push) Successful in 9m45s
deploy / build-and-push (push) Has been skipped
deploy / deploy (push) Has been skipped
The user-facing chapter list ordered by (number ASC, created_at ASC),
which broke the source site's order in two ways: non-numeric entries
("notice. : Officials") parsed to number=0 and clustered at the top,
even though the site placed them mid-list, and variants sharing a
number ("Ch.14 : PH" / "Ch.14 : Official") were torn apart by the
created_at tiebreak.

Capture each chapter's position in the source DOM as `source_index`
(0 = first = newest on this site) on every crawler sync, including the
UPDATE branch so a new chapter prepended on the source shifts every
existing row down by one on the next tick. The list query reverses
this with `ORDER BY source_index DESC NULLS LAST, number ASC,
created_at ASC` so the oldest chapter appears first, variants stay
adjacent in the order the site shows them, and non-numeric entries
land where the site placed them. User-uploaded chapters and pre-
migration rows keep their NULL source_index and fall through to the
prior number/created_at tiebreak via NULLS LAST.

The reader's client-side `[...chapters].sort((a,b) => a.number - b.number)`
is dropped; prev/next now walks the server-ordered array positionally
so it traverses variants and non-numeric entries in display order.

Existing data populates on the next cron tick or via admin force-resync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 07:25:09 +02:00
MechaCat02
b812c6d16c fix(reader): drop "Chapter N:" prefix from chapter title display (0.51.2)
The chapter list on the manga detail page, the reader's chapter-select
dropdown, the continuous-mode chapter bar, the browser tab title, and
the profile upload-history entries all prepended "Chapter {number}:"
in front of the crawled site title. Source titles already include
"Ch.N" themselves and the manga page renders chapters inside an <ol>,
so the prefix duplicated information the user could already see.

A small chapterLabel(c) helper in $lib/api/chapters returns the site
title as-is, falling back to "Chapter {number}" only when the
crawler captured an empty title (link/option stays non-empty). The
five render sites now call it. The previous-/next-chapter nav
buttons still read "Previous chapter (Ch. N)" / "Next chapter (Ch. N)"
since those are wayfinding labels, not title display.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 07:22:17 +02:00
MechaCat02
e93eec89e5 fix(crawler): queue chapter content in ascending number order (0.51.1)
Both enqueue paths now order by chapters.number so the cron tick and the
bookmark hook insert jobs from chapter 1 upward instead of source-discovery
or random-UUID order. The lease query tiebreaks on created_at so jobs
sharing a batch's scheduled_at come off the queue in insertion order,
propagating the enqueue intent through to dequeue. Concurrent workers
and per-CDN latency can still drift actual completion order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 21:13:51 +02:00
MechaCat02
8818c890c5 feat(reader): chapter select dropdown for direct chapter jumps (0.51.0)
Adds a chapter `<select>` to the reader's top nav listing every chapter
of the current manga, defaulting to the open chapter; picking another
entry navigates straight to it without going back to the manga detail
page. Options use the "Ch. N — Title" form to match the existing
chapter tile and prev/next buttons in the reader bar.

Covered by a new Playwright spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 07:09:30 +02:00
MechaCat02
c134bdbbde feat: cover retry backfill + admin force-resync for manga & chapter (0.50.0)
Adds a per-tick cover-backfill pass to the crawler daemon so mangas whose
cover download failed on first attempt get retried — the metadata pass's
early-stop optimisation otherwise prevents the walk from revisiting them.

Adds admin-only POST /admin/mangas/:id/resync and POST /admin/chapters/:id/resync
that refetch metadata + cover (or chapter content with force_refetch) from the
crawler source synchronously and return the refreshed row. Surfaced in the
UI as "Force resync" buttons on the manga detail and reader pages,
admin-only via session.user.is_admin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 22:00:09 +02:00
MechaCat02
5c22dfdb41 feat: paginate list views, fix stale page titles, tidy admin filter bar
Bundle of small UI/UX fixes plus a build hygiene tweak.

* List pagination — Home (`/`) and `/authors/[id]` silently capped at
  the backend default of 50 with no UI to advance. New reusable
  `Pager.svelte` (Prev/Next + numbered with ellipsis), URL-synced
  `?page=N`, and filter/search/sort reset to page 1 so users aren't
  stranded on an out-of-range page. Count label now shows a range
  ("Showing 51–100 of 237").

* Stale page title — Pages without a `<svelte:head><title>` left the
  document title at whatever the last manga / author / collection page
  set it to. Move static-route titles into a route-id → title map in
  the root layout and invert every dynamic title to brand-first
  (`Mangalord | {X}`) for consistency.

* Admin filter bar — `/admin/mangas` search input had `flex: 1` and
  ballooned across the row, shoving the sync-state select + Search
  button to the far right. Cap at 24rem, vertical-align the row, and
  promote the previously aria-only "Sync state" label to visible text.

* Build hygiene — `backend/target` had grown to 68 GiB. Cleaned and
  added `[profile.dev] debug = "line-tables-only"` (and `[profile.test]`
  too) to cut future dev builds by ~50–70% while keeping line numbers
  in backtraces.

Also: configure vitest to resolve Svelte's browser entry so
`@testing-library/svelte` can mount components in jsdom — needed for
the new `Pager.svelte.test.ts`.

Bump 0.48.0 -> 0.49.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 21:18:53 +02:00
MechaCat02
e50fc093c3 feat: add PRIVATE_MODE site-wide auth gate (0.48.0)
When `PRIVATE_MODE=true`, every API path except a small allowlist
(`/health`, `/auth/{config,login,logout,register}`) requires a valid
session cookie or bearer token — anonymous reads are rejected with
401. Self-registration is force-disabled in private mode regardless
of `ALLOW_SELF_REGISTER`, so a locked-down instance flips with a
single switch (admins still mint accounts via `POST /admin/users`).

The backend gate is a tower middleware that reuses the existing
`CurrentUser` extractor, so the cookie + bearer paths cannot drift
from per-handler auth. `/auth/config` now exposes the flag plus the
effective `self_register_enabled` value so the frontend can render
the navbar correctly on the first paint.

On the frontend, a new universal root `+layout.ts` fetches the
config and redirects anonymous visitors to `/login?next=<path>`
before page-specific loads fire. The redirect is UX only — the
backend middleware is the source of truth, so crafted requests
still 401.

Defaults stay public (`PRIVATE_MODE=false`); existing deployments
need no env change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 20:11:22 +02:00
MechaCat02
72756cfef2 feat(crawler): honour CRAWLER_LIMIT in the in-process daemon (0.47.0)
The CLI binary already capped runs at CRAWLER_LIMIT mangas, but the
daemon's RealMetadataPass passed a hardcoded `0` (no cap) to
`pipeline::run_metadata_pass`, so the env var was silently ignored once
the daemon took over the metadata pass.

Adds `manga_limit` to `CrawlerConfig`, reads it from `CRAWLER_LIMIT`
(default 0 = no cap), and threads it through `RealMetadataPass::run`
so a daemon-driven sweep stops at the same boundary as a CLI run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 20:07:01 +02:00
MechaCat02
4e20350645 fix(crawler): translate socks5h:// → socks5:// for Chromium --proxy-server
All checks were successful
deploy / test-backend (push) Successful in 19m30s
deploy / test-frontend (push) Successful in 9m42s
deploy / build-and-push (push) Successful in 8m10s
deploy / deploy (push) Successful in 15s
Chromium doesn't know the socks5h scheme (curl/reqwest convention)
and bails navigations with ERR_NO_SUPPORTED_PROXIES. It does, however,
send destination hostnames over SOCKS5 by default, so stripping the
`h` is a pure scheme rename — remote-DNS behaviour is preserved.

reqwest keeps the user's original CRAWLER_PROXY string (`socks5h://...`
remains valid and meaningful for it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:56:45 +02:00
MechaCat02
713ca139c4 feat(deploy): add optional tor service to dev compose for native-backend dev
Mirrors the prod tor service but with 127.0.0.1-only host port bindings
so a `cargo run` on the host can reach 127.0.0.1:9050 / 9051. Default
password baked in (overridable via TOR_CONTROL_PASSWORD env) since
host-loopback is the only exposure surface — same friction-free posture
as the postgres entry in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:56:45 +02:00
MechaCat02
e3cff9d874 fix(deploy): pivot tor service to password auth + wrapper entrypoint
Some checks failed
deploy / test-backend (push) Successful in 20m29s
deploy / test-frontend (push) Successful in 9m42s
deploy / deploy (push) Has been cancelled
deploy / build-and-push (push) Has been cancelled
Dockurr/tor's stock entrypoint binds the control port to localhost
(unreachable from a sibling container), refuses to run as a
non-default user (its setup chowns dirs and su-execs down to its
`tor` user, both requiring root), and skips its own
HashedControlPassword injection whenever the user's torrc declares
a ControlPort. The combination meant the original cookie-via-shared-
volume design couldn't work without fighting the image.

This commit:

- Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD
  with `tor --hash-password`, appends the hash to a writable copy
  of /etc/tor/torrc, then execs tor. Container runs as root only
  for that bring-up; the torrc's `User tor` directive drops privs
  after port binding.
- Adds a healthcheck on the tor service that gates downstream
  containers on both 9050 + 9051 actually listening (was
  service_started, which fires before tor finishes bootstrap).
- Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have
  rotated mid-chapter for any chapter with > ~50 images, which is
  exactly the kind of fingerprint we're trying to avoid.
- Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides
  (PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend).
  docker-compose.yml fails fast if unset.
- Removes the tor-data shared volume on backend (cookie auth is no
  longer the default; operators wanting cookie can mount it back).
- Documents the pivot + the cookie-vs-password tradeoff in
  .env.example.

End-to-end validated: `docker compose up -d tor`, then
`printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051`
returns three `250 OK` lines.

Audit ref: #2, #3, #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:25:54 +02:00
MechaCat02
d47e832613 fix(crawler): redact TorAuth::Password in Debug, drop NEWNYM info→debug
The startup log line in app.rs and bin/crawler.rs `?t`-debug-formats
the TorController, which through the derived Debug on TorAuth would
expand TorAuth::Password(p) and leak the plaintext password to logs.
Implement Debug manually on TorAuth — None / Password(<redacted>) /
Cookie(<path>) — and lock the redaction with a regression test.

Drop the per-NEWNYM success log from info to debug: a busy crawl
rotates circuits many times per minute. Failed NEWNYMs already log
at warn — those stay loud.

Tightens the closed_connection_mid_reply_is_an_error assertion which
was tautological (`closed connection` OR `AUTHENTICATE`) by driving
the mock to read the AUTH line then drop, exercising only the
EOF-mid-reply path.

Audit ref: #7, #9, nit on tautological test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:25:36 +02:00
MechaCat02
c30c7a546f fix(crawler): unify recircuit budget semantics — N = total attempts
The three retry-with-recircuit sites disagreed: detect.rs's
retry_on_transient_with_hook used "N = total attempts" (3 → 3
fetches), but session.rs's unauth branch and content.rs's chapter
loop used "N = recircuits" (3 → 4 fetches). At the same wall-clock
"max=3", different sites hit the upstream a different number of times.

Unify on N = total attempts (matching the existing
retry_on_transient convention). The CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS
env var now means exactly what its name suggests. Disabling the
recircuit feature collapses to max_attempts=1 (single attempt, no
retry) — bit-for-bit pre-TOR behavior preserved.

Adds a debug_assert!(max >= 1) on both helpers and a new
content.rs test exercising the mixed Transient → Unauth → Ok
sequence to lock in the shared-counter invariant.

Audit ref: #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:25:25 +02:00
MechaCat02
a0db7beb81 chore: bump to 0.46.0 for TOR proxy + recircuit feature
CRAWLER_TOR_CONTROL_URL, _PASSWORD, _COOKIE_PATH,
_RECIRCUIT_MAX_ATTEMPTS are new feature env vars; treat per CLAUDE.md
as a minor bump (feat:).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:01:57 +02:00
MechaCat02
ecbbebafc4 feat(deploy): dockurr/tor service + torrc; wire crawler to use it by default
Adds a `tor` service to the compose stack (dockurr/tor) with a torrc
tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr +
IsolateDestPort so NEWNYM picks up promptly, control port on 9051
with cookie auth, MaxCircuitDirtiness 60.

Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and
CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on
out-of-the-box. Operators can override both to empty in .env to opt
out without removing the service.

The tor-data named volume is mounted ro on the backend so it can read
/var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles
the permissions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:01:04 +02:00
MechaCat02
8c6378b877 feat(crawler): recircuit TOR on transient pages and unauthenticated probes
- target.rs swaps retry_on_transient → retry_on_transient_with_hook,
  signaling NEWNYM via ctx.tor between attempts when configured.
- session.rs gains verify_session_with_recircuit; the bare
  verify_session is now a one-line wrapper passing tor=None,
  unauth_max_recircuit=0. The inner run_session_probe_loop is
  pure-over-IO and unit-tested with closure-based fakes.
- content.rs extracts fetch_chapter_html_once + the closure-driven
  fetch_chapter_html_with_recircuit, used by sync_chapter_content to
  retry on Transient or Unauthenticated up to a recircuit_budget.
  Budget = 0 (no TOR) preserves original behavior bit-for-bit.
- app.rs and bin/crawler.rs construct the controller before on_launch
  and pass it into verify_session_with_recircuit, so a transient
  hiccup at startup no longer requires PHPSESSID rotation.

Recircuit budget defaults to CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS (3).
Errors from NEWNYM are logged and swallowed — failing to recircuit
should not take down the crawl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:01:04 +02:00
MechaCat02
8557e432a2 feat(crawler): plumb TorController through FetchContext and pipelines
Adds CRAWLER_TOR_CONTROL_URL / _PASSWORD / _COOKIE_PATH /
_RECIRCUIT_MAX_ATTEMPTS to CrawlerConfig and to bin/crawler.rs's
env reads. Constructs an Option<Arc<TorController>> at daemon /
CLI startup and threads it through FetchContext,
pipeline::run_metadata_pass, and content::sync_chapter_content as
Option<&TorController>.

Pure scaffolding — the controller isn't used yet; behavior is
unchanged. Next commit wires the retry hooks and session-probe
recircuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 19:59:47 +02:00
MechaCat02
d6d84dedcb feat(crawler): retry_on_transient_with_hook for between-retry side effects
Adds a sibling fn that fires a caller-supplied async hook between a
transient failure and the next attempt. The existing
retry_on_transient becomes a thin wrapper over it (no-op hook), so
no call sites churn yet.

Hook contract: fires only between attempts (N-1 times for N
attempts), never after a non-transient error or after the final
attempt. Designed for TOR NEWNYM, but the signature is generic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 19:59:47 +02:00
MechaCat02
d37b94871e feat(crawler): TorController for control-port NEWNYM signaling
Minimal client over tokio::net::TcpStream — AUTHENTICATE then
SIGNAL NEWNYM, one-shot connection. Supports cookie-file and
password auth (cookie preferred when both provided); covers the
multi-line `250-...\r\n250 OK` reply form so future torrc tweaks
won't confuse the parser.

Not yet wired into the crawler — that lands in the next commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 19:59:47 +02:00
8e39fadd21 ci: build via host docker socket (plain build); fix missing daemon socket (#5)
All checks were successful
deploy / test-backend (push) Successful in 19m12s
deploy / test-frontend (push) Successful in 9m43s
deploy / build-and-push (push) Successful in 8m11s
deploy / deploy (push) Successful in 11s
2026-05-31 17:40:14 +00:00
69 changed files with 5105 additions and 228 deletions

View File

@@ -74,6 +74,10 @@ CRAWLER_DOWNLOAD_ALLOWLIST=
CRAWLER_ALLOW_ANY_HOST=false CRAWLER_ALLOW_ANY_HOST=false
# Hard cap on a single image body. Default 32 MiB. # Hard cap on a single image body. Default 32 MiB.
CRAWLER_MAX_IMAGE_BYTES=33554432 CRAWLER_MAX_IMAGE_BYTES=33554432
# Max manga detail fetches per metadata pass (both the in-process daemon
# and the `bin/crawler` CLI). 0 means no cap — let the source walker run
# to completion. Useful for capped test runs against a new source.
CRAWLER_LIMIT=0
# Path to a system Chromium binary. When set, the crawler skips the # Path to a system Chromium binary. When set, the crawler skips the
# bundled-fetcher download. Required on platforms without a usable # bundled-fetcher download. Required on platforms without a usable
# upstream Chromium build (notably Linux_arm64 / Raspberry Pi). On # upstream Chromium build (notably Linux_arm64 / Raspberry Pi). On
@@ -83,6 +87,43 @@ CRAWLER_MAX_IMAGE_BYTES=33554432
# the image actually contains the binary. # the image actually contains the binary.
CRAWLER_CHROMIUM_BINARY= CRAWLER_CHROMIUM_BINARY=
# ----- Crawler TOR proxy + recircuit -----
# The compose stack ships a `tor` service (dockurr/tor) and defaults
# CRAWLER_PROXY to it, so by default all crawler traffic exits via the
# TOR network. To opt out, set CRAWLER_PROXY= (empty) AND
# CRAWLER_TOR_CONTROL_URL= (empty) below — the tor service can stay
# running, it just won't be used.
#
# Going through TOR adds latency to every fetch; image downloads in
# particular slow noticeably. The win is on sites that rate-limit or
# fingerprint by exit IP — NEWNYM recirculation makes a fresh exit
# cheap to reach for.
#
# CRAWLER_PROXY: SOCKS5(h) URL. Use `socks5h://` (not `socks5://`) so
# DNS resolution also goes through TOR, avoiding leaks via the host's
# resolver. Leave unset to talk to the upstream directly.
CRAWLER_PROXY=socks5h://tor:9050
# Control-port URL for SIGNAL NEWNYM ("get a fresh circuit"). Triggered
# automatically on bad pages (broken-page body, missing #logo) and on
# the Unauthenticated session probe outcome. Leave unset to disable
# the recircuit feature (the SOCKS proxy still works).
CRAWLER_TOR_CONTROL_URL=tcp://tor:9051
# Max NEWNYM-and-retry cycles per recircuit-eligible failure. Default 3.
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS=3
# ----- TOR control-port password -----
# Shared between the bundled dockurr/tor service (which hashes it into
# its HashedControlPassword) and the backend's
# CRAWLER_TOR_CONTROL_PASSWORD. REQUIRED — docker-compose.yml fails
# fast if absent. Generate a strong random string; rotate by setting
# a new value and restarting both `tor` and `backend`.
#
# Operators running their own non-dockurr tor daemon with cookie-file
# auth can ignore this var and instead set
# CRAWLER_TOR_CONTROL_COOKIE_PATH on the backend — the TorController
# prefers cookie when both are present.
TOR_CONTROL_PASSWORD=change-me-to-a-strong-random-string
# ----- Frontend ----- # ----- Frontend -----
# The frontend container runs SvelteKit's Node adapter on :3000 and # The frontend container runs SvelteKit's Node adapter on :3000 and
# proxies /api/* to BACKEND_URL via src/hooks.server.ts. In compose the # proxies /api/* to BACKEND_URL via src/hooks.server.ts. In compose the

View File

@@ -72,9 +72,17 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
needs: [test-backend, test-frontend] needs: [test-backend, test-frontend]
# PRs only run the test jobs; build + deploy are reserved for # PRs only run the test jobs; build + deploy are reserved for
# post-merge pushes to main. Without this gate every PR would push # post-merge pushes to main.
# a tagged image to the registry and SSH-deploy to prod.
if: github.event_name != 'pull_request' if: github.event_name != 'pull_request'
# Build on the host docker daemon directly (docker-outside-of-docker):
# the runner shares the deploy host's daemon, so a plain `docker build`
# reuses the host's layer cache and avoids buildx's docker-container
# driver + the gha cache exporter — neither works against this single-host
# act_runner, and there is no in-job daemon socket unless we mount it.
container:
image: docker.gitea.com/runner-images:ubuntu-latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
outputs: outputs:
image_tag: ${{ steps.meta.outputs.image_tag }} image_tag: ${{ steps.meta.outputs.image_tag }}
version: ${{ steps.meta.outputs.version }} version: ${{ steps.meta.outputs.version }}
@@ -93,48 +101,32 @@ jobs:
echo "image_tag=${GITHUB_SHA}" >> "$GITHUB_OUTPUT" echo "image_tag=${GITHUB_SHA}" >> "$GITHUB_OUTPUT"
echo "version=${version}" >> "$GITHUB_OUTPUT" echo "version=${version}" >> "$GITHUB_OUTPUT"
- uses: docker/setup-buildx-action@v3 - name: Build & push backend + frontend
env:
- name: docker login REGISTRY_URL: ${{ secrets.REGISTRY_URL }}
uses: docker/login-action@v3 REGISTRY_USERNAME: ${{ secrets.REGISTRY_USERNAME }}
with: REGISTRY_PASSWORD: ${{ secrets.REGISTRY_PASSWORD }}
registry: ${{ secrets.REGISTRY_URL }} IMAGE_TAG: ${{ steps.meta.outputs.image_tag }}
username: ${{ secrets.REGISTRY_USERNAME }} VERSION: ${{ steps.meta.outputs.version }}
password: ${{ secrets.REGISTRY_PASSWORD }} run: |
set -eu
- name: Build & push backend echo "$REGISTRY_PASSWORD" | docker login "$REGISTRY_URL" -u "$REGISTRY_USERNAME" --password-stdin
uses: docker/build-push-action@v5 for svc in backend frontend; do
with: img="$REGISTRY_URL/mangalord-$svc"
context: ./backend docker build -t "$img:$IMAGE_TAG" -t "$img:latest" -t "$img:$VERSION" "./$svc"
push: true for tag in "$IMAGE_TAG" latest "$VERSION"; do docker push "$img:$tag"; done
tags: | done
${{ secrets.REGISTRY_URL }}/mangalord-backend:latest docker logout "$REGISTRY_URL"
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.image_tag }}
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.version }}
cache-from: type=gha,scope=backend
cache-to: type=gha,mode=max,scope=backend
- name: Build & push frontend
uses: docker/build-push-action@v5
with:
context: ./frontend
push: true
tags: |
${{ secrets.REGISTRY_URL }}/mangalord-frontend:latest
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.image_tag }}
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.version }}
cache-from: type=gha,scope=frontend
cache-to: type=gha,mode=max,scope=frontend
deploy: deploy:
runs-on: ubuntu-latest runs-on: ubuntu-latest
needs: build-and-push needs: build-and-push
if: github.event_name != 'pull_request' if: github.event_name != 'pull_request'
# Single-host deploy: the runner lives on the same box as the stack, so we # Single-host deploy: the runner lives on the same box as the stack, so we
# drive the host docker daemon directly (act_runner shares its socket via # drive the host docker daemon directly (the job mounts the host docker
# `docker_host: "-"`) instead of SSHing out. The compose dir is bind-mounted # socket) instead of SSHing out. The compose dir is bind-mounted at its
# at its REAL host path so compose's relative bind-mounts (./mangalord/..., # REAL host path so compose's relative bind-mounts (./mangalord/...,
# ./Caddyfile) resolve; this requires `/mnt/ssd/docker-data` in the runner's # ./Caddyfile) resolve; both paths must be in the runner's
# container.valid_volumes. The central compose references the images as # container.valid_volumes. The central compose references the images as
# registry.mc02.dev/mangalord-*:${MANGALORD_TAG:-latest}, so we only pull # registry.mc02.dev/mangalord-*:${MANGALORD_TAG:-latest}, so we only pull
# and recreate the two mangalord services at the freshly built SHA. # and recreate the two mangalord services at the freshly built SHA.
@@ -142,6 +134,7 @@ jobs:
image: docker:cli image: docker:cli
volumes: volumes:
- /mnt/ssd/docker-data:/mnt/ssd/docker-data - /mnt/ssd/docker-data:/mnt/ssd/docker-data
- /var/run/docker.sock:/var/run/docker.sock
steps: steps:
- name: Deploy to the local stack - name: Deploy to the local stack
working-directory: /mnt/ssd/docker-data working-directory: /mnt/ssd/docker-data

78
HANDOFF.md Normal file
View File

@@ -0,0 +1,78 @@
# Hand-off — 2026-06-05
Snapshot of repo state for whoever picks up next. Pair with [CLAUDE.md](CLAUDE.md) for the architecture, dev rules, and command crib.
## Where main is
`main` is at **0.52.0** (commit `679abae`, `feat(chapter): preserve source-site order in chapter list`).
Recently landed (in order):
- `0.52.0``feat(chapter)`: source-site order in chapter list; new `source_index` column + reverse-sort. Migration `0021_chapter_source_index.sql` runs on next backend start.
- `0.51.2``fix(reader)`: drop `Chapter N:` prefix from title display; `chapterLabel()` helper in `$lib/api/chapters`.
- `0.47.0``feat(crawler)`: honour `CRAWLER_LIMIT` in the in-process daemon (previously CLI-only).
All three were authored this session.
## In-flight branches (pushed, not yet on main)
The remote now has 18 local-only branches. The ones most relevant to current work:
| Branch | Tip ver | Notes |
| --- | --- | --- |
| `feat/crawler-observability-and-reliability` | 0.54.0 | Active dev branch — admin crawler dashboard, live status SSE, coordinated browser restart, dead-job requeue, runtime PHPSESSID refresh, reliability fixes. Now also carries the `restart_browser` clears-`session_expired` fix from this session (commit `fb4182f`, 0.53.1). |
| `feat/private-mode` | 0.48.0 | Site-wide auth gate via `PRIVATE_MODE`. Already merged into main during this session. Branch left around for reference. |
| `feat/cover-retry-and-force-resync` | 0.50.0 | Cover retry backfill + admin force-resync. Already merged into main. |
The older `bugfix/*`, `chore/*`, and other `feat/*` branches are pre-0.40 era WIP — they predate this session and may be stale; verify before reviving.
## Session-specific changes worth flagging
### 1. `CRAWLER_LIMIT` now caps the daemon
Before: `bin/crawler` honoured `CRAWLER_LIMIT`, but the in-process daemon called `pipeline::run_metadata_pass(..., 0, ...)` so it always swept the full catalog.
Fix: `CrawlerConfig.manga_limit` reads `CRAWLER_LIMIT` (default `0` = no cap), threaded through `RealMetadataPass::run`. Same env var as the CLI.
Tests: `backend/src/config.rs::tests::crawler_limit_env_populates_manga_limit` (and the unset default).
### 2. Browser restart now drops the sticky `session_expired` flag
On branch `feat/crawler-observability-and-reliability` only (not on main yet).
Bug: hitting the admin "Restart browser" button ran `coordinated_restart` (which re-runs `on_launch` → re-injects PHPSESSID → re-probes), but `session_expired` stayed `true` regardless. UI continued to report Session Expired and chapter workers kept idling.
Fix: on `Ok(())` from `coordinated_restart`, the handler now calls `c.session.clear_expired()`. Error path still leaves it set. See [backend/src/api/admin/crawler.rs:230-262](backend/src/api/admin/crawler.rs#L230-L262).
Test: `backend/src/crawler/session_control.rs::tests::clear_expired_flips_sticky_flag_without_touching_session`.
## Dev stack
Compose ships db + tor (frontend runs natively). See `docker-compose.dev.yml`.
```bash
docker compose -f docker-compose.dev.yml up -d
(cd frontend && npm run dev) # http://localhost:5175
```
Backend dev command template (fill in `<START_URL>`):
```bash
cd backend && \
CRAWLER_START_URL="<START_URL>" \
CRAWLER_LIMIT=96 \
CRAWLER_DAEMON=true \
CRAWLER_PROXY=socks5h://localhost:9050 \
CRAWLER_TOR_CONTROL_URL=tcp://localhost:9051 \
CRAWLER_TOR_CONTROL_PASSWORD=dev-tor-password \
CRAWLER_ALLOW_ANY_HOST=true \
ADMIN_USERNAME=admin ADMIN_PASSWORD=admin \
cargo run --release
```
`backend/.env` already pins `DATABASE_URL`, `BIND_ADDRESS=0.0.0.0:18080`, `STORAGE_DIR`, `COOKIE_SECURE=false`. It also extends `RUST_LOG` to silence the noisy `chromiumoxide::conn` / `chromiumoxide::handler` WS-deserialize lines.
## Open items / known nuance
- **Vite generates `vite.config.ts.timestamp-*.mjs`** as a transient artifact when the dev server is running. It's not in `.gitignore`; consider adding `frontend/vite.config.ts.timestamp-*.mjs` to root `.gitignore` to stop it showing up in `git status`. Deleting the file is safe — Vite re-creates as needed.
- **Pre-bump drift.** Manifest versions have been hand-bumped ahead of commits a couple of times during this session (e.g. `0.53.0` sitting uncommitted in the working tree). When you land work on the active branches, double-check the `backend/Cargo.toml`, `backend/Cargo.lock`, `frontend/package.json` triplet are in lockstep before committing.
- **`feat/crawler-observability-and-reliability` is multi-feature.** It carries observability, reliability, runtime session control, dashboard, and now the session-expired-clear fix — landing it as one squash on main would be a sizeable diff. Consider whether to split it (e.g. split out the dashboard + dead-job requeue into its own slice) before merging.

2
backend/Cargo.lock generated
View File

@@ -1470,7 +1470,7 @@ checksum = "c41e0c4fef86961ac6d6f8a82609f55f31b05e4fce149ac5710e439df7619ba4"
[[package]] [[package]]
name = "mangalord" name = "mangalord"
version = "0.45.1" version = "0.52.0"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"argon2", "argon2",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "mangalord" name = "mangalord"
version = "0.45.1" version = "0.52.0"
edition = "2021" edition = "2021"
default-run = "mangalord" default-run = "mangalord"
@@ -57,3 +57,13 @@ http-body-util = "0.1"
mime = "0.3" mime = "0.3"
futures-util = "0.3" futures-util = "0.3"
tokio = { version = "1", features = ["test-util"] } tokio = { version = "1", features = ["test-util"] }
# Trim debug builds: keep line numbers in panics / backtraces but drop the
# full DWARF info (variable-level inspection in gdb/lldb). With a sqlx +
# axum + tokio dep tree the default ("full") leaves backend/target on the
# order of tens of GiB; this typically cuts ~5070% off that.
[profile.dev]
debug = "line-tables-only"
[profile.test]
debug = "line-tables-only"

View File

@@ -0,0 +1,18 @@
-- Capture each chapter's position in the source site's chapter list so
-- the user-facing list can preserve site order: variants of the same
-- chapter number (e.g. "Ch.14 : PH" next to "Ch.14 : Official") stay
-- adjacent, and non-numeric entries like "notice. : Officials" land
-- where the site placed them rather than clustering at the top under
-- number = 0.
--
-- Lower source_index = closer to the top of the source DOM = newer
-- chapter on this site (it renders newest-first). The list query
-- reverses this with ORDER BY source_index DESC so the oldest chapter
-- appears first in our UI.
--
-- NULL is the sentinel for user-uploaded chapters (no source row) and
-- for crawled rows that pre-date this migration. The list query keeps
-- the existing (number, created_at) tiebreak via NULLS LAST so those
-- fall through to the prior behaviour until the next crawler tick
-- populates the column.
ALTER TABLE chapters ADD COLUMN source_index INTEGER;

View File

@@ -5,6 +5,7 @@
//! `crate::auth::extractor::RequireAdmin`). //! `crate::auth::extractor::RequireAdmin`).
pub mod mangas; pub mod mangas;
pub mod resync;
pub mod system; pub mod system;
pub mod users; pub mod users;
@@ -16,5 +17,6 @@ pub fn routes() -> Router<AppState> {
Router::new() Router::new()
.merge(users::routes()) .merge(users::routes())
.merge(mangas::routes()) .merge(mangas::routes())
.merge(resync::routes())
.merge(system::routes()) .merge(system::routes())
} }

View File

@@ -0,0 +1,176 @@
//! Admin-triggered force resync of a single manga's metadata + cover,
//! or a single chapter's content.
//!
//! Both endpoints are admin-only (`RequireAdmin`, cookie-only) and run
//! synchronously with the request — the response carries the refreshed
//! resource so the UI can swap it in without a follow-up GET. The work
//! itself is delegated to [`ResyncService`] (set on AppState by
//! `app::build` when the crawler daemon is enabled); when the daemon
//! is disabled, both handlers return 503.
use axum::extract::{Path, State};
use axum::routing::post;
use axum::{Json, Router};
use serde::Serialize;
use serde_json::json;
use uuid::Uuid;
use crate::app::AppState;
use crate::auth::extractor::RequireAdmin;
use crate::crawler::resync::{ChapterResyncOutcome, ResyncError};
use crate::domain::manga::MangaDetail;
use crate::domain::Chapter;
use crate::error::{AppError, AppResult};
use crate::repo;
use crate::repo::crawler::UpsertStatus;
pub fn routes() -> Router<AppState> {
Router::new()
.route("/admin/mangas/:id/resync", post(resync_manga))
.route("/admin/chapters/:id/resync", post(resync_chapter))
}
#[derive(Debug, Serialize)]
pub struct MangaResyncResponse {
pub manga: MangaDetail,
/// `"new" | "updated" | "unchanged"` — mirrors [`UpsertStatus`].
pub metadata_status: &'static str,
pub cover_fetched: bool,
}
#[derive(Debug, Serialize)]
pub struct ChapterResyncResponse {
pub chapter: Chapter,
/// `"fetched" | "skipped"` — whether new pages landed or the
/// service short-circuited (e.g. chapter already had pages and the
/// session was lost so force was downgraded).
pub outcome: &'static str,
/// Page count when `outcome == "fetched"`. `None` for `skipped`.
pub pages: Option<usize>,
}
async fn resync_manga(
State(state): State<AppState>,
admin: RequireAdmin,
Path(manga_id): Path<Uuid>,
) -> AppResult<Json<MangaResyncResponse>> {
if !repo::manga::exists(&state.db, manga_id).await? {
return Err(AppError::NotFound);
}
let resync = state
.resync
.as_ref()
.ok_or_else(|| AppError::ServiceUnavailable(
"crawler daemon is disabled; force resync unavailable".into(),
))?;
let outcome = resync.resync_manga(manga_id).await.map_err(map_resync_err)?;
// Audit the action with the actor + the resync outcome so an
// operator-of-operators can answer "who refetched this manga, and
// did the cover land?" from the log alone.
repo::admin_audit::insert(
&state.db,
admin.0.id,
"manga_resync",
"manga",
Some(manga_id),
json!({
"metadata_status": status_str(outcome.metadata_status),
"cover_fetched": outcome.cover_fetched,
}),
)
.await?;
let manga = repo::manga::get_detail(&state.db, manga_id).await?;
Ok(Json(MangaResyncResponse {
manga,
metadata_status: status_str(outcome.metadata_status),
cover_fetched: outcome.cover_fetched,
}))
}
async fn resync_chapter(
State(state): State<AppState>,
admin: RequireAdmin,
Path(chapter_id): Path<Uuid>,
) -> AppResult<Json<ChapterResyncResponse>> {
let resync = state
.resync
.as_ref()
.ok_or_else(|| AppError::ServiceUnavailable(
"crawler daemon is disabled; force resync unavailable".into(),
))?;
// Look up the manga the chapter belongs to so we can return the
// refreshed chapter row in the response and 404 for unknown ids.
let manga_id: Option<Uuid> =
sqlx::query_scalar("SELECT manga_id FROM chapters WHERE id = $1")
.bind(chapter_id)
.fetch_optional(&state.db)
.await?;
let Some(manga_id) = manga_id else {
return Err(AppError::NotFound);
};
let outcome = resync
.resync_chapter(chapter_id)
.await
.map_err(map_resync_err)?;
let (outcome_str, pages) = match &outcome {
ChapterResyncOutcome::Fetched { pages, .. } => ("fetched", Some(*pages)),
ChapterResyncOutcome::Skipped { .. } => ("skipped", None),
};
repo::admin_audit::insert(
&state.db,
admin.0.id,
"chapter_resync",
"chapter",
Some(chapter_id),
json!({
"outcome": outcome_str,
"pages": pages,
}),
)
.await?;
let chapter = repo::chapter::find_by_id_in_manga(&state.db, manga_id, chapter_id)
.await?
.ok_or(AppError::NotFound)?;
Ok(Json(ChapterResyncResponse {
chapter,
outcome: outcome_str,
pages,
}))
}
fn status_str(s: UpsertStatus) -> &'static str {
match s {
UpsertStatus::New => "new",
UpsertStatus::Updated => "updated",
UpsertStatus::Unchanged => "unchanged",
}
}
/// Map [`ResyncError`] (and the anyhow envelopes wrapping it) onto the
/// right [`AppError`]. Anything else surfaces as a generic 500 via the
/// `Other` arm — the operator sees the underlying anyhow chain in
/// server logs, the client sees a clean envelope.
fn map_resync_err(err: anyhow::Error) -> AppError {
if let Some(rerr) = err.downcast_ref::<ResyncError>() {
match rerr {
ResyncError::NoMangaSource => AppError::ValidationFailed {
message: "manga has no live crawler source — cannot resync".into(),
details: json!({ "manga": "no_source" }),
},
ResyncError::NoChapterSource => AppError::ValidationFailed {
message: "chapter has no live crawler source — cannot resync".into(),
details: json!({ "chapter": "no_source" }),
},
}
} else {
AppError::Other(err)
}
}

View File

@@ -42,18 +42,22 @@ pub fn routes() -> Router<AppState> {
.route("/auth/tokens/:id", delete(delete_token)) .route("/auth/tokens/:id", delete(delete_token))
} }
/// Public, unauthenticated. Exposes anonymous-relevant auth policy /// Public, unauthenticated. Exposes anonymous-relevant auth policy so
/// (currently just whether self-registration is open) so the frontend /// the frontend can render its login / register affordances correctly
/// can render its login / register affordances correctly without a /// without a probe request that would conflate "disabled" with
/// probe request that would conflate "disabled" with "rate-limited". /// "rate-limited". `self_register_enabled` is the *effective* value
/// (`allow_self_register && !private_mode`), so a private-mode
/// instance reports `false` even if the raw flag is on.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct AuthConfigResponse { pub struct AuthConfigResponse {
pub self_register_enabled: bool, pub self_register_enabled: bool,
pub private_mode: bool,
} }
async fn auth_config(State(state): State<AppState>) -> Json<AuthConfigResponse> { async fn auth_config(State(state): State<AppState>) -> Json<AuthConfigResponse> {
Json(AuthConfigResponse { Json(AuthConfigResponse {
self_register_enabled: state.auth.allow_self_register, self_register_enabled: state.auth.allow_self_register && !state.auth.private_mode,
private_mode: state.auth.private_mode,
}) })
} }
@@ -103,7 +107,10 @@ async fn register(
// disabled and enabled paths both consume a token, and disabled // disabled and enabled paths both consume a token, and disabled
// returns 403 instead of running argon2. // returns 403 instead of running argon2.
check_auth_rate_limit(&state, "register")?; check_auth_rate_limit(&state, "register")?;
if !state.auth.allow_self_register { // Private mode force-blocks self-registration regardless of
// ALLOW_SELF_REGISTER — operators of locked-down instances mint
// accounts via `POST /admin/users` instead.
if !state.auth.allow_self_register || state.auth.private_mode {
return Err(AppError::Forbidden); return Err(AppError::Forbidden);
} }
let username = input.username.trim(); let username = input.username.trim();

View File

@@ -3,8 +3,10 @@ use std::sync::atomic::AtomicBool;
use anyhow::Context; use anyhow::Context;
use async_trait::async_trait; use async_trait::async_trait;
use axum::extract::DefaultBodyLimit; use axum::extract::{DefaultBodyLimit, FromRequestParts, Request, State};
use axum::http::{HeaderName, HeaderValue, Method}; use axum::http::{HeaderName, HeaderValue, Method};
use axum::middleware::{self, Next};
use axum::response::Response;
use axum::Router; use axum::Router;
use sqlx::postgres::PgPoolOptions; use sqlx::postgres::PgPoolOptions;
use sqlx::PgPool; use sqlx::PgPool;
@@ -12,7 +14,9 @@ use tokio_util::sync::CancellationToken;
use tower_http::cors::{AllowOrigin, CorsLayer}; use tower_http::cors::{AllowOrigin, CorsLayer};
use tower_http::trace::TraceLayer; use tower_http::trace::TraceLayer;
use crate::auth::extractor::CurrentUser;
use crate::auth::rate_limit::AuthRateLimiter; use crate::auth::rate_limit::AuthRateLimiter;
use crate::error::AppError;
use crate::config::{AuthConfig, Config, CrawlerConfig, UploadConfig}; use crate::config::{AuthConfig, Config, CrawlerConfig, UploadConfig};
use crate::crawler::browser_manager::{self, BrowserManager}; use crate::crawler::browser_manager::{self, BrowserManager};
use crate::crawler::content::{self, SyncOutcome}; use crate::crawler::content::{self, SyncOutcome};
@@ -20,6 +24,7 @@ use crate::crawler::daemon::{self, ChapterDispatcher, DaemonConfig, MetadataPass
use crate::crawler::jobs::JobPayload; use crate::crawler::jobs::JobPayload;
use crate::crawler::pipeline::{self, MetadataStats}; use crate::crawler::pipeline::{self, MetadataStats};
use crate::crawler::rate_limit::HostRateLimiters; use crate::crawler::rate_limit::HostRateLimiters;
use crate::crawler::resync::{RealResyncService, ResyncService};
use crate::crawler::safety::DownloadAllowlist; use crate::crawler::safety::DownloadAllowlist;
use crate::crawler::session; use crate::crawler::session;
use crate::repo; use crate::repo;
@@ -35,6 +40,12 @@ pub struct AppState {
/// One instance per AppState so tests stay isolated across the /// One instance per AppState so tests stay isolated across the
/// same process. /// same process.
pub auth_limiter: Arc<AuthRateLimiter>, pub auth_limiter: Arc<AuthRateLimiter>,
/// Admin-triggered force resync. `None` when the crawler daemon
/// is disabled (`CRAWLER_DAEMON=false`); admin handlers gate on
/// `.is_some()` and return 503 otherwise. Set by [`build`] from the
/// same wiring that builds the daemon's chapter dispatcher, so a
/// force resync uses the daemon's BrowserManager + rate limiters.
pub resync: Option<Arc<dyn ResyncService>>,
} }
/// Bundle returned by [`build`]. The router is what `axum::serve` consumes; /// Bundle returned by [`build`]. The router is what `axum::serve` consumes;
@@ -69,11 +80,12 @@ pub async fn build(config: Config) -> anyhow::Result<AppHandle> {
let storage: Arc<dyn Storage> = Arc::new(LocalStorage::new(config.storage_dir.clone())); let storage: Arc<dyn Storage> = Arc::new(LocalStorage::new(config.storage_dir.clone()));
let daemon = if config.crawler.daemon_enabled { let (daemon, resync) = if config.crawler.daemon_enabled {
Some(spawn_crawler_daemon(db.clone(), Arc::clone(&storage), &config.crawler).await?) let spawned = spawn_crawler_daemon(db.clone(), Arc::clone(&storage), &config.crawler).await?;
(Some(spawned.handle), Some(spawned.resync))
} else { } else {
tracing::info!("crawler daemon disabled (CRAWLER_DAEMON=false)"); tracing::info!("crawler daemon disabled (CRAWLER_DAEMON=false)");
None (None, None)
}; };
let auth_limiter = Arc::new(AuthRateLimiter::new(config.auth.rate_limit)); let auth_limiter = Arc::new(AuthRateLimiter::new(config.auth.rate_limit));
@@ -83,16 +95,26 @@ pub async fn build(config: Config) -> anyhow::Result<AppHandle> {
auth: config.auth.clone(), auth: config.auth.clone(),
upload: config.upload.clone(), upload: config.upload.clone(),
auth_limiter, auth_limiter,
resync,
}; };
let router = router(state).layer(cors_layer(&config.cors_allowed_origins)); let router = router(state).layer(cors_layer(&config.cors_allowed_origins));
Ok(AppHandle { router, daemon }) Ok(AppHandle { router, daemon })
} }
/// Bundle returned by [`spawn_crawler_daemon`]. The handle owns the
/// daemon's tasks; `resync` is the operator-trigger service shared with
/// `AppState` so admin endpoints can call into the same browser /
/// rate-limit machinery.
struct SpawnedDaemon {
handle: daemon::DaemonHandle,
resync: Arc<dyn ResyncService>,
}
async fn spawn_crawler_daemon( async fn spawn_crawler_daemon(
db: PgPool, db: PgPool,
storage: Arc<dyn Storage>, storage: Arc<dyn Storage>,
cfg: &CrawlerConfig, cfg: &CrawlerConfig,
) -> anyhow::Result<daemon::DaemonHandle> { ) -> anyhow::Result<SpawnedDaemon> {
// Reqwest client with cookie jar pre-seeded so CDN image fetches // Reqwest client with cookie jar pre-seeded so CDN image fetches
// include PHPSESSID. Same shape as bin/crawler.rs main(). // include PHPSESSID. Same shape as bin/crawler.rs main().
let cookie_jar = Arc::new(reqwest::cookie::Jar::default()); let cookie_jar = Arc::new(reqwest::cookie::Jar::default());
@@ -123,29 +145,49 @@ async fn spawn_crawler_daemon(
} }
let rate = Arc::new(rate); let rate = Arc::new(rate);
let tor = crate::crawler::tor::TorController::from_parts(
cfg.tor_control_url.as_deref(),
cfg.tor_control_password.as_deref(),
cfg.tor_control_cookie_path.as_deref(),
)
.context("build TorController from CRAWLER_TOR_CONTROL_* env")?
.map(Arc::new);
if let Some(t) = &tor {
tracing::info!(?t, "TOR control configured; transient pages will trigger NEWNYM");
}
let tor_recircuit_max = cfg.tor_recircuit_max_attempts;
// Browser manager. on_launch re-injects PHPSESSID on every fresh // Browser manager. on_launch re-injects PHPSESSID on every fresh
// chromium spawn so an idle teardown followed by re-launch stays // chromium spawn so an idle teardown followed by re-launch stays
// authenticated without operator action. // authenticated without operator action.
let mut launch_opts = cfg.browser.clone(); let mut launch_opts = cfg.browser.clone();
if let Some(proxy) = &cfg.proxy { if let Some(proxy) = &cfg.proxy {
launch_opts.extra_args.push(format!("--proxy-server={proxy}")); let chromium_proxy = crate::crawler::url_utils::chromium_proxy_arg(proxy);
launch_opts.extra_args.push(format!("--proxy-server={chromium_proxy}"));
} }
let on_launch = match (&cfg.phpsessid, &cfg.cookie_domain, &cfg.start_url) { let on_launch = match (&cfg.phpsessid, &cfg.cookie_domain, &cfg.start_url) {
(Some(sid), Some(domain), Some(start_url)) => { (Some(sid), Some(domain), Some(start_url)) => {
let sid = sid.clone(); let sid = sid.clone();
let domain = domain.clone(); let domain = domain.clone();
let start_url = start_url.clone(); let start_url = start_url.clone();
let tor_for_launch = tor.as_ref().map(Arc::clone);
let on_launch: browser_manager::OnLaunch = Arc::new(move |browser| { let on_launch: browser_manager::OnLaunch = Arc::new(move |browser| {
let sid = sid.clone(); let sid = sid.clone();
let domain = domain.clone(); let domain = domain.clone();
let start_url = start_url.clone(); let start_url = start_url.clone();
let tor_for_launch = tor_for_launch.as_ref().map(Arc::clone);
Box::pin(async move { Box::pin(async move {
session::inject_phpsessid(&browser, &sid, &domain) session::inject_phpsessid(&browser, &sid, &domain)
.await .await
.context("on_launch: inject_phpsessid")?; .context("on_launch: inject_phpsessid")?;
session::verify_session(&browser, &start_url) session::verify_session_with_recircuit(
.await &browser,
.context("on_launch: verify_session")?; &start_url,
tor_for_launch.as_deref(),
tor_recircuit_max,
)
.await
.context("on_launch: verify_session")?;
Ok(()) Ok(())
}) })
}); });
@@ -165,13 +207,26 @@ async fn spawn_crawler_daemon(
http: http.clone(), http: http.clone(),
rate: Arc::clone(&rate), rate: Arc::clone(&rate),
start_url: url.clone(), start_url: url.clone(),
manga_limit: cfg.manga_limit,
download_allowlist: cfg.download_allowlist.clone(), download_allowlist: cfg.download_allowlist.clone(),
max_image_bytes: cfg.max_image_bytes, max_image_bytes: cfg.max_image_bytes,
tor: tor.as_ref().map(Arc::clone),
}); });
m m
}); });
let dispatcher: Arc<dyn ChapterDispatcher> = Arc::new(RealChapterDispatcher { let dispatcher: Arc<dyn ChapterDispatcher> = Arc::new(RealChapterDispatcher {
browser_manager: Arc::clone(&browser_manager),
db: db.clone(),
storage: Arc::clone(&storage),
http: http.clone(),
rate: Arc::clone(&rate),
download_allowlist: cfg.download_allowlist.clone(),
max_image_bytes: cfg.max_image_bytes,
tor: tor.as_ref().map(Arc::clone),
});
let resync: Arc<dyn ResyncService> = Arc::new(RealResyncService {
browser_manager: Arc::clone(&browser_manager), browser_manager: Arc::clone(&browser_manager),
db: db.clone(), db: db.clone(),
storage: Arc::clone(&storage), storage: Arc::clone(&storage),
@@ -179,6 +234,7 @@ async fn spawn_crawler_daemon(
rate: Arc::clone(&rate), rate: Arc::clone(&rate),
download_allowlist: cfg.download_allowlist.clone(), download_allowlist: cfg.download_allowlist.clone(),
max_image_bytes: cfg.max_image_bytes, max_image_bytes: cfg.max_image_bytes,
tor: tor.as_ref().map(Arc::clone),
}); });
// Shared cancellation: daemon shutdown cancels the BrowserManager's // Shared cancellation: daemon shutdown cancels the BrowserManager's
@@ -215,7 +271,10 @@ async fn spawn_crawler_daemon(
}, },
); );
Ok(daemon_handle) Ok(SpawnedDaemon {
handle: daemon_handle,
resync,
})
} }
// Real impls of the daemon traits, owning the browser manager + I/O. Kept // Real impls of the daemon traits, owning the browser manager + I/O. Kept
@@ -230,8 +289,10 @@ struct RealMetadataPass {
http: reqwest::Client, http: reqwest::Client,
rate: Arc<HostRateLimiters>, rate: Arc<HostRateLimiters>,
start_url: String, start_url: String,
manga_limit: usize,
download_allowlist: DownloadAllowlist, download_allowlist: DownloadAllowlist,
max_image_bytes: usize, max_image_bytes: usize,
tor: Option<Arc<crate::crawler::tor::TorController>>,
} }
#[async_trait] #[async_trait]
@@ -244,10 +305,11 @@ impl MetadataPass for RealMetadataPass {
&self.http, &self.http,
&self.rate, &self.rate,
&self.start_url, &self.start_url,
0, self.manga_limit,
false, false,
&self.download_allowlist, &self.download_allowlist,
self.max_image_bytes, self.max_image_bytes,
self.tor.as_deref(),
) )
.await; .await;
if let Err(e) = &result { if let Err(e) = &result {
@@ -255,6 +317,36 @@ impl MetadataPass for RealMetadataPass {
self.browser_manager.invalidate().await; self.browser_manager.invalidate().await;
} }
} }
// Cover backfill follows the metadata pass even when the pass
// errored — the early-stop walk can complete its work and bail
// late, and a transient browser failure shouldn't cancel the
// residual cover backlog. The backfill has its own per-call cap
// so a runaway error stream can't monopolise the tick.
match pipeline::backfill_missing_covers(
&self.browser_manager,
&self.db,
self.storage.as_ref(),
&self.http,
&self.rate,
pipeline::COVER_BACKFILL_DEFAULT_MAX,
&self.download_allowlist,
self.max_image_bytes,
self.tor.as_deref(),
)
.await
{
Ok(stats) => {
if stats.considered > 0 {
tracing::info!(?stats, "cover backfill complete");
}
}
Err(e) => {
tracing::warn!(error = ?e, "cover backfill failed");
if crate::crawler::nav::anyhow_looks_browser_dead(&e) {
self.browser_manager.invalidate().await;
}
}
}
result result
} }
} }
@@ -267,6 +359,7 @@ struct RealChapterDispatcher {
rate: Arc<HostRateLimiters>, rate: Arc<HostRateLimiters>,
download_allowlist: DownloadAllowlist, download_allowlist: DownloadAllowlist,
max_image_bytes: usize, max_image_bytes: usize,
tor: Option<Arc<crate::crawler::tor::TorController>>,
} }
#[async_trait] #[async_trait]
@@ -298,6 +391,7 @@ impl ChapterDispatcher for RealChapterDispatcher {
false, false,
&self.download_allowlist, &self.download_allowlist,
self.max_image_bytes, self.max_image_bytes,
self.tor.as_deref(),
) )
.await; .await;
drop(lease); drop(lease);
@@ -325,11 +419,62 @@ pub fn router(state: AppState) -> Router {
let max_request_bytes = state.upload.max_request_bytes; let max_request_bytes = state.upload.max_request_bytes;
Router::new() Router::new()
.nest("/api/v1", crate::api::routes()) .nest("/api/v1", crate::api::routes())
.layer(middleware::from_fn_with_state(
state.clone(),
private_mode_guard,
))
.layer(DefaultBodyLimit::max(max_request_bytes)) .layer(DefaultBodyLimit::max(max_request_bytes))
.with_state(state) .with_state(state)
.layer(TraceLayer::new_for_http()) .layer(TraceLayer::new_for_http())
} }
/// Paths reachable anonymously even when `PRIVATE_MODE=true`. Login and
/// logout are needed for the auth flow itself; `/health` is reserved
/// for load-balancer probes; `/auth/config` lets the frontend decide
/// whether to render the login form or its anonymous alternatives;
/// `/auth/register` is exempted from the gate so the handler can
/// return its informative `registration_disabled` 403 (the same code
/// public-mode deployments use when `ALLOW_SELF_REGISTER=false`) —
/// the handler itself force-blocks the request body in private mode,
/// so no account ever gets created here. Everything else demands a
/// valid session cookie or bearer token.
fn is_public_in_private_mode(path: &str) -> bool {
matches!(
path,
"/api/v1/health"
| "/api/v1/auth/config"
| "/api/v1/auth/login"
| "/api/v1/auth/logout"
| "/api/v1/auth/register"
)
}
/// Site-wide auth gate for `PRIVATE_MODE=true`. With the flag off this
/// is a no-op pass-through, so public deployments take no extra DB
/// hit. With it on, the guard reuses [`CurrentUser`] — the same
/// session-cookie-then-bearer-token logic the per-handler extractor
/// uses — so the two paths can never drift.
async fn private_mode_guard(
State(state): State<AppState>,
req: Request,
next: Next,
) -> Result<Response, AppError> {
if !state.auth.private_mode {
return Ok(next.run(req).await);
}
if is_public_in_private_mode(req.uri().path()) {
return Ok(next.run(req).await);
}
let (mut parts, body) = req.into_parts();
match CurrentUser::from_request_parts(&mut parts, &state).await {
Ok(_) => {
let req = Request::from_parts(parts, body);
Ok(next.run(req).await)
}
Err(_) => Err(AppError::Unauthenticated),
}
}
pub(crate) fn cors_layer(allowed_origins: &[String]) -> CorsLayer { pub(crate) fn cors_layer(allowed_origins: &[String]) -> CorsLayer {
if allowed_origins.is_empty() { if allowed_origins.is_empty() {
// Same-origin only — no CORS headers emitted. // Same-origin only — no CORS headers emitted.

View File

@@ -78,6 +78,21 @@ async fn main() -> anyhow::Result<()> {
let proxy_url = std::env::var("CRAWLER_PROXY") let proxy_url = std::env::var("CRAWLER_PROXY")
.ok() .ok()
.filter(|s| !s.trim().is_empty()); .filter(|s| !s.trim().is_empty());
let tor_control_url = std::env::var("CRAWLER_TOR_CONTROL_URL")
.ok()
.filter(|s| !s.trim().is_empty());
let tor_control_password = std::env::var("CRAWLER_TOR_CONTROL_PASSWORD")
.ok()
.filter(|s| !s.trim().is_empty());
let tor_control_cookie_path = std::env::var("CRAWLER_TOR_CONTROL_COOKIE_PATH")
.ok()
.filter(|s| !s.trim().is_empty())
.map(std::path::PathBuf::from);
let tor_recircuit_max_attempts: u32 = std::env::var("CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS")
.ok()
.and_then(|s| s.parse().ok())
.unwrap_or(3)
.max(1);
let keep_browser_open = env_bool("CRAWLER_KEEP_BROWSER_OPEN", false); let keep_browser_open = env_bool("CRAWLER_KEEP_BROWSER_OPEN", false);
let db = PgPoolOptions::new() let db = PgPoolOptions::new()
@@ -112,7 +127,8 @@ async fn main() -> anyhow::Result<()> {
let mut options = LaunchOptions::from_env(); let mut options = LaunchOptions::from_env();
if let Some(proxy) = &proxy_url { if let Some(proxy) = &proxy_url {
options.extra_args.push(format!("--proxy-server={proxy}")); let chromium_proxy = mangalord::crawler::url_utils::chromium_proxy_arg(proxy);
options.extra_args.push(format!("--proxy-server={chromium_proxy}"));
} }
let keep_open = match (keep_browser_open, options.mode) { let keep_open = match (keep_browser_open, options.mode) {
(true, BrowserMode::Headed) => true, (true, BrowserMode::Headed) => true,
@@ -144,6 +160,17 @@ async fn main() -> anyhow::Result<()> {
"starting crawler" "starting crawler"
); );
let tor = mangalord::crawler::tor::TorController::from_parts(
tor_control_url.as_deref(),
tor_control_password.as_deref(),
tor_control_cookie_path.as_deref(),
)
.context("build TorController from CRAWLER_TOR_CONTROL_* env")?
.map(Arc::new);
if let Some(t) = &tor {
tracing::info!(?t, "TOR control configured");
}
// BrowserManager with idle_timeout = ZERO so the CLI keeps Chromium // BrowserManager with idle_timeout = ZERO so the CLI keeps Chromium
// alive for the entire run — same lifecycle as the old direct // alive for the entire run — same lifecycle as the old direct
// `browser::launch()` flow. on_launch re-injects PHPSESSID + runs the // `browser::launch()` flow. on_launch re-injects PHPSESSID + runs the
@@ -153,17 +180,24 @@ async fn main() -> anyhow::Result<()> {
let sid = sid.clone(); let sid = sid.clone();
let domain = domain.clone(); let domain = domain.clone();
let start_url_clone = start_url.clone(); let start_url_clone = start_url.clone();
let tor_for_launch = tor.as_ref().map(Arc::clone);
Arc::new(move |browser| { Arc::new(move |browser| {
let sid = sid.clone(); let sid = sid.clone();
let domain = domain.clone(); let domain = domain.clone();
let start_url = start_url_clone.clone(); let start_url = start_url_clone.clone();
let tor_for_launch = tor_for_launch.as_ref().map(Arc::clone);
Box::pin(async move { Box::pin(async move {
session::inject_phpsessid(&browser, &sid, &domain) session::inject_phpsessid(&browser, &sid, &domain)
.await .await
.context("inject_phpsessid")?; .context("inject_phpsessid")?;
session::verify_session(&browser, &start_url) session::verify_session_with_recircuit(
.await &browser,
.context("verify_session")?; &start_url,
tor_for_launch.as_deref(),
tor_recircuit_max_attempts,
)
.await
.context("verify_session")?;
Ok(()) Ok(())
}) })
}) })
@@ -187,6 +221,7 @@ async fn main() -> anyhow::Result<()> {
skip_chapter_content || !session_ready, skip_chapter_content || !session_ready,
chapter_workers, chapter_workers,
force_refetch_chapters, force_refetch_chapters,
tor.clone(),
) )
.await; .await;
@@ -216,6 +251,7 @@ async fn run(
skip_chapter_content: bool, skip_chapter_content: bool,
chapter_workers: usize, chapter_workers: usize,
force_refetch_chapters: bool, force_refetch_chapters: bool,
tor: Option<Arc<mangalord::crawler::tor::TorController>>,
) -> anyhow::Result<()> { ) -> anyhow::Result<()> {
let mut rate = HostRateLimiters::new(Duration::from_millis(rate_ms)); let mut rate = HostRateLimiters::new(Duration::from_millis(rate_ms));
if let Some(host) = cdn_host { if let Some(host) = cdn_host {
@@ -267,6 +303,7 @@ async fn run(
skip_chapters, skip_chapters,
allowlist.as_ref(), allowlist.as_ref(),
max_image_bytes, max_image_bytes,
tor.as_deref(),
) )
.await?; .await?;
tracing::info!(?stats, "metadata pass complete"); tracing::info!(?stats, "metadata pass complete");
@@ -283,6 +320,7 @@ async fn run(
force_refetch_chapters, force_refetch_chapters,
Arc::clone(&allowlist), Arc::clone(&allowlist),
max_image_bytes, max_image_bytes,
tor.clone(),
) )
.await?; .await?;
} }
@@ -308,6 +346,7 @@ async fn sync_bookmarked_chapter_content(
force_refetch: bool, force_refetch: bool,
allowlist: Arc<mangalord::crawler::safety::DownloadAllowlist>, allowlist: Arc<mangalord::crawler::safety::DownloadAllowlist>,
max_image_bytes: usize, max_image_bytes: usize,
tor: Option<Arc<mangalord::crawler::tor::TorController>>,
) -> anyhow::Result<()> { ) -> anyhow::Result<()> {
let pending: Vec<(Uuid, Uuid, String)> = sqlx::query_as( let pending: Vec<(Uuid, Uuid, String)> = sqlx::query_as(
r#" r#"
@@ -345,6 +384,7 @@ async fn sync_bookmarked_chapter_content(
let rate = Arc::clone(&rate); let rate = Arc::clone(&rate);
let manager = Arc::clone(&manager); let manager = Arc::clone(&manager);
let allowlist = Arc::clone(&allowlist); let allowlist = Arc::clone(&allowlist);
let tor = tor.clone();
let stats = &stats; let stats = &stats;
async move { async move {
if session_expired.load(std::sync::atomic::Ordering::Relaxed) { if session_expired.load(std::sync::atomic::Ordering::Relaxed) {
@@ -371,6 +411,7 @@ async fn sync_bookmarked_chapter_content(
force_refetch, force_refetch,
allowlist.as_ref(), allowlist.as_ref(),
max_image_bytes, max_image_bytes,
tor.as_deref(),
) )
.await; .await;
drop(lease); drop(lease);

View File

@@ -19,6 +19,14 @@ pub struct AuthConfig {
/// `POST /admin/users`. Defaults to `true` (open registration) /// `POST /admin/users`. Defaults to `true` (open registration)
/// for backward compatibility. /// for backward compatibility.
pub allow_self_register: bool, pub allow_self_register: bool,
/// When `true`, every API path except a small allowlist
/// (`/health`, `/auth/config`, `/auth/login`, `/auth/logout`)
/// requires a valid session cookie or bearer token — anonymous
/// reads are rejected with 401. Self-registration is also
/// force-disabled regardless of [`Self::allow_self_register`]
/// so a private instance is locked down with a single switch.
/// Defaults to `false` (current public behaviour).
pub private_mode: bool,
} }
impl Default for AuthConfig { impl Default for AuthConfig {
@@ -33,6 +41,7 @@ impl Default for AuthConfig {
// defaults. // defaults.
rate_limit: crate::auth::rate_limit::RateLimitConfig::default(), rate_limit: crate::auth::rate_limit::RateLimitConfig::default(),
allow_self_register: true, allow_self_register: true,
private_mode: false,
} }
} }
} }
@@ -97,6 +106,20 @@ pub struct CrawlerConfig {
pub cookie_domain: Option<String>, pub cookie_domain: Option<String>,
pub user_agent: Option<String>, pub user_agent: Option<String>,
pub proxy: Option<String>, pub proxy: Option<String>,
/// `tcp://host:port`, `host:port`, or bare `host` (default port
/// 9051). When `None`, TOR-recircuit-on-transient is disabled and
/// the crawler behaves identically to pre-TOR releases.
pub tor_control_url: Option<String>,
/// HashedControlPassword auth. Used only when
/// `tor_control_cookie_path` is `None`.
pub tor_control_password: Option<String>,
/// Cookie-file auth path (e.g.
/// `/var/lib/tor/control_auth_cookie`). Takes precedence over
/// password when both are set.
pub tor_control_cookie_path: Option<PathBuf>,
/// Maximum NEWNYM-and-retry cycles per recircuit-eligible failure.
/// Defaults to 3.
pub tor_recircuit_max_attempts: u32,
pub browser: LaunchOptions, pub browser: LaunchOptions,
/// Hosts the crawler is allowed to download images / covers from. /// Hosts the crawler is allowed to download images / covers from.
/// Always seeded with the host of `start_url` and (when set) the /// Always seeded with the host of `start_url` and (when set) the
@@ -105,6 +128,10 @@ pub struct CrawlerConfig {
pub download_allowlist: DownloadAllowlist, pub download_allowlist: DownloadAllowlist,
/// Hard upper bound on a single image download. Defaults to 32 MiB. /// Hard upper bound on a single image download. Defaults to 32 MiB.
pub max_image_bytes: usize, pub max_image_bytes: usize,
/// Max manga detail fetches per metadata pass. `0` means no cap
/// (full sweep up to the source's own bound). Sourced from
/// `CRAWLER_LIMIT`, mirroring the CLI binary.
pub manga_limit: usize,
} }
impl Default for CrawlerConfig { impl Default for CrawlerConfig {
@@ -124,9 +151,14 @@ impl Default for CrawlerConfig {
cookie_domain: None, cookie_domain: None,
user_agent: None, user_agent: None,
proxy: None, proxy: None,
tor_control_url: None,
tor_control_password: None,
tor_control_cookie_path: None,
tor_recircuit_max_attempts: 3,
browser: LaunchOptions::headless(), browser: LaunchOptions::headless(),
download_allowlist: DownloadAllowlist::new(), download_allowlist: DownloadAllowlist::new(),
max_image_bytes: DEFAULT_MAX_IMAGE_BYTES, max_image_bytes: DEFAULT_MAX_IMAGE_BYTES,
manga_limit: 0,
} }
} }
} }
@@ -158,6 +190,7 @@ impl Config {
) as u32, ) as u32,
}, },
allow_self_register: env_bool("ALLOW_SELF_REGISTER", true), allow_self_register: env_bool("ALLOW_SELF_REGISTER", true),
private_mode: env_bool("PRIVATE_MODE", false),
}, },
upload: UploadConfig { upload: UploadConfig {
max_request_bytes: env_usize("MAX_REQUEST_BYTES", 200 * 1024 * 1024), max_request_bytes: env_usize("MAX_REQUEST_BYTES", 200 * 1024 * 1024),
@@ -234,9 +267,22 @@ impl CrawlerConfig {
proxy: std::env::var("CRAWLER_PROXY") proxy: std::env::var("CRAWLER_PROXY")
.ok() .ok()
.filter(|s| !s.trim().is_empty()), .filter(|s| !s.trim().is_empty()),
tor_control_url: std::env::var("CRAWLER_TOR_CONTROL_URL")
.ok()
.filter(|s| !s.trim().is_empty()),
tor_control_password: std::env::var("CRAWLER_TOR_CONTROL_PASSWORD")
.ok()
.filter(|s| !s.trim().is_empty()),
tor_control_cookie_path: std::env::var("CRAWLER_TOR_CONTROL_COOKIE_PATH")
.ok()
.filter(|s| !s.trim().is_empty())
.map(PathBuf::from),
tor_recircuit_max_attempts: env_u64("CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS", 3)
.max(1) as u32,
browser: LaunchOptions::from_env(), browser: LaunchOptions::from_env(),
download_allowlist, download_allowlist,
max_image_bytes: env_usize("CRAWLER_MAX_IMAGE_BYTES", DEFAULT_MAX_IMAGE_BYTES), max_image_bytes: env_usize("CRAWLER_MAX_IMAGE_BYTES", DEFAULT_MAX_IMAGE_BYTES),
manga_limit: env_usize("CRAWLER_LIMIT", 0),
}) })
} }
} }
@@ -310,3 +356,64 @@ fn env_usize(name: &str, default: usize) -> usize {
.unwrap_or(default) .unwrap_or(default)
} }
#[cfg(test)]
mod tests {
use super::*;
use std::sync::Mutex;
// Serialise env-touching tests so concurrent cargo-test threads don't
// race on the process-global env. Re-acquire on poison since a
// panicking test still leaves the env in a consistent state for us
// (we set/unset within each guard region).
static ENV_GUARD: Mutex<()> = Mutex::new(());
#[test]
fn crawler_limit_env_populates_manga_limit() {
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
std::env::set_var("CRAWLER_LIMIT", "96");
let cfg = CrawlerConfig::from_env().expect("from_env");
std::env::remove_var("CRAWLER_LIMIT");
assert_eq!(cfg.manga_limit, 96);
}
#[test]
fn crawler_limit_unset_defaults_to_zero() {
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
std::env::remove_var("CRAWLER_LIMIT");
let cfg = CrawlerConfig::from_env().expect("from_env");
assert_eq!(cfg.manga_limit, 0);
}
#[test]
fn private_mode_env_parses_true() {
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
std::env::set_var("PRIVATE_MODE", "true");
std::env::set_var("DATABASE_URL", "postgres://test");
let cfg = Config::from_env().expect("from_env");
std::env::remove_var("PRIVATE_MODE");
std::env::remove_var("DATABASE_URL");
assert!(cfg.auth.private_mode);
}
#[test]
fn private_mode_env_parses_false() {
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
std::env::set_var("PRIVATE_MODE", "false");
std::env::set_var("DATABASE_URL", "postgres://test");
let cfg = Config::from_env().expect("from_env");
std::env::remove_var("PRIVATE_MODE");
std::env::remove_var("DATABASE_URL");
assert!(!cfg.auth.private_mode);
}
#[test]
fn private_mode_defaults_to_false() {
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
std::env::remove_var("PRIVATE_MODE");
std::env::set_var("DATABASE_URL", "postgres://test");
let cfg = Config::from_env().expect("from_env");
std::env::remove_var("DATABASE_URL");
assert!(!cfg.auth.private_mode);
}
}

View File

@@ -73,39 +73,36 @@ pub enum SyncOutcome {
SessionExpired, SessionExpired,
} }
/// Fetch all images for one chapter and persist them atomically. On /// Per-chapter max fetch attempts when TOR is configured. `N = 3` means
/// any error after the first storage put, the DB transaction rolls /// up to 3 total page fetches with 2 NEWNYM signals between them. When
/// back so the chapter stays at `page_count = 0` and is retried on the /// TOR is not configured the effective budget collapses to 1 (single
/// next run. Bytes already written to storage become orphans; a future /// attempt, no retry, no recircuit — bit-for-bit pre-TOR behavior).
/// reaper sweeps them. const CHAPTER_RECIRCUIT_MAX_ATTEMPTS: u32 = 3;
#[allow(clippy::too_many_arguments)]
pub async fn sync_chapter_content(
browser: &chromiumoxide::Browser,
db: &PgPool,
storage: &dyn Storage,
http: &reqwest::Client,
rate: &HostRateLimiters,
chapter_id: Uuid,
manga_id: Uuid,
source_url: &str,
force_refetch: bool,
allowlist: &DownloadAllowlist,
max_image_bytes: usize,
) -> anyhow::Result<SyncOutcome> {
// Skip if already fetched, unless caller explicitly forces.
if !force_refetch {
let (page_count,): (i32,) =
sqlx::query_as("SELECT page_count FROM chapters WHERE id = $1")
.bind(chapter_id)
.fetch_one(db)
.await
.context("read chapter page_count")?;
if page_count > 0 {
return Ok(SyncOutcome::Skipped);
}
}
// Nav to chapter page (rate-limited per host). /// Outcome of [`fetch_chapter_html_with_recircuit`]. `Ok` carries the
/// final reader HTML; the other two map to `sync_chapter_content`'s
/// existing failure modes.
#[derive(Debug)]
enum ChapterFetchOutcome {
Ok(String),
/// `ChapterProbe::Unauthenticated` after exhausting recircuit
/// budget (or with budget=0). Caller returns
/// `SyncOutcome::SessionExpired`.
SessionExpired,
/// `ChapterProbe::Transient` after exhausting recircuit budget
/// (or with budget=0). Caller bails so the dispatcher does
/// exponential backoff.
PersistentTransient,
}
/// Single rate-limited Chromium navigation to the chapter URL,
/// returning the page HTML. Extracted from `sync_chapter_content` so
/// the recircuit loop can call it once per attempt.
async fn fetch_chapter_html_once(
browser: &chromiumoxide::Browser,
rate: &HostRateLimiters,
source_url: &str,
) -> anyhow::Result<String> {
rate.wait_for(source_url).await?; rate.wait_for(source_url).await?;
let page = browser let page = browser
.new_page(source_url) .new_page(source_url)
@@ -124,28 +121,135 @@ pub async fn sync_chapter_content(
crate::crawler::nav::SELECTOR_TIMEOUT, crate::crawler::nav::SELECTOR_TIMEOUT,
) )
.await; .await;
let html = page.content().await.context("read chapter html")?; let html = page.content().await.context("read chapter html")?;
page.close().await.ok(); page.close().await.ok();
Ok(html)
}
// Three-way session classification: distinguishes a transient /// Pure-over-IO loop: fetch + classify, up to `max_attempts` total
// hiccup (broken-page body or logged-in-but-no-reader) from a /// fetches. Between attempts, `recircuit` is invoked (a no-op when
// genuine PHPSESSID expiry (no reader and no avatar widget). The /// TOR isn't configured). `max_attempts = 1` collapses to the
// earlier binary `#avatar_menu` check conflated both and froze /// original single-shot behavior — `Unauthenticated` →
// every worker on a layout shift. /// `SessionExpired`, `Transient` → `PersistentTransient` on the first
match session::classify_chapter_probe(&html) { /// hit, no recircuit.
ChapterProbe::Unauthenticated => return Ok(SyncOutcome::SessionExpired), ///
ChapterProbe::Transient => { /// Semantics match [`crate::crawler::detect::retry_on_transient`] and
/// [`run_session_probe_loop`]: `N` is **total attempts including the
/// first**, so `N = 3` means 3 fetches and up to 2 NEWNYM calls.
/// `Unauthenticated` and `Transient` share the budget — the loop
/// doesn't distinguish, so a sequence like Transient → Unauth → Ok
/// counts as 3 attempts.
async fn fetch_chapter_html_with_recircuit<F, Fut, R, RFut>(
mut fetch: F,
mut recircuit: R,
max_attempts: u32,
source_url_for_msg: &str,
) -> anyhow::Result<ChapterFetchOutcome>
where
F: FnMut() -> Fut,
Fut: std::future::Future<Output = anyhow::Result<String>>,
R: FnMut() -> RFut,
RFut: std::future::Future<Output = ()>,
{
debug_assert!(max_attempts >= 1, "max_attempts must be at least 1");
let mut attempt = 0u32;
loop {
attempt += 1;
let html = fetch().await?;
match session::classify_chapter_probe(&html) {
ChapterProbe::Ok => return Ok(ChapterFetchOutcome::Ok(html)),
ChapterProbe::Unauthenticated => {
if attempt >= max_attempts {
return Ok(ChapterFetchOutcome::SessionExpired);
}
tracing::warn!(
attempt,
max = max_attempts,
url = source_url_for_msg,
"chapter probe Unauthenticated; signaling TOR NEWNYM and retrying"
);
recircuit().await;
}
ChapterProbe::Transient => {
if attempt >= max_attempts {
return Ok(ChapterFetchOutcome::PersistentTransient);
}
tracing::warn!(
attempt,
max = max_attempts,
url = source_url_for_msg,
"chapter probe Transient; signaling TOR NEWNYM and retrying"
);
recircuit().await;
}
}
}
}
/// Fetch all images for one chapter and persist them atomically. On
/// any error after the first storage put, the DB transaction rolls
/// back so the chapter stays at `page_count = 0` and is retried on the
/// next run. Bytes already written to storage become orphans; a future
/// reaper sweeps them.
#[allow(clippy::too_many_arguments)]
pub async fn sync_chapter_content(
browser: &chromiumoxide::Browser,
db: &PgPool,
storage: &dyn Storage,
http: &reqwest::Client,
rate: &HostRateLimiters,
chapter_id: Uuid,
manga_id: Uuid,
source_url: &str,
force_refetch: bool,
allowlist: &DownloadAllowlist,
max_image_bytes: usize,
tor: Option<&crate::crawler::tor::TorController>,
) -> anyhow::Result<SyncOutcome> {
// Skip if already fetched, unless caller explicitly forces.
if !force_refetch {
let (page_count,): (i32,) =
sqlx::query_as("SELECT page_count FROM chapters WHERE id = $1")
.bind(chapter_id)
.fetch_one(db)
.await
.context("read chapter page_count")?;
if page_count > 0 {
return Ok(SyncOutcome::Skipped);
}
}
// Fetch + classify. With TOR configured, allow up to
// CHAPTER_RECIRCUIT_MAX_ATTEMPTS total page fetches with NEWNYM
// between each. Without TOR, collapse to 1 attempt (no retry, no
// recircuit) — matches the pre-TOR single-shot behavior bit-for-bit.
let max_attempts = if tor.is_some() { CHAPTER_RECIRCUIT_MAX_ATTEMPTS } else { 1 };
let html = match fetch_chapter_html_with_recircuit(
|| fetch_chapter_html_once(browser, rate, source_url),
|| async {
if let Some(t) = tor {
if let Err(e) = t.new_identity().await {
tracing::warn!(error = %e, "TOR NEWNYM failed; continuing with same circuit");
}
}
},
max_attempts,
source_url,
)
.await?
{
ChapterFetchOutcome::Ok(html) => html,
ChapterFetchOutcome::SessionExpired => return Ok(SyncOutcome::SessionExpired),
ChapterFetchOutcome::PersistentTransient => {
// Surface as a typed Err so the dispatcher path runs // Surface as a typed Err so the dispatcher path runs
// ack_failed with exponential backoff (rather than the // ack_failed with exponential backoff (rather than the
// session-expired sticky flag). // session-expired sticky flag).
anyhow::bail!( anyhow::bail!(
"chapter page at {source_url} returned a transient response \ "chapter page at {source_url} returned a transient response after \
(broken-page body or reader didn't render); will retry" {max_attempts} attempt(s); will retry"
); );
} }
ChapterProbe::Ok => {} };
}
let images = parse_chapter_pages(&html) let images = parse_chapter_pages(&html)
.with_context(|| format!("parse chapter pages at {source_url}"))?; .with_context(|| format!("parse chapter pages at {source_url}"))?;
@@ -304,4 +408,214 @@ mod tests {
let err = parse_chapter_pages(html).expect_err("expected Transient"); let err = parse_chapter_pages(html).expect_err("expected Transient");
assert!(err.is_transient(), "got non-transient: {err}"); assert!(err.is_transient(), "got non-transient: {err}");
} }
// --- fetch_chapter_html_with_recircuit -------------------------------
const OK_HTML: &str = r#"<html><body><a id="pic_container"><img id="page1" src="x"/></a></body></html>"#;
const UNAUTH_HTML: &str = r#"<html><body><header><div id="logo">x</div></header><main>please log in</main></body></html>"#;
const TRANSIENT_HTML: &str = "<html><body><p>we're sorry, the request file are not found.</p></body></html>";
#[tokio::test]
async fn recircuit_loop_ok_first_attempt() {
let mut recircuits = 0u32;
let mut fetches = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetches += 1;
async { Ok(OK_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok");
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
assert_eq!(fetches, 1);
assert_eq!(recircuits, 0);
}
#[tokio::test]
async fn recircuit_loop_unauth_with_single_attempt_returns_session_expired() {
// max_attempts=1 = TOR disabled, fail-fast on first Unauthenticated.
let mut recircuits = 0u32;
let mut fetches = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetches += 1;
async { Ok(UNAUTH_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
1,
"https://example/c",
)
.await
.expect("ok-result");
assert!(matches!(outcome, ChapterFetchOutcome::SessionExpired));
assert_eq!(fetches, 1);
assert_eq!(recircuits, 0, "no recircuit when budget is 1 (TOR disabled)");
}
#[tokio::test]
async fn recircuit_loop_unauth_then_ok_within_budget() {
// max_attempts=3 = up to 3 fetches with 2 recircuits between.
let mut recircuits = 0u32;
let mut fetch_n = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
let n = fetch_n;
async move {
if n == 1 {
Ok(UNAUTH_HTML.to_string())
} else {
Ok(OK_HTML.to_string())
}
}
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok");
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
assert_eq!(fetch_n, 2);
assert_eq!(recircuits, 1);
}
#[tokio::test]
async fn recircuit_loop_unauth_exhausts_budget_returns_session_expired() {
let mut recircuits = 0u32;
let mut fetch_n = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
async { Ok(UNAUTH_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok-result");
assert!(matches!(outcome, ChapterFetchOutcome::SessionExpired));
assert_eq!(fetch_n, 3, "max_attempts=3 → 3 fetches total");
assert_eq!(recircuits, 2, "2 recircuits between 3 fetches");
}
#[tokio::test]
async fn recircuit_loop_transient_then_ok_within_budget() {
let mut recircuits = 0u32;
let mut fetch_n = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
let n = fetch_n;
async move {
if n < 3 {
Ok(TRANSIENT_HTML.to_string())
} else {
Ok(OK_HTML.to_string())
}
}
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok");
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
assert_eq!(fetch_n, 3);
assert_eq!(recircuits, 2);
}
#[tokio::test]
async fn recircuit_loop_transient_exhausts_budget_returns_persistent() {
let mut recircuits = 0u32;
let mut fetch_n = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
async { Ok(TRANSIENT_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok-result");
assert!(matches!(outcome, ChapterFetchOutcome::PersistentTransient));
assert_eq!(fetch_n, 3, "max_attempts=3 → 3 fetches total");
assert_eq!(recircuits, 2, "2 recircuits between 3 fetches");
}
#[tokio::test]
async fn recircuit_loop_mixed_transient_then_unauth_then_ok_shares_budget() {
// Audit-prompted regression: outcomes share the attempt counter.
// Sequence: Transient (attempt 1) → Unauth (attempt 2) → Ok (3).
let mut recircuits = 0u32;
let mut fetch_n = 0u32;
let outcome = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
let n = fetch_n;
async move {
match n {
1 => Ok(TRANSIENT_HTML.to_string()),
2 => Ok(UNAUTH_HTML.to_string()),
_ => Ok(OK_HTML.to_string()),
}
}
},
|| {
recircuits += 1;
async {}
},
3,
"https://example/c",
)
.await
.expect("ok");
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
assert_eq!(fetch_n, 3);
assert_eq!(recircuits, 2);
}
#[tokio::test]
async fn recircuit_loop_propagates_fetch_errors() {
let mut fetch_n = 0u32;
let err = fetch_chapter_html_with_recircuit(
|| {
fetch_n += 1;
async { Err(anyhow::anyhow!("nav timeout")) }
},
|| async {},
3,
"https://example/c",
)
.await
.expect_err("fetch error bubbles");
assert_eq!(fetch_n, 1);
assert!(format!("{err:#}").contains("nav timeout"));
}
} }

View File

@@ -80,13 +80,36 @@ pub fn has_logo_sentinel(doc: &scraper::Html) -> bool {
/// caller can fall back on the job system's retry/backoff once the /// caller can fall back on the job system's retry/backoff once the
/// inline budget is exhausted. /// inline budget is exhausted.
pub async fn retry_on_transient<F, Fut, T>( pub async fn retry_on_transient<F, Fut, T>(
mut op: F, op: F,
max_attempts: u32, max_attempts: u32,
delay: Duration, delay: Duration,
) -> Result<T, PageError> ) -> Result<T, PageError>
where where
F: FnMut() -> Fut, F: FnMut() -> Fut,
Fut: Future<Output = Result<T, PageError>>, Fut: Future<Output = Result<T, PageError>>,
{
retry_on_transient_with_hook(op, max_attempts, delay, || async {}).await
}
/// Like [`retry_on_transient`] but invokes `on_retry` between a
/// transient failure and the subsequent sleep+retry. The hook does
/// **not** fire on the first attempt, after a non-transient error, or
/// after the final attempt (no retry follows). Hook failures are not
/// propagated — return `()` from the future and log inside if needed.
///
/// Wire the TOR controller's `new_identity` here to rotate circuits
/// between page-fetch retries; see [`crate::crawler::tor`].
pub async fn retry_on_transient_with_hook<F, Fut, T, H, HFut>(
mut op: F,
max_attempts: u32,
delay: Duration,
mut on_retry: H,
) -> Result<T, PageError>
where
F: FnMut() -> Fut,
Fut: Future<Output = Result<T, PageError>>,
H: FnMut() -> HFut,
HFut: Future<Output = ()>,
{ {
debug_assert!(max_attempts >= 1, "max_attempts must be at least 1"); debug_assert!(max_attempts >= 1, "max_attempts must be at least 1");
let mut attempt = 0u32; let mut attempt = 0u32;
@@ -101,8 +124,9 @@ where
attempt, attempt,
max_attempts, max_attempts,
error = %e, error = %e,
"transient error; sleeping before retry" "transient error; running on-retry hook and sleeping before retry"
); );
on_retry().await;
tokio::time::sleep(delay).await; tokio::time::sleep(delay).await;
} }
} }
@@ -247,4 +271,92 @@ mod tests {
assert_eq!(result.unwrap(), 7); assert_eq!(result.unwrap(), 7);
assert_eq!(attempt, 1); assert_eq!(attempt, 1);
} }
#[tokio::test]
async fn hook_fires_once_between_transient_and_success() {
let mut attempt = 0u32;
let mut hook_calls = 0u32;
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|| {
attempt += 1;
let n = attempt;
async move {
if n < 2 {
Err(PageError::transient("once"))
} else {
Ok(99)
}
}
},
5,
Duration::from_millis(0),
|| {
hook_calls += 1;
async {}
},
)
.await;
assert_eq!(result.unwrap(), 99);
assert_eq!(attempt, 2);
assert_eq!(hook_calls, 1, "hook fires exactly once between attempts");
}
#[tokio::test]
async fn hook_does_not_fire_when_first_attempt_succeeds() {
let mut hook_calls = 0u32;
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|| async { Ok(1) },
5,
Duration::from_millis(0),
|| {
hook_calls += 1;
async {}
},
)
.await;
assert!(result.is_ok());
assert_eq!(hook_calls, 0);
}
#[tokio::test]
async fn hook_does_not_fire_after_non_transient_error() {
let mut hook_calls = 0u32;
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|| async { Err(PageError::Other(anyhow::anyhow!("permanent"))) },
5,
Duration::from_millis(0),
|| {
hook_calls += 1;
async {}
},
)
.await;
assert!(result.is_err());
assert_eq!(hook_calls, 0, "non-transient must short-circuit before hook");
}
#[tokio::test]
async fn hook_does_not_fire_after_final_failed_attempt() {
// With max_attempts=3 and three persistent transients, the hook
// should run twice (between 1→2 and 2→3) — never a third time,
// because no retry follows attempt 3.
let mut attempt = 0u32;
let mut hook_calls = 0u32;
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|| {
attempt += 1;
async { Err(PageError::transient("always")) }
},
3,
Duration::from_millis(0),
|| {
hook_calls += 1;
async {}
},
)
.await;
assert!(result.is_err());
assert_eq!(attempt, 3);
assert_eq!(hook_calls, 2, "hook fires N-1 times for N attempts that all fail transient");
}
} }

View File

@@ -104,6 +104,12 @@ pub async fn enqueue(pool: &PgPool, payload: &JobPayload) -> sqlx::Result<Enqueu
/// ///
/// `kind_filter` matches against `payload->>'kind'`; `None` means /// `kind_filter` matches against `payload->>'kind'`; `None` means
/// any kind. /// any kind.
///
/// Ties on `scheduled_at` (the common case: a cron batch enqueues
/// everything with the same default `now()`) break by `created_at`, so
/// jobs come off the queue in insertion order. The enqueue paths insert
/// chapter-content jobs in ascending `chapters.number` order, so this
/// tiebreaker is what propagates that intent through to dequeue.
pub async fn lease( pub async fn lease(
pool: &PgPool, pool: &PgPool,
kind_filter: Option<&str>, kind_filter: Option<&str>,
@@ -118,7 +124,7 @@ pub async fn lease(
WHERE (state = 'pending' OR (state = 'running' AND leased_until < now())) WHERE (state = 'pending' OR (state = 'running' AND leased_until < now()))
AND scheduled_at <= now() AND scheduled_at <= now()
AND ($1::text IS NULL OR payload->>'kind' = $1) AND ($1::text IS NULL OR payload->>'kind' = $1)
ORDER BY scheduled_at ORDER BY scheduled_at, created_at
LIMIT $2 LIMIT $2
FOR UPDATE SKIP LOCKED FOR UPDATE SKIP LOCKED
) )

View File

@@ -23,7 +23,9 @@ pub mod jobs;
pub mod nav; pub mod nav;
pub mod pipeline; pub mod pipeline;
pub mod rate_limit; pub mod rate_limit;
pub mod resync;
pub mod safety; pub mod safety;
pub mod session; pub mod session;
pub mod source; pub mod source;
pub mod tor;
pub mod url_utils; pub mod url_utils;

View File

@@ -13,7 +13,7 @@ use crate::crawler::jobs::{self, EnqueueResult, JobPayload};
use crate::crawler::rate_limit::HostRateLimiters; use crate::crawler::rate_limit::HostRateLimiters;
use crate::crawler::safety::{fetch_bytes_capped, looks_like_image, DownloadAllowlist}; use crate::crawler::safety::{fetch_bytes_capped, looks_like_image, DownloadAllowlist};
use crate::crawler::source::target::TargetSource; use crate::crawler::source::target::TargetSource;
use crate::crawler::source::{FetchContext, Source}; use crate::crawler::source::{FetchContext, Source, SourceMangaRef};
use crate::repo; use crate::repo;
use crate::repo::crawler::UpsertStatus; use crate::repo::crawler::UpsertStatus;
use crate::storage::Storage; use crate::storage::Storage;
@@ -103,6 +103,7 @@ pub async fn run_metadata_pass(
skip_chapters: bool, skip_chapters: bool,
allowlist: &DownloadAllowlist, allowlist: &DownloadAllowlist,
max_image_bytes: usize, max_image_bytes: usize,
tor: Option<&crate::crawler::tor::TorController>,
) -> anyhow::Result<MetadataStats> { ) -> anyhow::Result<MetadataStats> {
let lease = browser_manager let lease = browser_manager
.acquire() .acquire()
@@ -121,6 +122,7 @@ pub async fn run_metadata_pass(
let ctx = FetchContext { let ctx = FetchContext {
browser: browser_ref, browser: browser_ref,
rate, rate,
tor,
}; };
let source_id = source.id(); let source_id = source.id();
@@ -427,8 +429,8 @@ pub async fn enqueue_bookmarked_pending(pool: &PgPool) -> anyhow::Result<Enqueue
AND cj.state = 'dead' AND cj.state = 'dead'
AND cj.updated_at > now() - ($1::bigint || ' days')::interval AND cj.updated_at > now() - ($1::bigint || ' days')::interval
) )
GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.manga_id, c.created_at GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.manga_id, c.number, c.created_at
ORDER BY c.manga_id, c.created_at ASC ORDER BY c.manga_id, c.number ASC, c.created_at ASC
"#, "#,
) )
.bind(CHAPTER_DEAD_QUARANTINE_DAYS) .bind(CHAPTER_DEAD_QUARANTINE_DAYS)
@@ -469,7 +471,7 @@ pub async fn enqueue_pending_for_manga(
) -> anyhow::Result<EnqueueSummary> { ) -> anyhow::Result<EnqueueSummary> {
let rows: Vec<(String, Uuid, String)> = sqlx::query_as( let rows: Vec<(String, Uuid, String)> = sqlx::query_as(
r#" r#"
SELECT DISTINCT cs.source_id, c.id AS chapter_id, cs.source_chapter_key SELECT cs.source_id, c.id AS chapter_id, cs.source_chapter_key
FROM chapters c FROM chapters c
JOIN chapter_sources cs ON cs.chapter_id = c.id JOIN chapter_sources cs ON cs.chapter_id = c.id
WHERE c.manga_id = $1 WHERE c.manga_id = $1
@@ -482,7 +484,8 @@ pub async fn enqueue_pending_for_manga(
AND cj.state = 'dead' AND cj.state = 'dead'
AND cj.updated_at > now() - ($2::bigint || ' days')::interval AND cj.updated_at > now() - ($2::bigint || ' days')::interval
) )
ORDER BY cs.source_id, c.id GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.number, c.created_at
ORDER BY c.number ASC, c.created_at ASC, cs.source_id
"#, "#,
) )
.bind(manga_id) .bind(manga_id)
@@ -521,12 +524,133 @@ pub struct EnqueueSummary {
pub failed: usize, pub failed: usize,
} }
#[derive(Debug, Default, Clone, Copy)]
pub struct CoverBackfillStats {
pub considered: usize,
pub fetched: usize,
pub failed: usize,
}
/// Default per-tick cap for [`backfill_missing_covers`]. The metadata pass
/// already retries covers when its walk reaches the affected manga; this
/// backfill exists to catch the residual case where the early-stop
/// optimisation prevents the walk from reaching mangas whose cover failed
/// on first attempt. A small cap is enough because the backlog only grows
/// from sporadic download failures, not from systematic misses.
pub const COVER_BACKFILL_DEFAULT_MAX: usize = 10;
/// Re-attempt cover downloads for mangas where `cover_image_path IS NULL`
/// but a live `manga_sources` row exists. Refetches the source detail
/// page (which is where the cover URL lives) and downloads the cover.
///
/// Bounded by `max_mangas` per call so a steady stream of failing covers
/// — e.g. a CDN host that's persistently 502 — can't monopolise a cron
/// tick. Orders by `manga_sources.last_seen_at DESC` so the freshest
/// missing-cover mangas are addressed first.
///
/// Failures are logged and counted, not raised: a single bad cover URL
/// must not stall every other backfill behind it.
#[allow(clippy::too_many_arguments)]
pub async fn backfill_missing_covers(
browser_manager: &BrowserManager,
db: &PgPool,
storage: &dyn Storage,
http: &reqwest::Client,
rate: &HostRateLimiters,
max_mangas: usize,
allowlist: &DownloadAllowlist,
max_image_bytes: usize,
tor: Option<&crate::crawler::tor::TorController>,
) -> anyhow::Result<CoverBackfillStats> {
let mut stats = CoverBackfillStats::default();
if max_mangas == 0 {
return Ok(stats);
}
let entries = repo::crawler::list_missing_covers(db, max_mangas as i64)
.await
.context("list_missing_covers")?;
if entries.is_empty() {
return Ok(stats);
}
let lease = browser_manager
.acquire()
.await
.context("acquire browser lease for cover backfill")?;
let browser_ref: &chromiumoxide::Browser = &lease;
let ctx = FetchContext { browser: browser_ref, rate, tor };
for entry in entries {
stats.considered += 1;
// Metadata-only TargetSource: skip chapter-list parsing so a
// missing-cover refetch doesn't soft-drop chapters on a partial
// render. Cover URL alone is what we need.
let source = TargetSource::new(entry.source_url.clone()).without_chapter_parsing();
let r = SourceMangaRef {
source_manga_key: entry.source_manga_key.clone(),
title: String::new(),
url: entry.source_url.clone(),
};
let cover_url = match source.fetch_manga(&ctx, &r).await {
Ok(manga) => manga.cover_url,
Err(e) => {
tracing::warn!(
manga_id = %entry.manga_id,
url = %entry.source_url,
error = ?e,
"cover backfill: fetch_manga failed"
);
stats.failed += 1;
continue;
}
};
let Some(cover_url) = cover_url else {
tracing::warn!(
manga_id = %entry.manga_id,
url = %entry.source_url,
"cover backfill: source returned no cover_url"
);
stats.failed += 1;
continue;
};
match download_and_store_cover(
db,
storage,
http,
rate,
&entry.source_url,
entry.manga_id,
&cover_url,
allowlist,
max_image_bytes,
)
.await
{
Ok(()) => stats.fetched += 1,
Err(e) => {
tracing::warn!(
manga_id = %entry.manga_id,
url = %entry.source_url,
error = ?e,
"cover backfill: download failed"
);
stats.failed += 1;
}
}
}
drop(lease);
Ok(stats)
}
/// Download a cover image and persist its storage path. Local to the /// Download a cover image and persist its storage path. Local to the
/// pipeline because the CLI still calls it from its inline chapter-content /// pipeline because the CLI still calls it from its inline chapter-content
/// loop; once the worker pool fully replaces that path we can fold this /// loop; once the worker pool fully replaces that path we can fold this
/// into `pipeline` proper. /// into `pipeline` proper.
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
async fn download_and_store_cover( pub(crate) async fn download_and_store_cover(
db: &PgPool, db: &PgPool,
storage: &dyn Storage, storage: &dyn Storage,
http: &reqwest::Client, http: &reqwest::Client,

View File

@@ -0,0 +1,277 @@
//! Admin-triggered resync of a single manga's metadata + cover, or a
//! single chapter's content.
//!
//! The cron tick already retries covers and chapter content on its own
//! schedule. This module exists for the operator-controlled path:
//! "this manga's metadata is stale / its cover never landed / this
//! chapter is broken — pull from source now, not at the next daily
//! tick." Wired into the admin API, never into the queue, so the work
//! happens synchronously with the HTTP request and the admin sees the
//! refreshed row in the response.
//!
//! Shares the daemon's [`BrowserManager`], rate limiter, HTTP client,
//! and TOR controller so a force resync respects the same per-host
//! pacing and recircuit budget the daily crawl uses — admin actions
//! must not let an operator accidentally hammer the source.
use std::sync::Arc;
use anyhow::Context;
use async_trait::async_trait;
use sqlx::PgPool;
use uuid::Uuid;
use crate::crawler::browser_manager::BrowserManager;
use crate::crawler::content::{self, SyncOutcome};
use crate::crawler::pipeline;
use crate::crawler::rate_limit::HostRateLimiters;
use crate::crawler::safety::DownloadAllowlist;
use crate::crawler::source::target::TargetSource;
use crate::crawler::source::{FetchContext, Source, SourceMangaRef};
use crate::crawler::tor::TorController;
use crate::repo;
use crate::repo::crawler::UpsertStatus;
use crate::storage::Storage;
/// Outcome of [`ResyncService::resync_manga`]. Mirrors the bits the
/// admin UI cares about — was the row actually re-upserted, did the
/// cover land — so the response can show "metadata refreshed, cover
/// re-downloaded" or "metadata unchanged" without a second round-trip.
#[derive(Debug, Clone, Copy)]
pub struct MangaResyncOutcome {
pub manga_id: Uuid,
pub metadata_status: UpsertStatus,
pub cover_fetched: bool,
}
/// Outcome of [`ResyncService::resync_chapter`]. `Fetched(pages)` is the
/// success case; `Skipped` means the source row was already gone or the
/// chapter had no live source.
#[derive(Debug, Clone)]
pub enum ChapterResyncOutcome {
Fetched { chapter_id: Uuid, pages: usize },
Skipped { chapter_id: Uuid, reason: String },
}
/// Service exposed by the daemon to the admin API. Optional on
/// [`AppState`] — `None` when the crawler daemon is disabled
/// (`CRAWLER_DAEMON=false`), in which case admin handlers return 503.
#[async_trait]
pub trait ResyncService: Send + Sync {
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome>;
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome>;
}
/// Errors with a stable shape so the API layer can map them to the
/// right HTTP status (404 vs 422 vs 5xx). Anything else surfaces as a
/// generic 500.
#[derive(Debug, thiserror::Error)]
pub enum ResyncError {
#[error("manga has no source to resync from")]
NoMangaSource,
#[error("chapter has no source to resync from")]
NoChapterSource,
}
pub struct RealResyncService {
pub browser_manager: Arc<BrowserManager>,
pub db: PgPool,
pub storage: Arc<dyn Storage>,
pub http: reqwest::Client,
pub rate: Arc<HostRateLimiters>,
pub download_allowlist: DownloadAllowlist,
pub max_image_bytes: usize,
pub tor: Option<Arc<TorController>>,
}
#[async_trait]
impl ResyncService for RealResyncService {
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome> {
// Pick the freshest live source row. Multi-source mangas
// (theoretical — only one Source impl today) get the row whose
// `last_seen_at` is newest; soft-dropped rows are skipped.
let row: Option<(String, String, String)> = sqlx::query_as(
"SELECT source_id, source_manga_key, source_url \
FROM manga_sources \
WHERE manga_id = $1 AND dropped_at IS NULL \
ORDER BY last_seen_at DESC \
LIMIT 1",
)
.bind(manga_id)
.fetch_optional(&self.db)
.await
.context("look up manga_sources for resync")?;
let Some((_source_id, source_manga_key, source_url)) = row else {
return Err(ResyncError::NoMangaSource.into());
};
let lease = self
.browser_manager
.acquire()
.await
.context("acquire browser lease for manga resync")?;
let browser_ref: &chromiumoxide::Browser = &lease;
let ctx = FetchContext {
browser: browser_ref,
rate: &self.rate,
tor: self.tor.as_deref(),
};
// Parse chapters too — a force resync is "make this manga fully
// current," not just metadata. The full pipeline handles the
// partial-render guard for us; we replicate the same caution
// here by skipping the chapter sync when the parser returned
// empty but the manga previously had chapters.
let source = TargetSource::new(source_url.clone());
let r = SourceMangaRef {
source_manga_key: source_manga_key.clone(),
title: String::new(),
url: source_url.clone(),
};
let manga = source
.fetch_manga(&ctx, &r)
.await
.with_context(|| format!("fetch_manga during resync of {manga_id}"))?;
// Partial-render guard: same logic as run_metadata_pass.
let source_id = source.id();
if !manga.chapters.is_empty() || {
let prior = repo::crawler::live_chapter_count_for_source_manga(
&self.db,
source_id,
&source_manga_key,
)
.await
.unwrap_or(0);
prior == 0
} {
// Either the new fetch surfaced chapters, or there were
// none before either — chapter sync is safe to run.
} else {
tracing::warn!(
%manga_id,
source_url = %source_url,
"resync_manga: fetch returned empty chapters but prior count > 0; skipping chapter sync to avoid soft-drop"
);
}
let upsert = repo::crawler::upsert_manga_from_source(
&self.db,
source_id,
&source_url,
&manga,
)
.await
.with_context(|| format!("upsert_manga_from_source during resync of {manga_id}"))?;
// Cover refetch: force-download regardless of UpsertStatus.
// Admin clicked "resync" because they want the cover too.
let mut cover_fetched = false;
if let Some(cover_url) = manga.cover_url.as_deref() {
match pipeline::download_and_store_cover(
&self.db,
self.storage.as_ref(),
&self.http,
&self.rate,
&source_url,
upsert.manga_id,
cover_url,
&self.download_allowlist,
self.max_image_bytes,
)
.await
{
Ok(()) => cover_fetched = true,
Err(e) => tracing::warn!(
%manga_id,
error = ?e,
"resync_manga: cover download failed"
),
}
}
// Chapter sync — only when the partial-render guard above
// didn't bail.
let prior_chapter_count = repo::crawler::live_chapter_count_for_source_manga(
&self.db,
source_id,
&source_manga_key,
)
.await
.unwrap_or(0);
if !manga.chapters.is_empty() || prior_chapter_count == 0 {
match repo::crawler::sync_manga_chapters(
&self.db,
source_id,
upsert.manga_id,
&manga.chapters,
)
.await
{
Ok(diff) => tracing::info!(
%manga_id,
new = diff.new,
refreshed = diff.refreshed,
dropped = diff.dropped,
"resync_manga: chapters synced"
),
Err(e) => tracing::warn!(
%manga_id,
error = ?e,
"resync_manga: chapter sync failed"
),
}
}
drop(lease);
Ok(MangaResyncOutcome {
manga_id: upsert.manga_id,
metadata_status: upsert.status,
cover_fetched,
})
}
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome> {
let row = repo::chapter::dispatch_target(&self.db, chapter_id)
.await
.context("look up chapter_sources for resync")?;
let Some((manga_id, source_url)) = row else {
return Err(ResyncError::NoChapterSource.into());
};
let lease = self
.browser_manager
.acquire()
.await
.context("acquire browser lease for chapter resync")?;
let result = content::sync_chapter_content(
&lease,
&self.db,
self.storage.as_ref(),
&self.http,
&self.rate,
chapter_id,
manga_id,
&source_url,
true,
&self.download_allowlist,
self.max_image_bytes,
self.tor.as_deref(),
)
.await;
drop(lease);
match result? {
SyncOutcome::Fetched { pages } => {
Ok(ChapterResyncOutcome::Fetched { chapter_id, pages })
}
SyncOutcome::Skipped => Ok(ChapterResyncOutcome::Skipped {
chapter_id,
reason: "chapter already had pages on disk".to_string(),
}),
SyncOutcome::SessionExpired => {
anyhow::bail!("source session expired — operator must refresh PHPSESSID")
}
}
}
}

View File

@@ -162,37 +162,123 @@ const PROBE_RETRY_DELAY: Duration = Duration::from_secs(2);
/// limiter. The trade is worth it — failing here costs ~1s; failing 30 /// limiter. The trade is worth it — failing here costs ~1s; failing 30
/// minutes into a backfill costs 30 minutes. /// minutes into a backfill costs 30 minutes.
pub async fn verify_session(browser: &Browser, probe_url: &str) -> anyhow::Result<()> { pub async fn verify_session(browser: &Browser, probe_url: &str) -> anyhow::Result<()> {
let mut attempt = 0u32; verify_session_with_recircuit(browser, probe_url, None, 0).await
}
/// Like [`verify_session`] but, when `tor` is `Some`, signals
/// `SIGNAL NEWNYM` between retries on transient pages AND treats
/// `Unauthenticated` as recoverable (up to `tor_max_attempts` total
/// probes, calling NEWNYM between each).
///
/// `verify_session` is `verify_session_with_recircuit(..., None, _)`,
/// which collapses the `Unauthenticated` budget to 1 attempt — i.e.
/// fail-fast, exactly the pre-TOR behavior.
pub async fn verify_session_with_recircuit(
browser: &Browser,
probe_url: &str,
tor: Option<&crate::crawler::tor::TorController>,
tor_max_attempts: u32,
) -> anyhow::Result<()> {
let unauth_max_attempts = if tor.is_some() { tor_max_attempts.max(1) } else { 1 };
run_session_probe_loop(
|| fetch_probe_html(browser, probe_url),
|| async {
if let Some(t) = tor {
if let Err(e) = t.new_identity().await {
tracing::warn!(error = %e, "TOR NEWNYM failed; continuing with same circuit");
}
}
},
PROBE_MAX_ATTEMPTS,
unauth_max_attempts,
PROBE_RETRY_DELAY,
probe_url,
)
.await
}
/// Pure-over-IO loop body for the session probe. Generic over the
/// fetch and recircuit closures so it can be unit-tested without a
/// real browser or TOR daemon.
///
/// Both budgets count **total attempts**, including the first — so
/// `transient_max_attempts = 3` allows 3 fetches and 2 recircuits
/// between them, and `unauth_max_attempts = 1` means "fail-fast, no
/// retry". This matches [`crate::crawler::detect::retry_on_transient`]
/// and the content-path recircuit loop.
///
/// Outcomes:
/// - `SessionProbe::Ok` → return `Ok(())`.
/// - `SessionProbe::Unauthenticated` → recircuit + retry while
/// under the unauth budget. After the cap, bail with the
/// "PHPSESSID expired" diagnostic, mentioning the attempt count so
/// a TOR-misconfig diagnosis is easier.
/// - `SessionProbe::Transient` → same shape against the transient
/// budget; bails with "site down or rate-limiting" after the cap.
async fn run_session_probe_loop<F, Fut, R, RFut>(
mut fetch_html: F,
mut recircuit: R,
transient_max_attempts: u32,
unauth_max_attempts: u32,
retry_delay: Duration,
probe_url_for_msg: &str,
) -> anyhow::Result<()>
where
F: FnMut() -> Fut,
Fut: std::future::Future<Output = anyhow::Result<String>>,
R: FnMut() -> RFut,
RFut: std::future::Future<Output = ()>,
{
debug_assert!(transient_max_attempts >= 1);
debug_assert!(unauth_max_attempts >= 1);
let mut transient_attempts = 0u32;
let mut unauth_attempts = 0u32;
loop { loop {
attempt += 1; let html = fetch_html().await?;
let html = fetch_probe_html(browser, probe_url).await?;
match classify_probe(&html) { match classify_probe(&html) {
SessionProbe::Ok => { SessionProbe::Ok => {
tracing::info!(attempt, "session probe ok — #logo + #avatar_menu present"); tracing::info!(
transient_attempts,
unauth_attempts,
"session probe ok — #logo + #avatar_menu present"
);
return Ok(()); return Ok(());
} }
SessionProbe::Unauthenticated => { SessionProbe::Unauthenticated => {
return Err(anyhow!( unauth_attempts += 1;
"session probe failed — #avatar_menu not present at {probe_url} \ if unauth_attempts >= unauth_max_attempts {
(page rendered the normal layout); PHPSESSID is missing, expired, \ return Err(anyhow!(
or revoked. Refresh CRAWLER_PHPSESSID and re-run." "session probe failed — #avatar_menu not present at {probe_url_for_msg} \
)); after {unauth_attempts} attempt(s); PHPSESSID is missing, \
} expired, or revoked. Refresh CRAWLER_PHPSESSID and re-run."
SessionProbe::Transient if attempt < PROBE_MAX_ATTEMPTS => { ));
}
tracing::warn!( tracing::warn!(
attempt, attempt = unauth_attempts,
max_attempts = PROBE_MAX_ATTEMPTS, max_attempts = unauth_max_attempts,
"session probe got a transient page; retrying" "session probe Unauthenticated despite PHPSESSID; signaling TOR \
NEWNYM and retrying"
); );
tokio::time::sleep(PROBE_RETRY_DELAY).await; recircuit().await;
tokio::time::sleep(retry_delay).await;
} }
SessionProbe::Transient => { SessionProbe::Transient => {
return Err(anyhow!( transient_attempts += 1;
"session probe failed — probe page at {probe_url} returned a \ if transient_attempts >= transient_max_attempts {
broken-page response after {PROBE_MAX_ATTEMPTS} attempts. \ return Err(anyhow!(
The site appears to be down or rate-limiting us; try again \ "session probe failed — probe page at {probe_url_for_msg} returned \
later before refreshing CRAWLER_PHPSESSID." a broken-page response after {transient_max_attempts} attempts. \
)); The site appears to be down or rate-limiting us; try again \
later before refreshing CRAWLER_PHPSESSID."
));
}
tracing::warn!(
attempt = transient_attempts,
max_attempts = transient_max_attempts,
"session probe got a transient page; recircuit + retry"
);
recircuit().await;
tokio::time::sleep(retry_delay).await;
} }
} }
} }
@@ -336,6 +422,204 @@ mod tests {
assert_eq!(classify_chapter_probe(html), ChapterProbe::Ok); assert_eq!(classify_chapter_probe(html), ChapterProbe::Ok);
} }
// --- run_session_probe_loop -----------------------------------------
//
// These tests exercise the recircuit-aware loop without a real
// browser. The fetch and recircuit closures are mocked over Vecs of
// canned outcomes / counters.
const OK_HTML: &str = r#"<html><body><div id="logo"></div><div id="avatar_menu"></div></body></html>"#;
const UNAUTH_HTML: &str = r#"<html><body><div id="logo"></div></body></html>"#;
const TRANSIENT_HTML: &str = "<html><body><p>we're sorry, the request file are not found.</p></body></html>";
#[tokio::test]
async fn probe_loop_ok_on_first_attempt_does_not_recircuit() {
let mut recircuits = 0u32;
let mut fetched = 0u32;
run_session_probe_loop(
|| {
fetched += 1;
async { Ok(OK_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
3,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect("ok on first attempt");
assert_eq!(fetched, 1);
assert_eq!(recircuits, 0);
}
#[tokio::test]
async fn probe_loop_unauth_then_ok_when_attempt_budget_available() {
// Budget = 3 total attempts. Unauth on call 1, ok on call 2.
let mut recircuits = 0u32;
let mut call = 0u32;
run_session_probe_loop(
|| {
call += 1;
let n = call;
async move {
if n == 1 {
Ok(UNAUTH_HTML.to_string())
} else {
Ok(OK_HTML.to_string())
}
}
},
|| {
recircuits += 1;
async {}
},
3,
3,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect("recovers after one recircuit");
assert_eq!(call, 2);
assert_eq!(recircuits, 1);
}
#[tokio::test]
async fn probe_loop_unauth_with_single_attempt_budget_fails_fast() {
// Budget = 1 total attempt = no retry (matches no-TOR behavior).
let mut recircuits = 0u32;
let mut call = 0u32;
let err = run_session_probe_loop(
|| {
call += 1;
async { Ok(UNAUTH_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
1,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect_err("budget=1 → fail-fast");
assert_eq!(call, 1, "no retry when budget is 1");
assert_eq!(recircuits, 0);
let msg = format!("{err:#}");
assert!(msg.contains("Refresh CRAWLER_PHPSESSID"), "msg: {msg}");
assert!(msg.contains("after 1 attempt"), "expected attempt count in msg: {msg}");
}
#[tokio::test]
async fn probe_loop_unauth_after_exhausting_budget_emits_attempt_count() {
let mut recircuits = 0u32;
let mut call = 0u32;
let err = run_session_probe_loop(
|| {
call += 1;
async { Ok(UNAUTH_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
10, // transient budget irrelevant here
3, // 3 attempts total, 2 recircuits between
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect_err("exhausts unauth budget");
assert_eq!(call, 3);
assert_eq!(recircuits, 2);
let msg = format!("{err:#}");
assert!(msg.contains("after 3 attempt"), "expected attempt count in error, got: {msg}");
}
#[tokio::test]
async fn probe_loop_transient_repeats_until_max_then_errors() {
let mut recircuits = 0u32;
let mut call = 0u32;
let err = run_session_probe_loop(
|| {
call += 1;
async { Ok(TRANSIENT_HTML.to_string()) }
},
|| {
recircuits += 1;
async {}
},
3,
1,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect_err("transient until max → fail");
assert_eq!(call, 3);
// Recircuit fires between attempts: 3 attempts → 2 recircuits.
assert_eq!(recircuits, 2);
let msg = format!("{err:#}");
assert!(msg.contains("broken-page response after 3 attempts"), "msg: {msg}");
}
#[tokio::test]
async fn probe_loop_transient_then_ok_returns_ok_after_one_recircuit() {
let mut recircuits = 0u32;
let mut call = 0u32;
run_session_probe_loop(
|| {
call += 1;
let n = call;
async move {
if n == 1 {
Ok(TRANSIENT_HTML.to_string())
} else {
Ok(OK_HTML.to_string())
}
}
},
|| {
recircuits += 1;
async {}
},
3,
1,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect("ok on second try");
assert_eq!(call, 2);
assert_eq!(recircuits, 1);
}
#[tokio::test]
async fn probe_loop_propagates_fetch_errors_immediately() {
let mut call = 0u32;
let err = run_session_probe_loop(
|| {
call += 1;
async { Err(anyhow!("nav timeout")) }
},
|| async {},
5,
5,
Duration::from_millis(0),
"https://example/probe",
)
.await
.expect_err("fetch error bubbles");
assert_eq!(call, 1);
assert!(format!("{err:#}").contains("nav timeout"));
}
#[test] #[test]
fn classify_probe_trusts_broken_body_over_stray_avatar_match() { fn classify_probe_trusts_broken_body_over_stray_avatar_match() {
// Defensive: if a broken-page body somehow contains an // Defensive: if a broken-page body somehow contains an

View File

@@ -67,6 +67,10 @@ pub struct SourceChapter {
pub struct FetchContext<'a> { pub struct FetchContext<'a> {
pub browser: &'a Browser, pub browser: &'a Browser,
pub rate: &'a crate::crawler::rate_limit::HostRateLimiters, pub rate: &'a crate::crawler::rate_limit::HostRateLimiters,
/// Optional TOR control-port client. When `Some`, retry helpers
/// signal `NEWNYM` between transient-page attempts so the next try
/// draws a fresh exit. `None` keeps pre-TOR behavior.
pub tor: Option<&'a crate::crawler::tor::TorController>,
} }
/// Lazy iterator over discovered manga refs. The caller drives the /// Lazy iterator over discovered manga refs. The caller drives the

View File

@@ -18,7 +18,7 @@ use super::{
SourceMangaRef, SourceMangaRef,
}; };
use crate::crawler::detect::{ use crate::crawler::detect::{
has_logo_sentinel, is_broken_page_body, retry_on_transient, PageError, has_logo_sentinel, is_broken_page_body, retry_on_transient_with_hook, PageError,
}; };
use crate::crawler::nav::{wait_for_nav, wait_for_selector, NavError, SELECTOR_TIMEOUT}; use crate::crawler::nav::{wait_for_nav, wait_for_selector, NavError, SELECTOR_TIMEOUT};
@@ -79,12 +79,13 @@ impl Source for TargetSource {
// and the HTML is handed straight to the first `next_batch` call // and the HTML is handed straight to the first `next_batch` call
// so the walker doesn't re-fetch it. Page count is discovered // so the walker doesn't re-fetch it. Page count is discovered
// incrementally — see `TargetSourceWalker::next_batch`. // incrementally — see `TargetSourceWalker::next_batch`.
let first_html = retry_on_transient( let first_html = retry_on_transient_with_hook(
|| async { || async {
navigate(ctx, self.base_url.as_str(), LIST_PAGE_MARKER).await navigate(ctx, self.base_url.as_str(), LIST_PAGE_MARKER).await
}, },
PAGE_TRANSIENT_RETRY_ATTEMPTS, PAGE_TRANSIENT_RETRY_ATTEMPTS,
PAGE_TRANSIENT_RETRY_DELAY, PAGE_TRANSIENT_RETRY_DELAY,
|| async { recircuit_if_configured(ctx.tor).await },
) )
.await?; .await?;
@@ -169,7 +170,7 @@ impl DiscoverWalk for TargetSourceWalker {
parse_manga_list_from(&doc)? parse_manga_list_from(&doc)?
} }
None => { None => {
retry_on_transient( retry_on_transient_with_hook(
|| async { || async {
let html = navigate( let html = navigate(
ctx, ctx,
@@ -182,12 +183,13 @@ impl DiscoverWalk for TargetSourceWalker {
}, },
PAGE_TRANSIENT_RETRY_ATTEMPTS, PAGE_TRANSIENT_RETRY_ATTEMPTS,
PAGE_TRANSIENT_RETRY_DELAY, PAGE_TRANSIENT_RETRY_DELAY,
|| async { recircuit_if_configured(ctx.tor).await },
) )
.await? .await?
} }
} }
} else { } else {
retry_on_transient( retry_on_transient_with_hook(
|| async { || async {
let url = page_url(&self.base_url, page_num); let url = page_url(&self.base_url, page_num);
let html = navigate(ctx, &url, LIST_PAGE_MARKER).await?; let html = navigate(ctx, &url, LIST_PAGE_MARKER).await?;
@@ -196,6 +198,7 @@ impl DiscoverWalk for TargetSourceWalker {
}, },
PAGE_TRANSIENT_RETRY_ATTEMPTS, PAGE_TRANSIENT_RETRY_ATTEMPTS,
PAGE_TRANSIENT_RETRY_DELAY, PAGE_TRANSIENT_RETRY_DELAY,
|| async { recircuit_if_configured(ctx.tor).await },
) )
.await? .await?
}; };
@@ -274,6 +277,20 @@ fn classify_navigate_html(html: String) -> Result<String, PageError> {
Ok(html) Ok(html)
} }
/// Hook for [`retry_on_transient_with_hook`]: when TOR is configured,
/// signal `NEWNYM` so the next navigation draws a fresh exit. Errors
/// from the controller are logged and swallowed — failing to recircuit
/// shouldn't take down the crawl, the next attempt just runs on the
/// same circuit as before.
async fn recircuit_if_configured(tor: Option<&crate::crawler::tor::TorController>) {
if let Some(t) = tor {
if let Err(e) = t.new_identity().await {
tracing::warn!(error = %e, "TOR NEWNYM failed; retrying on same circuit");
}
}
}
/// Substitutes the first `/N/` path segment with the target page /// Substitutes the first `/N/` path segment with the target page
/// number. Source impls that paginate via a different URL shape can /// number. Source impls that paginate via a different URL shape can
/// override this — for the modeled site the segment is always present. /// override this — for the modeled site the segment is always present.

446
backend/src/crawler/tor.rs Normal file
View File

@@ -0,0 +1,446 @@
//! TOR control-port client for `SIGNAL NEWNYM` ("recircuit").
//!
//! The crawler can be proxied through TOR (`CRAWLER_PROXY=socks5h://tor:9050`)
//! to randomize the exit IP seen by the target site. When the target
//! returns a "bad page" (its broken-template body, missing layout
//! sentinel, or unauthenticated probe despite a valid PHPSESSID), it
//! is often the current exit being rate-limited or fingerprinted rather
//! than a real failure. Asking the local TOR daemon for a new identity
//! over its control port (port 9051 by default) makes subsequent
//! connections draw a fresh circuit; combined with `IsolateDestAddr`
//! in torrc this is usually enough to clear the failure.
//!
//! Scope is deliberately tiny — `AUTHENTICATE` + `SIGNAL NEWNYM` over
//! a one-shot TCP connection. No `torut` dep, no hidden-service
//! plumbing, no event streaming.
//!
//! **Caveat for in-flight connections:** Chromium reuses sockets, so a
//! `NEWNYM` only affects *new* connections (in TOR terms, new circuits).
//! That's fine for our retry path — the next navigation opens a fresh
//! connection. We do not try to forcibly close existing streams.
use std::path::{Path, PathBuf};
use std::time::Duration;
use anyhow::{anyhow, bail, Context};
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::net::TcpStream;
use tokio::time::timeout;
/// Default control-port (`tor --defaults-torrc` ships 9051).
const DEFAULT_CONTROL_PORT: u16 = 9051;
/// Connect timeout — generous enough for a slow compose start, short
/// enough that a misconfigured controller doesn't stall a crawl.
const CONNECT_TIMEOUT: Duration = Duration::from_secs(5);
/// Per-command read timeout. `SIGNAL NEWNYM` returns instantly on the
/// happy path; bound it so a half-broken control port can't hang us.
const READ_TIMEOUT: Duration = Duration::from_secs(5);
/// How the controller authenticates to the control port.
///
/// `Cookie` is preferred for compose deploys where the auth cookie file
/// is shared between the `tor` and `backend` containers via a named
/// volume. `Password` is the fallback when the cookie file isn't
/// reachable (different gid, no shared volume, etc.). `None` matches a
/// torrc with no `CookieAuthentication 1` and no `HashedControlPassword`
/// — useful for local experimentation, not for production.
///
/// `Debug` is implemented manually to redact the password (and the
/// cookie path, which is non-sensitive but uninteresting in logs).
/// Don't add `#[derive(Debug)]` — the controller is `?`-logged at
/// startup and a derive would expand the password into the trace.
#[derive(Clone)]
pub enum TorAuth {
None,
Password(String),
Cookie(PathBuf),
}
impl std::fmt::Debug for TorAuth {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
TorAuth::None => f.write_str("None"),
TorAuth::Password(_) => f.write_str("Password(<redacted>)"),
TorAuth::Cookie(_) => f.write_str("Cookie(<path>)"),
}
}
}
#[derive(Debug, Clone)]
pub struct TorController {
/// `host:port` string. Kept as a string (not a `SocketAddr`) so
/// docker-compose hostnames like `tor:9051` resolve at connect time.
addr: String,
auth: TorAuth,
}
impl TorController {
pub fn new(addr: impl Into<String>, auth: TorAuth) -> Self {
Self { addr: addr.into(), auth }
}
/// Build a controller from the env-config shape:
/// `url` (e.g. `tcp://tor:9051`, `127.0.0.1:9051`, or `tor`),
/// optional password, optional cookie path. Returns `Ok(None)` when
/// `url` is absent — that's the "TOR feature disabled" signal.
/// Cookie wins over password when both are set (rotates with TOR;
/// no secret to manage).
pub fn from_parts(
url: Option<&str>,
password: Option<&str>,
cookie_path: Option<&Path>,
) -> anyhow::Result<Option<Self>> {
let Some(url) = url else { return Ok(None) };
let addr = parse_control_url(url)?;
let auth = match (cookie_path, password) {
(Some(p), _) => TorAuth::Cookie(p.to_path_buf()),
(None, Some(p)) => TorAuth::Password(p.to_string()),
(None, None) => TorAuth::None,
};
Ok(Some(Self { addr, auth }))
}
/// Open the control port, `AUTHENTICATE`, `SIGNAL NEWNYM`, `QUIT`.
/// Each invocation is a fresh connection; the controller is cheap
/// to clone and stateless across calls.
pub async fn new_identity(&self) -> anyhow::Result<()> {
let stream = timeout(CONNECT_TIMEOUT, TcpStream::connect(&self.addr))
.await
.with_context(|| {
format!("timed out connecting to TOR control port {}", self.addr)
})?
.with_context(|| format!("connect to TOR control port {}", self.addr))?;
let (read, mut write) = stream.into_split();
let mut read = BufReader::new(read);
let auth_line = self.build_auth_line().await?;
write_line(&mut write, &auth_line).await?;
timeout(READ_TIMEOUT, expect_250(&mut read))
.await
.map_err(|_| anyhow!("TOR control AUTHENTICATE timed out"))?
.context("AUTHENTICATE")?;
write_line(&mut write, "SIGNAL NEWNYM").await?;
timeout(READ_TIMEOUT, expect_250(&mut read))
.await
.map_err(|_| anyhow!("TOR control SIGNAL NEWNYM timed out"))?
.context("SIGNAL NEWNYM")?;
// QUIT is courtesy; ignore errors — the daemon may close the
// socket before our QUIT lands and that's perfectly fine.
let _ = write_line(&mut write, "QUIT").await;
// Debug-level: a busy crawl can rotate circuits many times per
// minute, INFO is too chatty. Failures still log at WARN.
tracing::debug!(addr = %self.addr, "TOR NEWNYM signaled");
Ok(())
}
async fn build_auth_line(&self) -> anyhow::Result<String> {
match &self.auth {
TorAuth::None => Ok("AUTHENTICATE".to_string()),
TorAuth::Password(p) => Ok(format!("AUTHENTICATE \"{}\"", escape_quoted(p))),
TorAuth::Cookie(path) => {
let bytes = tokio::fs::read(path)
.await
.with_context(|| format!("read TOR cookie file {}", path.display()))?;
Ok(format!("AUTHENTICATE {}", hex_encode(&bytes)))
}
}
}
}
/// Parse `tcp://host:port`, `host:port`, or bare `host` into a
/// connect-time string. Default port is [`DEFAULT_CONTROL_PORT`].
fn parse_control_url(url: &str) -> anyhow::Result<String> {
let stripped = url.strip_prefix("tcp://").unwrap_or(url);
if stripped.is_empty() {
bail!("TOR control url is empty");
}
if stripped.contains(':') {
Ok(stripped.to_string())
} else {
Ok(format!("{stripped}:{DEFAULT_CONTROL_PORT}"))
}
}
fn escape_quoted(s: &str) -> String {
s.replace('\\', r"\\").replace('"', r#"\""#)
}
fn hex_encode(bytes: &[u8]) -> String {
let mut s = String::with_capacity(bytes.len() * 2);
for b in bytes {
s.push_str(&format!("{b:02x}"));
}
s
}
async fn write_line<W: tokio::io::AsyncWrite + Unpin>(
w: &mut W,
line: &str,
) -> anyhow::Result<()> {
w.write_all(line.as_bytes()).await?;
w.write_all(b"\r\n").await?;
w.flush().await?;
Ok(())
}
/// Drain a TOR control reply, accepting only status `250`. Handles
/// the protocol's three line forms: `XYZ ...` (single/end), `XYZ-...`
/// (continuation), `XYZ+...` (data block ended by a lone `.`). Our
/// commands only ever produce single-line `250 OK`, but we honor the
/// continuation forms so a future torrc that adds events / banners
/// doesn't confuse the parser.
async fn expect_250<R: AsyncBufReadExt + Unpin>(r: &mut R) -> anyhow::Result<()> {
loop {
let mut line = String::new();
let n = r.read_line(&mut line).await?;
if n == 0 {
bail!("TOR control port closed connection mid-reply");
}
let trimmed = line.trim_end_matches(['\r', '\n']);
if trimmed.len() < 4 {
bail!("malformed TOR control reply: {trimmed:?}");
}
let (code, rest) = trimmed.split_at(3);
if code != "250" {
bail!("TOR control replied {trimmed:?}");
}
let sep = rest.as_bytes()[0];
match sep {
b' ' => return Ok(()),
b'-' => continue,
b'+' => {
// Data block — read until a line consisting of only ".".
loop {
let mut data = String::new();
let n = r.read_line(&mut data).await?;
if n == 0 {
bail!("TOR control port closed mid-data-block");
}
if data.trim_end_matches(['\r', '\n']) == "." {
break;
}
}
}
_ => bail!("malformed TOR control reply separator: {trimmed:?}"),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::{Arc, Mutex};
use tokio::io::AsyncWriteExt;
use tokio::net::TcpListener;
/// Spawn a mock control port that responds to each \r\n-terminated
/// inbound line with the next entry from `replies`. Each reply has
/// its own `\r\n` appended. Records received lines into `recorder`.
/// After `replies.len()` exchanges the task drops the socket — this
/// matches the real TOR behavior for QUIT (close after acking).
async fn spawn_mock(
replies: Vec<&'static str>,
recorder: Arc<Mutex<Vec<String>>>,
) -> String {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap().to_string();
tokio::spawn(async move {
let (sock, _) = listener.accept().await.unwrap();
let (r, mut w) = sock.into_split();
let mut r = BufReader::new(r);
for reply in replies {
let mut line = String::new();
let n = r.read_line(&mut line).await.unwrap_or(0);
if n == 0 {
return;
}
recorder
.lock()
.unwrap()
.push(line.trim_end_matches(['\r', '\n']).to_string());
w.write_all(reply.as_bytes()).await.unwrap();
w.write_all(b"\r\n").await.unwrap();
w.flush().await.unwrap();
}
});
addr
}
#[tokio::test]
async fn password_auth_then_newnym_writes_expected_sequence() {
let recorder = Arc::new(Mutex::new(Vec::new()));
// Two replies: AUTHENTICATE then SIGNAL NEWNYM. QUIT is
// fire-and-forget; the mock dropping the socket is the
// expected real-world behavior.
let addr =
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
let controller = TorController::new(addr, TorAuth::Password("secret".into()));
controller.new_identity().await.expect("new_identity ok");
let recorded = recorder.lock().unwrap().clone();
assert_eq!(recorded.first().map(String::as_str), Some("AUTHENTICATE \"secret\""));
assert_eq!(recorded.get(1).map(String::as_str), Some("SIGNAL NEWNYM"));
}
#[tokio::test]
async fn cookie_auth_hex_encodes_file_bytes() {
let tmp = tempfile::NamedTempFile::new().unwrap();
let cookie: Vec<u8> = (0u8..32).collect();
std::fs::write(tmp.path(), &cookie).unwrap();
let recorder = Arc::new(Mutex::new(Vec::new()));
let addr =
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
let controller =
TorController::new(addr, TorAuth::Cookie(tmp.path().to_path_buf()));
controller.new_identity().await.expect("new_identity ok");
let recorded = recorder.lock().unwrap().clone();
let expected_hex: String = cookie.iter().map(|b| format!("{b:02x}")).collect();
assert_eq!(
recorded.first().map(String::as_str),
Some(format!("AUTHENTICATE {expected_hex}").as_str())
);
}
#[tokio::test]
async fn no_auth_sends_bare_authenticate() {
let recorder = Arc::new(Mutex::new(Vec::new()));
let addr =
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
let controller = TorController::new(addr, TorAuth::None);
controller.new_identity().await.expect("new_identity ok");
let recorded = recorder.lock().unwrap().clone();
assert_eq!(recorded.first().map(String::as_str), Some("AUTHENTICATE"));
}
#[tokio::test]
async fn non_250_reply_returns_err_with_reply_text() {
let recorder = Arc::new(Mutex::new(Vec::new()));
let addr = spawn_mock(
vec!["515 Bad authentication"],
Arc::clone(&recorder),
)
.await;
let controller =
TorController::new(addr, TorAuth::Password("wrong".into()));
let err = controller.new_identity().await.expect_err("should fail");
let msg = format!("{err:#}");
assert!(msg.contains("515"), "expected 515 in error, got: {msg}");
}
#[tokio::test]
async fn closed_connection_mid_reply_is_an_error() {
// Listener accepts the AUTH line then drops without replying —
// this exercises the EOF-mid-reply path in expect_250 (rather
// than tor's own error replies which are covered elsewhere).
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap().to_string();
tokio::spawn(async move {
if let Ok((sock, _)) = listener.accept().await {
let (r, _w) = sock.into_split();
let mut r = BufReader::new(r);
let mut line = String::new();
let _ = r.read_line(&mut line).await; // read AUTH, ignore
// Drop _w (and the read half via scope exit) so the
// peer sees an immediate EOF on the next read.
}
});
let controller = TorController::new(addr, TorAuth::None);
let err = controller.new_identity().await.expect_err("should fail");
let msg = format!("{err:#}");
assert!(
msg.contains("closed connection"),
"expected EOF-mid-reply error, got: {msg}"
);
}
#[tokio::test]
async fn multi_line_250_continuation_is_accepted() {
let recorder = Arc::new(Mutex::new(Vec::new()));
// AUTHENTICATE reply uses the `250-...\r\n250 OK\r\n` form.
// Single reply string contains the whole multi-line response.
let addr = spawn_mock(
vec!["250-banner=foo\r\n250 OK", "250 OK"],
Arc::clone(&recorder),
)
.await;
let controller = TorController::new(addr, TorAuth::None);
controller.new_identity().await.expect("new_identity ok");
}
#[test]
fn from_parts_returns_none_when_url_unset() {
let c = TorController::from_parts(None, None, None).unwrap();
assert!(c.is_none());
}
#[test]
fn from_parts_prefers_cookie_over_password() {
let c = TorController::from_parts(
Some("tor:9051"),
Some("pw"),
Some(Path::new("/var/lib/tor/control_auth_cookie")),
)
.unwrap()
.expect("controller built");
assert!(matches!(c.auth, TorAuth::Cookie(_)));
}
#[test]
fn from_parts_falls_back_to_password_without_cookie() {
let c = TorController::from_parts(Some("tor:9051"), Some("pw"), None)
.unwrap()
.expect("controller built");
assert!(matches!(c.auth, TorAuth::Password(p) if p == "pw"));
}
#[test]
fn parse_control_url_accepts_tcp_scheme() {
assert_eq!(parse_control_url("tcp://127.0.0.1:9051").unwrap(), "127.0.0.1:9051");
}
#[test]
fn parse_control_url_defaults_port_when_omitted() {
assert_eq!(parse_control_url("tor").unwrap(), "tor:9051");
}
#[test]
fn parse_control_url_passes_through_host_port() {
assert_eq!(parse_control_url("tor:9999").unwrap(), "tor:9999");
}
#[test]
fn parse_control_url_rejects_empty() {
assert!(parse_control_url("").is_err());
assert!(parse_control_url("tcp://").is_err());
}
#[test]
fn escape_quoted_handles_quotes_and_backslashes() {
assert_eq!(escape_quoted(r#"a"b\c"#), r#"a\"b\\c"#);
}
#[test]
fn debug_format_redacts_password_and_cookie_path() {
// Regression: app.rs / bin/crawler.rs log the controller at
// startup via `tracing::info!(?t, ...)`. A derived Debug on
// TorAuth would expand TorAuth::Password(p) and leak the
// plaintext into logs.
let c = TorController::new("tor:9051", TorAuth::Password("super-secret".into()));
let dbg = format!("{c:?}");
assert!(!dbg.contains("super-secret"), "password leaked: {dbg}");
assert!(dbg.contains("<redacted>"), "expected <redacted>, got: {dbg}");
let c = TorController::new(
"tor:9051",
TorAuth::Cookie("/var/lib/tor/control_auth_cookie".into()),
);
let dbg = format!("{c:?}");
assert!(!dbg.contains("control_auth_cookie"), "cookie path leaked: {dbg}");
}
#[test]
fn hex_encode_zero_pads_low_bytes() {
assert_eq!(hex_encode(&[0x00, 0x0f, 0xff]), "000fff");
}
}

View File

@@ -91,6 +91,26 @@ pub fn registrable_domain(url: &str) -> Option<String> {
Some(format!(".{}", registrable.join("."))) Some(format!(".{}", registrable.join(".")))
} }
/// Normalise a SOCKS proxy URL for Chromium's `--proxy-server=` flag.
///
/// reqwest accepts both `socks5://` (resolve locally) and
/// `socks5h://` (resolve via the SOCKS server — important when the
/// proxy is TOR and we don't want the host's resolver to see the
/// target hostname). Chromium does **not** know the `socks5h` scheme
/// and refuses navigations with `ERR_NO_SUPPORTED_PROXIES`. It
/// already sends destination hostnames over SOCKS5 by default
/// regardless, so stripping the `h` is a pure scheme rename — the
/// remote-DNS behaviour is preserved.
///
/// Non-SOCKS schemes pass through unchanged.
pub fn chromium_proxy_arg(proxy: &str) -> String {
if let Some(rest) = proxy.strip_prefix("socks5h://") {
format!("socks5://{rest}")
} else {
proxy.to_string()
}
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
@@ -191,4 +211,34 @@ mod tests {
Some("[2001:db8::1]") Some("[2001:db8::1]")
); );
} }
#[test]
fn chromium_proxy_arg_strips_socks5h_to_socks5() {
// Regression: passing socks5h:// to Chromium yields
// ERR_NO_SUPPORTED_PROXIES at navigation time.
assert_eq!(
chromium_proxy_arg("socks5h://127.0.0.1:9050"),
"socks5://127.0.0.1:9050"
);
assert_eq!(
chromium_proxy_arg("socks5h://tor:9050"),
"socks5://tor:9050"
);
}
#[test]
fn chromium_proxy_arg_passes_socks5_unchanged() {
assert_eq!(
chromium_proxy_arg("socks5://127.0.0.1:9050"),
"socks5://127.0.0.1:9050"
);
}
#[test]
fn chromium_proxy_arg_passes_non_socks_unchanged() {
assert_eq!(
chromium_proxy_arg("http://proxy.example:8080"),
"http://proxy.example:8080"
);
}
} }

View File

@@ -21,6 +21,11 @@ pub enum AppError {
PayloadTooLarge(String), PayloadTooLarge(String),
#[error("unsupported media type: {0}")] #[error("unsupported media type: {0}")]
UnsupportedMediaType(String), UnsupportedMediaType(String),
/// 503 — a feature is currently unavailable, distinct from a 5xx
/// internal error. Used when admin actions require the crawler
/// daemon but it's been disabled (`CRAWLER_DAEMON=false`).
#[error("service unavailable: {0}")]
ServiceUnavailable(String),
/// 429 with an optional `Retry-After` header value (in seconds). /// 429 with an optional `Retry-After` header value (in seconds).
#[error("too many requests")] #[error("too many requests")]
TooManyRequests { TooManyRequests {
@@ -56,6 +61,7 @@ impl AppError {
AppError::Conflict(_) => "conflict", AppError::Conflict(_) => "conflict",
AppError::PayloadTooLarge(_) => "payload_too_large", AppError::PayloadTooLarge(_) => "payload_too_large",
AppError::UnsupportedMediaType(_) => "unsupported_media_type", AppError::UnsupportedMediaType(_) => "unsupported_media_type",
AppError::ServiceUnavailable(_) => "service_unavailable",
AppError::TooManyRequests { .. } => "too_many_requests", AppError::TooManyRequests { .. } => "too_many_requests",
AppError::ValidationFailed { .. } => "validation_failed", AppError::ValidationFailed { .. } => "validation_failed",
AppError::Database(sqlx::Error::RowNotFound) => "not_found", AppError::Database(sqlx::Error::RowNotFound) => "not_found",
@@ -85,6 +91,9 @@ impl IntoResponse for AppError {
AppError::UnsupportedMediaType(msg) => { AppError::UnsupportedMediaType(msg) => {
(StatusCode::UNSUPPORTED_MEDIA_TYPE, msg.clone(), None) (StatusCode::UNSUPPORTED_MEDIA_TYPE, msg.clone(), None)
} }
AppError::ServiceUnavailable(msg) => {
(StatusCode::SERVICE_UNAVAILABLE, msg.clone(), None)
}
AppError::TooManyRequests { retry_after_secs } => { AppError::TooManyRequests { retry_after_secs } => {
// Emit `Retry-After: N` (RFC 6585 §4) so a well-behaved // Emit `Retry-After: N` (RFC 6585 §4) so a well-behaved
// client can back off correctly. Done by building the // client can back off correctly. Done by building the

View File

@@ -12,15 +12,20 @@ pub async fn list_for_manga(
limit: i64, limit: i64,
offset: i64, offset: i64,
) -> AppResult<Vec<Chapter>> { ) -> AppResult<Vec<Chapter>> {
// Secondary sort by created_at gives duplicate-numbered chapters // Display order = source-site order reversed. The crawler stamps
// (multiple uploaders/translations of the same number) a stable // `source_index` = position in the source DOM (0 = first = newest
// order in lists and prev/next reader navigation. // on this site, see migration 0021), so DESC puts the oldest
// chapter first and keeps the site's variant grouping and the
// placement of non-numeric entries (e.g. "notice. : Officials")
// intact. NULLS LAST keeps user-uploaded chapters (no source row)
// and rows that pre-date the migration below crawled rows; the
// (number, created_at) tail then orders them deterministically.
let rows = sqlx::query_as::<_, Chapter>( let rows = sqlx::query_as::<_, Chapter>(
r#" r#"
SELECT id, manga_id, number, title, page_count, created_at SELECT id, manga_id, number, title, page_count, created_at
FROM chapters FROM chapters
WHERE manga_id = $1 WHERE manga_id = $1
ORDER BY number ASC, created_at ASC ORDER BY source_index DESC NULLS LAST, number ASC, created_at ASC
LIMIT $2 OFFSET $3 LIMIT $2 OFFSET $3
"#, "#,
) )

View File

@@ -352,7 +352,14 @@ pub async fn sync_manga_chapters(
.map(|c| c.source_chapter_key.clone()) .map(|c| c.source_chapter_key.clone())
.collect(); .collect();
for c in chapters { for (idx, c) in chapters.iter().enumerate() {
// `source_index` captures the chapter's position in the source
// DOM (0 = first = newest on this site) so the list query can
// reverse it for the user-facing list — see migration 0021.
// Every sync overwrites the value on both branches, so a new
// chapter inserted at the top of the source shifts every other
// row down by one on the next tick.
let source_index = idx as i32;
// Lookup is constrained by manga_id (via the chapters join) so a // Lookup is constrained by manga_id (via the chapters join) so a
// source whose chapter slugs collide across mangas (e.g. // source whose chapter slugs collide across mangas (e.g.
// "chapter-1" appearing under two different mangas) attributes // "chapter-1" appearing under two different mangas) attributes
@@ -382,14 +389,15 @@ pub async fn sync_manga_chapters(
// identity is the UUID, not the number. // identity is the UUID, not the number.
let (chapter_id,): (Uuid,) = sqlx::query_as( let (chapter_id,): (Uuid,) = sqlx::query_as(
r#" r#"
INSERT INTO chapters (manga_id, number, title, page_count) INSERT INTO chapters (manga_id, number, title, page_count, source_index)
VALUES ($1, $2, $3, 0) VALUES ($1, $2, $3, 0, $4)
RETURNING id RETURNING id
"#, "#,
) )
.bind(manga_id) .bind(manga_id)
.bind(c.number) .bind(c.number)
.bind(c.title.as_deref()) .bind(c.title.as_deref())
.bind(source_index)
.fetch_one(&mut *tx) .fetch_one(&mut *tx)
.await?; .await?;
sqlx::query( sqlx::query(
@@ -408,8 +416,11 @@ pub async fn sync_manga_chapters(
diff.new += 1; diff.new += 1;
} }
Some((chapter_id,)) => { Some((chapter_id,)) => {
sqlx::query("UPDATE chapters SET title = $1 WHERE id = $2") sqlx::query(
"UPDATE chapters SET title = $1, source_index = $2 WHERE id = $3",
)
.bind(c.title.as_deref()) .bind(c.title.as_deref())
.bind(source_index)
.bind(chapter_id) .bind(chapter_id)
.execute(&mut *tx) .execute(&mut *tx)
.await?; .await?;
@@ -542,6 +553,51 @@ pub async fn mark_run_completed(pool: &PgPool, source_id: &str) -> sqlx::Result<
Ok(()) Ok(())
} }
/// List mangas whose `cover_image_path IS NULL` but a live
/// `manga_sources` row still attaches them to a source. The bounded
/// result feeds the cover-backfill pass in [`crate::crawler::pipeline`]:
/// each entry is one (manga, freshest source row) pair where a cover
/// re-download is in order.
///
/// Per-manga deduplication uses `DISTINCT ON (m.id)` keyed on the row
/// with the newest `last_seen_at`, so a manga that's surfaced by
/// multiple sources only produces one row (the freshest). Sort is
/// stable for tests.
pub async fn list_missing_covers(
pool: &PgPool,
max: i64,
) -> sqlx::Result<Vec<MissingCoverEntry>> {
let rows: Vec<(Uuid, String, String)> = sqlx::query_as(
r#"
SELECT DISTINCT ON (m.id) m.id, ms.source_manga_key, ms.source_url
FROM mangas m
JOIN manga_sources ms ON ms.manga_id = m.id
WHERE m.cover_image_path IS NULL
AND ms.dropped_at IS NULL
ORDER BY m.id, ms.last_seen_at DESC
LIMIT $1
"#,
)
.bind(max)
.fetch_all(pool)
.await?;
Ok(rows
.into_iter()
.map(|(manga_id, source_manga_key, source_url)| MissingCoverEntry {
manga_id,
source_manga_key,
source_url,
})
.collect())
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct MissingCoverEntry {
pub manga_id: Uuid,
pub source_manga_key: String,
pub source_url: String,
}
/// Read the recovery flag for `source_id`. A missing row OR an /// Read the recovery flag for `source_id`. A missing row OR an
/// unparseable value reads as `true` ("clean") — the former covers the /// unparseable value reads as `true` ("clean") — the former covers the
/// first-ever run on a virgin DB (no recovery needed), the latter /// first-ever run on a virgin DB (no recovery needed), the latter

View File

@@ -0,0 +1,350 @@
//! Integration tests for the admin force-resync endpoints.
//!
//! Real resync work requires Chromium, so these tests swap in a stub
//! [`ResyncService`] to assert the handler-level contract: routing,
//! admin gate, 503 when the daemon is disabled, 404 / 422 mapping for
//! missing-resource / no-source cases, and the audit-log side effect.
mod common;
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use async_trait::async_trait;
use axum::http::StatusCode;
use serde_json::json;
use sqlx::PgPool;
use tower::ServiceExt;
use uuid::Uuid;
use mangalord::crawler::resync::{
ChapterResyncOutcome, MangaResyncOutcome, ResyncError, ResyncService,
};
use mangalord::repo;
use mangalord::repo::crawler::UpsertStatus;
/// Stub that records call counts and returns a canned outcome.
struct StubResync {
manga_calls: AtomicUsize,
chapter_calls: AtomicUsize,
/// When true, returns NoMangaSource / NoChapterSource.
no_source: bool,
}
impl StubResync {
fn new() -> Arc<Self> {
Arc::new(Self {
manga_calls: AtomicUsize::new(0),
chapter_calls: AtomicUsize::new(0),
no_source: false,
})
}
fn no_source() -> Arc<Self> {
Arc::new(Self {
manga_calls: AtomicUsize::new(0),
chapter_calls: AtomicUsize::new(0),
no_source: true,
})
}
}
#[async_trait]
impl ResyncService for StubResync {
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome> {
self.manga_calls.fetch_add(1, Ordering::SeqCst);
if self.no_source {
return Err(ResyncError::NoMangaSource.into());
}
Ok(MangaResyncOutcome {
manga_id,
metadata_status: UpsertStatus::Updated,
cover_fetched: true,
})
}
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome> {
self.chapter_calls.fetch_add(1, Ordering::SeqCst);
if self.no_source {
return Err(ResyncError::NoChapterSource.into());
}
Ok(ChapterResyncOutcome::Fetched {
chapter_id,
pages: 7,
})
}
}
async fn promote_admin(pool: &PgPool, username: &str) {
let u = repo::user::find_by_username(pool, username)
.await
.unwrap()
.unwrap();
repo::user::set_is_admin_unchecked(pool, u.id, true)
.await
.unwrap();
}
async fn insert_manga(pool: &PgPool, title: &str) -> Uuid {
let (id,): (Uuid,) = sqlx::query_as(
"INSERT INTO mangas (title, status, alt_titles) VALUES ($1, 'ongoing', ARRAY[]::text[]) RETURNING id",
)
.bind(title)
.fetch_one(pool)
.await
.unwrap();
id
}
async fn insert_chapter(pool: &PgPool, manga_id: Uuid, number: i32, pages: i32) -> Uuid {
let (id,): (Uuid,) = sqlx::query_as(
"INSERT INTO chapters (manga_id, number, title, page_count) VALUES ($1, $2, NULL, $3) RETURNING id",
)
.bind(manga_id)
.bind(number)
.bind(pages)
.fetch_one(pool)
.await
.unwrap();
id
}
// ----- manga resync ---------------------------------------------------------
#[sqlx::test(migrations = "./migrations")]
async fn manga_resync_calls_service_and_returns_refreshed_detail(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "Hello").await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
let body = common::body_json(resp).await;
// Stub returned Updated + cover_fetched=true.
assert_eq!(body["metadata_status"], "updated");
assert_eq!(body["cover_fetched"], true);
// Response includes the refreshed manga detail.
assert_eq!(body["manga"]["id"], manga_id.to_string());
assert_eq!(body["manga"]["title"], "Hello");
assert_eq!(stub.manga_calls.load(Ordering::SeqCst), 1);
// Audit row written.
let (audit_count,): (i64,) =
sqlx::query_as("SELECT count(*) FROM admin_audit WHERE action = 'manga_resync' AND target_id = $1")
.bind(manga_id)
.fetch_one(&pool)
.await
.unwrap();
assert_eq!(audit_count, 1);
}
#[sqlx::test(migrations = "./migrations")]
async fn manga_resync_returns_404_for_unknown_id(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/mangas/{}/resync", Uuid::new_v4()),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::NOT_FOUND);
// Service must not have been called when the manga doesn't exist.
assert_eq!(stub.manga_calls.load(Ordering::SeqCst), 0);
}
#[sqlx::test(migrations = "./migrations")]
async fn manga_resync_maps_no_source_to_422(pool: PgPool) {
let stub = StubResync::no_source();
let h = common::harness_with_resync(pool.clone(), stub);
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "Manual upload, no crawler source").await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::UNPROCESSABLE_ENTITY);
let body = common::body_json(resp).await;
assert_eq!(body["error"]["details"]["manga"], "no_source");
}
#[sqlx::test(migrations = "./migrations")]
async fn manga_resync_returns_503_when_daemon_disabled(pool: PgPool) {
let h = common::harness(pool.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "Z").await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::SERVICE_UNAVAILABLE);
let body = common::body_json(resp).await;
assert_eq!(body["error"]["code"], "service_unavailable");
}
#[sqlx::test(migrations = "./migrations")]
async fn manga_resync_requires_admin(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub);
// Non-admin user.
let (_u, cookie) = common::register_user(&h.app).await;
let manga_id = insert_manga(&pool, "M").await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
}
// ----- chapter resync -------------------------------------------------------
#[sqlx::test(migrations = "./migrations")]
async fn chapter_resync_calls_service_and_returns_refreshed_chapter(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "M").await;
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
let body = common::body_json(resp).await;
assert_eq!(body["outcome"], "fetched");
assert_eq!(body["pages"], 7);
assert_eq!(body["chapter"]["id"], chapter_id.to_string());
assert_eq!(stub.chapter_calls.load(Ordering::SeqCst), 1);
let (audit_count,): (i64,) = sqlx::query_as(
"SELECT count(*) FROM admin_audit WHERE action = 'chapter_resync' AND target_id = $1",
)
.bind(chapter_id)
.fetch_one(&pool)
.await
.unwrap();
assert_eq!(audit_count, 1);
}
#[sqlx::test(migrations = "./migrations")]
async fn chapter_resync_returns_404_for_unknown_id(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/chapters/{}/resync", Uuid::new_v4()),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::NOT_FOUND);
assert_eq!(stub.chapter_calls.load(Ordering::SeqCst), 0);
}
#[sqlx::test(migrations = "./migrations")]
async fn chapter_resync_maps_no_source_to_422(pool: PgPool) {
let stub = StubResync::no_source();
let h = common::harness_with_resync(pool.clone(), stub);
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "M").await;
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::UNPROCESSABLE_ENTITY);
let body = common::body_json(resp).await;
assert_eq!(body["error"]["details"]["chapter"], "no_source");
}
#[sqlx::test(migrations = "./migrations")]
async fn chapter_resync_returns_503_when_daemon_disabled(pool: PgPool) {
let h = common::harness(pool.clone());
let (username, cookie) = common::register_user(&h.app).await;
promote_admin(&pool, &username).await;
let manga_id = insert_manga(&pool, "M").await;
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::SERVICE_UNAVAILABLE);
}
#[sqlx::test(migrations = "./migrations")]
async fn chapter_resync_requires_admin(pool: PgPool) {
let stub = StubResync::new();
let h = common::harness_with_resync(pool.clone(), stub);
let (_u, cookie) = common::register_user(&h.app).await;
let manga_id = insert_manga(&pool, "M").await;
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
let resp = h
.app
.oneshot(common::post_json_with_cookie(
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
json!({}),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
}

View File

@@ -49,6 +49,7 @@ fn admin_test_router(pool: PgPool) -> (Router, TempDir) {
auth, auth,
upload: UploadConfig::default(), upload: UploadConfig::default(),
auth_limiter, auth_limiter,
resync: None,
}; };
let app = Router::new() let app = Router::new()
.nest("/api/v1", api::routes()) .nest("/api/v1", api::routes())

View File

@@ -0,0 +1,189 @@
//! Site-wide auth gate (`PRIVATE_MODE=true`).
//!
//! With private mode on, every API path except a small allowlist
//! (`/health`, `/auth/config`, `/auth/login`, `/auth/logout`) requires
//! a valid session cookie or bearer token, and `/auth/register` is
//! force-blocked regardless of `ALLOW_SELF_REGISTER`. With private mode
//! off (the default), nothing changes — the `public_mode_*` test
//! pins that regression guard.
mod common;
use serde_json::json;
use sqlx::PgPool;
use tower::ServiceExt;
use axum::http::StatusCode;
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_blocks_anonymous_manga_list(pool: PgPool) {
let h = common::harness_with_private_mode(pool);
let resp = h.app.oneshot(common::get("/api/v1/mangas")).await.unwrap();
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_blocks_anonymous_files(pool: PgPool) {
let h = common::harness_with_private_mode(pool);
// The path doesn't have to exist — the guard runs before routing,
// so the response is 401 (not 404). That's the property the test
// is pinning: nothing leaks via crafted URLs.
let resp = h
.app
.oneshot(common::get("/api/v1/files/anything.png"))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_allows_session_cookie_read(pool: PgPool) {
// Register through a non-private harness sharing the same DB pool
// so the session row exists. Then exercise the gate using a fresh
// private-mode harness against the same DB.
let public = common::harness(pool.clone());
let (_, cookie) = common::register_user(&public.app).await;
let private = common::harness_with_private_mode(pool);
let resp = private
.app
.oneshot(common::get_with_cookie("/api/v1/mangas", &cookie))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_allows_bearer_token_read(pool: PgPool) {
let public = common::harness(pool.clone());
let (_, cookie) = common::register_user(&public.app).await;
let resp = public
.app
.clone()
.oneshot(common::post_json_with_cookie(
"/api/v1/auth/tokens",
json!({ "name": "private-mode-bot" }),
&cookie,
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::CREATED);
let body = common::body_json(resp).await;
let bearer = body["bearer"].as_str().unwrap().to_string();
let private = common::harness_with_private_mode(pool);
let resp = private
.app
.oneshot(common::get_with_bearer("/api/v1/mangas", &bearer))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_allows_login_endpoint_anonymous(pool: PgPool) {
// Seed a user via the public harness so login has credentials to
// verify against.
let public = common::harness(pool.clone());
let _ = public
.app
.clone()
.oneshot(common::post_json(
"/api/v1/auth/register",
json!({ "username": "alice", "password": "hunter2hunter2" }),
))
.await
.unwrap();
let private = common::harness_with_private_mode(pool);
let resp = private
.app
.oneshot(common::post_json(
"/api/v1/auth/login",
json!({ "username": "alice", "password": "hunter2hunter2" }),
))
.await
.unwrap();
// Reaches the login handler and succeeds — *not* 401 from the
// gate. That's the property we're pinning.
assert_eq!(resp.status(), StatusCode::OK);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_allows_health_and_config_anonymous(pool: PgPool) {
let h = common::harness_with_private_mode(pool);
let r = h
.app
.clone()
.oneshot(common::get("/api/v1/health"))
.await
.unwrap();
assert_eq!(r.status(), StatusCode::OK);
let r = h
.app
.oneshot(common::get("/api/v1/auth/config"))
.await
.unwrap();
assert_eq!(r.status(), StatusCode::OK);
}
#[sqlx::test(migrations = "./migrations")]
async fn private_mode_blocks_register_even_when_self_register_enabled(pool: PgPool) {
// harness_with_private_mode keeps `allow_self_register=true` (the
// default) — private mode is supposed to force-block register
// regardless. That's what this test pins.
let h = common::harness_with_private_mode(pool);
let resp = h
.app
.oneshot(common::post_json(
"/api/v1/auth/register",
json!({ "username": "alice", "password": "hunter2hunter2" }),
))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
let body = common::body_json(resp).await;
assert_eq!(body["error"]["code"], "forbidden");
}
#[sqlx::test(migrations = "./migrations")]
async fn auth_config_reports_private_mode_and_effective_self_register(pool: PgPool) {
let h = common::harness_with_private_mode(pool);
let resp = h
.app
.oneshot(common::get("/api/v1/auth/config"))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
let body = common::body_json(resp).await;
assert_eq!(body["private_mode"], true);
// Effective value: `allow_self_register && !private_mode` is false
// here even though the raw `allow_self_register` is true.
assert_eq!(body["self_register_enabled"], false);
}
#[sqlx::test(migrations = "./migrations")]
async fn public_mode_does_not_gate_anonymous_reads(pool: PgPool) {
// Regression guard: with private_mode off (the default), the gate
// must be a no-op so existing public deployments stay public.
let h = common::harness(pool);
let resp = h.app.oneshot(common::get("/api/v1/mangas")).await.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
}
#[sqlx::test(migrations = "./migrations")]
async fn public_mode_reports_private_mode_false(pool: PgPool) {
let h = common::harness(pool);
let resp = h
.app
.oneshot(common::get("/api/v1/auth/config"))
.await
.unwrap();
assert_eq!(resp.status(), StatusCode::OK);
let body = common::body_json(resp).await;
assert_eq!(body["private_mode"], false);
assert_eq!(body["self_register_enabled"], true);
}

View File

@@ -74,6 +74,10 @@ fn harness_with_auth_config(
max_file_bytes: 256 * 1024, max_file_bytes: 256 * 1024,
}, },
auth_limiter, auth_limiter,
// Default harness has no crawler daemon wired up; admin resync
// handlers return 503 in this config. Tests that need a stub
// resync service swap it in via `harness_with_resync`.
resync: None,
}; };
Harness { app: router(state), _storage_dir: storage_dir } Harness { app: router(state), _storage_dir: storage_dir }
} }
@@ -92,6 +96,21 @@ pub fn harness_with_self_register_disabled(pool: PgPool) -> Harness {
harness_with_auth_config(pool, storage, storage_dir, auth) harness_with_auth_config(pool, storage, storage_dir, auth)
} }
/// Like [`harness`] but flips `PRIVATE_MODE` on so the site-wide auth
/// gate is exercised. `allow_self_register` stays at its default `true`
/// to verify that private mode force-disables self-registration on top
/// of whatever `ALLOW_SELF_REGISTER` says.
pub fn harness_with_private_mode(pool: PgPool) -> Harness {
let storage_dir = tempfile::tempdir().expect("tempdir");
let storage = Arc::new(LocalStorage::new(storage_dir.path()));
let auth = AuthConfig {
cookie_secure: false,
private_mode: true,
..AuthConfig::default()
};
harness_with_auth_config(pool, storage, storage_dir, auth)
}
/// Like [`harness`] but configures a tight auth rate limit. Used by /// Like [`harness`] but configures a tight auth rate limit. Used by
/// the brute-force-rate-limiting test. /// the brute-force-rate-limiting test.
pub fn harness_with_auth_rate_limit( pub fn harness_with_auth_rate_limit(
@@ -109,6 +128,37 @@ pub fn harness_with_auth_rate_limit(
harness_with_auth_config(pool, storage, storage_dir, auth) harness_with_auth_config(pool, storage, storage_dir, auth)
} }
/// Like [`harness`] but slots a caller-supplied [`ResyncService`] stub
/// into `AppState.resync`. Used by the admin resync tests so the
/// endpoint path is exercised without standing up a real Chromium.
pub fn harness_with_resync(
pool: PgPool,
resync: Arc<dyn mangalord::crawler::resync::ResyncService>,
) -> Harness {
let storage_dir = tempfile::tempdir().expect("tempdir");
let storage = Arc::new(LocalStorage::new(storage_dir.path()));
let auth = AuthConfig {
cookie_secure: false,
..AuthConfig::default()
};
let auth_limiter = Arc::new(AuthRateLimiter::new(auth.rate_limit));
let state = AppState {
db: pool,
storage,
auth,
upload: UploadConfig {
max_request_bytes: 4 * 1024 * 1024,
max_file_bytes: 256 * 1024,
},
auth_limiter,
resync: Some(resync),
};
Harness {
app: router(state),
_storage_dir: storage_dir,
}
}
/// Wraps a real `Storage` and fails on the N-th `put` call so tests can /// Wraps a real `Storage` and fails on the N-th `put` call so tests can
/// assert that handlers roll their DB writes back when storage errors /// assert that handlers roll their DB writes back when storage errors
/// mid-upload. Reads and other operations delegate to `inner`. /// mid-upload. Reads and other operations delegate to `inner`.

View File

@@ -517,3 +517,132 @@ async fn enqueue_bookmarked_pending_resumes_after_quarantine_expires(pool: PgPoo
); );
} }
/// Helper: insert a chapter with the given `number` and a non-dropped
/// source row, returning the chapter id. Used by the ordering tests so
/// the setup boilerplate doesn't drown the assertion.
async fn insert_pending_chapter(
pool: &PgPool,
manga_id: Uuid,
number: i32,
source_chapter_key: &str,
) -> Uuid {
let chapter_id: Uuid = sqlx::query_scalar(
"INSERT INTO chapters (manga_id, number, page_count) VALUES ($1, $2, 0) RETURNING id",
)
.bind(manga_id)
.bind(number)
.fetch_one(pool)
.await
.unwrap();
sqlx::query(
"INSERT INTO chapter_sources (source_id, source_chapter_key, chapter_id, source_url) \
VALUES ($1, $2, $3, $4)",
)
.bind("target")
.bind(source_chapter_key)
.bind(chapter_id)
.bind(format!("https://example.com/{source_chapter_key}"))
.execute(pool)
.await
.unwrap();
chapter_id
}
#[sqlx::test(migrations = "./migrations")]
async fn enqueue_bookmarked_pending_queues_chapters_in_ascending_number_order(pool: PgPool) {
// Insert chapters with `number` values 3, 1, 2 in that insertion
// order — so `created_at` order (the previous tiebreaker) does NOT
// match number order. After enqueue + lease, the worker should see
// chapters 1, 2, 3 in that sequence.
let user_id: Uuid = sqlx::query_scalar(
"INSERT INTO users (username, password_hash) VALUES ($1, $2) RETURNING id",
)
.bind("alice")
.bind("not-a-real-hash")
.fetch_one(&pool)
.await
.unwrap();
let manga_id: Uuid = sqlx::query_scalar("INSERT INTO mangas (title) VALUES ($1) RETURNING id")
.bind("Test")
.fetch_one(&pool)
.await
.unwrap();
sqlx::query(
"INSERT INTO sources (id, name, base_url) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING",
)
.bind("target")
.bind("Target")
.bind("https://example.com")
.execute(&pool)
.await
.unwrap();
let c3 = insert_pending_chapter(&pool, manga_id, 3, "ch3").await;
let c1 = insert_pending_chapter(&pool, manga_id, 1, "ch1").await;
let c2 = insert_pending_chapter(&pool, manga_id, 2, "ch2").await;
sqlx::query("INSERT INTO bookmarks (user_id, manga_id) VALUES ($1, $2)")
.bind(user_id)
.bind(manga_id)
.execute(&pool)
.await
.unwrap();
let summary = pipeline::enqueue_bookmarked_pending(&pool).await.unwrap();
assert_eq!(summary.inserted, 3);
let leases = jobs::lease(&pool, None, 10, std::time::Duration::from_secs(60))
.await
.unwrap();
let leased_chapter_ids: Vec<Uuid> = leases
.iter()
.map(|l| match &l.payload {
JobPayload::SyncChapterContent { chapter_id, .. } => *chapter_id,
other => panic!("unexpected payload kind: {other:?}"),
})
.collect();
assert_eq!(
leased_chapter_ids,
vec![c1, c2, c3],
"chapters must be leased in ascending chapter-number order, not insertion order"
);
}
#[sqlx::test(migrations = "./migrations")]
async fn enqueue_pending_for_manga_queues_chapters_in_ascending_number_order(pool: PgPool) {
// Same scenario as above but exercising the bookmark-create hook path
// (`enqueue_pending_for_manga`) which has its own ORDER BY.
let manga_id: Uuid = sqlx::query_scalar("INSERT INTO mangas (title) VALUES ($1) RETURNING id")
.bind("Test")
.fetch_one(&pool)
.await
.unwrap();
sqlx::query(
"INSERT INTO sources (id, name, base_url) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING",
)
.bind("target")
.bind("Target")
.bind("https://example.com")
.execute(&pool)
.await
.unwrap();
let c3 = insert_pending_chapter(&pool, manga_id, 3, "ch3").await;
let c1 = insert_pending_chapter(&pool, manga_id, 1, "ch1").await;
let c2 = insert_pending_chapter(&pool, manga_id, 2, "ch2").await;
let summary = pipeline::enqueue_pending_for_manga(&pool, manga_id)
.await
.unwrap();
assert_eq!(summary.inserted, 3);
let leases = jobs::lease(&pool, None, 10, std::time::Duration::from_secs(60))
.await
.unwrap();
let leased_chapter_ids: Vec<Uuid> = leases
.iter()
.map(|l| match &l.payload {
JobPayload::SyncChapterContent { chapter_id, .. } => *chapter_id,
other => panic!("unexpected payload kind: {other:?}"),
})
.collect();
assert_eq!(leased_chapter_ids, vec![c1, c2, c3]);
}

View File

@@ -531,6 +531,89 @@ async fn reap_done_deletes_old_rows_keeps_fresh(pool: PgPool) {
assert_eq!(remaining, vec![fresh_id], "only fresh row remains"); assert_eq!(remaining, vec![fresh_id], "only fresh row remains");
} }
#[sqlx::test(migrations = "./migrations")]
async fn lease_ties_on_scheduled_at_break_by_created_at(pool: PgPool) {
// Locks in the tiebreaker that lets enqueue order survive the lease
// step: when many jobs share `scheduled_at` (the common cron-batch
// case), the worker must pick the earliest-inserted row, not whatever
// Postgres returns in heap order. The enqueue path inserts chapters
// in chapter-number order, so this tiebreaker is what makes "queue
// in rising order" observable at the dequeue side too.
let a = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
.await
.unwrap()
{
EnqueueResult::Inserted(id) => id,
_ => unreachable!(),
};
let b = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
.await
.unwrap()
{
EnqueueResult::Inserted(id) => id,
_ => unreachable!(),
};
let c = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
.await
.unwrap()
{
EnqueueResult::Inserted(id) => id,
_ => unreachable!(),
};
// Pin `scheduled_at` to a single literal instant (shared across all
// three rows — `now()` would yield a different microsecond per UPDATE
// and make scheduled_at the actual sort key). Reverse `created_at`
// against insertion order so heap order would give the wrong answer.
let shared_scheduled = chrono::Utc::now() - chrono::Duration::hours(1);
sqlx::query(
"UPDATE crawler_jobs \
SET scheduled_at = $2, \
created_at = $3 \
WHERE id = $1",
)
.bind(a)
.bind(shared_scheduled)
.bind(chrono::Utc::now() - chrono::Duration::seconds(10))
.execute(&pool)
.await
.unwrap();
sqlx::query(
"UPDATE crawler_jobs \
SET scheduled_at = $2, \
created_at = $3 \
WHERE id = $1",
)
.bind(b)
.bind(shared_scheduled)
.bind(chrono::Utc::now() - chrono::Duration::seconds(20))
.execute(&pool)
.await
.unwrap();
sqlx::query(
"UPDATE crawler_jobs \
SET scheduled_at = $2, \
created_at = $3 \
WHERE id = $1",
)
.bind(c)
.bind(shared_scheduled)
.bind(chrono::Utc::now() - chrono::Duration::seconds(30))
.execute(&pool)
.await
.unwrap();
let leases = jobs::lease(&pool, None, 10, Duration::from_secs(60))
.await
.unwrap();
let order: Vec<Uuid> = leases.iter().map(|l| l.id).collect();
assert_eq!(
order,
vec![c, b, a],
"lease must return jobs in created_at order when scheduled_at ties"
);
}
#[sqlx::test(migrations = "./migrations")] #[sqlx::test(migrations = "./migrations")]
async fn reap_done_zero_is_a_no_op(pool: PgPool) { async fn reap_done_zero_is_a_no_op(pool: PgPool) {
let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4())) let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))

View File

@@ -6,6 +6,7 @@
use mangalord::crawler::source::{SourceChapterRef, SourceManga}; use mangalord::crawler::source::{SourceChapterRef, SourceManga};
use mangalord::repo::crawler::{self, ChapterDiff, UpsertStatus}; use mangalord::repo::crawler::{self, ChapterDiff, UpsertStatus};
use mangalord::repo::chapter as chapter_repo;
use sqlx::PgPool; use sqlx::PgPool;
use uuid::Uuid; use uuid::Uuid;
@@ -829,6 +830,107 @@ async fn sync_tags_garbage_collects_orphan_user_attachments(pool: PgPool) {
assert_eq!(orphan_rows, 0, "orphan user-attached tag should be reaped"); assert_eq!(orphan_rows, 0, "orphan user-attached tag should be reaped");
} }
// ---- list_missing_covers ---------------------------------------------------
#[sqlx::test(migrations = "./migrations")]
async fn list_missing_covers_only_returns_rows_without_cover(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let with_cover = sample_manga("with", "With Cover", "h1");
let without_cover = sample_manga("without", "No Cover", "h2");
let _w = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/with", &with_cover)
.await
.unwrap();
let nc = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/without", &without_cover)
.await
.unwrap();
// Manually set a cover for `with` only.
sqlx::query("UPDATE mangas SET cover_image_path = 'mangas/x/cover.jpg' WHERE id = $1")
.bind(_w.manga_id)
.execute(&pool)
.await
.unwrap();
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
assert_eq!(entries.len(), 1, "exactly the manga without a cover");
assert_eq!(entries[0].manga_id, nc.manga_id);
assert_eq!(entries[0].source_manga_key, "without");
assert_eq!(entries[0].source_url, "https://x.example/without");
}
#[sqlx::test(migrations = "./migrations")]
async fn list_missing_covers_skips_dropped_source_rows(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo", "h1");
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
.await
.unwrap();
sqlx::query("UPDATE manga_sources SET dropped_at = NOW() WHERE manga_id = $1")
.bind(up.manga_id)
.execute(&pool)
.await
.unwrap();
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
assert!(
entries.is_empty(),
"dropped-source mangas must not be backfilled — no live source to fetch from"
);
}
#[sqlx::test(migrations = "./migrations")]
async fn list_missing_covers_respects_limit(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
for i in 0..5 {
let key = format!("m{i}");
let url = format!("https://x.example/{key}");
let m = sample_manga(&key, &format!("M{i}"), &format!("h{i}"));
let _ = crawler::upsert_manga_from_source(&pool, "target", &url, &m)
.await
.unwrap();
}
let entries = crawler::list_missing_covers(&pool, 3).await.unwrap();
assert_eq!(entries.len(), 3, "limit caps the result set");
}
#[sqlx::test(migrations = "./migrations")]
async fn list_missing_covers_deduplicates_per_manga(pool: PgPool) {
// A manga surfaced by two sources should produce ONE backfill
// entry, not two — otherwise the per-tick cap could be eaten by
// duplicates and starve other mangas.
crawler::ensure_source(&pool, "src-a", "A", "https://a.example")
.await
.unwrap();
crawler::ensure_source(&pool, "src-b", "B", "https://b.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo", "h1");
let up = crawler::upsert_manga_from_source(&pool, "src-a", "https://a.example/foo", &m)
.await
.unwrap();
// Second source attaches to the SAME manga row.
sqlx::query(
"INSERT INTO manga_sources (source_id, source_manga_key, manga_id, source_url) \
VALUES ($1, $2, $3, $4)",
)
.bind("src-b")
.bind("foo-on-b")
.bind(up.manga_id)
.bind("https://b.example/foo")
.execute(&pool)
.await
.unwrap();
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
assert_eq!(entries.len(), 1, "DISTINCT ON (m.id) collapses duplicate source rows");
}
#[sqlx::test(migrations = "./migrations")] #[sqlx::test(migrations = "./migrations")]
async fn re_appearing_manga_clears_dropped_at(pool: PgPool) { async fn re_appearing_manga_clears_dropped_at(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example") crawler::ensure_source(&pool, "target", "T", "https://x.example")
@@ -860,3 +962,261 @@ async fn re_appearing_manga_clears_dropped_at(pool: PgPool) {
assert!(dropped.0.is_none()); assert!(dropped.0.is_none());
assert_eq!(dropped.1, up.manga_id); assert_eq!(dropped.1, up.manga_id);
} }
// ---- source_index: site-order preservation ----
//
// The user-facing chapter list reverses the source-site order so that
// the oldest chapter appears first. The crawler records each row's DOM
// position in `chapters.source_index` (0 = first in source DOM = newest
// on this site) on every sync; the list query orders by source_index
// DESC NULLS LAST, falling through to number/created_at for rows with
// no source row (e.g. user uploads).
#[sqlx::test(migrations = "./migrations")]
async fn source_index_set_on_insert_matches_dom_order(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo Manga", "hash-1");
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
.await
.unwrap();
let chapters = vec![
SourceChapterRef {
source_chapter_key: "a".into(),
number: 30,
title: Some("Ch.30".into()),
url: "https://x.example/foo/a".into(),
},
SourceChapterRef {
source_chapter_key: "b".into(),
number: 29,
title: Some("Ch.29".into()),
url: "https://x.example/foo/b".into(),
},
SourceChapterRef {
source_chapter_key: "c".into(),
number: 28,
title: Some("Ch.28".into()),
url: "https://x.example/foo/c".into(),
},
];
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &chapters)
.await
.unwrap();
let rows: Vec<(String, Option<i32>)> = sqlx::query_as(
"SELECT cs.source_chapter_key, c.source_index \
FROM chapters c \
JOIN chapter_sources cs ON cs.chapter_id = c.id \
WHERE c.manga_id = $1 \
ORDER BY cs.source_chapter_key",
)
.bind(up.manga_id)
.fetch_all(&pool)
.await
.unwrap();
assert_eq!(
rows,
vec![
("a".to_string(), Some(0)),
("b".to_string(), Some(1)),
("c".to_string(), Some(2)),
],
"source_index reflects enumerate() position in the input slice",
);
}
#[sqlx::test(migrations = "./migrations")]
async fn source_index_rewritten_on_resync_when_new_chapter_prepended(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo Manga", "hash-1");
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
.await
.unwrap();
let first = vec![
SourceChapterRef {
source_chapter_key: "a".into(),
number: 1,
title: Some("Ch.1".into()),
url: "https://x.example/foo/a".into(),
},
SourceChapterRef {
source_chapter_key: "b".into(),
number: 2,
title: Some("Ch.2".into()),
url: "https://x.example/foo/b".into(),
},
];
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &first)
.await
.unwrap();
// Second sync: a brand-new chapter appears at the top of the source
// (newest first on the site). All existing rows must shift their
// source_index down by one so the display order stays correct.
let second = vec![
SourceChapterRef {
source_chapter_key: "new".into(),
number: 3,
title: Some("Ch.3".into()),
url: "https://x.example/foo/new".into(),
},
SourceChapterRef {
source_chapter_key: "a".into(),
number: 1,
title: Some("Ch.1".into()),
url: "https://x.example/foo/a".into(),
},
SourceChapterRef {
source_chapter_key: "b".into(),
number: 2,
title: Some("Ch.2".into()),
url: "https://x.example/foo/b".into(),
},
];
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &second)
.await
.unwrap();
let rows: Vec<(String, Option<i32>)> = sqlx::query_as(
"SELECT cs.source_chapter_key, c.source_index \
FROM chapters c \
JOIN chapter_sources cs ON cs.chapter_id = c.id \
WHERE c.manga_id = $1 \
ORDER BY cs.source_chapter_key",
)
.bind(up.manga_id)
.fetch_all(&pool)
.await
.unwrap();
assert_eq!(
rows,
vec![
("a".to_string(), Some(1)),
("b".to_string(), Some(2)),
("new".to_string(), Some(0)),
],
"new chapter takes index 0, existing rows shift down on UPDATE",
);
}
#[sqlx::test(migrations = "./migrations")]
async fn list_for_manga_returns_source_order_reversed(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo Manga", "hash-1");
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
.await
.unwrap();
// Site DOM order (top-down = newest-first):
// ch11 (number = 11)
// notice (number = 0, non-numeric label on the site)
// ch10 (number = 10)
// Numbers deliberately disagree with DOM order: a number-based sort
// would put notice first, but the site places it between ch10 and
// ch11. Reversed-DOM display should yield [ch10, notice, ch11].
let chapters = vec![
SourceChapterRef {
source_chapter_key: "ch11".into(),
number: 11,
title: Some("Ch.11 : Official".into()),
url: "https://x.example/foo/11".into(),
},
SourceChapterRef {
source_chapter_key: "notice".into(),
number: 0,
title: Some("notice. : Officials".into()),
url: "https://x.example/foo/notice".into(),
},
SourceChapterRef {
source_chapter_key: "ch10".into(),
number: 10,
title: Some("Ch.10 : Official".into()),
url: "https://x.example/foo/10".into(),
},
];
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &chapters)
.await
.unwrap();
let listed = chapter_repo::list_for_manga(&pool, up.manga_id, 50, 0)
.await
.unwrap();
let keys: Vec<String> = listed
.iter()
.map(|c| c.title.clone().unwrap_or_default())
.collect();
assert_eq!(
keys,
vec![
"Ch.10 : Official".to_string(),
"notice. : Officials".to_string(),
"Ch.11 : Official".to_string(),
],
"list returns chapters in reversed source-DOM order, so the \
oldest appears first and non-numeric entries land where the \
site placed them",
);
}
#[sqlx::test(migrations = "./migrations")]
async fn list_for_manga_places_null_source_index_last(pool: PgPool) {
crawler::ensure_source(&pool, "target", "T", "https://x.example")
.await
.unwrap();
let m = sample_manga("foo", "Foo Manga", "hash-1");
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
.await
.unwrap();
// Crawled chapters get source_index 0 and 1; the upload path leaves
// it NULL. NULLS LAST plus the (number, created_at) tail means the
// upload sits after both crawled rows even though its number is in
// the middle.
let crawled = vec![
SourceChapterRef {
source_chapter_key: "a".into(),
number: 1,
title: Some("Ch.1".into()),
url: "https://x.example/foo/a".into(),
},
SourceChapterRef {
source_chapter_key: "b".into(),
number: 3,
title: Some("Ch.3".into()),
url: "https://x.example/foo/b".into(),
},
];
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &crawled)
.await
.unwrap();
chapter_repo::create(&pool, up.manga_id, 2, Some("User upload Ch.2"), None)
.await
.unwrap();
let listed = chapter_repo::list_for_manga(&pool, up.manga_id, 50, 0)
.await
.unwrap();
let titles: Vec<String> = listed
.iter()
.map(|c| c.title.clone().unwrap_or_default())
.collect();
assert_eq!(
titles,
vec![
"Ch.3".to_string(),
"Ch.1".to_string(),
"User upload Ch.2".to_string(),
],
"crawled rows ordered by reversed source_index; user upload \
(NULL source_index) falls through to the end",
);
}

View File

@@ -17,5 +17,28 @@ services:
timeout: 5s timeout: 5s
retries: 10 retries: 10
# Optional: TOR daemon for crawler dev. Ports bind to 127.0.0.1 only
# — never the LAN — so a native `cargo run` on the host can reach
# 127.0.0.1:9050 / 9051. Mirrors the prod tor service (see
# docker-compose.yml), just with host-loopback ports and a default
# password baked in for friction-free dev.
tor:
image: dockurr/tor:latest
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
environment:
PASSWORD: ${TOR_CONTROL_PASSWORD:-dev-tor-password}
volumes:
- ./tor/torrc:/etc/tor/torrc:ro
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
ports:
- "127.0.0.1:9050:9050"
- "127.0.0.1:9051:9051"
healthcheck:
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
interval: 5s
timeout: 5s
retries: 20
start_period: 30s
volumes: volumes:
mangalord-postgres-dev: mangalord-postgres-dev:

View File

@@ -19,11 +19,48 @@ services:
timeout: 5s timeout: 5s
retries: 10 retries: 10
tor:
# SOCKS5 proxy for the crawler, plus a control port so the backend
# can signal NEWNYM on bad pages. See tor/torrc for the daemon
# config; both ports are only `expose`d (compose-internal), never
# bound on the host.
#
# We bypass dockurr/tor's stock entrypoint because it binds the
# control port to localhost (unreachable from the backend
# container) and skips its own HashedControlPassword injection
# when the user's torrc declares a ControlPort. Our wrapper
# (tor/entrypoint.sh) generates the hash from $PASSWORD and execs
# tor with our torrc. Backend authenticates with the same plain
# string via CRAWLER_TOR_CONTROL_PASSWORD.
image: dockurr/tor:latest
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
environment:
PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
volumes:
- ./tor/torrc:/etc/tor/torrc:ro
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
expose:
- "9050"
- "9051"
# Wait for both control + SOCKS ports to listen before downstream
# services start. dockurr/tor's main process spawns before tor
# itself is bound, so `service_started` alone races the first
# NEWNYM call.
healthcheck:
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
interval: 5s
timeout: 5s
retries: 20
start_period: 30s
restart: unless-stopped
backend: backend:
build: ./backend build: ./backend
depends_on: depends_on:
postgres: postgres:
condition: service_healthy condition: service_healthy
tor:
condition: service_healthy
environment: environment:
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord} DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
BIND_ADDRESS: 0.0.0.0:8080 BIND_ADDRESS: 0.0.0.0:8080
@@ -44,6 +81,16 @@ services:
# arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true` # arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true`
# so the image actually contains the binary. # so the image actually contains the binary.
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-} CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
# TOR proxy + NEWNYM recircuit (see .env.example for details).
# Defaults assume the bundled `tor` service above; override
# CRAWLER_PROXY= and CRAWLER_TOR_CONTROL_URL= (both empty) in
# .env to disable. CRAWLER_TOR_CONTROL_PASSWORD MUST match the
# tor service's PASSWORD (both wired to the same TOR_CONTROL_PASSWORD
# .env var below).
CRAWLER_PROXY: ${CRAWLER_PROXY-socks5h://tor:9050}
CRAWLER_TOR_CONTROL_URL: ${CRAWLER_TOR_CONTROL_URL-tcp://tor:9051}
CRAWLER_TOR_CONTROL_PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS: ${CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS:-3}
volumes: volumes:
- storage-data:/var/lib/mangalord/storage - storage-data:/var/lib/mangalord/storage
# No host port mapping in the default setup — the frontend proxies # No host port mapping in the default setup — the frontend proxies

View File

@@ -10,6 +10,15 @@ import { test, expect, type Page } from '@playwright/test';
const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } }; const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } };
async function mockAnonymous(page: Page) { async function mockAnonymous(page: Page) {
// Force public mode so the root +layout.ts doesn't bounce us to /login
// (a dev backend with PRIVATE_MODE=true must not leak into E2E runs).
await page.route('**/api/v1/auth/config', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
});
});
await page.route('**/api/v1/auth/me', async (route) => { await page.route('**/api/v1/auth/me', async (route) => {
await route.fulfill({ await route.fulfill({
status: 401, status: 401,
@@ -69,3 +78,53 @@ test('search updates the manga list', async ({ page }) => {
await expect(page.getByTestId('manga-list')).toContainText('Berserk'); await expect(page.getByTestId('manga-list')).toContainText('Berserk');
expect(lastSearch).toBe('berserk'); expect(lastSearch).toBe('berserk');
}); });
test('clicking Next paginates to page 2 and updates the URL', async ({ page }) => {
await mockAnonymous(page);
// Fake a catalogue of 75 mangas; page 1 is ids 1..50, page 2 is ids 51..75.
const TOTAL = 75;
function mangaAt(i: number) {
return {
id: `m${i}`,
title: `Manga ${i}`,
author: 'Test',
description: null,
cover_image_path: null,
created_at: '2026-01-01T00:00:00Z',
updated_at: '2026-01-01T00:00:00Z',
authors: [],
genres: []
};
}
await page.route('**/api/v1/mangas*', async (route) => {
const url = new URL(route.request().url());
const limit = Number(url.searchParams.get('limit') ?? '50');
const offset = Number(url.searchParams.get('offset') ?? '0');
const items: ReturnType<typeof mangaAt>[] = [];
for (let i = offset + 1; i <= Math.min(offset + limit, TOTAL); i++) {
items.push(mangaAt(i));
}
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
items,
page: { limit, offset, total: TOTAL }
})
});
});
await page.goto('/');
await expect(page.getByTestId('manga-total')).toContainText('Showing 150 of 75');
await expect(page.getByTestId('manga-list')).toContainText('Manga 1');
await expect(page.getByTestId('manga-list')).not.toContainText('Manga 75');
await page.getByTestId('manga-pager').getByRole('button', { name: /next/i }).click();
await expect(page).toHaveURL(/[?&]page=2(&|$)/);
await expect(page.getByTestId('manga-total')).toContainText('Showing 5175 of 75');
await expect(page.getByTestId('manga-list')).toContainText('Manga 75');
await expect(page.getByTestId('manga-list')).not.toContainText('Manga 1');
});

View File

@@ -0,0 +1,67 @@
import { test, expect, type Page } from '@playwright/test';
// Guards the title-on-nav behavior: without this, a stale title from
// the last manga / author page lingers when the user navigates to a
// generic page like /upload.
async function mockAnonymous(page: Page) {
await page.route('**/api/v1/auth/config', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
});
});
await page.route('**/api/v1/auth/me', async (route) => {
await route.fulfill({
status: 401,
contentType: 'application/json',
body: JSON.stringify({ error: { code: 'unauthenticated', message: 'unauthenticated' } })
});
});
await page.route('**/api/v1/mangas*', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ items: [], page: { limit: 50, offset: 0, total: 0 } })
});
});
}
test('static route titles use the brand-first layout map', async ({ page }) => {
await mockAnonymous(page);
await page.goto('/');
await expect(page).toHaveTitle('Mangalord');
await page.goto('/upload');
await expect(page).toHaveTitle('Mangalord | Upload');
await page.goto('/login');
await expect(page).toHaveTitle('Mangalord | Login');
await page.goto('/bookmarks');
await expect(page).toHaveTitle('Mangalord | Bookmarks');
await page.goto('/collections');
await expect(page).toHaveTitle('Mangalord | Collections');
});
test('title updates when navigating away from a content page', async ({ page }) => {
await mockAnonymous(page);
// Pretend we just left a manga detail page — the document title
// would have been overridden to "Mangalord | Berserk". Use evaluate
// to set it synthetically so we can assert the regression cleanly
// even though the dynamic page itself isn't mocked here.
await page.goto('/');
await page.evaluate(() => {
document.title = 'Mangalord | Berserk';
});
expect(await page.title()).toBe('Mangalord | Berserk');
// Client-side nav to /upload — the root layout must reassert its
// mapped title or the stale "Berserk" lingers.
await page.goto('/upload');
await expect(page).toHaveTitle('Mangalord | Upload');
});

View File

@@ -0,0 +1,101 @@
import { test, expect, type Page } from '@playwright/test';
// Network-level mocks for the private-mode UX. The backend integration
// tests (api_private_mode.rs) cover the actual gate; here we only
// verify that the SvelteKit universal load redirects anonymous
// visitors to /login and then back to where they were going.
const userFixture = {
id: 'user-1',
username: 'alice',
created_at: '2026-01-01T00:00:00Z',
is_admin: false
};
const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } };
async function stubPrivateInstance(page: Page) {
let loggedIn = false;
// The flag that flips the gate on. Frontend reads it in
// `+layout.ts` to decide whether to redirect.
await page.route('**/api/v1/auth/config', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
self_register_enabled: false,
private_mode: true
})
});
});
await page.route('**/api/v1/auth/me', async (route) => {
if (loggedIn) {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ user: userFixture })
});
} else {
await route.fulfill({
status: 401,
contentType: 'application/json',
body: JSON.stringify({
error: { code: 'unauthenticated', message: 'unauthenticated' }
})
});
}
});
await page.route('**/api/v1/auth/login', async (route) => {
loggedIn = true;
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ user: userFixture })
});
});
// The real backend would 401 these too in private mode; we stub
// success so the post-login navigation can render the home page
// without an additional redirect cycle.
await page.route('**/api/v1/mangas*', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify(emptyPage)
});
});
}
test('private mode: anonymous visit to / redirects to /login?next=%2F', async ({ page }) => {
await stubPrivateInstance(page);
await page.goto('/');
await expect(page).toHaveURL(/\/login\?next=%2F$/);
await expect(page.getByTestId('login-username')).toBeVisible();
});
test('private mode: register link is hidden', async ({ page }) => {
await stubPrivateInstance(page);
await page.goto('/login');
await expect(page.getByTestId('nav-login')).toBeVisible();
// self_register_enabled is the effective value (false in private
// mode regardless of ALLOW_SELF_REGISTER), so the navbar must
// never render the register affordance here.
await expect(page.getByTestId('nav-register')).toHaveCount(0);
});
test('private mode: after login the user lands back on the requested page', async ({ page }) => {
await stubPrivateInstance(page);
// Visit a deep link → bounced to /login with next= preserving it.
await page.goto('/');
await expect(page).toHaveURL(/\/login\?next=%2F$/);
await page.getByTestId('login-username').fill('alice');
await page.getByTestId('login-password').fill('hunter2hunter2');
await page.getByTestId('login-submit').click();
// Authenticated → can now reach the home page without bouncing.
await expect(page.getByTestId('session-user')).toContainText('alice');
});

View File

@@ -0,0 +1,167 @@
import { test, expect, type Page } from '@playwright/test';
const mangaId = '33333333-3333-3333-3333-333333333333';
const chapter1Id = 'c1111111-3333-3333-3333-333333333333';
const chapter2Id = 'c2222222-3333-3333-3333-333333333333';
const chapter3Id = 'c3333333-3333-3333-3333-333333333333';
const mangaFixture = {
id: mangaId,
title: 'Vinland Saga',
author: 'Makoto Yukimura',
description: null,
cover_image_path: null,
created_at: '2026-01-01T00:00:00Z',
updated_at: '2026-01-01T00:00:00Z'
};
const chaptersFixture = [
{
id: chapter1Id,
manga_id: mangaId,
number: 1,
title: 'Somewhere, Not Here',
page_count: 1,
created_at: '2026-01-01T00:00:00Z'
},
{
id: chapter2Id,
manga_id: mangaId,
number: 2,
title: null,
page_count: 1,
created_at: '2026-01-02T00:00:00Z'
},
{
id: chapter3Id,
manga_id: mangaId,
number: 3,
title: 'Sword Dance',
page_count: 1,
created_at: '2026-01-03T00:00:00Z'
}
];
function pageFixture(chapterId: string) {
return [
{
id: `p1111111-${chapterId.slice(1, 8)}-3333-3333-333333333333`,
chapter_id: chapterId,
page_number: 1,
storage_key: `mangas/${mangaId}/chapters/${chapterId}/pages/0001.png`,
content_type: 'image/png'
}
];
}
async function mockReaderApis(page: Page) {
// Force public mode so the layout doesn't bounce anonymous visitors
// to /login (the dev backend on this machine runs with
// PRIVATE_MODE=true, which the layout's universal load respects).
await page.route('**/api/v1/auth/config', (route) =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
})
);
await page.route('**/api/v1/auth/me', (route) =>
route.fulfill({
status: 401,
contentType: 'application/json',
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
})
);
await page.route('**/api/v1/auth/me/preferences', (route) =>
route.fulfill({
status: 401,
contentType: 'application/json',
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
})
);
await page.route('**/api/v1/me/bookmarks*', (route) =>
route.fulfill({
status: 401,
contentType: 'application/json',
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
})
);
await page.route(`**/api/v1/mangas/${mangaId}`, (route) =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify(mangaFixture)
})
);
await page.route(new RegExp(`/api/v1/mangas/${mangaId}/chapters(\\?.*)?$`), (route) =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
items: chaptersFixture,
page: { limit: 200, offset: 0, total: chaptersFixture.length }
})
})
);
for (const c of chaptersFixture) {
await page.route(`**/api/v1/mangas/${mangaId}/chapters/${c.id}`, (route) =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify(c)
})
);
await page.route(
`**/api/v1/mangas/${mangaId}/chapters/${c.id}/pages`,
(route) =>
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ pages: pageFixture(c.id) })
})
);
}
const png = Buffer.from(
'89504e470d0a1a0a0000000d49484452000000010000000108060000001f15c4890000000d49444154789c63000100000005000158a3b62a0000000049454e44ae426082',
'hex'
);
await page.route('**/api/v1/files/**', (route) =>
route.fulfill({ status: 200, contentType: 'image/png', body: png })
);
}
test('reader chapter select lists every chapter with the manga-detail-style label', async ({
page
}) => {
await mockReaderApis(page);
await page.goto(`/manga/${mangaId}/chapter/${chapter2Id}`);
const select = page.getByTestId('reader-chapter-select');
await expect(select).toBeVisible();
// The current chapter is preselected.
await expect(select).toHaveValue(chapter2Id);
// Each chapter rendered as "Ch. N — Title" (or "Ch. N" when title is null),
// in ascending number order — matching the prev/next sort.
const labels = await select.locator('option').allTextContents();
expect(labels.map((l) => l.trim())).toEqual([
'Ch. 1 — Somewhere, Not Here',
'Ch. 2',
'Ch. 3 — Sword Dance'
]);
});
test('choosing a chapter from the select navigates to that chapter', async ({ page }) => {
await mockReaderApis(page);
await page.goto(`/manga/${mangaId}/chapter/${chapter1Id}`);
await expect(page.getByTestId('reader-chapter-select')).toHaveValue(chapter1Id);
await page.getByTestId('reader-chapter-select').selectOption(chapter3Id);
await expect(page).toHaveURL(
new RegExp(`/manga/${mangaId}/chapter/${chapter3Id}$`)
);
await expect(page.getByTestId('reader-chapter-select')).toHaveValue(chapter3Id);
});

View File

@@ -120,7 +120,7 @@ test('manga overview shows title, cover, and a chapter list', async ({ page }) =
await expect(page.getByTestId('manga-title')).toHaveText('Berserk'); await expect(page.getByTestId('manga-title')).toHaveText('Berserk');
await expect(page.getByTestId('manga-author')).toContainText('Kentaro Miura'); await expect(page.getByTestId('manga-author')).toContainText('Kentaro Miura');
await expect(page.getByTestId('manga-cover')).toBeVisible(); await expect(page.getByTestId('manga-cover')).toBeVisible();
await expect(page.getByTestId('chapter-list')).toContainText('Chapter 1'); await expect(page.getByTestId('chapter-list')).toContainText('The Brand');
await expect(page.getByTestId('bookmark-signin')).toBeVisible(); await expect(page.getByTestId('bookmark-signin')).toBeVisible();
}); });

View File

@@ -1,6 +1,6 @@
{ {
"name": "mangalord-frontend", "name": "mangalord-frontend",
"version": "0.45.1", "version": "0.52.0",
"private": true, "private": true,
"type": "module", "type": "module",
"scripts": { "scripts": {

View File

@@ -14,7 +14,9 @@ import {
createAdminUser, createAdminUser,
listAdminMangas, listAdminMangas,
listAdminChapters, listAdminChapters,
getSystemStats getSystemStats,
resyncManga,
resyncChapter
} from './admin'; } from './admin';
function ok(body: unknown, status = 200): Response { function ok(body: unknown, status = 200): Response {
@@ -242,4 +244,88 @@ describe('admin api client', () => {
const s = await getSystemStats(); const s = await getSystemStats();
expect(s.disk).toBeNull(); expect(s.disk).toBeNull();
}); });
// ---- force resync ----
it('resyncManga POSTs to /v1/admin/mangas/{id}/resync and returns the envelope', async () => {
const resp = {
manga: {
id: 'm-1',
title: 'T',
status: 'ongoing',
alt_titles: [],
description: null,
cover_image_path: 'mangas/m-1/cover.jpg',
created_at: '2026-01-01T00:00:00Z',
updated_at: '2026-01-02T00:00:00Z',
authors: [],
genres: [],
tags: []
},
metadata_status: 'updated',
cover_fetched: true
};
fetchSpy.mockResolvedValueOnce(ok(resp));
const got = await resyncManga('m-1');
expect(got.metadata_status).toBe('updated');
expect(got.cover_fetched).toBe(true);
expect(got.manga.id).toBe('m-1');
const url = fetchSpy.mock.calls[0][0] as string;
expect(url).toMatch(/\/v1\/admin\/mangas\/m-1\/resync$/);
const init = fetchSpy.mock.calls[0][1] as RequestInit;
expect(init.method).toBe('POST');
});
it('resyncManga surfaces 503 service_unavailable when the daemon is off', async () => {
fetchSpy.mockResolvedValueOnce(
envelope(503, 'service_unavailable', 'crawler daemon is disabled')
);
await expect(resyncManga('m-1')).rejects.toMatchObject({
status: 503,
code: 'service_unavailable'
});
});
it('resyncChapter POSTs to /v1/admin/chapters/{id}/resync and returns the envelope', async () => {
const resp = {
chapter: {
id: 'c-1',
manga_id: 'm-1',
number: 1,
title: 'Foo',
page_count: 7,
created_at: '2026-01-01T00:00:00Z'
},
outcome: 'fetched',
pages: 7
};
fetchSpy.mockResolvedValueOnce(ok(resp));
const got = await resyncChapter('c-1');
expect(got.outcome).toBe('fetched');
expect(got.pages).toBe(7);
expect(got.chapter.page_count).toBe(7);
const url = fetchSpy.mock.calls[0][0] as string;
expect(url).toMatch(/\/v1\/admin\/chapters\/c-1\/resync$/);
const init = fetchSpy.mock.calls[0][1] as RequestInit;
expect(init.method).toBe('POST');
});
it('resyncChapter handles the "skipped" outcome envelope', async () => {
const resp = {
chapter: {
id: 'c-1',
manga_id: 'm-1',
number: 1,
title: null,
page_count: 7,
created_at: '2026-01-01T00:00:00Z'
},
outcome: 'skipped',
pages: null
};
fetchSpy.mockResolvedValueOnce(ok(resp));
const got = await resyncChapter('c-1');
expect(got.outcome).toBe('skipped');
expect(got.pages).toBeNull();
});
}); });

View File

@@ -5,6 +5,8 @@
import { request, type Page } from './client'; import { request, type Page } from './client';
import type { User } from './auth'; import type { User } from './auth';
import type { MangaDetail } from './mangas';
import type { Chapter } from './chapters';
// ---- users ----------------------------------------------------------------- // ---- users -----------------------------------------------------------------
@@ -176,3 +178,39 @@ export type SystemStats = {
export async function getSystemStats(): Promise<SystemStats> { export async function getSystemStats(): Promise<SystemStats> {
return request<SystemStats>('/v1/admin/system'); return request<SystemStats>('/v1/admin/system');
} }
// ---- force resync ----------------------------------------------------------
export type MangaResyncResponse = {
manga: MangaDetail;
metadata_status: 'new' | 'updated' | 'unchanged';
cover_fetched: boolean;
};
export type ChapterResyncResponse = {
chapter: Chapter;
outcome: 'fetched' | 'skipped';
/** Page count when `outcome === 'fetched'`; null when skipped. */
pages: number | null;
};
/** POST /v1/admin/mangas/:id/resync — refetches metadata + cover from
* the manga's live crawler source. Long-running (one HTTP request per
* Chromium nav + image download), so the UI should disable the trigger
* and surface progress. */
export async function resyncManga(id: string): Promise<MangaResyncResponse> {
return request<MangaResyncResponse>(
`/v1/admin/mangas/${encodeURIComponent(id)}/resync`,
{ method: 'POST' }
);
}
/** POST /v1/admin/chapters/:id/resync — force-refetches a chapter's
* pages even if `page_count > 0`. Same long-running caveat as
* `resyncManga`. */
export async function resyncChapter(id: string): Promise<ChapterResyncResponse> {
return request<ChapterResyncResponse>(
`/v1/admin/chapters/${encodeURIComponent(id)}/resync`,
{ method: 'POST' }
);
}

View File

@@ -102,10 +102,14 @@ export async function deleteToken(id: string): Promise<void> {
} }
export type AuthConfig = { export type AuthConfig = {
/** When false, /v1/auth/register returns 403 and the UI should /** Effective value (`allow_self_register && !private_mode`).
* When false, /v1/auth/register returns 403 and the UI should
* hide its register affordance. Admins can still mint accounts * hide its register affordance. Admins can still mint accounts
* via POST /v1/admin/users. */ * via POST /v1/admin/users. */
self_register_enabled: boolean; self_register_enabled: boolean;
/** When true, every read endpoint requires auth and anonymous
* visitors are redirected to `/login` (see `+layout.ts`). */
private_mode: boolean;
}; };
/** Public — no auth, no cookie required. */ /** Public — no auth, no cookie required. */

View File

@@ -11,7 +11,8 @@ import {
listChapters, listChapters,
getChapter, getChapter,
getChapterPages, getChapterPages,
createChapter createChapter,
chapterLabel
} from './chapters'; } from './chapters';
function ok(body: unknown): Response { function ok(body: unknown): Response {
@@ -129,6 +130,18 @@ describe('chapters api client', () => {
} }
}); });
describe('chapterLabel', () => {
it('returns the site title verbatim when present', () => {
expect(chapterLabel({ number: 7, title: 'Ch.7 : Official' })).toBe(
'Ch.7 : Official'
);
});
it('falls back to "Chapter {number}" when title is null', () => {
expect(chapterLabel({ number: 3, title: null })).toBe('Chapter 3');
});
});
it('getChapterPages unwraps the {pages} envelope into the array', async () => { it('getChapterPages unwraps the {pages} envelope into the array', async () => {
fetchSpy.mockResolvedValueOnce( fetchSpy.mockResolvedValueOnce(
ok({ ok({

View File

@@ -14,6 +14,10 @@ export type ChaptersPage = {
page: Page; page: Page;
}; };
export function chapterLabel(c: Pick<Chapter, 'number' | 'title'>): string {
return c.title ?? `Chapter ${c.number}`;
}
export type ListOptions = { export type ListOptions = {
limit?: number; limit?: number;
offset?: number; offset?: number;

View File

@@ -16,6 +16,7 @@ import { getAuthConfig } from './api/auth';
class AuthConfigStore { class AuthConfigStore {
self_register_enabled = $state(true); self_register_enabled = $state(true);
private_mode = $state(false);
loaded = $state(false); loaded = $state(false);
private loading = false; private loading = false;
@@ -25,6 +26,7 @@ class AuthConfigStore {
try { try {
const cfg = await getAuthConfig(); const cfg = await getAuthConfig();
this.self_register_enabled = cfg.self_register_enabled; this.self_register_enabled = cfg.self_register_enabled;
this.private_mode = cfg.private_mode;
this.loaded = true; this.loaded = true;
} catch { } catch {
// Keep optimistic default; next page mount will retry. // Keep optimistic default; next page mount will retry.
@@ -32,6 +34,16 @@ class AuthConfigStore {
this.loading = false; this.loading = false;
} }
} }
/** Seed from server-rendered layout data so the very first paint
* doesn't flash the loading state. Used by `+layout.ts` /
* `+layout.svelte` on the universal-load path. Safe to call from
* SSR (no `browser` guard) since it touches only reactive state. */
seed(cfg: { self_register_enabled: boolean; private_mode: boolean }): void {
this.self_register_enabled = cfg.self_register_enabled;
this.private_mode = cfg.private_mode;
this.loaded = true;
}
} }
export const authConfig = new AuthConfigStore(); export const authConfig = new AuthConfigStore();

View File

@@ -0,0 +1,128 @@
<script lang="ts">
type Props = {
page: number;
totalPages: number;
onChange: (page: number) => void;
testid?: string;
};
let { page, totalPages, onChange, testid }: Props = $props();
type Slot = number | 'ellipsis';
// Compact layout: always show first + last, surround the current page with
// its direct neighbours, and use "…" to elide the rest. Keeps the bar to
// at most 7 buttons regardless of totalPages.
function buildSlots(p: number, total: number): Slot[] {
if (total <= 7) {
return Array.from({ length: total }, (_, i) => i + 1);
}
const out: Slot[] = [1];
if (p <= 4) {
for (let i = 2; i <= 5; i++) out.push(i);
out.push('ellipsis');
out.push(total);
} else if (p >= total - 3) {
out.push('ellipsis');
for (let i = total - 4; i <= total; i++) out.push(i);
} else {
out.push('ellipsis');
out.push(p - 1);
out.push(p);
out.push(p + 1);
out.push('ellipsis');
out.push(total);
}
return out;
}
const slots = $derived(buildSlots(page, totalPages));
</script>
{#if totalPages > 1}
<nav class="pager" aria-label="Pagination" data-testid={testid}>
<button
type="button"
class="step"
disabled={page <= 1}
onclick={() => onChange(page - 1)}
aria-label="Previous page"
>
Prev
</button>
{#each slots as slot, i (i)}
{#if slot === 'ellipsis'}
<span class="ellipsis" aria-hidden="true"></span>
{:else}
<button
type="button"
class="num"
class:active={slot === page}
aria-current={slot === page ? 'page' : undefined}
aria-label={`Go to page ${slot}`}
onclick={() => onChange(slot)}
>
{slot}
</button>
{/if}
{/each}
<button
type="button"
class="step"
disabled={page >= totalPages}
onclick={() => onChange(page + 1)}
aria-label="Next page"
>
Next
</button>
</nav>
{/if}
<style>
.pager {
display: flex;
flex-wrap: wrap;
align-items: center;
gap: var(--space-1);
margin: var(--space-4) 0;
justify-content: center;
}
.step,
.num {
min-width: 36px;
height: 36px;
padding: 0 var(--space-2);
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius-md);
color: var(--text);
cursor: pointer;
font-size: var(--font-sm);
}
.step:hover:not(:disabled),
.num:hover:not(.active) {
border-color: var(--primary);
}
.step:disabled {
opacity: 0.4;
cursor: not-allowed;
}
.num.active {
background: var(--primary);
color: var(--primary-contrast);
border-color: var(--primary);
cursor: default;
}
.ellipsis {
padding: 0 var(--space-1);
color: var(--text-muted);
user-select: none;
}
</style>

View File

@@ -0,0 +1,77 @@
import { describe, it, expect, vi, afterEach } from 'vitest';
import { render, screen, cleanup } from '@testing-library/svelte';
import Pager from './Pager.svelte';
afterEach(() => cleanup());
describe('Pager', () => {
it('renders nothing when totalPages <= 1', () => {
const { container } = render(Pager, { props: { page: 1, totalPages: 1, onChange: () => {} } });
expect(container.querySelector('nav')).toBeNull();
});
it('disables Prev on the first page and Next on the last', () => {
const { rerender } = render(Pager, {
props: { page: 1, totalPages: 5, onChange: () => {} }
});
expect((screen.getByRole('button', { name: /prev/i }) as HTMLButtonElement).disabled).toBe(true);
expect((screen.getByRole('button', { name: /next/i }) as HTMLButtonElement).disabled).toBe(false);
rerender({ page: 5, totalPages: 5, onChange: () => {} });
expect((screen.getByRole('button', { name: /prev/i }) as HTMLButtonElement).disabled).toBe(false);
expect((screen.getByRole('button', { name: /next/i }) as HTMLButtonElement).disabled).toBe(true);
});
it('marks the current page button as aria-current', () => {
render(Pager, { props: { page: 3, totalPages: 5, onChange: () => {} } });
const current = screen.getByRole('button', { name: /go to page 3/i });
expect(current.getAttribute('aria-current')).toBe('page');
});
it('fires onChange with the clicked page number', async () => {
const onChange = vi.fn();
render(Pager, { props: { page: 1, totalPages: 5, onChange } });
screen.getByRole('button', { name: /go to page 3/i }).click();
expect(onChange).toHaveBeenCalledWith(3);
});
it('Prev decrements and Next increments via onChange', () => {
const onChange = vi.fn();
render(Pager, { props: { page: 3, totalPages: 5, onChange } });
screen.getByRole('button', { name: /prev/i }).click();
screen.getByRole('button', { name: /next/i }).click();
expect(onChange).toHaveBeenNthCalledWith(1, 2);
expect(onChange).toHaveBeenNthCalledWith(2, 4);
});
it('shows every page button when totalPages <= 7', () => {
render(Pager, { props: { page: 4, totalPages: 7, onChange: () => {} } });
for (let n = 1; n <= 7; n++) {
expect(screen.getByRole('button', { name: new RegExp(`go to page ${n}$`, 'i') })).toBeTruthy();
}
});
it('collapses middle pages with ellipsis when totalPages > 7 and current is in the middle', () => {
render(Pager, { props: { page: 10, totalPages: 24, onChange: () => {} } });
// First and last are always shown
expect(screen.getByRole('button', { name: /go to page 1$/i })).toBeTruthy();
expect(screen.getByRole('button', { name: /go to page 24$/i })).toBeTruthy();
// Current and direct neighbours are shown
expect(screen.getByRole('button', { name: /go to page 9$/i })).toBeTruthy();
expect(screen.getByRole('button', { name: /go to page 10$/i })).toBeTruthy();
expect(screen.getByRole('button', { name: /go to page 11$/i })).toBeTruthy();
// Distant pages are NOT rendered as buttons
expect(screen.queryByRole('button', { name: /go to page 2$/i })).toBeNull();
expect(screen.queryByRole('button', { name: /go to page 23$/i })).toBeNull();
// Ellipsis appears on both sides
const ellipses = screen.getAllByText('…');
expect(ellipses.length).toBeGreaterThanOrEqual(2);
});
it('does not duplicate boundary buttons when current is near the edge', () => {
render(Pager, { props: { page: 2, totalPages: 20, onChange: () => {} } });
// Each page button rendered should be unique — no duplicate "go to page 1"
const first = screen.getAllByRole('button', { name: /go to page 1$/i });
expect(first.length).toBe(1);
});
});

View File

@@ -1,6 +1,7 @@
<script lang="ts"> <script lang="ts">
import { onMount, onDestroy } from 'svelte'; import { onMount, onDestroy } from 'svelte';
import { goto } from '$app/navigation'; import { goto } from '$app/navigation';
import { page } from '$app/stores';
import { logout } from '$lib/api/auth'; import { logout } from '$lib/api/auth';
import { authConfig } from '$lib/auth-config.svelte'; import { authConfig } from '$lib/auth-config.svelte';
import { preferences } from '$lib/preferences.svelte'; import { preferences } from '$lib/preferences.svelte';
@@ -14,15 +15,49 @@
import Shield from '@lucide/svelte/icons/shield'; import Shield from '@lucide/svelte/icons/shield';
import '$lib/styles/tokens.css'; import '$lib/styles/tokens.css';
let { children } = $props(); let { children, data } = $props();
let loggingOut = $state(false); let loggingOut = $state(false);
let headerEl: HTMLElement | undefined = $state(); let headerEl: HTMLElement | undefined = $state();
// Static-route title map. Dynamic pages (manga / author / collection /
// chapter) override this via their own <svelte:head><title>, since the
// title depends on data the layout doesn't have. Routes omitted here
// (notably the dynamic ones) fall through to the bare brand and rely
// on the page to set the descriptive form.
const STATIC_TITLES: Record<string, string> = {
'/': 'Mangalord',
'/login': 'Mangalord | Login',
'/register': 'Mangalord | Register',
'/upload': 'Mangalord | Upload',
'/bookmarks': 'Mangalord | Bookmarks',
'/collections': 'Mangalord | Collections',
'/profile': 'Mangalord | Profile',
'/profile/account': 'Mangalord | Account',
'/profile/bookmarks': 'Mangalord | Bookmarks',
'/profile/collections': 'Mangalord | Collections',
'/profile/history': 'Mangalord | Reading history',
'/profile/preferences': 'Mangalord | Preferences',
'/admin': 'Mangalord | Admin',
'/admin/mangas': 'Mangalord | Admin · Mangas',
'/admin/users': 'Mangalord | Admin · Users',
'/admin/system': 'Mangalord | Admin · System'
};
const layoutTitle = $derived(STATIC_TITLES[$page.route?.id ?? ''] ?? 'Mangalord');
// Seed authConfig from the universal layout load. $effect keeps
// the store in sync if `data` is replaced by a subsequent layout
// load (client-side nav). The first run also covers initial
// hydration so the navbar's register link reflects the real
// server flag without a separate fetch.
$effect(() => {
authConfig.seed(data.authConfig);
});
onMount(() => { onMount(() => {
theme.init(); theme.init();
preferences.init(); preferences.init();
if (!session.loaded) session.refresh(); if (!session.loaded) session.refresh();
if (!authConfig.loaded) authConfig.load();
// Publish the header's measured height as a CSS custom // Publish the header's measured height as a CSS custom
// property so sticky descendants (e.g. the reader nav) can // property so sticky descendants (e.g. the reader nav) can
@@ -70,6 +105,10 @@
} }
</script> </script>
<svelte:head>
<title>{layoutTitle}</title>
</svelte:head>
<header bind:this={headerEl}> <header bind:this={headerEl}>
<nav aria-label="primary"> <nav aria-label="primary">
<a class="brand" href="/">Mangalord</a> <a class="brand" href="/">Mangalord</a>

View File

@@ -0,0 +1,41 @@
// Universal root load. Surfaces /auth/config to every page so the
// navbar + layout can render without an extra round-trip, and — when
// the backend reports PRIVATE_MODE=true — bounces anonymous visitors
// to /login before any page-specific load fires. The backend
// middleware is still the source of truth for the gate; this just
// matches the UX so users don't see a page full of failed fetches.
import type { LayoutLoad } from './$types';
import { redirect } from '@sveltejs/kit';
import { getAuthConfig, me, type AuthConfig } from '$lib/api/auth';
// Paths reachable anonymously even when private_mode is on. /login is
// the entry point of the auth flow; everything else (including
// /register, which is force-blocked in private mode) bounces.
const PRIVATE_MODE_BYPASS = new Set(['/login']);
const PUBLIC_DEFAULTS: AuthConfig = {
self_register_enabled: true,
private_mode: false
};
export const load: LayoutLoad = async ({ url }) => {
let authConfig: AuthConfig = PUBLIC_DEFAULTS;
try {
authConfig = await getAuthConfig();
} catch {
// Fail-soft: keep the optimistic public-mode defaults so a
// backend hiccup doesn't lock anyone out of the login page.
// No private data can leak through here — the backend
// middleware is still authoritative for the gate.
}
if (authConfig.private_mode && !PRIVATE_MODE_BYPASS.has(url.pathname)) {
const user = await me().catch(() => null);
if (!user) {
const next = url.pathname + url.search;
redirect(302, `/login?next=${encodeURIComponent(next)}`);
}
}
return { authConfig };
};

View File

@@ -13,10 +13,13 @@
import { listTags, type Tag } from '$lib/api/tags'; import { listTags, type Tag } from '$lib/api/tags';
import Chip from '$lib/components/Chip.svelte'; import Chip from '$lib/components/Chip.svelte';
import MangaCard from '$lib/components/MangaCard.svelte'; import MangaCard from '$lib/components/MangaCard.svelte';
import Pager from '$lib/components/Pager.svelte';
import Search from '@lucide/svelte/icons/search'; import Search from '@lucide/svelte/icons/search';
import SlidersHorizontal from '@lucide/svelte/icons/sliders-horizontal'; import SlidersHorizontal from '@lucide/svelte/icons/sliders-horizontal';
import Plus from '@lucide/svelte/icons/plus'; import Plus from '@lucide/svelte/icons/plus';
const PAGE_SIZE = 50;
let mangas: MangaCardData[] = $state([]); let mangas: MangaCardData[] = $state([]);
let search = $state(''); let search = $state('');
let sort: MangaSort = $state('recent'); let sort: MangaSort = $state('recent');
@@ -36,11 +39,21 @@
let total: number | null = $state(null); let total: number | null = $state(null);
let loading = $state(true); let loading = $state(true);
let error: string | null = $state(null); let error: string | null = $state(null);
let currentPage = $state(1);
const activeFilterCount = $derived( const activeFilterCount = $derived(
(statusFilter ? 1 : 0) + selectedGenres.length + selectedTags.length (statusFilter ? 1 : 0) + selectedGenres.length + selectedTags.length
); );
const totalPages = $derived(
total != null && total > 0 ? Math.ceil(total / PAGE_SIZE) : 1
);
// 1-indexed range like "51100 of 237", clamped to the actual loaded set
// in case the last page is short.
const rangeStart = $derived(mangas.length === 0 ? 0 : (currentPage - 1) * PAGE_SIZE + 1);
const rangeEnd = $derived((currentPage - 1) * PAGE_SIZE + mangas.length);
async function load() { async function load() {
loading = true; loading = true;
error = null; error = null;
@@ -50,7 +63,9 @@
status: statusFilter || undefined, status: statusFilter || undefined,
genreIds: selectedGenres.map((g) => g.id), genreIds: selectedGenres.map((g) => g.id),
tagIds: selectedTags.map((t) => t.id), tagIds: selectedTags.map((t) => t.id),
sort sort,
limit: PAGE_SIZE,
offset: (currentPage - 1) * PAGE_SIZE
}); });
mangas = result.items; mangas = result.items;
total = result.page.total; total = result.page.total;
@@ -71,11 +86,29 @@
params.set('genres', selectedGenres.map((g) => g.id).join(',')); params.set('genres', selectedGenres.map((g) => g.id).join(','));
if (selectedTags.length) if (selectedTags.length)
params.set('tags', selectedTags.map((t) => t.id).join(',')); params.set('tags', selectedTags.map((t) => t.id).join(','));
if (currentPage > 1) params.set('page', String(currentPage));
const qs = params.toString(); const qs = params.toString();
const url = qs ? `/?${qs}` : '/'; const url = qs ? `/?${qs}` : '/';
goto(url, { replaceState: true, keepFocus: true, noScroll: true }); goto(url, { replaceState: true, keepFocus: true, noScroll: true });
} }
// Filter / search / sort changes invalidate the current page — drop back
// to page 1 so the user isn't stranded on an out-of-range page when the
// result set shrinks. Direct page navigation calls `goToPage()` instead.
function resetAndReload() {
currentPage = 1;
syncUrl();
load();
}
function goToPage(p: number) {
if (p === currentPage) return;
currentPage = p;
syncUrl();
load();
if (browser) window.scrollTo({ top: 0, behavior: 'smooth' });
}
async function hydrateFromUrl() { async function hydrateFromUrl() {
// Parse the query and resolve the supplied ids back to full Tag / // Parse the query and resolve the supplied ids back to full Tag /
// Genre objects so the chip rows render real labels. // Genre objects so the chip rows render real labels.
@@ -100,6 +133,8 @@
const tags = await listTags({ limit: 50 }); const tags = await listTags({ limit: 50 });
selectedTags = tags.filter((t) => tagIds.includes(t.id)); selectedTags = tags.filter((t) => tagIds.includes(t.id));
} }
const pageParam = Number(url.searchParams.get('page') ?? '1');
currentPage = Number.isFinite(pageParam) && pageParam >= 1 ? Math.floor(pageParam) : 1;
// Open the filters panel if anything is active so the user can see why. // Open the filters panel if anything is active so the user can see why.
if (statusFilter || selectedGenres.length || selectedTags.length) { if (statusFilter || selectedGenres.length || selectedTags.length) {
filtersOpen = true; filtersOpen = true;
@@ -108,32 +143,27 @@
async function onSubmit(e: SubmitEvent) { async function onSubmit(e: SubmitEvent) {
e.preventDefault(); e.preventDefault();
syncUrl(); resetAndReload();
await load();
} }
function onSortChange() { function onSortChange() {
syncUrl(); resetAndReload();
load();
} }
function onStatusChange() { function onStatusChange() {
syncUrl(); resetAndReload();
load();
} }
function toggleGenre(g: Genre) { function toggleGenre(g: Genre) {
selectedGenres = selectedGenres.some((x) => x.id === g.id) selectedGenres = selectedGenres.some((x) => x.id === g.id)
? selectedGenres.filter((x) => x.id !== g.id) ? selectedGenres.filter((x) => x.id !== g.id)
: [...selectedGenres, g]; : [...selectedGenres, g];
syncUrl(); resetAndReload();
load();
} }
function removeTag(t: Tag) { function removeTag(t: Tag) {
selectedTags = selectedTags.filter((x) => x.id !== t.id); selectedTags = selectedTags.filter((x) => x.id !== t.id);
syncUrl(); resetAndReload();
load();
} }
function pickTag(t: Tag) { function pickTag(t: Tag) {
@@ -143,8 +173,7 @@
tagDraft = ''; tagDraft = '';
tagSuggestions = []; tagSuggestions = [];
tagSuggestHighlight = -1; tagSuggestHighlight = -1;
syncUrl(); resetAndReload();
load();
} }
function onTagDraftInput() { function onTagDraftInput() {
@@ -192,8 +221,7 @@
statusFilter = ''; statusFilter = '';
selectedGenres = []; selectedGenres = [];
selectedTags = []; selectedTags = [];
syncUrl(); resetAndReload();
load();
} }
onMount(async () => { onMount(async () => {
@@ -383,7 +411,7 @@
{:else} {:else}
{#if total !== null} {#if total !== null}
<p class="count" data-testid="manga-total"> <p class="count" data-testid="manga-total">
Showing {mangas.length} of {total} Showing {rangeStart}{rangeEnd} of {total}
</p> </p>
{/if} {/if}
<ul class="manga-grid" data-testid="manga-list"> <ul class="manga-grid" data-testid="manga-list">
@@ -391,6 +419,12 @@
<MangaCard manga={m} authors={m.authors} genres={m.genres} /> <MangaCard manga={m} authors={m.authors} genres={m.genres} />
{/each} {/each}
</ul> </ul>
<Pager
page={currentPage}
{totalPages}
onChange={goToPage}
testid="manga-pager"
/>
{/if} {/if}
<style> <style>

View File

@@ -71,16 +71,19 @@
> >
<input <input
type="search" type="search"
placeholder="search by title" placeholder="Search by title"
bind:value={search} bind:value={search}
data-testid="admin-mangas-search" data-testid="admin-mangas-search"
/> />
<select bind:value={syncFilter} aria-label="sync state"> <label class="sync-label">
<option value="">all states</option> <span>Sync state</span>
<option value="in_progress">in progress</option> <select bind:value={syncFilter} aria-label="sync state">
<option value="dropped">dropped</option> <option value="">All</option>
<option value="synced">synced</option> <option value="in_progress">In progress</option>
</select> <option value="dropped">Dropped</option>
<option value="synced">Synced</option>
</select>
</label>
<button type="submit">Search</button> <button type="submit">Search</button>
</form> </form>
@@ -173,17 +176,28 @@
} }
form { form {
display: flex; display: flex;
flex-wrap: wrap;
align-items: center;
gap: var(--space-2); gap: var(--space-2);
margin-bottom: var(--space-3); margin-bottom: var(--space-3);
} }
input[type='search'] { input[type='search'] {
flex: 1; flex: 1;
min-width: 0;
max-width: 24rem;
padding: var(--space-2) var(--space-3); padding: var(--space-2) var(--space-3);
border: 1px solid var(--border); border: 1px solid var(--border);
border-radius: var(--radius-md); border-radius: var(--radius-md);
background: var(--surface); background: var(--surface);
color: var(--text); color: var(--text);
} }
.sync-label {
display: inline-flex;
align-items: center;
gap: var(--space-2);
color: var(--text-muted);
font-size: var(--font-sm);
}
select { select {
padding: var(--space-2) var(--space-3); padding: var(--space-2) var(--space-3);
border-radius: var(--radius-md); border-radius: var(--radius-md);

View File

@@ -1,15 +1,33 @@
<script lang="ts"> <script lang="ts">
import MangaCard from '$lib/components/MangaCard.svelte'; import MangaCard from '$lib/components/MangaCard.svelte';
import Pager from '$lib/components/Pager.svelte';
import ArrowLeft from '@lucide/svelte/icons/arrow-left'; import ArrowLeft from '@lucide/svelte/icons/arrow-left';
import { goto } from '$app/navigation';
import { page } from '$app/stores';
let { data } = $props(); let { data } = $props();
const author = $derived(data.author); const author = $derived(data.author);
const mangas = $derived(data.mangas); const mangas = $derived(data.mangas);
const total = $derived(data.total); const total = $derived(data.total);
const currentPage = $derived(data.currentPage);
const pageSize = $derived(data.pageSize);
const totalPages = $derived(
total != null && total > 0 ? Math.ceil(total / pageSize) : 1
);
const rangeStart = $derived(mangas.length === 0 ? 0 : (currentPage - 1) * pageSize + 1);
const rangeEnd = $derived((currentPage - 1) * pageSize + mangas.length);
function goToPage(p: number) {
if (p === currentPage) return;
const url = new URL($page.url);
if (p === 1) url.searchParams.delete('page');
else url.searchParams.set('page', String(p));
goto(url.pathname + url.search, { noScroll: false });
}
</script> </script>
<svelte:head> <svelte:head>
<title>{author.name} — Mangalord</title> <title>Mangalord | {author.name}</title>
</svelte:head> </svelte:head>
<nav class="back"> <nav class="back">
@@ -34,7 +52,7 @@
{:else} {:else}
{#if total != null} {#if total != null}
<p class="meta" data-testid="author-shown-of-total"> <p class="meta" data-testid="author-shown-of-total">
Showing {mangas.length} of {total} Showing {rangeStart}{rangeEnd} of {total}
</p> </p>
{/if} {/if}
<ul class="manga-grid" data-testid="author-manga-list"> <ul class="manga-grid" data-testid="author-manga-list">
@@ -42,6 +60,12 @@
<MangaCard manga={m} testid={`author-manga-${m.id}`} /> <MangaCard manga={m} testid={`author-manga-${m.id}`} />
{/each} {/each}
</ul> </ul>
<Pager
page={currentPage}
{totalPages}
onChange={goToPage}
testid="author-pager"
/>
{/if} {/if}
<style> <style>

View File

@@ -5,13 +5,27 @@ import type { PageLoad } from './$types';
export const ssr = false; export const ssr = false;
export const load: PageLoad = async ({ params }) => { const PAGE_SIZE = 50;
export const load: PageLoad = async ({ params, url }) => {
const pageParam = Number(url.searchParams.get('page') ?? '1');
const currentPage =
Number.isFinite(pageParam) && pageParam >= 1 ? Math.floor(pageParam) : 1;
try { try {
const [author, mangas] = await Promise.all([ const [author, mangas] = await Promise.all([
getAuthor(params.id), getAuthor(params.id),
listAuthorMangas(params.id, { limit: 50 }) listAuthorMangas(params.id, {
limit: PAGE_SIZE,
offset: (currentPage - 1) * PAGE_SIZE
})
]); ]);
return { author, mangas: mangas.items, total: mangas.page.total }; return {
author,
mangas: mangas.items,
total: mangas.page.total,
currentPage,
pageSize: PAGE_SIZE
};
} catch (e) { } catch (e) {
// 404 surfaces as a real SvelteKit error so the framework shell // 404 surfaces as a real SvelteKit error so the framework shell
// renders the standard not-found page instead of the route's // renders the standard not-found page instead of the route's

View File

@@ -7,10 +7,6 @@
const error = $derived(data.error); const error = $derived(data.error);
</script> </script>
<svelte:head>
<title>Bookmarks — Mangalord</title>
</svelte:head>
<h1>Bookmarks</h1> <h1>Bookmarks</h1>
{#if error} {#if error}

View File

@@ -5,10 +5,6 @@
const collections = $derived(data.collections); const collections = $derived(data.collections);
</script> </script>
<svelte:head>
<title>Collections — Mangalord</title>
</svelte:head>
<h1>Collections</h1> <h1>Collections</h1>
{#if !data.authenticated} {#if !data.authenticated}

View File

@@ -75,7 +75,7 @@
</script> </script>
<svelte:head> <svelte:head>
<title>{collection.name} — Mangalord</title> <title>Mangalord | {collection.name}</title>
</svelte:head> </svelte:head>
<nav class="back"> <nav class="back">

View File

@@ -0,0 +1,113 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
// Mock the API client *before* importing the load function so the
// module under test picks up the mock when it resolves its imports.
vi.mock('$lib/api/auth', () => ({
getAuthConfig: vi.fn(),
me: vi.fn()
}));
import { load } from './+layout';
import { getAuthConfig, me, type AuthConfig } from '$lib/api/auth';
type MinimalLoadEvent = { url: { pathname: string; search: string } };
function event(pathname: string, search = ''): MinimalLoadEvent {
return { url: { pathname, search } };
}
// `LayoutLoad`'s declared return type is `void | …`. Our `load`
// always returns `{ authConfig }`, but TypeScript can't narrow on
// that at the call site. Wrap to remove the `void` arm so the
// assertions stay terse.
async function callLoad(ev: MinimalLoadEvent): Promise<{ authConfig: AuthConfig }> {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const result = await load(ev as any);
return result as { authConfig: AuthConfig };
}
const PUBLIC_CFG = { self_register_enabled: true, private_mode: false };
const PRIVATE_CFG = { self_register_enabled: false, private_mode: true };
const aliceUser = {
id: 'u1',
username: 'alice',
created_at: '2026-01-01T00:00:00Z',
is_admin: false
};
describe('root +layout load', () => {
beforeEach(() => {
vi.mocked(getAuthConfig).mockReset();
vi.mocked(me).mockReset();
});
it('public mode: returns authConfig data, never calls me()', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PUBLIC_CFG);
const data = await callLoad(event('/'));
expect(data.authConfig).toEqual(PUBLIC_CFG);
expect(me).not.toHaveBeenCalled();
});
it('private mode + anonymous on `/`: throws redirect(302) to /login with next=', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
vi.mocked(me).mockResolvedValue(null);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await expect(load(event('/') as any)).rejects.toMatchObject({
status: 302,
location: '/login?next=%2F'
});
});
it('private mode + anonymous on `/login`: passes through without redirect', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
const data = await callLoad(event('/login'));
expect(data.authConfig.private_mode).toBe(true);
// me() must not run on the login page itself, otherwise anonymous
// visits make an extra round-trip every page load.
expect(me).not.toHaveBeenCalled();
});
it('private mode + logged-in user: no redirect, returns authConfig', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
vi.mocked(me).mockResolvedValue(aliceUser);
const data = await callLoad(event('/'));
expect(data.authConfig).toEqual(PRIVATE_CFG);
});
it('private mode + anonymous: preserves pathname AND search in next=', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
vi.mocked(me).mockResolvedValue(null);
await expect(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
load(event('/manga/abc', '?page=3') as any)
).rejects.toMatchObject({
status: 302,
location: '/login?next=%2Fmanga%2Fabc%3Fpage%3D3'
});
});
it('private mode + anonymous on /register: redirects to /login (register is never reachable in private mode)', async () => {
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
vi.mocked(me).mockResolvedValue(null);
// eslint-disable-next-line @typescript-eslint/no-explicit-any
await expect(load(event('/register') as any)).rejects.toMatchObject({
status: 302,
location: '/login?next=%2Fregister'
});
});
it('getAuthConfig failure: falls back to public-mode defaults, no redirect', async () => {
// The backend middleware is the source of truth for the gate;
// if the config probe blips, fail soft so a brief outage doesn't
// lock everyone out of even the login page. No private data
// can leak because the backend still 401s every request.
vi.mocked(getAuthConfig).mockRejectedValue(new Error('network'));
const data = await callLoad(event('/'));
expect(data.authConfig).toEqual({
self_register_enabled: true,
private_mode: false
});
expect(me).not.toHaveBeenCalled();
});
});

View File

@@ -1,13 +1,16 @@
<script lang="ts"> <script lang="ts">
import { fileUrl } from '$lib/api/client'; import { fileUrl, ApiError } from '$lib/api/client';
import { createBookmark, deleteBookmark, type Bookmark } from '$lib/api/bookmarks'; import { createBookmark, deleteBookmark, type Bookmark } from '$lib/api/bookmarks';
import { import {
attachTag, attachTag,
detachTag, detachTag,
type AuthorRef, type AuthorRef,
type GenreRef, type GenreRef,
type MangaDetail,
type TagRef type TagRef
} from '$lib/api/mangas'; } from '$lib/api/mangas';
import { resyncManga } from '$lib/api/admin';
import { chapterLabel } from '$lib/api/chapters';
import { listTags, type Tag } from '$lib/api/tags'; import { listTags, type Tag } from '$lib/api/tags';
import { session } from '$lib/session.svelte'; import { session } from '$lib/session.svelte';
import Chip from '$lib/components/Chip.svelte'; import Chip from '$lib/components/Chip.svelte';
@@ -16,9 +19,15 @@
import FolderPlus from '@lucide/svelte/icons/folder-plus'; import FolderPlus from '@lucide/svelte/icons/folder-plus';
import Pencil from '@lucide/svelte/icons/pencil'; import Pencil from '@lucide/svelte/icons/pencil';
import UploadCloud from '@lucide/svelte/icons/upload-cloud'; import UploadCloud from '@lucide/svelte/icons/upload-cloud';
import RefreshCw from '@lucide/svelte/icons/refresh-cw';
let { data } = $props(); let { data } = $props();
const manga = $derived(data.manga); // `manga` is locally overridable so a successful force resync can
// swap in the refreshed detail (new cover URL, refreshed status,
// etc.) without a router reload. Falls back to the server-loaded
// data otherwise.
let mangaOverride = $state<MangaDetail | null>(null);
const manga = $derived<MangaDetail>(mangaOverride ?? data.manga);
const chapters = $derived(data.chapters); const chapters = $derived(data.chapters);
const readProgress = $derived(data.readProgress); const readProgress = $derived(data.readProgress);
/** Chapter row from the local chapters list when present (so we /** Chapter row from the local chapters list when present (so we
@@ -37,6 +46,11 @@
continueChapter?.number ?? readProgress?.chapter_number ?? null continueChapter?.number ?? readProgress?.chapter_number ?? null
); );
const continueChapterTitle = $derived(continueChapter?.title ?? null); const continueChapterTitle = $derived(continueChapter?.title ?? null);
const continueLabel = $derived(
continueChapterNumber != null
? chapterLabel({ number: continueChapterNumber, title: continueChapterTitle })
: null
);
const authors = $derived<AuthorRef[]>(manga.authors); const authors = $derived<AuthorRef[]>(manga.authors);
const genres = $derived<GenreRef[]>(manga.genres); const genres = $derived<GenreRef[]>(manga.genres);
@@ -171,10 +185,35 @@
const statusLabel = $derived(manga.status === 'completed' ? 'Completed' : 'Ongoing'); const statusLabel = $derived(manga.status === 'completed' ? 'Completed' : 'Ongoing');
let collectionModalOpen = $state(false); let collectionModalOpen = $state(false);
// ---- Admin force resync ----
let resyncBusy = $state(false);
let resyncMessage = $state<{ kind: 'ok' | 'err'; text: string } | null>(null);
async function forceResync() {
if (!session.user?.is_admin || resyncBusy) return;
resyncBusy = true;
resyncMessage = null;
try {
const r = await resyncManga(manga.id);
mangaOverride = r.manga;
const coverNote = r.cover_fetched
? ' Cover re-downloaded.'
: ' Cover unchanged.';
resyncMessage = {
kind: 'ok',
text: `Metadata ${r.metadata_status}.${coverNote}`
};
} catch (e) {
const msg = e instanceof ApiError ? e.message : (e as Error).message;
resyncMessage = { kind: 'err', text: msg };
} finally {
resyncBusy = false;
}
}
</script> </script>
<svelte:head> <svelte:head>
<title>{manga.title} — Mangalord</title> <title>Mangalord | {manga.title}</title>
</svelte:head> </svelte:head>
<article> <article>
@@ -344,7 +383,34 @@
<UploadCloud size={16} aria-hidden="true" /> <UploadCloud size={16} aria-hidden="true" />
<span>Upload chapter</span> <span>Upload chapter</span>
</a> </a>
{#if session.user.is_admin}
<button
type="button"
class="action"
onclick={forceResync}
disabled={resyncBusy}
title="Refetch metadata + cover from the crawler source"
data-testid="force-resync-manga"
>
<RefreshCw
size={16}
aria-hidden="true"
class={resyncBusy ? 'spin' : ''}
/>
<span>{resyncBusy ? 'Resyncing…' : 'Force resync'}</span>
</button>
{/if}
</div> </div>
{#if resyncMessage}
<p
class="resync-msg"
class:err={resyncMessage.kind === 'err'}
role="status"
data-testid="force-resync-message"
>
{resyncMessage.text}
</p>
{/if}
{:else} {:else}
<a class="action" href="/login" data-testid="bookmark-signin"> <a class="action" href="/login" data-testid="bookmark-signin">
Sign in to bookmark or collect Sign in to bookmark or collect
@@ -371,7 +437,7 @@
> >
<span class="continue-label">Continue reading</span> <span class="continue-label">Continue reading</span>
<span class="continue-target"> <span class="continue-target">
Chapter {continueChapterNumber}{#if continueChapterTitle}: {continueChapterTitle}{/if} {continueLabel}
{#if readProgress && readProgress.page > 1} {#if readProgress && readProgress.page > 1}
— page {readProgress.page} — page {readProgress.page}
{/if} {/if}
@@ -385,7 +451,7 @@
{#each chapters as c (c.id)} {#each chapters as c (c.id)}
<li> <li>
<a href="/manga/{manga.id}/chapter/{c.id}"> <a href="/manga/{manga.id}/chapter/{c.id}">
Chapter {c.number}{#if c.title}: {c.title}{/if} {chapterLabel(c)}
</a> </a>
<span class="pages">({c.page_count} pages)</span> <span class="pages">({c.page_count} pages)</span>
</li> </li>
@@ -586,6 +652,29 @@
color: var(--text); color: var(--text);
} }
.resync-msg {
margin-top: var(--space-2);
color: var(--text-muted);
font-size: var(--font-sm);
}
.resync-msg.err {
color: var(--danger);
}
:global(.spin) {
animation: spin 0.9s linear infinite;
}
@keyframes spin {
from {
transform: rotate(0deg);
}
to {
transform: rotate(360deg);
}
}
.continue { .continue {
display: flex; display: flex;
flex-direction: column; flex-direction: column;

View File

@@ -1,10 +1,12 @@
<script lang="ts"> <script lang="ts">
import { onMount, onDestroy } from 'svelte'; import { onMount, onDestroy } from 'svelte';
import { goto } from '$app/navigation'; import { goto, invalidateAll } from '$app/navigation';
import { fileUrl } from '$lib/api/client'; import { fileUrl, ApiError } from '$lib/api/client';
import { GAP_PX, type ReaderPageGap } from '$lib/api/preferences'; import { GAP_PX, type ReaderPageGap } from '$lib/api/preferences';
import { preferences } from '$lib/preferences.svelte'; import { preferences } from '$lib/preferences.svelte';
import { updateReadProgress } from '$lib/api/read_progress'; import { updateReadProgress } from '$lib/api/read_progress';
import { chapterLabel } from '$lib/api/chapters';
import { resyncChapter } from '$lib/api/admin';
import { readerFullscreen } from '$lib/reader-fullscreen.svelte'; import { readerFullscreen } from '$lib/reader-fullscreen.svelte';
import { session } from '$lib/session.svelte'; import { session } from '$lib/session.svelte';
import ChevronLeft from '@lucide/svelte/icons/chevron-left'; import ChevronLeft from '@lucide/svelte/icons/chevron-left';
@@ -15,6 +17,7 @@
import ScrollText from '@lucide/svelte/icons/scroll-text'; import ScrollText from '@lucide/svelte/icons/scroll-text';
import Maximize2 from '@lucide/svelte/icons/maximize-2'; import Maximize2 from '@lucide/svelte/icons/maximize-2';
import Minimize2 from '@lucide/svelte/icons/minimize-2'; import Minimize2 from '@lucide/svelte/icons/minimize-2';
import RefreshCw from '@lucide/svelte/icons/refresh-cw';
let { data } = $props(); let { data } = $props();
const manga = $derived(data.manga); const manga = $derived(data.manga);
@@ -26,28 +29,25 @@
const gapPx = $derived(GAP_PX[preferences.readerPageGap]); const gapPx = $derived(GAP_PX[preferences.readerPageGap]);
const pageTitle = $derived( const pageTitle = $derived(
chapter.title `Mangalord | ${manga.title} · ${chapterLabel(chapter)}`
? `${manga.title} — Ch. ${chapter.number}: ${chapter.title}`
: `${manga.title} — Ch. ${chapter.number}`
); );
// Prev/next chapter computed from the chapter list. listChapters // Prev/next chapter computed from the chapter list. listChapters
// returns chapters in number ASC order; we still resolve via find // returns chapters in display order (reversed source-site order, so
// rather than index because the current chapter's position may // oldest first — see backend repo::chapter::list_for_manga), and
// not be `chapter.number - 1` (sparse numbering / chapter 0.5 / // prev/next walks that order positionally. Resolving the current
// future skipped numbers). // index via `find` rather than `chapter.number - 1` matters because
const sortedChapters = $derived( // numbers aren't a reliable index: variants share numbers, non-
[...chapters].sort((a, b) => a.number - b.number) // numeric entries pin to 0, and uploads can sparse-fill.
);
const currentIdx = $derived( const currentIdx = $derived(
sortedChapters.findIndex((c) => c.id === chapter.id) chapters.findIndex((c) => c.id === chapter.id)
); );
const prevChapter = $derived( const prevChapter = $derived(
currentIdx > 0 ? sortedChapters[currentIdx - 1] : null currentIdx > 0 ? chapters[currentIdx - 1] : null
); );
const nextChapter = $derived( const nextChapter = $derived(
currentIdx >= 0 && currentIdx < sortedChapters.length - 1 currentIdx >= 0 && currentIdx < chapters.length - 1
? sortedChapters[currentIdx + 1] ? chapters[currentIdx + 1]
: null : null
); );
@@ -256,6 +256,36 @@
if (typeof window !== 'undefined') window.removeEventListener('keydown', onKeydown); if (typeof window !== 'undefined') window.removeEventListener('keydown', onKeydown);
}); });
// ---- Admin force resync (current chapter) ----
let resyncBusy = $state(false);
let resyncMessage = $state<{ kind: 'ok' | 'err'; text: string } | null>(null);
async function forceResync() {
if (!session.user?.is_admin || resyncBusy) return;
resyncBusy = true;
resyncMessage = null;
try {
const r = await resyncChapter(chapter.id);
if (r.outcome === 'fetched') {
resyncMessage = {
kind: 'ok',
text: `Refetched ${r.pages} page${r.pages === 1 ? '' : 's'}. Reloading…`
};
// Re-run all loaders for this route so the reader picks
// up the freshly-downloaded pages. The page.ts loader
// doesn't `depends()` on anything explicitly, so
// invalidateAll is the right brush here.
await invalidateAll();
} else {
resyncMessage = { kind: 'ok', text: 'No new pages — source had nothing fresh.' };
}
} catch (e) {
const msg = e instanceof ApiError ? e.message : (e as Error).message;
resyncMessage = { kind: 'err', text: msg };
} finally {
resyncBusy = false;
}
}
// ---- Reading progress tracking ---- // ---- Reading progress tracking ----
// //
// High-water mark seeded from the server: progress only ever moves // High-water mark seeded from the server: progress only ever moves
@@ -427,6 +457,27 @@
</a> </a>
<div class="controls" role="group" aria-label="reader options"> <div class="controls" role="group" aria-label="reader options">
<label class="chapter-field">
<span class="visually-hidden">Jump to chapter</span>
<select
class="chapter-select"
value={chapter.id}
onchange={(e) => {
const target = (e.currentTarget as HTMLSelectElement).value;
if (target && target !== chapter.id) {
void goto(`/manga/${manga.id}/chapter/${target}`);
}
}}
data-testid="reader-chapter-select"
>
{#each chapters as c (c.id)}
<option value={c.id}>
{chapterLabel(c)}
</option>
{/each}
</select>
</label>
<div class="mode-toggle" role="radiogroup" aria-label="layout"> <div class="mode-toggle" role="radiogroup" aria-label="layout">
<button <button
type="button" type="button"
@@ -481,6 +532,23 @@
{/if} {/if}
</span> </span>
{#if session.user?.is_admin}
<button
type="button"
class="reader-resync"
onclick={forceResync}
disabled={resyncBusy}
title={resyncMessage?.kind === 'err'
? resyncMessage.text
: 'Force refetch this chapter from the crawler source'}
aria-label="Force resync chapter"
data-testid="force-resync-chapter"
>
<RefreshCw size={16} aria-hidden="true" class={resyncBusy ? 'spin' : ''} />
<span>{resyncBusy ? 'Resyncing…' : 'Force resync'}</span>
</button>
{/if}
<button <button
type="button" type="button"
class="fullscreen-toggle" class="fullscreen-toggle"
@@ -494,6 +562,17 @@
</button> </button>
</nav> </nav>
{#if resyncMessage}
<p
class="resync-toast"
class:err={resyncMessage.kind === 'err'}
role="status"
data-testid="force-resync-message"
>
{resyncMessage.text}
</p>
{/if}
<!-- <!--
Floating exit affordance — only rendered while focus mode is on. Floating exit affordance — only rendered while focus mode is on.
Lives in the top-right corner with a low resting opacity so it Lives in the top-right corner with a low resting opacity so it
@@ -604,7 +683,7 @@
</span> </span>
</button> </button>
<span class="chapter-bar-current" aria-hidden="true"> <span class="chapter-bar-current" aria-hidden="true">
Ch. {chapter.number}{#if chapter.title}{chapter.title}{/if} {chapterLabel(chapter)}
</span> </span>
<button <button
type="button" type="button"
@@ -741,7 +820,8 @@
outline-offset: -2px; outline-offset: -2px;
} }
.gap-field select { .gap-field select,
.chapter-select {
height: 32px; height: 32px;
padding: 0 var(--space-2); padding: 0 var(--space-2);
background: var(--surface); background: var(--surface);
@@ -751,6 +831,13 @@
font-size: var(--font-sm); font-size: var(--font-sm);
} }
/* Cap the chapter dropdown's resting width so long titles don't
push the rest of the nav off-screen; the native control's
expanded menu still shows full option text on focus. */
.chapter-select {
max-width: 16rem;
}
.visually-hidden { .visually-hidden {
position: absolute; position: absolute;
width: 1px; width: 1px;
@@ -911,7 +998,8 @@
} }
/* ===== Focus-mode controls ===== */ /* ===== Focus-mode controls ===== */
.fullscreen-toggle { .fullscreen-toggle,
.reader-resync {
display: inline-flex; display: inline-flex;
align-items: center; align-items: center;
gap: var(--space-1); gap: var(--space-1);
@@ -925,12 +1013,52 @@
font-size: var(--font-xs); font-size: var(--font-xs);
} }
.fullscreen-toggle:hover { .fullscreen-toggle:hover,
.reader-resync:hover:not(:disabled) {
background: var(--surface-elevated); background: var(--surface-elevated);
color: var(--text); color: var(--text);
border-color: var(--primary); border-color: var(--primary);
} }
.reader-resync:disabled {
opacity: 0.7;
cursor: progress;
}
.resync-toast {
position: fixed;
top: calc(var(--app-header-h) + var(--reader-nav-h, 48px) + var(--space-2));
right: var(--space-3);
z-index: 11;
margin: 0;
padding: var(--space-2) var(--space-3);
max-width: min(420px, calc(100vw - 2 * var(--space-3)));
background: var(--surface);
color: var(--text);
border: 1px solid var(--primary);
border-radius: var(--radius-md);
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.12);
font-size: var(--font-sm);
}
.resync-toast.err {
border-color: var(--danger);
color: var(--danger);
}
:global(.spin) {
animation: spin 0.9s linear infinite;
}
@keyframes spin {
from {
transform: rotate(0deg);
}
to {
transform: rotate(360deg);
}
}
/* Small floating exit affordance — corner-pinned, low resting /* Small floating exit affordance — corner-pinned, low resting
opacity so it doesn't sit on the chapter image too aggressively opacity so it doesn't sit on the chapter image too aggressively
but is still findable without hover. */ but is still findable without hover. */

View File

@@ -135,7 +135,7 @@
</script> </script>
<svelte:head> <svelte:head>
<title>Edit {manga.title} — Mangalord</title> <title>Mangalord | Edit · {manga.title}</title>
</svelte:head> </svelte:head>
<h1>Edit manga</h1> <h1>Edit manga</h1>

View File

@@ -57,7 +57,7 @@
</script> </script>
<svelte:head> <svelte:head>
<title>Upload chapter {manga.title} — Mangalord</title> <title>Mangalord | Upload chapter · {manga.title}</title>
</svelte:head> </svelte:head>
<nav class="back"> <nav class="back">

View File

@@ -35,10 +35,6 @@
); );
</script> </script>
<svelte:head>
<title>Profile — Mangalord</title>
</svelte:head>
<header class="profile-header"> <header class="profile-header">
<h1>Profile</h1> <h1>Profile</h1>
{#if !session.loaded} {#if !session.loaded}

View File

@@ -1,5 +1,6 @@
<script lang="ts"> <script lang="ts">
import { fileUrl } from '$lib/api/client'; import { fileUrl } from '$lib/api/client';
import { chapterLabel } from '$lib/api/chapters';
import { clearReadProgress, type ReadProgressSummary } from '$lib/api/read_progress'; import { clearReadProgress, type ReadProgressSummary } from '$lib/api/read_progress';
import BookImage from '@lucide/svelte/icons/book-image'; import BookImage from '@lucide/svelte/icons/book-image';
import Trash2 from '@lucide/svelte/icons/trash-2'; import Trash2 from '@lucide/svelte/icons/trash-2';
@@ -186,7 +187,7 @@
<a href="/manga/{u.manga_id}" class="title">{u.manga_title}</a> <a href="/manga/{u.manga_id}" class="title">{u.manga_title}</a>
<span class="target"> <span class="target">
<a href="/manga/{u.manga_id}/chapter/{u.chapter.id}"> <a href="/manga/{u.manga_id}/chapter/{u.chapter.id}">
Chapter {u.chapter.number}{#if u.chapter.title}: {u.chapter.title}{/if} {chapterLabel(u.chapter)}
</a> </a>
<span class="muted">({u.chapter.page_count} pages)</span> <span class="muted">({u.chapter.page_count} pages)</span>
</span> </span>

View File

@@ -184,10 +184,6 @@
} }
</script> </script>
<svelte:head>
<title>Upload — Mangalord</title>
</svelte:head>
<h1>Create manga</h1> <h1>Create manga</h1>
{#if !session.loaded} {#if !session.loaded}

View File

@@ -21,6 +21,12 @@ export default defineConfig(({ mode }) => {
environment: 'jsdom', environment: 'jsdom',
include: ['src/**/*.test.ts'], include: ['src/**/*.test.ts'],
globals: false globals: false
},
resolve: {
// Use Svelte's browser entry under vitest so component tests can
// mount with @testing-library/svelte. The default (server entry)
// throws lifecycle_function_unavailable on mount().
conditions: mode === 'test' ? ['browser'] : []
} }
}; };
}); });

40
tor/entrypoint.sh Executable file
View File

@@ -0,0 +1,40 @@
#!/bin/sh
# Mangalord wrapper around dockurr/tor's tor binary.
#
# We bypass the image's stock entrypoint for two reasons:
# 1. It generates a `ControlPort 9051` line that binds to localhost
# only (tor's default), but our backend lives in a separate
# container and needs to reach 0.0.0.0:9051.
# 2. It then *skips* writing HashedControlPassword whenever the
# user's torrc declares a ControlPort, so we can't both bind to
# 0.0.0.0 and benefit from its auto-hashing — it's one or the
# other. Doing the hashing ourselves is simpler than threading
# around its logic.
#
# This wrapper hashes $PASSWORD with `tor --hash-password`, appends a
# `HashedControlPassword` line to a writable copy of /etc/tor/torrc,
# then execs tor. Container runs as root (image default); tor binds
# 9050/9051 which don't require root and is fine inside a single-
# purpose container.
set -eu
if [ -z "${PASSWORD:-}" ]; then
echo "ERROR: PASSWORD env must be set (the plain string the backend will" >&2
echo " send as CRAWLER_TOR_CONTROL_PASSWORD)" >&2
exit 1
fi
# `tor --hash-password` prints the hash on the last line of stdout
# (preceded by initialization noise).
HASH=$(tor --hash-password "$PASSWORD" 2>/dev/null | tail -n1)
if [ -z "$HASH" ]; then
echo "ERROR: 'tor --hash-password' produced no output" >&2
exit 1
fi
# /etc/tor/torrc is bind-mounted read-only, so copy + append.
cp /etc/tor/torrc /tmp/torrc
printf '\n# Injected by mangalord-entrypoint.sh from $PASSWORD env.\nHashedControlPassword %s\n' "$HASH" >> /tmp/torrc
exec tor -f /tmp/torrc

38
tor/torrc Normal file
View File

@@ -0,0 +1,38 @@
# torrc for the Mangalord crawler.
#
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
# crawler talks to this daemon over the internal compose network only:
# `expose:` on the tor service surfaces 9050/9051 to sibling
# containers, never to the host.
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
# IsolateDestPort means each new (destination IP, port) draws a fresh
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
# navigation instead of having to wait for an existing dirty circuit
# to age out.
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
# Control port for SIGNAL NEWNYM. We rely on the dockurr/tor
# entrypoint to inject `HashedControlPassword <hash>` from its
# PASSWORD env var (see docker-compose.yml `tor.environment.PASSWORD`)
# via a higher-priority --defaults-torrc. We just need to declare the
# port itself here.
ControlPort 0.0.0.0:9051
# Keep circuits dirty for a while so a single chapter (which serial-
# fetches all its images through the same SOCKS endpoint) finishes on
# one circuit rather than mid-circuit-rotating in a way that looks like
# anti-bot evasion to the target. NEWNYM still forces a fresh circuit
# immediately when we want one — this is just the idle-rotation knob.
MaxCircuitDirtiness 600
# Drop privileges to the image's `tor` user after binding ports.
# Required because /var/lib/tor (the image's DataDirectory volume)
# is owned by tor:tor and tor refuses to use a data dir it doesn't
# own. Our entrypoint runs as root only so it can call
# `tor --hash-password` and write /tmp/torrc.
User tor
# Data + logs.
DataDirectory /var/lib/tor
Log notice stdout