Compare commits
32 Commits
fix/test-b
...
feat/crawl
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e02d125f51 | ||
|
|
fb4182f68d | ||
|
|
da6e320836 | ||
|
|
832042d2b7 | ||
|
|
ec0a8f2b5d | ||
|
|
6f0a8d88c9 | ||
|
|
41bf9455a1 | ||
|
|
cd0a1e13a9 | ||
|
|
3f91bea768 | ||
|
|
7a6815661f | ||
|
|
679abae736 | ||
|
|
b812c6d16c | ||
|
|
e93eec89e5 | ||
|
|
8818c890c5 | ||
|
|
c134bdbbde | ||
|
|
5c22dfdb41 | ||
|
|
e50fc093c3 | ||
|
|
72756cfef2 | ||
|
|
4e20350645 | ||
|
|
713ca139c4 | ||
|
|
e3cff9d874 | ||
|
|
d47e832613 | ||
|
|
c30c7a546f | ||
|
|
a0db7beb81 | ||
|
|
ecbbebafc4 | ||
|
|
8c6378b877 | ||
|
|
8557e432a2 | ||
|
|
d6d84dedcb | ||
|
|
d37b94871e | ||
| 8e39fadd21 | |||
| 3b3d13a0f6 | |||
| 0f90af80cb |
41
.env.example
41
.env.example
@@ -74,6 +74,10 @@ CRAWLER_DOWNLOAD_ALLOWLIST=
|
|||||||
CRAWLER_ALLOW_ANY_HOST=false
|
CRAWLER_ALLOW_ANY_HOST=false
|
||||||
# Hard cap on a single image body. Default 32 MiB.
|
# Hard cap on a single image body. Default 32 MiB.
|
||||||
CRAWLER_MAX_IMAGE_BYTES=33554432
|
CRAWLER_MAX_IMAGE_BYTES=33554432
|
||||||
|
# Max manga detail fetches per metadata pass (both the in-process daemon
|
||||||
|
# and the `bin/crawler` CLI). 0 means no cap — let the source walker run
|
||||||
|
# to completion. Useful for capped test runs against a new source.
|
||||||
|
CRAWLER_LIMIT=0
|
||||||
# Path to a system Chromium binary. When set, the crawler skips the
|
# Path to a system Chromium binary. When set, the crawler skips the
|
||||||
# bundled-fetcher download. Required on platforms without a usable
|
# bundled-fetcher download. Required on platforms without a usable
|
||||||
# upstream Chromium build (notably Linux_arm64 / Raspberry Pi). On
|
# upstream Chromium build (notably Linux_arm64 / Raspberry Pi). On
|
||||||
@@ -83,6 +87,43 @@ CRAWLER_MAX_IMAGE_BYTES=33554432
|
|||||||
# the image actually contains the binary.
|
# the image actually contains the binary.
|
||||||
CRAWLER_CHROMIUM_BINARY=
|
CRAWLER_CHROMIUM_BINARY=
|
||||||
|
|
||||||
|
# ----- Crawler TOR proxy + recircuit -----
|
||||||
|
# The compose stack ships a `tor` service (dockurr/tor) and defaults
|
||||||
|
# CRAWLER_PROXY to it, so by default all crawler traffic exits via the
|
||||||
|
# TOR network. To opt out, set CRAWLER_PROXY= (empty) AND
|
||||||
|
# CRAWLER_TOR_CONTROL_URL= (empty) below — the tor service can stay
|
||||||
|
# running, it just won't be used.
|
||||||
|
#
|
||||||
|
# Going through TOR adds latency to every fetch; image downloads in
|
||||||
|
# particular slow noticeably. The win is on sites that rate-limit or
|
||||||
|
# fingerprint by exit IP — NEWNYM recirculation makes a fresh exit
|
||||||
|
# cheap to reach for.
|
||||||
|
#
|
||||||
|
# CRAWLER_PROXY: SOCKS5(h) URL. Use `socks5h://` (not `socks5://`) so
|
||||||
|
# DNS resolution also goes through TOR, avoiding leaks via the host's
|
||||||
|
# resolver. Leave unset to talk to the upstream directly.
|
||||||
|
CRAWLER_PROXY=socks5h://tor:9050
|
||||||
|
# Control-port URL for SIGNAL NEWNYM ("get a fresh circuit"). Triggered
|
||||||
|
# automatically on bad pages (broken-page body, missing #logo) and on
|
||||||
|
# the Unauthenticated session probe outcome. Leave unset to disable
|
||||||
|
# the recircuit feature (the SOCKS proxy still works).
|
||||||
|
CRAWLER_TOR_CONTROL_URL=tcp://tor:9051
|
||||||
|
# Max NEWNYM-and-retry cycles per recircuit-eligible failure. Default 3.
|
||||||
|
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS=3
|
||||||
|
|
||||||
|
# ----- TOR control-port password -----
|
||||||
|
# Shared between the bundled dockurr/tor service (which hashes it into
|
||||||
|
# its HashedControlPassword) and the backend's
|
||||||
|
# CRAWLER_TOR_CONTROL_PASSWORD. REQUIRED — docker-compose.yml fails
|
||||||
|
# fast if absent. Generate a strong random string; rotate by setting
|
||||||
|
# a new value and restarting both `tor` and `backend`.
|
||||||
|
#
|
||||||
|
# Operators running their own non-dockurr tor daemon with cookie-file
|
||||||
|
# auth can ignore this var and instead set
|
||||||
|
# CRAWLER_TOR_CONTROL_COOKIE_PATH on the backend — the TorController
|
||||||
|
# prefers cookie when both are present.
|
||||||
|
TOR_CONTROL_PASSWORD=change-me-to-a-strong-random-string
|
||||||
|
|
||||||
# ----- Frontend -----
|
# ----- Frontend -----
|
||||||
# The frontend container runs SvelteKit's Node adapter on :3000 and
|
# The frontend container runs SvelteKit's Node adapter on :3000 and
|
||||||
# proxies /api/* to BACKEND_URL via src/hooks.server.ts. In compose the
|
# proxies /api/* to BACKEND_URL via src/hooks.server.ts. In compose the
|
||||||
|
|||||||
@@ -72,9 +72,17 @@ jobs:
|
|||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
needs: [test-backend, test-frontend]
|
needs: [test-backend, test-frontend]
|
||||||
# PRs only run the test jobs; build + deploy are reserved for
|
# PRs only run the test jobs; build + deploy are reserved for
|
||||||
# post-merge pushes to main. Without this gate every PR would push
|
# post-merge pushes to main.
|
||||||
# a tagged image to the registry and SSH-deploy to prod.
|
|
||||||
if: github.event_name != 'pull_request'
|
if: github.event_name != 'pull_request'
|
||||||
|
# Build on the host docker daemon directly (docker-outside-of-docker):
|
||||||
|
# the runner shares the deploy host's daemon, so a plain `docker build`
|
||||||
|
# reuses the host's layer cache and avoids buildx's docker-container
|
||||||
|
# driver + the gha cache exporter — neither works against this single-host
|
||||||
|
# act_runner, and there is no in-job daemon socket unless we mount it.
|
||||||
|
container:
|
||||||
|
image: docker.gitea.com/runner-images:ubuntu-latest
|
||||||
|
volumes:
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
outputs:
|
outputs:
|
||||||
image_tag: ${{ steps.meta.outputs.image_tag }}
|
image_tag: ${{ steps.meta.outputs.image_tag }}
|
||||||
version: ${{ steps.meta.outputs.version }}
|
version: ${{ steps.meta.outputs.version }}
|
||||||
@@ -93,48 +101,32 @@ jobs:
|
|||||||
echo "image_tag=${GITHUB_SHA}" >> "$GITHUB_OUTPUT"
|
echo "image_tag=${GITHUB_SHA}" >> "$GITHUB_OUTPUT"
|
||||||
echo "version=${version}" >> "$GITHUB_OUTPUT"
|
echo "version=${version}" >> "$GITHUB_OUTPUT"
|
||||||
|
|
||||||
- uses: docker/setup-buildx-action@v3
|
- name: Build & push backend + frontend
|
||||||
|
env:
|
||||||
- name: docker login
|
REGISTRY_URL: ${{ secrets.REGISTRY_URL }}
|
||||||
uses: docker/login-action@v3
|
REGISTRY_USERNAME: ${{ secrets.REGISTRY_USERNAME }}
|
||||||
with:
|
REGISTRY_PASSWORD: ${{ secrets.REGISTRY_PASSWORD }}
|
||||||
registry: ${{ secrets.REGISTRY_URL }}
|
IMAGE_TAG: ${{ steps.meta.outputs.image_tag }}
|
||||||
username: ${{ secrets.REGISTRY_USERNAME }}
|
VERSION: ${{ steps.meta.outputs.version }}
|
||||||
password: ${{ secrets.REGISTRY_PASSWORD }}
|
run: |
|
||||||
|
set -eu
|
||||||
- name: Build & push backend
|
echo "$REGISTRY_PASSWORD" | docker login "$REGISTRY_URL" -u "$REGISTRY_USERNAME" --password-stdin
|
||||||
uses: docker/build-push-action@v5
|
for svc in backend frontend; do
|
||||||
with:
|
img="$REGISTRY_URL/mangalord-$svc"
|
||||||
context: ./backend
|
docker build -t "$img:$IMAGE_TAG" -t "$img:latest" -t "$img:$VERSION" "./$svc"
|
||||||
push: true
|
for tag in "$IMAGE_TAG" latest "$VERSION"; do docker push "$img:$tag"; done
|
||||||
tags: |
|
done
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-backend:latest
|
docker logout "$REGISTRY_URL"
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.image_tag }}
|
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.version }}
|
|
||||||
cache-from: type=gha,scope=backend
|
|
||||||
cache-to: type=gha,mode=max,scope=backend
|
|
||||||
|
|
||||||
- name: Build & push frontend
|
|
||||||
uses: docker/build-push-action@v5
|
|
||||||
with:
|
|
||||||
context: ./frontend
|
|
||||||
push: true
|
|
||||||
tags: |
|
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-frontend:latest
|
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.image_tag }}
|
|
||||||
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.version }}
|
|
||||||
cache-from: type=gha,scope=frontend
|
|
||||||
cache-to: type=gha,mode=max,scope=frontend
|
|
||||||
|
|
||||||
deploy:
|
deploy:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-latest
|
||||||
needs: build-and-push
|
needs: build-and-push
|
||||||
if: github.event_name != 'pull_request'
|
if: github.event_name != 'pull_request'
|
||||||
# Single-host deploy: the runner lives on the same box as the stack, so we
|
# Single-host deploy: the runner lives on the same box as the stack, so we
|
||||||
# drive the host docker daemon directly (act_runner shares its socket via
|
# drive the host docker daemon directly (the job mounts the host docker
|
||||||
# `docker_host: "-"`) instead of SSHing out. The compose dir is bind-mounted
|
# socket) instead of SSHing out. The compose dir is bind-mounted at its
|
||||||
# at its REAL host path so compose's relative bind-mounts (./mangalord/...,
|
# REAL host path so compose's relative bind-mounts (./mangalord/...,
|
||||||
# ./Caddyfile) resolve; this requires `/mnt/ssd/docker-data` in the runner's
|
# ./Caddyfile) resolve; both paths must be in the runner's
|
||||||
# container.valid_volumes. The central compose references the images as
|
# container.valid_volumes. The central compose references the images as
|
||||||
# registry.mc02.dev/mangalord-*:${MANGALORD_TAG:-latest}, so we only pull
|
# registry.mc02.dev/mangalord-*:${MANGALORD_TAG:-latest}, so we only pull
|
||||||
# and recreate the two mangalord services at the freshly built SHA.
|
# and recreate the two mangalord services at the freshly built SHA.
|
||||||
@@ -142,6 +134,7 @@ jobs:
|
|||||||
image: docker:cli
|
image: docker:cli
|
||||||
volumes:
|
volumes:
|
||||||
- /mnt/ssd/docker-data:/mnt/ssd/docker-data
|
- /mnt/ssd/docker-data:/mnt/ssd/docker-data
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
steps:
|
steps:
|
||||||
- name: Deploy to the local stack
|
- name: Deploy to the local stack
|
||||||
working-directory: /mnt/ssd/docker-data
|
working-directory: /mnt/ssd/docker-data
|
||||||
|
|||||||
2
backend/Cargo.lock
generated
2
backend/Cargo.lock
generated
@@ -1470,7 +1470,7 @@ checksum = "c41e0c4fef86961ac6d6f8a82609f55f31b05e4fce149ac5710e439df7619ba4"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "mangalord"
|
name = "mangalord"
|
||||||
version = "0.45.0"
|
version = "0.54.0"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"anyhow",
|
"anyhow",
|
||||||
"argon2",
|
"argon2",
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "mangalord"
|
name = "mangalord"
|
||||||
version = "0.45.0"
|
version = "0.54.0"
|
||||||
edition = "2021"
|
edition = "2021"
|
||||||
default-run = "mangalord"
|
default-run = "mangalord"
|
||||||
|
|
||||||
@@ -57,3 +57,13 @@ http-body-util = "0.1"
|
|||||||
mime = "0.3"
|
mime = "0.3"
|
||||||
futures-util = "0.3"
|
futures-util = "0.3"
|
||||||
tokio = { version = "1", features = ["test-util"] }
|
tokio = { version = "1", features = ["test-util"] }
|
||||||
|
|
||||||
|
# Trim debug builds: keep line numbers in panics / backtraces but drop the
|
||||||
|
# full DWARF info (variable-level inspection in gdb/lldb). With a sqlx +
|
||||||
|
# axum + tokio dep tree the default ("full") leaves backend/target on the
|
||||||
|
# order of tens of GiB; this typically cuts ~50–70% off that.
|
||||||
|
[profile.dev]
|
||||||
|
debug = "line-tables-only"
|
||||||
|
|
||||||
|
[profile.test]
|
||||||
|
debug = "line-tables-only"
|
||||||
|
|||||||
18
backend/migrations/0021_chapter_source_index.sql
Normal file
18
backend/migrations/0021_chapter_source_index.sql
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
-- Capture each chapter's position in the source site's chapter list so
|
||||||
|
-- the user-facing list can preserve site order: variants of the same
|
||||||
|
-- chapter number (e.g. "Ch.14 : PH" next to "Ch.14 : Official") stay
|
||||||
|
-- adjacent, and non-numeric entries like "notice. : Officials" land
|
||||||
|
-- where the site placed them rather than clustering at the top under
|
||||||
|
-- number = 0.
|
||||||
|
--
|
||||||
|
-- Lower source_index = closer to the top of the source DOM = newer
|
||||||
|
-- chapter on this site (it renders newest-first). The list query
|
||||||
|
-- reverses this with ORDER BY source_index DESC so the oldest chapter
|
||||||
|
-- appears first in our UI.
|
||||||
|
--
|
||||||
|
-- NULL is the sentinel for user-uploaded chapters (no source row) and
|
||||||
|
-- for crawled rows that pre-date this migration. The list query keeps
|
||||||
|
-- the existing (number, created_at) tiebreak via NULLS LAST so those
|
||||||
|
-- fall through to the prior behaviour until the next crawler tick
|
||||||
|
-- populates the column.
|
||||||
|
ALTER TABLE chapters ADD COLUMN source_index INTEGER;
|
||||||
491
backend/src/api/admin/crawler.rs
Normal file
491
backend/src/api/admin/crawler.rs
Normal file
@@ -0,0 +1,491 @@
|
|||||||
|
//! Admin-only crawler observability + control endpoints.
|
||||||
|
//!
|
||||||
|
//! Mounted under `/api/v1/admin/crawler*`, cookie-only via `RequireAdmin`.
|
||||||
|
//! All control endpoints return 503 when the crawler daemon is disabled
|
||||||
|
//! (`AppState.crawler == None`). Reads compose the live in-process status
|
||||||
|
//! ([`crate::crawler::status`]) with DB-derived queue counts and the
|
||||||
|
//! session/browser flags.
|
||||||
|
|
||||||
|
use std::convert::Infallible;
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use axum::extract::{Query, State};
|
||||||
|
use axum::response::sse::{Event, KeepAlive, Sse};
|
||||||
|
use axum::routing::{get, post};
|
||||||
|
use axum::{Json, Router};
|
||||||
|
use futures_util::stream::Stream;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use serde_json::json;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use crate::app::{AppState, CrawlerControl};
|
||||||
|
use crate::auth::extractor::RequireAdmin;
|
||||||
|
use crate::crawler::browser_manager::RestartPhase;
|
||||||
|
use crate::crawler::status::{ActiveChapter, CoverTarget, LastPass, Phase};
|
||||||
|
use crate::error::{AppError, AppResult};
|
||||||
|
use crate::repo;
|
||||||
|
use crate::repo::crawler::{ActiveJob, DeadJob, MissingCoverRow, RequeueScope};
|
||||||
|
|
||||||
|
/// Backstop recompose interval for the SSE stream. Phase/worker/session
|
||||||
|
/// changes push instantly via the status `watch`; this only bounds the
|
||||||
|
/// staleness of DB-derived queue counts and the browser phase when those
|
||||||
|
/// change without an accompanying status poke.
|
||||||
|
const SSE_BACKSTOP: Duration = Duration::from_secs(5);
|
||||||
|
|
||||||
|
pub fn routes() -> Router<AppState> {
|
||||||
|
Router::new()
|
||||||
|
.route("/admin/crawler", get(get_status))
|
||||||
|
.route("/admin/crawler/stream", get(stream_status))
|
||||||
|
.route("/admin/crawler/run", post(run_now))
|
||||||
|
.route("/admin/crawler/browser/restart", post(restart_browser))
|
||||||
|
.route("/admin/crawler/session", post(update_session))
|
||||||
|
.route(
|
||||||
|
"/admin/crawler/session/clear-expired",
|
||||||
|
post(clear_session_expired),
|
||||||
|
)
|
||||||
|
.route("/admin/crawler/dead-jobs", get(list_dead_jobs))
|
||||||
|
.route("/admin/crawler/dead-jobs/requeue", post(requeue_dead_jobs))
|
||||||
|
.route("/admin/crawler/active-jobs", get(list_active_jobs))
|
||||||
|
.route("/admin/crawler/covers", get(list_covers))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// GET /admin/crawler — live status
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct QueueCounts {
|
||||||
|
pending: i64,
|
||||||
|
running: i64,
|
||||||
|
dead: i64,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct SessionStatus {
|
||||||
|
/// Whether the sticky session-expired flag is set (chapter workers idle).
|
||||||
|
expired: bool,
|
||||||
|
/// Whether a PHPSESSID is currently configured at all.
|
||||||
|
configured: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct CrawlerStatusResponse {
|
||||||
|
/// `"running"` | `"disabled"`.
|
||||||
|
daemon: &'static str,
|
||||||
|
phase: Option<Phase>,
|
||||||
|
/// Configured chapter-worker count (for "N busy / M workers").
|
||||||
|
worker_count: usize,
|
||||||
|
/// Chapters being crawled right now, with live page counts.
|
||||||
|
active_chapters: Vec<ActiveChapter>,
|
||||||
|
/// The cover being fetched right now, if any.
|
||||||
|
current_cover: Option<CoverTarget>,
|
||||||
|
/// Mangas still queued for a cover fetch.
|
||||||
|
covers_queued: i64,
|
||||||
|
last_pass: LastPass,
|
||||||
|
session: SessionStatus,
|
||||||
|
/// `"healthy"` | `"draining"` | `"restarting"` | `"down"`.
|
||||||
|
browser: &'static str,
|
||||||
|
queue: QueueCounts,
|
||||||
|
}
|
||||||
|
|
||||||
|
fn browser_phase_str(p: RestartPhase) -> &'static str {
|
||||||
|
match p {
|
||||||
|
RestartPhase::Healthy => "healthy",
|
||||||
|
RestartPhase::Draining => "draining",
|
||||||
|
RestartPhase::Restarting => "restarting",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Compose a full status snapshot from the in-memory status, the
|
||||||
|
/// browser/session flags, and a fresh DB queue-count query. Shared by the
|
||||||
|
/// one-shot `get_status` and the SSE `stream_status`.
|
||||||
|
async fn compose_status(state: &AppState) -> AppResult<CrawlerStatusResponse> {
|
||||||
|
let (pending, running, dead) = repo::crawler::job_state_counts(&state.db).await?;
|
||||||
|
let queue = QueueCounts {
|
||||||
|
pending,
|
||||||
|
running,
|
||||||
|
dead,
|
||||||
|
};
|
||||||
|
let covers_queued = repo::crawler::count_missing_covers(&state.db).await?;
|
||||||
|
|
||||||
|
Ok(match state.crawler.as_ref() {
|
||||||
|
None => CrawlerStatusResponse {
|
||||||
|
daemon: "disabled",
|
||||||
|
phase: None,
|
||||||
|
worker_count: 0,
|
||||||
|
active_chapters: Vec::new(),
|
||||||
|
current_cover: None,
|
||||||
|
covers_queued,
|
||||||
|
last_pass: LastPass::default(),
|
||||||
|
session: SessionStatus {
|
||||||
|
expired: false,
|
||||||
|
configured: false,
|
||||||
|
},
|
||||||
|
browser: "down",
|
||||||
|
queue,
|
||||||
|
},
|
||||||
|
Some(c) => {
|
||||||
|
let snap = c.status.snapshot().await;
|
||||||
|
CrawlerStatusResponse {
|
||||||
|
daemon: "running",
|
||||||
|
phase: Some(snap.phase),
|
||||||
|
worker_count: snap.worker_count,
|
||||||
|
active_chapters: snap.active_chapters,
|
||||||
|
current_cover: snap.current_cover,
|
||||||
|
covers_queued,
|
||||||
|
last_pass: snap.last_pass,
|
||||||
|
session: SessionStatus {
|
||||||
|
expired: c.session.is_expired(),
|
||||||
|
configured: c.session.current().await.is_some(),
|
||||||
|
},
|
||||||
|
browser: browser_phase_str(c.browser_manager.phase()),
|
||||||
|
queue,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn get_status(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
_admin: RequireAdmin,
|
||||||
|
) -> AppResult<Json<CrawlerStatusResponse>> {
|
||||||
|
Ok(Json(compose_status(&state).await?))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// GET /admin/crawler/stream — Server-Sent Events live status
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
/// Push live status to the dashboard instead of polling. Emits a snapshot
|
||||||
|
/// immediately on connect, then on every status change (instant, via the
|
||||||
|
/// `watch` notifier) and on a [`SSE_BACKSTOP`] tick (to refresh DB queue
|
||||||
|
/// counts / browser phase that change without a status poke). The browser
|
||||||
|
/// opens this only while the crawler page is mounted and closes it on
|
||||||
|
/// navigate-away, so the subscription is scoped to the active page.
|
||||||
|
async fn stream_status(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
_admin: RequireAdmin,
|
||||||
|
) -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
|
||||||
|
// Subscribe before the first emit so no change between the initial
|
||||||
|
// snapshot and the first await is lost.
|
||||||
|
let rx = state.crawler.as_ref().map(|c| c.status.subscribe());
|
||||||
|
|
||||||
|
let stream = futures_util::stream::unfold(
|
||||||
|
(state, rx, true),
|
||||||
|
|(state, mut rx, first)| async move {
|
||||||
|
// After the first immediate emit, wait for a change or the
|
||||||
|
// backstop tick before recomposing.
|
||||||
|
if !first {
|
||||||
|
match rx.as_mut() {
|
||||||
|
Some(rx) => {
|
||||||
|
tokio::select! {
|
||||||
|
_ = rx.changed() => {}
|
||||||
|
_ = tokio::time::sleep(SSE_BACKSTOP) => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
None => tokio::time::sleep(SSE_BACKSTOP).await,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Compose; on a transient DB error, emit a keep-alive comment
|
||||||
|
// rather than tearing down the stream.
|
||||||
|
let event = match compose_status(&state).await {
|
||||||
|
Ok(resp) => Event::default()
|
||||||
|
.event("status")
|
||||||
|
.json_data(&resp)
|
||||||
|
.unwrap_or_else(|_| Event::default().comment("serialize error")),
|
||||||
|
Err(_) => Event::default().comment("status unavailable"),
|
||||||
|
};
|
||||||
|
Some((Ok(event), (state, rx, false)))
|
||||||
|
},
|
||||||
|
);
|
||||||
|
|
||||||
|
Sse::new(stream).keep_alive(KeepAlive::default())
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// POST /admin/crawler/run — trigger an out-of-cycle metadata pass
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct RunResponse {
|
||||||
|
started: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn run_now(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
) -> AppResult<Json<RunResponse>> {
|
||||||
|
let c = require_crawler(&state)?;
|
||||||
|
let mp = c.metadata_pass.as_ref().ok_or_else(|| {
|
||||||
|
AppError::ServiceUnavailable("no source configured (CRAWLER_START_URL unset)".into())
|
||||||
|
})?;
|
||||||
|
let mp = std::sync::Arc::clone(mp);
|
||||||
|
// Fire-and-forget: the pass can run for minutes; the dashboard polls
|
||||||
|
// status for progress. Overlap with the daily cron is rare (daily) and
|
||||||
|
// both serialise on the single browser lease.
|
||||||
|
tokio::spawn(async move {
|
||||||
|
if let Err(e) = mp.run().await {
|
||||||
|
tracing::warn!(error = ?e, "manual metadata pass failed");
|
||||||
|
}
|
||||||
|
});
|
||||||
|
repo::admin_audit::insert(&state.db, admin.0.id, "crawler_run", "crawler", None, json!({}))
|
||||||
|
.await?;
|
||||||
|
Ok(Json(RunResponse { started: true }))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// POST /admin/crawler/browser/restart — coordinated restart
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct RestartResponse {
|
||||||
|
ok: bool,
|
||||||
|
error: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn restart_browser(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
) -> AppResult<Json<RestartResponse>> {
|
||||||
|
let c = require_crawler(&state)?;
|
||||||
|
let result = c.browser_manager.coordinated_restart(c.drain_deadline).await;
|
||||||
|
// A successful coordinated_restart re-runs on_launch, which re-injects
|
||||||
|
// PHPSESSID and re-probes — i.e. the session is live. Drop the sticky
|
||||||
|
// `session_expired` flag so chapter workers stop idling without
|
||||||
|
// requiring a second click on "Clear expired".
|
||||||
|
if result.is_ok() {
|
||||||
|
c.session.clear_expired();
|
||||||
|
}
|
||||||
|
// Push the post-restart browser phase to live subscribers immediately.
|
||||||
|
c.status.poke();
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"crawler_browser_restart",
|
||||||
|
"crawler",
|
||||||
|
None,
|
||||||
|
json!({ "ok": result.is_ok() }),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
Ok(Json(match result {
|
||||||
|
Ok(()) => RestartResponse {
|
||||||
|
ok: true,
|
||||||
|
error: None,
|
||||||
|
},
|
||||||
|
Err(e) => RestartResponse {
|
||||||
|
ok: false,
|
||||||
|
error: Some(format!("{e:#}")),
|
||||||
|
},
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// POST /admin/crawler/session — refresh PHPSESSID
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
struct UpdateSessionRequest {
|
||||||
|
phpsessid: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct UpdateSessionResponse {
|
||||||
|
/// Whether the post-update browser relaunch + session probe succeeded.
|
||||||
|
valid: bool,
|
||||||
|
error: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn update_session(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
Json(body): Json<UpdateSessionRequest>,
|
||||||
|
) -> AppResult<Json<UpdateSessionResponse>> {
|
||||||
|
let c = require_crawler(&state)?;
|
||||||
|
c.session
|
||||||
|
.update(&body.phpsessid)
|
||||||
|
.await
|
||||||
|
.map_err(|e| AppError::InvalidInput(format!("{e:#}")))?;
|
||||||
|
// Relaunch the browser so on_launch re-injects the new cookie and
|
||||||
|
// re-probes — the restart's success IS the session-validity signal.
|
||||||
|
let probe = c.browser_manager.coordinated_restart(c.drain_deadline).await;
|
||||||
|
// Session + browser state changed — push to live subscribers.
|
||||||
|
c.status.poke();
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"crawler_session_update",
|
||||||
|
"crawler",
|
||||||
|
None,
|
||||||
|
json!({ "valid": probe.is_ok() }),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
Ok(Json(match probe {
|
||||||
|
Ok(()) => UpdateSessionResponse {
|
||||||
|
valid: true,
|
||||||
|
error: None,
|
||||||
|
},
|
||||||
|
Err(e) => UpdateSessionResponse {
|
||||||
|
valid: false,
|
||||||
|
error: Some(format!("{e:#}")),
|
||||||
|
},
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct ClearExpiredResponse {
|
||||||
|
cleared: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn clear_session_expired(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
) -> AppResult<Json<ClearExpiredResponse>> {
|
||||||
|
let c = require_crawler(&state)?;
|
||||||
|
c.session.clear_expired();
|
||||||
|
// session.expired flipped — push to live subscribers.
|
||||||
|
c.status.poke();
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"crawler_session_clear_expired",
|
||||||
|
"crawler",
|
||||||
|
None,
|
||||||
|
json!({}),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
Ok(Json(ClearExpiredResponse { cleared: true }))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Dead jobs
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize, Default)]
|
||||||
|
struct DeadJobsParams {
|
||||||
|
#[serde(default)]
|
||||||
|
search: Option<String>,
|
||||||
|
#[serde(default = "default_limit")]
|
||||||
|
limit: i64,
|
||||||
|
#[serde(default)]
|
||||||
|
offset: i64,
|
||||||
|
}
|
||||||
|
|
||||||
|
fn default_limit() -> i64 {
|
||||||
|
50
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_dead_jobs(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
_admin: RequireAdmin,
|
||||||
|
Query(params): Query<DeadJobsParams>,
|
||||||
|
) -> AppResult<Json<crate::api::pagination::PagedResponse<DeadJob>>> {
|
||||||
|
let limit = params.limit.clamp(1, 200);
|
||||||
|
let offset = params.offset.max(0);
|
||||||
|
let search = params.search.filter(|s| !s.trim().is_empty());
|
||||||
|
let (items, total) =
|
||||||
|
repo::crawler::list_dead_jobs(&state.db, search.as_deref(), limit, offset).await?;
|
||||||
|
Ok(Json(crate::api::pagination::PagedResponse::with_total(
|
||||||
|
items, limit, offset, total,
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Deserialize)]
|
||||||
|
#[serde(tag = "scope", rename_all = "snake_case")]
|
||||||
|
enum RequeueRequest {
|
||||||
|
All,
|
||||||
|
Manga { manga_id: Uuid },
|
||||||
|
Chapter { chapter_id: Uuid },
|
||||||
|
Job { job_id: Uuid },
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
struct RequeueResponse {
|
||||||
|
requeued: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn requeue_dead_jobs(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
Json(body): Json<RequeueRequest>,
|
||||||
|
) -> AppResult<Json<RequeueResponse>> {
|
||||||
|
let scope = match &body {
|
||||||
|
RequeueRequest::All => RequeueScope::All,
|
||||||
|
RequeueRequest::Manga { manga_id } => RequeueScope::Manga(*manga_id),
|
||||||
|
RequeueRequest::Chapter { chapter_id } => RequeueScope::Chapter(*chapter_id),
|
||||||
|
RequeueRequest::Job { job_id } => RequeueScope::Job(*job_id),
|
||||||
|
};
|
||||||
|
let requeued = repo::crawler::requeue_dead_jobs(&state.db, scope).await?;
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"crawler_dead_jobs_requeue",
|
||||||
|
"crawler",
|
||||||
|
None,
|
||||||
|
json!({ "requeued": requeued, "scope": scope_label(&body) }),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
Ok(Json(RequeueResponse { requeued }))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn scope_label(r: &RequeueRequest) -> &'static str {
|
||||||
|
match r {
|
||||||
|
RequeueRequest::All => "all",
|
||||||
|
RequeueRequest::Manga { .. } => "manga",
|
||||||
|
RequeueRequest::Chapter { .. } => "chapter",
|
||||||
|
RequeueRequest::Job { .. } => "job",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Queued-chapters + queued-covers backlogs (paginated, fetched on demand)
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
/// Pagination + title-search params shared by the backlog list endpoints.
|
||||||
|
#[derive(Debug, Deserialize, Default)]
|
||||||
|
struct ListParams {
|
||||||
|
#[serde(default)]
|
||||||
|
search: Option<String>,
|
||||||
|
#[serde(default = "default_limit")]
|
||||||
|
limit: i64,
|
||||||
|
#[serde(default)]
|
||||||
|
offset: i64,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_active_jobs(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
_admin: RequireAdmin,
|
||||||
|
Query(params): Query<ListParams>,
|
||||||
|
) -> AppResult<Json<crate::api::pagination::PagedResponse<ActiveJob>>> {
|
||||||
|
let limit = params.limit.clamp(1, 200);
|
||||||
|
let offset = params.offset.max(0);
|
||||||
|
let search = params.search.filter(|s| !s.trim().is_empty());
|
||||||
|
let (items, total) =
|
||||||
|
repo::crawler::list_active_jobs(&state.db, search.as_deref(), limit, offset).await?;
|
||||||
|
Ok(Json(crate::api::pagination::PagedResponse::with_total(
|
||||||
|
items, limit, offset, total,
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn list_covers(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
_admin: RequireAdmin,
|
||||||
|
Query(params): Query<ListParams>,
|
||||||
|
) -> AppResult<Json<crate::api::pagination::PagedResponse<MissingCoverRow>>> {
|
||||||
|
let limit = params.limit.clamp(1, 200);
|
||||||
|
let offset = params.offset.max(0);
|
||||||
|
let search = params.search.filter(|s| !s.trim().is_empty());
|
||||||
|
let (items, total) =
|
||||||
|
repo::crawler::list_missing_cover_mangas(&state.db, search.as_deref(), limit, offset)
|
||||||
|
.await?;
|
||||||
|
Ok(Json(crate::api::pagination::PagedResponse::with_total(
|
||||||
|
items, limit, offset, total,
|
||||||
|
)))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
fn require_crawler(state: &AppState) -> Result<&std::sync::Arc<CrawlerControl>, AppError> {
|
||||||
|
state.crawler.as_ref().ok_or_else(|| {
|
||||||
|
AppError::ServiceUnavailable("crawler daemon is disabled".into())
|
||||||
|
})
|
||||||
|
}
|
||||||
@@ -4,7 +4,9 @@
|
|||||||
//! bot/API tokens cannot reach admin routes (see
|
//! bot/API tokens cannot reach admin routes (see
|
||||||
//! `crate::auth::extractor::RequireAdmin`).
|
//! `crate::auth::extractor::RequireAdmin`).
|
||||||
|
|
||||||
|
pub mod crawler;
|
||||||
pub mod mangas;
|
pub mod mangas;
|
||||||
|
pub mod resync;
|
||||||
pub mod system;
|
pub mod system;
|
||||||
pub mod users;
|
pub mod users;
|
||||||
|
|
||||||
@@ -16,5 +18,7 @@ pub fn routes() -> Router<AppState> {
|
|||||||
Router::new()
|
Router::new()
|
||||||
.merge(users::routes())
|
.merge(users::routes())
|
||||||
.merge(mangas::routes())
|
.merge(mangas::routes())
|
||||||
|
.merge(resync::routes())
|
||||||
.merge(system::routes())
|
.merge(system::routes())
|
||||||
|
.merge(crawler::routes())
|
||||||
}
|
}
|
||||||
|
|||||||
176
backend/src/api/admin/resync.rs
Normal file
176
backend/src/api/admin/resync.rs
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
//! Admin-triggered force resync of a single manga's metadata + cover,
|
||||||
|
//! or a single chapter's content.
|
||||||
|
//!
|
||||||
|
//! Both endpoints are admin-only (`RequireAdmin`, cookie-only) and run
|
||||||
|
//! synchronously with the request — the response carries the refreshed
|
||||||
|
//! resource so the UI can swap it in without a follow-up GET. The work
|
||||||
|
//! itself is delegated to [`ResyncService`] (set on AppState by
|
||||||
|
//! `app::build` when the crawler daemon is enabled); when the daemon
|
||||||
|
//! is disabled, both handlers return 503.
|
||||||
|
|
||||||
|
use axum::extract::{Path, State};
|
||||||
|
use axum::routing::post;
|
||||||
|
use axum::{Json, Router};
|
||||||
|
use serde::Serialize;
|
||||||
|
use serde_json::json;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use crate::app::AppState;
|
||||||
|
use crate::auth::extractor::RequireAdmin;
|
||||||
|
use crate::crawler::resync::{ChapterResyncOutcome, ResyncError};
|
||||||
|
use crate::domain::manga::MangaDetail;
|
||||||
|
use crate::domain::Chapter;
|
||||||
|
use crate::error::{AppError, AppResult};
|
||||||
|
use crate::repo;
|
||||||
|
use crate::repo::crawler::UpsertStatus;
|
||||||
|
|
||||||
|
pub fn routes() -> Router<AppState> {
|
||||||
|
Router::new()
|
||||||
|
.route("/admin/mangas/:id/resync", post(resync_manga))
|
||||||
|
.route("/admin/chapters/:id/resync", post(resync_chapter))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct MangaResyncResponse {
|
||||||
|
pub manga: MangaDetail,
|
||||||
|
/// `"new" | "updated" | "unchanged"` — mirrors [`UpsertStatus`].
|
||||||
|
pub metadata_status: &'static str,
|
||||||
|
pub cover_fetched: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Serialize)]
|
||||||
|
pub struct ChapterResyncResponse {
|
||||||
|
pub chapter: Chapter,
|
||||||
|
/// `"fetched" | "skipped"` — whether new pages landed or the
|
||||||
|
/// service short-circuited (e.g. chapter already had pages and the
|
||||||
|
/// session was lost so force was downgraded).
|
||||||
|
pub outcome: &'static str,
|
||||||
|
/// Page count when `outcome == "fetched"`. `None` for `skipped`.
|
||||||
|
pub pages: Option<usize>,
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn resync_manga(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
Path(manga_id): Path<Uuid>,
|
||||||
|
) -> AppResult<Json<MangaResyncResponse>> {
|
||||||
|
if !repo::manga::exists(&state.db, manga_id).await? {
|
||||||
|
return Err(AppError::NotFound);
|
||||||
|
}
|
||||||
|
let resync = state
|
||||||
|
.resync
|
||||||
|
.as_ref()
|
||||||
|
.ok_or_else(|| AppError::ServiceUnavailable(
|
||||||
|
"crawler daemon is disabled; force resync unavailable".into(),
|
||||||
|
))?;
|
||||||
|
|
||||||
|
let outcome = resync.resync_manga(manga_id).await.map_err(map_resync_err)?;
|
||||||
|
|
||||||
|
// Audit the action with the actor + the resync outcome so an
|
||||||
|
// operator-of-operators can answer "who refetched this manga, and
|
||||||
|
// did the cover land?" from the log alone.
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"manga_resync",
|
||||||
|
"manga",
|
||||||
|
Some(manga_id),
|
||||||
|
json!({
|
||||||
|
"metadata_status": status_str(outcome.metadata_status),
|
||||||
|
"cover_fetched": outcome.cover_fetched,
|
||||||
|
}),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let manga = repo::manga::get_detail(&state.db, manga_id).await?;
|
||||||
|
Ok(Json(MangaResyncResponse {
|
||||||
|
manga,
|
||||||
|
metadata_status: status_str(outcome.metadata_status),
|
||||||
|
cover_fetched: outcome.cover_fetched,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn resync_chapter(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
admin: RequireAdmin,
|
||||||
|
Path(chapter_id): Path<Uuid>,
|
||||||
|
) -> AppResult<Json<ChapterResyncResponse>> {
|
||||||
|
let resync = state
|
||||||
|
.resync
|
||||||
|
.as_ref()
|
||||||
|
.ok_or_else(|| AppError::ServiceUnavailable(
|
||||||
|
"crawler daemon is disabled; force resync unavailable".into(),
|
||||||
|
))?;
|
||||||
|
|
||||||
|
// Look up the manga the chapter belongs to so we can return the
|
||||||
|
// refreshed chapter row in the response and 404 for unknown ids.
|
||||||
|
let manga_id: Option<Uuid> =
|
||||||
|
sqlx::query_scalar("SELECT manga_id FROM chapters WHERE id = $1")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_optional(&state.db)
|
||||||
|
.await?;
|
||||||
|
let Some(manga_id) = manga_id else {
|
||||||
|
return Err(AppError::NotFound);
|
||||||
|
};
|
||||||
|
|
||||||
|
let outcome = resync
|
||||||
|
.resync_chapter(chapter_id)
|
||||||
|
.await
|
||||||
|
.map_err(map_resync_err)?;
|
||||||
|
|
||||||
|
let (outcome_str, pages) = match &outcome {
|
||||||
|
ChapterResyncOutcome::Fetched { pages, .. } => ("fetched", Some(*pages)),
|
||||||
|
ChapterResyncOutcome::Skipped { .. } => ("skipped", None),
|
||||||
|
};
|
||||||
|
|
||||||
|
repo::admin_audit::insert(
|
||||||
|
&state.db,
|
||||||
|
admin.0.id,
|
||||||
|
"chapter_resync",
|
||||||
|
"chapter",
|
||||||
|
Some(chapter_id),
|
||||||
|
json!({
|
||||||
|
"outcome": outcome_str,
|
||||||
|
"pages": pages,
|
||||||
|
}),
|
||||||
|
)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let chapter = repo::chapter::find_by_id_in_manga(&state.db, manga_id, chapter_id)
|
||||||
|
.await?
|
||||||
|
.ok_or(AppError::NotFound)?;
|
||||||
|
Ok(Json(ChapterResyncResponse {
|
||||||
|
chapter,
|
||||||
|
outcome: outcome_str,
|
||||||
|
pages,
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
fn status_str(s: UpsertStatus) -> &'static str {
|
||||||
|
match s {
|
||||||
|
UpsertStatus::New => "new",
|
||||||
|
UpsertStatus::Updated => "updated",
|
||||||
|
UpsertStatus::Unchanged => "unchanged",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Map [`ResyncError`] (and the anyhow envelopes wrapping it) onto the
|
||||||
|
/// right [`AppError`]. Anything else surfaces as a generic 500 via the
|
||||||
|
/// `Other` arm — the operator sees the underlying anyhow chain in
|
||||||
|
/// server logs, the client sees a clean envelope.
|
||||||
|
fn map_resync_err(err: anyhow::Error) -> AppError {
|
||||||
|
if let Some(rerr) = err.downcast_ref::<ResyncError>() {
|
||||||
|
match rerr {
|
||||||
|
ResyncError::NoMangaSource => AppError::ValidationFailed {
|
||||||
|
message: "manga has no live crawler source — cannot resync".into(),
|
||||||
|
details: json!({ "manga": "no_source" }),
|
||||||
|
},
|
||||||
|
ResyncError::NoChapterSource => AppError::ValidationFailed {
|
||||||
|
message: "chapter has no live crawler source — cannot resync".into(),
|
||||||
|
details: json!({ "chapter": "no_source" }),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
AppError::Other(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -42,18 +42,22 @@ pub fn routes() -> Router<AppState> {
|
|||||||
.route("/auth/tokens/:id", delete(delete_token))
|
.route("/auth/tokens/:id", delete(delete_token))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Public, unauthenticated. Exposes anonymous-relevant auth policy
|
/// Public, unauthenticated. Exposes anonymous-relevant auth policy so
|
||||||
/// (currently just whether self-registration is open) so the frontend
|
/// the frontend can render its login / register affordances correctly
|
||||||
/// can render its login / register affordances correctly without a
|
/// without a probe request that would conflate "disabled" with
|
||||||
/// probe request that would conflate "disabled" with "rate-limited".
|
/// "rate-limited". `self_register_enabled` is the *effective* value
|
||||||
|
/// (`allow_self_register && !private_mode`), so a private-mode
|
||||||
|
/// instance reports `false` even if the raw flag is on.
|
||||||
#[derive(Debug, Serialize)]
|
#[derive(Debug, Serialize)]
|
||||||
pub struct AuthConfigResponse {
|
pub struct AuthConfigResponse {
|
||||||
pub self_register_enabled: bool,
|
pub self_register_enabled: bool,
|
||||||
|
pub private_mode: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn auth_config(State(state): State<AppState>) -> Json<AuthConfigResponse> {
|
async fn auth_config(State(state): State<AppState>) -> Json<AuthConfigResponse> {
|
||||||
Json(AuthConfigResponse {
|
Json(AuthConfigResponse {
|
||||||
self_register_enabled: state.auth.allow_self_register,
|
self_register_enabled: state.auth.allow_self_register && !state.auth.private_mode,
|
||||||
|
private_mode: state.auth.private_mode,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -103,7 +107,10 @@ async fn register(
|
|||||||
// disabled and enabled paths both consume a token, and disabled
|
// disabled and enabled paths both consume a token, and disabled
|
||||||
// returns 403 instead of running argon2.
|
// returns 403 instead of running argon2.
|
||||||
check_auth_rate_limit(&state, "register")?;
|
check_auth_rate_limit(&state, "register")?;
|
||||||
if !state.auth.allow_self_register {
|
// Private mode force-blocks self-registration regardless of
|
||||||
|
// ALLOW_SELF_REGISTER — operators of locked-down instances mint
|
||||||
|
// accounts via `POST /admin/users` instead.
|
||||||
|
if !state.auth.allow_self_register || state.auth.private_mode {
|
||||||
return Err(AppError::Forbidden);
|
return Err(AppError::Forbidden);
|
||||||
}
|
}
|
||||||
let username = input.username.trim();
|
let username = input.username.trim();
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::sync::atomic::AtomicBool;
|
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
|
||||||
|
|
||||||
use anyhow::Context;
|
use anyhow::Context;
|
||||||
use async_trait::async_trait;
|
use async_trait::async_trait;
|
||||||
use axum::extract::DefaultBodyLimit;
|
use axum::extract::{DefaultBodyLimit, FromRequestParts, Request, State};
|
||||||
use axum::http::{HeaderName, HeaderValue, Method};
|
use axum::http::{HeaderName, HeaderValue, Method};
|
||||||
|
use axum::middleware::{self, Next};
|
||||||
|
use axum::response::Response;
|
||||||
use axum::Router;
|
use axum::Router;
|
||||||
use sqlx::postgres::PgPoolOptions;
|
use sqlx::postgres::PgPoolOptions;
|
||||||
use sqlx::PgPool;
|
use sqlx::PgPool;
|
||||||
@@ -12,7 +14,9 @@ use tokio_util::sync::CancellationToken;
|
|||||||
use tower_http::cors::{AllowOrigin, CorsLayer};
|
use tower_http::cors::{AllowOrigin, CorsLayer};
|
||||||
use tower_http::trace::TraceLayer;
|
use tower_http::trace::TraceLayer;
|
||||||
|
|
||||||
|
use crate::auth::extractor::CurrentUser;
|
||||||
use crate::auth::rate_limit::AuthRateLimiter;
|
use crate::auth::rate_limit::AuthRateLimiter;
|
||||||
|
use crate::error::AppError;
|
||||||
use crate::config::{AuthConfig, Config, CrawlerConfig, UploadConfig};
|
use crate::config::{AuthConfig, Config, CrawlerConfig, UploadConfig};
|
||||||
use crate::crawler::browser_manager::{self, BrowserManager};
|
use crate::crawler::browser_manager::{self, BrowserManager};
|
||||||
use crate::crawler::content::{self, SyncOutcome};
|
use crate::crawler::content::{self, SyncOutcome};
|
||||||
@@ -20,6 +24,7 @@ use crate::crawler::daemon::{self, ChapterDispatcher, DaemonConfig, MetadataPass
|
|||||||
use crate::crawler::jobs::JobPayload;
|
use crate::crawler::jobs::JobPayload;
|
||||||
use crate::crawler::pipeline::{self, MetadataStats};
|
use crate::crawler::pipeline::{self, MetadataStats};
|
||||||
use crate::crawler::rate_limit::HostRateLimiters;
|
use crate::crawler::rate_limit::HostRateLimiters;
|
||||||
|
use crate::crawler::resync::{RealResyncService, ResyncService};
|
||||||
use crate::crawler::safety::DownloadAllowlist;
|
use crate::crawler::safety::DownloadAllowlist;
|
||||||
use crate::crawler::session;
|
use crate::crawler::session;
|
||||||
use crate::repo;
|
use crate::repo;
|
||||||
@@ -35,6 +40,30 @@ pub struct AppState {
|
|||||||
/// One instance per AppState so tests stay isolated across the
|
/// One instance per AppState so tests stay isolated across the
|
||||||
/// same process.
|
/// same process.
|
||||||
pub auth_limiter: Arc<AuthRateLimiter>,
|
pub auth_limiter: Arc<AuthRateLimiter>,
|
||||||
|
/// Admin-triggered force resync. `None` when the crawler daemon
|
||||||
|
/// is disabled (`CRAWLER_DAEMON=false`); admin handlers gate on
|
||||||
|
/// `.is_some()` and return 503 otherwise. Set by [`build`] from the
|
||||||
|
/// same wiring that builds the daemon's chapter dispatcher, so a
|
||||||
|
/// force resync uses the daemon's BrowserManager + rate limiters.
|
||||||
|
pub resync: Option<Arc<dyn ResyncService>>,
|
||||||
|
/// Crawler observability + control handle (live status, coordinated
|
||||||
|
/// browser restart, runtime session, manual run). `None` when the
|
||||||
|
/// daemon is disabled; admin handlers gate on `.is_some()` → 503.
|
||||||
|
pub crawler: Option<Arc<CrawlerControl>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Shared handle the admin crawler endpoints use to observe and control
|
||||||
|
/// the running daemon. Bundled so the handlers take one optional field on
|
||||||
|
/// `AppState` rather than many.
|
||||||
|
pub struct CrawlerControl {
|
||||||
|
pub browser_manager: Arc<BrowserManager>,
|
||||||
|
pub session: Arc<crate::crawler::session_control::SessionController>,
|
||||||
|
pub status: crate::crawler::status::StatusHandle,
|
||||||
|
/// Used by the "run metadata pass now" endpoint; `None` when no
|
||||||
|
/// `CRAWLER_START_URL` is configured (cron disabled).
|
||||||
|
pub metadata_pass: Option<Arc<dyn MetadataPass>>,
|
||||||
|
/// Drain budget for a manually-triggered coordinated browser restart.
|
||||||
|
pub drain_deadline: std::time::Duration,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Bundle returned by [`build`]. The router is what `axum::serve` consumes;
|
/// Bundle returned by [`build`]. The router is what `axum::serve` consumes;
|
||||||
@@ -69,11 +98,12 @@ pub async fn build(config: Config) -> anyhow::Result<AppHandle> {
|
|||||||
|
|
||||||
let storage: Arc<dyn Storage> = Arc::new(LocalStorage::new(config.storage_dir.clone()));
|
let storage: Arc<dyn Storage> = Arc::new(LocalStorage::new(config.storage_dir.clone()));
|
||||||
|
|
||||||
let daemon = if config.crawler.daemon_enabled {
|
let (daemon, resync, crawler) = if config.crawler.daemon_enabled {
|
||||||
Some(spawn_crawler_daemon(db.clone(), Arc::clone(&storage), &config.crawler).await?)
|
let spawned = spawn_crawler_daemon(db.clone(), Arc::clone(&storage), &config.crawler).await?;
|
||||||
|
(Some(spawned.handle), Some(spawned.resync), Some(spawned.crawler))
|
||||||
} else {
|
} else {
|
||||||
tracing::info!("crawler daemon disabled (CRAWLER_DAEMON=false)");
|
tracing::info!("crawler daemon disabled (CRAWLER_DAEMON=false)");
|
||||||
None
|
(None, None, None)
|
||||||
};
|
};
|
||||||
|
|
||||||
let auth_limiter = Arc::new(AuthRateLimiter::new(config.auth.rate_limit));
|
let auth_limiter = Arc::new(AuthRateLimiter::new(config.auth.rate_limit));
|
||||||
@@ -83,21 +113,39 @@ pub async fn build(config: Config) -> anyhow::Result<AppHandle> {
|
|||||||
auth: config.auth.clone(),
|
auth: config.auth.clone(),
|
||||||
upload: config.upload.clone(),
|
upload: config.upload.clone(),
|
||||||
auth_limiter,
|
auth_limiter,
|
||||||
|
resync,
|
||||||
|
crawler,
|
||||||
};
|
};
|
||||||
let router = router(state).layer(cors_layer(&config.cors_allowed_origins));
|
let router = router(state).layer(cors_layer(&config.cors_allowed_origins));
|
||||||
Ok(AppHandle { router, daemon })
|
Ok(AppHandle { router, daemon })
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Bundle returned by [`spawn_crawler_daemon`]. The handle owns the
|
||||||
|
/// daemon's tasks; `resync` is the operator-trigger service shared with
|
||||||
|
/// `AppState` so admin endpoints can call into the same browser /
|
||||||
|
/// rate-limit machinery.
|
||||||
|
struct SpawnedDaemon {
|
||||||
|
handle: daemon::DaemonHandle,
|
||||||
|
resync: Arc<dyn ResyncService>,
|
||||||
|
crawler: Arc<CrawlerControl>,
|
||||||
|
}
|
||||||
|
|
||||||
async fn spawn_crawler_daemon(
|
async fn spawn_crawler_daemon(
|
||||||
db: PgPool,
|
db: PgPool,
|
||||||
storage: Arc<dyn Storage>,
|
storage: Arc<dyn Storage>,
|
||||||
cfg: &CrawlerConfig,
|
cfg: &CrawlerConfig,
|
||||||
) -> anyhow::Result<daemon::DaemonHandle> {
|
) -> anyhow::Result<SpawnedDaemon> {
|
||||||
// Reqwest client with cookie jar pre-seeded so CDN image fetches
|
// Reqwest client with a shared cookie jar so CDN image fetches include
|
||||||
// include PHPSESSID. Same shape as bin/crawler.rs main().
|
// PHPSESSID. The same `Arc<Jar>` is held by the SessionController, so a
|
||||||
|
// runtime session refresh rewrites it in place. Initial value: a
|
||||||
|
// persisted runtime session (survives restart) takes precedence over
|
||||||
|
// CRAWLER_PHPSESSID env.
|
||||||
let cookie_jar = Arc::new(reqwest::cookie::Jar::default());
|
let cookie_jar = Arc::new(reqwest::cookie::Jar::default());
|
||||||
|
let initial_sid = crate::crawler::session_control::SessionController::load_persisted(&db)
|
||||||
|
.await
|
||||||
|
.or_else(|| cfg.phpsessid.clone());
|
||||||
if let (Some(sid), Some(domain), Some(start_url)) =
|
if let (Some(sid), Some(domain), Some(start_url)) =
|
||||||
(&cfg.phpsessid, &cfg.cookie_domain, &cfg.start_url)
|
(&initial_sid, &cfg.cookie_domain, &cfg.start_url)
|
||||||
{
|
{
|
||||||
let cookie_str = format!("PHPSESSID={sid}; Domain={domain}; Path=/");
|
let cookie_str = format!("PHPSESSID={sid}; Domain={domain}; Path=/");
|
||||||
let seed_url = reqwest::Url::parse(start_url)
|
let seed_url = reqwest::Url::parse(start_url)
|
||||||
@@ -107,7 +155,7 @@ async fn spawn_crawler_daemon(
|
|||||||
let mut http_builder = reqwest::Client::builder()
|
let mut http_builder = reqwest::Client::builder()
|
||||||
.timeout(std::time::Duration::from_secs(30))
|
.timeout(std::time::Duration::from_secs(30))
|
||||||
.no_proxy()
|
.no_proxy()
|
||||||
.cookie_provider(cookie_jar);
|
.cookie_provider(Arc::clone(&cookie_jar));
|
||||||
if let Some(ua) = &cfg.user_agent {
|
if let Some(ua) = &cfg.user_agent {
|
||||||
http_builder = http_builder.user_agent(ua);
|
http_builder = http_builder.user_agent(ua);
|
||||||
}
|
}
|
||||||
@@ -123,27 +171,71 @@ async fn spawn_crawler_daemon(
|
|||||||
}
|
}
|
||||||
let rate = Arc::new(rate);
|
let rate = Arc::new(rate);
|
||||||
|
|
||||||
|
let tor = crate::crawler::tor::TorController::from_parts(
|
||||||
|
cfg.tor_control_url.as_deref(),
|
||||||
|
cfg.tor_control_password.as_deref(),
|
||||||
|
cfg.tor_control_cookie_path.as_deref(),
|
||||||
|
)
|
||||||
|
.context("build TorController from CRAWLER_TOR_CONTROL_* env")?
|
||||||
|
.map(Arc::new);
|
||||||
|
if let Some(t) = &tor {
|
||||||
|
tracing::info!(?t, "TOR control configured; transient pages will trigger NEWNYM");
|
||||||
|
}
|
||||||
|
let tor_recircuit_max = cfg.tor_recircuit_max_attempts;
|
||||||
|
|
||||||
|
// Session controller + sticky session-expired flag. Created before the
|
||||||
|
// browser so the on_launch hook can read the *current* session value
|
||||||
|
// (rather than a value captured at startup), and so a runtime refresh
|
||||||
|
// updates the cookie everywhere.
|
||||||
|
let session_expired = Arc::new(AtomicBool::new(false));
|
||||||
|
let session_controller = crate::crawler::session_control::SessionController::new(
|
||||||
|
initial_sid,
|
||||||
|
Arc::clone(&cookie_jar),
|
||||||
|
cfg.cookie_domain.clone(),
|
||||||
|
cfg.start_url.clone(),
|
||||||
|
db.clone(),
|
||||||
|
Arc::clone(&session_expired),
|
||||||
|
);
|
||||||
|
|
||||||
|
// Live status surface, sized to the worker count.
|
||||||
|
let status = crate::crawler::status::StatusHandle::new(cfg.chapter_workers);
|
||||||
|
|
||||||
// Browser manager. on_launch re-injects PHPSESSID on every fresh
|
// Browser manager. on_launch re-injects PHPSESSID on every fresh
|
||||||
// chromium spawn so an idle teardown followed by re-launch stays
|
// chromium spawn so an idle teardown followed by re-launch stays
|
||||||
// authenticated without operator action.
|
// authenticated without operator action.
|
||||||
let mut launch_opts = cfg.browser.clone();
|
let mut launch_opts = cfg.browser.clone();
|
||||||
if let Some(proxy) = &cfg.proxy {
|
if let Some(proxy) = &cfg.proxy {
|
||||||
launch_opts.extra_args.push(format!("--proxy-server={proxy}"));
|
let chromium_proxy = crate::crawler::url_utils::chromium_proxy_arg(proxy);
|
||||||
|
launch_opts.extra_args.push(format!("--proxy-server={chromium_proxy}"));
|
||||||
}
|
}
|
||||||
let on_launch = match (&cfg.phpsessid, &cfg.cookie_domain, &cfg.start_url) {
|
let on_launch = match (&cfg.cookie_domain, &cfg.start_url) {
|
||||||
(Some(sid), Some(domain), Some(start_url)) => {
|
(Some(domain), Some(start_url)) => {
|
||||||
let sid = sid.clone();
|
|
||||||
let domain = domain.clone();
|
let domain = domain.clone();
|
||||||
let start_url = start_url.clone();
|
let start_url = start_url.clone();
|
||||||
|
let tor_for_launch = tor.as_ref().map(Arc::clone);
|
||||||
|
let sc = Arc::clone(&session_controller);
|
||||||
let on_launch: browser_manager::OnLaunch = Arc::new(move |browser| {
|
let on_launch: browser_manager::OnLaunch = Arc::new(move |browser| {
|
||||||
let sid = sid.clone();
|
|
||||||
let domain = domain.clone();
|
let domain = domain.clone();
|
||||||
let start_url = start_url.clone();
|
let start_url = start_url.clone();
|
||||||
|
let tor_for_launch = tor_for_launch.as_ref().map(Arc::clone);
|
||||||
|
let sc = Arc::clone(&sc);
|
||||||
Box::pin(async move {
|
Box::pin(async move {
|
||||||
|
// Read the *current* session each launch so a runtime
|
||||||
|
// refresh is picked up on the next (re)launch. No session
|
||||||
|
// configured → run unauthenticated (metadata needs no auth).
|
||||||
|
let Some(sid) = sc.current().await else {
|
||||||
|
tracing::info!("on_launch: no session set — skipping inject + probe");
|
||||||
|
return Ok(());
|
||||||
|
};
|
||||||
session::inject_phpsessid(&browser, &sid, &domain)
|
session::inject_phpsessid(&browser, &sid, &domain)
|
||||||
.await
|
.await
|
||||||
.context("on_launch: inject_phpsessid")?;
|
.context("on_launch: inject_phpsessid")?;
|
||||||
session::verify_session(&browser, &start_url)
|
session::verify_session_with_recircuit(
|
||||||
|
&browser,
|
||||||
|
&start_url,
|
||||||
|
tor_for_launch.as_deref(),
|
||||||
|
tor_recircuit_max,
|
||||||
|
)
|
||||||
.await
|
.await
|
||||||
.context("on_launch: verify_session")?;
|
.context("on_launch: verify_session")?;
|
||||||
Ok(())
|
Ok(())
|
||||||
@@ -155,8 +247,6 @@ async fn spawn_crawler_daemon(
|
|||||||
};
|
};
|
||||||
let browser_manager = BrowserManager::new(launch_opts, cfg.idle_timeout, on_launch);
|
let browser_manager = BrowserManager::new(launch_opts, cfg.idle_timeout, on_launch);
|
||||||
|
|
||||||
let session_expired = Arc::new(AtomicBool::new(false));
|
|
||||||
|
|
||||||
let metadata_pass: Option<Arc<dyn MetadataPass>> = cfg.start_url.as_ref().map(|url| {
|
let metadata_pass: Option<Arc<dyn MetadataPass>> = cfg.start_url.as_ref().map(|url| {
|
||||||
let m: Arc<dyn MetadataPass> = Arc::new(RealMetadataPass {
|
let m: Arc<dyn MetadataPass> = Arc::new(RealMetadataPass {
|
||||||
browser_manager: Arc::clone(&browser_manager),
|
browser_manager: Arc::clone(&browser_manager),
|
||||||
@@ -165,13 +255,32 @@ async fn spawn_crawler_daemon(
|
|||||||
http: http.clone(),
|
http: http.clone(),
|
||||||
rate: Arc::clone(&rate),
|
rate: Arc::clone(&rate),
|
||||||
start_url: url.clone(),
|
start_url: url.clone(),
|
||||||
|
manga_limit: cfg.manga_limit,
|
||||||
download_allowlist: cfg.download_allowlist.clone(),
|
download_allowlist: cfg.download_allowlist.clone(),
|
||||||
max_image_bytes: cfg.max_image_bytes,
|
max_image_bytes: cfg.max_image_bytes,
|
||||||
|
metadata_max_consecutive_failures: cfg.metadata_max_consecutive_failures,
|
||||||
|
status: status.clone(),
|
||||||
|
tor: tor.as_ref().map(Arc::clone),
|
||||||
});
|
});
|
||||||
m
|
m
|
||||||
});
|
});
|
||||||
|
|
||||||
let dispatcher: Arc<dyn ChapterDispatcher> = Arc::new(RealChapterDispatcher {
|
let dispatcher: Arc<dyn ChapterDispatcher> = Arc::new(RealChapterDispatcher {
|
||||||
|
browser_manager: Arc::clone(&browser_manager),
|
||||||
|
db: db.clone(),
|
||||||
|
storage: Arc::clone(&storage),
|
||||||
|
http: http.clone(),
|
||||||
|
rate: Arc::clone(&rate),
|
||||||
|
download_allowlist: cfg.download_allowlist.clone(),
|
||||||
|
max_image_bytes: cfg.max_image_bytes,
|
||||||
|
transient_failures: Arc::new(AtomicU32::new(0)),
|
||||||
|
restart_threshold: cfg.browser_restart_threshold,
|
||||||
|
drain_deadline: cfg.job_timeout,
|
||||||
|
status: status.clone(),
|
||||||
|
tor: tor.as_ref().map(Arc::clone),
|
||||||
|
});
|
||||||
|
|
||||||
|
let resync: Arc<dyn ResyncService> = Arc::new(RealResyncService {
|
||||||
browser_manager: Arc::clone(&browser_manager),
|
browser_manager: Arc::clone(&browser_manager),
|
||||||
db: db.clone(),
|
db: db.clone(),
|
||||||
storage: Arc::clone(&storage),
|
storage: Arc::clone(&storage),
|
||||||
@@ -179,6 +288,7 @@ async fn spawn_crawler_daemon(
|
|||||||
rate: Arc::clone(&rate),
|
rate: Arc::clone(&rate),
|
||||||
download_allowlist: cfg.download_allowlist.clone(),
|
download_allowlist: cfg.download_allowlist.clone(),
|
||||||
max_image_bytes: cfg.max_image_bytes,
|
max_image_bytes: cfg.max_image_bytes,
|
||||||
|
tor: tor.as_ref().map(Arc::clone),
|
||||||
});
|
});
|
||||||
|
|
||||||
// Shared cancellation: daemon shutdown cancels the BrowserManager's
|
// Shared cancellation: daemon shutdown cancels the BrowserManager's
|
||||||
@@ -204,18 +314,32 @@ async fn spawn_crawler_daemon(
|
|||||||
db,
|
db,
|
||||||
cancel,
|
cancel,
|
||||||
DaemonConfig {
|
DaemonConfig {
|
||||||
metadata_pass,
|
metadata_pass: metadata_pass.clone(),
|
||||||
dispatcher,
|
dispatcher,
|
||||||
chapter_workers: cfg.chapter_workers,
|
chapter_workers: cfg.chapter_workers,
|
||||||
daily_at: cfg.daily_at,
|
daily_at: cfg.daily_at,
|
||||||
tz: cfg.tz,
|
tz: cfg.tz,
|
||||||
retention_days: cfg.retention_days,
|
retention_days: cfg.retention_days,
|
||||||
session_expired,
|
session_expired,
|
||||||
|
status: status.clone(),
|
||||||
|
job_timeout: cfg.job_timeout,
|
||||||
extra_tasks: vec![reaper_task, shutdown_task],
|
extra_tasks: vec![reaper_task, shutdown_task],
|
||||||
},
|
},
|
||||||
);
|
);
|
||||||
|
|
||||||
Ok(daemon_handle)
|
let crawler = Arc::new(CrawlerControl {
|
||||||
|
browser_manager: Arc::clone(&browser_manager),
|
||||||
|
session: session_controller,
|
||||||
|
status,
|
||||||
|
metadata_pass,
|
||||||
|
drain_deadline: cfg.job_timeout,
|
||||||
|
});
|
||||||
|
|
||||||
|
Ok(SpawnedDaemon {
|
||||||
|
handle: daemon_handle,
|
||||||
|
resync,
|
||||||
|
crawler,
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
// Real impls of the daemon traits, owning the browser manager + I/O. Kept
|
// Real impls of the daemon traits, owning the browser manager + I/O. Kept
|
||||||
@@ -230,8 +354,12 @@ struct RealMetadataPass {
|
|||||||
http: reqwest::Client,
|
http: reqwest::Client,
|
||||||
rate: Arc<HostRateLimiters>,
|
rate: Arc<HostRateLimiters>,
|
||||||
start_url: String,
|
start_url: String,
|
||||||
|
manga_limit: usize,
|
||||||
download_allowlist: DownloadAllowlist,
|
download_allowlist: DownloadAllowlist,
|
||||||
max_image_bytes: usize,
|
max_image_bytes: usize,
|
||||||
|
metadata_max_consecutive_failures: u32,
|
||||||
|
status: crate::crawler::status::StatusHandle,
|
||||||
|
tor: Option<Arc<crate::crawler::tor::TorController>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -244,10 +372,13 @@ impl MetadataPass for RealMetadataPass {
|
|||||||
&self.http,
|
&self.http,
|
||||||
&self.rate,
|
&self.rate,
|
||||||
&self.start_url,
|
&self.start_url,
|
||||||
0,
|
self.manga_limit,
|
||||||
false,
|
false,
|
||||||
&self.download_allowlist,
|
&self.download_allowlist,
|
||||||
self.max_image_bytes,
|
self.max_image_bytes,
|
||||||
|
self.metadata_max_consecutive_failures,
|
||||||
|
Some(&self.status),
|
||||||
|
self.tor.as_deref(),
|
||||||
)
|
)
|
||||||
.await;
|
.await;
|
||||||
if let Err(e) = &result {
|
if let Err(e) = &result {
|
||||||
@@ -255,6 +386,38 @@ impl MetadataPass for RealMetadataPass {
|
|||||||
self.browser_manager.invalidate().await;
|
self.browser_manager.invalidate().await;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
// Cover backfill follows the metadata pass even when the pass
|
||||||
|
// errored — the early-stop walk can complete its work and bail
|
||||||
|
// late, and a transient browser failure shouldn't cancel the
|
||||||
|
// residual cover backlog. The backfill has its own per-call cap
|
||||||
|
// so a runaway error stream can't monopolise the tick. It sets the
|
||||||
|
// CoverBackfill{index,total} phase + current_cover per entry.
|
||||||
|
match pipeline::backfill_missing_covers(
|
||||||
|
&self.browser_manager,
|
||||||
|
&self.db,
|
||||||
|
self.storage.as_ref(),
|
||||||
|
&self.http,
|
||||||
|
&self.rate,
|
||||||
|
pipeline::COVER_BACKFILL_DEFAULT_MAX,
|
||||||
|
&self.download_allowlist,
|
||||||
|
self.max_image_bytes,
|
||||||
|
Some(&self.status),
|
||||||
|
self.tor.as_deref(),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(stats) => {
|
||||||
|
if stats.considered > 0 {
|
||||||
|
tracing::info!(?stats, "cover backfill complete");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(error = ?e, "cover backfill failed");
|
||||||
|
if crate::crawler::nav::anyhow_looks_browser_dead(&e) {
|
||||||
|
self.browser_manager.invalidate().await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
result
|
result
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -267,6 +430,17 @@ struct RealChapterDispatcher {
|
|||||||
rate: Arc<HostRateLimiters>,
|
rate: Arc<HostRateLimiters>,
|
||||||
download_allowlist: DownloadAllowlist,
|
download_allowlist: DownloadAllowlist,
|
||||||
max_image_bytes: usize,
|
max_image_bytes: usize,
|
||||||
|
/// Consecutive transient chapter failures; resets on any success.
|
||||||
|
/// Drives the automatic coordinated browser restart.
|
||||||
|
transient_failures: Arc<std::sync::atomic::AtomicU32>,
|
||||||
|
/// Consecutive-failure count that triggers an auto restart.
|
||||||
|
restart_threshold: u32,
|
||||||
|
/// How long a coordinated restart waits for in-flight leases to drain.
|
||||||
|
drain_deadline: std::time::Duration,
|
||||||
|
/// Live status surface — the dispatcher registers each chapter it
|
||||||
|
/// crawls (with a realtime page count) here.
|
||||||
|
status: crate::crawler::status::StatusHandle,
|
||||||
|
tor: Option<Arc<crate::crawler::tor::TorController>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
@@ -281,10 +455,21 @@ impl ChapterDispatcher for RealChapterDispatcher {
|
|||||||
let row = repo::chapter::dispatch_target(&self.db, chapter_id)
|
let row = repo::chapter::dispatch_target(&self.db, chapter_id)
|
||||||
.await
|
.await
|
||||||
.context("look up chapter for dispatch")?;
|
.context("look up chapter for dispatch")?;
|
||||||
let Some((manga_id, source_url)) = row else {
|
let Some((manga_id, source_url, manga_title, chapter_number)) = row else {
|
||||||
// Chapter (or its source row) is gone — ack done.
|
// Chapter (or its source row) is gone — ack done.
|
||||||
return Ok(SyncOutcome::Skipped);
|
return Ok(SyncOutcome::Skipped);
|
||||||
};
|
};
|
||||||
|
// Register the chapter as crawling now (live status). The
|
||||||
|
// guard removes it on every exit path — success, panic, or
|
||||||
|
// the worker's outer-timeout drop.
|
||||||
|
let _active = self.status.begin_chapter(crate::crawler::status::ActiveChapter {
|
||||||
|
manga_id,
|
||||||
|
manga_title,
|
||||||
|
chapter_id,
|
||||||
|
chapter_number,
|
||||||
|
pages_done: 0,
|
||||||
|
pages_total: None,
|
||||||
|
});
|
||||||
let lease = self.browser_manager.acquire().await?;
|
let lease = self.browser_manager.acquire().await?;
|
||||||
let result = content::sync_chapter_content(
|
let result = content::sync_chapter_content(
|
||||||
&lease,
|
&lease,
|
||||||
@@ -298,14 +483,38 @@ impl ChapterDispatcher for RealChapterDispatcher {
|
|||||||
false,
|
false,
|
||||||
&self.download_allowlist,
|
&self.download_allowlist,
|
||||||
self.max_image_bytes,
|
self.max_image_bytes,
|
||||||
|
self.tor.as_deref(),
|
||||||
|
Some(&self.status),
|
||||||
)
|
)
|
||||||
.await;
|
.await;
|
||||||
drop(lease);
|
drop(lease);
|
||||||
match result {
|
match result {
|
||||||
Ok(outcome) => Ok(outcome),
|
Ok(outcome) => {
|
||||||
|
// Any successful dispatch (including a clean Skipped)
|
||||||
|
// means the browser is healthy — reset the streak.
|
||||||
|
self.transient_failures.store(0, Ordering::Release);
|
||||||
|
Ok(outcome)
|
||||||
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
|
let streak = self.transient_failures.fetch_add(1, Ordering::AcqRel) + 1;
|
||||||
if crate::crawler::nav::anyhow_looks_browser_dead(&e) {
|
if crate::crawler::nav::anyhow_looks_browser_dead(&e) {
|
||||||
|
// Hard browser-dead: lazy invalidate (next acquire
|
||||||
|
// relaunches). Reset the streak — we're recovering.
|
||||||
self.browser_manager.invalidate().await;
|
self.browser_manager.invalidate().await;
|
||||||
|
self.transient_failures.store(0, Ordering::Release);
|
||||||
|
} else if self.restart_threshold > 0 && streak >= self.restart_threshold {
|
||||||
|
// Persistent transients that TOR recircuit couldn't
|
||||||
|
// fix — proactively restart Chromium.
|
||||||
|
tracing::warn!(
|
||||||
|
streak,
|
||||||
|
threshold = self.restart_threshold,
|
||||||
|
"auto browser restart: consecutive transient chapter failures"
|
||||||
|
);
|
||||||
|
let _ = self
|
||||||
|
.browser_manager
|
||||||
|
.coordinated_restart(self.drain_deadline)
|
||||||
|
.await;
|
||||||
|
self.transient_failures.store(0, Ordering::Release);
|
||||||
}
|
}
|
||||||
Err(e)
|
Err(e)
|
||||||
}
|
}
|
||||||
@@ -325,11 +534,62 @@ pub fn router(state: AppState) -> Router {
|
|||||||
let max_request_bytes = state.upload.max_request_bytes;
|
let max_request_bytes = state.upload.max_request_bytes;
|
||||||
Router::new()
|
Router::new()
|
||||||
.nest("/api/v1", crate::api::routes())
|
.nest("/api/v1", crate::api::routes())
|
||||||
|
.layer(middleware::from_fn_with_state(
|
||||||
|
state.clone(),
|
||||||
|
private_mode_guard,
|
||||||
|
))
|
||||||
.layer(DefaultBodyLimit::max(max_request_bytes))
|
.layer(DefaultBodyLimit::max(max_request_bytes))
|
||||||
.with_state(state)
|
.with_state(state)
|
||||||
.layer(TraceLayer::new_for_http())
|
.layer(TraceLayer::new_for_http())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Paths reachable anonymously even when `PRIVATE_MODE=true`. Login and
|
||||||
|
/// logout are needed for the auth flow itself; `/health` is reserved
|
||||||
|
/// for load-balancer probes; `/auth/config` lets the frontend decide
|
||||||
|
/// whether to render the login form or its anonymous alternatives;
|
||||||
|
/// `/auth/register` is exempted from the gate so the handler can
|
||||||
|
/// return its informative `registration_disabled` 403 (the same code
|
||||||
|
/// public-mode deployments use when `ALLOW_SELF_REGISTER=false`) —
|
||||||
|
/// the handler itself force-blocks the request body in private mode,
|
||||||
|
/// so no account ever gets created here. Everything else demands a
|
||||||
|
/// valid session cookie or bearer token.
|
||||||
|
fn is_public_in_private_mode(path: &str) -> bool {
|
||||||
|
matches!(
|
||||||
|
path,
|
||||||
|
"/api/v1/health"
|
||||||
|
| "/api/v1/auth/config"
|
||||||
|
| "/api/v1/auth/login"
|
||||||
|
| "/api/v1/auth/logout"
|
||||||
|
| "/api/v1/auth/register"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Site-wide auth gate for `PRIVATE_MODE=true`. With the flag off this
|
||||||
|
/// is a no-op pass-through, so public deployments take no extra DB
|
||||||
|
/// hit. With it on, the guard reuses [`CurrentUser`] — the same
|
||||||
|
/// session-cookie-then-bearer-token logic the per-handler extractor
|
||||||
|
/// uses — so the two paths can never drift.
|
||||||
|
async fn private_mode_guard(
|
||||||
|
State(state): State<AppState>,
|
||||||
|
req: Request,
|
||||||
|
next: Next,
|
||||||
|
) -> Result<Response, AppError> {
|
||||||
|
if !state.auth.private_mode {
|
||||||
|
return Ok(next.run(req).await);
|
||||||
|
}
|
||||||
|
if is_public_in_private_mode(req.uri().path()) {
|
||||||
|
return Ok(next.run(req).await);
|
||||||
|
}
|
||||||
|
let (mut parts, body) = req.into_parts();
|
||||||
|
match CurrentUser::from_request_parts(&mut parts, &state).await {
|
||||||
|
Ok(_) => {
|
||||||
|
let req = Request::from_parts(parts, body);
|
||||||
|
Ok(next.run(req).await)
|
||||||
|
}
|
||||||
|
Err(_) => Err(AppError::Unauthenticated),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
pub(crate) fn cors_layer(allowed_origins: &[String]) -> CorsLayer {
|
pub(crate) fn cors_layer(allowed_origins: &[String]) -> CorsLayer {
|
||||||
if allowed_origins.is_empty() {
|
if allowed_origins.is_empty() {
|
||||||
// Same-origin only — no CORS headers emitted.
|
// Same-origin only — no CORS headers emitted.
|
||||||
|
|||||||
@@ -78,6 +78,21 @@ async fn main() -> anyhow::Result<()> {
|
|||||||
let proxy_url = std::env::var("CRAWLER_PROXY")
|
let proxy_url = std::env::var("CRAWLER_PROXY")
|
||||||
.ok()
|
.ok()
|
||||||
.filter(|s| !s.trim().is_empty());
|
.filter(|s| !s.trim().is_empty());
|
||||||
|
let tor_control_url = std::env::var("CRAWLER_TOR_CONTROL_URL")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty());
|
||||||
|
let tor_control_password = std::env::var("CRAWLER_TOR_CONTROL_PASSWORD")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty());
|
||||||
|
let tor_control_cookie_path = std::env::var("CRAWLER_TOR_CONTROL_COOKIE_PATH")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty())
|
||||||
|
.map(std::path::PathBuf::from);
|
||||||
|
let tor_recircuit_max_attempts: u32 = std::env::var("CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS")
|
||||||
|
.ok()
|
||||||
|
.and_then(|s| s.parse().ok())
|
||||||
|
.unwrap_or(3)
|
||||||
|
.max(1);
|
||||||
let keep_browser_open = env_bool("CRAWLER_KEEP_BROWSER_OPEN", false);
|
let keep_browser_open = env_bool("CRAWLER_KEEP_BROWSER_OPEN", false);
|
||||||
|
|
||||||
let db = PgPoolOptions::new()
|
let db = PgPoolOptions::new()
|
||||||
@@ -112,7 +127,8 @@ async fn main() -> anyhow::Result<()> {
|
|||||||
|
|
||||||
let mut options = LaunchOptions::from_env();
|
let mut options = LaunchOptions::from_env();
|
||||||
if let Some(proxy) = &proxy_url {
|
if let Some(proxy) = &proxy_url {
|
||||||
options.extra_args.push(format!("--proxy-server={proxy}"));
|
let chromium_proxy = mangalord::crawler::url_utils::chromium_proxy_arg(proxy);
|
||||||
|
options.extra_args.push(format!("--proxy-server={chromium_proxy}"));
|
||||||
}
|
}
|
||||||
let keep_open = match (keep_browser_open, options.mode) {
|
let keep_open = match (keep_browser_open, options.mode) {
|
||||||
(true, BrowserMode::Headed) => true,
|
(true, BrowserMode::Headed) => true,
|
||||||
@@ -144,6 +160,17 @@ async fn main() -> anyhow::Result<()> {
|
|||||||
"starting crawler"
|
"starting crawler"
|
||||||
);
|
);
|
||||||
|
|
||||||
|
let tor = mangalord::crawler::tor::TorController::from_parts(
|
||||||
|
tor_control_url.as_deref(),
|
||||||
|
tor_control_password.as_deref(),
|
||||||
|
tor_control_cookie_path.as_deref(),
|
||||||
|
)
|
||||||
|
.context("build TorController from CRAWLER_TOR_CONTROL_* env")?
|
||||||
|
.map(Arc::new);
|
||||||
|
if let Some(t) = &tor {
|
||||||
|
tracing::info!(?t, "TOR control configured");
|
||||||
|
}
|
||||||
|
|
||||||
// BrowserManager with idle_timeout = ZERO so the CLI keeps Chromium
|
// BrowserManager with idle_timeout = ZERO so the CLI keeps Chromium
|
||||||
// alive for the entire run — same lifecycle as the old direct
|
// alive for the entire run — same lifecycle as the old direct
|
||||||
// `browser::launch()` flow. on_launch re-injects PHPSESSID + runs the
|
// `browser::launch()` flow. on_launch re-injects PHPSESSID + runs the
|
||||||
@@ -153,15 +180,22 @@ async fn main() -> anyhow::Result<()> {
|
|||||||
let sid = sid.clone();
|
let sid = sid.clone();
|
||||||
let domain = domain.clone();
|
let domain = domain.clone();
|
||||||
let start_url_clone = start_url.clone();
|
let start_url_clone = start_url.clone();
|
||||||
|
let tor_for_launch = tor.as_ref().map(Arc::clone);
|
||||||
Arc::new(move |browser| {
|
Arc::new(move |browser| {
|
||||||
let sid = sid.clone();
|
let sid = sid.clone();
|
||||||
let domain = domain.clone();
|
let domain = domain.clone();
|
||||||
let start_url = start_url_clone.clone();
|
let start_url = start_url_clone.clone();
|
||||||
|
let tor_for_launch = tor_for_launch.as_ref().map(Arc::clone);
|
||||||
Box::pin(async move {
|
Box::pin(async move {
|
||||||
session::inject_phpsessid(&browser, &sid, &domain)
|
session::inject_phpsessid(&browser, &sid, &domain)
|
||||||
.await
|
.await
|
||||||
.context("inject_phpsessid")?;
|
.context("inject_phpsessid")?;
|
||||||
session::verify_session(&browser, &start_url)
|
session::verify_session_with_recircuit(
|
||||||
|
&browser,
|
||||||
|
&start_url,
|
||||||
|
tor_for_launch.as_deref(),
|
||||||
|
tor_recircuit_max_attempts,
|
||||||
|
)
|
||||||
.await
|
.await
|
||||||
.context("verify_session")?;
|
.context("verify_session")?;
|
||||||
Ok(())
|
Ok(())
|
||||||
@@ -187,6 +221,7 @@ async fn main() -> anyhow::Result<()> {
|
|||||||
skip_chapter_content || !session_ready,
|
skip_chapter_content || !session_ready,
|
||||||
chapter_workers,
|
chapter_workers,
|
||||||
force_refetch_chapters,
|
force_refetch_chapters,
|
||||||
|
tor.clone(),
|
||||||
)
|
)
|
||||||
.await;
|
.await;
|
||||||
|
|
||||||
@@ -216,6 +251,7 @@ async fn run(
|
|||||||
skip_chapter_content: bool,
|
skip_chapter_content: bool,
|
||||||
chapter_workers: usize,
|
chapter_workers: usize,
|
||||||
force_refetch_chapters: bool,
|
force_refetch_chapters: bool,
|
||||||
|
tor: Option<Arc<mangalord::crawler::tor::TorController>>,
|
||||||
) -> anyhow::Result<()> {
|
) -> anyhow::Result<()> {
|
||||||
let mut rate = HostRateLimiters::new(Duration::from_millis(rate_ms));
|
let mut rate = HostRateLimiters::new(Duration::from_millis(rate_ms));
|
||||||
if let Some(host) = cdn_host {
|
if let Some(host) = cdn_host {
|
||||||
@@ -267,6 +303,12 @@ async fn run(
|
|||||||
skip_chapters,
|
skip_chapters,
|
||||||
allowlist.as_ref(),
|
allowlist.as_ref(),
|
||||||
max_image_bytes,
|
max_image_bytes,
|
||||||
|
// Circuit-breaker disabled for the operator-driven CLI: a manual
|
||||||
|
// sweep should push through transient failures, not self-abort.
|
||||||
|
0,
|
||||||
|
// No live status surface for the one-shot CLI.
|
||||||
|
None,
|
||||||
|
tor.as_deref(),
|
||||||
)
|
)
|
||||||
.await?;
|
.await?;
|
||||||
tracing::info!(?stats, "metadata pass complete");
|
tracing::info!(?stats, "metadata pass complete");
|
||||||
@@ -283,6 +325,7 @@ async fn run(
|
|||||||
force_refetch_chapters,
|
force_refetch_chapters,
|
||||||
Arc::clone(&allowlist),
|
Arc::clone(&allowlist),
|
||||||
max_image_bytes,
|
max_image_bytes,
|
||||||
|
tor.clone(),
|
||||||
)
|
)
|
||||||
.await?;
|
.await?;
|
||||||
}
|
}
|
||||||
@@ -308,6 +351,7 @@ async fn sync_bookmarked_chapter_content(
|
|||||||
force_refetch: bool,
|
force_refetch: bool,
|
||||||
allowlist: Arc<mangalord::crawler::safety::DownloadAllowlist>,
|
allowlist: Arc<mangalord::crawler::safety::DownloadAllowlist>,
|
||||||
max_image_bytes: usize,
|
max_image_bytes: usize,
|
||||||
|
tor: Option<Arc<mangalord::crawler::tor::TorController>>,
|
||||||
) -> anyhow::Result<()> {
|
) -> anyhow::Result<()> {
|
||||||
let pending: Vec<(Uuid, Uuid, String)> = sqlx::query_as(
|
let pending: Vec<(Uuid, Uuid, String)> = sqlx::query_as(
|
||||||
r#"
|
r#"
|
||||||
@@ -345,6 +389,7 @@ async fn sync_bookmarked_chapter_content(
|
|||||||
let rate = Arc::clone(&rate);
|
let rate = Arc::clone(&rate);
|
||||||
let manager = Arc::clone(&manager);
|
let manager = Arc::clone(&manager);
|
||||||
let allowlist = Arc::clone(&allowlist);
|
let allowlist = Arc::clone(&allowlist);
|
||||||
|
let tor = tor.clone();
|
||||||
let stats = &stats;
|
let stats = &stats;
|
||||||
async move {
|
async move {
|
||||||
if session_expired.load(std::sync::atomic::Ordering::Relaxed) {
|
if session_expired.load(std::sync::atomic::Ordering::Relaxed) {
|
||||||
@@ -371,6 +416,9 @@ async fn sync_bookmarked_chapter_content(
|
|||||||
force_refetch,
|
force_refetch,
|
||||||
allowlist.as_ref(),
|
allowlist.as_ref(),
|
||||||
max_image_bytes,
|
max_image_bytes,
|
||||||
|
tor.as_deref(),
|
||||||
|
// CLI one-shot — no live status surface.
|
||||||
|
None,
|
||||||
)
|
)
|
||||||
.await;
|
.await;
|
||||||
drop(lease);
|
drop(lease);
|
||||||
|
|||||||
@@ -19,6 +19,14 @@ pub struct AuthConfig {
|
|||||||
/// `POST /admin/users`. Defaults to `true` (open registration)
|
/// `POST /admin/users`. Defaults to `true` (open registration)
|
||||||
/// for backward compatibility.
|
/// for backward compatibility.
|
||||||
pub allow_self_register: bool,
|
pub allow_self_register: bool,
|
||||||
|
/// When `true`, every API path except a small allowlist
|
||||||
|
/// (`/health`, `/auth/config`, `/auth/login`, `/auth/logout`)
|
||||||
|
/// requires a valid session cookie or bearer token — anonymous
|
||||||
|
/// reads are rejected with 401. Self-registration is also
|
||||||
|
/// force-disabled regardless of [`Self::allow_self_register`]
|
||||||
|
/// so a private instance is locked down with a single switch.
|
||||||
|
/// Defaults to `false` (current public behaviour).
|
||||||
|
pub private_mode: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for AuthConfig {
|
impl Default for AuthConfig {
|
||||||
@@ -33,6 +41,7 @@ impl Default for AuthConfig {
|
|||||||
// defaults.
|
// defaults.
|
||||||
rate_limit: crate::auth::rate_limit::RateLimitConfig::default(),
|
rate_limit: crate::auth::rate_limit::RateLimitConfig::default(),
|
||||||
allow_self_register: true,
|
allow_self_register: true,
|
||||||
|
private_mode: false,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -97,6 +106,20 @@ pub struct CrawlerConfig {
|
|||||||
pub cookie_domain: Option<String>,
|
pub cookie_domain: Option<String>,
|
||||||
pub user_agent: Option<String>,
|
pub user_agent: Option<String>,
|
||||||
pub proxy: Option<String>,
|
pub proxy: Option<String>,
|
||||||
|
/// `tcp://host:port`, `host:port`, or bare `host` (default port
|
||||||
|
/// 9051). When `None`, TOR-recircuit-on-transient is disabled and
|
||||||
|
/// the crawler behaves identically to pre-TOR releases.
|
||||||
|
pub tor_control_url: Option<String>,
|
||||||
|
/// HashedControlPassword auth. Used only when
|
||||||
|
/// `tor_control_cookie_path` is `None`.
|
||||||
|
pub tor_control_password: Option<String>,
|
||||||
|
/// Cookie-file auth path (e.g.
|
||||||
|
/// `/var/lib/tor/control_auth_cookie`). Takes precedence over
|
||||||
|
/// password when both are set.
|
||||||
|
pub tor_control_cookie_path: Option<PathBuf>,
|
||||||
|
/// Maximum NEWNYM-and-retry cycles per recircuit-eligible failure.
|
||||||
|
/// Defaults to 3.
|
||||||
|
pub tor_recircuit_max_attempts: u32,
|
||||||
pub browser: LaunchOptions,
|
pub browser: LaunchOptions,
|
||||||
/// Hosts the crawler is allowed to download images / covers from.
|
/// Hosts the crawler is allowed to download images / covers from.
|
||||||
/// Always seeded with the host of `start_url` and (when set) the
|
/// Always seeded with the host of `start_url` and (when set) the
|
||||||
@@ -105,6 +128,23 @@ pub struct CrawlerConfig {
|
|||||||
pub download_allowlist: DownloadAllowlist,
|
pub download_allowlist: DownloadAllowlist,
|
||||||
/// Hard upper bound on a single image download. Defaults to 32 MiB.
|
/// Hard upper bound on a single image download. Defaults to 32 MiB.
|
||||||
pub max_image_bytes: usize,
|
pub max_image_bytes: usize,
|
||||||
|
/// Max manga detail fetches per metadata pass. `0` means no cap
|
||||||
|
/// (full sweep up to the source's own bound). Sourced from
|
||||||
|
/// `CRAWLER_LIMIT`, mirroring the CLI binary.
|
||||||
|
pub manga_limit: usize,
|
||||||
|
/// Hard upper bound on a single chapter-content job dispatch. A job
|
||||||
|
/// exceeding this is acked failed (exponential backoff) instead of
|
||||||
|
/// wedging a worker. Defaults to 600s. `CRAWLER_JOB_TIMEOUT_SECS`.
|
||||||
|
pub job_timeout: Duration,
|
||||||
|
/// Consecutive `fetch_manga` failures that abort a metadata pass
|
||||||
|
/// (circuit-breaker for a source outage). The pass does NOT mark a
|
||||||
|
/// clean exit, so the next tick does a recovery sweep. Defaults to
|
||||||
|
/// 10. `CRAWLER_METADATA_MAX_CONSECUTIVE_FAILURES`.
|
||||||
|
pub metadata_max_consecutive_failures: u32,
|
||||||
|
/// Consecutive transient chapter failures (after TOR recircuit is
|
||||||
|
/// exhausted) that trigger an automatic coordinated browser restart.
|
||||||
|
/// Defaults to 3. `CRAWLER_BROWSER_RESTART_THRESHOLD`.
|
||||||
|
pub browser_restart_threshold: u32,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for CrawlerConfig {
|
impl Default for CrawlerConfig {
|
||||||
@@ -124,9 +164,17 @@ impl Default for CrawlerConfig {
|
|||||||
cookie_domain: None,
|
cookie_domain: None,
|
||||||
user_agent: None,
|
user_agent: None,
|
||||||
proxy: None,
|
proxy: None,
|
||||||
|
tor_control_url: None,
|
||||||
|
tor_control_password: None,
|
||||||
|
tor_control_cookie_path: None,
|
||||||
|
tor_recircuit_max_attempts: 3,
|
||||||
browser: LaunchOptions::headless(),
|
browser: LaunchOptions::headless(),
|
||||||
download_allowlist: DownloadAllowlist::new(),
|
download_allowlist: DownloadAllowlist::new(),
|
||||||
max_image_bytes: DEFAULT_MAX_IMAGE_BYTES,
|
max_image_bytes: DEFAULT_MAX_IMAGE_BYTES,
|
||||||
|
manga_limit: 0,
|
||||||
|
job_timeout: Duration::from_secs(600),
|
||||||
|
metadata_max_consecutive_failures: 10,
|
||||||
|
browser_restart_threshold: 3,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -158,6 +206,7 @@ impl Config {
|
|||||||
) as u32,
|
) as u32,
|
||||||
},
|
},
|
||||||
allow_self_register: env_bool("ALLOW_SELF_REGISTER", true),
|
allow_self_register: env_bool("ALLOW_SELF_REGISTER", true),
|
||||||
|
private_mode: env_bool("PRIVATE_MODE", false),
|
||||||
},
|
},
|
||||||
upload: UploadConfig {
|
upload: UploadConfig {
|
||||||
max_request_bytes: env_usize("MAX_REQUEST_BYTES", 200 * 1024 * 1024),
|
max_request_bytes: env_usize("MAX_REQUEST_BYTES", 200 * 1024 * 1024),
|
||||||
@@ -234,9 +283,29 @@ impl CrawlerConfig {
|
|||||||
proxy: std::env::var("CRAWLER_PROXY")
|
proxy: std::env::var("CRAWLER_PROXY")
|
||||||
.ok()
|
.ok()
|
||||||
.filter(|s| !s.trim().is_empty()),
|
.filter(|s| !s.trim().is_empty()),
|
||||||
|
tor_control_url: std::env::var("CRAWLER_TOR_CONTROL_URL")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty()),
|
||||||
|
tor_control_password: std::env::var("CRAWLER_TOR_CONTROL_PASSWORD")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty()),
|
||||||
|
tor_control_cookie_path: std::env::var("CRAWLER_TOR_CONTROL_COOKIE_PATH")
|
||||||
|
.ok()
|
||||||
|
.filter(|s| !s.trim().is_empty())
|
||||||
|
.map(PathBuf::from),
|
||||||
|
tor_recircuit_max_attempts: env_u64("CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS", 3)
|
||||||
|
.max(1) as u32,
|
||||||
browser: LaunchOptions::from_env(),
|
browser: LaunchOptions::from_env(),
|
||||||
download_allowlist,
|
download_allowlist,
|
||||||
max_image_bytes: env_usize("CRAWLER_MAX_IMAGE_BYTES", DEFAULT_MAX_IMAGE_BYTES),
|
max_image_bytes: env_usize("CRAWLER_MAX_IMAGE_BYTES", DEFAULT_MAX_IMAGE_BYTES),
|
||||||
|
manga_limit: env_usize("CRAWLER_LIMIT", 0),
|
||||||
|
job_timeout: Duration::from_secs(env_u64("CRAWLER_JOB_TIMEOUT_SECS", 600).max(1)),
|
||||||
|
metadata_max_consecutive_failures: env_u64(
|
||||||
|
"CRAWLER_METADATA_MAX_CONSECUTIVE_FAILURES",
|
||||||
|
10,
|
||||||
|
) as u32,
|
||||||
|
browser_restart_threshold: env_u64("CRAWLER_BROWSER_RESTART_THRESHOLD", 3).max(1)
|
||||||
|
as u32,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -310,3 +379,91 @@ fn env_usize(name: &str, default: usize) -> usize {
|
|||||||
.unwrap_or(default)
|
.unwrap_or(default)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use std::sync::Mutex;
|
||||||
|
|
||||||
|
// Serialise env-touching tests so concurrent cargo-test threads don't
|
||||||
|
// race on the process-global env. Re-acquire on poison since a
|
||||||
|
// panicking test still leaves the env in a consistent state for us
|
||||||
|
// (we set/unset within each guard region).
|
||||||
|
static ENV_GUARD: Mutex<()> = Mutex::new(());
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn crawler_limit_env_populates_manga_limit() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::set_var("CRAWLER_LIMIT", "96");
|
||||||
|
let cfg = CrawlerConfig::from_env().expect("from_env");
|
||||||
|
std::env::remove_var("CRAWLER_LIMIT");
|
||||||
|
assert_eq!(cfg.manga_limit, 96);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn crawler_limit_unset_defaults_to_zero() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::remove_var("CRAWLER_LIMIT");
|
||||||
|
let cfg = CrawlerConfig::from_env().expect("from_env");
|
||||||
|
assert_eq!(cfg.manga_limit, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn reliability_knobs_default_when_unset() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::remove_var("CRAWLER_JOB_TIMEOUT_SECS");
|
||||||
|
std::env::remove_var("CRAWLER_METADATA_MAX_CONSECUTIVE_FAILURES");
|
||||||
|
std::env::remove_var("CRAWLER_BROWSER_RESTART_THRESHOLD");
|
||||||
|
let cfg = CrawlerConfig::from_env().expect("from_env");
|
||||||
|
assert_eq!(cfg.job_timeout, Duration::from_secs(600));
|
||||||
|
assert_eq!(cfg.metadata_max_consecutive_failures, 10);
|
||||||
|
assert_eq!(cfg.browser_restart_threshold, 3);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn reliability_knobs_parse_from_env() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::set_var("CRAWLER_JOB_TIMEOUT_SECS", "120");
|
||||||
|
std::env::set_var("CRAWLER_METADATA_MAX_CONSECUTIVE_FAILURES", "5");
|
||||||
|
std::env::set_var("CRAWLER_BROWSER_RESTART_THRESHOLD", "7");
|
||||||
|
let cfg = CrawlerConfig::from_env().expect("from_env");
|
||||||
|
std::env::remove_var("CRAWLER_JOB_TIMEOUT_SECS");
|
||||||
|
std::env::remove_var("CRAWLER_METADATA_MAX_CONSECUTIVE_FAILURES");
|
||||||
|
std::env::remove_var("CRAWLER_BROWSER_RESTART_THRESHOLD");
|
||||||
|
assert_eq!(cfg.job_timeout, Duration::from_secs(120));
|
||||||
|
assert_eq!(cfg.metadata_max_consecutive_failures, 5);
|
||||||
|
assert_eq!(cfg.browser_restart_threshold, 7);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn private_mode_env_parses_true() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::set_var("PRIVATE_MODE", "true");
|
||||||
|
std::env::set_var("DATABASE_URL", "postgres://test");
|
||||||
|
let cfg = Config::from_env().expect("from_env");
|
||||||
|
std::env::remove_var("PRIVATE_MODE");
|
||||||
|
std::env::remove_var("DATABASE_URL");
|
||||||
|
assert!(cfg.auth.private_mode);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn private_mode_env_parses_false() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::set_var("PRIVATE_MODE", "false");
|
||||||
|
std::env::set_var("DATABASE_URL", "postgres://test");
|
||||||
|
let cfg = Config::from_env().expect("from_env");
|
||||||
|
std::env::remove_var("PRIVATE_MODE");
|
||||||
|
std::env::remove_var("DATABASE_URL");
|
||||||
|
assert!(!cfg.auth.private_mode);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn private_mode_defaults_to_false() {
|
||||||
|
let _g = ENV_GUARD.lock().unwrap_or_else(|p| p.into_inner());
|
||||||
|
std::env::remove_var("PRIVATE_MODE");
|
||||||
|
std::env::set_var("DATABASE_URL", "postgres://test");
|
||||||
|
let cfg = Config::from_env().expect("from_env");
|
||||||
|
std::env::remove_var("DATABASE_URL");
|
||||||
|
assert!(!cfg.auth.private_mode);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,7 @@
|
|||||||
//! until [`BrowserManager::shutdown`].
|
//! until [`BrowserManager::shutdown`].
|
||||||
|
|
||||||
use std::ops::Deref;
|
use std::ops::Deref;
|
||||||
use std::sync::atomic::{AtomicUsize, Ordering};
|
use std::sync::atomic::{AtomicBool, AtomicU8, AtomicUsize, Ordering};
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::Duration;
|
use std::time::Duration;
|
||||||
|
|
||||||
@@ -71,12 +71,42 @@ impl ActiveTracker {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Lifecycle gate for a coordinated browser restart. `acquire()` parks
|
||||||
|
/// while not [`RestartPhase::Healthy`] so no new navigation starts mid-
|
||||||
|
/// restart; long-lived lease holders (the metadata pass) cooperate by
|
||||||
|
/// checking [`BrowserManager::is_restart_pending`] at safe boundaries.
|
||||||
|
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
||||||
|
pub enum RestartPhase {
|
||||||
|
/// Normal operation — acquires proceed.
|
||||||
|
Healthy,
|
||||||
|
/// Restart requested; new acquires park, waiting for in-flight leases
|
||||||
|
/// to drain.
|
||||||
|
Draining,
|
||||||
|
/// Chromium is being closed + relaunched.
|
||||||
|
Restarting,
|
||||||
|
}
|
||||||
|
|
||||||
|
const PHASE_HEALTHY: u8 = 0;
|
||||||
|
const PHASE_DRAINING: u8 = 1;
|
||||||
|
const PHASE_RESTARTING: u8 = 2;
|
||||||
|
|
||||||
pub struct BrowserManager {
|
pub struct BrowserManager {
|
||||||
inner: Mutex<Inner>,
|
inner: Mutex<Inner>,
|
||||||
active: Arc<ActiveTracker>,
|
active: Arc<ActiveTracker>,
|
||||||
launch_opts: LaunchOptions,
|
launch_opts: LaunchOptions,
|
||||||
idle_timeout: Duration,
|
idle_timeout: Duration,
|
||||||
on_launch: OnLaunch,
|
on_launch: OnLaunch,
|
||||||
|
/// Coarse lifecycle phase (one of the `PHASE_*` constants).
|
||||||
|
phase: AtomicU8,
|
||||||
|
/// Woken when the phase returns to `Healthy` so parked acquires resume.
|
||||||
|
resume: Notify,
|
||||||
|
/// Serialises coordinated restarts so concurrent requests collapse into
|
||||||
|
/// a single relaunch.
|
||||||
|
restart_lock: Mutex<()>,
|
||||||
|
/// Result of the most recent relaunch, so a caller that coalesced into
|
||||||
|
/// an in-progress restart reports that restart's real outcome instead
|
||||||
|
/// of a blind success.
|
||||||
|
last_restart_ok: AtomicBool,
|
||||||
}
|
}
|
||||||
|
|
||||||
struct Inner {
|
struct Inner {
|
||||||
@@ -99,28 +129,72 @@ impl BrowserManager {
|
|||||||
launch_opts,
|
launch_opts,
|
||||||
idle_timeout,
|
idle_timeout,
|
||||||
on_launch,
|
on_launch,
|
||||||
|
phase: AtomicU8::new(PHASE_HEALTHY),
|
||||||
|
resume: Notify::new(),
|
||||||
|
restart_lock: Mutex::new(()),
|
||||||
|
last_restart_ok: AtomicBool::new(true),
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Current restart phase.
|
||||||
|
pub fn phase(&self) -> RestartPhase {
|
||||||
|
match self.phase.load(Ordering::Acquire) {
|
||||||
|
PHASE_DRAINING => RestartPhase::Draining,
|
||||||
|
PHASE_RESTARTING => RestartPhase::Restarting,
|
||||||
|
_ => RestartPhase::Healthy,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn set_phase(&self, phase: RestartPhase) {
|
||||||
|
let v = match phase {
|
||||||
|
RestartPhase::Healthy => PHASE_HEALTHY,
|
||||||
|
RestartPhase::Draining => PHASE_DRAINING,
|
||||||
|
RestartPhase::Restarting => PHASE_RESTARTING,
|
||||||
|
};
|
||||||
|
self.phase.store(v, Ordering::Release);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Whether a coordinated restart is in progress. Long-lived lease
|
||||||
|
/// holders poll this at safe boundaries and yield their lease so the
|
||||||
|
/// drain can complete promptly.
|
||||||
|
pub fn is_restart_pending(&self) -> bool {
|
||||||
|
self.phase() != RestartPhase::Healthy
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Launch Chromium into `guard`, running the `on_launch` hook before
|
||||||
|
/// publishing the handle so a probe failure doesn't leave a half-
|
||||||
|
/// initialised browser behind.
|
||||||
|
async fn launch_into(&self, guard: &mut Inner) -> anyhow::Result<()> {
|
||||||
|
let handle = browser::launch(self.launch_opts.clone())
|
||||||
|
.await
|
||||||
|
.context("BrowserManager: launch chromium")?;
|
||||||
|
let shared = handle.shared();
|
||||||
|
if let Err(e) = (self.on_launch)(Arc::clone(&shared)).await {
|
||||||
|
let _ = handle.close().await;
|
||||||
|
return Err(e.context("BrowserManager: on_launch hook failed"));
|
||||||
|
}
|
||||||
|
guard.handle = Some(handle);
|
||||||
|
guard.shared = Some(shared);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
/// Acquire a shared browser lease. The first acquire after a teardown
|
/// Acquire a shared browser lease. The first acquire after a teardown
|
||||||
/// launches a fresh Chromium (and runs `on_launch`); subsequent acquires
|
/// launches a fresh Chromium (and runs `on_launch`); subsequent acquires
|
||||||
/// while a process is alive just bump the counter and clone the `Arc`.
|
/// while a process is alive just bump the counter and clone the `Arc`.
|
||||||
pub async fn acquire(&self) -> anyhow::Result<BrowserLease> {
|
pub async fn acquire(&self) -> anyhow::Result<BrowserLease> {
|
||||||
|
// Park while a coordinated restart is draining/relaunching so no new
|
||||||
|
// navigation starts against a browser that's about to be torn down.
|
||||||
|
// The short sleep fallback guarantees liveness even if a `resume`
|
||||||
|
// notification is missed (classic Notify lost-wakeup).
|
||||||
|
while self.phase() != RestartPhase::Healthy {
|
||||||
|
tokio::select! {
|
||||||
|
_ = self.resume.notified() => {}
|
||||||
|
_ = tokio::time::sleep(Duration::from_millis(100)) => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
let mut guard = self.inner.lock().await;
|
let mut guard = self.inner.lock().await;
|
||||||
if guard.handle.is_none() {
|
if guard.handle.is_none() {
|
||||||
let handle = browser::launch(self.launch_opts.clone())
|
self.launch_into(&mut guard).await?;
|
||||||
.await
|
|
||||||
.context("BrowserManager: launch chromium")?;
|
|
||||||
let shared = handle.shared();
|
|
||||||
// Run the on-launch hook before publishing the handle so a session
|
|
||||||
// probe failure doesn't leave a half-initialized browser behind.
|
|
||||||
if let Err(e) = (self.on_launch)(Arc::clone(&shared)).await {
|
|
||||||
// Close the just-launched browser since we won't be using it.
|
|
||||||
let _ = handle.close().await;
|
|
||||||
return Err(e.context("BrowserManager: on_launch hook failed"));
|
|
||||||
}
|
|
||||||
guard.handle = Some(handle);
|
|
||||||
guard.shared = Some(shared);
|
|
||||||
}
|
}
|
||||||
let browser = guard
|
let browser = guard
|
||||||
.shared
|
.shared
|
||||||
@@ -134,6 +208,51 @@ impl BrowserManager {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Coordinated restart: block new acquires, wait for in-flight leases
|
||||||
|
/// to drain (up to `drain_deadline`, then force), close + relaunch
|
||||||
|
/// Chromium (re-running `on_launch` → re-inject session + probe), then
|
||||||
|
/// resume parked acquirers. Concurrent calls collapse into one
|
||||||
|
/// relaunch. The phase is always returned to `Healthy` — even if the
|
||||||
|
/// relaunch errors — so a failed restart never permanently wedges
|
||||||
|
/// acquisition (the next acquire retries the launch lazily).
|
||||||
|
pub async fn coordinated_restart(&self, drain_deadline: Duration) -> anyhow::Result<()> {
|
||||||
|
// Dedup: if a restart is already running, wait for it and report
|
||||||
|
// that restart's real outcome (not a blind success).
|
||||||
|
let _restart_guard = match self.restart_lock.try_lock() {
|
||||||
|
Ok(g) => g,
|
||||||
|
Err(_) => {
|
||||||
|
let _ = self.restart_lock.lock().await;
|
||||||
|
return if self.last_restart_ok.load(Ordering::Acquire) {
|
||||||
|
Ok(())
|
||||||
|
} else {
|
||||||
|
Err(anyhow::anyhow!("a concurrent coordinated browser restart failed"))
|
||||||
|
};
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
self.set_phase(RestartPhase::Draining);
|
||||||
|
await_drain(&self.active, drain_deadline).await;
|
||||||
|
|
||||||
|
self.set_phase(RestartPhase::Restarting);
|
||||||
|
let relaunch = {
|
||||||
|
let mut guard = self.inner.lock().await;
|
||||||
|
guard.shared = None;
|
||||||
|
if let Some(handle) = guard.handle.take() {
|
||||||
|
let _ = handle.close().await;
|
||||||
|
}
|
||||||
|
self.launch_into(&mut guard).await
|
||||||
|
};
|
||||||
|
|
||||||
|
self.last_restart_ok.store(relaunch.is_ok(), Ordering::Release);
|
||||||
|
self.set_phase(RestartPhase::Healthy);
|
||||||
|
self.resume.notify_waiters();
|
||||||
|
match &relaunch {
|
||||||
|
Ok(()) => tracing::info!("BrowserManager: coordinated restart complete"),
|
||||||
|
Err(e) => tracing::error!(error = ?e, "BrowserManager: coordinated restart relaunch failed"),
|
||||||
|
}
|
||||||
|
relaunch.context("coordinated_restart: relaunch")
|
||||||
|
}
|
||||||
|
|
||||||
/// Forcefully close the cached browser regardless of active count.
|
/// Forcefully close the cached browser regardless of active count.
|
||||||
/// Used on daemon shutdown. After this returns the next acquire will
|
/// Used on daemon shutdown. After this returns the next acquire will
|
||||||
/// re-launch from scratch.
|
/// re-launch from scratch.
|
||||||
@@ -176,6 +295,29 @@ impl BrowserManager {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Wait for the active-lease count to reach zero, up to `deadline`. Wakes
|
||||||
|
/// on the tracker's idle signal and re-checks on a short poll so a missed
|
||||||
|
/// signal can't strand the drain. Returns when drained or when the
|
||||||
|
/// deadline elapses (the caller then force-restarts). Extracted as a free
|
||||||
|
/// fn so the timing logic is unit-testable without launching Chromium.
|
||||||
|
async fn await_drain(active: &Arc<ActiveTracker>, deadline: Duration) {
|
||||||
|
let start = tokio::time::Instant::now();
|
||||||
|
while active.current() > 0 {
|
||||||
|
let Some(remaining) = deadline.checked_sub(start.elapsed()) else {
|
||||||
|
tracing::warn!(
|
||||||
|
active = active.current(),
|
||||||
|
"coordinated_restart: drain deadline exceeded — forcing relaunch"
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
};
|
||||||
|
let nap = remaining.min(Duration::from_millis(250));
|
||||||
|
tokio::select! {
|
||||||
|
_ = active.idle_signal().notified() => {}
|
||||||
|
_ = tokio::time::sleep(nap) => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Background reaper. Returns immediately when `idle_timeout == 0`.
|
/// Background reaper. Returns immediately when `idle_timeout == 0`.
|
||||||
/// Otherwise spawns a task that:
|
/// Otherwise spawns a task that:
|
||||||
/// 1. Waits on `idle_signal` (woken when active hits zero).
|
/// 1. Waits on `idle_signal` (woken when active hits zero).
|
||||||
@@ -270,6 +412,63 @@ mod tests {
|
|||||||
mgr.invalidate().await;
|
mgr.invalidate().await;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn await_drain_returns_immediately_when_already_idle() {
|
||||||
|
let active = ActiveTracker::new();
|
||||||
|
let start = tokio::time::Instant::now();
|
||||||
|
await_drain(&active, Duration::from_secs(5)).await;
|
||||||
|
assert!(start.elapsed() < Duration::from_millis(200), "no wait when idle");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn await_drain_completes_when_lease_released() {
|
||||||
|
let active = ActiveTracker::new();
|
||||||
|
active.acquire();
|
||||||
|
let bg = {
|
||||||
|
let a = Arc::clone(&active);
|
||||||
|
tokio::spawn(async move {
|
||||||
|
tokio::time::sleep(Duration::from_millis(100)).await;
|
||||||
|
a.release();
|
||||||
|
})
|
||||||
|
};
|
||||||
|
// Generous deadline; should return shortly after the release, not
|
||||||
|
// at the deadline.
|
||||||
|
let start = tokio::time::Instant::now();
|
||||||
|
await_drain(&active, Duration::from_secs(5)).await;
|
||||||
|
assert!(start.elapsed() < Duration::from_secs(2), "drained on release");
|
||||||
|
assert_eq!(active.current(), 0);
|
||||||
|
bg.await.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn await_drain_force_returns_after_deadline_when_stuck() {
|
||||||
|
let active = ActiveTracker::new();
|
||||||
|
active.acquire(); // never released
|
||||||
|
let start = tokio::time::Instant::now();
|
||||||
|
await_drain(&active, Duration::from_millis(300)).await;
|
||||||
|
let elapsed = start.elapsed();
|
||||||
|
assert!(elapsed >= Duration::from_millis(250), "waited ~deadline: {elapsed:?}");
|
||||||
|
assert!(elapsed < Duration::from_secs(2), "but not forever: {elapsed:?}");
|
||||||
|
assert_eq!(active.current(), 1, "still held — caller force-restarts");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn phase_transitions_reflect_is_restart_pending() {
|
||||||
|
let mgr = BrowserManager::new(
|
||||||
|
crate::crawler::browser::LaunchOptions::default(),
|
||||||
|
Duration::ZERO,
|
||||||
|
noop_on_launch(),
|
||||||
|
);
|
||||||
|
assert_eq!(mgr.phase(), RestartPhase::Healthy);
|
||||||
|
assert!(!mgr.is_restart_pending());
|
||||||
|
mgr.set_phase(RestartPhase::Draining);
|
||||||
|
assert!(mgr.is_restart_pending());
|
||||||
|
mgr.set_phase(RestartPhase::Restarting);
|
||||||
|
assert!(mgr.is_restart_pending());
|
||||||
|
mgr.set_phase(RestartPhase::Healthy);
|
||||||
|
assert!(!mgr.is_restart_pending());
|
||||||
|
}
|
||||||
|
|
||||||
#[tokio::test]
|
#[tokio::test]
|
||||||
async fn active_tracker_signals_idle_only_on_zero_transition() {
|
async fn active_tracker_signals_idle_only_on_zero_transition() {
|
||||||
let tracker = ActiveTracker::new();
|
let tracker = ActiveTracker::new();
|
||||||
|
|||||||
@@ -73,39 +73,36 @@ pub enum SyncOutcome {
|
|||||||
SessionExpired,
|
SessionExpired,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Fetch all images for one chapter and persist them atomically. On
|
/// Per-chapter max fetch attempts when TOR is configured. `N = 3` means
|
||||||
/// any error after the first storage put, the DB transaction rolls
|
/// up to 3 total page fetches with 2 NEWNYM signals between them. When
|
||||||
/// back so the chapter stays at `page_count = 0` and is retried on the
|
/// TOR is not configured the effective budget collapses to 1 (single
|
||||||
/// next run. Bytes already written to storage become orphans; a future
|
/// attempt, no retry, no recircuit — bit-for-bit pre-TOR behavior).
|
||||||
/// reaper sweeps them.
|
const CHAPTER_RECIRCUIT_MAX_ATTEMPTS: u32 = 3;
|
||||||
#[allow(clippy::too_many_arguments)]
|
|
||||||
pub async fn sync_chapter_content(
|
|
||||||
browser: &chromiumoxide::Browser,
|
|
||||||
db: &PgPool,
|
|
||||||
storage: &dyn Storage,
|
|
||||||
http: &reqwest::Client,
|
|
||||||
rate: &HostRateLimiters,
|
|
||||||
chapter_id: Uuid,
|
|
||||||
manga_id: Uuid,
|
|
||||||
source_url: &str,
|
|
||||||
force_refetch: bool,
|
|
||||||
allowlist: &DownloadAllowlist,
|
|
||||||
max_image_bytes: usize,
|
|
||||||
) -> anyhow::Result<SyncOutcome> {
|
|
||||||
// Skip if already fetched, unless caller explicitly forces.
|
|
||||||
if !force_refetch {
|
|
||||||
let (page_count,): (i32,) =
|
|
||||||
sqlx::query_as("SELECT page_count FROM chapters WHERE id = $1")
|
|
||||||
.bind(chapter_id)
|
|
||||||
.fetch_one(db)
|
|
||||||
.await
|
|
||||||
.context("read chapter page_count")?;
|
|
||||||
if page_count > 0 {
|
|
||||||
return Ok(SyncOutcome::Skipped);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Nav to chapter page (rate-limited per host).
|
/// Outcome of [`fetch_chapter_html_with_recircuit`]. `Ok` carries the
|
||||||
|
/// final reader HTML; the other two map to `sync_chapter_content`'s
|
||||||
|
/// existing failure modes.
|
||||||
|
#[derive(Debug)]
|
||||||
|
enum ChapterFetchOutcome {
|
||||||
|
Ok(String),
|
||||||
|
/// `ChapterProbe::Unauthenticated` after exhausting recircuit
|
||||||
|
/// budget (or with budget=0). Caller returns
|
||||||
|
/// `SyncOutcome::SessionExpired`.
|
||||||
|
SessionExpired,
|
||||||
|
/// `ChapterProbe::Transient` after exhausting recircuit budget
|
||||||
|
/// (or with budget=0). Caller bails so the dispatcher does
|
||||||
|
/// exponential backoff.
|
||||||
|
PersistentTransient,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Single rate-limited Chromium navigation to the chapter URL,
|
||||||
|
/// returning the page HTML. Extracted from `sync_chapter_content` so
|
||||||
|
/// the recircuit loop can call it once per attempt.
|
||||||
|
async fn fetch_chapter_html_once(
|
||||||
|
browser: &chromiumoxide::Browser,
|
||||||
|
rate: &HostRateLimiters,
|
||||||
|
source_url: &str,
|
||||||
|
) -> anyhow::Result<String> {
|
||||||
rate.wait_for(source_url).await?;
|
rate.wait_for(source_url).await?;
|
||||||
let page = browser
|
let page = browser
|
||||||
.new_page(source_url)
|
.new_page(source_url)
|
||||||
@@ -124,28 +121,140 @@ pub async fn sync_chapter_content(
|
|||||||
crate::crawler::nav::SELECTOR_TIMEOUT,
|
crate::crawler::nav::SELECTOR_TIMEOUT,
|
||||||
)
|
)
|
||||||
.await;
|
.await;
|
||||||
|
|
||||||
let html = page.content().await.context("read chapter html")?;
|
let html = page.content().await.context("read chapter html")?;
|
||||||
page.close().await.ok();
|
page.close().await.ok();
|
||||||
|
Ok(html)
|
||||||
|
}
|
||||||
|
|
||||||
// Three-way session classification: distinguishes a transient
|
/// Pure-over-IO loop: fetch + classify, up to `max_attempts` total
|
||||||
// hiccup (broken-page body or logged-in-but-no-reader) from a
|
/// fetches. Between attempts, `recircuit` is invoked (a no-op when
|
||||||
// genuine PHPSESSID expiry (no reader and no avatar widget). The
|
/// TOR isn't configured). `max_attempts = 1` collapses to the
|
||||||
// earlier binary `#avatar_menu` check conflated both and froze
|
/// original single-shot behavior — `Unauthenticated` →
|
||||||
// every worker on a layout shift.
|
/// `SessionExpired`, `Transient` → `PersistentTransient` on the first
|
||||||
|
/// hit, no recircuit.
|
||||||
|
///
|
||||||
|
/// Semantics match [`crate::crawler::detect::retry_on_transient`] and
|
||||||
|
/// [`run_session_probe_loop`]: `N` is **total attempts including the
|
||||||
|
/// first**, so `N = 3` means 3 fetches and up to 2 NEWNYM calls.
|
||||||
|
/// `Unauthenticated` and `Transient` share the budget — the loop
|
||||||
|
/// doesn't distinguish, so a sequence like Transient → Unauth → Ok
|
||||||
|
/// counts as 3 attempts.
|
||||||
|
async fn fetch_chapter_html_with_recircuit<F, Fut, R, RFut>(
|
||||||
|
mut fetch: F,
|
||||||
|
mut recircuit: R,
|
||||||
|
max_attempts: u32,
|
||||||
|
source_url_for_msg: &str,
|
||||||
|
) -> anyhow::Result<ChapterFetchOutcome>
|
||||||
|
where
|
||||||
|
F: FnMut() -> Fut,
|
||||||
|
Fut: std::future::Future<Output = anyhow::Result<String>>,
|
||||||
|
R: FnMut() -> RFut,
|
||||||
|
RFut: std::future::Future<Output = ()>,
|
||||||
|
{
|
||||||
|
debug_assert!(max_attempts >= 1, "max_attempts must be at least 1");
|
||||||
|
let mut attempt = 0u32;
|
||||||
|
loop {
|
||||||
|
attempt += 1;
|
||||||
|
let html = fetch().await?;
|
||||||
match session::classify_chapter_probe(&html) {
|
match session::classify_chapter_probe(&html) {
|
||||||
ChapterProbe::Unauthenticated => return Ok(SyncOutcome::SessionExpired),
|
ChapterProbe::Ok => return Ok(ChapterFetchOutcome::Ok(html)),
|
||||||
|
ChapterProbe::Unauthenticated => {
|
||||||
|
if attempt >= max_attempts {
|
||||||
|
return Ok(ChapterFetchOutcome::SessionExpired);
|
||||||
|
}
|
||||||
|
tracing::warn!(
|
||||||
|
attempt,
|
||||||
|
max = max_attempts,
|
||||||
|
url = source_url_for_msg,
|
||||||
|
"chapter probe Unauthenticated; signaling TOR NEWNYM and retrying"
|
||||||
|
);
|
||||||
|
recircuit().await;
|
||||||
|
}
|
||||||
ChapterProbe::Transient => {
|
ChapterProbe::Transient => {
|
||||||
|
if attempt >= max_attempts {
|
||||||
|
return Ok(ChapterFetchOutcome::PersistentTransient);
|
||||||
|
}
|
||||||
|
tracing::warn!(
|
||||||
|
attempt,
|
||||||
|
max = max_attempts,
|
||||||
|
url = source_url_for_msg,
|
||||||
|
"chapter probe Transient; signaling TOR NEWNYM and retrying"
|
||||||
|
);
|
||||||
|
recircuit().await;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Fetch one chapter's images and persist them. Each image is streamed to
|
||||||
|
/// storage as it's fetched (peak memory ≈ one image, not the whole
|
||||||
|
/// chapter); the page rows + `page_count` are then written in one short
|
||||||
|
/// transaction. On any failure the chapter stays at `page_count = 0` (no
|
||||||
|
/// partial rows) and the blobs already written are deleted best-effort by
|
||||||
|
/// [`cleanup_orphans`], so a retry starts clean.
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
pub async fn sync_chapter_content(
|
||||||
|
browser: &chromiumoxide::Browser,
|
||||||
|
db: &PgPool,
|
||||||
|
storage: &dyn Storage,
|
||||||
|
http: &reqwest::Client,
|
||||||
|
rate: &HostRateLimiters,
|
||||||
|
chapter_id: Uuid,
|
||||||
|
manga_id: Uuid,
|
||||||
|
source_url: &str,
|
||||||
|
force_refetch: bool,
|
||||||
|
allowlist: &DownloadAllowlist,
|
||||||
|
max_image_bytes: usize,
|
||||||
|
tor: Option<&crate::crawler::tor::TorController>,
|
||||||
|
// Optional live-status sink for the realtime page counter. The daemon
|
||||||
|
// dispatcher passes the shared handle (the chapter has already been
|
||||||
|
// registered via `begin_chapter`); the CLI / admin resync pass `None`.
|
||||||
|
progress: Option<&crate::crawler::status::StatusHandle>,
|
||||||
|
) -> anyhow::Result<SyncOutcome> {
|
||||||
|
// Skip if already fetched, unless caller explicitly forces.
|
||||||
|
if !force_refetch {
|
||||||
|
let (page_count,): (i32,) =
|
||||||
|
sqlx::query_as("SELECT page_count FROM chapters WHERE id = $1")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_one(db)
|
||||||
|
.await
|
||||||
|
.context("read chapter page_count")?;
|
||||||
|
if page_count > 0 {
|
||||||
|
return Ok(SyncOutcome::Skipped);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetch + classify. With TOR configured, allow up to
|
||||||
|
// CHAPTER_RECIRCUIT_MAX_ATTEMPTS total page fetches with NEWNYM
|
||||||
|
// between each. Without TOR, collapse to 1 attempt (no retry, no
|
||||||
|
// recircuit) — matches the pre-TOR single-shot behavior bit-for-bit.
|
||||||
|
let max_attempts = if tor.is_some() { CHAPTER_RECIRCUIT_MAX_ATTEMPTS } else { 1 };
|
||||||
|
let html = match fetch_chapter_html_with_recircuit(
|
||||||
|
|| fetch_chapter_html_once(browser, rate, source_url),
|
||||||
|
|| async {
|
||||||
|
if let Some(t) = tor {
|
||||||
|
if let Err(e) = t.new_identity().await {
|
||||||
|
tracing::warn!(error = %e, "TOR NEWNYM failed; continuing with same circuit");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
max_attempts,
|
||||||
|
source_url,
|
||||||
|
)
|
||||||
|
.await?
|
||||||
|
{
|
||||||
|
ChapterFetchOutcome::Ok(html) => html,
|
||||||
|
ChapterFetchOutcome::SessionExpired => return Ok(SyncOutcome::SessionExpired),
|
||||||
|
ChapterFetchOutcome::PersistentTransient => {
|
||||||
// Surface as a typed Err so the dispatcher path runs
|
// Surface as a typed Err so the dispatcher path runs
|
||||||
// ack_failed with exponential backoff (rather than the
|
// ack_failed with exponential backoff (rather than the
|
||||||
// session-expired sticky flag).
|
// session-expired sticky flag).
|
||||||
anyhow::bail!(
|
anyhow::bail!(
|
||||||
"chapter page at {source_url} returned a transient response \
|
"chapter page at {source_url} returned a transient response after \
|
||||||
(broken-page body or reader didn't render); will retry"
|
{max_attempts} attempt(s); will retry"
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
ChapterProbe::Ok => {}
|
};
|
||||||
}
|
|
||||||
|
|
||||||
let images = parse_chapter_pages(&html)
|
let images = parse_chapter_pages(&html)
|
||||||
.with_context(|| format!("parse chapter pages at {source_url}"))?;
|
.with_context(|| format!("parse chapter pages at {source_url}"))?;
|
||||||
@@ -156,28 +265,93 @@ pub async fn sync_chapter_content(
|
|||||||
// Resolve image URLs against the chapter URL (they may be relative).
|
// Resolve image URLs against the chapter URL (they may be relative).
|
||||||
let base = reqwest::Url::parse(source_url).context("parse chapter URL")?;
|
let base = reqwest::Url::parse(source_url).context("parse chapter URL")?;
|
||||||
|
|
||||||
// Fetch every image bytes-first into memory before writing
|
// Stream each image straight to storage as it's fetched, capping peak
|
||||||
// anything. Lets us bail the whole chapter cleanly if any image
|
// memory at a single image rather than the whole chapter. Track the
|
||||||
// fails — DB stays at page_count=0, no partial rows persisted.
|
// keys written so they can be rolled back if a later page (or the
|
||||||
let mut fetched: Vec<(i32, Vec<u8>, &'static str)> = Vec::with_capacity(images.len());
|
// final DB commit) fails — preserving the all-or-nothing guarantee
|
||||||
|
// without holding a DB transaction open across the network puts
|
||||||
|
// (which matters once `Storage` is backed by S3).
|
||||||
|
let total = images.len();
|
||||||
|
// Publish the now-known page total so the dashboard shows "0/N".
|
||||||
|
if let Some(p) = progress {
|
||||||
|
p.set_chapter_pages(chapter_id, 0, Some(total));
|
||||||
|
}
|
||||||
|
let mut written_keys: Vec<String> = Vec::with_capacity(total);
|
||||||
|
let mut stored: Vec<StoredPage> = Vec::with_capacity(total);
|
||||||
for img in &images {
|
for img in &images {
|
||||||
let url = base.join(&img.url).with_context(|| {
|
match download_and_store_page(
|
||||||
format!("join image URL {} onto {source_url}", img.url)
|
storage,
|
||||||
})?;
|
|
||||||
rate.wait_for(url.as_str()).await?;
|
|
||||||
let bytes = fetch_bytes_capped(
|
|
||||||
http,
|
http,
|
||||||
url.as_str(),
|
rate,
|
||||||
Some(source_url),
|
&base,
|
||||||
|
source_url,
|
||||||
|
manga_id,
|
||||||
|
chapter_id,
|
||||||
|
img,
|
||||||
allowlist,
|
allowlist,
|
||||||
max_image_bytes,
|
max_image_bytes,
|
||||||
)
|
)
|
||||||
.await?
|
.await
|
||||||
.to_vec();
|
{
|
||||||
// Reject any non-image response: the only valid output of an
|
Ok(page) => {
|
||||||
// image URL is an image. `infer` returns None on truncated
|
written_keys.push(page.storage_key.clone());
|
||||||
// bytes too, which also wants to be a failure not a silent
|
stored.push(page);
|
||||||
// `.bin` extension.
|
// Live page counter: push the climbing count to subscribers.
|
||||||
|
if let Some(p) = progress {
|
||||||
|
p.set_chapter_pages(chapter_id, stored.len(), Some(total));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
cleanup_orphans(storage, &written_keys).await;
|
||||||
|
return Err(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Short transaction: page rows + page_count only, no network I/O. On
|
||||||
|
// failure, roll back the stored bytes so the chapter stays at
|
||||||
|
// page_count=0 and is retried cleanly next run.
|
||||||
|
if let Err(e) = persist_pages(db, chapter_id, &stored).await {
|
||||||
|
cleanup_orphans(storage, &written_keys).await;
|
||||||
|
return Err(e);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(SyncOutcome::Fetched { pages: stored.len() })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A page image that has been written to storage and is awaiting its DB
|
||||||
|
/// row. Carries everything `persist_pages` needs.
|
||||||
|
pub(crate) struct StoredPage {
|
||||||
|
page_number: i32,
|
||||||
|
storage_key: String,
|
||||||
|
content_type: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Download a single page image, validate it's really an image, and write
|
||||||
|
/// it to storage. Returns the storage key + content type. Does not touch
|
||||||
|
/// the DB — persistence is batched into one short transaction afterward.
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
async fn download_and_store_page(
|
||||||
|
storage: &dyn Storage,
|
||||||
|
http: &reqwest::Client,
|
||||||
|
rate: &HostRateLimiters,
|
||||||
|
base: &reqwest::Url,
|
||||||
|
source_url: &str,
|
||||||
|
manga_id: Uuid,
|
||||||
|
chapter_id: Uuid,
|
||||||
|
img: &ChapterImage,
|
||||||
|
allowlist: &DownloadAllowlist,
|
||||||
|
max_image_bytes: usize,
|
||||||
|
) -> anyhow::Result<StoredPage> {
|
||||||
|
let url = base
|
||||||
|
.join(&img.url)
|
||||||
|
.with_context(|| format!("join image URL {} onto {source_url}", img.url))?;
|
||||||
|
rate.wait_for(url.as_str()).await?;
|
||||||
|
let bytes = fetch_bytes_capped(http, url.as_str(), Some(source_url), allowlist, max_image_bytes)
|
||||||
|
.await?;
|
||||||
|
// Reject any non-image response: the only valid output of an image URL
|
||||||
|
// is an image. `infer` returns None on truncated bytes too, which also
|
||||||
|
// wants to be a failure not a silent `.bin` extension.
|
||||||
if !looks_like_image(&bytes) {
|
if !looks_like_image(&bytes) {
|
||||||
anyhow::bail!(
|
anyhow::bail!(
|
||||||
"image URL {url} returned non-image bytes \
|
"image URL {url} returned non-image bytes \
|
||||||
@@ -188,24 +362,30 @@ pub async fn sync_chapter_content(
|
|||||||
let ext = infer::get(&bytes)
|
let ext = infer::get(&bytes)
|
||||||
.map(|k| k.extension())
|
.map(|k| k.extension())
|
||||||
.expect("looks_like_image asserted infer succeeded");
|
.expect("looks_like_image asserted infer succeeded");
|
||||||
fetched.push((img.page_number, bytes, ext));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Atomic write: storage puts + page row inserts + page_count
|
|
||||||
// update, all in one transaction. If anything fails, rollback +
|
|
||||||
// the chapter is retried next run. Storage orphans the bytes; a
|
|
||||||
// reaper sweeps them later.
|
|
||||||
let mut tx = db.begin().await.context("open chapter sync tx")?;
|
|
||||||
for (page_number, bytes, ext) in &fetched {
|
|
||||||
let key = format!(
|
let key = format!(
|
||||||
"mangas/{manga_id}/chapters/{chapter_id}/pages/{:04}.{ext}",
|
"mangas/{manga_id}/chapters/{chapter_id}/pages/{:04}.{ext}",
|
||||||
page_number
|
img.page_number
|
||||||
);
|
);
|
||||||
storage
|
storage
|
||||||
.put(&key, bytes)
|
.put(&key, &bytes)
|
||||||
.await
|
.await
|
||||||
.with_context(|| format!("put {key}"))?;
|
.with_context(|| format!("put {key}"))?;
|
||||||
// (chapter_id, page_number) is unique — re-runs idempotent.
|
Ok(StoredPage {
|
||||||
|
page_number: img.page_number,
|
||||||
|
storage_key: key,
|
||||||
|
content_type: format!("image/{ext}"),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Persist the page rows + chapter `page_count` in one short transaction.
|
||||||
|
/// `(chapter_id, page_number)` is unique so re-runs are idempotent.
|
||||||
|
pub(crate) async fn persist_pages(
|
||||||
|
db: &PgPool,
|
||||||
|
chapter_id: Uuid,
|
||||||
|
stored: &[StoredPage],
|
||||||
|
) -> anyhow::Result<()> {
|
||||||
|
let mut tx = db.begin().await.context("open chapter sync tx")?;
|
||||||
|
for page in stored {
|
||||||
sqlx::query(
|
sqlx::query(
|
||||||
"INSERT INTO pages (chapter_id, page_number, storage_key, content_type)
|
"INSERT INTO pages (chapter_id, page_number, storage_key, content_type)
|
||||||
VALUES ($1, $2, $3, $4)
|
VALUES ($1, $2, $3, $4)
|
||||||
@@ -214,22 +394,36 @@ pub async fn sync_chapter_content(
|
|||||||
content_type = EXCLUDED.content_type",
|
content_type = EXCLUDED.content_type",
|
||||||
)
|
)
|
||||||
.bind(chapter_id)
|
.bind(chapter_id)
|
||||||
.bind(page_number)
|
.bind(page.page_number)
|
||||||
.bind(&key)
|
.bind(&page.storage_key)
|
||||||
.bind(format!("image/{ext}"))
|
.bind(&page.content_type)
|
||||||
.execute(&mut *tx)
|
.execute(&mut *tx)
|
||||||
.await
|
.await
|
||||||
.with_context(|| format!("insert page row {page_number}"))?;
|
.with_context(|| format!("insert page row {}", page.page_number))?;
|
||||||
}
|
}
|
||||||
sqlx::query("UPDATE chapters SET page_count = $1 WHERE id = $2")
|
sqlx::query("UPDATE chapters SET page_count = $1 WHERE id = $2")
|
||||||
.bind(fetched.len() as i32)
|
.bind(stored.len() as i32)
|
||||||
.bind(chapter_id)
|
.bind(chapter_id)
|
||||||
.execute(&mut *tx)
|
.execute(&mut *tx)
|
||||||
.await
|
.await
|
||||||
.context("update page_count")?;
|
.context("update page_count")?;
|
||||||
tx.commit().await.context("commit chapter sync")?;
|
tx.commit().await.context("commit chapter sync")?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
Ok(SyncOutcome::Fetched { pages: fetched.len() })
|
/// Best-effort delete of partially-written page blobs after a chapter sync
|
||||||
|
/// fails, so a retry doesn't accumulate orphans. Errors are logged, not
|
||||||
|
/// raised — a leftover blob is harmless and a future reaper can sweep it.
|
||||||
|
pub(crate) async fn cleanup_orphans(storage: &dyn Storage, keys: &[String]) {
|
||||||
|
for key in keys {
|
||||||
|
if let Err(e) = storage.delete(key).await {
|
||||||
|
tracing::warn!(
|
||||||
|
%key,
|
||||||
|
error = ?e,
|
||||||
|
"failed to delete orphaned page blob after chapter sync failure"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Suppress unused-import warning for `session::registrable_domain`
|
// Suppress unused-import warning for `session::registrable_domain`
|
||||||
@@ -243,6 +437,90 @@ fn _keep_session_in_scope() {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
|
use crate::storage::LocalStorage;
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn cleanup_orphans_deletes_written_keys() {
|
||||||
|
let dir = tempfile::tempdir().unwrap();
|
||||||
|
let storage = LocalStorage::new(dir.path());
|
||||||
|
let keys = vec![
|
||||||
|
"mangas/m/chapters/c/pages/0001.jpg".to_string(),
|
||||||
|
"mangas/m/chapters/c/pages/0002.jpg".to_string(),
|
||||||
|
];
|
||||||
|
for k in &keys {
|
||||||
|
storage.put(k, b"\xff\xd8\xff\xe0 jpeg-ish").await.unwrap();
|
||||||
|
assert!(storage.exists(k).await.unwrap());
|
||||||
|
}
|
||||||
|
cleanup_orphans(&storage, &keys).await;
|
||||||
|
for k in &keys {
|
||||||
|
assert!(!storage.exists(k).await.unwrap(), "{k} should be deleted");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn cleanup_orphans_tolerates_missing_keys() {
|
||||||
|
// A key that was never written (e.g. the put itself failed) must
|
||||||
|
// not make cleanup error — it's best-effort.
|
||||||
|
let dir = tempfile::tempdir().unwrap();
|
||||||
|
let storage = LocalStorage::new(dir.path());
|
||||||
|
cleanup_orphans(&storage, &["never/written.jpg".to_string()]).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn persist_pages_inserts_rows_and_sets_page_count(pool: PgPool) {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
let chapter_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title) VALUES ($1, 'T')")
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO chapters (id, manga_id, number) VALUES ($1, $2, 1)")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let stored = vec![
|
||||||
|
StoredPage {
|
||||||
|
page_number: 1,
|
||||||
|
storage_key: "k/0001.jpg".into(),
|
||||||
|
content_type: "image/jpeg".into(),
|
||||||
|
},
|
||||||
|
StoredPage {
|
||||||
|
page_number: 2,
|
||||||
|
storage_key: "k/0002.jpg".into(),
|
||||||
|
content_type: "image/jpeg".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
persist_pages(&pool, chapter_id, &stored).await.unwrap();
|
||||||
|
|
||||||
|
let page_count: i32 =
|
||||||
|
sqlx::query_scalar("SELECT page_count FROM chapters WHERE id = $1")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(page_count, 2);
|
||||||
|
let rows: i64 =
|
||||||
|
sqlx::query_scalar("SELECT COUNT(*) FROM pages WHERE chapter_id = $1")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(rows, 2);
|
||||||
|
|
||||||
|
// Idempotent re-run (force refetch path): same rows, page_count stable.
|
||||||
|
persist_pages(&pool, chapter_id, &stored).await.unwrap();
|
||||||
|
let rows2: i64 =
|
||||||
|
sqlx::query_scalar("SELECT COUNT(*) FROM pages WHERE chapter_id = $1")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(rows2, 2, "re-run is idempotent via ON CONFLICT");
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn parse_chapter_pages_skips_loader_and_sorts_by_id() {
|
fn parse_chapter_pages_skips_loader_and_sorts_by_id() {
|
||||||
@@ -304,4 +582,214 @@ mod tests {
|
|||||||
let err = parse_chapter_pages(html).expect_err("expected Transient");
|
let err = parse_chapter_pages(html).expect_err("expected Transient");
|
||||||
assert!(err.is_transient(), "got non-transient: {err}");
|
assert!(err.is_transient(), "got non-transient: {err}");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// --- fetch_chapter_html_with_recircuit -------------------------------
|
||||||
|
|
||||||
|
const OK_HTML: &str = r#"<html><body><a id="pic_container"><img id="page1" src="x"/></a></body></html>"#;
|
||||||
|
const UNAUTH_HTML: &str = r#"<html><body><header><div id="logo">x</div></header><main>please log in</main></body></html>"#;
|
||||||
|
const TRANSIENT_HTML: &str = "<html><body><p>we're sorry, the request file are not found.</p></body></html>";
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_ok_first_attempt() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetches = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetches += 1;
|
||||||
|
async { Ok(OK_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
|
||||||
|
assert_eq!(fetches, 1);
|
||||||
|
assert_eq!(recircuits, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_unauth_with_single_attempt_returns_session_expired() {
|
||||||
|
// max_attempts=1 = TOR disabled, fail-fast on first Unauthenticated.
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetches = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetches += 1;
|
||||||
|
async { Ok(UNAUTH_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
1,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok-result");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::SessionExpired));
|
||||||
|
assert_eq!(fetches, 1);
|
||||||
|
assert_eq!(recircuits, 0, "no recircuit when budget is 1 (TOR disabled)");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_unauth_then_ok_within_budget() {
|
||||||
|
// max_attempts=3 = up to 3 fetches with 2 recircuits between.
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
let n = fetch_n;
|
||||||
|
async move {
|
||||||
|
if n == 1 {
|
||||||
|
Ok(UNAUTH_HTML.to_string())
|
||||||
|
} else {
|
||||||
|
Ok(OK_HTML.to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
|
||||||
|
assert_eq!(fetch_n, 2);
|
||||||
|
assert_eq!(recircuits, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_unauth_exhausts_budget_returns_session_expired() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
async { Ok(UNAUTH_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok-result");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::SessionExpired));
|
||||||
|
assert_eq!(fetch_n, 3, "max_attempts=3 → 3 fetches total");
|
||||||
|
assert_eq!(recircuits, 2, "2 recircuits between 3 fetches");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_transient_then_ok_within_budget() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
let n = fetch_n;
|
||||||
|
async move {
|
||||||
|
if n < 3 {
|
||||||
|
Ok(TRANSIENT_HTML.to_string())
|
||||||
|
} else {
|
||||||
|
Ok(OK_HTML.to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
|
||||||
|
assert_eq!(fetch_n, 3);
|
||||||
|
assert_eq!(recircuits, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_transient_exhausts_budget_returns_persistent() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
async { Ok(TRANSIENT_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok-result");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::PersistentTransient));
|
||||||
|
assert_eq!(fetch_n, 3, "max_attempts=3 → 3 fetches total");
|
||||||
|
assert_eq!(recircuits, 2, "2 recircuits between 3 fetches");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_mixed_transient_then_unauth_then_ok_shares_budget() {
|
||||||
|
// Audit-prompted regression: outcomes share the attempt counter.
|
||||||
|
// Sequence: Transient (attempt 1) → Unauth (attempt 2) → Ok (3).
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let outcome = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
let n = fetch_n;
|
||||||
|
async move {
|
||||||
|
match n {
|
||||||
|
1 => Ok(TRANSIENT_HTML.to_string()),
|
||||||
|
2 => Ok(UNAUTH_HTML.to_string()),
|
||||||
|
_ => Ok(OK_HTML.to_string()),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok");
|
||||||
|
assert!(matches!(outcome, ChapterFetchOutcome::Ok(_)));
|
||||||
|
assert_eq!(fetch_n, 3);
|
||||||
|
assert_eq!(recircuits, 2);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn recircuit_loop_propagates_fetch_errors() {
|
||||||
|
let mut fetch_n = 0u32;
|
||||||
|
let err = fetch_chapter_html_with_recircuit(
|
||||||
|
|| {
|
||||||
|
fetch_n += 1;
|
||||||
|
async { Err(anyhow::anyhow!("nav timeout")) }
|
||||||
|
},
|
||||||
|
|| async {},
|
||||||
|
3,
|
||||||
|
"https://example/c",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect_err("fetch error bubbles");
|
||||||
|
assert_eq!(fetch_n, 1);
|
||||||
|
assert!(format!("{err:#}").contains("nav timeout"));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -48,6 +48,7 @@ use tokio_util::sync::CancellationToken;
|
|||||||
use crate::crawler::content::SyncOutcome;
|
use crate::crawler::content::SyncOutcome;
|
||||||
use crate::crawler::jobs::{self, JobPayload, Lease, KIND_SYNC_CHAPTER_CONTENT};
|
use crate::crawler::jobs::{self, JobPayload, Lease, KIND_SYNC_CHAPTER_CONTENT};
|
||||||
use crate::crawler::pipeline;
|
use crate::crawler::pipeline;
|
||||||
|
use crate::crawler::status::{Phase, StatusHandle};
|
||||||
|
|
||||||
/// Fixed `pg_try_advisory_lock` key. ASCII "MANGALRD" interpreted as a
|
/// Fixed `pg_try_advisory_lock` key. ASCII "MANGALRD" interpreted as a
|
||||||
/// big-endian i64. Hardcoded so every replica agrees on the lock identity
|
/// big-endian i64. Hardcoded so every replica agrees on the lock identity
|
||||||
@@ -56,6 +57,15 @@ pub const CRON_LOCK_KEY: i64 = 0x4D414E47414C5244;
|
|||||||
|
|
||||||
const STATE_KEY_LAST_TICK: &str = "last_metadata_tick_at";
|
const STATE_KEY_LAST_TICK: &str = "last_metadata_tick_at";
|
||||||
|
|
||||||
|
/// Lease window handed to `jobs::lease`. Kept short, but continuously
|
||||||
|
/// extended by the per-job heartbeat (see [`WorkerContext::process_lease`])
|
||||||
|
/// so a long-but-healthy job never lapses and gets stolen.
|
||||||
|
const LEASE_DURATION: Duration = Duration::from_secs(60);
|
||||||
|
|
||||||
|
/// How often the heartbeat renews the lease while a job runs. A third of
|
||||||
|
/// the lease window leaves two missed-beat's slack before expiry.
|
||||||
|
const LEASE_HEARTBEAT: Duration = Duration::from_secs(20);
|
||||||
|
|
||||||
#[async_trait]
|
#[async_trait]
|
||||||
pub trait MetadataPass: Send + Sync {
|
pub trait MetadataPass: Send + Sync {
|
||||||
async fn run(&self) -> anyhow::Result<pipeline::MetadataStats>;
|
async fn run(&self) -> anyhow::Result<pipeline::MetadataStats>;
|
||||||
@@ -77,6 +87,13 @@ pub struct DaemonConfig {
|
|||||||
pub tz: Tz,
|
pub tz: Tz,
|
||||||
pub retention_days: u32,
|
pub retention_days: u32,
|
||||||
pub session_expired: Arc<AtomicBool>,
|
pub session_expired: Arc<AtomicBool>,
|
||||||
|
/// Live status surface updated by the cron + workers.
|
||||||
|
pub status: StatusHandle,
|
||||||
|
/// Hard upper bound on a single job's dispatch. A job that exceeds it
|
||||||
|
/// is acked failed (exponential backoff) rather than wedging a worker
|
||||||
|
/// forever. Must exceed [`LEASE_HEARTBEAT`] and the realistic
|
||||||
|
/// single-job runtime.
|
||||||
|
pub job_timeout: Duration,
|
||||||
/// Tasks that should run alongside the cron + workers and be cancelled
|
/// Tasks that should run alongside the cron + workers and be cancelled
|
||||||
/// on shutdown. Used to hand the daemon ownership of the browser
|
/// on shutdown. Used to hand the daemon ownership of the browser
|
||||||
/// manager's idle reaper.
|
/// manager's idle reaper.
|
||||||
@@ -123,6 +140,8 @@ pub fn spawn(pool: PgPool, cancel: CancellationToken, cfg: DaemonConfig) -> Daem
|
|||||||
tz,
|
tz,
|
||||||
retention_days,
|
retention_days,
|
||||||
session_expired,
|
session_expired,
|
||||||
|
status,
|
||||||
|
job_timeout,
|
||||||
extra_tasks,
|
extra_tasks,
|
||||||
} = cfg;
|
} = cfg;
|
||||||
|
|
||||||
@@ -134,6 +153,7 @@ pub fn spawn(pool: PgPool, cancel: CancellationToken, cfg: DaemonConfig) -> Daem
|
|||||||
tz,
|
tz,
|
||||||
retention_days,
|
retention_days,
|
||||||
metadata,
|
metadata,
|
||||||
|
status: status.clone(),
|
||||||
};
|
};
|
||||||
join.spawn(async move { ctx.run().await });
|
join.spawn(async move { ctx.run().await });
|
||||||
} else {
|
} else {
|
||||||
@@ -146,6 +166,8 @@ pub fn spawn(pool: PgPool, cancel: CancellationToken, cfg: DaemonConfig) -> Daem
|
|||||||
cancel: cancel.clone(),
|
cancel: cancel.clone(),
|
||||||
dispatcher: Arc::clone(&dispatcher),
|
dispatcher: Arc::clone(&dispatcher),
|
||||||
session_expired: Arc::clone(&session_expired),
|
session_expired: Arc::clone(&session_expired),
|
||||||
|
status: status.clone(),
|
||||||
|
job_timeout,
|
||||||
id: worker_id,
|
id: worker_id,
|
||||||
};
|
};
|
||||||
join.spawn(async move { ctx.run().await });
|
join.spawn(async move { ctx.run().await });
|
||||||
@@ -169,6 +191,7 @@ struct CronContext {
|
|||||||
tz: Tz,
|
tz: Tz,
|
||||||
retention_days: u32,
|
retention_days: u32,
|
||||||
metadata: Arc<dyn MetadataPass>,
|
metadata: Arc<dyn MetadataPass>,
|
||||||
|
status: StatusHandle,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl CronContext {
|
impl CronContext {
|
||||||
@@ -196,6 +219,11 @@ impl CronContext {
|
|||||||
// (NTP step, suspend/resume) don't strand us on a stale instant.
|
// (NTP step, suspend/resume) don't strand us on a stale instant.
|
||||||
let next = next_fire(Utc::now(), self.daily_at, self.tz);
|
let next = next_fire(Utc::now(), self.daily_at, self.tz);
|
||||||
let wait = (next - Utc::now()).to_std().unwrap_or(Duration::ZERO);
|
let wait = (next - Utc::now()).to_std().unwrap_or(Duration::ZERO);
|
||||||
|
self.status
|
||||||
|
.set_phase(Phase::Idle {
|
||||||
|
next_fire: Some(next),
|
||||||
|
})
|
||||||
|
.await;
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
next_fire_utc = %next.to_rfc3339(),
|
next_fire_utc = %next.to_rfc3339(),
|
||||||
wait_seconds = wait.as_secs(),
|
wait_seconds = wait.as_secs(),
|
||||||
@@ -243,9 +271,13 @@ impl CronContext {
|
|||||||
let metadata = &self.metadata;
|
let metadata = &self.metadata;
|
||||||
let pool = &self.pool;
|
let pool = &self.pool;
|
||||||
let retention_days = self.retention_days;
|
let retention_days = self.retention_days;
|
||||||
|
let status = &self.status;
|
||||||
let body = async move {
|
let body = async move {
|
||||||
match metadata.run().await {
|
match metadata.run().await {
|
||||||
Ok(stats) => tracing::info!(?stats, "cron: metadata pass done"),
|
Ok(stats) => {
|
||||||
|
status.record_pass(&stats, Utc::now()).await;
|
||||||
|
tracing::info!(?stats, "cron: metadata pass done");
|
||||||
|
}
|
||||||
Err(e) => tracing::error!(?e, "cron: metadata pass failed"),
|
Err(e) => tracing::error!(?e, "cron: metadata pass failed"),
|
||||||
}
|
}
|
||||||
match pipeline::enqueue_bookmarked_pending(pool).await {
|
match pipeline::enqueue_bookmarked_pending(pool).await {
|
||||||
@@ -283,6 +315,8 @@ struct WorkerContext {
|
|||||||
cancel: CancellationToken,
|
cancel: CancellationToken,
|
||||||
dispatcher: Arc<dyn ChapterDispatcher>,
|
dispatcher: Arc<dyn ChapterDispatcher>,
|
||||||
session_expired: Arc<AtomicBool>,
|
session_expired: Arc<AtomicBool>,
|
||||||
|
status: StatusHandle,
|
||||||
|
job_timeout: Duration,
|
||||||
id: usize,
|
id: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -303,7 +337,7 @@ impl WorkerContext {
|
|||||||
&self.pool,
|
&self.pool,
|
||||||
Some(KIND_SYNC_CHAPTER_CONTENT),
|
Some(KIND_SYNC_CHAPTER_CONTENT),
|
||||||
1,
|
1,
|
||||||
Duration::from_secs(60),
|
LEASE_DURATION,
|
||||||
)
|
)
|
||||||
.await
|
.await
|
||||||
{
|
{
|
||||||
@@ -341,9 +375,59 @@ impl WorkerContext {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let outcome = AssertUnwindSafe(self.dispatcher.dispatch(lease.payload.clone()))
|
// Heartbeat: keep the lease fresh while the (potentially long)
|
||||||
.catch_unwind()
|
// dispatch runs, so a slow-but-healthy job is never re-leased and
|
||||||
|
// never inflates `attempts` toward `max_attempts`. Stops itself
|
||||||
|
// once the job is no longer ours (renew returns false).
|
||||||
|
let heartbeat = {
|
||||||
|
let hb_pool = self.pool.clone();
|
||||||
|
let hb_id = lease.id;
|
||||||
|
tokio::spawn(async move {
|
||||||
|
loop {
|
||||||
|
tokio::time::sleep(LEASE_HEARTBEAT).await;
|
||||||
|
match jobs::renew(&hb_pool, hb_id, LEASE_DURATION).await {
|
||||||
|
Ok(true) => {}
|
||||||
|
Ok(false) => break,
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(lease_id = %hb_id, ?e, "heartbeat renew failed");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
};
|
||||||
|
|
||||||
|
// The "currently crawling" chapter (with its live page count) is
|
||||||
|
// registered by the dispatcher itself (RealChapterDispatcher) so it
|
||||||
|
// carries the manga/chapter identity + page progress and is removed
|
||||||
|
// via an RAII guard on every exit path.
|
||||||
|
|
||||||
|
// Outer timeout: a dispatch that exceeds `job_timeout` is acked
|
||||||
|
// failed (exponential backoff) rather than wedging the worker.
|
||||||
|
let dispatch = AssertUnwindSafe(self.dispatcher.dispatch(lease.payload.clone()))
|
||||||
|
.catch_unwind();
|
||||||
|
let outcome = tokio::time::timeout(self.job_timeout, dispatch).await;
|
||||||
|
heartbeat.abort();
|
||||||
|
|
||||||
|
let outcome = match outcome {
|
||||||
|
Ok(o) => o,
|
||||||
|
Err(_elapsed) => {
|
||||||
|
tracing::warn!(
|
||||||
|
worker = self.id,
|
||||||
|
lease_id = %lease.id,
|
||||||
|
timeout_secs = self.job_timeout.as_secs(),
|
||||||
|
"worker: dispatch timed out — ack failed"
|
||||||
|
);
|
||||||
|
let _ = jobs::ack_failed(
|
||||||
|
&self.pool,
|
||||||
|
lease.id,
|
||||||
|
"dispatch timed out",
|
||||||
|
lease.attempts,
|
||||||
|
lease.max_attempts,
|
||||||
|
)
|
||||||
.await;
|
.await;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
};
|
||||||
match outcome {
|
match outcome {
|
||||||
Ok(Ok(SyncOutcome::Fetched { .. } | SyncOutcome::Skipped)) => {
|
Ok(Ok(SyncOutcome::Fetched { .. } | SyncOutcome::Skipped)) => {
|
||||||
let _ = jobs::ack_done(&self.pool, lease.id).await;
|
let _ = jobs::ack_done(&self.pool, lease.id).await;
|
||||||
@@ -355,6 +439,8 @@ impl WorkerContext {
|
|||||||
"session expired — workers will idle until restart"
|
"session expired — workers will idle until restart"
|
||||||
);
|
);
|
||||||
self.session_expired.store(true, Ordering::Release);
|
self.session_expired.store(true, Ordering::Release);
|
||||||
|
// Push the session-expired flip to live status subscribers.
|
||||||
|
self.status.poke();
|
||||||
let _ = jobs::release(&self.pool, lease.id).await;
|
let _ = jobs::release(&self.pool, lease.id).await;
|
||||||
}
|
}
|
||||||
Ok(Err(e)) => {
|
Ok(Err(e)) => {
|
||||||
|
|||||||
@@ -80,13 +80,36 @@ pub fn has_logo_sentinel(doc: &scraper::Html) -> bool {
|
|||||||
/// caller can fall back on the job system's retry/backoff once the
|
/// caller can fall back on the job system's retry/backoff once the
|
||||||
/// inline budget is exhausted.
|
/// inline budget is exhausted.
|
||||||
pub async fn retry_on_transient<F, Fut, T>(
|
pub async fn retry_on_transient<F, Fut, T>(
|
||||||
mut op: F,
|
op: F,
|
||||||
max_attempts: u32,
|
max_attempts: u32,
|
||||||
delay: Duration,
|
delay: Duration,
|
||||||
) -> Result<T, PageError>
|
) -> Result<T, PageError>
|
||||||
where
|
where
|
||||||
F: FnMut() -> Fut,
|
F: FnMut() -> Fut,
|
||||||
Fut: Future<Output = Result<T, PageError>>,
|
Fut: Future<Output = Result<T, PageError>>,
|
||||||
|
{
|
||||||
|
retry_on_transient_with_hook(op, max_attempts, delay, || async {}).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Like [`retry_on_transient`] but invokes `on_retry` between a
|
||||||
|
/// transient failure and the subsequent sleep+retry. The hook does
|
||||||
|
/// **not** fire on the first attempt, after a non-transient error, or
|
||||||
|
/// after the final attempt (no retry follows). Hook failures are not
|
||||||
|
/// propagated — return `()` from the future and log inside if needed.
|
||||||
|
///
|
||||||
|
/// Wire the TOR controller's `new_identity` here to rotate circuits
|
||||||
|
/// between page-fetch retries; see [`crate::crawler::tor`].
|
||||||
|
pub async fn retry_on_transient_with_hook<F, Fut, T, H, HFut>(
|
||||||
|
mut op: F,
|
||||||
|
max_attempts: u32,
|
||||||
|
delay: Duration,
|
||||||
|
mut on_retry: H,
|
||||||
|
) -> Result<T, PageError>
|
||||||
|
where
|
||||||
|
F: FnMut() -> Fut,
|
||||||
|
Fut: Future<Output = Result<T, PageError>>,
|
||||||
|
H: FnMut() -> HFut,
|
||||||
|
HFut: Future<Output = ()>,
|
||||||
{
|
{
|
||||||
debug_assert!(max_attempts >= 1, "max_attempts must be at least 1");
|
debug_assert!(max_attempts >= 1, "max_attempts must be at least 1");
|
||||||
let mut attempt = 0u32;
|
let mut attempt = 0u32;
|
||||||
@@ -101,8 +124,9 @@ where
|
|||||||
attempt,
|
attempt,
|
||||||
max_attempts,
|
max_attempts,
|
||||||
error = %e,
|
error = %e,
|
||||||
"transient error; sleeping before retry"
|
"transient error; running on-retry hook and sleeping before retry"
|
||||||
);
|
);
|
||||||
|
on_retry().await;
|
||||||
tokio::time::sleep(delay).await;
|
tokio::time::sleep(delay).await;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -247,4 +271,92 @@ mod tests {
|
|||||||
assert_eq!(result.unwrap(), 7);
|
assert_eq!(result.unwrap(), 7);
|
||||||
assert_eq!(attempt, 1);
|
assert_eq!(attempt, 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hook_fires_once_between_transient_and_success() {
|
||||||
|
let mut attempt = 0u32;
|
||||||
|
let mut hook_calls = 0u32;
|
||||||
|
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|
||||||
|
|| {
|
||||||
|
attempt += 1;
|
||||||
|
let n = attempt;
|
||||||
|
async move {
|
||||||
|
if n < 2 {
|
||||||
|
Err(PageError::transient("once"))
|
||||||
|
} else {
|
||||||
|
Ok(99)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
5,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
|| {
|
||||||
|
hook_calls += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert_eq!(result.unwrap(), 99);
|
||||||
|
assert_eq!(attempt, 2);
|
||||||
|
assert_eq!(hook_calls, 1, "hook fires exactly once between attempts");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hook_does_not_fire_when_first_attempt_succeeds() {
|
||||||
|
let mut hook_calls = 0u32;
|
||||||
|
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|
||||||
|
|| async { Ok(1) },
|
||||||
|
5,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
|| {
|
||||||
|
hook_calls += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert!(result.is_ok());
|
||||||
|
assert_eq!(hook_calls, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hook_does_not_fire_after_non_transient_error() {
|
||||||
|
let mut hook_calls = 0u32;
|
||||||
|
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|
||||||
|
|| async { Err(PageError::Other(anyhow::anyhow!("permanent"))) },
|
||||||
|
5,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
|| {
|
||||||
|
hook_calls += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert!(result.is_err());
|
||||||
|
assert_eq!(hook_calls, 0, "non-transient must short-circuit before hook");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn hook_does_not_fire_after_final_failed_attempt() {
|
||||||
|
// With max_attempts=3 and three persistent transients, the hook
|
||||||
|
// should run twice (between 1→2 and 2→3) — never a third time,
|
||||||
|
// because no retry follows attempt 3.
|
||||||
|
let mut attempt = 0u32;
|
||||||
|
let mut hook_calls = 0u32;
|
||||||
|
let result: Result<i32, PageError> = retry_on_transient_with_hook(
|
||||||
|
|| {
|
||||||
|
attempt += 1;
|
||||||
|
async { Err(PageError::transient("always")) }
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
|| {
|
||||||
|
hook_calls += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert!(result.is_err());
|
||||||
|
assert_eq!(attempt, 3);
|
||||||
|
assert_eq!(hook_calls, 2, "hook fires N-1 times for N attempts that all fail transient");
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -66,16 +66,33 @@ pub struct Lease {
|
|||||||
pub max_attempts: i32,
|
pub max_attempts: i32,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Exponential backoff for `ack_failed` retries. `attempts` is the
|
/// Deterministic exponential backoff base for `ack_failed` retries.
|
||||||
/// post-increment value reported by `lease()` (so the first failure has
|
/// `attempts` is the post-increment value reported by `lease()` (so the
|
||||||
/// `attempts == 1` and waits 60s, the second 120s, etc.). Capped at 1h to
|
/// first failure has `attempts == 1` and waits 60s, the second 120s,
|
||||||
/// avoid runaway long sleeps that would outlive the daemon process.
|
/// etc.). Capped at 1h to avoid runaway long sleeps that would outlive
|
||||||
fn backoff_for(attempts: i32) -> Duration {
|
/// the daemon process. Jitter is applied separately by [`apply_jitter`].
|
||||||
|
fn backoff_base(attempts: i32) -> Duration {
|
||||||
let shift = attempts.saturating_sub(1).clamp(0, 20) as u32;
|
let shift = attempts.saturating_sub(1).clamp(0, 20) as u32;
|
||||||
let secs = 60u64.saturating_mul(1u64 << shift);
|
let secs = 60u64.saturating_mul(1u64 << shift);
|
||||||
Duration::from_secs(secs.min(3600))
|
Duration::from_secs(secs.min(3600))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Apply ±20% jitter to a backoff duration. `jitter` is a fraction in
|
||||||
|
/// `[0.0, 1.0)` (e.g. `rand::random::<f64>()`), mapped to a multiplier in
|
||||||
|
/// `[0.8, 1.2)`. Pure so the bounds stay unit-testable. Spreading retries
|
||||||
|
/// avoids a thundering herd when a source outage fails many jobs at once.
|
||||||
|
fn apply_jitter(base: Duration, jitter: f64) -> Duration {
|
||||||
|
let frac = jitter.clamp(0.0, 1.0);
|
||||||
|
let mult = 0.8 + 0.4 * frac; // [0.8, 1.2)
|
||||||
|
Duration::from_secs((base.as_secs_f64() * mult).round() as u64)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Jittered exponential backoff for `ack_failed`. Wraps [`backoff_base`]
|
||||||
|
/// with a random ±20% spread.
|
||||||
|
fn backoff_for(attempts: i32) -> Duration {
|
||||||
|
apply_jitter(backoff_base(attempts), rand::random::<f64>())
|
||||||
|
}
|
||||||
|
|
||||||
/// Insert a new pending job. For `SyncChapterContent` payloads the
|
/// Insert a new pending job. For `SyncChapterContent` payloads the
|
||||||
/// partial unique index `crawler_jobs_chapter_content_dedup_idx` blocks
|
/// partial unique index `crawler_jobs_chapter_content_dedup_idx` blocks
|
||||||
/// a second `(pending|running)` insert per chapter_id, returning
|
/// a second `(pending|running)` insert per chapter_id, returning
|
||||||
@@ -104,6 +121,12 @@ pub async fn enqueue(pool: &PgPool, payload: &JobPayload) -> sqlx::Result<Enqueu
|
|||||||
///
|
///
|
||||||
/// `kind_filter` matches against `payload->>'kind'`; `None` means
|
/// `kind_filter` matches against `payload->>'kind'`; `None` means
|
||||||
/// any kind.
|
/// any kind.
|
||||||
|
///
|
||||||
|
/// Ties on `scheduled_at` (the common case: a cron batch enqueues
|
||||||
|
/// everything with the same default `now()`) break by `created_at`, so
|
||||||
|
/// jobs come off the queue in insertion order. The enqueue paths insert
|
||||||
|
/// chapter-content jobs in ascending `chapters.number` order, so this
|
||||||
|
/// tiebreaker is what propagates that intent through to dequeue.
|
||||||
pub async fn lease(
|
pub async fn lease(
|
||||||
pool: &PgPool,
|
pool: &PgPool,
|
||||||
kind_filter: Option<&str>,
|
kind_filter: Option<&str>,
|
||||||
@@ -118,7 +141,7 @@ pub async fn lease(
|
|||||||
WHERE (state = 'pending' OR (state = 'running' AND leased_until < now()))
|
WHERE (state = 'pending' OR (state = 'running' AND leased_until < now()))
|
||||||
AND scheduled_at <= now()
|
AND scheduled_at <= now()
|
||||||
AND ($1::text IS NULL OR payload->>'kind' = $1)
|
AND ($1::text IS NULL OR payload->>'kind' = $1)
|
||||||
ORDER BY scheduled_at
|
ORDER BY scheduled_at, created_at
|
||||||
LIMIT $2
|
LIMIT $2
|
||||||
FOR UPDATE SKIP LOCKED
|
FOR UPDATE SKIP LOCKED
|
||||||
)
|
)
|
||||||
@@ -153,6 +176,35 @@ pub async fn lease(
|
|||||||
Ok(leases)
|
Ok(leases)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Extend the lease on a still-owned `running` job. Returns `true` if the
|
||||||
|
/// row was updated (we still hold the lease), `false` if the job is no
|
||||||
|
/// longer `running` (re-leased after a missed heartbeat, or already
|
||||||
|
/// acked) — the caller's heartbeat loop should stop. The `state =
|
||||||
|
/// 'running'` guard mirrors [`ack_done`]'s rationale.
|
||||||
|
///
|
||||||
|
/// This is the heartbeat primitive: a worker renews periodically while a
|
||||||
|
/// long-but-healthy job runs so `leased_until` never lapses, which would
|
||||||
|
/// otherwise let another worker steal the in-flight job and spuriously
|
||||||
|
/// inflate `attempts` toward `max_attempts`.
|
||||||
|
pub async fn renew(
|
||||||
|
pool: &PgPool,
|
||||||
|
lease_id: Uuid,
|
||||||
|
lease_duration: Duration,
|
||||||
|
) -> sqlx::Result<bool> {
|
||||||
|
let lease_ms: i64 = lease_duration.as_millis().min(i64::MAX as u128) as i64;
|
||||||
|
let res = sqlx::query(
|
||||||
|
"UPDATE crawler_jobs \
|
||||||
|
SET leased_until = now() + ($2::bigint || ' milliseconds')::interval, \
|
||||||
|
updated_at = now() \
|
||||||
|
WHERE id = $1 AND state = 'running'",
|
||||||
|
)
|
||||||
|
.bind(lease_id)
|
||||||
|
.bind(lease_ms)
|
||||||
|
.execute(pool)
|
||||||
|
.await?;
|
||||||
|
Ok(res.rows_affected() > 0)
|
||||||
|
}
|
||||||
|
|
||||||
/// Mark a leased job as successfully completed. The `state = 'running'`
|
/// Mark a leased job as successfully completed. The `state = 'running'`
|
||||||
/// predicate guards against a late ack from a worker whose lease expired
|
/// predicate guards against a late ack from a worker whose lease expired
|
||||||
/// and was already re-leased by another worker: without it, the late ack
|
/// and was already re-leased by another worker: without it, the late ack
|
||||||
@@ -272,19 +324,48 @@ mod tests {
|
|||||||
use super::*;
|
use super::*;
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn backoff_grows_exponentially_and_caps_at_one_hour() {
|
fn backoff_base_grows_exponentially_and_caps_at_one_hour() {
|
||||||
// attempts == 1 → 60s, doubling each step.
|
// attempts == 1 → 60s, doubling each step.
|
||||||
assert_eq!(backoff_for(1), Duration::from_secs(60));
|
assert_eq!(backoff_base(1), Duration::from_secs(60));
|
||||||
assert_eq!(backoff_for(2), Duration::from_secs(120));
|
assert_eq!(backoff_base(2), Duration::from_secs(120));
|
||||||
assert_eq!(backoff_for(3), Duration::from_secs(240));
|
assert_eq!(backoff_base(3), Duration::from_secs(240));
|
||||||
assert_eq!(backoff_for(4), Duration::from_secs(480));
|
assert_eq!(backoff_base(4), Duration::from_secs(480));
|
||||||
assert_eq!(backoff_for(5), Duration::from_secs(960));
|
assert_eq!(backoff_base(5), Duration::from_secs(960));
|
||||||
assert_eq!(backoff_for(6), Duration::from_secs(1920));
|
assert_eq!(backoff_base(6), Duration::from_secs(1920));
|
||||||
// 7th: 60 * 64 = 3840 → capped to 3600.
|
// 7th: 60 * 64 = 3840 → capped to 3600.
|
||||||
assert_eq!(backoff_for(7), Duration::from_secs(3600));
|
assert_eq!(backoff_base(7), Duration::from_secs(3600));
|
||||||
assert_eq!(backoff_for(20), Duration::from_secs(3600));
|
assert_eq!(backoff_base(20), Duration::from_secs(3600));
|
||||||
// Garbage / zero / negatives stay sane.
|
// Garbage / zero / negatives stay sane.
|
||||||
assert_eq!(backoff_for(0), Duration::from_secs(60));
|
assert_eq!(backoff_base(0), Duration::from_secs(60));
|
||||||
assert_eq!(backoff_for(-5), Duration::from_secs(60));
|
assert_eq!(backoff_base(-5), Duration::from_secs(60));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn apply_jitter_stays_within_plus_minus_twenty_percent() {
|
||||||
|
let base = Duration::from_secs(100);
|
||||||
|
// Lower bound (jitter = 0.0) → 0.8x.
|
||||||
|
assert_eq!(apply_jitter(base, 0.0), Duration::from_secs(80));
|
||||||
|
// Midpoint (jitter = 0.5) → 1.0x.
|
||||||
|
assert_eq!(apply_jitter(base, 0.5), Duration::from_secs(100));
|
||||||
|
// Upper end (jitter → 1.0) → ~1.2x.
|
||||||
|
assert_eq!(apply_jitter(base, 1.0), Duration::from_secs(120));
|
||||||
|
// Out-of-range inputs are clamped, never panic.
|
||||||
|
assert_eq!(apply_jitter(base, -3.0), Duration::from_secs(80));
|
||||||
|
assert_eq!(apply_jitter(base, 9.0), Duration::from_secs(120));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn backoff_for_random_jitter_stays_in_band() {
|
||||||
|
// The production wrapper draws its own randomness; assert the
|
||||||
|
// result for a mid-range attempt always lands within the jitter
|
||||||
|
// band of the base, across many draws.
|
||||||
|
let base = backoff_base(3).as_secs_f64(); // 240s
|
||||||
|
for _ in 0..1000 {
|
||||||
|
let v = backoff_for(3).as_secs_f64();
|
||||||
|
assert!(
|
||||||
|
v >= base * 0.8 - 1.0 && v <= base * 1.2 + 1.0,
|
||||||
|
"jittered backoff {v} outside band of base {base}"
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -23,7 +23,11 @@ pub mod jobs;
|
|||||||
pub mod nav;
|
pub mod nav;
|
||||||
pub mod pipeline;
|
pub mod pipeline;
|
||||||
pub mod rate_limit;
|
pub mod rate_limit;
|
||||||
|
pub mod resync;
|
||||||
pub mod safety;
|
pub mod safety;
|
||||||
pub mod session;
|
pub mod session;
|
||||||
|
pub mod session_control;
|
||||||
pub mod source;
|
pub mod source;
|
||||||
|
pub mod status;
|
||||||
|
pub mod tor;
|
||||||
pub mod url_utils;
|
pub mod url_utils;
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ use crate::crawler::jobs::{self, EnqueueResult, JobPayload};
|
|||||||
use crate::crawler::rate_limit::HostRateLimiters;
|
use crate::crawler::rate_limit::HostRateLimiters;
|
||||||
use crate::crawler::safety::{fetch_bytes_capped, looks_like_image, DownloadAllowlist};
|
use crate::crawler::safety::{fetch_bytes_capped, looks_like_image, DownloadAllowlist};
|
||||||
use crate::crawler::source::target::TargetSource;
|
use crate::crawler::source::target::TargetSource;
|
||||||
use crate::crawler::source::{FetchContext, Source};
|
use crate::crawler::source::{FetchContext, Source, SourceMangaRef};
|
||||||
use crate::repo;
|
use crate::repo;
|
||||||
use crate::repo::crawler::UpsertStatus;
|
use crate::repo::crawler::UpsertStatus;
|
||||||
use crate::storage::Storage;
|
use crate::storage::Storage;
|
||||||
@@ -65,6 +65,17 @@ pub(crate) fn should_mark_clean_exit(
|
|||||||
walked_to_completion || hit_stop_condition
|
walked_to_completion || hit_stop_condition
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Circuit-breaker: abort the walk once `consecutive` `fetch_manga`
|
||||||
|
/// failures reach `threshold`. A `threshold` of 0 disables the breaker
|
||||||
|
/// (unbounded — the legacy behaviour). When it fires the caller must NOT
|
||||||
|
/// mark a clean exit, so the next tick does a recovery sweep over the
|
||||||
|
/// catalog tail the aborted pass never reached.
|
||||||
|
///
|
||||||
|
/// Pure so the rule is unit-testable without the walker.
|
||||||
|
pub(crate) fn should_abort_pass(consecutive: u32, threshold: u32) -> bool {
|
||||||
|
threshold > 0 && consecutive >= threshold
|
||||||
|
}
|
||||||
|
|
||||||
/// Runs the discover → fetch → upsert → cover → chapter-list-diff pipeline
|
/// Runs the discover → fetch → upsert → cover → chapter-list-diff pipeline
|
||||||
/// for the target source. Pure metadata; chapter content is enqueued as
|
/// for the target source. Pure metadata; chapter content is enqueued as
|
||||||
/// separate `SyncChapterContent` jobs by the caller after this returns.
|
/// separate `SyncChapterContent` jobs by the caller after this returns.
|
||||||
@@ -103,12 +114,18 @@ pub async fn run_metadata_pass(
|
|||||||
skip_chapters: bool,
|
skip_chapters: bool,
|
||||||
allowlist: &DownloadAllowlist,
|
allowlist: &DownloadAllowlist,
|
||||||
max_image_bytes: usize,
|
max_image_bytes: usize,
|
||||||
|
max_consecutive_failures: u32,
|
||||||
|
status: Option<&crate::crawler::status::StatusHandle>,
|
||||||
|
tor: Option<&crate::crawler::tor::TorController>,
|
||||||
) -> anyhow::Result<MetadataStats> {
|
) -> anyhow::Result<MetadataStats> {
|
||||||
let lease = browser_manager
|
let lease = browser_manager
|
||||||
.acquire()
|
.acquire()
|
||||||
.await
|
.await
|
||||||
.context("acquire browser lease for metadata pass")?;
|
.context("acquire browser lease for metadata pass")?;
|
||||||
let browser_ref: &chromiumoxide::Browser = &lease;
|
let browser_ref: &chromiumoxide::Browser = &lease;
|
||||||
|
if let Some(s) = status {
|
||||||
|
s.set_phase(crate::crawler::status::Phase::WalkingList).await;
|
||||||
|
}
|
||||||
|
|
||||||
let source = {
|
let source = {
|
||||||
let s = TargetSource::new(start_url.to_string());
|
let s = TargetSource::new(start_url.to_string());
|
||||||
@@ -121,6 +138,7 @@ pub async fn run_metadata_pass(
|
|||||||
let ctx = FetchContext {
|
let ctx = FetchContext {
|
||||||
browser: browser_ref,
|
browser: browser_ref,
|
||||||
rate,
|
rate,
|
||||||
|
tor,
|
||||||
};
|
};
|
||||||
|
|
||||||
let source_id = source.id();
|
let source_id = source.id();
|
||||||
@@ -163,6 +181,11 @@ pub async fn run_metadata_pass(
|
|||||||
let mut walked_to_completion = false;
|
let mut walked_to_completion = false;
|
||||||
let mut hit_limit = false;
|
let mut hit_limit = false;
|
||||||
let mut hit_stop_condition = false;
|
let mut hit_stop_condition = false;
|
||||||
|
// Circuit-breaker state: consecutive fetch_manga failures. A sustained
|
||||||
|
// run abort (source outage) leaves the pass un-clean → recovery sweep
|
||||||
|
// next tick.
|
||||||
|
let mut consecutive_failures = 0u32;
|
||||||
|
let mut hit_failure_breaker = false;
|
||||||
|
|
||||||
'outer: loop {
|
'outer: loop {
|
||||||
let batch = match walker.next_batch(&ctx).await? {
|
let batch = match walker.next_batch(&ctx).await? {
|
||||||
@@ -173,6 +196,17 @@ pub async fn run_metadata_pass(
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
for r in batch {
|
for r in batch {
|
||||||
|
// Cooperative checkpoint: if a coordinated browser restart is
|
||||||
|
// pending, yield our (long-lived) lease so the drain can
|
||||||
|
// proceed instead of stalling for the rest of the walk. The
|
||||||
|
// pass exits un-clean, so the next tick recovery-sweeps the
|
||||||
|
// tail we didn't reach.
|
||||||
|
if browser_manager.is_restart_pending() {
|
||||||
|
tracing::info!(
|
||||||
|
"metadata pass: browser restart pending — yielding (recovery sweep next tick)"
|
||||||
|
);
|
||||||
|
break 'outer;
|
||||||
|
}
|
||||||
if max_refs.map(|m| stats.discovered >= m).unwrap_or(false) {
|
if max_refs.map(|m| stats.discovered >= m).unwrap_or(false) {
|
||||||
hit_limit = true;
|
hit_limit = true;
|
||||||
tracing::info!(cap = ?max_refs, "max_results reached; halting walk");
|
tracing::info!(cap = ?max_refs, "max_results reached; halting walk");
|
||||||
@@ -196,13 +230,24 @@ pub async fn run_metadata_pass(
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
stats.discovered += 1;
|
stats.discovered += 1;
|
||||||
|
if let Some(s) = status {
|
||||||
|
s.set_phase(crate::crawler::status::Phase::FetchingMetadata {
|
||||||
|
index: stats.discovered,
|
||||||
|
total: max_refs,
|
||||||
|
title: r.title.clone(),
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
}
|
||||||
tracing::info!(
|
tracing::info!(
|
||||||
idx = stats.discovered,
|
idx = stats.discovered,
|
||||||
key = %r.source_manga_key,
|
key = %r.source_manga_key,
|
||||||
"fetching metadata"
|
"fetching metadata"
|
||||||
);
|
);
|
||||||
let manga = match source.fetch_manga(&ctx, &r).await {
|
let manga = match source.fetch_manga(&ctx, &r).await {
|
||||||
Ok(m) => m,
|
Ok(m) => {
|
||||||
|
consecutive_failures = 0;
|
||||||
|
m
|
||||||
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
tracing::warn!(
|
tracing::warn!(
|
||||||
key = %r.source_manga_key,
|
key = %r.source_manga_key,
|
||||||
@@ -211,6 +256,17 @@ pub async fn run_metadata_pass(
|
|||||||
"fetch_manga failed"
|
"fetch_manga failed"
|
||||||
);
|
);
|
||||||
stats.mangas_failed += 1;
|
stats.mangas_failed += 1;
|
||||||
|
consecutive_failures += 1;
|
||||||
|
if should_abort_pass(consecutive_failures, max_consecutive_failures) {
|
||||||
|
hit_failure_breaker = true;
|
||||||
|
tracing::error!(
|
||||||
|
consecutive_failures,
|
||||||
|
threshold = max_consecutive_failures,
|
||||||
|
"metadata pass: too many consecutive fetch_manga failures; \
|
||||||
|
aborting (recovery sweep on next tick)"
|
||||||
|
);
|
||||||
|
break 'outer;
|
||||||
|
}
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -293,7 +349,14 @@ pub async fn run_metadata_pass(
|
|||||||
|| matches!(upsert.status, repo::crawler::UpsertStatus::Updated);
|
|| matches!(upsert.status, repo::crawler::UpsertStatus::Updated);
|
||||||
if needs_cover {
|
if needs_cover {
|
||||||
if let Some(cover_url) = manga.cover_url.as_deref() {
|
if let Some(cover_url) = manga.cover_url.as_deref() {
|
||||||
match download_and_store_cover(
|
if let Some(s) = status {
|
||||||
|
s.set_current_cover(Some(crate::crawler::status::CoverTarget {
|
||||||
|
manga_id: upsert.manga_id,
|
||||||
|
manga_title: manga.title.clone(),
|
||||||
|
}))
|
||||||
|
.await;
|
||||||
|
}
|
||||||
|
let cover_result = download_and_store_cover(
|
||||||
db,
|
db,
|
||||||
storage,
|
storage,
|
||||||
http,
|
http,
|
||||||
@@ -304,8 +367,11 @@ pub async fn run_metadata_pass(
|
|||||||
allowlist,
|
allowlist,
|
||||||
max_image_bytes,
|
max_image_bytes,
|
||||||
)
|
)
|
||||||
.await
|
.await;
|
||||||
{
|
if let Some(s) = status {
|
||||||
|
s.set_current_cover(None).await;
|
||||||
|
}
|
||||||
|
match cover_result {
|
||||||
Ok(()) => stats.covers_fetched += 1,
|
Ok(()) => stats.covers_fetched += 1,
|
||||||
Err(e) => tracing::warn!(
|
Err(e) => tracing::warn!(
|
||||||
manga_id = %upsert.manga_id,
|
manga_id = %upsert.manga_id,
|
||||||
@@ -388,6 +454,7 @@ pub async fn run_metadata_pass(
|
|||||||
walked_to_completion,
|
walked_to_completion,
|
||||||
hit_limit,
|
hit_limit,
|
||||||
hit_stop_condition,
|
hit_stop_condition,
|
||||||
|
hit_failure_breaker,
|
||||||
exited_cleanly,
|
exited_cleanly,
|
||||||
"metadata pass complete"
|
"metadata pass complete"
|
||||||
);
|
);
|
||||||
@@ -427,8 +494,8 @@ pub async fn enqueue_bookmarked_pending(pool: &PgPool) -> anyhow::Result<Enqueue
|
|||||||
AND cj.state = 'dead'
|
AND cj.state = 'dead'
|
||||||
AND cj.updated_at > now() - ($1::bigint || ' days')::interval
|
AND cj.updated_at > now() - ($1::bigint || ' days')::interval
|
||||||
)
|
)
|
||||||
GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.manga_id, c.created_at
|
GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.manga_id, c.number, c.created_at
|
||||||
ORDER BY c.manga_id, c.created_at ASC
|
ORDER BY c.manga_id, c.number ASC, c.created_at ASC
|
||||||
"#,
|
"#,
|
||||||
)
|
)
|
||||||
.bind(CHAPTER_DEAD_QUARANTINE_DAYS)
|
.bind(CHAPTER_DEAD_QUARANTINE_DAYS)
|
||||||
@@ -469,7 +536,7 @@ pub async fn enqueue_pending_for_manga(
|
|||||||
) -> anyhow::Result<EnqueueSummary> {
|
) -> anyhow::Result<EnqueueSummary> {
|
||||||
let rows: Vec<(String, Uuid, String)> = sqlx::query_as(
|
let rows: Vec<(String, Uuid, String)> = sqlx::query_as(
|
||||||
r#"
|
r#"
|
||||||
SELECT DISTINCT cs.source_id, c.id AS chapter_id, cs.source_chapter_key
|
SELECT cs.source_id, c.id AS chapter_id, cs.source_chapter_key
|
||||||
FROM chapters c
|
FROM chapters c
|
||||||
JOIN chapter_sources cs ON cs.chapter_id = c.id
|
JOIN chapter_sources cs ON cs.chapter_id = c.id
|
||||||
WHERE c.manga_id = $1
|
WHERE c.manga_id = $1
|
||||||
@@ -482,7 +549,8 @@ pub async fn enqueue_pending_for_manga(
|
|||||||
AND cj.state = 'dead'
|
AND cj.state = 'dead'
|
||||||
AND cj.updated_at > now() - ($2::bigint || ' days')::interval
|
AND cj.updated_at > now() - ($2::bigint || ' days')::interval
|
||||||
)
|
)
|
||||||
ORDER BY cs.source_id, c.id
|
GROUP BY cs.source_id, c.id, cs.source_chapter_key, c.number, c.created_at
|
||||||
|
ORDER BY c.number ASC, c.created_at ASC, cs.source_id
|
||||||
"#,
|
"#,
|
||||||
)
|
)
|
||||||
.bind(manga_id)
|
.bind(manga_id)
|
||||||
@@ -521,12 +589,149 @@ pub struct EnqueueSummary {
|
|||||||
pub failed: usize,
|
pub failed: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Default, Clone, Copy)]
|
||||||
|
pub struct CoverBackfillStats {
|
||||||
|
pub considered: usize,
|
||||||
|
pub fetched: usize,
|
||||||
|
pub failed: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Default per-tick cap for [`backfill_missing_covers`]. The metadata pass
|
||||||
|
/// already retries covers when its walk reaches the affected manga; this
|
||||||
|
/// backfill exists to catch the residual case where the early-stop
|
||||||
|
/// optimisation prevents the walk from reaching mangas whose cover failed
|
||||||
|
/// on first attempt. A small cap is enough because the backlog only grows
|
||||||
|
/// from sporadic download failures, not from systematic misses.
|
||||||
|
pub const COVER_BACKFILL_DEFAULT_MAX: usize = 10;
|
||||||
|
|
||||||
|
/// Re-attempt cover downloads for mangas where `cover_image_path IS NULL`
|
||||||
|
/// but a live `manga_sources` row exists. Refetches the source detail
|
||||||
|
/// page (which is where the cover URL lives) and downloads the cover.
|
||||||
|
///
|
||||||
|
/// Bounded by `max_mangas` per call so a steady stream of failing covers
|
||||||
|
/// — e.g. a CDN host that's persistently 502 — can't monopolise a cron
|
||||||
|
/// tick. Orders by `manga_sources.last_seen_at DESC` so the freshest
|
||||||
|
/// missing-cover mangas are addressed first.
|
||||||
|
///
|
||||||
|
/// Failures are logged and counted, not raised: a single bad cover URL
|
||||||
|
/// must not stall every other backfill behind it.
|
||||||
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
pub async fn backfill_missing_covers(
|
||||||
|
browser_manager: &BrowserManager,
|
||||||
|
db: &PgPool,
|
||||||
|
storage: &dyn Storage,
|
||||||
|
http: &reqwest::Client,
|
||||||
|
rate: &HostRateLimiters,
|
||||||
|
max_mangas: usize,
|
||||||
|
allowlist: &DownloadAllowlist,
|
||||||
|
max_image_bytes: usize,
|
||||||
|
status: Option<&crate::crawler::status::StatusHandle>,
|
||||||
|
tor: Option<&crate::crawler::tor::TorController>,
|
||||||
|
) -> anyhow::Result<CoverBackfillStats> {
|
||||||
|
let mut stats = CoverBackfillStats::default();
|
||||||
|
if max_mangas == 0 {
|
||||||
|
return Ok(stats);
|
||||||
|
}
|
||||||
|
|
||||||
|
let entries = repo::crawler::list_missing_covers(db, max_mangas as i64)
|
||||||
|
.await
|
||||||
|
.context("list_missing_covers")?;
|
||||||
|
|
||||||
|
if entries.is_empty() {
|
||||||
|
return Ok(stats);
|
||||||
|
}
|
||||||
|
|
||||||
|
let lease = browser_manager
|
||||||
|
.acquire()
|
||||||
|
.await
|
||||||
|
.context("acquire browser lease for cover backfill")?;
|
||||||
|
let browser_ref: &chromiumoxide::Browser = &lease;
|
||||||
|
let ctx = FetchContext { browser: browser_ref, rate, tor };
|
||||||
|
|
||||||
|
let total = entries.len();
|
||||||
|
for (index, entry) in entries.into_iter().enumerate() {
|
||||||
|
stats.considered += 1;
|
||||||
|
if let Some(s) = status {
|
||||||
|
s.set_phase(crate::crawler::status::Phase::CoverBackfill { index, total })
|
||||||
|
.await;
|
||||||
|
}
|
||||||
|
// Metadata-only TargetSource: skip chapter-list parsing so a
|
||||||
|
// missing-cover refetch doesn't soft-drop chapters on a partial
|
||||||
|
// render. Cover URL alone is what we need.
|
||||||
|
let source = TargetSource::new(entry.source_url.clone()).without_chapter_parsing();
|
||||||
|
let r = SourceMangaRef {
|
||||||
|
source_manga_key: entry.source_manga_key.clone(),
|
||||||
|
title: String::new(),
|
||||||
|
url: entry.source_url.clone(),
|
||||||
|
};
|
||||||
|
let manga = match source.fetch_manga(&ctx, &r).await {
|
||||||
|
Ok(manga) => manga,
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(
|
||||||
|
manga_id = %entry.manga_id,
|
||||||
|
url = %entry.source_url,
|
||||||
|
error = ?e,
|
||||||
|
"cover backfill: fetch_manga failed"
|
||||||
|
);
|
||||||
|
stats.failed += 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
let Some(cover_url) = manga.cover_url.clone() else {
|
||||||
|
tracing::warn!(
|
||||||
|
manga_id = %entry.manga_id,
|
||||||
|
url = %entry.source_url,
|
||||||
|
"cover backfill: source returned no cover_url"
|
||||||
|
);
|
||||||
|
stats.failed += 1;
|
||||||
|
continue;
|
||||||
|
};
|
||||||
|
if let Some(s) = status {
|
||||||
|
s.set_current_cover(Some(crate::crawler::status::CoverTarget {
|
||||||
|
manga_id: entry.manga_id,
|
||||||
|
manga_title: manga.title.clone(),
|
||||||
|
}))
|
||||||
|
.await;
|
||||||
|
}
|
||||||
|
let cover_result = download_and_store_cover(
|
||||||
|
db,
|
||||||
|
storage,
|
||||||
|
http,
|
||||||
|
rate,
|
||||||
|
&entry.source_url,
|
||||||
|
entry.manga_id,
|
||||||
|
&cover_url,
|
||||||
|
allowlist,
|
||||||
|
max_image_bytes,
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
if let Some(s) = status {
|
||||||
|
s.set_current_cover(None).await;
|
||||||
|
}
|
||||||
|
match cover_result {
|
||||||
|
Ok(()) => stats.fetched += 1,
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!(
|
||||||
|
manga_id = %entry.manga_id,
|
||||||
|
url = %entry.source_url,
|
||||||
|
error = ?e,
|
||||||
|
"cover backfill: download failed"
|
||||||
|
);
|
||||||
|
stats.failed += 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
drop(lease);
|
||||||
|
Ok(stats)
|
||||||
|
}
|
||||||
|
|
||||||
/// Download a cover image and persist its storage path. Local to the
|
/// Download a cover image and persist its storage path. Local to the
|
||||||
/// pipeline because the CLI still calls it from its inline chapter-content
|
/// pipeline because the CLI still calls it from its inline chapter-content
|
||||||
/// loop; once the worker pool fully replaces that path we can fold this
|
/// loop; once the worker pool fully replaces that path we can fold this
|
||||||
/// into `pipeline` proper.
|
/// into `pipeline` proper.
|
||||||
#[allow(clippy::too_many_arguments)]
|
#[allow(clippy::too_many_arguments)]
|
||||||
async fn download_and_store_cover(
|
pub(crate) async fn download_and_store_cover(
|
||||||
db: &PgPool,
|
db: &PgPool,
|
||||||
storage: &dyn Storage,
|
storage: &dyn Storage,
|
||||||
http: &reqwest::Client,
|
http: &reqwest::Client,
|
||||||
@@ -632,6 +837,18 @@ mod tests {
|
|||||||
assert!(!should_stop(false, UpsertStatus::New, None));
|
assert!(!should_stop(false, UpsertStatus::New, None));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn abort_pass_fires_at_threshold_and_respects_disable() {
|
||||||
|
// Disabled (0) never fires, no matter how many failures.
|
||||||
|
assert!(!should_abort_pass(0, 0));
|
||||||
|
assert!(!should_abort_pass(100, 0));
|
||||||
|
// Below threshold: keep going.
|
||||||
|
assert!(!should_abort_pass(9, 10));
|
||||||
|
// At/above threshold: abort.
|
||||||
|
assert!(should_abort_pass(10, 10));
|
||||||
|
assert!(should_abort_pass(11, 10));
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn clean_exit_when_walked_to_completion() {
|
fn clean_exit_when_walked_to_completion() {
|
||||||
// End-of-walk reached the catalog tail — the recovery flag may
|
// End-of-walk reached the catalog tail — the recovery flag may
|
||||||
|
|||||||
279
backend/src/crawler/resync.rs
Normal file
279
backend/src/crawler/resync.rs
Normal file
@@ -0,0 +1,279 @@
|
|||||||
|
//! Admin-triggered resync of a single manga's metadata + cover, or a
|
||||||
|
//! single chapter's content.
|
||||||
|
//!
|
||||||
|
//! The cron tick already retries covers and chapter content on its own
|
||||||
|
//! schedule. This module exists for the operator-controlled path:
|
||||||
|
//! "this manga's metadata is stale / its cover never landed / this
|
||||||
|
//! chapter is broken — pull from source now, not at the next daily
|
||||||
|
//! tick." Wired into the admin API, never into the queue, so the work
|
||||||
|
//! happens synchronously with the HTTP request and the admin sees the
|
||||||
|
//! refreshed row in the response.
|
||||||
|
//!
|
||||||
|
//! Shares the daemon's [`BrowserManager`], rate limiter, HTTP client,
|
||||||
|
//! and TOR controller so a force resync respects the same per-host
|
||||||
|
//! pacing and recircuit budget the daily crawl uses — admin actions
|
||||||
|
//! must not let an operator accidentally hammer the source.
|
||||||
|
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use anyhow::Context;
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use crate::crawler::browser_manager::BrowserManager;
|
||||||
|
use crate::crawler::content::{self, SyncOutcome};
|
||||||
|
use crate::crawler::pipeline;
|
||||||
|
use crate::crawler::rate_limit::HostRateLimiters;
|
||||||
|
use crate::crawler::safety::DownloadAllowlist;
|
||||||
|
use crate::crawler::source::target::TargetSource;
|
||||||
|
use crate::crawler::source::{FetchContext, Source, SourceMangaRef};
|
||||||
|
use crate::crawler::tor::TorController;
|
||||||
|
use crate::repo;
|
||||||
|
use crate::repo::crawler::UpsertStatus;
|
||||||
|
use crate::storage::Storage;
|
||||||
|
|
||||||
|
/// Outcome of [`ResyncService::resync_manga`]. Mirrors the bits the
|
||||||
|
/// admin UI cares about — was the row actually re-upserted, did the
|
||||||
|
/// cover land — so the response can show "metadata refreshed, cover
|
||||||
|
/// re-downloaded" or "metadata unchanged" without a second round-trip.
|
||||||
|
#[derive(Debug, Clone, Copy)]
|
||||||
|
pub struct MangaResyncOutcome {
|
||||||
|
pub manga_id: Uuid,
|
||||||
|
pub metadata_status: UpsertStatus,
|
||||||
|
pub cover_fetched: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Outcome of [`ResyncService::resync_chapter`]. `Fetched(pages)` is the
|
||||||
|
/// success case; `Skipped` means the source row was already gone or the
|
||||||
|
/// chapter had no live source.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub enum ChapterResyncOutcome {
|
||||||
|
Fetched { chapter_id: Uuid, pages: usize },
|
||||||
|
Skipped { chapter_id: Uuid, reason: String },
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Service exposed by the daemon to the admin API. Optional on
|
||||||
|
/// [`AppState`] — `None` when the crawler daemon is disabled
|
||||||
|
/// (`CRAWLER_DAEMON=false`), in which case admin handlers return 503.
|
||||||
|
#[async_trait]
|
||||||
|
pub trait ResyncService: Send + Sync {
|
||||||
|
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome>;
|
||||||
|
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome>;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Errors with a stable shape so the API layer can map them to the
|
||||||
|
/// right HTTP status (404 vs 422 vs 5xx). Anything else surfaces as a
|
||||||
|
/// generic 500.
|
||||||
|
#[derive(Debug, thiserror::Error)]
|
||||||
|
pub enum ResyncError {
|
||||||
|
#[error("manga has no source to resync from")]
|
||||||
|
NoMangaSource,
|
||||||
|
#[error("chapter has no source to resync from")]
|
||||||
|
NoChapterSource,
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct RealResyncService {
|
||||||
|
pub browser_manager: Arc<BrowserManager>,
|
||||||
|
pub db: PgPool,
|
||||||
|
pub storage: Arc<dyn Storage>,
|
||||||
|
pub http: reqwest::Client,
|
||||||
|
pub rate: Arc<HostRateLimiters>,
|
||||||
|
pub download_allowlist: DownloadAllowlist,
|
||||||
|
pub max_image_bytes: usize,
|
||||||
|
pub tor: Option<Arc<TorController>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ResyncService for RealResyncService {
|
||||||
|
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome> {
|
||||||
|
// Pick the freshest live source row. Multi-source mangas
|
||||||
|
// (theoretical — only one Source impl today) get the row whose
|
||||||
|
// `last_seen_at` is newest; soft-dropped rows are skipped.
|
||||||
|
let row: Option<(String, String, String)> = sqlx::query_as(
|
||||||
|
"SELECT source_id, source_manga_key, source_url \
|
||||||
|
FROM manga_sources \
|
||||||
|
WHERE manga_id = $1 AND dropped_at IS NULL \
|
||||||
|
ORDER BY last_seen_at DESC \
|
||||||
|
LIMIT 1",
|
||||||
|
)
|
||||||
|
.bind(manga_id)
|
||||||
|
.fetch_optional(&self.db)
|
||||||
|
.await
|
||||||
|
.context("look up manga_sources for resync")?;
|
||||||
|
let Some((_source_id, source_manga_key, source_url)) = row else {
|
||||||
|
return Err(ResyncError::NoMangaSource.into());
|
||||||
|
};
|
||||||
|
|
||||||
|
let lease = self
|
||||||
|
.browser_manager
|
||||||
|
.acquire()
|
||||||
|
.await
|
||||||
|
.context("acquire browser lease for manga resync")?;
|
||||||
|
let browser_ref: &chromiumoxide::Browser = &lease;
|
||||||
|
let ctx = FetchContext {
|
||||||
|
browser: browser_ref,
|
||||||
|
rate: &self.rate,
|
||||||
|
tor: self.tor.as_deref(),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Parse chapters too — a force resync is "make this manga fully
|
||||||
|
// current," not just metadata. The full pipeline handles the
|
||||||
|
// partial-render guard for us; we replicate the same caution
|
||||||
|
// here by skipping the chapter sync when the parser returned
|
||||||
|
// empty but the manga previously had chapters.
|
||||||
|
let source = TargetSource::new(source_url.clone());
|
||||||
|
let r = SourceMangaRef {
|
||||||
|
source_manga_key: source_manga_key.clone(),
|
||||||
|
title: String::new(),
|
||||||
|
url: source_url.clone(),
|
||||||
|
};
|
||||||
|
let manga = source
|
||||||
|
.fetch_manga(&ctx, &r)
|
||||||
|
.await
|
||||||
|
.with_context(|| format!("fetch_manga during resync of {manga_id}"))?;
|
||||||
|
|
||||||
|
// Partial-render guard: same logic as run_metadata_pass.
|
||||||
|
let source_id = source.id();
|
||||||
|
if !manga.chapters.is_empty() || {
|
||||||
|
let prior = repo::crawler::live_chapter_count_for_source_manga(
|
||||||
|
&self.db,
|
||||||
|
source_id,
|
||||||
|
&source_manga_key,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.unwrap_or(0);
|
||||||
|
prior == 0
|
||||||
|
} {
|
||||||
|
// Either the new fetch surfaced chapters, or there were
|
||||||
|
// none before either — chapter sync is safe to run.
|
||||||
|
} else {
|
||||||
|
tracing::warn!(
|
||||||
|
%manga_id,
|
||||||
|
source_url = %source_url,
|
||||||
|
"resync_manga: fetch returned empty chapters but prior count > 0; skipping chapter sync to avoid soft-drop"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
let upsert = repo::crawler::upsert_manga_from_source(
|
||||||
|
&self.db,
|
||||||
|
source_id,
|
||||||
|
&source_url,
|
||||||
|
&manga,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.with_context(|| format!("upsert_manga_from_source during resync of {manga_id}"))?;
|
||||||
|
|
||||||
|
// Cover refetch: force-download regardless of UpsertStatus.
|
||||||
|
// Admin clicked "resync" because they want the cover too.
|
||||||
|
let mut cover_fetched = false;
|
||||||
|
if let Some(cover_url) = manga.cover_url.as_deref() {
|
||||||
|
match pipeline::download_and_store_cover(
|
||||||
|
&self.db,
|
||||||
|
self.storage.as_ref(),
|
||||||
|
&self.http,
|
||||||
|
&self.rate,
|
||||||
|
&source_url,
|
||||||
|
upsert.manga_id,
|
||||||
|
cover_url,
|
||||||
|
&self.download_allowlist,
|
||||||
|
self.max_image_bytes,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(()) => cover_fetched = true,
|
||||||
|
Err(e) => tracing::warn!(
|
||||||
|
%manga_id,
|
||||||
|
error = ?e,
|
||||||
|
"resync_manga: cover download failed"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Chapter sync — only when the partial-render guard above
|
||||||
|
// didn't bail.
|
||||||
|
let prior_chapter_count = repo::crawler::live_chapter_count_for_source_manga(
|
||||||
|
&self.db,
|
||||||
|
source_id,
|
||||||
|
&source_manga_key,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.unwrap_or(0);
|
||||||
|
if !manga.chapters.is_empty() || prior_chapter_count == 0 {
|
||||||
|
match repo::crawler::sync_manga_chapters(
|
||||||
|
&self.db,
|
||||||
|
source_id,
|
||||||
|
upsert.manga_id,
|
||||||
|
&manga.chapters,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
{
|
||||||
|
Ok(diff) => tracing::info!(
|
||||||
|
%manga_id,
|
||||||
|
new = diff.new,
|
||||||
|
refreshed = diff.refreshed,
|
||||||
|
dropped = diff.dropped,
|
||||||
|
"resync_manga: chapters synced"
|
||||||
|
),
|
||||||
|
Err(e) => tracing::warn!(
|
||||||
|
%manga_id,
|
||||||
|
error = ?e,
|
||||||
|
"resync_manga: chapter sync failed"
|
||||||
|
),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
drop(lease);
|
||||||
|
Ok(MangaResyncOutcome {
|
||||||
|
manga_id: upsert.manga_id,
|
||||||
|
metadata_status: upsert.status,
|
||||||
|
cover_fetched,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome> {
|
||||||
|
let row = repo::chapter::dispatch_target(&self.db, chapter_id)
|
||||||
|
.await
|
||||||
|
.context("look up chapter_sources for resync")?;
|
||||||
|
let Some((manga_id, source_url, _title, _number)) = row else {
|
||||||
|
return Err(ResyncError::NoChapterSource.into());
|
||||||
|
};
|
||||||
|
|
||||||
|
let lease = self
|
||||||
|
.browser_manager
|
||||||
|
.acquire()
|
||||||
|
.await
|
||||||
|
.context("acquire browser lease for chapter resync")?;
|
||||||
|
let result = content::sync_chapter_content(
|
||||||
|
&lease,
|
||||||
|
&self.db,
|
||||||
|
self.storage.as_ref(),
|
||||||
|
&self.http,
|
||||||
|
&self.rate,
|
||||||
|
chapter_id,
|
||||||
|
manga_id,
|
||||||
|
&source_url,
|
||||||
|
true,
|
||||||
|
&self.download_allowlist,
|
||||||
|
self.max_image_bytes,
|
||||||
|
self.tor.as_deref(),
|
||||||
|
// Admin resync isn't a daemon worker slot — no live status.
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
drop(lease);
|
||||||
|
|
||||||
|
match result? {
|
||||||
|
SyncOutcome::Fetched { pages } => {
|
||||||
|
Ok(ChapterResyncOutcome::Fetched { chapter_id, pages })
|
||||||
|
}
|
||||||
|
SyncOutcome::Skipped => Ok(ChapterResyncOutcome::Skipped {
|
||||||
|
chapter_id,
|
||||||
|
reason: "chapter already had pages on disk".to_string(),
|
||||||
|
}),
|
||||||
|
SyncOutcome::SessionExpired => {
|
||||||
|
anyhow::bail!("source session expired — operator must refresh PHPSESSID")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -162,38 +162,124 @@ const PROBE_RETRY_DELAY: Duration = Duration::from_secs(2);
|
|||||||
/// limiter. The trade is worth it — failing here costs ~1s; failing 30
|
/// limiter. The trade is worth it — failing here costs ~1s; failing 30
|
||||||
/// minutes into a backfill costs 30 minutes.
|
/// minutes into a backfill costs 30 minutes.
|
||||||
pub async fn verify_session(browser: &Browser, probe_url: &str) -> anyhow::Result<()> {
|
pub async fn verify_session(browser: &Browser, probe_url: &str) -> anyhow::Result<()> {
|
||||||
let mut attempt = 0u32;
|
verify_session_with_recircuit(browser, probe_url, None, 0).await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Like [`verify_session`] but, when `tor` is `Some`, signals
|
||||||
|
/// `SIGNAL NEWNYM` between retries on transient pages AND treats
|
||||||
|
/// `Unauthenticated` as recoverable (up to `tor_max_attempts` total
|
||||||
|
/// probes, calling NEWNYM between each).
|
||||||
|
///
|
||||||
|
/// `verify_session` is `verify_session_with_recircuit(..., None, _)`,
|
||||||
|
/// which collapses the `Unauthenticated` budget to 1 attempt — i.e.
|
||||||
|
/// fail-fast, exactly the pre-TOR behavior.
|
||||||
|
pub async fn verify_session_with_recircuit(
|
||||||
|
browser: &Browser,
|
||||||
|
probe_url: &str,
|
||||||
|
tor: Option<&crate::crawler::tor::TorController>,
|
||||||
|
tor_max_attempts: u32,
|
||||||
|
) -> anyhow::Result<()> {
|
||||||
|
let unauth_max_attempts = if tor.is_some() { tor_max_attempts.max(1) } else { 1 };
|
||||||
|
run_session_probe_loop(
|
||||||
|
|| fetch_probe_html(browser, probe_url),
|
||||||
|
|| async {
|
||||||
|
if let Some(t) = tor {
|
||||||
|
if let Err(e) = t.new_identity().await {
|
||||||
|
tracing::warn!(error = %e, "TOR NEWNYM failed; continuing with same circuit");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
PROBE_MAX_ATTEMPTS,
|
||||||
|
unauth_max_attempts,
|
||||||
|
PROBE_RETRY_DELAY,
|
||||||
|
probe_url,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Pure-over-IO loop body for the session probe. Generic over the
|
||||||
|
/// fetch and recircuit closures so it can be unit-tested without a
|
||||||
|
/// real browser or TOR daemon.
|
||||||
|
///
|
||||||
|
/// Both budgets count **total attempts**, including the first — so
|
||||||
|
/// `transient_max_attempts = 3` allows 3 fetches and 2 recircuits
|
||||||
|
/// between them, and `unauth_max_attempts = 1` means "fail-fast, no
|
||||||
|
/// retry". This matches [`crate::crawler::detect::retry_on_transient`]
|
||||||
|
/// and the content-path recircuit loop.
|
||||||
|
///
|
||||||
|
/// Outcomes:
|
||||||
|
/// - `SessionProbe::Ok` → return `Ok(())`.
|
||||||
|
/// - `SessionProbe::Unauthenticated` → recircuit + retry while
|
||||||
|
/// under the unauth budget. After the cap, bail with the
|
||||||
|
/// "PHPSESSID expired" diagnostic, mentioning the attempt count so
|
||||||
|
/// a TOR-misconfig diagnosis is easier.
|
||||||
|
/// - `SessionProbe::Transient` → same shape against the transient
|
||||||
|
/// budget; bails with "site down or rate-limiting" after the cap.
|
||||||
|
async fn run_session_probe_loop<F, Fut, R, RFut>(
|
||||||
|
mut fetch_html: F,
|
||||||
|
mut recircuit: R,
|
||||||
|
transient_max_attempts: u32,
|
||||||
|
unauth_max_attempts: u32,
|
||||||
|
retry_delay: Duration,
|
||||||
|
probe_url_for_msg: &str,
|
||||||
|
) -> anyhow::Result<()>
|
||||||
|
where
|
||||||
|
F: FnMut() -> Fut,
|
||||||
|
Fut: std::future::Future<Output = anyhow::Result<String>>,
|
||||||
|
R: FnMut() -> RFut,
|
||||||
|
RFut: std::future::Future<Output = ()>,
|
||||||
|
{
|
||||||
|
debug_assert!(transient_max_attempts >= 1);
|
||||||
|
debug_assert!(unauth_max_attempts >= 1);
|
||||||
|
let mut transient_attempts = 0u32;
|
||||||
|
let mut unauth_attempts = 0u32;
|
||||||
loop {
|
loop {
|
||||||
attempt += 1;
|
let html = fetch_html().await?;
|
||||||
let html = fetch_probe_html(browser, probe_url).await?;
|
|
||||||
match classify_probe(&html) {
|
match classify_probe(&html) {
|
||||||
SessionProbe::Ok => {
|
SessionProbe::Ok => {
|
||||||
tracing::info!(attempt, "session probe ok — #logo + #avatar_menu present");
|
tracing::info!(
|
||||||
|
transient_attempts,
|
||||||
|
unauth_attempts,
|
||||||
|
"session probe ok — #logo + #avatar_menu present"
|
||||||
|
);
|
||||||
return Ok(());
|
return Ok(());
|
||||||
}
|
}
|
||||||
SessionProbe::Unauthenticated => {
|
SessionProbe::Unauthenticated => {
|
||||||
|
unauth_attempts += 1;
|
||||||
|
if unauth_attempts >= unauth_max_attempts {
|
||||||
return Err(anyhow!(
|
return Err(anyhow!(
|
||||||
"session probe failed — #avatar_menu not present at {probe_url} \
|
"session probe failed — #avatar_menu not present at {probe_url_for_msg} \
|
||||||
(page rendered the normal layout); PHPSESSID is missing, expired, \
|
after {unauth_attempts} attempt(s); PHPSESSID is missing, \
|
||||||
or revoked. Refresh CRAWLER_PHPSESSID and re-run."
|
expired, or revoked. Refresh CRAWLER_PHPSESSID and re-run."
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
SessionProbe::Transient if attempt < PROBE_MAX_ATTEMPTS => {
|
|
||||||
tracing::warn!(
|
tracing::warn!(
|
||||||
attempt,
|
attempt = unauth_attempts,
|
||||||
max_attempts = PROBE_MAX_ATTEMPTS,
|
max_attempts = unauth_max_attempts,
|
||||||
"session probe got a transient page; retrying"
|
"session probe Unauthenticated despite PHPSESSID; signaling TOR \
|
||||||
|
NEWNYM and retrying"
|
||||||
);
|
);
|
||||||
tokio::time::sleep(PROBE_RETRY_DELAY).await;
|
recircuit().await;
|
||||||
|
tokio::time::sleep(retry_delay).await;
|
||||||
}
|
}
|
||||||
SessionProbe::Transient => {
|
SessionProbe::Transient => {
|
||||||
|
transient_attempts += 1;
|
||||||
|
if transient_attempts >= transient_max_attempts {
|
||||||
return Err(anyhow!(
|
return Err(anyhow!(
|
||||||
"session probe failed — probe page at {probe_url} returned a \
|
"session probe failed — probe page at {probe_url_for_msg} returned \
|
||||||
broken-page response after {PROBE_MAX_ATTEMPTS} attempts. \
|
a broken-page response after {transient_max_attempts} attempts. \
|
||||||
The site appears to be down or rate-limiting us; try again \
|
The site appears to be down or rate-limiting us; try again \
|
||||||
later before refreshing CRAWLER_PHPSESSID."
|
later before refreshing CRAWLER_PHPSESSID."
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
|
tracing::warn!(
|
||||||
|
attempt = transient_attempts,
|
||||||
|
max_attempts = transient_max_attempts,
|
||||||
|
"session probe got a transient page; recircuit + retry"
|
||||||
|
);
|
||||||
|
recircuit().await;
|
||||||
|
tokio::time::sleep(retry_delay).await;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -336,6 +422,204 @@ mod tests {
|
|||||||
assert_eq!(classify_chapter_probe(html), ChapterProbe::Ok);
|
assert_eq!(classify_chapter_probe(html), ChapterProbe::Ok);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// --- run_session_probe_loop -----------------------------------------
|
||||||
|
//
|
||||||
|
// These tests exercise the recircuit-aware loop without a real
|
||||||
|
// browser. The fetch and recircuit closures are mocked over Vecs of
|
||||||
|
// canned outcomes / counters.
|
||||||
|
|
||||||
|
const OK_HTML: &str = r#"<html><body><div id="logo"></div><div id="avatar_menu"></div></body></html>"#;
|
||||||
|
const UNAUTH_HTML: &str = r#"<html><body><div id="logo"></div></body></html>"#;
|
||||||
|
const TRANSIENT_HTML: &str = "<html><body><p>we're sorry, the request file are not found.</p></body></html>";
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_ok_on_first_attempt_does_not_recircuit() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut fetched = 0u32;
|
||||||
|
run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
fetched += 1;
|
||||||
|
async { Ok(OK_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
3,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok on first attempt");
|
||||||
|
assert_eq!(fetched, 1);
|
||||||
|
assert_eq!(recircuits, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_unauth_then_ok_when_attempt_budget_available() {
|
||||||
|
// Budget = 3 total attempts. Unauth on call 1, ok on call 2.
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut call = 0u32;
|
||||||
|
run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
let n = call;
|
||||||
|
async move {
|
||||||
|
if n == 1 {
|
||||||
|
Ok(UNAUTH_HTML.to_string())
|
||||||
|
} else {
|
||||||
|
Ok(OK_HTML.to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
3,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("recovers after one recircuit");
|
||||||
|
assert_eq!(call, 2);
|
||||||
|
assert_eq!(recircuits, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_unauth_with_single_attempt_budget_fails_fast() {
|
||||||
|
// Budget = 1 total attempt = no retry (matches no-TOR behavior).
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut call = 0u32;
|
||||||
|
let err = run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
async { Ok(UNAUTH_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
1,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect_err("budget=1 → fail-fast");
|
||||||
|
assert_eq!(call, 1, "no retry when budget is 1");
|
||||||
|
assert_eq!(recircuits, 0);
|
||||||
|
let msg = format!("{err:#}");
|
||||||
|
assert!(msg.contains("Refresh CRAWLER_PHPSESSID"), "msg: {msg}");
|
||||||
|
assert!(msg.contains("after 1 attempt"), "expected attempt count in msg: {msg}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_unauth_after_exhausting_budget_emits_attempt_count() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut call = 0u32;
|
||||||
|
let err = run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
async { Ok(UNAUTH_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
10, // transient budget irrelevant here
|
||||||
|
3, // 3 attempts total, 2 recircuits between
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect_err("exhausts unauth budget");
|
||||||
|
assert_eq!(call, 3);
|
||||||
|
assert_eq!(recircuits, 2);
|
||||||
|
let msg = format!("{err:#}");
|
||||||
|
assert!(msg.contains("after 3 attempt"), "expected attempt count in error, got: {msg}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_transient_repeats_until_max_then_errors() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut call = 0u32;
|
||||||
|
let err = run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
async { Ok(TRANSIENT_HTML.to_string()) }
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
1,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect_err("transient until max → fail");
|
||||||
|
assert_eq!(call, 3);
|
||||||
|
// Recircuit fires between attempts: 3 attempts → 2 recircuits.
|
||||||
|
assert_eq!(recircuits, 2);
|
||||||
|
let msg = format!("{err:#}");
|
||||||
|
assert!(msg.contains("broken-page response after 3 attempts"), "msg: {msg}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_transient_then_ok_returns_ok_after_one_recircuit() {
|
||||||
|
let mut recircuits = 0u32;
|
||||||
|
let mut call = 0u32;
|
||||||
|
run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
let n = call;
|
||||||
|
async move {
|
||||||
|
if n == 1 {
|
||||||
|
Ok(TRANSIENT_HTML.to_string())
|
||||||
|
} else {
|
||||||
|
Ok(OK_HTML.to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
|| {
|
||||||
|
recircuits += 1;
|
||||||
|
async {}
|
||||||
|
},
|
||||||
|
3,
|
||||||
|
1,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("ok on second try");
|
||||||
|
assert_eq!(call, 2);
|
||||||
|
assert_eq!(recircuits, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn probe_loop_propagates_fetch_errors_immediately() {
|
||||||
|
let mut call = 0u32;
|
||||||
|
let err = run_session_probe_loop(
|
||||||
|
|| {
|
||||||
|
call += 1;
|
||||||
|
async { Err(anyhow!("nav timeout")) }
|
||||||
|
},
|
||||||
|
|| async {},
|
||||||
|
5,
|
||||||
|
5,
|
||||||
|
Duration::from_millis(0),
|
||||||
|
"https://example/probe",
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect_err("fetch error bubbles");
|
||||||
|
assert_eq!(call, 1);
|
||||||
|
assert!(format!("{err:#}").contains("nav timeout"));
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn classify_probe_trusts_broken_body_over_stray_avatar_match() {
|
fn classify_probe_trusts_broken_body_over_stray_avatar_match() {
|
||||||
// Defensive: if a broken-page body somehow contains an
|
// Defensive: if a broken-page body somehow contains an
|
||||||
|
|||||||
180
backend/src/crawler/session_control.rs
Normal file
180
backend/src/crawler/session_control.rs
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
//! Runtime-updatable crawler session (PHPSESSID).
|
||||||
|
//!
|
||||||
|
//! At startup the session comes from `CRAWLER_PHPSESSID`, but it expires
|
||||||
|
//! and previously needed a container restart to refresh. This controller
|
||||||
|
//! lets an admin push a fresh cookie at runtime: it rewrites the reqwest
|
||||||
|
//! cookie jar (CDN image fetches), updates the in-memory value the browser
|
||||||
|
//! `on_launch` hook reads, persists it to `crawler_state` (so it survives
|
||||||
|
//! a restart), and clears the sticky `session_expired` flag. A subsequent
|
||||||
|
//! coordinated browser restart re-runs `on_launch`, re-injecting the new
|
||||||
|
//! cookie into Chromium and re-probing.
|
||||||
|
|
||||||
|
use std::sync::atomic::{AtomicBool, Ordering};
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
use anyhow::Context;
|
||||||
|
use serde_json::json;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use tokio::sync::RwLock;
|
||||||
|
|
||||||
|
const STATE_KEY_RUNTIME_SESSION: &str = "runtime_session";
|
||||||
|
|
||||||
|
pub struct SessionController {
|
||||||
|
/// Current PHPSESSID — what `on_launch` injects into a fresh browser.
|
||||||
|
phpsessid: RwLock<Option<String>>,
|
||||||
|
/// The same `Arc<Jar>` handed to the reqwest client; updating it here
|
||||||
|
/// updates the client's cookies (the jar is internally mutable).
|
||||||
|
cookie_jar: Arc<reqwest::cookie::Jar>,
|
||||||
|
cookie_domain: Option<String>,
|
||||||
|
start_url: Option<String>,
|
||||||
|
db: PgPool,
|
||||||
|
session_expired: Arc<AtomicBool>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SessionController {
|
||||||
|
pub fn new(
|
||||||
|
initial: Option<String>,
|
||||||
|
cookie_jar: Arc<reqwest::cookie::Jar>,
|
||||||
|
cookie_domain: Option<String>,
|
||||||
|
start_url: Option<String>,
|
||||||
|
db: PgPool,
|
||||||
|
session_expired: Arc<AtomicBool>,
|
||||||
|
) -> Arc<Self> {
|
||||||
|
Arc::new(Self {
|
||||||
|
phpsessid: RwLock::new(initial),
|
||||||
|
cookie_jar,
|
||||||
|
cookie_domain,
|
||||||
|
start_url,
|
||||||
|
db,
|
||||||
|
session_expired,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The PHPSESSID a fresh browser should inject (None when unset).
|
||||||
|
pub async fn current(&self) -> Option<String> {
|
||||||
|
self.phpsessid.read().await.clone()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Whether the sticky session-expired flag is set (chapter workers
|
||||||
|
/// idle while true).
|
||||||
|
pub fn is_expired(&self) -> bool {
|
||||||
|
self.session_expired.load(Ordering::Acquire)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear the session-expired flag without changing the cookie — used
|
||||||
|
/// when the operator knows the session is fine and wants workers to
|
||||||
|
/// resume immediately.
|
||||||
|
pub fn clear_expired(&self) {
|
||||||
|
self.session_expired.store(false, Ordering::Release);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Update the session everywhere: reqwest jar, in-memory value, and
|
||||||
|
/// persisted `crawler_state`. Clears the session-expired flag. Does
|
||||||
|
/// NOT relaunch the browser — the caller triggers a coordinated
|
||||||
|
/// restart so `on_launch` re-injects + re-probes.
|
||||||
|
pub async fn update(&self, sid: &str) -> anyhow::Result<()> {
|
||||||
|
let sid = sid.trim().to_string();
|
||||||
|
anyhow::ensure!(!sid.is_empty(), "PHPSESSID must not be empty");
|
||||||
|
// The value is spliced into a cookie string and a CDP CookieParam.
|
||||||
|
// Reject control chars and cookie delimiters so a pasted value
|
||||||
|
// can't smuggle extra attributes / break out of the cookie.
|
||||||
|
anyhow::ensure!(
|
||||||
|
sid.chars().all(|c| !c.is_control() && c != ';' && c != ','),
|
||||||
|
"PHPSESSID contains invalid characters"
|
||||||
|
);
|
||||||
|
|
||||||
|
if let (Some(domain), Some(start_url)) = (&self.cookie_domain, &self.start_url) {
|
||||||
|
let cookie_str = format!("PHPSESSID={sid}; Domain={domain}; Path=/");
|
||||||
|
let seed_url =
|
||||||
|
reqwest::Url::parse(start_url).context("parse start_url for cookie seed")?;
|
||||||
|
self.cookie_jar.add_cookie_str(&cookie_str, &seed_url);
|
||||||
|
}
|
||||||
|
*self.phpsessid.write().await = Some(sid.clone());
|
||||||
|
persist(&self.db, &sid).await.context("persist runtime session")?;
|
||||||
|
self.session_expired.store(false, Ordering::Release);
|
||||||
|
tracing::info!("crawler session updated at runtime");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Read a persisted runtime session (if any) from `crawler_state`.
|
||||||
|
/// Called at startup so a mid-day refresh survives a restart.
|
||||||
|
pub async fn load_persisted(db: &PgPool) -> Option<String> {
|
||||||
|
let row: Option<serde_json::Value> =
|
||||||
|
sqlx::query_scalar("SELECT value FROM crawler_state WHERE key = $1")
|
||||||
|
.bind(STATE_KEY_RUNTIME_SESSION)
|
||||||
|
.fetch_optional(db)
|
||||||
|
.await
|
||||||
|
.ok()
|
||||||
|
.flatten();
|
||||||
|
row.and_then(|v| {
|
||||||
|
v.get("phpsessid")
|
||||||
|
.and_then(|s| s.as_str())
|
||||||
|
.map(|s| s.to_string())
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn persist(db: &PgPool, sid: &str) -> sqlx::Result<()> {
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO crawler_state (key, value, updated_at) \
|
||||||
|
VALUES ($1, $2, now()) \
|
||||||
|
ON CONFLICT (key) DO UPDATE \
|
||||||
|
SET value = EXCLUDED.value, updated_at = now()",
|
||||||
|
)
|
||||||
|
.bind(STATE_KEY_RUNTIME_SESSION)
|
||||||
|
.bind(json!({ "phpsessid": sid }))
|
||||||
|
.execute(db)
|
||||||
|
.await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
fn controller(db: PgPool) -> Arc<SessionController> {
|
||||||
|
SessionController::new(
|
||||||
|
None,
|
||||||
|
Arc::new(reqwest::cookie::Jar::default()),
|
||||||
|
Some("example.com".into()),
|
||||||
|
Some("https://example.com/".into()),
|
||||||
|
db,
|
||||||
|
Arc::new(AtomicBool::new(true)),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn update_rejects_empty_and_control_chars(pool: PgPool) {
|
||||||
|
let c = controller(pool);
|
||||||
|
assert!(c.update(" ").await.is_err(), "empty rejected");
|
||||||
|
assert!(c.update("abc\r\ndef").await.is_err(), "CRLF rejected");
|
||||||
|
assert!(c.update("ab;Domain=evil").await.is_err(), "semicolon rejected");
|
||||||
|
assert!(c.update("x,y").await.is_err(), "comma rejected");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn update_persists_and_clears_expired_then_round_trips(pool: PgPool) {
|
||||||
|
let c = controller(pool.clone());
|
||||||
|
c.update("good-sid-123").await.unwrap();
|
||||||
|
assert_eq!(c.current().await.as_deref(), Some("good-sid-123"));
|
||||||
|
assert!(!c.is_expired(), "update clears the expired flag");
|
||||||
|
// Persisted to crawler_state and readable by a fresh load.
|
||||||
|
assert_eq!(
|
||||||
|
SessionController::load_persisted(&pool).await.as_deref(),
|
||||||
|
Some("good-sid-123")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn clear_expired_flips_sticky_flag_without_touching_session(pool: PgPool) {
|
||||||
|
// The flag starts `true` per `controller(pool)`'s test wiring.
|
||||||
|
let c = controller(pool);
|
||||||
|
assert!(c.is_expired(), "test fixture starts with the flag set");
|
||||||
|
c.clear_expired();
|
||||||
|
assert!(!c.is_expired(), "clear_expired flips the sticky flag to false");
|
||||||
|
assert!(
|
||||||
|
c.current().await.is_none(),
|
||||||
|
"clear_expired does not invent a session"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -67,6 +67,10 @@ pub struct SourceChapter {
|
|||||||
pub struct FetchContext<'a> {
|
pub struct FetchContext<'a> {
|
||||||
pub browser: &'a Browser,
|
pub browser: &'a Browser,
|
||||||
pub rate: &'a crate::crawler::rate_limit::HostRateLimiters,
|
pub rate: &'a crate::crawler::rate_limit::HostRateLimiters,
|
||||||
|
/// Optional TOR control-port client. When `Some`, retry helpers
|
||||||
|
/// signal `NEWNYM` between transient-page attempts so the next try
|
||||||
|
/// draws a fresh exit. `None` keeps pre-TOR behavior.
|
||||||
|
pub tor: Option<&'a crate::crawler::tor::TorController>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Lazy iterator over discovered manga refs. The caller drives the
|
/// Lazy iterator over discovered manga refs. The caller drives the
|
||||||
|
|||||||
@@ -7,7 +7,6 @@
|
|||||||
//! (`td:has(label:contains("Author:"))`) are implemented by walking
|
//! (`td:has(label:contains("Author:"))`) are implemented by walking
|
||||||
//! the parsed tree.
|
//! the parsed tree.
|
||||||
|
|
||||||
use std::collections::VecDeque;
|
|
||||||
use std::time::Duration;
|
use std::time::Duration;
|
||||||
|
|
||||||
use anyhow::Context;
|
use anyhow::Context;
|
||||||
@@ -19,7 +18,7 @@ use super::{
|
|||||||
SourceMangaRef,
|
SourceMangaRef,
|
||||||
};
|
};
|
||||||
use crate::crawler::detect::{
|
use crate::crawler::detect::{
|
||||||
has_logo_sentinel, is_broken_page_body, retry_on_transient, PageError,
|
has_logo_sentinel, is_broken_page_body, retry_on_transient_with_hook, PageError,
|
||||||
};
|
};
|
||||||
use crate::crawler::nav::{wait_for_nav, wait_for_selector, NavError, SELECTOR_TIMEOUT};
|
use crate::crawler::nav::{wait_for_nav, wait_for_selector, NavError, SELECTOR_TIMEOUT};
|
||||||
|
|
||||||
@@ -75,33 +74,24 @@ impl Source for TargetSource {
|
|||||||
&self,
|
&self,
|
||||||
ctx: &FetchContext<'_>,
|
ctx: &FetchContext<'_>,
|
||||||
) -> anyhow::Result<Box<dyn DiscoverWalk + Send>> {
|
) -> anyhow::Result<Box<dyn DiscoverWalk + Send>> {
|
||||||
// Always visit page 1 first because that's the only way to
|
// Probe page 1 up front (with transient retry) for two reasons:
|
||||||
// discover `last_page`. Retry it on transient — a broken first
|
// a broken first page should abort cleanly rather than mid-walk,
|
||||||
// page would otherwise abort the whole walk before we've even
|
// and the HTML is handed straight to the first `next_batch` call
|
||||||
// started.
|
// so the walker doesn't re-fetch it. Page count is discovered
|
||||||
let first_html = retry_on_transient(
|
// incrementally — see `TargetSourceWalker::next_batch`.
|
||||||
|
let first_html = retry_on_transient_with_hook(
|
||||||
|| async {
|
|| async {
|
||||||
navigate(ctx, self.base_url.as_str(), LIST_PAGE_MARKER).await
|
navigate(ctx, self.base_url.as_str(), LIST_PAGE_MARKER).await
|
||||||
},
|
},
|
||||||
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
||||||
PAGE_TRANSIENT_RETRY_DELAY,
|
PAGE_TRANSIENT_RETRY_DELAY,
|
||||||
|
|| async { recircuit_if_configured(ctx.tor).await },
|
||||||
)
|
)
|
||||||
.await?;
|
.await?;
|
||||||
let last_page = {
|
|
||||||
let doc = scraper::Html::parse_document(&first_html);
|
|
||||||
parse_last_page(&doc)
|
|
||||||
};
|
|
||||||
|
|
||||||
let order = build_page_order(last_page);
|
|
||||||
tracing::info!(
|
|
||||||
last_page = ?last_page,
|
|
||||||
page_count = order.len(),
|
|
||||||
"walking pagination"
|
|
||||||
);
|
|
||||||
|
|
||||||
Ok(Box::new(TargetSourceWalker {
|
Ok(Box::new(TargetSourceWalker {
|
||||||
base_url: self.base_url.clone(),
|
base_url: self.base_url.clone(),
|
||||||
pages_remaining: order,
|
next_page: 1,
|
||||||
first_page_html: Some(first_html),
|
first_page_html: Some(first_html),
|
||||||
}))
|
}))
|
||||||
}
|
}
|
||||||
@@ -147,24 +137,19 @@ impl Source for TargetSource {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build the queue of page numbers `TargetSource::discover` will walk.
|
/// Walker returned by [`TargetSource::discover`]. Walks pages `1..` in
|
||||||
/// The site orders by `update_date DESC`, so newest-first is just the
|
/// order, terminating as soon as a page renders cleanly with zero entries
|
||||||
/// natural page order: `1..=last`. If `last_page` is unknown (source
|
/// — that's the "we ran off the end of the index" signal. Page 1's HTML
|
||||||
/// surfaces no pagination) only page 1 is visited.
|
/// is cached at construction time (discover already had to fetch it for
|
||||||
fn build_page_order(last_page: Option<i32>) -> VecDeque<i32> {
|
/// the transient probe) so the first batch doesn't re-fetch.
|
||||||
match last_page {
|
///
|
||||||
None => VecDeque::from([1]),
|
/// A genuinely empty `Ok(vec![])` from `parse_manga_list_from` is what
|
||||||
Some(last) => (1..=last).collect(),
|
/// stops us: the parser's `#logo` sentinel converts unrendered pages
|
||||||
}
|
/// into transient errors before they reach this loop, so an empty
|
||||||
}
|
/// parse result reliably means "no more entries."
|
||||||
|
|
||||||
/// Walker returned by [`TargetSource::discover`]. Pops one source-index
|
|
||||||
/// page per `next_batch` call. Page 1's HTML is cached at construction
|
|
||||||
/// time (the discover call needed it to read `last_page` anyway) so the
|
|
||||||
/// batch covering page 1 doesn't re-fetch.
|
|
||||||
struct TargetSourceWalker {
|
struct TargetSourceWalker {
|
||||||
base_url: String,
|
base_url: String,
|
||||||
pages_remaining: VecDeque<i32>,
|
next_page: i32,
|
||||||
first_page_html: Option<String>,
|
first_page_html: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -174,20 +159,18 @@ impl DiscoverWalk for TargetSourceWalker {
|
|||||||
&mut self,
|
&mut self,
|
||||||
ctx: &FetchContext<'_>,
|
ctx: &FetchContext<'_>,
|
||||||
) -> anyhow::Result<Option<Vec<SourceMangaRef>>> {
|
) -> anyhow::Result<Option<Vec<SourceMangaRef>>> {
|
||||||
let Some(page_num) = self.pages_remaining.pop_front() else {
|
let page_num = self.next_page;
|
||||||
return Ok(None);
|
|
||||||
};
|
|
||||||
let page_refs = if page_num == 1 {
|
let page_refs = if page_num == 1 {
|
||||||
// Reuse the cached page-1 HTML from the initial probe. Take
|
// Reuse the cached page-1 HTML from the initial probe. Take
|
||||||
// it (rather than clone) so a malformed page-order queue
|
// it (rather than clone) so a future re-entry that somehow
|
||||||
// that re-visits page 1 still falls back to a real fetch.
|
// revisits page 1 still falls back to a real fetch.
|
||||||
match self.first_page_html.take() {
|
match self.first_page_html.take() {
|
||||||
Some(html) => {
|
Some(html) => {
|
||||||
let doc = scraper::Html::parse_document(&html);
|
let doc = scraper::Html::parse_document(&html);
|
||||||
parse_manga_list_from(&doc)?
|
parse_manga_list_from(&doc)?
|
||||||
}
|
}
|
||||||
None => {
|
None => {
|
||||||
retry_on_transient(
|
retry_on_transient_with_hook(
|
||||||
|| async {
|
|| async {
|
||||||
let html = navigate(
|
let html = navigate(
|
||||||
ctx,
|
ctx,
|
||||||
@@ -200,12 +183,13 @@ impl DiscoverWalk for TargetSourceWalker {
|
|||||||
},
|
},
|
||||||
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
||||||
PAGE_TRANSIENT_RETRY_DELAY,
|
PAGE_TRANSIENT_RETRY_DELAY,
|
||||||
|
|| async { recircuit_if_configured(ctx.tor).await },
|
||||||
)
|
)
|
||||||
.await?
|
.await?
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
retry_on_transient(
|
retry_on_transient_with_hook(
|
||||||
|| async {
|
|| async {
|
||||||
let url = page_url(&self.base_url, page_num);
|
let url = page_url(&self.base_url, page_num);
|
||||||
let html = navigate(ctx, &url, LIST_PAGE_MARKER).await?;
|
let html = navigate(ctx, &url, LIST_PAGE_MARKER).await?;
|
||||||
@@ -214,10 +198,15 @@ impl DiscoverWalk for TargetSourceWalker {
|
|||||||
},
|
},
|
||||||
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
PAGE_TRANSIENT_RETRY_ATTEMPTS,
|
||||||
PAGE_TRANSIENT_RETRY_DELAY,
|
PAGE_TRANSIENT_RETRY_DELAY,
|
||||||
|
|| async { recircuit_if_configured(ctx.tor).await },
|
||||||
)
|
)
|
||||||
.await?
|
.await?
|
||||||
};
|
};
|
||||||
tracing::info!(page_num, count = page_refs.len(), "page walked");
|
tracing::info!(page_num, count = page_refs.len(), "page walked");
|
||||||
|
if page_refs.is_empty() {
|
||||||
|
return Ok(None);
|
||||||
|
}
|
||||||
|
self.next_page += 1;
|
||||||
Ok(Some(page_refs))
|
Ok(Some(page_refs))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -288,20 +277,20 @@ fn classify_navigate_html(html: String) -> Result<String, PageError> {
|
|||||||
Ok(html)
|
Ok(html)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn parse_last_page(doc: &scraper::Html) -> Option<i32> {
|
/// Hook for [`retry_on_transient_with_hook`]: when TOR is configured,
|
||||||
// Pagination links carry their page number as text. Take the
|
/// signal `NEWNYM` so the next navigation draws a fresh exit. Errors
|
||||||
// numeric maximum so we don't depend on a specific layout (Prev,
|
/// from the controller are logged and swallowed — failing to recircuit
|
||||||
// Next, ellipses, etc. all get filtered out by .parse).
|
/// shouldn't take down the crawl, the next attempt just runs on the
|
||||||
let sel = scraper::Selector::parse("#left_side .pagination a").unwrap();
|
/// same circuit as before.
|
||||||
doc.select(&sel)
|
async fn recircuit_if_configured(tor: Option<&crate::crawler::tor::TorController>) {
|
||||||
.filter_map(|a| {
|
if let Some(t) = tor {
|
||||||
collapse_whitespace(&a.text().collect::<String>())
|
if let Err(e) = t.new_identity().await {
|
||||||
.parse::<i32>()
|
tracing::warn!(error = %e, "TOR NEWNYM failed; retrying on same circuit");
|
||||||
.ok()
|
}
|
||||||
})
|
}
|
||||||
.max()
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/// Substitutes the first `/N/` path segment with the target page
|
/// Substitutes the first `/N/` path segment with the target page
|
||||||
/// number. Source impls that paginate via a different URL shape can
|
/// number. Source impls that paginate via a different URL shape can
|
||||||
/// override this — for the modeled site the segment is always present.
|
/// override this — for the modeled site the segment is always present.
|
||||||
@@ -853,29 +842,6 @@ mod tests {
|
|||||||
assert_eq!(parse_chapter_number("Special"), None);
|
assert_eq!(parse_chapter_number("Special"), None);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn parse_last_page_picks_highest_pagination_link() {
|
|
||||||
let html = r#"
|
|
||||||
<div id="left_side"><div class="pagination">
|
|
||||||
<a href="/list/1/">Prev</a>
|
|
||||||
<ol>
|
|
||||||
<li><a href="/list/1/">1</a></li>
|
|
||||||
<li><a href="/list/2/">2</a></li>
|
|
||||||
<li><a href="/list/47/">47</a></li>
|
|
||||||
<li><a href="/list/2/">Next</a></li>
|
|
||||||
</ol>
|
|
||||||
</div></div>
|
|
||||||
"#;
|
|
||||||
let doc = scraper::Html::parse_document(html);
|
|
||||||
assert_eq!(parse_last_page(&doc), Some(47));
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn parse_last_page_none_when_no_pagination() {
|
|
||||||
let doc = scraper::Html::parse_document("<html></html>");
|
|
||||||
assert!(parse_last_page(&doc).is_none());
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn page_url_substitutes_numeric_path_segment() {
|
fn page_url_substitutes_numeric_path_segment() {
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
@@ -1024,28 +990,6 @@ mod tests {
|
|||||||
assert!(err.is_transient(), "got non-transient: {err}");
|
assert!(err.is_transient(), "got non-transient: {err}");
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn build_page_order_is_natural_one_to_last() {
|
|
||||||
// Newest-first is just the source's natural pagination order:
|
|
||||||
// (update_date DESC) lives at page 1, oldest at the last page.
|
|
||||||
let order = build_page_order(Some(3));
|
|
||||||
assert_eq!(Vec::from(order), vec![1, 2, 3]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn build_page_order_falls_back_to_page_one_only_without_pagination() {
|
|
||||||
// Source surfaced no pagination control — visit page 1 alone
|
|
||||||
// and let the walk end after one batch.
|
|
||||||
let order = build_page_order(None);
|
|
||||||
assert_eq!(Vec::from(order), vec![1]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn build_page_order_single_page_index_yields_one_entry() {
|
|
||||||
let order = build_page_order(Some(1));
|
|
||||||
assert_eq!(Vec::from(order), vec![1]);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn parse_chapter_list_returns_transient_when_table_missing() {
|
fn parse_chapter_list_returns_transient_when_table_missing() {
|
||||||
// Partial render (post-load JS hadn't injected the table, layout
|
// Partial render (post-load JS hadn't injected the table, layout
|
||||||
|
|||||||
355
backend/src/crawler/status.rs
Normal file
355
backend/src/crawler/status.rs
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
//! Live, in-process crawler status.
|
||||||
|
//!
|
||||||
|
//! The metadata pass runs inline in the cron tick (it is not a
|
||||||
|
//! `crawler_jobs` row), so without this surface "what is the crawler doing
|
||||||
|
//! right now" is unanswerable from the dashboard. The daemon publishes its
|
||||||
|
//! current [`Phase`], the chapters being crawled right now (with a live
|
||||||
|
//! page count), and the cover being fetched into a shared [`StatusHandle`];
|
||||||
|
//! the admin endpoint reads a [`CrawlerStatus`] snapshot and composes it
|
||||||
|
//! with DB-derived counts + the session/browser flags.
|
||||||
|
//!
|
||||||
|
//! NOTE: this is per-process state. The deployment is a single server
|
||||||
|
//! (see CLAUDE.md), so an in-memory handle is sufficient; durable signals
|
||||||
|
//! (last-pass summary, runtime session) are persisted in `crawler_state`.
|
||||||
|
|
||||||
|
use std::collections::HashMap;
|
||||||
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
|
use chrono::{DateTime, Utc};
|
||||||
|
use serde::Serialize;
|
||||||
|
use tokio::sync::{watch, RwLock};
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use crate::crawler::pipeline::MetadataStats;
|
||||||
|
|
||||||
|
/// What the daemon's metadata pass is doing right now. Serialised with an
|
||||||
|
/// internal `state` tag so the frontend can switch on it.
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
#[serde(tag = "state", rename_all = "snake_case")]
|
||||||
|
pub enum Phase {
|
||||||
|
/// Sleeping until the next scheduled metadata pass.
|
||||||
|
Idle { next_fire: Option<DateTime<Utc>> },
|
||||||
|
/// Walking the source catalog list pages.
|
||||||
|
WalkingList,
|
||||||
|
/// Fetching one manga's metadata. `index`/`total` drive a progress bar
|
||||||
|
/// (`total` is `None` when the source size is unknown / uncapped).
|
||||||
|
FetchingMetadata {
|
||||||
|
index: usize,
|
||||||
|
total: Option<usize>,
|
||||||
|
title: String,
|
||||||
|
},
|
||||||
|
/// Backfilling covers that failed on first attempt. `index`/`total`
|
||||||
|
/// track progress through this tick's batch.
|
||||||
|
CoverBackfill { index: usize, total: usize },
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A chapter being downloaded right now, with a live page count. Keyed in
|
||||||
|
/// the status by `chapter_id`; inserted by the dispatcher when a job starts
|
||||||
|
/// and removed (via an RAII guard) when it finishes, panics, or times out.
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct ActiveChapter {
|
||||||
|
pub manga_id: Uuid,
|
||||||
|
pub manga_title: String,
|
||||||
|
pub chapter_id: Uuid,
|
||||||
|
pub chapter_number: i32,
|
||||||
|
pub pages_done: usize,
|
||||||
|
/// `None` until the chapter page list has been parsed.
|
||||||
|
pub pages_total: Option<usize>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The manga whose cover is being downloaded right now.
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct CoverTarget {
|
||||||
|
pub manga_id: Uuid,
|
||||||
|
pub manga_title: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Summary of the most recent metadata pass (persisted across restarts in
|
||||||
|
/// `crawler_state` by the cron; mirrored here for the live read).
|
||||||
|
#[derive(Clone, Debug, Serialize, Default)]
|
||||||
|
pub struct LastPass {
|
||||||
|
pub at: Option<DateTime<Utc>>,
|
||||||
|
pub discovered: usize,
|
||||||
|
pub upserted: usize,
|
||||||
|
pub covers_fetched: usize,
|
||||||
|
pub mangas_failed: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A point-in-time snapshot returned by [`StatusHandle::snapshot`]. The
|
||||||
|
/// session/browser/queue fields are composed at read time by the endpoint
|
||||||
|
/// (they live elsewhere), so they are not stored here.
|
||||||
|
#[derive(Clone, Debug, Serialize)]
|
||||||
|
pub struct CrawlerStatus {
|
||||||
|
pub phase: Phase,
|
||||||
|
/// Number of configured chapter workers (for "N busy / M workers").
|
||||||
|
pub worker_count: usize,
|
||||||
|
/// Chapters being downloaded right now, with live page counts.
|
||||||
|
pub active_chapters: Vec<ActiveChapter>,
|
||||||
|
pub last_pass: LastPass,
|
||||||
|
/// The cover being downloaded right now, if any.
|
||||||
|
pub current_cover: Option<CoverTarget>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Scalar status state held under the async `RwLock`. Active chapters live
|
||||||
|
/// in a separate sync map so per-page updates and RAII removal don't need
|
||||||
|
/// to `.await` (removal happens in `Drop`).
|
||||||
|
#[derive(Clone, Debug)]
|
||||||
|
struct Scalar {
|
||||||
|
phase: Phase,
|
||||||
|
worker_count: usize,
|
||||||
|
last_pass: LastPass,
|
||||||
|
current_cover: Option<CoverTarget>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Cloneable handle the daemon tasks use to publish status. Cheap to clone
|
||||||
|
/// (`Arc`). All writers funnel through the helper methods so locking stays
|
||||||
|
/// localised. Every mutation bumps a `watch` version so SSE subscribers
|
||||||
|
/// get pushed an update instead of polling.
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub struct StatusHandle {
|
||||||
|
scalar: Arc<RwLock<Scalar>>,
|
||||||
|
/// Currently-downloading chapters keyed by `chapter_id`. A sync mutex so
|
||||||
|
/// the RAII [`ChapterGuard`]'s `Drop` can remove without `.await`.
|
||||||
|
active: Arc<Mutex<HashMap<Uuid, ActiveChapter>>>,
|
||||||
|
/// Monotonic version bumped on every change. SSE handlers `subscribe()`
|
||||||
|
/// and `await .changed()` for instant pushes; `watch` has no
|
||||||
|
/// lost-wakeup so a change between snapshots is never missed.
|
||||||
|
version: Arc<watch::Sender<u64>>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Lock the active map, recovering from a poisoned mutex (we never hold the
|
||||||
|
/// lock across a panic-prone section, so the data is still consistent).
|
||||||
|
fn lock_active(
|
||||||
|
m: &Mutex<HashMap<Uuid, ActiveChapter>>,
|
||||||
|
) -> std::sync::MutexGuard<'_, HashMap<Uuid, ActiveChapter>> {
|
||||||
|
m.lock().unwrap_or_else(|e| e.into_inner())
|
||||||
|
}
|
||||||
|
|
||||||
|
impl StatusHandle {
|
||||||
|
pub fn new(num_workers: usize) -> Self {
|
||||||
|
let (version, _rx) = watch::channel(0u64);
|
||||||
|
Self {
|
||||||
|
scalar: Arc::new(RwLock::new(Scalar {
|
||||||
|
phase: Phase::Idle { next_fire: None },
|
||||||
|
worker_count: num_workers.max(1),
|
||||||
|
last_pass: LastPass::default(),
|
||||||
|
current_cover: None,
|
||||||
|
})),
|
||||||
|
active: Arc::new(Mutex::new(HashMap::new())),
|
||||||
|
version: Arc::new(version),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn bump(&self) {
|
||||||
|
self.version.send_modify(|v| *v = v.wrapping_add(1));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A receiver whose `.changed()` resolves on the next status change.
|
||||||
|
pub fn subscribe(&self) -> watch::Receiver<u64> {
|
||||||
|
self.version.subscribe()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Signal a change without mutating in-memory state — used when an
|
||||||
|
/// *external* signal the live snapshot reflects (browser phase,
|
||||||
|
/// session-expired flag, queue counts) has changed, so subscribers
|
||||||
|
/// recompose promptly.
|
||||||
|
pub fn poke(&self) {
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn set_phase(&self, phase: Phase) {
|
||||||
|
self.scalar.write().await.phase = phase;
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Set (or clear) the cover being downloaded right now.
|
||||||
|
pub async fn set_current_cover(&self, cover: Option<CoverTarget>) {
|
||||||
|
self.scalar.write().await.current_cover = cover;
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Register a chapter as crawling now; returns a guard that removes it
|
||||||
|
/// when dropped (on completion, panic-unwind, or timeout-drop).
|
||||||
|
pub fn begin_chapter(&self, chapter: ActiveChapter) -> ChapterGuard {
|
||||||
|
let id = chapter.chapter_id;
|
||||||
|
lock_active(&self.active).insert(id, chapter);
|
||||||
|
self.bump();
|
||||||
|
ChapterGuard {
|
||||||
|
active: Arc::clone(&self.active),
|
||||||
|
version: Arc::clone(&self.version),
|
||||||
|
chapter_id: id,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Update the live page count of an in-flight chapter. Sync (no
|
||||||
|
/// `.await`) so it's cheap to call once per stored page.
|
||||||
|
pub fn set_chapter_pages(&self, chapter_id: Uuid, done: usize, total: Option<usize>) {
|
||||||
|
{
|
||||||
|
let mut map = lock_active(&self.active);
|
||||||
|
if let Some(c) = map.get_mut(&chapter_id) {
|
||||||
|
c.pages_done = done;
|
||||||
|
c.pages_total = total;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Record a finished metadata pass. Stamps `at` with `now`.
|
||||||
|
pub async fn record_pass(&self, stats: &MetadataStats, at: DateTime<Utc>) {
|
||||||
|
self.scalar.write().await.last_pass = LastPass {
|
||||||
|
at: Some(at),
|
||||||
|
discovered: stats.discovered,
|
||||||
|
upserted: stats.upserted,
|
||||||
|
covers_fetched: stats.covers_fetched,
|
||||||
|
mangas_failed: stats.mangas_failed,
|
||||||
|
};
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Seed the last-pass summary from a persisted `crawler_state` value on
|
||||||
|
/// startup so the dashboard isn't blank until the first tick.
|
||||||
|
pub async fn set_last_pass(&self, last: LastPass) {
|
||||||
|
self.scalar.write().await.last_pass = last;
|
||||||
|
self.bump();
|
||||||
|
}
|
||||||
|
|
||||||
|
pub async fn snapshot(&self) -> CrawlerStatus {
|
||||||
|
let scalar = self.scalar.read().await.clone();
|
||||||
|
let mut active_chapters: Vec<ActiveChapter> =
|
||||||
|
lock_active(&self.active).values().cloned().collect();
|
||||||
|
// Stable, readable order: by chapter number then id.
|
||||||
|
active_chapters.sort_by(|a, b| {
|
||||||
|
a.chapter_number
|
||||||
|
.cmp(&b.chapter_number)
|
||||||
|
.then(a.chapter_id.cmp(&b.chapter_id))
|
||||||
|
});
|
||||||
|
CrawlerStatus {
|
||||||
|
phase: scalar.phase,
|
||||||
|
worker_count: scalar.worker_count,
|
||||||
|
active_chapters,
|
||||||
|
last_pass: scalar.last_pass,
|
||||||
|
current_cover: scalar.current_cover,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// RAII handle removing an [`ActiveChapter`] from the live status when the
|
||||||
|
/// chapter dispatch finishes, panics, or is dropped on timeout.
|
||||||
|
pub struct ChapterGuard {
|
||||||
|
active: Arc<Mutex<HashMap<Uuid, ActiveChapter>>>,
|
||||||
|
version: Arc<watch::Sender<u64>>,
|
||||||
|
chapter_id: Uuid,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl Drop for ChapterGuard {
|
||||||
|
fn drop(&mut self) {
|
||||||
|
lock_active(&self.active).remove(&self.chapter_id);
|
||||||
|
self.version.send_modify(|v| *v = v.wrapping_add(1));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
|
||||||
|
fn sample_chapter(n: i32) -> ActiveChapter {
|
||||||
|
ActiveChapter {
|
||||||
|
manga_id: Uuid::new_v4(),
|
||||||
|
manga_title: "M".into(),
|
||||||
|
chapter_id: Uuid::new_v4(),
|
||||||
|
chapter_number: n,
|
||||||
|
pages_done: 0,
|
||||||
|
pages_total: None,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn begin_chapter_shows_in_snapshot_and_guard_removes_on_drop() {
|
||||||
|
let h = StatusHandle::new(2);
|
||||||
|
let chap = sample_chapter(7);
|
||||||
|
let cid = chap.chapter_id;
|
||||||
|
{
|
||||||
|
let _guard = h.begin_chapter(chap);
|
||||||
|
let snap = h.snapshot().await;
|
||||||
|
assert_eq!(snap.active_chapters.len(), 1);
|
||||||
|
assert_eq!(snap.active_chapters[0].chapter_id, cid);
|
||||||
|
assert_eq!(snap.worker_count, 2);
|
||||||
|
}
|
||||||
|
// Guard dropped → entry removed.
|
||||||
|
let snap = h.snapshot().await;
|
||||||
|
assert!(snap.active_chapters.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn set_chapter_pages_updates_live_count() {
|
||||||
|
let h = StatusHandle::new(1);
|
||||||
|
let chap = sample_chapter(1);
|
||||||
|
let cid = chap.chapter_id;
|
||||||
|
let _guard = h.begin_chapter(chap);
|
||||||
|
h.set_chapter_pages(cid, 3, Some(20));
|
||||||
|
let snap = h.snapshot().await;
|
||||||
|
assert_eq!(snap.active_chapters[0].pages_done, 3);
|
||||||
|
assert_eq!(snap.active_chapters[0].pages_total, Some(20));
|
||||||
|
// Updating an unknown chapter is a no-op, not a panic.
|
||||||
|
h.set_chapter_pages(Uuid::new_v4(), 9, Some(9));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn snapshot_sorts_active_chapters_by_number() {
|
||||||
|
let h = StatusHandle::new(2);
|
||||||
|
let _g1 = h.begin_chapter(sample_chapter(5));
|
||||||
|
let _g2 = h.begin_chapter(sample_chapter(2));
|
||||||
|
let snap = h.snapshot().await;
|
||||||
|
assert_eq!(snap.active_chapters[0].chapter_number, 2);
|
||||||
|
assert_eq!(snap.active_chapters[1].chapter_number, 5);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn set_current_cover_round_trips() {
|
||||||
|
let h = StatusHandle::new(1);
|
||||||
|
let mid = Uuid::new_v4();
|
||||||
|
h.set_current_cover(Some(CoverTarget {
|
||||||
|
manga_id: mid,
|
||||||
|
manga_title: "One Piece".into(),
|
||||||
|
}))
|
||||||
|
.await;
|
||||||
|
assert_eq!(
|
||||||
|
h.snapshot().await.current_cover.map(|c| c.manga_id),
|
||||||
|
Some(mid)
|
||||||
|
);
|
||||||
|
h.set_current_cover(None).await;
|
||||||
|
assert!(h.snapshot().await.current_cover.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn record_pass_captures_stats_and_timestamp() {
|
||||||
|
let h = StatusHandle::new(1);
|
||||||
|
let stats = MetadataStats {
|
||||||
|
discovered: 5,
|
||||||
|
upserted: 3,
|
||||||
|
covers_fetched: 2,
|
||||||
|
mangas_failed: 1,
|
||||||
|
};
|
||||||
|
let at = Utc::now();
|
||||||
|
h.record_pass(&stats, at).await;
|
||||||
|
let snap = h.snapshot().await;
|
||||||
|
assert_eq!(snap.last_pass.discovered, 5);
|
||||||
|
assert_eq!(snap.last_pass.upserted, 3);
|
||||||
|
assert_eq!(snap.last_pass.at, Some(at));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn subscribe_resolves_on_mutation_poke_and_chapter_change() {
|
||||||
|
let h = StatusHandle::new(1);
|
||||||
|
let mut rx = h.subscribe();
|
||||||
|
h.set_phase(Phase::WalkingList).await;
|
||||||
|
rx.changed().await.unwrap();
|
||||||
|
h.poke();
|
||||||
|
rx.changed().await.unwrap();
|
||||||
|
// begin_chapter + guard drop each bump the version.
|
||||||
|
let g = h.begin_chapter(sample_chapter(1));
|
||||||
|
rx.changed().await.unwrap();
|
||||||
|
drop(g);
|
||||||
|
rx.changed().await.unwrap();
|
||||||
|
}
|
||||||
|
}
|
||||||
446
backend/src/crawler/tor.rs
Normal file
446
backend/src/crawler/tor.rs
Normal file
@@ -0,0 +1,446 @@
|
|||||||
|
//! TOR control-port client for `SIGNAL NEWNYM` ("recircuit").
|
||||||
|
//!
|
||||||
|
//! The crawler can be proxied through TOR (`CRAWLER_PROXY=socks5h://tor:9050`)
|
||||||
|
//! to randomize the exit IP seen by the target site. When the target
|
||||||
|
//! returns a "bad page" (its broken-template body, missing layout
|
||||||
|
//! sentinel, or unauthenticated probe despite a valid PHPSESSID), it
|
||||||
|
//! is often the current exit being rate-limited or fingerprinted rather
|
||||||
|
//! than a real failure. Asking the local TOR daemon for a new identity
|
||||||
|
//! over its control port (port 9051 by default) makes subsequent
|
||||||
|
//! connections draw a fresh circuit; combined with `IsolateDestAddr`
|
||||||
|
//! in torrc this is usually enough to clear the failure.
|
||||||
|
//!
|
||||||
|
//! Scope is deliberately tiny — `AUTHENTICATE` + `SIGNAL NEWNYM` over
|
||||||
|
//! a one-shot TCP connection. No `torut` dep, no hidden-service
|
||||||
|
//! plumbing, no event streaming.
|
||||||
|
//!
|
||||||
|
//! **Caveat for in-flight connections:** Chromium reuses sockets, so a
|
||||||
|
//! `NEWNYM` only affects *new* connections (in TOR terms, new circuits).
|
||||||
|
//! That's fine for our retry path — the next navigation opens a fresh
|
||||||
|
//! connection. We do not try to forcibly close existing streams.
|
||||||
|
|
||||||
|
use std::path::{Path, PathBuf};
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use anyhow::{anyhow, bail, Context};
|
||||||
|
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
|
||||||
|
use tokio::net::TcpStream;
|
||||||
|
use tokio::time::timeout;
|
||||||
|
|
||||||
|
/// Default control-port (`tor --defaults-torrc` ships 9051).
|
||||||
|
const DEFAULT_CONTROL_PORT: u16 = 9051;
|
||||||
|
/// Connect timeout — generous enough for a slow compose start, short
|
||||||
|
/// enough that a misconfigured controller doesn't stall a crawl.
|
||||||
|
const CONNECT_TIMEOUT: Duration = Duration::from_secs(5);
|
||||||
|
/// Per-command read timeout. `SIGNAL NEWNYM` returns instantly on the
|
||||||
|
/// happy path; bound it so a half-broken control port can't hang us.
|
||||||
|
const READ_TIMEOUT: Duration = Duration::from_secs(5);
|
||||||
|
|
||||||
|
/// How the controller authenticates to the control port.
|
||||||
|
///
|
||||||
|
/// `Cookie` is preferred for compose deploys where the auth cookie file
|
||||||
|
/// is shared between the `tor` and `backend` containers via a named
|
||||||
|
/// volume. `Password` is the fallback when the cookie file isn't
|
||||||
|
/// reachable (different gid, no shared volume, etc.). `None` matches a
|
||||||
|
/// torrc with no `CookieAuthentication 1` and no `HashedControlPassword`
|
||||||
|
/// — useful for local experimentation, not for production.
|
||||||
|
///
|
||||||
|
/// `Debug` is implemented manually to redact the password (and the
|
||||||
|
/// cookie path, which is non-sensitive but uninteresting in logs).
|
||||||
|
/// Don't add `#[derive(Debug)]` — the controller is `?`-logged at
|
||||||
|
/// startup and a derive would expand the password into the trace.
|
||||||
|
#[derive(Clone)]
|
||||||
|
pub enum TorAuth {
|
||||||
|
None,
|
||||||
|
Password(String),
|
||||||
|
Cookie(PathBuf),
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::fmt::Debug for TorAuth {
|
||||||
|
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||||
|
match self {
|
||||||
|
TorAuth::None => f.write_str("None"),
|
||||||
|
TorAuth::Password(_) => f.write_str("Password(<redacted>)"),
|
||||||
|
TorAuth::Cookie(_) => f.write_str("Cookie(<path>)"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub struct TorController {
|
||||||
|
/// `host:port` string. Kept as a string (not a `SocketAddr`) so
|
||||||
|
/// docker-compose hostnames like `tor:9051` resolve at connect time.
|
||||||
|
addr: String,
|
||||||
|
auth: TorAuth,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TorController {
|
||||||
|
pub fn new(addr: impl Into<String>, auth: TorAuth) -> Self {
|
||||||
|
Self { addr: addr.into(), auth }
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Build a controller from the env-config shape:
|
||||||
|
/// `url` (e.g. `tcp://tor:9051`, `127.0.0.1:9051`, or `tor`),
|
||||||
|
/// optional password, optional cookie path. Returns `Ok(None)` when
|
||||||
|
/// `url` is absent — that's the "TOR feature disabled" signal.
|
||||||
|
/// Cookie wins over password when both are set (rotates with TOR;
|
||||||
|
/// no secret to manage).
|
||||||
|
pub fn from_parts(
|
||||||
|
url: Option<&str>,
|
||||||
|
password: Option<&str>,
|
||||||
|
cookie_path: Option<&Path>,
|
||||||
|
) -> anyhow::Result<Option<Self>> {
|
||||||
|
let Some(url) = url else { return Ok(None) };
|
||||||
|
let addr = parse_control_url(url)?;
|
||||||
|
let auth = match (cookie_path, password) {
|
||||||
|
(Some(p), _) => TorAuth::Cookie(p.to_path_buf()),
|
||||||
|
(None, Some(p)) => TorAuth::Password(p.to_string()),
|
||||||
|
(None, None) => TorAuth::None,
|
||||||
|
};
|
||||||
|
Ok(Some(Self { addr, auth }))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Open the control port, `AUTHENTICATE`, `SIGNAL NEWNYM`, `QUIT`.
|
||||||
|
/// Each invocation is a fresh connection; the controller is cheap
|
||||||
|
/// to clone and stateless across calls.
|
||||||
|
pub async fn new_identity(&self) -> anyhow::Result<()> {
|
||||||
|
let stream = timeout(CONNECT_TIMEOUT, TcpStream::connect(&self.addr))
|
||||||
|
.await
|
||||||
|
.with_context(|| {
|
||||||
|
format!("timed out connecting to TOR control port {}", self.addr)
|
||||||
|
})?
|
||||||
|
.with_context(|| format!("connect to TOR control port {}", self.addr))?;
|
||||||
|
let (read, mut write) = stream.into_split();
|
||||||
|
let mut read = BufReader::new(read);
|
||||||
|
|
||||||
|
let auth_line = self.build_auth_line().await?;
|
||||||
|
write_line(&mut write, &auth_line).await?;
|
||||||
|
timeout(READ_TIMEOUT, expect_250(&mut read))
|
||||||
|
.await
|
||||||
|
.map_err(|_| anyhow!("TOR control AUTHENTICATE timed out"))?
|
||||||
|
.context("AUTHENTICATE")?;
|
||||||
|
|
||||||
|
write_line(&mut write, "SIGNAL NEWNYM").await?;
|
||||||
|
timeout(READ_TIMEOUT, expect_250(&mut read))
|
||||||
|
.await
|
||||||
|
.map_err(|_| anyhow!("TOR control SIGNAL NEWNYM timed out"))?
|
||||||
|
.context("SIGNAL NEWNYM")?;
|
||||||
|
|
||||||
|
// QUIT is courtesy; ignore errors — the daemon may close the
|
||||||
|
// socket before our QUIT lands and that's perfectly fine.
|
||||||
|
let _ = write_line(&mut write, "QUIT").await;
|
||||||
|
// Debug-level: a busy crawl can rotate circuits many times per
|
||||||
|
// minute, INFO is too chatty. Failures still log at WARN.
|
||||||
|
tracing::debug!(addr = %self.addr, "TOR NEWNYM signaled");
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn build_auth_line(&self) -> anyhow::Result<String> {
|
||||||
|
match &self.auth {
|
||||||
|
TorAuth::None => Ok("AUTHENTICATE".to_string()),
|
||||||
|
TorAuth::Password(p) => Ok(format!("AUTHENTICATE \"{}\"", escape_quoted(p))),
|
||||||
|
TorAuth::Cookie(path) => {
|
||||||
|
let bytes = tokio::fs::read(path)
|
||||||
|
.await
|
||||||
|
.with_context(|| format!("read TOR cookie file {}", path.display()))?;
|
||||||
|
Ok(format!("AUTHENTICATE {}", hex_encode(&bytes)))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Parse `tcp://host:port`, `host:port`, or bare `host` into a
|
||||||
|
/// connect-time string. Default port is [`DEFAULT_CONTROL_PORT`].
|
||||||
|
fn parse_control_url(url: &str) -> anyhow::Result<String> {
|
||||||
|
let stripped = url.strip_prefix("tcp://").unwrap_or(url);
|
||||||
|
if stripped.is_empty() {
|
||||||
|
bail!("TOR control url is empty");
|
||||||
|
}
|
||||||
|
if stripped.contains(':') {
|
||||||
|
Ok(stripped.to_string())
|
||||||
|
} else {
|
||||||
|
Ok(format!("{stripped}:{DEFAULT_CONTROL_PORT}"))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn escape_quoted(s: &str) -> String {
|
||||||
|
s.replace('\\', r"\\").replace('"', r#"\""#)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn hex_encode(bytes: &[u8]) -> String {
|
||||||
|
let mut s = String::with_capacity(bytes.len() * 2);
|
||||||
|
for b in bytes {
|
||||||
|
s.push_str(&format!("{b:02x}"));
|
||||||
|
}
|
||||||
|
s
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn write_line<W: tokio::io::AsyncWrite + Unpin>(
|
||||||
|
w: &mut W,
|
||||||
|
line: &str,
|
||||||
|
) -> anyhow::Result<()> {
|
||||||
|
w.write_all(line.as_bytes()).await?;
|
||||||
|
w.write_all(b"\r\n").await?;
|
||||||
|
w.flush().await?;
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Drain a TOR control reply, accepting only status `250`. Handles
|
||||||
|
/// the protocol's three line forms: `XYZ ...` (single/end), `XYZ-...`
|
||||||
|
/// (continuation), `XYZ+...` (data block ended by a lone `.`). Our
|
||||||
|
/// commands only ever produce single-line `250 OK`, but we honor the
|
||||||
|
/// continuation forms so a future torrc that adds events / banners
|
||||||
|
/// doesn't confuse the parser.
|
||||||
|
async fn expect_250<R: AsyncBufReadExt + Unpin>(r: &mut R) -> anyhow::Result<()> {
|
||||||
|
loop {
|
||||||
|
let mut line = String::new();
|
||||||
|
let n = r.read_line(&mut line).await?;
|
||||||
|
if n == 0 {
|
||||||
|
bail!("TOR control port closed connection mid-reply");
|
||||||
|
}
|
||||||
|
let trimmed = line.trim_end_matches(['\r', '\n']);
|
||||||
|
if trimmed.len() < 4 {
|
||||||
|
bail!("malformed TOR control reply: {trimmed:?}");
|
||||||
|
}
|
||||||
|
let (code, rest) = trimmed.split_at(3);
|
||||||
|
if code != "250" {
|
||||||
|
bail!("TOR control replied {trimmed:?}");
|
||||||
|
}
|
||||||
|
let sep = rest.as_bytes()[0];
|
||||||
|
match sep {
|
||||||
|
b' ' => return Ok(()),
|
||||||
|
b'-' => continue,
|
||||||
|
b'+' => {
|
||||||
|
// Data block — read until a line consisting of only ".".
|
||||||
|
loop {
|
||||||
|
let mut data = String::new();
|
||||||
|
let n = r.read_line(&mut data).await?;
|
||||||
|
if n == 0 {
|
||||||
|
bail!("TOR control port closed mid-data-block");
|
||||||
|
}
|
||||||
|
if data.trim_end_matches(['\r', '\n']) == "." {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ => bail!("malformed TOR control reply separator: {trimmed:?}"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use std::sync::{Arc, Mutex};
|
||||||
|
use tokio::io::AsyncWriteExt;
|
||||||
|
use tokio::net::TcpListener;
|
||||||
|
|
||||||
|
/// Spawn a mock control port that responds to each \r\n-terminated
|
||||||
|
/// inbound line with the next entry from `replies`. Each reply has
|
||||||
|
/// its own `\r\n` appended. Records received lines into `recorder`.
|
||||||
|
/// After `replies.len()` exchanges the task drops the socket — this
|
||||||
|
/// matches the real TOR behavior for QUIT (close after acking).
|
||||||
|
async fn spawn_mock(
|
||||||
|
replies: Vec<&'static str>,
|
||||||
|
recorder: Arc<Mutex<Vec<String>>>,
|
||||||
|
) -> String {
|
||||||
|
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
|
||||||
|
let addr = listener.local_addr().unwrap().to_string();
|
||||||
|
tokio::spawn(async move {
|
||||||
|
let (sock, _) = listener.accept().await.unwrap();
|
||||||
|
let (r, mut w) = sock.into_split();
|
||||||
|
let mut r = BufReader::new(r);
|
||||||
|
for reply in replies {
|
||||||
|
let mut line = String::new();
|
||||||
|
let n = r.read_line(&mut line).await.unwrap_or(0);
|
||||||
|
if n == 0 {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
recorder
|
||||||
|
.lock()
|
||||||
|
.unwrap()
|
||||||
|
.push(line.trim_end_matches(['\r', '\n']).to_string());
|
||||||
|
w.write_all(reply.as_bytes()).await.unwrap();
|
||||||
|
w.write_all(b"\r\n").await.unwrap();
|
||||||
|
w.flush().await.unwrap();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
addr
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn password_auth_then_newnym_writes_expected_sequence() {
|
||||||
|
let recorder = Arc::new(Mutex::new(Vec::new()));
|
||||||
|
// Two replies: AUTHENTICATE then SIGNAL NEWNYM. QUIT is
|
||||||
|
// fire-and-forget; the mock dropping the socket is the
|
||||||
|
// expected real-world behavior.
|
||||||
|
let addr =
|
||||||
|
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
|
||||||
|
let controller = TorController::new(addr, TorAuth::Password("secret".into()));
|
||||||
|
controller.new_identity().await.expect("new_identity ok");
|
||||||
|
let recorded = recorder.lock().unwrap().clone();
|
||||||
|
assert_eq!(recorded.first().map(String::as_str), Some("AUTHENTICATE \"secret\""));
|
||||||
|
assert_eq!(recorded.get(1).map(String::as_str), Some("SIGNAL NEWNYM"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn cookie_auth_hex_encodes_file_bytes() {
|
||||||
|
let tmp = tempfile::NamedTempFile::new().unwrap();
|
||||||
|
let cookie: Vec<u8> = (0u8..32).collect();
|
||||||
|
std::fs::write(tmp.path(), &cookie).unwrap();
|
||||||
|
let recorder = Arc::new(Mutex::new(Vec::new()));
|
||||||
|
let addr =
|
||||||
|
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
|
||||||
|
let controller =
|
||||||
|
TorController::new(addr, TorAuth::Cookie(tmp.path().to_path_buf()));
|
||||||
|
controller.new_identity().await.expect("new_identity ok");
|
||||||
|
let recorded = recorder.lock().unwrap().clone();
|
||||||
|
let expected_hex: String = cookie.iter().map(|b| format!("{b:02x}")).collect();
|
||||||
|
assert_eq!(
|
||||||
|
recorded.first().map(String::as_str),
|
||||||
|
Some(format!("AUTHENTICATE {expected_hex}").as_str())
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn no_auth_sends_bare_authenticate() {
|
||||||
|
let recorder = Arc::new(Mutex::new(Vec::new()));
|
||||||
|
let addr =
|
||||||
|
spawn_mock(vec!["250 OK", "250 OK"], Arc::clone(&recorder)).await;
|
||||||
|
let controller = TorController::new(addr, TorAuth::None);
|
||||||
|
controller.new_identity().await.expect("new_identity ok");
|
||||||
|
let recorded = recorder.lock().unwrap().clone();
|
||||||
|
assert_eq!(recorded.first().map(String::as_str), Some("AUTHENTICATE"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn non_250_reply_returns_err_with_reply_text() {
|
||||||
|
let recorder = Arc::new(Mutex::new(Vec::new()));
|
||||||
|
let addr = spawn_mock(
|
||||||
|
vec!["515 Bad authentication"],
|
||||||
|
Arc::clone(&recorder),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
let controller =
|
||||||
|
TorController::new(addr, TorAuth::Password("wrong".into()));
|
||||||
|
let err = controller.new_identity().await.expect_err("should fail");
|
||||||
|
let msg = format!("{err:#}");
|
||||||
|
assert!(msg.contains("515"), "expected 515 in error, got: {msg}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn closed_connection_mid_reply_is_an_error() {
|
||||||
|
// Listener accepts the AUTH line then drops without replying —
|
||||||
|
// this exercises the EOF-mid-reply path in expect_250 (rather
|
||||||
|
// than tor's own error replies which are covered elsewhere).
|
||||||
|
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
|
||||||
|
let addr = listener.local_addr().unwrap().to_string();
|
||||||
|
tokio::spawn(async move {
|
||||||
|
if let Ok((sock, _)) = listener.accept().await {
|
||||||
|
let (r, _w) = sock.into_split();
|
||||||
|
let mut r = BufReader::new(r);
|
||||||
|
let mut line = String::new();
|
||||||
|
let _ = r.read_line(&mut line).await; // read AUTH, ignore
|
||||||
|
// Drop _w (and the read half via scope exit) so the
|
||||||
|
// peer sees an immediate EOF on the next read.
|
||||||
|
}
|
||||||
|
});
|
||||||
|
let controller = TorController::new(addr, TorAuth::None);
|
||||||
|
let err = controller.new_identity().await.expect_err("should fail");
|
||||||
|
let msg = format!("{err:#}");
|
||||||
|
assert!(
|
||||||
|
msg.contains("closed connection"),
|
||||||
|
"expected EOF-mid-reply error, got: {msg}"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[tokio::test]
|
||||||
|
async fn multi_line_250_continuation_is_accepted() {
|
||||||
|
let recorder = Arc::new(Mutex::new(Vec::new()));
|
||||||
|
// AUTHENTICATE reply uses the `250-...\r\n250 OK\r\n` form.
|
||||||
|
// Single reply string contains the whole multi-line response.
|
||||||
|
let addr = spawn_mock(
|
||||||
|
vec!["250-banner=foo\r\n250 OK", "250 OK"],
|
||||||
|
Arc::clone(&recorder),
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
let controller = TorController::new(addr, TorAuth::None);
|
||||||
|
controller.new_identity().await.expect("new_identity ok");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn from_parts_returns_none_when_url_unset() {
|
||||||
|
let c = TorController::from_parts(None, None, None).unwrap();
|
||||||
|
assert!(c.is_none());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn from_parts_prefers_cookie_over_password() {
|
||||||
|
let c = TorController::from_parts(
|
||||||
|
Some("tor:9051"),
|
||||||
|
Some("pw"),
|
||||||
|
Some(Path::new("/var/lib/tor/control_auth_cookie")),
|
||||||
|
)
|
||||||
|
.unwrap()
|
||||||
|
.expect("controller built");
|
||||||
|
assert!(matches!(c.auth, TorAuth::Cookie(_)));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn from_parts_falls_back_to_password_without_cookie() {
|
||||||
|
let c = TorController::from_parts(Some("tor:9051"), Some("pw"), None)
|
||||||
|
.unwrap()
|
||||||
|
.expect("controller built");
|
||||||
|
assert!(matches!(c.auth, TorAuth::Password(p) if p == "pw"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parse_control_url_accepts_tcp_scheme() {
|
||||||
|
assert_eq!(parse_control_url("tcp://127.0.0.1:9051").unwrap(), "127.0.0.1:9051");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parse_control_url_defaults_port_when_omitted() {
|
||||||
|
assert_eq!(parse_control_url("tor").unwrap(), "tor:9051");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parse_control_url_passes_through_host_port() {
|
||||||
|
assert_eq!(parse_control_url("tor:9999").unwrap(), "tor:9999");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parse_control_url_rejects_empty() {
|
||||||
|
assert!(parse_control_url("").is_err());
|
||||||
|
assert!(parse_control_url("tcp://").is_err());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn escape_quoted_handles_quotes_and_backslashes() {
|
||||||
|
assert_eq!(escape_quoted(r#"a"b\c"#), r#"a\"b\\c"#);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn debug_format_redacts_password_and_cookie_path() {
|
||||||
|
// Regression: app.rs / bin/crawler.rs log the controller at
|
||||||
|
// startup via `tracing::info!(?t, ...)`. A derived Debug on
|
||||||
|
// TorAuth would expand TorAuth::Password(p) and leak the
|
||||||
|
// plaintext into logs.
|
||||||
|
let c = TorController::new("tor:9051", TorAuth::Password("super-secret".into()));
|
||||||
|
let dbg = format!("{c:?}");
|
||||||
|
assert!(!dbg.contains("super-secret"), "password leaked: {dbg}");
|
||||||
|
assert!(dbg.contains("<redacted>"), "expected <redacted>, got: {dbg}");
|
||||||
|
|
||||||
|
let c = TorController::new(
|
||||||
|
"tor:9051",
|
||||||
|
TorAuth::Cookie("/var/lib/tor/control_auth_cookie".into()),
|
||||||
|
);
|
||||||
|
let dbg = format!("{c:?}");
|
||||||
|
assert!(!dbg.contains("control_auth_cookie"), "cookie path leaked: {dbg}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn hex_encode_zero_pads_low_bytes() {
|
||||||
|
assert_eq!(hex_encode(&[0x00, 0x0f, 0xff]), "000fff");
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -91,6 +91,26 @@ pub fn registrable_domain(url: &str) -> Option<String> {
|
|||||||
Some(format!(".{}", registrable.join(".")))
|
Some(format!(".{}", registrable.join(".")))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Normalise a SOCKS proxy URL for Chromium's `--proxy-server=` flag.
|
||||||
|
///
|
||||||
|
/// reqwest accepts both `socks5://` (resolve locally) and
|
||||||
|
/// `socks5h://` (resolve via the SOCKS server — important when the
|
||||||
|
/// proxy is TOR and we don't want the host's resolver to see the
|
||||||
|
/// target hostname). Chromium does **not** know the `socks5h` scheme
|
||||||
|
/// and refuses navigations with `ERR_NO_SUPPORTED_PROXIES`. It
|
||||||
|
/// already sends destination hostnames over SOCKS5 by default
|
||||||
|
/// regardless, so stripping the `h` is a pure scheme rename — the
|
||||||
|
/// remote-DNS behaviour is preserved.
|
||||||
|
///
|
||||||
|
/// Non-SOCKS schemes pass through unchanged.
|
||||||
|
pub fn chromium_proxy_arg(proxy: &str) -> String {
|
||||||
|
if let Some(rest) = proxy.strip_prefix("socks5h://") {
|
||||||
|
format!("socks5://{rest}")
|
||||||
|
} else {
|
||||||
|
proxy.to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::*;
|
use super::*;
|
||||||
@@ -191,4 +211,34 @@ mod tests {
|
|||||||
Some("[2001:db8::1]")
|
Some("[2001:db8::1]")
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn chromium_proxy_arg_strips_socks5h_to_socks5() {
|
||||||
|
// Regression: passing socks5h:// to Chromium yields
|
||||||
|
// ERR_NO_SUPPORTED_PROXIES at navigation time.
|
||||||
|
assert_eq!(
|
||||||
|
chromium_proxy_arg("socks5h://127.0.0.1:9050"),
|
||||||
|
"socks5://127.0.0.1:9050"
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
chromium_proxy_arg("socks5h://tor:9050"),
|
||||||
|
"socks5://tor:9050"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn chromium_proxy_arg_passes_socks5_unchanged() {
|
||||||
|
assert_eq!(
|
||||||
|
chromium_proxy_arg("socks5://127.0.0.1:9050"),
|
||||||
|
"socks5://127.0.0.1:9050"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn chromium_proxy_arg_passes_non_socks_unchanged() {
|
||||||
|
assert_eq!(
|
||||||
|
chromium_proxy_arg("http://proxy.example:8080"),
|
||||||
|
"http://proxy.example:8080"
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -21,6 +21,11 @@ pub enum AppError {
|
|||||||
PayloadTooLarge(String),
|
PayloadTooLarge(String),
|
||||||
#[error("unsupported media type: {0}")]
|
#[error("unsupported media type: {0}")]
|
||||||
UnsupportedMediaType(String),
|
UnsupportedMediaType(String),
|
||||||
|
/// 503 — a feature is currently unavailable, distinct from a 5xx
|
||||||
|
/// internal error. Used when admin actions require the crawler
|
||||||
|
/// daemon but it's been disabled (`CRAWLER_DAEMON=false`).
|
||||||
|
#[error("service unavailable: {0}")]
|
||||||
|
ServiceUnavailable(String),
|
||||||
/// 429 with an optional `Retry-After` header value (in seconds).
|
/// 429 with an optional `Retry-After` header value (in seconds).
|
||||||
#[error("too many requests")]
|
#[error("too many requests")]
|
||||||
TooManyRequests {
|
TooManyRequests {
|
||||||
@@ -56,6 +61,7 @@ impl AppError {
|
|||||||
AppError::Conflict(_) => "conflict",
|
AppError::Conflict(_) => "conflict",
|
||||||
AppError::PayloadTooLarge(_) => "payload_too_large",
|
AppError::PayloadTooLarge(_) => "payload_too_large",
|
||||||
AppError::UnsupportedMediaType(_) => "unsupported_media_type",
|
AppError::UnsupportedMediaType(_) => "unsupported_media_type",
|
||||||
|
AppError::ServiceUnavailable(_) => "service_unavailable",
|
||||||
AppError::TooManyRequests { .. } => "too_many_requests",
|
AppError::TooManyRequests { .. } => "too_many_requests",
|
||||||
AppError::ValidationFailed { .. } => "validation_failed",
|
AppError::ValidationFailed { .. } => "validation_failed",
|
||||||
AppError::Database(sqlx::Error::RowNotFound) => "not_found",
|
AppError::Database(sqlx::Error::RowNotFound) => "not_found",
|
||||||
@@ -85,6 +91,9 @@ impl IntoResponse for AppError {
|
|||||||
AppError::UnsupportedMediaType(msg) => {
|
AppError::UnsupportedMediaType(msg) => {
|
||||||
(StatusCode::UNSUPPORTED_MEDIA_TYPE, msg.clone(), None)
|
(StatusCode::UNSUPPORTED_MEDIA_TYPE, msg.clone(), None)
|
||||||
}
|
}
|
||||||
|
AppError::ServiceUnavailable(msg) => {
|
||||||
|
(StatusCode::SERVICE_UNAVAILABLE, msg.clone(), None)
|
||||||
|
}
|
||||||
AppError::TooManyRequests { retry_after_secs } => {
|
AppError::TooManyRequests { retry_after_secs } => {
|
||||||
// Emit `Retry-After: N` (RFC 6585 §4) so a well-behaved
|
// Emit `Retry-After: N` (RFC 6585 §4) so a well-behaved
|
||||||
// client can back off correctly. Done by building the
|
// client can back off correctly. Done by building the
|
||||||
|
|||||||
@@ -12,15 +12,20 @@ pub async fn list_for_manga(
|
|||||||
limit: i64,
|
limit: i64,
|
||||||
offset: i64,
|
offset: i64,
|
||||||
) -> AppResult<Vec<Chapter>> {
|
) -> AppResult<Vec<Chapter>> {
|
||||||
// Secondary sort by created_at gives duplicate-numbered chapters
|
// Display order = source-site order reversed. The crawler stamps
|
||||||
// (multiple uploaders/translations of the same number) a stable
|
// `source_index` = position in the source DOM (0 = first = newest
|
||||||
// order in lists and prev/next reader navigation.
|
// on this site, see migration 0021), so DESC puts the oldest
|
||||||
|
// chapter first and keeps the site's variant grouping and the
|
||||||
|
// placement of non-numeric entries (e.g. "notice. : Officials")
|
||||||
|
// intact. NULLS LAST keeps user-uploaded chapters (no source row)
|
||||||
|
// and rows that pre-date the migration below crawled rows; the
|
||||||
|
// (number, created_at) tail then orders them deterministically.
|
||||||
let rows = sqlx::query_as::<_, Chapter>(
|
let rows = sqlx::query_as::<_, Chapter>(
|
||||||
r#"
|
r#"
|
||||||
SELECT id, manga_id, number, title, page_count, created_at
|
SELECT id, manga_id, number, title, page_count, created_at
|
||||||
FROM chapters
|
FROM chapters
|
||||||
WHERE manga_id = $1
|
WHERE manga_id = $1
|
||||||
ORDER BY number ASC, created_at ASC
|
ORDER BY source_index DESC NULLS LAST, number ASC, created_at ASC
|
||||||
LIMIT $2 OFFSET $3
|
LIMIT $2 OFFSET $3
|
||||||
"#,
|
"#,
|
||||||
)
|
)
|
||||||
@@ -133,14 +138,18 @@ pub async fn page_count(pool: &PgPool, id: Uuid) -> sqlx::Result<Option<i32>> {
|
|||||||
/// filter — this resolver stays in lockstep so a chapter that was
|
/// filter — this resolver stays in lockstep so a chapter that was
|
||||||
/// dropped between enqueue and lease isn't dispatched against a stale
|
/// dropped between enqueue and lease isn't dispatched against a stale
|
||||||
/// URL.
|
/// URL.
|
||||||
|
/// Returns `(manga_id, source_url, manga_title, chapter_number)`. The
|
||||||
|
/// title + number feed the live "currently crawling" status; the rest is
|
||||||
|
/// what the dispatcher needs to do the work.
|
||||||
pub async fn dispatch_target(
|
pub async fn dispatch_target(
|
||||||
pool: &PgPool,
|
pool: &PgPool,
|
||||||
chapter_id: Uuid,
|
chapter_id: Uuid,
|
||||||
) -> sqlx::Result<Option<(Uuid, String)>> {
|
) -> sqlx::Result<Option<(Uuid, String, String, i32)>> {
|
||||||
sqlx::query_as(
|
sqlx::query_as(
|
||||||
"SELECT c.manga_id, cs.source_url \
|
"SELECT c.manga_id, cs.source_url, m.title, c.number \
|
||||||
FROM chapters c \
|
FROM chapters c \
|
||||||
JOIN chapter_sources cs ON cs.chapter_id = c.id \
|
JOIN chapter_sources cs ON cs.chapter_id = c.id \
|
||||||
|
JOIN mangas m ON m.id = c.manga_id \
|
||||||
WHERE c.id = $1 \
|
WHERE c.id = $1 \
|
||||||
AND cs.dropped_at IS NULL \
|
AND cs.dropped_at IS NULL \
|
||||||
ORDER BY cs.last_seen_at DESC \
|
ORDER BY cs.last_seen_at DESC \
|
||||||
|
|||||||
@@ -17,8 +17,9 @@
|
|||||||
//! Each public function is a transaction boundary so a partial failure
|
//! Each public function is a transaction boundary so a partial failure
|
||||||
//! mid-call leaves the DB in its pre-call state.
|
//! mid-call leaves the DB in its pre-call state.
|
||||||
|
|
||||||
use chrono::Utc;
|
use chrono::{DateTime, Utc};
|
||||||
use sqlx::{PgPool, Postgres, Transaction};
|
use serde::Serialize;
|
||||||
|
use sqlx::{FromRow, PgPool, Postgres, Transaction};
|
||||||
use uuid::Uuid;
|
use uuid::Uuid;
|
||||||
|
|
||||||
use crate::crawler::source::{SourceChapterRef, SourceManga};
|
use crate::crawler::source::{SourceChapterRef, SourceManga};
|
||||||
@@ -352,7 +353,14 @@ pub async fn sync_manga_chapters(
|
|||||||
.map(|c| c.source_chapter_key.clone())
|
.map(|c| c.source_chapter_key.clone())
|
||||||
.collect();
|
.collect();
|
||||||
|
|
||||||
for c in chapters {
|
for (idx, c) in chapters.iter().enumerate() {
|
||||||
|
// `source_index` captures the chapter's position in the source
|
||||||
|
// DOM (0 = first = newest on this site) so the list query can
|
||||||
|
// reverse it for the user-facing list — see migration 0021.
|
||||||
|
// Every sync overwrites the value on both branches, so a new
|
||||||
|
// chapter inserted at the top of the source shifts every other
|
||||||
|
// row down by one on the next tick.
|
||||||
|
let source_index = idx as i32;
|
||||||
// Lookup is constrained by manga_id (via the chapters join) so a
|
// Lookup is constrained by manga_id (via the chapters join) so a
|
||||||
// source whose chapter slugs collide across mangas (e.g.
|
// source whose chapter slugs collide across mangas (e.g.
|
||||||
// "chapter-1" appearing under two different mangas) attributes
|
// "chapter-1" appearing under two different mangas) attributes
|
||||||
@@ -382,14 +390,15 @@ pub async fn sync_manga_chapters(
|
|||||||
// identity is the UUID, not the number.
|
// identity is the UUID, not the number.
|
||||||
let (chapter_id,): (Uuid,) = sqlx::query_as(
|
let (chapter_id,): (Uuid,) = sqlx::query_as(
|
||||||
r#"
|
r#"
|
||||||
INSERT INTO chapters (manga_id, number, title, page_count)
|
INSERT INTO chapters (manga_id, number, title, page_count, source_index)
|
||||||
VALUES ($1, $2, $3, 0)
|
VALUES ($1, $2, $3, 0, $4)
|
||||||
RETURNING id
|
RETURNING id
|
||||||
"#,
|
"#,
|
||||||
)
|
)
|
||||||
.bind(manga_id)
|
.bind(manga_id)
|
||||||
.bind(c.number)
|
.bind(c.number)
|
||||||
.bind(c.title.as_deref())
|
.bind(c.title.as_deref())
|
||||||
|
.bind(source_index)
|
||||||
.fetch_one(&mut *tx)
|
.fetch_one(&mut *tx)
|
||||||
.await?;
|
.await?;
|
||||||
sqlx::query(
|
sqlx::query(
|
||||||
@@ -408,8 +417,11 @@ pub async fn sync_manga_chapters(
|
|||||||
diff.new += 1;
|
diff.new += 1;
|
||||||
}
|
}
|
||||||
Some((chapter_id,)) => {
|
Some((chapter_id,)) => {
|
||||||
sqlx::query("UPDATE chapters SET title = $1 WHERE id = $2")
|
sqlx::query(
|
||||||
|
"UPDATE chapters SET title = $1, source_index = $2 WHERE id = $3",
|
||||||
|
)
|
||||||
.bind(c.title.as_deref())
|
.bind(c.title.as_deref())
|
||||||
|
.bind(source_index)
|
||||||
.bind(chapter_id)
|
.bind(chapter_id)
|
||||||
.execute(&mut *tx)
|
.execute(&mut *tx)
|
||||||
.await?;
|
.await?;
|
||||||
@@ -542,6 +554,51 @@ pub async fn mark_run_completed(pool: &PgPool, source_id: &str) -> sqlx::Result<
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// List mangas whose `cover_image_path IS NULL` but a live
|
||||||
|
/// `manga_sources` row still attaches them to a source. The bounded
|
||||||
|
/// result feeds the cover-backfill pass in [`crate::crawler::pipeline`]:
|
||||||
|
/// each entry is one (manga, freshest source row) pair where a cover
|
||||||
|
/// re-download is in order.
|
||||||
|
///
|
||||||
|
/// Per-manga deduplication uses `DISTINCT ON (m.id)` keyed on the row
|
||||||
|
/// with the newest `last_seen_at`, so a manga that's surfaced by
|
||||||
|
/// multiple sources only produces one row (the freshest). Sort is
|
||||||
|
/// stable for tests.
|
||||||
|
pub async fn list_missing_covers(
|
||||||
|
pool: &PgPool,
|
||||||
|
max: i64,
|
||||||
|
) -> sqlx::Result<Vec<MissingCoverEntry>> {
|
||||||
|
let rows: Vec<(Uuid, String, String)> = sqlx::query_as(
|
||||||
|
r#"
|
||||||
|
SELECT DISTINCT ON (m.id) m.id, ms.source_manga_key, ms.source_url
|
||||||
|
FROM mangas m
|
||||||
|
JOIN manga_sources ms ON ms.manga_id = m.id
|
||||||
|
WHERE m.cover_image_path IS NULL
|
||||||
|
AND ms.dropped_at IS NULL
|
||||||
|
ORDER BY m.id, ms.last_seen_at DESC
|
||||||
|
LIMIT $1
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(max)
|
||||||
|
.fetch_all(pool)
|
||||||
|
.await?;
|
||||||
|
Ok(rows
|
||||||
|
.into_iter()
|
||||||
|
.map(|(manga_id, source_manga_key, source_url)| MissingCoverEntry {
|
||||||
|
manga_id,
|
||||||
|
source_manga_key,
|
||||||
|
source_url,
|
||||||
|
})
|
||||||
|
.collect())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub struct MissingCoverEntry {
|
||||||
|
pub manga_id: Uuid,
|
||||||
|
pub source_manga_key: String,
|
||||||
|
pub source_url: String,
|
||||||
|
}
|
||||||
|
|
||||||
/// Read the recovery flag for `source_id`. A missing row OR an
|
/// Read the recovery flag for `source_id`. A missing row OR an
|
||||||
/// unparseable value reads as `true` ("clean") — the former covers the
|
/// unparseable value reads as `true` ("clean") — the former covers the
|
||||||
/// first-ever run on a virgin DB (no recovery needed), the latter
|
/// first-ever run on a virgin DB (no recovery needed), the latter
|
||||||
@@ -562,3 +619,327 @@ pub async fn last_run_completed_cleanly(
|
|||||||
.unwrap_or(true))
|
.unwrap_or(true))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Dead-letter jobs: admin observability + requeue.
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
/// A `dead` crawler job joined to its chapter/manga context for the admin
|
||||||
|
/// dead-letter view. Chapter columns are `Option` because the join is
|
||||||
|
/// best-effort (the chapter may have been removed since the job died, or
|
||||||
|
/// the job may be a non-chapter kind).
|
||||||
|
#[derive(Debug, Clone, Serialize, FromRow)]
|
||||||
|
pub struct DeadJob {
|
||||||
|
pub id: Uuid,
|
||||||
|
pub kind: String,
|
||||||
|
pub chapter_id: Option<Uuid>,
|
||||||
|
pub manga_id: Option<Uuid>,
|
||||||
|
pub manga_title: Option<String>,
|
||||||
|
pub chapter_number: Option<i32>,
|
||||||
|
pub attempts: i32,
|
||||||
|
pub max_attempts: i32,
|
||||||
|
pub last_error: Option<String>,
|
||||||
|
pub updated_at: DateTime<Utc>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Paginated list of `dead` jobs, newest-failed first, joined to chapter +
|
||||||
|
/// manga context. `search` filters on manga title (case-insensitive
|
||||||
|
/// substring). Returns the page slice plus the unfiltered-by-page total.
|
||||||
|
pub async fn list_dead_jobs(
|
||||||
|
pool: &PgPool,
|
||||||
|
search: Option<&str>,
|
||||||
|
limit: i64,
|
||||||
|
offset: i64,
|
||||||
|
) -> sqlx::Result<(Vec<DeadJob>, i64)> {
|
||||||
|
let search_pat = search
|
||||||
|
.map(|s| format!("%{}%", s.trim()))
|
||||||
|
.filter(|p| p.len() > 2);
|
||||||
|
|
||||||
|
let items: Vec<DeadJob> = sqlx::query_as(
|
||||||
|
r#"
|
||||||
|
SELECT
|
||||||
|
cj.id,
|
||||||
|
cj.payload->>'kind' AS kind,
|
||||||
|
(cj.payload->>'chapter_id')::uuid AS chapter_id,
|
||||||
|
c.manga_id AS manga_id,
|
||||||
|
m.title AS manga_title,
|
||||||
|
c.number AS chapter_number,
|
||||||
|
cj.attempts,
|
||||||
|
cj.max_attempts,
|
||||||
|
cj.last_error,
|
||||||
|
cj.updated_at
|
||||||
|
FROM crawler_jobs cj
|
||||||
|
LEFT JOIN chapters c ON c.id = (cj.payload->>'chapter_id')::uuid
|
||||||
|
LEFT JOIN mangas m ON m.id = c.manga_id
|
||||||
|
WHERE cj.state = 'dead'
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
ORDER BY cj.updated_at DESC
|
||||||
|
LIMIT $2 OFFSET $3
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.bind(limit)
|
||||||
|
.bind(offset)
|
||||||
|
.fetch_all(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let total: i64 = sqlx::query_scalar(
|
||||||
|
r#"
|
||||||
|
SELECT COUNT(*)
|
||||||
|
FROM crawler_jobs cj
|
||||||
|
LEFT JOIN chapters c ON c.id = (cj.payload->>'chapter_id')::uuid
|
||||||
|
LEFT JOIN mangas m ON m.id = c.manga_id
|
||||||
|
WHERE cj.state = 'dead'
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok((items, total))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// An in-flight chapter-content job (`pending` or `running`) joined to its
|
||||||
|
/// chapter + manga, for the "queued chapters" admin view.
|
||||||
|
#[derive(Debug, Clone, Serialize, FromRow)]
|
||||||
|
pub struct ActiveJob {
|
||||||
|
pub id: Uuid,
|
||||||
|
pub chapter_id: Option<Uuid>,
|
||||||
|
pub manga_id: Option<Uuid>,
|
||||||
|
pub manga_title: Option<String>,
|
||||||
|
pub chapter_number: Option<i32>,
|
||||||
|
/// `"pending"` or `"running"`.
|
||||||
|
pub state: String,
|
||||||
|
pub attempts: i32,
|
||||||
|
pub max_attempts: i32,
|
||||||
|
pub updated_at: DateTime<Utc>,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Paginated list of `pending`/`running` chapter-content jobs (which
|
||||||
|
/// chapters of which mangas are queued or being crawled). Running first,
|
||||||
|
/// then by scheduled order. `search` filters on manga title.
|
||||||
|
pub async fn list_active_jobs(
|
||||||
|
pool: &PgPool,
|
||||||
|
search: Option<&str>,
|
||||||
|
limit: i64,
|
||||||
|
offset: i64,
|
||||||
|
) -> sqlx::Result<(Vec<ActiveJob>, i64)> {
|
||||||
|
let search_pat = search
|
||||||
|
.map(|s| format!("%{}%", s.trim()))
|
||||||
|
.filter(|p| p.len() > 2);
|
||||||
|
|
||||||
|
let items: Vec<ActiveJob> = sqlx::query_as(
|
||||||
|
r#"
|
||||||
|
SELECT
|
||||||
|
cj.id,
|
||||||
|
(cj.payload->>'chapter_id')::uuid AS chapter_id,
|
||||||
|
c.manga_id AS manga_id,
|
||||||
|
m.title AS manga_title,
|
||||||
|
c.number AS chapter_number,
|
||||||
|
cj.state,
|
||||||
|
cj.attempts,
|
||||||
|
cj.max_attempts,
|
||||||
|
cj.updated_at
|
||||||
|
FROM crawler_jobs cj
|
||||||
|
LEFT JOIN chapters c ON c.id = (cj.payload->>'chapter_id')::uuid
|
||||||
|
LEFT JOIN mangas m ON m.id = c.manga_id
|
||||||
|
WHERE cj.state IN ('pending','running')
|
||||||
|
AND cj.payload->>'kind' = 'sync_chapter_content'
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
ORDER BY (cj.state = 'running') DESC, cj.scheduled_at, cj.created_at
|
||||||
|
LIMIT $2 OFFSET $3
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.bind(limit)
|
||||||
|
.bind(offset)
|
||||||
|
.fetch_all(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let total: i64 = sqlx::query_scalar(
|
||||||
|
r#"
|
||||||
|
SELECT COUNT(*)
|
||||||
|
FROM crawler_jobs cj
|
||||||
|
LEFT JOIN chapters c ON c.id = (cj.payload->>'chapter_id')::uuid
|
||||||
|
LEFT JOIN mangas m ON m.id = c.manga_id
|
||||||
|
WHERE cj.state IN ('pending','running')
|
||||||
|
AND cj.payload->>'kind' = 'sync_chapter_content'
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok((items, total))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// A manga whose cover is still missing (queued for cover fetch).
|
||||||
|
#[derive(Debug, Clone, Serialize, FromRow)]
|
||||||
|
pub struct MissingCoverRow {
|
||||||
|
pub manga_id: Uuid,
|
||||||
|
pub manga_title: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Count mangas with no cover yet but a live source row — the cover
|
||||||
|
/// backlog the metadata pass + backfill drain.
|
||||||
|
pub async fn count_missing_covers(pool: &PgPool) -> sqlx::Result<i64> {
|
||||||
|
sqlx::query_scalar(
|
||||||
|
r#"
|
||||||
|
SELECT COUNT(*) FROM mangas m
|
||||||
|
WHERE m.cover_image_path IS NULL
|
||||||
|
AND EXISTS (
|
||||||
|
SELECT 1 FROM manga_sources ms
|
||||||
|
WHERE ms.manga_id = m.id AND ms.dropped_at IS NULL
|
||||||
|
)
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Paginated list of mangas queued for a cover fetch (no cover yet + a live
|
||||||
|
/// source), with titles. `search` filters on title. Freshest source first.
|
||||||
|
pub async fn list_missing_cover_mangas(
|
||||||
|
pool: &PgPool,
|
||||||
|
search: Option<&str>,
|
||||||
|
limit: i64,
|
||||||
|
offset: i64,
|
||||||
|
) -> sqlx::Result<(Vec<MissingCoverRow>, i64)> {
|
||||||
|
let search_pat = search
|
||||||
|
.map(|s| format!("%{}%", s.trim()))
|
||||||
|
.filter(|p| p.len() > 2);
|
||||||
|
|
||||||
|
let items: Vec<MissingCoverRow> = sqlx::query_as(
|
||||||
|
r#"
|
||||||
|
SELECT m.id AS manga_id, m.title AS manga_title
|
||||||
|
FROM mangas m
|
||||||
|
WHERE m.cover_image_path IS NULL
|
||||||
|
AND EXISTS (
|
||||||
|
SELECT 1 FROM manga_sources ms
|
||||||
|
WHERE ms.manga_id = m.id AND ms.dropped_at IS NULL
|
||||||
|
)
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
ORDER BY m.updated_at DESC
|
||||||
|
LIMIT $2 OFFSET $3
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.bind(limit)
|
||||||
|
.bind(offset)
|
||||||
|
.fetch_all(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
let total: i64 = sqlx::query_scalar(
|
||||||
|
r#"
|
||||||
|
SELECT COUNT(*) FROM mangas m
|
||||||
|
WHERE m.cover_image_path IS NULL
|
||||||
|
AND EXISTS (
|
||||||
|
SELECT 1 FROM manga_sources ms
|
||||||
|
WHERE ms.manga_id = m.id AND ms.dropped_at IS NULL
|
||||||
|
)
|
||||||
|
AND ($1::text IS NULL OR m.title ILIKE $1)
|
||||||
|
"#,
|
||||||
|
)
|
||||||
|
.bind(&search_pat)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await?;
|
||||||
|
|
||||||
|
Ok((items, total))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Scope of a dead-job requeue.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
pub enum RequeueScope {
|
||||||
|
/// Every dead job.
|
||||||
|
All,
|
||||||
|
/// Dead jobs whose chapter belongs to this manga.
|
||||||
|
Manga(Uuid),
|
||||||
|
/// Dead jobs for a single chapter.
|
||||||
|
Chapter(Uuid),
|
||||||
|
/// A single dead job by its id.
|
||||||
|
Job(Uuid),
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Requeue dead jobs back to `pending` with a fresh attempt budget. This is
|
||||||
|
/// an explicit operator override, so it bypasses the dead-letter quarantine
|
||||||
|
/// the enqueue helpers honour (we act directly on the row). Returns the
|
||||||
|
/// number of rows requeued.
|
||||||
|
///
|
||||||
|
/// Two invariants protect the partial unique dedup index
|
||||||
|
/// `crawler_jobs_chapter_content_dedup_idx` (one `pending|running`
|
||||||
|
/// sync_chapter_content job per chapter):
|
||||||
|
/// 1. A chapter that already has a live (`pending|running`) job is
|
||||||
|
/// skipped entirely (`NO_LIVE_DUP`).
|
||||||
|
/// 2. When a chapter has *multiple* dead jobs, only the newest is
|
||||||
|
/// revived (`DISTINCT ON` the chapter key) — without this, flipping
|
||||||
|
/// two dead rows for the same chapter to `pending` in one statement
|
||||||
|
/// would violate the index and abort the whole requeue. Non-chapter
|
||||||
|
/// jobs fall back to their row id so each stays distinct.
|
||||||
|
pub async fn requeue_dead_jobs(pool: &PgPool, scope: RequeueScope) -> sqlx::Result<u64> {
|
||||||
|
// Scope predicate spliced into the `pick` CTE. Only compile-time
|
||||||
|
// literals are interpolated; all values are bound below.
|
||||||
|
let scope_pred: &str = match scope {
|
||||||
|
RequeueScope::All => "",
|
||||||
|
RequeueScope::Manga(_) => {
|
||||||
|
"AND (cj.payload->>'chapter_id')::uuid IN \
|
||||||
|
(SELECT id FROM chapters WHERE manga_id = $1)"
|
||||||
|
}
|
||||||
|
RequeueScope::Chapter(_) => "AND (cj.payload->>'chapter_id')::uuid = $1",
|
||||||
|
RequeueScope::Job(_) => "AND cj.id = $1",
|
||||||
|
};
|
||||||
|
|
||||||
|
let sql = format!(
|
||||||
|
r#"
|
||||||
|
WITH pick AS (
|
||||||
|
SELECT DISTINCT ON (COALESCE(cj.payload->>'chapter_id', cj.id::text)) cj.id
|
||||||
|
FROM crawler_jobs cj
|
||||||
|
WHERE cj.state = 'dead'
|
||||||
|
{scope_pred}
|
||||||
|
AND NOT EXISTS (
|
||||||
|
SELECT 1 FROM crawler_jobs live
|
||||||
|
WHERE live.payload->>'kind' = 'sync_chapter_content'
|
||||||
|
AND live.payload->>'chapter_id' = cj.payload->>'chapter_id'
|
||||||
|
AND live.state IN ('pending','running')
|
||||||
|
)
|
||||||
|
ORDER BY COALESCE(cj.payload->>'chapter_id', cj.id::text), cj.updated_at DESC
|
||||||
|
)
|
||||||
|
UPDATE crawler_jobs
|
||||||
|
SET state = 'pending', attempts = 0, leased_until = NULL,
|
||||||
|
last_error = NULL, scheduled_at = now(), updated_at = now()
|
||||||
|
FROM pick
|
||||||
|
WHERE crawler_jobs.id = pick.id
|
||||||
|
"#
|
||||||
|
);
|
||||||
|
|
||||||
|
let mut q = sqlx::query(&sql);
|
||||||
|
match scope {
|
||||||
|
RequeueScope::All => {}
|
||||||
|
RequeueScope::Manga(id) | RequeueScope::Chapter(id) | RequeueScope::Job(id) => {
|
||||||
|
q = q.bind(id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Ok(q.execute(pool).await?.rows_affected())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Count crawler jobs grouped by state — drives the dashboard queue
|
||||||
|
/// gauges. Returns `(pending, running, dead)`.
|
||||||
|
pub async fn job_state_counts(pool: &PgPool) -> sqlx::Result<(i64, i64, i64)> {
|
||||||
|
let rows: Vec<(String, i64)> =
|
||||||
|
sqlx::query_as("SELECT state, COUNT(*) FROM crawler_jobs GROUP BY state")
|
||||||
|
.fetch_all(pool)
|
||||||
|
.await?;
|
||||||
|
let mut pending = 0;
|
||||||
|
let mut running = 0;
|
||||||
|
let mut dead = 0;
|
||||||
|
for (state, n) in rows {
|
||||||
|
match state.as_str() {
|
||||||
|
"pending" => pending = n,
|
||||||
|
"running" => running = n,
|
||||||
|
"dead" => dead = n,
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Ok((pending, running, dead))
|
||||||
|
}
|
||||||
|
|
||||||
|
|||||||
344
backend/tests/api_admin_crawler.rs
Normal file
344
backend/tests/api_admin_crawler.rs
Normal file
@@ -0,0 +1,344 @@
|
|||||||
|
//! Integration tests for the admin crawler observability/control API.
|
||||||
|
//!
|
||||||
|
//! The default test harness wires `AppState.crawler = None` (no daemon),
|
||||||
|
//! so the *control* endpoints return 503 and the *read* endpoints that
|
||||||
|
//! work off the DB (status shell, dead-jobs list/requeue) still function.
|
||||||
|
//! This is exactly the production "daemon disabled" posture.
|
||||||
|
|
||||||
|
mod common;
|
||||||
|
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
|
use axum::http::StatusCode;
|
||||||
|
use axum::Router;
|
||||||
|
use http_body_util::BodyExt;
|
||||||
|
use serde_json::json;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use tower::ServiceExt;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use common::{body_json, get, get_with_cookie, post_json_with_cookie, register_user, harness};
|
||||||
|
|
||||||
|
async fn seed_admin(pool: &PgPool, app: &Router) -> String {
|
||||||
|
let (username, cookie) = register_user(app).await;
|
||||||
|
let u = mangalord::repo::user::find_by_username(pool, &username)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
mangalord::repo::user::set_is_admin_unchecked(pool, u.id, true)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
cookie
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn seed_dead_job(pool: &PgPool, title: &str) -> Uuid {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
let chapter_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title) VALUES ($1, $2)")
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(title)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO chapters (id, manga_id, number) VALUES ($1, $2, 1)")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let job_id = Uuid::new_v4();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO crawler_jobs (id, payload, state, attempts, last_error) \
|
||||||
|
VALUES ($1, $2, 'dead', 5, 'boom')",
|
||||||
|
)
|
||||||
|
.bind(job_id)
|
||||||
|
.bind(json!({
|
||||||
|
"kind": "sync_chapter_content",
|
||||||
|
"source_id": "target",
|
||||||
|
"chapter_id": chapter_id,
|
||||||
|
"source_chapter_key": "k",
|
||||||
|
}))
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
job_id
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Seed a chapter-content job in a given state ('pending'/'running').
|
||||||
|
async fn seed_job(pool: &PgPool, title: &str, state: &str) {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
let chapter_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title) VALUES ($1, $2)")
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(title)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO chapters (id, manga_id, number) VALUES ($1, $2, 1)")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO crawler_jobs (id, payload, state) VALUES ($1, $2, $3)")
|
||||||
|
.bind(Uuid::new_v4())
|
||||||
|
.bind(json!({
|
||||||
|
"kind": "sync_chapter_content",
|
||||||
|
"source_id": "target",
|
||||||
|
"chapter_id": chapter_id,
|
||||||
|
"source_chapter_key": "k",
|
||||||
|
}))
|
||||||
|
.bind(state)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Seed a manga with no cover + a live source row (queued for cover fetch).
|
||||||
|
async fn seed_missing_cover(pool: &PgPool, title: &str) {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title, cover_image_path) VALUES ($1, $2, NULL)")
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(title)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO sources (id, name, base_url) VALUES ('target','T','http://x') ON CONFLICT DO NOTHING")
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO manga_sources (source_id, source_manga_key, manga_id, source_url) \
|
||||||
|
VALUES ('target', $1, $2, 'http://x/m')",
|
||||||
|
)
|
||||||
|
.bind(format!("k-{manga_id}"))
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn active_jobs_and_covers_lists_over_http(pool: PgPool) {
|
||||||
|
seed_job(&pool, "Naruto", "pending").await;
|
||||||
|
seed_job(&pool, "Bleach", "running").await;
|
||||||
|
seed_missing_cover(&pool, "One Piece").await;
|
||||||
|
let h = harness(pool.clone());
|
||||||
|
let cookie = seed_admin(&pool, &h.app).await;
|
||||||
|
|
||||||
|
// Queued/active chapters.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/active-jobs", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = body_json(resp).await;
|
||||||
|
assert_eq!(body["page"]["total"], 2);
|
||||||
|
|
||||||
|
// Queued covers.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/covers", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = body_json(resp).await;
|
||||||
|
assert_eq!(body["page"]["total"], 1);
|
||||||
|
assert_eq!(body["items"][0]["manga_title"], "One Piece");
|
||||||
|
|
||||||
|
// Both are admin-gated.
|
||||||
|
let (_u, plain) = register_user(&h.app).await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/active-jobs", &plain))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn get_status_requires_admin(pool: PgPool) {
|
||||||
|
let h = harness(pool);
|
||||||
|
// Unauthenticated → 401.
|
||||||
|
let resp = h.app.clone().oneshot(get("/api/v1/admin/crawler")).await.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
|
||||||
|
|
||||||
|
// Authenticated non-admin → 403.
|
||||||
|
let (_u, cookie) = register_user(&h.app).await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn get_status_reports_disabled_daemon_with_queue_counts(pool: PgPool) {
|
||||||
|
seed_dead_job(&pool, "Naruto").await;
|
||||||
|
let h = harness(pool.clone());
|
||||||
|
let cookie = seed_admin(&pool, &h.app).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = body_json(resp).await;
|
||||||
|
assert_eq!(body["daemon"], "disabled");
|
||||||
|
assert_eq!(body["queue"]["dead"], 1);
|
||||||
|
assert_eq!(body["browser"], "down");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn control_endpoints_return_503_when_daemon_disabled(pool: PgPool) {
|
||||||
|
let h = harness(pool.clone());
|
||||||
|
let cookie = seed_admin(&pool, &h.app).await;
|
||||||
|
for uri in [
|
||||||
|
"/api/v1/admin/crawler/run",
|
||||||
|
"/api/v1/admin/crawler/browser/restart",
|
||||||
|
"/api/v1/admin/crawler/session/clear-expired",
|
||||||
|
] {
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(post_json_with_cookie(uri, json!({}), &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(
|
||||||
|
resp.status(),
|
||||||
|
StatusCode::SERVICE_UNAVAILABLE,
|
||||||
|
"{uri} should be 503 when daemon disabled"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn status_stream_requires_admin(pool: PgPool) {
|
||||||
|
let h = harness(pool);
|
||||||
|
// Unauthenticated → 401.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get("/api/v1/admin/crawler/stream"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
|
||||||
|
// Non-admin → 403.
|
||||||
|
let (_u, cookie) = register_user(&h.app).await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/stream", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn status_stream_emits_initial_event(pool: PgPool) {
|
||||||
|
let h = harness(pool.clone());
|
||||||
|
let cookie = seed_admin(&pool, &h.app).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/stream", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let ct = resp
|
||||||
|
.headers()
|
||||||
|
.get(axum::http::header::CONTENT_TYPE)
|
||||||
|
.and_then(|v| v.to_str().ok())
|
||||||
|
.unwrap_or_default()
|
||||||
|
.to_string();
|
||||||
|
assert!(ct.starts_with("text/event-stream"), "content-type was {ct:?}");
|
||||||
|
|
||||||
|
// Accumulate frames (the immediate snapshot may arrive split across
|
||||||
|
// frames) until the status payload appears, with an overall timeout so
|
||||||
|
// the never-ending stream can't hang the test.
|
||||||
|
let mut body = resp.into_body();
|
||||||
|
let mut acc = String::new();
|
||||||
|
let deadline = tokio::time::timeout(Duration::from_secs(5), async {
|
||||||
|
loop {
|
||||||
|
let Some(frame) = body.frame().await else { break };
|
||||||
|
if let Ok(data) = frame.expect("frame ok").into_data() {
|
||||||
|
acc.push_str(&String::from_utf8_lossy(&data));
|
||||||
|
if acc.contains("\"daemon\"") {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
assert!(deadline.is_ok(), "did not receive status within 5s; got: {acc:?}");
|
||||||
|
assert!(acc.contains("\"daemon\""), "missing status payload: {acc}");
|
||||||
|
assert!(acc.contains("status"), "missing SSE event name: {acc}");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn mutating_endpoints_reject_non_admin(pool: PgPool) {
|
||||||
|
let h = harness(pool);
|
||||||
|
// A logged-in non-admin must be forbidden from a mutating endpoint.
|
||||||
|
let (_u, cookie) = register_user(&h.app).await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(post_json_with_cookie(
|
||||||
|
"/api/v1/admin/crawler/dead-jobs/requeue",
|
||||||
|
json!({ "scope": "all" }),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn dead_jobs_list_and_requeue_over_http(pool: PgPool) {
|
||||||
|
let job_id = seed_dead_job(&pool, "Bleach").await;
|
||||||
|
let h = harness(pool.clone());
|
||||||
|
let cookie = seed_admin(&pool, &h.app).await;
|
||||||
|
|
||||||
|
// List.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(get_with_cookie("/api/v1/admin/crawler/dead-jobs", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = body_json(resp).await;
|
||||||
|
assert_eq!(body["page"]["total"], 1);
|
||||||
|
assert_eq!(body["items"][0]["manga_title"], "Bleach");
|
||||||
|
|
||||||
|
// Requeue the single job.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(post_json_with_cookie(
|
||||||
|
"/api/v1/admin/crawler/dead-jobs/requeue",
|
||||||
|
json!({ "scope": "job", "job_id": job_id }),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = body_json(resp).await;
|
||||||
|
assert_eq!(body["requeued"], 1);
|
||||||
|
|
||||||
|
let state: String = sqlx::query_scalar("SELECT state FROM crawler_jobs WHERE id = $1")
|
||||||
|
.bind(job_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(state, "pending");
|
||||||
|
}
|
||||||
350
backend/tests/api_admin_resync.rs
Normal file
350
backend/tests/api_admin_resync.rs
Normal file
@@ -0,0 +1,350 @@
|
|||||||
|
//! Integration tests for the admin force-resync endpoints.
|
||||||
|
//!
|
||||||
|
//! Real resync work requires Chromium, so these tests swap in a stub
|
||||||
|
//! [`ResyncService`] to assert the handler-level contract: routing,
|
||||||
|
//! admin gate, 503 when the daemon is disabled, 404 / 422 mapping for
|
||||||
|
//! missing-resource / no-source cases, and the audit-log side effect.
|
||||||
|
|
||||||
|
mod common;
|
||||||
|
|
||||||
|
use std::sync::Arc;
|
||||||
|
use std::sync::atomic::{AtomicUsize, Ordering};
|
||||||
|
|
||||||
|
use async_trait::async_trait;
|
||||||
|
use axum::http::StatusCode;
|
||||||
|
use serde_json::json;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use tower::ServiceExt;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
use mangalord::crawler::resync::{
|
||||||
|
ChapterResyncOutcome, MangaResyncOutcome, ResyncError, ResyncService,
|
||||||
|
};
|
||||||
|
use mangalord::repo;
|
||||||
|
use mangalord::repo::crawler::UpsertStatus;
|
||||||
|
|
||||||
|
/// Stub that records call counts and returns a canned outcome.
|
||||||
|
struct StubResync {
|
||||||
|
manga_calls: AtomicUsize,
|
||||||
|
chapter_calls: AtomicUsize,
|
||||||
|
/// When true, returns NoMangaSource / NoChapterSource.
|
||||||
|
no_source: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl StubResync {
|
||||||
|
fn new() -> Arc<Self> {
|
||||||
|
Arc::new(Self {
|
||||||
|
manga_calls: AtomicUsize::new(0),
|
||||||
|
chapter_calls: AtomicUsize::new(0),
|
||||||
|
no_source: false,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
fn no_source() -> Arc<Self> {
|
||||||
|
Arc::new(Self {
|
||||||
|
manga_calls: AtomicUsize::new(0),
|
||||||
|
chapter_calls: AtomicUsize::new(0),
|
||||||
|
no_source: true,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[async_trait]
|
||||||
|
impl ResyncService for StubResync {
|
||||||
|
async fn resync_manga(&self, manga_id: Uuid) -> anyhow::Result<MangaResyncOutcome> {
|
||||||
|
self.manga_calls.fetch_add(1, Ordering::SeqCst);
|
||||||
|
if self.no_source {
|
||||||
|
return Err(ResyncError::NoMangaSource.into());
|
||||||
|
}
|
||||||
|
Ok(MangaResyncOutcome {
|
||||||
|
manga_id,
|
||||||
|
metadata_status: UpsertStatus::Updated,
|
||||||
|
cover_fetched: true,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
async fn resync_chapter(&self, chapter_id: Uuid) -> anyhow::Result<ChapterResyncOutcome> {
|
||||||
|
self.chapter_calls.fetch_add(1, Ordering::SeqCst);
|
||||||
|
if self.no_source {
|
||||||
|
return Err(ResyncError::NoChapterSource.into());
|
||||||
|
}
|
||||||
|
Ok(ChapterResyncOutcome::Fetched {
|
||||||
|
chapter_id,
|
||||||
|
pages: 7,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn promote_admin(pool: &PgPool, username: &str) {
|
||||||
|
let u = repo::user::find_by_username(pool, username)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
.unwrap();
|
||||||
|
repo::user::set_is_admin_unchecked(pool, u.id, true)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn insert_manga(pool: &PgPool, title: &str) -> Uuid {
|
||||||
|
let (id,): (Uuid,) = sqlx::query_as(
|
||||||
|
"INSERT INTO mangas (title, status, alt_titles) VALUES ($1, 'ongoing', ARRAY[]::text[]) RETURNING id",
|
||||||
|
)
|
||||||
|
.bind(title)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
id
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn insert_chapter(pool: &PgPool, manga_id: Uuid, number: i32, pages: i32) -> Uuid {
|
||||||
|
let (id,): (Uuid,) = sqlx::query_as(
|
||||||
|
"INSERT INTO chapters (manga_id, number, title, page_count) VALUES ($1, $2, NULL, $3) RETURNING id",
|
||||||
|
)
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(number)
|
||||||
|
.bind(pages)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
id
|
||||||
|
}
|
||||||
|
|
||||||
|
// ----- manga resync ---------------------------------------------------------
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn manga_resync_calls_service_and_returns_refreshed_detail(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "Hello").await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
// Stub returned Updated + cover_fetched=true.
|
||||||
|
assert_eq!(body["metadata_status"], "updated");
|
||||||
|
assert_eq!(body["cover_fetched"], true);
|
||||||
|
// Response includes the refreshed manga detail.
|
||||||
|
assert_eq!(body["manga"]["id"], manga_id.to_string());
|
||||||
|
assert_eq!(body["manga"]["title"], "Hello");
|
||||||
|
|
||||||
|
assert_eq!(stub.manga_calls.load(Ordering::SeqCst), 1);
|
||||||
|
|
||||||
|
// Audit row written.
|
||||||
|
let (audit_count,): (i64,) =
|
||||||
|
sqlx::query_as("SELECT count(*) FROM admin_audit WHERE action = 'manga_resync' AND target_id = $1")
|
||||||
|
.bind(manga_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(audit_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn manga_resync_returns_404_for_unknown_id(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/mangas/{}/resync", Uuid::new_v4()),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::NOT_FOUND);
|
||||||
|
// Service must not have been called when the manga doesn't exist.
|
||||||
|
assert_eq!(stub.manga_calls.load(Ordering::SeqCst), 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn manga_resync_maps_no_source_to_422(pool: PgPool) {
|
||||||
|
let stub = StubResync::no_source();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub);
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "Manual upload, no crawler source").await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNPROCESSABLE_ENTITY);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["error"]["details"]["manga"], "no_source");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn manga_resync_returns_503_when_daemon_disabled(pool: PgPool) {
|
||||||
|
let h = common::harness(pool.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "Z").await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::SERVICE_UNAVAILABLE);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["error"]["code"], "service_unavailable");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn manga_resync_requires_admin(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub);
|
||||||
|
// Non-admin user.
|
||||||
|
let (_u, cookie) = common::register_user(&h.app).await;
|
||||||
|
let manga_id = insert_manga(&pool, "M").await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/mangas/{manga_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ----- chapter resync -------------------------------------------------------
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn chapter_resync_calls_service_and_returns_refreshed_chapter(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "M").await;
|
||||||
|
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["outcome"], "fetched");
|
||||||
|
assert_eq!(body["pages"], 7);
|
||||||
|
assert_eq!(body["chapter"]["id"], chapter_id.to_string());
|
||||||
|
assert_eq!(stub.chapter_calls.load(Ordering::SeqCst), 1);
|
||||||
|
|
||||||
|
let (audit_count,): (i64,) = sqlx::query_as(
|
||||||
|
"SELECT count(*) FROM admin_audit WHERE action = 'chapter_resync' AND target_id = $1",
|
||||||
|
)
|
||||||
|
.bind(chapter_id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(audit_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn chapter_resync_returns_404_for_unknown_id(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/chapters/{}/resync", Uuid::new_v4()),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::NOT_FOUND);
|
||||||
|
assert_eq!(stub.chapter_calls.load(Ordering::SeqCst), 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn chapter_resync_maps_no_source_to_422(pool: PgPool) {
|
||||||
|
let stub = StubResync::no_source();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub);
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "M").await;
|
||||||
|
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNPROCESSABLE_ENTITY);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["error"]["details"]["chapter"], "no_source");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn chapter_resync_returns_503_when_daemon_disabled(pool: PgPool) {
|
||||||
|
let h = common::harness(pool.clone());
|
||||||
|
let (username, cookie) = common::register_user(&h.app).await;
|
||||||
|
promote_admin(&pool, &username).await;
|
||||||
|
let manga_id = insert_manga(&pool, "M").await;
|
||||||
|
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
|
||||||
|
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::SERVICE_UNAVAILABLE);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn chapter_resync_requires_admin(pool: PgPool) {
|
||||||
|
let stub = StubResync::new();
|
||||||
|
let h = common::harness_with_resync(pool.clone(), stub);
|
||||||
|
let (_u, cookie) = common::register_user(&h.app).await;
|
||||||
|
let manga_id = insert_manga(&pool, "M").await;
|
||||||
|
let chapter_id = insert_chapter(&pool, manga_id, 1, 0).await;
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
&format!("/api/v1/admin/chapters/{chapter_id}/resync"),
|
||||||
|
json!({}),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
}
|
||||||
@@ -49,6 +49,8 @@ fn admin_test_router(pool: PgPool) -> (Router, TempDir) {
|
|||||||
auth,
|
auth,
|
||||||
upload: UploadConfig::default(),
|
upload: UploadConfig::default(),
|
||||||
auth_limiter,
|
auth_limiter,
|
||||||
|
resync: None,
|
||||||
|
crawler: None,
|
||||||
};
|
};
|
||||||
let app = Router::new()
|
let app = Router::new()
|
||||||
.nest("/api/v1", api::routes())
|
.nest("/api/v1", api::routes())
|
||||||
|
|||||||
189
backend/tests/api_private_mode.rs
Normal file
189
backend/tests/api_private_mode.rs
Normal file
@@ -0,0 +1,189 @@
|
|||||||
|
//! Site-wide auth gate (`PRIVATE_MODE=true`).
|
||||||
|
//!
|
||||||
|
//! With private mode on, every API path except a small allowlist
|
||||||
|
//! (`/health`, `/auth/config`, `/auth/login`, `/auth/logout`) requires
|
||||||
|
//! a valid session cookie or bearer token, and `/auth/register` is
|
||||||
|
//! force-blocked regardless of `ALLOW_SELF_REGISTER`. With private mode
|
||||||
|
//! off (the default), nothing changes — the `public_mode_*` test
|
||||||
|
//! pins that regression guard.
|
||||||
|
|
||||||
|
mod common;
|
||||||
|
|
||||||
|
use serde_json::json;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use tower::ServiceExt;
|
||||||
|
|
||||||
|
use axum::http::StatusCode;
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_blocks_anonymous_manga_list(pool: PgPool) {
|
||||||
|
let h = common::harness_with_private_mode(pool);
|
||||||
|
let resp = h.app.oneshot(common::get("/api/v1/mangas")).await.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_blocks_anonymous_files(pool: PgPool) {
|
||||||
|
let h = common::harness_with_private_mode(pool);
|
||||||
|
// The path doesn't have to exist — the guard runs before routing,
|
||||||
|
// so the response is 401 (not 404). That's the property the test
|
||||||
|
// is pinning: nothing leaks via crafted URLs.
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::get("/api/v1/files/anything.png"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::UNAUTHORIZED);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_allows_session_cookie_read(pool: PgPool) {
|
||||||
|
// Register through a non-private harness sharing the same DB pool
|
||||||
|
// so the session row exists. Then exercise the gate using a fresh
|
||||||
|
// private-mode harness against the same DB.
|
||||||
|
let public = common::harness(pool.clone());
|
||||||
|
let (_, cookie) = common::register_user(&public.app).await;
|
||||||
|
|
||||||
|
let private = common::harness_with_private_mode(pool);
|
||||||
|
let resp = private
|
||||||
|
.app
|
||||||
|
.oneshot(common::get_with_cookie("/api/v1/mangas", &cookie))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_allows_bearer_token_read(pool: PgPool) {
|
||||||
|
let public = common::harness(pool.clone());
|
||||||
|
let (_, cookie) = common::register_user(&public.app).await;
|
||||||
|
|
||||||
|
let resp = public
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(common::post_json_with_cookie(
|
||||||
|
"/api/v1/auth/tokens",
|
||||||
|
json!({ "name": "private-mode-bot" }),
|
||||||
|
&cookie,
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::CREATED);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
let bearer = body["bearer"].as_str().unwrap().to_string();
|
||||||
|
|
||||||
|
let private = common::harness_with_private_mode(pool);
|
||||||
|
let resp = private
|
||||||
|
.app
|
||||||
|
.oneshot(common::get_with_bearer("/api/v1/mangas", &bearer))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_allows_login_endpoint_anonymous(pool: PgPool) {
|
||||||
|
// Seed a user via the public harness so login has credentials to
|
||||||
|
// verify against.
|
||||||
|
let public = common::harness(pool.clone());
|
||||||
|
let _ = public
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(common::post_json(
|
||||||
|
"/api/v1/auth/register",
|
||||||
|
json!({ "username": "alice", "password": "hunter2hunter2" }),
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let private = common::harness_with_private_mode(pool);
|
||||||
|
let resp = private
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json(
|
||||||
|
"/api/v1/auth/login",
|
||||||
|
json!({ "username": "alice", "password": "hunter2hunter2" }),
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
// Reaches the login handler and succeeds — *not* 401 from the
|
||||||
|
// gate. That's the property we're pinning.
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_allows_health_and_config_anonymous(pool: PgPool) {
|
||||||
|
let h = common::harness_with_private_mode(pool);
|
||||||
|
|
||||||
|
let r = h
|
||||||
|
.app
|
||||||
|
.clone()
|
||||||
|
.oneshot(common::get("/api/v1/health"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(r.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
let r = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::get("/api/v1/auth/config"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(r.status(), StatusCode::OK);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn private_mode_blocks_register_even_when_self_register_enabled(pool: PgPool) {
|
||||||
|
// harness_with_private_mode keeps `allow_self_register=true` (the
|
||||||
|
// default) — private mode is supposed to force-block register
|
||||||
|
// regardless. That's what this test pins.
|
||||||
|
let h = common::harness_with_private_mode(pool);
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::post_json(
|
||||||
|
"/api/v1/auth/register",
|
||||||
|
json!({ "username": "alice", "password": "hunter2hunter2" }),
|
||||||
|
))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::FORBIDDEN);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["error"]["code"], "forbidden");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn auth_config_reports_private_mode_and_effective_self_register(pool: PgPool) {
|
||||||
|
let h = common::harness_with_private_mode(pool);
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::get("/api/v1/auth/config"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["private_mode"], true);
|
||||||
|
// Effective value: `allow_self_register && !private_mode` is false
|
||||||
|
// here even though the raw `allow_self_register` is true.
|
||||||
|
assert_eq!(body["self_register_enabled"], false);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn public_mode_does_not_gate_anonymous_reads(pool: PgPool) {
|
||||||
|
// Regression guard: with private_mode off (the default), the gate
|
||||||
|
// must be a no-op so existing public deployments stay public.
|
||||||
|
let h = common::harness(pool);
|
||||||
|
let resp = h.app.oneshot(common::get("/api/v1/mangas")).await.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn public_mode_reports_private_mode_false(pool: PgPool) {
|
||||||
|
let h = common::harness(pool);
|
||||||
|
let resp = h
|
||||||
|
.app
|
||||||
|
.oneshot(common::get("/api/v1/auth/config"))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(resp.status(), StatusCode::OK);
|
||||||
|
let body = common::body_json(resp).await;
|
||||||
|
assert_eq!(body["private_mode"], false);
|
||||||
|
assert_eq!(body["self_register_enabled"], true);
|
||||||
|
}
|
||||||
@@ -74,6 +74,11 @@ fn harness_with_auth_config(
|
|||||||
max_file_bytes: 256 * 1024,
|
max_file_bytes: 256 * 1024,
|
||||||
},
|
},
|
||||||
auth_limiter,
|
auth_limiter,
|
||||||
|
// Default harness has no crawler daemon wired up; admin resync
|
||||||
|
// handlers return 503 in this config. Tests that need a stub
|
||||||
|
// resync service swap it in via `harness_with_resync`.
|
||||||
|
resync: None,
|
||||||
|
crawler: None,
|
||||||
};
|
};
|
||||||
Harness { app: router(state), _storage_dir: storage_dir }
|
Harness { app: router(state), _storage_dir: storage_dir }
|
||||||
}
|
}
|
||||||
@@ -92,6 +97,21 @@ pub fn harness_with_self_register_disabled(pool: PgPool) -> Harness {
|
|||||||
harness_with_auth_config(pool, storage, storage_dir, auth)
|
harness_with_auth_config(pool, storage, storage_dir, auth)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Like [`harness`] but flips `PRIVATE_MODE` on so the site-wide auth
|
||||||
|
/// gate is exercised. `allow_self_register` stays at its default `true`
|
||||||
|
/// to verify that private mode force-disables self-registration on top
|
||||||
|
/// of whatever `ALLOW_SELF_REGISTER` says.
|
||||||
|
pub fn harness_with_private_mode(pool: PgPool) -> Harness {
|
||||||
|
let storage_dir = tempfile::tempdir().expect("tempdir");
|
||||||
|
let storage = Arc::new(LocalStorage::new(storage_dir.path()));
|
||||||
|
let auth = AuthConfig {
|
||||||
|
cookie_secure: false,
|
||||||
|
private_mode: true,
|
||||||
|
..AuthConfig::default()
|
||||||
|
};
|
||||||
|
harness_with_auth_config(pool, storage, storage_dir, auth)
|
||||||
|
}
|
||||||
|
|
||||||
/// Like [`harness`] but configures a tight auth rate limit. Used by
|
/// Like [`harness`] but configures a tight auth rate limit. Used by
|
||||||
/// the brute-force-rate-limiting test.
|
/// the brute-force-rate-limiting test.
|
||||||
pub fn harness_with_auth_rate_limit(
|
pub fn harness_with_auth_rate_limit(
|
||||||
@@ -109,6 +129,38 @@ pub fn harness_with_auth_rate_limit(
|
|||||||
harness_with_auth_config(pool, storage, storage_dir, auth)
|
harness_with_auth_config(pool, storage, storage_dir, auth)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Like [`harness`] but slots a caller-supplied [`ResyncService`] stub
|
||||||
|
/// into `AppState.resync`. Used by the admin resync tests so the
|
||||||
|
/// endpoint path is exercised without standing up a real Chromium.
|
||||||
|
pub fn harness_with_resync(
|
||||||
|
pool: PgPool,
|
||||||
|
resync: Arc<dyn mangalord::crawler::resync::ResyncService>,
|
||||||
|
) -> Harness {
|
||||||
|
let storage_dir = tempfile::tempdir().expect("tempdir");
|
||||||
|
let storage = Arc::new(LocalStorage::new(storage_dir.path()));
|
||||||
|
let auth = AuthConfig {
|
||||||
|
cookie_secure: false,
|
||||||
|
..AuthConfig::default()
|
||||||
|
};
|
||||||
|
let auth_limiter = Arc::new(AuthRateLimiter::new(auth.rate_limit));
|
||||||
|
let state = AppState {
|
||||||
|
db: pool,
|
||||||
|
storage,
|
||||||
|
auth,
|
||||||
|
upload: UploadConfig {
|
||||||
|
max_request_bytes: 4 * 1024 * 1024,
|
||||||
|
max_file_bytes: 256 * 1024,
|
||||||
|
},
|
||||||
|
auth_limiter,
|
||||||
|
resync: Some(resync),
|
||||||
|
crawler: None,
|
||||||
|
};
|
||||||
|
Harness {
|
||||||
|
app: router(state),
|
||||||
|
_storage_dir: storage_dir,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Wraps a real `Storage` and fails on the N-th `put` call so tests can
|
/// Wraps a real `Storage` and fails on the N-th `put` call so tests can
|
||||||
/// assert that handlers roll their DB writes back when storage errors
|
/// assert that handlers roll their DB writes back when storage errors
|
||||||
/// mid-upload. Reads and other operations delegate to `inner`.
|
/// mid-upload. Reads and other operations delegate to `inner`.
|
||||||
|
|||||||
@@ -40,6 +40,8 @@ fn make_cfg(
|
|||||||
tz: Tz::UTC,
|
tz: Tz::UTC,
|
||||||
retention_days: 7,
|
retention_days: 7,
|
||||||
session_expired,
|
session_expired,
|
||||||
|
status: mangalord::crawler::status::StatusHandle::new(workers),
|
||||||
|
job_timeout: Duration::from_secs(60),
|
||||||
extra_tasks: Vec::new(),
|
extra_tasks: Vec::new(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -88,6 +90,52 @@ impl ChapterDispatcher for PanickingDispatcher {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Never completes — used to verify the worker's outer dispatch timeout.
|
||||||
|
struct HangingDispatcher {
|
||||||
|
seen: AtomicUsize,
|
||||||
|
}
|
||||||
|
#[async_trait::async_trait]
|
||||||
|
impl ChapterDispatcher for HangingDispatcher {
|
||||||
|
async fn dispatch(&self, _payload: JobPayload) -> anyhow::Result<SyncOutcome> {
|
||||||
|
self.seen.fetch_add(1, Ordering::AcqRel);
|
||||||
|
std::future::pending::<()>().await;
|
||||||
|
unreachable!("hanging dispatcher never resolves");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn worker_times_out_a_hung_dispatch_and_acks_failed(pool: PgPool) {
|
||||||
|
enqueue_chapter_job(&pool).await;
|
||||||
|
let dispatcher = Arc::new(HangingDispatcher {
|
||||||
|
seen: AtomicUsize::new(0),
|
||||||
|
});
|
||||||
|
let session_expired = Arc::new(std::sync::atomic::AtomicBool::new(false));
|
||||||
|
let cancel = CancellationToken::new();
|
||||||
|
let mut cfg = make_cfg(None, dispatcher.clone(), session_expired, 1);
|
||||||
|
cfg.job_timeout = Duration::from_millis(300);
|
||||||
|
let handle = daemon::spawn(pool.clone(), cancel.clone(), cfg);
|
||||||
|
|
||||||
|
// The hung job should time out and return to pending with backoff
|
||||||
|
// (attempts=1 < max=5). Poll for the recorded error.
|
||||||
|
let mut timed_out = false;
|
||||||
|
for _ in 0..40 {
|
||||||
|
let n: i64 = sqlx::query_scalar(
|
||||||
|
"SELECT COUNT(*) FROM crawler_jobs WHERE last_error = 'dispatch timed out'",
|
||||||
|
)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
if n == 1 {
|
||||||
|
timed_out = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||||
|
}
|
||||||
|
handle.shutdown().await;
|
||||||
|
assert!(timed_out, "hung dispatch must be acked failed with a timeout error");
|
||||||
|
assert!(dispatcher.seen.load(Ordering::Acquire) >= 1);
|
||||||
|
}
|
||||||
|
|
||||||
#[sqlx::test(migrations = "./migrations")]
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
async fn workers_drain_jobs_through_dispatcher(pool: PgPool) {
|
async fn workers_drain_jobs_through_dispatcher(pool: PgPool) {
|
||||||
enqueue_chapter_job(&pool).await;
|
enqueue_chapter_job(&pool).await;
|
||||||
@@ -517,3 +565,132 @@ async fn enqueue_bookmarked_pending_resumes_after_quarantine_expires(pool: PgPoo
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Helper: insert a chapter with the given `number` and a non-dropped
|
||||||
|
/// source row, returning the chapter id. Used by the ordering tests so
|
||||||
|
/// the setup boilerplate doesn't drown the assertion.
|
||||||
|
async fn insert_pending_chapter(
|
||||||
|
pool: &PgPool,
|
||||||
|
manga_id: Uuid,
|
||||||
|
number: i32,
|
||||||
|
source_chapter_key: &str,
|
||||||
|
) -> Uuid {
|
||||||
|
let chapter_id: Uuid = sqlx::query_scalar(
|
||||||
|
"INSERT INTO chapters (manga_id, number, page_count) VALUES ($1, $2, 0) RETURNING id",
|
||||||
|
)
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(number)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO chapter_sources (source_id, source_chapter_key, chapter_id, source_url) \
|
||||||
|
VALUES ($1, $2, $3, $4)",
|
||||||
|
)
|
||||||
|
.bind("target")
|
||||||
|
.bind(source_chapter_key)
|
||||||
|
.bind(chapter_id)
|
||||||
|
.bind(format!("https://example.com/{source_chapter_key}"))
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
chapter_id
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn enqueue_bookmarked_pending_queues_chapters_in_ascending_number_order(pool: PgPool) {
|
||||||
|
// Insert chapters with `number` values 3, 1, 2 in that insertion
|
||||||
|
// order — so `created_at` order (the previous tiebreaker) does NOT
|
||||||
|
// match number order. After enqueue + lease, the worker should see
|
||||||
|
// chapters 1, 2, 3 in that sequence.
|
||||||
|
let user_id: Uuid = sqlx::query_scalar(
|
||||||
|
"INSERT INTO users (username, password_hash) VALUES ($1, $2) RETURNING id",
|
||||||
|
)
|
||||||
|
.bind("alice")
|
||||||
|
.bind("not-a-real-hash")
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let manga_id: Uuid = sqlx::query_scalar("INSERT INTO mangas (title) VALUES ($1) RETURNING id")
|
||||||
|
.bind("Test")
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO sources (id, name, base_url) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING",
|
||||||
|
)
|
||||||
|
.bind("target")
|
||||||
|
.bind("Target")
|
||||||
|
.bind("https://example.com")
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let c3 = insert_pending_chapter(&pool, manga_id, 3, "ch3").await;
|
||||||
|
let c1 = insert_pending_chapter(&pool, manga_id, 1, "ch1").await;
|
||||||
|
let c2 = insert_pending_chapter(&pool, manga_id, 2, "ch2").await;
|
||||||
|
sqlx::query("INSERT INTO bookmarks (user_id, manga_id) VALUES ($1, $2)")
|
||||||
|
.bind(user_id)
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let summary = pipeline::enqueue_bookmarked_pending(&pool).await.unwrap();
|
||||||
|
assert_eq!(summary.inserted, 3);
|
||||||
|
|
||||||
|
let leases = jobs::lease(&pool, None, 10, std::time::Duration::from_secs(60))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let leased_chapter_ids: Vec<Uuid> = leases
|
||||||
|
.iter()
|
||||||
|
.map(|l| match &l.payload {
|
||||||
|
JobPayload::SyncChapterContent { chapter_id, .. } => *chapter_id,
|
||||||
|
other => panic!("unexpected payload kind: {other:?}"),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
assert_eq!(
|
||||||
|
leased_chapter_ids,
|
||||||
|
vec![c1, c2, c3],
|
||||||
|
"chapters must be leased in ascending chapter-number order, not insertion order"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn enqueue_pending_for_manga_queues_chapters_in_ascending_number_order(pool: PgPool) {
|
||||||
|
// Same scenario as above but exercising the bookmark-create hook path
|
||||||
|
// (`enqueue_pending_for_manga`) which has its own ORDER BY.
|
||||||
|
let manga_id: Uuid = sqlx::query_scalar("INSERT INTO mangas (title) VALUES ($1) RETURNING id")
|
||||||
|
.bind("Test")
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO sources (id, name, base_url) VALUES ($1, $2, $3) ON CONFLICT DO NOTHING",
|
||||||
|
)
|
||||||
|
.bind("target")
|
||||||
|
.bind("Target")
|
||||||
|
.bind("https://example.com")
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let c3 = insert_pending_chapter(&pool, manga_id, 3, "ch3").await;
|
||||||
|
let c1 = insert_pending_chapter(&pool, manga_id, 1, "ch1").await;
|
||||||
|
let c2 = insert_pending_chapter(&pool, manga_id, 2, "ch2").await;
|
||||||
|
|
||||||
|
let summary = pipeline::enqueue_pending_for_manga(&pool, manga_id)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(summary.inserted, 3);
|
||||||
|
|
||||||
|
let leases = jobs::lease(&pool, None, 10, std::time::Duration::from_secs(60))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let leased_chapter_ids: Vec<Uuid> = leases
|
||||||
|
.iter()
|
||||||
|
.map(|l| match &l.payload {
|
||||||
|
JobPayload::SyncChapterContent { chapter_id, .. } => *chapter_id,
|
||||||
|
other => panic!("unexpected payload kind: {other:?}"),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
assert_eq!(leased_chapter_ids, vec![c1, c2, c3]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|||||||
304
backend/tests/crawler_dead_jobs.rs
Normal file
304
backend/tests/crawler_dead_jobs.rs
Normal file
@@ -0,0 +1,304 @@
|
|||||||
|
//! Integration tests for the dead-letter admin queries in
|
||||||
|
//! `repo::crawler`: listing dead jobs with manga/chapter context and the
|
||||||
|
//! scoped requeue (all / per-manga / single) used by the admin dashboard.
|
||||||
|
|
||||||
|
use mangalord::repo::crawler::{self, RequeueScope};
|
||||||
|
use serde_json::json;
|
||||||
|
use sqlx::PgPool;
|
||||||
|
use uuid::Uuid;
|
||||||
|
|
||||||
|
/// Seed a manga with no cover + a live source row (so it's "queued for a
|
||||||
|
/// cover fetch"). Returns the manga id.
|
||||||
|
async fn seed_missing_cover(pool: &PgPool, title: &str) -> Uuid {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title, cover_image_path) VALUES ($1, $2, NULL)")
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(title)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO sources (id, name, base_url) VALUES ('target', 'T', 'http://x') ON CONFLICT DO NOTHING")
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO manga_sources (source_id, source_manga_key, manga_id, source_url) \
|
||||||
|
VALUES ('target', $1, $2, 'http://x/m')",
|
||||||
|
)
|
||||||
|
.bind(format!("k-{manga_id}"))
|
||||||
|
.bind(manga_id)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
manga_id
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Seed a manga + chapter and return their ids.
|
||||||
|
async fn seed_chapter(pool: &PgPool, title: &str, number: i32) -> (Uuid, Uuid) {
|
||||||
|
let manga_id = Uuid::new_v4();
|
||||||
|
let chapter_id = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title) VALUES ($1, $2)")
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(title)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("INSERT INTO chapters (id, manga_id, number) VALUES ($1, $2, $3)")
|
||||||
|
.bind(chapter_id)
|
||||||
|
.bind(manga_id)
|
||||||
|
.bind(number)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
(manga_id, chapter_id)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Insert a crawler_jobs row in a given state for a chapter-content job.
|
||||||
|
async fn insert_job(pool: &PgPool, chapter_id: Uuid, state: &str, attempts: i32) -> Uuid {
|
||||||
|
let id = Uuid::new_v4();
|
||||||
|
let payload = json!({
|
||||||
|
"kind": "sync_chapter_content",
|
||||||
|
"source_id": "target",
|
||||||
|
"chapter_id": chapter_id,
|
||||||
|
"source_chapter_key": "k",
|
||||||
|
});
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO crawler_jobs (id, payload, state, attempts, last_error) \
|
||||||
|
VALUES ($1, $2, $3, $4, 'boom')",
|
||||||
|
)
|
||||||
|
.bind(id)
|
||||||
|
.bind(payload)
|
||||||
|
.bind(state)
|
||||||
|
.bind(attempts)
|
||||||
|
.execute(pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
id
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn state_of(pool: &PgPool, id: Uuid) -> String {
|
||||||
|
sqlx::query_scalar::<_, String>("SELECT state FROM crawler_jobs WHERE id = $1")
|
||||||
|
.bind(id)
|
||||||
|
.fetch_one(pool)
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_dead_jobs_returns_context_and_total(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "Naruto", 700).await;
|
||||||
|
insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
// A non-dead job must not appear.
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "Bleach", 1).await;
|
||||||
|
insert_job(&pool, c2, "pending", 0).await;
|
||||||
|
|
||||||
|
let (items, total) = crawler::list_dead_jobs(&pool, None, 50, 0).await.unwrap();
|
||||||
|
assert_eq!(total, 1);
|
||||||
|
assert_eq!(items.len(), 1);
|
||||||
|
let row = &items[0];
|
||||||
|
assert_eq!(row.manga_title.as_deref(), Some("Naruto"));
|
||||||
|
assert_eq!(row.chapter_number, Some(700));
|
||||||
|
assert_eq!(row.attempts, 5);
|
||||||
|
assert_eq!(row.last_error.as_deref(), Some("boom"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_dead_jobs_filters_by_title_search(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "Naruto", 700).await;
|
||||||
|
insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "One Piece", 1).await;
|
||||||
|
insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
|
||||||
|
let (items, total) = crawler::list_dead_jobs(&pool, Some("piece"), 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(total, 1);
|
||||||
|
assert_eq!(items[0].manga_title.as_deref(), Some("One Piece"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_all_resets_dead_jobs_to_pending(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "B", 1).await;
|
||||||
|
let j1 = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let j2 = insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, RequeueScope::All)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(n, 2);
|
||||||
|
assert_eq!(state_of(&pool, j1).await, "pending");
|
||||||
|
assert_eq!(state_of(&pool, j2).await, "pending");
|
||||||
|
let attempts: i32 = sqlx::query_scalar("SELECT attempts FROM crawler_jobs WHERE id = $1")
|
||||||
|
.bind(j1)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(attempts, 0, "attempts reset on requeue");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_by_manga_scopes_to_that_manga(pool: PgPool) {
|
||||||
|
let (m1, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "B", 1).await;
|
||||||
|
let j1 = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let j2 = insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, RequeueScope::Manga(m1))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(n, 1);
|
||||||
|
assert_eq!(state_of(&pool, j1).await, "pending");
|
||||||
|
assert_eq!(state_of(&pool, j2).await, "dead", "other manga untouched");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_by_chapter_scopes_to_that_chapter(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "A", 2).await;
|
||||||
|
let j1 = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let j2 = insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, RequeueScope::Chapter(c1))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(n, 1);
|
||||||
|
assert_eq!(state_of(&pool, j1).await, "pending");
|
||||||
|
assert_eq!(state_of(&pool, j2).await, "dead", "other chapter untouched");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_single_job(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "B", 1).await;
|
||||||
|
let j1 = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let j2 = insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, RequeueScope::Job(j1))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(n, 1);
|
||||||
|
assert_eq!(state_of(&pool, j1).await, "pending");
|
||||||
|
assert_eq!(state_of(&pool, j2).await, "dead");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_skips_dead_when_live_job_exists_for_same_chapter(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let dead = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
// A live pending job for the SAME chapter already exists.
|
||||||
|
insert_job(&pool, c1, "pending", 0).await;
|
||||||
|
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, RequeueScope::All)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(n, 0, "must not resurrect a dead job that has a live counterpart");
|
||||||
|
assert_eq!(state_of(&pool, dead).await, "dead");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn requeue_with_two_dead_jobs_for_one_chapter_revives_one_not_500(pool: PgPool) {
|
||||||
|
// Regression: two dead jobs for the SAME chapter must not both flip to
|
||||||
|
// pending in one statement — that would violate the partial unique
|
||||||
|
// dedup index and abort the whole requeue.
|
||||||
|
let (manga_id, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let older = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
let newer = insert_job(&pool, c1, "dead", 5).await;
|
||||||
|
// Make `newer` unambiguously newer.
|
||||||
|
sqlx::query("UPDATE crawler_jobs SET updated_at = now() - interval '1 hour' WHERE id = $1")
|
||||||
|
.bind(older)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
for scope in [RequeueScope::All, RequeueScope::Manga(manga_id), RequeueScope::Chapter(c1)] {
|
||||||
|
// Reset to two-dead before each scope variant.
|
||||||
|
sqlx::query("UPDATE crawler_jobs SET state = 'dead' WHERE id = ANY($1)")
|
||||||
|
.bind(vec![older, newer])
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let n = crawler::requeue_dead_jobs(&pool, scope)
|
||||||
|
.await
|
||||||
|
.expect("requeue must not error on duplicate dead jobs");
|
||||||
|
assert_eq!(n, 1, "exactly one dead job per chapter is revived");
|
||||||
|
// The newest one is the survivor; the other stays dead.
|
||||||
|
assert_eq!(state_of(&pool, newer).await, "pending");
|
||||||
|
assert_eq!(state_of(&pool, older).await, "dead");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_active_jobs_returns_pending_and_running_running_first(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "Naruto", 700).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "Bleach", 10).await;
|
||||||
|
insert_job(&pool, c1, "pending", 0).await;
|
||||||
|
insert_job(&pool, c2, "running", 1).await;
|
||||||
|
// A dead + a done job must NOT appear.
|
||||||
|
let (_m3, c3) = seed_chapter(&pool, "Gone", 1).await;
|
||||||
|
insert_job(&pool, c3, "dead", 5).await;
|
||||||
|
|
||||||
|
let (items, total) = crawler::list_active_jobs(&pool, None, 50, 0).await.unwrap();
|
||||||
|
assert_eq!(total, 2);
|
||||||
|
assert_eq!(items.len(), 2);
|
||||||
|
// Running first.
|
||||||
|
assert_eq!(items[0].state, "running");
|
||||||
|
assert_eq!(items[0].manga_title.as_deref(), Some("Bleach"));
|
||||||
|
assert_eq!(items[1].state, "pending");
|
||||||
|
assert_eq!(items[1].chapter_number, Some(700));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_active_jobs_filters_by_title(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "Naruto", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "One Piece", 1).await;
|
||||||
|
insert_job(&pool, c1, "pending", 0).await;
|
||||||
|
insert_job(&pool, c2, "pending", 0).await;
|
||||||
|
let (items, total) = crawler::list_active_jobs(&pool, Some("piece"), 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(total, 1);
|
||||||
|
assert_eq!(items[0].manga_title.as_deref(), Some("One Piece"));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn missing_covers_count_and_list(pool: PgPool) {
|
||||||
|
seed_missing_cover(&pool, "Naruto").await;
|
||||||
|
seed_missing_cover(&pool, "Bleach").await;
|
||||||
|
// A manga WITH a cover must not be counted.
|
||||||
|
let with_cover = Uuid::new_v4();
|
||||||
|
sqlx::query("INSERT INTO mangas (id, title, cover_image_path) VALUES ($1, 'Done', 'k.jpg')")
|
||||||
|
.bind(with_cover)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
assert_eq!(crawler::count_missing_covers(&pool).await.unwrap(), 2);
|
||||||
|
|
||||||
|
let (items, total) = crawler::list_missing_cover_mangas(&pool, None, 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(total, 2);
|
||||||
|
assert_eq!(items.len(), 2);
|
||||||
|
|
||||||
|
let (items, total) = crawler::list_missing_cover_mangas(&pool, Some("naru"), 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(total, 1);
|
||||||
|
assert_eq!(items[0].manga_title, "Naruto");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn job_state_counts_groups_by_state(pool: PgPool) {
|
||||||
|
let (_m, c1) = seed_chapter(&pool, "A", 1).await;
|
||||||
|
let (_m2, c2) = seed_chapter(&pool, "B", 1).await;
|
||||||
|
let (_m3, c3) = seed_chapter(&pool, "C", 1).await;
|
||||||
|
insert_job(&pool, c1, "pending", 0).await;
|
||||||
|
insert_job(&pool, c2, "dead", 5).await;
|
||||||
|
insert_job(&pool, c3, "dead", 5).await;
|
||||||
|
|
||||||
|
let (pending, running, dead) = crawler::job_state_counts(&pool).await.unwrap();
|
||||||
|
assert_eq!(pending, 1);
|
||||||
|
assert_eq!(running, 0);
|
||||||
|
assert_eq!(dead, 2);
|
||||||
|
}
|
||||||
@@ -185,6 +185,68 @@ async fn lease_marks_running_and_bumps_attempts_and_sets_leased_until(pool: PgPo
|
|||||||
assert!(leased_until > chrono::Utc::now());
|
assert!(leased_until > chrono::Utc::now());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn renew_extends_leased_until_while_running(pool: PgPool) {
|
||||||
|
let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
{
|
||||||
|
EnqueueResult::Inserted(id) => id,
|
||||||
|
EnqueueResult::Skipped => unreachable!(),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Lease with a short window, then collapse leased_until to the recent
|
||||||
|
// past so the renew is unambiguously an extension.
|
||||||
|
let leases = jobs::lease(&pool, None, 1, Duration::from_secs(5))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(leases.len(), 1);
|
||||||
|
sqlx::query("UPDATE crawler_jobs SET leased_until = now() - interval '1 second' WHERE id = $1")
|
||||||
|
.bind(id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let still_owned = jobs::renew(&pool, id, Duration::from_secs(120))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert!(still_owned, "renew on a running job returns true");
|
||||||
|
|
||||||
|
let leased_until: chrono::DateTime<chrono::Utc> =
|
||||||
|
sqlx::query_scalar("SELECT leased_until FROM crawler_jobs WHERE id = $1")
|
||||||
|
.bind(id)
|
||||||
|
.fetch_one(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert!(
|
||||||
|
leased_until > chrono::Utc::now() + chrono::Duration::seconds(60),
|
||||||
|
"leased_until pushed ~120s into the future"
|
||||||
|
);
|
||||||
|
assert_eq!(job_state(&pool, id).await, "running");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn renew_is_noop_once_job_no_longer_running(pool: PgPool) {
|
||||||
|
let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
{
|
||||||
|
EnqueueResult::Inserted(id) => id,
|
||||||
|
EnqueueResult::Skipped => unreachable!(),
|
||||||
|
};
|
||||||
|
let leases = jobs::lease(&pool, None, 1, Duration::from_secs(60))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
// Job completes — heartbeat should now see it's no longer ours.
|
||||||
|
jobs::ack_done(&pool, leases[0].id).await.unwrap();
|
||||||
|
|
||||||
|
let still_owned = jobs::renew(&pool, id, Duration::from_secs(120))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert!(!still_owned, "renew on a non-running job returns false");
|
||||||
|
assert_eq!(job_state(&pool, id).await, "done");
|
||||||
|
}
|
||||||
|
|
||||||
#[sqlx::test(migrations = "./migrations")]
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
async fn lease_with_kind_filter_only_matches_that_kind(pool: PgPool) {
|
async fn lease_with_kind_filter_only_matches_that_kind(pool: PgPool) {
|
||||||
let manga_id = match jobs::enqueue(&pool, &sync_manga_payload("foo"))
|
let manga_id = match jobs::enqueue(&pool, &sync_manga_payload("foo"))
|
||||||
@@ -531,6 +593,89 @@ async fn reap_done_deletes_old_rows_keeps_fresh(pool: PgPool) {
|
|||||||
assert_eq!(remaining, vec![fresh_id], "only fresh row remains");
|
assert_eq!(remaining, vec![fresh_id], "only fresh row remains");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn lease_ties_on_scheduled_at_break_by_created_at(pool: PgPool) {
|
||||||
|
// Locks in the tiebreaker that lets enqueue order survive the lease
|
||||||
|
// step: when many jobs share `scheduled_at` (the common cron-batch
|
||||||
|
// case), the worker must pick the earliest-inserted row, not whatever
|
||||||
|
// Postgres returns in heap order. The enqueue path inserts chapters
|
||||||
|
// in chapter-number order, so this tiebreaker is what makes "queue
|
||||||
|
// in rising order" observable at the dequeue side too.
|
||||||
|
let a = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
{
|
||||||
|
EnqueueResult::Inserted(id) => id,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
let b = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
{
|
||||||
|
EnqueueResult::Inserted(id) => id,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
let c = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
.await
|
||||||
|
.unwrap()
|
||||||
|
{
|
||||||
|
EnqueueResult::Inserted(id) => id,
|
||||||
|
_ => unreachable!(),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Pin `scheduled_at` to a single literal instant (shared across all
|
||||||
|
// three rows — `now()` would yield a different microsecond per UPDATE
|
||||||
|
// and make scheduled_at the actual sort key). Reverse `created_at`
|
||||||
|
// against insertion order so heap order would give the wrong answer.
|
||||||
|
let shared_scheduled = chrono::Utc::now() - chrono::Duration::hours(1);
|
||||||
|
sqlx::query(
|
||||||
|
"UPDATE crawler_jobs \
|
||||||
|
SET scheduled_at = $2, \
|
||||||
|
created_at = $3 \
|
||||||
|
WHERE id = $1",
|
||||||
|
)
|
||||||
|
.bind(a)
|
||||||
|
.bind(shared_scheduled)
|
||||||
|
.bind(chrono::Utc::now() - chrono::Duration::seconds(10))
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"UPDATE crawler_jobs \
|
||||||
|
SET scheduled_at = $2, \
|
||||||
|
created_at = $3 \
|
||||||
|
WHERE id = $1",
|
||||||
|
)
|
||||||
|
.bind(b)
|
||||||
|
.bind(shared_scheduled)
|
||||||
|
.bind(chrono::Utc::now() - chrono::Duration::seconds(20))
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query(
|
||||||
|
"UPDATE crawler_jobs \
|
||||||
|
SET scheduled_at = $2, \
|
||||||
|
created_at = $3 \
|
||||||
|
WHERE id = $1",
|
||||||
|
)
|
||||||
|
.bind(c)
|
||||||
|
.bind(shared_scheduled)
|
||||||
|
.bind(chrono::Utc::now() - chrono::Duration::seconds(30))
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let leases = jobs::lease(&pool, None, 10, Duration::from_secs(60))
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let order: Vec<Uuid> = leases.iter().map(|l| l.id).collect();
|
||||||
|
assert_eq!(
|
||||||
|
order,
|
||||||
|
vec![c, b, a],
|
||||||
|
"lease must return jobs in created_at order when scheduled_at ties"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
#[sqlx::test(migrations = "./migrations")]
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
async fn reap_done_zero_is_a_no_op(pool: PgPool) {
|
async fn reap_done_zero_is_a_no_op(pool: PgPool) {
|
||||||
let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
let id = match jobs::enqueue(&pool, &chapter_content_payload(Uuid::new_v4()))
|
||||||
|
|||||||
@@ -6,6 +6,7 @@
|
|||||||
|
|
||||||
use mangalord::crawler::source::{SourceChapterRef, SourceManga};
|
use mangalord::crawler::source::{SourceChapterRef, SourceManga};
|
||||||
use mangalord::repo::crawler::{self, ChapterDiff, UpsertStatus};
|
use mangalord::repo::crawler::{self, ChapterDiff, UpsertStatus};
|
||||||
|
use mangalord::repo::chapter as chapter_repo;
|
||||||
use sqlx::PgPool;
|
use sqlx::PgPool;
|
||||||
use uuid::Uuid;
|
use uuid::Uuid;
|
||||||
|
|
||||||
@@ -829,6 +830,107 @@ async fn sync_tags_garbage_collects_orphan_user_attachments(pool: PgPool) {
|
|||||||
assert_eq!(orphan_rows, 0, "orphan user-attached tag should be reaped");
|
assert_eq!(orphan_rows, 0, "orphan user-attached tag should be reaped");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---- list_missing_covers ---------------------------------------------------
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_missing_covers_only_returns_rows_without_cover(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let with_cover = sample_manga("with", "With Cover", "h1");
|
||||||
|
let without_cover = sample_manga("without", "No Cover", "h2");
|
||||||
|
let _w = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/with", &with_cover)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let nc = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/without", &without_cover)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Manually set a cover for `with` only.
|
||||||
|
sqlx::query("UPDATE mangas SET cover_image_path = 'mangas/x/cover.jpg' WHERE id = $1")
|
||||||
|
.bind(_w.manga_id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
|
||||||
|
assert_eq!(entries.len(), 1, "exactly the manga without a cover");
|
||||||
|
assert_eq!(entries[0].manga_id, nc.manga_id);
|
||||||
|
assert_eq!(entries[0].source_manga_key, "without");
|
||||||
|
assert_eq!(entries[0].source_url, "https://x.example/without");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_missing_covers_skips_dropped_source_rows(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo", "h1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
sqlx::query("UPDATE manga_sources SET dropped_at = NOW() WHERE manga_id = $1")
|
||||||
|
.bind(up.manga_id)
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
|
||||||
|
assert!(
|
||||||
|
entries.is_empty(),
|
||||||
|
"dropped-source mangas must not be backfilled — no live source to fetch from"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_missing_covers_respects_limit(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
for i in 0..5 {
|
||||||
|
let key = format!("m{i}");
|
||||||
|
let url = format!("https://x.example/{key}");
|
||||||
|
let m = sample_manga(&key, &format!("M{i}"), &format!("h{i}"));
|
||||||
|
let _ = crawler::upsert_manga_from_source(&pool, "target", &url, &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
}
|
||||||
|
let entries = crawler::list_missing_covers(&pool, 3).await.unwrap();
|
||||||
|
assert_eq!(entries.len(), 3, "limit caps the result set");
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_missing_covers_deduplicates_per_manga(pool: PgPool) {
|
||||||
|
// A manga surfaced by two sources should produce ONE backfill
|
||||||
|
// entry, not two — otherwise the per-tick cap could be eaten by
|
||||||
|
// duplicates and starve other mangas.
|
||||||
|
crawler::ensure_source(&pool, "src-a", "A", "https://a.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
crawler::ensure_source(&pool, "src-b", "B", "https://b.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo", "h1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "src-a", "https://a.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
// Second source attaches to the SAME manga row.
|
||||||
|
sqlx::query(
|
||||||
|
"INSERT INTO manga_sources (source_id, source_manga_key, manga_id, source_url) \
|
||||||
|
VALUES ($1, $2, $3, $4)",
|
||||||
|
)
|
||||||
|
.bind("src-b")
|
||||||
|
.bind("foo-on-b")
|
||||||
|
.bind(up.manga_id)
|
||||||
|
.bind("https://b.example/foo")
|
||||||
|
.execute(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let entries = crawler::list_missing_covers(&pool, 50).await.unwrap();
|
||||||
|
assert_eq!(entries.len(), 1, "DISTINCT ON (m.id) collapses duplicate source rows");
|
||||||
|
}
|
||||||
|
|
||||||
#[sqlx::test(migrations = "./migrations")]
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
async fn re_appearing_manga_clears_dropped_at(pool: PgPool) {
|
async fn re_appearing_manga_clears_dropped_at(pool: PgPool) {
|
||||||
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
@@ -860,3 +962,261 @@ async fn re_appearing_manga_clears_dropped_at(pool: PgPool) {
|
|||||||
assert!(dropped.0.is_none());
|
assert!(dropped.0.is_none());
|
||||||
assert_eq!(dropped.1, up.manga_id);
|
assert_eq!(dropped.1, up.manga_id);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---- source_index: site-order preservation ----
|
||||||
|
//
|
||||||
|
// The user-facing chapter list reverses the source-site order so that
|
||||||
|
// the oldest chapter appears first. The crawler records each row's DOM
|
||||||
|
// position in `chapters.source_index` (0 = first in source DOM = newest
|
||||||
|
// on this site) on every sync; the list query orders by source_index
|
||||||
|
// DESC NULLS LAST, falling through to number/created_at for rows with
|
||||||
|
// no source row (e.g. user uploads).
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn source_index_set_on_insert_matches_dom_order(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo Manga", "hash-1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let chapters = vec![
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "a".into(),
|
||||||
|
number: 30,
|
||||||
|
title: Some("Ch.30".into()),
|
||||||
|
url: "https://x.example/foo/a".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "b".into(),
|
||||||
|
number: 29,
|
||||||
|
title: Some("Ch.29".into()),
|
||||||
|
url: "https://x.example/foo/b".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "c".into(),
|
||||||
|
number: 28,
|
||||||
|
title: Some("Ch.28".into()),
|
||||||
|
url: "https://x.example/foo/c".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &chapters)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let rows: Vec<(String, Option<i32>)> = sqlx::query_as(
|
||||||
|
"SELECT cs.source_chapter_key, c.source_index \
|
||||||
|
FROM chapters c \
|
||||||
|
JOIN chapter_sources cs ON cs.chapter_id = c.id \
|
||||||
|
WHERE c.manga_id = $1 \
|
||||||
|
ORDER BY cs.source_chapter_key",
|
||||||
|
)
|
||||||
|
.bind(up.manga_id)
|
||||||
|
.fetch_all(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(
|
||||||
|
rows,
|
||||||
|
vec![
|
||||||
|
("a".to_string(), Some(0)),
|
||||||
|
("b".to_string(), Some(1)),
|
||||||
|
("c".to_string(), Some(2)),
|
||||||
|
],
|
||||||
|
"source_index reflects enumerate() position in the input slice",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn source_index_rewritten_on_resync_when_new_chapter_prepended(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo Manga", "hash-1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let first = vec![
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "a".into(),
|
||||||
|
number: 1,
|
||||||
|
title: Some("Ch.1".into()),
|
||||||
|
url: "https://x.example/foo/a".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "b".into(),
|
||||||
|
number: 2,
|
||||||
|
title: Some("Ch.2".into()),
|
||||||
|
url: "https://x.example/foo/b".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &first)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Second sync: a brand-new chapter appears at the top of the source
|
||||||
|
// (newest first on the site). All existing rows must shift their
|
||||||
|
// source_index down by one so the display order stays correct.
|
||||||
|
let second = vec![
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "new".into(),
|
||||||
|
number: 3,
|
||||||
|
title: Some("Ch.3".into()),
|
||||||
|
url: "https://x.example/foo/new".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "a".into(),
|
||||||
|
number: 1,
|
||||||
|
title: Some("Ch.1".into()),
|
||||||
|
url: "https://x.example/foo/a".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "b".into(),
|
||||||
|
number: 2,
|
||||||
|
title: Some("Ch.2".into()),
|
||||||
|
url: "https://x.example/foo/b".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &second)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let rows: Vec<(String, Option<i32>)> = sqlx::query_as(
|
||||||
|
"SELECT cs.source_chapter_key, c.source_index \
|
||||||
|
FROM chapters c \
|
||||||
|
JOIN chapter_sources cs ON cs.chapter_id = c.id \
|
||||||
|
WHERE c.manga_id = $1 \
|
||||||
|
ORDER BY cs.source_chapter_key",
|
||||||
|
)
|
||||||
|
.bind(up.manga_id)
|
||||||
|
.fetch_all(&pool)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
assert_eq!(
|
||||||
|
rows,
|
||||||
|
vec![
|
||||||
|
("a".to_string(), Some(1)),
|
||||||
|
("b".to_string(), Some(2)),
|
||||||
|
("new".to_string(), Some(0)),
|
||||||
|
],
|
||||||
|
"new chapter takes index 0, existing rows shift down on UPDATE",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_for_manga_returns_source_order_reversed(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo Manga", "hash-1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Site DOM order (top-down = newest-first):
|
||||||
|
// ch11 (number = 11)
|
||||||
|
// notice (number = 0, non-numeric label on the site)
|
||||||
|
// ch10 (number = 10)
|
||||||
|
// Numbers deliberately disagree with DOM order: a number-based sort
|
||||||
|
// would put notice first, but the site places it between ch10 and
|
||||||
|
// ch11. Reversed-DOM display should yield [ch10, notice, ch11].
|
||||||
|
let chapters = vec![
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "ch11".into(),
|
||||||
|
number: 11,
|
||||||
|
title: Some("Ch.11 : Official".into()),
|
||||||
|
url: "https://x.example/foo/11".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "notice".into(),
|
||||||
|
number: 0,
|
||||||
|
title: Some("notice. : Officials".into()),
|
||||||
|
url: "https://x.example/foo/notice".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "ch10".into(),
|
||||||
|
number: 10,
|
||||||
|
title: Some("Ch.10 : Official".into()),
|
||||||
|
url: "https://x.example/foo/10".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &chapters)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let listed = chapter_repo::list_for_manga(&pool, up.manga_id, 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let keys: Vec<String> = listed
|
||||||
|
.iter()
|
||||||
|
.map(|c| c.title.clone().unwrap_or_default())
|
||||||
|
.collect();
|
||||||
|
assert_eq!(
|
||||||
|
keys,
|
||||||
|
vec![
|
||||||
|
"Ch.10 : Official".to_string(),
|
||||||
|
"notice. : Officials".to_string(),
|
||||||
|
"Ch.11 : Official".to_string(),
|
||||||
|
],
|
||||||
|
"list returns chapters in reversed source-DOM order, so the \
|
||||||
|
oldest appears first and non-numeric entries land where the \
|
||||||
|
site placed them",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
#[sqlx::test(migrations = "./migrations")]
|
||||||
|
async fn list_for_manga_places_null_source_index_last(pool: PgPool) {
|
||||||
|
crawler::ensure_source(&pool, "target", "T", "https://x.example")
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let m = sample_manga("foo", "Foo Manga", "hash-1");
|
||||||
|
let up = crawler::upsert_manga_from_source(&pool, "target", "https://x.example/foo", &m)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Crawled chapters get source_index 0 and 1; the upload path leaves
|
||||||
|
// it NULL. NULLS LAST plus the (number, created_at) tail means the
|
||||||
|
// upload sits after both crawled rows even though its number is in
|
||||||
|
// the middle.
|
||||||
|
let crawled = vec![
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "a".into(),
|
||||||
|
number: 1,
|
||||||
|
title: Some("Ch.1".into()),
|
||||||
|
url: "https://x.example/foo/a".into(),
|
||||||
|
},
|
||||||
|
SourceChapterRef {
|
||||||
|
source_chapter_key: "b".into(),
|
||||||
|
number: 3,
|
||||||
|
title: Some("Ch.3".into()),
|
||||||
|
url: "https://x.example/foo/b".into(),
|
||||||
|
},
|
||||||
|
];
|
||||||
|
crawler::sync_manga_chapters(&pool, "target", up.manga_id, &crawled)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
chapter_repo::create(&pool, up.manga_id, 2, Some("User upload Ch.2"), None)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let listed = chapter_repo::list_for_manga(&pool, up.manga_id, 50, 0)
|
||||||
|
.await
|
||||||
|
.unwrap();
|
||||||
|
let titles: Vec<String> = listed
|
||||||
|
.iter()
|
||||||
|
.map(|c| c.title.clone().unwrap_or_default())
|
||||||
|
.collect();
|
||||||
|
assert_eq!(
|
||||||
|
titles,
|
||||||
|
vec![
|
||||||
|
"Ch.3".to_string(),
|
||||||
|
"Ch.1".to_string(),
|
||||||
|
"User upload Ch.2".to_string(),
|
||||||
|
],
|
||||||
|
"crawled rows ordered by reversed source_index; user upload \
|
||||||
|
(NULL source_index) falls through to the end",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|||||||
@@ -109,7 +109,7 @@ async fn dispatch_target_prefers_most_recent_live_source(pool: PgPool) {
|
|||||||
seed_chapter_with_two_live_sources(&pool).await;
|
seed_chapter_with_two_live_sources(&pool).await;
|
||||||
|
|
||||||
let row = dispatch_target(&pool, chapter_id).await.unwrap();
|
let row = dispatch_target(&pool, chapter_id).await.unwrap();
|
||||||
let (_manga_id, source_url) =
|
let (_manga_id, source_url, _title, _number) =
|
||||||
row.expect("two live sources should yield a dispatch target");
|
row.expect("two live sources should yield a dispatch target");
|
||||||
assert_eq!(
|
assert_eq!(
|
||||||
source_url, new_url,
|
source_url, new_url,
|
||||||
@@ -133,7 +133,7 @@ async fn dispatch_target_skips_dropped_sources(pool: PgPool) {
|
|||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
let row = dispatch_target(&pool, chapter_id).await.unwrap();
|
let row = dispatch_target(&pool, chapter_id).await.unwrap();
|
||||||
let (_manga_id, source_url) =
|
let (_manga_id, source_url, _title, _number) =
|
||||||
row.expect("a single live source should still yield a dispatch target");
|
row.expect("a single live source should still yield a dispatch target");
|
||||||
assert!(
|
assert!(
|
||||||
source_url != new_url,
|
source_url != new_url,
|
||||||
|
|||||||
@@ -17,5 +17,28 @@ services:
|
|||||||
timeout: 5s
|
timeout: 5s
|
||||||
retries: 10
|
retries: 10
|
||||||
|
|
||||||
|
# Optional: TOR daemon for crawler dev. Ports bind to 127.0.0.1 only
|
||||||
|
# — never the LAN — so a native `cargo run` on the host can reach
|
||||||
|
# 127.0.0.1:9050 / 9051. Mirrors the prod tor service (see
|
||||||
|
# docker-compose.yml), just with host-loopback ports and a default
|
||||||
|
# password baked in for friction-free dev.
|
||||||
|
tor:
|
||||||
|
image: dockurr/tor:latest
|
||||||
|
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
|
||||||
|
environment:
|
||||||
|
PASSWORD: ${TOR_CONTROL_PASSWORD:-dev-tor-password}
|
||||||
|
volumes:
|
||||||
|
- ./tor/torrc:/etc/tor/torrc:ro
|
||||||
|
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
|
||||||
|
ports:
|
||||||
|
- "127.0.0.1:9050:9050"
|
||||||
|
- "127.0.0.1:9051:9051"
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
|
||||||
|
interval: 5s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 20
|
||||||
|
start_period: 30s
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
mangalord-postgres-dev:
|
mangalord-postgres-dev:
|
||||||
|
|||||||
@@ -19,11 +19,48 @@ services:
|
|||||||
timeout: 5s
|
timeout: 5s
|
||||||
retries: 10
|
retries: 10
|
||||||
|
|
||||||
|
tor:
|
||||||
|
# SOCKS5 proxy for the crawler, plus a control port so the backend
|
||||||
|
# can signal NEWNYM on bad pages. See tor/torrc for the daemon
|
||||||
|
# config; both ports are only `expose`d (compose-internal), never
|
||||||
|
# bound on the host.
|
||||||
|
#
|
||||||
|
# We bypass dockurr/tor's stock entrypoint because it binds the
|
||||||
|
# control port to localhost (unreachable from the backend
|
||||||
|
# container) and skips its own HashedControlPassword injection
|
||||||
|
# when the user's torrc declares a ControlPort. Our wrapper
|
||||||
|
# (tor/entrypoint.sh) generates the hash from $PASSWORD and execs
|
||||||
|
# tor with our torrc. Backend authenticates with the same plain
|
||||||
|
# string via CRAWLER_TOR_CONTROL_PASSWORD.
|
||||||
|
image: dockurr/tor:latest
|
||||||
|
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
|
||||||
|
environment:
|
||||||
|
PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
|
||||||
|
volumes:
|
||||||
|
- ./tor/torrc:/etc/tor/torrc:ro
|
||||||
|
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
|
||||||
|
expose:
|
||||||
|
- "9050"
|
||||||
|
- "9051"
|
||||||
|
# Wait for both control + SOCKS ports to listen before downstream
|
||||||
|
# services start. dockurr/tor's main process spawns before tor
|
||||||
|
# itself is bound, so `service_started` alone races the first
|
||||||
|
# NEWNYM call.
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
|
||||||
|
interval: 5s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 20
|
||||||
|
start_period: 30s
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
backend:
|
backend:
|
||||||
build: ./backend
|
build: ./backend
|
||||||
depends_on:
|
depends_on:
|
||||||
postgres:
|
postgres:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
|
tor:
|
||||||
|
condition: service_healthy
|
||||||
environment:
|
environment:
|
||||||
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
|
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
|
||||||
BIND_ADDRESS: 0.0.0.0:8080
|
BIND_ADDRESS: 0.0.0.0:8080
|
||||||
@@ -44,6 +81,16 @@ services:
|
|||||||
# arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true`
|
# arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true`
|
||||||
# so the image actually contains the binary.
|
# so the image actually contains the binary.
|
||||||
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
|
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
|
||||||
|
# TOR proxy + NEWNYM recircuit (see .env.example for details).
|
||||||
|
# Defaults assume the bundled `tor` service above; override
|
||||||
|
# CRAWLER_PROXY= and CRAWLER_TOR_CONTROL_URL= (both empty) in
|
||||||
|
# .env to disable. CRAWLER_TOR_CONTROL_PASSWORD MUST match the
|
||||||
|
# tor service's PASSWORD (both wired to the same TOR_CONTROL_PASSWORD
|
||||||
|
# .env var below).
|
||||||
|
CRAWLER_PROXY: ${CRAWLER_PROXY-socks5h://tor:9050}
|
||||||
|
CRAWLER_TOR_CONTROL_URL: ${CRAWLER_TOR_CONTROL_URL-tcp://tor:9051}
|
||||||
|
CRAWLER_TOR_CONTROL_PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
|
||||||
|
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS: ${CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS:-3}
|
||||||
volumes:
|
volumes:
|
||||||
- storage-data:/var/lib/mangalord/storage
|
- storage-data:/var/lib/mangalord/storage
|
||||||
# No host port mapping in the default setup — the frontend proxies
|
# No host port mapping in the default setup — the frontend proxies
|
||||||
|
|||||||
@@ -10,6 +10,15 @@ import { test, expect, type Page } from '@playwright/test';
|
|||||||
const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } };
|
const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } };
|
||||||
|
|
||||||
async function mockAnonymous(page: Page) {
|
async function mockAnonymous(page: Page) {
|
||||||
|
// Force public mode so the root +layout.ts doesn't bounce us to /login
|
||||||
|
// (a dev backend with PRIVATE_MODE=true must not leak into E2E runs).
|
||||||
|
await page.route('**/api/v1/auth/config', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
|
||||||
|
});
|
||||||
|
});
|
||||||
await page.route('**/api/v1/auth/me', async (route) => {
|
await page.route('**/api/v1/auth/me', async (route) => {
|
||||||
await route.fulfill({
|
await route.fulfill({
|
||||||
status: 401,
|
status: 401,
|
||||||
@@ -69,3 +78,53 @@ test('search updates the manga list', async ({ page }) => {
|
|||||||
await expect(page.getByTestId('manga-list')).toContainText('Berserk');
|
await expect(page.getByTestId('manga-list')).toContainText('Berserk');
|
||||||
expect(lastSearch).toBe('berserk');
|
expect(lastSearch).toBe('berserk');
|
||||||
});
|
});
|
||||||
|
|
||||||
|
test('clicking Next paginates to page 2 and updates the URL', async ({ page }) => {
|
||||||
|
await mockAnonymous(page);
|
||||||
|
|
||||||
|
// Fake a catalogue of 75 mangas; page 1 is ids 1..50, page 2 is ids 51..75.
|
||||||
|
const TOTAL = 75;
|
||||||
|
function mangaAt(i: number) {
|
||||||
|
return {
|
||||||
|
id: `m${i}`,
|
||||||
|
title: `Manga ${i}`,
|
||||||
|
author: 'Test',
|
||||||
|
description: null,
|
||||||
|
cover_image_path: null,
|
||||||
|
created_at: '2026-01-01T00:00:00Z',
|
||||||
|
updated_at: '2026-01-01T00:00:00Z',
|
||||||
|
authors: [],
|
||||||
|
genres: []
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
await page.route('**/api/v1/mangas*', async (route) => {
|
||||||
|
const url = new URL(route.request().url());
|
||||||
|
const limit = Number(url.searchParams.get('limit') ?? '50');
|
||||||
|
const offset = Number(url.searchParams.get('offset') ?? '0');
|
||||||
|
const items: ReturnType<typeof mangaAt>[] = [];
|
||||||
|
for (let i = offset + 1; i <= Math.min(offset + limit, TOTAL); i++) {
|
||||||
|
items.push(mangaAt(i));
|
||||||
|
}
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({
|
||||||
|
items,
|
||||||
|
page: { limit, offset, total: TOTAL }
|
||||||
|
})
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
await page.goto('/');
|
||||||
|
await expect(page.getByTestId('manga-total')).toContainText('Showing 1–50 of 75');
|
||||||
|
await expect(page.getByTestId('manga-list')).toContainText('Manga 1');
|
||||||
|
await expect(page.getByTestId('manga-list')).not.toContainText('Manga 75');
|
||||||
|
|
||||||
|
await page.getByTestId('manga-pager').getByRole('button', { name: /next/i }).click();
|
||||||
|
|
||||||
|
await expect(page).toHaveURL(/[?&]page=2(&|$)/);
|
||||||
|
await expect(page.getByTestId('manga-total')).toContainText('Showing 51–75 of 75');
|
||||||
|
await expect(page.getByTestId('manga-list')).toContainText('Manga 75');
|
||||||
|
await expect(page.getByTestId('manga-list')).not.toContainText('Manga 1');
|
||||||
|
});
|
||||||
|
|||||||
67
frontend/e2e/page-title.spec.ts
Normal file
67
frontend/e2e/page-title.spec.ts
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
import { test, expect, type Page } from '@playwright/test';
|
||||||
|
|
||||||
|
// Guards the title-on-nav behavior: without this, a stale title from
|
||||||
|
// the last manga / author page lingers when the user navigates to a
|
||||||
|
// generic page like /upload.
|
||||||
|
|
||||||
|
async function mockAnonymous(page: Page) {
|
||||||
|
await page.route('**/api/v1/auth/config', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
|
||||||
|
});
|
||||||
|
});
|
||||||
|
await page.route('**/api/v1/auth/me', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 401,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ error: { code: 'unauthenticated', message: 'unauthenticated' } })
|
||||||
|
});
|
||||||
|
});
|
||||||
|
await page.route('**/api/v1/mangas*', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ items: [], page: { limit: 50, offset: 0, total: 0 } })
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
test('static route titles use the brand-first layout map', async ({ page }) => {
|
||||||
|
await mockAnonymous(page);
|
||||||
|
|
||||||
|
await page.goto('/');
|
||||||
|
await expect(page).toHaveTitle('Mangalord');
|
||||||
|
|
||||||
|
await page.goto('/upload');
|
||||||
|
await expect(page).toHaveTitle('Mangalord | Upload');
|
||||||
|
|
||||||
|
await page.goto('/login');
|
||||||
|
await expect(page).toHaveTitle('Mangalord | Login');
|
||||||
|
|
||||||
|
await page.goto('/bookmarks');
|
||||||
|
await expect(page).toHaveTitle('Mangalord | Bookmarks');
|
||||||
|
|
||||||
|
await page.goto('/collections');
|
||||||
|
await expect(page).toHaveTitle('Mangalord | Collections');
|
||||||
|
});
|
||||||
|
|
||||||
|
test('title updates when navigating away from a content page', async ({ page }) => {
|
||||||
|
await mockAnonymous(page);
|
||||||
|
|
||||||
|
// Pretend we just left a manga detail page — the document title
|
||||||
|
// would have been overridden to "Mangalord | Berserk". Use evaluate
|
||||||
|
// to set it synthetically so we can assert the regression cleanly
|
||||||
|
// even though the dynamic page itself isn't mocked here.
|
||||||
|
await page.goto('/');
|
||||||
|
await page.evaluate(() => {
|
||||||
|
document.title = 'Mangalord | Berserk';
|
||||||
|
});
|
||||||
|
expect(await page.title()).toBe('Mangalord | Berserk');
|
||||||
|
|
||||||
|
// Client-side nav to /upload — the root layout must reassert its
|
||||||
|
// mapped title or the stale "Berserk" lingers.
|
||||||
|
await page.goto('/upload');
|
||||||
|
await expect(page).toHaveTitle('Mangalord | Upload');
|
||||||
|
});
|
||||||
101
frontend/e2e/private-mode.spec.ts
Normal file
101
frontend/e2e/private-mode.spec.ts
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
import { test, expect, type Page } from '@playwright/test';
|
||||||
|
|
||||||
|
// Network-level mocks for the private-mode UX. The backend integration
|
||||||
|
// tests (api_private_mode.rs) cover the actual gate; here we only
|
||||||
|
// verify that the SvelteKit universal load redirects anonymous
|
||||||
|
// visitors to /login and then back to where they were going.
|
||||||
|
|
||||||
|
const userFixture = {
|
||||||
|
id: 'user-1',
|
||||||
|
username: 'alice',
|
||||||
|
created_at: '2026-01-01T00:00:00Z',
|
||||||
|
is_admin: false
|
||||||
|
};
|
||||||
|
const emptyPage = { items: [], page: { limit: 50, offset: 0, total: null } };
|
||||||
|
|
||||||
|
async function stubPrivateInstance(page: Page) {
|
||||||
|
let loggedIn = false;
|
||||||
|
|
||||||
|
// The flag that flips the gate on. Frontend reads it in
|
||||||
|
// `+layout.ts` to decide whether to redirect.
|
||||||
|
await page.route('**/api/v1/auth/config', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({
|
||||||
|
self_register_enabled: false,
|
||||||
|
private_mode: true
|
||||||
|
})
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
await page.route('**/api/v1/auth/me', async (route) => {
|
||||||
|
if (loggedIn) {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ user: userFixture })
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 401,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({
|
||||||
|
error: { code: 'unauthenticated', message: 'unauthenticated' }
|
||||||
|
})
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
await page.route('**/api/v1/auth/login', async (route) => {
|
||||||
|
loggedIn = true;
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ user: userFixture })
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
// The real backend would 401 these too in private mode; we stub
|
||||||
|
// success so the post-login navigation can render the home page
|
||||||
|
// without an additional redirect cycle.
|
||||||
|
await page.route('**/api/v1/mangas*', async (route) => {
|
||||||
|
await route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify(emptyPage)
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
test('private mode: anonymous visit to / redirects to /login?next=%2F', async ({ page }) => {
|
||||||
|
await stubPrivateInstance(page);
|
||||||
|
await page.goto('/');
|
||||||
|
await expect(page).toHaveURL(/\/login\?next=%2F$/);
|
||||||
|
await expect(page.getByTestId('login-username')).toBeVisible();
|
||||||
|
});
|
||||||
|
|
||||||
|
test('private mode: register link is hidden', async ({ page }) => {
|
||||||
|
await stubPrivateInstance(page);
|
||||||
|
await page.goto('/login');
|
||||||
|
await expect(page.getByTestId('nav-login')).toBeVisible();
|
||||||
|
// self_register_enabled is the effective value (false in private
|
||||||
|
// mode regardless of ALLOW_SELF_REGISTER), so the navbar must
|
||||||
|
// never render the register affordance here.
|
||||||
|
await expect(page.getByTestId('nav-register')).toHaveCount(0);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('private mode: after login the user lands back on the requested page', async ({ page }) => {
|
||||||
|
await stubPrivateInstance(page);
|
||||||
|
|
||||||
|
// Visit a deep link → bounced to /login with next= preserving it.
|
||||||
|
await page.goto('/');
|
||||||
|
await expect(page).toHaveURL(/\/login\?next=%2F$/);
|
||||||
|
|
||||||
|
await page.getByTestId('login-username').fill('alice');
|
||||||
|
await page.getByTestId('login-password').fill('hunter2hunter2');
|
||||||
|
await page.getByTestId('login-submit').click();
|
||||||
|
|
||||||
|
// Authenticated → can now reach the home page without bouncing.
|
||||||
|
await expect(page.getByTestId('session-user')).toContainText('alice');
|
||||||
|
});
|
||||||
167
frontend/e2e/reader-chapter-select.spec.ts
Normal file
167
frontend/e2e/reader-chapter-select.spec.ts
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
import { test, expect, type Page } from '@playwright/test';
|
||||||
|
|
||||||
|
const mangaId = '33333333-3333-3333-3333-333333333333';
|
||||||
|
const chapter1Id = 'c1111111-3333-3333-3333-333333333333';
|
||||||
|
const chapter2Id = 'c2222222-3333-3333-3333-333333333333';
|
||||||
|
const chapter3Id = 'c3333333-3333-3333-3333-333333333333';
|
||||||
|
|
||||||
|
const mangaFixture = {
|
||||||
|
id: mangaId,
|
||||||
|
title: 'Vinland Saga',
|
||||||
|
author: 'Makoto Yukimura',
|
||||||
|
description: null,
|
||||||
|
cover_image_path: null,
|
||||||
|
created_at: '2026-01-01T00:00:00Z',
|
||||||
|
updated_at: '2026-01-01T00:00:00Z'
|
||||||
|
};
|
||||||
|
|
||||||
|
const chaptersFixture = [
|
||||||
|
{
|
||||||
|
id: chapter1Id,
|
||||||
|
manga_id: mangaId,
|
||||||
|
number: 1,
|
||||||
|
title: 'Somewhere, Not Here',
|
||||||
|
page_count: 1,
|
||||||
|
created_at: '2026-01-01T00:00:00Z'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: chapter2Id,
|
||||||
|
manga_id: mangaId,
|
||||||
|
number: 2,
|
||||||
|
title: null,
|
||||||
|
page_count: 1,
|
||||||
|
created_at: '2026-01-02T00:00:00Z'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
id: chapter3Id,
|
||||||
|
manga_id: mangaId,
|
||||||
|
number: 3,
|
||||||
|
title: 'Sword Dance',
|
||||||
|
page_count: 1,
|
||||||
|
created_at: '2026-01-03T00:00:00Z'
|
||||||
|
}
|
||||||
|
];
|
||||||
|
|
||||||
|
function pageFixture(chapterId: string) {
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
id: `p1111111-${chapterId.slice(1, 8)}-3333-3333-333333333333`,
|
||||||
|
chapter_id: chapterId,
|
||||||
|
page_number: 1,
|
||||||
|
storage_key: `mangas/${mangaId}/chapters/${chapterId}/pages/0001.png`,
|
||||||
|
content_type: 'image/png'
|
||||||
|
}
|
||||||
|
];
|
||||||
|
}
|
||||||
|
|
||||||
|
async function mockReaderApis(page: Page) {
|
||||||
|
// Force public mode so the layout doesn't bounce anonymous visitors
|
||||||
|
// to /login (the dev backend on this machine runs with
|
||||||
|
// PRIVATE_MODE=true, which the layout's universal load respects).
|
||||||
|
await page.route('**/api/v1/auth/config', (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ self_register_enabled: true, private_mode: false })
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route('**/api/v1/auth/me', (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 401,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route('**/api/v1/auth/me/preferences', (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 401,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route('**/api/v1/me/bookmarks*', (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 401,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ error: { code: 'unauthenticated', message: '' } })
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route(`**/api/v1/mangas/${mangaId}`, (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify(mangaFixture)
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route(new RegExp(`/api/v1/mangas/${mangaId}/chapters(\\?.*)?$`), (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({
|
||||||
|
items: chaptersFixture,
|
||||||
|
page: { limit: 200, offset: 0, total: chaptersFixture.length }
|
||||||
|
})
|
||||||
|
})
|
||||||
|
);
|
||||||
|
for (const c of chaptersFixture) {
|
||||||
|
await page.route(`**/api/v1/mangas/${mangaId}/chapters/${c.id}`, (route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify(c)
|
||||||
|
})
|
||||||
|
);
|
||||||
|
await page.route(
|
||||||
|
`**/api/v1/mangas/${mangaId}/chapters/${c.id}/pages`,
|
||||||
|
(route) =>
|
||||||
|
route.fulfill({
|
||||||
|
status: 200,
|
||||||
|
contentType: 'application/json',
|
||||||
|
body: JSON.stringify({ pages: pageFixture(c.id) })
|
||||||
|
})
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const png = Buffer.from(
|
||||||
|
'89504e470d0a1a0a0000000d49484452000000010000000108060000001f15c4890000000d49444154789c63000100000005000158a3b62a0000000049454e44ae426082',
|
||||||
|
'hex'
|
||||||
|
);
|
||||||
|
await page.route('**/api/v1/files/**', (route) =>
|
||||||
|
route.fulfill({ status: 200, contentType: 'image/png', body: png })
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
test('reader chapter select lists every chapter with the manga-detail-style label', async ({
|
||||||
|
page
|
||||||
|
}) => {
|
||||||
|
await mockReaderApis(page);
|
||||||
|
await page.goto(`/manga/${mangaId}/chapter/${chapter2Id}`);
|
||||||
|
|
||||||
|
const select = page.getByTestId('reader-chapter-select');
|
||||||
|
await expect(select).toBeVisible();
|
||||||
|
|
||||||
|
// The current chapter is preselected.
|
||||||
|
await expect(select).toHaveValue(chapter2Id);
|
||||||
|
|
||||||
|
// Each chapter rendered as "Ch. N — Title" (or "Ch. N" when title is null),
|
||||||
|
// in ascending number order — matching the prev/next sort.
|
||||||
|
const labels = await select.locator('option').allTextContents();
|
||||||
|
expect(labels.map((l) => l.trim())).toEqual([
|
||||||
|
'Ch. 1 — Somewhere, Not Here',
|
||||||
|
'Ch. 2',
|
||||||
|
'Ch. 3 — Sword Dance'
|
||||||
|
]);
|
||||||
|
});
|
||||||
|
|
||||||
|
test('choosing a chapter from the select navigates to that chapter', async ({ page }) => {
|
||||||
|
await mockReaderApis(page);
|
||||||
|
await page.goto(`/manga/${mangaId}/chapter/${chapter1Id}`);
|
||||||
|
|
||||||
|
await expect(page.getByTestId('reader-chapter-select')).toHaveValue(chapter1Id);
|
||||||
|
|
||||||
|
await page.getByTestId('reader-chapter-select').selectOption(chapter3Id);
|
||||||
|
|
||||||
|
await expect(page).toHaveURL(
|
||||||
|
new RegExp(`/manga/${mangaId}/chapter/${chapter3Id}$`)
|
||||||
|
);
|
||||||
|
await expect(page.getByTestId('reader-chapter-select')).toHaveValue(chapter3Id);
|
||||||
|
});
|
||||||
@@ -120,7 +120,7 @@ test('manga overview shows title, cover, and a chapter list', async ({ page }) =
|
|||||||
await expect(page.getByTestId('manga-title')).toHaveText('Berserk');
|
await expect(page.getByTestId('manga-title')).toHaveText('Berserk');
|
||||||
await expect(page.getByTestId('manga-author')).toContainText('Kentaro Miura');
|
await expect(page.getByTestId('manga-author')).toContainText('Kentaro Miura');
|
||||||
await expect(page.getByTestId('manga-cover')).toBeVisible();
|
await expect(page.getByTestId('manga-cover')).toBeVisible();
|
||||||
await expect(page.getByTestId('chapter-list')).toContainText('Chapter 1');
|
await expect(page.getByTestId('chapter-list')).toContainText('The Brand');
|
||||||
await expect(page.getByTestId('bookmark-signin')).toBeVisible();
|
await expect(page.getByTestId('bookmark-signin')).toBeVisible();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "mangalord-frontend",
|
"name": "mangalord-frontend",
|
||||||
"version": "0.45.0",
|
"version": "0.54.0",
|
||||||
"private": true,
|
"private": true,
|
||||||
"type": "module",
|
"type": "module",
|
||||||
"scripts": {
|
"scripts": {
|
||||||
|
|||||||
@@ -14,7 +14,19 @@ import {
|
|||||||
createAdminUser,
|
createAdminUser,
|
||||||
listAdminMangas,
|
listAdminMangas,
|
||||||
listAdminChapters,
|
listAdminChapters,
|
||||||
getSystemStats
|
getSystemStats,
|
||||||
|
resyncManga,
|
||||||
|
resyncChapter,
|
||||||
|
getCrawlerStatus,
|
||||||
|
crawlerStatusStreamUrl,
|
||||||
|
runCrawlerPass,
|
||||||
|
restartCrawlerBrowser,
|
||||||
|
updateCrawlerSession,
|
||||||
|
clearCrawlerSessionExpired,
|
||||||
|
listDeadJobs,
|
||||||
|
requeueDeadJobs,
|
||||||
|
listActiveJobs,
|
||||||
|
listMissingCovers
|
||||||
} from './admin';
|
} from './admin';
|
||||||
|
|
||||||
function ok(body: unknown, status = 200): Response {
|
function ok(body: unknown, status = 200): Response {
|
||||||
@@ -242,4 +254,211 @@ describe('admin api client', () => {
|
|||||||
const s = await getSystemStats();
|
const s = await getSystemStats();
|
||||||
expect(s.disk).toBeNull();
|
expect(s.disk).toBeNull();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ---- force resync ----
|
||||||
|
|
||||||
|
it('resyncManga POSTs to /v1/admin/mangas/{id}/resync and returns the envelope', async () => {
|
||||||
|
const resp = {
|
||||||
|
manga: {
|
||||||
|
id: 'm-1',
|
||||||
|
title: 'T',
|
||||||
|
status: 'ongoing',
|
||||||
|
alt_titles: [],
|
||||||
|
description: null,
|
||||||
|
cover_image_path: 'mangas/m-1/cover.jpg',
|
||||||
|
created_at: '2026-01-01T00:00:00Z',
|
||||||
|
updated_at: '2026-01-02T00:00:00Z',
|
||||||
|
authors: [],
|
||||||
|
genres: [],
|
||||||
|
tags: []
|
||||||
|
},
|
||||||
|
metadata_status: 'updated',
|
||||||
|
cover_fetched: true
|
||||||
|
};
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok(resp));
|
||||||
|
const got = await resyncManga('m-1');
|
||||||
|
expect(got.metadata_status).toBe('updated');
|
||||||
|
expect(got.cover_fetched).toBe(true);
|
||||||
|
expect(got.manga.id).toBe('m-1');
|
||||||
|
const url = fetchSpy.mock.calls[0][0] as string;
|
||||||
|
expect(url).toMatch(/\/v1\/admin\/mangas\/m-1\/resync$/);
|
||||||
|
const init = fetchSpy.mock.calls[0][1] as RequestInit;
|
||||||
|
expect(init.method).toBe('POST');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resyncManga surfaces 503 service_unavailable when the daemon is off', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(
|
||||||
|
envelope(503, 'service_unavailable', 'crawler daemon is disabled')
|
||||||
|
);
|
||||||
|
await expect(resyncManga('m-1')).rejects.toMatchObject({
|
||||||
|
status: 503,
|
||||||
|
code: 'service_unavailable'
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resyncChapter POSTs to /v1/admin/chapters/{id}/resync and returns the envelope', async () => {
|
||||||
|
const resp = {
|
||||||
|
chapter: {
|
||||||
|
id: 'c-1',
|
||||||
|
manga_id: 'm-1',
|
||||||
|
number: 1,
|
||||||
|
title: 'Foo',
|
||||||
|
page_count: 7,
|
||||||
|
created_at: '2026-01-01T00:00:00Z'
|
||||||
|
},
|
||||||
|
outcome: 'fetched',
|
||||||
|
pages: 7
|
||||||
|
};
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok(resp));
|
||||||
|
const got = await resyncChapter('c-1');
|
||||||
|
expect(got.outcome).toBe('fetched');
|
||||||
|
expect(got.pages).toBe(7);
|
||||||
|
expect(got.chapter.page_count).toBe(7);
|
||||||
|
const url = fetchSpy.mock.calls[0][0] as string;
|
||||||
|
expect(url).toMatch(/\/v1\/admin\/chapters\/c-1\/resync$/);
|
||||||
|
const init = fetchSpy.mock.calls[0][1] as RequestInit;
|
||||||
|
expect(init.method).toBe('POST');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('resyncChapter handles the "skipped" outcome envelope', async () => {
|
||||||
|
const resp = {
|
||||||
|
chapter: {
|
||||||
|
id: 'c-1',
|
||||||
|
manga_id: 'm-1',
|
||||||
|
number: 1,
|
||||||
|
title: null,
|
||||||
|
page_count: 7,
|
||||||
|
created_at: '2026-01-01T00:00:00Z'
|
||||||
|
},
|
||||||
|
outcome: 'skipped',
|
||||||
|
pages: null
|
||||||
|
};
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok(resp));
|
||||||
|
const got = await resyncChapter('c-1');
|
||||||
|
expect(got.outcome).toBe('skipped');
|
||||||
|
expect(got.pages).toBeNull();
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe('admin crawler api client', () => {
|
||||||
|
let fetchSpy: MockInstance<typeof globalThis.fetch>;
|
||||||
|
beforeEach(() => {
|
||||||
|
fetchSpy = vi.spyOn(globalThis, 'fetch');
|
||||||
|
});
|
||||||
|
afterEach(() => {
|
||||||
|
vi.restoreAllMocks();
|
||||||
|
});
|
||||||
|
|
||||||
|
const statusFixture = {
|
||||||
|
daemon: 'running',
|
||||||
|
phase: { state: 'fetching_metadata', index: 3, total: 10, title: 'One Piece' },
|
||||||
|
worker_count: 2,
|
||||||
|
active_chapters: [
|
||||||
|
{
|
||||||
|
manga_id: 'm-1',
|
||||||
|
manga_title: 'Bleach',
|
||||||
|
chapter_id: 'c-1',
|
||||||
|
chapter_number: 12,
|
||||||
|
pages_done: 4,
|
||||||
|
pages_total: 20
|
||||||
|
}
|
||||||
|
],
|
||||||
|
current_cover: { manga_id: 'm-2', manga_title: 'Naruto' },
|
||||||
|
covers_queued: 7,
|
||||||
|
last_pass: { at: null, discovered: 0, upserted: 0, covers_fetched: 0, mangas_failed: 0 },
|
||||||
|
session: { expired: false, configured: true },
|
||||||
|
browser: 'healthy',
|
||||||
|
queue: { pending: 2, running: 1, dead: 4 }
|
||||||
|
};
|
||||||
|
|
||||||
|
it('crawlerStatusStreamUrl points at the SSE endpoint under the API base', () => {
|
||||||
|
expect(crawlerStatusStreamUrl()).toMatch(/\/v1\/admin\/crawler\/stream$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('getCrawlerStatus GETs /v1/admin/crawler with live chapter/cover fields', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok(statusFixture));
|
||||||
|
const s = await getCrawlerStatus();
|
||||||
|
expect(s.queue.dead).toBe(4);
|
||||||
|
expect(s.phase?.state).toBe('fetching_metadata');
|
||||||
|
expect(s.active_chapters[0].pages_done).toBe(4);
|
||||||
|
expect(s.active_chapters[0].pages_total).toBe(20);
|
||||||
|
expect(s.current_cover?.manga_title).toBe('Naruto');
|
||||||
|
expect(s.covers_queued).toBe(7);
|
||||||
|
const url = fetchSpy.mock.calls[0][0] as string;
|
||||||
|
expect(url).toMatch(/\/v1\/admin\/crawler$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('listActiveJobs GETs /v1/admin/crawler/active-jobs with search', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(
|
||||||
|
ok({ items: [], page: { limit: 20, offset: 0, total: 0 } })
|
||||||
|
);
|
||||||
|
await listActiveJobs({ search: 'bleach' });
|
||||||
|
const url = fetchSpy.mock.calls[0][0] as string;
|
||||||
|
expect(url).toMatch(/\/v1\/admin\/crawler\/active-jobs\?/);
|
||||||
|
expect(url).toContain('search=bleach');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('listMissingCovers GETs /v1/admin/crawler/covers', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(
|
||||||
|
ok({ items: [{ manga_id: 'm-1', manga_title: 'X' }], page: { limit: 20, offset: 0, total: 1 } })
|
||||||
|
);
|
||||||
|
const r = await listMissingCovers();
|
||||||
|
expect(r.items[0].manga_title).toBe('X');
|
||||||
|
expect(fetchSpy.mock.calls[0][0]).toMatch(/\/v1\/admin\/crawler\/covers$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('runCrawlerPass POSTs /v1/admin/crawler/run', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok({ started: true }));
|
||||||
|
const r = await runCrawlerPass();
|
||||||
|
expect(r.started).toBe(true);
|
||||||
|
const init = fetchSpy.mock.calls[0][1] as RequestInit;
|
||||||
|
expect(init.method).toBe('POST');
|
||||||
|
expect(fetchSpy.mock.calls[0][0]).toMatch(/\/v1\/admin\/crawler\/run$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('restartCrawlerBrowser POSTs the restart endpoint', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok({ ok: true, error: null }));
|
||||||
|
const r = await restartCrawlerBrowser();
|
||||||
|
expect(r.ok).toBe(true);
|
||||||
|
expect(fetchSpy.mock.calls[0][0]).toMatch(/\/v1\/admin\/crawler\/browser\/restart$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('updateCrawlerSession POSTs the phpsessid body', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok({ valid: true, error: null }));
|
||||||
|
const r = await updateCrawlerSession('abc123');
|
||||||
|
expect(r.valid).toBe(true);
|
||||||
|
const init = fetchSpy.mock.calls[0][1] as RequestInit;
|
||||||
|
expect(init.method).toBe('POST');
|
||||||
|
expect(JSON.parse(init.body as string)).toEqual({ phpsessid: 'abc123' });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('clearCrawlerSessionExpired POSTs clear-expired', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok({ cleared: true }));
|
||||||
|
const r = await clearCrawlerSessionExpired();
|
||||||
|
expect(r.cleared).toBe(true);
|
||||||
|
expect(fetchSpy.mock.calls[0][0]).toMatch(/\/v1\/admin\/crawler\/session\/clear-expired$/);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('listDeadJobs forwards search + pagination', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(
|
||||||
|
ok({ items: [], page: { limit: 20, offset: 20, total: 0 } })
|
||||||
|
);
|
||||||
|
await listDeadJobs({ search: 'naruto', limit: 20, offset: 20 });
|
||||||
|
const url = fetchSpy.mock.calls[0][0] as string;
|
||||||
|
expect(url).toContain('search=naruto');
|
||||||
|
expect(url).toContain('offset=20');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('requeueDeadJobs POSTs the scope body', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(ok({ requeued: 3 }));
|
||||||
|
const r = await requeueDeadJobs({ scope: 'manga', manga_id: 'm-9' });
|
||||||
|
expect(r.requeued).toBe(3);
|
||||||
|
const init = fetchSpy.mock.calls[0][1] as RequestInit;
|
||||||
|
expect(JSON.parse(init.body as string)).toEqual({ scope: 'manga', manga_id: 'm-9' });
|
||||||
|
});
|
||||||
|
|
||||||
|
it('surfaces a 503 as ApiError', async () => {
|
||||||
|
fetchSpy.mockResolvedValueOnce(envelope(503, 'service_unavailable', 'disabled'));
|
||||||
|
await expect(runCrawlerPass()).rejects.toMatchObject({ status: 503 });
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -3,8 +3,10 @@
|
|||||||
// won't reach these routes). 403s thrown here propagate up to the
|
// won't reach these routes). 403s thrown here propagate up to the
|
||||||
// /admin layout, which renders the framework error page.
|
// /admin layout, which renders the framework error page.
|
||||||
|
|
||||||
import { request, type Page } from './client';
|
import { request, apiUrl, type Page } from './client';
|
||||||
import type { User } from './auth';
|
import type { User } from './auth';
|
||||||
|
import type { MangaDetail } from './mangas';
|
||||||
|
import type { Chapter } from './chapters';
|
||||||
|
|
||||||
// ---- users -----------------------------------------------------------------
|
// ---- users -----------------------------------------------------------------
|
||||||
|
|
||||||
@@ -176,3 +178,208 @@ export type SystemStats = {
|
|||||||
export async function getSystemStats(): Promise<SystemStats> {
|
export async function getSystemStats(): Promise<SystemStats> {
|
||||||
return request<SystemStats>('/v1/admin/system');
|
return request<SystemStats>('/v1/admin/system');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ---- force resync ----------------------------------------------------------
|
||||||
|
|
||||||
|
export type MangaResyncResponse = {
|
||||||
|
manga: MangaDetail;
|
||||||
|
metadata_status: 'new' | 'updated' | 'unchanged';
|
||||||
|
cover_fetched: boolean;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type ChapterResyncResponse = {
|
||||||
|
chapter: Chapter;
|
||||||
|
outcome: 'fetched' | 'skipped';
|
||||||
|
/** Page count when `outcome === 'fetched'`; null when skipped. */
|
||||||
|
pages: number | null;
|
||||||
|
};
|
||||||
|
|
||||||
|
/** POST /v1/admin/mangas/:id/resync — refetches metadata + cover from
|
||||||
|
* the manga's live crawler source. Long-running (one HTTP request per
|
||||||
|
* Chromium nav + image download), so the UI should disable the trigger
|
||||||
|
* and surface progress. */
|
||||||
|
export async function resyncManga(id: string): Promise<MangaResyncResponse> {
|
||||||
|
return request<MangaResyncResponse>(
|
||||||
|
`/v1/admin/mangas/${encodeURIComponent(id)}/resync`,
|
||||||
|
{ method: 'POST' }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST /v1/admin/chapters/:id/resync — force-refetches a chapter's
|
||||||
|
* pages even if `page_count > 0`. Same long-running caveat as
|
||||||
|
* `resyncManga`. */
|
||||||
|
export async function resyncChapter(id: string): Promise<ChapterResyncResponse> {
|
||||||
|
return request<ChapterResyncResponse>(
|
||||||
|
`/v1/admin/chapters/${encodeURIComponent(id)}/resync`,
|
||||||
|
{ method: 'POST' }
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- crawler observability + control ---------------------------------------
|
||||||
|
|
||||||
|
/** Current daemon activity. Discriminated on `state`. */
|
||||||
|
export type CrawlerPhase =
|
||||||
|
| { state: 'idle'; next_fire: string | null }
|
||||||
|
| { state: 'walking_list' }
|
||||||
|
| { state: 'fetching_metadata'; index: number; total: number | null; title: string }
|
||||||
|
| { state: 'cover_backfill'; index: number; total: number };
|
||||||
|
|
||||||
|
/** A chapter being crawled right now, with a live page count. */
|
||||||
|
export type ActiveChapter = {
|
||||||
|
manga_id: string;
|
||||||
|
manga_title: string;
|
||||||
|
chapter_id: string;
|
||||||
|
chapter_number: number;
|
||||||
|
pages_done: number;
|
||||||
|
pages_total: number | null;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type CrawlerLastPass = {
|
||||||
|
at: string | null;
|
||||||
|
discovered: number;
|
||||||
|
upserted: number;
|
||||||
|
covers_fetched: number;
|
||||||
|
mangas_failed: number;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type CrawlerStatus = {
|
||||||
|
daemon: 'running' | 'disabled';
|
||||||
|
phase: CrawlerPhase | null;
|
||||||
|
worker_count: number;
|
||||||
|
active_chapters: ActiveChapter[];
|
||||||
|
current_cover: { manga_id: string; manga_title: string } | null;
|
||||||
|
covers_queued: number;
|
||||||
|
last_pass: CrawlerLastPass;
|
||||||
|
session: { expired: boolean; configured: boolean };
|
||||||
|
browser: 'healthy' | 'draining' | 'restarting' | 'down';
|
||||||
|
queue: { pending: number; running: number; dead: number };
|
||||||
|
};
|
||||||
|
|
||||||
|
export async function getCrawlerStatus(): Promise<CrawlerStatus> {
|
||||||
|
return request<CrawlerStatus>('/v1/admin/crawler');
|
||||||
|
}
|
||||||
|
|
||||||
|
/** URL of the Server-Sent Events live-status stream. Open with
|
||||||
|
* `new EventSource(...)` while the crawler page is mounted and close it on
|
||||||
|
* navigate-away so the subscription is scoped to the active page. Each
|
||||||
|
* message is a named `status` event whose `data` is a {@link CrawlerStatus}. */
|
||||||
|
export function crawlerStatusStreamUrl(): string {
|
||||||
|
return apiUrl('/v1/admin/crawler/stream');
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST /v1/admin/crawler/run — trigger an out-of-cycle metadata pass. */
|
||||||
|
export async function runCrawlerPass(): Promise<{ started: boolean }> {
|
||||||
|
return request('/v1/admin/crawler/run', { method: 'POST' });
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST /v1/admin/crawler/browser/restart — coordinated Chromium restart. */
|
||||||
|
export async function restartCrawlerBrowser(): Promise<{ ok: boolean; error: string | null }> {
|
||||||
|
return request('/v1/admin/crawler/browser/restart', { method: 'POST' });
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST /v1/admin/crawler/session — refresh PHPSESSID and re-probe. */
|
||||||
|
export async function updateCrawlerSession(
|
||||||
|
phpsessid: string
|
||||||
|
): Promise<{ valid: boolean; error: string | null }> {
|
||||||
|
return request('/v1/admin/crawler/session', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify({ phpsessid })
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** POST /v1/admin/crawler/session/clear-expired — resume idled workers. */
|
||||||
|
export async function clearCrawlerSessionExpired(): Promise<{ cleared: boolean }> {
|
||||||
|
return request('/v1/admin/crawler/session/clear-expired', { method: 'POST' });
|
||||||
|
}
|
||||||
|
|
||||||
|
export type DeadJob = {
|
||||||
|
id: string;
|
||||||
|
kind: string;
|
||||||
|
chapter_id: string | null;
|
||||||
|
manga_id: string | null;
|
||||||
|
manga_title: string | null;
|
||||||
|
chapter_number: number | null;
|
||||||
|
attempts: number;
|
||||||
|
max_attempts: number;
|
||||||
|
last_error: string | null;
|
||||||
|
updated_at: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type DeadJobsPage = { items: DeadJob[]; page: Page };
|
||||||
|
|
||||||
|
export async function listDeadJobs(opts?: {
|
||||||
|
search?: string;
|
||||||
|
limit?: number;
|
||||||
|
offset?: number;
|
||||||
|
}): Promise<DeadJobsPage> {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (opts?.search) params.set('search', opts.search);
|
||||||
|
if (opts?.limit != null) params.set('limit', String(opts.limit));
|
||||||
|
if (opts?.offset != null) params.set('offset', String(opts.offset));
|
||||||
|
const qs = params.toString();
|
||||||
|
return request<DeadJobsPage>(`/v1/admin/crawler/dead-jobs${qs ? `?${qs}` : ''}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Requeue scope: all dead jobs, one manga's, one chapter's, or a single job. */
|
||||||
|
export type RequeueScope =
|
||||||
|
| { scope: 'all' }
|
||||||
|
| { scope: 'manga'; manga_id: string }
|
||||||
|
| { scope: 'chapter'; chapter_id: string }
|
||||||
|
| { scope: 'job'; job_id: string };
|
||||||
|
|
||||||
|
export async function requeueDeadJobs(scope: RequeueScope): Promise<{ requeued: number }> {
|
||||||
|
return request('/v1/admin/crawler/dead-jobs/requeue', {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'content-type': 'application/json' },
|
||||||
|
body: JSON.stringify(scope)
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** A queued/running chapter-content job (which chapters are queued). */
|
||||||
|
export type ActiveJob = {
|
||||||
|
id: string;
|
||||||
|
chapter_id: string | null;
|
||||||
|
manga_id: string | null;
|
||||||
|
manga_title: string | null;
|
||||||
|
chapter_number: number | null;
|
||||||
|
state: 'pending' | 'running';
|
||||||
|
attempts: number;
|
||||||
|
max_attempts: number;
|
||||||
|
updated_at: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
export type ActiveJobsPage = { items: ActiveJob[]; page: Page };
|
||||||
|
|
||||||
|
/** GET /v1/admin/crawler/active-jobs — which chapters of which mangas are
|
||||||
|
* queued or running now. */
|
||||||
|
export async function listActiveJobs(opts?: {
|
||||||
|
search?: string;
|
||||||
|
limit?: number;
|
||||||
|
offset?: number;
|
||||||
|
}): Promise<ActiveJobsPage> {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (opts?.search) params.set('search', opts.search);
|
||||||
|
if (opts?.limit != null) params.set('limit', String(opts.limit));
|
||||||
|
if (opts?.offset != null) params.set('offset', String(opts.offset));
|
||||||
|
const qs = params.toString();
|
||||||
|
return request<ActiveJobsPage>(`/v1/admin/crawler/active-jobs${qs ? `?${qs}` : ''}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** A manga queued for a cover fetch (no cover yet + a live source). */
|
||||||
|
export type MissingCover = { manga_id: string; manga_title: string };
|
||||||
|
export type MissingCoversPage = { items: MissingCover[]; page: Page };
|
||||||
|
|
||||||
|
/** GET /v1/admin/crawler/covers — which manga covers are queued. */
|
||||||
|
export async function listMissingCovers(opts?: {
|
||||||
|
search?: string;
|
||||||
|
limit?: number;
|
||||||
|
offset?: number;
|
||||||
|
}): Promise<MissingCoversPage> {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (opts?.search) params.set('search', opts.search);
|
||||||
|
if (opts?.limit != null) params.set('limit', String(opts.limit));
|
||||||
|
if (opts?.offset != null) params.set('offset', String(opts.offset));
|
||||||
|
const qs = params.toString();
|
||||||
|
return request<MissingCoversPage>(`/v1/admin/crawler/covers${qs ? `?${qs}` : ''}`);
|
||||||
|
}
|
||||||
|
|||||||
@@ -102,10 +102,14 @@ export async function deleteToken(id: string): Promise<void> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
export type AuthConfig = {
|
export type AuthConfig = {
|
||||||
/** When false, /v1/auth/register returns 403 and the UI should
|
/** Effective value (`allow_self_register && !private_mode`).
|
||||||
|
* When false, /v1/auth/register returns 403 and the UI should
|
||||||
* hide its register affordance. Admins can still mint accounts
|
* hide its register affordance. Admins can still mint accounts
|
||||||
* via POST /v1/admin/users. */
|
* via POST /v1/admin/users. */
|
||||||
self_register_enabled: boolean;
|
self_register_enabled: boolean;
|
||||||
|
/** When true, every read endpoint requires auth and anonymous
|
||||||
|
* visitors are redirected to `/login` (see `+layout.ts`). */
|
||||||
|
private_mode: boolean;
|
||||||
};
|
};
|
||||||
|
|
||||||
/** Public — no auth, no cookie required. */
|
/** Public — no auth, no cookie required. */
|
||||||
|
|||||||
@@ -11,7 +11,8 @@ import {
|
|||||||
listChapters,
|
listChapters,
|
||||||
getChapter,
|
getChapter,
|
||||||
getChapterPages,
|
getChapterPages,
|
||||||
createChapter
|
createChapter,
|
||||||
|
chapterLabel
|
||||||
} from './chapters';
|
} from './chapters';
|
||||||
|
|
||||||
function ok(body: unknown): Response {
|
function ok(body: unknown): Response {
|
||||||
@@ -129,6 +130,18 @@ describe('chapters api client', () => {
|
|||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('chapterLabel', () => {
|
||||||
|
it('returns the site title verbatim when present', () => {
|
||||||
|
expect(chapterLabel({ number: 7, title: 'Ch.7 : Official' })).toBe(
|
||||||
|
'Ch.7 : Official'
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to "Chapter {number}" when title is null', () => {
|
||||||
|
expect(chapterLabel({ number: 3, title: null })).toBe('Chapter 3');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
it('getChapterPages unwraps the {pages} envelope into the array', async () => {
|
it('getChapterPages unwraps the {pages} envelope into the array', async () => {
|
||||||
fetchSpy.mockResolvedValueOnce(
|
fetchSpy.mockResolvedValueOnce(
|
||||||
ok({
|
ok({
|
||||||
|
|||||||
@@ -14,6 +14,10 @@ export type ChaptersPage = {
|
|||||||
page: Page;
|
page: Page;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
export function chapterLabel(c: Pick<Chapter, 'number' | 'title'>): string {
|
||||||
|
return c.title ?? `Chapter ${c.number}`;
|
||||||
|
}
|
||||||
|
|
||||||
export type ListOptions = {
|
export type ListOptions = {
|
||||||
limit?: number;
|
limit?: number;
|
||||||
offset?: number;
|
offset?: number;
|
||||||
|
|||||||
@@ -12,6 +12,15 @@ export function fileUrl(key: string): string {
|
|||||||
return `${BASE}/v1/files/${key}`;
|
return `${BASE}/v1/files/${key}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Builds an API URL for non-`fetch` consumers (e.g. `EventSource` for SSE),
|
||||||
|
* applying the same `VITE_API_BASE` prefix as `request()`. `path` is the
|
||||||
|
* route after the base, e.g. `/v1/admin/crawler/stream`.
|
||||||
|
*/
|
||||||
|
export function apiUrl(path: string): string {
|
||||||
|
return `${BASE}${path}`;
|
||||||
|
}
|
||||||
|
|
||||||
export class ApiError extends Error {
|
export class ApiError extends Error {
|
||||||
constructor(
|
constructor(
|
||||||
public readonly status: number,
|
public readonly status: number,
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ import { getAuthConfig } from './api/auth';
|
|||||||
|
|
||||||
class AuthConfigStore {
|
class AuthConfigStore {
|
||||||
self_register_enabled = $state(true);
|
self_register_enabled = $state(true);
|
||||||
|
private_mode = $state(false);
|
||||||
loaded = $state(false);
|
loaded = $state(false);
|
||||||
private loading = false;
|
private loading = false;
|
||||||
|
|
||||||
@@ -25,6 +26,7 @@ class AuthConfigStore {
|
|||||||
try {
|
try {
|
||||||
const cfg = await getAuthConfig();
|
const cfg = await getAuthConfig();
|
||||||
this.self_register_enabled = cfg.self_register_enabled;
|
this.self_register_enabled = cfg.self_register_enabled;
|
||||||
|
this.private_mode = cfg.private_mode;
|
||||||
this.loaded = true;
|
this.loaded = true;
|
||||||
} catch {
|
} catch {
|
||||||
// Keep optimistic default; next page mount will retry.
|
// Keep optimistic default; next page mount will retry.
|
||||||
@@ -32,6 +34,16 @@ class AuthConfigStore {
|
|||||||
this.loading = false;
|
this.loading = false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Seed from server-rendered layout data so the very first paint
|
||||||
|
* doesn't flash the loading state. Used by `+layout.ts` /
|
||||||
|
* `+layout.svelte` on the universal-load path. Safe to call from
|
||||||
|
* SSR (no `browser` guard) since it touches only reactive state. */
|
||||||
|
seed(cfg: { self_register_enabled: boolean; private_mode: boolean }): void {
|
||||||
|
this.self_register_enabled = cfg.self_register_enabled;
|
||||||
|
this.private_mode = cfg.private_mode;
|
||||||
|
this.loaded = true;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
export const authConfig = new AuthConfigStore();
|
export const authConfig = new AuthConfigStore();
|
||||||
|
|||||||
128
frontend/src/lib/components/Pager.svelte
Normal file
128
frontend/src/lib/components/Pager.svelte
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
<script lang="ts">
|
||||||
|
type Props = {
|
||||||
|
page: number;
|
||||||
|
totalPages: number;
|
||||||
|
onChange: (page: number) => void;
|
||||||
|
testid?: string;
|
||||||
|
};
|
||||||
|
|
||||||
|
let { page, totalPages, onChange, testid }: Props = $props();
|
||||||
|
|
||||||
|
type Slot = number | 'ellipsis';
|
||||||
|
|
||||||
|
// Compact layout: always show first + last, surround the current page with
|
||||||
|
// its direct neighbours, and use "…" to elide the rest. Keeps the bar to
|
||||||
|
// at most 7 buttons regardless of totalPages.
|
||||||
|
function buildSlots(p: number, total: number): Slot[] {
|
||||||
|
if (total <= 7) {
|
||||||
|
return Array.from({ length: total }, (_, i) => i + 1);
|
||||||
|
}
|
||||||
|
const out: Slot[] = [1];
|
||||||
|
if (p <= 4) {
|
||||||
|
for (let i = 2; i <= 5; i++) out.push(i);
|
||||||
|
out.push('ellipsis');
|
||||||
|
out.push(total);
|
||||||
|
} else if (p >= total - 3) {
|
||||||
|
out.push('ellipsis');
|
||||||
|
for (let i = total - 4; i <= total; i++) out.push(i);
|
||||||
|
} else {
|
||||||
|
out.push('ellipsis');
|
||||||
|
out.push(p - 1);
|
||||||
|
out.push(p);
|
||||||
|
out.push(p + 1);
|
||||||
|
out.push('ellipsis');
|
||||||
|
out.push(total);
|
||||||
|
}
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slots = $derived(buildSlots(page, totalPages));
|
||||||
|
</script>
|
||||||
|
|
||||||
|
{#if totalPages > 1}
|
||||||
|
<nav class="pager" aria-label="Pagination" data-testid={testid}>
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
class="step"
|
||||||
|
disabled={page <= 1}
|
||||||
|
onclick={() => onChange(page - 1)}
|
||||||
|
aria-label="Previous page"
|
||||||
|
>
|
||||||
|
‹ Prev
|
||||||
|
</button>
|
||||||
|
|
||||||
|
{#each slots as slot, i (i)}
|
||||||
|
{#if slot === 'ellipsis'}
|
||||||
|
<span class="ellipsis" aria-hidden="true">…</span>
|
||||||
|
{:else}
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
class="num"
|
||||||
|
class:active={slot === page}
|
||||||
|
aria-current={slot === page ? 'page' : undefined}
|
||||||
|
aria-label={`Go to page ${slot}`}
|
||||||
|
onclick={() => onChange(slot)}
|
||||||
|
>
|
||||||
|
{slot}
|
||||||
|
</button>
|
||||||
|
{/if}
|
||||||
|
{/each}
|
||||||
|
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
class="step"
|
||||||
|
disabled={page >= totalPages}
|
||||||
|
onclick={() => onChange(page + 1)}
|
||||||
|
aria-label="Next page"
|
||||||
|
>
|
||||||
|
Next ›
|
||||||
|
</button>
|
||||||
|
</nav>
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
<style>
|
||||||
|
.pager {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
align-items: center;
|
||||||
|
gap: var(--space-1);
|
||||||
|
margin: var(--space-4) 0;
|
||||||
|
justify-content: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
.step,
|
||||||
|
.num {
|
||||||
|
min-width: 36px;
|
||||||
|
height: 36px;
|
||||||
|
padding: 0 var(--space-2);
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
color: var(--text);
|
||||||
|
cursor: pointer;
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
|
||||||
|
.step:hover:not(:disabled),
|
||||||
|
.num:hover:not(.active) {
|
||||||
|
border-color: var(--primary);
|
||||||
|
}
|
||||||
|
|
||||||
|
.step:disabled {
|
||||||
|
opacity: 0.4;
|
||||||
|
cursor: not-allowed;
|
||||||
|
}
|
||||||
|
|
||||||
|
.num.active {
|
||||||
|
background: var(--primary);
|
||||||
|
color: var(--primary-contrast);
|
||||||
|
border-color: var(--primary);
|
||||||
|
cursor: default;
|
||||||
|
}
|
||||||
|
|
||||||
|
.ellipsis {
|
||||||
|
padding: 0 var(--space-1);
|
||||||
|
color: var(--text-muted);
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
77
frontend/src/lib/components/Pager.svelte.test.ts
Normal file
77
frontend/src/lib/components/Pager.svelte.test.ts
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
import { describe, it, expect, vi, afterEach } from 'vitest';
|
||||||
|
import { render, screen, cleanup } from '@testing-library/svelte';
|
||||||
|
import Pager from './Pager.svelte';
|
||||||
|
|
||||||
|
afterEach(() => cleanup());
|
||||||
|
|
||||||
|
describe('Pager', () => {
|
||||||
|
it('renders nothing when totalPages <= 1', () => {
|
||||||
|
const { container } = render(Pager, { props: { page: 1, totalPages: 1, onChange: () => {} } });
|
||||||
|
expect(container.querySelector('nav')).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('disables Prev on the first page and Next on the last', () => {
|
||||||
|
const { rerender } = render(Pager, {
|
||||||
|
props: { page: 1, totalPages: 5, onChange: () => {} }
|
||||||
|
});
|
||||||
|
expect((screen.getByRole('button', { name: /prev/i }) as HTMLButtonElement).disabled).toBe(true);
|
||||||
|
expect((screen.getByRole('button', { name: /next/i }) as HTMLButtonElement).disabled).toBe(false);
|
||||||
|
|
||||||
|
rerender({ page: 5, totalPages: 5, onChange: () => {} });
|
||||||
|
expect((screen.getByRole('button', { name: /prev/i }) as HTMLButtonElement).disabled).toBe(false);
|
||||||
|
expect((screen.getByRole('button', { name: /next/i }) as HTMLButtonElement).disabled).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('marks the current page button as aria-current', () => {
|
||||||
|
render(Pager, { props: { page: 3, totalPages: 5, onChange: () => {} } });
|
||||||
|
const current = screen.getByRole('button', { name: /go to page 3/i });
|
||||||
|
expect(current.getAttribute('aria-current')).toBe('page');
|
||||||
|
});
|
||||||
|
|
||||||
|
it('fires onChange with the clicked page number', async () => {
|
||||||
|
const onChange = vi.fn();
|
||||||
|
render(Pager, { props: { page: 1, totalPages: 5, onChange } });
|
||||||
|
screen.getByRole('button', { name: /go to page 3/i }).click();
|
||||||
|
expect(onChange).toHaveBeenCalledWith(3);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('Prev decrements and Next increments via onChange', () => {
|
||||||
|
const onChange = vi.fn();
|
||||||
|
render(Pager, { props: { page: 3, totalPages: 5, onChange } });
|
||||||
|
screen.getByRole('button', { name: /prev/i }).click();
|
||||||
|
screen.getByRole('button', { name: /next/i }).click();
|
||||||
|
expect(onChange).toHaveBeenNthCalledWith(1, 2);
|
||||||
|
expect(onChange).toHaveBeenNthCalledWith(2, 4);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('shows every page button when totalPages <= 7', () => {
|
||||||
|
render(Pager, { props: { page: 4, totalPages: 7, onChange: () => {} } });
|
||||||
|
for (let n = 1; n <= 7; n++) {
|
||||||
|
expect(screen.getByRole('button', { name: new RegExp(`go to page ${n}$`, 'i') })).toBeTruthy();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
it('collapses middle pages with ellipsis when totalPages > 7 and current is in the middle', () => {
|
||||||
|
render(Pager, { props: { page: 10, totalPages: 24, onChange: () => {} } });
|
||||||
|
// First and last are always shown
|
||||||
|
expect(screen.getByRole('button', { name: /go to page 1$/i })).toBeTruthy();
|
||||||
|
expect(screen.getByRole('button', { name: /go to page 24$/i })).toBeTruthy();
|
||||||
|
// Current and direct neighbours are shown
|
||||||
|
expect(screen.getByRole('button', { name: /go to page 9$/i })).toBeTruthy();
|
||||||
|
expect(screen.getByRole('button', { name: /go to page 10$/i })).toBeTruthy();
|
||||||
|
expect(screen.getByRole('button', { name: /go to page 11$/i })).toBeTruthy();
|
||||||
|
// Distant pages are NOT rendered as buttons
|
||||||
|
expect(screen.queryByRole('button', { name: /go to page 2$/i })).toBeNull();
|
||||||
|
expect(screen.queryByRole('button', { name: /go to page 23$/i })).toBeNull();
|
||||||
|
// Ellipsis appears on both sides
|
||||||
|
const ellipses = screen.getAllByText('…');
|
||||||
|
expect(ellipses.length).toBeGreaterThanOrEqual(2);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does not duplicate boundary buttons when current is near the edge', () => {
|
||||||
|
render(Pager, { props: { page: 2, totalPages: 20, onChange: () => {} } });
|
||||||
|
// Each page button rendered should be unique — no duplicate "go to page 1"
|
||||||
|
const first = screen.getAllByRole('button', { name: /go to page 1$/i });
|
||||||
|
expect(first.length).toBe(1);
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import { onMount, onDestroy } from 'svelte';
|
import { onMount, onDestroy } from 'svelte';
|
||||||
import { goto } from '$app/navigation';
|
import { goto } from '$app/navigation';
|
||||||
|
import { page } from '$app/stores';
|
||||||
import { logout } from '$lib/api/auth';
|
import { logout } from '$lib/api/auth';
|
||||||
import { authConfig } from '$lib/auth-config.svelte';
|
import { authConfig } from '$lib/auth-config.svelte';
|
||||||
import { preferences } from '$lib/preferences.svelte';
|
import { preferences } from '$lib/preferences.svelte';
|
||||||
@@ -14,15 +15,49 @@
|
|||||||
import Shield from '@lucide/svelte/icons/shield';
|
import Shield from '@lucide/svelte/icons/shield';
|
||||||
import '$lib/styles/tokens.css';
|
import '$lib/styles/tokens.css';
|
||||||
|
|
||||||
let { children } = $props();
|
let { children, data } = $props();
|
||||||
let loggingOut = $state(false);
|
let loggingOut = $state(false);
|
||||||
let headerEl: HTMLElement | undefined = $state();
|
let headerEl: HTMLElement | undefined = $state();
|
||||||
|
|
||||||
|
// Static-route title map. Dynamic pages (manga / author / collection /
|
||||||
|
// chapter) override this via their own <svelte:head><title>, since the
|
||||||
|
// title depends on data the layout doesn't have. Routes omitted here
|
||||||
|
// (notably the dynamic ones) fall through to the bare brand and rely
|
||||||
|
// on the page to set the descriptive form.
|
||||||
|
const STATIC_TITLES: Record<string, string> = {
|
||||||
|
'/': 'Mangalord',
|
||||||
|
'/login': 'Mangalord | Login',
|
||||||
|
'/register': 'Mangalord | Register',
|
||||||
|
'/upload': 'Mangalord | Upload',
|
||||||
|
'/bookmarks': 'Mangalord | Bookmarks',
|
||||||
|
'/collections': 'Mangalord | Collections',
|
||||||
|
'/profile': 'Mangalord | Profile',
|
||||||
|
'/profile/account': 'Mangalord | Account',
|
||||||
|
'/profile/bookmarks': 'Mangalord | Bookmarks',
|
||||||
|
'/profile/collections': 'Mangalord | Collections',
|
||||||
|
'/profile/history': 'Mangalord | Reading history',
|
||||||
|
'/profile/preferences': 'Mangalord | Preferences',
|
||||||
|
'/admin': 'Mangalord | Admin',
|
||||||
|
'/admin/mangas': 'Mangalord | Admin · Mangas',
|
||||||
|
'/admin/users': 'Mangalord | Admin · Users',
|
||||||
|
'/admin/system': 'Mangalord | Admin · System'
|
||||||
|
};
|
||||||
|
|
||||||
|
const layoutTitle = $derived(STATIC_TITLES[$page.route?.id ?? ''] ?? 'Mangalord');
|
||||||
|
|
||||||
|
// Seed authConfig from the universal layout load. $effect keeps
|
||||||
|
// the store in sync if `data` is replaced by a subsequent layout
|
||||||
|
// load (client-side nav). The first run also covers initial
|
||||||
|
// hydration so the navbar's register link reflects the real
|
||||||
|
// server flag without a separate fetch.
|
||||||
|
$effect(() => {
|
||||||
|
authConfig.seed(data.authConfig);
|
||||||
|
});
|
||||||
|
|
||||||
onMount(() => {
|
onMount(() => {
|
||||||
theme.init();
|
theme.init();
|
||||||
preferences.init();
|
preferences.init();
|
||||||
if (!session.loaded) session.refresh();
|
if (!session.loaded) session.refresh();
|
||||||
if (!authConfig.loaded) authConfig.load();
|
|
||||||
|
|
||||||
// Publish the header's measured height as a CSS custom
|
// Publish the header's measured height as a CSS custom
|
||||||
// property so sticky descendants (e.g. the reader nav) can
|
// property so sticky descendants (e.g. the reader nav) can
|
||||||
@@ -70,6 +105,10 @@
|
|||||||
}
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
|
<svelte:head>
|
||||||
|
<title>{layoutTitle}</title>
|
||||||
|
</svelte:head>
|
||||||
|
|
||||||
<header bind:this={headerEl}>
|
<header bind:this={headerEl}>
|
||||||
<nav aria-label="primary">
|
<nav aria-label="primary">
|
||||||
<a class="brand" href="/">Mangalord</a>
|
<a class="brand" href="/">Mangalord</a>
|
||||||
|
|||||||
41
frontend/src/routes/+layout.ts
Normal file
41
frontend/src/routes/+layout.ts
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
// Universal root load. Surfaces /auth/config to every page so the
|
||||||
|
// navbar + layout can render without an extra round-trip, and — when
|
||||||
|
// the backend reports PRIVATE_MODE=true — bounces anonymous visitors
|
||||||
|
// to /login before any page-specific load fires. The backend
|
||||||
|
// middleware is still the source of truth for the gate; this just
|
||||||
|
// matches the UX so users don't see a page full of failed fetches.
|
||||||
|
import type { LayoutLoad } from './$types';
|
||||||
|
import { redirect } from '@sveltejs/kit';
|
||||||
|
import { getAuthConfig, me, type AuthConfig } from '$lib/api/auth';
|
||||||
|
|
||||||
|
// Paths reachable anonymously even when private_mode is on. /login is
|
||||||
|
// the entry point of the auth flow; everything else (including
|
||||||
|
// /register, which is force-blocked in private mode) bounces.
|
||||||
|
const PRIVATE_MODE_BYPASS = new Set(['/login']);
|
||||||
|
|
||||||
|
const PUBLIC_DEFAULTS: AuthConfig = {
|
||||||
|
self_register_enabled: true,
|
||||||
|
private_mode: false
|
||||||
|
};
|
||||||
|
|
||||||
|
export const load: LayoutLoad = async ({ url }) => {
|
||||||
|
let authConfig: AuthConfig = PUBLIC_DEFAULTS;
|
||||||
|
try {
|
||||||
|
authConfig = await getAuthConfig();
|
||||||
|
} catch {
|
||||||
|
// Fail-soft: keep the optimistic public-mode defaults so a
|
||||||
|
// backend hiccup doesn't lock anyone out of the login page.
|
||||||
|
// No private data can leak through here — the backend
|
||||||
|
// middleware is still authoritative for the gate.
|
||||||
|
}
|
||||||
|
|
||||||
|
if (authConfig.private_mode && !PRIVATE_MODE_BYPASS.has(url.pathname)) {
|
||||||
|
const user = await me().catch(() => null);
|
||||||
|
if (!user) {
|
||||||
|
const next = url.pathname + url.search;
|
||||||
|
redirect(302, `/login?next=${encodeURIComponent(next)}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return { authConfig };
|
||||||
|
};
|
||||||
@@ -13,10 +13,13 @@
|
|||||||
import { listTags, type Tag } from '$lib/api/tags';
|
import { listTags, type Tag } from '$lib/api/tags';
|
||||||
import Chip from '$lib/components/Chip.svelte';
|
import Chip from '$lib/components/Chip.svelte';
|
||||||
import MangaCard from '$lib/components/MangaCard.svelte';
|
import MangaCard from '$lib/components/MangaCard.svelte';
|
||||||
|
import Pager from '$lib/components/Pager.svelte';
|
||||||
import Search from '@lucide/svelte/icons/search';
|
import Search from '@lucide/svelte/icons/search';
|
||||||
import SlidersHorizontal from '@lucide/svelte/icons/sliders-horizontal';
|
import SlidersHorizontal from '@lucide/svelte/icons/sliders-horizontal';
|
||||||
import Plus from '@lucide/svelte/icons/plus';
|
import Plus from '@lucide/svelte/icons/plus';
|
||||||
|
|
||||||
|
const PAGE_SIZE = 50;
|
||||||
|
|
||||||
let mangas: MangaCardData[] = $state([]);
|
let mangas: MangaCardData[] = $state([]);
|
||||||
let search = $state('');
|
let search = $state('');
|
||||||
let sort: MangaSort = $state('recent');
|
let sort: MangaSort = $state('recent');
|
||||||
@@ -36,11 +39,21 @@
|
|||||||
let total: number | null = $state(null);
|
let total: number | null = $state(null);
|
||||||
let loading = $state(true);
|
let loading = $state(true);
|
||||||
let error: string | null = $state(null);
|
let error: string | null = $state(null);
|
||||||
|
let currentPage = $state(1);
|
||||||
|
|
||||||
const activeFilterCount = $derived(
|
const activeFilterCount = $derived(
|
||||||
(statusFilter ? 1 : 0) + selectedGenres.length + selectedTags.length
|
(statusFilter ? 1 : 0) + selectedGenres.length + selectedTags.length
|
||||||
);
|
);
|
||||||
|
|
||||||
|
const totalPages = $derived(
|
||||||
|
total != null && total > 0 ? Math.ceil(total / PAGE_SIZE) : 1
|
||||||
|
);
|
||||||
|
|
||||||
|
// 1-indexed range like "51–100 of 237", clamped to the actual loaded set
|
||||||
|
// in case the last page is short.
|
||||||
|
const rangeStart = $derived(mangas.length === 0 ? 0 : (currentPage - 1) * PAGE_SIZE + 1);
|
||||||
|
const rangeEnd = $derived((currentPage - 1) * PAGE_SIZE + mangas.length);
|
||||||
|
|
||||||
async function load() {
|
async function load() {
|
||||||
loading = true;
|
loading = true;
|
||||||
error = null;
|
error = null;
|
||||||
@@ -50,7 +63,9 @@
|
|||||||
status: statusFilter || undefined,
|
status: statusFilter || undefined,
|
||||||
genreIds: selectedGenres.map((g) => g.id),
|
genreIds: selectedGenres.map((g) => g.id),
|
||||||
tagIds: selectedTags.map((t) => t.id),
|
tagIds: selectedTags.map((t) => t.id),
|
||||||
sort
|
sort,
|
||||||
|
limit: PAGE_SIZE,
|
||||||
|
offset: (currentPage - 1) * PAGE_SIZE
|
||||||
});
|
});
|
||||||
mangas = result.items;
|
mangas = result.items;
|
||||||
total = result.page.total;
|
total = result.page.total;
|
||||||
@@ -71,11 +86,29 @@
|
|||||||
params.set('genres', selectedGenres.map((g) => g.id).join(','));
|
params.set('genres', selectedGenres.map((g) => g.id).join(','));
|
||||||
if (selectedTags.length)
|
if (selectedTags.length)
|
||||||
params.set('tags', selectedTags.map((t) => t.id).join(','));
|
params.set('tags', selectedTags.map((t) => t.id).join(','));
|
||||||
|
if (currentPage > 1) params.set('page', String(currentPage));
|
||||||
const qs = params.toString();
|
const qs = params.toString();
|
||||||
const url = qs ? `/?${qs}` : '/';
|
const url = qs ? `/?${qs}` : '/';
|
||||||
goto(url, { replaceState: true, keepFocus: true, noScroll: true });
|
goto(url, { replaceState: true, keepFocus: true, noScroll: true });
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Filter / search / sort changes invalidate the current page — drop back
|
||||||
|
// to page 1 so the user isn't stranded on an out-of-range page when the
|
||||||
|
// result set shrinks. Direct page navigation calls `goToPage()` instead.
|
||||||
|
function resetAndReload() {
|
||||||
|
currentPage = 1;
|
||||||
|
syncUrl();
|
||||||
|
load();
|
||||||
|
}
|
||||||
|
|
||||||
|
function goToPage(p: number) {
|
||||||
|
if (p === currentPage) return;
|
||||||
|
currentPage = p;
|
||||||
|
syncUrl();
|
||||||
|
load();
|
||||||
|
if (browser) window.scrollTo({ top: 0, behavior: 'smooth' });
|
||||||
|
}
|
||||||
|
|
||||||
async function hydrateFromUrl() {
|
async function hydrateFromUrl() {
|
||||||
// Parse the query and resolve the supplied ids back to full Tag /
|
// Parse the query and resolve the supplied ids back to full Tag /
|
||||||
// Genre objects so the chip rows render real labels.
|
// Genre objects so the chip rows render real labels.
|
||||||
@@ -100,6 +133,8 @@
|
|||||||
const tags = await listTags({ limit: 50 });
|
const tags = await listTags({ limit: 50 });
|
||||||
selectedTags = tags.filter((t) => tagIds.includes(t.id));
|
selectedTags = tags.filter((t) => tagIds.includes(t.id));
|
||||||
}
|
}
|
||||||
|
const pageParam = Number(url.searchParams.get('page') ?? '1');
|
||||||
|
currentPage = Number.isFinite(pageParam) && pageParam >= 1 ? Math.floor(pageParam) : 1;
|
||||||
// Open the filters panel if anything is active so the user can see why.
|
// Open the filters panel if anything is active so the user can see why.
|
||||||
if (statusFilter || selectedGenres.length || selectedTags.length) {
|
if (statusFilter || selectedGenres.length || selectedTags.length) {
|
||||||
filtersOpen = true;
|
filtersOpen = true;
|
||||||
@@ -108,32 +143,27 @@
|
|||||||
|
|
||||||
async function onSubmit(e: SubmitEvent) {
|
async function onSubmit(e: SubmitEvent) {
|
||||||
e.preventDefault();
|
e.preventDefault();
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
await load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function onSortChange() {
|
function onSortChange() {
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function onStatusChange() {
|
function onStatusChange() {
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function toggleGenre(g: Genre) {
|
function toggleGenre(g: Genre) {
|
||||||
selectedGenres = selectedGenres.some((x) => x.id === g.id)
|
selectedGenres = selectedGenres.some((x) => x.id === g.id)
|
||||||
? selectedGenres.filter((x) => x.id !== g.id)
|
? selectedGenres.filter((x) => x.id !== g.id)
|
||||||
: [...selectedGenres, g];
|
: [...selectedGenres, g];
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function removeTag(t: Tag) {
|
function removeTag(t: Tag) {
|
||||||
selectedTags = selectedTags.filter((x) => x.id !== t.id);
|
selectedTags = selectedTags.filter((x) => x.id !== t.id);
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function pickTag(t: Tag) {
|
function pickTag(t: Tag) {
|
||||||
@@ -143,8 +173,7 @@
|
|||||||
tagDraft = '';
|
tagDraft = '';
|
||||||
tagSuggestions = [];
|
tagSuggestions = [];
|
||||||
tagSuggestHighlight = -1;
|
tagSuggestHighlight = -1;
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function onTagDraftInput() {
|
function onTagDraftInput() {
|
||||||
@@ -192,8 +221,7 @@
|
|||||||
statusFilter = '';
|
statusFilter = '';
|
||||||
selectedGenres = [];
|
selectedGenres = [];
|
||||||
selectedTags = [];
|
selectedTags = [];
|
||||||
syncUrl();
|
resetAndReload();
|
||||||
load();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
onMount(async () => {
|
onMount(async () => {
|
||||||
@@ -383,7 +411,7 @@
|
|||||||
{:else}
|
{:else}
|
||||||
{#if total !== null}
|
{#if total !== null}
|
||||||
<p class="count" data-testid="manga-total">
|
<p class="count" data-testid="manga-total">
|
||||||
Showing {mangas.length} of {total}
|
Showing {rangeStart}–{rangeEnd} of {total}
|
||||||
</p>
|
</p>
|
||||||
{/if}
|
{/if}
|
||||||
<ul class="manga-grid" data-testid="manga-list">
|
<ul class="manga-grid" data-testid="manga-list">
|
||||||
@@ -391,6 +419,12 @@
|
|||||||
<MangaCard manga={m} authors={m.authors} genres={m.genres} />
|
<MangaCard manga={m} authors={m.authors} genres={m.genres} />
|
||||||
{/each}
|
{/each}
|
||||||
</ul>
|
</ul>
|
||||||
|
<Pager
|
||||||
|
page={currentPage}
|
||||||
|
{totalPages}
|
||||||
|
onChange={goToPage}
|
||||||
|
testid="manga-pager"
|
||||||
|
/>
|
||||||
{/if}
|
{/if}
|
||||||
|
|
||||||
<style>
|
<style>
|
||||||
|
|||||||
@@ -6,6 +6,7 @@
|
|||||||
{ href: '/admin', label: 'Overview' },
|
{ href: '/admin', label: 'Overview' },
|
||||||
{ href: '/admin/users', label: 'Users' },
|
{ href: '/admin/users', label: 'Users' },
|
||||||
{ href: '/admin/mangas', label: 'Mangas' },
|
{ href: '/admin/mangas', label: 'Mangas' },
|
||||||
|
{ href: '/admin/crawler', label: 'Crawler' },
|
||||||
{ href: '/admin/system', label: 'System' }
|
{ href: '/admin/system', label: 'System' }
|
||||||
];
|
];
|
||||||
</script>
|
</script>
|
||||||
|
|||||||
838
frontend/src/routes/admin/crawler/+page.svelte
Normal file
838
frontend/src/routes/admin/crawler/+page.svelte
Normal file
@@ -0,0 +1,838 @@
|
|||||||
|
<script lang="ts">
|
||||||
|
import { onMount, onDestroy } from 'svelte';
|
||||||
|
import Modal from '$lib/components/Modal.svelte';
|
||||||
|
import Pager from '$lib/components/Pager.svelte';
|
||||||
|
import {
|
||||||
|
getCrawlerStatus,
|
||||||
|
crawlerStatusStreamUrl,
|
||||||
|
runCrawlerPass,
|
||||||
|
restartCrawlerBrowser,
|
||||||
|
updateCrawlerSession,
|
||||||
|
clearCrawlerSessionExpired,
|
||||||
|
listDeadJobs,
|
||||||
|
requeueDeadJobs,
|
||||||
|
listActiveJobs,
|
||||||
|
listMissingCovers,
|
||||||
|
type CrawlerStatus,
|
||||||
|
type CrawlerPhase,
|
||||||
|
type DeadJob,
|
||||||
|
type ActiveJob,
|
||||||
|
type MissingCover,
|
||||||
|
type RequeueScope
|
||||||
|
} from '$lib/api/admin';
|
||||||
|
|
||||||
|
let status: CrawlerStatus | null = $state(null);
|
||||||
|
let error: string | null = $state(null);
|
||||||
|
let notice: string | null = $state(null);
|
||||||
|
let live = $state(false);
|
||||||
|
let source: EventSource | null = null;
|
||||||
|
let busy = $state(false);
|
||||||
|
|
||||||
|
// Dead jobs
|
||||||
|
let deadJobs: DeadJob[] = $state([]);
|
||||||
|
let deadTotal = $state(0);
|
||||||
|
let deadSearch = $state('');
|
||||||
|
let deadPage = $state(1);
|
||||||
|
const DEAD_LIMIT = 20;
|
||||||
|
|
||||||
|
// Queued chapters (pending/running)
|
||||||
|
let activeJobs: ActiveJob[] = $state([]);
|
||||||
|
let activeTotal = $state(0);
|
||||||
|
let activeSearch = $state('');
|
||||||
|
let activePage = $state(1);
|
||||||
|
const ACTIVE_LIMIT = 20;
|
||||||
|
|
||||||
|
// Queued covers (mangas missing a cover)
|
||||||
|
let covers: MissingCover[] = $state([]);
|
||||||
|
let coversTotal = $state(0);
|
||||||
|
let coversSearch = $state('');
|
||||||
|
let coversPage = $state(1);
|
||||||
|
const COVERS_LIMIT = 20;
|
||||||
|
|
||||||
|
// Modals
|
||||||
|
let sessionModalOpen = $state(false);
|
||||||
|
let restartModalOpen = $state(false);
|
||||||
|
let phpsessid = $state('');
|
||||||
|
let sessionResult: string | null = $state(null);
|
||||||
|
|
||||||
|
async function refresh() {
|
||||||
|
try {
|
||||||
|
status = await getCrawlerStatus();
|
||||||
|
error = null;
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof Error ? e.message : 'refresh failed';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadDeadJobs() {
|
||||||
|
try {
|
||||||
|
const resp = await listDeadJobs({
|
||||||
|
search: deadSearch.trim() || undefined,
|
||||||
|
limit: DEAD_LIMIT,
|
||||||
|
offset: (deadPage - 1) * DEAD_LIMIT
|
||||||
|
});
|
||||||
|
deadJobs = resp.items;
|
||||||
|
deadTotal = resp.page.total ?? resp.items.length;
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof Error ? e.message : 'failed to load dead jobs';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadActiveJobs() {
|
||||||
|
try {
|
||||||
|
const resp = await listActiveJobs({
|
||||||
|
search: activeSearch.trim() || undefined,
|
||||||
|
limit: ACTIVE_LIMIT,
|
||||||
|
offset: (activePage - 1) * ACTIVE_LIMIT
|
||||||
|
});
|
||||||
|
activeJobs = resp.items;
|
||||||
|
activeTotal = resp.page.total ?? resp.items.length;
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof Error ? e.message : 'failed to load queued chapters';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadCovers() {
|
||||||
|
try {
|
||||||
|
const resp = await listMissingCovers({
|
||||||
|
search: coversSearch.trim() || undefined,
|
||||||
|
limit: COVERS_LIMIT,
|
||||||
|
offset: (coversPage - 1) * COVERS_LIMIT
|
||||||
|
});
|
||||||
|
covers = resp.items;
|
||||||
|
coversTotal = resp.page.total ?? resp.items.length;
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof Error ? e.message : 'failed to load queued covers';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Auto-refresh the (fetched, not streamed) backlog lists when the live
|
||||||
|
// status shows the relevant counts moved — keeps the lists feeling live
|
||||||
|
// without pushing big payloads over SSE. `$effect` re-runs when these
|
||||||
|
// tracked values change.
|
||||||
|
let lastQueueKey = $state('');
|
||||||
|
let lastCoversKey = $state(-1);
|
||||||
|
$effect(() => {
|
||||||
|
const k = `${status?.queue.pending ?? 0}:${status?.queue.running ?? 0}`;
|
||||||
|
if (k !== lastQueueKey) {
|
||||||
|
lastQueueKey = k;
|
||||||
|
loadActiveJobs();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
$effect(() => {
|
||||||
|
const c = status?.covers_queued ?? -1;
|
||||||
|
if (c !== lastCoversKey) {
|
||||||
|
lastCoversKey = c;
|
||||||
|
loadCovers();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Live updates via Server-Sent Events instead of polling. The
|
||||||
|
// EventSource is opened on mount and closed on destroy, so the
|
||||||
|
// subscription exists only while this page is showing live data.
|
||||||
|
function openStream() {
|
||||||
|
const es = new EventSource(crawlerStatusStreamUrl(), { withCredentials: true });
|
||||||
|
es.addEventListener('status', (e) => {
|
||||||
|
try {
|
||||||
|
status = JSON.parse((e as MessageEvent).data) as CrawlerStatus;
|
||||||
|
error = null;
|
||||||
|
live = true;
|
||||||
|
} catch {
|
||||||
|
// ignore a malformed frame; the next one will replace it
|
||||||
|
}
|
||||||
|
});
|
||||||
|
es.onopen = () => {
|
||||||
|
live = true;
|
||||||
|
};
|
||||||
|
es.onerror = () => {
|
||||||
|
// The browser auto-reconnects; reflect the gap in the UI.
|
||||||
|
live = false;
|
||||||
|
};
|
||||||
|
source = es;
|
||||||
|
}
|
||||||
|
|
||||||
|
onMount(() => {
|
||||||
|
// One-shot fetch for instant initial paint + resilience if SSE is
|
||||||
|
// blocked; the stream then drives subsequent updates.
|
||||||
|
refresh();
|
||||||
|
loadDeadJobs();
|
||||||
|
openStream();
|
||||||
|
});
|
||||||
|
onDestroy(() => {
|
||||||
|
source?.close();
|
||||||
|
source = null;
|
||||||
|
});
|
||||||
|
|
||||||
|
async function withBusy(label: string, fn: () => Promise<void>) {
|
||||||
|
busy = true;
|
||||||
|
notice = null;
|
||||||
|
error = null;
|
||||||
|
try {
|
||||||
|
await fn();
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof Error ? e.message : `${label} failed`;
|
||||||
|
} finally {
|
||||||
|
busy = false;
|
||||||
|
await refresh();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function onRunPass() {
|
||||||
|
await withBusy('run pass', async () => {
|
||||||
|
await runCrawlerPass();
|
||||||
|
notice = 'Metadata pass started.';
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function onConfirmRestart() {
|
||||||
|
restartModalOpen = false;
|
||||||
|
await withBusy('restart browser', async () => {
|
||||||
|
const r = await restartCrawlerBrowser();
|
||||||
|
notice = r.ok ? 'Browser restarted.' : `Restart failed: ${r.error ?? 'unknown'}`;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function onSaveSession() {
|
||||||
|
sessionResult = null;
|
||||||
|
busy = true;
|
||||||
|
try {
|
||||||
|
const r = await updateCrawlerSession(phpsessid);
|
||||||
|
sessionResult = r.valid
|
||||||
|
? '✓ Session valid — workers resumed.'
|
||||||
|
: `✕ Probe failed: ${r.error ?? 'unauthenticated'}`;
|
||||||
|
if (r.valid) {
|
||||||
|
sessionModalOpen = false;
|
||||||
|
phpsessid = '';
|
||||||
|
notice = 'Session updated.';
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
sessionResult = e instanceof Error ? e.message : 'update failed';
|
||||||
|
} finally {
|
||||||
|
busy = false;
|
||||||
|
await refresh();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function onClearExpired() {
|
||||||
|
await withBusy('clear expired', async () => {
|
||||||
|
await clearCrawlerSessionExpired();
|
||||||
|
notice = 'Session-expired flag cleared.';
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function requeue(scope: RequeueScope) {
|
||||||
|
await withBusy('requeue', async () => {
|
||||||
|
const r = await requeueDeadJobs(scope);
|
||||||
|
notice = `Requeued ${r.requeued} job(s).`;
|
||||||
|
await loadDeadJobs();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function onSearchDead() {
|
||||||
|
deadPage = 1;
|
||||||
|
loadDeadJobs();
|
||||||
|
}
|
||||||
|
|
||||||
|
function onDeadPageChange(p: number) {
|
||||||
|
deadPage = p;
|
||||||
|
loadDeadJobs();
|
||||||
|
}
|
||||||
|
|
||||||
|
function onSearchActive() {
|
||||||
|
activePage = 1;
|
||||||
|
loadActiveJobs();
|
||||||
|
}
|
||||||
|
function onActivePageChange(p: number) {
|
||||||
|
activePage = p;
|
||||||
|
loadActiveJobs();
|
||||||
|
}
|
||||||
|
function onSearchCovers() {
|
||||||
|
coversPage = 1;
|
||||||
|
loadCovers();
|
||||||
|
}
|
||||||
|
function onCoversPageChange(p: number) {
|
||||||
|
coversPage = p;
|
||||||
|
loadCovers();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ---- display helpers ----
|
||||||
|
function phaseLabel(p: CrawlerPhase | null): string {
|
||||||
|
if (!p) return 'Daemon disabled';
|
||||||
|
switch (p.state) {
|
||||||
|
case 'idle':
|
||||||
|
return p.next_fire
|
||||||
|
? `Idle — next pass ${new Date(p.next_fire).toLocaleString()}`
|
||||||
|
: 'Idle';
|
||||||
|
case 'walking_list':
|
||||||
|
return 'Walking source list';
|
||||||
|
case 'fetching_metadata':
|
||||||
|
return `Fetching metadata · ${p.index}/${p.total ?? '?'} · ${p.title}`;
|
||||||
|
case 'cover_backfill':
|
||||||
|
return `Backfilling covers · ${p.index + 1}/${p.total}`;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function phasePercent(p: CrawlerPhase | null): number | null {
|
||||||
|
if (p && p.state === 'fetching_metadata' && p.total && p.total > 0) {
|
||||||
|
return Math.min(100, (p.index / p.total) * 100);
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function sessionPill(s: CrawlerStatus): { cls: string; text: string } {
|
||||||
|
if (s.daemon === 'disabled') return { cls: 'badge-not_downloaded', text: 'n/a' };
|
||||||
|
if (s.session.expired) return { cls: 'badge-in_progress', text: 'Expired' };
|
||||||
|
if (!s.session.configured) return { cls: 'badge-not_downloaded', text: 'Not set' };
|
||||||
|
return { cls: 'badge-synced', text: 'OK' };
|
||||||
|
}
|
||||||
|
|
||||||
|
function browserPill(s: CrawlerStatus): { cls: string; text: string } {
|
||||||
|
switch (s.browser) {
|
||||||
|
case 'healthy':
|
||||||
|
return { cls: 'badge-synced', text: 'Up' };
|
||||||
|
case 'draining':
|
||||||
|
case 'restarting':
|
||||||
|
return { cls: 'badge-in_progress', text: s.browser };
|
||||||
|
default:
|
||||||
|
return { cls: 'badge-not_downloaded', text: 'Down' };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const deadTotalPages = $derived(Math.max(1, Math.ceil(deadTotal / DEAD_LIMIT)));
|
||||||
|
const activeTotalPages = $derived(Math.max(1, Math.ceil(activeTotal / ACTIVE_LIMIT)));
|
||||||
|
const coversTotalPages = $derived(Math.max(1, Math.ceil(coversTotal / COVERS_LIMIT)));
|
||||||
|
|
||||||
|
function chapterPercent(c: { pages_done: number; pages_total: number | null }): number | null {
|
||||||
|
return c.pages_total && c.pages_total > 0
|
||||||
|
? Math.min(100, (c.pages_done / c.pages_total) * 100)
|
||||||
|
: null;
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<div class="titlebar">
|
||||||
|
<h1>Crawler</h1>
|
||||||
|
<span class="livedot" class:on={live} title={live ? 'Live (SSE)' : 'Reconnecting…'}>
|
||||||
|
{live ? '● live' : '○ reconnecting…'}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{#if error}
|
||||||
|
<p class="error" role="alert">{error}</p>
|
||||||
|
{/if}
|
||||||
|
{#if notice}
|
||||||
|
<p class="notice" role="status">{notice}</p>
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
{#if status}
|
||||||
|
<!-- Status hero -->
|
||||||
|
<section class="hero" data-testid="crawler-hero">
|
||||||
|
<div class="pills">
|
||||||
|
<span class="pill"
|
||||||
|
>Daemon
|
||||||
|
<span class={`badge ${status.daemon === 'running' ? 'badge-synced' : 'badge-not_downloaded'}`}
|
||||||
|
>{status.daemon}</span
|
||||||
|
></span
|
||||||
|
>
|
||||||
|
<span class="pill"
|
||||||
|
>Session <span class={`badge ${sessionPill(status).cls}`}>{sessionPill(status).text}</span></span
|
||||||
|
>
|
||||||
|
<span class="pill"
|
||||||
|
>Browser <span class={`badge ${browserPill(status).cls}`}>{browserPill(status).text}</span></span
|
||||||
|
>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p class="phase" data-testid="crawler-phase">{phaseLabel(status.phase)}</p>
|
||||||
|
{#if phasePercent(status.phase) !== null}
|
||||||
|
{@render Bar({ percent: phasePercent(status.phase) ?? 0 })}
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
{#if status.session.expired}
|
||||||
|
<p class="warn">
|
||||||
|
⚠ Chapter downloads paused — session expired. Metadata + list crawl continue.
|
||||||
|
</p>
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
{#if status.current_cover}
|
||||||
|
<p class="cover" data-testid="current-cover">
|
||||||
|
🖼 Fetching cover: <strong>{status.current_cover.manga_title}</strong>
|
||||||
|
</p>
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
<p class="lastpass">
|
||||||
|
Last pass:
|
||||||
|
{#if status.last_pass.at}
|
||||||
|
{new Date(status.last_pass.at).toLocaleString()} ·
|
||||||
|
{status.last_pass.discovered} seen · {status.last_pass.upserted} upserted ·
|
||||||
|
{status.last_pass.mangas_failed} failed
|
||||||
|
{:else}
|
||||||
|
— none yet this session
|
||||||
|
{/if}
|
||||||
|
</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Controls -->
|
||||||
|
<section class="controls">
|
||||||
|
<button onclick={onRunPass} disabled={busy || status.daemon !== 'running'}
|
||||||
|
>Run metadata pass now</button
|
||||||
|
>
|
||||||
|
<button onclick={() => (restartModalOpen = true)} disabled={busy || status.daemon !== 'running'}
|
||||||
|
>Restart browser</button
|
||||||
|
>
|
||||||
|
<button onclick={() => { sessionModalOpen = true; sessionResult = null; }} disabled={busy || status.daemon !== 'running'}
|
||||||
|
>Manage session…</button
|
||||||
|
>
|
||||||
|
{#if status.session.expired}
|
||||||
|
<button onclick={onClearExpired} disabled={busy}>Clear expired flag</button>
|
||||||
|
{/if}
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Queue + covers stats -->
|
||||||
|
<section class="grid2">
|
||||||
|
<article>
|
||||||
|
<h2>Queue</h2>
|
||||||
|
<dl>
|
||||||
|
<dt>Pending</dt>
|
||||||
|
<dd>{status.queue.pending}</dd>
|
||||||
|
<dt>Running</dt>
|
||||||
|
<dd>{status.queue.running}</dd>
|
||||||
|
<dt>Dead</dt>
|
||||||
|
<dd>{status.queue.dead}</dd>
|
||||||
|
<dt>Covers queued</dt>
|
||||||
|
<dd>{status.covers_queued}</dd>
|
||||||
|
</dl>
|
||||||
|
</article>
|
||||||
|
<article>
|
||||||
|
<h2>Active chapters ({status.active_chapters.length}/{status.worker_count})</h2>
|
||||||
|
{#if status.active_chapters.length === 0}
|
||||||
|
<p class="muted">idle — no chapters downloading</p>
|
||||||
|
{:else}
|
||||||
|
<table class="active">
|
||||||
|
<tbody>
|
||||||
|
{#each status.active_chapters as c (c.chapter_id)}
|
||||||
|
<tr>
|
||||||
|
<td>{c.manga_title} · ch.{c.chapter_number}</td>
|
||||||
|
<td class="pagecount" data-testid="active-pages">
|
||||||
|
{c.pages_done}/{c.pages_total ?? '?'}
|
||||||
|
</td>
|
||||||
|
<td class="pagebar">
|
||||||
|
{#if chapterPercent(c) !== null}
|
||||||
|
{@render Bar({ percent: chapterPercent(c) ?? 0 })}
|
||||||
|
{/if}
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{/each}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
{/if}
|
||||||
|
</article>
|
||||||
|
</section>
|
||||||
|
{:else}
|
||||||
|
<p>Loading…</p>
|
||||||
|
{/if}
|
||||||
|
|
||||||
|
<!-- Queued chapters (pending/running backlog) -->
|
||||||
|
<section class="backlog">
|
||||||
|
<div class="deadhead">
|
||||||
|
<h2>Queued chapters ({activeTotal})</h2>
|
||||||
|
<div class="deadtools">
|
||||||
|
<input
|
||||||
|
placeholder="Search manga…"
|
||||||
|
bind:value={activeSearch}
|
||||||
|
onkeydown={(e) => e.key === 'Enter' && onSearchActive()}
|
||||||
|
/>
|
||||||
|
<button onclick={onSearchActive}>Search</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{#if activeJobs.length === 0}
|
||||||
|
<p class="muted">No chapters queued.</p>
|
||||||
|
{:else}
|
||||||
|
<table class="dead">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Manga / Chapter</th>
|
||||||
|
<th>State</th>
|
||||||
|
<th>Att.</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{#each activeJobs as j (j.id)}
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
{j.manga_title ?? '(unknown)'}
|
||||||
|
{#if j.chapter_number != null}· ch.{j.chapter_number}{/if}
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<span
|
||||||
|
class={`badge ${j.state === 'running' ? 'badge-downloading' : 'badge-not_downloaded'}`}
|
||||||
|
>{j.state}</span
|
||||||
|
>
|
||||||
|
</td>
|
||||||
|
<td>{j.attempts}/{j.max_attempts}</td>
|
||||||
|
</tr>
|
||||||
|
{/each}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<Pager page={activePage} totalPages={activeTotalPages} onChange={onActivePageChange} />
|
||||||
|
{/if}
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Queued covers (mangas missing a cover) -->
|
||||||
|
<section class="backlog">
|
||||||
|
<div class="deadhead">
|
||||||
|
<h2>Queued covers ({coversTotal})</h2>
|
||||||
|
<div class="deadtools">
|
||||||
|
<input
|
||||||
|
placeholder="Search manga…"
|
||||||
|
bind:value={coversSearch}
|
||||||
|
onkeydown={(e) => e.key === 'Enter' && onSearchCovers()}
|
||||||
|
/>
|
||||||
|
<button onclick={onSearchCovers}>Search</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{#if covers.length === 0}
|
||||||
|
<p class="muted">No covers queued 🎉</p>
|
||||||
|
{:else}
|
||||||
|
<table class="dead">
|
||||||
|
<thead>
|
||||||
|
<tr><th>Manga</th></tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{#each covers as c (c.manga_id)}
|
||||||
|
<tr><td>{c.manga_title}</td></tr>
|
||||||
|
{/each}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<Pager page={coversPage} totalPages={coversTotalPages} onChange={onCoversPageChange} />
|
||||||
|
{/if}
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Dead jobs -->
|
||||||
|
<section class="deadjobs">
|
||||||
|
<div class="deadhead">
|
||||||
|
<h2>Dead jobs ({deadTotal})</h2>
|
||||||
|
<div class="deadtools">
|
||||||
|
<input
|
||||||
|
placeholder="Search manga…"
|
||||||
|
bind:value={deadSearch}
|
||||||
|
onkeydown={(e) => e.key === 'Enter' && onSearchDead()}
|
||||||
|
/>
|
||||||
|
<button onclick={onSearchDead}>Search</button>
|
||||||
|
<button
|
||||||
|
onclick={() => requeue({ scope: 'all' })}
|
||||||
|
disabled={busy || deadTotal === 0}>Requeue all ({deadTotal})</button
|
||||||
|
>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{#if deadJobs.length === 0}
|
||||||
|
<p class="muted">No dead jobs 🎉</p>
|
||||||
|
{:else}
|
||||||
|
<table class="dead">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Manga / Chapter</th>
|
||||||
|
<th>Att.</th>
|
||||||
|
<th>Failed</th>
|
||||||
|
<th>Last error</th>
|
||||||
|
<th class="actions">Action</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{#each deadJobs as j (j.id)}
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
{j.manga_title ?? '(unknown)'}
|
||||||
|
{#if j.chapter_number != null}· ch.{j.chapter_number}{/if}
|
||||||
|
</td>
|
||||||
|
<td>{j.attempts}/{j.max_attempts}</td>
|
||||||
|
<td>{new Date(j.updated_at).toLocaleDateString()}</td>
|
||||||
|
<td class="err" title={j.last_error ?? ''}>{j.last_error ?? '—'}</td>
|
||||||
|
<td class="actions">
|
||||||
|
<button onclick={() => requeue({ scope: 'job', job_id: j.id })} disabled={busy}
|
||||||
|
>Requeue</button
|
||||||
|
>
|
||||||
|
{#if j.manga_id}
|
||||||
|
<button
|
||||||
|
class="secondary"
|
||||||
|
onclick={() => requeue({ scope: 'manga', manga_id: j.manga_id! })}
|
||||||
|
disabled={busy}>Manga</button
|
||||||
|
>
|
||||||
|
{/if}
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{/each}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
<Pager page={deadPage} totalPages={deadTotalPages} onChange={onDeadPageChange} />
|
||||||
|
{/if}
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<!-- Restart confirm modal -->
|
||||||
|
<Modal open={restartModalOpen} title="Restart browser" onClose={() => (restartModalOpen = false)} size="sm">
|
||||||
|
{#snippet children()}
|
||||||
|
<p>This relaunches Chromium and re-injects the session cookie.</p>
|
||||||
|
<ul class="coord">
|
||||||
|
<li>In-flight jobs are allowed to finish (bounded), then forced.</li>
|
||||||
|
<li>New jobs pause until the relaunch completes.</li>
|
||||||
|
<li>The metadata pass yields at its next checkpoint.</li>
|
||||||
|
</ul>
|
||||||
|
{/snippet}
|
||||||
|
{#snippet footer()}
|
||||||
|
<button onclick={() => (restartModalOpen = false)}>Cancel</button>
|
||||||
|
<button class="primary" onclick={onConfirmRestart} disabled={busy}>Restart</button>
|
||||||
|
{/snippet}
|
||||||
|
</Modal>
|
||||||
|
|
||||||
|
<!-- Session modal -->
|
||||||
|
<Modal open={sessionModalOpen} title="Manage crawler session" onClose={() => (sessionModalOpen = false)} size="md">
|
||||||
|
{#snippet children()}
|
||||||
|
<label for="phpsessid">PHPSESSID</label>
|
||||||
|
<input id="phpsessid" type="password" bind:value={phpsessid} autocomplete="off" />
|
||||||
|
<p class="hint">
|
||||||
|
Saving rewrites the cookie everywhere, persists it, restarts the browser, and re-probes.
|
||||||
|
</p>
|
||||||
|
{#if sessionResult}
|
||||||
|
<p class="sessionresult">{sessionResult}</p>
|
||||||
|
{/if}
|
||||||
|
{/snippet}
|
||||||
|
{#snippet footer()}
|
||||||
|
<button onclick={() => (sessionModalOpen = false)}>Cancel</button>
|
||||||
|
<button class="primary" onclick={onSaveSession} disabled={busy || phpsessid.trim() === ''}
|
||||||
|
>Save & validate</button
|
||||||
|
>
|
||||||
|
{/snippet}
|
||||||
|
</Modal>
|
||||||
|
|
||||||
|
{#snippet Bar({ percent }: { percent: number })}
|
||||||
|
<div class="bar" role="progressbar" aria-valuenow={percent} aria-valuemin="0" aria-valuemax="100">
|
||||||
|
<div class="fill" style:width="{Math.min(100, Math.max(0, percent))}%"></div>
|
||||||
|
<span class="label">{percent.toFixed(0)}%</span>
|
||||||
|
</div>
|
||||||
|
{/snippet}
|
||||||
|
|
||||||
|
<style>
|
||||||
|
h1 {
|
||||||
|
margin: 0;
|
||||||
|
}
|
||||||
|
.titlebar {
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: var(--space-3);
|
||||||
|
margin-bottom: var(--space-4);
|
||||||
|
}
|
||||||
|
.livedot {
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
.livedot.on {
|
||||||
|
color: var(--success, #0a7d2c);
|
||||||
|
}
|
||||||
|
h2 {
|
||||||
|
margin: 0 0 var(--space-3) 0;
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
color: var(--text-muted);
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
}
|
||||||
|
.hero {
|
||||||
|
padding: var(--space-4);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
background: var(--surface);
|
||||||
|
margin-bottom: var(--space-4);
|
||||||
|
}
|
||||||
|
.pills {
|
||||||
|
display: flex;
|
||||||
|
gap: var(--space-4);
|
||||||
|
flex-wrap: wrap;
|
||||||
|
margin-bottom: var(--space-3);
|
||||||
|
}
|
||||||
|
.pill {
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
color: var(--text-muted);
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: var(--space-2);
|
||||||
|
}
|
||||||
|
.phase {
|
||||||
|
font-size: var(--font-lg);
|
||||||
|
font-weight: var(--weight-semibold);
|
||||||
|
margin: var(--space-2) 0;
|
||||||
|
}
|
||||||
|
.lastpass,
|
||||||
|
.hint {
|
||||||
|
color: var(--text-muted);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
.warn {
|
||||||
|
color: #92400e;
|
||||||
|
background: #fef3c7;
|
||||||
|
border: 1px solid #fcd34d;
|
||||||
|
padding: var(--space-2) var(--space-3);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
.controls {
|
||||||
|
display: flex;
|
||||||
|
gap: var(--space-2);
|
||||||
|
flex-wrap: wrap;
|
||||||
|
margin-bottom: var(--space-4);
|
||||||
|
}
|
||||||
|
.grid2 {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fit, minmax(16rem, 1fr));
|
||||||
|
gap: var(--space-3);
|
||||||
|
margin-bottom: var(--space-4);
|
||||||
|
}
|
||||||
|
article {
|
||||||
|
padding: var(--space-3);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
background: var(--surface);
|
||||||
|
}
|
||||||
|
dl {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: max-content 1fr;
|
||||||
|
gap: var(--space-1) var(--space-3);
|
||||||
|
margin: 0;
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
dt {
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
dd {
|
||||||
|
margin: 0;
|
||||||
|
font-family: var(--font-mono, monospace);
|
||||||
|
}
|
||||||
|
table {
|
||||||
|
width: 100%;
|
||||||
|
border-collapse: collapse;
|
||||||
|
}
|
||||||
|
th,
|
||||||
|
td {
|
||||||
|
padding: var(--space-2);
|
||||||
|
text-align: left;
|
||||||
|
border-bottom: 1px solid var(--border);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
.actions {
|
||||||
|
text-align: right;
|
||||||
|
}
|
||||||
|
.mono {
|
||||||
|
font-family: var(--font-mono, monospace);
|
||||||
|
font-size: var(--font-xs);
|
||||||
|
}
|
||||||
|
.err {
|
||||||
|
max-width: 22rem;
|
||||||
|
overflow: hidden;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
white-space: nowrap;
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
.deadhead {
|
||||||
|
display: flex;
|
||||||
|
justify-content: space-between;
|
||||||
|
align-items: center;
|
||||||
|
gap: var(--space-3);
|
||||||
|
flex-wrap: wrap;
|
||||||
|
}
|
||||||
|
.deadtools {
|
||||||
|
display: flex;
|
||||||
|
gap: var(--space-2);
|
||||||
|
}
|
||||||
|
button.secondary {
|
||||||
|
background: var(--surface-elevated);
|
||||||
|
}
|
||||||
|
.notice {
|
||||||
|
color: var(--success, #0a7d2c);
|
||||||
|
padding: var(--space-2) var(--space-3);
|
||||||
|
border: 1px solid var(--success, #0a7d2c);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
margin-bottom: var(--space-3);
|
||||||
|
}
|
||||||
|
.sessionresult {
|
||||||
|
margin-top: var(--space-2);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
.coord {
|
||||||
|
margin: var(--space-2) 0;
|
||||||
|
padding-left: var(--space-4);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
/* badges (shared convention with admin/mangas) */
|
||||||
|
:global(.badge) {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 0 var(--space-2);
|
||||||
|
border-radius: var(--radius-sm, 4px);
|
||||||
|
font-size: var(--font-xs);
|
||||||
|
font-weight: var(--weight-semibold);
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
background: var(--surface);
|
||||||
|
}
|
||||||
|
:global(.badge-synced) {
|
||||||
|
background: #dcfce7;
|
||||||
|
color: #166534;
|
||||||
|
border-color: #86efac;
|
||||||
|
}
|
||||||
|
:global(.badge-in_progress),
|
||||||
|
:global(.badge-downloading) {
|
||||||
|
background: #fef3c7;
|
||||||
|
color: #92400e;
|
||||||
|
border-color: #fcd34d;
|
||||||
|
}
|
||||||
|
:global(.badge-not_downloaded) {
|
||||||
|
background: var(--surface-elevated);
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
.bar {
|
||||||
|
position: relative;
|
||||||
|
background: var(--surface-elevated);
|
||||||
|
border-radius: var(--radius-sm, 4px);
|
||||||
|
height: 1.5rem;
|
||||||
|
margin: var(--space-2) 0;
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
.fill {
|
||||||
|
height: 100%;
|
||||||
|
background: #22c55e;
|
||||||
|
transition: width 0.3s ease;
|
||||||
|
}
|
||||||
|
.label {
|
||||||
|
position: absolute;
|
||||||
|
top: 50%;
|
||||||
|
left: 50%;
|
||||||
|
transform: translate(-50%, -50%);
|
||||||
|
font-size: var(--font-xs);
|
||||||
|
font-weight: var(--weight-semibold);
|
||||||
|
}
|
||||||
|
.error {
|
||||||
|
color: var(--danger, #dc2626);
|
||||||
|
padding: var(--space-2) var(--space-3);
|
||||||
|
border: 1px solid var(--danger, #dc2626);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
margin-bottom: var(--space-3);
|
||||||
|
}
|
||||||
|
.muted {
|
||||||
|
color: var(--text-muted);
|
||||||
|
}
|
||||||
|
.cover {
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
.backlog {
|
||||||
|
margin-top: var(--space-4);
|
||||||
|
}
|
||||||
|
.pagecount {
|
||||||
|
font-family: var(--font-mono, monospace);
|
||||||
|
font-size: var(--font-xs);
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.pagebar {
|
||||||
|
width: 8rem;
|
||||||
|
}
|
||||||
|
table.active td {
|
||||||
|
vertical-align: middle;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
@@ -3,6 +3,7 @@
|
|||||||
import {
|
import {
|
||||||
listAdminMangas,
|
listAdminMangas,
|
||||||
listAdminChapters,
|
listAdminChapters,
|
||||||
|
requeueDeadJobs,
|
||||||
type AdminMangasPage,
|
type AdminMangasPage,
|
||||||
type AdminChapterRow,
|
type AdminChapterRow,
|
||||||
type MangaSyncState
|
type MangaSyncState
|
||||||
@@ -59,6 +60,27 @@
|
|||||||
function badgeClass(state: string): string {
|
function badgeClass(state: string): string {
|
||||||
return `badge badge-${state}`;
|
return `badge badge-${state}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
let requeuingChapter: string | null = $state(null);
|
||||||
|
|
||||||
|
/** Requeue the dead job(s) for a single failed chapter, then refresh
|
||||||
|
* that manga's chapter list so the pill updates. */
|
||||||
|
async function requeueChapter(mangaId: string, chapterId: string) {
|
||||||
|
requeuingChapter = chapterId;
|
||||||
|
error = null;
|
||||||
|
try {
|
||||||
|
await requeueDeadJobs({ scope: 'chapter', chapter_id: chapterId });
|
||||||
|
const resp = await listAdminChapters(mangaId, { limit: 500 });
|
||||||
|
chaptersByManga[mangaId] = {
|
||||||
|
items: resp.items,
|
||||||
|
total: resp.page.total ?? resp.items.length
|
||||||
|
};
|
||||||
|
} catch (e) {
|
||||||
|
error = e instanceof ApiError ? e.message : 'requeue failed';
|
||||||
|
} finally {
|
||||||
|
requeuingChapter = null;
|
||||||
|
}
|
||||||
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<h1>Mangas</h1>
|
<h1>Mangas</h1>
|
||||||
@@ -71,16 +93,19 @@
|
|||||||
>
|
>
|
||||||
<input
|
<input
|
||||||
type="search"
|
type="search"
|
||||||
placeholder="search by title"
|
placeholder="Search by title"
|
||||||
bind:value={search}
|
bind:value={search}
|
||||||
data-testid="admin-mangas-search"
|
data-testid="admin-mangas-search"
|
||||||
/>
|
/>
|
||||||
|
<label class="sync-label">
|
||||||
|
<span>Sync state</span>
|
||||||
<select bind:value={syncFilter} aria-label="sync state">
|
<select bind:value={syncFilter} aria-label="sync state">
|
||||||
<option value="">all states</option>
|
<option value="">All</option>
|
||||||
<option value="in_progress">in progress</option>
|
<option value="in_progress">In progress</option>
|
||||||
<option value="dropped">dropped</option>
|
<option value="dropped">Dropped</option>
|
||||||
<option value="synced">synced</option>
|
<option value="synced">Synced</option>
|
||||||
</select>
|
</select>
|
||||||
|
</label>
|
||||||
<button type="submit">Search</button>
|
<button type="submit">Search</button>
|
||||||
</form>
|
</form>
|
||||||
|
|
||||||
@@ -150,6 +175,16 @@
|
|||||||
<span class={badgeClass(c.sync_state)}>
|
<span class={badgeClass(c.sync_state)}>
|
||||||
{c.sync_state}
|
{c.sync_state}
|
||||||
</span>
|
</span>
|
||||||
|
{#if c.sync_state === 'failed'}
|
||||||
|
<button
|
||||||
|
class="requeue"
|
||||||
|
onclick={() => requeueChapter(m.id, c.id)}
|
||||||
|
disabled={requeuingChapter === c.id}
|
||||||
|
title="Requeue this chapter"
|
||||||
|
>
|
||||||
|
↻ requeue
|
||||||
|
</button>
|
||||||
|
{/if}
|
||||||
</td>
|
</td>
|
||||||
</tr>
|
</tr>
|
||||||
{/each}
|
{/each}
|
||||||
@@ -173,17 +208,28 @@
|
|||||||
}
|
}
|
||||||
form {
|
form {
|
||||||
display: flex;
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
align-items: center;
|
||||||
gap: var(--space-2);
|
gap: var(--space-2);
|
||||||
margin-bottom: var(--space-3);
|
margin-bottom: var(--space-3);
|
||||||
}
|
}
|
||||||
input[type='search'] {
|
input[type='search'] {
|
||||||
flex: 1;
|
flex: 1;
|
||||||
|
min-width: 0;
|
||||||
|
max-width: 24rem;
|
||||||
padding: var(--space-2) var(--space-3);
|
padding: var(--space-2) var(--space-3);
|
||||||
border: 1px solid var(--border);
|
border: 1px solid var(--border);
|
||||||
border-radius: var(--radius-md);
|
border-radius: var(--radius-md);
|
||||||
background: var(--surface);
|
background: var(--surface);
|
||||||
color: var(--text);
|
color: var(--text);
|
||||||
}
|
}
|
||||||
|
.sync-label {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: var(--space-2);
|
||||||
|
color: var(--text-muted);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
select {
|
select {
|
||||||
padding: var(--space-2) var(--space-3);
|
padding: var(--space-2) var(--space-3);
|
||||||
border-radius: var(--radius-md);
|
border-radius: var(--radius-md);
|
||||||
@@ -258,6 +304,11 @@
|
|||||||
color: #991b1b;
|
color: #991b1b;
|
||||||
border-color: #fca5a5;
|
border-color: #fca5a5;
|
||||||
}
|
}
|
||||||
|
.requeue {
|
||||||
|
margin-left: var(--space-2);
|
||||||
|
font-size: var(--font-xs);
|
||||||
|
padding: 0 var(--space-2);
|
||||||
|
}
|
||||||
.badge-not_downloaded {
|
.badge-not_downloaded {
|
||||||
background: var(--surface-elevated);
|
background: var(--surface-elevated);
|
||||||
color: var(--text-muted);
|
color: var(--text-muted);
|
||||||
|
|||||||
@@ -1,15 +1,33 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import MangaCard from '$lib/components/MangaCard.svelte';
|
import MangaCard from '$lib/components/MangaCard.svelte';
|
||||||
|
import Pager from '$lib/components/Pager.svelte';
|
||||||
import ArrowLeft from '@lucide/svelte/icons/arrow-left';
|
import ArrowLeft from '@lucide/svelte/icons/arrow-left';
|
||||||
|
import { goto } from '$app/navigation';
|
||||||
|
import { page } from '$app/stores';
|
||||||
|
|
||||||
let { data } = $props();
|
let { data } = $props();
|
||||||
const author = $derived(data.author);
|
const author = $derived(data.author);
|
||||||
const mangas = $derived(data.mangas);
|
const mangas = $derived(data.mangas);
|
||||||
const total = $derived(data.total);
|
const total = $derived(data.total);
|
||||||
|
const currentPage = $derived(data.currentPage);
|
||||||
|
const pageSize = $derived(data.pageSize);
|
||||||
|
const totalPages = $derived(
|
||||||
|
total != null && total > 0 ? Math.ceil(total / pageSize) : 1
|
||||||
|
);
|
||||||
|
const rangeStart = $derived(mangas.length === 0 ? 0 : (currentPage - 1) * pageSize + 1);
|
||||||
|
const rangeEnd = $derived((currentPage - 1) * pageSize + mangas.length);
|
||||||
|
|
||||||
|
function goToPage(p: number) {
|
||||||
|
if (p === currentPage) return;
|
||||||
|
const url = new URL($page.url);
|
||||||
|
if (p === 1) url.searchParams.delete('page');
|
||||||
|
else url.searchParams.set('page', String(p));
|
||||||
|
goto(url.pathname + url.search, { noScroll: false });
|
||||||
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
<svelte:head>
|
||||||
<title>{author.name} — Mangalord</title>
|
<title>Mangalord | {author.name}</title>
|
||||||
</svelte:head>
|
</svelte:head>
|
||||||
|
|
||||||
<nav class="back">
|
<nav class="back">
|
||||||
@@ -34,7 +52,7 @@
|
|||||||
{:else}
|
{:else}
|
||||||
{#if total != null}
|
{#if total != null}
|
||||||
<p class="meta" data-testid="author-shown-of-total">
|
<p class="meta" data-testid="author-shown-of-total">
|
||||||
Showing {mangas.length} of {total}
|
Showing {rangeStart}–{rangeEnd} of {total}
|
||||||
</p>
|
</p>
|
||||||
{/if}
|
{/if}
|
||||||
<ul class="manga-grid" data-testid="author-manga-list">
|
<ul class="manga-grid" data-testid="author-manga-list">
|
||||||
@@ -42,6 +60,12 @@
|
|||||||
<MangaCard manga={m} testid={`author-manga-${m.id}`} />
|
<MangaCard manga={m} testid={`author-manga-${m.id}`} />
|
||||||
{/each}
|
{/each}
|
||||||
</ul>
|
</ul>
|
||||||
|
<Pager
|
||||||
|
page={currentPage}
|
||||||
|
{totalPages}
|
||||||
|
onChange={goToPage}
|
||||||
|
testid="author-pager"
|
||||||
|
/>
|
||||||
{/if}
|
{/if}
|
||||||
|
|
||||||
<style>
|
<style>
|
||||||
|
|||||||
@@ -5,13 +5,27 @@ import type { PageLoad } from './$types';
|
|||||||
|
|
||||||
export const ssr = false;
|
export const ssr = false;
|
||||||
|
|
||||||
export const load: PageLoad = async ({ params }) => {
|
const PAGE_SIZE = 50;
|
||||||
|
|
||||||
|
export const load: PageLoad = async ({ params, url }) => {
|
||||||
|
const pageParam = Number(url.searchParams.get('page') ?? '1');
|
||||||
|
const currentPage =
|
||||||
|
Number.isFinite(pageParam) && pageParam >= 1 ? Math.floor(pageParam) : 1;
|
||||||
try {
|
try {
|
||||||
const [author, mangas] = await Promise.all([
|
const [author, mangas] = await Promise.all([
|
||||||
getAuthor(params.id),
|
getAuthor(params.id),
|
||||||
listAuthorMangas(params.id, { limit: 50 })
|
listAuthorMangas(params.id, {
|
||||||
|
limit: PAGE_SIZE,
|
||||||
|
offset: (currentPage - 1) * PAGE_SIZE
|
||||||
|
})
|
||||||
]);
|
]);
|
||||||
return { author, mangas: mangas.items, total: mangas.page.total };
|
return {
|
||||||
|
author,
|
||||||
|
mangas: mangas.items,
|
||||||
|
total: mangas.page.total,
|
||||||
|
currentPage,
|
||||||
|
pageSize: PAGE_SIZE
|
||||||
|
};
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
// 404 surfaces as a real SvelteKit error so the framework shell
|
// 404 surfaces as a real SvelteKit error so the framework shell
|
||||||
// renders the standard not-found page instead of the route's
|
// renders the standard not-found page instead of the route's
|
||||||
|
|||||||
@@ -7,10 +7,6 @@
|
|||||||
const error = $derived(data.error);
|
const error = $derived(data.error);
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
|
||||||
<title>Bookmarks — Mangalord</title>
|
|
||||||
</svelte:head>
|
|
||||||
|
|
||||||
<h1>Bookmarks</h1>
|
<h1>Bookmarks</h1>
|
||||||
|
|
||||||
{#if error}
|
{#if error}
|
||||||
|
|||||||
@@ -5,10 +5,6 @@
|
|||||||
const collections = $derived(data.collections);
|
const collections = $derived(data.collections);
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
|
||||||
<title>Collections — Mangalord</title>
|
|
||||||
</svelte:head>
|
|
||||||
|
|
||||||
<h1>Collections</h1>
|
<h1>Collections</h1>
|
||||||
|
|
||||||
{#if !data.authenticated}
|
{#if !data.authenticated}
|
||||||
|
|||||||
@@ -75,7 +75,7 @@
|
|||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
<svelte:head>
|
||||||
<title>{collection.name} — Mangalord</title>
|
<title>Mangalord | {collection.name}</title>
|
||||||
</svelte:head>
|
</svelte:head>
|
||||||
|
|
||||||
<nav class="back">
|
<nav class="back">
|
||||||
|
|||||||
113
frontend/src/routes/layout.test.ts
Normal file
113
frontend/src/routes/layout.test.ts
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
||||||
|
|
||||||
|
// Mock the API client *before* importing the load function so the
|
||||||
|
// module under test picks up the mock when it resolves its imports.
|
||||||
|
vi.mock('$lib/api/auth', () => ({
|
||||||
|
getAuthConfig: vi.fn(),
|
||||||
|
me: vi.fn()
|
||||||
|
}));
|
||||||
|
|
||||||
|
import { load } from './+layout';
|
||||||
|
import { getAuthConfig, me, type AuthConfig } from '$lib/api/auth';
|
||||||
|
|
||||||
|
type MinimalLoadEvent = { url: { pathname: string; search: string } };
|
||||||
|
|
||||||
|
function event(pathname: string, search = ''): MinimalLoadEvent {
|
||||||
|
return { url: { pathname, search } };
|
||||||
|
}
|
||||||
|
|
||||||
|
// `LayoutLoad`'s declared return type is `void | …`. Our `load`
|
||||||
|
// always returns `{ authConfig }`, but TypeScript can't narrow on
|
||||||
|
// that at the call site. Wrap to remove the `void` arm so the
|
||||||
|
// assertions stay terse.
|
||||||
|
async function callLoad(ev: MinimalLoadEvent): Promise<{ authConfig: AuthConfig }> {
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
const result = await load(ev as any);
|
||||||
|
return result as { authConfig: AuthConfig };
|
||||||
|
}
|
||||||
|
|
||||||
|
const PUBLIC_CFG = { self_register_enabled: true, private_mode: false };
|
||||||
|
const PRIVATE_CFG = { self_register_enabled: false, private_mode: true };
|
||||||
|
|
||||||
|
const aliceUser = {
|
||||||
|
id: 'u1',
|
||||||
|
username: 'alice',
|
||||||
|
created_at: '2026-01-01T00:00:00Z',
|
||||||
|
is_admin: false
|
||||||
|
};
|
||||||
|
|
||||||
|
describe('root +layout load', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
vi.mocked(getAuthConfig).mockReset();
|
||||||
|
vi.mocked(me).mockReset();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('public mode: returns authConfig data, never calls me()', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PUBLIC_CFG);
|
||||||
|
const data = await callLoad(event('/'));
|
||||||
|
expect(data.authConfig).toEqual(PUBLIC_CFG);
|
||||||
|
expect(me).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('private mode + anonymous on `/`: throws redirect(302) to /login with next=', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
|
||||||
|
vi.mocked(me).mockResolvedValue(null);
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
await expect(load(event('/') as any)).rejects.toMatchObject({
|
||||||
|
status: 302,
|
||||||
|
location: '/login?next=%2F'
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('private mode + anonymous on `/login`: passes through without redirect', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
|
||||||
|
const data = await callLoad(event('/login'));
|
||||||
|
expect(data.authConfig.private_mode).toBe(true);
|
||||||
|
// me() must not run on the login page itself, otherwise anonymous
|
||||||
|
// visits make an extra round-trip every page load.
|
||||||
|
expect(me).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('private mode + logged-in user: no redirect, returns authConfig', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
|
||||||
|
vi.mocked(me).mockResolvedValue(aliceUser);
|
||||||
|
const data = await callLoad(event('/'));
|
||||||
|
expect(data.authConfig).toEqual(PRIVATE_CFG);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('private mode + anonymous: preserves pathname AND search in next=', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
|
||||||
|
vi.mocked(me).mockResolvedValue(null);
|
||||||
|
await expect(
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
load(event('/manga/abc', '?page=3') as any)
|
||||||
|
).rejects.toMatchObject({
|
||||||
|
status: 302,
|
||||||
|
location: '/login?next=%2Fmanga%2Fabc%3Fpage%3D3'
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('private mode + anonymous on /register: redirects to /login (register is never reachable in private mode)', async () => {
|
||||||
|
vi.mocked(getAuthConfig).mockResolvedValue(PRIVATE_CFG);
|
||||||
|
vi.mocked(me).mockResolvedValue(null);
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
await expect(load(event('/register') as any)).rejects.toMatchObject({
|
||||||
|
status: 302,
|
||||||
|
location: '/login?next=%2Fregister'
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
it('getAuthConfig failure: falls back to public-mode defaults, no redirect', async () => {
|
||||||
|
// The backend middleware is the source of truth for the gate;
|
||||||
|
// if the config probe blips, fail soft so a brief outage doesn't
|
||||||
|
// lock everyone out of even the login page. No private data
|
||||||
|
// can leak because the backend still 401s every request.
|
||||||
|
vi.mocked(getAuthConfig).mockRejectedValue(new Error('network'));
|
||||||
|
const data = await callLoad(event('/'));
|
||||||
|
expect(data.authConfig).toEqual({
|
||||||
|
self_register_enabled: true,
|
||||||
|
private_mode: false
|
||||||
|
});
|
||||||
|
expect(me).not.toHaveBeenCalled();
|
||||||
|
});
|
||||||
|
});
|
||||||
@@ -1,13 +1,16 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import { fileUrl } from '$lib/api/client';
|
import { fileUrl, ApiError } from '$lib/api/client';
|
||||||
import { createBookmark, deleteBookmark, type Bookmark } from '$lib/api/bookmarks';
|
import { createBookmark, deleteBookmark, type Bookmark } from '$lib/api/bookmarks';
|
||||||
import {
|
import {
|
||||||
attachTag,
|
attachTag,
|
||||||
detachTag,
|
detachTag,
|
||||||
type AuthorRef,
|
type AuthorRef,
|
||||||
type GenreRef,
|
type GenreRef,
|
||||||
|
type MangaDetail,
|
||||||
type TagRef
|
type TagRef
|
||||||
} from '$lib/api/mangas';
|
} from '$lib/api/mangas';
|
||||||
|
import { resyncManga } from '$lib/api/admin';
|
||||||
|
import { chapterLabel } from '$lib/api/chapters';
|
||||||
import { listTags, type Tag } from '$lib/api/tags';
|
import { listTags, type Tag } from '$lib/api/tags';
|
||||||
import { session } from '$lib/session.svelte';
|
import { session } from '$lib/session.svelte';
|
||||||
import Chip from '$lib/components/Chip.svelte';
|
import Chip from '$lib/components/Chip.svelte';
|
||||||
@@ -16,9 +19,15 @@
|
|||||||
import FolderPlus from '@lucide/svelte/icons/folder-plus';
|
import FolderPlus from '@lucide/svelte/icons/folder-plus';
|
||||||
import Pencil from '@lucide/svelte/icons/pencil';
|
import Pencil from '@lucide/svelte/icons/pencil';
|
||||||
import UploadCloud from '@lucide/svelte/icons/upload-cloud';
|
import UploadCloud from '@lucide/svelte/icons/upload-cloud';
|
||||||
|
import RefreshCw from '@lucide/svelte/icons/refresh-cw';
|
||||||
|
|
||||||
let { data } = $props();
|
let { data } = $props();
|
||||||
const manga = $derived(data.manga);
|
// `manga` is locally overridable so a successful force resync can
|
||||||
|
// swap in the refreshed detail (new cover URL, refreshed status,
|
||||||
|
// etc.) without a router reload. Falls back to the server-loaded
|
||||||
|
// data otherwise.
|
||||||
|
let mangaOverride = $state<MangaDetail | null>(null);
|
||||||
|
const manga = $derived<MangaDetail>(mangaOverride ?? data.manga);
|
||||||
const chapters = $derived(data.chapters);
|
const chapters = $derived(data.chapters);
|
||||||
const readProgress = $derived(data.readProgress);
|
const readProgress = $derived(data.readProgress);
|
||||||
/** Chapter row from the local chapters list when present (so we
|
/** Chapter row from the local chapters list when present (so we
|
||||||
@@ -37,6 +46,11 @@
|
|||||||
continueChapter?.number ?? readProgress?.chapter_number ?? null
|
continueChapter?.number ?? readProgress?.chapter_number ?? null
|
||||||
);
|
);
|
||||||
const continueChapterTitle = $derived(continueChapter?.title ?? null);
|
const continueChapterTitle = $derived(continueChapter?.title ?? null);
|
||||||
|
const continueLabel = $derived(
|
||||||
|
continueChapterNumber != null
|
||||||
|
? chapterLabel({ number: continueChapterNumber, title: continueChapterTitle })
|
||||||
|
: null
|
||||||
|
);
|
||||||
|
|
||||||
const authors = $derived<AuthorRef[]>(manga.authors);
|
const authors = $derived<AuthorRef[]>(manga.authors);
|
||||||
const genres = $derived<GenreRef[]>(manga.genres);
|
const genres = $derived<GenreRef[]>(manga.genres);
|
||||||
@@ -171,10 +185,35 @@
|
|||||||
const statusLabel = $derived(manga.status === 'completed' ? 'Completed' : 'Ongoing');
|
const statusLabel = $derived(manga.status === 'completed' ? 'Completed' : 'Ongoing');
|
||||||
|
|
||||||
let collectionModalOpen = $state(false);
|
let collectionModalOpen = $state(false);
|
||||||
|
|
||||||
|
// ---- Admin force resync ----
|
||||||
|
let resyncBusy = $state(false);
|
||||||
|
let resyncMessage = $state<{ kind: 'ok' | 'err'; text: string } | null>(null);
|
||||||
|
async function forceResync() {
|
||||||
|
if (!session.user?.is_admin || resyncBusy) return;
|
||||||
|
resyncBusy = true;
|
||||||
|
resyncMessage = null;
|
||||||
|
try {
|
||||||
|
const r = await resyncManga(manga.id);
|
||||||
|
mangaOverride = r.manga;
|
||||||
|
const coverNote = r.cover_fetched
|
||||||
|
? ' Cover re-downloaded.'
|
||||||
|
: ' Cover unchanged.';
|
||||||
|
resyncMessage = {
|
||||||
|
kind: 'ok',
|
||||||
|
text: `Metadata ${r.metadata_status}.${coverNote}`
|
||||||
|
};
|
||||||
|
} catch (e) {
|
||||||
|
const msg = e instanceof ApiError ? e.message : (e as Error).message;
|
||||||
|
resyncMessage = { kind: 'err', text: msg };
|
||||||
|
} finally {
|
||||||
|
resyncBusy = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
<svelte:head>
|
||||||
<title>{manga.title} — Mangalord</title>
|
<title>Mangalord | {manga.title}</title>
|
||||||
</svelte:head>
|
</svelte:head>
|
||||||
|
|
||||||
<article>
|
<article>
|
||||||
@@ -344,7 +383,34 @@
|
|||||||
<UploadCloud size={16} aria-hidden="true" />
|
<UploadCloud size={16} aria-hidden="true" />
|
||||||
<span>Upload chapter</span>
|
<span>Upload chapter</span>
|
||||||
</a>
|
</a>
|
||||||
|
{#if session.user.is_admin}
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
class="action"
|
||||||
|
onclick={forceResync}
|
||||||
|
disabled={resyncBusy}
|
||||||
|
title="Refetch metadata + cover from the crawler source"
|
||||||
|
data-testid="force-resync-manga"
|
||||||
|
>
|
||||||
|
<RefreshCw
|
||||||
|
size={16}
|
||||||
|
aria-hidden="true"
|
||||||
|
class={resyncBusy ? 'spin' : ''}
|
||||||
|
/>
|
||||||
|
<span>{resyncBusy ? 'Resyncing…' : 'Force resync'}</span>
|
||||||
|
</button>
|
||||||
|
{/if}
|
||||||
</div>
|
</div>
|
||||||
|
{#if resyncMessage}
|
||||||
|
<p
|
||||||
|
class="resync-msg"
|
||||||
|
class:err={resyncMessage.kind === 'err'}
|
||||||
|
role="status"
|
||||||
|
data-testid="force-resync-message"
|
||||||
|
>
|
||||||
|
{resyncMessage.text}
|
||||||
|
</p>
|
||||||
|
{/if}
|
||||||
{:else}
|
{:else}
|
||||||
<a class="action" href="/login" data-testid="bookmark-signin">
|
<a class="action" href="/login" data-testid="bookmark-signin">
|
||||||
Sign in to bookmark or collect
|
Sign in to bookmark or collect
|
||||||
@@ -371,7 +437,7 @@
|
|||||||
>
|
>
|
||||||
<span class="continue-label">Continue reading</span>
|
<span class="continue-label">Continue reading</span>
|
||||||
<span class="continue-target">
|
<span class="continue-target">
|
||||||
Chapter {continueChapterNumber}{#if continueChapterTitle}: {continueChapterTitle}{/if}
|
{continueLabel}
|
||||||
{#if readProgress && readProgress.page > 1}
|
{#if readProgress && readProgress.page > 1}
|
||||||
— page {readProgress.page}
|
— page {readProgress.page}
|
||||||
{/if}
|
{/if}
|
||||||
@@ -385,7 +451,7 @@
|
|||||||
{#each chapters as c (c.id)}
|
{#each chapters as c (c.id)}
|
||||||
<li>
|
<li>
|
||||||
<a href="/manga/{manga.id}/chapter/{c.id}">
|
<a href="/manga/{manga.id}/chapter/{c.id}">
|
||||||
Chapter {c.number}{#if c.title}: {c.title}{/if}
|
{chapterLabel(c)}
|
||||||
</a>
|
</a>
|
||||||
<span class="pages">({c.page_count} pages)</span>
|
<span class="pages">({c.page_count} pages)</span>
|
||||||
</li>
|
</li>
|
||||||
@@ -586,6 +652,29 @@
|
|||||||
color: var(--text);
|
color: var(--text);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.resync-msg {
|
||||||
|
margin-top: var(--space-2);
|
||||||
|
color: var(--text-muted);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
|
||||||
|
.resync-msg.err {
|
||||||
|
color: var(--danger);
|
||||||
|
}
|
||||||
|
|
||||||
|
:global(.spin) {
|
||||||
|
animation: spin 0.9s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes spin {
|
||||||
|
from {
|
||||||
|
transform: rotate(0deg);
|
||||||
|
}
|
||||||
|
to {
|
||||||
|
transform: rotate(360deg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
.continue {
|
.continue {
|
||||||
display: flex;
|
display: flex;
|
||||||
flex-direction: column;
|
flex-direction: column;
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import { onMount, onDestroy } from 'svelte';
|
import { onMount, onDestroy } from 'svelte';
|
||||||
import { goto } from '$app/navigation';
|
import { goto, invalidateAll } from '$app/navigation';
|
||||||
import { fileUrl } from '$lib/api/client';
|
import { fileUrl, ApiError } from '$lib/api/client';
|
||||||
import { GAP_PX, type ReaderPageGap } from '$lib/api/preferences';
|
import { GAP_PX, type ReaderPageGap } from '$lib/api/preferences';
|
||||||
import { preferences } from '$lib/preferences.svelte';
|
import { preferences } from '$lib/preferences.svelte';
|
||||||
import { updateReadProgress } from '$lib/api/read_progress';
|
import { updateReadProgress } from '$lib/api/read_progress';
|
||||||
|
import { chapterLabel } from '$lib/api/chapters';
|
||||||
|
import { resyncChapter } from '$lib/api/admin';
|
||||||
import { readerFullscreen } from '$lib/reader-fullscreen.svelte';
|
import { readerFullscreen } from '$lib/reader-fullscreen.svelte';
|
||||||
import { session } from '$lib/session.svelte';
|
import { session } from '$lib/session.svelte';
|
||||||
import ChevronLeft from '@lucide/svelte/icons/chevron-left';
|
import ChevronLeft from '@lucide/svelte/icons/chevron-left';
|
||||||
@@ -15,6 +17,7 @@
|
|||||||
import ScrollText from '@lucide/svelte/icons/scroll-text';
|
import ScrollText from '@lucide/svelte/icons/scroll-text';
|
||||||
import Maximize2 from '@lucide/svelte/icons/maximize-2';
|
import Maximize2 from '@lucide/svelte/icons/maximize-2';
|
||||||
import Minimize2 from '@lucide/svelte/icons/minimize-2';
|
import Minimize2 from '@lucide/svelte/icons/minimize-2';
|
||||||
|
import RefreshCw from '@lucide/svelte/icons/refresh-cw';
|
||||||
|
|
||||||
let { data } = $props();
|
let { data } = $props();
|
||||||
const manga = $derived(data.manga);
|
const manga = $derived(data.manga);
|
||||||
@@ -26,28 +29,25 @@
|
|||||||
const gapPx = $derived(GAP_PX[preferences.readerPageGap]);
|
const gapPx = $derived(GAP_PX[preferences.readerPageGap]);
|
||||||
|
|
||||||
const pageTitle = $derived(
|
const pageTitle = $derived(
|
||||||
chapter.title
|
`Mangalord | ${manga.title} · ${chapterLabel(chapter)}`
|
||||||
? `${manga.title} — Ch. ${chapter.number}: ${chapter.title}`
|
|
||||||
: `${manga.title} — Ch. ${chapter.number}`
|
|
||||||
);
|
);
|
||||||
|
|
||||||
// Prev/next chapter computed from the chapter list. listChapters
|
// Prev/next chapter computed from the chapter list. listChapters
|
||||||
// returns chapters in number ASC order; we still resolve via find
|
// returns chapters in display order (reversed source-site order, so
|
||||||
// rather than index because the current chapter's position may
|
// oldest first — see backend repo::chapter::list_for_manga), and
|
||||||
// not be `chapter.number - 1` (sparse numbering / chapter 0.5 /
|
// prev/next walks that order positionally. Resolving the current
|
||||||
// future skipped numbers).
|
// index via `find` rather than `chapter.number - 1` matters because
|
||||||
const sortedChapters = $derived(
|
// numbers aren't a reliable index: variants share numbers, non-
|
||||||
[...chapters].sort((a, b) => a.number - b.number)
|
// numeric entries pin to 0, and uploads can sparse-fill.
|
||||||
);
|
|
||||||
const currentIdx = $derived(
|
const currentIdx = $derived(
|
||||||
sortedChapters.findIndex((c) => c.id === chapter.id)
|
chapters.findIndex((c) => c.id === chapter.id)
|
||||||
);
|
);
|
||||||
const prevChapter = $derived(
|
const prevChapter = $derived(
|
||||||
currentIdx > 0 ? sortedChapters[currentIdx - 1] : null
|
currentIdx > 0 ? chapters[currentIdx - 1] : null
|
||||||
);
|
);
|
||||||
const nextChapter = $derived(
|
const nextChapter = $derived(
|
||||||
currentIdx >= 0 && currentIdx < sortedChapters.length - 1
|
currentIdx >= 0 && currentIdx < chapters.length - 1
|
||||||
? sortedChapters[currentIdx + 1]
|
? chapters[currentIdx + 1]
|
||||||
: null
|
: null
|
||||||
);
|
);
|
||||||
|
|
||||||
@@ -256,6 +256,36 @@
|
|||||||
if (typeof window !== 'undefined') window.removeEventListener('keydown', onKeydown);
|
if (typeof window !== 'undefined') window.removeEventListener('keydown', onKeydown);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ---- Admin force resync (current chapter) ----
|
||||||
|
let resyncBusy = $state(false);
|
||||||
|
let resyncMessage = $state<{ kind: 'ok' | 'err'; text: string } | null>(null);
|
||||||
|
async function forceResync() {
|
||||||
|
if (!session.user?.is_admin || resyncBusy) return;
|
||||||
|
resyncBusy = true;
|
||||||
|
resyncMessage = null;
|
||||||
|
try {
|
||||||
|
const r = await resyncChapter(chapter.id);
|
||||||
|
if (r.outcome === 'fetched') {
|
||||||
|
resyncMessage = {
|
||||||
|
kind: 'ok',
|
||||||
|
text: `Refetched ${r.pages} page${r.pages === 1 ? '' : 's'}. Reloading…`
|
||||||
|
};
|
||||||
|
// Re-run all loaders for this route so the reader picks
|
||||||
|
// up the freshly-downloaded pages. The page.ts loader
|
||||||
|
// doesn't `depends()` on anything explicitly, so
|
||||||
|
// invalidateAll is the right brush here.
|
||||||
|
await invalidateAll();
|
||||||
|
} else {
|
||||||
|
resyncMessage = { kind: 'ok', text: 'No new pages — source had nothing fresh.' };
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
const msg = e instanceof ApiError ? e.message : (e as Error).message;
|
||||||
|
resyncMessage = { kind: 'err', text: msg };
|
||||||
|
} finally {
|
||||||
|
resyncBusy = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// ---- Reading progress tracking ----
|
// ---- Reading progress tracking ----
|
||||||
//
|
//
|
||||||
// High-water mark seeded from the server: progress only ever moves
|
// High-water mark seeded from the server: progress only ever moves
|
||||||
@@ -427,6 +457,27 @@
|
|||||||
</a>
|
</a>
|
||||||
|
|
||||||
<div class="controls" role="group" aria-label="reader options">
|
<div class="controls" role="group" aria-label="reader options">
|
||||||
|
<label class="chapter-field">
|
||||||
|
<span class="visually-hidden">Jump to chapter</span>
|
||||||
|
<select
|
||||||
|
class="chapter-select"
|
||||||
|
value={chapter.id}
|
||||||
|
onchange={(e) => {
|
||||||
|
const target = (e.currentTarget as HTMLSelectElement).value;
|
||||||
|
if (target && target !== chapter.id) {
|
||||||
|
void goto(`/manga/${manga.id}/chapter/${target}`);
|
||||||
|
}
|
||||||
|
}}
|
||||||
|
data-testid="reader-chapter-select"
|
||||||
|
>
|
||||||
|
{#each chapters as c (c.id)}
|
||||||
|
<option value={c.id}>
|
||||||
|
{chapterLabel(c)}
|
||||||
|
</option>
|
||||||
|
{/each}
|
||||||
|
</select>
|
||||||
|
</label>
|
||||||
|
|
||||||
<div class="mode-toggle" role="radiogroup" aria-label="layout">
|
<div class="mode-toggle" role="radiogroup" aria-label="layout">
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
@@ -481,6 +532,23 @@
|
|||||||
{/if}
|
{/if}
|
||||||
</span>
|
</span>
|
||||||
|
|
||||||
|
{#if session.user?.is_admin}
|
||||||
|
<button
|
||||||
|
type="button"
|
||||||
|
class="reader-resync"
|
||||||
|
onclick={forceResync}
|
||||||
|
disabled={resyncBusy}
|
||||||
|
title={resyncMessage?.kind === 'err'
|
||||||
|
? resyncMessage.text
|
||||||
|
: 'Force refetch this chapter from the crawler source'}
|
||||||
|
aria-label="Force resync chapter"
|
||||||
|
data-testid="force-resync-chapter"
|
||||||
|
>
|
||||||
|
<RefreshCw size={16} aria-hidden="true" class={resyncBusy ? 'spin' : ''} />
|
||||||
|
<span>{resyncBusy ? 'Resyncing…' : 'Force resync'}</span>
|
||||||
|
</button>
|
||||||
|
{/if}
|
||||||
|
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
class="fullscreen-toggle"
|
class="fullscreen-toggle"
|
||||||
@@ -494,6 +562,17 @@
|
|||||||
</button>
|
</button>
|
||||||
</nav>
|
</nav>
|
||||||
|
|
||||||
|
{#if resyncMessage}
|
||||||
|
<p
|
||||||
|
class="resync-toast"
|
||||||
|
class:err={resyncMessage.kind === 'err'}
|
||||||
|
role="status"
|
||||||
|
data-testid="force-resync-message"
|
||||||
|
>
|
||||||
|
{resyncMessage.text}
|
||||||
|
</p>
|
||||||
|
{/if}
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
Floating exit affordance — only rendered while focus mode is on.
|
Floating exit affordance — only rendered while focus mode is on.
|
||||||
Lives in the top-right corner with a low resting opacity so it
|
Lives in the top-right corner with a low resting opacity so it
|
||||||
@@ -604,7 +683,7 @@
|
|||||||
</span>
|
</span>
|
||||||
</button>
|
</button>
|
||||||
<span class="chapter-bar-current" aria-hidden="true">
|
<span class="chapter-bar-current" aria-hidden="true">
|
||||||
Ch. {chapter.number}{#if chapter.title} — {chapter.title}{/if}
|
{chapterLabel(chapter)}
|
||||||
</span>
|
</span>
|
||||||
<button
|
<button
|
||||||
type="button"
|
type="button"
|
||||||
@@ -741,7 +820,8 @@
|
|||||||
outline-offset: -2px;
|
outline-offset: -2px;
|
||||||
}
|
}
|
||||||
|
|
||||||
.gap-field select {
|
.gap-field select,
|
||||||
|
.chapter-select {
|
||||||
height: 32px;
|
height: 32px;
|
||||||
padding: 0 var(--space-2);
|
padding: 0 var(--space-2);
|
||||||
background: var(--surface);
|
background: var(--surface);
|
||||||
@@ -751,6 +831,13 @@
|
|||||||
font-size: var(--font-sm);
|
font-size: var(--font-sm);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Cap the chapter dropdown's resting width so long titles don't
|
||||||
|
push the rest of the nav off-screen; the native control's
|
||||||
|
expanded menu still shows full option text on focus. */
|
||||||
|
.chapter-select {
|
||||||
|
max-width: 16rem;
|
||||||
|
}
|
||||||
|
|
||||||
.visually-hidden {
|
.visually-hidden {
|
||||||
position: absolute;
|
position: absolute;
|
||||||
width: 1px;
|
width: 1px;
|
||||||
@@ -911,7 +998,8 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* ===== Focus-mode controls ===== */
|
/* ===== Focus-mode controls ===== */
|
||||||
.fullscreen-toggle {
|
.fullscreen-toggle,
|
||||||
|
.reader-resync {
|
||||||
display: inline-flex;
|
display: inline-flex;
|
||||||
align-items: center;
|
align-items: center;
|
||||||
gap: var(--space-1);
|
gap: var(--space-1);
|
||||||
@@ -925,12 +1013,52 @@
|
|||||||
font-size: var(--font-xs);
|
font-size: var(--font-xs);
|
||||||
}
|
}
|
||||||
|
|
||||||
.fullscreen-toggle:hover {
|
.fullscreen-toggle:hover,
|
||||||
|
.reader-resync:hover:not(:disabled) {
|
||||||
background: var(--surface-elevated);
|
background: var(--surface-elevated);
|
||||||
color: var(--text);
|
color: var(--text);
|
||||||
border-color: var(--primary);
|
border-color: var(--primary);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.reader-resync:disabled {
|
||||||
|
opacity: 0.7;
|
||||||
|
cursor: progress;
|
||||||
|
}
|
||||||
|
|
||||||
|
.resync-toast {
|
||||||
|
position: fixed;
|
||||||
|
top: calc(var(--app-header-h) + var(--reader-nav-h, 48px) + var(--space-2));
|
||||||
|
right: var(--space-3);
|
||||||
|
z-index: 11;
|
||||||
|
margin: 0;
|
||||||
|
padding: var(--space-2) var(--space-3);
|
||||||
|
max-width: min(420px, calc(100vw - 2 * var(--space-3)));
|
||||||
|
background: var(--surface);
|
||||||
|
color: var(--text);
|
||||||
|
border: 1px solid var(--primary);
|
||||||
|
border-radius: var(--radius-md);
|
||||||
|
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.12);
|
||||||
|
font-size: var(--font-sm);
|
||||||
|
}
|
||||||
|
|
||||||
|
.resync-toast.err {
|
||||||
|
border-color: var(--danger);
|
||||||
|
color: var(--danger);
|
||||||
|
}
|
||||||
|
|
||||||
|
:global(.spin) {
|
||||||
|
animation: spin 0.9s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes spin {
|
||||||
|
from {
|
||||||
|
transform: rotate(0deg);
|
||||||
|
}
|
||||||
|
to {
|
||||||
|
transform: rotate(360deg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/* Small floating exit affordance — corner-pinned, low resting
|
/* Small floating exit affordance — corner-pinned, low resting
|
||||||
opacity so it doesn't sit on the chapter image too aggressively
|
opacity so it doesn't sit on the chapter image too aggressively
|
||||||
but is still findable without hover. */
|
but is still findable without hover. */
|
||||||
|
|||||||
@@ -135,7 +135,7 @@
|
|||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
<svelte:head>
|
||||||
<title>Edit {manga.title} — Mangalord</title>
|
<title>Mangalord | Edit · {manga.title}</title>
|
||||||
</svelte:head>
|
</svelte:head>
|
||||||
|
|
||||||
<h1>Edit manga</h1>
|
<h1>Edit manga</h1>
|
||||||
|
|||||||
@@ -57,7 +57,7 @@
|
|||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
<svelte:head>
|
||||||
<title>Upload chapter — {manga.title} — Mangalord</title>
|
<title>Mangalord | Upload chapter · {manga.title}</title>
|
||||||
</svelte:head>
|
</svelte:head>
|
||||||
|
|
||||||
<nav class="back">
|
<nav class="back">
|
||||||
|
|||||||
@@ -35,10 +35,6 @@
|
|||||||
);
|
);
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
|
||||||
<title>Profile — Mangalord</title>
|
|
||||||
</svelte:head>
|
|
||||||
|
|
||||||
<header class="profile-header">
|
<header class="profile-header">
|
||||||
<h1>Profile</h1>
|
<h1>Profile</h1>
|
||||||
{#if !session.loaded}
|
{#if !session.loaded}
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
<script lang="ts">
|
<script lang="ts">
|
||||||
import { fileUrl } from '$lib/api/client';
|
import { fileUrl } from '$lib/api/client';
|
||||||
|
import { chapterLabel } from '$lib/api/chapters';
|
||||||
import { clearReadProgress, type ReadProgressSummary } from '$lib/api/read_progress';
|
import { clearReadProgress, type ReadProgressSummary } from '$lib/api/read_progress';
|
||||||
import BookImage from '@lucide/svelte/icons/book-image';
|
import BookImage from '@lucide/svelte/icons/book-image';
|
||||||
import Trash2 from '@lucide/svelte/icons/trash-2';
|
import Trash2 from '@lucide/svelte/icons/trash-2';
|
||||||
@@ -186,7 +187,7 @@
|
|||||||
<a href="/manga/{u.manga_id}" class="title">{u.manga_title}</a>
|
<a href="/manga/{u.manga_id}" class="title">{u.manga_title}</a>
|
||||||
<span class="target">
|
<span class="target">
|
||||||
<a href="/manga/{u.manga_id}/chapter/{u.chapter.id}">
|
<a href="/manga/{u.manga_id}/chapter/{u.chapter.id}">
|
||||||
Chapter {u.chapter.number}{#if u.chapter.title}: {u.chapter.title}{/if}
|
{chapterLabel(u.chapter)}
|
||||||
</a>
|
</a>
|
||||||
<span class="muted">({u.chapter.page_count} pages)</span>
|
<span class="muted">({u.chapter.page_count} pages)</span>
|
||||||
</span>
|
</span>
|
||||||
|
|||||||
@@ -184,10 +184,6 @@
|
|||||||
}
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
<svelte:head>
|
|
||||||
<title>Upload — Mangalord</title>
|
|
||||||
</svelte:head>
|
|
||||||
|
|
||||||
<h1>Create manga</h1>
|
<h1>Create manga</h1>
|
||||||
|
|
||||||
{#if !session.loaded}
|
{#if !session.loaded}
|
||||||
|
|||||||
@@ -21,6 +21,12 @@ export default defineConfig(({ mode }) => {
|
|||||||
environment: 'jsdom',
|
environment: 'jsdom',
|
||||||
include: ['src/**/*.test.ts'],
|
include: ['src/**/*.test.ts'],
|
||||||
globals: false
|
globals: false
|
||||||
|
},
|
||||||
|
resolve: {
|
||||||
|
// Use Svelte's browser entry under vitest so component tests can
|
||||||
|
// mount with @testing-library/svelte. The default (server entry)
|
||||||
|
// throws lifecycle_function_unavailable on mount().
|
||||||
|
conditions: mode === 'test' ? ['browser'] : []
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
});
|
});
|
||||||
|
|||||||
40
tor/entrypoint.sh
Executable file
40
tor/entrypoint.sh
Executable file
@@ -0,0 +1,40 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
# Mangalord wrapper around dockurr/tor's tor binary.
|
||||||
|
#
|
||||||
|
# We bypass the image's stock entrypoint for two reasons:
|
||||||
|
# 1. It generates a `ControlPort 9051` line that binds to localhost
|
||||||
|
# only (tor's default), but our backend lives in a separate
|
||||||
|
# container and needs to reach 0.0.0.0:9051.
|
||||||
|
# 2. It then *skips* writing HashedControlPassword whenever the
|
||||||
|
# user's torrc declares a ControlPort, so we can't both bind to
|
||||||
|
# 0.0.0.0 and benefit from its auto-hashing — it's one or the
|
||||||
|
# other. Doing the hashing ourselves is simpler than threading
|
||||||
|
# around its logic.
|
||||||
|
#
|
||||||
|
# This wrapper hashes $PASSWORD with `tor --hash-password`, appends a
|
||||||
|
# `HashedControlPassword` line to a writable copy of /etc/tor/torrc,
|
||||||
|
# then execs tor. Container runs as root (image default); tor binds
|
||||||
|
# 9050/9051 which don't require root and is fine inside a single-
|
||||||
|
# purpose container.
|
||||||
|
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
if [ -z "${PASSWORD:-}" ]; then
|
||||||
|
echo "ERROR: PASSWORD env must be set (the plain string the backend will" >&2
|
||||||
|
echo " send as CRAWLER_TOR_CONTROL_PASSWORD)" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# `tor --hash-password` prints the hash on the last line of stdout
|
||||||
|
# (preceded by initialization noise).
|
||||||
|
HASH=$(tor --hash-password "$PASSWORD" 2>/dev/null | tail -n1)
|
||||||
|
if [ -z "$HASH" ]; then
|
||||||
|
echo "ERROR: 'tor --hash-password' produced no output" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# /etc/tor/torrc is bind-mounted read-only, so copy + append.
|
||||||
|
cp /etc/tor/torrc /tmp/torrc
|
||||||
|
printf '\n# Injected by mangalord-entrypoint.sh from $PASSWORD env.\nHashedControlPassword %s\n' "$HASH" >> /tmp/torrc
|
||||||
|
|
||||||
|
exec tor -f /tmp/torrc
|
||||||
38
tor/torrc
Normal file
38
tor/torrc
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
# torrc for the Mangalord crawler.
|
||||||
|
#
|
||||||
|
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
|
||||||
|
# crawler talks to this daemon over the internal compose network only:
|
||||||
|
# `expose:` on the tor service surfaces 9050/9051 to sibling
|
||||||
|
# containers, never to the host.
|
||||||
|
|
||||||
|
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
|
||||||
|
# IsolateDestPort means each new (destination IP, port) draws a fresh
|
||||||
|
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
|
||||||
|
# navigation instead of having to wait for an existing dirty circuit
|
||||||
|
# to age out.
|
||||||
|
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
|
||||||
|
|
||||||
|
# Control port for SIGNAL NEWNYM. We rely on the dockurr/tor
|
||||||
|
# entrypoint to inject `HashedControlPassword <hash>` from its
|
||||||
|
# PASSWORD env var (see docker-compose.yml `tor.environment.PASSWORD`)
|
||||||
|
# via a higher-priority --defaults-torrc. We just need to declare the
|
||||||
|
# port itself here.
|
||||||
|
ControlPort 0.0.0.0:9051
|
||||||
|
|
||||||
|
# Keep circuits dirty for a while so a single chapter (which serial-
|
||||||
|
# fetches all its images through the same SOCKS endpoint) finishes on
|
||||||
|
# one circuit rather than mid-circuit-rotating in a way that looks like
|
||||||
|
# anti-bot evasion to the target. NEWNYM still forces a fresh circuit
|
||||||
|
# immediately when we want one — this is just the idle-rotation knob.
|
||||||
|
MaxCircuitDirtiness 600
|
||||||
|
|
||||||
|
# Drop privileges to the image's `tor` user after binding ports.
|
||||||
|
# Required because /var/lib/tor (the image's DataDirectory volume)
|
||||||
|
# is owned by tor:tor and tor refuses to use a data dir it doesn't
|
||||||
|
# own. Our entrypoint runs as root only so it can call
|
||||||
|
# `tor --hash-password` and write /tmp/torrc.
|
||||||
|
User tor
|
||||||
|
|
||||||
|
# Data + logs.
|
||||||
|
DataDirectory /var/lib/tor
|
||||||
|
Log notice stdout
|
||||||
Reference in New Issue
Block a user