Adds a `tor` service to the compose stack (dockurr/tor) with a torrc tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr + IsolateDestPort so NEWNYM picks up promptly, control port on 9051 with cookie auth, MaxCircuitDirtiness 60. Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on out-of-the-box. Operators can override both to empty in .env to opt out without removing the service. The tor-data named volume is mounted ro on the backend so it can read /var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles the permissions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
122 lines
5.7 KiB
Plaintext
122 lines
5.7 KiB
Plaintext
# Copy to .env for `docker compose up --build`. Local-dev runs (cargo run
|
|
# / npm run dev) read backend/.env if present, or pick up the variables
|
|
# from your shell.
|
|
#
|
|
# Production note: COOKIE_SECURE=true (the default below) makes browsers
|
|
# refuse to send the session cookie over plain HTTP. Run with a TLS-
|
|
# terminating reverse proxy (Caddy, Traefik, nginx) in front — the
|
|
# compose file here doesn't ship one. Local/dev runs without HTTPS
|
|
# should set COOKIE_SECURE=false.
|
|
|
|
# ----- Postgres -----
|
|
# These are read by the Postgres container *and* by DATABASE_URL below;
|
|
# changing them after the first boot won't migrate existing data, so set
|
|
# them up front for any new deployment.
|
|
#
|
|
# POSTGRES_PASSWORD is REQUIRED — docker-compose.yml fails fast if it
|
|
# isn't set in this file, to prevent a deploy without an .env booting
|
|
# Postgres with a publicly-known credential.
|
|
POSTGRES_USER=mangalord
|
|
POSTGRES_PASSWORD=change-me-to-a-strong-random-string
|
|
POSTGRES_DB=mangalord
|
|
|
|
# ----- Backend -----
|
|
DATABASE_URL=postgres://mangalord:mangalord@postgres:5432/mangalord
|
|
BIND_ADDRESS=0.0.0.0:8080
|
|
STORAGE_DIR=/var/lib/mangalord/storage
|
|
RUST_LOG=info,mangalord=debug,chromiumoxide::conn=off,chromiumoxide::handler=off
|
|
|
|
# ----- Auth / cookies -----
|
|
# COOKIE_SECURE controls whether the `Secure` flag is set on the session
|
|
# cookie. Keep `true` in production (HTTPS); set to `false` if you're
|
|
# serving over plain HTTP locally (e.g., behind a dev reverse proxy).
|
|
COOKIE_SECURE=true
|
|
# COOKIE_DOMAIN scopes the session cookie. Leave empty to default to the
|
|
# requesting host. Set when serving the API and frontend on subdomains of
|
|
# a shared parent (e.g., `.example.com`) so the cookie is shared.
|
|
COOKIE_DOMAIN=
|
|
# Session lifetime in days. Expired sessions are no longer accepted and
|
|
# get reaped lazily.
|
|
SESSION_TTL_DAYS=30
|
|
|
|
# ----- Auth brute-force rate limits -----
|
|
# Token-bucket budget shared across /auth/login, /auth/register, and
|
|
# /auth/me/password. Set per_sec=0 to disable (e.g. behind a
|
|
# rate-limiting reverse proxy that already enforces a budget).
|
|
AUTH_RATE_PER_SEC=5
|
|
AUTH_RATE_BURST=10
|
|
|
|
# ----- CORS -----
|
|
# Comma-separated origins allowed to call the API with credentials.
|
|
# Default is empty: same-origin only. Set when frontend and backend live
|
|
# on different hosts. Example: https://app.example.com,https://app.example.de
|
|
CORS_ALLOWED_ORIGINS=
|
|
|
|
# ----- Upload limits -----
|
|
# Per-request body cap. axum rejects oversized requests with 413 before
|
|
# our handlers run. Default 200 MiB.
|
|
MAX_REQUEST_BYTES=209715200
|
|
# Per-image-part cap. Enforced after reading each part, so a single
|
|
# oversized image is rejected even when the total request fits.
|
|
# Default 20 MiB.
|
|
MAX_FILE_BYTES=20971520
|
|
|
|
# ----- Crawler download safety -----
|
|
# Hosts the crawler is allowed to fetch images/covers from, in addition
|
|
# to CRAWLER_START_URL's host and CRAWLER_CDN_HOST. Comma-separated.
|
|
# Defends against SSRF via scraped <img src="http://10.0.0.1/...">.
|
|
CRAWLER_DOWNLOAD_ALLOWLIST=
|
|
# Bypass the host allowlist entirely. Intended for sources that shard
|
|
# images across numbered CDN subdomains (cdn1/cdn2/…) where enumerating
|
|
# every host upfront is impractical. The private-IP / localhost / non-
|
|
# http(s) scheme defenses STAY ON — a scraped <img src="http://10.0.0.1/">
|
|
# is still refused with this flag set.
|
|
CRAWLER_ALLOW_ANY_HOST=false
|
|
# Hard cap on a single image body. Default 32 MiB.
|
|
CRAWLER_MAX_IMAGE_BYTES=33554432
|
|
# Path to a system Chromium binary. When set, the crawler skips the
|
|
# bundled-fetcher download. Required on platforms without a usable
|
|
# upstream Chromium build (notably Linux_arm64 / Raspberry Pi). On
|
|
# Debian: /usr/bin/chromium-headless-shell or /usr/bin/chromium. On
|
|
# Ubuntu the package is chromium-browser (different path). Pair with
|
|
# `docker compose build --build-arg INSTALL_CHROMIUM=true backend` so
|
|
# the image actually contains the binary.
|
|
CRAWLER_CHROMIUM_BINARY=
|
|
|
|
# ----- Crawler TOR proxy + recircuit -----
|
|
# The compose stack ships a `tor` service (dockurr/tor) and defaults
|
|
# CRAWLER_PROXY to it, so by default all crawler traffic exits via the
|
|
# TOR network. To opt out, set CRAWLER_PROXY= (empty) AND
|
|
# CRAWLER_TOR_CONTROL_URL= (empty) below — the tor service can stay
|
|
# running, it just won't be used.
|
|
#
|
|
# CRAWLER_PROXY: SOCKS5(h) URL. Use `socks5h://` (not `socks5://`) so
|
|
# DNS resolution also goes through TOR, avoiding leaks via the host's
|
|
# resolver. Leave unset to talk to the upstream directly.
|
|
CRAWLER_PROXY=socks5h://tor:9050
|
|
# Control-port URL for SIGNAL NEWNYM ("get a fresh circuit"). Triggered
|
|
# automatically on bad pages (broken-page body, missing #logo) and on
|
|
# the Unauthenticated session probe outcome. Leave unset to disable the
|
|
# recircuit feature (the SOCKS proxy still works).
|
|
CRAWLER_TOR_CONTROL_URL=tcp://tor:9051
|
|
# Auth — cookie file (preferred) or password (HashedControlPassword).
|
|
# Cookie wins when both are set. The bundled torrc enables cookie auth
|
|
# and shares /var/lib/tor between containers via a named volume.
|
|
CRAWLER_TOR_CONTROL_COOKIE_PATH=/var/lib/tor/control_auth_cookie
|
|
# CRAWLER_TOR_CONTROL_PASSWORD=
|
|
# Max NEWNYM-and-retry cycles per recircuit-eligible failure. Default 3.
|
|
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS=3
|
|
|
|
# ----- Frontend -----
|
|
# The frontend container runs SvelteKit's Node adapter on :3000 and
|
|
# proxies /api/* to BACKEND_URL via src/hooks.server.ts. In compose the
|
|
# default `http://backend:8080` reaches the backend service over the
|
|
# internal docker network. Override only if you're running the
|
|
# frontend container against a backend somewhere else.
|
|
BACKEND_URL=http://backend:8080
|
|
# Per-request wall-clock cap for the /api/* reverse proxy (milliseconds).
|
|
# Default 300000 (5 min) covers a typical 200 MiB chapter upload over
|
|
# 25 Mbps; raise for users on slower upstream links or lower if a
|
|
# tighter front proxy already bounds the request lifetime.
|
|
BACKEND_PROXY_TIMEOUT_MS=300000
|