feat(deploy): dockurr/tor service + torrc; wire crawler to use it by default

Adds a `tor` service to the compose stack (dockurr/tor) with a torrc
tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr +
IsolateDestPort so NEWNYM picks up promptly, control port on 9051
with cookie auth, MaxCircuitDirtiness 60.

Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and
CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on
out-of-the-box. Operators can override both to empty in .env to opt
out without removing the service.

The tor-data named volume is mounted ro on the backend so it can read
/var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles
the permissions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-31 18:54:40 +02:00
parent 8c6378b877
commit ecbbebafc4
3 changed files with 84 additions and 0 deletions

View File

@@ -19,11 +19,27 @@ services:
timeout: 5s
retries: 10
tor:
# SOCKS5 proxy for the crawler, plus a control port so the backend
# can signal NEWNYM on bad pages. See tor/torrc for the daemon
# config; both ports are only `expose`d (compose-internal), never
# bound on the host.
image: dockurr/tor:latest
volumes:
- ./tor/torrc:/etc/tor/torrc:ro
- tor-data:/var/lib/tor
expose:
- "9050"
- "9051"
restart: unless-stopped
backend:
build: ./backend
depends_on:
postgres:
condition: service_healthy
tor:
condition: service_started
environment:
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
BIND_ADDRESS: 0.0.0.0:8080
@@ -44,8 +60,19 @@ services:
# arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true`
# so the image actually contains the binary.
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
# TOR proxy + NEWNYM recircuit (see .env.example for details).
# Defaults assume the bundled `tor` service above; override to
# empty strings to disable.
CRAWLER_PROXY: ${CRAWLER_PROXY-socks5h://tor:9050}
CRAWLER_TOR_CONTROL_URL: ${CRAWLER_TOR_CONTROL_URL-tcp://tor:9051}
CRAWLER_TOR_CONTROL_COOKIE_PATH: ${CRAWLER_TOR_CONTROL_COOKIE_PATH-/var/lib/tor/control_auth_cookie}
CRAWLER_TOR_CONTROL_PASSWORD: ${CRAWLER_TOR_CONTROL_PASSWORD:-}
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS: ${CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS:-3}
volumes:
- storage-data:/var/lib/mangalord/storage
# Read the TOR control-auth cookie from the shared named volume.
# Read-only on the backend side; the tor service is the writer.
- tor-data:/var/lib/tor:ro
# No host port mapping in the default setup — the frontend proxies
# /api/* through its hooks.server.ts. Expose :8080 only if you want
# to hit the API directly from the host (e.g., bot scripts during
@@ -67,3 +94,4 @@ services:
volumes:
postgres-data:
storage-data:
tor-data: