Files
Mangalord/docker-compose.yml
MechaCat02 e3cff9d874
Some checks failed
deploy / test-backend (push) Successful in 20m29s
deploy / test-frontend (push) Successful in 9m42s
deploy / deploy (push) Has been cancelled
deploy / build-and-push (push) Has been cancelled
fix(deploy): pivot tor service to password auth + wrapper entrypoint
Dockurr/tor's stock entrypoint binds the control port to localhost
(unreachable from a sibling container), refuses to run as a
non-default user (its setup chowns dirs and su-execs down to its
`tor` user, both requiring root), and skips its own
HashedControlPassword injection whenever the user's torrc declares
a ControlPort. The combination meant the original cookie-via-shared-
volume design couldn't work without fighting the image.

This commit:

- Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD
  with `tor --hash-password`, appends the hash to a writable copy
  of /etc/tor/torrc, then execs tor. Container runs as root only
  for that bring-up; the torrc's `User tor` directive drops privs
  after port binding.
- Adds a healthcheck on the tor service that gates downstream
  containers on both 9050 + 9051 actually listening (was
  service_started, which fires before tor finishes bootstrap).
- Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have
  rotated mid-chapter for any chapter with > ~50 images, which is
  exactly the kind of fingerprint we're trying to avoid.
- Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides
  (PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend).
  docker-compose.yml fails fast if unset.
- Removes the tor-data shared volume on backend (cookie auth is no
  longer the default; operators wanting cookie can mount it back).
- Documents the pivot + the cookie-vs-password tradeoff in
  .env.example.

End-to-end validated: `docker compose up -d tor`, then
`printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051`
returns three `250 OK` lines.

Audit ref: #2, #3, #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:25:54 +02:00

117 lines
4.8 KiB
YAML

# Production-like compose. Requires a populated `.env` next to this
# file: at minimum POSTGRES_PASSWORD must be set to a non-default
# value (the `?required` form below fails fast otherwise). The
# frontend container expects HTTPS in front (Caddy/Traefik/nginx)
# because COOKIE_SECURE=true browsers will refuse to send the session
# cookie over plain HTTP.
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER:-mangalord}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}
POSTGRES_DB: ${POSTGRES_DB:-mangalord}
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-mangalord}"]
interval: 5s
timeout: 5s
retries: 10
tor:
# SOCKS5 proxy for the crawler, plus a control port so the backend
# can signal NEWNYM on bad pages. See tor/torrc for the daemon
# config; both ports are only `expose`d (compose-internal), never
# bound on the host.
#
# We bypass dockurr/tor's stock entrypoint because it binds the
# control port to localhost (unreachable from the backend
# container) and skips its own HashedControlPassword injection
# when the user's torrc declares a ControlPort. Our wrapper
# (tor/entrypoint.sh) generates the hash from $PASSWORD and execs
# tor with our torrc. Backend authenticates with the same plain
# string via CRAWLER_TOR_CONTROL_PASSWORD.
image: dockurr/tor:latest
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
environment:
PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
volumes:
- ./tor/torrc:/etc/tor/torrc:ro
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
expose:
- "9050"
- "9051"
# Wait for both control + SOCKS ports to listen before downstream
# services start. dockurr/tor's main process spawns before tor
# itself is bound, so `service_started` alone races the first
# NEWNYM call.
healthcheck:
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
interval: 5s
timeout: 5s
retries: 20
start_period: 30s
restart: unless-stopped
backend:
build: ./backend
depends_on:
postgres:
condition: service_healthy
tor:
condition: service_healthy
environment:
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
BIND_ADDRESS: 0.0.0.0:8080
STORAGE_DIR: /var/lib/mangalord/storage
RUST_LOG: ${RUST_LOG:-info,mangalord=debug}
# Auth / cookies — see .env.example for context.
COOKIE_SECURE: ${COOKIE_SECURE:-true}
COOKIE_DOMAIN: ${COOKIE_DOMAIN:-}
SESSION_TTL_DAYS: ${SESSION_TTL_DAYS:-30}
# CORS — same-origin by default; populate when serving the API on
# a different host than the frontend.
CORS_ALLOWED_ORIGINS: ${CORS_ALLOWED_ORIGINS:-}
# Upload limits.
MAX_REQUEST_BYTES: ${MAX_REQUEST_BYTES:-209715200}
MAX_FILE_BYTES: ${MAX_FILE_BYTES:-20971520}
# System-chromium override for the crawler. Leave blank to use the
# bundled fetcher; set to e.g. /usr/bin/chromium-headless-shell on
# arm64 deployments. Pair with `--build-arg INSTALL_CHROMIUM=true`
# so the image actually contains the binary.
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
# TOR proxy + NEWNYM recircuit (see .env.example for details).
# Defaults assume the bundled `tor` service above; override
# CRAWLER_PROXY= and CRAWLER_TOR_CONTROL_URL= (both empty) in
# .env to disable. CRAWLER_TOR_CONTROL_PASSWORD MUST match the
# tor service's PASSWORD (both wired to the same TOR_CONTROL_PASSWORD
# .env var below).
CRAWLER_PROXY: ${CRAWLER_PROXY-socks5h://tor:9050}
CRAWLER_TOR_CONTROL_URL: ${CRAWLER_TOR_CONTROL_URL-tcp://tor:9051}
CRAWLER_TOR_CONTROL_PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS: ${CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS:-3}
volumes:
- storage-data:/var/lib/mangalord/storage
# No host port mapping in the default setup — the frontend proxies
# /api/* through its hooks.server.ts. Expose :8080 only if you want
# to hit the API directly from the host (e.g., bot scripts during
# development).
expose:
- "8080"
frontend:
build: ./frontend
depends_on:
- backend
environment:
# SvelteKit's hooks.server.ts proxies /api/* to this URL so the
# browser only ever talks to :3000 and cookies stay same-origin.
BACKEND_URL: http://backend:8080
ports:
- "3000:3000"
volumes:
postgres-data:
storage-data: