fix(deploy): pivot tor service to password auth + wrapper entrypoint
Dockurr/tor's stock entrypoint binds the control port to localhost (unreachable from a sibling container), refuses to run as a non-default user (its setup chowns dirs and su-execs down to its `tor` user, both requiring root), and skips its own HashedControlPassword injection whenever the user's torrc declares a ControlPort. The combination meant the original cookie-via-shared- volume design couldn't work without fighting the image. This commit: - Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD with `tor --hash-password`, appends the hash to a writable copy of /etc/tor/torrc, then execs tor. Container runs as root only for that bring-up; the torrc's `User tor` directive drops privs after port binding. - Adds a healthcheck on the tor service that gates downstream containers on both 9050 + 9051 actually listening (was service_started, which fires before tor finishes bootstrap). - Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have rotated mid-chapter for any chapter with > ~50 images, which is exactly the kind of fingerprint we're trying to avoid. - Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides (PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend). docker-compose.yml fails fast if unset. - Removes the tor-data shared volume on backend (cookie auth is no longer the default; operators wanting cookie can mount it back). - Documents the pivot + the cookie-vs-password tradeoff in .env.example. End-to-end validated: `docker compose up -d tor`, then `printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051` returns three `250 OK` lines. Audit ref: #2, #3, #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -24,13 +24,34 @@ services:
|
||||
# can signal NEWNYM on bad pages. See tor/torrc for the daemon
|
||||
# config; both ports are only `expose`d (compose-internal), never
|
||||
# bound on the host.
|
||||
#
|
||||
# We bypass dockurr/tor's stock entrypoint because it binds the
|
||||
# control port to localhost (unreachable from the backend
|
||||
# container) and skips its own HashedControlPassword injection
|
||||
# when the user's torrc declares a ControlPort. Our wrapper
|
||||
# (tor/entrypoint.sh) generates the hash from $PASSWORD and execs
|
||||
# tor with our torrc. Backend authenticates with the same plain
|
||||
# string via CRAWLER_TOR_CONTROL_PASSWORD.
|
||||
image: dockurr/tor:latest
|
||||
entrypoint: ["/bin/sh", "/usr/local/bin/mangalord-entrypoint.sh"]
|
||||
environment:
|
||||
PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
|
||||
volumes:
|
||||
- ./tor/torrc:/etc/tor/torrc:ro
|
||||
- tor-data:/var/lib/tor
|
||||
- ./tor/entrypoint.sh:/usr/local/bin/mangalord-entrypoint.sh:ro
|
||||
expose:
|
||||
- "9050"
|
||||
- "9051"
|
||||
# Wait for both control + SOCKS ports to listen before downstream
|
||||
# services start. dockurr/tor's main process spawns before tor
|
||||
# itself is bound, so `service_started` alone races the first
|
||||
# NEWNYM call.
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "nc -z 127.0.0.1 9050 && nc -z 127.0.0.1 9051"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 20
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
|
||||
backend:
|
||||
@@ -39,7 +60,7 @@ services:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
tor:
|
||||
condition: service_started
|
||||
condition: service_healthy
|
||||
environment:
|
||||
DATABASE_URL: postgres://${POSTGRES_USER:-mangalord}:${POSTGRES_PASSWORD:?POSTGRES_PASSWORD must be set in .env}@postgres:5432/${POSTGRES_DB:-mangalord}
|
||||
BIND_ADDRESS: 0.0.0.0:8080
|
||||
@@ -61,18 +82,17 @@ services:
|
||||
# so the image actually contains the binary.
|
||||
CRAWLER_CHROMIUM_BINARY: ${CRAWLER_CHROMIUM_BINARY:-}
|
||||
# TOR proxy + NEWNYM recircuit (see .env.example for details).
|
||||
# Defaults assume the bundled `tor` service above; override to
|
||||
# empty strings to disable.
|
||||
# Defaults assume the bundled `tor` service above; override
|
||||
# CRAWLER_PROXY= and CRAWLER_TOR_CONTROL_URL= (both empty) in
|
||||
# .env to disable. CRAWLER_TOR_CONTROL_PASSWORD MUST match the
|
||||
# tor service's PASSWORD (both wired to the same TOR_CONTROL_PASSWORD
|
||||
# .env var below).
|
||||
CRAWLER_PROXY: ${CRAWLER_PROXY-socks5h://tor:9050}
|
||||
CRAWLER_TOR_CONTROL_URL: ${CRAWLER_TOR_CONTROL_URL-tcp://tor:9051}
|
||||
CRAWLER_TOR_CONTROL_COOKIE_PATH: ${CRAWLER_TOR_CONTROL_COOKIE_PATH-/var/lib/tor/control_auth_cookie}
|
||||
CRAWLER_TOR_CONTROL_PASSWORD: ${CRAWLER_TOR_CONTROL_PASSWORD:-}
|
||||
CRAWLER_TOR_CONTROL_PASSWORD: ${TOR_CONTROL_PASSWORD:?TOR_CONTROL_PASSWORD must be set in .env}
|
||||
CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS: ${CRAWLER_TOR_RECIRCUIT_MAX_ATTEMPTS:-3}
|
||||
volumes:
|
||||
- storage-data:/var/lib/mangalord/storage
|
||||
# Read the TOR control-auth cookie from the shared named volume.
|
||||
# Read-only on the backend side; the tor service is the writer.
|
||||
- tor-data:/var/lib/tor:ro
|
||||
# No host port mapping in the default setup — the frontend proxies
|
||||
# /api/* through its hooks.server.ts. Expose :8080 only if you want
|
||||
# to hit the API directly from the host (e.g., bot scripts during
|
||||
@@ -94,4 +114,3 @@ services:
|
||||
volumes:
|
||||
postgres-data:
|
||||
storage-data:
|
||||
tor-data:
|
||||
|
||||
Reference in New Issue
Block a user