feat(deploy): dockurr/tor service + torrc; wire crawler to use it by default

Adds a `tor` service to the compose stack (dockurr/tor) with a torrc
tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr +
IsolateDestPort so NEWNYM picks up promptly, control port on 9051
with cookie auth, MaxCircuitDirtiness 60.

Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and
CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on
out-of-the-box. Operators can override both to empty in .env to opt
out without removing the service.

The tor-data named volume is mounted ro on the backend so it can read
/var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles
the permissions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-31 18:54:40 +02:00
parent 8c6378b877
commit ecbbebafc4
3 changed files with 84 additions and 0 deletions

32
tor/torrc Normal file
View File

@@ -0,0 +1,32 @@
# torrc for the Mangalord crawler.
#
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
# crawler talks to this daemon over the internal compose network only:
# `expose:` on the tor service surfaces 9050/9051 to sibling
# containers, never to the host.
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
# IsolateDestPort means each new (destination IP, port) draws a fresh
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
# navigation instead of having to wait for an existing dirty circuit
# to age out.
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
# Control port for SIGNAL NEWNYM. Cookie auth means no secret to manage
# in .env — the cookie file is created by the daemon at startup and
# shared with the backend container via the named `tor-data` volume.
# CookieAuthFileGroupReadable lets the backend's gid read it without
# having to run as root.
ControlPort 0.0.0.0:9051
CookieAuthentication 1
CookieAuthFile /var/lib/tor/control_auth_cookie
CookieAuthFileGroupReadable 1
# Keep circuits short-lived so NEWNYM actually changes our visible
# exit soon. Default is 600s (10 min); 60s is short enough that retries
# after a brief site rate-limit window almost always see a new IP.
MaxCircuitDirtiness 60
# Data + logs.
DataDirectory /var/lib/tor
Log notice stdout