feat(deploy): dockurr/tor service + torrc; wire crawler to use it by default
Adds a `tor` service to the compose stack (dockurr/tor) with a torrc tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr + IsolateDestPort so NEWNYM picks up promptly, control port on 9051 with cookie auth, MaxCircuitDirtiness 60. Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on out-of-the-box. Operators can override both to empty in .env to opt out without removing the service. The tor-data named volume is mounted ro on the backend so it can read /var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles the permissions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
32
tor/torrc
Normal file
32
tor/torrc
Normal file
@@ -0,0 +1,32 @@
|
||||
# torrc for the Mangalord crawler.
|
||||
#
|
||||
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
|
||||
# crawler talks to this daemon over the internal compose network only:
|
||||
# `expose:` on the tor service surfaces 9050/9051 to sibling
|
||||
# containers, never to the host.
|
||||
|
||||
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
|
||||
# IsolateDestPort means each new (destination IP, port) draws a fresh
|
||||
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
|
||||
# navigation instead of having to wait for an existing dirty circuit
|
||||
# to age out.
|
||||
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
|
||||
|
||||
# Control port for SIGNAL NEWNYM. Cookie auth means no secret to manage
|
||||
# in .env — the cookie file is created by the daemon at startup and
|
||||
# shared with the backend container via the named `tor-data` volume.
|
||||
# CookieAuthFileGroupReadable lets the backend's gid read it without
|
||||
# having to run as root.
|
||||
ControlPort 0.0.0.0:9051
|
||||
CookieAuthentication 1
|
||||
CookieAuthFile /var/lib/tor/control_auth_cookie
|
||||
CookieAuthFileGroupReadable 1
|
||||
|
||||
# Keep circuits short-lived so NEWNYM actually changes our visible
|
||||
# exit soon. Default is 600s (10 min); 60s is short enough that retries
|
||||
# after a brief site rate-limit window almost always see a new IP.
|
||||
MaxCircuitDirtiness 60
|
||||
|
||||
# Data + logs.
|
||||
DataDirectory /var/lib/tor
|
||||
Log notice stdout
|
||||
Reference in New Issue
Block a user