Adds a `tor` service to the compose stack (dockurr/tor) with a torrc tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr + IsolateDestPort so NEWNYM picks up promptly, control port on 9051 with cookie auth, MaxCircuitDirtiness 60. Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on out-of-the-box. Operators can override both to empty in .env to opt out without removing the service. The tor-data named volume is mounted ro on the backend so it can read /var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles the permissions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
33 lines
1.3 KiB
Plaintext
33 lines
1.3 KiB
Plaintext
# torrc for the Mangalord crawler.
|
|
#
|
|
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
|
|
# crawler talks to this daemon over the internal compose network only:
|
|
# `expose:` on the tor service surfaces 9050/9051 to sibling
|
|
# containers, never to the host.
|
|
|
|
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
|
|
# IsolateDestPort means each new (destination IP, port) draws a fresh
|
|
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
|
|
# navigation instead of having to wait for an existing dirty circuit
|
|
# to age out.
|
|
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
|
|
|
|
# Control port for SIGNAL NEWNYM. Cookie auth means no secret to manage
|
|
# in .env — the cookie file is created by the daemon at startup and
|
|
# shared with the backend container via the named `tor-data` volume.
|
|
# CookieAuthFileGroupReadable lets the backend's gid read it without
|
|
# having to run as root.
|
|
ControlPort 0.0.0.0:9051
|
|
CookieAuthentication 1
|
|
CookieAuthFile /var/lib/tor/control_auth_cookie
|
|
CookieAuthFileGroupReadable 1
|
|
|
|
# Keep circuits short-lived so NEWNYM actually changes our visible
|
|
# exit soon. Default is 600s (10 min); 60s is short enough that retries
|
|
# after a brief site rate-limit window almost always see a new IP.
|
|
MaxCircuitDirtiness 60
|
|
|
|
# Data + logs.
|
|
DataDirectory /var/lib/tor
|
|
Log notice stdout
|