Dockurr/tor's stock entrypoint binds the control port to localhost
(unreachable from a sibling container), refuses to run as a
non-default user (its setup chowns dirs and su-execs down to its
`tor` user, both requiring root), and skips its own
HashedControlPassword injection whenever the user's torrc declares
a ControlPort. The combination meant the original cookie-via-shared-
volume design couldn't work without fighting the image.
This commit:
- Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD
with `tor --hash-password`, appends the hash to a writable copy
of /etc/tor/torrc, then execs tor. Container runs as root only
for that bring-up; the torrc's `User tor` directive drops privs
after port binding.
- Adds a healthcheck on the tor service that gates downstream
containers on both 9050 + 9051 actually listening (was
service_started, which fires before tor finishes bootstrap).
- Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have
rotated mid-chapter for any chapter with > ~50 images, which is
exactly the kind of fingerprint we're trying to avoid.
- Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides
(PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend).
docker-compose.yml fails fast if unset.
- Removes the tor-data shared volume on backend (cookie auth is no
longer the default; operators wanting cookie can mount it back).
- Documents the pivot + the cookie-vs-password tradeoff in
.env.example.
End-to-end validated: `docker compose up -d tor`, then
`printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051`
returns three `250 OK` lines.
Audit ref: #2, #3, #6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `tor` service to the compose stack (dockurr/tor) with a torrc
tuned for the crawler — SOCKS5 on 9050 with IsolateDestAddr +
IsolateDestPort so NEWNYM picks up promptly, control port on 9051
with cookie auth, MaxCircuitDirtiness 60.
Backend defaults CRAWLER_PROXY → socks5h://tor:9050 and
CRAWLER_TOR_CONTROL_URL → tcp://tor:9051 so TOR + recircuit are on
out-of-the-box. Operators can override both to empty in .env to opt
out without removing the service.
The tor-data named volume is mounted ro on the backend so it can read
/var/lib/tor/control_auth_cookie; CookieAuthFileGroupReadable handles
the permissions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>