Files
Mangalord/tor/torrc
MechaCat02 e3cff9d874
Some checks failed
deploy / test-backend (push) Successful in 20m29s
deploy / test-frontend (push) Successful in 9m42s
deploy / deploy (push) Has been cancelled
deploy / build-and-push (push) Has been cancelled
fix(deploy): pivot tor service to password auth + wrapper entrypoint
Dockurr/tor's stock entrypoint binds the control port to localhost
(unreachable from a sibling container), refuses to run as a
non-default user (its setup chowns dirs and su-execs down to its
`tor` user, both requiring root), and skips its own
HashedControlPassword injection whenever the user's torrc declares
a ControlPort. The combination meant the original cookie-via-shared-
volume design couldn't work without fighting the image.

This commit:

- Adds tor/entrypoint.sh, a small wrapper that hashes $PASSWORD
  with `tor --hash-password`, appends the hash to a writable copy
  of /etc/tor/torrc, then execs tor. Container runs as root only
  for that bring-up; the torrc's `User tor` directive drops privs
  after port binding.
- Adds a healthcheck on the tor service that gates downstream
  containers on both 9050 + 9051 actually listening (was
  service_started, which fires before tor finishes bootstrap).
- Loosens MaxCircuitDirtiness 60 → 600. The 60s value would have
  rotated mid-chapter for any chapter with > ~50 images, which is
  exactly the kind of fingerprint we're trying to avoid.
- Wires TOR_CONTROL_PASSWORD as a REQUIRED .env var on both sides
  (PASSWORD on tor, CRAWLER_TOR_CONTROL_PASSWORD on backend).
  docker-compose.yml fails fast if unset.
- Removes the tor-data shared volume on backend (cookie auth is no
  longer the default; operators wanting cookie can mount it back).
- Documents the pivot + the cookie-vs-password tradeoff in
  .env.example.

End-to-end validated: `docker compose up -d tor`, then
`printf 'AUTHENTICATE "test"\r\nSIGNAL NEWNYM\r\nQUIT\r\n' | nc tor 9051`
returns three `250 OK` lines.

Audit ref: #2, #3, #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 20:25:54 +02:00

39 lines
1.6 KiB
Plaintext

# torrc for the Mangalord crawler.
#
# Mounted into the dockurr/tor container at /etc/tor/torrc. The
# crawler talks to this daemon over the internal compose network only:
# `expose:` on the tor service surfaces 9050/9051 to sibling
# containers, never to the host.
# SOCKS5 proxy that reqwest and Chromium use. IsolateDestAddr +
# IsolateDestPort means each new (destination IP, port) draws a fresh
# circuit — so a SIGNAL NEWNYM picks up promptly on the next
# navigation instead of having to wait for an existing dirty circuit
# to age out.
SOCKSPort 0.0.0.0:9050 IsolateDestAddr IsolateDestPort
# Control port for SIGNAL NEWNYM. We rely on the dockurr/tor
# entrypoint to inject `HashedControlPassword <hash>` from its
# PASSWORD env var (see docker-compose.yml `tor.environment.PASSWORD`)
# via a higher-priority --defaults-torrc. We just need to declare the
# port itself here.
ControlPort 0.0.0.0:9051
# Keep circuits dirty for a while so a single chapter (which serial-
# fetches all its images through the same SOCKS endpoint) finishes on
# one circuit rather than mid-circuit-rotating in a way that looks like
# anti-bot evasion to the target. NEWNYM still forces a fresh circuit
# immediately when we want one — this is just the idle-rotation knob.
MaxCircuitDirtiness 600
# Drop privileges to the image's `tor` user after binding ports.
# Required because /var/lib/tor (the image's DataDirectory volume)
# is owned by tor:tor and tor refuses to use a data dir it doesn't
# own. Our entrypoint runs as root only so it can call
# `tor --hash-password` and write /tmp/torrc.
User tor
# Data + logs.
DataDirectory /var/lib/tor
Log notice stdout