Four crawler defences in one PR (all four threats the review flagged
in §3 of REVIEW.md):
- New crawler::safety module with is_safe_url + accumulate_capped +
fetch_bytes_capped. Rejects non-http(s) schemes, RFC1918 / loopback
/ link-local / CGNAT / ULA / IPv6-link-local hosts, and any host
not on the operator's allowlist (defaults to CRAWLER_START_URL host
+ CRAWLER_CDN_HOST + CRAWLER_DOWNLOAD_ALLOWLIST extras).
- Streaming size cap (CRAWLER_MAX_IMAGE_BYTES, default 32 MiB) so a
10 GiB \"image\" can't fill memory before disk.
- looks_like_image() reject path: non-image bytes fail the chapter or
cover instead of being stored as .bin and served as
application/octet-stream.
- session::classify_chapter_probe: three-way classifier replaces the
binary #avatar_menu check at content.rs:115. A transient hiccup
(broken-page body, or logged-in-but-no-reader) now retries with
backoff instead of falsely freezing every worker on
session_expired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>