feat: crawler manga-list & metadata sync with cover download (0.23.0)
- TargetSource: first concrete impl of the Source trait, modeled on
the old Puppeteer crawler's selectors (+ status normalization,
tag-count stripping, chapter list)
- DiscoverMode::Backfill walks pagination last->1, reverse within each
page (oldest-first); Incremental walks forward
- RateLimiter (tokio-time aware) plumbed through FetchContext so the
pagination walk honors the same per-host budget as the outer loop
- repo::crawler: ensure_source, upsert_manga_from_source (returns
New/Updated/Unchanged + current cover_image_path for backfill
decisions), sync_manga_chapters, mark_dropped_mangas — all
transactional, with case-insensitive lookups and source-insertable
genres
- Cover image download via reqwest+infer; stored under
mangas/{id}/cover.{ext} via the Storage trait
- Single CRAWLER_PROXY env wires both Chromium (--proxy-server) and
reqwest::Proxy::all (HTTP/HTTPS/SOCKS5)
- Crawler binary: positional start URL or $CRAWLER_START_URL,
$CRAWLER_LIMIT (cap fetches + skip drop pass on partial runs),
$CRAWLER_SKIP_CHAPTERS (disable selector AND sync), $CRAWLER_RATE_MS
- Silences chromiumoxide 0.7's known CDP deserialize log spam via
default tracing filter + CdpError::Serde downgrade
- 9 sqlx integration tests + 11 selector/rate-limit unit tests
This commit is contained in:
@@ -18,6 +18,7 @@ use std::path::PathBuf;
|
||||
|
||||
use anyhow::Context;
|
||||
use chromiumoxide::browser::{Browser, BrowserConfig};
|
||||
use chromiumoxide::error::CdpError;
|
||||
use chromiumoxide::fetcher::{BrowserFetcher, BrowserFetcherOptions};
|
||||
use futures_util::StreamExt;
|
||||
use tokio::task::JoinHandle;
|
||||
@@ -169,8 +170,16 @@ pub async fn launch(options: LaunchOptions) -> anyhow::Result<Handle> {
|
||||
|
||||
let driver = tokio::spawn(async move {
|
||||
while let Some(event) = handler.next().await {
|
||||
if let Err(err) = event {
|
||||
tracing::warn!(?err, "chromium handler event error");
|
||||
match event {
|
||||
Ok(_) => {}
|
||||
// chromiumoxide 0.7 ships fixed CDP type bindings, so any
|
||||
// CDP event Chrome added later fails to deserialize. The
|
||||
// connection is unaffected — these are noise. Suppress
|
||||
// them so real failures stay visible.
|
||||
Err(CdpError::Serde(_)) => {
|
||||
tracing::trace!("chromium emitted an unrecognized CDP event");
|
||||
}
|
||||
Err(err) => tracing::warn!(?err, "chromium handler event error"),
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user