fix(crawler): close walker race against site reordering (0.35.1)

The target site orders by update_date DESC, and any new or updated
manga pushes everyone down by one slot. The paginated walker was
blind to this drift:

* Backfill (page last -> 1): shifts push items into pages already
  finished. The displaced manga was silently missed; with
  mark_dropped_mangas running on a fully-completed walk, items even
  got false-dropped because last_seen_at was stale.
* Incremental (page 1 -> last): a shift causes the slot-last item
  of an already-read page to reappear on the next page, leading to
  a redundant fetch_manga and an inflated consecutive_unchanged
  streak.

Fix is two-pronged:

1. Backfill boundary re-check. After fetching each page P, re-fetch
   the previously-walked page P+1 and check where its old slot-0
   key now sits. If it slid to slot K, the first K entries are
   items that used to live on P and slid past us; they get appended
   to the batch. If the anchor is gone entirely (multi-page shift
   or it was bumped to page 1), the whole re-fetched page is
   processed conservatively and the pipeline dedup absorbs the
   noise. The re-check must be the *last* navigation of the
   iteration to close the within-iteration race.

2. Run-scoped dedup in run_metadata_pass. A HashSet<String> of
   source_manga_keys avoids double-processing. The set uses a
   contains-then-insert pattern with insert firing *after* a
   successful upsert, so a transient fetch/upsert failure leaves
   the key retryable if it reappears later in the same pass (via
   the boundary re-check or another batch).

Incremental mode does not run the re-check (shifts move in the
same direction as the walk); only the dedup helps it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-29 20:14:01 +02:00
parent f57ca8e45c
commit dea9b1aaa8
5 changed files with 294 additions and 4 deletions

2
backend/Cargo.lock generated
View File

@@ -1470,7 +1470,7 @@ checksum = "c41e0c4fef86961ac6d6f8a82609f55f31b05e4fce149ac5710e439df7619ba4"
[[package]]
name = "mangalord"
version = "0.35.0"
version = "0.35.1"
dependencies = [
"anyhow",
"argon2",