Compare commits

..

3 Commits

Author SHA1 Message Date
MechaCat02
0a3877be51 chore: full hop-by-hop header strip and 60s timeout on /api/* proxy
The SvelteKit proxy was only stripping host + content-length; the rest
of RFC 7230 §6.1 (connection, keep-alive, proxy-authenticate,
proxy-authorization, te, trailer, transfer-encoding, upgrade) leaked
through to axum. Axum doesn't emit them so the impact is theoretical,
but the proxy should be RFC-conformant. Also adds an AbortController
with a configurable 60s timeout (BACKEND_PROXY_TIMEOUT_MS) so a
wedged backend can't hang the browser request indefinitely — failures
surface as the standard 502 upstream_unavailable envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 20:07:39 +02:00
MechaCat02
e7662d18d6 feat: gitea actions for build, push, and ssh deploy (0.34.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 06:56:13 +02:00
MechaCat02
45ce0d8f12 feat: incremental crawl mode with seed-completion gate (0.33.0)
Daemon now auto-detects mode per source: Backfill until the first
full walk records `seed_completed:<source>` in `crawler_state`, then
Incremental (newest-first, stops after N consecutive Unchanged
upserts). `CRAWLER_MODE` overrides to a fixed mode; CLI rejects
`auto` since it has no pre-run DB state.

`Source::discover` returns a lazy `DiscoverWalk` so Incremental can
break out mid-walk without prefetching pages. The drop pass and seed
marker are now gated on a true full walk — fixes a latent soft-drop
of the index tail under partial sweeps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 06:41:26 +02:00
9 changed files with 376 additions and 12 deletions

View File

@@ -51,3 +51,8 @@ MAX_FILE_BYTES=20971520
# internal docker network. Override only if you're running the
# frontend container against a backend somewhere else.
BACKEND_URL=http://backend:8080
# Per-request wall-clock cap for the /api/* reverse proxy (milliseconds).
# Default 300000 (5 min) covers a typical 200 MiB chapter upload over
# 25 Mbps; raise for users on slower upstream links or lower if a
# tighter front proxy already bounds the request lifetime.
BACKEND_PROXY_TIMEOUT_MS=300000

71
.gitea/README.md Normal file
View File

@@ -0,0 +1,71 @@
# Gitea Actions
The [`deploy`](workflows/deploy.yml) workflow runs on every push to `main`
(and via manual `workflow_dispatch`). It tests, builds, pushes the images
to a private registry, and rolls the stack over by SSH on the target host.
## Required secrets
Set under *Repo Settings → Actions → Secrets*:
| Name | Example | Purpose |
| -------------------- | ------------------------ | ---------------------------------------------------------------- |
| `REGISTRY_URL` | `registry.example.com` | Registry host. No scheme, no trailing slash. |
| `REGISTRY_USERNAME` | `mangalord-ci` | `docker login` user. |
| `REGISTRY_PASSWORD` | `<token>` | `docker login` token/password. |
| `SSH_HOST` | `mangalord.example.com` | Deploy target hostname/IP. |
| `SSH_USER` | `deploy` | SSH user on the target (must be in the `docker` group). |
| `SSH_PRIVATE_KEY` | `-----BEGIN OPENSSH...` | Private key authorised in the target user's `authorized_keys`. |
| `SSH_PORT` | `22` | Optional. Defaults to `22` if unset. |
## Required variables
Set under *Repo Settings → Actions → Variables* (not secrets — they appear
in logs):
| Name | Example | Purpose |
| ------------- | ------------------------ | ---------------------------------------------------------------------- |
| `DEPLOY_PATH` | `/srv/mangalord` | Directory on target holding `docker-compose.yml`, `.env`, and the prod overlay. |
## One-time host setup
The workflow assumes the deploy target already has:
1. Docker + Docker Compose v2 installed and the `SSH_USER` in the `docker` group.
2. `$DEPLOY_PATH/docker-compose.yml` (copy of the repo's [docker-compose.yml](../docker-compose.yml)).
3. `$DEPLOY_PATH/docker-compose.prod.yml` (copy of the repo's [docker-compose.prod.yml](../docker-compose.prod.yml)).
4. `$DEPLOY_PATH/.env` populated from [.env.example](../.env.example) with production values (real `POSTGRES_PASSWORD`, `COOKIE_SECURE=true`, etc.).
Bootstrap once:
```bash
ssh deploy@mangalord.example.com
sudo mkdir -p /srv/mangalord && sudo chown deploy:deploy /srv/mangalord
cd /srv/mangalord
# place docker-compose.yml, docker-compose.prod.yml, and .env here
```
The first workflow run will pull the images, bring the stack up, and run
the embedded migrations on startup.
## Image tags
Every push produces three tags per image:
- `mangalord-{backend,frontend}:latest`
- `mangalord-{backend,frontend}:<git-sha>` — used by the deploy job; lets
you pin a deploy to a specific commit
- `mangalord-{backend,frontend}:<version>` — the version from
[backend/Cargo.toml](../backend/Cargo.toml) (verified in lockstep with
[frontend/package.json](../frontend/package.json))
## Rollback
SSH to the target, set `IMAGE_TAG` to a previous commit SHA, and re-up:
```bash
cd /srv/mangalord
export REGISTRY_URL=registry.example.com
export IMAGE_TAG=<previous-sha>
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
```

144
.gitea/workflows/deploy.yml Normal file
View File

@@ -0,0 +1,144 @@
name: deploy
on:
push:
branches: [main]
workflow_dispatch:
jobs:
test-backend:
runs-on: ubuntu-latest
container:
image: rust:1-slim
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_USER: mangalord
POSTGRES_PASSWORD: mangalord
POSTGRES_DB: mangalord
options: >-
--health-cmd "pg_isready -U mangalord"
--health-interval 5s
--health-timeout 5s
--health-retries 10
env:
DATABASE_URL: postgres://mangalord:mangalord@postgres:5432/mangalord
steps:
- uses: actions/checkout@v4
- name: Install build deps
run: |
apt-get update
apt-get install -y --no-install-recommends pkg-config libssl-dev ca-certificates
- name: Cache cargo registry and target
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
backend/target
key: cargo-${{ runner.os }}-${{ hashFiles('backend/Cargo.lock') }}
restore-keys: |
cargo-${{ runner.os }}-
- name: cargo test
working-directory: backend
run: cargo test --locked
test-frontend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
cache: npm
cache-dependency-path: frontend/package-lock.json
- name: npm ci
working-directory: frontend
run: npm ci
- name: vitest
working-directory: frontend
run: npm test
build-and-push:
runs-on: ubuntu-latest
needs: [test-backend, test-frontend]
outputs:
image_tag: ${{ steps.meta.outputs.image_tag }}
version: ${{ steps.meta.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Resolve image tags
id: meta
run: |
version="$(grep -m1 '^version' backend/Cargo.toml | cut -d'"' -f2)"
frontend_version="$(grep -m1 '"version"' frontend/package.json | cut -d'"' -f4)"
if [ "$version" != "$frontend_version" ]; then
echo "Version mismatch: backend=$version frontend=$frontend_version" >&2
exit 1
fi
echo "image_tag=${GITHUB_SHA}" >> "$GITHUB_OUTPUT"
echo "version=${version}" >> "$GITHUB_OUTPUT"
- uses: docker/setup-buildx-action@v3
- name: docker login
uses: docker/login-action@v3
with:
registry: ${{ secrets.REGISTRY_URL }}
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Build & push backend
uses: docker/build-push-action@v5
with:
context: ./backend
push: true
tags: |
${{ secrets.REGISTRY_URL }}/mangalord-backend:latest
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.image_tag }}
${{ secrets.REGISTRY_URL }}/mangalord-backend:${{ steps.meta.outputs.version }}
cache-from: type=gha,scope=backend
cache-to: type=gha,mode=max,scope=backend
- name: Build & push frontend
uses: docker/build-push-action@v5
with:
context: ./frontend
push: true
tags: |
${{ secrets.REGISTRY_URL }}/mangalord-frontend:latest
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.image_tag }}
${{ secrets.REGISTRY_URL }}/mangalord-frontend:${{ steps.meta.outputs.version }}
cache-from: type=gha,scope=frontend
cache-to: type=gha,mode=max,scope=frontend
deploy:
runs-on: ubuntu-latest
needs: build-and-push
steps:
- name: SSH deploy
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.SSH_HOST }}
username: ${{ secrets.SSH_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
port: ${{ secrets.SSH_PORT || 22 }}
envs: REGISTRY_URL,REGISTRY_USERNAME,REGISTRY_PASSWORD,IMAGE_TAG,DEPLOY_PATH
script_stop: true
script: |
set -euo pipefail
cd "$DEPLOY_PATH"
echo "$REGISTRY_PASSWORD" | docker login "$REGISTRY_URL" -u "$REGISTRY_USERNAME" --password-stdin
export REGISTRY_URL IMAGE_TAG
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
docker image prune -f
docker logout "$REGISTRY_URL"
env:
REGISTRY_URL: ${{ secrets.REGISTRY_URL }}
REGISTRY_USERNAME: ${{ secrets.REGISTRY_USERNAME }}
REGISTRY_PASSWORD: ${{ secrets.REGISTRY_PASSWORD }}
IMAGE_TAG: ${{ needs.build-and-push.outputs.image_tag }}
DEPLOY_PATH: ${{ vars.DEPLOY_PATH }}

2
backend/Cargo.lock generated
View File

@@ -1470,7 +1470,7 @@ checksum = "c41e0c4fef86961ac6d6f8a82609f55f31b05e4fce149ac5710e439df7619ba4"
[[package]]
name = "mangalord"
version = "0.33.0"
version = "0.34.0"
dependencies = [
"anyhow",
"argon2",

View File

@@ -1,6 +1,6 @@
[package]
name = "mangalord"
version = "0.33.0"
version = "0.34.0"
edition = "2021"
default-run = "mangalord"

22
docker-compose.prod.yml Normal file
View File

@@ -0,0 +1,22 @@
# Production overlay: layer on top of docker-compose.yml on the deploy
# host so the backend and frontend run from pre-built registry images
# instead of building locally.
#
# docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
#
# REGISTRY_URL and IMAGE_TAG are injected by .gitea/workflows/deploy.yml
# at deploy time. IMAGE_TAG defaults to `latest` so a manual
# `docker compose ... up -d` on the host still works.
services:
backend:
build: !reset null
image: ${REGISTRY_URL}/mangalord-backend:${IMAGE_TAG:-latest}
pull_policy: always
restart: unless-stopped
frontend:
build: !reset null
image: ${REGISTRY_URL}/mangalord-frontend:${IMAGE_TAG:-latest}
pull_policy: always
restart: unless-stopped

View File

@@ -1,6 +1,6 @@
{
"name": "mangalord-frontend",
"version": "0.33.0",
"version": "0.34.0",
"private": true,
"type": "module",
"scripts": {

View File

@@ -118,4 +118,77 @@ describe('hooks.server proxy', () => {
expect(body.error.code).toBe('upstream_unavailable');
expect(errSpy).toHaveBeenCalled();
});
it('strips every hop-by-hop header listed in RFC 7230 §6.1', async () => {
// Defence in depth: axum doesn't emit these, but a future
// middleware that did would otherwise leak per-connection
// state across the proxy boundary.
fetchSpy.mockResolvedValueOnce(new Response('[]', { status: 200 }));
const resolve = vi.fn();
await handle({
event: makeEvent('/api/v1/health', {
headers: {
host: 'app.example.com',
'content-length': '0',
connection: 'keep-alive',
'keep-alive': 'timeout=5',
'proxy-authenticate': 'Basic realm=x',
'proxy-authorization': 'Basic xyz',
te: 'trailers',
trailer: 'Expires',
'transfer-encoding': 'chunked',
upgrade: 'websocket',
// A non-hop-by-hop header to ensure non-targets
// aren't accidentally stripped.
'x-custom': 'pass-through'
}
}),
resolve
});
const init = fetchSpy.mock.calls[0][1] as RequestInit;
const headers = init.headers as Headers;
for (const h of [
'host',
'content-length',
'connection',
'keep-alive',
'proxy-authenticate',
'proxy-authorization',
'te',
'trailer',
'transfer-encoding',
'upgrade'
]) {
expect(headers.get(h), `${h} should be stripped`).toBeNull();
}
expect(headers.get('x-custom')).toBe('pass-through');
});
it('aborts and returns 502 when the upstream stalls past the timeout', async () => {
const errSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
// Simulate an aborted fetch (AbortController.abort() raises a
// DOMException with name 'AbortError' on Node's fetch). The
// handler should treat it as the same upstream_unavailable
// 502 it uses for any other network failure.
const abortErr = new DOMException('aborted', 'AbortError');
fetchSpy.mockRejectedValueOnce(abortErr);
const resolve = vi.fn();
const resp = await handle({ event: makeEvent('/api/v1/slow'), resolve });
expect(resp.status).toBe(502);
const body = await resp.json();
expect(body.error.code).toBe('upstream_unavailable');
expect(errSpy).toHaveBeenCalled();
});
it('attaches an AbortSignal to the upstream fetch so it can time out', async () => {
fetchSpy.mockResolvedValueOnce(new Response('[]', { status: 200 }));
const resolve = vi.fn();
await handle({ event: makeEvent('/api/v1/health'), resolve });
const init = fetchSpy.mock.calls[0][1] as RequestInit;
expect(init.signal).toBeInstanceOf(AbortSignal);
// The signal hasn't fired (handler returned in time), but its
// presence is the contract this test is pinning.
expect(init.signal?.aborted).toBe(false);
});
});

View File

@@ -12,20 +12,66 @@ import type { Handle } from '@sveltejs/kit';
const BACKEND_URL = process.env.BACKEND_URL ?? 'http://localhost:8080';
/**
* Hop-by-hop headers per RFC 7230 §6.1. These are scoped to a single
* transport-level connection and must not be forwarded by a proxy.
* Plus `host` and `content-length`: `host` would mislead the backend
* about its origin, and `content-length` is recomputed by the upstream
* fetch from the body stream.
*/
const HOP_BY_HOP_HEADERS = [
'host',
'content-length',
'connection',
'keep-alive',
'proxy-authenticate',
'proxy-authorization',
'te',
'trailer',
'transfer-encoding',
'upgrade'
];
/**
* Cap each proxied request at 5 minutes. The bound exists to surface
* a wedged backend (stuck on a slow DB query, deadlocked, etc.) as a
* 502 rather than letting the browser request hang indefinitely.
*
* The default leans toward the slow-upload end of the spectrum: at a
* 1 Mbps upstream, a 200 MiB chapter upload (the default
* `MAX_REQUEST_BYTES` cap) needs ~27 minutes; 300 s covers the more
* realistic 25 Mbps urban-broadband case (~64 s for the same upload)
* with comfortable headroom. Operators serving very slow clients
* should raise `BACKEND_PROXY_TIMEOUT_MS`; operators behind a
* tighter upstream proxy may want to lower it. A future improvement
* is an idle-based timeout (reset per chunk) instead of this
* wall-clock budget — that's a fair bit more code, deferred.
*/
const PROXY_TIMEOUT_MS = (() => {
const raw = process.env.BACKEND_PROXY_TIMEOUT_MS;
const n = raw ? Number(raw) : 300_000;
return Number.isFinite(n) && n > 0 ? n : 300_000;
})();
export const handle: Handle = async ({ event, resolve }) => {
if (event.url.pathname.startsWith('/api/')) {
const target = `${BACKEND_URL}${event.url.pathname}${event.url.search}`;
// Strip hop-by-hop headers — `host` would mislead the backend
// about the origin, and `content-length` will be recomputed.
const headers = new Headers(event.request.headers);
headers.delete('host');
headers.delete('content-length');
for (const h of HOP_BY_HOP_HEADERS) headers.delete(h);
// AbortController times the upstream fetch out so a backend
// wedged on a slow DB query doesn't keep the browser request
// hanging forever. The `signal` is also wired into the
// RequestInit so the body stream is cancelled cleanly.
const ctrl = new AbortController();
const timeoutHandle = setTimeout(() => ctrl.abort(), PROXY_TIMEOUT_MS);
const init: RequestInit & { duplex?: 'half' } = {
method: event.request.method,
headers,
redirect: 'manual'
redirect: 'manual',
signal: ctrl.signal
};
if (event.request.method !== 'GET' && event.request.method !== 'HEAD') {
init.body = event.request.body;
@@ -39,11 +85,13 @@ export const handle: Handle = async ({ event, resolve }) => {
upstream = await fetch(target, init);
} catch (e) {
// Network-layer failure (DNS / connection refused / TLS
// handshake) — most commonly "backend container restarting".
// SvelteKit's default 500 would be an HTML page that
// client.ts can't .json(), which masks the real cause. Emit
// the standard envelope with a dedicated code instead.
// handshake / abort by timeout) — most commonly "backend
// container restarting". SvelteKit's default 500 would be
// an HTML page that client.ts can't .json(), which masks
// the real cause. Emit the standard envelope with a
// dedicated code instead.
console.error('Proxy to backend failed:', e);
clearTimeout(timeoutHandle);
return new Response(
JSON.stringify({
error: {
@@ -58,6 +106,7 @@ export const handle: Handle = async ({ event, resolve }) => {
);
}
clearTimeout(timeoutHandle);
return new Response(upstream.body, {
status: upstream.status,
statusText: upstream.statusText,