docs(blueprint): add §11.6 Phase 3.5 users, roles, and bearer-token auth

Specifies the unified can(principal, capability) gate, instance roles
(owner/admin/member), per-app memberships (app_admin/editor/viewer),
pic_-prefixed API keys, and the schema rooms for invites / MFA /
service accounts. Updates §12 Phase 3 to add 3c as a third foundation
piece alongside 3a (admin auth) and 3b (multi-app scoping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-26 21:31:25 +02:00
parent a393f11344
commit 5546323cdc

View File

@@ -1022,6 +1022,152 @@ The scripts and routes endpoints keep their existing shape — this avoids forci
---
## 11.6 Users, roles, and bearer-token auth (Phase 3.5) — Pending
**Status**: pending. Targets `crates/manager-core/src/{authz,api_keys_api,api_key_repo}.rs`, an extended `auth_middleware.rs`, new shared types under `crates/shared/src/auth.rs`, migration `0006_users_authz.sql`.
**Purpose**: bridge Phase 3b → Phase 4. Phase 4's v1.1 SDKs (KV, docs, HTTP, cron) each gate access on the calling principal. Without a real authorization model in place, every SDK addition has to either invent its own gate or stay open. Phase 3.5 lands `can(principal, capability)` as the single check every future SDK + admin endpoint goes through, so v1.1 work focuses on data plane shape, not on re-litigating auth.
**Why this slot**: same logic as Phase 3b. Adding a `Principal` parameter and a capability check to surfaces that don't exist yet is free; retrofitting them onto live SDK services after v1.1 ships is a refactor of every gate.
### Principal Model
One `Principal` value represents a human admin user. Service accounts (CI bots, Rhai scripts calling out) get **schema room** in this phase but no runtime support — `users.kind` style differentiation lands when Phase 4's `users.*` SDK arrives. Until then, every authenticated request resolves to exactly one admin row, whether the credential is a session cookie or a bearer API key.
```rust
pub struct Principal {
pub user_id: UserId, // alias of AdminUserId for the transition
pub instance_role: InstanceRole,
pub scopes: Option<Vec<Scope>>, // None = cookie session (full role authority)
// Some = API key (intersect with role)
pub app_binding: Option<AppId>, // API key bound to one app; denies other apps
}
```
### Instance Roles (one per user)
| Role | Powers |
|---|---|
| `owner` | full instance control, manage other owners, implicit `app_admin` on every app. Multiple owners allowed. |
| `admin` | create apps, invite users, implicit `editor` on every app. Cannot manage instance-wide settings or other owners. |
| `member` | invited into specific apps only. Cannot create apps, cannot invite. **Strict isolation enforced at SQL** — list endpoints `WHERE app_id IN (SELECT app_id FROM app_members WHERE user_id = $1)`; the API never returns apps a member isn't part of. |
The current Phase 3a `admin_users` rows all become `owner` via `DEFAULT 'owner'` on the new column. Multi-owner installs get a startup `tracing::warn!` listing the active owner usernames so the operator can demote extras via `PATCH /api/v1/admin/admins/{id}`.
### App-Scoped Roles (zero-to-many per user × app)
| Role | Grants |
|---|---|
| `app_admin` | settings, domain claims, delete app |
| `editor` | CRUD on scripts, routes, sandbox config |
| `viewer` | read scripts + execution logs |
Implicit grants from instance role: every `owner` is `app_admin` on every app; every `admin` is `editor` on every app. Explicit `app_members` rows are the only path for `member` users.
### Auth Methods — Same Principal, Different Extractor
Two credential types feed the same middleware:
1. **Session cookie** (Phase 3a, unchanged) — `picloud_session=<token>`. Extracted by header or cookie. SHA-256 lookup against `admin_sessions.token_hash`. Sliding 24h TTL. Produces `Principal { scopes: None, app_binding: None }`.
2. **Bearer API key** (new) — `Authorization: Bearer pic_<base32(32 random bytes)>`. The `pic_` prefix is the discriminator: present → API key path; absent → session path. The 8 chars immediately after `pic_` are indexed (`api_keys.prefix`); the full body after `pic_` is Argon2id-verified against each candidate's `hash`. Last-used timestamp updates inline.
Both paths converge on the same `Principal` extension; handlers cannot tell which credential was presented unless they introspect `principal.scopes`.
### API Key Format & Storage
- Raw form: `pic_<base32(32 random bytes, no padding)>` — ~56 chars total.
- Stored: 8-char prefix + Argon2id PHC hash of the body. Raw value returned **exactly once** in the `POST /api/v1/admin/api-keys` response; never logged, never readable again.
- Optional `expires_at`. Lookup queries always filter `expires_at IS NULL OR expires_at > NOW()`.
- Optional `app_id` ("bound key") — every `App*(other_app)` capability is denied for this key, regardless of the user's role.
### Scope Set (intentionally narrow)
Exactly seven scopes; no further subdivision until a real use case appears:
`script:read`, `script:write`, `route:write`, `domain:manage`, `log:read`, `app:admin`, `instance:admin`
Mint-time validation rejects unknown values. Bound keys (`app_id` set) cannot carry `instance:*` scopes — the combination is irreconcilable (a bound credential cannot claim instance-wide authority) and is rejected with 422.
### Effective Capability — `can(principal, capability)`
```
allow = role_grants(principal.instance_role, capability)
∧ (principal.scopes.is_none() required_scope(capability) ∈ principal.scopes)
∧ (principal.app_binding.is_none() capability.app_id() == principal.app_binding)
```
`role_grants` collapses the three tables (instance role + implicit app grants + explicit `app_members`) into a single yes/no. Each handler calls `state.authz.require(&principal, Capability::AppWrite(script.app_id))` after loading the resource (so the capability binds to the resource's actual `app_id`, not a path param the caller controls).
### Deactivation Symmetry
Phase 3a's `set_active(false)` wipes that user's `admin_sessions`. Phase 3.5 extends it to also set `expires_at = NOW()` on every row in `api_keys WHERE user_id = $1` — both credential surfaces become inert at the same moment, no enumeration window.
### CLI Auth Posture (forward note)
The eventual `picloud` CLI authenticates by **paste-the-token**, not OAuth: the user runs `picloud login`, the dashboard mints a fresh key (or the user mints one via `POST /api/v1/admin/api-keys`), and the CLI prompts for the raw token. The CLI binary itself is deferred; the dashboard surface and the bearer credential type land here so the CLI is a thin wrapper when it arrives.
### Schema (Migration 0006)
```sql
ALTER TABLE admin_users
ADD COLUMN instance_role TEXT NOT NULL DEFAULT 'owner'
CHECK (instance_role IN ('owner','admin','member')),
ADD COLUMN email TEXT UNIQUE,
ADD COLUMN mfa_secret TEXT; -- reserved slot, not built
CREATE TABLE app_members (
app_id UUID NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
role TEXT NOT NULL CHECK (role IN ('app_admin','editor','viewer')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (app_id, user_id)
);
CREATE INDEX app_members_user_id_idx ON app_members (user_id);
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES admin_users(id) ON DELETE CASCADE,
hash TEXT NOT NULL, -- Argon2id PHC
prefix TEXT NOT NULL, -- first 8 chars after `pic_`
name TEXT NOT NULL,
scopes TEXT[] NOT NULL, -- intersected with role at check time
app_id UUID NULL REFERENCES apps(id) ON DELETE CASCADE,
expires_at TIMESTAMPTZ NULL,
last_used_at TIMESTAMPTZ NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX api_keys_prefix_idx ON api_keys (prefix);
CREATE INDEX api_keys_user_id_idx ON api_keys (user_id);
-- Reserved (not built this phase):
-- invites (token, email, instance_role, app_id, app_role, invited_by, expires_at, consumed_at)
-- service_accounts (id, name, owning_user_id, …)
```
### New Endpoints (additive — no API major bump)
```
POST /api/v1/admin/api-keys — { name, scopes[], app_id?, expires_at? }
→ 201 { …, raw_token } (raw returned exactly once)
GET /api/v1/admin/api-keys — list caller's own keys (no raw)
DELETE /api/v1/admin/api-keys/{id} — caller's own only
```
Every existing `/api/v1/admin/*` endpoint is re-gated from "any authed admin" to a specific `Capability`. Request/response shapes are unchanged; what changes is the set of callers each endpoint accepts (a `member` now gets 403 on app surfaces they're not part of, where before they would have been 401-or-200 depending only on session validity).
### Out of Scope (Phase 3.5)
Schema room only, not built:
- **Invites** — email-based join flow; `invites` table reserved in the migration comment block.
- **MFA / TOTP** — `mfa_secret` column reserved on `admin_users`.
- **Service accounts** — reserved as a future table; for now, every API key belongs to a human `admin_users` row.
Defer to follow-up sessions: dashboard surfaces for invites / member management / key minting (curl is the supported interface this phase), OIDC / SAML / SCIM, the `picloud` CLI binary itself, email/SMTP delivery of invites, audit log shipping.
---
## 12. Development Roadmap
### Phase 1: MVP ✓ (Shipped)
@@ -1048,13 +1194,15 @@ The scripts and routes endpoints keep their existing shape — this avoids forci
### Phase 3: v1.0.x — Foundations (Current focus)
Two foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
Three foundation pieces that must land before the v1.1 service expansion, because retrofitting them later is expensive.
**3a. Admin auth** — ✓ shipped. See section 11.4. Per-user `admin_users` (not a shared secret), Argon2id passwords, env-var bootstrap of the first admin, session-token doubling as bearer token for API. No roles in this cut; schema is forward-compatible with later RBAC.
**3b. Multi-app scoping** — ✓ shipped. See section 11.5. `apps`, `app_domains`, `app_slug_history` tables; `app_id` columns on `scripts`, `routes`, `execution_logs`. Migration assigns existing data to a `default` app and always claims `localhost`; a Rust-side bootstrap inserts a `Hello World` script + `/hello` route when the default app is empty. Orchestrator dispatch is two-phase (Host → app → route trie). `/api/v1/execute/{id}/*` continues to work without a public domain claim. Dashboard is app-hierarchical (`/admin/apps`, `/admin/apps/{slug}/...`); API stays flat with new endpoints under `/api/v1/admin/apps/*` and a `?app=` filter on script listing. Per-app admin roles deferred.
**Why both before v1.1**: every v1.1 service (KV, docs, users, etc.) needs an `app_id` scoping key in its schema. Adding it now, with one small migration on existing tables, is cheap. Adding it after those services ship is several migrations on populated data.
**3c. Users, roles, and bearer-token auth** — pending. See section 11.6. Adds `instance_role` to `admin_users` (`owner`/`admin`/`member`), `app_members` for per-app `app_admin`/`editor`/`viewer` grants, and `api_keys` for `Authorization: Bearer pic_…` credentials. Unifies cookie-session and API-key paths behind a single `can(principal, capability)` gate; list endpoints filter by membership at SQL for `member` users. Dashboard surfaces, invites, MFA, service accounts, and the `picloud` CLI binary are deferred — schema room only.
**Why all three before v1.1**: every v1.1 service (KV, docs, users, etc.) needs both an `app_id` scoping key in its schema and a `Principal` to authorize against. Adding both now is one migration each on a small surface; adding them after the SDKs ship is many migrations on populated data plus a re-gate of every SDK call.
---