Files
PiCloud/docs/versioning.md
MechaCat02 0eaf4aee69 chore: versioning guardrail script for the structural checks
scripts/check-versioning.sh — POSIX sh, no dependencies, runs in
under a second. Three structural checks that don't need git
history (the parts that do need it stay deferred until we have CI
and a CHANGELOG file):

  1. Migration filenames are sequential 0001_*.sql, 0002_*.sql, ...
     with no gaps or duplicates. Catches "added migration with
     the wrong number" before it reaches review.
  2. SDK_VERSION in shared::version parses as MAJOR.MINOR
     (numeric, no extra components). Catches accidental
     PATCH-style bumps like "1.1.0" that the SemVer-for-SDKs
     rule in docs/versioning.md forbids.
  3. [workspace.package].version parses as MAJOR.MINOR.PATCH
     (numeric). Catches typos in the product version bump
     that would silently downgrade everywhere.

Each check prints a precise FAIL message identifying the
offending file/value when it trips. Verified by deliberately
breaking each one and confirming exit=1.

Run manually as `bash scripts/check-versioning.sh` for now; wires
into CI as soon as we have one. Docs/versioning.md updated to
reflect that items (3) and (4) are now in place and (5) is partly
implemented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 22:21:37 +02:00

8.5 KiB

Versioning

PiCloud carries one product version for the build you install, and independent versions on the four contracts that actually break for users. The product version answers "which build do I have"; surface versions answer "which contracts does that build honor".

This split exists because crate-level SemVer between, say, picloud-shared and picloud-manager-core is fiction — they always ship together. The boundaries that matter are user-facing: scripts depending on the SDK, callers hitting the HTTP API, databases shared across deploys, and (later) executor nodes talking to a manager.


What gets a version

Lockstep — one number for the whole thing

All of these carry the same version and are bumped together:

  • Every crate in the Cargo workspace (via version.workspace = true)
  • The dashboard's package.json
  • Docker image tags (picloud:0.2.0)
  • Git tags (v0.2.0)

Defined once in Cargo.toml under [workspace.package]. There is no scenario where one crate is at a different version than another in the same build.

Independent — versioned at each surface

Surface Where the version lives Format Bump rule
Rhai SDK shared::version::SDK_VERSION, exposed to scripts as ctx.sdk_version "major.minor" string Minor: additions; Major: removals/renames/retyped
HTTP API URL prefix /api/v{N}/...; shared::version::API_VERSION is the current major integer New integer when request/response shape, status semantics, or auth model changes
Database schema Largest applied migration ID (manager-core::migrations::latest_version()) integer, monotonic One per forward migration; never edit a committed file
Inter-service wire (cluster mode, v1.3+) X-PiCloud-Wire request header; shared::version::WIRE_VERSION integer New integer when RPC shape changes

All five live in one place so /version can return them honestly.


Per-surface compatibility rules

Rhai SDK (strictest)

Scripts run in production with no recompile. A wrong SDK bump silently breaks user code.

  • Patch (1.2.0 → 1.2.1) — doc fixes, internal optimizations. No script-observable change.
  • Minor (1.2 → 1.3) — added functions; added optional ctx.* fields; relaxed limits; new variants accepted alongside old ones. Every script written for 1.2 must still run unchanged on 1.3.
  • Major (1 → 2) — anything removed, renamed, retyped, restricted, or made required.

Scripts can detect available features at runtime:

if ctx.sdk_version >= "1.2" {
    // call kv.* (added in 1.2)
}

The contract test in crates/executor-core/tests/sdk_contract/ (coming alongside the first SDK additions) holds golden scripts that exercise every documented SDK surface. They must pass on every commit. A minor bump that breaks any of them is a build failure.

HTTP API

Path prefix is the version. Within a major, the following are non-breaking and welcome:

  • New endpoints
  • New optional request fields
  • New response fields (clients must ignore unknown fields)
  • New Deprecation: headers warning of upcoming removals

The following require a new major (/api/v2/...):

  • Removed endpoints, removed response fields, renamed fields
  • Changed request-field types or required-field additions
  • Changed status-code semantics for the same outcome
  • Auth model changes

When vN+1 ships, vN stays live for at least one product minor (so users have a release cycle to migrate). Deprecation is announced via the Deprecation: true and Sunset: <date> response headers on the old prefix before removal.

Database schema

  • Forward-only. Never edit a migration that has shipped. If a migration was wrong, write a new one that fixes it.
  • Migrations are numbered sequentially (0001_init.sql, 0002_*.sql, ...). The number is the schema version.
  • A given binary applies migrations strictly greater than the last-applied ID, then refuses to start if its embedded migrations are older than what's in the DB — that would imply a downgrade, which is never automatic.
  • This makes rolling deploys safe: the schema is always "ahead of or equal to" any running binary in the cluster.

Wire protocol (cluster mode, v1.3+)

  • Inter-service RPCs include X-PiCloud-Wire: N.
  • A peer that doesn't recognize N refuses the call and returns 426 Upgrade Required with the version it speaks.
  • Both versions must be live in the cluster during rolling upgrades — current and current-minus-one — until all nodes agree on the new one.

How we check and enforce

A versioning scheme without enforcement decays in months. Five cheap mechanical checks:

  1. Compile-time uniformity. All workspace crates inherit version.workspace = true. Drift is impossible to introduce.

  2. Runtime self-report. GET /version returns every surface version. Dashboards, monitoring, inter-service handshakes, and humans all read from one source. /healthz stays a plain "ok" string for k8s probes — version negotiation is a separate concern.

  3. Golden SDK contract tests. tests/sdk_contract/ Rhai scripts exercise every SDK surface and must pass on every commit. The contract is the test.

  4. Migration replay test. An integration test that boots a fresh Postgres, applies every migration in order, and asserts the resulting schema. Catches the most common mistake (edited-not-added migration).

  5. CI guardrail script. scripts/check-versioning.sh — runs the structural checks that don't need git history:

    • Migration files are numbered sequentially from 0001_*.sql with no gaps.
    • SDK_VERSION parses as MAJOR.MINOR (numeric, no extra components).
    • [workspace.package].version parses as MAJOR.MINOR.PATCH.

    Run manually as bash scripts/check-versioning.sh. Wires into CI when CI exists. Deferred to the same future PR that introduces CI: SDK-major-bump-needs-CHANGELOG and BREAKING: commit-message annotation (both need git history + a CHANGELOG file that doesn't exist yet).

(3) and (4) are now in place: crates/executor-core/tests/sdk_contract.rs holds the SDK contract suite; crates/manager-core/tests/schema_snapshot.rs holds the schema snapshot guard.


When to bump what

The product version follows SemVer applied pragmatically — we're pre-1.0, so the rules are looser:

  • Patch (0.2.0 → 0.2.1) — bug fixes, no surface change
  • Minor (0.2 → 0.3) — any surface bump, new features, or breaking changes (pre-1.0 license)
  • Major (0 → 1) — first stable release; SDK and API both committed to long-term compatibility

After 1.0, the product version follows strict SemVer based on the worst surface change:

  • Any surface major bump → product major bump
  • Any surface minor bump → product minor bump (at minimum)
  • No surface changes → product patch

A surface can hit its own 1.0 independently of the product. The SDK in particular is likely to stabilize before the platform does, since scripts in production demand it.


Current versions

Version
Product 0.5.0
SDK 1.1 (adds ctx.request.params, ctx.request.query, ctx.request.rest)
API 1
Schema 3 (matches migrations/0003_routes.sql)
Wire 1 (reserved; cluster mode not implemented)

Read live from GET /version on any running instance.


Examples

Adding a kv.* SDK in v1.1+:

  • Workspace bump: 0.2.0 → 0.3.0 (pre-1.0 minor)
  • SDK bump: "1.0" → "1.1" (added functions only)
  • API bump: none (no new endpoints affect existing API contract)
  • Schema bump: 1 → 2 (0002_kv_store.sql adds the kv_store table)

Renaming ctx.execution_id to ctx.exec_id:

  • SDK bump: "1.x" → "2.0" (breaking)
  • Product: minor bump pre-1.0, major bump post-1.0
  • Migration path: keep ctx.execution_id available in 1.x for a deprecation window, add ctx.exec_id alongside; flip to 2.0 only when both fields have shipped together for a release.

Adding pagination to GET /api/v1/admin/scripts:

  • New optional ?limit=&offset= query params with sensible defaults → no API bump
  • Response keeps the same shape; clients that don't pass limit see the old behavior → no API bump

Changing the response shape of GET /api/v1/admin/scripts/{id} to wrap in { script: {...} }:

  • Breaking. Ship as /api/v2/admin/scripts/{id}. Keep /api/v1 live until at least one product minor passes.