# 03. Deployment Strategy ## 1. Goals - Ship to founding members in ~12 weeks without compromising security invariants. - Maintain one-VPS-per-customer isolation. - Keep OpenClaw upstream-pinned and independently upgradeable. - Make tenant rollout reversible with fast rollback paths. ## 2. Environment Topology ## 2.1 Control Plane Environments | Environment | Purpose | Data | |---|---|---| | `dev` | Rapid feature iteration | Synthetic/local data | | `staging` | Release-candidate validation, e2e, load, security checks | Sanitized fixtures | | `prod-eu` | EU customers (default EU routing) | Real customer data | | `prod-us` | NA customers (default NA routing) | Real customer data | Control plane services (Hub + worker + notifications) are region-deployed with independent DBs and clear region affinity. ## 2.2 Tenant Environments - `sandbox tenants`: internal QA and interactive demo pool. - `canary tenants`: first real-production update recipients. - `general tenants`: full customer fleet. ## 3. Deployment Units ## 3.1 Control Plane Units - `hub-web-api` container (Next.js standalone runtime) - `hub-worker` container (automation + billing jobs) - `notifications` container (push/email delivery) - `postgres` (managed or self-hosted HA) ## 3.2 Tenant Units (Per Customer VPS) - `openclaw` container (upstream image/tag pinned) - `safety-wrapper` plugin package mounted into OpenClaw extension dir - `egress-proxy` service (localhost-only) - tool containers and nginx from provisioner - local SQLite data stores for secrets/approvals/metering ## 4. Provisioning Deployment Plan ## 4.1 Provisioner Mode Continue with existing one-shot SSH provisioner flow, retooled to: - deploy OpenClaw + Safety components - remove legacy orchestrator/sysadmin deployment - strip deprecated stacks and n8n references - write secrets into encrypted vault only (no plaintext long-lived config) ## 4.2 Immutable Artifact Inputs Provisioning uses pinned artifacts only: - OpenClaw release tag (`stable` channel pin) - Safety Wrapper image/package digest - Tool stack compose templates with hash - policy bundle version + checksum ## 5. Secrets And Credential Deployment - Registration token is one-time and short-lived. - Tenant API key returned at registration; only hash stored in Hub DB. - Provisioner writes bootstrap secrets to tmpfs file, consumed once, then shredded. - Existing plaintext job config path (`jobs//config.json`) replaced by encrypted payload + ephemeral decrypt-on-run. ## 6. Release Strategy ## 6.1 Control Plane - Trunk-based merges behind feature flags. - Deploy via Gitea Actions with staged promotions (`dev -> staging -> prod`). - DB migrations run in expand/contract pattern. ## 6.2 Tenant Plane Tenant updates split into independent channels: - `policy-only`: classification/autonomy/tool policy updates (no binary change) - `wrapper patch`: Safety Wrapper version bump - `openclaw bump`: upstream release bump (separate tracked campaign) Rollout: 1. Internal sandbox tenants 2. 5% canary customer tenants 3. 25% 4. 100% Auto-stop criteria: - redaction test failure - approval-routing failure >1% - tenant heartbeat drop >3% ## 7. Rollback Strategy ## 7.1 Control Plane Rollback - Keep last two container digests deployable. - Migration rollback policy: only for reversible migrations; otherwise hotfix-forward. ## 7.2 Tenant Rollback - Policy rollback via previous signed policy bundle. - Wrapper rollback to previous plugin package. - OpenClaw rollback to previous pinned stable tag after compatibility check. ## 8. Observability And SLOs ## 8.1 Required Telemetry - tenant heartbeat latency and freshness - approval queue latency (request -> decision) - redaction pipeline counters (matches by layer) - token usage ingest lag - provisioning success/failure per step ## 8.2 Launch SLO Targets - Hub API availability: 99.9% - Tenant heartbeat freshness: 99% under 2 minutes - Approval propagation: p95 < 5 seconds (Hub to mobile push) - Provisioning success first-attempt: >= 90% ## 9. Dual-Provider Strategy (Netcup + Hetzner) - Primary capacity pool on Netcup (EU/US). - Overflow path on Hetzner with same provisioner scripts and hardened baseline. - Provider adapter abstraction lives in Hub `server-provisioning` module; provisioner remains Debian-focused and provider-agnostic. ## 10. Cutover Plan From Current State 1. Freeze legacy orchestrator/sysadmin deployment paths. 2. Land prerequisite cleanup release (n8n/deprecated removal + credential leak fix). 3. Enable new tenant register/heartbeat APIs in Hub. 4. Provision first new-architecture internal tenant. 5. Execute parallel-run window (old and new provisioning flows side-by-side for internal only). 6. Flip default provisioning to new flow for production orders.