4.7 KiB
4.7 KiB
03. Deployment Strategy
1. Goals
- Ship to founding members in ~12 weeks without compromising security invariants.
- Maintain one-VPS-per-customer isolation.
- Keep OpenClaw upstream-pinned and independently upgradeable.
- Make tenant rollout reversible with fast rollback paths.
2. Environment Topology
2.1 Control Plane Environments
| Environment | Purpose | Data |
|---|---|---|
dev |
Rapid feature iteration | Synthetic/local data |
staging |
Release-candidate validation, e2e, load, security checks | Sanitized fixtures |
prod-eu |
EU customers (default EU routing) | Real customer data |
prod-us |
NA customers (default NA routing) | Real customer data |
Control plane services (Hub + worker + notifications) are region-deployed with independent DBs and clear region affinity.
2.2 Tenant Environments
sandbox tenants: internal QA and interactive demo pool.canary tenants: first real-production update recipients.general tenants: full customer fleet.
3. Deployment Units
3.1 Control Plane Units
hub-web-apicontainer (Next.js standalone runtime)hub-workercontainer (automation + billing jobs)notificationscontainer (push/email delivery)postgres(managed or self-hosted HA)
3.2 Tenant Units (Per Customer VPS)
openclawcontainer (upstream image/tag pinned)safety-wrapperplugin package mounted into OpenClaw extension diregress-proxyservice (localhost-only)- tool containers and nginx from provisioner
- local SQLite data stores for secrets/approvals/metering
4. Provisioning Deployment Plan
4.1 Provisioner Mode
Continue with existing one-shot SSH provisioner flow, retooled to:
- deploy OpenClaw + Safety components
- remove legacy orchestrator/sysadmin deployment
- strip deprecated stacks and n8n references
- write secrets into encrypted vault only (no plaintext long-lived config)
4.2 Immutable Artifact Inputs
Provisioning uses pinned artifacts only:
- OpenClaw release tag (
stablechannel pin) - Safety Wrapper image/package digest
- Tool stack compose templates with hash
- policy bundle version + checksum
5. Secrets And Credential Deployment
- Registration token is one-time and short-lived.
- Tenant API key returned at registration; only hash stored in Hub DB.
- Provisioner writes bootstrap secrets to tmpfs file, consumed once, then shredded.
- Existing plaintext job config path (
jobs/<id>/config.json) replaced by encrypted payload + ephemeral decrypt-on-run.
6. Release Strategy
6.1 Control Plane
- Trunk-based merges behind feature flags.
- Deploy via Gitea Actions with staged promotions (
dev -> staging -> prod). - DB migrations run in expand/contract pattern.
6.2 Tenant Plane
Tenant updates split into independent channels:
policy-only: classification/autonomy/tool policy updates (no binary change)wrapper patch: Safety Wrapper version bumpopenclaw bump: upstream release bump (separate tracked campaign)
Rollout:
- Internal sandbox tenants
- 5% canary customer tenants
- 25%
- 100%
Auto-stop criteria:
- redaction test failure
- approval-routing failure >1%
- tenant heartbeat drop >3%
7. Rollback Strategy
7.1 Control Plane Rollback
- Keep last two container digests deployable.
- Migration rollback policy: only for reversible migrations; otherwise hotfix-forward.
7.2 Tenant Rollback
- Policy rollback via previous signed policy bundle.
- Wrapper rollback to previous plugin package.
- OpenClaw rollback to previous pinned stable tag after compatibility check.
8. Observability And SLOs
8.1 Required Telemetry
- tenant heartbeat latency and freshness
- approval queue latency (request -> decision)
- redaction pipeline counters (matches by layer)
- token usage ingest lag
- provisioning success/failure per step
8.2 Launch SLO Targets
- Hub API availability: 99.9%
- Tenant heartbeat freshness: 99% under 2 minutes
- Approval propagation: p95 < 5 seconds (Hub to mobile push)
- Provisioning success first-attempt: >= 90%
9. Dual-Provider Strategy (Netcup + Hetzner)
- Primary capacity pool on Netcup (EU/US).
- Overflow path on Hetzner with same provisioner scripts and hardened baseline.
- Provider adapter abstraction lives in Hub
server-provisioningmodule; provisioner remains Debian-focused and provider-agnostic.
10. Cutover Plan From Current State
- Freeze legacy orchestrator/sysadmin deployment paths.
- Land prerequisite cleanup release (n8n/deprecated removal + credential leak fix).
- Enable new tenant register/heartbeat APIs in Hub.
- Provision first new-architecture internal tenant.
- Execute parallel-run window (old and new provisioning flows side-by-side for internal only).
- Flip default provisioning to new flow for production orders.