LetsBeBiz-Redesign/docs/architecture-proposal/claude/08-CICD-STRATEGY.md

782 lines
22 KiB
Markdown
Raw Permalink Normal View History

# LetsBe Biz — CI/CD Strategy
**Date:** February 27, 2026
**Team:** Claude Opus 4.6 Architecture Team
**Document:** 08 of 09
**Status:** Proposal — Competing with independent team
---
## Table of Contents
1. [CI/CD Overview](#1-cicd-overview)
2. [Gitea Actions Pipelines](#2-gitea-actions-pipelines)
3. [Branch Strategy](#3-branch-strategy)
4. [Build & Publish](#4-build--publish)
5. [Deployment Workflows](#5-deployment-workflows)
6. [Rollback Procedures](#6-rollback-procedures)
7. [Secret Management in CI](#7-secret-management-in-ci)
8. [Quality Gates in CI](#8-quality-gates-in-ci)
9. [Monitoring & Alerting](#9-monitoring--alerting)
---
## 1. CI/CD Overview
### Platform: Gitea Actions
Gitea Actions is the CI/CD platform (Architecture Brief §9.1). It uses GitHub Actions-compatible YAML workflow syntax, making migration straightforward if needed later.
### Pipeline Architecture
```
Developer pushes code
┌──────────────────┐
│ Gitea Actions │
│ Trigger: push │
│ │
│ 1. Lint │
│ 2. Type Check │
│ 3. Unit Tests │
│ 4. Build │
│ 5. Security Scan │
└────────┬─────────┘
┌────┴────┐
│ Branch? │
└────┬────┘
┌────┼────────────┐
│ │ │
feature develop main
│ │ │
│ ▼ ▼
│ Build Docker Build Docker
│ Push :dev Push :latest
│ │ │
│ ▼ ▼
│ Deploy to Deploy to
│ staging production
│ │
│ ▼
│ Canary rollout
│ (tenant servers)
└─► PR required to merge
```
### Environments
| Environment | Branch | Trigger | Purpose |
|-------------|--------|---------|---------|
| **Local** | Any | Manual | Developer testing |
| **CI** | Any push | Automatic | Lint, test, type check |
| **Staging** | `develop` | Automatic on merge | Integration testing, dogfooding |
| **Production** | `main` | Manual approval | Live customers |
---
## 2. Gitea Actions Pipelines
### 2.1 Monorepo CI Pipeline (All Packages)
```yaml
# .gitea/workflows/ci.yml
name: CI
on:
push:
branches: [main, develop, 'feature/**']
pull_request:
branches: [main, develop]
env:
NODE_VERSION: '22'
jobs:
lint-and-typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- name: Install dependencies
run: npm ci
- name: Lint
run: npx turbo run lint
- name: Type check
run: npx turbo run typecheck
unit-tests:
runs-on: ubuntu-latest
needs: lint-and-typecheck
strategy:
matrix:
package:
- safety-wrapper
- secrets-proxy
- hub
- shared-types
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
- name: Install dependencies
run: npm ci
- name: Run tests for ${{ matrix.package }}
run: npx turbo run test --filter=${{ matrix.package }}
security-scan:
runs-on: ubuntu-latest
needs: lint-and-typecheck
steps:
- uses: actions/checkout@v4
- name: Check for secrets in code
run: |
npx @trufflesecurity/trufflehog git file://. --only-verified --fail
- name: Dependency audit
run: npm audit --audit-level=high
```
### 2.2 Safety Wrapper Pipeline
```yaml
# .gitea/workflows/safety-wrapper.yml
name: Safety Wrapper
on:
push:
paths:
- 'packages/safety-wrapper/**'
- 'packages/shared-types/**'
branches: [main, develop]
jobs:
p0-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
- run: npm ci
- name: P0 Secrets Redaction Tests
run: npx turbo run test:p0 --filter=secrets-proxy
- name: P0 Command Classification Tests
run: npx turbo run test:p0 --filter=safety-wrapper
- name: P1 Autonomy Tests
run: npx turbo run test:p1 --filter=safety-wrapper
build-image:
runs-on: ubuntu-latest
needs: p0-tests
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Set tag
id: tag
run: |
if [ "${{ github.ref }}" = "refs/heads/main" ]; then
echo "tag=latest" >> $GITHUB_OUTPUT
else
echo "tag=dev" >> $GITHUB_OUTPUT
fi
- name: Build Safety Wrapper image
run: |
docker build \
-f packages/safety-wrapper/Dockerfile \
-t code.letsbe.solutions/letsbe/safety-wrapper:${{ steps.tag.outputs.tag }} \
-t code.letsbe.solutions/letsbe/safety-wrapper:${{ github.sha }} \
.
- name: Push to registry
run: |
echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login code.letsbe.solutions -u ${{ secrets.REGISTRY_USER }} --password-stdin
docker push code.letsbe.solutions/letsbe/safety-wrapper:${{ steps.tag.outputs.tag }}
docker push code.letsbe.solutions/letsbe/safety-wrapper:${{ github.sha }}
```
### 2.3 Secrets Proxy Pipeline
```yaml
# .gitea/workflows/secrets-proxy.yml
name: Secrets Proxy
on:
push:
paths:
- 'packages/secrets-proxy/**'
- 'packages/shared-types/**'
branches: [main, develop]
jobs:
p0-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '22' }
- run: npm ci
- name: P0 Redaction Tests (must pass 100%)
run: npx turbo run test:p0 --filter=secrets-proxy
- name: Performance Benchmark
run: npx turbo run test:benchmark --filter=secrets-proxy
build-image:
runs-on: ubuntu-latest
needs: p0-tests
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Build Secrets Proxy image
run: |
docker build \
-f packages/secrets-proxy/Dockerfile \
-t code.letsbe.solutions/letsbe/secrets-proxy:${{ github.ref == 'refs/heads/main' && 'latest' || 'dev' }} \
.
- name: Push to registry
run: |
echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login code.letsbe.solutions -u ${{ secrets.REGISTRY_USER }} --password-stdin
docker push code.letsbe.solutions/letsbe/secrets-proxy --all-tags
```
### 2.4 Hub Pipeline
```yaml
# .gitea/workflows/hub.yml
name: Hub
on:
push:
paths:
- 'packages/hub/**'
- 'packages/shared-prisma/**'
branches: [main, develop]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_DB: hub_test
POSTGRES_USER: hub
POSTGRES_PASSWORD: testpass
ports: ['5432:5432']
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '22' }
- run: npm ci
- name: Run Prisma migrations
run: npx turbo run db:push --filter=hub
env:
DATABASE_URL: postgresql://hub:testpass@localhost:5432/hub_test
- name: Run tests
run: npx turbo run test --filter=hub
env:
DATABASE_URL: postgresql://hub:testpass@localhost:5432/hub_test
build-image:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Build Hub image
run: |
docker build \
-f packages/hub/Dockerfile \
-t code.letsbe.solutions/letsbe/hub:${{ github.ref == 'refs/heads/main' && 'latest' || 'dev' }} \
.
- name: Push to registry
run: |
echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login code.letsbe.solutions -u ${{ secrets.REGISTRY_USER }} --password-stdin
docker push code.letsbe.solutions/letsbe/hub --all-tags
```
### 2.5 Integration Test Pipeline
```yaml
# .gitea/workflows/integration.yml
name: Integration Tests
on:
push:
branches: [develop]
workflow_dispatch:
jobs:
integration:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '22' }
- run: npm ci
- name: Start integration stack
run: docker compose -f test/docker-compose.integration.yml up -d --wait
timeout-minutes: 5
- name: Wait for services
run: |
for i in $(seq 1 30); do
curl -sf http://localhost:8200/health && break || sleep 2
done
- name: Run integration tests
run: npx turbo run test:integration
- name: Collect logs on failure
if: failure()
run: docker compose -f test/docker-compose.integration.yml logs > integration-logs.txt
- name: Upload logs
if: failure()
uses: actions/upload-artifact@v4
with:
name: integration-logs
path: integration-logs.txt
- name: Teardown
if: always()
run: docker compose -f test/docker-compose.integration.yml down -v
```
---
## 3. Branch Strategy
### Git Flow (Simplified)
```
main ─────────────────────────────────────────────────►
│ ▲
│ │ (merge via PR, requires approval)
│ │
develop ──┬───────────┬───────────┬────────┤
│ │ │
feature/sw-skeleton │ feature/hub-billing
│ │
│ feature/secrets-proxy
hotfix/critical-fix ──────────────────────► main (direct merge for critical fixes)
```
### Branch Rules
| Branch | Protection | Merge Requirements |
|--------|-----------|-------------------|
| `main` | Protected; no direct pushes | PR from `develop`; 1 approval; all CI checks pass; security scan pass |
| `develop` | Protected; no direct pushes | PR from feature branch; all CI checks pass |
| `feature/*` | Unprotected | Free to push; PR to develop when ready |
| `hotfix/*` | Unprotected | Can merge to both `main` and `develop`; 1 approval required |
### Naming Conventions
```
feature/sw-command-classification # Safety Wrapper feature
feature/hub-tenant-api # Hub feature
feature/mobile-chat-view # Mobile app feature
feature/prov-step10-rewrite # Provisioner feature
fix/secrets-proxy-jwt-detection # Bug fix
hotfix/redaction-bypass-cve # Critical security fix
```
### Release Tagging
```
v0.1.0 # First internal milestone (M1)
v0.2.0 # M2
v0.3.0 # M3
v1.0.0 # Founding member launch (M4)
v1.0.1 # First patch
v1.1.0 # First feature update post-launch
```
---
## 4. Build & Publish
### Docker Image Strategy
| Image | Registry Path | Build Context | Size Target |
|-------|--------------|---------------|-------------|
| `letsbe/safety-wrapper` | `code.letsbe.solutions/letsbe/safety-wrapper` | `packages/safety-wrapper/` | <150MB |
| `letsbe/secrets-proxy` | `code.letsbe.solutions/letsbe/secrets-proxy` | `packages/secrets-proxy/` | <100MB |
| `letsbe/hub` | `code.letsbe.solutions/letsbe/hub` | `packages/hub/` | <500MB |
| `letsbe/ansible-runner` | `code.letsbe.solutions/letsbe/ansible-runner` | `packages/provisioner/` | Existing |
### Multi-Stage Dockerfile Pattern
```dockerfile
# packages/safety-wrapper/Dockerfile
# Stage 1: Dependencies
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
COPY packages/safety-wrapper/package.json ./packages/safety-wrapper/
COPY packages/shared-types/package.json ./packages/shared-types/
RUN npm ci --workspace=packages/safety-wrapper --workspace=packages/shared-types
# Stage 2: Build
FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY packages/safety-wrapper/ ./packages/safety-wrapper/
COPY packages/shared-types/ ./packages/shared-types/
COPY turbo.json package.json ./
RUN npx turbo run build --filter=safety-wrapper
# Stage 3: Production
FROM node:22-alpine AS runner
WORKDIR /app
RUN addgroup -g 1001 -S letsbe && adduser -S letsbe -u 1001
COPY --from=builder /app/packages/safety-wrapper/dist ./dist
COPY --from=builder /app/packages/safety-wrapper/package.json ./
COPY --from=deps /app/node_modules ./node_modules
USER letsbe
EXPOSE 8200
CMD ["node", "dist/index.js"]
```
### Image Tagging
| Tag | When | Purpose |
|-----|------|---------|
| `:dev` | On merge to `develop` | Staging deployment |
| `:latest` | On merge to `main` | Production deployment |
| `:<git-sha>` | On every build | Immutable reference for debugging |
| `:v1.0.0` | On release tag | Version-pinned deployment |
---
## 5. Deployment Workflows
### 5.1 Central Platform (Hub) Deployment
```yaml
# .gitea/workflows/deploy-hub.yml
name: Deploy Hub
on:
push:
branches: [main]
paths: ['packages/hub/**', 'packages/shared-prisma/**']
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- name: Deploy to production
run: |
ssh -o StrictHostKeyChecking=no deploy@hub.letsbe.biz << 'EOF'
cd /opt/letsbe/hub
docker compose pull hub
docker compose up -d hub
# Wait for health check
for i in $(seq 1 30); do
curl -sf http://localhost:3847/api/health && break || sleep 2
done
# Run migrations
docker compose exec hub npx prisma migrate deploy
EOF
```
### 5.2 Tenant Server Update Pipeline
Tenant servers are updated via the Hub push mechanism (see 03-DEPLOYMENT-STRATEGY §7).
```yaml
# .gitea/workflows/tenant-update.yml
name: Tenant Server Update
on:
workflow_dispatch:
inputs:
component:
description: 'Component to update'
required: true
type: choice
options: [safety-wrapper, secrets-proxy, openclaw]
strategy:
description: 'Rollout strategy'
required: true
type: choice
options: [staging-only, canary-5pct, canary-25pct, full-rollout]
jobs:
prepare:
runs-on: ubuntu-latest
steps:
- name: Verify image exists
run: |
docker manifest inspect code.letsbe.solutions/letsbe/${{ inputs.component }}:latest
rollout:
runs-on: ubuntu-latest
needs: prepare
steps:
- name: Trigger Hub rollout API
run: |
curl -X POST https://hub.letsbe.biz/api/v1/admin/rollout \
-H "Authorization: Bearer ${{ secrets.HUB_ADMIN_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{
"component": "${{ inputs.component }}",
"tag": "latest",
"strategy": "${{ inputs.strategy }}"
}'
```
### 5.3 Staging Deployment (Automatic)
```yaml
# .gitea/workflows/deploy-staging.yml
name: Deploy Staging
on:
push:
branches: [develop]
jobs:
deploy-staging:
runs-on: ubuntu-latest
environment: staging
steps:
- name: Deploy Hub to staging
run: |
ssh deploy@staging.letsbe.biz << 'EOF'
cd /opt/letsbe/hub
docker compose pull
docker compose up -d
docker compose exec hub npx prisma migrate deploy
EOF
- name: Deploy tenant stack to staging VPS
run: |
ssh deploy@staging-tenant.letsbe.biz << 'EOF'
cd /opt/letsbe
docker compose -f docker-compose.letsbe.yml pull
docker compose -f docker-compose.letsbe.yml up -d
EOF
- name: Run smoke tests
run: |
curl -sf https://staging.letsbe.biz/api/health
curl -sf https://staging-tenant.letsbe.biz:8200/health
curl -sf https://staging-tenant.letsbe.biz:8100/health
```
---
## 6. Rollback Procedures
### 6.1 Hub Rollback
```bash
# Rollback Hub to previous version
ssh deploy@hub.letsbe.biz << 'EOF'
cd /opt/letsbe/hub
# Find previous image
PREVIOUS=$(docker compose images hub --format '{{.Tag}}' | head -1)
# Pull and deploy previous
docker compose pull hub # Uses previous :latest from registry
docker compose up -d hub
# Verify health
for i in $(seq 1 30); do
curl -sf http://localhost:3847/api/health && break || sleep 2
done
# Note: Prisma migrations are forward-only.
# If a migration needs reverting, use prisma migrate resolve.
EOF
```
### 6.2 Tenant Component Rollback
```bash
# Rollback Safety Wrapper on a specific tenant
ssh deploy@tenant-server << 'EOF'
cd /opt/letsbe
# Roll back to pinned SHA
docker compose -f docker-compose.letsbe.yml \
-e SAFETY_WRAPPER_TAG=<previous-sha> \
up -d safety-wrapper
# Verify health
curl -sf http://127.0.0.1:8200/health
EOF
```
### 6.3 Rollback Decision Matrix
| Symptom | Action | Automatic? |
|---------|--------|-----------|
| Health check fails after deploy | Rollback to previous image | Yes (Docker restart policy pulls previous on repeated failure) |
| P0 tests fail in CI | Block merge; no deployment | Yes (CI gate) |
| Secrets redaction miss detected | EMERGENCY: rollback all tenants immediately | Manual (requires admin trigger) |
| Hub API errors >5% | Rollback Hub to previous version | Manual (monitoring alert) |
| Billing discrepancy | Investigate first; rollback billing code if confirmed | Manual |
### 6.4 Emergency Rollback Checklist
For critical security issues (e.g., redaction bypass):
1. **STOP** all tenant updates immediately (disable Hub rollout API)
2. **ROLLBACK** all affected components to last known-good version
3. **VERIFY** rollback successful (health checks, P0 tests)
4. **INVESTIGATE** root cause
5. **FIX** and add test case for the specific failure
6. **AUDIT** all tenants for potential exposure during the window
7. **NOTIFY** affected customers if secrets were potentially exposed
8. **POST-MORTEM** within 24 hours
---
## 7. Secret Management in CI
### Gitea Secrets Configuration
| Secret | Scope | Purpose |
|--------|-------|---------|
| `REGISTRY_USER` | Organization | Docker registry login |
| `REGISTRY_PASSWORD` | Organization | Docker registry password |
| `HUB_ADMIN_TOKEN` | Repository | Hub API authentication for deployments |
| `STAGING_SSH_KEY` | Repository | SSH key for staging deployment |
| `PRODUCTION_SSH_KEY` | Repository | SSH key for production deployment |
| `STRIPE_TEST_KEY` | Repository | Stripe test mode for integration tests |
### Rules
1. **Never** put secrets in workflow YAML files
2. **Never** echo secrets in CI logs (use `::add-mask::`)
3. **Never** pass secrets as command-line arguments (use environment variables)
4. SSH keys: use deploy keys with minimal permissions (read-only for CI, write for deploy)
5. Rotate all CI secrets quarterly
---
## 8. Quality Gates in CI
### Gate Configuration
```yaml
# In each pipeline, quality gates are enforced as job dependencies:
jobs:
# Gate 1: Code quality
lint:
# Must pass before tests run
...
typecheck:
# Must pass before tests run
...
# Gate 2: Correctness
unit-tests:
needs: [lint, typecheck]
# Must pass before build
...
# Gate 3: Security
security-scan:
needs: [lint]
# Must pass before deploy
...
# Gate 4: Build
build:
needs: [unit-tests, security-scan]
# Must succeed before deploy
...
# Gate 5: Deploy (only on protected branches)
deploy:
needs: [build]
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop'
...
```
### PR Merge Requirements
| Requirement | Enforcement |
|-------------|------------|
| All CI checks pass | Gitea branch protection rule |
| At least 1 approval | Gitea branch protection rule |
| No unresolved review comments | Convention (not enforced by Gitea) |
| P0 tests pass if security code changed | CI pipeline condition |
| No secrets detected in diff | trufflehog scan |
---
## 9. Monitoring & Alerting
### CI Pipeline Monitoring
| Metric | Alert Threshold | Action |
|--------|----------------|--------|
| Build duration | >15 min | Investigate; optimize caching |
| Test suite duration | >10 min | Investigate; parallelize tests |
| Failed builds on `develop` | >3 consecutive | Freeze merges; investigate |
| Failed deploys | Any | Automatic rollback; notify team |
| Security scan findings | Any critical | Block merge; assign to Security Lead |
### Deployment Monitoring
| Metric | Alert Threshold | Action |
|--------|----------------|--------|
| Hub health after deploy | Unhealthy for >60s | Automatic rollback |
| Tenant health after update | Unhealthy for >120s | Rollback specific tenant; pause rollout |
| Error rate post-deploy | >5% increase | Alert team; investigate |
| Latency post-deploy | >2× baseline | Alert team; investigate |
### Notification Channels
| Event | Channel |
|-------|---------|
| CI failure on `main` | Team chat (immediate) |
| Security scan finding | Team chat + email to Security Lead |
| Deployment success | Team chat (informational) |
| Deployment failure | Team chat + email to on-call |
| Emergency rollback | Team chat + phone call to on-call |
---
*End of Document — 08 CI/CD Strategy*