LetsBeBiz-Redesign/docs/technical/LetsBe_Biz_Infrastructure_R...

765 lines
24 KiB
Markdown
Raw Normal View History

# LetsBe Biz — Infrastructure Runbook
**Version:** 1.0
**Date:** February 26, 2026
**Authors:** Matt (Founder), Claude (Architecture)
**Status:** Engineering Spec — Ready for Implementation
**Companion docs:** Technical Architecture v1.2, Tool Catalog v2.2, Security & GDPR Framework v1.1
**Decision refs:** Foundation Document Decisions #18, #27
---
## 1. Purpose
This runbook is the operational reference for provisioning, managing, monitoring, and maintaining LetsBe Biz infrastructure. It covers the full lifecycle: from ordering a VPS through Netcup to deprovisioning a customer's server at account termination.
**Target audience:** Matt (operations), future engineering team, and the IT Admin AI agent (for self-referencing operational procedures).
---
## 2. Infrastructure Overview
### 2.1 Hosting Provider: Netcup
| Item | Detail |
|------|--------|
| **Provider** | Netcup GmbH (Karlsruhe, Germany) |
| **Product line** | VPS (Virtual Private Server) |
| **EU data center** | Netcup Nürnberg/Karlsruhe, Germany |
| **NA data center** | Netcup Manassas, Virginia, USA |
| **API** | SCP (Server Control Panel) REST API with OAuth2 Device Flow |
| **Hub integration** | Full — server ordering, power actions, metrics, snapshots, rescue mode via `netcupService.ts` |
### 2.2 Server Tiers
| Tier | vCPUs | RAM | Disk | Recommended Tools | Monthly Cost (est.) |
|------|-------|-----|------|-------------------|---------------------|
| Lite (€29) | 4 | 8 GB | 160 GB SSD | 58 tools | ~€812 |
| Build (€45) | 8 | 16 GB | 320 GB SSD | 1015 tools | ~€1418 |
| Scale (€75) | 12 | 32 GB | 640 GB SSD | 1525 tools | ~€2228 |
| Enterprise (€109) | 16 | 64 GB | 1.2 TB SSD | 28+ tools | ~€3545 |
### 2.3 Network Architecture
```
Internet
Netcup VPS (public IP)
├── Port 80 (HTTP → 301 redirect to HTTPS)
├── Port 443 (HTTPS → nginx reverse proxy)
├── Port 22022 (SSH — hardened, key-only)
nginx (Alpine container)
├── *.{{domain}} → Route by subdomain to tool containers
│ ├── files.{{domain}} → 127.0.0.1:3023 (Nextcloud)
│ ├── crm.{{domain}} → 127.0.0.1:3025 (Odoo)
│ ├── chat.{{domain}} → 127.0.0.1:3026 (Chatwoot)
│ ├── blog.{{domain}} → 127.0.0.1:3029 (Ghost)
│ ├── mail.{{domain}} → 127.0.0.1:3031 (Stalwart Mail)
│ ├── ... (33 nginx configs total)
│ └── status.{{domain}} → 127.0.0.1:3008 (Uptime Kuma)
└── Internal only (not exposed via nginx):
├── 127.0.0.1:18789 (OpenClaw Gateway)
├── 127.0.0.1:8100 (Secrets Proxy)
└── Various internal tool ports
```
---
## 3. Provisioning Pipeline
### 3.1 End-to-End Flow
```
Customer signs up → Stripe payment → Hub creates Order
Hub Automation Worker (state machine)
├── PAYMENT_CONFIRMED → order VPS from Netcup (if AUTO mode)
├── AWAITING_SERVER → poll Netcup until VPS is ready
├── SERVER_READY → wait for DNS records
├── DNS_PENDING → verify A records for all subdomains
├── DNS_READY → trigger provisioning
├── PROVISIONING → spawn Docker provisioner container
│ │
│ ▼
│ letsbe-provisioner (10-step pipeline via SSH)
│ ├── Step 1: System packages (apt update, essentials)
│ ├── Step 2: Docker CE installation
│ ├── Step 3: Disable conflicting services
│ ├── Step 4: nginx + fallback config
│ ├── Step 5: UFW firewall (80, 443, 22022)
│ ├── Step 6: Admin user + SSH key (optional)
│ ├── Step 7: SSH hardening (port 22022, key-only)
│ ├── Step 8: Unattended security updates
│ ├── Step 9: Deploy tool stacks (docker-compose)
│ └── Step 10: Deploy OpenClaw + Safety Wrapper + bootstrap
├── FULFILLED → server is live, customer notified
└── FAILED → retry logic (1min / 5min / 15min backoff, max 3 attempts)
```
### 3.2 Provisioner Detail (setup.sh)
**Location:** `letsbe-provisioner/scripts/setup.sh` (~832 lines)
#### Step 1: System Packages
```bash
apt-get update && apt-get upgrade -y
apt-get install -y curl wget gnupg2 ca-certificates lsb-release apt-transport-https \
software-properties-common unzip jq htop iotop net-tools dnsutils certbot \
python3-certbot-nginx fail2ban rclone
```
#### Step 2: Docker CE
```bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
apt-get update && apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
```
#### Step 3: Disable Conflicting Services
```bash
systemctl stop apache2 2>/dev/null || true
systemctl disable apache2 2>/dev/null || true
systemctl stop postfix 2>/dev/null || true
systemctl disable postfix 2>/dev/null || true
```
#### Step 4: nginx
Deploy nginx Alpine container with initial fallback config. SSL certificates provisioned via certbot after DNS is verified.
#### Step 5: UFW Firewall
```bash
ufw default deny incoming
ufw default allow outgoing
ufw allow 80/tcp # HTTP
ufw allow 443/tcp # HTTPS
ufw allow 22022/tcp # SSH (hardened port)
ufw allow 25/tcp # SMTP (Stalwart Mail)
ufw allow 587/tcp # SMTP submission
ufw allow 993/tcp # IMAPS
ufw --force enable
```
#### Step 6: Admin User
```bash
useradd -m -s /bin/bash -G docker letsbe-admin
mkdir -p /home/letsbe-admin/.ssh
echo "{{admin_ssh_public_key}}" > /home/letsbe-admin/.ssh/authorized_keys
chmod 700 /home/letsbe-admin/.ssh
chmod 600 /home/letsbe-admin/.ssh/authorized_keys
chown -R letsbe-admin:letsbe-admin /home/letsbe-admin/.ssh
```
#### Step 7: SSH Hardening
```bash
# /etc/ssh/sshd_config modifications:
Port 22022
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 30
AllowUsers letsbe-admin
```
#### Step 8: Unattended Security Updates
```bash
apt-get install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
# Configure /etc/apt/apt.conf.d/50unattended-upgrades for security-only updates
```
#### Step 9: Deploy Tool Stacks
For each tool selected by the customer:
```bash
# 1. Generate credentials (env_setup.sh)
# 50+ secrets: database passwords, admin tokens, API keys, JWT secrets
# Written to /opt/letsbe/env/credentials.env and per-tool .env files
# 2. Deploy Docker Compose stacks
for stack in {{selected_tools}}; do
cd /opt/letsbe/stacks/$stack
docker compose up -d
done
# 3. Deploy nginx configs per tool
for conf in {{selected_nginx_configs}}; do
cp /opt/letsbe/nginx/sites/$conf /etc/nginx/sites-enabled/
done
nginx -t && nginx -s reload
# 4. Request SSL certificates
certbot --nginx -d "*.{{domain}}" --non-interactive --agree-tos -m "ssl@{{domain}}"
```
#### Step 10: Deploy OpenClaw + Safety Wrapper + Bootstrap
```bash
# 1. Deploy OpenClaw container with Safety Wrapper extension pre-installed
cd /opt/letsbe/stacks/openclaw
docker compose up -d
# 2. Deploy Secrets Proxy
cd /opt/letsbe/stacks/secrets-proxy
docker compose up -d
# 3. Seed secrets registry from credentials.env
docker exec letsbe-openclaw /opt/letsbe/scripts/seed-secrets.sh
# 4. Generate tool-registry.json from deployed tools
docker exec letsbe-openclaw /opt/letsbe/scripts/generate-tool-registry.sh
# 5. Deploy SOUL.md files for each agent
# (generated from templates with tenant variables substituted)
# 6. Run initial setup browser automations
# (Cal.com, Chatwoot, Keycloak, Nextcloud, Stalwart Mail, Umami, Uptime Kuma)
# 7. Register with Hub
docker exec letsbe-openclaw /opt/letsbe/scripts/hub-register.sh
# 8. Clean up config.json (CRITICAL: remove plaintext passwords)
rm -f /opt/letsbe/config.json
```
### 3.3 Credential Generation (env_setup.sh)
**Location:** `letsbe-provisioner/scripts/env_setup.sh` (~678 lines)
Generates 50+ unique credentials per tenant:
| Category | Count | Examples |
|----------|-------|---------|
| Database passwords | 18 | PostgreSQL passwords for each tool with a DB |
| Admin passwords | 12 | Nextcloud admin, Keycloak admin, Odoo admin, etc. |
| API tokens | 10 | NocoDB API token, Ghost admin API key, etc. |
| JWT secrets | 5 | Chatwoot, Cal.com, OpenClaw, etc. |
| Encryption keys | 3 | Safety Wrapper registry key, backup encryption key |
| SSH keys | 2 | Admin key pair, Hub communication key |
| SMTP credentials | 2 | Stalwart Mail admin, relay credentials |
**Generation method:** `openssl rand -base64 32` for passwords, `openssl rand -hex 32` for tokens, `ssh-keygen -t ed25519` for SSH keys.
**Template rendering:** All `{{ variable }}` placeholders in Docker Compose files and nginx configs are substituted with generated values.
### 3.4 Post-Provisioning Verification
After step 10 completes, the provisioner runs health checks:
```bash
# 1. Verify all containers are running
docker ps --format "{{.Names}}: {{.Status}}" | grep -v "Up" && exit 1
# 2. Verify nginx is serving
curl -sf https://{{domain}} > /dev/null || exit 1
# 3. Verify each tool's health endpoint
for tool in {{health_check_urls}}; do
curl -sf "$tool" > /dev/null || echo "WARNING: $tool not responding"
done
# 4. Verify Safety Wrapper registered with Hub
curl -sf http://127.0.0.1:8100/health || exit 1
# 5. Verify OpenClaw is responsive
curl -sf http://127.0.0.1:18789/health || exit 1
# 6. Report success to Hub
curl -X PATCH "{{hub_url}}/api/v1/jobs/{{job_id}}" \
-H "Authorization: Bearer {{runner_token}}" \
-d '{"status": "COMPLETED"}'
```
---
## 4. Backup System
### 4.1 Backup Architecture
**Location:** `letsbe-provisioner/scripts/backups.sh` (~473 lines)
**Schedule:** Daily via cron at 02:00 server local time
**Retention:** 7 daily backups + 4 weekly backups (rolling)
### 4.2 What Gets Backed Up
| Component | Method | Target |
|-----------|--------|--------|
| PostgreSQL databases (18) | `pg_dump --format=custom` | `/opt/letsbe/backups/daily/` |
| MySQL databases (2) | `mysqldump --single-transaction` | `/opt/letsbe/backups/daily/` |
| MongoDB databases (1) | `mongodump --archive` | `/opt/letsbe/backups/daily/` |
| Nextcloud files | rsync snapshot | `/opt/letsbe/backups/daily/nextcloud/` |
| Docker volumes (critical) | `docker run --volumes-from` tar | `/opt/letsbe/backups/daily/volumes/` |
| nginx configs | tar archive | `/opt/letsbe/backups/daily/nginx/` |
| OpenClaw state | tar of `~/.openclaw/` | `/opt/letsbe/backups/daily/openclaw/` |
| Safety Wrapper state | SQLite backup API | `/opt/letsbe/backups/daily/safety-wrapper/` |
| Credentials | Encrypted tar | `/opt/letsbe/backups/daily/credentials.enc` |
### 4.3 Remote Backup
After local backup completes, `rclone` syncs to a remote destination:
```bash
rclone sync /opt/letsbe/backups/ remote:backups/{{tenant_id}}/ \
--transfers 4 \
--checkers 8 \
--fast-list \
--log-file /var/log/letsbe/rclone.log
```
Remote destination options (configured per tenant):
- Netcup S3 (default)
- Customer-provided S3 bucket
- Customer-provided rclone remote
### 4.4 Backup Status Reporting
After each backup run, `backups.sh` writes a `backup-status.json`:
```json
{
"timestamp": "2026-02-26T02:15:00Z",
"status": "success",
"duration_seconds": 847,
"databases_backed_up": 21,
"files_backed_up": true,
"remote_sync": "success",
"total_size_gb": 4.2,
"errors": []
}
```
The Safety Wrapper monitors this file (Decision #27) and reports status to the Hub via heartbeat.
### 4.5 Backup Rotation
```bash
# Daily: keep last 7
find /opt/letsbe/backups/daily/ -maxdepth 1 -mtime +7 -exec rm -rf {} \;
# Weekly: copy Sunday's backup to weekly/, keep last 4
if [ "$(date +%u)" = "7" ]; then
cp -a /opt/letsbe/backups/daily/ /opt/letsbe/backups/weekly/$(date +%Y-%m-%d)/
fi
find /opt/letsbe/backups/weekly/ -maxdepth 1 -mtime +28 -exec rm -rf {} \;
```
---
## 5. Restore Procedures
### 5.1 Per-Tool Restore
**Location:** `letsbe-provisioner/scripts/restore.sh` (~512 lines)
```bash
# Restore a specific tool's database from a daily backup
./restore.sh --tool nextcloud --date 2026-02-25
# Steps:
# 1. Stop the tool container
# 2. Restore database from backup
# 3. Restore files (if applicable)
# 4. Start the tool container
# 5. Verify health check
# 6. Report to Hub
```
### 5.2 Full Server Restore
For complete server recovery (e.g., VPS failure):
```
1. Order new VPS from Netcup (same region, same tier)
2. Run provisioner with --restore flag
- Steps 1-8: Standard server setup
- Step 9: Deploy tool stacks (empty)
- Step 10: Deploy OpenClaw + Safety Wrapper
3. Restore from remote backup:
rclone sync remote:backups/{{tenant_id}}/latest/ /opt/letsbe/backups/daily/
4. Run restore.sh --all
- Restores all 21 databases
- Restores all file volumes
- Restores OpenClaw state
- Restores Safety Wrapper secrets registry
- Restores credentials
5. Verify all tools are healthy
6. Update DNS if IP changed
7. Hub updates server connection record
```
### 5.3 Point-in-Time Recovery
For accidental data deletion by a user:
```
1. Identify the backup date that contains the needed data
2. Restore the specific tool to a temporary container:
./restore.sh --tool odoo --date 2026-02-23 --target temp
3. Extract the needed data from the temp container
4. Import the data into the production tool
5. Remove the temp container
```
---
## 6. Monitoring
### 6.1 Uptime Kuma (On-Tenant)
Each tenant VPS runs Uptime Kuma monitoring all local services:
| Monitor | Type | Interval | Alert Threshold |
|---------|------|----------|-----------------|
| nginx | HTTP(S) | 60s | 3 failures |
| Each tool container | HTTP | 120s | 3 failures |
| OpenClaw Gateway | HTTP (port 18789) | 60s | 2 failures |
| Secrets Proxy | HTTP (port 8100) | 60s | 2 failures |
| SSL certificate expiry | Certificate | Daily | 14 days before expiry |
| Disk usage | Push | 300s | >85% |
### 6.2 Hub-Level Monitoring
The Hub monitors all tenant servers centrally:
| Metric | Source | Check Interval | Alert |
|--------|--------|---------------|-------|
| Heartbeat received | Safety Wrapper | Expected every 5 min | Missing >15 min |
| Token usage rate | Safety Wrapper heartbeat | Every heartbeat | >90% pool consumed |
| Backup status | Safety Wrapper (reads backup-status.json) | Daily | Any backup failure |
| Container health | Portainer API (via Hub) | Every 10 min | Container crash/OOM |
| VPS metrics | Netcup SCP API | Every 15 min | CPU >90% sustained, disk >90% |
| OpenClaw version | Safety Wrapper heartbeat | Every heartbeat | Version mismatch with expected |
### 6.3 GlitchTip (Error Tracking)
GlitchTip runs on each tenant and captures application errors from:
- OpenClaw (Node.js errors, unhandled rejections)
- Safety Wrapper (hook errors, tool execution failures)
- Tool containers that support Sentry-compatible error reporting
### 6.4 Diun (Container Update Notifications)
Diun monitors all Docker images for new releases:
```yaml
# /opt/letsbe/stacks/diun/docker-compose.yml
watch:
schedule: "0 6 * * *" # Check daily at 06:00
notif:
webhook:
endpoint: "http://127.0.0.1:8100/webhooks/diun" # Safety Wrapper
method: POST
```
The Safety Wrapper receives update notifications and:
1. Logs the available update
2. Reports to Hub via heartbeat
3. Does NOT auto-update (updates require IT Admin agent or manual action)
---
## 7. Maintenance Procedures
### 7.1 Tool Updates
Tool container updates are initiated by the IT Admin agent or manually:
```bash
# 1. Pull new image
cd /opt/letsbe/stacks/{{tool}}
docker compose pull
# 2. Backup the tool's database
./backups.sh --tool {{tool}}
# 3. Rolling update
docker compose up -d --force-recreate
# 4. Verify health check
curl -sf http://127.0.0.1:{{port}}/health
# 5. If health check fails, rollback:
docker compose down
docker tag {{tool}}:previous {{tool}}:latest
docker compose up -d
```
### 7.2 OpenClaw Updates
OpenClaw is pinned to a tested release tag. Update procedure:
```bash
# 1. Check upstream changelog for breaking changes
# 2. Test in staging VPS first
# 3. On tenant VPS:
cd /opt/letsbe/stacks/openclaw
# 4. Backup OpenClaw state
tar czf /opt/letsbe/backups/openclaw-pre-update.tar.gz ~/.openclaw/
# 5. Update image tag in docker-compose.yml
sed -i 's/openclaw:v2026.2.1/openclaw:v2026.3.0/' docker-compose.yml
# 6. Pull and recreate
docker compose pull && docker compose up -d --force-recreate
# 7. Verify
curl -sf http://127.0.0.1:18789/health
docker exec letsbe-openclaw openclaw --version
# 8. If verification fails, rollback:
docker compose down
sed -i 's/openclaw:v2026.3.0/openclaw:v2026.2.1/' docker-compose.yml
docker compose up -d
tar xzf /opt/letsbe/backups/openclaw-pre-update.tar.gz -C /
```
**Update cadence:** Monthly review of upstream changelog. Update only for security fixes or features we need. Never update on Fridays.
### 7.3 SSL Certificate Renewal
Let's Encrypt certificates auto-renew via certbot cron. Manual renewal if needed:
```bash
certbot renew --nginx --force-renewal
nginx -t && nginx -s reload
```
### 7.4 Credential Rotation
The IT Admin agent can rotate credentials for any tool:
```bash
# 1. Generate new credential
NEW_PASS=$(openssl rand -base64 32)
# 2. Update the tool's .env file
sed -i "s/DB_PASSWORD=.*/DB_PASSWORD=$NEW_PASS/" /opt/letsbe/stacks/{{tool}}/.env
# 3. Update the database user's password
docker exec {{tool}}-db psql -c "ALTER USER {{user}} PASSWORD '$NEW_PASS';"
# 4. Restart the tool container
docker compose -f /opt/letsbe/stacks/{{tool}}/docker-compose.yml restart
# 5. Update the secrets registry
# (Safety Wrapper detects .env change and updates registry automatically)
# 6. Verify tool health
curl -sf http://127.0.0.1:{{port}}/health
```
### 7.5 Disk Space Management
When disk usage exceeds 85%:
```bash
# 1. Check disk usage by directory
du -sh /opt/letsbe/stacks/* | sort -rh | head -20
du -sh /opt/letsbe/backups/* | sort -rh
# 2. Clean Docker resources
docker system prune -f # Remove stopped containers, unused networks
docker image prune -a -f # Remove unused images
docker volume prune -f # Remove unused volumes (CAREFUL: verify first)
# 3. Clean old logs
find /var/log -name "*.gz" -mtime +30 -delete
docker container ls -a --format "{{.Names}}" | xargs -I {} docker logs {} --since 720h 2>/dev/null | wc -l
# 4. Clean old backups (if rotation isn't catching them)
find /opt/letsbe/backups/daily/ -maxdepth 1 -mtime +7 -exec rm -rf {} \;
# 5. If still above 85%, recommend tier upgrade to user
```
---
## 8. Deprovisioning
### 8.1 Customer Cancellation Flow
```
Customer requests cancellation
Hub: 48-hour cooling-off period
│ (Customer can cancel the cancellation)
Hub: 30-day data export window begins
│ Customer can:
│ - Download files via Nextcloud
│ - Export CRM data via Odoo
│ - Export email via IMAP
│ - SSH into server for full access
│ - Request a full backup via Hub
Hub: After 30 days → trigger deprovisioning
├── Revoke Safety Wrapper Hub API key
├── Stop all containers
├── Delete remote backups (rclone purge)
├── Request VPS deletion via Netcup API
│ └── Netcup wipes disk and destroys VPS
├── Delete all Netcup snapshots
├── Remove DNS records
└── Hub: soft-delete account data, retain billing records (7 years per HGB §257)
```
### 8.2 Emergency Server Isolation
If a tenant VPS is compromised or abusing the platform:
```bash
# 1. Revoke Hub API key immediately (Hub admin panel)
# 2. SSH into server (port 22022):
ssh -p 22022 letsbe-admin@{{server_ip}}
# 3. Stop the AI runtime
docker stop letsbe-openclaw letsbe-secrets-proxy
# 4. Block outbound traffic (except SSH)
ufw deny out to any
ufw allow out to any port 22022
# 5. Take a forensic snapshot via Netcup API
# 6. Assess and decide: remediate or deprovision
```
---
## 9. Disaster Recovery
### 9.1 Scenarios
| Scenario | RTO | RPO | Procedure |
|----------|-----|-----|-----------|
| Single container crash | <5 min | 0 (no data loss) | Auto-restart via Docker restart policy |
| Multiple container failure | <30 min | 0 | IT Admin agent investigates, restarts services |
| VPS disk corruption | 24 hours | 24 hours (last backup) | New VPS + restore from remote backup |
| VPS total loss | 24 hours | 24 hours | New VPS (same region) + restore |
| Netcup data center outage | 48 hours | 24 hours | New VPS in alternate region + restore |
| Hub outage | <1 hour | 0 (tenant VPS operates independently) | Hub restart/failover |
| OpenRouter outage | <5 min | 0 | Model fallback chain engages automatically |
### 9.2 Tenant VPS Operates Independently
A key architectural property: **tenant VPS continues operating even if the Hub is down.** The Safety Wrapper operates with its local config, the AI agents continue serving the user, and tools continue running. The Hub is needed only for:
- Billing and subscription management
- Config updates (new agents, autonomy changes)
- Approval queue (if approvals are routed through Hub instead of local)
- Monitoring dashboards
### 9.3 Recovery Testing
**Monthly:** Restore a random tool's database from backup on a staging VPS to verify backup integrity.
**Quarterly:** Full server restore drill — order a new VPS, run complete restore from remote backup, verify all tools and agents are functional.
---
## 10. Security Operations
### 10.1 SSH Access Audit
```bash
# Review successful SSH logins
journalctl -u sshd --since "7 days ago" | grep "Accepted"
# Review failed SSH attempts
journalctl -u sshd --since "7 days ago" | grep "Failed"
# Check fail2ban status
fail2ban-client status sshd
```
### 10.2 Container Security
```bash
# Check for containers running as root (should be minimal)
docker ps --format "{{.Names}}" | xargs -I {} docker inspect {} --format "{{.Config.User}}"
# Check for containers with excessive privileges
docker ps --format "{{.Names}}" | xargs -I {} docker inspect {} --format "{{.HostConfig.Privileged}}"
# Verify network isolation
docker network ls
docker network inspect bridge
```
### 10.3 Vulnerability Scanning
```bash
# Scan Docker images for known vulnerabilities (using Trivy)
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image --severity HIGH,CRITICAL {{image_name}}
# Scan all running containers
docker ps --format "{{.Image}}" | sort -u | while read img; do
trivy image --severity HIGH,CRITICAL "$img"
done
```
### 10.4 Incident Response Checklist
```
[ ] 1. Contain: Isolate affected VPS (Section 8.2)
[ ] 2. Assess: Determine scope (which data, which users affected)
[ ] 3. Preserve: Take forensic snapshot before changes
[ ] 4. Notify: Hub alerts → Matt → customer (within timelines per GDPR Art. 33/34)
[ ] 5. Remediate: Fix the vulnerability, rotate compromised credentials
[ ] 6. Restore: From clean backup if data was corrupted
[ ] 7. Verify: Full health check on all services
[ ] 8. Document: Post-mortem with root cause, timeline, actions taken
[ ] 9. Improve: Update runbook/monitoring to prevent recurrence
```
---
## 11. Common Operations Quick Reference
| Task | Command / Procedure |
|------|---------------------|
| Check all containers | `docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"` |
| Restart a tool | `cd /opt/letsbe/stacks/{{tool}} && docker compose restart` |
| View tool logs | `docker logs --tail 100 -f {{container_name}}` |
| Check disk usage | `df -h /opt/letsbe` |
| Check RAM usage | `free -h` |
| Run manual backup | `/opt/letsbe/scripts/backups.sh` |
| Restore a tool | `/opt/letsbe/scripts/restore.sh --tool {{tool}} --date YYYY-MM-DD` |
| Check SSL expiry | `certbot certificates` |
| Renew SSL | `certbot renew --nginx` |
| Check Safety Wrapper | `curl http://127.0.0.1:8100/health` |
| Check OpenClaw | `curl http://127.0.0.1:18789/health` |
| View backup status | `cat /opt/letsbe/backups/backup-status.json \| jq` |
| Check firewall | `ufw status verbose` |
| Check fail2ban | `fail2ban-client status sshd` |
---
## 12. Changelog
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-02-26 | Initial runbook. Covers: Netcup provisioning, 10-step pipeline, credential generation, backup/restore, monitoring stack, maintenance procedures, deprovisioning, disaster recovery, security operations, quick reference. |