LetsBeBiz-Redesign/docs/technical/LetsBe_Biz_Infrastructure_R...

# LetsBe Biz — Infrastructure Runbook

**Version:** 1.0
**Date:** February 26, 2026
**Authors:** Matt (Founder), Claude (Architecture)
**Status:** Engineering Spec — Ready for Implementation
**Companion docs:** Technical Architecture v1.2, Tool Catalog v2.2, Security & GDPR Framework v1.1
**Decision refs:** Foundation Document Decisions #18, #27

---

## 1. Purpose

This runbook is the operational reference for provisioning, managing, monitoring, and maintaining LetsBe Biz infrastructure. It covers the full lifecycle: from ordering a VPS through Netcup to deprovisioning a customer's server at account termination.

**Target audience:** Matt (operations), future engineering team, and the IT Admin AI agent (for self-referencing operational procedures).

---

## 2. Infrastructure Overview

### 2.1 Hosting Provider: Netcup

| Item | Detail |
|------|--------|
| **Provider** | Netcup GmbH (Karlsruhe, Germany) |
| **Product line** | VPS (Virtual Private Server) |
| **EU data center** | Netcup Nürnberg/Karlsruhe, Germany |
| **NA data center** | Netcup Manassas, Virginia, USA |
| **API** | SCP (Server Control Panel) REST API with OAuth2 Device Flow |
| **Hub integration** | Full — server ordering, power actions, metrics, snapshots, rescue mode via `netcupService.ts` |

### 2.2 Server Tiers

| Tier | vCPUs | RAM | Disk | Recommended Tools | Monthly Cost (est.) |
|------|-------|-----|------|-------------------|---------------------|
| Lite (€29) | 4 | 8 GB | 160 GB SSD | 5–8 tools | ~€8–12 |
| Build (€45) | 8 | 16 GB | 320 GB SSD | 10–15 tools | ~€14–18 |
| Scale (€75) | 12 | 32 GB | 640 GB SSD | 15–25 tools | ~€22–28 |
| Enterprise (€109) | 16 | 64 GB | 1.2 TB SSD | 28+ tools | ~€35–45 |

### 2.3 Network Architecture

```
Internet
    │
    ▼
Netcup VPS (public IP)
    │
    ├── Port 80 (HTTP → 301 redirect to HTTPS)
    ├── Port 443 (HTTPS → nginx reverse proxy)
    ├── Port 22022 (SSH — hardened, key-only)
    │
    ▼
nginx (Alpine container)
    │
    ├── *.{{domain}} → Route by subdomain to tool containers
    │   ├── files.{{domain}} → 127.0.0.1:3023 (Nextcloud)
    │   ├── crm.{{domain}} → 127.0.0.1:3025 (Odoo)
    │   ├── chat.{{domain}} → 127.0.0.1:3026 (Chatwoot)
    │   ├── blog.{{domain}} → 127.0.0.1:3029 (Ghost)
    │   ├── mail.{{domain}} → 127.0.0.1:3031 (Stalwart Mail)
    │   ├── ... (33 nginx configs total)
    │   └── status.{{domain}} → 127.0.0.1:3008 (Uptime Kuma)
    │
    └── Internal only (not exposed via nginx):
        ├── 127.0.0.1:18789 (OpenClaw Gateway)
        ├── 127.0.0.1:8100 (Secrets Proxy)
        └── Various internal tool ports
```

---

## 3. Provisioning Pipeline

### 3.1 End-to-End Flow

```
Customer signs up → Stripe payment → Hub creates Order
    │
    ▼
Hub Automation Worker (state machine)
    │
    ├── PAYMENT_CONFIRMED → order VPS from Netcup (if AUTO mode)
    ├── AWAITING_SERVER → poll Netcup until VPS is ready
    ├── SERVER_READY → wait for DNS records
    ├── DNS_PENDING → verify A records for all subdomains
    ├── DNS_READY → trigger provisioning
    ├── PROVISIONING → spawn Docker provisioner container
    │   │
    │   ▼
    │   letsbe-provisioner (10-step pipeline via SSH)
    │   ├── Step 1: System packages (apt update, essentials)
    │   ├── Step 2: Docker CE installation
    │   ├── Step 3: Disable conflicting services
    │   ├── Step 4: nginx + fallback config
    │   ├── Step 5: UFW firewall (80, 443, 22022)
    │   ├── Step 6: Admin user + SSH key (optional)
    │   ├── Step 7: SSH hardening (port 22022, key-only)
    │   ├── Step 8: Unattended security updates
    │   ├── Step 9: Deploy tool stacks (docker-compose)
    │   └── Step 10: Deploy OpenClaw + Safety Wrapper + bootstrap
    │
    ├── FULFILLED → server is live, customer notified
    └── FAILED → retry logic (1min / 5min / 15min backoff, max 3 attempts)
```

### 3.2 Provisioner Detail (setup.sh)

**Location:** `letsbe-provisioner/scripts/setup.sh` (~832 lines)

#### Step 1: System Packages

```bash
apt-get update && apt-get upgrade -y
apt-get install -y curl wget gnupg2 ca-certificates lsb-release apt-transport-https \
  software-properties-common unzip jq htop iotop net-tools dnsutils certbot \
  python3-certbot-nginx fail2ban rclone
```

#### Step 2: Docker CE

```bash
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
apt-get update && apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
systemctl enable --now docker
```

#### Step 3: Disable Conflicting Services

```bash
systemctl stop apache2 2>/dev/null || true
systemctl disable apache2 2>/dev/null || true
systemctl stop postfix 2>/dev/null || true
systemctl disable postfix 2>/dev/null || true
```

#### Step 4: nginx

Deploy nginx Alpine container with initial fallback config. SSL certificates provisioned via certbot after DNS is verified.

#### Step 5: UFW Firewall

```bash
ufw default deny incoming
ufw default allow outgoing
ufw allow 80/tcp    # HTTP
ufw allow 443/tcp   # HTTPS
ufw allow 22022/tcp # SSH (hardened port)
ufw allow 25/tcp    # SMTP (Stalwart Mail)
ufw allow 587/tcp   # SMTP submission
ufw allow 993/tcp   # IMAPS
ufw --force enable
```

#### Step 6: Admin User

```bash
useradd -m -s /bin/bash -G docker letsbe-admin
mkdir -p /home/letsbe-admin/.ssh
echo "{{admin_ssh_public_key}}" > /home/letsbe-admin/.ssh/authorized_keys
chmod 700 /home/letsbe-admin/.ssh
chmod 600 /home/letsbe-admin/.ssh/authorized_keys
chown -R letsbe-admin:letsbe-admin /home/letsbe-admin/.ssh
```

#### Step 7: SSH Hardening

```bash
# /etc/ssh/sshd_config modifications:
Port 22022
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 30
AllowUsers letsbe-admin
```

#### Step 8: Unattended Security Updates

```bash
apt-get install -y unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
# Configure /etc/apt/apt.conf.d/50unattended-upgrades for security-only updates
```

#### Step 9: Deploy Tool Stacks

For each tool selected by the customer:

```bash
# 1. Generate credentials (env_setup.sh)
#    50+ secrets: database passwords, admin tokens, API keys, JWT secrets
#    Written to /opt/letsbe/env/credentials.env and per-tool .env files

# 2. Deploy Docker Compose stacks
for stack in {{selected_tools}}; do
  cd /opt/letsbe/stacks/$stack
  docker compose up -d
done

# 3. Deploy nginx configs per tool
for conf in {{selected_nginx_configs}}; do
  cp /opt/letsbe/nginx/sites/$conf /etc/nginx/sites-enabled/
done
nginx -t && nginx -s reload

# 4. Request SSL certificates
certbot --nginx -d "*.{{domain}}" --non-interactive --agree-tos -m "ssl@{{domain}}"
```

#### Step 10: Deploy OpenClaw + Safety Wrapper + Bootstrap

```bash
# 1. Deploy OpenClaw container with Safety Wrapper extension pre-installed
cd /opt/letsbe/stacks/openclaw
docker compose up -d

# 2. Deploy Secrets Proxy
cd /opt/letsbe/stacks/secrets-proxy
docker compose up -d

# 3. Seed secrets registry from credentials.env
docker exec letsbe-openclaw /opt/letsbe/scripts/seed-secrets.sh

# 4. Generate tool-registry.json from deployed tools
docker exec letsbe-openclaw /opt/letsbe/scripts/generate-tool-registry.sh

# 5. Deploy SOUL.md files for each agent
# (generated from templates with tenant variables substituted)

# 6. Run initial setup browser automations
# (Cal.com, Chatwoot, Keycloak, Nextcloud, Stalwart Mail, Umami, Uptime Kuma)

# 7. Register with Hub
docker exec letsbe-openclaw /opt/letsbe/scripts/hub-register.sh

# 8. Clean up config.json (CRITICAL: remove plaintext passwords)
rm -f /opt/letsbe/config.json
```

### 3.3 Credential Generation (env_setup.sh)

**Location:** `letsbe-provisioner/scripts/env_setup.sh` (~678 lines)

Generates 50+ unique credentials per tenant:

| Category | Count | Examples |
|----------|-------|---------|
| Database passwords | 18 | PostgreSQL passwords for each tool with a DB |
| Admin passwords | 12 | Nextcloud admin, Keycloak admin, Odoo admin, etc. |
| API tokens | 10 | NocoDB API token, Ghost admin API key, etc. |
| JWT secrets | 5 | Chatwoot, Cal.com, OpenClaw, etc. |
| Encryption keys | 3 | Safety Wrapper registry key, backup encryption key |
| SSH keys | 2 | Admin key pair, Hub communication key |
| SMTP credentials | 2 | Stalwart Mail admin, relay credentials |

**Generation method:** `openssl rand -base64 32` for passwords, `openssl rand -hex 32` for tokens, `ssh-keygen -t ed25519` for SSH keys.

**Template rendering:** All `{{ variable }}` placeholders in Docker Compose files and nginx configs are substituted with generated values.

### 3.4 Post-Provisioning Verification

After step 10 completes, the provisioner runs health checks:

```bash
# 1. Verify all containers are running
docker ps --format "{{.Names}}: {{.Status}}" | grep -v "Up" && exit 1

# 2. Verify nginx is serving
curl -sf https://{{domain}} > /dev/null || exit 1

# 3. Verify each tool's health endpoint
for tool in {{health_check_urls}}; do
  curl -sf "$tool" > /dev/null || echo "WARNING: $tool not responding"
done

# 4. Verify Safety Wrapper registered with Hub
curl -sf http://127.0.0.1:8100/health || exit 1

# 5. Verify OpenClaw is responsive
curl -sf http://127.0.0.1:18789/health || exit 1

# 6. Report success to Hub
curl -X PATCH "{{hub_url}}/api/v1/jobs/{{job_id}}" \
  -H "Authorization: Bearer {{runner_token}}" \
  -d '{"status": "COMPLETED"}'
```

---

## 4. Backup System

### 4.1 Backup Architecture

**Location:** `letsbe-provisioner/scripts/backups.sh` (~473 lines)
**Schedule:** Daily via cron at 02:00 server local time
**Retention:** 7 daily backups + 4 weekly backups (rolling)

### 4.2 What Gets Backed Up

| Component | Method | Target |
|-----------|--------|--------|
| PostgreSQL databases (18) | `pg_dump --format=custom` | `/opt/letsbe/backups/daily/` |
| MySQL databases (2) | `mysqldump --single-transaction` | `/opt/letsbe/backups/daily/` |
| MongoDB databases (1) | `mongodump --archive` | `/opt/letsbe/backups/daily/` |
| Nextcloud files | rsync snapshot | `/opt/letsbe/backups/daily/nextcloud/` |
| Docker volumes (critical) | `docker run --volumes-from` tar | `/opt/letsbe/backups/daily/volumes/` |
| nginx configs | tar archive | `/opt/letsbe/backups/daily/nginx/` |
| OpenClaw state | tar of `~/.openclaw/` | `/opt/letsbe/backups/daily/openclaw/` |
| Safety Wrapper state | SQLite backup API | `/opt/letsbe/backups/daily/safety-wrapper/` |
| Credentials | Encrypted tar | `/opt/letsbe/backups/daily/credentials.enc` |

### 4.3 Remote Backup

After local backup completes, `rclone` syncs to a remote destination:

```bash
rclone sync /opt/letsbe/backups/ remote:backups/{{tenant_id}}/ \
  --transfers 4 \
  --checkers 8 \
  --fast-list \
  --log-file /var/log/letsbe/rclone.log
```

Remote destination options (configured per tenant):
- Netcup S3 (default)
- Customer-provided S3 bucket
- Customer-provided rclone remote

### 4.4 Backup Status Reporting

After each backup run, `backups.sh` writes a `backup-status.json`:

```json
{
  "timestamp": "2026-02-26T02:15:00Z",
  "status": "success",
  "duration_seconds": 847,
  "databases_backed_up": 21,
  "files_backed_up": true,
  "remote_sync": "success",
  "total_size_gb": 4.2,
  "errors": []
}
```

The Safety Wrapper monitors this file (Decision #27) and reports status to the Hub via heartbeat.

### 4.5 Backup Rotation

```bash
# Daily: keep last 7
find /opt/letsbe/backups/daily/ -maxdepth 1 -mtime +7 -exec rm -rf {} \;

# Weekly: copy Sunday's backup to weekly/, keep last 4
if [ "$(date +%u)" = "7" ]; then
  cp -a /opt/letsbe/backups/daily/ /opt/letsbe/backups/weekly/$(date +%Y-%m-%d)/
fi
find /opt/letsbe/backups/weekly/ -maxdepth 1 -mtime +28 -exec rm -rf {} \;
```

---

## 5. Restore Procedures

### 5.1 Per-Tool Restore

**Location:** `letsbe-provisioner/scripts/restore.sh` (~512 lines)

```bash
# Restore a specific tool's database from a daily backup
./restore.sh --tool nextcloud --date 2026-02-25

# Steps:
# 1. Stop the tool container
# 2. Restore database from backup
# 3. Restore files (if applicable)
# 4. Start the tool container
# 5. Verify health check
# 6. Report to Hub
```

### 5.2 Full Server Restore

For complete server recovery (e.g., VPS failure):

```
1. Order new VPS from Netcup (same region, same tier)
2. Run provisioner with --restore flag
   - Steps 1-8: Standard server setup
   - Step 9: Deploy tool stacks (empty)
   - Step 10: Deploy OpenClaw + Safety Wrapper
3. Restore from remote backup:
   rclone sync remote:backups/{{tenant_id}}/latest/ /opt/letsbe/backups/daily/
4. Run restore.sh --all
   - Restores all 21 databases
   - Restores all file volumes
   - Restores OpenClaw state
   - Restores Safety Wrapper secrets registry
   - Restores credentials
5. Verify all tools are healthy
6. Update DNS if IP changed
7. Hub updates server connection record
```

### 5.3 Point-in-Time Recovery

For accidental data deletion by a user:

```
1. Identify the backup date that contains the needed data
2. Restore the specific tool to a temporary container:
   ./restore.sh --tool odoo --date 2026-02-23 --target temp
3. Extract the needed data from the temp container
4. Import the data into the production tool
5. Remove the temp container
```

---

## 6. Monitoring

### 6.1 Uptime Kuma (On-Tenant)

Each tenant VPS runs Uptime Kuma monitoring all local services:

| Monitor | Type | Interval | Alert Threshold |
|---------|------|----------|-----------------|
| nginx | HTTP(S) | 60s | 3 failures |
| Each tool container | HTTP | 120s | 3 failures |
| OpenClaw Gateway | HTTP (port 18789) | 60s | 2 failures |
| Secrets Proxy | HTTP (port 8100) | 60s | 2 failures |
| SSL certificate expiry | Certificate | Daily | 14 days before expiry |
| Disk usage | Push | 300s | >85% |

### 6.2 Hub-Level Monitoring

The Hub monitors all tenant servers centrally:

| Metric | Source | Check Interval | Alert |
|--------|--------|---------------|-------|
| Heartbeat received | Safety Wrapper | Expected every 5 min | Missing >15 min |
| Token usage rate | Safety Wrapper heartbeat | Every heartbeat | >90% pool consumed |
| Backup status | Safety Wrapper (reads backup-status.json) | Daily | Any backup failure |
| Container health | Portainer API (via Hub) | Every 10 min | Container crash/OOM |
| VPS metrics | Netcup SCP API | Every 15 min | CPU >90% sustained, disk >90% |
| OpenClaw version | Safety Wrapper heartbeat | Every heartbeat | Version mismatch with expected |

### 6.3 GlitchTip (Error Tracking)

GlitchTip runs on each tenant and captures application errors from:
- OpenClaw (Node.js errors, unhandled rejections)
- Safety Wrapper (hook errors, tool execution failures)
- Tool containers that support Sentry-compatible error reporting

### 6.4 Diun (Container Update Notifications)

Diun monitors all Docker images for new releases:

```yaml
# /opt/letsbe/stacks/diun/docker-compose.yml
watch:
  schedule: "0 6 * * *"  # Check daily at 06:00
notif:
  webhook:
    endpoint: "http://127.0.0.1:8100/webhooks/diun"  # Safety Wrapper
    method: POST
```

The Safety Wrapper receives update notifications and:
1. Logs the available update
2. Reports to Hub via heartbeat
3. Does NOT auto-update (updates require IT Admin agent or manual action)

---

## 7. Maintenance Procedures

### 7.1 Tool Updates

Tool container updates are initiated by the IT Admin agent or manually:

```bash
# 1. Pull new image
cd /opt/letsbe/stacks/{{tool}}
docker compose pull

# 2. Backup the tool's database
./backups.sh --tool {{tool}}

# 3. Rolling update
docker compose up -d --force-recreate

# 4. Verify health check
curl -sf http://127.0.0.1:{{port}}/health

# 5. If health check fails, rollback:
docker compose down
docker tag {{tool}}:previous {{tool}}:latest
docker compose up -d
```

### 7.2 OpenClaw Updates

OpenClaw is pinned to a tested release tag. Update procedure:

```bash
# 1. Check upstream changelog for breaking changes
# 2. Test in staging VPS first

# 3. On tenant VPS:
cd /opt/letsbe/stacks/openclaw

# 4. Backup OpenClaw state
tar czf /opt/letsbe/backups/openclaw-pre-update.tar.gz ~/.openclaw/

# 5. Update image tag in docker-compose.yml
sed -i 's/openclaw:v2026.2.1/openclaw:v2026.3.0/' docker-compose.yml

# 6. Pull and recreate
docker compose pull && docker compose up -d --force-recreate

# 7. Verify
curl -sf http://127.0.0.1:18789/health
docker exec letsbe-openclaw openclaw --version

# 8. If verification fails, rollback:
docker compose down
sed -i 's/openclaw:v2026.3.0/openclaw:v2026.2.1/' docker-compose.yml
docker compose up -d
tar xzf /opt/letsbe/backups/openclaw-pre-update.tar.gz -C /
```

**Update cadence:** Monthly review of upstream changelog. Update only for security fixes or features we need. Never update on Fridays.

### 7.3 SSL Certificate Renewal

Let's Encrypt certificates auto-renew via certbot cron. Manual renewal if needed:

```bash
certbot renew --nginx --force-renewal
nginx -t && nginx -s reload
```

### 7.4 Credential Rotation

The IT Admin agent can rotate credentials for any tool:

```bash
# 1. Generate new credential
NEW_PASS=$(openssl rand -base64 32)

# 2. Update the tool's .env file
sed -i "s/DB_PASSWORD=.*/DB_PASSWORD=$NEW_PASS/" /opt/letsbe/stacks/{{tool}}/.env

# 3. Update the database user's password
docker exec {{tool}}-db psql -c "ALTER USER {{user}} PASSWORD '$NEW_PASS';"

# 4. Restart the tool container
docker compose -f /opt/letsbe/stacks/{{tool}}/docker-compose.yml restart

# 5. Update the secrets registry
# (Safety Wrapper detects .env change and updates registry automatically)

# 6. Verify tool health
curl -sf http://127.0.0.1:{{port}}/health
```

### 7.5 Disk Space Management

When disk usage exceeds 85%:

```bash
# 1. Check disk usage by directory
du -sh /opt/letsbe/stacks/* | sort -rh | head -20
du -sh /opt/letsbe/backups/* | sort -rh

# 2. Clean Docker resources
docker system prune -f          # Remove stopped containers, unused networks
docker image prune -a -f        # Remove unused images
docker volume prune -f          # Remove unused volumes (CAREFUL: verify first)

# 3. Clean old logs
find /var/log -name "*.gz" -mtime +30 -delete
docker container ls -a --format "{{.Names}}" | xargs -I {} docker logs {} --since 720h 2>/dev/null | wc -l

# 4. Clean old backups (if rotation isn't catching them)
find /opt/letsbe/backups/daily/ -maxdepth 1 -mtime +7 -exec rm -rf {} \;

# 5. If still above 85%, recommend tier upgrade to user
```

---

## 8. Deprovisioning

### 8.1 Customer Cancellation Flow

```
Customer requests cancellation
    │
    ▼
Hub: 48-hour cooling-off period
    │ (Customer can cancel the cancellation)
    ▼
Hub: 30-day data export window begins
    │ Customer can:
    │ - Download files via Nextcloud
    │ - Export CRM data via Odoo
    │ - Export email via IMAP
    │ - SSH into server for full access
    │ - Request a full backup via Hub
    ▼
Hub: After 30 days → trigger deprovisioning
    │
    ├── Revoke Safety Wrapper Hub API key
    ├── Stop all containers
    ├── Delete remote backups (rclone purge)
    ├── Request VPS deletion via Netcup API
    │   └── Netcup wipes disk and destroys VPS
    ├── Delete all Netcup snapshots
    ├── Remove DNS records
    └── Hub: soft-delete account data, retain billing records (7 years per HGB §257)
```

### 8.2 Emergency Server Isolation

If a tenant VPS is compromised or abusing the platform:

```bash
# 1. Revoke Hub API key immediately (Hub admin panel)
# 2. SSH into server (port 22022):
ssh -p 22022 letsbe-admin@{{server_ip}}

# 3. Stop the AI runtime
docker stop letsbe-openclaw letsbe-secrets-proxy

# 4. Block outbound traffic (except SSH)
ufw deny out to any
ufw allow out to any port 22022

# 5. Take a forensic snapshot via Netcup API
# 6. Assess and decide: remediate or deprovision
```

---

## 9. Disaster Recovery

### 9.1 Scenarios

| Scenario | RTO | RPO | Procedure |
|----------|-----|-----|-----------|
| Single container crash | <5 min | 0 (no data loss) | Auto-restart via Docker restart policy |
| Multiple container failure | <30 min | 0 | IT Admin agent investigates, restarts services |
| VPS disk corruption | 2–4 hours | 24 hours (last backup) | New VPS + restore from remote backup |
| VPS total loss | 2–4 hours | 24 hours | New VPS (same region) + restore |
| Netcup data center outage | 4–8 hours | 24 hours | New VPS in alternate region + restore |
| Hub outage | <1 hour | 0 (tenant VPS operates independently) | Hub restart/failover |
| OpenRouter outage | <5 min | 0 | Model fallback chain engages automatically |

### 9.2 Tenant VPS Operates Independently

A key architectural property: **tenant VPS continues operating even if the Hub is down.** The Safety Wrapper operates with its local config, the AI agents continue serving the user, and tools continue running. The Hub is needed only for:
- Billing and subscription management
- Config updates (new agents, autonomy changes)
- Approval queue (if approvals are routed through Hub instead of local)
- Monitoring dashboards

### 9.3 Recovery Testing

**Monthly:** Restore a random tool's database from backup on a staging VPS to verify backup integrity.

**Quarterly:** Full server restore drill — order a new VPS, run complete restore from remote backup, verify all tools and agents are functional.

---

## 10. Security Operations

### 10.1 SSH Access Audit

```bash
# Review successful SSH logins
journalctl -u sshd --since "7 days ago" | grep "Accepted"

# Review failed SSH attempts
journalctl -u sshd --since "7 days ago" | grep "Failed"

# Check fail2ban status
fail2ban-client status sshd
```

### 10.2 Container Security

```bash
# Check for containers running as root (should be minimal)
docker ps --format "{{.Names}}" | xargs -I {} docker inspect {} --format "{{.Config.User}}"

# Check for containers with excessive privileges
docker ps --format "{{.Names}}" | xargs -I {} docker inspect {} --format "{{.HostConfig.Privileged}}"

# Verify network isolation
docker network ls
docker network inspect bridge
```

### 10.3 Vulnerability Scanning

```bash
# Scan Docker images for known vulnerabilities (using Trivy)
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy image --severity HIGH,CRITICAL {{image_name}}

# Scan all running containers
docker ps --format "{{.Image}}" | sort -u | while read img; do
  trivy image --severity HIGH,CRITICAL "$img"
done
```

### 10.4 Incident Response Checklist

```
[ ] 1. Contain: Isolate affected VPS (Section 8.2)
[ ] 2. Assess: Determine scope (which data, which users affected)
[ ] 3. Preserve: Take forensic snapshot before changes
[ ] 4. Notify: Hub alerts → Matt → customer (within timelines per GDPR Art. 33/34)
[ ] 5. Remediate: Fix the vulnerability, rotate compromised credentials
[ ] 6. Restore: From clean backup if data was corrupted
[ ] 7. Verify: Full health check on all services
[ ] 8. Document: Post-mortem with root cause, timeline, actions taken
[ ] 9. Improve: Update runbook/monitoring to prevent recurrence
```

---

## 11. Common Operations Quick Reference

| Task | Command / Procedure |
|------|---------------------|
| Check all containers | `docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"` |
| Restart a tool | `cd /opt/letsbe/stacks/{{tool}} && docker compose restart` |
| View tool logs | `docker logs --tail 100 -f {{container_name}}` |
| Check disk usage | `df -h /opt/letsbe` |
| Check RAM usage | `free -h` |
| Run manual backup | `/opt/letsbe/scripts/backups.sh` |
| Restore a tool | `/opt/letsbe/scripts/restore.sh --tool {{tool}} --date YYYY-MM-DD` |
| Check SSL expiry | `certbot certificates` |
| Renew SSL | `certbot renew --nginx` |
| Check Safety Wrapper | `curl http://127.0.0.1:8100/health` |
| Check OpenClaw | `curl http://127.0.0.1:18789/health` |
| View backup status | `cat /opt/letsbe/backups/backup-status.json \| jq` |
| Check firewall | `ufw status verbose` |
| Check fail2ban | `fail2ban-client status sshd` |

---

## 12. Changelog

| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-02-26 | Initial runbook. Covers: Netcup provisioning, 10-step pipeline, credential generation, backup/restore, monitoring stack, maintenance procedures, deprovisioning, disaster recovery, security operations, quick reference. |