docs(ops): backup/restore + email deliverability runbooks

Two new runbooks under docs/runbooks/ plus the automation scripts the
backup runbook references. Both are written so an operator who has only
the off-site backup credentials and the runbook can recover the system
unaided.

Backup/restore (Phase 4a):
- docs/runbooks/backup-and-restore.md — covers what gets backed up
  (Postgres / MinIO / .env+ENCRYPTION_KEY), schedule (hourly DB +
  hourly MinIO mirror, 7-day hourly + 30-day daily retention),
  cold-restore procedure with row-count verification, weekly drill
- scripts/backup/pg-backup.sh — pg_dump → gzip → optional GPG → mc
  upload, fails loud
- scripts/backup/minio-mirror.sh — incremental mc mirror, no --remove
  flag so accidental deletes on the live bucket can't cascade
- scripts/backup/restore.sh — interactive prod restore + --drill mode
  that runs against a sandbox DB and diffs row counts

Email deliverability (Phase 4b):
- docs/runbooks/email-deliverability.md — what the CRM sends, DNS
  records (SPF/DKIM/DMARC/MX), per-port override implications,
  diagnosis flow ("didn't arrive" → 4-step checklist starting with
  EMAIL_REDIRECT_TO), provider migration plan, realapi suite as the
  end-to-end probe

Tests still 778/778 vitest, tsc/lint clean — these phases are docs +
shell scripts, no code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt Ciaccio
2026-04-28 20:10:30 +02:00
parent a3305a94f3
commit 6eb0d3dc92
5 changed files with 620 additions and 0 deletions

View File

@@ -0,0 +1,63 @@
#!/usr/bin/env bash
# Hourly PostgreSQL backup for Port Nimara CRM.
#
# Reads DATABASE_URL and BACKUP_S3_* from the environment. Dumps to a
# tmpfile, gzips, optionally GPG-encrypts to BACKUP_GPG_RECIPIENT, and
# uploads to s3://${BACKUP_S3_BUCKET}/pg/<hostname>/<UTC-date>/<hour>.dump.gz[.gpg].
#
# Designed to fail loud: any non-zero exit halts the script and propagates
# to the cron / CI runner so the operator sees the failure.
set -euo pipefail
: "${DATABASE_URL:?DATABASE_URL not set}"
: "${BACKUP_S3_BUCKET:?BACKUP_S3_BUCKET not set}"
: "${BACKUP_S3_ENDPOINT:?BACKUP_S3_ENDPOINT not set}"
: "${BACKUP_S3_ACCESS_KEY:?BACKUP_S3_ACCESS_KEY not set}"
: "${BACKUP_S3_SECRET_KEY:?BACKUP_S3_SECRET_KEY not set}"
HOST="${BACKUP_HOST_OVERRIDE:-$(hostname -s)}"
DATE_UTC="$(date -u +%Y-%m-%d)"
HOUR_UTC="$(date -u +%H)"
WORKDIR="$(mktemp -d)"
trap 'rm -rf "$WORKDIR"' EXIT
DUMP_FILE="$WORKDIR/${HOUR_UTC}.dump"
ARCHIVE_NAME="${HOUR_UTC}.dump.gz"
echo "[$(date -u +%FT%TZ)] Dumping $DATABASE_URL$DUMP_FILE"
pg_dump --format=custom --compress=9 --no-owner --no-privileges \
--file="$DUMP_FILE" "$DATABASE_URL"
# pg_dump's `custom` format is already compressed, but we wrap in gzip so
# the file looks the same regardless of the dump format on disk.
gzip -n "$DUMP_FILE"
GZ_FILE="${DUMP_FILE}.gz"
# Optional GPG layer. Only encrypt if the recipient is configured.
if [[ -n "${BACKUP_GPG_RECIPIENT:-}" ]]; then
echo "[$(date -u +%FT%TZ)] Encrypting for $BACKUP_GPG_RECIPIENT"
gpg --batch --yes --trust-model always \
--recipient "$BACKUP_GPG_RECIPIENT" \
--encrypt --output "${GZ_FILE}.gpg" "$GZ_FILE"
rm "$GZ_FILE"
GZ_FILE="${GZ_FILE}.gpg"
ARCHIVE_NAME="${ARCHIVE_NAME}.gpg"
fi
# Configure mc client for the backup destination.
MC_ALIAS="bk-$$"
mc alias set "$MC_ALIAS" "$BACKUP_S3_ENDPOINT" \
"$BACKUP_S3_ACCESS_KEY" "$BACKUP_S3_SECRET_KEY" \
--api S3v4 >/dev/null
REMOTE_PATH="${MC_ALIAS}/${BACKUP_S3_BUCKET}/pg/${HOST}/${DATE_UTC}/${ARCHIVE_NAME}"
echo "[$(date -u +%FT%TZ)] Uploading → $REMOTE_PATH"
mc cp --quiet "$GZ_FILE" "$REMOTE_PATH"
# Tag with retention metadata so lifecycle rules can decide what to expire.
mc tag set "$REMOTE_PATH" "kind=hourly&host=${HOST}&date=${DATE_UTC}" >/dev/null
mc alias remove "$MC_ALIAS" >/dev/null
echo "[$(date -u +%FT%TZ)] OK ${ARCHIVE_NAME} ($(du -h "$GZ_FILE" | cut -f1))"