Files
pn-new-crm/docs/deployment-plan.md
Matt 0416dc8d39 docs(launch): website-integration env vars + cutover sequence
deployment-plan.md gains a full env-var reference (CRM + website) and the cutover env-flip sequence; launch-readiness.md gets the 2026-06-02 closeout; BACKLOG.md adds the deferred integration-health-panel idea (section L).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 17:22:12 +02:00

21 KiB

Production Deployment Plan — Port Nimara CRM

Status: DRAFT · pre-deployment · 2026-05-31 Target: https://crm.portnimara.com on the PN Cloud server. Companion: docs/launch-readiness.md (Initiative 5 — cutover). Credentials live in private/deployment-creds.md (gitignored) — never put secrets in this file.

Guardrails (non-negotiable)

  1. No change to anything on the prod server without Matt's explicit per-action approval. Recon/reads are fine; every sudo, every file write, every docker mutation, every certbot run is approved individually before it runs.
  2. Documenso is VITAL. It has broken on past upgrades. Nothing touches the Documenso DB, volumes, or container until a verified backup + S3↔DB reconciliation exists AND the upgrade step is explicitly approved.
  3. Work one phase at a time; verify before moving on. Keep a rollback for each mutating step.

Access (established 2026-05-31)

What Detail Verified
Prod server (SSH) 45.142.177.246:22022, user stefan, key id_ed25519_2026 (macOS keychain) connected, key auth
Gitea API https://code.letsbe.solutions as matt (admin) — reads build status, warnings, errors v1.25.5, repo letsbe/pn-new-crm
Container registry code.letsbe.solutions/letsbe/pn-new-crm/{crm-app,crm-worker} CI pushes :latest + :<sha>

Notes:

  • stefan is unprivileged (uid 1000, not in the docker group; sudo prompts for a password). Every docker / nginx / certbot / cert-read step needs sudo (root pass in private/deployment-creds.mdVERIFY; the per-server creds file had MOPC's pass by mistake).
  • Reading build logs: GET /api/v1/repos/letsbe/pn-new-crm/actions/tasks (run status) + per-job logs; latest main build is success.

How builds reach prod

git push origin main → Gitea Actions .gitea/workflows/build.yml:

  1. lint job: pnpm lint + pnpm exec tsc --noEmit.
  2. build-and-push job (main only): builds Dockerfilecrm-app and Dockerfile.workercrm-worker, pushes :latest + :<sha> to the Gitea registry.

Prod pulls those images — it does not build. So a deploy is: push → wait for green CI → docker compose pull + up -d on the server.


Prod stack (docker-compose.prod.yml)

Service Image Notes
postgres postgres:16-alpine self-contained, volume pgdata
redis redis:7-alpine self-contained, volume redisdata (BullMQ + socket.io adapter)
crm-app registry crm-app:latest host 7100 → container 3000
crm-worker registry crm-worker:latest BullMQ worker
  • Storage: no MinIO service in the compose — the CRM uses external MinIO via system_settings.storage_backend + getStorageBackend(). The existing prod MinIO (:9000, s3.conf / minio.conf nginx vhosts) is the backend. Confirm bucket + keys (creds file §3).
  • Decision needed: does the CRM get its own Postgres (the compose default, isolated pgdata) or reuse an existing prod Postgres instance? Default = the compose's own Postgres (cleanest isolation). Confirm.

Phase 1 — crm.portnimara.com go-live

DNS already points crm.portnimara.com at the server. No crm.portnimara nginx vhost exists yet (fresh setup). Template: portnimara_dev.conf (reverse-proxy + Certbot pattern already in use on this box).

Pre-flight (no approval needed — prep only)

  • Assemble the prod .env for the CRM. Source of truth: src/lib/env.ts (Zod schema) + .env.example. Critical keys:
    • APP_URL=https://crm.portnimara.com
    • DATABASE_URL (compose Postgres), REDIS_*
    • storage / MinIO (endpoint, access/secret, bucket) — creds file §3
    • DOCUMENSO_API_URL (bare host, no /api/v1), DOCUMENSO_API_VERSION, API key
    • better-auth secret, WEBSITE_INTAKE_SECRET, SMTP/IMAP
    • EMAIL_REDIRECT_TO MUST be unset in prod.
  • Server can pull from the registry: docker login code.letsbe.solutions with a registry token (creds file §2 — generate a Gitea token; do not bake the account password into the server).

Step 1 — nginx vhost (⚠ approval)

  1. Create /etc/nginx/sites-available/crm_portnimara.conf modelled on portnimara_dev.conf: port-80 → 443 redirect + .well-known/acme-challenge location; port-443 server proxy_pass http://127.0.0.1:7100 with the same header block (Host, X-Real-IP, CF-Connecting-IP, X-Forwarded-_, websocket Upgrade/Connection for socket.io), client_max_body_size 64M, proxy_read_timeout 300, buffering off. HTTP-only first (no ssl\__ lines yet) so Certbot can complete the challenge.
  2. Symlink into sites-enabled/.
  3. sudo nginx -t — must pass. Then sudo systemctl reload nginx.

Step 2 — TLS cert (⚠ approval)

  • sudo certbot --nginx -d crm.portnimara.com — pulls + installs the cert, rewrites the vhost with the managed ssl_certificate lines + 80→443 redirect. Re-run sudo nginx -t + reload.

Step 3 — bring up the container (⚠ approval)

  1. Place docker-compose.prod.yml + the prod .env in the deploy dir (e.g. /opt/pn-crm — confirm location).
  2. sudo docker login code.letsbe.solutions (registry token).
  3. sudo docker compose -f docker-compose.prod.yml pull.
  4. sudo docker compose -f docker-compose.prod.yml up -d.
  5. Watch for errors: sudo docker compose logs -f crm-app crm-worker.
  6. Apply schema: migrations via psql (per CLAUDE.md db:migrate is broken) or the app's push path — confirm the prod migration approach.
  7. Seed/bootstrap the port + admin user as needed.

Verify

  • curl -fsS https://crm.portnimara.com/api/public/health{status:"ok"...}
  • Authenticated health w/ X-Intake-Secret{checks:{db,redis}}
  • Login loads, branding renders, a berth list + a deal render.
  • socket.io realtime connects (websocket upgrade through nginx works).
  • No 42703 column errors (restart crm-app after any schema change).

Phase 2 — Documenso v1.13.1 → v2.x upgrade (VITAL — execute SOBER, heavily gated)

Do not execute while impaired. This is the production signing system. Every mutating step needs an explicit, sober go/no-go. The runbook below is reference; the actual run is a scheduled session.

Verified facts (2026-05-31 recon + research)

Item Value
Current version documenso/documenso:v1.13.1 (Oct 2025 — last v1)
Latest version v2.11.0 (May 2026). Path: 1.13.1 → 2.0.0 → … → 2.11.0 (major jump)
Compose /root/docker-compose/documenso/docker-compose.yml (project documenso-production, services documenso + database)
DB postgres:15, db documenso_db, user admin, vol documenso-production_documenso-database/var/lib/postgresql/data
App port container 3000 → host 3020; served at https://signatures.portnimara.dev (nginx documenso.conf, direct — no Cloudflare)
Storage external MinIO, bucket signatures @ s3.portnimara.com, region eu-central-1
Signing cert /opt/documenso/certificate.p12 (+ passphrase in env)

Research conclusions (sources in chat):

  • v1 API survives in v2"API V1 is stable but deprecated; nothing breaks." So the CRM keeps working on v1 API; flip to v2 later. (Will be explicitly re-tested against the clone in Phase 0 before committing.)
  • Postgres 15 is v2's official DB — no DB-engine upgrade needed.
  • Env vars carry over unchanged; only NEXTAUTH_URL is dropped in v2 (auth now derives from NEXT_PUBLIC_WEBAPP_URL, already set correctly) — harmless leftover.
  • Upgrade = pull new image + restart; prisma migrate deploy auto-runs all pending migrations on startup.
  • Known migration-failure history (issue #1880: NOT-NULL column added without backfill). 1.13.1 is past that one, but it's the failure pattern to expect — hence the clone dry-run.
  • The login bounce (non-Secure cookie / NEXTAUTH_URL quirk) is plausibly fixed in v2's reworked auth, but treat that as a hoped-for bonus, not the goal.

Locked decisions (per Matt, 2026-05-31)

  • Dry-run on a clone first: yes. Target latest v2.11.0, staged through v2.0.0.
  • No-downtime caveat: true zero-downtime is not possible (migrations run on restart). Goal = brief + pre-rehearsed: validate fully on the clone, pre-pull the image, then a fast prod cutover in a low-traffic window.
  • CRM stays on Documenso v1 API after upgrade.
  • Backups: pg_dump + cert + compose/env pulled to the Mac (private/documenso-backups/, gitignored) and a cold volume snapshot kept on-server for fastest rollback.
  • Privilege: root via su (stefan isn't in the docker group; sudo needs a password we don't have — root pass works for su).

Phase 0 — Dry-run on a disposable clone (zero prod risk)

  • pg_dump -Fc documenso_db (live, no downtime) → restore into a throwaway postgres:15 + documenso:v2.11.0 stack on a different compose project + port, with a copy of the signing cert.
  • Watch prisma migrate deploy run the full 1.13.1→2.11.0 chain. Confirm: all migrations succeed, app boots, login works, existing documents render.
  • Re-test the CRM's v1 API calls against the clone → expect 200s.
  • If a migration fails: capture it, fix forward (or decide a target version that's clean) BEFORE touching prod.

Phase A — Prod backups (after Phase 0 passes; verified before any change)

  • pg_dump -Fc documenso_db → pull to private/documenso-backups/ on the Mac (off-box). Plus a plain SQL dump.
  • Cold volume snapshot: stop stack → tar documenso-production_documenso-database → keep on-server + copy off. (This is the gold rollback — Prisma migrations aren't reversible.)
  • Copy compose file + env + /opt/documenso/{certificate.p12,private.key,certificate.crt}.
  • MinIO signatures: read-only object inventory ({key,size,lastModified,etag}) + DB→storage-key mapping export (Document/DocumentData → storage key) so files can be re-matched if linkage breaks.
  • Test-restore the dump into a throwaway PG15; record SHA-256s.

Phase B — Collation pre-fix (low risk; validate need on the clone first)

  • REFRESH COLLATION VERSION on documenso_db (+ template1/postgres) + reindex, so the libc 2.36→2.41 mismatch can't interfere with migration index ops.

Phase C — Prod upgrade (staged, pinned tags, low-traffic window)

  • Pre-pull images. Edit compose: v1.13.1 → v2.0.0up -d → watch migration logs → verify.
  • Then v2.0.0 → v2.11.0 → verify. Keep postgres:15.

Phase D — Verify

  • Login works; an existing completed envelope's PDF resolves from MinIO; send a test envelope; webhook reaches the CRM (X-Documenso-Secret, idempotent handleDocumentCompleted); reminders/void work.
  • CRM unchanged (still v1 API).

Phase E — Rollback (any failure)

  • Revert image tag + restore the volume snapshot (and/or DB dump) → back to v1.13.1 exactly.

Until Phase 0 passes AND a sober Phase A/C is explicitly approved step-by-step, do not touch the Documenso container, DB, volumes, or /opt/documenso.


Open decisions / what I need from you

  1. MinIO creds filled; Documenso DB creds filled (creds file §3/§4). Still need the Documenso API token + webhook secret (generate after login as matt@portnimara.com).
  2. Verify the root/sudo password (IpMKQ0TW56ovv80 — confirmed it works for su to root; not stefan's sudo password).
  3. CRM Postgres: own (compose default) or reuse an existing instance?
  4. Deploy dir for the CRM on the server (/opt/pn-crm?).
  5. Registry pull token — Gitea token for docker login on the server.
  6. Documenso target = v2.11.0, staged, clone-validated first.
  7. Maintenance window for the (brief, unavoidable) Documenso restart downtime.
  8. Off-box backup destination confirmed = Mac private/documenso-backups/ + on-server volume snapshot.

Progress log

  • 2026-05-31: Access established (SSH + Gitea API). Read-only recon done (nginx templates, prod compose, host port 7100). CRM deploy plan drafted. Documenso fully diagnosed read-only (v1.13.1, healthy app+DB, login issue = wrong email @letsbe vs @portnimara.com + a non-Secure-cookie quirk; 5432 publicly exposed + brute-forced; libc collation mismatch). Researched v2 upgrade (v2.11.0 latest, PG15 ok, env vars carry over, v1 API survives). Upgrade runbook drafted. No prod changes made; no backups taken.
  • 2026-06-01: Phase 0 dry-run PASSED (local, zero prod impact). Read-only pg_dump of prod (3.5 MB — metadata only) → restored into a throwaway postgres:15 → booted documenso:v2.11.0 against it. Result: full v1.13.1→v2.11.0 chain applied cleanly (All migrations have been successfully applied, 140→157, none unfinished), app boots (home 302, signin 200, v2 api 200), and v1 API still answers (400 not 404) → CRM safe. Dump saved at private/documenso-backups/ (off-box backup). Dry-run stack torn down 2026-06-01 after the pass (docker compose -p documenso-dryrun down -v — containers + anonymous volume + network removed; restored clone gone, off-box dump retained). Compose file kept at private/documenso-dryrun/docker-compose.yml for a re-run. Prod still untouched.

Environment variables — initial deployment + cutover

Single source of truth for the env each instance needs for the website<->CRM integration (added 2026-06-02). Every website-side CRM var is a no-op when unset, so the marketing site behaves exactly as today until these are filled at cutover. Full CRM schema: src/lib/env.ts.

CRM instance (crm.portnimara.com)

Var Value Notes
APP_URL https://crm.portnimara.com Absolute URLs + email links (the inquiry sales-alert "Open in CRM" button).
WEBSITE_INTAKE_SECRET shared secret MUST equal the website's CRM_INTAKE_SECRET. If unset, /api/public/website-inquiries returns 503 and refuses all intake.
EMAIL_REDIRECT_TO unset in prod Dev-only reroute; the prod build guard fails if it is set.
DATABASE_URL, REDIS_*, storage/MinIO, DOCUMENSO_*, SMTP_*, better-auth secret per .env Standard (see Phase 1 Pre-flight).

Per-port settings (stored in system_settings, set via Admin UI — NOT env):

  • website_intake_email_enabled — boolean, default OFF. Flip ON at cutover so the CRM sends the registrant confirmation + staff alert for website inquiries (berth / residence / contact), reusing the branded templates + per-port From. Keep OFF until the website's own sending is turned off (see WEBSITE_INQUIRY_EMAILS_DISABLED) to avoid double-sends.
  • inquiry_notification_recipients (JSON string[]) — staff who receive berth + contact-form inquiry alerts.
  • residential_notification_recipients (JSON string[]) — staff who receive residence inquiry alerts.
  • inquiry_contact_email (string) — fallback alert recipient + reply-to.

Website instance (Nuxt marketing site — repo ron/website.git)

New vars for the CRM integration (read via process.env in Nitro; all no-op when unset → site unchanged):

Var Value Enables Set when
CRM_INTAKE_URL https://crm.portnimara.com (bare host, no trailing slash) Inquiry dual-write delivery + base URL for the berth feed Cutover (safe earlier; just starts populating website_submissions)
CRM_INTAKE_SECRET shared secret Auth for the dual-write (X-Webhook-Secret); MUST equal CRM WEBSITE_INTAKE_SECRET With CRM_INTAKE_URL
CRM_BERTHS_ENABLED 1 (or true/yes) Switches the public berth map/list to read from CRM /api/public/berths instead of NocoDB (requires CRM_INTAKE_URL) Cutover, after CRM berth data is migrated + verified
WEBSITE_INQUIRY_EMAILS_DISABLED 1 Turns OFF the website's own Gmail confirmation + alert emails, handing email ownership to the CRM Cutover, flipped together with CRM website_intake_email_enabled = ON

UTM: no env var — cookieless; the client plugin reads utm_* from the landing URL and forwards them via an x-utm header.

Existing website env (keep, unchanged): NocoDB url/token, SMTP user/pass, alertRecipientsBerths/Residences/Contact, RECAPTCHA_SECRET, NUXT_PUBLIC_RECAPTCHA_SITE_KEY, Directus url. NocoDB stays as the berth fallback + the dual-write's primary target until the old system is retired; SMTP + alert recipients stay until WEBSITE_INQUIRY_EMAILS_DISABLED is set.

Cutover env-flip sequence (website)

  1. Confirm CRM is up, berth data migrated, and WEBSITE_INTAKE_SECRET set on the CRM.
  2. Set website CRM_INTAKE_URL + CRM_INTAKE_SECRET → verify a test inquiry lands in website_submissions.
  3. Flip CRM website_intake_email_enabled = ON and website WEBSITE_INQUIRY_EMAILS_DISABLED = 1 together → CRM is the single email owner.
  4. Set website CRM_BERTHS_ENABLED = 1 → public map reads from the CRM.
  5. Watch errors; rollback = unset the website vars (instant revert to NocoDB + website email).

Progress log (cont.)

  • 2026-06-02: Website integration prep (local only; no prod changes, nothing pushed). Website repo (main, uncommitted): env-gated berth feed (CRM_BERTHS_ENABLED), cookieless UTM forwarding (no env), inquiry dual-write (pre-existing). Website email kill-switch added (WEBSITE_INQUIRY_EMAILS_DISABLED). CRM repo: flag-gated email ownership (website_intake_email_enabled, default OFF) reusing the inquiry
    • residential templates plus a new contact-form alert template, hooked into /api/public/website-inquiries. New website env vars documented above. CRM tsc-clean + unit test added; website berth/UTM vue-tsc-clean. Nothing deployed.