Files
pn-new-crm/docs/deployment-plan.md
Matt 31ba72f344 chore(launch-prep): hide unfinished report/import surfaces, defer big builds
Ship-what's-done prep ahead of the prod cutover (launch ~today):

- Hide Financial + Marketing report cards from the reports landing
  (both were "Builder in development" placeholders gated on unbuilt
  data sources). Sales/Operational/Custom + templates/scheduling/
  exports remain live.
- Trim the Custom-report card copy to match the shipped basic builder
  (no group-by/filters yet; the builder page header was already honest).
- Hide the Bulk Import mockup from search-nav-catalog + the admin
  sections browser; /admin/import is now unreachable from the UI.
- Correct client-facing doc over-claims (waiting-list "next-in-line
  notification", Import) in features-list.md + new-system-feature-summary.md.
- Un-stale BACKLOG.md (Documenso phases 2-7 confirmed shipped).
- Log decisions + deferred work (full importer, full custom-builder,
  waiting-list, maintenance-log, paper-upload bug) to launch-readiness.md.

Deferred-importer design spec added at
docs/superpowers/specs/2026-06-01-bulk-import-design.md.

Verified: tsc --noEmit clean, eslint clean on changed files,
1512/1519 vitest pass (7 failures are Redis-down, unrelated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 16:39:51 +02:00

15 KiB

Production Deployment Plan — Port Nimara CRM

Status: DRAFT · pre-deployment · 2026-05-31 Target: https://crm.portnimara.com on the PN Cloud server. Companion: docs/launch-readiness.md (Initiative 5 — cutover). Credentials live in private/deployment-creds.md (gitignored) — never put secrets in this file.

Guardrails (non-negotiable)

  1. No change to anything on the prod server without Matt's explicit per-action approval. Recon/reads are fine; every sudo, every file write, every docker mutation, every certbot run is approved individually before it runs.
  2. Documenso is VITAL. It has broken on past upgrades. Nothing touches the Documenso DB, volumes, or container until a verified backup + S3↔DB reconciliation exists AND the upgrade step is explicitly approved.
  3. Work one phase at a time; verify before moving on. Keep a rollback for each mutating step.

Access (established 2026-05-31)

What Detail Verified
Prod server (SSH) 45.142.177.246:22022, user stefan, key id_ed25519_2026 (macOS keychain) connected, key auth
Gitea API https://code.letsbe.solutions as matt (admin) — reads build status, warnings, errors v1.25.5, repo letsbe/pn-new-crm
Container registry code.letsbe.solutions/letsbe/pn-new-crm/{crm-app,crm-worker} CI pushes :latest + :<sha>

Notes:

  • stefan is unprivileged (uid 1000, not in the docker group; sudo prompts for a password). Every docker / nginx / certbot / cert-read step needs sudo (root pass in private/deployment-creds.mdVERIFY; the per-server creds file had MOPC's pass by mistake).
  • Reading build logs: GET /api/v1/repos/letsbe/pn-new-crm/actions/tasks (run status) + per-job logs; latest main build is success.

How builds reach prod

git push origin main → Gitea Actions .gitea/workflows/build.yml:

  1. lint job: pnpm lint + pnpm exec tsc --noEmit.
  2. build-and-push job (main only): builds Dockerfilecrm-app and Dockerfile.workercrm-worker, pushes :latest + :<sha> to the Gitea registry.

Prod pulls those images — it does not build. So a deploy is: push → wait for green CI → docker compose pull + up -d on the server.


Prod stack (docker-compose.prod.yml)

Service Image Notes
postgres postgres:16-alpine self-contained, volume pgdata
redis redis:7-alpine self-contained, volume redisdata (BullMQ + socket.io adapter)
crm-app registry crm-app:latest host 7100 → container 3000
crm-worker registry crm-worker:latest BullMQ worker
  • Storage: no MinIO service in the compose — the CRM uses external MinIO via system_settings.storage_backend + getStorageBackend(). The existing prod MinIO (:9000, s3.conf / minio.conf nginx vhosts) is the backend. Confirm bucket + keys (creds file §3).
  • Decision needed: does the CRM get its own Postgres (the compose default, isolated pgdata) or reuse an existing prod Postgres instance? Default = the compose's own Postgres (cleanest isolation). Confirm.

Phase 1 — crm.portnimara.com go-live

DNS already points crm.portnimara.com at the server. No crm.portnimara nginx vhost exists yet (fresh setup). Template: portnimara_dev.conf (reverse-proxy + Certbot pattern already in use on this box).

Pre-flight (no approval needed — prep only)

  • Assemble the prod .env for the CRM. Source of truth: src/lib/env.ts (Zod schema) + .env.example. Critical keys:
    • APP_URL=https://crm.portnimara.com
    • DATABASE_URL (compose Postgres), REDIS_*
    • storage / MinIO (endpoint, access/secret, bucket) — creds file §3
    • DOCUMENSO_API_URL (bare host, no /api/v1), DOCUMENSO_API_VERSION, API key
    • better-auth secret, WEBSITE_INTAKE_SECRET, SMTP/IMAP
    • EMAIL_REDIRECT_TO MUST be unset in prod.
  • Server can pull from the registry: docker login code.letsbe.solutions with a registry token (creds file §2 — generate a Gitea token; do not bake the account password into the server).

Step 1 — nginx vhost (⚠ approval)

  1. Create /etc/nginx/sites-available/crm_portnimara.conf modelled on portnimara_dev.conf: port-80 → 443 redirect + .well-known/acme-challenge location; port-443 server proxy_pass http://127.0.0.1:7100 with the same header block (Host, X-Real-IP, CF-Connecting-IP, X-Forwarded-_, websocket Upgrade/Connection for socket.io), client_max_body_size 64M, proxy_read_timeout 300, buffering off. HTTP-only first (no ssl\__ lines yet) so Certbot can complete the challenge.
  2. Symlink into sites-enabled/.
  3. sudo nginx -t — must pass. Then sudo systemctl reload nginx.

Step 2 — TLS cert (⚠ approval)

  • sudo certbot --nginx -d crm.portnimara.com — pulls + installs the cert, rewrites the vhost with the managed ssl_certificate lines + 80→443 redirect. Re-run sudo nginx -t + reload.

Step 3 — bring up the container (⚠ approval)

  1. Place docker-compose.prod.yml + the prod .env in the deploy dir (e.g. /opt/pn-crm — confirm location).
  2. sudo docker login code.letsbe.solutions (registry token).
  3. sudo docker compose -f docker-compose.prod.yml pull.
  4. sudo docker compose -f docker-compose.prod.yml up -d.
  5. Watch for errors: sudo docker compose logs -f crm-app crm-worker.
  6. Apply schema: migrations via psql (per CLAUDE.md db:migrate is broken) or the app's push path — confirm the prod migration approach.
  7. Seed/bootstrap the port + admin user as needed.

Verify

  • curl -fsS https://crm.portnimara.com/api/public/health{status:"ok"...}
  • Authenticated health w/ X-Intake-Secret{checks:{db,redis}}
  • Login loads, branding renders, a berth list + a deal render.
  • socket.io realtime connects (websocket upgrade through nginx works).
  • No 42703 column errors (restart crm-app after any schema change).

Phase 2 — Documenso v1.13.1 → v2.x upgrade (VITAL — execute SOBER, heavily gated)

Do not execute while impaired. This is the production signing system. Every mutating step needs an explicit, sober go/no-go. The runbook below is reference; the actual run is a scheduled session.

Verified facts (2026-05-31 recon + research)

Item Value
Current version documenso/documenso:v1.13.1 (Oct 2025 — last v1)
Latest version v2.11.0 (May 2026). Path: 1.13.1 → 2.0.0 → … → 2.11.0 (major jump)
Compose /root/docker-compose/documenso/docker-compose.yml (project documenso-production, services documenso + database)
DB postgres:15, db documenso_db, user admin, vol documenso-production_documenso-database/var/lib/postgresql/data
App port container 3000 → host 3020; served at https://signatures.portnimara.dev (nginx documenso.conf, direct — no Cloudflare)
Storage external MinIO, bucket signatures @ s3.portnimara.com, region eu-central-1
Signing cert /opt/documenso/certificate.p12 (+ passphrase in env)

Research conclusions (sources in chat):

  • v1 API survives in v2"API V1 is stable but deprecated; nothing breaks." So the CRM keeps working on v1 API; flip to v2 later. (Will be explicitly re-tested against the clone in Phase 0 before committing.)
  • Postgres 15 is v2's official DB — no DB-engine upgrade needed.
  • Env vars carry over unchanged; only NEXTAUTH_URL is dropped in v2 (auth now derives from NEXT_PUBLIC_WEBAPP_URL, already set correctly) — harmless leftover.
  • Upgrade = pull new image + restart; prisma migrate deploy auto-runs all pending migrations on startup.
  • Known migration-failure history (issue #1880: NOT-NULL column added without backfill). 1.13.1 is past that one, but it's the failure pattern to expect — hence the clone dry-run.
  • The login bounce (non-Secure cookie / NEXTAUTH_URL quirk) is plausibly fixed in v2's reworked auth, but treat that as a hoped-for bonus, not the goal.

Locked decisions (per Matt, 2026-05-31)

  • Dry-run on a clone first: yes. Target latest v2.11.0, staged through v2.0.0.
  • No-downtime caveat: true zero-downtime is not possible (migrations run on restart). Goal = brief + pre-rehearsed: validate fully on the clone, pre-pull the image, then a fast prod cutover in a low-traffic window.
  • CRM stays on Documenso v1 API after upgrade.
  • Backups: pg_dump + cert + compose/env pulled to the Mac (private/documenso-backups/, gitignored) and a cold volume snapshot kept on-server for fastest rollback.
  • Privilege: root via su (stefan isn't in the docker group; sudo needs a password we don't have — root pass works for su).

Phase 0 — Dry-run on a disposable clone (zero prod risk)

  • pg_dump -Fc documenso_db (live, no downtime) → restore into a throwaway postgres:15 + documenso:v2.11.0 stack on a different compose project + port, with a copy of the signing cert.
  • Watch prisma migrate deploy run the full 1.13.1→2.11.0 chain. Confirm: all migrations succeed, app boots, login works, existing documents render.
  • Re-test the CRM's v1 API calls against the clone → expect 200s.
  • If a migration fails: capture it, fix forward (or decide a target version that's clean) BEFORE touching prod.

Phase A — Prod backups (after Phase 0 passes; verified before any change)

  • pg_dump -Fc documenso_db → pull to private/documenso-backups/ on the Mac (off-box). Plus a plain SQL dump.
  • Cold volume snapshot: stop stack → tar documenso-production_documenso-database → keep on-server + copy off. (This is the gold rollback — Prisma migrations aren't reversible.)
  • Copy compose file + env + /opt/documenso/{certificate.p12,private.key,certificate.crt}.
  • MinIO signatures: read-only object inventory ({key,size,lastModified,etag}) + DB→storage-key mapping export (Document/DocumentData → storage key) so files can be re-matched if linkage breaks.
  • Test-restore the dump into a throwaway PG15; record SHA-256s.

Phase B — Collation pre-fix (low risk; validate need on the clone first)

  • REFRESH COLLATION VERSION on documenso_db (+ template1/postgres) + reindex, so the libc 2.36→2.41 mismatch can't interfere with migration index ops.

Phase C — Prod upgrade (staged, pinned tags, low-traffic window)

  • Pre-pull images. Edit compose: v1.13.1 → v2.0.0up -d → watch migration logs → verify.
  • Then v2.0.0 → v2.11.0 → verify. Keep postgres:15.

Phase D — Verify

  • Login works; an existing completed envelope's PDF resolves from MinIO; send a test envelope; webhook reaches the CRM (X-Documenso-Secret, idempotent handleDocumentCompleted); reminders/void work.
  • CRM unchanged (still v1 API).

Phase E — Rollback (any failure)

  • Revert image tag + restore the volume snapshot (and/or DB dump) → back to v1.13.1 exactly.

Until Phase 0 passes AND a sober Phase A/C is explicitly approved step-by-step, do not touch the Documenso container, DB, volumes, or /opt/documenso.


Open decisions / what I need from you

  1. MinIO creds filled; Documenso DB creds filled (creds file §3/§4). Still need the Documenso API token + webhook secret (generate after login as matt@portnimara.com).
  2. Verify the root/sudo password (IpMKQ0TW56ovv80 — confirmed it works for su to root; not stefan's sudo password).
  3. CRM Postgres: own (compose default) or reuse an existing instance?
  4. Deploy dir for the CRM on the server (/opt/pn-crm?).
  5. Registry pull token — Gitea token for docker login on the server.
  6. Documenso target = v2.11.0, staged, clone-validated first.
  7. Maintenance window for the (brief, unavoidable) Documenso restart downtime.
  8. Off-box backup destination confirmed = Mac private/documenso-backups/ + on-server volume snapshot.

Progress log

  • 2026-05-31: Access established (SSH + Gitea API). Read-only recon done (nginx templates, prod compose, host port 7100). CRM deploy plan drafted. Documenso fully diagnosed read-only (v1.13.1, healthy app+DB, login issue = wrong email @letsbe vs @portnimara.com + a non-Secure-cookie quirk; 5432 publicly exposed + brute-forced; libc collation mismatch). Researched v2 upgrade (v2.11.0 latest, PG15 ok, env vars carry over, v1 API survives). Upgrade runbook drafted. No prod changes made; no backups taken.
  • 2026-06-01: Phase 0 dry-run PASSED (local, zero prod impact). Read-only pg_dump of prod (3.5 MB — metadata only) → restored into a throwaway postgres:15 → booted documenso:v2.11.0 against it. Result: full v1.13.1→v2.11.0 chain applied cleanly (All migrations have been successfully applied, 140→157, none unfinished), app boots (home 302, signin 200, v2 api 200), and v1 API still answers (400 not 404) → CRM safe. Dump saved at private/documenso-backups/ (off-box backup). Dry-run stack torn down 2026-06-01 after the pass (docker compose -p documenso-dryrun down -v — containers + anonymous volume + network removed; restored clone gone, off-box dump retained). Compose file kept at private/documenso-dryrun/docker-compose.yml for a re-run. Prod still untouched.