Files
pn-new-crm/docs/deployment-plan.md

379 lines
25 KiB
Markdown
Raw Normal View History

# Production Deployment Plan — Port Nimara CRM
> **Status:** DRAFT · pre-deployment · 2026-05-31
> **Target:** `https://crm.portnimara.com` on the PN Cloud server.
> **Companion:** `docs/launch-readiness.md` (Initiative 5 — cutover).
> Credentials live in `private/deployment-creds.md` (gitignored) — **never
> put secrets in this file.**
## ⛔ Guardrails (non-negotiable)
1. **No change to anything on the prod server without Matt's explicit
per-action approval.** Recon/reads are fine; every `sudo`, every file
write, every `docker` mutation, every `certbot` run is approved
individually before it runs.
2. **Documenso is VITAL.** It has broken on past upgrades. Nothing touches
the Documenso DB, volumes, or container until a verified backup +
S3↔DB reconciliation exists AND the upgrade step is explicitly approved.
3. Work one phase at a time; verify before moving on. Keep a rollback for
each mutating step.
---
## Access (established 2026-05-31)
| What | Detail | Verified |
| ---------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------ |
| **Prod server (SSH)** | `45.142.177.246:22022`, user `stefan`, key `id_ed25519_2026` (macOS keychain) | ✅ connected, key auth |
| **Gitea API** | `https://code.letsbe.solutions` as `matt` (admin) — reads build status, warnings, errors | ✅ v1.25.5, repo `letsbe/pn-new-crm` |
| **Container registry** | `code.letsbe.solutions/letsbe/pn-new-crm/{crm-app,crm-worker}` | ✅ CI pushes `:latest` + `:<sha>` |
Notes:
- `stefan` is **unprivileged** (uid 1000, not in the `docker` group; `sudo`
prompts for a password). Every `docker` / `nginx` / `certbot` / cert-read
step needs `sudo` (root pass in `private/deployment-creds.md`**VERIFY**;
the per-server creds file had MOPC's pass by mistake).
- Reading build logs: `GET /api/v1/repos/letsbe/pn-new-crm/actions/tasks`
(run status) + per-job logs; latest `main` build is **success**.
---
## How builds reach prod
`git push origin main` → Gitea Actions `.gitea/workflows/build.yml`:
1. **lint** job: `pnpm lint` + `pnpm exec tsc --noEmit`.
2. **build-and-push** job (main only): builds `Dockerfile``crm-app` and
`Dockerfile.worker``crm-worker`, pushes `:latest` + `:<sha>` to the
Gitea registry.
Prod **pulls** those images — it does not build. So a deploy is:
push → wait for green CI → `docker compose pull` + `up -d` on the server.
---
## Prod stack (`docker-compose.prod.yml`)
| Service | Image | Notes |
| ------------ | ---------------------------- | --------------------------------------------------------------- |
| `postgres` | `postgres:16-alpine` | self-contained, volume `pgdata` |
| `redis` | `redis:7-alpine` | self-contained, volume `redisdata` (BullMQ + socket.io adapter) |
| `crm-app` | registry `crm-app:latest` | **host `7100` → container `3000`** |
| `crm-worker` | registry `crm-worker:latest` | BullMQ worker |
- **Storage:** no MinIO service in the compose — the CRM uses **external
MinIO** via `system_settings.storage_backend` + `getStorageBackend()`.
The existing prod MinIO (`:9000`, `s3.conf` / `minio.conf` nginx vhosts)
is the backend. Confirm bucket + keys (creds file §3).
- **Decision needed:** does the CRM get its **own** Postgres (the compose
default, isolated `pgdata`) or reuse an existing prod Postgres instance?
Default = the compose's own Postgres (cleanest isolation). Confirm.
---
## Phase 1 — `crm.portnimara.com` go-live
DNS already points `crm.portnimara.com` at the server. No `crm.portnimara`
nginx vhost exists yet (fresh setup). Template: `portnimara_dev.conf`
(reverse-proxy + Certbot pattern already in use on this box).
### Pre-flight (no approval needed — prep only)
- [ ] Assemble the prod `.env` for the CRM. Source of truth: `src/lib/env.ts`
(Zod schema) + `.env.example`. Critical keys:
- `APP_URL=https://crm.portnimara.com`
- `DATABASE_URL` (compose Postgres), `REDIS_*`
- storage / MinIO (endpoint, access/secret, bucket) — creds file §3
- `DOCUMENSO_API_URL` (bare host, no `/api/v1`), `DOCUMENSO_API_VERSION`, API key
- better-auth secret, `WEBSITE_INTAKE_SECRET`, SMTP/IMAP
- **`EMAIL_REDIRECT_TO` MUST be unset in prod.**
- [ ] Server can pull from the registry: `docker login code.letsbe.solutions`
with a registry token (creds file §2 — generate a Gitea token; do
**not** bake the account password into the server).
### Step 1 — nginx vhost (⚠ approval)
1. Create `/etc/nginx/sites-available/crm_portnimara.conf` modelled on
`portnimara_dev.conf`: port-80 → 443 redirect + `.well-known/acme-challenge`
location; port-443 server `proxy_pass http://127.0.0.1:7100` with the same
header block (Host, X-Real-IP, CF-Connecting-IP, X-Forwarded-\_, websocket
`Upgrade`/`Connection` for socket.io), `client_max_body_size 64M`,
`proxy_read_timeout 300`, buffering off. **HTTP-only first** (no `ssl\__`
lines yet) so Certbot can complete the challenge.
2. Symlink into `sites-enabled/`.
3. `sudo nginx -t` — must pass. Then `sudo systemctl reload nginx`.
### Step 2 — TLS cert (⚠ approval)
- `sudo certbot --nginx -d crm.portnimara.com` — pulls + installs the cert,
rewrites the vhost with the managed `ssl_certificate` lines + 80→443
redirect. Re-run `sudo nginx -t` + reload.
### Step 3 — bring up the container (⚠ approval)
1. Place `docker-compose.prod.yml` + the prod `.env` in the deploy dir
(e.g. `/opt/pn-crm` — confirm location).
2. `sudo docker login code.letsbe.solutions` (registry token).
3. `sudo docker compose -f docker-compose.prod.yml pull`.
4. `sudo docker compose -f docker-compose.prod.yml up -d`.
5. **Watch for errors:** `sudo docker compose logs -f crm-app crm-worker`.
6. Apply schema: migrations via `psql` (per CLAUDE.md `db:migrate` is broken)
or the app's push path — confirm the prod migration approach.
7. Seed/bootstrap the port + admin user as needed.
### Verify
- [ ] `curl -fsS https://crm.portnimara.com/api/public/health``{status:"ok"...}`
- [ ] Authenticated health w/ `X-Intake-Secret``{checks:{db,redis}}`
- [ ] Login loads, branding renders, a berth list + a deal render.
- [ ] socket.io realtime connects (websocket upgrade through nginx works).
- [ ] No `42703` column errors (restart `crm-app` after any schema change).
---
## Phase 2 — Documenso v1.13.1 → v2.x upgrade (VITAL — execute SOBER, heavily gated)
> **Do not execute while impaired.** This is the production signing system.
> Every mutating step needs an explicit, sober go/no-go. The runbook below is
> reference; the actual run is a scheduled session.
### Verified facts (2026-05-31 recon + research)
| Item | Value |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| Current version | `documenso/documenso:v1.13.1` (Oct 2025 — last v1) |
| Latest version | **`v2.11.0`** (May 2026). Path: 1.13.1 → 2.0.0 → … → 2.11.0 (major jump) |
| Compose | `/root/docker-compose/documenso/docker-compose.yml` (project `documenso-production`, services `documenso` + `database`) |
| DB | `postgres:15`, db `documenso_db`, user `admin`, vol `documenso-production_documenso-database``/var/lib/postgresql/data` |
| App port | container `3000` → host `3020`; served at `https://signatures.portnimara.dev` (nginx `documenso.conf`, direct — **no Cloudflare**) |
| Storage | external MinIO, bucket `signatures` @ `s3.portnimara.com`, region `eu-central-1` |
| Signing cert | `/opt/documenso/certificate.p12` (+ passphrase in env) |
**Research conclusions (sources in chat):**
- **v1 API survives in v2** — _"API V1 is stable but deprecated; nothing breaks."_ So the CRM keeps working on v1 API; flip to v2 later. (Will be **explicitly re-tested against the clone in Phase 0** before committing.)
- **Postgres 15 is v2's official DB** — no DB-engine upgrade needed.
- **Env vars carry over unchanged**; only `NEXTAUTH_URL` is dropped in v2 (auth now derives from `NEXT_PUBLIC_WEBAPP_URL`, already set correctly) — harmless leftover.
- Upgrade = pull new image + restart; `prisma migrate deploy` auto-runs all pending migrations on startup.
- **Known migration-failure history** (issue #1880: NOT-NULL column added without backfill). 1.13.1 is past that one, but it's the failure pattern to expect — hence the clone dry-run.
- The login bounce (non-`Secure` cookie / `NEXTAUTH_URL` quirk) is plausibly fixed in v2's reworked auth, but treat that as a hoped-for bonus, not the goal.
### Locked decisions (per Matt, 2026-05-31)
- Dry-run on a clone first: **yes**. Target **latest v2.11.0**, staged through v2.0.0.
- **No-downtime caveat:** true zero-downtime is **not possible** (migrations run on restart). Goal = brief + pre-rehearsed: validate fully on the clone, pre-pull the image, then a fast prod cutover in a low-traffic window.
- CRM stays on Documenso **v1 API** after upgrade.
- Backups: `pg_dump` + cert + compose/env pulled to the Mac (`private/documenso-backups/`, gitignored) **and** a cold volume snapshot kept on-server for fastest rollback.
- Privilege: root via `su` (stefan isn't in the docker group; sudo needs a password we don't have — root pass works for `su`).
### Phase 0 — Dry-run on a disposable clone (zero prod risk)
- [ ] `pg_dump -Fc documenso_db` (live, no downtime) → restore into a throwaway `postgres:15` + `documenso:v2.11.0` stack on a **different compose project + port**, with a copy of the signing cert.
- [ ] Watch `prisma migrate deploy` run the full 1.13.1→2.11.0 chain. Confirm: all migrations succeed, app boots, **login works**, existing documents render.
- [ ] **Re-test the CRM's v1 API calls** against the clone → expect 200s.
- [ ] If a migration fails: capture it, fix forward (or decide a target version that's clean) BEFORE touching prod.
### Phase A — Prod backups (after Phase 0 passes; verified before any change)
- [ ] `pg_dump -Fc documenso_db` → pull to `private/documenso-backups/` on the Mac (off-box). Plus a plain SQL dump.
- [ ] Cold volume snapshot: stop stack → `tar` `documenso-production_documenso-database` → keep on-server + copy off. (This is the gold rollback — Prisma migrations aren't reversible.)
- [ ] Copy compose file + env + `/opt/documenso/{certificate.p12,private.key,certificate.crt}`.
- [ ] **MinIO `signatures`**: read-only object inventory (`{key,size,lastModified,etag}`) + DB→storage-key mapping export (Document/DocumentData → storage key) so files can be re-matched if linkage breaks.
- [ ] Test-restore the dump into a throwaway PG15; record SHA-256s.
### Phase B — Collation pre-fix (low risk; validate need on the clone first)
- [ ] `REFRESH COLLATION VERSION` on `documenso_db` (+ `template1`/`postgres`) + reindex, so the libc 2.36→2.41 mismatch can't interfere with migration index ops.
### Phase C — Prod upgrade (staged, pinned tags, low-traffic window)
- [ ] Pre-pull images. Edit compose: `v1.13.1 → v2.0.0``up -d` → watch migration logs → verify.
- [ ] Then `v2.0.0 → v2.11.0` → verify. Keep `postgres:15`.
### Phase D — Verify
- [ ] Login works; an existing completed envelope's PDF resolves from MinIO; send a test envelope; **webhook reaches the CRM** (`X-Documenso-Secret`, idempotent `handleDocumentCompleted`); reminders/void work.
- [ ] CRM unchanged (still v1 API).
### Phase E — Rollback (any failure)
- [ ] Revert image tag + restore the volume snapshot (and/or DB dump) → back to v1.13.1 exactly.
> Until Phase 0 passes AND a sober Phase A/C is explicitly approved step-by-step, **do not touch the Documenso container, DB, volumes, or `/opt/documenso`.**
---
## Open decisions / what I need from you
1. ✅ MinIO creds filled; Documenso DB creds filled (creds file §3/§4). Still need the Documenso **API token** + **webhook secret** (generate after login as `matt@portnimara.com`).
2. **Verify the root/sudo password** (`IpMKQ0TW56ovv80` — confirmed it works for `su` to root; not stefan's sudo password).
3. **CRM Postgres:** own (compose default) or reuse an existing instance?
4. **Deploy dir** for the CRM on the server (`/opt/pn-crm`?).
5. **Registry pull token** — Gitea token for `docker login` on the server.
6. ✅ Documenso target = **v2.11.0**, staged, clone-validated first.
7. **Maintenance window** for the (brief, unavoidable) Documenso restart downtime.
8. **Off-box backup destination confirmed** = Mac `private/documenso-backups/` + on-server volume snapshot.
## Progress log
- 2026-05-31: Access established (SSH + Gitea API). Read-only recon done
(nginx templates, prod compose, host port 7100). CRM deploy plan drafted.
Documenso fully diagnosed read-only (v1.13.1, healthy app+DB, login issue =
wrong email `@letsbe` vs `@portnimara.com` + a non-Secure-cookie quirk;
5432 publicly exposed + brute-forced; libc collation mismatch). Researched
v2 upgrade (v2.11.0 latest, PG15 ok, env vars carry over, v1 API survives).
Upgrade runbook drafted. **No prod changes made; no backups taken.**
- 2026-06-01: **Phase 0 dry-run PASSED (local, zero prod impact).** Read-only
`pg_dump` of prod (3.5 MB — metadata only) → restored into a throwaway
`postgres:15` → booted `documenso:v2.11.0` against it. Result: full
v1.13.1→v2.11.0 chain applied cleanly (`All migrations have been
successfully applied`, 140→157, none unfinished), app boots (home 302,
signin 200, v2 api 200), and **v1 API still answers (400 not 404) → CRM
safe**. Dump saved at `private/documenso-backups/` (off-box backup).
Dry-run stack **torn down 2026-06-01** after the pass (`docker compose
-p documenso-dryrun down -v` — containers + anonymous volume + network
removed; restored clone gone, off-box dump retained). Compose file kept
at `private/documenso-dryrun/docker-compose.yml` for a re-run. Prod
still untouched.
---
## Environment variables — initial deployment + cutover
> Single source of truth for the env each instance needs for the
> website<->CRM integration (added 2026-06-02). **Every website-side CRM
> var is a no-op when unset**, so the marketing site behaves exactly as
> today until these are filled at cutover. Full CRM schema: `src/lib/env.ts`.
### CRM instance (`crm.portnimara.com`)
| Var | Value | Notes |
| ------------------------------------------------------------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `APP_URL` | `https://crm.portnimara.com` | Absolute URLs + email links (the inquiry sales-alert "Open in CRM" button). |
| `WEBSITE_INTAKE_SECRET` | shared secret | **MUST equal** the website's `CRM_INTAKE_SECRET`. If unset, `/api/public/website-inquiries` returns **503** and refuses all intake. |
| `EMAIL_REDIRECT_TO` | **unset in prod** | Dev-only reroute; the prod build guard fails if it is set. |
| `DATABASE_URL`, `REDIS_*`, storage/MinIO, `DOCUMENSO_*`, `SMTP_*`, better-auth secret | per `.env` | Standard (see Phase 1 Pre-flight). |
Per-port **settings** (stored in `system_settings`, set via Admin UI — NOT env):
- `website_intake_email_enabled` — boolean, **default OFF**. Flip ON at
cutover so the CRM sends the registrant confirmation + staff alert for
website inquiries (berth / residence / contact), reusing the branded
templates + per-port From. Keep OFF until the website's own sending is
turned off (see `WEBSITE_INQUIRY_EMAILS_DISABLED`) to avoid double-sends.
- `inquiry_notification_recipients` (JSON string[]) — staff who receive
berth + contact-form inquiry alerts.
- `residential_notification_recipients` (JSON string[]) — staff who receive
residence inquiry alerts.
- `inquiry_contact_email` (string) — fallback alert recipient + reply-to.
### Website instance (Nuxt marketing site — repo `ron/website.git`)
New vars for the CRM integration (read via `process.env` in Nitro;
**all no-op when unset → site unchanged**):
| Var | Value | Enables | Set when |
| --------------------------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| `CRM_INTAKE_URL` | `https://crm.portnimara.com` (bare host, no trailing slash) | Inquiry dual-write delivery + base URL for the berth feed | Cutover (safe earlier; just starts populating `website_submissions`) |
| `CRM_INTAKE_SECRET` | shared secret | Auth for the dual-write (`X-Webhook-Secret`); **MUST equal** CRM `WEBSITE_INTAKE_SECRET` | With `CRM_INTAKE_URL` |
| `CRM_BERTHS_ENABLED` | `1` (or `true`/`yes`) | Switches the public berth map/list to read from CRM `/api/public/berths` instead of NocoDB (requires `CRM_INTAKE_URL`) | Cutover, after CRM berth data is migrated + verified |
| `WEBSITE_INQUIRY_EMAILS_DISABLED` | `1` | Turns OFF the website's own Gmail confirmation + alert emails, handing email ownership to the CRM | Cutover, flipped **together** with CRM `website_intake_email_enabled = ON` |
UTM: **no env var** — cookieless; the client plugin reads `utm_*` from the
landing URL and forwards them via an `x-utm` header.
Existing website env (keep, unchanged): NocoDB url/token, SMTP user/pass,
`alertRecipientsBerths/Residences/Contact`, `RECAPTCHA_SECRET`,
`NUXT_PUBLIC_RECAPTCHA_SITE_KEY`, Directus url. NocoDB stays as the berth
fallback + the dual-write's primary target until the old system is retired;
SMTP + alert recipients stay until `WEBSITE_INQUIRY_EMAILS_DISABLED` is set.
### Cutover env-flip sequence (website)
1. Confirm CRM is up, berth data migrated, and `WEBSITE_INTAKE_SECRET` set on the CRM.
2. Set website `CRM_INTAKE_URL` + `CRM_INTAKE_SECRET` → verify a test inquiry lands in `website_submissions`.
3. Flip CRM `website_intake_email_enabled = ON` **and** website `WEBSITE_INQUIRY_EMAILS_DISABLED = 1` together → CRM is the single email owner.
4. Set website `CRM_BERTHS_ENABLED = 1` → public map reads from the CRM.
5. Watch errors; rollback = unset the website vars (instant revert to NocoDB + website email).
## Progress log (cont.)
- 2026-06-02: **Website integration prep (local only; no prod changes, nothing pushed).**
Website repo (`main`, uncommitted): env-gated berth feed (`CRM_BERTHS_ENABLED`),
cookieless UTM forwarding (no env), inquiry dual-write (pre-existing). Website
email kill-switch added (`WEBSITE_INQUIRY_EMAILS_DISABLED`). CRM repo: flag-gated
email ownership (`website_intake_email_enabled`, default OFF) reusing the inquiry
- residential templates plus a new contact-form alert template, hooked into
`/api/public/website-inquiries`. New website env vars documented above. CRM
tsc-clean + unit test added; website berth/UTM vue-tsc-clean. Nothing deployed.
- 2026-06-02 (later): in-app notifications for website submissions + the
structured notification-recipient resolver (emails/users/roles/everyone,
backward-compatible) + the admin recipient-picker UI all shipped on the
`feat/website-intake-email-ownership` branch (CRM repo). Contact-form client
confirmation added on BOTH the website (gated by `WEBSITE_INQUIRY_EMAILS_DISABLED`)
and the CRM (gated by `website_intake_email_enabled`). tsc-clean; full vitest
suite (1570) green. Picker live browser-verify pending a dev server. Branch is
4 commits ahead of `main`, not merged, not pushed.
---
## Initial Deployment Runbook — execute-ready (assembled 2026-06-02)
> The single ordered checklist for go-live; detailed step content is in Phase 1
> / Phase 2 above + Initiative 5 (`launch-readiness.md`). **Guardrail stands: no
> prod-server mutation without per-action approval; reads/recon are free.** Per
> Matt (2026-06-02): assemble ALL inputs + the full plan before executing
> anything, including recon.
### Locked decisions
- **CRM Postgres:** OWN — compose-default `postgres:16`, isolated `pgdata`.
- **Deploy dir:** `/root/docker-compose/pn-crm/` (matches the other compose folders).
- **DB/Redis exposure:** bind to `127.0.0.1` ONLY — no public ports (the Documenso
`5432` public-exposure + brute-force lesson; the R1 port scan confirms).
- **Initial image:** include the email-ownership work — merge
`feat/website-intake-email-ownership` -> `main` -> push -> CI builds -> pull.
It is all flag-OFF by default, so it ships dormant + safe. (Alternative on
record: deploy `main` as-is, merge before the website cutover flip.)
### Prerequisites — gather BEFORE executing
| Need | For | Status |
| -------------------------------------------------------- | ---------------------------------------- | ------------------------ |
| SSH `stefan@45.142.177.246:22022` (key) | all recon + deploy | have |
| prod root pass (`su`) | docker / nginx / certbot | VERIFY (creds file) |
| Gitea registry pull token | `docker login` -> pull crm images | NEED (generate) |
| `WEBSITE_INTAKE_SECRET` (shared) | CRM `.env` + website `CRM_INTAKE_SECRET` | generate at P1 |
| Documenso API token + webhook secret | CRM `.env` (login `matt@portnimara.com`) | NEED |
| MinIO creds (endpoint/key/secret/bucket) for the new CRM | CRM `.env` storage | confirm (creds §3) |
| Legacy MinIO read creds | EOI backfill (D2) | NEED |
| Website-server root pass | Phase 4 env wiring | you provide at that step |
| Maintenance window | Documenso restart | schedule |
### Ordered steps (each gated)
- **Phase 0 [recon]** R1 port scan (external + internal listeners) · R2 NocoDB +
Documenso drift vs the 2026-06-01 pull · R3 fresh read-only Documenso `pg_dump`
-> re-run the `1.13.1->2.11.0` clone dry-run (final "won't break" check).
- **Phase 1 [APPROVAL]** P1 prod `.env` -> P2 `/root/docker-compose/pn-crm`
(localhost-bound DB/Redis) -> P3 nginx (HTTP-first) -> P4
`certbot --nginx -d crm.portnimara.com` -> P5 `docker login` + pull + up -> P6
schema + seed port/admin -> P7 verify (health, login, berths, socket.io).
- **Phase 2 [APPROVAL]** D1 load migrated data -> D2 MinIO EOI backfill -> D3
reconcile counts.
- **Phase 3 [APPROVAL · VITAL · together]** backups (pg_dump + cold volume
snapshot + cert + MinIO inventory, off-box) -> staged `1.13.1->2.0.0->2.11.0`
-> verify (login, existing envelope renders, test send, webhook reaches CRM,
CRM stays on v1 API) -> rollback ready.
- **Phase 4 [APPROVAL]** website env wiring on the other server -> cutover flips
(`CRM_INTAKE_URL`/`SECRET` on; then `website_intake_email_enabled` ON +
`WEBSITE_INQUIRY_EMAILS_DISABLED=1`; then `CRM_BERTHS_ENABLED=1`).
### Rollback anchors
- CRM: `docker compose -f docker-compose.prod.yml down` — the pn-crm stack is
isolated (own Postgres), zero impact on the other apps on the box.
- Documenso: revert the image tag + restore the cold volume snapshot / pg_dump.
- Website: unset the `CRM_*` env vars -> instant revert to NocoDB + website email.