Ship-what's-done prep ahead of the prod cutover (launch ~today): - Hide Financial + Marketing report cards from the reports landing (both were "Builder in development" placeholders gated on unbuilt data sources). Sales/Operational/Custom + templates/scheduling/ exports remain live. - Trim the Custom-report card copy to match the shipped basic builder (no group-by/filters yet; the builder page header was already honest). - Hide the Bulk Import mockup from search-nav-catalog + the admin sections browser; /admin/import is now unreachable from the UI. - Correct client-facing doc over-claims (waiting-list "next-in-line notification", Import) in features-list.md + new-system-feature-summary.md. - Un-stale BACKLOG.md (Documenso phases 2-7 confirmed shipped). - Log decisions + deferred work (full importer, full custom-builder, waiting-list, maintenance-log, paper-upload bug) to launch-readiness.md. Deferred-importer design spec added at docs/superpowers/specs/2026-06-01-bulk-import-design.md. Verified: tsc --noEmit clean, eslint clean on changed files, 1512/1519 vitest pass (7 failures are Redis-down, unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
239 lines
15 KiB
Markdown
239 lines
15 KiB
Markdown
# Production Deployment Plan — Port Nimara CRM
|
|
|
|
> **Status:** DRAFT · pre-deployment · 2026-05-31
|
|
> **Target:** `https://crm.portnimara.com` on the PN Cloud server.
|
|
> **Companion:** `docs/launch-readiness.md` (Initiative 5 — cutover).
|
|
> Credentials live in `private/deployment-creds.md` (gitignored) — **never
|
|
> put secrets in this file.**
|
|
|
|
## ⛔ Guardrails (non-negotiable)
|
|
|
|
1. **No change to anything on the prod server without Matt's explicit
|
|
per-action approval.** Recon/reads are fine; every `sudo`, every file
|
|
write, every `docker` mutation, every `certbot` run is approved
|
|
individually before it runs.
|
|
2. **Documenso is VITAL.** It has broken on past upgrades. Nothing touches
|
|
the Documenso DB, volumes, or container until a verified backup +
|
|
S3↔DB reconciliation exists AND the upgrade step is explicitly approved.
|
|
3. Work one phase at a time; verify before moving on. Keep a rollback for
|
|
each mutating step.
|
|
|
|
---
|
|
|
|
## Access (established 2026-05-31)
|
|
|
|
| What | Detail | Verified |
|
|
| ---------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------ |
|
|
| **Prod server (SSH)** | `45.142.177.246:22022`, user `stefan`, key `id_ed25519_2026` (macOS keychain) | ✅ connected, key auth |
|
|
| **Gitea API** | `https://code.letsbe.solutions` as `matt` (admin) — reads build status, warnings, errors | ✅ v1.25.5, repo `letsbe/pn-new-crm` |
|
|
| **Container registry** | `code.letsbe.solutions/letsbe/pn-new-crm/{crm-app,crm-worker}` | ✅ CI pushes `:latest` + `:<sha>` |
|
|
|
|
Notes:
|
|
|
|
- `stefan` is **unprivileged** (uid 1000, not in the `docker` group; `sudo`
|
|
prompts for a password). Every `docker` / `nginx` / `certbot` / cert-read
|
|
step needs `sudo` (root pass in `private/deployment-creds.md` — **VERIFY**;
|
|
the per-server creds file had MOPC's pass by mistake).
|
|
- Reading build logs: `GET /api/v1/repos/letsbe/pn-new-crm/actions/tasks`
|
|
(run status) + per-job logs; latest `main` build is **success**.
|
|
|
|
---
|
|
|
|
## How builds reach prod
|
|
|
|
`git push origin main` → Gitea Actions `.gitea/workflows/build.yml`:
|
|
|
|
1. **lint** job: `pnpm lint` + `pnpm exec tsc --noEmit`.
|
|
2. **build-and-push** job (main only): builds `Dockerfile` → `crm-app` and
|
|
`Dockerfile.worker` → `crm-worker`, pushes `:latest` + `:<sha>` to the
|
|
Gitea registry.
|
|
|
|
Prod **pulls** those images — it does not build. So a deploy is:
|
|
push → wait for green CI → `docker compose pull` + `up -d` on the server.
|
|
|
|
---
|
|
|
|
## Prod stack (`docker-compose.prod.yml`)
|
|
|
|
| Service | Image | Notes |
|
|
| ------------ | ---------------------------- | --------------------------------------------------------------- |
|
|
| `postgres` | `postgres:16-alpine` | self-contained, volume `pgdata` |
|
|
| `redis` | `redis:7-alpine` | self-contained, volume `redisdata` (BullMQ + socket.io adapter) |
|
|
| `crm-app` | registry `crm-app:latest` | **host `7100` → container `3000`** |
|
|
| `crm-worker` | registry `crm-worker:latest` | BullMQ worker |
|
|
|
|
- **Storage:** no MinIO service in the compose — the CRM uses **external
|
|
MinIO** via `system_settings.storage_backend` + `getStorageBackend()`.
|
|
The existing prod MinIO (`:9000`, `s3.conf` / `minio.conf` nginx vhosts)
|
|
is the backend. Confirm bucket + keys (creds file §3).
|
|
- **Decision needed:** does the CRM get its **own** Postgres (the compose
|
|
default, isolated `pgdata`) or reuse an existing prod Postgres instance?
|
|
Default = the compose's own Postgres (cleanest isolation). Confirm.
|
|
|
|
---
|
|
|
|
## Phase 1 — `crm.portnimara.com` go-live
|
|
|
|
DNS already points `crm.portnimara.com` at the server. No `crm.portnimara`
|
|
nginx vhost exists yet (fresh setup). Template: `portnimara_dev.conf`
|
|
(reverse-proxy + Certbot pattern already in use on this box).
|
|
|
|
### Pre-flight (no approval needed — prep only)
|
|
|
|
- [ ] Assemble the prod `.env` for the CRM. Source of truth: `src/lib/env.ts`
|
|
(Zod schema) + `.env.example`. Critical keys:
|
|
- `APP_URL=https://crm.portnimara.com`
|
|
- `DATABASE_URL` (compose Postgres), `REDIS_*`
|
|
- storage / MinIO (endpoint, access/secret, bucket) — creds file §3
|
|
- `DOCUMENSO_API_URL` (bare host, no `/api/v1`), `DOCUMENSO_API_VERSION`, API key
|
|
- better-auth secret, `WEBSITE_INTAKE_SECRET`, SMTP/IMAP
|
|
- **`EMAIL_REDIRECT_TO` MUST be unset in prod.**
|
|
- [ ] Server can pull from the registry: `docker login code.letsbe.solutions`
|
|
with a registry token (creds file §2 — generate a Gitea token; do
|
|
**not** bake the account password into the server).
|
|
|
|
### Step 1 — nginx vhost (⚠ approval)
|
|
|
|
1. Create `/etc/nginx/sites-available/crm_portnimara.conf` modelled on
|
|
`portnimara_dev.conf`: port-80 → 443 redirect + `.well-known/acme-challenge`
|
|
location; port-443 server `proxy_pass http://127.0.0.1:7100` with the same
|
|
header block (Host, X-Real-IP, CF-Connecting-IP, X-Forwarded-_, websocket
|
|
`Upgrade`/`Connection` for socket.io), `client_max_body_size 64M`,
|
|
`proxy_read_timeout 300`, buffering off. **HTTP-only first** (no `ssl\__`
|
|
lines yet) so Certbot can complete the challenge.
|
|
2. Symlink into `sites-enabled/`.
|
|
3. `sudo nginx -t` — must pass. Then `sudo systemctl reload nginx`.
|
|
|
|
### Step 2 — TLS cert (⚠ approval)
|
|
|
|
- `sudo certbot --nginx -d crm.portnimara.com` — pulls + installs the cert,
|
|
rewrites the vhost with the managed `ssl_certificate` lines + 80→443
|
|
redirect. Re-run `sudo nginx -t` + reload.
|
|
|
|
### Step 3 — bring up the container (⚠ approval)
|
|
|
|
1. Place `docker-compose.prod.yml` + the prod `.env` in the deploy dir
|
|
(e.g. `/opt/pn-crm` — confirm location).
|
|
2. `sudo docker login code.letsbe.solutions` (registry token).
|
|
3. `sudo docker compose -f docker-compose.prod.yml pull`.
|
|
4. `sudo docker compose -f docker-compose.prod.yml up -d`.
|
|
5. **Watch for errors:** `sudo docker compose logs -f crm-app crm-worker`.
|
|
6. Apply schema: migrations via `psql` (per CLAUDE.md `db:migrate` is broken)
|
|
or the app's push path — confirm the prod migration approach.
|
|
7. Seed/bootstrap the port + admin user as needed.
|
|
|
|
### Verify
|
|
|
|
- [ ] `curl -fsS https://crm.portnimara.com/api/public/health` → `{status:"ok"...}`
|
|
- [ ] Authenticated health w/ `X-Intake-Secret` → `{checks:{db,redis}}`
|
|
- [ ] Login loads, branding renders, a berth list + a deal render.
|
|
- [ ] socket.io realtime connects (websocket upgrade through nginx works).
|
|
- [ ] No `42703` column errors (restart `crm-app` after any schema change).
|
|
|
|
---
|
|
|
|
## Phase 2 — Documenso v1.13.1 → v2.x upgrade (VITAL — execute SOBER, heavily gated)
|
|
|
|
> **Do not execute while impaired.** This is the production signing system.
|
|
> Every mutating step needs an explicit, sober go/no-go. The runbook below is
|
|
> reference; the actual run is a scheduled session.
|
|
|
|
### Verified facts (2026-05-31 recon + research)
|
|
|
|
| Item | Value |
|
|
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
|
|
| Current version | `documenso/documenso:v1.13.1` (Oct 2025 — last v1) |
|
|
| Latest version | **`v2.11.0`** (May 2026). Path: 1.13.1 → 2.0.0 → … → 2.11.0 (major jump) |
|
|
| Compose | `/root/docker-compose/documenso/docker-compose.yml` (project `documenso-production`, services `documenso` + `database`) |
|
|
| DB | `postgres:15`, db `documenso_db`, user `admin`, vol `documenso-production_documenso-database` → `/var/lib/postgresql/data` |
|
|
| App port | container `3000` → host `3020`; served at `https://signatures.portnimara.dev` (nginx `documenso.conf`, direct — **no Cloudflare**) |
|
|
| Storage | external MinIO, bucket `signatures` @ `s3.portnimara.com`, region `eu-central-1` |
|
|
| Signing cert | `/opt/documenso/certificate.p12` (+ passphrase in env) |
|
|
|
|
**Research conclusions (sources in chat):**
|
|
|
|
- **v1 API survives in v2** — _"API V1 is stable but deprecated; nothing breaks."_ So the CRM keeps working on v1 API; flip to v2 later. (Will be **explicitly re-tested against the clone in Phase 0** before committing.)
|
|
- **Postgres 15 is v2's official DB** — no DB-engine upgrade needed.
|
|
- **Env vars carry over unchanged**; only `NEXTAUTH_URL` is dropped in v2 (auth now derives from `NEXT_PUBLIC_WEBAPP_URL`, already set correctly) — harmless leftover.
|
|
- Upgrade = pull new image + restart; `prisma migrate deploy` auto-runs all pending migrations on startup.
|
|
- **Known migration-failure history** (issue #1880: NOT-NULL column added without backfill). 1.13.1 is past that one, but it's the failure pattern to expect — hence the clone dry-run.
|
|
- The login bounce (non-`Secure` cookie / `NEXTAUTH_URL` quirk) is plausibly fixed in v2's reworked auth, but treat that as a hoped-for bonus, not the goal.
|
|
|
|
### Locked decisions (per Matt, 2026-05-31)
|
|
|
|
- Dry-run on a clone first: **yes**. Target **latest v2.11.0**, staged through v2.0.0.
|
|
- **No-downtime caveat:** true zero-downtime is **not possible** (migrations run on restart). Goal = brief + pre-rehearsed: validate fully on the clone, pre-pull the image, then a fast prod cutover in a low-traffic window.
|
|
- CRM stays on Documenso **v1 API** after upgrade.
|
|
- Backups: `pg_dump` + cert + compose/env pulled to the Mac (`private/documenso-backups/`, gitignored) **and** a cold volume snapshot kept on-server for fastest rollback.
|
|
- Privilege: root via `su` (stefan isn't in the docker group; sudo needs a password we don't have — root pass works for `su`).
|
|
|
|
### Phase 0 — Dry-run on a disposable clone (zero prod risk)
|
|
|
|
- [ ] `pg_dump -Fc documenso_db` (live, no downtime) → restore into a throwaway `postgres:15` + `documenso:v2.11.0` stack on a **different compose project + port**, with a copy of the signing cert.
|
|
- [ ] Watch `prisma migrate deploy` run the full 1.13.1→2.11.0 chain. Confirm: all migrations succeed, app boots, **login works**, existing documents render.
|
|
- [ ] **Re-test the CRM's v1 API calls** against the clone → expect 200s.
|
|
- [ ] If a migration fails: capture it, fix forward (or decide a target version that's clean) BEFORE touching prod.
|
|
|
|
### Phase A — Prod backups (after Phase 0 passes; verified before any change)
|
|
|
|
- [ ] `pg_dump -Fc documenso_db` → pull to `private/documenso-backups/` on the Mac (off-box). Plus a plain SQL dump.
|
|
- [ ] Cold volume snapshot: stop stack → `tar` `documenso-production_documenso-database` → keep on-server + copy off. (This is the gold rollback — Prisma migrations aren't reversible.)
|
|
- [ ] Copy compose file + env + `/opt/documenso/{certificate.p12,private.key,certificate.crt}`.
|
|
- [ ] **MinIO `signatures`**: read-only object inventory (`{key,size,lastModified,etag}`) + DB→storage-key mapping export (Document/DocumentData → storage key) so files can be re-matched if linkage breaks.
|
|
- [ ] Test-restore the dump into a throwaway PG15; record SHA-256s.
|
|
|
|
### Phase B — Collation pre-fix (low risk; validate need on the clone first)
|
|
|
|
- [ ] `REFRESH COLLATION VERSION` on `documenso_db` (+ `template1`/`postgres`) + reindex, so the libc 2.36→2.41 mismatch can't interfere with migration index ops.
|
|
|
|
### Phase C — Prod upgrade (staged, pinned tags, low-traffic window)
|
|
|
|
- [ ] Pre-pull images. Edit compose: `v1.13.1 → v2.0.0` → `up -d` → watch migration logs → verify.
|
|
- [ ] Then `v2.0.0 → v2.11.0` → verify. Keep `postgres:15`.
|
|
|
|
### Phase D — Verify
|
|
|
|
- [ ] Login works; an existing completed envelope's PDF resolves from MinIO; send a test envelope; **webhook reaches the CRM** (`X-Documenso-Secret`, idempotent `handleDocumentCompleted`); reminders/void work.
|
|
- [ ] CRM unchanged (still v1 API).
|
|
|
|
### Phase E — Rollback (any failure)
|
|
|
|
- [ ] Revert image tag + restore the volume snapshot (and/or DB dump) → back to v1.13.1 exactly.
|
|
|
|
> Until Phase 0 passes AND a sober Phase A/C is explicitly approved step-by-step, **do not touch the Documenso container, DB, volumes, or `/opt/documenso`.**
|
|
|
|
---
|
|
|
|
## Open decisions / what I need from you
|
|
|
|
1. ✅ MinIO creds filled; Documenso DB creds filled (creds file §3/§4). Still need the Documenso **API token** + **webhook secret** (generate after login as `matt@portnimara.com`).
|
|
2. **Verify the root/sudo password** (`IpMKQ0TW56ovv80` — confirmed it works for `su` to root; not stefan's sudo password).
|
|
3. **CRM Postgres:** own (compose default) or reuse an existing instance?
|
|
4. **Deploy dir** for the CRM on the server (`/opt/pn-crm`?).
|
|
5. **Registry pull token** — Gitea token for `docker login` on the server.
|
|
6. ✅ Documenso target = **v2.11.0**, staged, clone-validated first.
|
|
7. **Maintenance window** for the (brief, unavoidable) Documenso restart downtime.
|
|
8. **Off-box backup destination confirmed** = Mac `private/documenso-backups/` + on-server volume snapshot.
|
|
|
|
## Progress log
|
|
|
|
- 2026-05-31: Access established (SSH + Gitea API). Read-only recon done
|
|
(nginx templates, prod compose, host port 7100). CRM deploy plan drafted.
|
|
Documenso fully diagnosed read-only (v1.13.1, healthy app+DB, login issue =
|
|
wrong email `@letsbe` vs `@portnimara.com` + a non-Secure-cookie quirk;
|
|
5432 publicly exposed + brute-forced; libc collation mismatch). Researched
|
|
v2 upgrade (v2.11.0 latest, PG15 ok, env vars carry over, v1 API survives).
|
|
Upgrade runbook drafted. **No prod changes made; no backups taken.**
|
|
- 2026-06-01: **Phase 0 dry-run PASSED (local, zero prod impact).** Read-only
|
|
`pg_dump` of prod (3.5 MB — metadata only) → restored into a throwaway
|
|
`postgres:15` → booted `documenso:v2.11.0` against it. Result: full
|
|
v1.13.1→v2.11.0 chain applied cleanly (`All migrations have been
|
|
successfully applied`, 140→157, none unfinished), app boots (home 302,
|
|
signin 200, v2 api 200), and **v1 API still answers (400 not 404) → CRM
|
|
safe**. Dump saved at `private/documenso-backups/` (off-box backup).
|
|
Dry-run stack **torn down 2026-06-01** after the pass (`docker compose
|
|
-p documenso-dryrun down -v` — containers + anonymous volume + network
|
|
removed; restored clone gone, off-box dump retained). Compose file kept
|
|
at `private/documenso-dryrun/docker-compose.yml` for a re-run. Prod
|
|
still untouched.
|