Files
pn-new-crm/docs/superpowers/specs/2026-06-01-legacy-data-migration-design.md

213 lines
13 KiB
Markdown
Raw Normal View History

# Legacy → New CRM Data Migration — Design Spec
> **Status:** DRAFT (2026-06-01) · scope locked · awaiting stage-map sign-off
> **Goal:** Translate all live legacy data + reconnect documents/EOIs so the
> new CRM "picks up exactly where we left off."
> **Companion:** `docs/launch-readiness.md` Initiative 5 · `docs/deployment-plan.md`
> **Source snapshot:** read-only `pg_dump` of prod NocoDB at
> `private/nocodb-snapshot/` (gitignored), restored locally as `nocodb_legacy`.
## 1. Source landscape (verified 2026-06-01)
Legacy data is spread across these systems (portal has **no DB of its own**):
| System | What | Migrate? |
| ------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| **NocoDB "Port Nimara"** base `plplouets5zw1um` | Interests (255), Berths (117), Residences (45), multi-berth junction `_nc_m2m_Berths_Interests` (83), Website subs (Interest 64 / Contact 50 / BerthEOI 1), Newsletter (69), reminder/alert settings | ✅ |
| **NocoDB "Expenses"** base `p3hq2fxdevqcaq8` | Expenses (165); `invoices` empty | ✅ |
| **MinIO bucket `client-portal`** | EOIs, berth PDFs, receipts, business cards, general files | ✅ (Phase 2) |
| **MinIO bucket `signatures`** | Documenso signed PDFs | ✅ (Phase 2) |
| **Documenso v1.13.1** | Signing envelopes, linked per-deal by `documensoID` | ✅ (Phase 2) |
| 9 other NocoDB bases (Customer_List, Registered Interest, Form Submissions, 2nd Residential, Image Uploads, EOI Queue, …) | Old imports/experiments/backups | ❌ **excluded** — zero code refs; stale 714 months |
| Gmail (IMAP), Keycloak | Email archive, portal auth | ❌ out of scope (per Matt) |
**Authority for scope:** the live portal + website code reference table IDs in
**only** the two active bases above; the recency check confirms `Interests` is
the only actively-written table (last write 2026-05-21).
**Legacy has no Company entity** (everything is attributed to a person), so the
migration creates **clients + yachts (client-owned) + deals** — no companies.
## 2. Key linking facts
- **Client + yacht are inline on each Interests row** → extract + dedup.
- **`documensoID`** (e.g. `"82"`) on each deal → resolves to Documenso
`Envelope.secondaryId = 'document_' || documensoID` (verified: deal
`doc=114` → envelope `document_114`). The envelope's completed PDF = the
signed EOI. (Prod Documenso = v1.13.1, 140 migrations — confirmed.)
- **`Berth Number`** (mooring, e.g. `D31`) + the `_nc_m2m_Berths_Interests`
junction → multi-berth links.
- **Notes** = inline `Internal Notes` + `Extra Comments` (+ 5 rows in
`nc_comments`).
- Dedup key for people: **lowercased email → fallback canonical phone**.
## 3. Phase 1 — NocoDB → new CRM (data)
Build against the local `nocodb_legacy` snapshot; idempotent; every new row
stamped with its `legacy_nocodb_id` (add a nullable column or a side mapping
table `migration_id_map(entity, legacy_id, new_id)`).
**Import order (FK-safe):** clients → yachts → interests → interest_berths →
notes → residential → expenses → website_submissions → settings.
### 3.1 Clients (from Interests, deduped)
Source fields → `clients`: `Full Name`→fullName (title-cased via the legacy
`normalizePersonName` rule), `Email Address`→primary email, `Phone Number`
canonical phone, `Address`+`Place of Residence`→address/locality,
`Contact Method Preferred`→preferredContactMethod, `Source`→source,
`Lead Category`→(deal-level, see below). **Dedup:** group all 255 interests by
lowercased email (fallback canonical phone); one client per unique person,
N deals.
### 3.2 Yachts (from Interests)
`Yacht Name`→name (skip `TBC`/blank), `Length`/`Width`/`Depth`→dims. **Unit
note:** legacy stores strings like `"50ft"` — parse number + unit, convert ft→m
to match the berth/yacht numeric schema (store original string in a note if
ambiguous). Owner = the deduped client (polymorphic `client`).
### 3.3 Interests / deals
- **Stage:** map `Sales Process Level` (8) → new 7-stage pipeline — **see §4
(needs sign-off).**
- `Lead Category` (General / Friends and Family)→leadCategory, `Source`→source.
- Statuses: `EOI Status`, `Deposit 10% Status`, `Contract Status`,
`Contract Sent Status`, `Berth Info Sent Status` → drive stage + the new
EOI/contract/deposit fields; `Deposit 10% Status='Received'` → a `payments`
row (deposit) + auto-advance.
- Dates: `Date Added`/`Created At`→createdAt (DD-MM-YYYY → ISO; many are null —
fall back to Documenso/earliest signal), `EOI Time Sent`, `Time LOI Sent`.
- `documensoID` → stored for Phase 2 EOI relink.
- **Outcome:** `Sales Process Level='Contract Signed'` + deposit/contract
complete → won; otherwise open. (No explicit "lost" in legacy.)
### 3.4 interest_berths (multi-berth)
From `_nc_m2m_Berths_Interests` (83 links) → `interest_berths` via
`interestBerthsService`. `is_primary` = the `Berth Number` plain-text mooring
(or first link); `is_in_eoi_bundle` = true for signed/sent EOIs. Resolve berth
by mooring against the migrated 117 berths.
### 3.5 Notes
`Internal Notes` + `Extra Comments` (and `nc_comments`) → `interestNotes` via
`notes.service`, preserving original timestamps where present.
### 3.6 Residential
`Interests (Residences)` (45) → `residential_clients` + `residential_interests`
(dedup by email). The 2nd residential base (16 rows) is **excluded** (stale).
### 3.7 Expenses
`Expenses` base (165) → the expenses module. Map Time→date, Payer→payer,
Category→category, Price (string `"€1,234"`)→numeric+currency. Receipts linked
in Phase 2 (the `Receipts` images live in MinIO).
### 3.8 Website submissions + settings
Website Interest/Contact/BerthEOI subs → `website_submissions`. `reminder_settings`
/`alert_settings` → best-effort into `system_settings`.
## 4. Stage mapping (8 → 7) — NEEDS SIGN-OFF
Legacy `Sales Process Level` → new pipeline stage (proposed):
| Legacy | New stage |
| ------------------------------- | --------------------------- |
| General Qualified Interest | `qualified` |
| Specific Qualified Interest | `nurturing` |
| EOI and NDA Sent | `eoi` |
| Signed EOI and NDA | `eoi` (EOI signed) |
| Made Reservation | `reservation` |
| Contract Negotiation | `reservation``contract`? |
| Contract Negotiations Finalized | `contract` |
| Contract Signed | `contract` (won) |
Open questions for Matt: (a) is "General Qualified Interest" really `qualified`
or should some map to `enquiry`? (b) does "Contract Negotiation" belong in
`reservation` or `contract`? (c) treat `Contract Signed` as a closed-won
outcome?
## 5. Phase 2 — documents & EOIs (MinIO inventoried 2026-06-01)
Documents live in **three** MinIO buckets (verified):
- **`client-portal`** (248 objects, 240 MB) — cleanly foldered: `Berth-PDFs/`
(114, mooring in filename), `EOIs/` (95 signed EOIs foldered by client name),
`Client Documents/` (6), `Legal/` (14), `expense-sheets/` (2),
`client-emails/` (3 sent-email JSONs keyed `interest-<id>`).
- **`signatures`** (323) — Documenso's raw per-envelope store (many test dupes —
secondary source).
- **`database`** — NocoDB's own attachment store at
`database/nc/uploads/noco/plplouets5zw1um/mbs9hjauug4eseo/cjzx7y2h9sxwd0n/…`
(field `cjzx7y2h9sxwd0n` = `EOI_Document`). **This is where the pre-Documenso
("before/aside") signed EOIs live**, as NocoDB attachments.
**EOI coverage — verified, no missing signed EOI.** Of 255 interests, 48 are
EOI-signed; every one resolves to a recoverable PDF:
1. **~38 via `documensoID`** → `Envelope.secondaryId='document_'||id`
completed PDF (+ curated copy in `client-portal/EOIs/<name>/`).
2. **~10 old LOI-process deals** (no documensoID, `LOI=Signing Complete`) →
`EOI_Document` attachment in the **`database`** bucket.
3. **3 via explicit `S3_Documenso_Path`**`client-portal/EOIs/`.
Backfill order per deal: prefer the curated `client-portal/EOIs/` copy → fall
back to Documenso (by secondaryId) → then the NocoDB `database` attachment. Each
→ store via `getStorageBackend()``files`+`documents` rows → `ensureEntityFolder`.
Still run a file↔deal reconciliation to flag orphan EOI files + confirm each
envelope PDF actually downloads.
4. **Berth PDFs:** `client-portal/Berth-PDFs/` (114) → `berth_pdf_versions`
(mooring parsed from filename).
5. **Receipts / business cards:** NOT in `client-portal` — likely in `forms`/
`images`/`directus` buckets (OpnForm uploads). Hunt only if wanted.
6. Unresolved → manual-review CSV.
### ⚠ Crossover gate — in-flight Documenso signings
Documenso currently holds **6 PENDING** (sent, awaiting signature) + **6 DRAFT**
envelopes (of 58 total; 46 COMPLETED). PENDING: Thomas Nemic (2026-02-04), Davy
Morée (2025-11-28), Matthew Ciaccio (2025-11-24), Ben Sturge (2025-10-11), Van
der Merwe (2025-10-02), Charles Davis (2025-08-22) — most stale/likely abandoned,
only one from 2026. **Before the Documenso upgrade/crossover, review these:** void
the dead ones, let any genuine one finish — don't strand an active signature.
## 6. Verification & reconcile
**Validated run (2026-06-01, `extract-nocodb.ts`):** 255 interests → **232
unique clients** (1.10×; 21 with >1 deal roll up correctly), 39 yachts, 84
deal↔berth links (12 multi-berth), 63 notes. Stages 8→7: qualified 171 · eoi 51
· nurturing 30 · reservation 2 · contract 1. **EOI coverage 48/48 resolvable.**
Signing state (Documenso-authoritative): signed 48 · **awaiting_signature 3**
(interests 581/633/639 → migrate as "awaiting" + keep envelope link + display
pending) · none 204. Duplicate review: 1 exact-name (Etiennette Clamouze ×2), 0
fuzzy. Residential 45→35. Expenses 165 (0 parse fails). Output →
`private/migration-output/` (gitignored).
**In-flight signing display:** the 3 `awaiting_signature` deals load with the
interest's EOI state = sent/awaiting + the Documenso envelope linked, so the new
CRM's webhook/poll completes them and the UI shows "Waiting for signatures."
Reconcile the 6 Documenso PENDING: 3 link to deals (in-flight above); 3 are
abandoned re-sends of already-signed deals → void-review before crossover.
Remaining: spot-check 5 deals end-to-end after load.
## 7. Deliverables (scripts/migration/)
- `probe-minio.ts` — bucket inventory (Phase 2 sizing; answers "are the
business cards there?").
- `extract-nocodb.ts` — read the snapshot, emit normalized JSON per entity.
- `transform-load.ts` — dedup + map + load via service helpers, idempotent.
- `backfill-documents.ts` — Phase 2 EOI/PDF/receipt backfill.
- `reconcile.ts` — final report.
## 8. Decisions locked (2026-06-01)
- Scope = the 2 active bases only; 9 others excluded; email/Keycloak out.
- Extract via read-only pg_dump snapshot (done).
- No company entities (legacy has none).
- Idempotent, keyed on `legacy_nocodb_id`.