Files
pn-new-crm/docs/superpowers/specs/2026-06-01-legacy-data-migration-design.md
Matt 7dba1a47bb fix(migration): modernize stale NocoDB→CRM pipeline stage map to current 7 stages
The 2026-05-03 migration pipeline (src/lib/dedup/*) predates the 9→7
pipeline-stage refactor; its STAGE_MAP emitted invalid stages
(open/details_sent/eoi_sent/…) that would write bad pipeline_stage values
on --apply. Remap to the current PIPELINE_STAGES (enquiry/qualified/
nurturing/eoi/reservation/deposit_paid/contract) + a deposit-received →
deposit_paid override. Frozen-fixture test expectations updated (17/17 pass).

Validated: live --dry-run = 239 clients / 255 interests / 41 EOI docs
(matches independent snapshot analysis; pipeline is more conservative and
flags 3 borderline pairs for review).

Adds the migration design spec (source map, scope lock to Port Nimara +
Expenses bases, EOI coverage 48/48, in-flight Documenso state, remaining
gaps: interest eoiStatus, expenses, doc-blob backfill).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 19:03:32 +02:00

213 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Legacy → New CRM Data Migration — Design Spec
> **Status:** DRAFT (2026-06-01) · scope locked · awaiting stage-map sign-off
> **Goal:** Translate all live legacy data + reconnect documents/EOIs so the
> new CRM "picks up exactly where we left off."
> **Companion:** `docs/launch-readiness.md` Initiative 5 · `docs/deployment-plan.md`
> **Source snapshot:** read-only `pg_dump` of prod NocoDB at
> `private/nocodb-snapshot/` (gitignored), restored locally as `nocodb_legacy`.
## 1. Source landscape (verified 2026-06-01)
Legacy data is spread across these systems (portal has **no DB of its own**):
| System | What | Migrate? |
| ------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
| **NocoDB "Port Nimara"** base `plplouets5zw1um` | Interests (255), Berths (117), Residences (45), multi-berth junction `_nc_m2m_Berths_Interests` (83), Website subs (Interest 64 / Contact 50 / BerthEOI 1), Newsletter (69), reminder/alert settings | ✅ |
| **NocoDB "Expenses"** base `p3hq2fxdevqcaq8` | Expenses (165); `invoices` empty | ✅ |
| **MinIO bucket `client-portal`** | EOIs, berth PDFs, receipts, business cards, general files | ✅ (Phase 2) |
| **MinIO bucket `signatures`** | Documenso signed PDFs | ✅ (Phase 2) |
| **Documenso v1.13.1** | Signing envelopes, linked per-deal by `documensoID` | ✅ (Phase 2) |
| 9 other NocoDB bases (Customer_List, Registered Interest, Form Submissions, 2nd Residential, Image Uploads, EOI Queue, …) | Old imports/experiments/backups | ❌ **excluded** — zero code refs; stale 714 months |
| Gmail (IMAP), Keycloak | Email archive, portal auth | ❌ out of scope (per Matt) |
**Authority for scope:** the live portal + website code reference table IDs in
**only** the two active bases above; the recency check confirms `Interests` is
the only actively-written table (last write 2026-05-21).
**Legacy has no Company entity** (everything is attributed to a person), so the
migration creates **clients + yachts (client-owned) + deals** — no companies.
## 2. Key linking facts
- **Client + yacht are inline on each Interests row** → extract + dedup.
- **`documensoID`** (e.g. `"82"`) on each deal → resolves to Documenso
`Envelope.secondaryId = 'document_' || documensoID` (verified: deal
`doc=114` → envelope `document_114`). The envelope's completed PDF = the
signed EOI. (Prod Documenso = v1.13.1, 140 migrations — confirmed.)
- **`Berth Number`** (mooring, e.g. `D31`) + the `_nc_m2m_Berths_Interests`
junction → multi-berth links.
- **Notes** = inline `Internal Notes` + `Extra Comments` (+ 5 rows in
`nc_comments`).
- Dedup key for people: **lowercased email → fallback canonical phone**.
## 3. Phase 1 — NocoDB → new CRM (data)
Build against the local `nocodb_legacy` snapshot; idempotent; every new row
stamped with its `legacy_nocodb_id` (add a nullable column or a side mapping
table `migration_id_map(entity, legacy_id, new_id)`).
**Import order (FK-safe):** clients → yachts → interests → interest_berths →
notes → residential → expenses → website_submissions → settings.
### 3.1 Clients (from Interests, deduped)
Source fields → `clients`: `Full Name`→fullName (title-cased via the legacy
`normalizePersonName` rule), `Email Address`→primary email, `Phone Number`
canonical phone, `Address`+`Place of Residence`→address/locality,
`Contact Method Preferred`→preferredContactMethod, `Source`→source,
`Lead Category`→(deal-level, see below). **Dedup:** group all 255 interests by
lowercased email (fallback canonical phone); one client per unique person,
N deals.
### 3.2 Yachts (from Interests)
`Yacht Name`→name (skip `TBC`/blank), `Length`/`Width`/`Depth`→dims. **Unit
note:** legacy stores strings like `"50ft"` — parse number + unit, convert ft→m
to match the berth/yacht numeric schema (store original string in a note if
ambiguous). Owner = the deduped client (polymorphic `client`).
### 3.3 Interests / deals
- **Stage:** map `Sales Process Level` (8) → new 7-stage pipeline — **see §4
(needs sign-off).**
- `Lead Category` (General / Friends and Family)→leadCategory, `Source`→source.
- Statuses: `EOI Status`, `Deposit 10% Status`, `Contract Status`,
`Contract Sent Status`, `Berth Info Sent Status` → drive stage + the new
EOI/contract/deposit fields; `Deposit 10% Status='Received'` → a `payments`
row (deposit) + auto-advance.
- Dates: `Date Added`/`Created At`→createdAt (DD-MM-YYYY → ISO; many are null —
fall back to Documenso/earliest signal), `EOI Time Sent`, `Time LOI Sent`.
- `documensoID` → stored for Phase 2 EOI relink.
- **Outcome:** `Sales Process Level='Contract Signed'` + deposit/contract
complete → won; otherwise open. (No explicit "lost" in legacy.)
### 3.4 interest_berths (multi-berth)
From `_nc_m2m_Berths_Interests` (83 links) → `interest_berths` via
`interestBerthsService`. `is_primary` = the `Berth Number` plain-text mooring
(or first link); `is_in_eoi_bundle` = true for signed/sent EOIs. Resolve berth
by mooring against the migrated 117 berths.
### 3.5 Notes
`Internal Notes` + `Extra Comments` (and `nc_comments`) → `interestNotes` via
`notes.service`, preserving original timestamps where present.
### 3.6 Residential
`Interests (Residences)` (45) → `residential_clients` + `residential_interests`
(dedup by email). The 2nd residential base (16 rows) is **excluded** (stale).
### 3.7 Expenses
`Expenses` base (165) → the expenses module. Map Time→date, Payer→payer,
Category→category, Price (string `"€1,234"`)→numeric+currency. Receipts linked
in Phase 2 (the `Receipts` images live in MinIO).
### 3.8 Website submissions + settings
Website Interest/Contact/BerthEOI subs → `website_submissions`. `reminder_settings`
/`alert_settings` → best-effort into `system_settings`.
## 4. Stage mapping (8 → 7) — NEEDS SIGN-OFF
Legacy `Sales Process Level` → new pipeline stage (proposed):
| Legacy | New stage |
| ------------------------------- | --------------------------- |
| General Qualified Interest | `qualified` |
| Specific Qualified Interest | `nurturing` |
| EOI and NDA Sent | `eoi` |
| Signed EOI and NDA | `eoi` (EOI signed) |
| Made Reservation | `reservation` |
| Contract Negotiation | `reservation``contract`? |
| Contract Negotiations Finalized | `contract` |
| Contract Signed | `contract` (won) |
Open questions for Matt: (a) is "General Qualified Interest" really `qualified`
or should some map to `enquiry`? (b) does "Contract Negotiation" belong in
`reservation` or `contract`? (c) treat `Contract Signed` as a closed-won
outcome?
## 5. Phase 2 — documents & EOIs (MinIO inventoried 2026-06-01)
Documents live in **three** MinIO buckets (verified):
- **`client-portal`** (248 objects, 240 MB) — cleanly foldered: `Berth-PDFs/`
(114, mooring in filename), `EOIs/` (95 signed EOIs foldered by client name),
`Client Documents/` (6), `Legal/` (14), `expense-sheets/` (2),
`client-emails/` (3 sent-email JSONs keyed `interest-<id>`).
- **`signatures`** (323) — Documenso's raw per-envelope store (many test dupes —
secondary source).
- **`database`** — NocoDB's own attachment store at
`database/nc/uploads/noco/plplouets5zw1um/mbs9hjauug4eseo/cjzx7y2h9sxwd0n/…`
(field `cjzx7y2h9sxwd0n` = `EOI_Document`). **This is where the pre-Documenso
("before/aside") signed EOIs live**, as NocoDB attachments.
**EOI coverage — verified, no missing signed EOI.** Of 255 interests, 48 are
EOI-signed; every one resolves to a recoverable PDF:
1. **~38 via `documensoID`** → `Envelope.secondaryId='document_'||id`
completed PDF (+ curated copy in `client-portal/EOIs/<name>/`).
2. **~10 old LOI-process deals** (no documensoID, `LOI=Signing Complete`) →
`EOI_Document` attachment in the **`database`** bucket.
3. **3 via explicit `S3_Documenso_Path`**`client-portal/EOIs/`.
Backfill order per deal: prefer the curated `client-portal/EOIs/` copy → fall
back to Documenso (by secondaryId) → then the NocoDB `database` attachment. Each
→ store via `getStorageBackend()``files`+`documents` rows → `ensureEntityFolder`.
Still run a file↔deal reconciliation to flag orphan EOI files + confirm each
envelope PDF actually downloads.
4. **Berth PDFs:** `client-portal/Berth-PDFs/` (114) → `berth_pdf_versions`
(mooring parsed from filename).
5. **Receipts / business cards:** NOT in `client-portal` — likely in `forms`/
`images`/`directus` buckets (OpnForm uploads). Hunt only if wanted.
6. Unresolved → manual-review CSV.
### ⚠ Crossover gate — in-flight Documenso signings
Documenso currently holds **6 PENDING** (sent, awaiting signature) + **6 DRAFT**
envelopes (of 58 total; 46 COMPLETED). PENDING: Thomas Nemic (2026-02-04), Davy
Morée (2025-11-28), Matthew Ciaccio (2025-11-24), Ben Sturge (2025-10-11), Van
der Merwe (2025-10-02), Charles Davis (2025-08-22) — most stale/likely abandoned,
only one from 2026. **Before the Documenso upgrade/crossover, review these:** void
the dead ones, let any genuine one finish — don't strand an active signature.
## 6. Verification & reconcile
**Validated run (2026-06-01, `extract-nocodb.ts`):** 255 interests → **232
unique clients** (1.10×; 21 with >1 deal roll up correctly), 39 yachts, 84
deal↔berth links (12 multi-berth), 63 notes. Stages 8→7: qualified 171 · eoi 51
· nurturing 30 · reservation 2 · contract 1. **EOI coverage 48/48 resolvable.**
Signing state (Documenso-authoritative): signed 48 · **awaiting_signature 3**
(interests 581/633/639 → migrate as "awaiting" + keep envelope link + display
pending) · none 204. Duplicate review: 1 exact-name (Etiennette Clamouze ×2), 0
fuzzy. Residential 45→35. Expenses 165 (0 parse fails). Output →
`private/migration-output/` (gitignored).
**In-flight signing display:** the 3 `awaiting_signature` deals load with the
interest's EOI state = sent/awaiting + the Documenso envelope linked, so the new
CRM's webhook/poll completes them and the UI shows "Waiting for signatures."
Reconcile the 6 Documenso PENDING: 3 link to deals (in-flight above); 3 are
abandoned re-sends of already-signed deals → void-review before crossover.
Remaining: spot-check 5 deals end-to-end after load.
## 7. Deliverables (scripts/migration/)
- `probe-minio.ts` — bucket inventory (Phase 2 sizing; answers "are the
business cards there?").
- `extract-nocodb.ts` — read the snapshot, emit normalized JSON per entity.
- `transform-load.ts` — dedup + map + load via service helpers, idempotent.
- `backfill-documents.ts` — Phase 2 EOI/PDF/receipt backfill.
- `reconcile.ts` — final report.
## 8. Decisions locked (2026-06-01)
- Scope = the 2 active bases only; 9 others excluded; email/Keycloak out.
- Extract via read-only pg_dump snapshot (done).
- No company entities (legacy has none).
- Idempotent, keyed on `legacy_nocodb_id`.