Files
pn-new-crm/docs/superpowers/specs/2026-06-01-legacy-data-migration-design.md
Matt 7dba1a47bb fix(migration): modernize stale NocoDB→CRM pipeline stage map to current 7 stages
The 2026-05-03 migration pipeline (src/lib/dedup/*) predates the 9→7
pipeline-stage refactor; its STAGE_MAP emitted invalid stages
(open/details_sent/eoi_sent/…) that would write bad pipeline_stage values
on --apply. Remap to the current PIPELINE_STAGES (enquiry/qualified/
nurturing/eoi/reservation/deposit_paid/contract) + a deposit-received →
deposit_paid override. Frozen-fixture test expectations updated (17/17 pass).

Validated: live --dry-run = 239 clients / 255 interests / 41 EOI docs
(matches independent snapshot analysis; pipeline is more conservative and
flags 3 borderline pairs for review).

Adds the migration design spec (source map, scope lock to Port Nimara +
Expenses bases, EOI coverage 48/48, in-flight Documenso state, remaining
gaps: interest eoiStatus, expenses, doc-blob backfill).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 19:03:32 +02:00

13 KiB
Raw Blame History

Legacy → New CRM Data Migration — Design Spec

Status: DRAFT (2026-06-01) · scope locked · awaiting stage-map sign-off Goal: Translate all live legacy data + reconnect documents/EOIs so the new CRM "picks up exactly where we left off." Companion: docs/launch-readiness.md Initiative 5 · docs/deployment-plan.md Source snapshot: read-only pg_dump of prod NocoDB at private/nocodb-snapshot/ (gitignored), restored locally as nocodb_legacy.

1. Source landscape (verified 2026-06-01)

Legacy data is spread across these systems (portal has no DB of its own):

System What Migrate?
NocoDB "Port Nimara" base plplouets5zw1um Interests (255), Berths (117), Residences (45), multi-berth junction _nc_m2m_Berths_Interests (83), Website subs (Interest 64 / Contact 50 / BerthEOI 1), Newsletter (69), reminder/alert settings
NocoDB "Expenses" base p3hq2fxdevqcaq8 Expenses (165); invoices empty
MinIO bucket client-portal EOIs, berth PDFs, receipts, business cards, general files (Phase 2)
MinIO bucket signatures Documenso signed PDFs (Phase 2)
Documenso v1.13.1 Signing envelopes, linked per-deal by documensoID (Phase 2)
9 other NocoDB bases (Customer_List, Registered Interest, Form Submissions, 2nd Residential, Image Uploads, EOI Queue, …) Old imports/experiments/backups excluded — zero code refs; stale 714 months
Gmail (IMAP), Keycloak Email archive, portal auth out of scope (per Matt)

Authority for scope: the live portal + website code reference table IDs in only the two active bases above; the recency check confirms Interests is the only actively-written table (last write 2026-05-21).

Legacy has no Company entity (everything is attributed to a person), so the migration creates clients + yachts (client-owned) + deals — no companies.

2. Key linking facts

  • Client + yacht are inline on each Interests row → extract + dedup.
  • documensoID (e.g. "82") on each deal → resolves to Documenso Envelope.secondaryId = 'document_' || documensoID (verified: deal doc=114 → envelope document_114). The envelope's completed PDF = the signed EOI. (Prod Documenso = v1.13.1, 140 migrations — confirmed.)
  • Berth Number (mooring, e.g. D31) + the _nc_m2m_Berths_Interests junction → multi-berth links.
  • Notes = inline Internal Notes + Extra Comments (+ 5 rows in nc_comments).
  • Dedup key for people: lowercased email → fallback canonical phone.

3. Phase 1 — NocoDB → new CRM (data)

Build against the local nocodb_legacy snapshot; idempotent; every new row stamped with its legacy_nocodb_id (add a nullable column or a side mapping table migration_id_map(entity, legacy_id, new_id)).

Import order (FK-safe): clients → yachts → interests → interest_berths → notes → residential → expenses → website_submissions → settings.

3.1 Clients (from Interests, deduped)

Source fields → clients: Full Name→fullName (title-cased via the legacy normalizePersonName rule), Email Address→primary email, Phone Number→ canonical phone, Address+Place of Residence→address/locality, Contact Method Preferred→preferredContactMethod, Source→source, Lead Category→(deal-level, see below). Dedup: group all 255 interests by lowercased email (fallback canonical phone); one client per unique person, N deals.

3.2 Yachts (from Interests)

Yacht Name→name (skip TBC/blank), Length/Width/Depth→dims. Unit note: legacy stores strings like "50ft" — parse number + unit, convert ft→m to match the berth/yacht numeric schema (store original string in a note if ambiguous). Owner = the deduped client (polymorphic client).

3.3 Interests / deals

  • Stage: map Sales Process Level (8) → new 7-stage pipeline — see §4 (needs sign-off).
  • Lead Category (General / Friends and Family)→leadCategory, Source→source.
  • Statuses: EOI Status, Deposit 10% Status, Contract Status, Contract Sent Status, Berth Info Sent Status → drive stage + the new EOI/contract/deposit fields; Deposit 10% Status='Received' → a payments row (deposit) + auto-advance.
  • Dates: Date Added/Created At→createdAt (DD-MM-YYYY → ISO; many are null — fall back to Documenso/earliest signal), EOI Time Sent, Time LOI Sent.
  • documensoID → stored for Phase 2 EOI relink.
  • Outcome: Sales Process Level='Contract Signed' + deposit/contract complete → won; otherwise open. (No explicit "lost" in legacy.)

3.4 interest_berths (multi-berth)

From _nc_m2m_Berths_Interests (83 links) → interest_berths via interestBerthsService. is_primary = the Berth Number plain-text mooring (or first link); is_in_eoi_bundle = true for signed/sent EOIs. Resolve berth by mooring against the migrated 117 berths.

3.5 Notes

Internal Notes + Extra Comments (and nc_comments) → interestNotes via notes.service, preserving original timestamps where present.

3.6 Residential

Interests (Residences) (45) → residential_clients + residential_interests (dedup by email). The 2nd residential base (16 rows) is excluded (stale).

3.7 Expenses

Expenses base (165) → the expenses module. Map Time→date, Payer→payer, Category→category, Price (string "€1,234")→numeric+currency. Receipts linked in Phase 2 (the Receipts images live in MinIO).

3.8 Website submissions + settings

Website Interest/Contact/BerthEOI subs → website_submissions. reminder_settings /alert_settings → best-effort into system_settings.

4. Stage mapping (8 → 7) — NEEDS SIGN-OFF

Legacy Sales Process Level → new pipeline stage (proposed):

Legacy New stage
General Qualified Interest qualified
Specific Qualified Interest nurturing
EOI and NDA Sent eoi
Signed EOI and NDA eoi (EOI signed)
Made Reservation reservation
Contract Negotiation reservationcontract?
Contract Negotiations Finalized contract
Contract Signed contract (won)

Open questions for Matt: (a) is "General Qualified Interest" really qualified or should some map to enquiry? (b) does "Contract Negotiation" belong in reservation or contract? (c) treat Contract Signed as a closed-won outcome?

5. Phase 2 — documents & EOIs (MinIO inventoried 2026-06-01)

Documents live in three MinIO buckets (verified):

  • client-portal (248 objects, 240 MB) — cleanly foldered: Berth-PDFs/ (114, mooring in filename), EOIs/ (95 signed EOIs foldered by client name), Client Documents/ (6), Legal/ (14), expense-sheets/ (2), client-emails/ (3 sent-email JSONs keyed interest-<id>).
  • signatures (323) — Documenso's raw per-envelope store (many test dupes — secondary source).
  • database — NocoDB's own attachment store at database/nc/uploads/noco/plplouets5zw1um/mbs9hjauug4eseo/cjzx7y2h9sxwd0n/… (field cjzx7y2h9sxwd0n = EOI_Document). This is where the pre-Documenso ("before/aside") signed EOIs live, as NocoDB attachments.

EOI coverage — verified, no missing signed EOI. Of 255 interests, 48 are EOI-signed; every one resolves to a recoverable PDF:

  1. ~38 via documensoIDEnvelope.secondaryId='document_'||id → completed PDF (+ curated copy in client-portal/EOIs/<name>/).
  2. ~10 old LOI-process deals (no documensoID, LOI=Signing Complete) → EOI_Document attachment in the database bucket.
  3. 3 via explicit S3_Documenso_Pathclient-portal/EOIs/.

Backfill order per deal: prefer the curated client-portal/EOIs/ copy → fall back to Documenso (by secondaryId) → then the NocoDB database attachment. Each → store via getStorageBackend()files+documents rows → ensureEntityFolder. Still run a file↔deal reconciliation to flag orphan EOI files + confirm each envelope PDF actually downloads.

  1. Berth PDFs: client-portal/Berth-PDFs/ (114) → berth_pdf_versions (mooring parsed from filename).
  2. Receipts / business cards: NOT in client-portal — likely in forms/ images/directus buckets (OpnForm uploads). Hunt only if wanted.
  3. Unresolved → manual-review CSV.

⚠ Crossover gate — in-flight Documenso signings

Documenso currently holds 6 PENDING (sent, awaiting signature) + 6 DRAFT envelopes (of 58 total; 46 COMPLETED). PENDING: Thomas Nemic (2026-02-04), Davy Morée (2025-11-28), Matthew Ciaccio (2025-11-24), Ben Sturge (2025-10-11), Van der Merwe (2025-10-02), Charles Davis (2025-08-22) — most stale/likely abandoned, only one from 2026. Before the Documenso upgrade/crossover, review these: void the dead ones, let any genuine one finish — don't strand an active signature.

6. Verification & reconcile

Validated run (2026-06-01, extract-nocodb.ts): 255 interests → 232 unique clients (1.10×; 21 with >1 deal roll up correctly), 39 yachts, 84 deal↔berth links (12 multi-berth), 63 notes. Stages 8→7: qualified 171 · eoi 51 · nurturing 30 · reservation 2 · contract 1. EOI coverage 48/48 resolvable. Signing state (Documenso-authoritative): signed 48 · awaiting_signature 3 (interests 581/633/639 → migrate as "awaiting" + keep envelope link + display pending) · none 204. Duplicate review: 1 exact-name (Etiennette Clamouze ×2), 0 fuzzy. Residential 45→35. Expenses 165 (0 parse fails). Output → private/migration-output/ (gitignored).

In-flight signing display: the 3 awaiting_signature deals load with the interest's EOI state = sent/awaiting + the Documenso envelope linked, so the new CRM's webhook/poll completes them and the UI shows "Waiting for signatures." Reconcile the 6 Documenso PENDING: 3 link to deals (in-flight above); 3 are abandoned re-sends of already-signed deals → void-review before crossover.

Remaining: spot-check 5 deals end-to-end after load.

7. Deliverables (scripts/migration/)

  • probe-minio.ts — bucket inventory (Phase 2 sizing; answers "are the business cards there?").
  • extract-nocodb.ts — read the snapshot, emit normalized JSON per entity.
  • transform-load.ts — dedup + map + load via service helpers, idempotent.
  • backfill-documents.ts — Phase 2 EOI/PDF/receipt backfill.
  • reconcile.ts — final report.

8. Decisions locked (2026-06-01)

  • Scope = the 2 active bases only; 9 others excluded; email/Keycloak out.
  • Extract via read-only pg_dump snapshot (done).
  • No company entities (legacy has none).
  • Idempotent, keyed on legacy_nocodb_id.