Files
pn-new-crm/tests/unit/dedup/migration-transform.test.ts
Matt Ciaccio 18e5c124b0 feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.

Schema additions
================

- `client_merge_candidates` — pairs flagged by the background scoring
  job for the /admin/duplicates review queue. Status enum: pending /
  dismissed / merged. Unique-(portId, clientAId, clientBId) so the
  same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
  rows (NocoDB Interest #624 → new client UUID) so re-running --apply
  against the same dry-run report skips already-imported entities.

Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.

Library
=======

src/lib/dedup/nocodb-source.ts
  Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
  auto-paginates until isLastPage, captures the table IDs from the
  2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
  parallel into one in-memory object the transform layer consumes.

src/lib/dedup/migration-transform.ts
  Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
    - normalizes name / email / phone / country via the dedup library
    - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
    - maps the 8-stage `Sales Process Level` enum to the new 9-stage
      pipelineStage
    - filters yacht-name placeholders ('TBC', 'Na', etc.)
    - merges Internal Notes + Extra Comments + Berth Size Desired into
      a single notes blob
  Then runs `findClientMatches` pairwise (with blocking) and
  union-finds clusters of rows whose score crosses the auto-link
  threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
  Each cluster's "lead" row is picked by completeness score with
  recency tie-break.

src/lib/dedup/migration-report.ts
  Writes three artifacts to .migration/<timestamp>/:
    - report.csv  — one row per planned op, RFC-4180 escaped
    - summary.md  — human-skimmable overview
    - plan.json   — full structured plan for the --apply phase
  CSV cells with comma / quote / newline are quoted; internal quotes
  are doubled. No external CSV dep.

src/lib/dedup/phone-parse.ts
  Script-safe wrapper around libphonenumber-js's `core` entry that
  loads `metadata.min.json` directly. The default `index.cjs.js`
  bundled by libphonenumber hits a metadata-shape interop bug under
  Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
  The dedup `normalizePhone` and `find-matches` both use this wrapper
  now so the same code path runs in vitest, Next.js, and the migration
  CLI without surprises.

src/lib/dedup/normalize.ts
  Tightened country resolution: added Caribbean short-form aliases
  ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
  US locations seen in the NocoDB dump (Boston, Tampa, Fort
  Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
  to drop the `isValid()` strict check — the libphonenumber min build
  rejects many real NANP-territory numbers, and dedup only needs a
  canonical E.164 to compare.

CLI
===

scripts/migrate-from-nocodb.ts
  pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
    → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
       runs the transform, writes report. No DB writes.
  pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
    → Stubbed; exits with `not yet implemented` and a pointer to the
       design doc. Apply phase ships in a follow-up.

Tests
=====

tests/unit/dedup/migration-transform.test.ts (7 cases)
  Fixture-based regression. A frozen 12-row NocoDB snapshot covers
  every duplicate pattern in the design (§1.2). The test asserts:
    - 12 input rows → 7 unique clients (cluster math is right)
    - Patterns A / B / C / E auto-link
    - Pattern F (Etiennette Clamouze) does NOT auto-link
    - Every interest preserved as its own row even when clients merge
    - 8-stage → 9-stage enum mapping is correct per spec
    - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
      client) — the design's signature win
    - Output is deterministic (run twice, identical)

Validation against real data
============================

Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
  - 237 clients (15 merged into 13 clusters)
  - 252 interests (one per source row)
  - 406 contacts, 52 addresses
  - 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
  - 3 pairs flagged for review (Camazou, Zasso, one new)
  - 1 phone placeholder flagged

Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00

214 lines
7.6 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
/**
* Migration transform — fixture-based regression test.
*
* Feeds the transform a small frozen NocoDB snapshot containing one
* representative row from each duplicate pattern documented in
* docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §1.2,
* and asserts the resulting plan matches the algorithm's expected
* behavior. If any future change starts merging Pattern F (Etiennette
* Clamouze) or stops merging Pattern A (Deepak Ramchandani), this
* test fails immediately.
*/
import { describe, expect, it } from 'vitest';
import { transformSnapshot } from '@/lib/dedup/migration-transform';
import type { NocoDbRow, NocoDbSnapshot } from '@/lib/dedup/nocodb-source';
function row(fields: Partial<NocoDbRow> & { Id: number }): NocoDbRow {
return fields as NocoDbRow;
}
const FIXTURE: NocoDbSnapshot = {
fetchedAt: '2026-05-03T12:00:00.000Z',
berths: [],
residentialInterests: [],
websiteInterestSubmissions: [],
websiteContactFormSubmissions: [],
websiteBerthEoiSupplements: [],
interests: [
// Pattern A: pure double-submit (Deepak Ramchandani #624/#625)
row({
Id: 624,
'Full Name': 'Deepak Ramchandani',
'Email Address': 'dannyrams8888@gmail.com',
'Phone Number': '+17215868888',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 625,
'Full Name': 'Deepak Ramchandani',
'Email Address': 'dannyrams8888@gmail.com',
'Phone Number': '+17215868888',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern B: phone format variance (Howard Wiarda #236/#536)
row({
Id: 236,
'Full Name': 'Howard Wiarda',
'Email Address': 'hwiarda@hotmail.com',
'Phone Number': '574-274-0548',
'Place of Residence': 'USA',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 536,
'Full Name': 'Howard Wiarda',
'Email Address': 'hwiarda@hotmail.com',
'Phone Number': '+15742740548',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern C: name capitalization (Nicolas Ruiz #681/#682/#683 — three rows)
row({
Id: 681,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'ruiz.nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 682,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'ruiz.nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'Specific Qualified Interest',
}),
row({
Id: 683,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'Ruiz.Nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern E: surname typo with same email + phone (Constanzo/Costanzo)
row({
Id: 336,
'Full Name': 'Gianfranco Di Costanzo',
'Email Address': 'gdc@nauticall.com',
'Phone Number': '+17542628669',
'Yacht Name': 'GEMINI',
'Sales Process Level': 'Contract Signed',
}),
row({
Id: 585,
'Full Name': 'Gianfranco Di Constanzo',
'Email Address': 'gdc@nauticall.com',
'Phone Number': '+17542628669',
'Yacht Name': 'CALYPSO',
'Sales Process Level': 'Signed EOI and NDA',
}),
// Pattern F: same name, different country phones (Etiennette Clamouze)
row({
Id: 188,
'Full Name': 'Etiennette Clamouze',
'Email Address': 'clamouze.etiennette@gmail.com',
'Phone Number': '+33767780640',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 717,
'Full Name': 'Etiennette Clamouze',
'Email Address': 'Etiennette@the-manoah.com',
'Phone Number': '+12645815607',
'Sales Process Level': 'General Qualified Interest',
}),
// Single isolated row to verify non-duplicates pass through
row({
Id: 999,
'Full Name': 'Lone Wolf',
'Email Address': 'lone@example.com',
'Phone Number': '+15551234567',
'Sales Process Level': 'General Qualified Interest',
}),
],
};
describe('transformSnapshot — fixture regression', () => {
it('produces the expected number of clients + interests', () => {
const plan = transformSnapshot(FIXTURE);
// 12 input rows → 7 unique clients (Deepak: 1, Wiarda: 1, Ruiz: 1,
// Constanzo: 1, Etiennette x2: 2, Lone: 1). Etiennette stays as 2
// because Pattern F is correctly NOT auto-merged.
expect(plan.stats.outputClients).toBe(7);
expect(plan.stats.outputInterests).toBe(12); // one per source row
});
it('auto-links every Pattern AE cluster', () => {
const plan = transformSnapshot(FIXTURE);
const linkedSourceIds = new Set<number>();
for (const link of plan.autoLinks) {
linkedSourceIds.add(link.leadSourceId);
for (const merged of link.mergedSourceIds) {
linkedSourceIds.add(merged);
}
}
// Pattern A: 624 + 625
expect(linkedSourceIds.has(624) && linkedSourceIds.has(625)).toBe(true);
// Pattern B: 236 + 536
expect(linkedSourceIds.has(236) && linkedSourceIds.has(536)).toBe(true);
// Pattern C: 681 + 682 + 683 (three-way)
expect(linkedSourceIds.has(681) && linkedSourceIds.has(682) && linkedSourceIds.has(683)).toBe(
true,
);
// Pattern E: 336 + 585
expect(linkedSourceIds.has(336) && linkedSourceIds.has(585)).toBe(true);
});
it('does NOT auto-link Pattern F (Etiennette Clamouze, different country)', () => {
const plan = transformSnapshot(FIXTURE);
const linkedSourceIds = new Set<number>();
for (const link of plan.autoLinks) {
linkedSourceIds.add(link.leadSourceId);
for (const merged of link.mergedSourceIds) {
linkedSourceIds.add(merged);
}
}
// Both Etiennette rows must remain as separate clients.
expect(linkedSourceIds.has(188)).toBe(false);
expect(linkedSourceIds.has(717)).toBe(false);
});
it('preserves every interest as its own row even when clients merge', () => {
const plan = transformSnapshot(FIXTURE);
const sourceIds = plan.interests.map((i) => i.sourceId).sort((a, b) => a - b);
expect(sourceIds).toEqual([188, 236, 336, 536, 585, 624, 625, 681, 682, 683, 717, 999]);
});
it('maps the legacy 8-stage enum to new pipeline stages', () => {
const plan = transformSnapshot(FIXTURE);
const stagesById = new Map(plan.interests.map((i) => [i.sourceId, i.pipelineStage]));
expect(stagesById.get(681)).toBe('open'); // General Qualified Interest
expect(stagesById.get(682)).toBe('details_sent'); // Specific Qualified Interest
expect(stagesById.get(336)).toBe('contract_signed'); // Contract Signed
expect(stagesById.get(585)).toBe('eoi_signed'); // Signed EOI and NDA
});
it('attaches different yachts to one merged Constanzo client', () => {
const plan = transformSnapshot(FIXTURE);
const constanzoClient = plan.clients.find(
(c) => c.sourceIds.includes(336) && c.sourceIds.includes(585),
);
expect(constanzoClient).toBeDefined();
const yachtsForConstanzo = plan.interests
.filter((i) => i.clientTempId === constanzoClient!.tempId)
.map((i) => i.yachtName)
.sort();
expect(yachtsForConstanzo).toEqual(['CALYPSO', 'GEMINI']);
});
it('produces deterministic output (same input → same plan)', () => {
// The transform is pure — running it twice should yield bit-identical
// results. Catches order-dependent bugs in the dedup clustering.
const a = transformSnapshot(FIXTURE);
const b = transformSnapshot(FIXTURE);
expect(JSON.stringify(a.stats)).toBe(JSON.stringify(b.stats));
expect(a.autoLinks.length).toBe(b.autoLinks.length);
});
});