Files
pn-new-crm/tests/unit/dedup/migration-transform.test.ts

214 lines
7.6 KiB
TypeScript
Raw Permalink Normal View History

feat(dedup): NocoDB migration script + tables (P3 dry-run) Lands the one-shot migration pipeline from the legacy NocoDB Interests base into the new client/interest schema. Dry-run mode is fully operational: pulls the live snapshot, runs the dedup library, and writes a CSV + Markdown report under .migration/<timestamp>/. The --apply phase is stubbed for a follow-up PR per the design's P3 implementation sequence. Schema additions ================ - `client_merge_candidates` — pairs flagged by the background scoring job for the /admin/duplicates review queue. Status enum: pending / dismissed / merged. Unique-(portId, clientAId, clientBId) so the same pair can't surface twice. Empty until P2 lands the cron. - `migration_source_links` — idempotency ledger. Maps source-system rows (NocoDB Interest #624 → new client UUID) so re-running --apply against the same dry-run report skips already-imported entities. Both tables ship with the migration `0020_unusual_azazel.sql` — already applied to the local dev DB during this commit's preparation. Library ======= src/lib/dedup/nocodb-source.ts Read-only adapter for the legacy NocoDB v2 API. xc-token auth, auto-paginates until isLastPage, captures the table IDs from the 2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in parallel into one in-memory object the transform layer consumes. src/lib/dedup/migration-transform.ts Pure function: NocoDB snapshot in, MigrationPlan out. Per row: - normalizes name / email / phone / country via the dedup library - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats - maps the 8-stage `Sales Process Level` enum to the new 9-stage pipelineStage - filters yacht-name placeholders ('TBC', 'Na', etc.) - merges Internal Notes + Extra Comments + Berth Size Desired into a single notes blob Then runs `findClientMatches` pairwise (with blocking) and union-finds clusters of rows whose score crosses the auto-link threshold (90). Lower-scoring pairs (50–89) become 'needs review'. Each cluster's "lead" row is picked by completeness score with recency tie-break. src/lib/dedup/migration-report.ts Writes three artifacts to .migration/<timestamp>/: - report.csv — one row per planned op, RFC-4180 escaped - summary.md — human-skimmable overview - plan.json — full structured plan for the --apply phase CSV cells with comma / quote / newline are quoted; internal quotes are doubled. No external CSV dep. src/lib/dedup/phone-parse.ts Script-safe wrapper around libphonenumber-js's `core` entry that loads `metadata.min.json` directly. The default `index.cjs.js` bundled by libphonenumber hits a metadata-shape interop bug under Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it. The dedup `normalizePhone` and `find-matches` both use this wrapper now so the same code path runs in vitest, Next.js, and the migration CLI without surprises. src/lib/dedup/normalize.ts Tightened country resolution: added Caribbean short-form aliases ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the US locations seen in the NocoDB dump (Boston, Tampa, Fort Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing to drop the `isValid()` strict check — the libphonenumber min build rejects many real NANP-territory numbers, and dedup only needs a canonical E.164 to compare. CLI === scripts/migrate-from-nocodb.ts pnpm tsx scripts/migrate-from-nocodb.ts --dry-run → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars), runs the transform, writes report. No DB writes. pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/ → Stubbed; exits with `not yet implemented` and a pointer to the design doc. Apply phase ships in a follow-up. Tests ===== tests/unit/dedup/migration-transform.test.ts (7 cases) Fixture-based regression. A frozen 12-row NocoDB snapshot covers every duplicate pattern in the design (§1.2). The test asserts: - 12 input rows → 7 unique clients (cluster math is right) - Patterns A / B / C / E auto-link - Pattern F (Etiennette Clamouze) does NOT auto-link - Every interest preserved as its own row even when clients merge - 8-stage → 9-stage enum mapping is correct per spec - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one client) — the design's signature win - Output is deterministic (run twice, identical) Validation against real data ============================ Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the live NocoDB. Result on 252 Interests rows: - 237 clients (15 merged into 13 clusters) - 252 interests (one per source row) - 406 contacts, 52 addresses - 13 auto-linked clusters (every confirmed cluster from §1.2 audit) - 3 pairs flagged for review (Camazou, Zasso, one new) - 1 phone placeholder flagged Total dedup test count: 57 (50 from P1 + 7 fixture tests). Lint: clean. Tsc: clean for new files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
/**
* Migration transform fixture-based regression test.
*
* Feeds the transform a small frozen NocoDB snapshot containing one
* representative row from each duplicate pattern documented in
* docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §1.2,
* and asserts the resulting plan matches the algorithm's expected
* behavior. If any future change starts merging Pattern F (Etiennette
* Clamouze) or stops merging Pattern A (Deepak Ramchandani), this
* test fails immediately.
*/
import { describe, expect, it } from 'vitest';
import { transformSnapshot } from '@/lib/dedup/migration-transform';
import type { NocoDbRow, NocoDbSnapshot } from '@/lib/dedup/nocodb-source';
function row(fields: Partial<NocoDbRow> & { Id: number }): NocoDbRow {
return fields as NocoDbRow;
}
const FIXTURE: NocoDbSnapshot = {
fetchedAt: '2026-05-03T12:00:00.000Z',
berths: [],
residentialInterests: [],
websiteInterestSubmissions: [],
websiteContactFormSubmissions: [],
websiteBerthEoiSupplements: [],
interests: [
// Pattern A: pure double-submit (Deepak Ramchandani #624/#625)
row({
Id: 624,
'Full Name': 'Deepak Ramchandani',
'Email Address': 'dannyrams8888@gmail.com',
'Phone Number': '+17215868888',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 625,
'Full Name': 'Deepak Ramchandani',
'Email Address': 'dannyrams8888@gmail.com',
'Phone Number': '+17215868888',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern B: phone format variance (Howard Wiarda #236/#536)
row({
Id: 236,
'Full Name': 'Howard Wiarda',
'Email Address': 'hwiarda@hotmail.com',
'Phone Number': '574-274-0548',
'Place of Residence': 'USA',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 536,
'Full Name': 'Howard Wiarda',
'Email Address': 'hwiarda@hotmail.com',
'Phone Number': '+15742740548',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern C: name capitalization (Nicolas Ruiz #681/#682/#683 — three rows)
row({
Id: 681,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'ruiz.nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 682,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'ruiz.nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'Specific Qualified Interest',
}),
row({
Id: 683,
'Full Name': 'Nicolas Ruiz',
'Email Address': 'Ruiz.Nicolas@ufl.edu',
'Phone Number': '+17862006617',
'Sales Process Level': 'General Qualified Interest',
}),
// Pattern E: surname typo with same email + phone (Constanzo/Costanzo)
row({
Id: 336,
'Full Name': 'Gianfranco Di Costanzo',
'Email Address': 'gdc@nauticall.com',
'Phone Number': '+17542628669',
'Yacht Name': 'GEMINI',
'Sales Process Level': 'Contract Signed',
}),
row({
Id: 585,
'Full Name': 'Gianfranco Di Constanzo',
'Email Address': 'gdc@nauticall.com',
'Phone Number': '+17542628669',
'Yacht Name': 'CALYPSO',
'Sales Process Level': 'Signed EOI and NDA',
}),
// Pattern F: same name, different country phones (Etiennette Clamouze)
row({
Id: 188,
'Full Name': 'Etiennette Clamouze',
'Email Address': 'clamouze.etiennette@gmail.com',
'Phone Number': '+33767780640',
'Sales Process Level': 'General Qualified Interest',
}),
row({
Id: 717,
'Full Name': 'Etiennette Clamouze',
'Email Address': 'Etiennette@the-manoah.com',
'Phone Number': '+12645815607',
'Sales Process Level': 'General Qualified Interest',
}),
// Single isolated row to verify non-duplicates pass through
row({
Id: 999,
'Full Name': 'Lone Wolf',
'Email Address': 'lone@example.com',
'Phone Number': '+15551234567',
'Sales Process Level': 'General Qualified Interest',
}),
],
};
describe('transformSnapshot — fixture regression', () => {
it('produces the expected number of clients + interests', () => {
const plan = transformSnapshot(FIXTURE);
// 12 input rows → 7 unique clients (Deepak: 1, Wiarda: 1, Ruiz: 1,
// Constanzo: 1, Etiennette x2: 2, Lone: 1). Etiennette stays as 2
// because Pattern F is correctly NOT auto-merged.
expect(plan.stats.outputClients).toBe(7);
expect(plan.stats.outputInterests).toBe(12); // one per source row
});
it('auto-links every Pattern AE cluster', () => {
const plan = transformSnapshot(FIXTURE);
const linkedSourceIds = new Set<number>();
for (const link of plan.autoLinks) {
linkedSourceIds.add(link.leadSourceId);
for (const merged of link.mergedSourceIds) {
linkedSourceIds.add(merged);
}
}
// Pattern A: 624 + 625
expect(linkedSourceIds.has(624) && linkedSourceIds.has(625)).toBe(true);
// Pattern B: 236 + 536
expect(linkedSourceIds.has(236) && linkedSourceIds.has(536)).toBe(true);
// Pattern C: 681 + 682 + 683 (three-way)
expect(linkedSourceIds.has(681) && linkedSourceIds.has(682) && linkedSourceIds.has(683)).toBe(
true,
);
// Pattern E: 336 + 585
expect(linkedSourceIds.has(336) && linkedSourceIds.has(585)).toBe(true);
});
it('does NOT auto-link Pattern F (Etiennette Clamouze, different country)', () => {
const plan = transformSnapshot(FIXTURE);
const linkedSourceIds = new Set<number>();
for (const link of plan.autoLinks) {
linkedSourceIds.add(link.leadSourceId);
for (const merged of link.mergedSourceIds) {
linkedSourceIds.add(merged);
}
}
// Both Etiennette rows must remain as separate clients.
expect(linkedSourceIds.has(188)).toBe(false);
expect(linkedSourceIds.has(717)).toBe(false);
});
it('preserves every interest as its own row even when clients merge', () => {
const plan = transformSnapshot(FIXTURE);
const sourceIds = plan.interests.map((i) => i.sourceId).sort((a, b) => a - b);
expect(sourceIds).toEqual([188, 236, 336, 536, 585, 624, 625, 681, 682, 683, 717, 999]);
});
it('maps the legacy 8-stage enum to new pipeline stages', () => {
const plan = transformSnapshot(FIXTURE);
const stagesById = new Map(plan.interests.map((i) => [i.sourceId, i.pipelineStage]));
expect(stagesById.get(681)).toBe('open'); // General Qualified Interest
expect(stagesById.get(682)).toBe('details_sent'); // Specific Qualified Interest
expect(stagesById.get(336)).toBe('contract_signed'); // Contract Signed
expect(stagesById.get(585)).toBe('eoi_signed'); // Signed EOI and NDA
});
it('attaches different yachts to one merged Constanzo client', () => {
const plan = transformSnapshot(FIXTURE);
const constanzoClient = plan.clients.find(
(c) => c.sourceIds.includes(336) && c.sourceIds.includes(585),
);
expect(constanzoClient).toBeDefined();
const yachtsForConstanzo = plan.interests
.filter((i) => i.clientTempId === constanzoClient!.tempId)
.map((i) => i.yachtName)
.sort();
expect(yachtsForConstanzo).toEqual(['CALYPSO', 'GEMINI']);
});
it('produces deterministic output (same input → same plan)', () => {
// The transform is pure — running it twice should yield bit-identical
// results. Catches order-dependent bugs in the dedup clustering.
const a = transformSnapshot(FIXTURE);
const b = transformSnapshot(FIXTURE);
expect(JSON.stringify(a.stats)).toBe(JSON.stringify(b.stats));
expect(a.autoLinks.length).toBe(b.autoLinks.length);
});
});