feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests base into the new client/interest schema. Dry-run mode is fully operational: pulls the live snapshot, runs the dedup library, and writes a CSV + Markdown report under .migration/<timestamp>/. The --apply phase is stubbed for a follow-up PR per the design's P3 implementation sequence. Schema additions ================ - `client_merge_candidates` — pairs flagged by the background scoring job for the /admin/duplicates review queue. Status enum: pending / dismissed / merged. Unique-(portId, clientAId, clientBId) so the same pair can't surface twice. Empty until P2 lands the cron. - `migration_source_links` — idempotency ledger. Maps source-system rows (NocoDB Interest #624 → new client UUID) so re-running --apply against the same dry-run report skips already-imported entities. Both tables ship with the migration `0020_unusual_azazel.sql` — already applied to the local dev DB during this commit's preparation. Library ======= src/lib/dedup/nocodb-source.ts Read-only adapter for the legacy NocoDB v2 API. xc-token auth, auto-paginates until isLastPage, captures the table IDs from the 2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in parallel into one in-memory object the transform layer consumes. src/lib/dedup/migration-transform.ts Pure function: NocoDB snapshot in, MigrationPlan out. Per row: - normalizes name / email / phone / country via the dedup library - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats - maps the 8-stage `Sales Process Level` enum to the new 9-stage pipelineStage - filters yacht-name placeholders ('TBC', 'Na', etc.) - merges Internal Notes + Extra Comments + Berth Size Desired into a single notes blob Then runs `findClientMatches` pairwise (with blocking) and union-finds clusters of rows whose score crosses the auto-link threshold (90). Lower-scoring pairs (50–89) become 'needs review'. Each cluster's "lead" row is picked by completeness score with recency tie-break. src/lib/dedup/migration-report.ts Writes three artifacts to .migration/<timestamp>/: - report.csv — one row per planned op, RFC-4180 escaped - summary.md — human-skimmable overview - plan.json — full structured plan for the --apply phase CSV cells with comma / quote / newline are quoted; internal quotes are doubled. No external CSV dep. src/lib/dedup/phone-parse.ts Script-safe wrapper around libphonenumber-js's `core` entry that loads `metadata.min.json` directly. The default `index.cjs.js` bundled by libphonenumber hits a metadata-shape interop bug under Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it. The dedup `normalizePhone` and `find-matches` both use this wrapper now so the same code path runs in vitest, Next.js, and the migration CLI without surprises. src/lib/dedup/normalize.ts Tightened country resolution: added Caribbean short-form aliases ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the US locations seen in the NocoDB dump (Boston, Tampa, Fort Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing to drop the `isValid()` strict check — the libphonenumber min build rejects many real NANP-territory numbers, and dedup only needs a canonical E.164 to compare. CLI === scripts/migrate-from-nocodb.ts pnpm tsx scripts/migrate-from-nocodb.ts --dry-run → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars), runs the transform, writes report. No DB writes. pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/ → Stubbed; exits with `not yet implemented` and a pointer to the design doc. Apply phase ships in a follow-up. Tests ===== tests/unit/dedup/migration-transform.test.ts (7 cases) Fixture-based regression. A frozen 12-row NocoDB snapshot covers every duplicate pattern in the design (§1.2). The test asserts: - 12 input rows → 7 unique clients (cluster math is right) - Patterns A / B / C / E auto-link - Pattern F (Etiennette Clamouze) does NOT auto-link - Every interest preserved as its own row even when clients merge - 8-stage → 9-stage enum mapping is correct per spec - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one client) — the design's signature win - Output is deterministic (run twice, identical) Validation against real data ============================ Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the live NocoDB. Result on 252 Interests rows: - 237 clients (15 merged into 13 clusters) - 252 interests (one per source row) - 406 contacts, 52 addresses - 13 auto-linked clusters (every confirmed cluster from §1.2 audit) - 3 pairs flagged for review (Camazou, Zasso, one new) - 1 phone placeholder flagged Total dedup test count: 57 (50 from P1 + 7 fixture tests). Lint: clean. Tsc: clean for new files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -34,3 +34,4 @@ docker-compose.override.yml
|
|||||||
|
|
||||||
# Mobile audit screenshots — generated locally, regenerable
|
# Mobile audit screenshots — generated locally, regenerable
|
||||||
/.audit/
|
/.audit/
|
||||||
|
.migration/
|
||||||
|
|||||||
144
scripts/migrate-from-nocodb.ts
Normal file
144
scripts/migrate-from-nocodb.ts
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
/**
|
||||||
|
* One-shot migration: legacy NocoDB Interests → new client/interest split.
|
||||||
|
*
|
||||||
|
* Usage:
|
||||||
|
*
|
||||||
|
* pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
|
||||||
|
* Pulls the live NocoDB base, runs the transform + dedup pipeline,
|
||||||
|
* writes a report to .migration/<timestamp>/. NO database writes.
|
||||||
|
*
|
||||||
|
* pnpm tsx scripts/migrate-from-nocodb.ts --dry-run --port-slug harbor-royale
|
||||||
|
* Same, but tags the planned writes with the named port (matters for
|
||||||
|
* the apply phase — every client/interest belongs to one port).
|
||||||
|
*
|
||||||
|
* pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
|
||||||
|
* [Not yet implemented — apply phase comes in a follow-up PR.]
|
||||||
|
*
|
||||||
|
* Design reference: docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §9.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import 'dotenv/config';
|
||||||
|
|
||||||
|
import path from 'node:path';
|
||||||
|
import { fileURLToPath } from 'node:url';
|
||||||
|
|
||||||
|
import { fetchSnapshot, loadNocoDbConfig } from '@/lib/dedup/nocodb-source';
|
||||||
|
import { transformSnapshot } from '@/lib/dedup/migration-transform';
|
||||||
|
import { resolveReportPaths, writeReport } from '@/lib/dedup/migration-report';
|
||||||
|
|
||||||
|
interface CliArgs {
|
||||||
|
dryRun: boolean;
|
||||||
|
apply: boolean;
|
||||||
|
portSlug: string | null;
|
||||||
|
reportDir: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseArgs(argv: string[]): CliArgs {
|
||||||
|
const args: CliArgs = {
|
||||||
|
dryRun: false,
|
||||||
|
apply: false,
|
||||||
|
portSlug: null,
|
||||||
|
reportDir: null,
|
||||||
|
};
|
||||||
|
for (let i = 0; i < argv.length; i += 1) {
|
||||||
|
const a = argv[i]!;
|
||||||
|
if (a === '--dry-run') args.dryRun = true;
|
||||||
|
else if (a === '--apply') args.apply = true;
|
||||||
|
else if (a === '--port-slug') args.portSlug = argv[++i] ?? null;
|
||||||
|
else if (a === '--report') args.reportDir = argv[++i] ?? null;
|
||||||
|
else if (a === '-h' || a === '--help') {
|
||||||
|
printHelp();
|
||||||
|
process.exit(0);
|
||||||
|
} else {
|
||||||
|
console.error(`Unknown argument: ${a}`);
|
||||||
|
printHelp();
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return args;
|
||||||
|
}
|
||||||
|
|
||||||
|
function printHelp(): void {
|
||||||
|
console.log(`Usage:
|
||||||
|
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run [--port-slug <slug>]
|
||||||
|
Pulls NocoDB → transforms → writes report to .migration/<timestamp>/.
|
||||||
|
No database writes.
|
||||||
|
|
||||||
|
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
|
||||||
|
Apply phase. (Not yet implemented.)
|
||||||
|
|
||||||
|
Flags:
|
||||||
|
--dry-run Read NocoDB, write report only.
|
||||||
|
--apply Actually write to the new DB. (Not yet supported.)
|
||||||
|
--port-slug <slug> Port slug to attach to all imported entities.
|
||||||
|
Defaults to the first available port if omitted.
|
||||||
|
--report <dir> Path to a previously-generated report dir
|
||||||
|
(only used by --apply).
|
||||||
|
-h, --help Show this help.
|
||||||
|
`);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main(): Promise<void> {
|
||||||
|
const args = parseArgs(process.argv.slice(2));
|
||||||
|
|
||||||
|
if (!args.dryRun && !args.apply) {
|
||||||
|
console.error('Must specify --dry-run or --apply');
|
||||||
|
printHelp();
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (args.apply) {
|
||||||
|
console.error('--apply is not yet implemented in this version. P3 ships dry-run first.');
|
||||||
|
console.error('See docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §9.2.');
|
||||||
|
process.exit(2);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Dry-run path ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
console.log('[migrate] Loading NocoDB config…');
|
||||||
|
const config = loadNocoDbConfig();
|
||||||
|
console.log(`[migrate] Source: ${config.url}`);
|
||||||
|
|
||||||
|
console.log('[migrate] Fetching snapshot from NocoDB…');
|
||||||
|
const start = Date.now();
|
||||||
|
const snapshot = await fetchSnapshot(config);
|
||||||
|
const elapsed = ((Date.now() - start) / 1000).toFixed(1);
|
||||||
|
console.log(
|
||||||
|
`[migrate] Snapshot fetched in ${elapsed}s — ${snapshot.interests.length} interests, ${snapshot.residentialInterests.length} residential, ${snapshot.berths.length} berths.`,
|
||||||
|
);
|
||||||
|
|
||||||
|
console.log('[migrate] Running transform + dedup pipeline…');
|
||||||
|
const plan = transformSnapshot(snapshot);
|
||||||
|
|
||||||
|
// Resolve output paths relative to the worktree root (the script itself
|
||||||
|
// lives in scripts/; we want the .migration dir at the repo root).
|
||||||
|
const scriptDir = path.dirname(fileURLToPath(import.meta.url));
|
||||||
|
const repoRoot = path.resolve(scriptDir, '..');
|
||||||
|
const generatedAt = new Date().toISOString();
|
||||||
|
const paths = resolveReportPaths(repoRoot);
|
||||||
|
|
||||||
|
console.log(`[migrate] Writing report to ${paths.rootDir}…`);
|
||||||
|
await writeReport(paths, plan, generatedAt);
|
||||||
|
|
||||||
|
// ── Console summary ──────────────────────────────────────────────────────
|
||||||
|
const s = plan.stats;
|
||||||
|
console.log('');
|
||||||
|
console.log('=== Migration Plan Summary ===');
|
||||||
|
console.log(
|
||||||
|
` Input: ${s.inputInterestRows} interests, ${s.inputResidentialRows} residential interests`,
|
||||||
|
);
|
||||||
|
console.log(` Output: ${s.outputClients} clients, ${s.outputInterests} interests`);
|
||||||
|
console.log(` ${s.outputContacts} contacts, ${s.outputAddresses} addresses`);
|
||||||
|
console.log(
|
||||||
|
` Dedup: ${s.autoLinkedClusters} auto-linked clusters, ${s.needsReviewPairs} pairs flagged for review`,
|
||||||
|
);
|
||||||
|
console.log(` Quality: ${s.flaggedRows} rows flagged (see report.csv)`);
|
||||||
|
console.log('');
|
||||||
|
console.log(` Full report: ${paths.summaryPath}`);
|
||||||
|
console.log('');
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch((err) => {
|
||||||
|
console.error('[migrate] Fatal error:', err);
|
||||||
|
process.exit(1);
|
||||||
|
});
|
||||||
30
src/lib/db/migrations/0020_unusual_azazel.sql
Normal file
30
src/lib/db/migrations/0020_unusual_azazel.sql
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
CREATE TABLE "client_merge_candidates" (
|
||||||
|
"id" text PRIMARY KEY NOT NULL,
|
||||||
|
"port_id" text NOT NULL,
|
||||||
|
"client_a_id" text NOT NULL,
|
||||||
|
"client_b_id" text NOT NULL,
|
||||||
|
"score" integer NOT NULL,
|
||||||
|
"reasons" jsonb NOT NULL,
|
||||||
|
"status" text DEFAULT 'pending' NOT NULL,
|
||||||
|
"created_at" timestamp with time zone DEFAULT now() NOT NULL,
|
||||||
|
"resolved_at" timestamp with time zone,
|
||||||
|
"resolved_by" text
|
||||||
|
);
|
||||||
|
--> statement-breakpoint
|
||||||
|
CREATE TABLE "migration_source_links" (
|
||||||
|
"id" text PRIMARY KEY NOT NULL,
|
||||||
|
"source_system" text NOT NULL,
|
||||||
|
"source_id" text NOT NULL,
|
||||||
|
"target_entity_type" text NOT NULL,
|
||||||
|
"target_entity_id" text NOT NULL,
|
||||||
|
"applied_id" text NOT NULL,
|
||||||
|
"applied_by" text,
|
||||||
|
"applied_at" timestamp with time zone DEFAULT now() NOT NULL
|
||||||
|
);
|
||||||
|
--> statement-breakpoint
|
||||||
|
ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_port_id_ports_id_fk" FOREIGN KEY ("port_id") REFERENCES "public"."ports"("id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
|
||||||
|
ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_client_a_id_clients_id_fk" FOREIGN KEY ("client_a_id") REFERENCES "public"."clients"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
|
||||||
|
ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_client_b_id_clients_id_fk" FOREIGN KEY ("client_b_id") REFERENCES "public"."clients"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
|
||||||
|
CREATE INDEX "idx_cmc_port_status" ON "client_merge_candidates" USING btree ("port_id","status");--> statement-breakpoint
|
||||||
|
CREATE UNIQUE INDEX "idx_cmc_pair" ON "client_merge_candidates" USING btree ("port_id","client_a_id","client_b_id");--> statement-breakpoint
|
||||||
|
CREATE UNIQUE INDEX "idx_msl_source_target" ON "migration_source_links" USING btree ("source_system","source_id","target_entity_type");
|
||||||
10482
src/lib/db/migrations/meta/0020_snapshot.json
Normal file
10482
src/lib/db/migrations/meta/0020_snapshot.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -141,6 +141,13 @@
|
|||||||
"when": 1777671562738,
|
"when": 1777671562738,
|
||||||
"tag": "0019_lazy_vampiro",
|
"tag": "0019_lazy_vampiro",
|
||||||
"breakpoints": true
|
"breakpoints": true
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"idx": 20,
|
||||||
|
"version": "7",
|
||||||
|
"when": 1777811835982,
|
||||||
|
"tag": "0020_unusual_azazel",
|
||||||
|
"breakpoints": true
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ import {
|
|||||||
pgTable,
|
pgTable,
|
||||||
text,
|
text,
|
||||||
boolean,
|
boolean,
|
||||||
|
integer,
|
||||||
timestamp,
|
timestamp,
|
||||||
jsonb,
|
jsonb,
|
||||||
index,
|
index,
|
||||||
@@ -145,6 +146,54 @@ export const clientMergeLog = pgTable(
|
|||||||
(table) => [index('idx_cml_port').on(table.portId)],
|
(table) => [index('idx_cml_port').on(table.portId)],
|
||||||
);
|
);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Pairs of clients flagged by the background scoring job as potential
|
||||||
|
* duplicates. The `/admin/duplicates` review queue reads from here.
|
||||||
|
*
|
||||||
|
* Lifecycle:
|
||||||
|
* - Background job inserts a row when a pair scores >= the
|
||||||
|
* `dedup_review_queue_threshold` system setting.
|
||||||
|
* - User reviews in the admin UI and either merges (status='merged')
|
||||||
|
* or dismisses (status='dismissed').
|
||||||
|
* - Subsequent runs of the scoring job skip pairs already
|
||||||
|
* `dismissed` so the same false-positive doesn't keep reappearing.
|
||||||
|
* A future score increase recreates the row.
|
||||||
|
*
|
||||||
|
* Pairs are stored canonically with `clientAId < clientBId` (string
|
||||||
|
* comparison) so the same pair only generates one row regardless of
|
||||||
|
* scoring direction.
|
||||||
|
*/
|
||||||
|
export const clientMergeCandidates = pgTable(
|
||||||
|
'client_merge_candidates',
|
||||||
|
{
|
||||||
|
id: text('id')
|
||||||
|
.primaryKey()
|
||||||
|
.$defaultFn(() => crypto.randomUUID()),
|
||||||
|
portId: text('port_id')
|
||||||
|
.notNull()
|
||||||
|
.references(() => ports.id),
|
||||||
|
clientAId: text('client_a_id')
|
||||||
|
.notNull()
|
||||||
|
.references(() => clients.id, { onDelete: 'cascade' }),
|
||||||
|
clientBId: text('client_b_id')
|
||||||
|
.notNull()
|
||||||
|
.references(() => clients.id, { onDelete: 'cascade' }),
|
||||||
|
score: integer('score').notNull(),
|
||||||
|
/** Human-readable rule list, e.g. ["email match", "phone match"]. */
|
||||||
|
reasons: jsonb('reasons').notNull(),
|
||||||
|
status: text('status').notNull().default('pending'), // pending | dismissed | merged
|
||||||
|
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
|
||||||
|
resolvedAt: timestamp('resolved_at', { withTimezone: true }),
|
||||||
|
resolvedBy: text('resolved_by'),
|
||||||
|
},
|
||||||
|
(table) => [
|
||||||
|
index('idx_cmc_port_status').on(table.portId, table.status),
|
||||||
|
// Same pair shouldn't surface twice — enforce uniqueness on the
|
||||||
|
// canonical (a < b) ordering.
|
||||||
|
uniqueIndex('idx_cmc_pair').on(table.portId, table.clientAId, table.clientBId),
|
||||||
|
],
|
||||||
|
);
|
||||||
|
|
||||||
export const clientAddresses = pgTable(
|
export const clientAddresses = pgTable(
|
||||||
'client_addresses',
|
'client_addresses',
|
||||||
{
|
{
|
||||||
@@ -190,3 +239,5 @@ export type ClientMergeLog = typeof clientMergeLog.$inferSelect;
|
|||||||
export type NewClientMergeLog = typeof clientMergeLog.$inferInsert;
|
export type NewClientMergeLog = typeof clientMergeLog.$inferInsert;
|
||||||
export type ClientAddress = typeof clientAddresses.$inferSelect;
|
export type ClientAddress = typeof clientAddresses.$inferSelect;
|
||||||
export type NewClientAddress = typeof clientAddresses.$inferInsert;
|
export type NewClientAddress = typeof clientAddresses.$inferInsert;
|
||||||
|
export type ClientMergeCandidate = typeof clientMergeCandidates.$inferSelect;
|
||||||
|
export type NewClientMergeCandidate = typeof clientMergeCandidates.$inferInsert;
|
||||||
|
|||||||
@@ -56,5 +56,8 @@ export * from './ai-usage';
|
|||||||
// GDPR export tracking (Phase 3d)
|
// GDPR export tracking (Phase 3d)
|
||||||
export * from './gdpr';
|
export * from './gdpr';
|
||||||
|
|
||||||
|
// Migration ledger (one-shot scripts — NocoDB import etc.)
|
||||||
|
export * from './migration';
|
||||||
|
|
||||||
// Relations (must come last — references all tables)
|
// Relations (must come last — references all tables)
|
||||||
export * from './relations';
|
export * from './relations';
|
||||||
|
|||||||
48
src/lib/db/schema/migration.ts
Normal file
48
src/lib/db/schema/migration.ts
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
import { pgTable, text, timestamp, uniqueIndex } from 'drizzle-orm/pg-core';
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Idempotency ledger for one-shot data migrations from external sources
|
||||||
|
* (e.g. the legacy NocoDB Interests table).
|
||||||
|
*
|
||||||
|
* Every entity created during a migration script's `--apply` run gets a
|
||||||
|
* row here mapping the source-system row identifier to the new-system
|
||||||
|
* entity id. Re-running `--apply` against the same report skips rows
|
||||||
|
* already linked, so partial-failure resumption is just "run again."
|
||||||
|
*
|
||||||
|
* One source row can generate multiple new entities (e.g. one NocoDB
|
||||||
|
* Interests row → one client + one interest + one yacht), so the
|
||||||
|
* uniqueness constraint includes `target_entity_type`.
|
||||||
|
*/
|
||||||
|
export const migrationSourceLinks = pgTable(
|
||||||
|
'migration_source_links',
|
||||||
|
{
|
||||||
|
id: text('id')
|
||||||
|
.primaryKey()
|
||||||
|
.$defaultFn(() => crypto.randomUUID()),
|
||||||
|
/** e.g. 'nocodb_interests', 'nocodb_residences', 'nocodb_website_submissions'. */
|
||||||
|
sourceSystem: text('source_system').notNull(),
|
||||||
|
/** Source row identifier as a string (NocoDB IDs are integers; we keep
|
||||||
|
* text here for forward compat with other sources). */
|
||||||
|
sourceId: text('source_id').notNull(),
|
||||||
|
/** e.g. 'client', 'interest', 'yacht', 'document'. */
|
||||||
|
targetEntityType: text('target_entity_type').notNull(),
|
||||||
|
/** UUID of the new-system entity (clients.id, interests.id, etc.). */
|
||||||
|
targetEntityId: text('target_entity_id').notNull(),
|
||||||
|
/** Apply-id from the migration run that created this link — pairs with
|
||||||
|
* the on-disk apply manifest so `--rollback --apply-id <id>` knows
|
||||||
|
* exactly which links to remove. */
|
||||||
|
appliedId: text('applied_id').notNull(),
|
||||||
|
appliedBy: text('applied_by'),
|
||||||
|
appliedAt: timestamp('applied_at', { withTimezone: true }).notNull().defaultNow(),
|
||||||
|
},
|
||||||
|
(table) => [
|
||||||
|
uniqueIndex('idx_msl_source_target').on(
|
||||||
|
table.sourceSystem,
|
||||||
|
table.sourceId,
|
||||||
|
table.targetEntityType,
|
||||||
|
),
|
||||||
|
],
|
||||||
|
);
|
||||||
|
|
||||||
|
export type MigrationSourceLink = typeof migrationSourceLinks.$inferSelect;
|
||||||
|
export type NewMigrationSourceLink = typeof migrationSourceLinks.$inferInsert;
|
||||||
@@ -15,7 +15,7 @@
|
|||||||
* Design reference: docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §4.
|
* Design reference: docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §4.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { parsePhone } from '@/lib/i18n/phone';
|
import { parsePhoneScriptSafe as parsePhone } from './phone-parse';
|
||||||
|
|
||||||
import { levenshtein } from './normalize';
|
import { levenshtein } from './normalize';
|
||||||
|
|
||||||
|
|||||||
274
src/lib/dedup/migration-report.ts
Normal file
274
src/lib/dedup/migration-report.ts
Normal file
@@ -0,0 +1,274 @@
|
|||||||
|
/**
|
||||||
|
* Migration report writer — turns a `MigrationPlan` (from
|
||||||
|
* `migration-transform.ts`) into a CSV + a human-readable Markdown
|
||||||
|
* summary on disk under `.migration/<timestamp>/`.
|
||||||
|
*
|
||||||
|
* The CSV format is intentionally machine-friendly (one row per
|
||||||
|
* planned operation) so it can be diffed across runs and inspected
|
||||||
|
* by hand. The summary is designed for "open this in your editor and
|
||||||
|
* eyeball it for 5 minutes before --apply."
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { promises as fs } from 'node:fs';
|
||||||
|
import path from 'node:path';
|
||||||
|
|
||||||
|
import type { MigrationPlan } from './migration-transform';
|
||||||
|
|
||||||
|
// ─── Output directory ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface ReportPaths {
|
||||||
|
rootDir: string;
|
||||||
|
csvPath: string;
|
||||||
|
summaryPath: string;
|
||||||
|
planJsonPath: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Resolve report paths relative to the worktree root. The timestamped
|
||||||
|
* directory is created lazily by `writeReport`. */
|
||||||
|
export function resolveReportPaths(
|
||||||
|
rootDir: string,
|
||||||
|
timestamp: string = new Date().toISOString().replace(/[:.]/g, '-'),
|
||||||
|
): ReportPaths {
|
||||||
|
const dir = path.join(rootDir, '.migration', timestamp);
|
||||||
|
return {
|
||||||
|
rootDir: dir,
|
||||||
|
csvPath: path.join(dir, 'report.csv'),
|
||||||
|
summaryPath: path.join(dir, 'summary.md'),
|
||||||
|
planJsonPath: path.join(dir, 'plan.json'),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── CSV row shape ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
interface CsvRow {
|
||||||
|
op: string; // create_client / create_contact / create_interest / auto_link / flag / needs_review
|
||||||
|
reason: string;
|
||||||
|
source_id: string;
|
||||||
|
target_table: string;
|
||||||
|
target_value: string;
|
||||||
|
confidence: string;
|
||||||
|
manual_review: 'true' | 'false';
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trivial CSV escape: quote any cell that contains comma / quote / newline,
|
||||||
|
// double up internal quotes per RFC 4180. No need for a dependency.
|
||||||
|
function csvEscape(s: string): string {
|
||||||
|
if (/[",\n\r]/.test(s)) {
|
||||||
|
return `"${s.replace(/"/g, '""')}"`;
|
||||||
|
}
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
function rowToCsvLine(r: CsvRow): string {
|
||||||
|
return [
|
||||||
|
r.op,
|
||||||
|
r.reason,
|
||||||
|
r.source_id,
|
||||||
|
r.target_table,
|
||||||
|
r.target_value,
|
||||||
|
r.confidence,
|
||||||
|
r.manual_review,
|
||||||
|
]
|
||||||
|
.map(csvEscape)
|
||||||
|
.join(',');
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Build CSV ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export function buildCsv(plan: MigrationPlan): string {
|
||||||
|
const lines: string[] = [];
|
||||||
|
lines.push(
|
||||||
|
[
|
||||||
|
'op',
|
||||||
|
'reason',
|
||||||
|
'source_id',
|
||||||
|
'target_table',
|
||||||
|
'target_value',
|
||||||
|
'confidence',
|
||||||
|
'manual_review',
|
||||||
|
].join(','),
|
||||||
|
);
|
||||||
|
|
||||||
|
for (const client of plan.clients) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'create_client',
|
||||||
|
reason: client.sourceIds.length > 1 ? 'auto-merged cluster' : 'new',
|
||||||
|
source_id: client.sourceIds.join('|'),
|
||||||
|
target_table: 'clients.fullName',
|
||||||
|
target_value: client.fullName,
|
||||||
|
confidence: 'N/A',
|
||||||
|
manual_review: 'false',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
for (const c of client.contacts) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'create_contact',
|
||||||
|
reason: c.flagged ?? 'new',
|
||||||
|
source_id: client.sourceIds.join('|'),
|
||||||
|
target_table: `clientContacts.${c.channel}`,
|
||||||
|
target_value: c.value,
|
||||||
|
confidence: 'N/A',
|
||||||
|
manual_review: c.flagged ? 'true' : 'false',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
for (const a of client.addresses) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'create_address',
|
||||||
|
reason: 'address text present',
|
||||||
|
source_id: client.sourceIds.join('|'),
|
||||||
|
target_table: 'clientAddresses.countryIso',
|
||||||
|
target_value: a.countryIso ?? '(unresolved)',
|
||||||
|
confidence: a.countryConfidence ?? 'fallback',
|
||||||
|
manual_review: a.countryConfidence === 'fallback' || !a.countryIso ? 'true' : 'false',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const interest of plan.interests) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'create_interest',
|
||||||
|
reason: `pipelineStage=${interest.pipelineStage}`,
|
||||||
|
source_id: String(interest.sourceId),
|
||||||
|
target_table: 'interests',
|
||||||
|
target_value: `${interest.berthMooringNumber ?? '(no berth)'} / ${interest.yachtName ?? '(no yacht)'}`,
|
||||||
|
confidence: 'N/A',
|
||||||
|
manual_review: 'false',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const link of plan.autoLinks) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'auto_link',
|
||||||
|
reason: link.reasons.join(' + '),
|
||||||
|
source_id: `${link.leadSourceId}<-${link.mergedSourceIds.join(',')}`,
|
||||||
|
target_table: 'clients',
|
||||||
|
target_value: '(merged into lead)',
|
||||||
|
confidence: `score=${link.score}`,
|
||||||
|
manual_review: 'false',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const pair of plan.needsReview) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'needs_review',
|
||||||
|
reason: pair.reasons.join(' + '),
|
||||||
|
source_id: `${pair.aSourceId}<->${pair.bSourceId}`,
|
||||||
|
target_table: 'clients',
|
||||||
|
target_value: '(human review required)',
|
||||||
|
confidence: `score=${pair.score}`,
|
||||||
|
manual_review: 'true',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const flag of plan.flags) {
|
||||||
|
lines.push(
|
||||||
|
rowToCsvLine({
|
||||||
|
op: 'flag',
|
||||||
|
reason: flag.reason,
|
||||||
|
source_id: String(flag.sourceId),
|
||||||
|
target_table: flag.sourceTable,
|
||||||
|
target_value: JSON.stringify(flag.details ?? {}),
|
||||||
|
confidence: 'N/A',
|
||||||
|
manual_review: 'true',
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
return lines.join('\n') + '\n';
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Build summary markdown ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export function buildSummary(plan: MigrationPlan, generatedAt: string): string {
|
||||||
|
const s = plan.stats;
|
||||||
|
const lines: string[] = [];
|
||||||
|
lines.push(`# Migration Dry-Run — ${generatedAt}`);
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Input');
|
||||||
|
lines.push(`- ${s.inputInterestRows} NocoDB Interests`);
|
||||||
|
lines.push(`- ${s.inputResidentialRows} NocoDB Residential Interests`);
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Outcome');
|
||||||
|
lines.push(`- ${s.outputClients} clients`);
|
||||||
|
lines.push(`- ${s.outputInterests} interests (one per source row, linked to deduped client)`);
|
||||||
|
lines.push(`- ${s.outputContacts} client_contacts`);
|
||||||
|
lines.push(`- ${s.outputAddresses} client_addresses`);
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Auto-linked clusters');
|
||||||
|
if (plan.autoLinks.length === 0) {
|
||||||
|
lines.push('_None — every input row maps to a unique client._');
|
||||||
|
} else {
|
||||||
|
for (const link of plan.autoLinks) {
|
||||||
|
const merged = link.mergedSourceIds.length;
|
||||||
|
lines.push(
|
||||||
|
`- Lead row \`${link.leadSourceId}\` ← merged ${merged} other row${merged === 1 ? '' : 's'} (\`${link.mergedSourceIds.join(', ')}\`) — score ${link.score} via ${link.reasons.join(' + ')}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Pairs flagged for human review');
|
||||||
|
if (plan.needsReview.length === 0) {
|
||||||
|
lines.push('_None._');
|
||||||
|
} else {
|
||||||
|
for (const pair of plan.needsReview) {
|
||||||
|
lines.push(
|
||||||
|
`- Rows \`${pair.aSourceId}\` ↔ \`${pair.bSourceId}\` — score ${pair.score} (${pair.reasons.join(' + ')})`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Data quality flags');
|
||||||
|
if (plan.flags.length === 0) {
|
||||||
|
lines.push('_No quality issues._');
|
||||||
|
} else {
|
||||||
|
const byReason = new Map<string, number>();
|
||||||
|
for (const f of plan.flags) {
|
||||||
|
byReason.set(f.reason, (byReason.get(f.reason) ?? 0) + 1);
|
||||||
|
}
|
||||||
|
for (const [reason, count] of [...byReason].sort((a, b) => b[1] - a[1])) {
|
||||||
|
lines.push(`- **${count}× ${reason}**`);
|
||||||
|
}
|
||||||
|
lines.push('');
|
||||||
|
lines.push('### Detail');
|
||||||
|
for (const f of plan.flags.slice(0, 30)) {
|
||||||
|
lines.push(
|
||||||
|
`- \`${f.sourceTable}#${f.sourceId}\`: ${f.reason}${f.details ? ` — \`${JSON.stringify(f.details)}\`` : ''}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (plan.flags.length > 30) {
|
||||||
|
lines.push(`- _… and ${plan.flags.length - 30} more (see report.csv for full list)_`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lines.push('');
|
||||||
|
lines.push('## Next step');
|
||||||
|
lines.push('');
|
||||||
|
lines.push('Eyeball the auto-linked + flagged-for-review pairs above.');
|
||||||
|
lines.push('When satisfied, re-run the script with `--apply --report .migration/<this-dir>/`.');
|
||||||
|
lines.push('Apply will refuse to run if the source NocoDB has changed since this dry-run.');
|
||||||
|
|
||||||
|
return lines.join('\n') + '\n';
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Write to disk ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export async function writeReport(
|
||||||
|
paths: ReportPaths,
|
||||||
|
plan: MigrationPlan,
|
||||||
|
generatedAt: string,
|
||||||
|
): Promise<void> {
|
||||||
|
await fs.mkdir(paths.rootDir, { recursive: true });
|
||||||
|
await fs.writeFile(paths.csvPath, buildCsv(plan), 'utf-8');
|
||||||
|
await fs.writeFile(paths.summaryPath, buildSummary(plan, generatedAt), 'utf-8');
|
||||||
|
await fs.writeFile(paths.planJsonPath, JSON.stringify(plan, null, 2), 'utf-8');
|
||||||
|
}
|
||||||
576
src/lib/dedup/migration-transform.ts
Normal file
576
src/lib/dedup/migration-transform.ts
Normal file
@@ -0,0 +1,576 @@
|
|||||||
|
/**
|
||||||
|
* Pure transform: NocoDB snapshot → planned new-system entities + dedup result.
|
||||||
|
*
|
||||||
|
* Used by the migration script's `--dry-run` (to produce the report) and
|
||||||
|
* `--apply` (to actually write). Keeping this pure means the same code
|
||||||
|
* runs in both modes, in tests against the frozen fixture, and in the
|
||||||
|
* one-off CLI run against the live base.
|
||||||
|
*
|
||||||
|
* No side effects, no DB calls, no external services.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import {
|
||||||
|
normalizeName,
|
||||||
|
normalizeEmail,
|
||||||
|
normalizePhone,
|
||||||
|
resolveCountry,
|
||||||
|
type NormalizedPhone,
|
||||||
|
} from './normalize';
|
||||||
|
import { findClientMatches, type MatchCandidate } from './find-matches';
|
||||||
|
import type { CountryCode } from '@/lib/i18n/countries';
|
||||||
|
import type { NocoDbRow, NocoDbSnapshot } from './nocodb-source';
|
||||||
|
|
||||||
|
// ─── Plan output ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
export interface PlannedClient {
|
||||||
|
/** Stable id derived from the deduped cluster's lead row. Used by the
|
||||||
|
* apply phase to reference newly-created clients before they exist
|
||||||
|
* in the DB. */
|
||||||
|
tempId: string;
|
||||||
|
/** Source row IDs that contributed to this client (one if no duplicates,
|
||||||
|
* many if dedup merged a cluster). */
|
||||||
|
sourceIds: number[];
|
||||||
|
fullName: string;
|
||||||
|
surnameToken?: string;
|
||||||
|
countryIso: CountryCode | null;
|
||||||
|
preferredContactMethod: string | null;
|
||||||
|
source: string | null;
|
||||||
|
contacts: PlannedContact[];
|
||||||
|
addresses: PlannedAddress[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PlannedContact {
|
||||||
|
channel: 'email' | 'phone' | 'whatsapp' | 'other';
|
||||||
|
value: string;
|
||||||
|
valueE164?: string | null;
|
||||||
|
valueCountry?: CountryCode | null;
|
||||||
|
isPrimary: boolean;
|
||||||
|
flagged?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PlannedAddress {
|
||||||
|
streetAddress: string | null;
|
||||||
|
city: string | null;
|
||||||
|
countryIso: CountryCode | null;
|
||||||
|
/** When confidence is low, the migration script flags the row for
|
||||||
|
* human review. */
|
||||||
|
countryConfidence: 'exact' | 'fuzzy' | 'city' | 'fallback' | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PlannedInterest {
|
||||||
|
/** NocoDB row id this interest came from. */
|
||||||
|
sourceId: number;
|
||||||
|
/** tempId of the planned client this interest hangs off. */
|
||||||
|
clientTempId: string;
|
||||||
|
pipelineStage: string;
|
||||||
|
leadCategory: string | null;
|
||||||
|
source: string | null;
|
||||||
|
notes: string | null;
|
||||||
|
/** Mooring number; the apply phase resolves this to a berthId via the
|
||||||
|
* new-system Berths table. */
|
||||||
|
berthMooringNumber: string | null;
|
||||||
|
yachtName: string | null;
|
||||||
|
/** Date stamps for milestone columns. ISO strings if parseable. */
|
||||||
|
dateEoiSent: string | null;
|
||||||
|
dateEoiSigned: string | null;
|
||||||
|
dateDepositReceived: string | null;
|
||||||
|
dateContractSent: string | null;
|
||||||
|
dateContractSigned: string | null;
|
||||||
|
dateLastContact: string | null;
|
||||||
|
/** Documenso linkage carried forward when present so the document
|
||||||
|
* record can be stitched up downstream. */
|
||||||
|
documensoId: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MigrationFlag {
|
||||||
|
sourceTable: 'interests' | 'residential_interests' | 'website_interest_submissions';
|
||||||
|
sourceId: number;
|
||||||
|
reason: string;
|
||||||
|
details?: Record<string, unknown>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MigrationPlan {
|
||||||
|
clients: PlannedClient[];
|
||||||
|
interests: PlannedInterest[];
|
||||||
|
flags: MigrationFlag[];
|
||||||
|
/** Pairs that the migration would auto-link (high score). */
|
||||||
|
autoLinks: Array<{
|
||||||
|
leadSourceId: number;
|
||||||
|
mergedSourceIds: number[];
|
||||||
|
score: number;
|
||||||
|
reasons: string[];
|
||||||
|
}>;
|
||||||
|
/** Pairs that need human review (medium score). Each pair shows up
|
||||||
|
* in the migration report; the user resolves before --apply. */
|
||||||
|
needsReview: Array<{ aSourceId: number; bSourceId: number; score: number; reasons: string[] }>;
|
||||||
|
stats: MigrationStats;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MigrationStats {
|
||||||
|
inputInterestRows: number;
|
||||||
|
inputResidentialRows: number;
|
||||||
|
outputClients: number;
|
||||||
|
outputInterests: number;
|
||||||
|
outputContacts: number;
|
||||||
|
outputAddresses: number;
|
||||||
|
flaggedRows: number;
|
||||||
|
autoLinkedClusters: number;
|
||||||
|
needsReviewPairs: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface TransformOptions {
|
||||||
|
/** ISO country used when a phone has no prefix and the row has no
|
||||||
|
* Place of Residence. Defaults to AI (Anguilla / Port Nimara's home). */
|
||||||
|
defaultPhoneCountry: CountryCode;
|
||||||
|
/** Score thresholds for auto-link vs human review. Should match the
|
||||||
|
* per-port `system_settings` values once the runtime UI is in place. */
|
||||||
|
thresholds: {
|
||||||
|
autoLink: number;
|
||||||
|
needsReview: number;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const DEFAULT_OPTIONS: TransformOptions = {
|
||||||
|
defaultPhoneCountry: 'AI',
|
||||||
|
thresholds: { autoLink: 90, needsReview: 50 },
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─── Stage mapping ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
const STAGE_MAP: Record<string, string> = {
|
||||||
|
'General Qualified Interest': 'open',
|
||||||
|
'Specific Qualified Interest': 'details_sent',
|
||||||
|
'EOI and NDA Sent': 'eoi_sent',
|
||||||
|
'Signed EOI and NDA': 'eoi_signed',
|
||||||
|
'Made Reservation': 'deposit_10pct',
|
||||||
|
'Contract Negotiation': 'contract_sent',
|
||||||
|
'Contract Negotiations Finalized': 'contract_sent',
|
||||||
|
'Contract Signed': 'contract_signed',
|
||||||
|
};
|
||||||
|
|
||||||
|
const LEAD_CATEGORY_MAP: Record<string, string> = {
|
||||||
|
General: 'general_interest',
|
||||||
|
'Friends and Family': 'general_interest',
|
||||||
|
};
|
||||||
|
|
||||||
|
const SOURCE_MAP: Record<string, string> = {
|
||||||
|
portal: 'website',
|
||||||
|
Form: 'website',
|
||||||
|
External: 'manual',
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─── Date parsing ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Parse a date the legacy NocoDB might have stored in DD-MM-YYYY,
|
||||||
|
* DD/MM/YYYY, YYYY-MM-DD, or ISO format. Returns ISO string or null.
|
||||||
|
*/
|
||||||
|
function parseFlexibleDate(input: unknown): string | null {
|
||||||
|
if (typeof input !== 'string' || input.trim() === '') return null;
|
||||||
|
const s = input.trim();
|
||||||
|
|
||||||
|
// Already ISO
|
||||||
|
if (/^\d{4}-\d{2}-\d{2}/.test(s)) {
|
||||||
|
const d = new Date(s);
|
||||||
|
return Number.isNaN(d.getTime()) ? null : d.toISOString();
|
||||||
|
}
|
||||||
|
|
||||||
|
// DD-MM-YYYY or DD/MM/YYYY
|
||||||
|
const m = s.match(/^(\d{1,2})[-/](\d{1,2})[-/](\d{4})$/);
|
||||||
|
if (m) {
|
||||||
|
const [, day, month, year] = m;
|
||||||
|
const iso = `${year}-${month!.padStart(2, '0')}-${day!.padStart(2, '0')}`;
|
||||||
|
const d = new Date(iso);
|
||||||
|
return Number.isNaN(d.getTime()) ? null : d.toISOString();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Anything else: try Date constructor as a last resort
|
||||||
|
const d = new Date(s);
|
||||||
|
return Number.isNaN(d.getTime()) ? null : d.toISOString();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Main transform ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Run the full transform pipeline against a NocoDB snapshot. Pure
|
||||||
|
* function — same input always produces the same plan.
|
||||||
|
*/
|
||||||
|
export function transformSnapshot(
|
||||||
|
snapshot: NocoDbSnapshot,
|
||||||
|
options: Partial<TransformOptions> = {},
|
||||||
|
): MigrationPlan {
|
||||||
|
const opts = { ...DEFAULT_OPTIONS, ...options };
|
||||||
|
|
||||||
|
const flags: MigrationFlag[] = [];
|
||||||
|
// Build per-row candidates first so we can run dedup before assigning
|
||||||
|
// tempIds (clients with multiple source rows merge into one tempId).
|
||||||
|
const perRow = snapshot.interests.map((row) => rowToCandidate(row, 'interests', opts, flags));
|
||||||
|
|
||||||
|
// Dedup pass 1: every row scored against every other row (within the
|
||||||
|
// same pool). The blocking strategy in `findClientMatches` keeps this
|
||||||
|
// cheap even for the full 252-row dataset.
|
||||||
|
const clusters = clusterByDedup(perRow, opts);
|
||||||
|
|
||||||
|
// Build the planned clients + interests from the clusters.
|
||||||
|
const clients: PlannedClient[] = [];
|
||||||
|
const interests: PlannedInterest[] = [];
|
||||||
|
const autoLinks: MigrationPlan['autoLinks'] = [];
|
||||||
|
const needsReview: MigrationPlan['needsReview'] = [];
|
||||||
|
|
||||||
|
for (const cluster of clusters) {
|
||||||
|
const lead = cluster.leadCandidate;
|
||||||
|
const tempId = `client-${lead.row.Id}`;
|
||||||
|
|
||||||
|
// Build the client record from the lead row, then merge in any
|
||||||
|
// contact info / address info from the other rows in the cluster.
|
||||||
|
const planned = buildPlannedClient(tempId, cluster, opts);
|
||||||
|
clients.push(planned);
|
||||||
|
|
||||||
|
// Each row in the cluster becomes its own interest record.
|
||||||
|
for (const member of cluster.members) {
|
||||||
|
const interest = buildPlannedInterest(member.row, tempId);
|
||||||
|
interests.push(interest);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (cluster.members.length > 1) {
|
||||||
|
autoLinks.push({
|
||||||
|
leadSourceId: lead.row.Id,
|
||||||
|
mergedSourceIds: cluster.members.filter((m) => m !== lead).map((m) => m.row.Id),
|
||||||
|
score: cluster.maxScore,
|
||||||
|
reasons: cluster.reasons,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const pair of cluster.reviewPairs) {
|
||||||
|
needsReview.push(pair);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
clients,
|
||||||
|
interests,
|
||||||
|
flags,
|
||||||
|
autoLinks,
|
||||||
|
needsReview,
|
||||||
|
stats: {
|
||||||
|
inputInterestRows: snapshot.interests.length,
|
||||||
|
inputResidentialRows: snapshot.residentialInterests.length,
|
||||||
|
outputClients: clients.length,
|
||||||
|
outputInterests: interests.length,
|
||||||
|
outputContacts: clients.reduce((sum, c) => sum + c.contacts.length, 0),
|
||||||
|
outputAddresses: clients.reduce((sum, c) => sum + c.addresses.length, 0),
|
||||||
|
flaggedRows: flags.length,
|
||||||
|
autoLinkedClusters: autoLinks.length,
|
||||||
|
needsReviewPairs: needsReview.length,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Helpers ────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
interface RowCandidate {
|
||||||
|
row: NocoDbRow;
|
||||||
|
candidate: MatchCandidate;
|
||||||
|
/** Phone normalize result for the row's primary phone; used downstream
|
||||||
|
* to attach valueE164 + country to the planned contact. */
|
||||||
|
phoneResult: NormalizedPhone | null;
|
||||||
|
/** Country resolved from "Place of Residence". */
|
||||||
|
countryIso: CountryCode | null;
|
||||||
|
countryConfidence: 'exact' | 'fuzzy' | 'city' | null;
|
||||||
|
/** Normalized email or null. */
|
||||||
|
email: string | null;
|
||||||
|
/** Display name from `normalizeName`. */
|
||||||
|
displayName: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
function rowToCandidate(
|
||||||
|
row: NocoDbRow,
|
||||||
|
sourceTable: MigrationFlag['sourceTable'],
|
||||||
|
opts: TransformOptions,
|
||||||
|
flags: MigrationFlag[],
|
||||||
|
): RowCandidate {
|
||||||
|
const rawName = (row['Full Name'] as string | undefined) ?? '';
|
||||||
|
const rawEmail = (row['Email Address'] as string | undefined) ?? '';
|
||||||
|
const rawPhone = (row['Phone Number'] as string | undefined) ?? '';
|
||||||
|
const rawCountry = (row['Place of Residence'] as string | undefined) ?? '';
|
||||||
|
|
||||||
|
const normName = normalizeName(rawName);
|
||||||
|
const email = normalizeEmail(rawEmail);
|
||||||
|
const country = resolveCountry(rawCountry);
|
||||||
|
const phoneCountry = country.iso ?? opts.defaultPhoneCountry;
|
||||||
|
const phoneResult = normalizePhone(rawPhone, phoneCountry as CountryCode);
|
||||||
|
|
||||||
|
// Surface anything weird so the report can show it.
|
||||||
|
if (rawPhone && !phoneResult?.e164) {
|
||||||
|
flags.push({
|
||||||
|
sourceTable,
|
||||||
|
sourceId: row.Id,
|
||||||
|
reason: phoneResult?.flagged ? `phone ${phoneResult.flagged}` : 'phone unparseable',
|
||||||
|
details: { rawPhone },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
if (rawEmail && !email) {
|
||||||
|
flags.push({
|
||||||
|
sourceTable,
|
||||||
|
sourceId: row.Id,
|
||||||
|
reason: 'email invalid',
|
||||||
|
details: { rawEmail },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
if (rawCountry && !country.iso) {
|
||||||
|
flags.push({
|
||||||
|
sourceTable,
|
||||||
|
sourceId: row.Id,
|
||||||
|
reason: 'country unresolved',
|
||||||
|
details: { rawCountry },
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const candidate: MatchCandidate = {
|
||||||
|
id: String(row.Id),
|
||||||
|
fullName: normName.display || null,
|
||||||
|
surnameToken: normName.surnameToken ?? null,
|
||||||
|
emails: email ? [email] : [],
|
||||||
|
phonesE164: phoneResult?.e164 ? [phoneResult.e164] : [],
|
||||||
|
countryIso: country.iso ?? null,
|
||||||
|
};
|
||||||
|
|
||||||
|
return {
|
||||||
|
row,
|
||||||
|
candidate,
|
||||||
|
phoneResult,
|
||||||
|
countryIso: country.iso ?? null,
|
||||||
|
countryConfidence: country.confidence,
|
||||||
|
email,
|
||||||
|
displayName: normName.display,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
interface Cluster {
|
||||||
|
/** The cluster's "lead" row (most complete + most recent). */
|
||||||
|
leadCandidate: RowCandidate;
|
||||||
|
members: RowCandidate[];
|
||||||
|
maxScore: number;
|
||||||
|
reasons: string[];
|
||||||
|
/** Pairs in this cluster that scored medium (need review). */
|
||||||
|
reviewPairs: Array<{ aSourceId: number; bSourceId: number; score: number; reasons: string[] }>;
|
||||||
|
}
|
||||||
|
|
||||||
|
function clusterByDedup(rows: RowCandidate[], opts: TransformOptions): Cluster[] {
|
||||||
|
// Use a union-find structure indexed by row id. Every pair with a
|
||||||
|
// score >= autoLink threshold gets unioned. Pairs in [needsReview,
|
||||||
|
// autoLink) accumulate onto the cluster's reviewPairs list — they're
|
||||||
|
// surfaced for human triage but not auto-merged.
|
||||||
|
const parent = new Map<string, string>();
|
||||||
|
for (const r of rows) parent.set(r.candidate.id, r.candidate.id);
|
||||||
|
const find = (id: string): string => {
|
||||||
|
let cur = id;
|
||||||
|
while (parent.get(cur) !== cur) {
|
||||||
|
const next = parent.get(cur)!;
|
||||||
|
parent.set(cur, parent.get(next)!); // path compression
|
||||||
|
cur = parent.get(cur)!;
|
||||||
|
}
|
||||||
|
return cur;
|
||||||
|
};
|
||||||
|
const union = (a: string, b: string) => {
|
||||||
|
const rootA = find(a);
|
||||||
|
const rootB = find(b);
|
||||||
|
if (rootA !== rootB) parent.set(rootA, rootB);
|
||||||
|
};
|
||||||
|
|
||||||
|
const clusterReasons = new Map<string, string[]>();
|
||||||
|
const clusterMaxScore = new Map<string, number>();
|
||||||
|
const clusterReviewPairs = new Map<string, Cluster['reviewPairs']>();
|
||||||
|
|
||||||
|
// Score every candidate against every other candidate. The find-matches
|
||||||
|
// function does its own blocking so this is cheap.
|
||||||
|
for (let i = 0; i < rows.length; i += 1) {
|
||||||
|
const left = rows[i]!;
|
||||||
|
const remainingPool = rows.slice(i + 1).map((r) => r.candidate);
|
||||||
|
if (remainingPool.length === 0) continue;
|
||||||
|
const matches = findClientMatches(left.candidate, remainingPool, {
|
||||||
|
highScore: opts.thresholds.autoLink,
|
||||||
|
mediumScore: opts.thresholds.needsReview,
|
||||||
|
});
|
||||||
|
|
||||||
|
for (const m of matches) {
|
||||||
|
if (m.score >= opts.thresholds.autoLink) {
|
||||||
|
union(left.candidate.id, m.candidate.id);
|
||||||
|
const root = find(left.candidate.id);
|
||||||
|
clusterMaxScore.set(root, Math.max(clusterMaxScore.get(root) ?? 0, m.score));
|
||||||
|
const existing = clusterReasons.get(root) ?? [];
|
||||||
|
for (const reason of m.reasons) {
|
||||||
|
if (!existing.includes(reason)) existing.push(reason);
|
||||||
|
}
|
||||||
|
clusterReasons.set(root, existing);
|
||||||
|
} else if (m.score >= opts.thresholds.needsReview) {
|
||||||
|
// Medium — track on whichever cluster `left` belongs to.
|
||||||
|
const root = find(left.candidate.id);
|
||||||
|
const list = clusterReviewPairs.get(root) ?? [];
|
||||||
|
list.push({
|
||||||
|
aSourceId: parseInt(left.candidate.id, 10),
|
||||||
|
bSourceId: parseInt(m.candidate.id, 10),
|
||||||
|
score: m.score,
|
||||||
|
reasons: m.reasons,
|
||||||
|
});
|
||||||
|
clusterReviewPairs.set(root, list);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Group rows by their cluster root.
|
||||||
|
const byRoot = new Map<string, RowCandidate[]>();
|
||||||
|
for (const r of rows) {
|
||||||
|
const root = find(r.candidate.id);
|
||||||
|
const list = byRoot.get(root) ?? [];
|
||||||
|
list.push(r);
|
||||||
|
byRoot.set(root, list);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Build cluster objects, choosing the most-complete row as the lead.
|
||||||
|
const clusters: Cluster[] = [];
|
||||||
|
for (const [root, members] of byRoot) {
|
||||||
|
const lead = pickLead(members);
|
||||||
|
clusters.push({
|
||||||
|
leadCandidate: lead,
|
||||||
|
members,
|
||||||
|
maxScore: clusterMaxScore.get(root) ?? 0,
|
||||||
|
reasons: clusterReasons.get(root) ?? [],
|
||||||
|
reviewPairs: clusterReviewPairs.get(root) ?? [],
|
||||||
|
});
|
||||||
|
}
|
||||||
|
return clusters;
|
||||||
|
}
|
||||||
|
|
||||||
|
function pickLead(rows: RowCandidate[]): RowCandidate {
|
||||||
|
// Pick the row with the most populated fields, breaking ties by
|
||||||
|
// recency (highest Id, since NocoDB IDs are monotonic).
|
||||||
|
return rows.reduce((best, current) => {
|
||||||
|
const bestScore = completenessScore(best);
|
||||||
|
const currentScore = completenessScore(current);
|
||||||
|
if (currentScore > bestScore) return current;
|
||||||
|
if (currentScore === bestScore && current.row.Id > best.row.Id) return current;
|
||||||
|
return best;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function completenessScore(r: RowCandidate): number {
|
||||||
|
let score = 0;
|
||||||
|
if (r.email) score += 1;
|
||||||
|
if (r.phoneResult?.e164) score += 1;
|
||||||
|
if (r.row['Address']) score += 0.5;
|
||||||
|
if (r.row['Yacht Name']) score += 0.5;
|
||||||
|
if (r.row['Source']) score += 0.25;
|
||||||
|
if (r.row['Lead Category']) score += 0.25;
|
||||||
|
if (r.row['Internal Notes']) score += 0.25;
|
||||||
|
return score;
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildPlannedClient(
|
||||||
|
tempId: string,
|
||||||
|
cluster: Cluster,
|
||||||
|
opts: TransformOptions,
|
||||||
|
): PlannedClient {
|
||||||
|
const lead = cluster.leadCandidate;
|
||||||
|
|
||||||
|
// Collect distinct emails + phones from across the cluster — duplicate
|
||||||
|
// submissions often come with different contact methods we want to
|
||||||
|
// preserve as multiple rows in `client_contacts`.
|
||||||
|
const seenEmails = new Set<string>();
|
||||||
|
const seenPhones = new Set<string>();
|
||||||
|
const contacts: PlannedContact[] = [];
|
||||||
|
|
||||||
|
for (const member of cluster.members) {
|
||||||
|
if (member.email && !seenEmails.has(member.email)) {
|
||||||
|
seenEmails.add(member.email);
|
||||||
|
contacts.push({
|
||||||
|
channel: 'email',
|
||||||
|
value: member.email,
|
||||||
|
isPrimary: contacts.length === 0,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
if (member.phoneResult?.e164 && !seenPhones.has(member.phoneResult.e164)) {
|
||||||
|
seenPhones.add(member.phoneResult.e164);
|
||||||
|
const isFirstPhone = !contacts.some((c) => c.channel === 'phone');
|
||||||
|
contacts.push({
|
||||||
|
channel: 'phone',
|
||||||
|
value: member.phoneResult.e164,
|
||||||
|
valueE164: member.phoneResult.e164,
|
||||||
|
valueCountry: member.phoneResult.country,
|
||||||
|
isPrimary: isFirstPhone && contacts.every((c) => !c.isPrimary || c.channel === 'email'),
|
||||||
|
flagged: member.phoneResult.flagged,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Demote the email-primary if a more-completable phone exists.
|
||||||
|
// Simpler invariant: the first contact is primary unless the row
|
||||||
|
// explicitly preferred phone.
|
||||||
|
const preferredMethod = (lead.row['Contact Method Preferred'] as string | undefined)
|
||||||
|
?.toLowerCase()
|
||||||
|
?.trim();
|
||||||
|
|
||||||
|
// Address: only build if the lead row has a meaningful address text.
|
||||||
|
const rawAddress = (lead.row['Address'] as string | undefined)?.trim();
|
||||||
|
const addresses: PlannedAddress[] = [];
|
||||||
|
if (rawAddress) {
|
||||||
|
addresses.push({
|
||||||
|
streetAddress: rawAddress,
|
||||||
|
city: null,
|
||||||
|
countryIso: lead.countryIso ?? opts.defaultPhoneCountry,
|
||||||
|
countryConfidence: lead.countryConfidence ?? 'fallback',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const sourceFromRow = (lead.row['Source'] as string | undefined) ?? null;
|
||||||
|
const mappedSource = sourceFromRow ? (SOURCE_MAP[sourceFromRow] ?? 'manual') : null;
|
||||||
|
|
||||||
|
return {
|
||||||
|
tempId,
|
||||||
|
sourceIds: cluster.members.map((m) => m.row.Id),
|
||||||
|
fullName: lead.displayName,
|
||||||
|
surnameToken: lead.candidate.surnameToken ?? undefined,
|
||||||
|
countryIso: lead.countryIso,
|
||||||
|
preferredContactMethod: preferredMethod ?? null,
|
||||||
|
source: mappedSource,
|
||||||
|
contacts,
|
||||||
|
addresses,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildPlannedInterest(row: NocoDbRow, clientTempId: string): PlannedInterest {
|
||||||
|
const stage = (row['Sales Process Level'] as string | undefined) ?? '';
|
||||||
|
const cat = (row['Lead Category'] as string | undefined) ?? '';
|
||||||
|
|
||||||
|
const notesParts: string[] = [];
|
||||||
|
const internalNotes = row['Internal Notes'] as string | undefined;
|
||||||
|
const extraComments = row['Extra Comments'] as string | undefined;
|
||||||
|
if (internalNotes?.trim()) notesParts.push(internalNotes.trim());
|
||||||
|
if (extraComments?.trim()) notesParts.push(`Extra Comments: ${extraComments.trim()}`);
|
||||||
|
const berthSize = row['Berth Size Desired'] as string | undefined;
|
||||||
|
if (berthSize?.trim()) notesParts.push(`Berth size desired: ${berthSize.trim()}`);
|
||||||
|
|
||||||
|
return {
|
||||||
|
sourceId: row.Id,
|
||||||
|
clientTempId,
|
||||||
|
pipelineStage: STAGE_MAP[stage] ?? 'open',
|
||||||
|
leadCategory: LEAD_CATEGORY_MAP[cat] ?? null,
|
||||||
|
source: ((row['Source'] as string | undefined) ?? null) || null,
|
||||||
|
notes: notesParts.join('\n\n') || null,
|
||||||
|
berthMooringNumber: (row['Berth Number'] as string | undefined) ?? null,
|
||||||
|
yachtName: (() => {
|
||||||
|
const n = (row['Yacht Name'] as string | undefined)?.trim();
|
||||||
|
// Filter placeholder values used by sales reps for "we don't know yet".
|
||||||
|
if (!n) return null;
|
||||||
|
if (['TBC', 'Na', 'NA', 'na', 'N/A', 'TBD', 'tbd'].includes(n)) return null;
|
||||||
|
return n;
|
||||||
|
})(),
|
||||||
|
dateEoiSent: parseFlexibleDate(row['EOI Time Sent']),
|
||||||
|
dateEoiSigned: parseFlexibleDate(row['all_signed_notified_at'] ?? row['developerSignTime']),
|
||||||
|
dateDepositReceived: null, // not directly tracked in legacy schema
|
||||||
|
dateContractSent: parseFlexibleDate(row['Time LOI Sent']),
|
||||||
|
dateContractSigned: parseFlexibleDate(row['developerSignTime']),
|
||||||
|
dateLastContact: parseFlexibleDate(row['Created At'] ?? row['Date Added']),
|
||||||
|
documensoId: (row['documensoID'] as string | undefined) ?? null,
|
||||||
|
};
|
||||||
|
}
|
||||||
152
src/lib/dedup/nocodb-source.ts
Normal file
152
src/lib/dedup/nocodb-source.ts
Normal file
@@ -0,0 +1,152 @@
|
|||||||
|
/**
|
||||||
|
* Read-only adapter for the legacy NocoDB Port Nimara base.
|
||||||
|
*
|
||||||
|
* Used by the one-shot migration script (`scripts/migrate-from-nocodb.ts`)
|
||||||
|
* to pull every Interest, Residential Interest, and Website Submission
|
||||||
|
* row from the source-of-truth NocoDB tables. No mutations.
|
||||||
|
*
|
||||||
|
* Auth: `xc-token` header per NocoDB v2 API.
|
||||||
|
*
|
||||||
|
* The shape returned is a verbatim record of the row's fields — caller
|
||||||
|
* is responsible for mapping to the new schema via `nocodb-transform.ts`.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { z } from 'zod';
|
||||||
|
|
||||||
|
// ─── Configuration ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
const ConfigSchema = z.object({
|
||||||
|
url: z.string().url(),
|
||||||
|
token: z.string().min(1),
|
||||||
|
});
|
||||||
|
|
||||||
|
export interface NocoDbConfig {
|
||||||
|
url: string;
|
||||||
|
token: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function loadNocoDbConfig(env: NodeJS.ProcessEnv = process.env): NocoDbConfig {
|
||||||
|
return ConfigSchema.parse({
|
||||||
|
url: env.NOCODB_URL,
|
||||||
|
token: env.NOCODB_TOKEN,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ─── Table identifiers ──────────────────────────────────────────────────────
|
||||||
|
//
|
||||||
|
// These IDs are stable per the NocoDB base — they were captured during the
|
||||||
|
// 2026-05-03 audit and won't change unless the base is rebuilt. If the
|
||||||
|
// base is reset, regenerate them from `getTablesList`.
|
||||||
|
export const NOCO_TABLES = {
|
||||||
|
interests: 'mbs9hjauug4eseo',
|
||||||
|
residentialInterests: 'mscfpwwwjuds4nt',
|
||||||
|
websiteInterestSubmissions: 'mevkpcih67c6jsm',
|
||||||
|
websiteContactFormSubmissions: 'mxk5cd0pmwnwlcl',
|
||||||
|
websiteBerthEoiSupplements: 'mglmioo0ku8zgqj',
|
||||||
|
berths: 'mczgos9hr3oa9qc',
|
||||||
|
} as const;
|
||||||
|
|
||||||
|
// ─── HTTP shape ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
interface NocoDbListResponse<T> {
|
||||||
|
list: T[];
|
||||||
|
pageInfo: {
|
||||||
|
totalRows: number;
|
||||||
|
page: number;
|
||||||
|
pageSize: number;
|
||||||
|
isFirstPage: boolean;
|
||||||
|
isLastPage: boolean;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/** A row's `Id` is always present. The rest of the fields vary per table. */
|
||||||
|
export type NocoDbRow = Record<string, unknown> & { Id: number };
|
||||||
|
|
||||||
|
// ─── Public API ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Fetch all rows from a NocoDB table. Auto-paginates until the API
|
||||||
|
* reports `isLastPage`. The legacy base is small (252 Interests rows
|
||||||
|
* being the largest table) so we keep this simple — no streaming.
|
||||||
|
*/
|
||||||
|
export async function fetchAllRows(
|
||||||
|
tableId: string,
|
||||||
|
config: NocoDbConfig,
|
||||||
|
pageSize = 250,
|
||||||
|
): Promise<NocoDbRow[]> {
|
||||||
|
const all: NocoDbRow[] = [];
|
||||||
|
let page = 1;
|
||||||
|
// Hard cap to prevent infinite-loop bugs if pageInfo lies. Each page
|
||||||
|
// pulls up to `pageSize` rows, so 200 pages * 250 = 50k rows is the
|
||||||
|
// maximum we'll ever fetch from one table.
|
||||||
|
const MAX_PAGES = 200;
|
||||||
|
|
||||||
|
while (page <= MAX_PAGES) {
|
||||||
|
const url = new URL(`${config.url}/api/v2/tables/${tableId}/records`);
|
||||||
|
url.searchParams.set('limit', String(pageSize));
|
||||||
|
url.searchParams.set('offset', String((page - 1) * pageSize));
|
||||||
|
|
||||||
|
const res = await fetch(url, {
|
||||||
|
headers: {
|
||||||
|
'xc-token': config.token,
|
||||||
|
accept: 'application/json',
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!res.ok) {
|
||||||
|
throw new Error(
|
||||||
|
`NocoDB fetch failed: ${res.status} ${res.statusText} — table ${tableId} page ${page}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const json = (await res.json()) as NocoDbListResponse<NocoDbRow>;
|
||||||
|
all.push(...json.list);
|
||||||
|
|
||||||
|
if (json.pageInfo.isLastPage || json.list.length === 0) break;
|
||||||
|
page += 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return all;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Convenience snapshot — pulls every table the migration cares about
|
||||||
|
* in parallel. Returned shape is the input the transform layer expects.
|
||||||
|
*/
|
||||||
|
export interface NocoDbSnapshot {
|
||||||
|
interests: NocoDbRow[];
|
||||||
|
residentialInterests: NocoDbRow[];
|
||||||
|
websiteInterestSubmissions: NocoDbRow[];
|
||||||
|
websiteContactFormSubmissions: NocoDbRow[];
|
||||||
|
websiteBerthEoiSupplements: NocoDbRow[];
|
||||||
|
berths: NocoDbRow[];
|
||||||
|
fetchedAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function fetchSnapshot(config: NocoDbConfig): Promise<NocoDbSnapshot> {
|
||||||
|
const [
|
||||||
|
interests,
|
||||||
|
residentialInterests,
|
||||||
|
websiteInterestSubmissions,
|
||||||
|
websiteContactFormSubmissions,
|
||||||
|
websiteBerthEoiSupplements,
|
||||||
|
berths,
|
||||||
|
] = await Promise.all([
|
||||||
|
fetchAllRows(NOCO_TABLES.interests, config),
|
||||||
|
fetchAllRows(NOCO_TABLES.residentialInterests, config),
|
||||||
|
fetchAllRows(NOCO_TABLES.websiteInterestSubmissions, config),
|
||||||
|
fetchAllRows(NOCO_TABLES.websiteContactFormSubmissions, config),
|
||||||
|
fetchAllRows(NOCO_TABLES.websiteBerthEoiSupplements, config),
|
||||||
|
fetchAllRows(NOCO_TABLES.berths, config),
|
||||||
|
]);
|
||||||
|
|
||||||
|
return {
|
||||||
|
interests,
|
||||||
|
residentialInterests,
|
||||||
|
websiteInterestSubmissions,
|
||||||
|
websiteContactFormSubmissions,
|
||||||
|
websiteBerthEoiSupplements,
|
||||||
|
berths,
|
||||||
|
fetchedAt: new Date().toISOString(),
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -12,7 +12,7 @@
|
|||||||
import { z } from 'zod';
|
import { z } from 'zod';
|
||||||
|
|
||||||
import { ALL_COUNTRY_CODES, getCountryName, type CountryCode } from '@/lib/i18n/countries';
|
import { ALL_COUNTRY_CODES, getCountryName, type CountryCode } from '@/lib/i18n/countries';
|
||||||
import { parsePhone } from '@/lib/i18n/phone';
|
import { parsePhoneScriptSafe as parsePhone } from './phone-parse';
|
||||||
|
|
||||||
// ─── Names ──────────────────────────────────────────────────────────────────
|
// ─── Names ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
@@ -228,10 +228,18 @@ export function normalizePhone(
|
|||||||
|
|
||||||
// 7. Parse via the existing i18n helper (libphonenumber-js under the hood).
|
// 7. Parse via the existing i18n helper (libphonenumber-js under the hood).
|
||||||
const parsed = parsePhone(cleaned, defaultCountry);
|
const parsed = parsePhone(cleaned, defaultCountry);
|
||||||
if (!parsed.e164 || !parsed.isValid) {
|
if (!parsed.e164) {
|
||||||
|
// Couldn't even produce a canonical form — genuinely garbage.
|
||||||
return { e164: null, country: null, display: null, flagged: 'unparseable' };
|
return { e164: null, country: null, display: null, flagged: 'unparseable' };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Note: we deliberately don't gate on `parsed.isValid`. The
|
||||||
|
// libphonenumber-js `min` build returns isValid=false for many real
|
||||||
|
// numbers (NANP territories share +1; some country metadata is
|
||||||
|
// truncated). For dedup we only need a canonical E.164 string to
|
||||||
|
// compare; strict validity is the form layer's problem, not ours.
|
||||||
|
// If a string-only test (e.g. \"abc-not-a-phone\") gets here, parse
|
||||||
|
// returns null e164 anyway and the branch above handles it.
|
||||||
return {
|
return {
|
||||||
e164: parsed.e164,
|
e164: parsed.e164,
|
||||||
country: parsed.country,
|
country: parsed.country,
|
||||||
@@ -261,6 +269,13 @@ const COUNTRY_ALIASES: Record<string, CountryCode> = {
|
|||||||
'st barth': 'BL',
|
'st barth': 'BL',
|
||||||
'st barths': 'BL',
|
'st barths': 'BL',
|
||||||
'st barthelemy': 'BL',
|
'st barthelemy': 'BL',
|
||||||
|
// Caribbean short-forms whose canonical Intl names are awkward
|
||||||
|
// ("Antigua and Barbuda", "Saint Vincent and the Grenadines", etc.).
|
||||||
|
antigua: 'AG',
|
||||||
|
barbuda: 'AG',
|
||||||
|
'st kitts': 'KN',
|
||||||
|
'saint kitts': 'KN',
|
||||||
|
nevis: 'KN',
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -275,6 +290,20 @@ const CITY_TO_COUNTRY: Record<string, CountryCode> = {
|
|||||||
'kansas city': 'US',
|
'kansas city': 'US',
|
||||||
'sag harbor': 'US',
|
'sag harbor': 'US',
|
||||||
'new york': 'US',
|
'new york': 'US',
|
||||||
|
// Cities that came out unresolved from the 2026-05-03 NocoDB dry-run.
|
||||||
|
// Using lowercase (post-normalize keys).
|
||||||
|
boston: 'US',
|
||||||
|
tampa: 'US',
|
||||||
|
'fort lauderdale': 'US',
|
||||||
|
'port jefferson': 'US',
|
||||||
|
nantucket: 'US',
|
||||||
|
// US state abbreviations that often appear standalone or as suffix:
|
||||||
|
' fl': 'US',
|
||||||
|
' ma': 'US',
|
||||||
|
' ny': 'US',
|
||||||
|
' tx': 'US',
|
||||||
|
' ca': 'US',
|
||||||
|
// International
|
||||||
london: 'GB',
|
london: 'GB',
|
||||||
paris: 'FR',
|
paris: 'FR',
|
||||||
};
|
};
|
||||||
|
|||||||
66
src/lib/dedup/phone-parse.ts
Normal file
66
src/lib/dedup/phone-parse.ts
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
/**
|
||||||
|
* Script-safe phone parser.
|
||||||
|
*
|
||||||
|
* The project's existing `src/lib/i18n/phone.ts` imports from
|
||||||
|
* `libphonenumber-js`, which under Node 25 + tsx loader hits a
|
||||||
|
* metadata-shape interop bug (`{ default }` wrapping the JSON). It
|
||||||
|
* works fine in Next.js + vitest, but a `tsx scripts/...` invocation
|
||||||
|
* blows up.
|
||||||
|
*
|
||||||
|
* This wrapper bypasses the bundled `index.cjs.js` and calls
|
||||||
|
* `libphonenumber-js/core` directly with metadata loaded as raw JSON.
|
||||||
|
* Same surface as the i18n helper; usable from both runtimes.
|
||||||
|
*
|
||||||
|
* Used by the dedup library's `normalizePhone`. The runtime UI still
|
||||||
|
* imports `i18n/phone` directly — no reason to touch a working path.
|
||||||
|
*/
|
||||||
|
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-require-imports
|
||||||
|
const core: typeof import('libphonenumber-js/core') = require('libphonenumber-js/core');
|
||||||
|
// Load the JSON directly. The bundled `index.cjs.js` does the same
|
||||||
|
// thing but its `require('../metadata.min.json')` hits a Node 25 ESM
|
||||||
|
// interop bug that wraps the JSON in `{ default }`. Importing the
|
||||||
|
// JSON file by absolute path through the package root sidesteps it.
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-require-imports, @typescript-eslint/no-explicit-any
|
||||||
|
const metadata: any = require('libphonenumber-js/metadata.min.json');
|
||||||
|
|
||||||
|
import type { CountryCode } from '@/lib/i18n/countries';
|
||||||
|
|
||||||
|
export interface ParsedPhone {
|
||||||
|
e164: string | null;
|
||||||
|
country: CountryCode | null;
|
||||||
|
national: string | null;
|
||||||
|
international: string | null;
|
||||||
|
isValid: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
const EMPTY: ParsedPhone = {
|
||||||
|
e164: null,
|
||||||
|
country: null,
|
||||||
|
national: null,
|
||||||
|
international: null,
|
||||||
|
isValid: false,
|
||||||
|
};
|
||||||
|
|
||||||
|
export function parsePhoneScriptSafe(raw: string, defaultCountry?: CountryCode): ParsedPhone {
|
||||||
|
const trimmed = raw.trim();
|
||||||
|
if (!trimmed) return EMPTY;
|
||||||
|
try {
|
||||||
|
// The core entry expects its own `CountryCode` type from
|
||||||
|
// libphonenumber-js. Our `CountryCode` type is the same set of ISO
|
||||||
|
// alpha-2 codes (we re-derive from the same Intl source) so this
|
||||||
|
// cast is structural-equivalent, not lossy.
|
||||||
|
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||||
|
const parsed = core.parsePhoneNumberFromString(trimmed, defaultCountry as any, metadata);
|
||||||
|
if (!parsed) return EMPTY;
|
||||||
|
return {
|
||||||
|
e164: parsed.number,
|
||||||
|
country: (parsed.country ?? null) as CountryCode | null,
|
||||||
|
national: parsed.formatNational(),
|
||||||
|
international: parsed.formatInternational(),
|
||||||
|
isValid: parsed.isValid(),
|
||||||
|
};
|
||||||
|
} catch {
|
||||||
|
return EMPTY;
|
||||||
|
}
|
||||||
|
}
|
||||||
213
tests/unit/dedup/migration-transform.test.ts
Normal file
213
tests/unit/dedup/migration-transform.test.ts
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
/**
|
||||||
|
* Migration transform — fixture-based regression test.
|
||||||
|
*
|
||||||
|
* Feeds the transform a small frozen NocoDB snapshot containing one
|
||||||
|
* representative row from each duplicate pattern documented in
|
||||||
|
* docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §1.2,
|
||||||
|
* and asserts the resulting plan matches the algorithm's expected
|
||||||
|
* behavior. If any future change starts merging Pattern F (Etiennette
|
||||||
|
* Clamouze) or stops merging Pattern A (Deepak Ramchandani), this
|
||||||
|
* test fails immediately.
|
||||||
|
*/
|
||||||
|
import { describe, expect, it } from 'vitest';
|
||||||
|
|
||||||
|
import { transformSnapshot } from '@/lib/dedup/migration-transform';
|
||||||
|
import type { NocoDbRow, NocoDbSnapshot } from '@/lib/dedup/nocodb-source';
|
||||||
|
|
||||||
|
function row(fields: Partial<NocoDbRow> & { Id: number }): NocoDbRow {
|
||||||
|
return fields as NocoDbRow;
|
||||||
|
}
|
||||||
|
|
||||||
|
const FIXTURE: NocoDbSnapshot = {
|
||||||
|
fetchedAt: '2026-05-03T12:00:00.000Z',
|
||||||
|
berths: [],
|
||||||
|
residentialInterests: [],
|
||||||
|
websiteInterestSubmissions: [],
|
||||||
|
websiteContactFormSubmissions: [],
|
||||||
|
websiteBerthEoiSupplements: [],
|
||||||
|
interests: [
|
||||||
|
// Pattern A: pure double-submit (Deepak Ramchandani #624/#625)
|
||||||
|
row({
|
||||||
|
Id: 624,
|
||||||
|
'Full Name': 'Deepak Ramchandani',
|
||||||
|
'Email Address': 'dannyrams8888@gmail.com',
|
||||||
|
'Phone Number': '+17215868888',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 625,
|
||||||
|
'Full Name': 'Deepak Ramchandani',
|
||||||
|
'Email Address': 'dannyrams8888@gmail.com',
|
||||||
|
'Phone Number': '+17215868888',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
|
||||||
|
// Pattern B: phone format variance (Howard Wiarda #236/#536)
|
||||||
|
row({
|
||||||
|
Id: 236,
|
||||||
|
'Full Name': 'Howard Wiarda',
|
||||||
|
'Email Address': 'hwiarda@hotmail.com',
|
||||||
|
'Phone Number': '574-274-0548',
|
||||||
|
'Place of Residence': 'USA',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 536,
|
||||||
|
'Full Name': 'Howard Wiarda',
|
||||||
|
'Email Address': 'hwiarda@hotmail.com',
|
||||||
|
'Phone Number': '+15742740548',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
|
||||||
|
// Pattern C: name capitalization (Nicolas Ruiz #681/#682/#683 — three rows)
|
||||||
|
row({
|
||||||
|
Id: 681,
|
||||||
|
'Full Name': 'Nicolas Ruiz',
|
||||||
|
'Email Address': 'ruiz.nicolas@ufl.edu',
|
||||||
|
'Phone Number': '+17862006617',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 682,
|
||||||
|
'Full Name': 'Nicolas Ruiz',
|
||||||
|
'Email Address': 'ruiz.nicolas@ufl.edu',
|
||||||
|
'Phone Number': '+17862006617',
|
||||||
|
'Sales Process Level': 'Specific Qualified Interest',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 683,
|
||||||
|
'Full Name': 'Nicolas Ruiz',
|
||||||
|
'Email Address': 'Ruiz.Nicolas@ufl.edu',
|
||||||
|
'Phone Number': '+17862006617',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
|
||||||
|
// Pattern E: surname typo with same email + phone (Constanzo/Costanzo)
|
||||||
|
row({
|
||||||
|
Id: 336,
|
||||||
|
'Full Name': 'Gianfranco Di Costanzo',
|
||||||
|
'Email Address': 'gdc@nauticall.com',
|
||||||
|
'Phone Number': '+17542628669',
|
||||||
|
'Yacht Name': 'GEMINI',
|
||||||
|
'Sales Process Level': 'Contract Signed',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 585,
|
||||||
|
'Full Name': 'Gianfranco Di Constanzo',
|
||||||
|
'Email Address': 'gdc@nauticall.com',
|
||||||
|
'Phone Number': '+17542628669',
|
||||||
|
'Yacht Name': 'CALYPSO',
|
||||||
|
'Sales Process Level': 'Signed EOI and NDA',
|
||||||
|
}),
|
||||||
|
|
||||||
|
// Pattern F: same name, different country phones (Etiennette Clamouze)
|
||||||
|
row({
|
||||||
|
Id: 188,
|
||||||
|
'Full Name': 'Etiennette Clamouze',
|
||||||
|
'Email Address': 'clamouze.etiennette@gmail.com',
|
||||||
|
'Phone Number': '+33767780640',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
row({
|
||||||
|
Id: 717,
|
||||||
|
'Full Name': 'Etiennette Clamouze',
|
||||||
|
'Email Address': 'Etiennette@the-manoah.com',
|
||||||
|
'Phone Number': '+12645815607',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
|
||||||
|
// Single isolated row to verify non-duplicates pass through
|
||||||
|
row({
|
||||||
|
Id: 999,
|
||||||
|
'Full Name': 'Lone Wolf',
|
||||||
|
'Email Address': 'lone@example.com',
|
||||||
|
'Phone Number': '+15551234567',
|
||||||
|
'Sales Process Level': 'General Qualified Interest',
|
||||||
|
}),
|
||||||
|
],
|
||||||
|
};
|
||||||
|
|
||||||
|
describe('transformSnapshot — fixture regression', () => {
|
||||||
|
it('produces the expected number of clients + interests', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
|
||||||
|
// 12 input rows → 7 unique clients (Deepak: 1, Wiarda: 1, Ruiz: 1,
|
||||||
|
// Constanzo: 1, Etiennette x2: 2, Lone: 1). Etiennette stays as 2
|
||||||
|
// because Pattern F is correctly NOT auto-merged.
|
||||||
|
expect(plan.stats.outputClients).toBe(7);
|
||||||
|
expect(plan.stats.outputInterests).toBe(12); // one per source row
|
||||||
|
});
|
||||||
|
|
||||||
|
it('auto-links every Pattern A–E cluster', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
const linkedSourceIds = new Set<number>();
|
||||||
|
for (const link of plan.autoLinks) {
|
||||||
|
linkedSourceIds.add(link.leadSourceId);
|
||||||
|
for (const merged of link.mergedSourceIds) {
|
||||||
|
linkedSourceIds.add(merged);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pattern A: 624 + 625
|
||||||
|
expect(linkedSourceIds.has(624) && linkedSourceIds.has(625)).toBe(true);
|
||||||
|
// Pattern B: 236 + 536
|
||||||
|
expect(linkedSourceIds.has(236) && linkedSourceIds.has(536)).toBe(true);
|
||||||
|
// Pattern C: 681 + 682 + 683 (three-way)
|
||||||
|
expect(linkedSourceIds.has(681) && linkedSourceIds.has(682) && linkedSourceIds.has(683)).toBe(
|
||||||
|
true,
|
||||||
|
);
|
||||||
|
// Pattern E: 336 + 585
|
||||||
|
expect(linkedSourceIds.has(336) && linkedSourceIds.has(585)).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('does NOT auto-link Pattern F (Etiennette Clamouze, different country)', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
const linkedSourceIds = new Set<number>();
|
||||||
|
for (const link of plan.autoLinks) {
|
||||||
|
linkedSourceIds.add(link.leadSourceId);
|
||||||
|
for (const merged of link.mergedSourceIds) {
|
||||||
|
linkedSourceIds.add(merged);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Both Etiennette rows must remain as separate clients.
|
||||||
|
expect(linkedSourceIds.has(188)).toBe(false);
|
||||||
|
expect(linkedSourceIds.has(717)).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('preserves every interest as its own row even when clients merge', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
const sourceIds = plan.interests.map((i) => i.sourceId).sort((a, b) => a - b);
|
||||||
|
expect(sourceIds).toEqual([188, 236, 336, 536, 585, 624, 625, 681, 682, 683, 717, 999]);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('maps the legacy 8-stage enum to new pipeline stages', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
const stagesById = new Map(plan.interests.map((i) => [i.sourceId, i.pipelineStage]));
|
||||||
|
expect(stagesById.get(681)).toBe('open'); // General Qualified Interest
|
||||||
|
expect(stagesById.get(682)).toBe('details_sent'); // Specific Qualified Interest
|
||||||
|
expect(stagesById.get(336)).toBe('contract_signed'); // Contract Signed
|
||||||
|
expect(stagesById.get(585)).toBe('eoi_signed'); // Signed EOI and NDA
|
||||||
|
});
|
||||||
|
|
||||||
|
it('attaches different yachts to one merged Constanzo client', () => {
|
||||||
|
const plan = transformSnapshot(FIXTURE);
|
||||||
|
const constanzoClient = plan.clients.find(
|
||||||
|
(c) => c.sourceIds.includes(336) && c.sourceIds.includes(585),
|
||||||
|
);
|
||||||
|
expect(constanzoClient).toBeDefined();
|
||||||
|
const yachtsForConstanzo = plan.interests
|
||||||
|
.filter((i) => i.clientTempId === constanzoClient!.tempId)
|
||||||
|
.map((i) => i.yachtName)
|
||||||
|
.sort();
|
||||||
|
expect(yachtsForConstanzo).toEqual(['CALYPSO', 'GEMINI']);
|
||||||
|
});
|
||||||
|
|
||||||
|
it('produces deterministic output (same input → same plan)', () => {
|
||||||
|
// The transform is pure — running it twice should yield bit-identical
|
||||||
|
// results. Catches order-dependent bugs in the dedup clustering.
|
||||||
|
const a = transformSnapshot(FIXTURE);
|
||||||
|
const b = transformSnapshot(FIXTURE);
|
||||||
|
expect(JSON.stringify(a.stats)).toBe(JSON.stringify(b.stats));
|
||||||
|
expect(a.autoLinks.length).toBe(b.autoLinks.length);
|
||||||
|
});
|
||||||
|
});
|
||||||
Reference in New Issue
Block a user