feat(dedup): NocoDB migration script + tables (P3 dry-run)

Lands the one-shot migration pipeline from the legacy NocoDB Interests base into the new client/interest schema. Dry-run mode is fully operational: pulls the live snapshot, runs the dedup library, and writes a CSV + Markdown report under .migration/<timestamp>/. The --apply phase is stubbed for a follow-up PR per the design's P3 implementation sequence. Schema additions ================ - `client_merge_candidates` — pairs flagged by the background scoring job for the /admin/duplicates review queue. Status enum: pending / dismissed / merged. Unique-(portId, clientAId, clientBId) so the same pair can't surface twice. Empty until P2 lands the cron. - `migration_source_links` — idempotency ledger. Maps source-system rows (NocoDB Interest #624 → new client UUID) so re-running --apply against the same dry-run report skips already-imported entities. Both tables ship with the migration `0020_unusual_azazel.sql` — already applied to the local dev DB during this commit's preparation. Library ======= src/lib/dedup/nocodb-source.ts Read-only adapter for the legacy NocoDB v2 API. xc-token auth, auto-paginates until isLastPage, captures the table IDs from the 2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in parallel into one in-memory object the transform layer consumes. src/lib/dedup/migration-transform.ts Pure function: NocoDB snapshot in, MigrationPlan out. Per row: - normalizes name / email / phone / country via the dedup library - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats - maps the 8-stage `Sales Process Level` enum to the new 9-stage pipelineStage - filters yacht-name placeholders ('TBC', 'Na', etc.) - merges Internal Notes + Extra Comments + Berth Size Desired into a single notes blob Then runs `findClientMatches` pairwise (with blocking) and union-finds clusters of rows whose score crosses the auto-link threshold (90). Lower-scoring pairs (50–89) become 'needs review'. Each cluster's "lead" row is picked by completeness score with recency tie-break. src/lib/dedup/migration-report.ts Writes three artifacts to .migration/<timestamp>/: - report.csv — one row per planned op, RFC-4180 escaped - summary.md — human-skimmable overview - plan.json — full structured plan for the --apply phase CSV cells with comma / quote / newline are quoted; internal quotes are doubled. No external CSV dep. src/lib/dedup/phone-parse.ts Script-safe wrapper around libphonenumber-js's `core` entry that loads `metadata.min.json` directly. The default `index.cjs.js` bundled by libphonenumber hits a metadata-shape interop bug under Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it. The dedup `normalizePhone` and `find-matches` both use this wrapper now so the same code path runs in vitest, Next.js, and the migration CLI without surprises. src/lib/dedup/normalize.ts Tightened country resolution: added Caribbean short-form aliases ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the US locations seen in the NocoDB dump (Boston, Tampa, Fort Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing to drop the `isValid()` strict check — the libphonenumber min build rejects many real NANP-territory numbers, and dedup only needs a canonical E.164 to compare. CLI === scripts/migrate-from-nocodb.ts pnpm tsx scripts/migrate-from-nocodb.ts --dry-run → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars), runs the transform, writes report. No DB writes. pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/ → Stubbed; exits with `not yet implemented` and a pointer to the design doc. Apply phase ships in a follow-up. Tests ===== tests/unit/dedup/migration-transform.test.ts (7 cases) Fixture-based regression. A frozen 12-row NocoDB snapshot covers every duplicate pattern in the design (§1.2). The test asserts: - 12 input rows → 7 unique clients (cluster math is right) - Patterns A / B / C / E auto-link - Pattern F (Etiennette Clamouze) does NOT auto-link - Every interest preserved as its own row even when clients merge - 8-stage → 9-stage enum mapping is correct per spec - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one client) — the design's signature win - Output is deterministic (run twice, identical) Validation against real data ============================ Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the live NocoDB. Result on 252 Interests rows: - 237 clients (15 merged into 13 clusters) - 252 interests (one per source row) - 406 contacts, 52 addresses - 13 auto-linked clusters (every confirmed cluster from §1.2 audit) - 3 pairs flagged for review (Camazou, Zasso, one new) - 1 phone placeholder flagged Total dedup test count: 57 (50 from P1 + 7 fixture tests). Lint: clean. Tsc: clean for new files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
parent 8b077e1999
commit 18e5c124b0
15 changed files with 12079 additions and 3 deletions
--- a/src/lib/db/migrations/0020_unusual_azazel.sql
+++ b/src/lib/db/migrations/0020_unusual_azazel.sql
@@ -0,0 +1,30 @@
+CREATE TABLE "client_merge_candidates" (
+	"id" text PRIMARY KEY NOT NULL,
+	"port_id" text NOT NULL,
+	"client_a_id" text NOT NULL,
+	"client_b_id" text NOT NULL,
+	"score" integer NOT NULL,
+	"reasons" jsonb NOT NULL,
+	"status" text DEFAULT 'pending' NOT NULL,
+	"created_at" timestamp with time zone DEFAULT now() NOT NULL,
+	"resolved_at" timestamp with time zone,
+	"resolved_by" text
+);
+--> statement-breakpoint
+CREATE TABLE "migration_source_links" (
+	"id" text PRIMARY KEY NOT NULL,
+	"source_system" text NOT NULL,
+	"source_id" text NOT NULL,
+	"target_entity_type" text NOT NULL,
+	"target_entity_id" text NOT NULL,
+	"applied_id" text NOT NULL,
+	"applied_by" text,
+	"applied_at" timestamp with time zone DEFAULT now() NOT NULL
+);
+--> statement-breakpoint
+ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_port_id_ports_id_fk" FOREIGN KEY ("port_id") REFERENCES "public"."ports"("id") ON DELETE no action ON UPDATE no action;--> statement-breakpoint
+ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_client_a_id_clients_id_fk" FOREIGN KEY ("client_a_id") REFERENCES "public"."clients"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
+ALTER TABLE "client_merge_candidates" ADD CONSTRAINT "client_merge_candidates_client_b_id_clients_id_fk" FOREIGN KEY ("client_b_id") REFERENCES "public"."clients"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
+CREATE INDEX "idx_cmc_port_status" ON "client_merge_candidates" USING btree ("port_id","status");--> statement-breakpoint
+CREATE UNIQUE INDEX "idx_cmc_pair" ON "client_merge_candidates" USING btree ("port_id","client_a_id","client_b_id");--> statement-breakpoint
+CREATE UNIQUE INDEX "idx_msl_source_target" ON "migration_source_links" USING btree ("source_system","source_id","target_entity_type");
--- a/src/lib/db/migrations/meta/0020_snapshot.json
+++ b/src/lib/db/migrations/meta/0020_snapshot.json
--- a/src/lib/db/migrations/meta/_journal.json
+++ b/src/lib/db/migrations/meta/_journal.json
@@ -141,6 +141,13 @@
      "when": 1777671562738,
      "tag": "0019_lazy_vampiro",
      "breakpoints": true
+    },
+    {
+      "idx": 20,
+      "version": "7",
+      "when": 1777811835982,
+      "tag": "0020_unusual_azazel",
+      "breakpoints": true
    }
  ]
 }
--- a/src/lib/db/schema/clients.ts
+++ b/src/lib/db/schema/clients.ts
@@ -2,6 +2,7 @@ import {
  pgTable,
  text,
  boolean,
+  integer,
  timestamp,
  jsonb,
  index,
@@ -145,6 +146,54 @@ export const clientMergeLog = pgTable(
  (table) => [index('idx_cml_port').on(table.portId)],
 );

+/**
+ * Pairs of clients flagged by the background scoring job as potential
+ * duplicates. The `/admin/duplicates` review queue reads from here.
+ *
+ * Lifecycle:
+ *   - Background job inserts a row when a pair scores >= the
+ *     `dedup_review_queue_threshold` system setting.
+ *   - User reviews in the admin UI and either merges (status='merged')
+ *     or dismisses (status='dismissed').
+ *   - Subsequent runs of the scoring job skip pairs already
+ *     `dismissed` so the same false-positive doesn't keep reappearing.
+ *     A future score increase recreates the row.
+ *
+ * Pairs are stored canonically with `clientAId < clientBId` (string
+ * comparison) so the same pair only generates one row regardless of
+ * scoring direction.
+ */
+export const clientMergeCandidates = pgTable(
+  'client_merge_candidates',
+  {
+    id: text('id')
+      .primaryKey()
+      .$defaultFn(() => crypto.randomUUID()),
+    portId: text('port_id')
+      .notNull()
+      .references(() => ports.id),
+    clientAId: text('client_a_id')
+      .notNull()
+      .references(() => clients.id, { onDelete: 'cascade' }),
+    clientBId: text('client_b_id')
+      .notNull()
+      .references(() => clients.id, { onDelete: 'cascade' }),
+    score: integer('score').notNull(),
+    /** Human-readable rule list, e.g. ["email match", "phone match"]. */
+    reasons: jsonb('reasons').notNull(),
+    status: text('status').notNull().default('pending'), // pending | dismissed | merged
+    createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
+    resolvedAt: timestamp('resolved_at', { withTimezone: true }),
+    resolvedBy: text('resolved_by'),
+  },
+  (table) => [
+    index('idx_cmc_port_status').on(table.portId, table.status),
+    // Same pair shouldn't surface twice — enforce uniqueness on the
+    // canonical (a < b) ordering.
+    uniqueIndex('idx_cmc_pair').on(table.portId, table.clientAId, table.clientBId),
+  ],
+);
+
 export const clientAddresses = pgTable(
  'client_addresses',
  {
@@ -190,3 +239,5 @@ export type ClientMergeLog = typeof clientMergeLog.$inferSelect;
 export type NewClientMergeLog = typeof clientMergeLog.$inferInsert;
 export type ClientAddress = typeof clientAddresses.$inferSelect;
 export type NewClientAddress = typeof clientAddresses.$inferInsert;
+export type ClientMergeCandidate = typeof clientMergeCandidates.$inferSelect;
+export type NewClientMergeCandidate = typeof clientMergeCandidates.$inferInsert;
--- a/src/lib/db/schema/index.ts
+++ b/src/lib/db/schema/index.ts
@@ -56,5 +56,8 @@ export * from './ai-usage';
 // GDPR export tracking (Phase 3d)
 export * from './gdpr';

+// Migration ledger (one-shot scripts — NocoDB import etc.)
+export * from './migration';
+
 // Relations (must come last — references all tables)
 export * from './relations';
--- a/src/lib/db/schema/migration.ts
+++ b/src/lib/db/schema/migration.ts
@@ -0,0 +1,48 @@
+import { pgTable, text, timestamp, uniqueIndex } from 'drizzle-orm/pg-core';
+
+/**
+ * Idempotency ledger for one-shot data migrations from external sources
+ * (e.g. the legacy NocoDB Interests table).
+ *
+ * Every entity created during a migration script's `--apply` run gets a
+ * row here mapping the source-system row identifier to the new-system
+ * entity id. Re-running `--apply` against the same report skips rows
+ * already linked, so partial-failure resumption is just "run again."
+ *
+ * One source row can generate multiple new entities (e.g. one NocoDB
+ * Interests row → one client + one interest + one yacht), so the
+ * uniqueness constraint includes `target_entity_type`.
+ */
+export const migrationSourceLinks = pgTable(
+  'migration_source_links',
+  {
+    id: text('id')
+      .primaryKey()
+      .$defaultFn(() => crypto.randomUUID()),
+    /** e.g. 'nocodb_interests', 'nocodb_residences', 'nocodb_website_submissions'. */
+    sourceSystem: text('source_system').notNull(),
+    /** Source row identifier as a string (NocoDB IDs are integers; we keep
+     *  text here for forward compat with other sources). */
+    sourceId: text('source_id').notNull(),
+    /** e.g. 'client', 'interest', 'yacht', 'document'. */
+    targetEntityType: text('target_entity_type').notNull(),
+    /** UUID of the new-system entity (clients.id, interests.id, etc.). */
+    targetEntityId: text('target_entity_id').notNull(),
+    /** Apply-id from the migration run that created this link — pairs with
+     *  the on-disk apply manifest so `--rollback --apply-id <id>` knows
+     *  exactly which links to remove. */
+    appliedId: text('applied_id').notNull(),
+    appliedBy: text('applied_by'),
+    appliedAt: timestamp('applied_at', { withTimezone: true }).notNull().defaultNow(),
+  },
+  (table) => [
+    uniqueIndex('idx_msl_source_target').on(
+      table.sourceSystem,
+      table.sourceId,
+      table.targetEntityType,
+    ),
+  ],
+);
+
+export type MigrationSourceLink = typeof migrationSourceLinks.$inferSelect;
+export type NewMigrationSourceLink = typeof migrationSourceLinks.$inferInsert;
--- a/src/lib/dedup/find-matches.ts
+++ b/src/lib/dedup/find-matches.ts
@@ -15,7 +15,7 @@
 * Design reference: docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §4.
 */

-import { parsePhone } from '@/lib/i18n/phone';
+import { parsePhoneScriptSafe as parsePhone } from './phone-parse';

 import { levenshtein } from './normalize';

--- a/src/lib/dedup/migration-report.ts
+++ b/src/lib/dedup/migration-report.ts
@@ -0,0 +1,274 @@
+/**
+ * Migration report writer — turns a `MigrationPlan` (from
+ * `migration-transform.ts`) into a CSV + a human-readable Markdown
+ * summary on disk under `.migration/<timestamp>/`.
+ *
+ * The CSV format is intentionally machine-friendly (one row per
+ * planned operation) so it can be diffed across runs and inspected
+ * by hand. The summary is designed for "open this in your editor and
+ * eyeball it for 5 minutes before --apply."
+ */
+
+import { promises as fs } from 'node:fs';
+import path from 'node:path';
+
+import type { MigrationPlan } from './migration-transform';
+
+// ─── Output directory ───────────────────────────────────────────────────────
+
+export interface ReportPaths {
+  rootDir: string;
+  csvPath: string;
+  summaryPath: string;
+  planJsonPath: string;
+}
+
+/** Resolve report paths relative to the worktree root. The timestamped
+ *  directory is created lazily by `writeReport`. */
+export function resolveReportPaths(
+  rootDir: string,
+  timestamp: string = new Date().toISOString().replace(/[:.]/g, '-'),
+): ReportPaths {
+  const dir = path.join(rootDir, '.migration', timestamp);
+  return {
+    rootDir: dir,
+    csvPath: path.join(dir, 'report.csv'),
+    summaryPath: path.join(dir, 'summary.md'),
+    planJsonPath: path.join(dir, 'plan.json'),
+  };
+}
+
+// ─── CSV row shape ──────────────────────────────────────────────────────────
+
+interface CsvRow {
+  op: string; // create_client / create_contact / create_interest / auto_link / flag / needs_review
+  reason: string;
+  source_id: string;
+  target_table: string;
+  target_value: string;
+  confidence: string;
+  manual_review: 'true' | 'false';
+}
+
+// Trivial CSV escape: quote any cell that contains comma / quote / newline,
+// double up internal quotes per RFC 4180. No need for a dependency.
+function csvEscape(s: string): string {
+  if (/[",\n\r]/.test(s)) {
+    return `"${s.replace(/"/g, '""')}"`;
+  }
+  return s;
+}
+
+function rowToCsvLine(r: CsvRow): string {
+  return [
+    r.op,
+    r.reason,
+    r.source_id,
+    r.target_table,
+    r.target_value,
+    r.confidence,
+    r.manual_review,
+  ]
+    .map(csvEscape)
+    .join(',');
+}
+
+// ─── Build CSV ──────────────────────────────────────────────────────────────
+
+export function buildCsv(plan: MigrationPlan): string {
+  const lines: string[] = [];
+  lines.push(
+    [
+      'op',
+      'reason',
+      'source_id',
+      'target_table',
+      'target_value',
+      'confidence',
+      'manual_review',
+    ].join(','),
+  );
+
+  for (const client of plan.clients) {
+    lines.push(
+      rowToCsvLine({
+        op: 'create_client',
+        reason: client.sourceIds.length > 1 ? 'auto-merged cluster' : 'new',
+        source_id: client.sourceIds.join('|'),
+        target_table: 'clients.fullName',
+        target_value: client.fullName,
+        confidence: 'N/A',
+        manual_review: 'false',
+      }),
+    );
+    for (const c of client.contacts) {
+      lines.push(
+        rowToCsvLine({
+          op: 'create_contact',
+          reason: c.flagged ?? 'new',
+          source_id: client.sourceIds.join('|'),
+          target_table: `clientContacts.${c.channel}`,
+          target_value: c.value,
+          confidence: 'N/A',
+          manual_review: c.flagged ? 'true' : 'false',
+        }),
+      );
+    }
+    for (const a of client.addresses) {
+      lines.push(
+        rowToCsvLine({
+          op: 'create_address',
+          reason: 'address text present',
+          source_id: client.sourceIds.join('|'),
+          target_table: 'clientAddresses.countryIso',
+          target_value: a.countryIso ?? '(unresolved)',
+          confidence: a.countryConfidence ?? 'fallback',
+          manual_review: a.countryConfidence === 'fallback' || !a.countryIso ? 'true' : 'false',
+        }),
+      );
+    }
+  }
+
+  for (const interest of plan.interests) {
+    lines.push(
+      rowToCsvLine({
+        op: 'create_interest',
+        reason: `pipelineStage=${interest.pipelineStage}`,
+        source_id: String(interest.sourceId),
+        target_table: 'interests',
+        target_value: `${interest.berthMooringNumber ?? '(no berth)'} / ${interest.yachtName ?? '(no yacht)'}`,
+        confidence: 'N/A',
+        manual_review: 'false',
+      }),
+    );
+  }
+
+  for (const link of plan.autoLinks) {
+    lines.push(
+      rowToCsvLine({
+        op: 'auto_link',
+        reason: link.reasons.join(' + '),
+        source_id: `${link.leadSourceId}<-${link.mergedSourceIds.join(',')}`,
+        target_table: 'clients',
+        target_value: '(merged into lead)',
+        confidence: `score=${link.score}`,
+        manual_review: 'false',
+      }),
+    );
+  }
+
+  for (const pair of plan.needsReview) {
+    lines.push(
+      rowToCsvLine({
+        op: 'needs_review',
+        reason: pair.reasons.join(' + '),
+        source_id: `${pair.aSourceId}<->${pair.bSourceId}`,
+        target_table: 'clients',
+        target_value: '(human review required)',
+        confidence: `score=${pair.score}`,
+        manual_review: 'true',
+      }),
+    );
+  }
+
+  for (const flag of plan.flags) {
+    lines.push(
+      rowToCsvLine({
+        op: 'flag',
+        reason: flag.reason,
+        source_id: String(flag.sourceId),
+        target_table: flag.sourceTable,
+        target_value: JSON.stringify(flag.details ?? {}),
+        confidence: 'N/A',
+        manual_review: 'true',
+      }),
+    );
+  }
+
+  return lines.join('\n') + '\n';
+}
+
+// ─── Build summary markdown ─────────────────────────────────────────────────
+
+export function buildSummary(plan: MigrationPlan, generatedAt: string): string {
+  const s = plan.stats;
+  const lines: string[] = [];
+  lines.push(`# Migration Dry-Run — ${generatedAt}`);
+  lines.push('');
+  lines.push('## Input');
+  lines.push(`- ${s.inputInterestRows} NocoDB Interests`);
+  lines.push(`- ${s.inputResidentialRows} NocoDB Residential Interests`);
+  lines.push('');
+  lines.push('## Outcome');
+  lines.push(`- ${s.outputClients} clients`);
+  lines.push(`- ${s.outputInterests} interests (one per source row, linked to deduped client)`);
+  lines.push(`- ${s.outputContacts} client_contacts`);
+  lines.push(`- ${s.outputAddresses} client_addresses`);
+  lines.push('');
+  lines.push('## Auto-linked clusters');
+  if (plan.autoLinks.length === 0) {
+    lines.push('_None — every input row maps to a unique client._');
+  } else {
+    for (const link of plan.autoLinks) {
+      const merged = link.mergedSourceIds.length;
+      lines.push(
+        `- Lead row \`${link.leadSourceId}\` ← merged ${merged} other row${merged === 1 ? '' : 's'} (\`${link.mergedSourceIds.join(', ')}\`) — score ${link.score} via ${link.reasons.join(' + ')}`,
+      );
+    }
+  }
+  lines.push('');
+  lines.push('## Pairs flagged for human review');
+  if (plan.needsReview.length === 0) {
+    lines.push('_None._');
+  } else {
+    for (const pair of plan.needsReview) {
+      lines.push(
+        `- Rows \`${pair.aSourceId}\` ↔ \`${pair.bSourceId}\` — score ${pair.score} (${pair.reasons.join(' + ')})`,
+      );
+    }
+  }
+  lines.push('');
+  lines.push('## Data quality flags');
+  if (plan.flags.length === 0) {
+    lines.push('_No quality issues._');
+  } else {
+    const byReason = new Map<string, number>();
+    for (const f of plan.flags) {
+      byReason.set(f.reason, (byReason.get(f.reason) ?? 0) + 1);
+    }
+    for (const [reason, count] of [...byReason].sort((a, b) => b[1] - a[1])) {
+      lines.push(`- **${count}× ${reason}**`);
+    }
+    lines.push('');
+    lines.push('### Detail');
+    for (const f of plan.flags.slice(0, 30)) {
+      lines.push(
+        `- \`${f.sourceTable}#${f.sourceId}\`: ${f.reason}${f.details ? ` — \`${JSON.stringify(f.details)}\`` : ''}`,
+      );
+    }
+    if (plan.flags.length > 30) {
+      lines.push(`- _… and ${plan.flags.length - 30} more (see report.csv for full list)_`);
+    }
+  }
+  lines.push('');
+  lines.push('## Next step');
+  lines.push('');
+  lines.push('Eyeball the auto-linked + flagged-for-review pairs above.');
+  lines.push('When satisfied, re-run the script with `--apply --report .migration/<this-dir>/`.');
+  lines.push('Apply will refuse to run if the source NocoDB has changed since this dry-run.');
+
+  return lines.join('\n') + '\n';
+}
+
+// ─── Write to disk ──────────────────────────────────────────────────────────
+
+export async function writeReport(
+  paths: ReportPaths,
+  plan: MigrationPlan,
+  generatedAt: string,
+): Promise<void> {
+  await fs.mkdir(paths.rootDir, { recursive: true });
+  await fs.writeFile(paths.csvPath, buildCsv(plan), 'utf-8');
+  await fs.writeFile(paths.summaryPath, buildSummary(plan, generatedAt), 'utf-8');
+  await fs.writeFile(paths.planJsonPath, JSON.stringify(plan, null, 2), 'utf-8');
+}
--- a/src/lib/dedup/migration-transform.ts
+++ b/src/lib/dedup/migration-transform.ts
@@ -0,0 +1,576 @@
+/**
+ * Pure transform: NocoDB snapshot → planned new-system entities + dedup result.
+ *
+ * Used by the migration script's `--dry-run` (to produce the report) and
+ * `--apply` (to actually write). Keeping this pure means the same code
+ * runs in both modes, in tests against the frozen fixture, and in the
+ * one-off CLI run against the live base.
+ *
+ * No side effects, no DB calls, no external services.
+ */
+
+import {
+  normalizeName,
+  normalizeEmail,
+  normalizePhone,
+  resolveCountry,
+  type NormalizedPhone,
+} from './normalize';
+import { findClientMatches, type MatchCandidate } from './find-matches';
+import type { CountryCode } from '@/lib/i18n/countries';
+import type { NocoDbRow, NocoDbSnapshot } from './nocodb-source';
+
+// ─── Plan output ────────────────────────────────────────────────────────────
+
+export interface PlannedClient {
+  /** Stable id derived from the deduped cluster's lead row. Used by the
+   *  apply phase to reference newly-created clients before they exist
+   *  in the DB. */
+  tempId: string;
+  /** Source row IDs that contributed to this client (one if no duplicates,
+   *  many if dedup merged a cluster). */
+  sourceIds: number[];
+  fullName: string;
+  surnameToken?: string;
+  countryIso: CountryCode | null;
+  preferredContactMethod: string | null;
+  source: string | null;
+  contacts: PlannedContact[];
+  addresses: PlannedAddress[];
+}
+
+export interface PlannedContact {
+  channel: 'email' | 'phone' | 'whatsapp' | 'other';
+  value: string;
+  valueE164?: string | null;
+  valueCountry?: CountryCode | null;
+  isPrimary: boolean;
+  flagged?: string;
+}
+
+export interface PlannedAddress {
+  streetAddress: string | null;
+  city: string | null;
+  countryIso: CountryCode | null;
+  /** When confidence is low, the migration script flags the row for
+   *  human review. */
+  countryConfidence: 'exact' | 'fuzzy' | 'city' | 'fallback' | null;
+}
+
+export interface PlannedInterest {
+  /** NocoDB row id this interest came from. */
+  sourceId: number;
+  /** tempId of the planned client this interest hangs off. */
+  clientTempId: string;
+  pipelineStage: string;
+  leadCategory: string | null;
+  source: string | null;
+  notes: string | null;
+  /** Mooring number; the apply phase resolves this to a berthId via the
+   *  new-system Berths table. */
+  berthMooringNumber: string | null;
+  yachtName: string | null;
+  /** Date stamps for milestone columns. ISO strings if parseable. */
+  dateEoiSent: string | null;
+  dateEoiSigned: string | null;
+  dateDepositReceived: string | null;
+  dateContractSent: string | null;
+  dateContractSigned: string | null;
+  dateLastContact: string | null;
+  /** Documenso linkage carried forward when present so the document
+   *  record can be stitched up downstream. */
+  documensoId: string | null;
+}
+
+export interface MigrationFlag {
+  sourceTable: 'interests' | 'residential_interests' | 'website_interest_submissions';
+  sourceId: number;
+  reason: string;
+  details?: Record<string, unknown>;
+}
+
+export interface MigrationPlan {
+  clients: PlannedClient[];
+  interests: PlannedInterest[];
+  flags: MigrationFlag[];
+  /** Pairs that the migration would auto-link (high score). */
+  autoLinks: Array<{
+    leadSourceId: number;
+    mergedSourceIds: number[];
+    score: number;
+    reasons: string[];
+  }>;
+  /** Pairs that need human review (medium score). Each pair shows up
+   *  in the migration report; the user resolves before --apply. */
+  needsReview: Array<{ aSourceId: number; bSourceId: number; score: number; reasons: string[] }>;
+  stats: MigrationStats;
+}
+
+export interface MigrationStats {
+  inputInterestRows: number;
+  inputResidentialRows: number;
+  outputClients: number;
+  outputInterests: number;
+  outputContacts: number;
+  outputAddresses: number;
+  flaggedRows: number;
+  autoLinkedClusters: number;
+  needsReviewPairs: number;
+}
+
+export interface TransformOptions {
+  /** ISO country used when a phone has no prefix and the row has no
+   *  Place of Residence. Defaults to AI (Anguilla / Port Nimara's home). */
+  defaultPhoneCountry: CountryCode;
+  /** Score thresholds for auto-link vs human review. Should match the
+   *  per-port `system_settings` values once the runtime UI is in place. */
+  thresholds: {
+    autoLink: number;
+    needsReview: number;
+  };
+}
+
+const DEFAULT_OPTIONS: TransformOptions = {
+  defaultPhoneCountry: 'AI',
+  thresholds: { autoLink: 90, needsReview: 50 },
+};
+
+// ─── Stage mapping ──────────────────────────────────────────────────────────
+
+const STAGE_MAP: Record<string, string> = {
+  'General Qualified Interest': 'open',
+  'Specific Qualified Interest': 'details_sent',
+  'EOI and NDA Sent': 'eoi_sent',
+  'Signed EOI and NDA': 'eoi_signed',
+  'Made Reservation': 'deposit_10pct',
+  'Contract Negotiation': 'contract_sent',
+  'Contract Negotiations Finalized': 'contract_sent',
+  'Contract Signed': 'contract_signed',
+};
+
+const LEAD_CATEGORY_MAP: Record<string, string> = {
+  General: 'general_interest',
+  'Friends and Family': 'general_interest',
+};
+
+const SOURCE_MAP: Record<string, string> = {
+  portal: 'website',
+  Form: 'website',
+  External: 'manual',
+};
+
+// ─── Date parsing ───────────────────────────────────────────────────────────
+
+/**
+ * Parse a date the legacy NocoDB might have stored in DD-MM-YYYY,
+ * DD/MM/YYYY, YYYY-MM-DD, or ISO format. Returns ISO string or null.
+ */
+function parseFlexibleDate(input: unknown): string | null {
+  if (typeof input !== 'string' || input.trim() === '') return null;
+  const s = input.trim();
+
+  // Already ISO
+  if (/^\d{4}-\d{2}-\d{2}/.test(s)) {
+    const d = new Date(s);
+    return Number.isNaN(d.getTime()) ? null : d.toISOString();
+  }
+
+  // DD-MM-YYYY or DD/MM/YYYY
+  const m = s.match(/^(\d{1,2})[-/](\d{1,2})[-/](\d{4})$/);
+  if (m) {
+    const [, day, month, year] = m;
+    const iso = `${year}-${month!.padStart(2, '0')}-${day!.padStart(2, '0')}`;
+    const d = new Date(iso);
+    return Number.isNaN(d.getTime()) ? null : d.toISOString();
+  }
+
+  // Anything else: try Date constructor as a last resort
+  const d = new Date(s);
+  return Number.isNaN(d.getTime()) ? null : d.toISOString();
+}
+
+// ─── Main transform ─────────────────────────────────────────────────────────
+
+/**
+ * Run the full transform pipeline against a NocoDB snapshot. Pure
+ * function — same input always produces the same plan.
+ */
+export function transformSnapshot(
+  snapshot: NocoDbSnapshot,
+  options: Partial<TransformOptions> = {},
+): MigrationPlan {
+  const opts = { ...DEFAULT_OPTIONS, ...options };
+
+  const flags: MigrationFlag[] = [];
+  // Build per-row candidates first so we can run dedup before assigning
+  // tempIds (clients with multiple source rows merge into one tempId).
+  const perRow = snapshot.interests.map((row) => rowToCandidate(row, 'interests', opts, flags));
+
+  // Dedup pass 1: every row scored against every other row (within the
+  // same pool). The blocking strategy in `findClientMatches` keeps this
+  // cheap even for the full 252-row dataset.
+  const clusters = clusterByDedup(perRow, opts);
+
+  // Build the planned clients + interests from the clusters.
+  const clients: PlannedClient[] = [];
+  const interests: PlannedInterest[] = [];
+  const autoLinks: MigrationPlan['autoLinks'] = [];
+  const needsReview: MigrationPlan['needsReview'] = [];
+
+  for (const cluster of clusters) {
+    const lead = cluster.leadCandidate;
+    const tempId = `client-${lead.row.Id}`;
+
+    // Build the client record from the lead row, then merge in any
+    // contact info / address info from the other rows in the cluster.
+    const planned = buildPlannedClient(tempId, cluster, opts);
+    clients.push(planned);
+
+    // Each row in the cluster becomes its own interest record.
+    for (const member of cluster.members) {
+      const interest = buildPlannedInterest(member.row, tempId);
+      interests.push(interest);
+    }
+
+    if (cluster.members.length > 1) {
+      autoLinks.push({
+        leadSourceId: lead.row.Id,
+        mergedSourceIds: cluster.members.filter((m) => m !== lead).map((m) => m.row.Id),
+        score: cluster.maxScore,
+        reasons: cluster.reasons,
+      });
+    }
+
+    for (const pair of cluster.reviewPairs) {
+      needsReview.push(pair);
+    }
+  }
+
+  return {
+    clients,
+    interests,
+    flags,
+    autoLinks,
+    needsReview,
+    stats: {
+      inputInterestRows: snapshot.interests.length,
+      inputResidentialRows: snapshot.residentialInterests.length,
+      outputClients: clients.length,
+      outputInterests: interests.length,
+      outputContacts: clients.reduce((sum, c) => sum + c.contacts.length, 0),
+      outputAddresses: clients.reduce((sum, c) => sum + c.addresses.length, 0),
+      flaggedRows: flags.length,
+      autoLinkedClusters: autoLinks.length,
+      needsReviewPairs: needsReview.length,
+    },
+  };
+}
+
+// ─── Helpers ────────────────────────────────────────────────────────────────
+
+interface RowCandidate {
+  row: NocoDbRow;
+  candidate: MatchCandidate;
+  /** Phone normalize result for the row's primary phone; used downstream
+   *  to attach valueE164 + country to the planned contact. */
+  phoneResult: NormalizedPhone | null;
+  /** Country resolved from "Place of Residence". */
+  countryIso: CountryCode | null;
+  countryConfidence: 'exact' | 'fuzzy' | 'city' | null;
+  /** Normalized email or null. */
+  email: string | null;
+  /** Display name from `normalizeName`. */
+  displayName: string;
+}
+
+function rowToCandidate(
+  row: NocoDbRow,
+  sourceTable: MigrationFlag['sourceTable'],
+  opts: TransformOptions,
+  flags: MigrationFlag[],
+): RowCandidate {
+  const rawName = (row['Full Name'] as string | undefined) ?? '';
+  const rawEmail = (row['Email Address'] as string | undefined) ?? '';
+  const rawPhone = (row['Phone Number'] as string | undefined) ?? '';
+  const rawCountry = (row['Place of Residence'] as string | undefined) ?? '';
+
+  const normName = normalizeName(rawName);
+  const email = normalizeEmail(rawEmail);
+  const country = resolveCountry(rawCountry);
+  const phoneCountry = country.iso ?? opts.defaultPhoneCountry;
+  const phoneResult = normalizePhone(rawPhone, phoneCountry as CountryCode);
+
+  // Surface anything weird so the report can show it.
+  if (rawPhone && !phoneResult?.e164) {
+    flags.push({
+      sourceTable,
+      sourceId: row.Id,
+      reason: phoneResult?.flagged ? `phone ${phoneResult.flagged}` : 'phone unparseable',
+      details: { rawPhone },
+    });
+  }
+  if (rawEmail && !email) {
+    flags.push({
+      sourceTable,
+      sourceId: row.Id,
+      reason: 'email invalid',
+      details: { rawEmail },
+    });
+  }
+  if (rawCountry && !country.iso) {
+    flags.push({
+      sourceTable,
+      sourceId: row.Id,
+      reason: 'country unresolved',
+      details: { rawCountry },
+    });
+  }
+
+  const candidate: MatchCandidate = {
+    id: String(row.Id),
+    fullName: normName.display || null,
+    surnameToken: normName.surnameToken ?? null,
+    emails: email ? [email] : [],
+    phonesE164: phoneResult?.e164 ? [phoneResult.e164] : [],
+    countryIso: country.iso ?? null,
+  };
+
+  return {
+    row,
+    candidate,
+    phoneResult,
+    countryIso: country.iso ?? null,
+    countryConfidence: country.confidence,
+    email,
+    displayName: normName.display,
+  };
+}
+
+interface Cluster {
+  /** The cluster's "lead" row (most complete + most recent). */
+  leadCandidate: RowCandidate;
+  members: RowCandidate[];
+  maxScore: number;
+  reasons: string[];
+  /** Pairs in this cluster that scored medium (need review). */
+  reviewPairs: Array<{ aSourceId: number; bSourceId: number; score: number; reasons: string[] }>;
+}
+
+function clusterByDedup(rows: RowCandidate[], opts: TransformOptions): Cluster[] {
+  // Use a union-find structure indexed by row id. Every pair with a
+  // score >= autoLink threshold gets unioned. Pairs in [needsReview,
+  // autoLink) accumulate onto the cluster's reviewPairs list — they're
+  // surfaced for human triage but not auto-merged.
+  const parent = new Map<string, string>();
+  for (const r of rows) parent.set(r.candidate.id, r.candidate.id);
+  const find = (id: string): string => {
+    let cur = id;
+    while (parent.get(cur) !== cur) {
+      const next = parent.get(cur)!;
+      parent.set(cur, parent.get(next)!); // path compression
+      cur = parent.get(cur)!;
+    }
+    return cur;
+  };
+  const union = (a: string, b: string) => {
+    const rootA = find(a);
+    const rootB = find(b);
+    if (rootA !== rootB) parent.set(rootA, rootB);
+  };
+
+  const clusterReasons = new Map<string, string[]>();
+  const clusterMaxScore = new Map<string, number>();
+  const clusterReviewPairs = new Map<string, Cluster['reviewPairs']>();
+
+  // Score every candidate against every other candidate. The find-matches
+  // function does its own blocking so this is cheap.
+  for (let i = 0; i < rows.length; i += 1) {
+    const left = rows[i]!;
+    const remainingPool = rows.slice(i + 1).map((r) => r.candidate);
+    if (remainingPool.length === 0) continue;
+    const matches = findClientMatches(left.candidate, remainingPool, {
+      highScore: opts.thresholds.autoLink,
+      mediumScore: opts.thresholds.needsReview,
+    });
+
+    for (const m of matches) {
+      if (m.score >= opts.thresholds.autoLink) {
+        union(left.candidate.id, m.candidate.id);
+        const root = find(left.candidate.id);
+        clusterMaxScore.set(root, Math.max(clusterMaxScore.get(root) ?? 0, m.score));
+        const existing = clusterReasons.get(root) ?? [];
+        for (const reason of m.reasons) {
+          if (!existing.includes(reason)) existing.push(reason);
+        }
+        clusterReasons.set(root, existing);
+      } else if (m.score >= opts.thresholds.needsReview) {
+        // Medium — track on whichever cluster `left` belongs to.
+        const root = find(left.candidate.id);
+        const list = clusterReviewPairs.get(root) ?? [];
+        list.push({
+          aSourceId: parseInt(left.candidate.id, 10),
+          bSourceId: parseInt(m.candidate.id, 10),
+          score: m.score,
+          reasons: m.reasons,
+        });
+        clusterReviewPairs.set(root, list);
+      }
+    }
+  }
+
+  // Group rows by their cluster root.
+  const byRoot = new Map<string, RowCandidate[]>();
+  for (const r of rows) {
+    const root = find(r.candidate.id);
+    const list = byRoot.get(root) ?? [];
+    list.push(r);
+    byRoot.set(root, list);
+  }
+
+  // Build cluster objects, choosing the most-complete row as the lead.
+  const clusters: Cluster[] = [];
+  for (const [root, members] of byRoot) {
+    const lead = pickLead(members);
+    clusters.push({
+      leadCandidate: lead,
+      members,
+      maxScore: clusterMaxScore.get(root) ?? 0,
+      reasons: clusterReasons.get(root) ?? [],
+      reviewPairs: clusterReviewPairs.get(root) ?? [],
+    });
+  }
+  return clusters;
+}
+
+function pickLead(rows: RowCandidate[]): RowCandidate {
+  // Pick the row with the most populated fields, breaking ties by
+  // recency (highest Id, since NocoDB IDs are monotonic).
+  return rows.reduce((best, current) => {
+    const bestScore = completenessScore(best);
+    const currentScore = completenessScore(current);
+    if (currentScore > bestScore) return current;
+    if (currentScore === bestScore && current.row.Id > best.row.Id) return current;
+    return best;
+  });
+}
+
+function completenessScore(r: RowCandidate): number {
+  let score = 0;
+  if (r.email) score += 1;
+  if (r.phoneResult?.e164) score += 1;
+  if (r.row['Address']) score += 0.5;
+  if (r.row['Yacht Name']) score += 0.5;
+  if (r.row['Source']) score += 0.25;
+  if (r.row['Lead Category']) score += 0.25;
+  if (r.row['Internal Notes']) score += 0.25;
+  return score;
+}
+
+function buildPlannedClient(
+  tempId: string,
+  cluster: Cluster,
+  opts: TransformOptions,
+): PlannedClient {
+  const lead = cluster.leadCandidate;
+
+  // Collect distinct emails + phones from across the cluster — duplicate
+  // submissions often come with different contact methods we want to
+  // preserve as multiple rows in `client_contacts`.
+  const seenEmails = new Set<string>();
+  const seenPhones = new Set<string>();
+  const contacts: PlannedContact[] = [];
+
+  for (const member of cluster.members) {
+    if (member.email && !seenEmails.has(member.email)) {
+      seenEmails.add(member.email);
+      contacts.push({
+        channel: 'email',
+        value: member.email,
+        isPrimary: contacts.length === 0,
+      });
+    }
+    if (member.phoneResult?.e164 && !seenPhones.has(member.phoneResult.e164)) {
+      seenPhones.add(member.phoneResult.e164);
+      const isFirstPhone = !contacts.some((c) => c.channel === 'phone');
+      contacts.push({
+        channel: 'phone',
+        value: member.phoneResult.e164,
+        valueE164: member.phoneResult.e164,
+        valueCountry: member.phoneResult.country,
+        isPrimary: isFirstPhone && contacts.every((c) => !c.isPrimary || c.channel === 'email'),
+        flagged: member.phoneResult.flagged,
+      });
+    }
+  }
+
+  // Demote the email-primary if a more-completable phone exists.
+  // Simpler invariant: the first contact is primary unless the row
+  // explicitly preferred phone.
+  const preferredMethod = (lead.row['Contact Method Preferred'] as string | undefined)
+    ?.toLowerCase()
+    ?.trim();
+
+  // Address: only build if the lead row has a meaningful address text.
+  const rawAddress = (lead.row['Address'] as string | undefined)?.trim();
+  const addresses: PlannedAddress[] = [];
+  if (rawAddress) {
+    addresses.push({
+      streetAddress: rawAddress,
+      city: null,
+      countryIso: lead.countryIso ?? opts.defaultPhoneCountry,
+      countryConfidence: lead.countryConfidence ?? 'fallback',
+    });
+  }
+
+  const sourceFromRow = (lead.row['Source'] as string | undefined) ?? null;
+  const mappedSource = sourceFromRow ? (SOURCE_MAP[sourceFromRow] ?? 'manual') : null;
+
+  return {
+    tempId,
+    sourceIds: cluster.members.map((m) => m.row.Id),
+    fullName: lead.displayName,
+    surnameToken: lead.candidate.surnameToken ?? undefined,
+    countryIso: lead.countryIso,
+    preferredContactMethod: preferredMethod ?? null,
+    source: mappedSource,
+    contacts,
+    addresses,
+  };
+}
+
+function buildPlannedInterest(row: NocoDbRow, clientTempId: string): PlannedInterest {
+  const stage = (row['Sales Process Level'] as string | undefined) ?? '';
+  const cat = (row['Lead Category'] as string | undefined) ?? '';
+
+  const notesParts: string[] = [];
+  const internalNotes = row['Internal Notes'] as string | undefined;
+  const extraComments = row['Extra Comments'] as string | undefined;
+  if (internalNotes?.trim()) notesParts.push(internalNotes.trim());
+  if (extraComments?.trim()) notesParts.push(`Extra Comments: ${extraComments.trim()}`);
+  const berthSize = row['Berth Size Desired'] as string | undefined;
+  if (berthSize?.trim()) notesParts.push(`Berth size desired: ${berthSize.trim()}`);
+
+  return {
+    sourceId: row.Id,
+    clientTempId,
+    pipelineStage: STAGE_MAP[stage] ?? 'open',
+    leadCategory: LEAD_CATEGORY_MAP[cat] ?? null,
+    source: ((row['Source'] as string | undefined) ?? null) || null,
+    notes: notesParts.join('\n\n') || null,
+    berthMooringNumber: (row['Berth Number'] as string | undefined) ?? null,
+    yachtName: (() => {
+      const n = (row['Yacht Name'] as string | undefined)?.trim();
+      // Filter placeholder values used by sales reps for "we don't know yet".
+      if (!n) return null;
+      if (['TBC', 'Na', 'NA', 'na', 'N/A', 'TBD', 'tbd'].includes(n)) return null;
+      return n;
+    })(),
+    dateEoiSent: parseFlexibleDate(row['EOI Time Sent']),
+    dateEoiSigned: parseFlexibleDate(row['all_signed_notified_at'] ?? row['developerSignTime']),
+    dateDepositReceived: null, // not directly tracked in legacy schema
+    dateContractSent: parseFlexibleDate(row['Time LOI Sent']),
+    dateContractSigned: parseFlexibleDate(row['developerSignTime']),
+    dateLastContact: parseFlexibleDate(row['Created At'] ?? row['Date Added']),
+    documensoId: (row['documensoID'] as string | undefined) ?? null,
+  };
+}
--- a/src/lib/dedup/nocodb-source.ts
+++ b/src/lib/dedup/nocodb-source.ts
@@ -0,0 +1,152 @@
+/**
+ * Read-only adapter for the legacy NocoDB Port Nimara base.
+ *
+ * Used by the one-shot migration script (`scripts/migrate-from-nocodb.ts`)
+ * to pull every Interest, Residential Interest, and Website Submission
+ * row from the source-of-truth NocoDB tables. No mutations.
+ *
+ * Auth: `xc-token` header per NocoDB v2 API.
+ *
+ * The shape returned is a verbatim record of the row's fields — caller
+ * is responsible for mapping to the new schema via `nocodb-transform.ts`.
+ */
+
+import { z } from 'zod';
+
+// ─── Configuration ──────────────────────────────────────────────────────────
+
+const ConfigSchema = z.object({
+  url: z.string().url(),
+  token: z.string().min(1),
+});
+
+export interface NocoDbConfig {
+  url: string;
+  token: string;
+}
+
+export function loadNocoDbConfig(env: NodeJS.ProcessEnv = process.env): NocoDbConfig {
+  return ConfigSchema.parse({
+    url: env.NOCODB_URL,
+    token: env.NOCODB_TOKEN,
+  });
+}
+
+// ─── Table identifiers ──────────────────────────────────────────────────────
+//
+// These IDs are stable per the NocoDB base — they were captured during the
+// 2026-05-03 audit and won't change unless the base is rebuilt. If the
+// base is reset, regenerate them from `getTablesList`.
+export const NOCO_TABLES = {
+  interests: 'mbs9hjauug4eseo',
+  residentialInterests: 'mscfpwwwjuds4nt',
+  websiteInterestSubmissions: 'mevkpcih67c6jsm',
+  websiteContactFormSubmissions: 'mxk5cd0pmwnwlcl',
+  websiteBerthEoiSupplements: 'mglmioo0ku8zgqj',
+  berths: 'mczgos9hr3oa9qc',
+} as const;
+
+// ─── HTTP shape ─────────────────────────────────────────────────────────────
+
+interface NocoDbListResponse<T> {
+  list: T[];
+  pageInfo: {
+    totalRows: number;
+    page: number;
+    pageSize: number;
+    isFirstPage: boolean;
+    isLastPage: boolean;
+  };
+}
+
+/** A row's `Id` is always present. The rest of the fields vary per table. */
+export type NocoDbRow = Record<string, unknown> & { Id: number };
+
+// ─── Public API ─────────────────────────────────────────────────────────────
+
+/**
+ * Fetch all rows from a NocoDB table. Auto-paginates until the API
+ * reports `isLastPage`. The legacy base is small (252 Interests rows
+ * being the largest table) so we keep this simple — no streaming.
+ */
+export async function fetchAllRows(
+  tableId: string,
+  config: NocoDbConfig,
+  pageSize = 250,
+): Promise<NocoDbRow[]> {
+  const all: NocoDbRow[] = [];
+  let page = 1;
+  // Hard cap to prevent infinite-loop bugs if pageInfo lies. Each page
+  // pulls up to `pageSize` rows, so 200 pages * 250 = 50k rows is the
+  // maximum we'll ever fetch from one table.
+  const MAX_PAGES = 200;
+
+  while (page <= MAX_PAGES) {
+    const url = new URL(`${config.url}/api/v2/tables/${tableId}/records`);
+    url.searchParams.set('limit', String(pageSize));
+    url.searchParams.set('offset', String((page - 1) * pageSize));
+
+    const res = await fetch(url, {
+      headers: {
+        'xc-token': config.token,
+        accept: 'application/json',
+      },
+    });
+
+    if (!res.ok) {
+      throw new Error(
+        `NocoDB fetch failed: ${res.status} ${res.statusText} — table ${tableId} page ${page}`,
+      );
+    }
+
+    const json = (await res.json()) as NocoDbListResponse<NocoDbRow>;
+    all.push(...json.list);
+
+    if (json.pageInfo.isLastPage || json.list.length === 0) break;
+    page += 1;
+  }
+
+  return all;
+}
+
+/**
+ * Convenience snapshot — pulls every table the migration cares about
+ * in parallel. Returned shape is the input the transform layer expects.
+ */
+export interface NocoDbSnapshot {
+  interests: NocoDbRow[];
+  residentialInterests: NocoDbRow[];
+  websiteInterestSubmissions: NocoDbRow[];
+  websiteContactFormSubmissions: NocoDbRow[];
+  websiteBerthEoiSupplements: NocoDbRow[];
+  berths: NocoDbRow[];
+  fetchedAt: string;
+}
+
+export async function fetchSnapshot(config: NocoDbConfig): Promise<NocoDbSnapshot> {
+  const [
+    interests,
+    residentialInterests,
+    websiteInterestSubmissions,
+    websiteContactFormSubmissions,
+    websiteBerthEoiSupplements,
+    berths,
+  ] = await Promise.all([
+    fetchAllRows(NOCO_TABLES.interests, config),
+    fetchAllRows(NOCO_TABLES.residentialInterests, config),
+    fetchAllRows(NOCO_TABLES.websiteInterestSubmissions, config),
+    fetchAllRows(NOCO_TABLES.websiteContactFormSubmissions, config),
+    fetchAllRows(NOCO_TABLES.websiteBerthEoiSupplements, config),
+    fetchAllRows(NOCO_TABLES.berths, config),
+  ]);
+
+  return {
+    interests,
+    residentialInterests,
+    websiteInterestSubmissions,
+    websiteContactFormSubmissions,
+    websiteBerthEoiSupplements,
+    berths,
+    fetchedAt: new Date().toISOString(),
+  };
+}
--- a/src/lib/dedup/normalize.ts
+++ b/src/lib/dedup/normalize.ts
@@ -12,7 +12,7 @@
 import { z } from 'zod';

 import { ALL_COUNTRY_CODES, getCountryName, type CountryCode } from '@/lib/i18n/countries';
-import { parsePhone } from '@/lib/i18n/phone';
+import { parsePhoneScriptSafe as parsePhone } from './phone-parse';

 // ─── Names ──────────────────────────────────────────────────────────────────

@@ -228,10 +228,18 @@ export function normalizePhone(

  // 7. Parse via the existing i18n helper (libphonenumber-js under the hood).
  const parsed = parsePhone(cleaned, defaultCountry);
-  if (!parsed.e164 || !parsed.isValid) {
+  if (!parsed.e164) {
+    // Couldn't even produce a canonical form — genuinely garbage.
    return { e164: null, country: null, display: null, flagged: 'unparseable' };
  }

+  // Note: we deliberately don't gate on `parsed.isValid`. The
+  // libphonenumber-js `min` build returns isValid=false for many real
+  // numbers (NANP territories share +1; some country metadata is
+  // truncated). For dedup we only need a canonical E.164 string to
+  // compare; strict validity is the form layer's problem, not ours.
+  // If a string-only test (e.g. \"abc-not-a-phone\") gets here, parse
+  // returns null e164 anyway and the branch above handles it.
  return {
    e164: parsed.e164,
    country: parsed.country,
@@ -261,6 +269,13 @@ const COUNTRY_ALIASES: Record<string, CountryCode> = {
  'st barth': 'BL',
  'st barths': 'BL',
  'st barthelemy': 'BL',
+  // Caribbean short-forms whose canonical Intl names are awkward
+  // ("Antigua and Barbuda", "Saint Vincent and the Grenadines", etc.).
+  antigua: 'AG',
+  barbuda: 'AG',
+  'st kitts': 'KN',
+  'saint kitts': 'KN',
+  nevis: 'KN',
 };

 /**
@@ -275,6 +290,20 @@ const CITY_TO_COUNTRY: Record<string, CountryCode> = {
  'kansas city': 'US',
  'sag harbor': 'US',
  'new york': 'US',
+  // Cities that came out unresolved from the 2026-05-03 NocoDB dry-run.
+  // Using lowercase (post-normalize keys).
+  boston: 'US',
+  tampa: 'US',
+  'fort lauderdale': 'US',
+  'port jefferson': 'US',
+  nantucket: 'US',
+  // US state abbreviations that often appear standalone or as suffix:
+  ' fl': 'US',
+  ' ma': 'US',
+  ' ny': 'US',
+  ' tx': 'US',
+  ' ca': 'US',
+  // International
  london: 'GB',
  paris: 'FR',
 };
--- a/src/lib/dedup/phone-parse.ts
+++ b/src/lib/dedup/phone-parse.ts
@@ -0,0 +1,66 @@
+/**
+ * Script-safe phone parser.
+ *
+ * The project's existing `src/lib/i18n/phone.ts` imports from
+ * `libphonenumber-js`, which under Node 25 + tsx loader hits a
+ * metadata-shape interop bug (`{ default }` wrapping the JSON). It
+ * works fine in Next.js + vitest, but a `tsx scripts/...` invocation
+ * blows up.
+ *
+ * This wrapper bypasses the bundled `index.cjs.js` and calls
+ * `libphonenumber-js/core` directly with metadata loaded as raw JSON.
+ * Same surface as the i18n helper; usable from both runtimes.
+ *
+ * Used by the dedup library's `normalizePhone`. The runtime UI still
+ * imports `i18n/phone` directly — no reason to touch a working path.
+ */
+
+// eslint-disable-next-line @typescript-eslint/no-require-imports
+const core: typeof import('libphonenumber-js/core') = require('libphonenumber-js/core');
+// Load the JSON directly. The bundled `index.cjs.js` does the same
+// thing but its `require('../metadata.min.json')` hits a Node 25 ESM
+// interop bug that wraps the JSON in `{ default }`. Importing the
+// JSON file by absolute path through the package root sidesteps it.
+// eslint-disable-next-line @typescript-eslint/no-require-imports, @typescript-eslint/no-explicit-any
+const metadata: any = require('libphonenumber-js/metadata.min.json');
+
+import type { CountryCode } from '@/lib/i18n/countries';
+
+export interface ParsedPhone {
+  e164: string | null;
+  country: CountryCode | null;
+  national: string | null;
+  international: string | null;
+  isValid: boolean;
+}
+
+const EMPTY: ParsedPhone = {
+  e164: null,
+  country: null,
+  national: null,
+  international: null,
+  isValid: false,
+};
+
+export function parsePhoneScriptSafe(raw: string, defaultCountry?: CountryCode): ParsedPhone {
+  const trimmed = raw.trim();
+  if (!trimmed) return EMPTY;
+  try {
+    // The core entry expects its own `CountryCode` type from
+    // libphonenumber-js. Our `CountryCode` type is the same set of ISO
+    // alpha-2 codes (we re-derive from the same Intl source) so this
+    // cast is structural-equivalent, not lossy.
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const parsed = core.parsePhoneNumberFromString(trimmed, defaultCountry as any, metadata);
+    if (!parsed) return EMPTY;
+    return {
+      e164: parsed.number,
+      country: (parsed.country ?? null) as CountryCode | null,
+      national: parsed.formatNational(),
+      international: parsed.formatInternational(),
+      isValid: parsed.isValid(),
+    };
+  } catch {
+    return EMPTY;
+  }
+}