Files
pn-new-crm/src/lib/dedup/nocodb-source.ts

153 lines
5.0 KiB
TypeScript
Raw Normal View History

feat(dedup): NocoDB migration script + tables (P3 dry-run) Lands the one-shot migration pipeline from the legacy NocoDB Interests base into the new client/interest schema. Dry-run mode is fully operational: pulls the live snapshot, runs the dedup library, and writes a CSV + Markdown report under .migration/<timestamp>/. The --apply phase is stubbed for a follow-up PR per the design's P3 implementation sequence. Schema additions ================ - `client_merge_candidates` — pairs flagged by the background scoring job for the /admin/duplicates review queue. Status enum: pending / dismissed / merged. Unique-(portId, clientAId, clientBId) so the same pair can't surface twice. Empty until P2 lands the cron. - `migration_source_links` — idempotency ledger. Maps source-system rows (NocoDB Interest #624 → new client UUID) so re-running --apply against the same dry-run report skips already-imported entities. Both tables ship with the migration `0020_unusual_azazel.sql` — already applied to the local dev DB during this commit's preparation. Library ======= src/lib/dedup/nocodb-source.ts Read-only adapter for the legacy NocoDB v2 API. xc-token auth, auto-paginates until isLastPage, captures the table IDs from the 2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in parallel into one in-memory object the transform layer consumes. src/lib/dedup/migration-transform.ts Pure function: NocoDB snapshot in, MigrationPlan out. Per row: - normalizes name / email / phone / country via the dedup library - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats - maps the 8-stage `Sales Process Level` enum to the new 9-stage pipelineStage - filters yacht-name placeholders ('TBC', 'Na', etc.) - merges Internal Notes + Extra Comments + Berth Size Desired into a single notes blob Then runs `findClientMatches` pairwise (with blocking) and union-finds clusters of rows whose score crosses the auto-link threshold (90). Lower-scoring pairs (50–89) become 'needs review'. Each cluster's "lead" row is picked by completeness score with recency tie-break. src/lib/dedup/migration-report.ts Writes three artifacts to .migration/<timestamp>/: - report.csv — one row per planned op, RFC-4180 escaped - summary.md — human-skimmable overview - plan.json — full structured plan for the --apply phase CSV cells with comma / quote / newline are quoted; internal quotes are doubled. No external CSV dep. src/lib/dedup/phone-parse.ts Script-safe wrapper around libphonenumber-js's `core` entry that loads `metadata.min.json` directly. The default `index.cjs.js` bundled by libphonenumber hits a metadata-shape interop bug under Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it. The dedup `normalizePhone` and `find-matches` both use this wrapper now so the same code path runs in vitest, Next.js, and the migration CLI without surprises. src/lib/dedup/normalize.ts Tightened country resolution: added Caribbean short-form aliases ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the US locations seen in the NocoDB dump (Boston, Tampa, Fort Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing to drop the `isValid()` strict check — the libphonenumber min build rejects many real NANP-territory numbers, and dedup only needs a canonical E.164 to compare. CLI === scripts/migrate-from-nocodb.ts pnpm tsx scripts/migrate-from-nocodb.ts --dry-run → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars), runs the transform, writes report. No DB writes. pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/ → Stubbed; exits with `not yet implemented` and a pointer to the design doc. Apply phase ships in a follow-up. Tests ===== tests/unit/dedup/migration-transform.test.ts (7 cases) Fixture-based regression. A frozen 12-row NocoDB snapshot covers every duplicate pattern in the design (§1.2). The test asserts: - 12 input rows → 7 unique clients (cluster math is right) - Patterns A / B / C / E auto-link - Pattern F (Etiennette Clamouze) does NOT auto-link - Every interest preserved as its own row even when clients merge - 8-stage → 9-stage enum mapping is correct per spec - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one client) — the design's signature win - Output is deterministic (run twice, identical) Validation against real data ============================ Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the live NocoDB. Result on 252 Interests rows: - 237 clients (15 merged into 13 clusters) - 252 interests (one per source row) - 406 contacts, 52 addresses - 13 auto-linked clusters (every confirmed cluster from §1.2 audit) - 3 pairs flagged for review (Camazou, Zasso, one new) - 1 phone placeholder flagged Total dedup test count: 57 (50 from P1 + 7 fixture tests). Lint: clean. Tsc: clean for new files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
/**
* Read-only adapter for the legacy NocoDB Port Nimara base.
*
* Used by the one-shot migration script (`scripts/migrate-from-nocodb.ts`)
* to pull every Interest, Residential Interest, and Website Submission
* row from the source-of-truth NocoDB tables. No mutations.
*
* Auth: `xc-token` header per NocoDB v2 API.
*
* The shape returned is a verbatim record of the row's fields caller
* is responsible for mapping to the new schema via `nocodb-transform.ts`.
*/
import { z } from 'zod';
// ─── Configuration ──────────────────────────────────────────────────────────
const ConfigSchema = z.object({
url: z.string().url(),
token: z.string().min(1),
});
export interface NocoDbConfig {
url: string;
token: string;
}
export function loadNocoDbConfig(env: NodeJS.ProcessEnv = process.env): NocoDbConfig {
return ConfigSchema.parse({
url: env.NOCODB_URL,
token: env.NOCODB_TOKEN,
});
}
// ─── Table identifiers ──────────────────────────────────────────────────────
//
// These IDs are stable per the NocoDB base — they were captured during the
// 2026-05-03 audit and won't change unless the base is rebuilt. If the
// base is reset, regenerate them from `getTablesList`.
export const NOCO_TABLES = {
interests: 'mbs9hjauug4eseo',
residentialInterests: 'mscfpwwwjuds4nt',
websiteInterestSubmissions: 'mevkpcih67c6jsm',
websiteContactFormSubmissions: 'mxk5cd0pmwnwlcl',
websiteBerthEoiSupplements: 'mglmioo0ku8zgqj',
berths: 'mczgos9hr3oa9qc',
} as const;
// ─── HTTP shape ─────────────────────────────────────────────────────────────
interface NocoDbListResponse<T> {
list: T[];
pageInfo: {
totalRows: number;
page: number;
pageSize: number;
isFirstPage: boolean;
isLastPage: boolean;
};
}
/** A row's `Id` is always present. The rest of the fields vary per table. */
export type NocoDbRow = Record<string, unknown> & { Id: number };
// ─── Public API ─────────────────────────────────────────────────────────────
/**
* Fetch all rows from a NocoDB table. Auto-paginates until the API
* reports `isLastPage`. The legacy base is small (252 Interests rows
* being the largest table) so we keep this simple no streaming.
*/
export async function fetchAllRows(
tableId: string,
config: NocoDbConfig,
pageSize = 250,
): Promise<NocoDbRow[]> {
const all: NocoDbRow[] = [];
let page = 1;
// Hard cap to prevent infinite-loop bugs if pageInfo lies. Each page
// pulls up to `pageSize` rows, so 200 pages * 250 = 50k rows is the
// maximum we'll ever fetch from one table.
const MAX_PAGES = 200;
while (page <= MAX_PAGES) {
const url = new URL(`${config.url}/api/v2/tables/${tableId}/records`);
url.searchParams.set('limit', String(pageSize));
url.searchParams.set('offset', String((page - 1) * pageSize));
const res = await fetch(url, {
headers: {
'xc-token': config.token,
accept: 'application/json',
},
});
if (!res.ok) {
throw new Error(
`NocoDB fetch failed: ${res.status} ${res.statusText} — table ${tableId} page ${page}`,
);
}
const json = (await res.json()) as NocoDbListResponse<NocoDbRow>;
all.push(...json.list);
if (json.pageInfo.isLastPage || json.list.length === 0) break;
page += 1;
}
return all;
}
/**
* Convenience snapshot pulls every table the migration cares about
* in parallel. Returned shape is the input the transform layer expects.
*/
export interface NocoDbSnapshot {
interests: NocoDbRow[];
residentialInterests: NocoDbRow[];
websiteInterestSubmissions: NocoDbRow[];
websiteContactFormSubmissions: NocoDbRow[];
websiteBerthEoiSupplements: NocoDbRow[];
berths: NocoDbRow[];
fetchedAt: string;
}
export async function fetchSnapshot(config: NocoDbConfig): Promise<NocoDbSnapshot> {
const [
interests,
residentialInterests,
websiteInterestSubmissions,
websiteContactFormSubmissions,
websiteBerthEoiSupplements,
berths,
] = await Promise.all([
fetchAllRows(NOCO_TABLES.interests, config),
fetchAllRows(NOCO_TABLES.residentialInterests, config),
fetchAllRows(NOCO_TABLES.websiteInterestSubmissions, config),
fetchAllRows(NOCO_TABLES.websiteContactFormSubmissions, config),
fetchAllRows(NOCO_TABLES.websiteBerthEoiSupplements, config),
fetchAllRows(NOCO_TABLES.berths, config),
]);
return {
interests,
residentialInterests,
websiteInterestSubmissions,
websiteContactFormSubmissions,
websiteBerthEoiSupplements,
berths,
fetchedAt: new Date().toISOString(),
};
}