feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
/**
|
|
|
|
|
* One-shot migration: legacy NocoDB Interests → new client/interest split.
|
|
|
|
|
*
|
|
|
|
|
* Usage:
|
|
|
|
|
*
|
|
|
|
|
* pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
|
|
|
|
|
* Pulls the live NocoDB base, runs the transform + dedup pipeline,
|
|
|
|
|
* writes a report to .migration/<timestamp>/. NO database writes.
|
|
|
|
|
*
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
* pnpm tsx scripts/migrate-from-nocodb.ts --dry-run --port-slug port-nimara
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
* Same, but tags the planned writes with the named port (matters for
|
|
|
|
|
* the apply phase — every client/interest belongs to one port).
|
|
|
|
|
*
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
* pnpm tsx scripts/migrate-from-nocodb.ts --apply --port-slug port-nimara
|
|
|
|
|
* Re-fetches NocoDB, re-transforms, then writes the planned rows
|
|
|
|
|
* into the target port via the idempotent `migration_source_links`
|
|
|
|
|
* ledger. Re-runs are safe — already-imported source IDs are skipped.
|
|
|
|
|
* REQUIRES `EMAIL_REDIRECT_TO` to be set in env (safety net) unless
|
|
|
|
|
* `--unsafe-skip-redirect-check` is also passed.
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
*
|
|
|
|
|
* Design reference: docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md §9.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
import 'dotenv/config';
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
import { randomUUID } from 'node:crypto';
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
import path from 'node:path';
|
|
|
|
|
import { fileURLToPath } from 'node:url';
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
import { eq } from 'drizzle-orm';
|
|
|
|
|
|
|
|
|
|
import { db } from '@/lib/db';
|
|
|
|
|
import { ports } from '@/lib/db/schema/ports';
|
|
|
|
|
import { applyPlan } from '@/lib/dedup/migration-apply';
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
import { fetchSnapshot, loadNocoDbConfig } from '@/lib/dedup/nocodb-source';
|
|
|
|
|
import { transformSnapshot } from '@/lib/dedup/migration-transform';
|
|
|
|
|
import { resolveReportPaths, writeReport } from '@/lib/dedup/migration-report';
|
|
|
|
|
|
|
|
|
|
interface CliArgs {
|
|
|
|
|
dryRun: boolean;
|
|
|
|
|
apply: boolean;
|
|
|
|
|
portSlug: string | null;
|
|
|
|
|
reportDir: string | null;
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
unsafeSkipRedirectCheck: boolean;
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function parseArgs(argv: string[]): CliArgs {
|
|
|
|
|
const args: CliArgs = {
|
|
|
|
|
dryRun: false,
|
|
|
|
|
apply: false,
|
|
|
|
|
portSlug: null,
|
|
|
|
|
reportDir: null,
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
unsafeSkipRedirectCheck: false,
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
};
|
|
|
|
|
for (let i = 0; i < argv.length; i += 1) {
|
|
|
|
|
const a = argv[i]!;
|
|
|
|
|
if (a === '--dry-run') args.dryRun = true;
|
|
|
|
|
else if (a === '--apply') args.apply = true;
|
|
|
|
|
else if (a === '--port-slug') args.portSlug = argv[++i] ?? null;
|
|
|
|
|
else if (a === '--report') args.reportDir = argv[++i] ?? null;
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
else if (a === '--unsafe-skip-redirect-check') args.unsafeSkipRedirectCheck = true;
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
else if (a === '-h' || a === '--help') {
|
|
|
|
|
printHelp();
|
|
|
|
|
process.exit(0);
|
|
|
|
|
} else {
|
|
|
|
|
console.error(`Unknown argument: ${a}`);
|
|
|
|
|
printHelp();
|
|
|
|
|
process.exit(1);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return args;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function printHelp(): void {
|
|
|
|
|
console.log(`Usage:
|
|
|
|
|
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run [--port-slug <slug>]
|
|
|
|
|
Pulls NocoDB → transforms → writes report to .migration/<timestamp>/.
|
|
|
|
|
No database writes.
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
pnpm tsx scripts/migrate-from-nocodb.ts --apply --port-slug <slug>
|
|
|
|
|
Re-fetches NocoDB, re-transforms, writes via migration_source_links
|
|
|
|
|
ledger. Idempotent — safe to re-run. Requires EMAIL_REDIRECT_TO set
|
|
|
|
|
(unless --unsafe-skip-redirect-check is also passed).
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
|
|
|
|
|
Flags:
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
--dry-run Read NocoDB, write report only.
|
|
|
|
|
--apply Actually write rows to the DB.
|
|
|
|
|
--port-slug <slug> Port slug to attach to all imported
|
|
|
|
|
entities. Defaults to the first
|
|
|
|
|
available port if omitted.
|
|
|
|
|
--report <dir> Path to a previously-generated report
|
|
|
|
|
dir (only used by --apply).
|
|
|
|
|
--unsafe-skip-redirect-check Skip the EMAIL_REDIRECT_TO precondition
|
|
|
|
|
check. Only use in production cutover.
|
|
|
|
|
-h, --help Show this help.
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
`);
|
|
|
|
|
}
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
/**
|
|
|
|
|
* Resolve the target port: use the slug if provided, otherwise the first
|
|
|
|
|
* port found. Errors out cleanly if the slug doesn't match any port.
|
|
|
|
|
*/
|
|
|
|
|
async function resolvePort(slug: string | null): Promise<{ id: string; slug: string }> {
|
|
|
|
|
if (slug) {
|
|
|
|
|
const [p] = await db
|
|
|
|
|
.select({ id: ports.id, slug: ports.slug })
|
|
|
|
|
.from(ports)
|
|
|
|
|
.where(eq(ports.slug, slug))
|
|
|
|
|
.limit(1);
|
|
|
|
|
if (!p) {
|
|
|
|
|
console.error(`No port found with slug "${slug}".`);
|
|
|
|
|
process.exit(1);
|
|
|
|
|
}
|
|
|
|
|
return { id: p.id, slug: p.slug };
|
|
|
|
|
}
|
|
|
|
|
const [first] = await db.select({ id: ports.id, slug: ports.slug }).from(ports).limit(1);
|
|
|
|
|
if (!first) {
|
|
|
|
|
console.error('No ports exist in the target DB. Seed at least one port before applying.');
|
|
|
|
|
process.exit(1);
|
|
|
|
|
}
|
|
|
|
|
return { id: first.id, slug: first.slug };
|
|
|
|
|
}
|
|
|
|
|
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
async function main(): Promise<void> {
|
|
|
|
|
const args = parseArgs(process.argv.slice(2));
|
|
|
|
|
|
|
|
|
|
if (!args.dryRun && !args.apply) {
|
|
|
|
|
console.error('Must specify --dry-run or --apply');
|
|
|
|
|
printHelp();
|
|
|
|
|
process.exit(1);
|
|
|
|
|
}
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
// Safety gate: --apply must run with EMAIL_REDIRECT_TO set, unless the
|
|
|
|
|
// operator explicitly opts out (production cutover).
|
|
|
|
|
if (args.apply && !process.env.EMAIL_REDIRECT_TO && !args.unsafeSkipRedirectCheck) {
|
|
|
|
|
console.error(
|
|
|
|
|
'--apply requires EMAIL_REDIRECT_TO to be set in the environment as a safety net.',
|
|
|
|
|
);
|
|
|
|
|
console.error('See docs/operations/outbound-comms-safety.md for the rationale.');
|
|
|
|
|
console.error(
|
|
|
|
|
'If you are running the production cutover and have read that doc, add ' +
|
|
|
|
|
'--unsafe-skip-redirect-check to override.',
|
|
|
|
|
);
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
process.exit(2);
|
|
|
|
|
}
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
// ── Fetch + transform (shared by dry-run and apply) ──────────────────────
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
|
|
|
|
|
console.log('[migrate] Loading NocoDB config…');
|
|
|
|
|
const config = loadNocoDbConfig();
|
|
|
|
|
console.log(`[migrate] Source: ${config.url}`);
|
|
|
|
|
|
|
|
|
|
console.log('[migrate] Fetching snapshot from NocoDB…');
|
|
|
|
|
const start = Date.now();
|
|
|
|
|
const snapshot = await fetchSnapshot(config);
|
|
|
|
|
const elapsed = ((Date.now() - start) / 1000).toFixed(1);
|
|
|
|
|
console.log(
|
|
|
|
|
`[migrate] Snapshot fetched in ${elapsed}s — ${snapshot.interests.length} interests, ${snapshot.residentialInterests.length} residential, ${snapshot.berths.length} berths.`,
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
console.log('[migrate] Running transform + dedup pipeline…');
|
|
|
|
|
const plan = transformSnapshot(snapshot);
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
// Resolve output paths relative to the worktree root.
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
const scriptDir = path.dirname(fileURLToPath(import.meta.url));
|
|
|
|
|
const repoRoot = path.resolve(scriptDir, '..');
|
|
|
|
|
const generatedAt = new Date().toISOString();
|
|
|
|
|
const paths = resolveReportPaths(repoRoot);
|
|
|
|
|
|
|
|
|
|
console.log(`[migrate] Writing report to ${paths.rootDir}…`);
|
|
|
|
|
await writeReport(paths, plan, generatedAt);
|
|
|
|
|
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
// ── Plan summary ─────────────────────────────────────────────────────────
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
const s = plan.stats;
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log('=== Migration Plan Summary ===');
|
|
|
|
|
console.log(
|
|
|
|
|
` Input: ${s.inputInterestRows} interests, ${s.inputResidentialRows} residential interests`,
|
|
|
|
|
);
|
|
|
|
|
console.log(` Output: ${s.outputClients} clients, ${s.outputInterests} interests`);
|
|
|
|
|
console.log(` ${s.outputContacts} contacts, ${s.outputAddresses} addresses`);
|
|
|
|
|
console.log(
|
|
|
|
|
` Dedup: ${s.autoLinkedClusters} auto-linked clusters, ${s.needsReviewPairs} pairs flagged for review`,
|
|
|
|
|
);
|
|
|
|
|
console.log(` Quality: ${s.flaggedRows} rows flagged (see report.csv)`);
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log(` Full report: ${paths.summaryPath}`);
|
feat(dedup): wire --apply path for NocoDB migration
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.
src/lib/dedup/migration-apply.ts (new):
Idempotent apply driver. Walks the MigrationPlan, inserting clients,
contacts, addresses, yacht stubs, and interests, threading every
insert through the migration_source_links ledger so re-runs against
the same data are safe. Per-entity transactions (not one giant
transaction) so partial-failure resumption is just "run again."
Per-entity behavior:
- clients: idempotent on (source_system, source_id, target_type=client)
across the entire dedup cluster — if any source row already maps
to a client, reuse that record.
- contacts: bulk insert, primary email + primary phone independent.
- addresses: bulk insert, port_id required (schema enforces it),
first address marked primary when multiple.
- yachts: minimal stub when the legacy interest had a yachtName,
currentOwnerType=client + currentOwnerId=migrated client. Linked
via migration_source_links target_type=yacht.
- interests: looks up berthId via mooring number, yachtId via the
stub above. Carries Documenso ID forward when present.
surnameToken from PlannedClient is dropped on insert (it's a dedup
blocking-index artifact; runtime dedup re-derives from fullName).
scripts/migrate-from-nocodb.ts:
- Removes the "not yet implemented" guard for --apply.
- Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
the env var is set, OR --unsafe-skip-redirect-check is also passed
(production cutover only). Refers to docs/operations/outbound-comms-safety.md.
- Re-fetches NocoDB at apply time (rather than reading a saved report
dir) so the data is always fresh. Re-running is safe via the
idempotency ledger.
- Resolves target port via --port-slug (or first port if omitted).
- Generates a UUID applyId tagged on every link, which pairs with a
future --rollback flag.
- Apply summary prints inserted/skipped counts per entity type plus
the first 20 warnings.
Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
|
|
|
|
|
|
|
|
if (args.dryRun) {
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log('Dry-run complete. Re-run with --apply to write rows.');
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ── Apply path ───────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
const port = await resolvePort(args.portSlug);
|
|
|
|
|
const applyId = randomUUID();
|
|
|
|
|
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log(`[migrate] Applying to port "${port.slug}" (id=${port.id})`);
|
|
|
|
|
console.log(`[migrate] Apply id: ${applyId}`);
|
|
|
|
|
console.log('[migrate] Inserting…');
|
|
|
|
|
|
|
|
|
|
const applyStart = Date.now();
|
|
|
|
|
const result = await applyPlan(plan, { port, applyId });
|
|
|
|
|
const applyElapsed = ((Date.now() - applyStart) / 1000).toFixed(1);
|
|
|
|
|
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log('=== Apply Result ===');
|
|
|
|
|
console.log(` Time: ${applyElapsed}s`);
|
|
|
|
|
console.log(
|
|
|
|
|
` Clients: ${result.clientsInserted} inserted, ${result.clientsSkipped} already linked`,
|
|
|
|
|
);
|
|
|
|
|
console.log(` Contacts: ${result.contactsInserted} inserted`);
|
|
|
|
|
console.log(` Addresses: ${result.addressesInserted} inserted`);
|
|
|
|
|
console.log(` Yachts: ${result.yachtsInserted} inserted`);
|
|
|
|
|
console.log(
|
|
|
|
|
` Interests: ${result.interestsInserted} inserted, ${result.interestsSkipped} already linked`,
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
if (result.warnings.length > 0) {
|
|
|
|
|
console.log('');
|
|
|
|
|
console.log('Warnings:');
|
|
|
|
|
for (const w of result.warnings.slice(0, 20)) {
|
|
|
|
|
console.log(` - ${w}`);
|
|
|
|
|
}
|
|
|
|
|
if (result.warnings.length > 20) {
|
|
|
|
|
console.log(` … ${result.warnings.length - 20} more`);
|
|
|
|
|
}
|
|
|
|
|
}
|
feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.
Schema additions
================
- `client_merge_candidates` — pairs flagged by the background scoring
job for the /admin/duplicates review queue. Status enum: pending /
dismissed / merged. Unique-(portId, clientAId, clientBId) so the
same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
rows (NocoDB Interest #624 → new client UUID) so re-running --apply
against the same dry-run report skips already-imported entities.
Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.
Library
=======
src/lib/dedup/nocodb-source.ts
Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
auto-paginates until isLastPage, captures the table IDs from the
2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
parallel into one in-memory object the transform layer consumes.
src/lib/dedup/migration-transform.ts
Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
- normalizes name / email / phone / country via the dedup library
- parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
- maps the 8-stage `Sales Process Level` enum to the new 9-stage
pipelineStage
- filters yacht-name placeholders ('TBC', 'Na', etc.)
- merges Internal Notes + Extra Comments + Berth Size Desired into
a single notes blob
Then runs `findClientMatches` pairwise (with blocking) and
union-finds clusters of rows whose score crosses the auto-link
threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
Each cluster's "lead" row is picked by completeness score with
recency tie-break.
src/lib/dedup/migration-report.ts
Writes three artifacts to .migration/<timestamp>/:
- report.csv — one row per planned op, RFC-4180 escaped
- summary.md — human-skimmable overview
- plan.json — full structured plan for the --apply phase
CSV cells with comma / quote / newline are quoted; internal quotes
are doubled. No external CSV dep.
src/lib/dedup/phone-parse.ts
Script-safe wrapper around libphonenumber-js's `core` entry that
loads `metadata.min.json` directly. The default `index.cjs.js`
bundled by libphonenumber hits a metadata-shape interop bug under
Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
The dedup `normalizePhone` and `find-matches` both use this wrapper
now so the same code path runs in vitest, Next.js, and the migration
CLI without surprises.
src/lib/dedup/normalize.ts
Tightened country resolution: added Caribbean short-form aliases
('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
US locations seen in the NocoDB dump (Boston, Tampa, Fort
Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
to drop the `isValid()` strict check — the libphonenumber min build
rejects many real NANP-territory numbers, and dedup only needs a
canonical E.164 to compare.
CLI
===
scripts/migrate-from-nocodb.ts
pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
→ Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
runs the transform, writes report. No DB writes.
pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
→ Stubbed; exits with `not yet implemented` and a pointer to the
design doc. Apply phase ships in a follow-up.
Tests
=====
tests/unit/dedup/migration-transform.test.ts (7 cases)
Fixture-based regression. A frozen 12-row NocoDB snapshot covers
every duplicate pattern in the design (§1.2). The test asserts:
- 12 input rows → 7 unique clients (cluster math is right)
- Patterns A / B / C / E auto-link
- Pattern F (Etiennette Clamouze) does NOT auto-link
- Every interest preserved as its own row even when clients merge
- 8-stage → 9-stage enum mapping is correct per spec
- Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
client) — the design's signature win
- Output is deterministic (run twice, identical)
Validation against real data
============================
Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
- 237 clients (15 merged into 13 clusters)
- 252 interests (one per source row)
- 406 contacts, 52 addresses
- 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
- 3 pairs flagged for review (Camazou, Zasso, one new)
- 1 phone placeholder flagged
Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
|
|
|
console.log('');
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
main().catch((err) => {
|
|
|
|
|
console.error('[migrate] Fatal error:', err);
|
|
|
|
|
process.exit(1);
|
|
|
|
|
});
|