Commit Graph

7 Commits

Author SHA1 Message Date
Matt Ciaccio
05be89ec6f feat(berths): normalize mooring numbers to canonical form
Sweep CRM mooring numbers from the legacy hyphen+padded form ("A-01")
to the canonical bare form ("A1") used by NocoDB, the public website,
the per-berth PDFs, and the Documenso EOI templates. Drift was
introduced by the original load-berths-to-port-nimara.ts seed; this
gates the Phase 3 public-website cutover where /berths/A1 URLs would
404 against a CRM still storing "A-01".

- 0024 data migration: idempotent regexp_replace + post-update sanity
  check that surfaces any non-conforming rows for manual triage.
- Invert normalizeLegacyMooring in dedup/migration-apply: it now
  canonicalizes ("D-32" -> "D32") instead of legacy-izing.
- Update tiptap-to-pdfme example tokens, EOI fixture moorings, and
  smoke-test seed moorings.
- Refresh seed-data/berths.json to canonical form; drop the now-
  redundant legacyMooringNumber field.
- Delete scripts/load-berths-to-port-nimara.ts (superseded in 0c).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:59:26 +02:00
Matt Ciaccio
8699f81879 chore(style): codebase em-dash sweep + minor layout polish
Some checks failed
Build & Push Docker Images / lint (push) Failing after 1m18s
Build & Push Docker Images / build-and-push (push) Has been skipped
Replaces every em-dash and en-dash with regular ASCII hyphens
across comments, JSX strings, and dev-facing logs. Mostly cosmetic
but stops the inconsistent mix that crept in over the last few
months (some files used em-dashes in comments, others didn't,
some used both).

Bundles two small dashboard-layout tweaks that touch a couple of
already-modified files:
- (dashboard)/layout.tsx main padding goes from p-6 to pt-3 px-6
  pb-6 so page content sits closer to the topbar.
- Sidebar now receives the ports list it needs for the footer
  port switcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:57:01 +02:00
Matt Ciaccio
d62822c284 fix(migration): NocoDB import safety + dedup helpers + lead-source backfill
migration-apply: residential client + interest inserts now wrap in
db.transaction so a partial failure can't leave an orphan client
row without its interest (or vice versa).

migration-transform: buildPlannedDocument returns null when there
are no signers so the apply pass doesn't try to send a Documenso
envelope without recipients. mapDocumentStatus gets an explicit
"Awaiting Further Details" branch that no longer auto-promotes via
stale sign-time fields. parseFlexibleDate handles ISO and DD-MM-YYYY
inputs uniformly.

backfill-legacy-lead-source: chunk UPDATE WHERE clause now
isNull(source) on top of the inArray match, so a re-run can't
overwrite a more accurate source written between batches.

Adds 235 lines of vitest coverage on migration-transform.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:56:18 +02:00
Matt Ciaccio
c612bbdfd9 fix(migration): legacy bare-mooring lookup + port-nimara berth backfill
Some checks failed
Build & Push Docker Images / lint (push) Failing after 1m12s
Build & Push Docker Images / build-and-push (push) Has been skipped
Two issues surfaced when applying the migration to dev:

1. Mooring number format mismatch
   The legacy NocoDB Interests table writes bare mooring strings
   ("D32", "B16", "A4"), but the new berths table (mirroring the
   NocoDB Berths snapshot) uses zero-padded dashed form ("D-32",
   "B-16", "A-04"). The interest→berth lookup missed every reference.

   migration-apply.ts now tries the literal value first, then falls
   back to a normalized form via `normalizeLegacyMooring(raw)`:
     "D32" -> "D-32"
     "A4"  -> "A-04"
     "E18" -> "E-18"
   Multi-mooring strings ("A3, D30") are left as-is so they surface in
   the warnings list for human review rather than silently picking one.

2. port-nimara only had the 12 hand-rolled seed berths, not the 117-
   berth NocoDB snapshot
   The mobile-foundation seed only places those 12 in port-nimara; the
   117-berth snapshot was added later but only seeded into Marina
   Azzurra (the secondary test port). Migrated interests reference
   moorings well beyond A-01..D-03, so most lookups failed.

   New scripts/load-berths-to-port-nimara.ts: idempotently loads any
   missing snapshot berths into port-nimara without disturbing the
   existing 12 (skips moorings that already exist). Run once;
   subsequent runs no-op.

Result of full migration run on dev:
  237 clients inserted (out of 245 total — 8 from prior seed)
  406 contacts, 52 addresses, 38 yachts, 252 interests
  27 interest→berth links resolved (only 13 source rows had a Berth
  field set in NocoDB to begin with — most legacy interests are early
  inquiries with no berth assignment)
  1 unresolved warning: source=277 has multi-mooring "A3, D30"

Verified in UI:
  /port-nimara/clients shows real names (John-michael Seelye, Reza
  Amjad, Etiennette Clamouze, …)
  /port-nimara/clients/<id> renders contacts (gmail.com addresses,
  E.164 phones), tab counts (Interests N, Yachts N), pipeline summary
  Dashboard: 245 clients, 266 active interests, $46.5M pipeline value
  Pipeline funnel chart now shows real distribution (180 Open, 45
  EOI Signed, dropoff through stages)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 21:05:11 +02:00
Matt Ciaccio
c45aac551d feat(dedup): wire --apply path for NocoDB migration
Some checks failed
Build & Push Docker Images / lint (push) Successful in 1m12s
Build & Push Docker Images / build-and-push (push) Failing after 3m41s
Completes the migration script's apply phase, which was stubbed at
the P3 ship to defer until after the runtime surfaces (P2) and the
comms safety net were in place. Both prerequisites just landed on
main, so this unblocks the actual data import.

src/lib/dedup/migration-apply.ts (new):
  Idempotent apply driver. Walks the MigrationPlan, inserting clients,
  contacts, addresses, yacht stubs, and interests, threading every
  insert through the migration_source_links ledger so re-runs against
  the same data are safe. Per-entity transactions (not one giant
  transaction) so partial-failure resumption is just "run again."

  Per-entity behavior:
    - clients: idempotent on (source_system, source_id, target_type=client)
      across the entire dedup cluster — if any source row already maps
      to a client, reuse that record.
    - contacts: bulk insert, primary email + primary phone independent.
    - addresses: bulk insert, port_id required (schema enforces it),
      first address marked primary when multiple.
    - yachts: minimal stub when the legacy interest had a yachtName,
      currentOwnerType=client + currentOwnerId=migrated client. Linked
      via migration_source_links target_type=yacht.
    - interests: looks up berthId via mooring number, yachtId via the
      stub above. Carries Documenso ID forward when present.

  surnameToken from PlannedClient is dropped on insert (it's a dedup
  blocking-index artifact; runtime dedup re-derives from fullName).

scripts/migrate-from-nocodb.ts:
  - Removes the "not yet implemented" guard for --apply.
  - Adds EMAIL_REDIRECT_TO precondition gate: --apply errors out unless
    the env var is set, OR --unsafe-skip-redirect-check is also passed
    (production cutover only). Refers to docs/operations/outbound-comms-safety.md.
  - Re-fetches NocoDB at apply time (rather than reading a saved report
    dir) so the data is always fresh. Re-running is safe via the
    idempotency ledger.
  - Resolves target port via --port-slug (or first port if omitted).
  - Generates a UUID applyId tagged on every link, which pairs with a
    future --rollback flag.
  - Apply summary prints inserted/skipped counts per entity type plus
    the first 20 warnings.

Verification: 0 tsc errors, 926/926 vitest passing, lint clean.
The actual end-to-end run requires NOCODB_URL + NOCODB_TOKEN in .env
which aren't configured in this checkout; that's the operator's next
step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 19:53:04 +02:00
Matt Ciaccio
18e5c124b0 feat(dedup): NocoDB migration script + tables (P3 dry-run)
Lands the one-shot migration pipeline from the legacy NocoDB Interests
base into the new client/interest schema. Dry-run mode is fully
operational: pulls the live snapshot, runs the dedup library, and
writes a CSV + Markdown report under .migration/<timestamp>/. The
--apply phase is stubbed for a follow-up PR per the design's P3
implementation sequence.

Schema additions
================

- `client_merge_candidates` — pairs flagged by the background scoring
  job for the /admin/duplicates review queue. Status enum: pending /
  dismissed / merged. Unique-(portId, clientAId, clientBId) so the
  same pair can't surface twice. Empty until P2 lands the cron.
- `migration_source_links` — idempotency ledger. Maps source-system
  rows (NocoDB Interest #624 → new client UUID) so re-running --apply
  against the same dry-run report skips already-imported entities.

Both tables ship with the migration `0020_unusual_azazel.sql` —
already applied to the local dev DB during this commit's preparation.

Library
=======

src/lib/dedup/nocodb-source.ts
  Read-only adapter for the legacy NocoDB v2 API. xc-token auth,
  auto-paginates until isLastPage, captures the table IDs from the
  2026-05-03 audit. `fetchSnapshot()` pulls every relevant table in
  parallel into one in-memory object the transform layer consumes.

src/lib/dedup/migration-transform.ts
  Pure function: NocoDB snapshot in, MigrationPlan out. Per row:
    - normalizes name / email / phone / country via the dedup library
    - parses the legacy DD-MM-YYYY / DD/MM/YYYY / ISO date formats
    - maps the 8-stage `Sales Process Level` enum to the new 9-stage
      pipelineStage
    - filters yacht-name placeholders ('TBC', 'Na', etc.)
    - merges Internal Notes + Extra Comments + Berth Size Desired into
      a single notes blob
  Then runs `findClientMatches` pairwise (with blocking) and
  union-finds clusters of rows whose score crosses the auto-link
  threshold (90). Lower-scoring pairs (50–89) become 'needs review'.
  Each cluster's "lead" row is picked by completeness score with
  recency tie-break.

src/lib/dedup/migration-report.ts
  Writes three artifacts to .migration/<timestamp>/:
    - report.csv  — one row per planned op, RFC-4180 escaped
    - summary.md  — human-skimmable overview
    - plan.json   — full structured plan for the --apply phase
  CSV cells with comma / quote / newline are quoted; internal quotes
  are doubled. No external CSV dep.

src/lib/dedup/phone-parse.ts
  Script-safe wrapper around libphonenumber-js's `core` entry that
  loads `metadata.min.json` directly. The default `index.cjs.js`
  bundled by libphonenumber hits a metadata-shape interop bug under
  Node 25 + tsx (`{ default }` wrapping); core+JSON sidesteps it.
  The dedup `normalizePhone` and `find-matches` both use this wrapper
  now so the same code path runs in vitest, Next.js, and the migration
  CLI without surprises.

src/lib/dedup/normalize.ts
  Tightened country resolution: added Caribbean short-form aliases
  ('antigua' → AG, 'st kitts' → KN, etc.) and a city map covering the
  US locations seen in the NocoDB dump (Boston, Tampa, Fort
  Lauderdale, Port Jefferson, Nantucket). Also relaxed phone parsing
  to drop the `isValid()` strict check — the libphonenumber min build
  rejects many real NANP-territory numbers, and dedup only needs a
  canonical E.164 to compare.

CLI
===

scripts/migrate-from-nocodb.ts
  pnpm tsx scripts/migrate-from-nocodb.ts --dry-run
    → Pulls the live NocoDB base (NOCODB_URL + NOCODB_TOKEN env vars),
       runs the transform, writes report. No DB writes.
  pnpm tsx scripts/migrate-from-nocodb.ts --apply --report .migration/<dir>/
    → Stubbed; exits with `not yet implemented` and a pointer to the
       design doc. Apply phase ships in a follow-up.

Tests
=====

tests/unit/dedup/migration-transform.test.ts (7 cases)
  Fixture-based regression. A frozen 12-row NocoDB snapshot covers
  every duplicate pattern in the design (§1.2). The test asserts:
    - 12 input rows → 7 unique clients (cluster math is right)
    - Patterns A / B / C / E auto-link
    - Pattern F (Etiennette Clamouze) does NOT auto-link
    - Every interest preserved as its own row even when clients merge
    - 8-stage → 9-stage enum mapping is correct per spec
    - Multi-yacht merge (Constanzo CALYPSO + Costanzo GEMINI under one
      client) — the design's signature win
    - Output is deterministic (run twice, identical)

Validation against real data
============================

Ran `pnpm tsx scripts/migrate-from-nocodb.ts --dry-run` against the
live NocoDB. Result on 252 Interests rows:
  - 237 clients (15 merged into 13 clusters)
  - 252 interests (one per source row)
  - 406 contacts, 52 addresses
  - 13 auto-linked clusters (every confirmed cluster from §1.2 audit)
  - 3 pairs flagged for review (Camazou, Zasso, one new)
  - 1 phone placeholder flagged

Total dedup test count: 57 (50 from P1 + 7 fixture tests).
Lint: clean. Tsc: clean for new files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:50:01 +02:00
Matt Ciaccio
8b077e1999 feat(dedup): normalization + match-finding library (P1)
The pure-logic spine of the client deduplication system spec'd in
docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md.
Two modules, JSX-free, vitest-tested against fixtures drawn directly
from real dirty values observed in the legacy NocoDB Interests audit.

src/lib/dedup/normalize.ts
- normalizeName: trims whitespace, replaces \r/\n/\t, intelligently
  title-cases ALL-CAPS surnames while keeping particles (van / de /
  dalla / etc.) lowercase mid-name. Preserves Irish O' surnames and
  the "slash-with-company" structure ("Daniel Wainstein / 7 Knots,
  LLC") seen in production. Returns a surnameToken (lowercased last
  non-particle token) for use as a dedup blocking key.
- normalizeEmail: trim + lowercase + zod email validation. Plus-aliases
  preserved; null on invalid.
- normalizePhone: pre-cleans the input (strips spreadsheet apostrophes,
  carriage returns, dots/dashes/parens, converts 00 prefix to +) then
  delegates to libphonenumber-js. Detects multi-number fields ("a/b",
  "a;b") and placeholder fakes (8+ consecutive zeros, e.g.
  +447000000000). Flags every quirk so the migration report and runtime
  audit log can surface it.
- resolveCountry: maps free-text country/region input to ISO-3166-1
  alpha-2 via alias → exact (vs. Intl-derived names) → city → fuzzy
  (Levenshtein ≤ 2). Fuzzy is gated by length so 4-char inputs ("Mars")
  don't false-positive against short country names.
- levenshtein: standard iterative implementation, exported for reuse
  by find-matches.

src/lib/dedup/find-matches.ts
- findClientMatches: builds three blocking indexes off the pool (email
  / phone / surname-token), gathers the comparison set via union, and
  scores each candidate via the rule set in design §4.2:
    Email match            +60
    Phone E.164 match      +50  (≥ 8 digits, excludes placeholder zeros)
    Name exact match       +20
    Surname + given fuzzy  +15  (Levenshtein ≤ 1)
    Negative: shared email but different phone country  −15
    Negative: name match but no shared contact          −20
  Score is clamped to [0,100]. Confidence tier ('high' / 'medium' /
  'low') is derived from configurable thresholds passed in by the
  caller — defaults are highScore=90, mediumScore=50.

tests/unit/dedup/normalize.test.ts (38 cases)
Every dirty-data pattern from design §1.3 has a fixture: carriage
returns in names, ALL-CAPS surnames, lowercase entries, particles,
slash-with-company, plus-aliases, capitalized email localparts,
spreadsheet-apostrophe phones, multi-number phones, placeholder
phones, 00-prefix phones, French/UK local-format phones,
Saint-Barthélemy diacritic variants, Kansas City fallback.

tests/unit/dedup/find-matches.test.ts (12 cases)
Each duplicate cluster from design §1.2 has a test:
- Pattern A (Deepak Ramchandani — pure double-submit) → high
- Pattern B (Howard Wiarda — phone format variance) → high
- Pattern C (Nicolas Ruiz — name capitalization) → high
- Pattern D (Chris/Christopher Allen — name shortening) → high
- Pattern E (Christopher Camazou — typo on resubmit) → high or medium
- Pattern E (Constanzo/Costanzo — surname typo, multi-yacht) → high
- Pattern F (Etiennette Clamouze — same name, different country) →
  must NOT auto-merge
- Pattern F (Bruno+Bruce — shared household contact) → no match
- Negative evidence (same email, different phone country) → medium
- Blocking (no shared keys → 0 matches)
- Sort order (high before low)
- Empty pool

Total: 50 new tests, all green. Zero changes to runtime behavior or
schema; unblocks P2 (runtime surfaces) and P3 (NocoDB migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:28:59 +02:00