Commit Graph

2 Commits

Author SHA1 Message Date
221ae5784e chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:

- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
  never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
  after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
  redirects (ocr to ai, reports to dashboard, invitations to users),
  docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
  flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
  let-reassign), set-state-in-effect disables in CountryFlag and
  UploadForSigning preview-bytes effect, unused 'confirm' destructures in
  interest contract + reservation tabs, unescaped apostrophe in test-template
  card copy
2026-05-23 00:52:59 +02:00
Matt Ciaccio
8b077e1999 feat(dedup): normalization + match-finding library (P1)
The pure-logic spine of the client deduplication system spec'd in
docs/superpowers/specs/2026-05-03-dedup-and-migration-design.md.
Two modules, JSX-free, vitest-tested against fixtures drawn directly
from real dirty values observed in the legacy NocoDB Interests audit.

src/lib/dedup/normalize.ts
- normalizeName: trims whitespace, replaces \r/\n/\t, intelligently
  title-cases ALL-CAPS surnames while keeping particles (van / de /
  dalla / etc.) lowercase mid-name. Preserves Irish O' surnames and
  the "slash-with-company" structure ("Daniel Wainstein / 7 Knots,
  LLC") seen in production. Returns a surnameToken (lowercased last
  non-particle token) for use as a dedup blocking key.
- normalizeEmail: trim + lowercase + zod email validation. Plus-aliases
  preserved; null on invalid.
- normalizePhone: pre-cleans the input (strips spreadsheet apostrophes,
  carriage returns, dots/dashes/parens, converts 00 prefix to +) then
  delegates to libphonenumber-js. Detects multi-number fields ("a/b",
  "a;b") and placeholder fakes (8+ consecutive zeros, e.g.
  +447000000000). Flags every quirk so the migration report and runtime
  audit log can surface it.
- resolveCountry: maps free-text country/region input to ISO-3166-1
  alpha-2 via alias → exact (vs. Intl-derived names) → city → fuzzy
  (Levenshtein ≤ 2). Fuzzy is gated by length so 4-char inputs ("Mars")
  don't false-positive against short country names.
- levenshtein: standard iterative implementation, exported for reuse
  by find-matches.

src/lib/dedup/find-matches.ts
- findClientMatches: builds three blocking indexes off the pool (email
  / phone / surname-token), gathers the comparison set via union, and
  scores each candidate via the rule set in design §4.2:
    Email match            +60
    Phone E.164 match      +50  (≥ 8 digits, excludes placeholder zeros)
    Name exact match       +20
    Surname + given fuzzy  +15  (Levenshtein ≤ 1)
    Negative: shared email but different phone country  −15
    Negative: name match but no shared contact          −20
  Score is clamped to [0,100]. Confidence tier ('high' / 'medium' /
  'low') is derived from configurable thresholds passed in by the
  caller — defaults are highScore=90, mediumScore=50.

tests/unit/dedup/normalize.test.ts (38 cases)
Every dirty-data pattern from design §1.3 has a fixture: carriage
returns in names, ALL-CAPS surnames, lowercase entries, particles,
slash-with-company, plus-aliases, capitalized email localparts,
spreadsheet-apostrophe phones, multi-number phones, placeholder
phones, 00-prefix phones, French/UK local-format phones,
Saint-Barthélemy diacritic variants, Kansas City fallback.

tests/unit/dedup/find-matches.test.ts (12 cases)
Each duplicate cluster from design §1.2 has a test:
- Pattern A (Deepak Ramchandani — pure double-submit) → high
- Pattern B (Howard Wiarda — phone format variance) → high
- Pattern C (Nicolas Ruiz — name capitalization) → high
- Pattern D (Chris/Christopher Allen — name shortening) → high
- Pattern E (Christopher Camazou — typo on resubmit) → high or medium
- Pattern E (Constanzo/Costanzo — surname typo, multi-yacht) → high
- Pattern F (Etiennette Clamouze — same name, different country) →
  must NOT auto-merge
- Pattern F (Bruno+Bruce — shared household contact) → no match
- Negative evidence (same email, different phone country) → medium
- Blocking (no shared keys → 0 matches)
- Sort order (high before low)
- Empty pool

Total: 50 new tests, all green. Zero changes to runtime behavior or
schema; unblocks P2 (runtime surfaces) and P3 (NocoDB migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:28:59 +02:00