Files
pn-new-crm/docs/superpowers/specs/2026-06-01-bulk-import-design.md
Matt 31ba72f344 chore(launch-prep): hide unfinished report/import surfaces, defer big builds
Ship-what's-done prep ahead of the prod cutover (launch ~today):

- Hide Financial + Marketing report cards from the reports landing
  (both were "Builder in development" placeholders gated on unbuilt
  data sources). Sales/Operational/Custom + templates/scheduling/
  exports remain live.
- Trim the Custom-report card copy to match the shipped basic builder
  (no group-by/filters yet; the builder page header was already honest).
- Hide the Bulk Import mockup from search-nav-catalog + the admin
  sections browser; /admin/import is now unreachable from the UI.
- Correct client-facing doc over-claims (waiting-list "next-in-line
  notification", Import) in features-list.md + new-system-feature-summary.md.
- Un-stale BACKLOG.md (Documenso phases 2-7 confirmed shipped).
- Log decisions + deferred work (full importer, full custom-builder,
  waiting-list, maintenance-log, paper-upload bug) to launch-readiness.md.

Deferred-importer design spec added at
docs/superpowers/specs/2026-06-01-bulk-import-design.md.

Verified: tsc --noEmit clean, eslint clean on changed files,
1512/1519 vitest pass (7 failures are Redis-down, unrelated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 16:39:51 +02:00

8.7 KiB

Bulk CSV/XLSX Importer — Design Spec

Status: Approved (2026-06-01) · ready for implementation plan Driver: Replace the static admin/import mockup with a real self-serve importer. Primary purpose: one-time cutover migration of legacy NocoDB/portal data into the new CRM at launch. Tracker: docs/launch-readiness.md · feature-completeness batch.

Purpose & scope

A visual importer that ingests CSV/XLSX exports of the legacy system and loads them into the CRM with column-mapping, dry-run preview, dedup, and per-batch undo. Built for the cutover migration but engineered as a reusable engine (it can serve ongoing ops later without a rewrite).

In scope — seven entities, imported in dependency order so foreign keys resolve by natural key:

# Entity Dedup match-key FKs resolved by natural key
1 Companies name (case-insensitive)
2 Clients primary email → fallback canonical phone
3 Yachts name + owner (or HIN if present) owner → client email / company name
4 Berths mooringNumber (canonical ^[A-Z]+\d+$)
5 Interests/deals default create-new (flag likely dupes by client+berth+stage) client → email, primary berth → mooring
6 Tenancies client + berth + startDate client → email, berth → mooring
7 Expenses date + amount + description (or none)

Berths are included for UI consistency even though scripts/import-berths-from-nocodb.ts already covers them via CLI.

Non-goals (v1): full pre-update snapshot/revert of updated rows (undo covers inserts only); streaming multi-GB files (migration files are small); scheduling/automation of imports; importing attachments/PDFs (handled by the Initiative 5 MinIO backfill scripts, separate).

Architecture — generic engine + per-entity adapter registry

One pipeline parameterised by a per-entity adapter, mirroring the existing src/lib/reports/custom/registry.ts and settings-registry patterns.

src/lib/import/registry.ts exports IMPORT_ENTITY_KEYS and IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>. Each adapter:

interface ImportAdapter {
  key: ImportEntityKey;
  label: string;
  order: number; // dependency order (companies=1 … expenses=7)
  dependsOn: ImportEntityKey[];
  /** Target fields drive the column-mapping UI + zod validation. */
  targetFields: ImportField[]; // { key, label, required, type, zod }
  /** Natural key used for dedup + as the FK-resolution lookup value. */
  matchKey: (row: MappedRow) => string | null;
  /** Resolve FK ids by natural key against the live DB. Returns ids or a
   *  per-field resolution error. */
  resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
  /** Dedup lookup — find an existing row by matchKey within the port. */
  findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
  /** Writes delegate to the EXISTING service helpers so audit logging,
   *  validation, and polymorphic-ownership rules come for free. */
  insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
  update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
}

Adding an entity = adding one adapter + registering it. No engine change.

Pipeline (BullMQ import queue, concurrency 1)

The queue + worker already exist (src/lib/queue/workers/import.ts is currently a documented no-op). We replace the no-op body with the real processor and add a producer.

  1. Upload & parse. Drag-drop CSV/XLSX → parse (papaparse for CSV; ExcelJS already installed for XLSX) → raw rows. The uploaded file is stored via getStorageBackend() under a temp prefix so the worker can re-read it; cleaned up after commit or on expiry.
  2. Map columns. Auto-suggest mappings by fuzzy header match to the adapter's targetFields; user overrides; save mapping as a per-port template (import_mappings) for re-runs.
  3. Dry-run (no writes). Per row: apply mapping → zod-validate → resolveForeignKeysfindExisting → classify as will-insert | will-update | will-skip | error(line, reason). Surface counts + a sample of rows + a downloadable line-numbered error report.
  4. Commit. Producer enqueues the job; the worker streams rows applying the chosen conflict policy (skip-matches / update-matches / error-on-match) via the adapter's insert/update. Per-row try/catch so valid rows still land; every action recorded in import_batch_rows; import_batches updated with live progress + final counts.
  5. History + Undo. Admin list of batches (status, counts, error-report download). Undo deletes the rows a batch inserted, in reverse dependency order, refusing if any inserted row now has dependents created outside the batch. Updates are marked non-revertible in v1.

Data model (3 new tables; no changes to entity tables)

  • import_batchesid, port_id, entity_type, filename, storage_key, status (uploaded|dry_run|committing|completed|failed|undone), total_rows, inserted, updated, skipped, errored, mapping_json, conflict_policy, created_by, created_at, completed_at.
  • import_batch_rowsid, batch_id, row_number, action (inserted|updated|skipped|errored), entity_id (nullable), error (nullable). Powers the error report + undo. Migration-scale volume is fine.
  • import_mappingsid, port_id, entity_type, name, mapping_json, created_by, created_at. Saved column mappings, reusable across runs.

Migration added via the project's psql-applied numbered migration flow; restart next dev after (prepared-statement cache caveat per CLAUDE.md).

Validation, errors, conflict policy

  • Per-row zod from each adapter's targetFields; failures collected with row number + field + message, never aborting the whole file.
  • Downloadable error report (CSV: row, field, message) from any dry-run or completed batch.
  • Conflict policy chosen per import, surfaced at the dry-run step (three distinct behaviours for a matched row):
    • skip-matches — insert new, leave matched rows untouched. Default; safe to re-run.
    • update-matches — insert new, overwrite matched rows with the file's values (correct earlier mistakes).
    • error-on-match — treat a match as a row error to review, importing nothing for it (strictest).

UI

A 4-step wizard mirroring the existing bulk-add-berths wizard:

  1. Pick entity (registry-driven, shown in dependency order with a hint) + upload file.
  2. Map columns (auto-suggested; load a saved mapping; save current).
  3. Dry-run preview — counts (new / update / skip / error), sample table, error-report download, pick conflict policy.
  4. Commit — progress bar (worker reports % via batch counts) → result summary with link to History.

Plus an Import History tab: batch list + status + counts + error report + Undo. Replaces the static mockup at src/app/(dashboard)/[portSlug]/admin/import/page.tsx.

Permissions & tenancy

Gate behind a new data.import permission (admin-tier). Every query + write is port_id-scoped; FK resolution only matches within the port.

Testing (TDD)

  • Per-adapter unit tests (one suite each): column mapping, zod validation (valid + each failure mode), matchKey, resolveForeignKeys (hit / miss / ambiguous), findExisting dedup.
  • Dry-run classifier integration test on a seeded DB: a fixture file yielding one of each class (insert / update / skip / error).
  • Commit worker integration test: each conflict policy; partial-failure (valid rows land, errored rows reported); idempotent re-run.
  • Undo test: deletes inserted rows; refuses when an inserted row has an outside dependent.

Decisions locked (defaults the user approved 2026-06-01)

  • Rollback depth: inserts-only undo; updates non-revertible in v1.
  • Partial failure: valid rows commit, errors reported (not all-or-nothing).
  • Berths: included in the UI importer despite the existing CLI.
  • All seven entities in scope.
  • Purpose: one-time cutover migration (engine reusable for ongoing ops).