# Bulk CSV/XLSX Importer — Design Spec > **Status:** Approved (2026-06-01) · ready for implementation plan > **Driver:** Replace the static `admin/import` mockup with a real > self-serve importer. Primary purpose: **one-time cutover migration** > of legacy NocoDB/portal data into the new CRM at launch. > **Tracker:** `docs/launch-readiness.md` · feature-completeness batch. ## Purpose & scope A visual importer that ingests CSV/XLSX exports of the legacy system and loads them into the CRM with column-mapping, dry-run preview, dedup, and per-batch undo. Built for the cutover migration but engineered as a reusable engine (it can serve ongoing ops later without a rewrite). **In scope — seven entities**, imported in dependency order so foreign keys resolve by natural key: | # | Entity | Dedup match-key | FKs resolved by natural key | | --- | --------------- | ---------------------------------------------------------------- | --------------------------------------- | | 1 | Companies | `name` (case-insensitive) | — | | 2 | Clients | primary `email` → fallback canonical `phone` | — | | 3 | Yachts | `name` + owner (or HIN if present) | owner → client email / company name | | 4 | Berths | `mooringNumber` (canonical `^[A-Z]+\d+$`) | — | | 5 | Interests/deals | default **create-new** (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring | | 6 | Tenancies | client + berth + `startDate` | client → email, berth → mooring | | 7 | Expenses | `date` + `amount` + `description` (or none) | — | Berths are included for UI consistency even though `scripts/import-berths-from-nocodb.ts` already covers them via CLI. **Non-goals (v1):** full pre-update snapshot/revert of _updated_ rows (undo covers inserts only); streaming multi-GB files (migration files are small); scheduling/automation of imports; importing attachments/PDFs (handled by the Initiative 5 MinIO backfill scripts, separate). ## Architecture — generic engine + per-entity adapter registry One pipeline parameterised by a per-entity **adapter**, mirroring the existing `src/lib/reports/custom/registry.ts` and settings-registry patterns. `src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and `IMPORT_REGISTRY: Record`. Each adapter: ```ts interface ImportAdapter { key: ImportEntityKey; label: string; order: number; // dependency order (companies=1 … expenses=7) dependsOn: ImportEntityKey[]; /** Target fields drive the column-mapping UI + zod validation. */ targetFields: ImportField[]; // { key, label, required, type, zod } /** Natural key used for dedup + as the FK-resolution lookup value. */ matchKey: (row: MappedRow) => string | null; /** Resolve FK ids by natural key against the live DB. Returns ids or a * per-field resolution error. */ resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise; /** Dedup lookup — find an existing row by matchKey within the port. */ findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>; /** Writes delegate to the EXISTING service helpers so audit logging, * validation, and polymorphic-ownership rules come for free. */ insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>; update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise; } ``` Adding an entity = adding one adapter + registering it. No engine change. ## Pipeline (BullMQ `import` queue, concurrency 1) The queue + worker already exist (`src/lib/queue/workers/import.ts` is currently a documented no-op). We replace the no-op body with the real processor and add a producer. 1. **Upload & parse.** Drag-drop CSV/XLSX → parse (papaparse for CSV; **ExcelJS already installed** for XLSX) → raw rows. The uploaded file is stored via `getStorageBackend()` under a temp prefix so the worker can re-read it; cleaned up after commit or on expiry. 2. **Map columns.** Auto-suggest mappings by fuzzy header match to the adapter's `targetFields`; user overrides; **save mapping as a per-port template** (`import_mappings`) for re-runs. 3. **Dry-run (no writes).** Per row: apply mapping → zod-validate → `resolveForeignKeys` → `findExisting` → classify as `will-insert | will-update | will-skip | error(line, reason)`. Surface counts + a sample of rows + a downloadable line-numbered error report. 4. **Commit.** Producer enqueues the job; the worker streams rows applying the chosen **conflict policy** (`skip-matches` / `update-matches` / `error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch so valid rows still land; every action recorded in `import_batch_rows`; `import_batches` updated with live progress + final counts. 5. **History + Undo.** Admin list of batches (status, counts, error-report download). **Undo** deletes the rows a batch _inserted_, in reverse dependency order, refusing if any inserted row now has dependents created outside the batch. Updates are marked non-revertible in v1. ## Data model (3 new tables; no changes to entity tables) - **`import_batches`** — `id, port_id, entity_type, filename, storage_key, status (uploaded|dry_run|committing|completed|failed|undone), total_rows, inserted, updated, skipped, errored, mapping_json, conflict_policy, created_by, created_at, completed_at`. - **`import_batch_rows`** — `id, batch_id, row_number, action (inserted|updated|skipped|errored), entity_id (nullable), error (nullable)`. Powers the error report + undo. Migration-scale volume is fine. - **`import_mappings`** — `id, port_id, entity_type, name, mapping_json, created_by, created_at`. Saved column mappings, reusable across runs. Migration added via the project's `psql`-applied numbered migration flow; restart `next dev` after (prepared-statement cache caveat per CLAUDE.md). ## Validation, errors, conflict policy - **Per-row zod** from each adapter's `targetFields`; failures collected with row number + field + message, never aborting the whole file. - **Downloadable error report** (CSV: row, field, message) from any dry-run or completed batch. - **Conflict policy** chosen per import, surfaced at the dry-run step (three distinct behaviours for a matched row): - `skip-matches` — insert new, leave matched rows untouched. Default; safe to re-run. - `update-matches` — insert new, overwrite matched rows with the file's values (correct earlier mistakes). - `error-on-match` — treat a match as a row error to review, importing nothing for it (strictest). ## UI A 4-step wizard mirroring the existing **bulk-add-berths wizard**: 1. Pick entity (registry-driven, shown in dependency order with a hint) + upload file. 2. Map columns (auto-suggested; load a saved mapping; save current). 3. Dry-run preview — counts (new / update / skip / error), sample table, error-report download, pick conflict policy. 4. Commit — progress bar (worker reports % via batch counts) → result summary with link to History. Plus an **Import History** tab: batch list + status + counts + error report + **Undo**. Replaces the static mockup at `src/app/(dashboard)/[portSlug]/admin/import/page.tsx`. ## Permissions & tenancy Gate behind a new `data.import` permission (admin-tier). Every query + write is `port_id`-scoped; FK resolution only matches within the port. ## Testing (TDD) - **Per-adapter unit tests** (one suite each): column mapping, zod validation (valid + each failure mode), `matchKey`, `resolveForeignKeys` (hit / miss / ambiguous), `findExisting` dedup. - **Dry-run classifier integration test** on a seeded DB: a fixture file yielding one of each class (insert / update / skip / error). - **Commit worker integration test**: each conflict policy; partial-failure (valid rows land, errored rows reported); idempotent re-run. - **Undo test**: deletes inserted rows; refuses when an inserted row has an outside dependent. ## Decisions locked (defaults the user approved 2026-06-01) - Rollback depth: **inserts-only undo**; updates non-revertible in v1. - Partial failure: **valid rows commit**, errors reported (not all-or-nothing). - Berths: **included** in the UI importer despite the existing CLI. - All seven entities in scope. - Purpose: one-time cutover migration (engine reusable for ongoing ops).