# Bulk CSV/XLSX Importer — Design Spec

> **Status:** Approved (2026-06-01) · ready for implementation plan
> **Driver:** Replace the static `admin/import` mockup with a real
> self-serve importer. Primary purpose: **one-time cutover migration**
> of legacy NocoDB/portal data into the new CRM at launch.
> **Tracker:** `docs/launch-readiness.md` · feature-completeness batch.

## Purpose & scope

A visual importer that ingests CSV/XLSX exports of the legacy system and
loads them into the CRM with column-mapping, dry-run preview, dedup, and
per-batch undo. Built for the cutover migration but engineered as a
reusable engine (it can serve ongoing ops later without a rewrite).

**In scope — seven entities**, imported in dependency order so foreign
keys resolve by natural key:

| #   | Entity          | Dedup match-key                                                  | FKs resolved by natural key             |
| --- | --------------- | ---------------------------------------------------------------- | --------------------------------------- |
| 1   | Companies       | `name` (case-insensitive)                                        | —                                       |
| 2   | Clients         | primary `email` → fallback canonical `phone`                     | —                                       |
| 3   | Yachts          | `name` + owner (or HIN if present)                               | owner → client email / company name     |
| 4   | Berths          | `mooringNumber` (canonical `^[A-Z]+\d+$`)                        | —                                       |
| 5   | Interests/deals | default **create-new** (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring |
| 6   | Tenancies       | client + berth + `startDate`                                     | client → email, berth → mooring         |
| 7   | Expenses        | `date` + `amount` + `description` (or none)                      | —                                       |

Berths are included for UI consistency even though
`scripts/import-berths-from-nocodb.ts` already covers them via CLI.

**Non-goals (v1):** full pre-update snapshot/revert of _updated_ rows
(undo covers inserts only); streaming multi-GB files (migration files
are small); scheduling/automation of imports; importing attachments/PDFs
(handled by the Initiative 5 MinIO backfill scripts, separate).

## Architecture — generic engine + per-entity adapter registry

One pipeline parameterised by a per-entity **adapter**, mirroring the
existing `src/lib/reports/custom/registry.ts` and settings-registry
patterns.

`src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and
`IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>`. Each adapter:

```ts
interface ImportAdapter {
  key: ImportEntityKey;
  label: string;
  order: number; // dependency order (companies=1 … expenses=7)
  dependsOn: ImportEntityKey[];
  /** Target fields drive the column-mapping UI + zod validation. */
  targetFields: ImportField[]; // { key, label, required, type, zod }
  /** Natural key used for dedup + as the FK-resolution lookup value. */
  matchKey: (row: MappedRow) => string | null;
  /** Resolve FK ids by natural key against the live DB. Returns ids or a
   *  per-field resolution error. */
  resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
  /** Dedup lookup — find an existing row by matchKey within the port. */
  findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
  /** Writes delegate to the EXISTING service helpers so audit logging,
   *  validation, and polymorphic-ownership rules come for free. */
  insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
  update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
}
```

Adding an entity = adding one adapter + registering it. No engine change.

## Pipeline (BullMQ `import` queue, concurrency 1)

The queue + worker already exist (`src/lib/queue/workers/import.ts` is
currently a documented no-op). We replace the no-op body with the real
processor and add a producer.

1. **Upload & parse.** Drag-drop CSV/XLSX → parse (papaparse for CSV;
   **ExcelJS already installed** for XLSX) → raw rows. The uploaded file
   is stored via `getStorageBackend()` under a temp prefix so the worker
   can re-read it; cleaned up after commit or on expiry.
2. **Map columns.** Auto-suggest mappings by fuzzy header match to the
   adapter's `targetFields`; user overrides; **save mapping as a per-port
   template** (`import_mappings`) for re-runs.
3. **Dry-run (no writes).** Per row: apply mapping → zod-validate →
   `resolveForeignKeys` → `findExisting` → classify as
   `will-insert | will-update | will-skip | error(line, reason)`. Surface
   counts + a sample of rows + a downloadable line-numbered error report.
4. **Commit.** Producer enqueues the job; the worker streams rows applying
   the chosen **conflict policy** (`skip-matches` / `update-matches` /
   `error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch
   so valid rows still land; every action recorded in `import_batch_rows`;
   `import_batches` updated with live progress + final counts.
5. **History + Undo.** Admin list of batches (status, counts, error-report
   download). **Undo** deletes the rows a batch _inserted_, in reverse
   dependency order, refusing if any inserted row now has dependents
   created outside the batch. Updates are marked non-revertible in v1.

## Data model (3 new tables; no changes to entity tables)

- **`import_batches`** — `id, port_id, entity_type, filename, storage_key,
status (uploaded|dry_run|committing|completed|failed|undone),
total_rows, inserted, updated, skipped, errored, mapping_json,
conflict_policy, created_by, created_at, completed_at`.
- **`import_batch_rows`** — `id, batch_id, row_number, action
(inserted|updated|skipped|errored), entity_id (nullable), error
(nullable)`. Powers the error report + undo. Migration-scale volume is
  fine.
- **`import_mappings`** — `id, port_id, entity_type, name, mapping_json,
created_by, created_at`. Saved column mappings, reusable across runs.

Migration added via the project's `psql`-applied numbered migration flow;
restart `next dev` after (prepared-statement cache caveat per CLAUDE.md).

## Validation, errors, conflict policy

- **Per-row zod** from each adapter's `targetFields`; failures collected
  with row number + field + message, never aborting the whole file.
- **Downloadable error report** (CSV: row, field, message) from any
  dry-run or completed batch.
- **Conflict policy** chosen per import, surfaced at the dry-run step
  (three distinct behaviours for a matched row):
  - `skip-matches` — insert new, leave matched rows untouched. Default;
    safe to re-run.
  - `update-matches` — insert new, overwrite matched rows with the file's
    values (correct earlier mistakes).
  - `error-on-match` — treat a match as a row error to review, importing
    nothing for it (strictest).

## UI

A 4-step wizard mirroring the existing **bulk-add-berths wizard**:

1. Pick entity (registry-driven, shown in dependency order with a hint) +
   upload file.
2. Map columns (auto-suggested; load a saved mapping; save current).
3. Dry-run preview — counts (new / update / skip / error), sample table,
   error-report download, pick conflict policy.
4. Commit — progress bar (worker reports % via batch counts) → result
   summary with link to History.

Plus an **Import History** tab: batch list + status + counts + error
report + **Undo**. Replaces the static mockup at
`src/app/(dashboard)/[portSlug]/admin/import/page.tsx`.

## Permissions & tenancy

Gate behind a new `data.import` permission (admin-tier). Every query +
write is `port_id`-scoped; FK resolution only matches within the port.

## Testing (TDD)

- **Per-adapter unit tests** (one suite each): column mapping, zod
  validation (valid + each failure mode), `matchKey`, `resolveForeignKeys`
  (hit / miss / ambiguous), `findExisting` dedup.
- **Dry-run classifier integration test** on a seeded DB: a fixture file
  yielding one of each class (insert / update / skip / error).
- **Commit worker integration test**: each conflict policy; partial-failure
  (valid rows land, errored rows reported); idempotent re-run.
- **Undo test**: deletes inserted rows; refuses when an inserted row has an
  outside dependent.

## Decisions locked (defaults the user approved 2026-06-01)

- Rollback depth: **inserts-only undo**; updates non-revertible in v1.
- Partial failure: **valid rows commit**, errors reported (not
  all-or-nothing).
- Berths: **included** in the UI importer despite the existing CLI.
- All seven entities in scope.
- Purpose: one-time cutover migration (engine reusable for ongoing ops).