docs/superpowers/specs/2026-06-01-bulk-import-design.md

# Bulk CSV/XLSX Importer — Design Spec

> **Status:** Approved (2026-06-01) · ready for implementation plan
> **Driver:** Replace the static `admin/import` mockup with a real
> self-serve importer. Primary purpose: **one-time cutover migration**
> of legacy NocoDB/portal data into the new CRM at launch.
> **Tracker:** `docs/launch-readiness.md` · feature-completeness batch.

## Purpose & scope

A visual importer that ingests CSV/XLSX exports of the legacy system and
loads them into the CRM with column-mapping, dry-run preview, dedup, and
per-batch undo. Built for the cutover migration but engineered as a
reusable engine (it can serve ongoing ops later without a rewrite).

**In scope — seven entities**, imported in dependency order so foreign
keys resolve by natural key:

| #   | Entity          | Dedup match-key                                                  | FKs resolved by natural key             |
| --- | --------------- | ---------------------------------------------------------------- | --------------------------------------- |
| 1   | Companies       | `name` (case-insensitive)                                        | —                                       |
| 2   | Clients         | primary `email` → fallback canonical `phone`                     | —                                       |
| 3   | Yachts          | `name` + owner (or HIN if present)                               | owner → client email / company name     |
| 4   | Berths          | `mooringNumber` (canonical `^[A-Z]+\d+$`)                        | —                                       |
| 5   | Interests/deals | default **create-new** (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring |
| 6   | Tenancies       | client + berth + `startDate`                                     | client → email, berth → mooring         |
| 7   | Expenses        | `date` + `amount` + `description` (or none)                      | —                                       |

Berths are included for UI consistency even though
`scripts/import-berths-from-nocodb.ts` already covers them via CLI.

**Non-goals (v1):** full pre-update snapshot/revert of _updated_ rows
(undo covers inserts only); streaming multi-GB files (migration files
are small); scheduling/automation of imports; importing attachments/PDFs
(handled by the Initiative 5 MinIO backfill scripts, separate).

## Architecture — generic engine + per-entity adapter registry

One pipeline parameterised by a per-entity **adapter**, mirroring the
existing `src/lib/reports/custom/registry.ts` and settings-registry
patterns.

`src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and
`IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>`. Each adapter:

```ts
interface ImportAdapter {
  key: ImportEntityKey;
  label: string;
  order: number; // dependency order (companies=1 … expenses=7)
  dependsOn: ImportEntityKey[];
  /** Target fields drive the column-mapping UI + zod validation. */
  targetFields: ImportField[]; // { key, label, required, type, zod }
  /** Natural key used for dedup + as the FK-resolution lookup value. */
  matchKey: (row: MappedRow) => string | null;
  /** Resolve FK ids by natural key against the live DB. Returns ids or a
   *  per-field resolution error. */
  resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
  /** Dedup lookup — find an existing row by matchKey within the port. */
  findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
  /** Writes delegate to the EXISTING service helpers so audit logging,
   *  validation, and polymorphic-ownership rules come for free. */
  insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
  update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
}
```

Adding an entity = adding one adapter + registering it. No engine change.

## Pipeline (BullMQ `import` queue, concurrency 1)

The queue + worker already exist (`src/lib/queue/workers/import.ts` is
currently a documented no-op). We replace the no-op body with the real
processor and add a producer.

1. **Upload & parse.** Drag-drop CSV/XLSX → parse (papaparse for CSV;
   **ExcelJS already installed** for XLSX) → raw rows. The uploaded file
   is stored via `getStorageBackend()` under a temp prefix so the worker
   can re-read it; cleaned up after commit or on expiry.
2. **Map columns.** Auto-suggest mappings by fuzzy header match to the
   adapter's `targetFields`; user overrides; **save mapping as a per-port
   template** (`import_mappings`) for re-runs.
3. **Dry-run (no writes).** Per row: apply mapping → zod-validate →
   `resolveForeignKeys` → `findExisting` → classify as
   `will-insert | will-update | will-skip | error(line, reason)`. Surface
   counts + a sample of rows + a downloadable line-numbered error report.
4. **Commit.** Producer enqueues the job; the worker streams rows applying
   the chosen **conflict policy** (`skip-matches` / `update-matches` /
   `error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch
   so valid rows still land; every action recorded in `import_batch_rows`;
   `import_batches` updated with live progress + final counts.
5. **History + Undo.** Admin list of batches (status, counts, error-report
   download). **Undo** deletes the rows a batch _inserted_, in reverse
   dependency order, refusing if any inserted row now has dependents
   created outside the batch. Updates are marked non-revertible in v1.

## Data model (3 new tables; no changes to entity tables)

- **`import_batches`** — `id, port_id, entity_type, filename, storage_key,
status (uploaded|dry_run|committing|completed|failed|undone),
total_rows, inserted, updated, skipped, errored, mapping_json,
conflict_policy, created_by, created_at, completed_at`.
- **`import_batch_rows`** — `id, batch_id, row_number, action
(inserted|updated|skipped|errored), entity_id (nullable), error
(nullable)`. Powers the error report + undo. Migration-scale volume is
  fine.
- **`import_mappings`** — `id, port_id, entity_type, name, mapping_json,
created_by, created_at`. Saved column mappings, reusable across runs.

Migration added via the project's `psql`-applied numbered migration flow;
restart `next dev` after (prepared-statement cache caveat per CLAUDE.md).

## Validation, errors, conflict policy

- **Per-row zod** from each adapter's `targetFields`; failures collected
  with row number + field + message, never aborting the whole file.
- **Downloadable error report** (CSV: row, field, message) from any
  dry-run or completed batch.
- **Conflict policy** chosen per import, surfaced at the dry-run step
  (three distinct behaviours for a matched row):
  - `skip-matches` — insert new, leave matched rows untouched. Default;
    safe to re-run.
  - `update-matches` — insert new, overwrite matched rows with the file's
    values (correct earlier mistakes).
  - `error-on-match` — treat a match as a row error to review, importing
    nothing for it (strictest).

## UI

A 4-step wizard mirroring the existing **bulk-add-berths wizard**:

1. Pick entity (registry-driven, shown in dependency order with a hint) +
   upload file.
2. Map columns (auto-suggested; load a saved mapping; save current).
3. Dry-run preview — counts (new / update / skip / error), sample table,
   error-report download, pick conflict policy.
4. Commit — progress bar (worker reports % via batch counts) → result
   summary with link to History.

Plus an **Import History** tab: batch list + status + counts + error
report + **Undo**. Replaces the static mockup at
`src/app/(dashboard)/[portSlug]/admin/import/page.tsx`.

## Permissions & tenancy

Gate behind a new `data.import` permission (admin-tier). Every query +
write is `port_id`-scoped; FK resolution only matches within the port.

## Testing (TDD)

- **Per-adapter unit tests** (one suite each): column mapping, zod
  validation (valid + each failure mode), `matchKey`, `resolveForeignKeys`
  (hit / miss / ambiguous), `findExisting` dedup.
- **Dry-run classifier integration test** on a seeded DB: a fixture file
  yielding one of each class (insert / update / skip / error).
- **Commit worker integration test**: each conflict policy; partial-failure
  (valid rows land, errored rows reported); idempotent re-run.
- **Undo test**: deletes inserted rows; refuses when an inserted row has an
  outside dependent.

## Decisions locked (defaults the user approved 2026-06-01)

- Rollback depth: **inserts-only undo**; updates non-revertible in v1.
- Partial failure: **valid rows commit**, errors reported (not
  all-or-nothing).
- Berths: **included** in the UI importer despite the existing CLI.
- All seven entities in scope.
- Purpose: one-time cutover migration (engine reusable for ongoing ops).
chore(launch-prep): hide unfinished report/import surfaces, defer big builds Ship-what's-done prep ahead of the prod cutover (launch ~today): - Hide Financial + Marketing report cards from the reports landing (both were "Builder in development" placeholders gated on unbuilt data sources). Sales/Operational/Custom + templates/scheduling/ exports remain live. - Trim the Custom-report card copy to match the shipped basic builder (no group-by/filters yet; the builder page header was already honest). - Hide the Bulk Import mockup from search-nav-catalog + the admin sections browser; /admin/import is now unreachable from the UI. - Correct client-facing doc over-claims (waiting-list "next-in-line notification", Import) in features-list.md + new-system-feature-summary.md. - Un-stale BACKLOG.md (Documenso phases 2-7 confirmed shipped). - Log decisions + deferred work (full importer, full custom-builder, waiting-list, maintenance-log, paper-upload bug) to launch-readiness.md. Deferred-importer design spec added at docs/superpowers/specs/2026-06-01-bulk-import-design.md. Verified: tsc --noEmit clean, eslint clean on changed files, 1512/1519 vitest pass (7 failures are Redis-down, unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 2026-06-01 16:39:51 +02:00			`# Bulk CSV/XLSX Importer — Design Spec`

			`> Status: Approved (2026-06-01) · ready for implementation plan`
			> Driver: Replace the static `admin/import` mockup with a real
			`> self-serve importer. Primary purpose: one-time cutover migration`
			`> of legacy NocoDB/portal data into the new CRM at launch.`
			> Tracker: `docs/launch-readiness.md` · feature-completeness batch.

			`## Purpose & scope`

			`A visual importer that ingests CSV/XLSX exports of the legacy system and`
			`loads them into the CRM with column-mapping, dry-run preview, dedup, and`
			`per-batch undo. Built for the cutover migration but engineered as a`
			`reusable engine (it can serve ongoing ops later without a rewrite).`

			`In scope — seven entities, imported in dependency order so foreign`
			`keys resolve by natural key:`

			`\| # \| Entity \| Dedup match-key \| FKs resolved by natural key \|`
			`\| --- \| --------------- \| ---------------------------------------------------------------- \| --------------------------------------- \|`
			\| 1 \| Companies \| `name` (case-insensitive) \| — \|
			\| 2 \| Clients \| primary `email` → fallback canonical `phone` \| — \|
			\| 3 \| Yachts \| `name` + owner (or HIN if present) \| owner → client email / company name \|
			\| 4 \| Berths \| `mooringNumber` (canonical `^[A-Z]+\d+$`) \| — \|
			`\| 5 \| Interests/deals \| default create-new (flag likely dupes by client+berth+stage) \| client → email, primary berth → mooring \|`
			\| 6 \| Tenancies \| client + berth + `startDate` \| client → email, berth → mooring \|
			\| 7 \| Expenses \| `date` + `amount` + `description` (or none) \| — \|

			`Berths are included for UI consistency even though`
			`scripts/import-berths-from-nocodb.ts` already covers them via CLI.

			`Non-goals (v1): full pre-update snapshot/revert of _updated_ rows`
			`(undo covers inserts only); streaming multi-GB files (migration files`
			`are small); scheduling/automation of imports; importing attachments/PDFs`
			`(handled by the Initiative 5 MinIO backfill scripts, separate).`

			`## Architecture — generic engine + per-entity adapter registry`

			`One pipeline parameterised by a per-entity adapter, mirroring the`
			existing `src/lib/reports/custom/registry.ts` and settings-registry
			`patterns.`

			`src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and
			`IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>`. Each adapter:

			```ts
			`interface ImportAdapter {`
			`key: ImportEntityKey;`
			`label: string;`
			`order: number; // dependency order (companies=1 … expenses=7)`
			`dependsOn: ImportEntityKey[];`
			`/** Target fields drive the column-mapping UI + zod validation. */`
			`targetFields: ImportField[]; // { key, label, required, type, zod }`
			`/** Natural key used for dedup + as the FK-resolution lookup value. */`
			`matchKey: (row: MappedRow) => string \| null;`
			`/** Resolve FK ids by natural key against the live DB. Returns ids or a`
			`* per-field resolution error. */`
			`resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;`
			`/** Dedup lookup — find an existing row by matchKey within the port. */`
			`findExisting: (portId: string, matchKey: string) => Promise<{ id: string } \| null>;`
			`/** Writes delegate to the EXISTING service helpers so audit logging,`
			`* validation, and polymorphic-ownership rules come for free. */`
			`insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;`
			`update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;`
			`}`
			```

			`Adding an entity = adding one adapter + registering it. No engine change.`

			## Pipeline (BullMQ `import` queue, concurrency 1)

			The queue + worker already exist (`src/lib/queue/workers/import.ts` is
			`currently a documented no-op). We replace the no-op body with the real`
			`processor and add a producer.`

			`1. Upload & parse. Drag-drop CSV/XLSX → parse (papaparse for CSV;`
			`ExcelJS already installed for XLSX) → raw rows. The uploaded file`
			is stored via `getStorageBackend()` under a temp prefix so the worker
			`can re-read it; cleaned up after commit or on expiry.`
			`2. Map columns. Auto-suggest mappings by fuzzy header match to the`
			adapter's `targetFields`; user overrides; **save mapping as a per-port
			template** (`import_mappings`) for re-runs.
			`3. Dry-run (no writes). Per row: apply mapping → zod-validate →`
			`resolveForeignKeys` → `findExisting` → classify as
			`will-insert \| will-update \| will-skip \| error(line, reason)`. Surface
			`counts + a sample of rows + a downloadable line-numbered error report.`
			`4. Commit. Producer enqueues the job; the worker streams rows applying`
			the chosen conflict policy (`skip-matches` / `update-matches` /
			`error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch
			so valid rows still land; every action recorded in `import_batch_rows`;
			`import_batches` updated with live progress + final counts.
			`5. History + Undo. Admin list of batches (status, counts, error-report`
			`download). Undo deletes the rows a batch _inserted_, in reverse`
			`dependency order, refusing if any inserted row now has dependents`
			`created outside the batch. Updates are marked non-revertible in v1.`

			`## Data model (3 new tables; no changes to entity tables)`

			- `import_batches` — `id, port_id, entity_type, filename, storage_key,
			`status (uploaded\|dry_run\|committing\|completed\|failed\|undone),`
			`total_rows, inserted, updated, skipped, errored, mapping_json,`
			conflict_policy, created_by, created_at, completed_at`.
			- `import_batch_rows` — `id, batch_id, row_number, action
			`(inserted\|updated\|skipped\|errored), entity_id (nullable), error`
			(nullable)`. Powers the error report + undo. Migration-scale volume is
			`fine.`
			- `import_mappings` — `id, port_id, entity_type, name, mapping_json,
			created_by, created_at`. Saved column mappings, reusable across runs.

			Migration added via the project's `psql`-applied numbered migration flow;
			restart `next dev` after (prepared-statement cache caveat per CLAUDE.md).

			`## Validation, errors, conflict policy`

			- Per-row zod from each adapter's `targetFields`; failures collected
			`with row number + field + message, never aborting the whole file.`
			`- Downloadable error report (CSV: row, field, message) from any`
			`dry-run or completed batch.`
			`- Conflict policy chosen per import, surfaced at the dry-run step`
			`(three distinct behaviours for a matched row):`
			- `skip-matches` — insert new, leave matched rows untouched. Default;
			`safe to re-run.`
			- `update-matches` — insert new, overwrite matched rows with the file's
			`values (correct earlier mistakes).`
			- `error-on-match` — treat a match as a row error to review, importing
			`nothing for it (strictest).`

			`## UI`

			`A 4-step wizard mirroring the existing bulk-add-berths wizard:`

			`1. Pick entity (registry-driven, shown in dependency order with a hint) +`
			`upload file.`
			`2. Map columns (auto-suggested; load a saved mapping; save current).`
			`3. Dry-run preview — counts (new / update / skip / error), sample table,`
			`error-report download, pick conflict policy.`
			`4. Commit — progress bar (worker reports % via batch counts) → result`
			`summary with link to History.`

			`Plus an Import History tab: batch list + status + counts + error`
			`report + Undo. Replaces the static mockup at`
			`src/app/(dashboard)/[portSlug]/admin/import/page.tsx`.

			`## Permissions & tenancy`

			Gate behind a new `data.import` permission (admin-tier). Every query +
			write is `port_id`-scoped; FK resolution only matches within the port.

			`## Testing (TDD)`

			`- Per-adapter unit tests (one suite each): column mapping, zod`
			validation (valid + each failure mode), `matchKey`, `resolveForeignKeys`
			(hit / miss / ambiguous), `findExisting` dedup.
			`- Dry-run classifier integration test on a seeded DB: a fixture file`
			`yielding one of each class (insert / update / skip / error).`
			`- Commit worker integration test: each conflict policy; partial-failure`
			`(valid rows land, errored rows reported); idempotent re-run.`
			`- Undo test: deletes inserted rows; refuses when an inserted row has an`
			`outside dependent.`

			`## Decisions locked (defaults the user approved 2026-06-01)`

			`- Rollback depth: inserts-only undo; updates non-revertible in v1.`
			`- Partial failure: valid rows commit, errors reported (not`
			`all-or-nothing).`
			`- Berths: included in the UI importer despite the existing CLI.`
			`- All seven entities in scope.`
			`- Purpose: one-time cutover migration (engine reusable for ongoing ops).`