Files
pn-new-crm/docs/superpowers/specs/2026-06-01-bulk-import-design.md

169 lines
8.7 KiB
Markdown
Raw Normal View History

# Bulk CSV/XLSX Importer — Design Spec
> **Status:** Approved (2026-06-01) · ready for implementation plan
> **Driver:** Replace the static `admin/import` mockup with a real
> self-serve importer. Primary purpose: **one-time cutover migration**
> of legacy NocoDB/portal data into the new CRM at launch.
> **Tracker:** `docs/launch-readiness.md` · feature-completeness batch.
## Purpose & scope
A visual importer that ingests CSV/XLSX exports of the legacy system and
loads them into the CRM with column-mapping, dry-run preview, dedup, and
per-batch undo. Built for the cutover migration but engineered as a
reusable engine (it can serve ongoing ops later without a rewrite).
**In scope — seven entities**, imported in dependency order so foreign
keys resolve by natural key:
| # | Entity | Dedup match-key | FKs resolved by natural key |
| --- | --------------- | ---------------------------------------------------------------- | --------------------------------------- |
| 1 | Companies | `name` (case-insensitive) | — |
| 2 | Clients | primary `email` → fallback canonical `phone` | — |
| 3 | Yachts | `name` + owner (or HIN if present) | owner → client email / company name |
| 4 | Berths | `mooringNumber` (canonical `^[A-Z]+\d+$`) | — |
| 5 | Interests/deals | default **create-new** (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring |
| 6 | Tenancies | client + berth + `startDate` | client → email, berth → mooring |
| 7 | Expenses | `date` + `amount` + `description` (or none) | — |
Berths are included for UI consistency even though
`scripts/import-berths-from-nocodb.ts` already covers them via CLI.
**Non-goals (v1):** full pre-update snapshot/revert of _updated_ rows
(undo covers inserts only); streaming multi-GB files (migration files
are small); scheduling/automation of imports; importing attachments/PDFs
(handled by the Initiative 5 MinIO backfill scripts, separate).
## Architecture — generic engine + per-entity adapter registry
One pipeline parameterised by a per-entity **adapter**, mirroring the
existing `src/lib/reports/custom/registry.ts` and settings-registry
patterns.
`src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and
`IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>`. Each adapter:
```ts
interface ImportAdapter {
key: ImportEntityKey;
label: string;
order: number; // dependency order (companies=1 … expenses=7)
dependsOn: ImportEntityKey[];
/** Target fields drive the column-mapping UI + zod validation. */
targetFields: ImportField[]; // { key, label, required, type, zod }
/** Natural key used for dedup + as the FK-resolution lookup value. */
matchKey: (row: MappedRow) => string | null;
/** Resolve FK ids by natural key against the live DB. Returns ids or a
* per-field resolution error. */
resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
/** Dedup lookup — find an existing row by matchKey within the port. */
findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
/** Writes delegate to the EXISTING service helpers so audit logging,
* validation, and polymorphic-ownership rules come for free. */
insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
}
```
Adding an entity = adding one adapter + registering it. No engine change.
## Pipeline (BullMQ `import` queue, concurrency 1)
The queue + worker already exist (`src/lib/queue/workers/import.ts` is
currently a documented no-op). We replace the no-op body with the real
processor and add a producer.
1. **Upload & parse.** Drag-drop CSV/XLSX → parse (papaparse for CSV;
**ExcelJS already installed** for XLSX) → raw rows. The uploaded file
is stored via `getStorageBackend()` under a temp prefix so the worker
can re-read it; cleaned up after commit or on expiry.
2. **Map columns.** Auto-suggest mappings by fuzzy header match to the
adapter's `targetFields`; user overrides; **save mapping as a per-port
template** (`import_mappings`) for re-runs.
3. **Dry-run (no writes).** Per row: apply mapping → zod-validate →
`resolveForeignKeys``findExisting` → classify as
`will-insert | will-update | will-skip | error(line, reason)`. Surface
counts + a sample of rows + a downloadable line-numbered error report.
4. **Commit.** Producer enqueues the job; the worker streams rows applying
the chosen **conflict policy** (`skip-matches` / `update-matches` /
`error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch
so valid rows still land; every action recorded in `import_batch_rows`;
`import_batches` updated with live progress + final counts.
5. **History + Undo.** Admin list of batches (status, counts, error-report
download). **Undo** deletes the rows a batch _inserted_, in reverse
dependency order, refusing if any inserted row now has dependents
created outside the batch. Updates are marked non-revertible in v1.
## Data model (3 new tables; no changes to entity tables)
- **`import_batches`** — `id, port_id, entity_type, filename, storage_key,
status (uploaded|dry_run|committing|completed|failed|undone),
total_rows, inserted, updated, skipped, errored, mapping_json,
conflict_policy, created_by, created_at, completed_at`.
- **`import_batch_rows`** — `id, batch_id, row_number, action
(inserted|updated|skipped|errored), entity_id (nullable), error
(nullable)`. Powers the error report + undo. Migration-scale volume is
fine.
- **`import_mappings`** — `id, port_id, entity_type, name, mapping_json,
created_by, created_at`. Saved column mappings, reusable across runs.
Migration added via the project's `psql`-applied numbered migration flow;
restart `next dev` after (prepared-statement cache caveat per CLAUDE.md).
## Validation, errors, conflict policy
- **Per-row zod** from each adapter's `targetFields`; failures collected
with row number + field + message, never aborting the whole file.
- **Downloadable error report** (CSV: row, field, message) from any
dry-run or completed batch.
- **Conflict policy** chosen per import, surfaced at the dry-run step
(three distinct behaviours for a matched row):
- `skip-matches` — insert new, leave matched rows untouched. Default;
safe to re-run.
- `update-matches` — insert new, overwrite matched rows with the file's
values (correct earlier mistakes).
- `error-on-match` — treat a match as a row error to review, importing
nothing for it (strictest).
## UI
A 4-step wizard mirroring the existing **bulk-add-berths wizard**:
1. Pick entity (registry-driven, shown in dependency order with a hint) +
upload file.
2. Map columns (auto-suggested; load a saved mapping; save current).
3. Dry-run preview — counts (new / update / skip / error), sample table,
error-report download, pick conflict policy.
4. Commit — progress bar (worker reports % via batch counts) → result
summary with link to History.
Plus an **Import History** tab: batch list + status + counts + error
report + **Undo**. Replaces the static mockup at
`src/app/(dashboard)/[portSlug]/admin/import/page.tsx`.
## Permissions & tenancy
Gate behind a new `data.import` permission (admin-tier). Every query +
write is `port_id`-scoped; FK resolution only matches within the port.
## Testing (TDD)
- **Per-adapter unit tests** (one suite each): column mapping, zod
validation (valid + each failure mode), `matchKey`, `resolveForeignKeys`
(hit / miss / ambiguous), `findExisting` dedup.
- **Dry-run classifier integration test** on a seeded DB: a fixture file
yielding one of each class (insert / update / skip / error).
- **Commit worker integration test**: each conflict policy; partial-failure
(valid rows land, errored rows reported); idempotent re-run.
- **Undo test**: deletes inserted rows; refuses when an inserted row has an
outside dependent.
## Decisions locked (defaults the user approved 2026-06-01)
- Rollback depth: **inserts-only undo**; updates non-revertible in v1.
- Partial failure: **valid rows commit**, errors reported (not
all-or-nothing).
- Berths: **included** in the UI importer despite the existing CLI.
- All seven entities in scope.
- Purpose: one-time cutover migration (engine reusable for ongoing ops).