169 lines
8.7 KiB
Markdown
169 lines
8.7 KiB
Markdown
|
|
# Bulk CSV/XLSX Importer — Design Spec
|
||
|
|
|
||
|
|
> **Status:** Approved (2026-06-01) · ready for implementation plan
|
||
|
|
> **Driver:** Replace the static `admin/import` mockup with a real
|
||
|
|
> self-serve importer. Primary purpose: **one-time cutover migration**
|
||
|
|
> of legacy NocoDB/portal data into the new CRM at launch.
|
||
|
|
> **Tracker:** `docs/launch-readiness.md` · feature-completeness batch.
|
||
|
|
|
||
|
|
## Purpose & scope
|
||
|
|
|
||
|
|
A visual importer that ingests CSV/XLSX exports of the legacy system and
|
||
|
|
loads them into the CRM with column-mapping, dry-run preview, dedup, and
|
||
|
|
per-batch undo. Built for the cutover migration but engineered as a
|
||
|
|
reusable engine (it can serve ongoing ops later without a rewrite).
|
||
|
|
|
||
|
|
**In scope — seven entities**, imported in dependency order so foreign
|
||
|
|
keys resolve by natural key:
|
||
|
|
|
||
|
|
| # | Entity | Dedup match-key | FKs resolved by natural key |
|
||
|
|
| --- | --------------- | ---------------------------------------------------------------- | --------------------------------------- |
|
||
|
|
| 1 | Companies | `name` (case-insensitive) | — |
|
||
|
|
| 2 | Clients | primary `email` → fallback canonical `phone` | — |
|
||
|
|
| 3 | Yachts | `name` + owner (or HIN if present) | owner → client email / company name |
|
||
|
|
| 4 | Berths | `mooringNumber` (canonical `^[A-Z]+\d+$`) | — |
|
||
|
|
| 5 | Interests/deals | default **create-new** (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring |
|
||
|
|
| 6 | Tenancies | client + berth + `startDate` | client → email, berth → mooring |
|
||
|
|
| 7 | Expenses | `date` + `amount` + `description` (or none) | — |
|
||
|
|
|
||
|
|
Berths are included for UI consistency even though
|
||
|
|
`scripts/import-berths-from-nocodb.ts` already covers them via CLI.
|
||
|
|
|
||
|
|
**Non-goals (v1):** full pre-update snapshot/revert of _updated_ rows
|
||
|
|
(undo covers inserts only); streaming multi-GB files (migration files
|
||
|
|
are small); scheduling/automation of imports; importing attachments/PDFs
|
||
|
|
(handled by the Initiative 5 MinIO backfill scripts, separate).
|
||
|
|
|
||
|
|
## Architecture — generic engine + per-entity adapter registry
|
||
|
|
|
||
|
|
One pipeline parameterised by a per-entity **adapter**, mirroring the
|
||
|
|
existing `src/lib/reports/custom/registry.ts` and settings-registry
|
||
|
|
patterns.
|
||
|
|
|
||
|
|
`src/lib/import/registry.ts` exports `IMPORT_ENTITY_KEYS` and
|
||
|
|
`IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>`. Each adapter:
|
||
|
|
|
||
|
|
```ts
|
||
|
|
interface ImportAdapter {
|
||
|
|
key: ImportEntityKey;
|
||
|
|
label: string;
|
||
|
|
order: number; // dependency order (companies=1 … expenses=7)
|
||
|
|
dependsOn: ImportEntityKey[];
|
||
|
|
/** Target fields drive the column-mapping UI + zod validation. */
|
||
|
|
targetFields: ImportField[]; // { key, label, required, type, zod }
|
||
|
|
/** Natural key used for dedup + as the FK-resolution lookup value. */
|
||
|
|
matchKey: (row: MappedRow) => string | null;
|
||
|
|
/** Resolve FK ids by natural key against the live DB. Returns ids or a
|
||
|
|
* per-field resolution error. */
|
||
|
|
resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
|
||
|
|
/** Dedup lookup — find an existing row by matchKey within the port. */
|
||
|
|
findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
|
||
|
|
/** Writes delegate to the EXISTING service helpers so audit logging,
|
||
|
|
* validation, and polymorphic-ownership rules come for free. */
|
||
|
|
insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
|
||
|
|
update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Adding an entity = adding one adapter + registering it. No engine change.
|
||
|
|
|
||
|
|
## Pipeline (BullMQ `import` queue, concurrency 1)
|
||
|
|
|
||
|
|
The queue + worker already exist (`src/lib/queue/workers/import.ts` is
|
||
|
|
currently a documented no-op). We replace the no-op body with the real
|
||
|
|
processor and add a producer.
|
||
|
|
|
||
|
|
1. **Upload & parse.** Drag-drop CSV/XLSX → parse (papaparse for CSV;
|
||
|
|
**ExcelJS already installed** for XLSX) → raw rows. The uploaded file
|
||
|
|
is stored via `getStorageBackend()` under a temp prefix so the worker
|
||
|
|
can re-read it; cleaned up after commit or on expiry.
|
||
|
|
2. **Map columns.** Auto-suggest mappings by fuzzy header match to the
|
||
|
|
adapter's `targetFields`; user overrides; **save mapping as a per-port
|
||
|
|
template** (`import_mappings`) for re-runs.
|
||
|
|
3. **Dry-run (no writes).** Per row: apply mapping → zod-validate →
|
||
|
|
`resolveForeignKeys` → `findExisting` → classify as
|
||
|
|
`will-insert | will-update | will-skip | error(line, reason)`. Surface
|
||
|
|
counts + a sample of rows + a downloadable line-numbered error report.
|
||
|
|
4. **Commit.** Producer enqueues the job; the worker streams rows applying
|
||
|
|
the chosen **conflict policy** (`skip-matches` / `update-matches` /
|
||
|
|
`error-on-match`) via the adapter's `insert`/`update`. Per-row try/catch
|
||
|
|
so valid rows still land; every action recorded in `import_batch_rows`;
|
||
|
|
`import_batches` updated with live progress + final counts.
|
||
|
|
5. **History + Undo.** Admin list of batches (status, counts, error-report
|
||
|
|
download). **Undo** deletes the rows a batch _inserted_, in reverse
|
||
|
|
dependency order, refusing if any inserted row now has dependents
|
||
|
|
created outside the batch. Updates are marked non-revertible in v1.
|
||
|
|
|
||
|
|
## Data model (3 new tables; no changes to entity tables)
|
||
|
|
|
||
|
|
- **`import_batches`** — `id, port_id, entity_type, filename, storage_key,
|
||
|
|
status (uploaded|dry_run|committing|completed|failed|undone),
|
||
|
|
total_rows, inserted, updated, skipped, errored, mapping_json,
|
||
|
|
conflict_policy, created_by, created_at, completed_at`.
|
||
|
|
- **`import_batch_rows`** — `id, batch_id, row_number, action
|
||
|
|
(inserted|updated|skipped|errored), entity_id (nullable), error
|
||
|
|
(nullable)`. Powers the error report + undo. Migration-scale volume is
|
||
|
|
fine.
|
||
|
|
- **`import_mappings`** — `id, port_id, entity_type, name, mapping_json,
|
||
|
|
created_by, created_at`. Saved column mappings, reusable across runs.
|
||
|
|
|
||
|
|
Migration added via the project's `psql`-applied numbered migration flow;
|
||
|
|
restart `next dev` after (prepared-statement cache caveat per CLAUDE.md).
|
||
|
|
|
||
|
|
## Validation, errors, conflict policy
|
||
|
|
|
||
|
|
- **Per-row zod** from each adapter's `targetFields`; failures collected
|
||
|
|
with row number + field + message, never aborting the whole file.
|
||
|
|
- **Downloadable error report** (CSV: row, field, message) from any
|
||
|
|
dry-run or completed batch.
|
||
|
|
- **Conflict policy** chosen per import, surfaced at the dry-run step
|
||
|
|
(three distinct behaviours for a matched row):
|
||
|
|
- `skip-matches` — insert new, leave matched rows untouched. Default;
|
||
|
|
safe to re-run.
|
||
|
|
- `update-matches` — insert new, overwrite matched rows with the file's
|
||
|
|
values (correct earlier mistakes).
|
||
|
|
- `error-on-match` — treat a match as a row error to review, importing
|
||
|
|
nothing for it (strictest).
|
||
|
|
|
||
|
|
## UI
|
||
|
|
|
||
|
|
A 4-step wizard mirroring the existing **bulk-add-berths wizard**:
|
||
|
|
|
||
|
|
1. Pick entity (registry-driven, shown in dependency order with a hint) +
|
||
|
|
upload file.
|
||
|
|
2. Map columns (auto-suggested; load a saved mapping; save current).
|
||
|
|
3. Dry-run preview — counts (new / update / skip / error), sample table,
|
||
|
|
error-report download, pick conflict policy.
|
||
|
|
4. Commit — progress bar (worker reports % via batch counts) → result
|
||
|
|
summary with link to History.
|
||
|
|
|
||
|
|
Plus an **Import History** tab: batch list + status + counts + error
|
||
|
|
report + **Undo**. Replaces the static mockup at
|
||
|
|
`src/app/(dashboard)/[portSlug]/admin/import/page.tsx`.
|
||
|
|
|
||
|
|
## Permissions & tenancy
|
||
|
|
|
||
|
|
Gate behind a new `data.import` permission (admin-tier). Every query +
|
||
|
|
write is `port_id`-scoped; FK resolution only matches within the port.
|
||
|
|
|
||
|
|
## Testing (TDD)
|
||
|
|
|
||
|
|
- **Per-adapter unit tests** (one suite each): column mapping, zod
|
||
|
|
validation (valid + each failure mode), `matchKey`, `resolveForeignKeys`
|
||
|
|
(hit / miss / ambiguous), `findExisting` dedup.
|
||
|
|
- **Dry-run classifier integration test** on a seeded DB: a fixture file
|
||
|
|
yielding one of each class (insert / update / skip / error).
|
||
|
|
- **Commit worker integration test**: each conflict policy; partial-failure
|
||
|
|
(valid rows land, errored rows reported); idempotent re-run.
|
||
|
|
- **Undo test**: deletes inserted rows; refuses when an inserted row has an
|
||
|
|
outside dependent.
|
||
|
|
|
||
|
|
## Decisions locked (defaults the user approved 2026-06-01)
|
||
|
|
|
||
|
|
- Rollback depth: **inserts-only undo**; updates non-revertible in v1.
|
||
|
|
- Partial failure: **valid rows commit**, errors reported (not
|
||
|
|
all-or-nothing).
|
||
|
|
- Berths: **included** in the UI importer despite the existing CLI.
|
||
|
|
- All seven entities in scope.
|
||
|
|
- Purpose: one-time cutover migration (engine reusable for ongoing ops).
|