Ship-what's-done prep ahead of the prod cutover (launch ~today): - Hide Financial + Marketing report cards from the reports landing (both were "Builder in development" placeholders gated on unbuilt data sources). Sales/Operational/Custom + templates/scheduling/ exports remain live. - Trim the Custom-report card copy to match the shipped basic builder (no group-by/filters yet; the builder page header was already honest). - Hide the Bulk Import mockup from search-nav-catalog + the admin sections browser; /admin/import is now unreachable from the UI. - Correct client-facing doc over-claims (waiting-list "next-in-line notification", Import) in features-list.md + new-system-feature-summary.md. - Un-stale BACKLOG.md (Documenso phases 2-7 confirmed shipped). - Log decisions + deferred work (full importer, full custom-builder, waiting-list, maintenance-log, paper-upload bug) to launch-readiness.md. Deferred-importer design spec added at docs/superpowers/specs/2026-06-01-bulk-import-design.md. Verified: tsc --noEmit clean, eslint clean on changed files, 1512/1519 vitest pass (7 failures are Redis-down, unrelated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.7 KiB
Bulk CSV/XLSX Importer — Design Spec
Status: Approved (2026-06-01) · ready for implementation plan Driver: Replace the static
admin/importmockup with a real self-serve importer. Primary purpose: one-time cutover migration of legacy NocoDB/portal data into the new CRM at launch. Tracker:docs/launch-readiness.md· feature-completeness batch.
Purpose & scope
A visual importer that ingests CSV/XLSX exports of the legacy system and loads them into the CRM with column-mapping, dry-run preview, dedup, and per-batch undo. Built for the cutover migration but engineered as a reusable engine (it can serve ongoing ops later without a rewrite).
In scope — seven entities, imported in dependency order so foreign keys resolve by natural key:
| # | Entity | Dedup match-key | FKs resolved by natural key |
|---|---|---|---|
| 1 | Companies | name (case-insensitive) |
— |
| 2 | Clients | primary email → fallback canonical phone |
— |
| 3 | Yachts | name + owner (or HIN if present) |
owner → client email / company name |
| 4 | Berths | mooringNumber (canonical ^[A-Z]+\d+$) |
— |
| 5 | Interests/deals | default create-new (flag likely dupes by client+berth+stage) | client → email, primary berth → mooring |
| 6 | Tenancies | client + berth + startDate |
client → email, berth → mooring |
| 7 | Expenses | date + amount + description (or none) |
— |
Berths are included for UI consistency even though
scripts/import-berths-from-nocodb.ts already covers them via CLI.
Non-goals (v1): full pre-update snapshot/revert of updated rows (undo covers inserts only); streaming multi-GB files (migration files are small); scheduling/automation of imports; importing attachments/PDFs (handled by the Initiative 5 MinIO backfill scripts, separate).
Architecture — generic engine + per-entity adapter registry
One pipeline parameterised by a per-entity adapter, mirroring the
existing src/lib/reports/custom/registry.ts and settings-registry
patterns.
src/lib/import/registry.ts exports IMPORT_ENTITY_KEYS and
IMPORT_REGISTRY: Record<ImportEntityKey, ImportAdapter>. Each adapter:
interface ImportAdapter {
key: ImportEntityKey;
label: string;
order: number; // dependency order (companies=1 … expenses=7)
dependsOn: ImportEntityKey[];
/** Target fields drive the column-mapping UI + zod validation. */
targetFields: ImportField[]; // { key, label, required, type, zod }
/** Natural key used for dedup + as the FK-resolution lookup value. */
matchKey: (row: MappedRow) => string | null;
/** Resolve FK ids by natural key against the live DB. Returns ids or a
* per-field resolution error. */
resolveForeignKeys: (row: MappedRow, ctx: ImportCtx) => Promise<FkResult>;
/** Dedup lookup — find an existing row by matchKey within the port. */
findExisting: (portId: string, matchKey: string) => Promise<{ id: string } | null>;
/** Writes delegate to the EXISTING service helpers so audit logging,
* validation, and polymorphic-ownership rules come for free. */
insert: (row: ResolvedRow, ctx: ImportCtx) => Promise<{ id: string }>;
update: (existingId: string, row: ResolvedRow, ctx: ImportCtx) => Promise<void>;
}
Adding an entity = adding one adapter + registering it. No engine change.
Pipeline (BullMQ import queue, concurrency 1)
The queue + worker already exist (src/lib/queue/workers/import.ts is
currently a documented no-op). We replace the no-op body with the real
processor and add a producer.
- Upload & parse. Drag-drop CSV/XLSX → parse (papaparse for CSV;
ExcelJS already installed for XLSX) → raw rows. The uploaded file
is stored via
getStorageBackend()under a temp prefix so the worker can re-read it; cleaned up after commit or on expiry. - Map columns. Auto-suggest mappings by fuzzy header match to the
adapter's
targetFields; user overrides; save mapping as a per-port template (import_mappings) for re-runs. - Dry-run (no writes). Per row: apply mapping → zod-validate →
resolveForeignKeys→findExisting→ classify aswill-insert | will-update | will-skip | error(line, reason). Surface counts + a sample of rows + a downloadable line-numbered error report. - Commit. Producer enqueues the job; the worker streams rows applying
the chosen conflict policy (
skip-matches/update-matches/error-on-match) via the adapter'sinsert/update. Per-row try/catch so valid rows still land; every action recorded inimport_batch_rows;import_batchesupdated with live progress + final counts. - History + Undo. Admin list of batches (status, counts, error-report download). Undo deletes the rows a batch inserted, in reverse dependency order, refusing if any inserted row now has dependents created outside the batch. Updates are marked non-revertible in v1.
Data model (3 new tables; no changes to entity tables)
import_batches—id, port_id, entity_type, filename, storage_key, status (uploaded|dry_run|committing|completed|failed|undone), total_rows, inserted, updated, skipped, errored, mapping_json, conflict_policy, created_by, created_at, completed_at.import_batch_rows—id, batch_id, row_number, action (inserted|updated|skipped|errored), entity_id (nullable), error (nullable). Powers the error report + undo. Migration-scale volume is fine.import_mappings—id, port_id, entity_type, name, mapping_json, created_by, created_at. Saved column mappings, reusable across runs.
Migration added via the project's psql-applied numbered migration flow;
restart next dev after (prepared-statement cache caveat per CLAUDE.md).
Validation, errors, conflict policy
- Per-row zod from each adapter's
targetFields; failures collected with row number + field + message, never aborting the whole file. - Downloadable error report (CSV: row, field, message) from any dry-run or completed batch.
- Conflict policy chosen per import, surfaced at the dry-run step
(three distinct behaviours for a matched row):
skip-matches— insert new, leave matched rows untouched. Default; safe to re-run.update-matches— insert new, overwrite matched rows with the file's values (correct earlier mistakes).error-on-match— treat a match as a row error to review, importing nothing for it (strictest).
UI
A 4-step wizard mirroring the existing bulk-add-berths wizard:
- Pick entity (registry-driven, shown in dependency order with a hint) + upload file.
- Map columns (auto-suggested; load a saved mapping; save current).
- Dry-run preview — counts (new / update / skip / error), sample table, error-report download, pick conflict policy.
- Commit — progress bar (worker reports % via batch counts) → result summary with link to History.
Plus an Import History tab: batch list + status + counts + error
report + Undo. Replaces the static mockup at
src/app/(dashboard)/[portSlug]/admin/import/page.tsx.
Permissions & tenancy
Gate behind a new data.import permission (admin-tier). Every query +
write is port_id-scoped; FK resolution only matches within the port.
Testing (TDD)
- Per-adapter unit tests (one suite each): column mapping, zod
validation (valid + each failure mode),
matchKey,resolveForeignKeys(hit / miss / ambiguous),findExistingdedup. - Dry-run classifier integration test on a seeded DB: a fixture file yielding one of each class (insert / update / skip / error).
- Commit worker integration test: each conflict policy; partial-failure (valid rows land, errored rows reported); idempotent re-run.
- Undo test: deletes inserted rows; refuses when an inserted row has an outside dependent.
Decisions locked (defaults the user approved 2026-06-01)
- Rollback depth: inserts-only undo; updates non-revertible in v1.
- Partial failure: valid rows commit, errors reported (not all-or-nothing).
- Berths: included in the UI importer despite the existing CLI.
- All seven entities in scope.
- Purpose: one-time cutover migration (engine reusable for ongoing ops).