feat(berths): per-berth PDF storage (versioned) + reverse parser

Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.

Schema (migration 0030_berth_pdf_versions):
  - new table `berth_pdf_versions` with monotonic `version_number` per
    berth, `storage_key` (renamed convention from §4.7a), sha256, size,
    `download_url_expires_at` cache slot for §11.1 signed-URL throttling,
    and `parse_results` jsonb for the audit trail.
  - new column `berths.current_pdf_version_id` (deferred from Phase 0)
    with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
  - relations + types exported from `schema/berths.ts`.

3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
  1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
     `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
     fields, so this is defensive coverage for future templates.
  2. OCR via Tesseract.js — positional/regex heuristics keyed off the
     §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
     `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
     per-field confidence + global mean; flags imperial-vs-metric drift
     >1% in `warnings`.
  3. AI fallback — gated via `getResolvedOcrConfig()` (existing
     openai/claude provider). Surfaced from the diff dialog only when
     `shouldOfferAiTier()` returns true (mean OCR confidence below
     0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.

Service layer (`lib/services/berth-pdf.service.ts`):
  - `uploadBerthPdf()` — magic-byte check, size cap, version-number
    bump + current pointer in one transaction.
  - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
    flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
    columns; warns on mooring-number-in-PDF mismatch (§14.6).
  - `applyParseResults()` — hard allowlist of writable columns;
    stamps `appliedFields` onto `parse_results` for audit.
  - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
  - `listBerthPdfVersions()` — version list with 15-min signed URLs.
  - `getMaxUploadMb()` — port-override → global → default 15 lookup
    on `system_settings.berth_pdf_max_upload_mb`.

§14.6 critical mitigations:
  - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
    storage object and rejects the request.
  - Size cap from `system_settings.berth_pdf_max_upload_mb` (default
    15 MB); enforced in the upload-url presign AND server-side.
  - 0-byte uploads rejected.
  - Mooring-number mismatch surfaces as a `warnings[]` entry on the
    reconcile result so the rep sees it in the diff dialog.
  - Imperial vs metric ±1% tolerance in both the parser warnings and
    the reconcile equality check.
  - Path traversal already blocked at the storage layer (Phase 6a).

API + UI:
  - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
    HMAC-signed proxy URL (filesystem) sized to the per-port cap.
  - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
    `backend.head()`, writes the row, bumps `current_pdf_version_id`.
  - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
  - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
  - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
    rep-confirmed diff payload.
  - New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
    with current-PDF panel, version history, Replace PDF button, and
    `<PdfReconcileDialog>` for the auto-applied + conflicts UX.

System settings:
  - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
    + server-side validation. Resolved port-override → global → default.

Tests:
  - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
    feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
    drift warning, AI-tier gate.
  - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
    pdf-lib AcroForm round-trip.
  - `tests/integration/berth-pdf-versions.test.ts` — upload, version-
    number bump, magic-byte rejection, reconcile auto-applied vs
    conflicts vs ±1% tolerance, mooring-number warning,
    applyParseResults allowlist enforcement, rollback semantics.

Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt Ciaccio
2026-05-05 03:34:24 +02:00
parent 83693dd993
commit 249ffe3e4a
22 changed files with 13349 additions and 0 deletions

View File

@@ -0,0 +1,70 @@
/**
* Returns a presigned URL the browser can use to PUT a PDF directly to the
* active storage backend. The URL is constrained by content-length-range up
* to `system_settings.berth_pdf_max_upload_mb` (default 15 MB) per §11.1.
*
* For S3 backends this is a true signed URL; for filesystem backends it's a
* CRM-internal proxy URL with an HMAC token (see `FilesystemBackend`).
*/
import { NextResponse } from 'next/server';
import { type RouteHandler } from '@/lib/api/helpers';
import { db } from '@/lib/db';
import { berths } from '@/lib/db/schema/berths';
import { eq } from 'drizzle-orm';
import { errorResponse, NotFoundError, ValidationError } from '@/lib/errors';
import { getMaxUploadMb } from '@/lib/services/berth-pdf.service';
import { getStorageBackend } from '@/lib/storage';
interface PostBody {
fileName: string;
/** Size hint in bytes — used to early-reject oversized uploads before we
* burn a presigned URL. */
sizeBytes?: number;
}
export const postHandler: RouteHandler = async (req, _ctx, params) => {
try {
const body = (await req.json()) as Partial<PostBody>;
const fileName = (body.fileName ?? '').trim();
if (!fileName) throw new ValidationError('fileName is required');
const berthRow = await db.query.berths.findFirst({ where: eq(berths.id, params.id!) });
if (!berthRow) throw new NotFoundError('Berth');
const maxMb = await getMaxUploadMb(berthRow.portId);
const maxBytes = maxMb * 1024 * 1024;
if (typeof body.sizeBytes === 'number' && body.sizeBytes > maxBytes) {
throw new ValidationError(
`File exceeds ${maxMb} MB upload cap (got ${(body.sizeBytes / 1024 / 1024).toFixed(1)} MB).`,
);
}
// Provisional version number: the actual row insert happens in POST
// /pdf-versions and re-computes via SELECT max+1 inside a transaction,
// so a race between two reps just shifts which one wins the version
// slot. The storage key is gen_random_uuid()-namespaced so collisions
// in the storage layer are impossible.
const sanitized = fileName.replace(/[^a-zA-Z0-9._-]/g, '_').slice(0, 200) || 'berth.pdf';
const storageKey = `berths/${params.id!}/uploads/${crypto.randomUUID()}_${sanitized}`;
const backend = await getStorageBackend();
const presigned = await backend.presignUpload(storageKey, {
contentType: 'application/pdf',
expirySeconds: 900,
});
return NextResponse.json({
data: {
url: presigned.url,
method: presigned.method,
storageKey,
maxBytes,
backend: backend.name,
},
});
} catch (error) {
return errorResponse(error);
}
};

View File

@@ -0,0 +1,5 @@
import { withAuth, withPermission } from '@/lib/api/helpers';
import { postHandler } from './handlers';
export const POST = withAuth(withPermission('berths', 'edit', postHandler));

View File

@@ -0,0 +1,14 @@
import { NextResponse } from 'next/server';
import { type RouteHandler } from '@/lib/api/helpers';
import { errorResponse } from '@/lib/errors';
import { rollbackToVersion } from '@/lib/services/berth-pdf.service';
export const postHandler: RouteHandler = async (_req, _ctx, params) => {
try {
const result = await rollbackToVersion(params.id!, params.versionId!);
return NextResponse.json({ data: result });
} catch (error) {
return errorResponse(error);
}
};

View File

@@ -0,0 +1,5 @@
import { withAuth, withPermission } from '@/lib/api/helpers';
import { postHandler } from './handlers';
export const POST = withAuth(withPermission('berths', 'edit', postHandler));

View File

@@ -0,0 +1,88 @@
/**
* Route handlers for `/api/v1/berths/[id]/pdf-versions` (Phase 6b).
*
* Lives in handlers.ts (not route.ts) so integration tests can call them
* directly, bypassing the auth/permission middleware (per CLAUDE.md
* "Route handler exports" convention).
*/
import { NextResponse } from 'next/server';
import { type RouteHandler } from '@/lib/api/helpers';
import { errorResponse, ValidationError } from '@/lib/errors';
import { listBerthPdfVersions, uploadBerthPdf } from '@/lib/services/berth-pdf.service';
interface PostBody {
storageKey: string;
fileName: string;
fileSizeBytes: number;
sha256: string;
parseResults?: {
engine: 'acroform' | 'ocr' | 'ai';
extracted?: Record<string, unknown>;
meanConfidence?: number;
warnings?: string[];
};
}
export const getHandler: RouteHandler = async (_req, _ctx, params) => {
try {
const versions = await listBerthPdfVersions(params.id!);
return NextResponse.json({ data: versions });
} catch (error) {
return errorResponse(error);
}
};
export const postHandler: RouteHandler = async (req, ctx, params) => {
try {
const body = (await req.json()) as Partial<PostBody>;
if (!body.storageKey || !body.fileName) {
throw new ValidationError('storageKey and fileName are required');
}
if (typeof body.fileSizeBytes !== 'number' || body.fileSizeBytes <= 0) {
throw new ValidationError('fileSizeBytes must be a positive integer');
}
if (!body.sha256 || typeof body.sha256 !== 'string') {
throw new ValidationError('sha256 is required');
}
const result = await uploadBerthPdf({
berthId: params.id!,
storageKey: body.storageKey,
fileName: body.fileName,
fileSizeBytes: body.fileSizeBytes,
sha256: body.sha256,
uploadedBy: ctx.userId,
parseResult: body.parseResults
? {
engine: body.parseResults.engine,
// Reconstruct just enough of the ParseResult shape to round-trip
// through serialization; the rep already saw the conflicts in the
// diff dialog, so storing the engine + extracted is what we need
// for audit.
fields: Object.fromEntries(
Object.entries(body.parseResults.extracted ?? {}).map(([k, v]) => {
if (v && typeof v === 'object' && 'value' in v) {
const obj = v as { value: unknown; confidence?: number };
return [
k,
{
value: obj.value as never,
confidence: typeof obj.confidence === 'number' ? obj.confidence : 1,
engine: body.parseResults!.engine,
},
];
}
return [k, undefined];
}),
) as never,
meanConfidence: body.parseResults.meanConfidence ?? 1,
warnings: body.parseResults.warnings ?? [],
}
: undefined,
});
return NextResponse.json({ data: result }, { status: 201 });
} catch (error) {
return errorResponse(error);
}
};

View File

@@ -0,0 +1,24 @@
import { NextResponse } from 'next/server';
import { type RouteHandler } from '@/lib/api/helpers';
import { errorResponse, ValidationError } from '@/lib/errors';
import { applyParseResults, type ExtractedBerthFields } from '@/lib/services/berth-pdf.service';
interface PostBody {
versionId: string;
fieldsToApply: Partial<ExtractedBerthFields>;
}
export const postHandler: RouteHandler = async (req, _ctx, params) => {
try {
const body = (await req.json()) as Partial<PostBody>;
if (!body.versionId) throw new ValidationError('versionId is required');
if (!body.fieldsToApply || typeof body.fieldsToApply !== 'object') {
throw new ValidationError('fieldsToApply must be an object');
}
const result = await applyParseResults(params.id!, body.versionId, body.fieldsToApply);
return NextResponse.json({ data: result });
} catch (error) {
return errorResponse(error);
}
};

View File

@@ -0,0 +1,5 @@
import { withAuth, withPermission } from '@/lib/api/helpers';
import { postHandler } from './handlers';
export const POST = withAuth(withPermission('berths', 'edit', postHandler));

View File

@@ -0,0 +1,6 @@
import { withAuth, withPermission } from '@/lib/api/helpers';
import { getHandler, postHandler } from './handlers';
export const GET = withAuth(withPermission('berths', 'view', getHandler));
export const POST = withAuth(withPermission('berths', 'edit', postHandler));