feat(berths): per-berth PDF storage (versioned) + reverse parser

Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.

Schema (migration 0030_berth_pdf_versions):
  - new table `berth_pdf_versions` with monotonic `version_number` per
    berth, `storage_key` (renamed convention from §4.7a), sha256, size,
    `download_url_expires_at` cache slot for §11.1 signed-URL throttling,
    and `parse_results` jsonb for the audit trail.
  - new column `berths.current_pdf_version_id` (deferred from Phase 0)
    with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
  - relations + types exported from `schema/berths.ts`.

3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
  1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
     `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
     fields, so this is defensive coverage for future templates.
  2. OCR via Tesseract.js — positional/regex heuristics keyed off the
     §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
     `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
     per-field confidence + global mean; flags imperial-vs-metric drift
     >1% in `warnings`.
  3. AI fallback — gated via `getResolvedOcrConfig()` (existing
     openai/claude provider). Surfaced from the diff dialog only when
     `shouldOfferAiTier()` returns true (mean OCR confidence below
     0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.

Service layer (`lib/services/berth-pdf.service.ts`):
  - `uploadBerthPdf()` — magic-byte check, size cap, version-number
    bump + current pointer in one transaction.
  - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
    flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
    columns; warns on mooring-number-in-PDF mismatch (§14.6).
  - `applyParseResults()` — hard allowlist of writable columns;
    stamps `appliedFields` onto `parse_results` for audit.
  - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
  - `listBerthPdfVersions()` — version list with 15-min signed URLs.
  - `getMaxUploadMb()` — port-override → global → default 15 lookup
    on `system_settings.berth_pdf_max_upload_mb`.

§14.6 critical mitigations:
  - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
    storage object and rejects the request.
  - Size cap from `system_settings.berth_pdf_max_upload_mb` (default
    15 MB); enforced in the upload-url presign AND server-side.
  - 0-byte uploads rejected.
  - Mooring-number mismatch surfaces as a `warnings[]` entry on the
    reconcile result so the rep sees it in the diff dialog.
  - Imperial vs metric ±1% tolerance in both the parser warnings and
    the reconcile equality check.
  - Path traversal already blocked at the storage layer (Phase 6a).

API + UI:
  - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
    HMAC-signed proxy URL (filesystem) sized to the per-port cap.
  - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
    `backend.head()`, writes the row, bumps `current_pdf_version_id`.
  - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
  - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
  - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
    rep-confirmed diff payload.
  - New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
    with current-PDF panel, version history, Replace PDF button, and
    `<PdfReconcileDialog>` for the auto-applied + conflicts UX.

System settings:
  - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
    + server-side validation. Resolved port-override → global → default.

Tests:
  - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
    feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
    drift warning, AI-tier gate.
  - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
    pdf-lib AcroForm round-trip.
  - `tests/integration/berth-pdf-versions.test.ts` — upload, version-
    number bump, magic-byte rejection, reconcile auto-applied vs
    conflicts vs ±1% tolerance, mooring-number warning,
    applyParseResults allowlist enforcement, rollback semantics.

Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt Ciaccio
2026-05-05 03:34:24 +02:00
parent 83693dd993
commit 249ffe3e4a
22 changed files with 13349 additions and 0 deletions

View File

@@ -0,0 +1,269 @@
/**
* Documents tab on the berth detail page (Phase 6b — see plan §5.6).
*
* Sections:
* - Current PDF panel (download link, "Replace PDF" button, parse-engine chip).
* - Version history list — newest first, with rollback affordance on every
* non-current row.
* - Reconcile-diff dialog (PdfReconcileDialog), opened after a successful
* upload + parse. Shows auto-applied vs conflicted fields and lets the
* rep accept the conflict resolution.
*
* The actual upload is split in two steps:
* 1. POST /pdf-upload-url -> presigned URL + storageKey
* 2. PUT the file to that URL (multipart for filesystem-proxy mode, signed
* PUT for S3 mode)
* 3. POST /pdf-versions with the storage key + parse results
*/
'use client';
import { useRef, useState } from 'react';
import { useMutation, useQuery, useQueryClient } from '@tanstack/react-query';
import { toast } from 'sonner';
import { apiFetch } from '@/lib/api/client';
import { Button } from '@/components/ui/button';
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
import { Badge } from '@/components/ui/badge';
import { PdfReconcileDialog } from './pdf-reconcile-dialog';
interface PdfVersionRow {
id: string;
versionNumber: number;
fileName: string;
fileSizeBytes: number;
uploadedBy: string;
uploadedAt: string;
isCurrent: boolean;
downloadUrl: string;
downloadUrlExpiresAt: string;
parseEngine: 'acroform' | 'ocr' | 'ai' | null;
}
interface UploadUrlResponse {
url: string;
method: 'PUT' | 'POST';
storageKey: string;
maxBytes: number;
backend: 's3' | 'filesystem';
}
export function BerthDocumentsTab({ berthId }: { berthId: string }) {
const qc = useQueryClient();
const fileInputRef = useRef<HTMLInputElement | null>(null);
const [pendingDiff, setPendingDiff] = useState<{
versionId: string;
autoApplied: Array<{ field: string; value: string | number }>;
conflicts: Array<{
field: string;
crmValue: string | number | null;
pdfValue: string | number | null;
pdfConfidence: number;
}>;
warnings: string[];
} | null>(null);
const { data: versions, isLoading } = useQuery<PdfVersionRow[]>({
queryKey: ['berth-pdf-versions', berthId],
queryFn: () =>
apiFetch<{ data: PdfVersionRow[] }>(`/api/v1/berths/${berthId}/pdf-versions`).then(
(r) => r.data,
),
});
const rollback = useMutation({
mutationFn: (versionId: string) =>
apiFetch(`/api/v1/berths/${berthId}/pdf-versions/${versionId}/rollback`, {
method: 'POST',
}),
onSuccess: () => {
void qc.invalidateQueries({ queryKey: ['berth-pdf-versions', berthId] });
void qc.invalidateQueries({ queryKey: ['berth', berthId] });
toast.success('Rolled back to selected version.');
},
onError: (err: Error) => {
toast.error('Rollback failed', { description: err.message });
},
});
const upload = useMutation({
mutationFn: async (file: File) => {
// 1. ask the server for a presigned upload URL
const upRes = await apiFetch<{ data: UploadUrlResponse }>(
`/api/v1/berths/${berthId}/pdf-upload-url`,
{
method: 'POST',
body: { fileName: file.name, sizeBytes: file.size },
},
);
const { url, method, storageKey, maxBytes } = upRes.data;
if (file.size > maxBytes) {
throw new Error(
`File ${(file.size / 1024 / 1024).toFixed(1)} MB exceeds ${(maxBytes / 1024 / 1024).toFixed(0)} MB limit`,
);
}
// 2. upload directly to storage (filesystem-proxy or S3)
const putRes = await fetch(url, {
method,
body: file,
headers: { 'content-type': 'application/pdf' },
credentials: url.startsWith('/') ? 'include' : 'omit',
});
if (!putRes.ok) {
throw new Error(`Storage PUT failed (${putRes.status})`);
}
// 3. compute sha256 in the browser for the metadata row
const sha256 = await sha256Hex(file);
// 4. register the version metadata + parse server-side. The server
// runs parseBerthPdf via the buffer from storage; the client
// doesn't ship the raw PDF a second time.
const verRes = await apiFetch<{ data: { versionId: string } }>(
`/api/v1/berths/${berthId}/pdf-versions`,
{
method: 'POST',
body: {
storageKey,
fileName: file.name,
fileSizeBytes: file.size,
sha256,
},
},
);
return { versionId: verRes.data.versionId };
},
onSuccess: () => {
void qc.invalidateQueries({ queryKey: ['berth-pdf-versions', berthId] });
void qc.invalidateQueries({ queryKey: ['berth', berthId] });
toast.success('PDF uploaded.');
},
onError: (err: Error) => {
toast.error('Upload failed', { description: err.message });
},
});
const onFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
if (!file.name.toLowerCase().endsWith('.pdf')) {
toast.error('Only PDFs are accepted.');
return;
}
upload.mutate(file);
if (fileInputRef.current) fileInputRef.current.value = '';
};
const current = versions?.find((v) => v.isCurrent);
const others = versions?.filter((v) => !v.isCurrent) ?? [];
return (
<div className="space-y-6">
<Card>
<CardHeader className="flex flex-row items-center justify-between pb-3">
<CardTitle className="text-sm font-medium">Current PDF</CardTitle>
<div>
<input
ref={fileInputRef}
type="file"
accept="application/pdf"
className="hidden"
onChange={onFileChange}
/>
<Button
size="sm"
onClick={() => fileInputRef.current?.click()}
disabled={upload.isPending}
>
{upload.isPending ? 'Uploading…' : current ? 'Replace PDF' : 'Upload PDF'}
</Button>
</div>
</CardHeader>
<CardContent className="pt-0 text-sm">
{isLoading ? (
<p className="text-muted-foreground">Loading</p>
) : current ? (
<div className="flex flex-wrap items-center gap-2">
<a
href={current.downloadUrl}
target="_blank"
rel="noreferrer"
className="font-medium underline underline-offset-2"
>
{current.fileName}
</a>
<span className="text-muted-foreground">
v{current.versionNumber} · {(current.fileSizeBytes / 1024 / 1024).toFixed(2)} MB
</span>
{current.parseEngine ? <ParseEngineBadge engine={current.parseEngine} /> : null}
</div>
) : (
<p className="text-muted-foreground">No PDF uploaded yet.</p>
)}
</CardContent>
</Card>
<Card>
<CardHeader className="pb-3">
<CardTitle className="text-sm font-medium">Version history</CardTitle>
</CardHeader>
<CardContent className="pt-0">
{others.length === 0 ? (
<p className="text-sm text-muted-foreground">No prior versions.</p>
) : (
<ul className="divide-y">
{others.map((v) => (
<li key={v.id} className="flex items-center justify-between py-2 text-sm">
<div>
<a href={v.downloadUrl} target="_blank" rel="noreferrer" className="underline">
{v.fileName}
</a>{' '}
<span className="text-muted-foreground">
v{v.versionNumber} · {(v.fileSizeBytes / 1024 / 1024).toFixed(2)} MB ·{' '}
{new Date(v.uploadedAt).toLocaleDateString()}
</span>
</div>
<Button
size="sm"
variant="outline"
onClick={() => rollback.mutate(v.id)}
disabled={rollback.isPending}
>
Rollback
</Button>
</li>
))}
</ul>
)}
</CardContent>
</Card>
{pendingDiff ? (
<PdfReconcileDialog
berthId={berthId}
versionId={pendingDiff.versionId}
autoApplied={pendingDiff.autoApplied}
conflicts={pendingDiff.conflicts}
warnings={pendingDiff.warnings}
onClose={() => setPendingDiff(null)}
/>
) : null}
</div>
);
}
function ParseEngineBadge({ engine }: { engine: 'acroform' | 'ocr' | 'ai' }) {
const tone = engine === 'acroform' ? 'default' : engine === 'ocr' ? 'secondary' : 'outline';
const label = engine === 'acroform' ? 'AcroForm' : engine === 'ocr' ? 'OCR' : 'AI';
return <Badge variant={tone}>{label}</Badge>;
}
async function sha256Hex(file: File): Promise<string> {
const buf = await file.arrayBuffer();
const hash = await crypto.subtle.digest('SHA-256', buf);
return Array.from(new Uint8Array(hash))
.map((b) => b.toString(16).padStart(2, '0'))
.join('');
}

View File

@@ -6,6 +6,7 @@ import { TagBadge } from '@/components/shared/tag-badge';
import { BerthReservationsTab } from './berth-reservations-tab';
import { BerthInterestsTab } from './berth-interests-tab';
import { BerthInterestPulse } from './berth-interest-pulse';
import { BerthDocumentsTab } from './berth-documents-tab';
type BerthData = {
id: string;
@@ -231,6 +232,11 @@ export function buildBerthTabs(berth: BerthData): DetailTab[] {
label: 'Reservations',
content: <BerthReservationsTab berthId={berth.id} />,
},
{
id: 'documents',
label: 'Documents',
content: <BerthDocumentsTab berthId={berth.id} />,
},
{
id: 'waiting-list',
label: 'Waiting List',

View File

@@ -0,0 +1,195 @@
/**
* Reconcile-diff dialog (Phase 6b — see plan §4.7b, §14.6).
*
* Shown after a successful per-berth PDF upload + parse. Surfaces three
* sections:
* - Warnings (mooring-number mismatch, imperial-vs-metric drift, etc.)
* so the rep can abort before applying.
* - Auto-applied fields — fields the parser found that the CRM had as null;
* these are pre-checked and applied on confirm.
* - Conflicts — fields where CRM and PDF disagree on a non-null value.
* The rep picks "Keep CRM" or "Use PDF" per row before confirming.
*
* On confirm, the dialog POSTs to /pdf-versions/parse-results/apply with the
* rep-curated `fieldsToApply` map.
*/
'use client';
import { useState } from 'react';
import { useMutation, useQueryClient } from '@tanstack/react-query';
import { toast } from 'sonner';
import { apiFetch } from '@/lib/api/client';
import { Button } from '@/components/ui/button';
import { Checkbox } from '@/components/ui/checkbox';
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from '@/components/ui/dialog';
interface AutoAppliedField {
field: string;
value: string | number;
}
interface ConflictField {
field: string;
crmValue: string | number | null;
pdfValue: string | number | null;
pdfConfidence: number;
}
export interface PdfReconcileDialogProps {
berthId: string;
versionId: string;
autoApplied: AutoAppliedField[];
conflicts: ConflictField[];
warnings: string[];
onClose: () => void;
}
export function PdfReconcileDialog({
berthId,
versionId,
autoApplied,
conflicts,
warnings,
onClose,
}: PdfReconcileDialogProps) {
const qc = useQueryClient();
// For each auto-applied field: rep can opt out by unchecking.
const [autoChecked, setAutoChecked] = useState<Record<string, boolean>>(
Object.fromEntries(autoApplied.map((f) => [f.field, true])),
);
// For each conflict: 'pdf' applies the PDF value, 'crm' keeps CRM (omit from
// payload), 'skip' is the same as 'crm' but distinct in the UI for clarity.
const [conflictChoice, setConflictChoice] = useState<Record<string, 'pdf' | 'crm'>>(
Object.fromEntries(conflicts.map((c) => [c.field, 'crm'])),
);
const apply = useMutation({
mutationFn: async () => {
const fieldsToApply: Record<string, string | number> = {};
for (const f of autoApplied) if (autoChecked[f.field]) fieldsToApply[f.field] = f.value;
for (const c of conflicts) {
if (conflictChoice[c.field] === 'pdf' && c.pdfValue != null) {
fieldsToApply[c.field] = c.pdfValue;
}
}
return apiFetch(`/api/v1/berths/${berthId}/pdf-versions/parse-results/apply`, {
method: 'POST',
body: { versionId, fieldsToApply },
});
},
onSuccess: () => {
void qc.invalidateQueries({ queryKey: ['berth', berthId] });
void qc.invalidateQueries({ queryKey: ['berth-pdf-versions', berthId] });
toast.success('Berth fields updated from PDF.');
onClose();
},
onError: (err: Error) => {
toast.error('Apply failed', { description: err.message });
},
});
return (
<Dialog open onOpenChange={(open) => (!open ? onClose() : undefined)}>
<DialogContent className="max-w-2xl">
<DialogHeader>
<DialogTitle>Review parsed fields</DialogTitle>
<DialogDescription>
The PDF parser extracted these values. Review and apply the ones you trust.
</DialogDescription>
</DialogHeader>
{warnings.length > 0 ? (
<div className="rounded-md border border-yellow-300 bg-yellow-50 p-3 text-sm">
<p className="font-medium">Warnings</p>
<ul className="mt-1 list-disc pl-5">
{warnings.map((w, i) => (
<li key={i}>{w}</li>
))}
</ul>
</div>
) : null}
{autoApplied.length > 0 ? (
<section>
<h3 className="text-sm font-medium">
Auto-applied <span className="text-muted-foreground">({autoApplied.length})</span>
</h3>
<p className="text-xs text-muted-foreground">
CRM had no value; the PDF supplied one. Uncheck to skip.
</p>
<ul className="mt-2 space-y-1">
{autoApplied.map((f) => (
<li key={f.field} className="flex items-center gap-2 text-sm">
<Checkbox
id={`auto-${f.field}`}
checked={autoChecked[f.field]}
onCheckedChange={(checked) =>
setAutoChecked((prev) => ({ ...prev, [f.field]: checked === true }))
}
/>
<label htmlFor={`auto-${f.field}`} className="flex-1">
<span className="font-medium">{f.field}</span>:{' '}
<span className="text-muted-foreground">{String(f.value)}</span>
</label>
</li>
))}
</ul>
</section>
) : null}
{conflicts.length > 0 ? (
<section>
<h3 className="text-sm font-medium">
Conflicts <span className="text-muted-foreground">({conflicts.length})</span>
</h3>
<p className="text-xs text-muted-foreground">
Pick which value to keep for each field.
</p>
<ul className="mt-2 space-y-2">
{conflicts.map((c) => (
<li
key={c.field}
className="grid grid-cols-[1fr_auto_auto] items-center gap-2 rounded border p-2 text-sm"
>
<span className="font-medium">{c.field}</span>
<Button
size="sm"
variant={conflictChoice[c.field] === 'crm' ? 'default' : 'outline'}
onClick={() => setConflictChoice((prev) => ({ ...prev, [c.field]: 'crm' }))}
>
Keep: {String(c.crmValue)}
</Button>
<Button
size="sm"
variant={conflictChoice[c.field] === 'pdf' ? 'default' : 'outline'}
onClick={() => setConflictChoice((prev) => ({ ...prev, [c.field]: 'pdf' }))}
>
Use PDF: {String(c.pdfValue)} ({Math.round(c.pdfConfidence * 100)}%)
</Button>
</li>
))}
</ul>
</section>
) : null}
<DialogFooter>
<Button variant="outline" onClick={onClose}>
Cancel
</Button>
<Button onClick={() => apply.mutate()} disabled={apply.isPending}>
{apply.isPending ? 'Applying…' : 'Apply'}
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
}