Files
pn-new-crm/tests/unit/services/berth-pdf-acroform.test.ts

60 lines
2.0 KiB
TypeScript
Raw Normal View History

feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
/**
* AcroForm-tier test for parseBerthPdf. Builds a synthetic PDF with named
* AcroForm fields via pdf-lib and asserts the parser pulls them out without
* needing OCR.
*/
import { describe, expect, it } from 'vitest';
import { PDFDocument } from 'pdf-lib';
import { parseBerthPdf } from '@/lib/services/berth-pdf-parser';
async function buildAcroFormPdf(): Promise<Buffer> {
const doc = await PDFDocument.create();
doc.addPage([400, 400]);
const form = doc.getForm();
const fields: Array<[string, string]> = [
['mooring_number', 'A1'],
['length_ft', '206.67'],
['length_m', '63'],
['width_ft', '46.58'],
['width_m', '14.2'],
['power_capacity', '330'],
['voltage', '480'],
['weekly_rate_high_usd', '11341'],
['weekly_rate_low_usd', '8100'],
['daily_rate_high_usd', '1890'],
['daily_rate_low_usd', '1350'],
['pricing_valid_until', '2025-09-15'],
['bow_facing', 'East'],
['mooring_type', 'Side Pier / Med Mooring'],
];
for (const [name, value] of fields) {
const field = form.createTextField(name);
field.setText(value);
}
const bytes = await doc.save();
return Buffer.from(bytes);
}
describe('parseBerthPdf — AcroForm tier', () => {
it('extracts named fields and skips OCR', async () => {
const buf = await buildAcroFormPdf();
const result = await parseBerthPdf(buf, { skipOcr: true });
expect(result.engine).toBe('acroform');
expect(result.fields.mooringNumber?.value).toBe('A1');
expect(result.fields.lengthFt?.value).toBeCloseTo(206.67, 1);
expect(result.fields.lengthM?.value).toBe(63);
expect(result.fields.weeklyRateHighUsd?.value).toBe(11341);
expect(result.fields.pricingValidUntil?.value).toBe('2025-09-15');
expect(result.fields.bowFacing?.value).toBe('East');
expect(result.meanConfidence).toBe(1);
});
it('rejects a non-PDF buffer via magic-byte check', async () => {
await expect(parseBerthPdf(Buffer.from('not a pdf'))).rejects.toThrow(/magic-byte/);
});
});