Files
pn-new-crm/src/lib/services/berth-pdf.service.ts

659 lines
24 KiB
TypeScript
Raw Normal View History

feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
/**
* Berth PDF management service (Phase 6b see plan §4.7b, §11.1, §14.6).
*
* Responsibilities:
* - Upload a per-berth PDF (versioned), via the active `StorageBackend`.
* - Verify the magic bytes (`%PDF-`) before persisting; delete the storage
* object on mismatch (§14.6 critical).
* - Reconcile the parsed fields against the current berth row, surfacing
* conflicts for the rep's diff dialog and auto-applying nullable gaps.
* - Enforce per-port size cap from `system_settings.berth_pdf_max_upload_mb`.
* - Generate signed download URLs for the version list.
*/
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
import { createHash } from 'node:crypto';
import { and, desc, eq, isNull, max, sql } from 'drizzle-orm';
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
import { db } from '@/lib/db';
import { berths, berthPdfVersions } from '@/lib/db/schema/berths';
import { systemSettings } from '@/lib/db/schema/system';
import { ConflictError, NotFoundError, ValidationError } from '@/lib/errors';
import { logger } from '@/lib/logger';
import { getStorageBackend } from '@/lib/storage';
import {
type ExtractedBerthFields,
type ParseResult,
type ParserEngine,
isPdfMagic,
} from './berth-pdf-parser';
// ─── shared types ────────────────────────────────────────────────────────────
export interface ReconcileConflict {
field: keyof ExtractedBerthFields;
crmValue: string | number | null;
pdfValue: string | number | null;
/** Confidence the parser assigned to the PDF value (0..1). */
pdfConfidence: number;
}
export interface ReconcileResult {
/** Fields where CRM was null and the PDF supplied a value; these can be
* applied automatically (the rep still sees them as "Auto-applied" chips). */
autoApplied: Array<{ field: keyof ExtractedBerthFields; value: string | number }>;
/** Fields where CRM and PDF disagree on a non-null value. The diff dialog
* shows these as a side-by-side comparison; nothing is written until the
* rep confirms via the apply endpoint. */
conflicts: ReconcileConflict[];
/** Pure-warning bucket e.g. mooring-number mismatch with the berth being
* uploaded to (§14.6). */
warnings: string[];
/** Engine that produced the parse — surfaced on the diff UI. */
engine: ParserEngine;
}
// Field allowlist for reconcile/apply. Mirrors `berths` columns; we never
// blindly write `crypto.randomUUID()` or anything outside this set so a
// rogue parser tier can't poison the schema.
const APPLIABLE_FIELDS: ReadonlyArray<keyof ExtractedBerthFields> = [
'lengthFt',
'lengthM',
'widthFt',
'widthM',
'draftFt',
'draftM',
'waterDepth',
'waterDepthM',
'bowFacing',
'sidePontoon',
'powerCapacity',
'voltage',
'mooringType',
'cleatType',
'cleatCapacity',
'bollardType',
'bollardCapacity',
'access',
'price',
'weeklyRateHighUsd',
'weeklyRateLowUsd',
'dailyRateHighUsd',
'dailyRateLowUsd',
'pricingValidUntil',
];
// Numeric berths columns are stored as `numeric` (Drizzle returns string).
// This set tells the apply path which fields need stringification.
const NUMERIC_FIELDS = new Set<keyof ExtractedBerthFields>([
'lengthFt',
'lengthM',
'widthFt',
'widthM',
'draftFt',
'draftM',
'waterDepth',
'waterDepthM',
'powerCapacity',
'voltage',
'price',
'weeklyRateHighUsd',
'weeklyRateLowUsd',
'dailyRateHighUsd',
'dailyRateLowUsd',
]);
// Tolerance for imperial vs metric reconcile. Same threshold as the parser.
const IMPERIAL_METRIC_TOLERANCE = 0.01;
// ─── settings helpers ────────────────────────────────────────────────────────
/** Resolve `berth_pdf_max_upload_mb` with port-override → global → default 15. */
export async function getMaxUploadMb(portId: string): Promise<number> {
const KEY = 'berth_pdf_max_upload_mb';
const [portRow] = await db
.select()
.from(systemSettings)
.where(and(eq(systemSettings.key, KEY), eq(systemSettings.portId, portId)));
if (portRow && typeof portRow.value === 'number') return portRow.value;
if (portRow && typeof portRow.value === 'string') {
const n = Number(portRow.value);
if (Number.isFinite(n)) return n;
}
const [globalRow] = await db
.select()
.from(systemSettings)
.where(and(eq(systemSettings.key, KEY), isNull(systemSettings.portId)));
if (globalRow && typeof globalRow.value === 'number') return globalRow.value;
if (globalRow && typeof globalRow.value === 'string') {
const n = Number(globalRow.value);
if (Number.isFinite(n)) return n;
}
return 15;
}
// ─── upload + version management ─────────────────────────────────────────────
export interface UploadBerthPdfArgs {
berthId: string;
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/**
* Acting tenant. Every public service entrypoint requires this so the berth
* lookup can be scoped to `(berth.id, port_id)` without it a rep with
* berths:edit on port A could supply a port B berth UUID and write/read
* cross-tenant data. NotFoundError on mismatch.
*/
portId: string;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
/** Already-uploaded storage key (the upload-url endpoint generated it) OR
* undefined to make this service compute one. */
storageKey?: string;
/** Raw bytes when the server proxies the upload (filesystem mode); when
* callers used a presigned PUT they pass `storageKey` and skip this. */
buffer?: Buffer;
fileName: string;
uploadedBy: string;
/** Pre-computed sha256 hex from the client (verified server-side anyway). */
sha256?: string;
/** Pre-computed bytes (used for the size cap pre-flight on direct uploads). */
fileSizeBytes?: number;
/** Result of running `parseBerthPdf` server-side. Optional the rep may
* have skipped parsing on a re-upload. */
parseResult?: ParseResult;
}
export interface UploadBerthPdfResult {
versionId: string;
storageKey: string;
versionNumber: number;
fileSizeBytes: number;
contentSha256: string;
}
/**
* Persist a per-berth PDF version. Either the raw `buffer` or a pre-uploaded
* `storageKey` (with optional `buffer` for verification) is required.
*
* Critical mitigations enforced here:
* - §14.6 magic-byte check against the buffer when present.
* - §14.6 size cap from `berth_pdf_max_upload_mb`.
* - Storage key namespaced under `berths/{id}/v{n}/...` so two reps racing
* on the same berth can't collide (the version-number unique index in
* the DB does the dedup).
*/
export async function uploadBerthPdf(args: UploadBerthPdfArgs): Promise<UploadBerthPdfResult> {
// 1. Resolve the berth + port for size-cap lookup.
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
// Tenant-scoped lookup — NotFoundError when the berth lives in a different
// port so a rep on port A cannot upload PDFs against port B's berths.
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, args.berthId), eq(berths.portId, args.portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const maxMb = await getMaxUploadMb(berthRow.portId);
const maxBytes = maxMb * 1024 * 1024;
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// 2. Per-berth advisory lock prevents two concurrent uploads from both
// computing version `v3` and racing to write blobs (the unique index
// on (berth_id, version_number) would catch the second insert, but
// only AFTER its blob is already in storage — leaving an orphan).
// The lock is scoped to a transaction wrapping the version-number
// read AND the blob write, so concurrent uploads serialize cleanly.
// NB: hash the UUID into a 32-bit int for pg_advisory_xact_lock(int).
const berthLockKey = hashBerthIdToInt(args.berthId);
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
// 3. Magic bytes + size when we have the buffer in hand.
const backend = await getStorageBackend();
const buffer = args.buffer;
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// UUID-based storage key path so two concurrent uploads can't collide
// on the same blob path (the version_number suffix used to be in the
// key but is now a separate DB column allocated under the per-berth
// advisory lock — see step 4).
let versionNumber = 1;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
let storageKey =
args.storageKey ??
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
`berths/${args.berthId}/${crypto.randomUUID()}/${sanitizeFileName(args.fileName)}`;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
let sizeBytes = args.fileSizeBytes ?? buffer?.length ?? 0;
let sha256 = args.sha256 ?? '';
if (buffer) {
if (!isPdfMagic(buffer)) {
// Best-effort cleanup if the storage already has a partial.
if (args.storageKey) await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
'Uploaded file failed PDF magic-byte check (does not start with %PDF-).',
);
}
if (buffer.length === 0) throw new ValidationError('Uploaded PDF is empty (0 bytes).');
if (buffer.length > maxBytes) {
throw new ValidationError(
`PDF exceeds ${maxMb} MB upload cap (got ${(buffer.length / 1024 / 1024).toFixed(1)} MB).`,
);
}
const written = await backend.put(storageKey, buffer, { contentType: 'application/pdf' });
storageKey = written.key;
sizeBytes = written.sizeBytes;
sha256 = written.sha256;
} else if (args.storageKey) {
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// Browser uploaded directly via presigned URL — verify via HEAD + magic bytes.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const head = await backend.head(args.storageKey);
if (!head) {
throw new ValidationError('Uploaded object not found at expected storage key.');
}
if (head.sizeBytes === 0) {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError('Uploaded PDF is empty (0 bytes).');
}
if (head.sizeBytes > maxBytes) {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
`PDF exceeds ${maxMb} MB upload cap (got ${(head.sizeBytes / 1024 / 1024).toFixed(1)} MB).`,
);
}
if (head.contentType !== 'application/pdf' && head.contentType !== 'application/octet-stream') {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
`Uploaded object content-type is ${head.contentType}; expected application/pdf.`,
);
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// Magic-byte check on the presign path (§14.6 critical) - browser-
// uploaded objects could be anything until we read the bytes. Stream
// just the first 5 bytes; abort early on mismatch and delete the blob.
const probeBytes = await readFirstBytes(backend, args.storageKey, 5);
if (!isPdfMagic(probeBytes)) {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
'Uploaded file failed PDF magic-byte check (does not start with %PDF-).',
);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
sizeBytes = head.sizeBytes;
sha256 = args.sha256 ?? '';
storageKey = args.storageKey;
} else {
throw new ValidationError('Either buffer or storageKey is required.');
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// 4. Take the per-berth advisory lock, compute version_number under
// the lock, insert + bump pointer. All inside a single transaction
// so the lock + writes commit atomically.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const versionId = crypto.randomUUID();
await db.transaction(async (tx) => {
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
await tx.execute(sql`SELECT pg_advisory_xact_lock(${berthLockKey})`);
versionNumber = await nextVersionNumberTx(tx, args.berthId);
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
await tx.insert(berthPdfVersions).values({
id: versionId,
berthId: args.berthId,
versionNumber,
storageKey,
fileName: args.fileName,
fileSizeBytes: sizeBytes,
contentSha256: sha256,
uploadedBy: args.uploadedBy,
parseResults: args.parseResult ? serializeParseResult(args.parseResult) : null,
});
await tx
.update(berths)
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
.where(eq(berths.id, args.berthId));
});
logger.info(
{ berthId: args.berthId, versionId, versionNumber, storageKey, sizeBytes },
'Berth PDF version saved',
);
return { versionId, storageKey, versionNumber, fileSizeBytes: sizeBytes, contentSha256: sha256 };
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
/** Tx-bound variant — same SELECT MAX(...) but inside the caller's transaction. */
async function nextVersionNumberTx(
tx: Parameters<Parameters<typeof db.transaction>[0]>[0],
berthId: string,
): Promise<number> {
const [row] = await tx
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
.select({ max: max(berthPdfVersions.versionNumber) })
.from(berthPdfVersions)
.where(eq(berthPdfVersions.berthId, berthId));
return (row?.max ?? 0) + 1;
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
/**
* Hash a UUID berthId into a 32-bit signed integer for pg_advisory_xact_lock.
* Uses the first 4 bytes of sha256 reinterpreted as int32 collisions are
* theoretically possible but the lock is per-berth so a collision just
* means two different berths' uploads serialize through the same key,
* which is harmless (correctness preserved, slight contention only).
*/
function hashBerthIdToInt(berthId: string): number {
const h = createHash('sha256').update(berthId).digest();
// Read as signed 32-bit big-endian; pg_advisory_xact_lock(int) signature.
return h.readInt32BE(0);
}
/**
* Stream just the first `n` bytes of a stored object so the magic-byte
* check on the presigned-PUT path can run without buffering the whole
* file. Returns a Buffer of up to `n` bytes (less if the file is shorter).
*/
async function readFirstBytes(
backend: Awaited<ReturnType<typeof getStorageBackend>>,
key: string,
n: number,
): Promise<Buffer> {
const stream = await backend.get(key);
const chunks: Buffer[] = [];
let total = 0;
for await (const chunk of stream as AsyncIterable<Buffer | string>) {
const buf = typeof chunk === 'string' ? Buffer.from(chunk) : chunk;
chunks.push(buf);
total += buf.length;
if (total >= n) break;
}
// Best-effort dispose - some streams are still readable after iteration.
if (typeof (stream as { destroy?: () => void }).destroy === 'function') {
(stream as unknown as { destroy: () => void }).destroy();
}
return Buffer.concat(chunks).subarray(0, n);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
function sanitizeFileName(raw: string): string {
// Preserve the extension; replace spaces / disallowed chars with '_' so the
// result satisfies the storage-key validation regex.
const last = raw.split(/[\\/]/).pop() ?? raw;
return last.replace(/[^a-zA-Z0-9._-]/g, '_').slice(0, 200) || 'berth.pdf';
}
function serializeParseResult(parse: ParseResult): Record<string, unknown> {
return {
engine: parse.engine,
extracted: Object.fromEntries(
Object.entries(parse.fields).map(([k, v]) => [
k,
v ? { value: v.value, confidence: v.confidence } : null,
]),
),
meanConfidence: parse.meanConfidence,
warnings: parse.warnings,
};
}
// ─── reconcile + apply ───────────────────────────────────────────────────────
/**
* Walk every parsed field; classify into:
* - `autoApplied` when the CRM column is null/empty.
* - `conflicts` when both sides have a non-null value and they disagree.
*
* Numeric tolerance: ±1% (matches §14.6 imperial-vs-metric guidance, applied
* uniformly across all numeric columns since the same rounding noise affects
* weekly/daily rates too).
*/
export async function reconcilePdfWithBerth(
berthId: string,
parsed: ParseResult,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError on cross-port lookups. */
portId: string,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<ReconcileResult> {
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const fields = parsed.fields;
const autoApplied: ReconcileResult['autoApplied'] = [];
const conflicts: ReconcileConflict[] = [];
const warnings: string[] = [...parsed.warnings];
// §14.6 — mooring-number mismatch warning.
const pdfMooring = fields.mooringNumber?.value;
if (
pdfMooring &&
typeof pdfMooring === 'string' &&
pdfMooring.toUpperCase() !== berthRow.mooringNumber.toUpperCase()
) {
warnings.push(
`PDF says berth ${pdfMooring} but uploading to ${berthRow.mooringNumber}. Confirm before applying.`,
);
}
for (const key of APPLIABLE_FIELDS) {
const parsedField = fields[key];
if (!parsedField || parsedField.value == null) continue;
const crmRaw = (berthRow as Record<string, unknown>)[key];
const crmValue = normalizeForCompare(key, crmRaw);
const pdfValue = normalizeForCompare(key, parsedField.value);
if (crmValue == null || crmValue === '') {
autoApplied.push({ field: key, value: parsedField.value as string | number });
continue;
}
if (!valuesEqual(crmValue, pdfValue, NUMERIC_FIELDS.has(key))) {
conflicts.push({
field: key,
crmValue: crmValue as string | number | null,
pdfValue: pdfValue as string | number | null,
pdfConfidence: parsedField.confidence,
});
}
}
return { autoApplied, conflicts, warnings, engine: parsed.engine };
}
/**
* Apply a rep-confirmed slice of the reconcile diff to the berth row. The
* caller passes the canonical `ExtractedBerthFields` keys; anything outside
* `APPLIABLE_FIELDS` is silently dropped to keep this endpoint a hard
* allowlist.
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
*
* Mooring-mismatch gate (§14.6 critical): when the version's stored
* `parseResults.warnings` contains a mooring-mismatch warning, the apply
* is rejected unless the caller passes `confirmMooringMismatch: true`.
* This is the service-side enforcement of the "force re-confirm" rule
* UI confirmation alone is not enough.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
*/
export async function applyParseResults(
berthId: string,
versionId: string,
fieldsToApply: Partial<ExtractedBerthFields>,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
opts: { confirmMooringMismatch?: boolean } = {},
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<{ updatedFields: Array<keyof ExtractedBerthFields> }> {
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const versionRow = await db.query.berthPdfVersions.findFirst({
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
});
if (!versionRow) throw new NotFoundError('Berth PDF version');
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// §14.6 mooring-mismatch gate.
const priorParse = versionRow.parseResults as { warnings?: string[] } | null;
const hasMooringMismatch = (priorParse?.warnings ?? []).some(
(w) => /uploading to/i.test(w) && /berth/i.test(w),
);
if (hasMooringMismatch && !opts.confirmMooringMismatch) {
throw new ConflictError(
'PDF mooring mismatch with target berth. Pass confirmMooringMismatch=true to override.',
);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const update: Record<string, unknown> = {};
const applied: Array<keyof ExtractedBerthFields> = [];
for (const key of APPLIABLE_FIELDS) {
const value = fieldsToApply[key];
if (value === undefined) continue;
if (value === null) {
update[key] = null;
applied.push(key);
continue;
}
if (NUMERIC_FIELDS.has(key)) {
const n = typeof value === 'number' ? value : Number(value);
if (!Number.isFinite(n)) continue;
// numeric columns expect strings to preserve precision.
update[key] = String(n);
} else {
update[key] = String(value);
}
applied.push(key);
}
if (applied.length === 0) {
throw new ValidationError('No appliable fields supplied.');
}
update.updatedAt = new Date();
await db.transaction(async (tx) => {
await tx.update(berths).set(update).where(eq(berths.id, berthId));
// Stamp the applied-field set onto parse_results for audit.
const prior = (versionRow.parseResults as Record<string, unknown> | null) ?? {};
await tx
.update(berthPdfVersions)
.set({
parseResults: {
...prior,
appliedFields: applied,
appliedAt: new Date().toISOString(),
},
})
.where(eq(berthPdfVersions.id, versionId));
});
return { updatedFields: applied };
}
// ─── version listing + rollback ──────────────────────────────────────────────
export interface BerthPdfVersionListItem {
id: string;
versionNumber: number;
fileName: string;
fileSizeBytes: number;
uploadedBy: string;
uploadedAt: Date;
isCurrent: boolean;
/** Pre-signed download URL (15-min TTL). */
downloadUrl: string;
downloadUrlExpiresAt: Date;
parseEngine: ParserEngine | null;
}
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
export async function listBerthPdfVersions(
berthId: string,
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
): Promise<BerthPdfVersionListItem[]> {
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const rows = await db
.select()
.from(berthPdfVersions)
.where(eq(berthPdfVersions.berthId, berthId))
.orderBy(desc(berthPdfVersions.versionNumber));
const backend = await getStorageBackend();
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
// Presign in parallel — for an S3 backend each call is a separate HTTP
// round-trip, so a 20-version berth used to take 20× the latency in
// the sequential loop. Promise.all collapses to ~1× round-trip.
const presigned = await Promise.all(
rows.map((row) =>
backend.presignDownload(row.storageKey, {
expirySeconds: 900,
filename: row.fileName,
contentType: 'application/pdf',
}),
),
);
return rows.map((row, i) => {
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const parseEngine = (row.parseResults as { engine?: ParserEngine } | null)?.engine ?? null;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
return {
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
id: row.id,
versionNumber: row.versionNumber,
fileName: row.fileName,
fileSizeBytes: row.fileSizeBytes,
uploadedBy: row.uploadedBy,
uploadedAt: row.uploadedAt,
isCurrent: berthRow.currentPdfVersionId === row.id,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
downloadUrl: presigned[i]!.url,
downloadUrlExpiresAt: presigned[i]!.expiresAt,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
parseEngine,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
};
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
/**
* Set `berths.current_pdf_version_id` to the requested version. Per §14.6,
* this does NOT re-parse and re-update the berth columns that's a separate
* deliberate "extract data from this version" action.
*/
export async function rollbackToVersion(
berthId: string,
versionId: string,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<{ versionId: string; versionNumber: number }> {
const versionRow = await db.query.berthPdfVersions.findFirst({
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
});
if (!versionRow) throw new NotFoundError('Berth PDF version');
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
if (berthRow.currentPdfVersionId === versionId) {
throw new ConflictError('That version is already current; rollback is a no-op.');
}
await db
.update(berths)
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
.where(eq(berths.id, berthId));
return { versionId, versionNumber: versionRow.versionNumber };
}
// ─── compare helpers ─────────────────────────────────────────────────────────
function normalizeForCompare(
key: keyof ExtractedBerthFields,
raw: unknown,
): string | number | null {
if (raw == null) return null;
if (NUMERIC_FIELDS.has(key)) {
const n = typeof raw === 'number' ? raw : Number(String(raw).replace(/[^0-9.\-]/g, ''));
return Number.isFinite(n) ? n : null;
}
if (typeof raw === 'string') return raw.trim();
return String(raw);
}
function valuesEqual(a: unknown, b: unknown, isNumeric: boolean): boolean {
if (a == null && b == null) return true;
if (a == null || b == null) return false;
if (isNumeric) {
const an = Number(a);
const bn = Number(b);
if (!Number.isFinite(an) || !Number.isFinite(bn)) return false;
if (an === bn) return true;
if (bn === 0) return Math.abs(an - bn) < 0.0001;
return Math.abs(an - bn) / Math.abs(bn) <= IMPERIAL_METRIC_TOLERANCE;
}
return String(a).trim().toLowerCase() === String(b).trim().toLowerCase();
}
// ─── re-exports the route layer leans on ─────────────────────────────────────
export { parseBerthPdf } from './berth-pdf-parser';
export type {
ExtractedBerthFields,
ParsedField,
ParseResult,
ParserEngine,
} from './berth-pdf-parser';