Files
pn-new-crm/src/lib/services/berth-pdf.service.ts

671 lines
25 KiB
TypeScript
Raw Normal View History

feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
/**
* Berth PDF management service (Phase 6b see plan §4.7b, §11.1, §14.6).
*
* Responsibilities:
* - Upload a per-berth PDF (versioned), via the active `StorageBackend`.
* - Verify the magic bytes (`%PDF-`) before persisting; delete the storage
* object on mismatch (§14.6 critical).
* - Reconcile the parsed fields against the current berth row, surfacing
* conflicts for the rep's diff dialog and auto-applying nullable gaps.
* - Enforce per-port size cap from `system_settings.berth_pdf_max_upload_mb`.
* - Generate signed download URLs for the version list.
*/
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
import { createHash } from 'node:crypto';
import { and, desc, eq, isNull, max, sql } from 'drizzle-orm';
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
import { db } from '@/lib/db';
import { berths, berthPdfVersions } from '@/lib/db/schema/berths';
import { systemSettings } from '@/lib/db/schema/system';
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
import { CodedError, ConflictError, NotFoundError, ValidationError } from '@/lib/errors';
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
import { logger } from '@/lib/logger';
import { getStorageBackend } from '@/lib/storage';
import {
type ExtractedBerthFields,
type ParseResult,
type ParserEngine,
isPdfMagic,
} from './berth-pdf-parser';
// ─── shared types ────────────────────────────────────────────────────────────
export interface ReconcileConflict {
field: keyof ExtractedBerthFields;
crmValue: string | number | null;
pdfValue: string | number | null;
/** Confidence the parser assigned to the PDF value (0..1). */
pdfConfidence: number;
}
export interface ReconcileResult {
/** Fields where CRM was null and the PDF supplied a value; these can be
* applied automatically (the rep still sees them as "Auto-applied" chips). */
autoApplied: Array<{ field: keyof ExtractedBerthFields; value: string | number }>;
/** Fields where CRM and PDF disagree on a non-null value. The diff dialog
* shows these as a side-by-side comparison; nothing is written until the
* rep confirms via the apply endpoint. */
conflicts: ReconcileConflict[];
/** Pure-warning bucket e.g. mooring-number mismatch with the berth being
* uploaded to (§14.6). */
warnings: string[];
/** Engine that produced the parse — surfaced on the diff UI. */
engine: ParserEngine;
}
// Field allowlist for reconcile/apply. Mirrors `berths` columns; we never
// blindly write `crypto.randomUUID()` or anything outside this set so a
// rogue parser tier can't poison the schema.
const APPLIABLE_FIELDS: ReadonlyArray<keyof ExtractedBerthFields> = [
'lengthFt',
'lengthM',
'widthFt',
'widthM',
'draftFt',
'draftM',
'waterDepth',
'waterDepthM',
'bowFacing',
'sidePontoon',
'powerCapacity',
'voltage',
'mooringType',
'cleatType',
'cleatCapacity',
'bollardType',
'bollardCapacity',
'access',
'price',
'weeklyRateHighUsd',
'weeklyRateLowUsd',
'dailyRateHighUsd',
'dailyRateLowUsd',
'pricingValidUntil',
];
// Numeric berths columns are stored as `numeric` (Drizzle returns string).
// This set tells the apply path which fields need stringification.
const NUMERIC_FIELDS = new Set<keyof ExtractedBerthFields>([
'lengthFt',
'lengthM',
'widthFt',
'widthM',
'draftFt',
'draftM',
'waterDepth',
'waterDepthM',
'powerCapacity',
'voltage',
'price',
'weeklyRateHighUsd',
'weeklyRateLowUsd',
'dailyRateHighUsd',
'dailyRateLowUsd',
]);
// Tolerance for imperial vs metric reconcile. Same threshold as the parser.
const IMPERIAL_METRIC_TOLERANCE = 0.01;
// ─── settings helpers ────────────────────────────────────────────────────────
/** Resolve `berth_pdf_max_upload_mb` with port-override → global → default 15. */
export async function getMaxUploadMb(portId: string): Promise<number> {
const KEY = 'berth_pdf_max_upload_mb';
const [portRow] = await db
.select()
.from(systemSettings)
.where(and(eq(systemSettings.key, KEY), eq(systemSettings.portId, portId)));
if (portRow && typeof portRow.value === 'number') return portRow.value;
if (portRow && typeof portRow.value === 'string') {
const n = Number(portRow.value);
if (Number.isFinite(n)) return n;
}
const [globalRow] = await db
.select()
.from(systemSettings)
.where(and(eq(systemSettings.key, KEY), isNull(systemSettings.portId)));
if (globalRow && typeof globalRow.value === 'number') return globalRow.value;
if (globalRow && typeof globalRow.value === 'string') {
const n = Number(globalRow.value);
if (Number.isFinite(n)) return n;
}
return 15;
}
// ─── upload + version management ─────────────────────────────────────────────
export interface UploadBerthPdfArgs {
berthId: string;
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/**
* Acting tenant. Every public service entrypoint requires this so the berth
* lookup can be scoped to `(berth.id, port_id)` without it a rep with
* berths:edit on port A could supply a port B berth UUID and write/read
* cross-tenant data. NotFoundError on mismatch.
*/
portId: string;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
/** Already-uploaded storage key (the upload-url endpoint generated it) OR
* undefined to make this service compute one. */
storageKey?: string;
/** Raw bytes when the server proxies the upload (filesystem mode); when
* callers used a presigned PUT they pass `storageKey` and skip this. */
buffer?: Buffer;
fileName: string;
uploadedBy: string;
/** Pre-computed sha256 hex from the client (verified server-side anyway). */
sha256?: string;
/** Pre-computed bytes (used for the size cap pre-flight on direct uploads). */
fileSizeBytes?: number;
/** Result of running `parseBerthPdf` server-side. Optional the rep may
* have skipped parsing on a re-upload. */
parseResult?: ParseResult;
}
export interface UploadBerthPdfResult {
versionId: string;
storageKey: string;
versionNumber: number;
fileSizeBytes: number;
contentSha256: string;
}
/**
* Persist a per-berth PDF version. Either the raw `buffer` or a pre-uploaded
* `storageKey` (with optional `buffer` for verification) is required.
*
* Critical mitigations enforced here:
* - §14.6 magic-byte check against the buffer when present.
* - §14.6 size cap from `berth_pdf_max_upload_mb`.
* - Storage key namespaced under `berths/{id}/v{n}/...` so two reps racing
* on the same berth can't collide (the version-number unique index in
* the DB does the dedup).
*/
export async function uploadBerthPdf(args: UploadBerthPdfArgs): Promise<UploadBerthPdfResult> {
// 1. Resolve the berth + port for size-cap lookup.
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
// Tenant-scoped lookup — NotFoundError when the berth lives in a different
// port so a rep on port A cannot upload PDFs against port B's berths.
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, args.berthId), eq(berths.portId, args.portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const maxMb = await getMaxUploadMb(berthRow.portId);
const maxBytes = maxMb * 1024 * 1024;
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// 2. Per-berth advisory lock prevents two concurrent uploads from both
// computing version `v3` and racing to write blobs (the unique index
// on (berth_id, version_number) would catch the second insert, but
// only AFTER its blob is already in storage — leaving an orphan).
// The lock is scoped to a transaction wrapping the version-number
// read AND the blob write, so concurrent uploads serialize cleanly.
// NB: hash the UUID into a 32-bit int for pg_advisory_xact_lock(int).
const berthLockKey = hashBerthIdToInt(args.berthId);
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
// 3. Magic bytes + size when we have the buffer in hand.
const backend = await getStorageBackend();
const buffer = args.buffer;
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// UUID-based storage key path so two concurrent uploads can't collide
// on the same blob path (the version_number suffix used to be in the
// key but is now a separate DB column allocated under the per-berth
// advisory lock — see step 4).
let versionNumber = 1;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
let storageKey =
args.storageKey ??
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
`berths/${args.berthId}/${crypto.randomUUID()}/${sanitizeFileName(args.fileName)}`;
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
let sizeBytes = args.fileSizeBytes ?? buffer?.length ?? 0;
let sha256 = args.sha256 ?? '';
if (buffer) {
if (!isPdfMagic(buffer)) {
// Best-effort cleanup if the storage already has a partial.
if (args.storageKey) await backend.delete(args.storageKey).catch(() => undefined);
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
throw new CodedError('BERTHS_PDF_MAGIC_BYTE');
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
if (buffer.length === 0) throw new CodedError('BERTHS_PDF_EMPTY');
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (buffer.length > maxBytes) {
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
throw new CodedError('BERTHS_PDF_TOO_LARGE', {
internalMessage: `PDF exceeds ${maxMb} MB upload cap (got ${(buffer.length / 1024 / 1024).toFixed(1)} MB).`,
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
const written = await backend.put(storageKey, buffer, { contentType: 'application/pdf' });
storageKey = written.key;
sizeBytes = written.sizeBytes;
sha256 = written.sha256;
} else if (args.storageKey) {
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// Browser uploaded directly via presigned URL — verify via HEAD + magic bytes.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const head = await backend.head(args.storageKey);
if (!head) {
throw new ValidationError('Uploaded object not found at expected storage key.');
}
if (head.sizeBytes === 0) {
await backend.delete(args.storageKey).catch(() => undefined);
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
throw new CodedError('BERTHS_PDF_EMPTY');
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
if (head.sizeBytes > maxBytes) {
await backend.delete(args.storageKey).catch(() => undefined);
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
throw new CodedError('BERTHS_PDF_TOO_LARGE', {
internalMessage: `PDF exceeds ${maxMb} MB upload cap (got ${(head.sizeBytes / 1024 / 1024).toFixed(1)} MB).`,
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
if (head.contentType !== 'application/pdf' && head.contentType !== 'application/octet-stream') {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
`Uploaded object content-type is ${head.contentType}; expected application/pdf.`,
);
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// Magic-byte check on the presign path (§14.6 critical) - browser-
// uploaded objects could be anything until we read the bytes. Stream
// just the first 5 bytes; abort early on mismatch and delete the blob.
const probeBytes = await readFirstBytes(backend, args.storageKey, 5);
if (!isPdfMagic(probeBytes)) {
await backend.delete(args.storageKey).catch(() => undefined);
throw new ValidationError(
'Uploaded file failed PDF magic-byte check (does not start with %PDF-).',
);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
sizeBytes = head.sizeBytes;
sha256 = args.sha256 ?? '';
storageKey = args.storageKey;
} else {
throw new ValidationError('Either buffer or storageKey is required.');
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// 4. Take the per-berth advisory lock, compute version_number under
// the lock, insert + bump pointer. All inside a single transaction
// so the lock + writes commit atomically.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const versionId = crypto.randomUUID();
await db.transaction(async (tx) => {
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
await tx.execute(sql`SELECT pg_advisory_xact_lock(${berthLockKey})`);
versionNumber = await nextVersionNumberTx(tx, args.berthId);
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
await tx.insert(berthPdfVersions).values({
id: versionId,
berthId: args.berthId,
versionNumber,
storageKey,
fileName: args.fileName,
fileSizeBytes: sizeBytes,
contentSha256: sha256,
uploadedBy: args.uploadedBy,
parseResults: args.parseResult ? serializeParseResult(args.parseResult) : null,
});
await tx
.update(berths)
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
.where(eq(berths.id, args.berthId));
});
logger.info(
{ berthId: args.berthId, versionId, versionNumber, storageKey, sizeBytes },
'Berth PDF version saved',
);
return { versionId, storageKey, versionNumber, fileSizeBytes: sizeBytes, contentSha256: sha256 };
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
/** Tx-bound variant — same SELECT MAX(...) but inside the caller's transaction. */
async function nextVersionNumberTx(
tx: Parameters<Parameters<typeof db.transaction>[0]>[0],
berthId: string,
): Promise<number> {
const [row] = await tx
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
.select({ max: max(berthPdfVersions.versionNumber) })
.from(berthPdfVersions)
.where(eq(berthPdfVersions.berthId, berthId));
return (row?.max ?? 0) + 1;
}
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
/**
* Hash a UUID berthId into a 32-bit signed integer for pg_advisory_xact_lock.
* Uses the first 4 bytes of sha256 reinterpreted as int32 collisions are
* theoretically possible but the lock is per-berth so a collision just
* means two different berths' uploads serialize through the same key,
* which is harmless (correctness preserved, slight contention only).
*/
function hashBerthIdToInt(berthId: string): number {
const h = createHash('sha256').update(berthId).digest();
// Read as signed 32-bit big-endian; pg_advisory_xact_lock(int) signature.
return h.readInt32BE(0);
}
/**
* Stream just the first `n` bytes of a stored object so the magic-byte
* check on the presigned-PUT path can run without buffering the whole
* file. Returns a Buffer of up to `n` bytes (less if the file is shorter).
*/
async function readFirstBytes(
backend: Awaited<ReturnType<typeof getStorageBackend>>,
key: string,
n: number,
): Promise<Buffer> {
const stream = await backend.get(key);
const chunks: Buffer[] = [];
let total = 0;
for await (const chunk of stream as AsyncIterable<Buffer | string>) {
const buf = typeof chunk === 'string' ? Buffer.from(chunk) : chunk;
chunks.push(buf);
total += buf.length;
if (total >= n) break;
}
// Best-effort dispose - some streams are still readable after iteration.
if (typeof (stream as { destroy?: () => void }).destroy === 'function') {
(stream as unknown as { destroy: () => void }).destroy();
}
return Buffer.concat(chunks).subarray(0, n);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
function sanitizeFileName(raw: string): string {
// Preserve the extension; replace spaces / disallowed chars with '_' so the
// result satisfies the storage-key validation regex.
const last = raw.split(/[\\/]/).pop() ?? raw;
return last.replace(/[^a-zA-Z0-9._-]/g, '_').slice(0, 200) || 'berth.pdf';
}
function serializeParseResult(parse: ParseResult): Record<string, unknown> {
return {
engine: parse.engine,
extracted: Object.fromEntries(
Object.entries(parse.fields).map(([k, v]) => [
k,
v ? { value: v.value, confidence: v.confidence } : null,
]),
),
meanConfidence: parse.meanConfidence,
warnings: parse.warnings,
};
}
// ─── reconcile + apply ───────────────────────────────────────────────────────
/**
* Walk every parsed field; classify into:
* - `autoApplied` when the CRM column is null/empty.
* - `conflicts` when both sides have a non-null value and they disagree.
*
* Numeric tolerance: ±1% (matches §14.6 imperial-vs-metric guidance, applied
* uniformly across all numeric columns since the same rounding noise affects
* weekly/daily rates too).
*/
export async function reconcilePdfWithBerth(
berthId: string,
parsed: ParseResult,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError on cross-port lookups. */
portId: string,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<ReconcileResult> {
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const fields = parsed.fields;
const autoApplied: ReconcileResult['autoApplied'] = [];
const conflicts: ReconcileConflict[] = [];
const warnings: string[] = [...parsed.warnings];
// §14.6 — mooring-number mismatch warning.
const pdfMooring = fields.mooringNumber?.value;
if (
pdfMooring &&
typeof pdfMooring === 'string' &&
pdfMooring.toUpperCase() !== berthRow.mooringNumber.toUpperCase()
) {
warnings.push(
`PDF says berth ${pdfMooring} but uploading to ${berthRow.mooringNumber}. Confirm before applying.`,
);
}
for (const key of APPLIABLE_FIELDS) {
const parsedField = fields[key];
if (!parsedField || parsedField.value == null) continue;
const crmRaw = (berthRow as Record<string, unknown>)[key];
const crmValue = normalizeForCompare(key, crmRaw);
const pdfValue = normalizeForCompare(key, parsedField.value);
if (crmValue == null || crmValue === '') {
autoApplied.push({ field: key, value: parsedField.value as string | number });
continue;
}
if (!valuesEqual(crmValue, pdfValue, NUMERIC_FIELDS.has(key))) {
conflicts.push({
field: key,
crmValue: crmValue as string | number | null,
pdfValue: pdfValue as string | number | null,
pdfConfidence: parsedField.confidence,
});
}
}
return { autoApplied, conflicts, warnings, engine: parsed.engine };
}
/**
* Apply a rep-confirmed slice of the reconcile diff to the berth row. The
* caller passes the canonical `ExtractedBerthFields` keys; anything outside
* `APPLIABLE_FIELDS` is silently dropped to keep this endpoint a hard
* allowlist.
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
*
* Mooring-mismatch gate (§14.6 critical): when the version's stored
* `parseResults.warnings` contains a mooring-mismatch warning, the apply
* is rejected unless the caller passes `confirmMooringMismatch: true`.
* This is the service-side enforcement of the "force re-confirm" rule
* UI confirmation alone is not enough.
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
*/
export async function applyParseResults(
berthId: string,
versionId: string,
fieldsToApply: Partial<ExtractedBerthFields>,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
opts: { confirmMooringMismatch?: boolean } = {},
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<{ updatedFields: Array<keyof ExtractedBerthFields> }> {
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const versionRow = await db.query.berthPdfVersions.findFirst({
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
});
if (!versionRow) throw new NotFoundError('Berth PDF version');
fix(audit): post-review hardening across phases 0-7 15 of 17 findings from the consolidated audit (3 reviewer agents on the previously-shipped phase commits). Remaining two are nice-to-have follow-ups deferred. Critical (data integrity / security): - Public berths API: closed-deal junction rows no longer flip a berth to "Under Offer" - filter on `interests.outcome IS NULL` so won/ lost/cancelled don't pollute public-map status. Both list + single-mooring routes. - Recommender heat: cancelled outcomes now count as fall-throughs (SQL was `LIKE 'lost%'` which silently dropped them, leaving cancelled-only berths stuck in tier A). - Filesystem presignDownload returns an absolute URL (origin from APP_URL) so emailed download links resolve from external mail clients. - Magic-byte verification on the presigned-PUT path: both per-berth PDFs and brochures stream the first 5 bytes via the storage backend and reject + delete on `%PDF-` mismatch (was only enforced when the server saw the buffer; presign-PUT was wide open). - Replay-protection TTL aligned to the token's own expiry (was a fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling 25 days. - Brochures unique partial index on (port_id) WHERE is_default=true + 0032 migration. Closes the read-then-write race in the create/ update transactions. Important: - Recommender SQL: defense-in-depth `i.port_id = $portId` filter on the aggregates CTE. - berth-pdf service: per-berth pg_advisory_xact_lock around the version-number SELECT + insert. Storage key is now UUID-based so concurrent uploads can't collide on blob paths. Replaces `nextVersionNumber` with the tx-bound variant. - berth-pdf apply: rejects with ConflictError when parse_results contain a mooring-mismatch warning unless the caller passes `confirmMooringMismatch: true` (force-reconfirm gate was UI-only). - Send-out body: HTML-escape brochure filename in the download-link fallback (XSS guard). - parseDecimalWithUnit rejects negative numbers. - listClients DISTINCT ON for primary contact resolution: bounds contact-row count to ~2 per client. Defensive: - verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite. - Replaced sql ANY() with inArray() in interest-berths. Tests: 1145 -> 1163 passing. Deferred: bulk-send rate limit (no bulk endpoint today), markdown italic regex breaking links with asterisks (cosmetic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
// §14.6 mooring-mismatch gate.
const priorParse = versionRow.parseResults as { warnings?: string[] } | null;
const hasMooringMismatch = (priorParse?.warnings ?? []).some(
(w) => /uploading to/i.test(w) && /berth/i.test(w),
);
if (hasMooringMismatch && !opts.confirmMooringMismatch) {
throw new ConflictError(
'PDF mooring mismatch with target berth. Pass confirmMooringMismatch=true to override.',
);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const update: Record<string, unknown> = {};
const applied: Array<keyof ExtractedBerthFields> = [];
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred items (deferring the Documenso Phases 2-7 build and items needing design decisions or live external instances). DB schema: - Migration 0046 converts 5 composite (port_id, archived_at) indexes to partial WHERE archived_at IS NULL — clients, interests, yachts, and both residential tables. Smaller, faster planner choice for the dominant list-query shape. Multi-tenant isolation: - document_sends now verifies recipient.interestId belongs to the port before landing on the audit row (the surrounding clientId check was already port-scoped; interestId pollution was the gap). Routes / API: - /api/v1/custom-fields/[entityId] requires entityType query param and gates on the matching resource permission (clients/interests/berths/ yachts/companies). Fixes the cross-resource gap where a user with clients.view could read company custom-field values. - Admin user list trash button wrapped in PermissionGate (edit was already gated; remove was not). Service polish: - berth-recommender accepts string-shaped JSONB booleans ('true'/'false') so admin UIs that wrap values as strings don't silently fall through to defaults. - expense-pdf renderReceiptHeader anchors all text positions to a captured baseY rather than reading mutating doc.y after rect+stroke. Headers no longer drift on the first receipt page after a soft page break. - berth-pdf apply: collect non-finite numeric coercion drops + warn-log them so partial silent drops are observable (was invisible because the no-fields-supplied check only fires when ALL drop). - Storage cache fingerprint comment documenting the encrypted-secret invariant + the explicit invalidation hook. UI polish: - invoice-detail typed: replaced two `any` casts with a proper InvoiceDetailData / LineItem / LinkedExpense interface set. - YachtForm now accepts initialOwner prop. Wired through: - client-yachts-tab passes { type: 'client', id: clientId } - interest-form passes { type: 'client', id: selectedClientId } - Interest-form yacht picker now includes company-owned yachts where the selected client is a member (fetches client.companies and feeds YachtPicker an array filter). Plus an inline "Add new" button that opens YachtForm pre-bound to the client. - YachtPicker accepts ownerFilter as single OR array for "match any" semantics. BACKLOG.md updated with what landed vs what's still deferred (and why each deferred item is genuinely larger than this push warrants). Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
// Capture keys whose values were supplied but couldn't be coerced
// (e.g. a numeric column receiving a non-finite or non-numeric value).
// Without this, partial silent drops are invisible because the
// "no appliable fields supplied" check only fires when EVERY key drops.
const dropped: Array<{ key: keyof ExtractedBerthFields; reason: string }> = [];
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
for (const key of APPLIABLE_FIELDS) {
const value = fieldsToApply[key];
if (value === undefined) continue;
if (value === null) {
update[key] = null;
applied.push(key);
continue;
}
if (NUMERIC_FIELDS.has(key)) {
const n = typeof value === 'number' ? value : Number(value);
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred items (deferring the Documenso Phases 2-7 build and items needing design decisions or live external instances). DB schema: - Migration 0046 converts 5 composite (port_id, archived_at) indexes to partial WHERE archived_at IS NULL — clients, interests, yachts, and both residential tables. Smaller, faster planner choice for the dominant list-query shape. Multi-tenant isolation: - document_sends now verifies recipient.interestId belongs to the port before landing on the audit row (the surrounding clientId check was already port-scoped; interestId pollution was the gap). Routes / API: - /api/v1/custom-fields/[entityId] requires entityType query param and gates on the matching resource permission (clients/interests/berths/ yachts/companies). Fixes the cross-resource gap where a user with clients.view could read company custom-field values. - Admin user list trash button wrapped in PermissionGate (edit was already gated; remove was not). Service polish: - berth-recommender accepts string-shaped JSONB booleans ('true'/'false') so admin UIs that wrap values as strings don't silently fall through to defaults. - expense-pdf renderReceiptHeader anchors all text positions to a captured baseY rather than reading mutating doc.y after rect+stroke. Headers no longer drift on the first receipt page after a soft page break. - berth-pdf apply: collect non-finite numeric coercion drops + warn-log them so partial silent drops are observable (was invisible because the no-fields-supplied check only fires when ALL drop). - Storage cache fingerprint comment documenting the encrypted-secret invariant + the explicit invalidation hook. UI polish: - invoice-detail typed: replaced two `any` casts with a proper InvoiceDetailData / LineItem / LinkedExpense interface set. - YachtForm now accepts initialOwner prop. Wired through: - client-yachts-tab passes { type: 'client', id: clientId } - interest-form passes { type: 'client', id: selectedClientId } - Interest-form yacht picker now includes company-owned yachts where the selected client is a member (fetches client.companies and feeds YachtPicker an array filter). Plus an inline "Add new" button that opens YachtForm pre-bound to the client. - YachtPicker accepts ownerFilter as single OR array for "match any" semantics. BACKLOG.md updated with what landed vs what's still deferred (and why each deferred item is genuinely larger than this push warrants). Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
if (!Number.isFinite(n)) {
dropped.push({ key, reason: `non-finite numeric (${typeof value}: ${String(value)})` });
continue;
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
// numeric columns expect strings to preserve precision.
update[key] = String(n);
} else {
update[key] = String(value);
}
applied.push(key);
}
if (applied.length === 0) {
throw new ValidationError('No appliable fields supplied.');
}
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred items (deferring the Documenso Phases 2-7 build and items needing design decisions or live external instances). DB schema: - Migration 0046 converts 5 composite (port_id, archived_at) indexes to partial WHERE archived_at IS NULL — clients, interests, yachts, and both residential tables. Smaller, faster planner choice for the dominant list-query shape. Multi-tenant isolation: - document_sends now verifies recipient.interestId belongs to the port before landing on the audit row (the surrounding clientId check was already port-scoped; interestId pollution was the gap). Routes / API: - /api/v1/custom-fields/[entityId] requires entityType query param and gates on the matching resource permission (clients/interests/berths/ yachts/companies). Fixes the cross-resource gap where a user with clients.view could read company custom-field values. - Admin user list trash button wrapped in PermissionGate (edit was already gated; remove was not). Service polish: - berth-recommender accepts string-shaped JSONB booleans ('true'/'false') so admin UIs that wrap values as strings don't silently fall through to defaults. - expense-pdf renderReceiptHeader anchors all text positions to a captured baseY rather than reading mutating doc.y after rect+stroke. Headers no longer drift on the first receipt page after a soft page break. - berth-pdf apply: collect non-finite numeric coercion drops + warn-log them so partial silent drops are observable (was invisible because the no-fields-supplied check only fires when ALL drop). - Storage cache fingerprint comment documenting the encrypted-secret invariant + the explicit invalidation hook. UI polish: - invoice-detail typed: replaced two `any` casts with a proper InvoiceDetailData / LineItem / LinkedExpense interface set. - YachtForm now accepts initialOwner prop. Wired through: - client-yachts-tab passes { type: 'client', id: clientId } - interest-form passes { type: 'client', id: selectedClientId } - Interest-form yacht picker now includes company-owned yachts where the selected client is a member (fetches client.companies and feeds YachtPicker an array filter). Plus an inline "Add new" button that opens YachtForm pre-bound to the client. - YachtPicker accepts ownerFilter as single OR array for "match any" semantics. BACKLOG.md updated with what landed vs what's still deferred (and why each deferred item is genuinely larger than this push warrants). Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
if (dropped.length > 0) {
logger.warn(
{ berthId, versionId, dropped },
'Berth PDF apply: silently dropped fields that failed type coercion',
);
}
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
update.updatedAt = new Date();
await db.transaction(async (tx) => {
await tx.update(berths).set(update).where(eq(berths.id, berthId));
// Stamp the applied-field set onto parse_results for audit.
const prior = (versionRow.parseResults as Record<string, unknown> | null) ?? {};
await tx
.update(berthPdfVersions)
.set({
parseResults: {
...prior,
appliedFields: applied,
appliedAt: new Date().toISOString(),
},
})
.where(eq(berthPdfVersions.id, versionId));
});
return { updatedFields: applied };
}
// ─── version listing + rollback ──────────────────────────────────────────────
export interface BerthPdfVersionListItem {
id: string;
versionNumber: number;
fileName: string;
fileSizeBytes: number;
uploadedBy: string;
uploadedAt: Date;
isCurrent: boolean;
/** Pre-signed download URL (15-min TTL). */
downloadUrl: string;
downloadUrlExpiresAt: Date;
parseEngine: ParserEngine | null;
}
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
export async function listBerthPdfVersions(
berthId: string,
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
): Promise<BerthPdfVersionListItem[]> {
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
const rows = await db
.select()
.from(berthPdfVersions)
.where(eq(berthPdfVersions.berthId, berthId))
.orderBy(desc(berthPdfVersions.versionNumber));
const backend = await getStorageBackend();
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
// Presign in parallel — for an S3 backend each call is a separate HTTP
// round-trip, so a 20-version berth used to take 20× the latency in
// the sequential loop. Promise.all collapses to ~1× round-trip.
const presigned = await Promise.all(
rows.map((row) =>
backend.presignDownload(row.storageKey, {
expirySeconds: 900,
filename: row.fileName,
contentType: 'application/pdf',
}),
),
);
return rows.map((row, i) => {
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
const parseEngine = (row.parseResults as { engine?: ParserEngine } | null)?.engine ?? null;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
return {
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
id: row.id,
versionNumber: row.versionNumber,
fileName: row.fileName,
fileSizeBytes: row.fileSizeBytes,
uploadedBy: row.uploadedBy,
uploadedAt: row.uploadedAt,
isCurrent: berthRow.currentPdfVersionId === row.id,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
downloadUrl: presigned[i]!.url,
downloadUrlExpiresAt: presigned[i]!.expiresAt,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
parseEngine,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes Replaces the legacy text-only expense PDF (was just dumping rows into a single pdfme text field — no images, no pagination) with a proper streaming export modelled on the legacy Nuxt client-portal but re-architected for memory safety. The legacy implementation OOM'd on hundreds of receipts because it: - buffered every receipt image into memory simultaneously - accumulated PDF chunks into an array, concat'd at end - base64-encoded the whole PDF into a JSON response (3x peak memory) - had no image downscaling The new design: - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts): pdfkit pipes bytes directly to the HTTP response (no Buffer accumulation). Receipts are processed serially so peak heap is one image at a time. Sharp downscales any receipt > 500 KB or > 1500 px to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a 500-receipt export, peak RSS stays under ~100 MB; legacy needed >2 GB for the same input. - Pages: cover summary box (count, totals, currency equiv, optional processing fee), grouped expense table (groupBy=none|payer|category| date), one-page-per-receipt with header (establishment, amount, date, payer, category, file name) and full-bleed image. - Storage backend abstraction — receipts stream from `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem. - Route: POST /api/v1/expenses/export/pdf streams binary application/pdf with cache-control:no-store. Validator caps expenseIds at 1000 to prevent runaway loops. Receipt-less expense flow (per user request): - Schema: 0033 migration adds `expenses.no_receipt_acknowledged` boolean (default false). - Validator: createExpenseSchema requires either receiptFileIds OR noReceiptAcknowledged=true; the .refine() error message tells the rep exactly what to do. updateExpenseSchema is partial and skips the rule (existing rows can be edited without re-acknowledging). - PDF: receiptless expenses get an inline red "(no receipt)" tag in the establishment cell + a red footer warning in the summary box showing the count and at-risk amount. - The legacy parent-company reimbursement queue may refuse to pay receiptless expenses, so the warning is load-bearing for ops. Audit-3 fixes piggy-backed: - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS protection — a crafted PDF rasterizing to high-res noise could pin the worker indefinitely). - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the legacy single-brochure fast-path was discarding its result on the multi-brochure branch). - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the presignDownload calls instead of awaiting each in a for-loop — 20-version berths went from 20× round-trip to 1×. - 🟡 public berths route no longer logs the full `row` object on enum drift (was dumping price + amenity columns into ops logs). - 🟡 dropped the dead `void sql` import from public berths route. Tests still 1163/1163. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
};
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
/**
* Set `berths.current_pdf_version_id` to the requested version. Per §14.6,
* this does NOT re-parse and re-update the berth columns that's a separate
* deliberate "extract data from this version" action.
*/
export async function rollbackToVersion(
berthId: string,
versionId: string,
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
/** Tenant scope. NotFoundError when berth lives in a different port. */
portId: string,
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
): Promise<{ versionId: string; versionNumber: number }> {
const versionRow = await db.query.berthPdfVersions.findFirst({
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
});
if (!versionRow) throw new NotFoundError('Berth PDF version');
fix(security): scope berth-pdf service entrypoints by portId Post-merge security review caught a cross-tenant authorization bypass in the per-berth PDF endpoints (HIGH severity, confidence 10): GET /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-versions POST /api/v1/berths/[id]/pdf-upload-url POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback POST /api/v1/berths/[id]/pdf-versions/parse-results/apply Each handler looked up the target berth by id only — `eq(berths.id, ...)`. withAuth resolves ctx.portId from the user-controlled X-Port-Id header (only verifying the user has SOME role on that port), and withPermission('berths', 'view'|'edit', ...) is a coarse capability check, not a row-level grant. A rep with berths:edit on Port A could supply a Port B berth UUID and: - list + receive 15-min presigned download URLs to every PDF version - mint an upload URL targeting `berths/<port-B-id>/uploads/...` - POST a new version (overwriting current_pdf_version_id on foreign berth) - rollback to any prior version on a foreign berth - apply rep-confirmed parse-result fields onto a foreign berth's columns Sibling routes (waiting-list etc.) already pair the id filter with `eq(berths.portId, ctx.portId)`, so this was an omission, not design. Fix: - Push `portId: string` into uploadBerthPdf, listBerthPdfVersions, rollbackToVersion, applyParseResults, reconcilePdfWithBerth. - Each function now filters the berth lookup with `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws NotFoundError on mismatch (no foreign-port disclosure). - Inline the same `and(...)` filter in the pdf-upload-url handler. - Every handler passes ctx.portId through. Coverage: - New `cross-port tenant guard` test exercises every entrypoint with a foreign-port id and asserts NotFoundError. - 1164/1164 vitest passing. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
const berthRow = await db.query.berths.findFirst({
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
});
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
if (!berthRow) throw new NotFoundError('Berth');
if (berthRow.currentPdfVersionId === versionId) {
feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
throw new CodedError('BERTHS_VERSION_ALREADY_CURRENT');
feat(berths): per-berth PDF storage (versioned) + reverse parser Phase 6b of the berth-recommender refactor (see docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6). Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every file write goes through `getStorageBackend()`; no direct minio imports. Schema (migration 0030_berth_pdf_versions): - new table `berth_pdf_versions` with monotonic `version_number` per berth, `storage_key` (renamed convention from §4.7a), sha256, size, `download_url_expires_at` cache slot for §11.1 signed-URL throttling, and `parse_results` jsonb for the audit trail. - new column `berths.current_pdf_version_id` (deferred from Phase 0) with FK to `berth_pdf_versions(id)` ON DELETE SET NULL. - relations + types exported from `schema/berths.ts`. 3-tier reverse parser (`lib/services/berth-pdf-parser.ts`): 1. AcroForm via pdf-lib — pulls named fields (`length_ft`, `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such fields, so this is defensive coverage for future templates. 2. OCR via Tesseract.js — positional/regex heuristics keyed off the §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`, `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns per-field confidence + global mean; flags imperial-vs-metric drift >1% in `warnings`. 3. AI fallback — gated via `getResolvedOcrConfig()` (existing openai/claude provider). Surfaced from the diff dialog only when `shouldOfferAiTier()` returns true (mean OCR confidence below 0.55 threshold), so OPENAI_API_KEY isn't burned on every upload. Service layer (`lib/services/berth-pdf.service.ts`): - `uploadBerthPdf()` — magic-byte check, size cap, version-number bump + current pointer in one transaction. - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null; flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric columns; warns on mooring-number-in-PDF mismatch (§14.6). - `applyParseResults()` — hard allowlist of writable columns; stamps `appliedFields` onto `parse_results` for audit. - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6). - `listBerthPdfVersions()` — version list with 15-min signed URLs. - `getMaxUploadMb()` — port-override → global → default 15 lookup on `system_settings.berth_pdf_max_upload_mb`. §14.6 critical mitigations: - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the storage object and rejects the request. - Size cap from `system_settings.berth_pdf_max_upload_mb` (default 15 MB); enforced in the upload-url presign AND server-side. - 0-byte uploads rejected. - Mooring-number mismatch surfaces as a `warnings[]` entry on the reconcile result so the rep sees it in the diff dialog. - Imperial vs metric ±1% tolerance in both the parser warnings and the reconcile equality check. - Path traversal already blocked at the storage layer (Phase 6a). API + UI: - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or HMAC-signed proxy URL (filesystem) sized to the per-port cap. - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via `backend.head()`, writes the row, bumps `current_pdf_version_id`. - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs. - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`. - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` — rep-confirmed diff payload. - New "Documents" tab on the berth detail page (`berth-tabs.tsx`) with current-PDF panel, version history, Replace PDF button, and `<PdfReconcileDialog>` for the auto-applied + conflicts UX. System settings: - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size + server-side validation. Resolved port-override → global → default. Tests: - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes, feet-inches, human dates, full §9.2-shaped OCR text → 18 fields, drift warning, AI-tier gate. - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic pdf-lib AcroForm round-trip. - `tests/integration/berth-pdf-versions.test.ts` — upload, version- number bump, magic-byte rejection, reconcile auto-applied vs conflicts vs ±1% tolerance, mooring-number warning, applyParseResults allowlist enforcement, rollback semantics. Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run` green at 1103/1103. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
}
await db
.update(berths)
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
.where(eq(berths.id, berthId));
return { versionId, versionNumber: versionRow.versionNumber };
}
// ─── compare helpers ─────────────────────────────────────────────────────────
function normalizeForCompare(
key: keyof ExtractedBerthFields,
raw: unknown,
): string | number | null {
if (raw == null) return null;
if (NUMERIC_FIELDS.has(key)) {
const n = typeof raw === 'number' ? raw : Number(String(raw).replace(/[^0-9.\-]/g, ''));
return Number.isFinite(n) ? n : null;
}
if (typeof raw === 'string') return raw.trim();
return String(raw);
}
function valuesEqual(a: unknown, b: unknown, isNumeric: boolean): boolean {
if (a == null && b == null) return true;
if (a == null || b == null) return false;
if (isNumeric) {
const an = Number(a);
const bn = Number(b);
if (!Number.isFinite(an) || !Number.isFinite(bn)) return false;
if (an === bn) return true;
if (bn === 0) return Math.abs(an - bn) < 0.0001;
return Math.abs(an - bn) / Math.abs(bn) <= IMPERIAL_METRIC_TOLERANCE;
}
return String(a).trim().toLowerCase() === String(b).trim().toLowerCase();
}
// ─── re-exports the route layer leans on ─────────────────────────────────────
export { parseBerthPdf } from './berth-pdf-parser';
export type {
ExtractedBerthFields,
ParsedField,
ParseResult,
ParserEngine,
} from './berth-pdf-parser';