feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
/**
|
|
|
|
|
|
* Berth PDF management service (Phase 6b — see plan §4.7b, §11.1, §14.6).
|
|
|
|
|
|
*
|
|
|
|
|
|
* Responsibilities:
|
|
|
|
|
|
* - Upload a per-berth PDF (versioned), via the active `StorageBackend`.
|
|
|
|
|
|
* - Verify the magic bytes (`%PDF-`) before persisting; delete the storage
|
|
|
|
|
|
* object on mismatch (§14.6 critical).
|
|
|
|
|
|
* - Reconcile the parsed fields against the current berth row, surfacing
|
|
|
|
|
|
* conflicts for the rep's diff dialog and auto-applying nullable gaps.
|
|
|
|
|
|
* - Enforce per-port size cap from `system_settings.berth_pdf_max_upload_mb`.
|
|
|
|
|
|
* - Generate signed download URLs for the version list.
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
import { createHash } from 'node:crypto';
|
|
|
|
|
|
|
|
|
|
|
|
import { and, desc, eq, isNull, max, sql } from 'drizzle-orm';
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
|
|
|
|
|
|
import { db } from '@/lib/db';
|
|
|
|
|
|
import { berths, berthPdfVersions } from '@/lib/db/schema/berths';
|
|
|
|
|
|
import { systemSettings } from '@/lib/db/schema/system';
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
import { CodedError, ConflictError, NotFoundError, ValidationError } from '@/lib/errors';
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
import { logger } from '@/lib/logger';
|
|
|
|
|
|
import { getStorageBackend } from '@/lib/storage';
|
|
|
|
|
|
|
|
|
|
|
|
import {
|
|
|
|
|
|
type ExtractedBerthFields,
|
|
|
|
|
|
type ParseResult,
|
|
|
|
|
|
type ParserEngine,
|
|
|
|
|
|
isPdfMagic,
|
|
|
|
|
|
} from './berth-pdf-parser';
|
|
|
|
|
|
|
|
|
|
|
|
// ─── shared types ────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
export interface ReconcileConflict {
|
|
|
|
|
|
field: keyof ExtractedBerthFields;
|
|
|
|
|
|
crmValue: string | number | null;
|
|
|
|
|
|
pdfValue: string | number | null;
|
|
|
|
|
|
/** Confidence the parser assigned to the PDF value (0..1). */
|
|
|
|
|
|
pdfConfidence: number;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export interface ReconcileResult {
|
|
|
|
|
|
/** Fields where CRM was null and the PDF supplied a value; these can be
|
|
|
|
|
|
* applied automatically (the rep still sees them as "Auto-applied" chips). */
|
|
|
|
|
|
autoApplied: Array<{ field: keyof ExtractedBerthFields; value: string | number }>;
|
|
|
|
|
|
/** Fields where CRM and PDF disagree on a non-null value. The diff dialog
|
|
|
|
|
|
* shows these as a side-by-side comparison; nothing is written until the
|
|
|
|
|
|
* rep confirms via the apply endpoint. */
|
|
|
|
|
|
conflicts: ReconcileConflict[];
|
|
|
|
|
|
/** Pure-warning bucket — e.g. mooring-number mismatch with the berth being
|
|
|
|
|
|
* uploaded to (§14.6). */
|
|
|
|
|
|
warnings: string[];
|
|
|
|
|
|
/** Engine that produced the parse — surfaced on the diff UI. */
|
|
|
|
|
|
engine: ParserEngine;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Field allowlist for reconcile/apply. Mirrors `berths` columns; we never
|
|
|
|
|
|
// blindly write `crypto.randomUUID()` or anything outside this set so a
|
|
|
|
|
|
// rogue parser tier can't poison the schema.
|
|
|
|
|
|
const APPLIABLE_FIELDS: ReadonlyArray<keyof ExtractedBerthFields> = [
|
|
|
|
|
|
'lengthFt',
|
|
|
|
|
|
'lengthM',
|
|
|
|
|
|
'widthFt',
|
|
|
|
|
|
'widthM',
|
|
|
|
|
|
'draftFt',
|
|
|
|
|
|
'draftM',
|
|
|
|
|
|
'waterDepth',
|
|
|
|
|
|
'waterDepthM',
|
|
|
|
|
|
'bowFacing',
|
|
|
|
|
|
'sidePontoon',
|
|
|
|
|
|
'powerCapacity',
|
|
|
|
|
|
'voltage',
|
|
|
|
|
|
'mooringType',
|
|
|
|
|
|
'cleatType',
|
|
|
|
|
|
'cleatCapacity',
|
|
|
|
|
|
'bollardType',
|
|
|
|
|
|
'bollardCapacity',
|
|
|
|
|
|
'access',
|
|
|
|
|
|
'price',
|
|
|
|
|
|
'weeklyRateHighUsd',
|
|
|
|
|
|
'weeklyRateLowUsd',
|
|
|
|
|
|
'dailyRateHighUsd',
|
|
|
|
|
|
'dailyRateLowUsd',
|
|
|
|
|
|
'pricingValidUntil',
|
|
|
|
|
|
];
|
|
|
|
|
|
|
|
|
|
|
|
// Numeric berths columns are stored as `numeric` (Drizzle returns string).
|
|
|
|
|
|
// This set tells the apply path which fields need stringification.
|
|
|
|
|
|
const NUMERIC_FIELDS = new Set<keyof ExtractedBerthFields>([
|
|
|
|
|
|
'lengthFt',
|
|
|
|
|
|
'lengthM',
|
|
|
|
|
|
'widthFt',
|
|
|
|
|
|
'widthM',
|
|
|
|
|
|
'draftFt',
|
|
|
|
|
|
'draftM',
|
|
|
|
|
|
'waterDepth',
|
|
|
|
|
|
'waterDepthM',
|
|
|
|
|
|
'powerCapacity',
|
|
|
|
|
|
'voltage',
|
|
|
|
|
|
'price',
|
|
|
|
|
|
'weeklyRateHighUsd',
|
|
|
|
|
|
'weeklyRateLowUsd',
|
|
|
|
|
|
'dailyRateHighUsd',
|
|
|
|
|
|
'dailyRateLowUsd',
|
|
|
|
|
|
]);
|
|
|
|
|
|
|
|
|
|
|
|
// Tolerance for imperial vs metric reconcile. Same threshold as the parser.
|
|
|
|
|
|
const IMPERIAL_METRIC_TOLERANCE = 0.01;
|
|
|
|
|
|
|
|
|
|
|
|
// ─── settings helpers ────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
/** Resolve `berth_pdf_max_upload_mb` with port-override → global → default 15. */
|
|
|
|
|
|
export async function getMaxUploadMb(portId: string): Promise<number> {
|
|
|
|
|
|
const KEY = 'berth_pdf_max_upload_mb';
|
|
|
|
|
|
const [portRow] = await db
|
|
|
|
|
|
.select()
|
|
|
|
|
|
.from(systemSettings)
|
|
|
|
|
|
.where(and(eq(systemSettings.key, KEY), eq(systemSettings.portId, portId)));
|
|
|
|
|
|
if (portRow && typeof portRow.value === 'number') return portRow.value;
|
|
|
|
|
|
if (portRow && typeof portRow.value === 'string') {
|
|
|
|
|
|
const n = Number(portRow.value);
|
|
|
|
|
|
if (Number.isFinite(n)) return n;
|
|
|
|
|
|
}
|
|
|
|
|
|
const [globalRow] = await db
|
|
|
|
|
|
.select()
|
|
|
|
|
|
.from(systemSettings)
|
|
|
|
|
|
.where(and(eq(systemSettings.key, KEY), isNull(systemSettings.portId)));
|
|
|
|
|
|
if (globalRow && typeof globalRow.value === 'number') return globalRow.value;
|
|
|
|
|
|
if (globalRow && typeof globalRow.value === 'string') {
|
|
|
|
|
|
const n = Number(globalRow.value);
|
|
|
|
|
|
if (Number.isFinite(n)) return n;
|
|
|
|
|
|
}
|
|
|
|
|
|
return 15;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ─── upload + version management ─────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
export interface UploadBerthPdfArgs {
|
|
|
|
|
|
berthId: string;
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
/**
|
|
|
|
|
|
* Acting tenant. Every public service entrypoint requires this so the berth
|
|
|
|
|
|
* lookup can be scoped to `(berth.id, port_id)` — without it a rep with
|
|
|
|
|
|
* berths:edit on port A could supply a port B berth UUID and write/read
|
|
|
|
|
|
* cross-tenant data. NotFoundError on mismatch.
|
|
|
|
|
|
*/
|
|
|
|
|
|
portId: string;
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
/** Already-uploaded storage key (the upload-url endpoint generated it) OR
|
|
|
|
|
|
* undefined to make this service compute one. */
|
|
|
|
|
|
storageKey?: string;
|
|
|
|
|
|
/** Raw bytes when the server proxies the upload (filesystem mode); when
|
|
|
|
|
|
* callers used a presigned PUT they pass `storageKey` and skip this. */
|
|
|
|
|
|
buffer?: Buffer;
|
|
|
|
|
|
fileName: string;
|
|
|
|
|
|
uploadedBy: string;
|
|
|
|
|
|
/** Pre-computed sha256 hex from the client (verified server-side anyway). */
|
|
|
|
|
|
sha256?: string;
|
|
|
|
|
|
/** Pre-computed bytes (used for the size cap pre-flight on direct uploads). */
|
|
|
|
|
|
fileSizeBytes?: number;
|
|
|
|
|
|
/** Result of running `parseBerthPdf` server-side. Optional — the rep may
|
|
|
|
|
|
* have skipped parsing on a re-upload. */
|
|
|
|
|
|
parseResult?: ParseResult;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
export interface UploadBerthPdfResult {
|
|
|
|
|
|
versionId: string;
|
|
|
|
|
|
storageKey: string;
|
|
|
|
|
|
versionNumber: number;
|
|
|
|
|
|
fileSizeBytes: number;
|
|
|
|
|
|
contentSha256: string;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Persist a per-berth PDF version. Either the raw `buffer` or a pre-uploaded
|
|
|
|
|
|
* `storageKey` (with optional `buffer` for verification) is required.
|
|
|
|
|
|
*
|
|
|
|
|
|
* Critical mitigations enforced here:
|
|
|
|
|
|
* - §14.6 magic-byte check against the buffer when present.
|
|
|
|
|
|
* - §14.6 size cap from `berth_pdf_max_upload_mb`.
|
|
|
|
|
|
* - Storage key namespaced under `berths/{id}/v{n}/...` so two reps racing
|
|
|
|
|
|
* on the same berth can't collide (the version-number unique index in
|
|
|
|
|
|
* the DB does the dedup).
|
|
|
|
|
|
*/
|
|
|
|
|
|
export async function uploadBerthPdf(args: UploadBerthPdfArgs): Promise<UploadBerthPdfResult> {
|
|
|
|
|
|
// 1. Resolve the berth + port for size-cap lookup.
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
// Tenant-scoped lookup — NotFoundError when the berth lives in a different
|
|
|
|
|
|
// port so a rep on port A cannot upload PDFs against port B's berths.
|
|
|
|
|
|
const berthRow = await db.query.berths.findFirst({
|
|
|
|
|
|
where: and(eq(berths.id, args.berthId), eq(berths.portId, args.portId)),
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (!berthRow) throw new NotFoundError('Berth');
|
|
|
|
|
|
const maxMb = await getMaxUploadMb(berthRow.portId);
|
|
|
|
|
|
const maxBytes = maxMb * 1024 * 1024;
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// 2. Per-berth advisory lock prevents two concurrent uploads from both
|
|
|
|
|
|
// computing version `v3` and racing to write blobs (the unique index
|
|
|
|
|
|
// on (berth_id, version_number) would catch the second insert, but
|
|
|
|
|
|
// only AFTER its blob is already in storage — leaving an orphan).
|
|
|
|
|
|
// The lock is scoped to a transaction wrapping the version-number
|
|
|
|
|
|
// read AND the blob write, so concurrent uploads serialize cleanly.
|
|
|
|
|
|
// NB: hash the UUID into a 32-bit int for pg_advisory_xact_lock(int).
|
|
|
|
|
|
const berthLockKey = hashBerthIdToInt(args.berthId);
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
|
|
|
|
|
|
// 3. Magic bytes + size when we have the buffer in hand.
|
|
|
|
|
|
const backend = await getStorageBackend();
|
|
|
|
|
|
const buffer = args.buffer;
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// UUID-based storage key path so two concurrent uploads can't collide
|
|
|
|
|
|
// on the same blob path (the version_number suffix used to be in the
|
|
|
|
|
|
// key but is now a separate DB column allocated under the per-berth
|
|
|
|
|
|
// advisory lock — see step 4).
|
|
|
|
|
|
let versionNumber = 1;
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
let storageKey =
|
|
|
|
|
|
args.storageKey ??
|
2026-05-05 04:07:03 +02:00
|
|
|
|
`berths/${args.berthId}/${crypto.randomUUID()}/${sanitizeFileName(args.fileName)}`;
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
let sizeBytes = args.fileSizeBytes ?? buffer?.length ?? 0;
|
|
|
|
|
|
let sha256 = args.sha256 ?? '';
|
|
|
|
|
|
|
|
|
|
|
|
if (buffer) {
|
|
|
|
|
|
if (!isPdfMagic(buffer)) {
|
|
|
|
|
|
// Best-effort cleanup if the storage already has a partial.
|
|
|
|
|
|
if (args.storageKey) await backend.delete(args.storageKey).catch(() => undefined);
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
throw new CodedError('BERTHS_PDF_MAGIC_BYTE');
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
if (buffer.length === 0) throw new CodedError('BERTHS_PDF_EMPTY');
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (buffer.length > maxBytes) {
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
throw new CodedError('BERTHS_PDF_TOO_LARGE', {
|
|
|
|
|
|
internalMessage: `PDF exceeds ${maxMb} MB upload cap (got ${(buffer.length / 1024 / 1024).toFixed(1)} MB).`,
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
const written = await backend.put(storageKey, buffer, { contentType: 'application/pdf' });
|
|
|
|
|
|
storageKey = written.key;
|
|
|
|
|
|
sizeBytes = written.sizeBytes;
|
|
|
|
|
|
sha256 = written.sha256;
|
|
|
|
|
|
} else if (args.storageKey) {
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// Browser uploaded directly via presigned URL — verify via HEAD + magic bytes.
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
const head = await backend.head(args.storageKey);
|
|
|
|
|
|
if (!head) {
|
|
|
|
|
|
throw new ValidationError('Uploaded object not found at expected storage key.');
|
|
|
|
|
|
}
|
|
|
|
|
|
if (head.sizeBytes === 0) {
|
|
|
|
|
|
await backend.delete(args.storageKey).catch(() => undefined);
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
throw new CodedError('BERTHS_PDF_EMPTY');
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
if (head.sizeBytes > maxBytes) {
|
|
|
|
|
|
await backend.delete(args.storageKey).catch(() => undefined);
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
throw new CodedError('BERTHS_PDF_TOO_LARGE', {
|
|
|
|
|
|
internalMessage: `PDF exceeds ${maxMb} MB upload cap (got ${(head.sizeBytes / 1024 / 1024).toFixed(1)} MB).`,
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
if (head.contentType !== 'application/pdf' && head.contentType !== 'application/octet-stream') {
|
|
|
|
|
|
await backend.delete(args.storageKey).catch(() => undefined);
|
|
|
|
|
|
throw new ValidationError(
|
|
|
|
|
|
`Uploaded object content-type is ${head.contentType}; expected application/pdf.`,
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// Magic-byte check on the presign path (§14.6 critical) - browser-
|
|
|
|
|
|
// uploaded objects could be anything until we read the bytes. Stream
|
|
|
|
|
|
// just the first 5 bytes; abort early on mismatch and delete the blob.
|
|
|
|
|
|
const probeBytes = await readFirstBytes(backend, args.storageKey, 5);
|
|
|
|
|
|
if (!isPdfMagic(probeBytes)) {
|
|
|
|
|
|
await backend.delete(args.storageKey).catch(() => undefined);
|
|
|
|
|
|
throw new ValidationError(
|
|
|
|
|
|
'Uploaded file failed PDF magic-byte check (does not start with %PDF-).',
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
sizeBytes = head.sizeBytes;
|
|
|
|
|
|
sha256 = args.sha256 ?? '';
|
|
|
|
|
|
storageKey = args.storageKey;
|
|
|
|
|
|
} else {
|
|
|
|
|
|
throw new ValidationError('Either buffer or storageKey is required.');
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// 4. Take the per-berth advisory lock, compute version_number under
|
|
|
|
|
|
// the lock, insert + bump pointer. All inside a single transaction
|
|
|
|
|
|
// so the lock + writes commit atomically.
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
const versionId = crypto.randomUUID();
|
|
|
|
|
|
await db.transaction(async (tx) => {
|
2026-05-05 04:07:03 +02:00
|
|
|
|
await tx.execute(sql`SELECT pg_advisory_xact_lock(${berthLockKey})`);
|
|
|
|
|
|
versionNumber = await nextVersionNumberTx(tx, args.berthId);
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
await tx.insert(berthPdfVersions).values({
|
|
|
|
|
|
id: versionId,
|
|
|
|
|
|
berthId: args.berthId,
|
|
|
|
|
|
versionNumber,
|
|
|
|
|
|
storageKey,
|
|
|
|
|
|
fileName: args.fileName,
|
|
|
|
|
|
fileSizeBytes: sizeBytes,
|
|
|
|
|
|
contentSha256: sha256,
|
|
|
|
|
|
uploadedBy: args.uploadedBy,
|
|
|
|
|
|
parseResults: args.parseResult ? serializeParseResult(args.parseResult) : null,
|
|
|
|
|
|
});
|
|
|
|
|
|
await tx
|
|
|
|
|
|
.update(berths)
|
|
|
|
|
|
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
|
|
|
|
|
|
.where(eq(berths.id, args.berthId));
|
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
|
|
logger.info(
|
|
|
|
|
|
{ berthId: args.berthId, versionId, versionNumber, storageKey, sizeBytes },
|
|
|
|
|
|
'Berth PDF version saved',
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
return { versionId, storageKey, versionNumber, fileSizeBytes: sizeBytes, contentSha256: sha256 };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
/** Tx-bound variant — same SELECT MAX(...) but inside the caller's transaction. */
|
|
|
|
|
|
async function nextVersionNumberTx(
|
|
|
|
|
|
tx: Parameters<Parameters<typeof db.transaction>[0]>[0],
|
|
|
|
|
|
berthId: string,
|
|
|
|
|
|
): Promise<number> {
|
|
|
|
|
|
const [row] = await tx
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
.select({ max: max(berthPdfVersions.versionNumber) })
|
|
|
|
|
|
.from(berthPdfVersions)
|
|
|
|
|
|
.where(eq(berthPdfVersions.berthId, berthId));
|
|
|
|
|
|
return (row?.max ?? 0) + 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
/**
|
|
|
|
|
|
* Hash a UUID berthId into a 32-bit signed integer for pg_advisory_xact_lock.
|
|
|
|
|
|
* Uses the first 4 bytes of sha256 reinterpreted as int32 — collisions are
|
|
|
|
|
|
* theoretically possible but the lock is per-berth so a collision just
|
|
|
|
|
|
* means two different berths' uploads serialize through the same key,
|
|
|
|
|
|
* which is harmless (correctness preserved, slight contention only).
|
|
|
|
|
|
*/
|
|
|
|
|
|
function hashBerthIdToInt(berthId: string): number {
|
|
|
|
|
|
const h = createHash('sha256').update(berthId).digest();
|
|
|
|
|
|
// Read as signed 32-bit big-endian; pg_advisory_xact_lock(int) signature.
|
|
|
|
|
|
return h.readInt32BE(0);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Stream just the first `n` bytes of a stored object so the magic-byte
|
|
|
|
|
|
* check on the presigned-PUT path can run without buffering the whole
|
|
|
|
|
|
* file. Returns a Buffer of up to `n` bytes (less if the file is shorter).
|
|
|
|
|
|
*/
|
|
|
|
|
|
async function readFirstBytes(
|
|
|
|
|
|
backend: Awaited<ReturnType<typeof getStorageBackend>>,
|
|
|
|
|
|
key: string,
|
|
|
|
|
|
n: number,
|
|
|
|
|
|
): Promise<Buffer> {
|
|
|
|
|
|
const stream = await backend.get(key);
|
|
|
|
|
|
const chunks: Buffer[] = [];
|
|
|
|
|
|
let total = 0;
|
|
|
|
|
|
for await (const chunk of stream as AsyncIterable<Buffer | string>) {
|
|
|
|
|
|
const buf = typeof chunk === 'string' ? Buffer.from(chunk) : chunk;
|
|
|
|
|
|
chunks.push(buf);
|
|
|
|
|
|
total += buf.length;
|
|
|
|
|
|
if (total >= n) break;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Best-effort dispose - some streams are still readable after iteration.
|
|
|
|
|
|
if (typeof (stream as { destroy?: () => void }).destroy === 'function') {
|
|
|
|
|
|
(stream as unknown as { destroy: () => void }).destroy();
|
|
|
|
|
|
}
|
|
|
|
|
|
return Buffer.concat(chunks).subarray(0, n);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
function sanitizeFileName(raw: string): string {
|
|
|
|
|
|
// Preserve the extension; replace spaces / disallowed chars with '_' so the
|
|
|
|
|
|
// result satisfies the storage-key validation regex.
|
|
|
|
|
|
const last = raw.split(/[\\/]/).pop() ?? raw;
|
|
|
|
|
|
return last.replace(/[^a-zA-Z0-9._-]/g, '_').slice(0, 200) || 'berth.pdf';
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
function serializeParseResult(parse: ParseResult): Record<string, unknown> {
|
|
|
|
|
|
return {
|
|
|
|
|
|
engine: parse.engine,
|
|
|
|
|
|
extracted: Object.fromEntries(
|
|
|
|
|
|
Object.entries(parse.fields).map(([k, v]) => [
|
|
|
|
|
|
k,
|
|
|
|
|
|
v ? { value: v.value, confidence: v.confidence } : null,
|
|
|
|
|
|
]),
|
|
|
|
|
|
),
|
|
|
|
|
|
meanConfidence: parse.meanConfidence,
|
|
|
|
|
|
warnings: parse.warnings,
|
|
|
|
|
|
};
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ─── reconcile + apply ───────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Walk every parsed field; classify into:
|
|
|
|
|
|
* - `autoApplied` when the CRM column is null/empty.
|
|
|
|
|
|
* - `conflicts` when both sides have a non-null value and they disagree.
|
|
|
|
|
|
*
|
|
|
|
|
|
* Numeric tolerance: ±1% (matches §14.6 imperial-vs-metric guidance, applied
|
|
|
|
|
|
* uniformly across all numeric columns since the same rounding noise affects
|
|
|
|
|
|
* weekly/daily rates too).
|
|
|
|
|
|
*/
|
|
|
|
|
|
export async function reconcilePdfWithBerth(
|
|
|
|
|
|
berthId: string,
|
|
|
|
|
|
parsed: ParseResult,
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
/** Tenant scope. NotFoundError on cross-port lookups. */
|
|
|
|
|
|
portId: string,
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
): Promise<ReconcileResult> {
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
const berthRow = await db.query.berths.findFirst({
|
|
|
|
|
|
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (!berthRow) throw new NotFoundError('Berth');
|
|
|
|
|
|
const fields = parsed.fields;
|
|
|
|
|
|
|
|
|
|
|
|
const autoApplied: ReconcileResult['autoApplied'] = [];
|
|
|
|
|
|
const conflicts: ReconcileConflict[] = [];
|
|
|
|
|
|
const warnings: string[] = [...parsed.warnings];
|
|
|
|
|
|
|
|
|
|
|
|
// §14.6 — mooring-number mismatch warning.
|
|
|
|
|
|
const pdfMooring = fields.mooringNumber?.value;
|
|
|
|
|
|
if (
|
|
|
|
|
|
pdfMooring &&
|
|
|
|
|
|
typeof pdfMooring === 'string' &&
|
|
|
|
|
|
pdfMooring.toUpperCase() !== berthRow.mooringNumber.toUpperCase()
|
|
|
|
|
|
) {
|
|
|
|
|
|
warnings.push(
|
|
|
|
|
|
`PDF says berth ${pdfMooring} but uploading to ${berthRow.mooringNumber}. Confirm before applying.`,
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
for (const key of APPLIABLE_FIELDS) {
|
|
|
|
|
|
const parsedField = fields[key];
|
|
|
|
|
|
if (!parsedField || parsedField.value == null) continue;
|
|
|
|
|
|
|
|
|
|
|
|
const crmRaw = (berthRow as Record<string, unknown>)[key];
|
|
|
|
|
|
const crmValue = normalizeForCompare(key, crmRaw);
|
|
|
|
|
|
const pdfValue = normalizeForCompare(key, parsedField.value);
|
|
|
|
|
|
|
|
|
|
|
|
if (crmValue == null || crmValue === '') {
|
|
|
|
|
|
autoApplied.push({ field: key, value: parsedField.value as string | number });
|
|
|
|
|
|
continue;
|
|
|
|
|
|
}
|
|
|
|
|
|
if (!valuesEqual(crmValue, pdfValue, NUMERIC_FIELDS.has(key))) {
|
|
|
|
|
|
conflicts.push({
|
|
|
|
|
|
field: key,
|
|
|
|
|
|
crmValue: crmValue as string | number | null,
|
|
|
|
|
|
pdfValue: pdfValue as string | number | null,
|
|
|
|
|
|
pdfConfidence: parsedField.confidence,
|
|
|
|
|
|
});
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
return { autoApplied, conflicts, warnings, engine: parsed.engine };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Apply a rep-confirmed slice of the reconcile diff to the berth row. The
|
|
|
|
|
|
* caller passes the canonical `ExtractedBerthFields` keys; anything outside
|
|
|
|
|
|
* `APPLIABLE_FIELDS` is silently dropped to keep this endpoint a hard
|
|
|
|
|
|
* allowlist.
|
2026-05-05 04:07:03 +02:00
|
|
|
|
*
|
|
|
|
|
|
* Mooring-mismatch gate (§14.6 critical): when the version's stored
|
|
|
|
|
|
* `parseResults.warnings` contains a mooring-mismatch warning, the apply
|
|
|
|
|
|
* is rejected unless the caller passes `confirmMooringMismatch: true`.
|
|
|
|
|
|
* This is the service-side enforcement of the "force re-confirm" rule —
|
|
|
|
|
|
* UI confirmation alone is not enough.
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
*/
|
|
|
|
|
|
export async function applyParseResults(
|
|
|
|
|
|
berthId: string,
|
|
|
|
|
|
versionId: string,
|
|
|
|
|
|
fieldsToApply: Partial<ExtractedBerthFields>,
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
/** Tenant scope. NotFoundError when berth lives in a different port. */
|
|
|
|
|
|
portId: string,
|
2026-05-05 04:07:03 +02:00
|
|
|
|
opts: { confirmMooringMismatch?: boolean } = {},
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
): Promise<{ updatedFields: Array<keyof ExtractedBerthFields> }> {
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
const berthRow = await db.query.berths.findFirst({
|
|
|
|
|
|
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (!berthRow) throw new NotFoundError('Berth');
|
|
|
|
|
|
const versionRow = await db.query.berthPdfVersions.findFirst({
|
|
|
|
|
|
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
|
|
|
|
|
|
});
|
|
|
|
|
|
if (!versionRow) throw new NotFoundError('Berth PDF version');
|
|
|
|
|
|
|
2026-05-05 04:07:03 +02:00
|
|
|
|
// §14.6 mooring-mismatch gate.
|
|
|
|
|
|
const priorParse = versionRow.parseResults as { warnings?: string[] } | null;
|
|
|
|
|
|
const hasMooringMismatch = (priorParse?.warnings ?? []).some(
|
|
|
|
|
|
(w) => /uploading to/i.test(w) && /berth/i.test(w),
|
|
|
|
|
|
);
|
|
|
|
|
|
if (hasMooringMismatch && !opts.confirmMooringMismatch) {
|
|
|
|
|
|
throw new ConflictError(
|
|
|
|
|
|
'PDF mooring mismatch with target berth. Pass confirmMooringMismatch=true to override.',
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
const update: Record<string, unknown> = {};
|
|
|
|
|
|
const applied: Array<keyof ExtractedBerthFields> = [];
|
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish
Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred
items (deferring the Documenso Phases 2-7 build and items needing design
decisions or live external instances).
DB schema:
- Migration 0046 converts 5 composite (port_id, archived_at) indexes to
partial WHERE archived_at IS NULL — clients, interests, yachts, and
both residential tables. Smaller, faster planner choice for the
dominant list-query shape.
Multi-tenant isolation:
- document_sends now verifies recipient.interestId belongs to the port
before landing on the audit row (the surrounding clientId check was
already port-scoped; interestId pollution was the gap).
Routes / API:
- /api/v1/custom-fields/[entityId] requires entityType query param and
gates on the matching resource permission (clients/interests/berths/
yachts/companies). Fixes the cross-resource gap where a user with
clients.view could read company custom-field values.
- Admin user list trash button wrapped in PermissionGate (edit was
already gated; remove was not).
Service polish:
- berth-recommender accepts string-shaped JSONB booleans
('true'/'false') so admin UIs that wrap values as strings don't
silently fall through to defaults.
- expense-pdf renderReceiptHeader anchors all text positions to a
captured baseY rather than reading mutating doc.y after rect+stroke.
Headers no longer drift on the first receipt page after a soft page
break.
- berth-pdf apply: collect non-finite numeric coercion drops + warn-log
them so partial silent drops are observable (was invisible because
the no-fields-supplied check only fires when ALL drop).
- Storage cache fingerprint comment documenting the encrypted-secret
invariant + the explicit invalidation hook.
UI polish:
- invoice-detail typed: replaced two `any` casts with a proper
InvoiceDetailData / LineItem / LinkedExpense interface set.
- YachtForm now accepts initialOwner prop. Wired through:
- client-yachts-tab passes { type: 'client', id: clientId }
- interest-form passes { type: 'client', id: selectedClientId }
- Interest-form yacht picker now includes company-owned yachts where
the selected client is a member (fetches client.companies and feeds
YachtPicker an array filter). Plus an inline "Add new" button that
opens YachtForm pre-bound to the client.
- YachtPicker accepts ownerFilter as single OR array for "match any"
semantics.
BACKLOG.md updated with what landed vs what's still deferred (and why
each deferred item is genuinely larger than this push warrants).
Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
|
|
|
|
// Capture keys whose values were supplied but couldn't be coerced
|
|
|
|
|
|
// (e.g. a numeric column receiving a non-finite or non-numeric value).
|
|
|
|
|
|
// Without this, partial silent drops are invisible because the
|
|
|
|
|
|
// "no appliable fields supplied" check only fires when EVERY key drops.
|
|
|
|
|
|
const dropped: Array<{ key: keyof ExtractedBerthFields; reason: string }> = [];
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
for (const key of APPLIABLE_FIELDS) {
|
|
|
|
|
|
const value = fieldsToApply[key];
|
|
|
|
|
|
if (value === undefined) continue;
|
|
|
|
|
|
if (value === null) {
|
|
|
|
|
|
update[key] = null;
|
|
|
|
|
|
applied.push(key);
|
|
|
|
|
|
continue;
|
|
|
|
|
|
}
|
|
|
|
|
|
if (NUMERIC_FIELDS.has(key)) {
|
|
|
|
|
|
const n = typeof value === 'number' ? value : Number(value);
|
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish
Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred
items (deferring the Documenso Phases 2-7 build and items needing design
decisions or live external instances).
DB schema:
- Migration 0046 converts 5 composite (port_id, archived_at) indexes to
partial WHERE archived_at IS NULL — clients, interests, yachts, and
both residential tables. Smaller, faster planner choice for the
dominant list-query shape.
Multi-tenant isolation:
- document_sends now verifies recipient.interestId belongs to the port
before landing on the audit row (the surrounding clientId check was
already port-scoped; interestId pollution was the gap).
Routes / API:
- /api/v1/custom-fields/[entityId] requires entityType query param and
gates on the matching resource permission (clients/interests/berths/
yachts/companies). Fixes the cross-resource gap where a user with
clients.view could read company custom-field values.
- Admin user list trash button wrapped in PermissionGate (edit was
already gated; remove was not).
Service polish:
- berth-recommender accepts string-shaped JSONB booleans
('true'/'false') so admin UIs that wrap values as strings don't
silently fall through to defaults.
- expense-pdf renderReceiptHeader anchors all text positions to a
captured baseY rather than reading mutating doc.y after rect+stroke.
Headers no longer drift on the first receipt page after a soft page
break.
- berth-pdf apply: collect non-finite numeric coercion drops + warn-log
them so partial silent drops are observable (was invisible because
the no-fields-supplied check only fires when ALL drop).
- Storage cache fingerprint comment documenting the encrypted-secret
invariant + the explicit invalidation hook.
UI polish:
- invoice-detail typed: replaced two `any` casts with a proper
InvoiceDetailData / LineItem / LinkedExpense interface set.
- YachtForm now accepts initialOwner prop. Wired through:
- client-yachts-tab passes { type: 'client', id: clientId }
- interest-form passes { type: 'client', id: selectedClientId }
- Interest-form yacht picker now includes company-owned yachts where
the selected client is a member (fetches client.companies and feeds
YachtPicker an array filter). Plus an inline "Add new" button that
opens YachtForm pre-bound to the client.
- YachtPicker accepts ownerFilter as single OR array for "match any"
semantics.
BACKLOG.md updated with what landed vs what's still deferred (and why
each deferred item is genuinely larger than this push warrants).
Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
|
|
|
|
if (!Number.isFinite(n)) {
|
|
|
|
|
|
dropped.push({ key, reason: `non-finite numeric (${typeof value}: ${String(value)})` });
|
|
|
|
|
|
continue;
|
|
|
|
|
|
}
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
// numeric columns expect strings to preserve precision.
|
|
|
|
|
|
update[key] = String(n);
|
|
|
|
|
|
} else {
|
|
|
|
|
|
update[key] = String(value);
|
|
|
|
|
|
}
|
|
|
|
|
|
applied.push(key);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (applied.length === 0) {
|
|
|
|
|
|
throw new ValidationError('No appliable fields supplied.');
|
|
|
|
|
|
}
|
fix(audit): backlog sweep — partial archived indexes, custom-fields per-entity gate, polish
Wave through the 2026-05-07 backlog of small/concrete audit-final-deferred
items (deferring the Documenso Phases 2-7 build and items needing design
decisions or live external instances).
DB schema:
- Migration 0046 converts 5 composite (port_id, archived_at) indexes to
partial WHERE archived_at IS NULL — clients, interests, yachts, and
both residential tables. Smaller, faster planner choice for the
dominant list-query shape.
Multi-tenant isolation:
- document_sends now verifies recipient.interestId belongs to the port
before landing on the audit row (the surrounding clientId check was
already port-scoped; interestId pollution was the gap).
Routes / API:
- /api/v1/custom-fields/[entityId] requires entityType query param and
gates on the matching resource permission (clients/interests/berths/
yachts/companies). Fixes the cross-resource gap where a user with
clients.view could read company custom-field values.
- Admin user list trash button wrapped in PermissionGate (edit was
already gated; remove was not).
Service polish:
- berth-recommender accepts string-shaped JSONB booleans
('true'/'false') so admin UIs that wrap values as strings don't
silently fall through to defaults.
- expense-pdf renderReceiptHeader anchors all text positions to a
captured baseY rather than reading mutating doc.y after rect+stroke.
Headers no longer drift on the first receipt page after a soft page
break.
- berth-pdf apply: collect non-finite numeric coercion drops + warn-log
them so partial silent drops are observable (was invisible because
the no-fields-supplied check only fires when ALL drop).
- Storage cache fingerprint comment documenting the encrypted-secret
invariant + the explicit invalidation hook.
UI polish:
- invoice-detail typed: replaced two `any` casts with a proper
InvoiceDetailData / LineItem / LinkedExpense interface set.
- YachtForm now accepts initialOwner prop. Wired through:
- client-yachts-tab passes { type: 'client', id: clientId }
- interest-form passes { type: 'client', id: selectedClientId }
- Interest-form yacht picker now includes company-owned yachts where
the selected client is a member (fetches client.companies and feeds
YachtPicker an array filter). Plus an inline "Add new" button that
opens YachtForm pre-bound to the client.
- YachtPicker accepts ownerFilter as single OR array for "match any"
semantics.
BACKLOG.md updated with what landed vs what's still deferred (and why
each deferred item is genuinely larger than this push warrants).
Tests: 1185/1185 vitest, tsc clean.
2026-05-07 21:45:42 +02:00
|
|
|
|
if (dropped.length > 0) {
|
|
|
|
|
|
logger.warn(
|
|
|
|
|
|
{ berthId, versionId, dropped },
|
|
|
|
|
|
'Berth PDF apply: silently dropped fields that failed type coercion',
|
|
|
|
|
|
);
|
|
|
|
|
|
}
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
update.updatedAt = new Date();
|
|
|
|
|
|
|
|
|
|
|
|
await db.transaction(async (tx) => {
|
|
|
|
|
|
await tx.update(berths).set(update).where(eq(berths.id, berthId));
|
|
|
|
|
|
// Stamp the applied-field set onto parse_results for audit.
|
|
|
|
|
|
const prior = (versionRow.parseResults as Record<string, unknown> | null) ?? {};
|
|
|
|
|
|
await tx
|
|
|
|
|
|
.update(berthPdfVersions)
|
|
|
|
|
|
.set({
|
|
|
|
|
|
parseResults: {
|
|
|
|
|
|
...prior,
|
|
|
|
|
|
appliedFields: applied,
|
|
|
|
|
|
appliedAt: new Date().toISOString(),
|
|
|
|
|
|
},
|
|
|
|
|
|
})
|
|
|
|
|
|
.where(eq(berthPdfVersions.id, versionId));
|
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
|
|
return { updatedFields: applied };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ─── version listing + rollback ──────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
export interface BerthPdfVersionListItem {
|
|
|
|
|
|
id: string;
|
|
|
|
|
|
versionNumber: number;
|
|
|
|
|
|
fileName: string;
|
|
|
|
|
|
fileSizeBytes: number;
|
|
|
|
|
|
uploadedBy: string;
|
|
|
|
|
|
uploadedAt: Date;
|
|
|
|
|
|
isCurrent: boolean;
|
|
|
|
|
|
/** Pre-signed download URL (15-min TTL). */
|
|
|
|
|
|
downloadUrl: string;
|
|
|
|
|
|
downloadUrlExpiresAt: Date;
|
|
|
|
|
|
parseEngine: ParserEngine | null;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
export async function listBerthPdfVersions(
|
|
|
|
|
|
berthId: string,
|
|
|
|
|
|
/** Tenant scope. NotFoundError when berth lives in a different port. */
|
|
|
|
|
|
portId: string,
|
|
|
|
|
|
): Promise<BerthPdfVersionListItem[]> {
|
|
|
|
|
|
const berthRow = await db.query.berths.findFirst({
|
|
|
|
|
|
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (!berthRow) throw new NotFoundError('Berth');
|
|
|
|
|
|
|
|
|
|
|
|
const rows = await db
|
|
|
|
|
|
.select()
|
|
|
|
|
|
.from(berthPdfVersions)
|
|
|
|
|
|
.where(eq(berthPdfVersions.berthId, berthId))
|
|
|
|
|
|
.orderBy(desc(berthPdfVersions.versionNumber));
|
|
|
|
|
|
|
|
|
|
|
|
const backend = await getStorageBackend();
|
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
|
|
|
|
// Presign in parallel — for an S3 backend each call is a separate HTTP
|
|
|
|
|
|
// round-trip, so a 20-version berth used to take 20× the latency in
|
|
|
|
|
|
// the sequential loop. Promise.all collapses to ~1× round-trip.
|
|
|
|
|
|
const presigned = await Promise.all(
|
|
|
|
|
|
rows.map((row) =>
|
|
|
|
|
|
backend.presignDownload(row.storageKey, {
|
|
|
|
|
|
expirySeconds: 900,
|
|
|
|
|
|
filename: row.fileName,
|
|
|
|
|
|
contentType: 'application/pdf',
|
|
|
|
|
|
}),
|
|
|
|
|
|
),
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
return rows.map((row, i) => {
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
const parseEngine = (row.parseResults as { engine?: ParserEngine } | null)?.engine ?? null;
|
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
|
|
|
|
return {
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
id: row.id,
|
|
|
|
|
|
versionNumber: row.versionNumber,
|
|
|
|
|
|
fileName: row.fileName,
|
|
|
|
|
|
fileSizeBytes: row.fileSizeBytes,
|
|
|
|
|
|
uploadedBy: row.uploadedBy,
|
|
|
|
|
|
uploadedAt: row.uploadedAt,
|
|
|
|
|
|
isCurrent: berthRow.currentPdfVersionId === row.id,
|
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
|
|
|
|
downloadUrl: presigned[i]!.url,
|
|
|
|
|
|
downloadUrlExpiresAt: presigned[i]!.expiresAt,
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
parseEngine,
|
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
|
|
|
|
};
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
|
* Set `berths.current_pdf_version_id` to the requested version. Per §14.6,
|
|
|
|
|
|
* this does NOT re-parse and re-update the berth columns — that's a separate
|
|
|
|
|
|
* deliberate "extract data from this version" action.
|
|
|
|
|
|
*/
|
|
|
|
|
|
export async function rollbackToVersion(
|
|
|
|
|
|
berthId: string,
|
|
|
|
|
|
versionId: string,
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
/** Tenant scope. NotFoundError when berth lives in a different port. */
|
|
|
|
|
|
portId: string,
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
): Promise<{ versionId: string; versionNumber: number }> {
|
|
|
|
|
|
const versionRow = await db.query.berthPdfVersions.findFirst({
|
|
|
|
|
|
where: and(eq(berthPdfVersions.id, versionId), eq(berthPdfVersions.berthId, berthId)),
|
|
|
|
|
|
});
|
|
|
|
|
|
if (!versionRow) throw new NotFoundError('Berth PDF version');
|
fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):
GET /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-versions
POST /api/v1/berths/[id]/pdf-upload-url
POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
POST /api/v1/berths/[id]/pdf-versions/parse-results/apply
Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns
Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.
Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
`and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.
Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
|
|
|
|
const berthRow = await db.query.berths.findFirst({
|
|
|
|
|
|
where: and(eq(berths.id, berthId), eq(berths.portId, portId)),
|
|
|
|
|
|
});
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
if (!berthRow) throw new NotFoundError('Berth');
|
|
|
|
|
|
|
|
|
|
|
|
if (berthRow.currentPdfVersionId === versionId) {
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
|
throw new CodedError('BERTHS_VERSION_ALREADY_CURRENT');
|
feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
await db
|
|
|
|
|
|
.update(berths)
|
|
|
|
|
|
.set({ currentPdfVersionId: versionId, updatedAt: new Date() })
|
|
|
|
|
|
.where(eq(berths.id, berthId));
|
|
|
|
|
|
|
|
|
|
|
|
return { versionId, versionNumber: versionRow.versionNumber };
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ─── compare helpers ─────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
function normalizeForCompare(
|
|
|
|
|
|
key: keyof ExtractedBerthFields,
|
|
|
|
|
|
raw: unknown,
|
|
|
|
|
|
): string | number | null {
|
|
|
|
|
|
if (raw == null) return null;
|
|
|
|
|
|
if (NUMERIC_FIELDS.has(key)) {
|
|
|
|
|
|
const n = typeof raw === 'number' ? raw : Number(String(raw).replace(/[^0-9.\-]/g, ''));
|
|
|
|
|
|
return Number.isFinite(n) ? n : null;
|
|
|
|
|
|
}
|
|
|
|
|
|
if (typeof raw === 'string') return raw.trim();
|
|
|
|
|
|
return String(raw);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
function valuesEqual(a: unknown, b: unknown, isNumeric: boolean): boolean {
|
|
|
|
|
|
if (a == null && b == null) return true;
|
|
|
|
|
|
if (a == null || b == null) return false;
|
|
|
|
|
|
if (isNumeric) {
|
|
|
|
|
|
const an = Number(a);
|
|
|
|
|
|
const bn = Number(b);
|
|
|
|
|
|
if (!Number.isFinite(an) || !Number.isFinite(bn)) return false;
|
|
|
|
|
|
if (an === bn) return true;
|
|
|
|
|
|
if (bn === 0) return Math.abs(an - bn) < 0.0001;
|
|
|
|
|
|
return Math.abs(an - bn) / Math.abs(bn) <= IMPERIAL_METRIC_TOLERANCE;
|
|
|
|
|
|
}
|
|
|
|
|
|
return String(a).trim().toLowerCase() === String(b).trim().toLowerCase();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ─── re-exports the route layer leans on ─────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
export { parseBerthPdf } from './berth-pdf-parser';
|
|
|
|
|
|
export type {
|
|
|
|
|
|
ExtractedBerthFields,
|
|
|
|
|
|
ParsedField,
|
|
|
|
|
|
ParseResult,
|
|
|
|
|
|
ParserEngine,
|
|
|
|
|
|
} from './berth-pdf-parser';
|