feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.
Schema (migration 0030_berth_pdf_versions):
- new table `berth_pdf_versions` with monotonic `version_number` per
berth, `storage_key` (renamed convention from §4.7a), sha256, size,
`download_url_expires_at` cache slot for §11.1 signed-URL throttling,
and `parse_results` jsonb for the audit trail.
- new column `berths.current_pdf_version_id` (deferred from Phase 0)
with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
- relations + types exported from `schema/berths.ts`.
3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
`mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
fields, so this is defensive coverage for future templates.
2. OCR via Tesseract.js — positional/regex heuristics keyed off the
§9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
`WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
per-field confidence + global mean; flags imperial-vs-metric drift
>1% in `warnings`.
3. AI fallback — gated via `getResolvedOcrConfig()` (existing
openai/claude provider). Surfaced from the diff dialog only when
`shouldOfferAiTier()` returns true (mean OCR confidence below
0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.
Service layer (`lib/services/berth-pdf.service.ts`):
- `uploadBerthPdf()` — magic-byte check, size cap, version-number
bump + current pointer in one transaction.
- `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
columns; warns on mooring-number-in-PDF mismatch (§14.6).
- `applyParseResults()` — hard allowlist of writable columns;
stamps `appliedFields` onto `parse_results` for audit.
- `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
- `listBerthPdfVersions()` — version list with 15-min signed URLs.
- `getMaxUploadMb()` — port-override → global → default 15 lookup
on `system_settings.berth_pdf_max_upload_mb`.
§14.6 critical mitigations:
- Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
storage object and rejects the request.
- Size cap from `system_settings.berth_pdf_max_upload_mb` (default
15 MB); enforced in the upload-url presign AND server-side.
- 0-byte uploads rejected.
- Mooring-number mismatch surfaces as a `warnings[]` entry on the
reconcile result so the rep sees it in the diff dialog.
- Imperial vs metric ±1% tolerance in both the parser warnings and
the reconcile equality check.
- Path traversal already blocked at the storage layer (Phase 6a).
API + UI:
- `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
HMAC-signed proxy URL (filesystem) sized to the per-port cap.
- `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
`backend.head()`, writes the row, bumps `current_pdf_version_id`.
- `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
- `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
- `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
rep-confirmed diff payload.
- New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
with current-PDF panel, version history, Replace PDF button, and
`<PdfReconcileDialog>` for the auto-applied + conflicts UX.
System settings:
- `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
+ server-side validation. Resolved port-override → global → default.
Tests:
- `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
drift warning, AI-tier gate.
- `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
pdf-lib AcroForm round-trip.
- `tests/integration/berth-pdf-versions.test.ts` — upload, version-
number bump, magic-byte rejection, reconcile auto-applied vs
conflicts vs ±1% tolerance, mooring-number warning,
applyParseResults allowlist enforcement, rollback semantics.
Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
24
src/lib/db/migrations/0030_berth_pdf_versions.sql
Normal file
24
src/lib/db/migrations/0030_berth_pdf_versions.sql
Normal file
@@ -0,0 +1,24 @@
|
||||
CREATE TABLE "berth_pdf_versions" (
|
||||
"id" text PRIMARY KEY NOT NULL,
|
||||
"berth_id" text NOT NULL,
|
||||
"version_number" integer NOT NULL,
|
||||
"storage_key" text NOT NULL,
|
||||
"file_name" text NOT NULL,
|
||||
"file_size_bytes" integer NOT NULL,
|
||||
"content_sha256" text NOT NULL,
|
||||
"uploaded_by" text NOT NULL,
|
||||
"uploaded_at" timestamp with time zone DEFAULT now() NOT NULL,
|
||||
"download_url_expires_at" timestamp with time zone,
|
||||
"parse_results" jsonb
|
||||
);
|
||||
--> statement-breakpoint
|
||||
ALTER TABLE "berths" ADD COLUMN "current_pdf_version_id" text;--> statement-breakpoint
|
||||
ALTER TABLE "berth_pdf_versions" ADD CONSTRAINT "berth_pdf_versions_berth_id_berths_id_fk" FOREIGN KEY ("berth_id") REFERENCES "public"."berths"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
|
||||
CREATE UNIQUE INDEX "berth_pdf_versions_berth_version_idx" ON "berth_pdf_versions" USING btree ("berth_id","version_number");--> statement-breakpoint
|
||||
CREATE INDEX "idx_bpv_berth" ON "berth_pdf_versions" USING btree ("berth_id","uploaded_at");--> statement-breakpoint
|
||||
-- berths.current_pdf_version_id -> berth_pdf_versions.id (added after both tables
|
||||
-- exist to break the circular FK declaration; ON DELETE SET NULL so deleting the
|
||||
-- pointed-at row keeps the berth and just clears the pointer).
|
||||
ALTER TABLE "berths" ADD CONSTRAINT "berths_current_pdf_version_id_fk"
|
||||
FOREIGN KEY ("current_pdf_version_id") REFERENCES "public"."berth_pdf_versions"("id")
|
||||
ON DELETE SET NULL ON UPDATE NO ACTION;
|
||||
11010
src/lib/db/migrations/meta/0030_snapshot.json
Normal file
11010
src/lib/db/migrations/meta/0030_snapshot.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -211,6 +211,13 @@
|
||||
"when": 1777941465866,
|
||||
"tag": "0029_puzzling_romulus",
|
||||
"breakpoints": true
|
||||
},
|
||||
{
|
||||
"idx": 30,
|
||||
"version": "7",
|
||||
"when": 1777944021221,
|
||||
"tag": "0030_berth_pdf_versions",
|
||||
"breakpoints": true
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
@@ -76,6 +76,11 @@ export const berths = pgTable(
|
||||
// against updated_at to detect human edits made after the last import,
|
||||
// so re-running the import doesn't clobber CRM-side overrides.
|
||||
lastImportedAt: timestamp('last_imported_at', { withTimezone: true }),
|
||||
// Pointer to the active per-berth PDF version (Phase 6b). Null until a
|
||||
// rep uploads the first PDF; a later rollback can re-target this column
|
||||
// to any prior `berth_pdf_versions.id`. The full history lives in the
|
||||
// junction table — this column is just the "current" pointer.
|
||||
currentPdfVersionId: text('current_pdf_version_id'),
|
||||
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
|
||||
updatedAt: timestamp('updated_at', { withTimezone: true }).notNull().defaultNow(),
|
||||
},
|
||||
@@ -181,6 +186,46 @@ export const berthMaintenanceLog = pgTable(
|
||||
(table) => [index('idx_bml_berth').on(table.berthId), index('idx_bml_port').on(table.portId)],
|
||||
);
|
||||
|
||||
/**
|
||||
* Per-berth PDF version history (Phase 6b — see plan §3.3 / §4.7b).
|
||||
*
|
||||
* Each upload creates a new row with a monotonic `versionNumber` per berth.
|
||||
* The active version is referenced by `berths.current_pdf_version_id`. The
|
||||
* storage_key points at the file in the active `StorageBackend` (s3/filesystem),
|
||||
* which is resolved at access time via `getStorageBackend()`.
|
||||
*
|
||||
* `parseResults` captures what the 3-tier reverse parser extracted at upload
|
||||
* time plus any conflicts the rep resolved in the diff dialog. Kept as audit
|
||||
* trail; rolling back to a prior version does NOT replay these (per §14.6).
|
||||
*/
|
||||
export const berthPdfVersions = pgTable(
|
||||
'berth_pdf_versions',
|
||||
{
|
||||
id: text('id')
|
||||
.primaryKey()
|
||||
.$defaultFn(() => crypto.randomUUID()),
|
||||
berthId: text('berth_id')
|
||||
.notNull()
|
||||
.references(() => berths.id, { onDelete: 'cascade' }),
|
||||
versionNumber: integer('version_number').notNull(),
|
||||
/** Object key in the active storage backend (renamed from `s3_key` per §4.7a). */
|
||||
storageKey: text('storage_key').notNull(),
|
||||
fileName: text('file_name').notNull(),
|
||||
fileSizeBytes: integer('file_size_bytes').notNull(),
|
||||
contentSha256: text('content_sha256').notNull(),
|
||||
uploadedBy: text('uploaded_by').notNull(),
|
||||
uploadedAt: timestamp('uploaded_at', { withTimezone: true }).notNull().defaultNow(),
|
||||
/** Cached signed-URL expiry per §11.1 — re-sign only when within 1h of expiry. */
|
||||
downloadUrlExpiresAt: timestamp('download_url_expires_at', { withTimezone: true }),
|
||||
/** { engine: 'acroform'|'ocr'|'ai', extracted: {...}, conflicts: [...], appliedFields: [...] } */
|
||||
parseResults: jsonb('parse_results'),
|
||||
},
|
||||
(table) => [
|
||||
uniqueIndex('berth_pdf_versions_berth_version_idx').on(table.berthId, table.versionNumber),
|
||||
index('idx_bpv_berth').on(table.berthId, table.uploadedAt),
|
||||
],
|
||||
);
|
||||
|
||||
export const berthTags = pgTable(
|
||||
'berth_tags',
|
||||
{
|
||||
@@ -202,3 +247,5 @@ export type BerthWaitingList = typeof berthWaitingList.$inferSelect;
|
||||
export type NewBerthWaitingList = typeof berthWaitingList.$inferInsert;
|
||||
export type BerthMaintenanceLog = typeof berthMaintenanceLog.$inferSelect;
|
||||
export type NewBerthMaintenanceLog = typeof berthMaintenanceLog.$inferInsert;
|
||||
export type BerthPdfVersion = typeof berthPdfVersions.$inferSelect;
|
||||
export type NewBerthPdfVersion = typeof berthPdfVersions.$inferInsert;
|
||||
|
||||
@@ -40,6 +40,7 @@ import {
|
||||
berthWaitingList,
|
||||
berthMaintenanceLog,
|
||||
berthTags,
|
||||
berthPdfVersions,
|
||||
} from './berths';
|
||||
|
||||
// Reservations
|
||||
@@ -411,6 +412,19 @@ export const berthsRelations = relations(berths, ({ one, many }) => ({
|
||||
tags: many(berthTags),
|
||||
interestBerths: many(interestBerths),
|
||||
reminders: many(reminders),
|
||||
pdfVersions: many(berthPdfVersions),
|
||||
currentPdfVersion: one(berthPdfVersions, {
|
||||
fields: [berths.currentPdfVersionId],
|
||||
references: [berthPdfVersions.id],
|
||||
relationName: 'berthCurrentPdfVersion',
|
||||
}),
|
||||
}));
|
||||
|
||||
export const berthPdfVersionsRelations = relations(berthPdfVersions, ({ one }) => ({
|
||||
berth: one(berths, {
|
||||
fields: [berthPdfVersions.berthId],
|
||||
references: [berths.id],
|
||||
}),
|
||||
}));
|
||||
|
||||
export const berthMapDataRelations = relations(berthMapData, ({ one }) => ({
|
||||
|
||||
Reference in New Issue
Block a user