Commit Graph

4 Commits

Author SHA1 Message Date
Matt Ciaccio
d4b3a1338f fix(security): scope berth-pdf service entrypoints by portId
Post-merge security review caught a cross-tenant authorization bypass
in the per-berth PDF endpoints (HIGH severity, confidence 10):

  GET    /api/v1/berths/[id]/pdf-versions
  POST   /api/v1/berths/[id]/pdf-versions
  POST   /api/v1/berths/[id]/pdf-upload-url
  POST   /api/v1/berths/[id]/pdf-versions/[versionId]/rollback
  POST   /api/v1/berths/[id]/pdf-versions/parse-results/apply

Each handler looked up the target berth by id only — `eq(berths.id, ...)`.
withAuth resolves ctx.portId from the user-controlled X-Port-Id header
(only verifying the user has SOME role on that port), and
withPermission('berths', 'view'|'edit', ...) is a coarse capability
check, not a row-level grant. A rep with berths:edit on Port A could
supply a Port B berth UUID and:
- list + receive 15-min presigned download URLs to every PDF version
- mint an upload URL targeting `berths/<port-B-id>/uploads/...`
- POST a new version (overwriting current_pdf_version_id on foreign berth)
- rollback to any prior version on a foreign berth
- apply rep-confirmed parse-result fields onto a foreign berth's columns

Sibling routes (waiting-list etc.) already pair the id filter with
`eq(berths.portId, ctx.portId)`, so this was an omission, not design.

Fix:
- Push `portId: string` into uploadBerthPdf, listBerthPdfVersions,
  rollbackToVersion, applyParseResults, reconcilePdfWithBerth.
- Each function now filters the berth lookup with
  `and(eq(berths.id, ...), eq(berths.portId, portId))` and throws
  NotFoundError on mismatch (no foreign-port disclosure).
- Inline the same `and(...)` filter in the pdf-upload-url handler.
- Every handler passes ctx.portId through.

Coverage:
- New `cross-port tenant guard` test exercises every entrypoint with a
  foreign-port id and asserts NotFoundError.
- 1164/1164 vitest passing. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:31:33 +02:00
Matt Ciaccio
014bbe1923 feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
  - buffered every receipt image into memory simultaneously
  - accumulated PDF chunks into an array, concat'd at end
  - base64-encoded the whole PDF into a JSON response (3x peak memory)
  - had no image downscaling

The new design:
  - `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
    pdfkit pipes bytes directly to the HTTP response (no Buffer
    accumulation). Receipts are processed serially so peak heap is one
    image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
    to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
    500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
    GB for the same input.
  - Pages: cover summary box (count, totals, currency equiv, optional
    processing fee), grouped expense table (groupBy=none|payer|category|
    date), one-page-per-receipt with header (establishment, amount,
    date, payer, category, file name) and full-bleed image.
  - Storage backend abstraction — receipts stream from
    `getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
  - Route: POST /api/v1/expenses/export/pdf streams binary
    application/pdf with cache-control:no-store. Validator caps
    expenseIds at 1000 to prevent runaway loops.

Receipt-less expense flow (per user request):
  - Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
    boolean (default false).
  - Validator: createExpenseSchema requires either receiptFileIds OR
    noReceiptAcknowledged=true; the .refine() error message tells the
    rep exactly what to do. updateExpenseSchema is partial and skips
    the rule (existing rows can be edited without re-acknowledging).
  - PDF: receiptless expenses get an inline red "(no receipt)" tag in
    the establishment cell + a red footer warning in the summary box
    showing the count and at-risk amount.
  - The legacy parent-company reimbursement queue may refuse to pay
    receiptless expenses, so the warning is load-bearing for ops.

Audit-3 fixes piggy-backed:
  - 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
    protection — a crafted PDF rasterizing to high-res noise could
    pin the worker indefinitely).
  - 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
    legacy single-brochure fast-path was discarding its result on the
    multi-brochure branch).
  - 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
    presignDownload calls instead of awaiting each in a for-loop —
    20-version berths went from 20× round-trip to 1×.
  - 🟡 public berths route no longer logs the full `row` object on
    enum drift (was dumping price + amenity columns into ops logs).
  - 🟡 dropped the dead `void sql` import from public berths route.

Tests still 1163/1163. tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
Matt Ciaccio
86372a857f fix(audit): post-review hardening across phases 0-7
15 of 17 findings from the consolidated audit (3 reviewer agents on
the previously-shipped phase commits). Remaining two are nice-to-have
follow-ups deferred.

Critical (data integrity / security):
- Public berths API: closed-deal junction rows no longer flip a berth
  to "Under Offer" - filter on `interests.outcome IS NULL` so won/
  lost/cancelled don't pollute public-map status. Both list +
  single-mooring routes.
- Recommender heat: cancelled outcomes now count as fall-throughs
  (SQL was `LIKE 'lost%'` which silently dropped them, leaving
  cancelled-only berths stuck in tier A).
- Filesystem presignDownload returns an absolute URL (origin from
  APP_URL) so emailed download links resolve from external mail
  clients.
- Magic-byte verification on the presigned-PUT path: both per-berth
  PDFs and brochures stream the first 5 bytes via the storage backend
  and reject + delete on `%PDF-` mismatch (was only enforced when the
  server saw the buffer; presign-PUT was wide open).
- Replay-protection TTL aligned to the token's own expiry (was a
  fixed 30 min, but send-out tokens live 24 h). Floor 60 s, ceiling
  25 days.
- Brochures unique partial index on (port_id) WHERE is_default=true
  + 0032 migration. Closes the read-then-write race in the create/
  update transactions.

Important:
- Recommender SQL: defense-in-depth `i.port_id = $portId` filter on
  the aggregates CTE.
- berth-pdf service: per-berth pg_advisory_xact_lock around the
  version-number SELECT + insert. Storage key is now UUID-based so
  concurrent uploads can't collide on blob paths. Replaces
  `nextVersionNumber` with the tx-bound variant.
- berth-pdf apply: rejects with ConflictError when parse_results
  contain a mooring-mismatch warning unless the caller passes
  `confirmMooringMismatch: true` (force-reconfirm gate was UI-only).
- Send-out body: HTML-escape brochure filename in the download-link
  fallback (XSS guard).
- parseDecimalWithUnit rejects negative numbers.
- listClients DISTINCT ON for primary contact resolution: bounds
  contact-row count to ~2 per client.

Defensive:
- verifyProxyToken rejects NaN/Infinity expiries via Number.isFinite.
- Replaced sql ANY() with inArray() in interest-berths.

Tests: 1145 -> 1163 passing.

Deferred: bulk-send rate limit (no bulk endpoint today), markdown
italic regex breaking links with asterisks (cosmetic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:07:03 +02:00
Matt Ciaccio
249ffe3e4a feat(berths): per-berth PDF storage (versioned) + reverse parser
Phase 6b of the berth-recommender refactor (see
docs/berth-recommender-and-pdf-plan.md §3.2, §3.3, §4.7b, §11.1, §14.6).
Builds on the Phase 6a pluggable storage backend (commit 83693dd) — every
file write goes through `getStorageBackend()`; no direct minio imports.

Schema (migration 0030_berth_pdf_versions):
  - new table `berth_pdf_versions` with monotonic `version_number` per
    berth, `storage_key` (renamed convention from §4.7a), sha256, size,
    `download_url_expires_at` cache slot for §11.1 signed-URL throttling,
    and `parse_results` jsonb for the audit trail.
  - new column `berths.current_pdf_version_id` (deferred from Phase 0)
    with FK to `berth_pdf_versions(id)` ON DELETE SET NULL.
  - relations + types exported from `schema/berths.ts`.

3-tier reverse parser (`lib/services/berth-pdf-parser.ts`):
  1. AcroForm via pdf-lib — pulls named fields (`length_ft`,
     `mooring_number`, etc.) at confidence 1. Sample PDF has 0 such
     fields, so this is defensive coverage for future templates.
  2. OCR via Tesseract.js — positional/regex heuristics keyed off the
     §9.2 layout (Length/Width/Water Depth as `<imperial> / <metric>`,
     `WEEK HIGH / LOW`, `CONFIRMED THROUGH UNTIL <date>`, etc.). Returns
     per-field confidence + global mean; flags imperial-vs-metric drift
     >1% in `warnings`.
  3. AI fallback — gated via `getResolvedOcrConfig()` (existing
     openai/claude provider). Surfaced from the diff dialog only when
     `shouldOfferAiTier()` returns true (mean OCR confidence below
     0.55 threshold), so OPENAI_API_KEY isn't burned on every upload.

Service layer (`lib/services/berth-pdf.service.ts`):
  - `uploadBerthPdf()` — magic-byte check, size cap, version-number
    bump + current pointer in one transaction.
  - `reconcilePdfWithBerth()` — auto-applies fields where CRM is null;
    flags conflicts when CRM and PDF disagree; tolerates ±1% on numeric
    columns; warns on mooring-number-in-PDF mismatch (§14.6).
  - `applyParseResults()` — hard allowlist of writable columns;
    stamps `appliedFields` onto `parse_results` for audit.
  - `rollbackToVersion()` — pointer flip only, never re-parses (§14.6).
  - `listBerthPdfVersions()` — version list with 15-min signed URLs.
  - `getMaxUploadMb()` — port-override → global → default 15 lookup
    on `system_settings.berth_pdf_max_upload_mb`.

§14.6 critical mitigations:
  - Magic-byte check (`%PDF-`) on every upload; mismatch deletes the
    storage object and rejects the request.
  - Size cap from `system_settings.berth_pdf_max_upload_mb` (default
    15 MB); enforced in the upload-url presign AND server-side.
  - 0-byte uploads rejected.
  - Mooring-number mismatch surfaces as a `warnings[]` entry on the
    reconcile result so the rep sees it in the diff dialog.
  - Imperial vs metric ±1% tolerance in both the parser warnings and
    the reconcile equality check.
  - Path traversal already blocked at the storage layer (Phase 6a).

API + UI:
  - `POST /api/v1/berths/[id]/pdf-upload-url` — presigned URL (S3) or
    HMAC-signed proxy URL (filesystem) sized to the per-port cap.
  - `POST /api/v1/berths/[id]/pdf-versions` — verifies the upload via
    `backend.head()`, writes the row, bumps `current_pdf_version_id`.
  - `GET /api/v1/berths/[id]/pdf-versions` — version list + signed URLs.
  - `POST /api/v1/berths/[id]/pdf-versions/[versionId]/rollback`.
  - `POST /api/v1/berths/[id]/pdf-versions/parse-results/apply` —
    rep-confirmed diff payload.
  - New "Documents" tab on the berth detail page (`berth-tabs.tsx`)
    with current-PDF panel, version history, Replace PDF button, and
    `<PdfReconcileDialog>` for the auto-applied + conflicts UX.

System settings:
  - `berth_pdf_max_upload_mb` (default 15) — caps presigned-upload size
    + server-side validation. Resolved port-override → global → default.

Tests:
  - `tests/unit/services/berth-pdf-parser.test.ts` — magic bytes,
    feet-inches, human dates, full §9.2-shaped OCR text → 18 fields,
    drift warning, AI-tier gate.
  - `tests/unit/services/berth-pdf-acroform.test.ts` — synthetic
    pdf-lib AcroForm round-trip.
  - `tests/integration/berth-pdf-versions.test.ts` — upload, version-
    number bump, magic-byte rejection, reconcile auto-applied vs
    conflicts vs ±1% tolerance, mooring-number warning,
    applyParseResults allowlist enforcement, rollback semantics.

Acceptance: `pnpm exec tsc --noEmit` clean, `pnpm exec vitest run`
green at 1103/1103.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 03:34:24 +02:00