audit: Tier 0 quick wins — EMAIL_REDIRECT_TO prod guard + storage routing + metadata masking

Tier 0.2: src/lib/env.ts now refuses boot when NODE_ENV=production AND
EMAIL_REDIRECT_TO is set. Sendmail logs the rewrite at warn (was debug)
so dev/staging windows where someone forgets to unset are immediately
visible.

Tier 0.6: backup_jobs.storage_path added to TABLES_WITH_STORAGE_KEYS in
src/lib/storage/migrate.ts. Flipping the storage backend used to
silently orphan every pg_dump artefact — last-resort recovery path is
now actually portable.

Tier 1.7: createAuditLog now runs metadata through maskSensitiveFields
(was only applied to old/new value diffs). Portal-auth, crm-invite,
hard-delete and email-accounts services were writing raw emails into
this column unbounded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-12 17:02:10 +02:00
parent a7b72801be
commit 0baca41693
13 changed files with 297 additions and 249 deletions

View File

@@ -1660,7 +1660,7 @@ primary berth, refreshing".
`user.email` directly on the Better-Auth `user` row but never deletes the
target user's existing sessions. Concurrent sessions of the affected user
keep working under the new email (because Better-Auth indexes sessions by
user_id, not email) — that's fine. **But the _previous_ email is now free**
user*id, not email) — that's fine. \*\*But the \_previous* email is now free\*\*
to be claimed by a fresh signup before the admin sends the "your email was
changed" notice. There's no unique constraint that prevents an attacker
from re-registering as `old@example.com` and taking over outgoing identity
@@ -1688,13 +1688,13 @@ Fix: `CREATE UNIQUE INDEX user_email_changes_one_pending ON user_email_changes
### H-6. Email-confirm token isn't atomically consumed
`src/app/api/v1/me/email/confirm/[token]/route.ts:28-57`. Three separate
statements: SELECT pending, UPDATE user, UPDATE pending.applied_at. No
statements: SELECT pending, UPDATE user, UPDATE pending.applied*at. No
transaction wrapper. A user who double-clicks the email link (or a link
preview pre-fetcher like Outlook SafeLinks) fires two near-simultaneous
GETs. Both pass the `appliedAt IS NULL` check, both flip `user.email`
(idempotent — same value), both mark applied. Functional, but the second
audit-log entry is misleading. More importantly: if the second click
arrives 200ms later AND the user re-fired a _different_ change in between
arrives 200ms later AND the user re-fired a \_different* change in between
that the first click happened to apply, you've stomped state.
Fix: single transaction, `SELECT … FOR UPDATE` the pending row, branch on
@@ -5332,7 +5332,7 @@ Either (a) collect every user on the port and run the same `deepMerge` chain per
### H4. Cross-port email collisions are non-deterministic
`public/interests/route.ts:90-114`: when a client_contact with the same email exists on a _different_ port, the code creates a new client. But `tx.query.clientContacts.findFirst` returns "any matching row" with no `ORDER BY` — subsequent submissions may pick either port's row first. Net: same email used cross-port, then resubmitted to the original port, can spawn 2nd/3rd same-port clients. Fix: filter the lookup by joining to `clients.port_id`, or scope the contact lookup to clients owned by the target port from the start.
`public/interests/route.ts:90-114`: when a client*contact with the same email exists on a \_different* port, the code creates a new client. But `tx.query.clientContacts.findFirst` returns "any matching row" with no `ORDER BY` — subsequent submissions may pick either port's row first. Net: same email used cross-port, then resubmitted to the original port, can spawn 2nd/3rd same-port clients. Fix: filter the lookup by joining to `clients.port_id`, or scope the contact lookup to clients owned by the target port from the start.
### H5. `portName` hardcoded as `'Port Nimara'` in four call sites
@@ -6099,10 +6099,10 @@ of unrelated context.
### C3. `src/lib/services/notes.service.ts` — 1121 lines, near-pure duplication
6 entity-type branches per operation (clients / interests / yachts / companies
/ residential_clients / residential_interests). The `create` function alone
/ residential*clients / residential_interests). The `create` function alone
(lines 689846) is 158 lines of 6 copy-pasted insert-then-profile-lookup
blocks; same for `update` (8471019) and `deleteNote` (1020+). A
`tableForEntity()` dispatcher is _defined_ at line 82 then immediately silenced
`tableForEntity()` dispatcher is \_defined* at line 82 then immediately silenced
(`void tableForEntity;` line 98) — i.e. the abstraction was started, abandoned,
and the dead helper left in place. Aggregated listers (`listForClient/Yacht/
Company/ResidentialClientAggregated`) are 4 near-identical 100-line bodies.
@@ -6404,10 +6404,12 @@ Every cross-port-aware code path re-derives "which ports can this user touch?" f
## 33. S3 vs internal DB pathing + storage routing audit (storage-pathing-auditor)
# Audit — S3 vs Internal DB Pathing + Storage Routing
Scope: `src/lib/storage/*`, every `getStorageBackend()` consumer, migration script, magic-byte enforcement, encryption-at-rest boundary.
Date: 2026-05-12
## Boundary summary (what lives where)
- **In DB (Postgres):** file metadata only — `files.storage_path`, `berth_pdf_versions.storage_key`, `brochure_versions.storage_key`, `gdpr_exports.storage_key`, `backup_jobs.storage_path`, user-avatar FK (`user_profiles.avatar_file_id` → `files`), document signing state (`documents.signed_file_id`). AES-256-GCM-encrypted **secrets**: `system_settings.storage_s3_secret_key_encrypted`, `storage_proxy_hmac_secret_encrypted`, `email_accounts.credentials_enc`, `webhooks.secret`, `ocr_config.api_key_encrypted`. No BYTEA / JSONB blobs found (`grep BYTEA → 0`).
- **In backend (S3/filesystem):** every uploaded blob — signed PDFs (`buildStoragePath(slug,'eoi-signed',…)`), per-berth PDFs (`berths/{id}/…`), brochures, avatars, GDPR exports, pg_dump backups, expense receipts, generated reports, template source PDFs, send-out attachment fallbacks.
- **Routing:** `getStorageBackend()` reads global `system_settings.storage_backend` ('s3'|'filesystem'), caches by config fingerprint, invalidated via `resetStorageBackendCache()` on settings write or migration flip. Code never imports `minio/Client` outside `s3.ts` (verified — only legacy `buildStoragePath` helper survives in `src/lib/minio/index.ts`). Interface methods: put/get/head/delete/listByPrefix/presignUpload/presignDownload — both backends implement all 7.
@@ -6415,62 +6417,79 @@ Date: 2026-05-12
## CRITICAL
### C1. `backup_jobs.storage_path` missing from `TABLES_WITH_STORAGE_KEYS` — silent backup loss on backend swap
`src/lib/storage/migrate.ts:55-60` lists only `files`, `berth_pdf_versions`, `brochure_versions`, `gdpr_exports`. `backup_jobs.storage_path` (pg_dump artefacts written by `src/lib/services/backup.service.ts:54+72`) is **not** in the list. Flipping S3 → filesystem (or vice-versa) leaves every historical database backup pointing at the old backend — `getBackupDownloadUrl(id)` will 404 / NoSuchKey, and the admin won't know until they try to restore. This is the worst category of data loss because backups are the recovery path of last resort. The comment in `migrate.ts:51` calls out `report_snapshots` as a future addition but mentions nothing about `backup_jobs`. **Add `{ table: 'backup_jobs', keyColumn: 'storage_path', pkColumn: 'id' }` and ship the line with a smoke test.**
### C2. Orphan-blob risk: every `backend.put` runs outside the `db.insert(files)` transaction
Pattern repeated across 9+ services (`files.ts:68-92`, `documents.service.ts:833-854` and `1134-1183`, `external-eoi.service.ts:71-96` — comment at L67-70 explicitly acknowledges "orphan reaper handles those" but **no reaper exists**, `invoices.ts:603`, `document-templates.ts:537,674`, `reports.service.ts:231`, `gdpr-export.service.ts:169`, `backup.service.ts:62`, `berth-pdf.service.ts:229`). Sequence is: PUT bytes → DB INSERT. If insert fails or the process dies in between, the blob is permanent and unreferenced. Only `handleDocumentCompleted` (`documents.service.ts:1110`) has an early-return idempotency gate; the rest leak. Over months of operation an S3 prefix accumulates dozens-to-hundreds of orphans that pay storage cost forever and survive every backup-restore. **Add an orphan-reaper maintenance job** that walks `listByPrefix()` against the union of all `storage_*` columns and deletes blobs older than 24 h without a DB pointer. Also wrap the `put + insert` pairs in a try/catch that explicitly deletes on insert failure (cheap defense in depth).
### C3. S3 backend stores blobs without server-side encryption (SSE-S3 / SSE-KMS)
`S3Backend.put()` (`src/lib/storage/s3.ts:191`) passes only `Content-Type` to `client.putObject`. No `x-amz-server-side-encryption` header. Bytes-at-rest encryption depends entirely on the bucket's default-encryption policy, which is invisible to the application — a customer who provisions a MinIO/B2/R2 bucket without server-side encryption gets cleartext signed contracts, GDPR exports, and `pg_dump` archives sitting on disk. Same audit posture as SMTP/IMAP creds (which **are** AES-GCM in the DB) demands the same guarantee for the blob plane. **Either add `ServerSideEncryption: 'AES256'` to every `putObject` call, or surface a boot-time check that asserts the bucket has default-encryption enabled and refuses to start otherwise** (similar to the `MULTI_NODE_DEPLOYMENT` guard on FilesystemBackend).
## HIGH
### H1. Berth-PDF presigned-upload keys are not port-scoped
`src/app/api/v1/berths/[id]/pdf-upload-url/handlers.ts:58` builds `berths/{berthId}/uploads/{uuid}_{name}` — no leading `${portSlug}/`. Result: the optional port-binding (`p` field on the HMAC token, enforced in `filesystem.ts:184-188` and documented in `index.ts:43-49`) cannot be wired here, and the storage-key namespacing convention diverges from `buildStoragePath` (which always prefixes the port slug). Tenant isolation today relies on the up-front `berths.portId === ctx.portId` check before mint, but the defense-in-depth port-binding is unwired. **Normalise the key to `${portSlug}/berths/...` and pass `portSlug` into `backend.presignUpload`.**
### H2. `presignDownload` callers never pass `portSlug` — port-binding token guard is dead code
`presignDownloadUrl(...)` (`storage/index.ts:233`) accepts `portSlug` and only 1 of ~12 callers uses it. `files.ts:117,128`, `backup.service.ts:115`, `portal.service.ts:351`, `reports.service.ts:170`, `gdpr-export.service.ts:224,282` all pass undefined. The filesystem-proxy GET will therefore accept any valid HMAC token regardless of the storage-key's port prefix. The check is genuinely defensible (see `filesystem.ts:179`) but never engaged. **Plumb the active port slug through every call site, or remove the optional `p` field and the verifier code so the contract isn't misleading.**
### H3. `S3Backend.put` and `backup.service` buffer entire blobs into memory
`s3.ts:187` (`Buffer.isBuffer(body) ? body : await streamToBuffer(body)`) and `backup.service.ts:60-62` (concatenates the entire pg_dump dump into memory before put). For a multi-GB database dump the worker OOMs. Comment at `s3.ts:184-187` explicitly says "typical files are under 50MB" but `runPgDump` writes a dump file whose size scales with the tenant. **Use `client.fPutObject` (file-path streaming) for backups; for streamable callers expose a `putStream(key, stream, sizeBytes, opts)` overload that pipes without `streamToBuffer`.**
### H4. Migrator's `copyAndVerify` double-buffers every blob and has no streaming hash
`storage/migrate.ts:170-204` reads source → Buffer, sha256, put, then re-reads target → Buffer, sha256 again. For a 5 GB pg_dump (see C1 — once added) this allocates ~10 GB of heap. The sha256-verify round-trip is the right idea; **pipe through `crypto.createHash` on both legs**, never buffer.
### H5. `S3Backend.presignUpload` lacks content-type / content-length binding
`s3.ts:249-256` only calls `presignedPutObject(bucket, key, expiry)`. The signed URL does not bind `Content-Type` or `Content-Length` — a browser can PUT 1 GB of arbitrary bytes against an EOI-signed key. Caps and magic-byte checks fire only on the **register** call afterwards (`registerBrochureVersion` and `uploadBerthPdf` HEAD-then-stream-first-5-bytes path). That's sufficient for the two consumers today, but the gate is one-deep — a future caller that forgets to wire a register endpoint exposes raw S3 directly. **Switch to MinIO `presignedPostPolicy` with `content-length-range` + `Content-Type` conditions so the binding is on the signature itself.**
## MEDIUM
### M1. CLAUDE.md drift on "TABLES_WITH_STORAGE_KEYS populated in 9a5ba87"
CLAUDE.md says the migrator covers "every blob in `files`, `berth_pdf_versions`, `brochure_versions`, `gdpr_exports`". Verified true — but **backup_jobs is the missing 5th** (see C1). Update the doc + add a unit test that asserts the array matches the set of tables with a `storage_*` column.
### M2. `email-compose.service.ts:124` reads attachment bytes into a Buffer
Each attachment under the `email_attach_threshold_mb` cap is fetched via `storage.get(...)` and concatenated. With multiple recipients × multiple attachments the worker holds N × size MB simultaneously. **Stream into `nodemailer`'s `content: <Readable>` API directly.**
### M3. UUID storage keys never check existence before put (no `If-None-Match: "*"`)
`crypto.randomUUID()` collision is astronomical, but a buggy caller passing a duplicate key (or a re-run of a worker after a partial DB rollback) silently overwrites. **Cheap belt: pass `If-None-Match: '*'` (S3) or `O_EXCL` (filesystem) — surfaces double-writes loudly.**
### M4. Per-port S3 routing not possible / `listByPrefix` unbounded
Storage config rows are global (`portId IS NULL`). Multi-tenant can't direct port-A vs port-B to separate buckets / KMS keys. `listByPrefix` returns every key in one array — script-only today but a footgun if called with empty prefix in production. **Document the global-config assumption; add a cursor variant before any per-port-bucket customer lands.**
### M5. `storage_filesystem_root` change invalidates outstanding HMAC tokens silently
Cache swaps, but tokens minted under the old root still verify HMAC; `resolveKeyForProxy` then 404s under the new root. Customer download links emailed an hour earlier break with no warning. **Either refuse runtime root changes, or warn in admin UI.**
### M6. Avatar URLs re-presign every 15 min — browser cache broken
No CDN / `s-maxage` fronts hot reads. Per-page avatar GET burns a presign + S3 round-trip. **Issue 24 h URLs for `category='avatar'`, or front with the Next.js Image route.**
### M7. Verified clean
- `withTimeout(...)` wraps every minio call (s3.ts L143/150/190/203/219/237/285/292/300); `system-monitoring.service.ts:153` adds its own 5 s deadline. **No bare minio calls escape.** ✓
- `MULTI_NODE_DEPLOYMENT` guard reads `env.MULTI_NODE_DEPLOYMENT` (zod-coerced, `env.ts:80`), test at `filesystem-backend.test.ts:139`. ✓
### M8. Magic-byte enforcement
- **In-server uploads:** `files.ts:58` (`bufferMatchesMime`), `berth-pdf.service.ts:218` (`isPdfMagic`). ✓
- **Presigned-PUT post-upload register:** `brochures.service.ts:258` (first-5-byte stream + `%PDF-`), `berth-pdf.service.ts:259` (`readFirstBytes` + `isPdfMagic`). ✓
- **Filesystem proxy PUT:** inline check `route.ts:220` when token's `c=application/pdf`. ✓
- **S3 direct PUT:** no inline check (relies on the register endpoint). Acceptable per CLAUDE.md, but document divergence: a future S3 consumer that forgets to call register leaks the gate.
## Verified-clean (informational)
- No BYTEA / binary-JSONB blob columns. ✓
- Single canonical key format mismatch (`storage_path` vs `storage_key`) is documented + handled by per-table column mapping. ✓
- `validateStorageKey` rejects traversal, absolute paths, dotfiles, and >1024 chars. ✓
@@ -6480,6 +6499,7 @@ No CDN / `s-maxage` fronts hot reads. Per-page avatar GET burns a presign + S3 r
- All blob keys are UUID-namespaced — collision-safe, not deterministic-audit-style. ✓
## Recommended ordering
1. **C1** (one-line fix + smoke test) before any backend migration ships.
2. **C2** orphan reaper — cron job behind `maintenance` worker.
3. **C3** SSE-S3 — single-line putObject change + bucket-policy assertion at boot.