Files
pn-new-crm/docs/AUDIT-2026-05-12.md
Matt d3960af340 feat: warm-up deps — ts-reset, web-vitals, RHF devtool, query-broadcast
Four low-risk adds before the Zod 4 / drizzle-zod headliner:

- @total-typescript/ts-reset: tightens TS stdlib types globally (JSON.parse
  → unknown, fetch().json() → unknown, .filter(Boolean) narrows, Set
  literals respect typed Set targets). Caught 179 latent type errors;
  fixed all production sites (8 files) and added `any` cast escape hatch
  in test files (ESLint exemption scoped to tests/).
- web-vitals + /api/v1/internal/vitals endpoint + WebVitalsReporter
  client component: establishes Core Web Vitals baseline (LCP/INP/CLS/
  FCP/TTFB) via navigator.sendBeacon. Required before optimisation work.
- @hookform/devtools + FormDevtool wrapper: dev-only RHF state inspector,
  lazy-loaded via next/dynamic so the chunk is excluded from prod
  bundles entirely.
- @tanstack/query-broadcast-client-experimental: cross-tab cache sync
  via BroadcastChannel — wired in query-provider.tsx, 1-liner.

Audit doc updated with sections 35 + 36 (PDF stack overhaul + comprehensive
second-pass package sweep) covering ~20 package adoption candidates and
4-5 deprecation candidates.

Verified: tsc clean, vitest 1293/1293 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:16:18 +02:00

476 KiB
Raw Blame History

Port Nimara CRM — Comprehensive Platform Audit

Generated: 2026-05-12 (session run) Branch: feat/documents-folders Method: 19 parallel audit agents on Claude Opus 4.7, read-only static analysis. Each agent owned a single domain and wrote a CRITICAL/HIGH/MEDIUM-grouped report. This document consolidates the reports and overlays the fixes already shipped during the session.


How to read this document

  1. Executive summary lists every CRITICAL finding (must address before production), per domain.
  2. Already fixed in this session is a manifest of the changes I shipped while the audit was running. Don't re-fix these.
  3. Cross-cutting priority queue is the top ~15 highest-impact findings across the entire codebase, ordered. Tackle these first.
  4. Per-domain reports below contain the full text of every agent's report verbatim — useful when you sit down to actually fix a specific area.
  5. Methodology + agent roster appendix at the bottom lists who looked at what.

Severity is the auditor's judgment, not mine — I have not re-graded findings. Treat anything tagged CRITICAL as a real block on shipping.


Executive summary

CRITICAL findings (must address)

# Domain File Issue Status
1 Security src/app/api/v1/admin/users/[id]/permission-overrides/route.ts Admins could grant themselves every permission leaf via self-target FIXED this session
2 Security src/app/api/auth/resolve-identifier/route.ts Username enumeration via hit/miss response shape + no rate limit FIXED this session
3 Services src/lib/services/users.service.ts (admin email-change) account.accountId not updated → user can't sign in with either old or new email after admin rotation; sessions also not revoked FIXED this session
4 Observability src/lib/services/search-nav-catalog.ts 10 NAV_CATALOG entries pointed at routes that don't exist (/admin/audit-log, /admin/error-events, /user-settings, 7×/settings/<x>) FIXED this session
5 Auth flow src/middleware.ts Token-gated email confirm/cancel routes blocked by session 401 FIXED this session
6 Email src/lib/env.ts + src/lib/email/index.ts EMAIL_REDIRECT_TO has no NODE_ENV=production guard — a stray prod env value silently funnels every email to one inbox Open
7 Email every template URL interpolations into href="…" and link text are unescaped — a " in any URL breaks out, no scheme rejection Open
8 Data model src/lib/db/migrations/0052_audit_critical_fixes.sql CREATE INDEX CONCURRENTLY silently never runs because there's no real db:migrate runner — six composite indexes missing in prod Open
9 Data model db:push flow Two structural constraints (berths.current_pdf_version_id circular FK, system_settings NULLS NOT DISTINCT) not in db:push; fresh-deploy diverges from prod Open
10 Services documents.service.ts: handleDocumentCompleted Orphan-blob window — failure between storage.put and documents.update leaves the blob and marks status='completed' with no signedFileId Open
11 GDPR src/lib/services/gdpr-bundle-builder.ts Article-15 export missing portal_users, email_threads/messages, document_sends, reminders, files, scratchpadNotes, client_merge_log, contact_log, website_submissions, form_submissions Open
12 GDPR src/lib/services/client-hard-delete.service.ts "Right to be forgotten" doesn't actually erase — verbatim PII survives in email_messages.body_html, files, document_sends.recipient_email forever Open
13 GDPR src/app/api/auth/resolve-identifier/route.ts (post-fix) Still echoes the real canonical email on a successful username hit (rate-limited but enumerable) Partial — see Open follow-ups
14 GDPR audit_logs.metadata field Not covered by maskSensitiveFields; raw PII (emails, IPs, names) accumulates unbounded with no retention cron Open
15 Observability src/app/api/webhooks/documenso/route.ts Webhook handler bypasses the platform-error pipeline entirely — admin/errors silent on Documenso webhook crashes Open
16 UI/UX 16 sites use native window.confirm() Bypasses ConfirmationDialog / AlertDialog for destructive flows (cancel signing, delete files, archive interest/company/yacht…) Open
17 Documenso documenso-client.ts v1↔v2 routing (Pending full report) In progress
18 Concurrency (see report) Various race windows on multi-rep edits + partial-unique-index inserts Open

HIGH-priority queue

Listed after CRITICALs in the priority queue section below.


Already fixed in this session

These changes are on the feat/documents-folders branch (post-commit 660553c and onward). Do not re-fix.

Security

  • Self-target privilege escalation blocksrc/app/api/v1/admin/users/[id]/permission-overrides/route.ts now refuses PUT when targetUserId === ctx.userId. Additionally, the body now sanitises against a canonical ALLOWED_RESOURCE_ACTIONS allow-list mirroring RolePermissions, so unknown resource/action keys are stripped before write. Cross-tenant pollution check added (refuses overrides for users without a user_port_roles row in the caller's port).
  • Username enumeration killsrc/app/api/auth/resolve-identifier/route.ts now (a) shares the auth 5-per-15-min rate-limit bucket keyed by client IP, (b) returns a synthetic @auth.invalid email on miss so hit and miss are indistinguishable in shape. (Note: GDPR auditor flagged the hit-path still echoes a real canonical email — still an information leak that's worth a deeper redesign; see Open follow-ups.)
  • Email-change account/session rotationsrc/lib/services/users.service.ts now also updates account.accountId for the credential provider (Better Auth's actual login key) AND revokes every active session row when an admin rotates a user's email. Previously the user could not sign in with either old or new email after rotation.
  • Middleware unblocks token-gated email routessrc/middleware.ts adds /api/v1/me/email/confirm/ and /api/v1/me/email/cancel/ to PUBLIC_PATHS so the confirm/cancel links work in a fresh browser without an existing session.

Search + navigation

  • NAV_CATALOG dead-link sweepsrc/lib/services/search-nav-catalog.ts corrected 10 entries that pointed to non-existent routes. /admin/audit-log/admin/audit, /admin/error-events/admin/errors, /user-settings/settings/profile, and the 7 phantom /settings/<x> entries redirected to their real /admin/<x> homes.
  • Topbar global search extended — every admin sub-card now indexed in NAV_CATALOG with curated keywords (client portal, ai scoring, pipeline weights, recommender heat weights, etc.). Results sort to the bottom of the cmd-K dropdown, beneath entity hits.
  • Admin sections page searchsrc/components/admin/admin-sections-browser.tsx AdminSection gained a keywords?: string[] field, populated for System Settings (mirrors KNOWN_SETTINGS), AI configuration, OCR, Users, and Website analytics. filteredMatches haystack now includes those keywords.

User management

  • Disable / enable button — third Power/PowerOff action button on the desktop user list + matching dropdown item on the mobile card. Backed by userProfiles.isActive (already enforced by withAuth → 403 on disabled accounts).
  • UserForm tabs + permissions matrix — UserForm now wraps Profile & role + Permissions in tabs. New UserPermissionMatrix component renders the full RolePermissions shape with three-state per-leaf toggle (Inherit / Grant / Deny). The matrix is role="radiogroup" + aria-checked per option, and shows an amber callout explaining that overrides save on their own button. Dirty-state tracked via originalOverrides comparison.
  • First/last name + admin email change — UserForm collects first + last name (canonical) alongside displayName. Email change behind an AlertDialog confirmation; on confirm sends an automated notice to the prior address (new template src/lib/email/templates/admin-email-change.ts).
  • Phone formatting — UserForm swaps the bare tel input for the shared PhoneInput (country combobox + AsYouType + E.164 storage).

Optional username sign-in

  • Migration 0054_user_profiles_username.sql adds username column (2..30 chars, regex ^[a-z0-9._-]{2,30}$, partial unique index on LOWER(username)).
  • Login page now accepts email OR username via /api/auth/resolve-identifier.
  • Self-service username card on src/components/settings/user-settings.tsx.
  • /api/v1/me PATCH now accepts username with allow-list + reserved-name check + uniqueness check before write.

Per-user permission overrides

  • Migration 0055_user_permission_overrides.sql adds the table.
  • Effective-permissions resolver in src/lib/api/helpers.ts now layers user overrides on top of role + port-role overrides + residential toggle.
  • GET / PUT /api/v1/admin/users/[id]/permission-overrides endpoints.

Role + enum normalization

  • formatRole() + ROLE_LABELS in src/lib/constants.ts — replaces the ad-hoc humanizeRole in sidebar.tsx and prettifyRoleName in role-list.tsx. user-list, user-card, role-list, user-form now render "Sales Agent" instead of "sales_agent".
  • formatOutcome() + OUTCOME_LABELS for interest outcomes. Updated client-columns.tsx, realtime-toasts.tsx, interest-detail-header.tsx, command-search.tsx.
  • Pipeline stage normalization extended to: next-in-line-notify.service.ts, command-search.tsx (interest + residential interest bucket), yacht-tabs.tsx, interest-picker.tsx, ai.ts worker email body, pipeline-report.ts + revenue-report.ts PDF generators.

Auto-memory

  • Saved feedback memory: "Be thorough — audit everything that ends in a user-facing notification". (Memory subsystem is /Users/matt/.claude/projects/...)

Cross-cutting priority queue

Tackle in this order. C-prefix = CRITICAL still open; H-prefix = HIGH.

  1. [C] Wire a real db:migrate runner — without it, 0052_audit_critical_fixes.sql silently never creates 6 composite indexes (data-model C1). Recommended: a tsx script that reads migrations in order, splits on --> statement-breakpoint, runs CREATE INDEX CONCURRENTLY outside a tx, and tracks state in a __drizzle_migrations table. Same script gives you db:migrate:status for prod readiness.
  2. [C] Add EMAIL_REDIRECT_TO prod guardsrc/lib/env.ts should refine to reject when NODE_ENV === 'production', and src/lib/email/index.ts should logger.warn at boot when set (not debug). 5 minutes of work, prevents an extremely-bad-day class of incident.
  3. [C] Fix orphan-blob window in handleDocumentCompletedsrc/lib/services/documents.service.ts:1100-1253. Wrap the storage.put + files.insert + documents.update sequence in a transaction or a saga with a compensating delete. The current catch-block path also incorrectly marks status='completed' with no signedFileId, hiding the failure from reps.
  4. [C] Escape URLs in email templates — every template in src/lib/email/templates/* inlines ${data.link} etc. into href/text without escaping. Move all template rendering through a shared escapeUrl helper and add scheme allow-listing (http(s) only).
  5. [C] Eliminate the 16 native window.confirm() calls — each one is a destructive flow that bypasses ConfirmationDialog / AlertDialog. ui-ux-auditor lists the sites; high-leverage UX fix.
  6. [C] GDPR export completenessgdpr-bundle-builder.ts must include portal_users, email_threads/messages, document_sends, reminders, files, scratchpadNotes, client_merge_log, contact_log, website_submissions, form_submissions. This is a regulator-finding-level gap.
  7. [C] Right-to-be-forgotten actually eraseclient-hard-delete.service.ts currently nullifies FKs but leaves verbatim PII in email_messages.body_html, files, document_sends.recipient_email. Add a true wipe path (or document the limitation in the legal text and gate the feature behind a "we cannot fully erase X" warning).
  8. [C] Add user_permission_overrides.user_id FK + onDelete='set null' on nullable client refs — data-model H1+H2. Migration 0056.
  9. [C] Resolve-identifier hit-path still leaks email — replace the API entirely with a server-side signIn proxy that takes {identifier, password} and never returns the canonical email at all. Current rate-limited hit still echoes real emails to anyone with a guessable username.
  10. [H] Re-audit audit_logs.metadata masking — extend maskSensitiveFields to cover audit_logs.metadata; add a 90-day retention cron (mirroring error_events).
  11. [H] Webhook → error pipelinedocumenso/route.ts should captureErrorEvent on handler crash. Apply the same to every other webhook route.
  12. [H] Wire admin email-template subject editor — 5 of 8 templates ignore overrides.subject; admins see "Saved" with zero effect. email-auditor H1+H2.
  13. [H] Wire admin signature/footer fields/admin/email writes email_signature_html + email_footer_html which the shell never reads. Either delete or wire.
  14. [H] PII redaction in audit/error pipelineerror_events.request_body_excerpt sanitizer redacts password/token but not email/phone/name/dob/address.
  15. [H] Notification email worker XSSworkers/notifications.ts:65-71 interpolates notif.description and notif.link into HTML unescaped. Apply escapeHtml + URL allow-list.

Per-domain reports

Each section below is the agent's report verbatim. File:line refs reference the repo as it stands at the start of the audit session — some have already been addressed (see "Already fixed in this session" above).


1. Security + API + auth audit (security-auditor + early api-security run)

Two reports — the team-spawned security-auditor and an earlier standalone run. Both included verbatim.

Report A: security-auditor (team)

Security / API / Auth Audit — feat/documents-folders branch

Read-only audit of the pn-crm repo. Scope: auth wrappers, tenant scoping, public/webhook endpoints, the just-shipped username-resolve + permission- overrides + admin email-change flows, CSRF posture, audit-log coverage.

No CRITICAL issues found — auth helpers (withAuth / withPermission / requireSuperAdmin) are applied consistently across src/app/api/v1/**, public endpoints all use timing-safe secret compares + per-IP rate limits, and the Documenso webhook idempotency + per-port secret resolution is sound. The findings below are HIGH / MEDIUM.


HIGH

H1. resolve-identifier leaks username→email mapping AND has no rate limit

File: src/app/api/auth/resolve-identifier/route.ts (lines 2558)

The route's own docstring claims it "pairs with the global login-attempt limiter" — but no enforcePublicRateLimit / checkRateLimit is actually called in the handler. Unauthenticated attackers can POST {identifier:"matt"} at unbounded volume; on a hit the response is {email:"matt@letsbe.solutions"}, on a miss the response echoes the raw input. That makes existence trivially decidable (response contains @ ↔ hit), and on a hit the caller also learns the actual email address. Usernames are typically far more guessable than emails (first names, social handles), so this becomes a one- way username → email harvester usable for downstream phishing / password spraying. Fix: wrap with enforcePublicRateLimit(req, 'portalSignIn', identifier.toLowerCase()) (or a new loginIdentifier bucket) AND stop echoing the resolved email — either return {ok:true} and require the caller to POST (username,password) together to a single sign-in endpoint that does the lookup server-side, or return an opaque short-lived token that Better Auth's sign-in step can redeem internally.

H2. Admin email-change leaves emailVerified true → account takeover via reset

File: src/lib/services/users.service.ts (lines 233262, 355387)

updateUser rotates user.email directly when an admin edits the address (line 246247) but never resets emailVerified. A hostile or compromised admin can point any victim's account at an attacker-controlled mailbox, then trigger the existing "forgot password" flow on the new address and silently hijack the account; the existing notifyAdminEmailChange notice fires to the old address fire-and-forget and is documented as non-blocking ("failure to send doesn't roll back"). There is also no createAuditLog specifically for the email-change — the generic update audit at line 287 buries the change inside newValue: data rather than emitting a dedicated email_change action that monitoring can alert on. Fix: when wantsEmailChange, set emailVerified: false in the Better Auth user update, write a dedicated severity: 'warning' audit row with {oldEmail, newEmail, changedBy}, and require the recipient to click the existing /api/v1/me/email/confirm/[token] flow before the rotation applies — i.e. mint a user_email_changes row rather than direct-UPDATE.

H3. Permission-overrides PUT accepts arbitrary keys → JSONB pollution + deep-merge surprise

File: src/app/api/v1/admin/users/[id]/permission-overrides/route.ts (lines 3135, 97141)

updateOverridesSchema is z.record(z.string(), z.record(z.string(), z.boolean())) — no allow-list against the known RolePermissions resource/action keys. An admin (or a stolen admin session) can persist arbitrary keys into user_permission_overrides.permission_overrides. Two concrete impacts: (a) future deep-merge logic that maps unknown keys into newly added resources promotes the rogue keys silently (silent privilege creep when new permissions ship); (b) the JSONB can be bloated to harm downstream readers. Fix: validate against KNOWN_PERMISSION_LEAVES derived from RolePermissions (resource → action set), reject unknown keys with ValidationError, and bound the merged blob size as /api/v1/me/route.ts already does for preferences. The GET handler is fine — it only reads what was already persisted.

H4. /api/v1/me/email/confirm|cancel/[token] is unreachable for logged-out users (middleware 401)

File: src/app/api/v1/me/email/cancel/[token]/route.ts, src/app/api/v1/me/email/confirm/[token]/route.ts, src/middleware.ts (PUBLIC_PATHS list, line 820)

The handlers correctly skip withAuth ("the token IS the proof") but /api/v1/me/email/... is not in PUBLIC_PATHS, so middleware.ts returns a 401 JSON for any unauthenticated request — exactly the case a user clicking the confirm link from email on a different device will hit. End result: every confirm/cancel click from a logged-out browser fails with "Authentication required". Also, the GET request applies an irreversible state mutation with no CSRF guard (the origin-check in middleware only fires for STATE_CHANGING_METHODS). Fix: move these handlers under /api/auth/email-change/{confirm,cancel}/[token] so they're covered by the /api/auth/ PUBLIC_PATHS prefix, OR add /api/v1/me/email/ to PUBLIC_PATHS. Convert the GET mutation to a POST landing page (one-click confirm form) so cross-site image/prefetch tags can't silently flip state.


MEDIUM

M1. Direct Schema.parse(body) instead of parseBody(req, schema)

Files: src/app/api/v1/admin/custom-fields/[fieldId]/route.ts:18-19, src/app/api/v1/search/route.ts:11, src/app/api/v1/files/upload/route.ts:21, src/app/api/v1/companies/[id]/members/[mid]/handlers.ts:29, src/app/api/public/website-inquiries/route.ts:97-98, src/app/api/public/residential-inquiries/route.ts:51-52, src/app/api/public/interests/route.ts:47-48, src/app/api/portal/auth/{sign-in,forgot-password,reset-password,activate,change-password}/route.ts, src/app/api/auth/{set-password,resolve-identifier}/route.ts.

CLAUDE.md explicitly requires parseBody so the 400 envelope + field- errors shape stays uniform (the frontend's toastError hook depends on it). Most of these are caught by an outer try/catch that routes ZodError into errorResponse, which masks the issue — but the response shape diverges (a thrown ZodError becomes a generic 500 unless errorResponse maps it). Admin route custom-fields/[fieldId] is the worst case: a malformed PATCH body 500s instead of 400-with-field-errors. Fix: swap to parseBody(req, schema) in the admin/internal routes; the portal / public auth routes intentionally use safeParse + manual ValidationError mapping and can be left as-is.

M2. CSRF origin check disabled in development

File: src/middleware.ts (line 80)

process.env.NODE_ENV !== 'development' gates the origin check. If a production deployment is ever booted with NODE_ENV=development accidentally (shell export leakage, container override, "debug deploy"), all CSRF defense-in-depth is silently off — SameSite=Lax still helps but isn't enough for legacy browsers / extension contexts. Fix: key the bypass on an explicit DISABLE_CSRF_FOR_LAN=1 env var that's defaulted to unset and refused in lib/env.ts when NODE_ENV==='production'.

M3. Permission-override audit log lacks severity escalation

File: src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:124-134

Changing user permission grants is exactly the action an attacker would take after compromising an admin; the audit row should be emitted with severity:'warning' (matching the email_change_cancelled precedent in src/app/api/v1/me/email/cancel/[token]/route.ts:46) so the audit UI's default filter surfaces it. Today it's a vanilla action:'update' lost in the noise.

M4. /api/public/interests audit row stores client phone in metadata

File: src/app/api/public/interests/route.ts:254-271

The audit row's newValue and surrounding metadata capture ip plus foreign keys, which is fine, but data.phone is held in scope and could easily slip in during a future edit. Today the row is OK; flag as a place to add a regression test. (Not a finding to act on, just a watch-list item for the broader audit team.)

M5. Filesystem storage proxy: token leak via Referer

File: src/app/api/storage/[token]/route.ts:42-119

Cache-Control: private, no-store is set on the response, but the URL itself (with the HMAC token in the path) leaks via the Referer header when the downloaded asset is opened inside a browser tab that then navigates to a third-party link. Single-use replay protection mitigates reuse, but a token still-in-window is good for one stolen download. Fix: either rotate to a POST-with-token-in-body form (breaks <a download>), or set Referrer-Policy: no-referrer on the response and document that issuers should mint with the shortest possible expiry. Lower-impact because filesystem mode is single-tenant per the boot guard.

M6. /api/v1/clients/bulk-hard-delete lacks per-IP rate-limit

File: src/app/api/v1/clients/bulk-hard-delete/route.ts (no withRateLimit)

The sibling bulk-hard-delete-request/route.ts is wrapped in withRateLimit but the actual delete endpoint is not. A compromised admin session could fan out hundreds of irrevocable hard-deletes in a tight loop with no limiter to slow it down. Fix: add withRateLimit('destructiveBulk', ...) or similar with a 5/minute cap; the existing audit row will still be emitted, but the limiter caps the blast radius.


Verified clean (no finding)

  • withAuth / withPermission / requireSuperAdmin applied uniformly: every route.ts under src/app/api/v1/** was checked; the only files without the wrappers are me/email/{confirm,cancel}/[token]/route.ts (covered by H4) which intentionally use bearer-token auth.
  • withAuth enforces port-context via X-Port-Id header / preferences, never from body (helpers.ts:160168).
  • Documenso webhook: timing-safe per-port secret resolution, replay guard via signatureHash unique index, per-handler portScope forwarded so a documensoId reused across ports can't cross-mutate.
  • Public website-intake: timing-safe verifySecret with length-equal buffer pad, refusal-by-default when WEBSITE_INTAKE_SECRET unset, per-IP rate-limit, unknown port slug → generic 400 (no input echo).
  • Raw sql\...`usage scanned acrosssrc/lib/servicesandsrc/app/api: every interpolation is via Drizzle's parameter binding (sql`... ${foo} ...``); no string concatenation gaps found.
  • Storage proxy upload (PUT) does HMAC verify + single-use replay + size cap
    • PDF magic-byte enforcement before disk write.

— security-auditor (read-only audit; no source files edited)

Report B: api-security (standalone earlier run)

API + Auth + Security Audit Port Nimara CRM

Scope: src/app/api/**, src/lib/api/helpers.ts, src/lib/auth/**, src/middleware.ts, plus the newly-added permission-overrides and resolve-identifier flows.

CRITICAL

1. Privilege escalation via PUT /api/v1/admin/users/[id]/permission-overrides

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:97-141

The PUT handler gates only on withPermission('admin', 'manage_users', …) and never verifies that params.id !== ctx.userId. Any user who holds admin.manage_users can target their own userId and write a userPermissionOverrides row that grants every leaf ({ admin: { manage_users: true, manage_settings: true, … }, … }). Because withAuth deep-merges userOverride.permissionOverrides last in the chain (src/lib/api/helpers.ts:227-238), the row wins over the base role and instantly escalates the caller to admin-of-everything on the next request. The companion removeUserFromPort service in src/lib/services/users.service.ts:319 does have a self-target guard — the same guard is missing here. Fix: in the PUT handler, throw ForbiddenError when targetUserId === ctx.userId && !ctx.isSuperAdmin, and require super-admin to flip admin.* leaves (or any leaf that the calling user cannot already grant). Tier-2 fix: rotate this row to require super-admin outright; admin-of-port shouldn't be able to mint persistent overrides for peers anyway.

2. /api/auth/resolve-identifier has no rate-limit — username enumeration

src/app/api/auth/resolve-identifier/route.ts:25-59

The endpoint is unauthenticated, sits behind /api/auth/* (so the middleware origin check is skipped per src/middleware.ts:46-49), and does NO rate-limit / throttling. The header comment claims it "pairs with the global login-attempt limiter" but that limiter is only triggered when the subsequent sign-in call runs — an attacker hitting just this endpoint with a wordlist is unconstrained. While the response shape is the same on hit and miss ({ email: <string> }), the content differs: a hit returns an @-bearing email, a miss returns the unchanged raw input. So with one HTTP call per candidate an attacker deterministically learns which usernames map to real accounts; they then funnel only the validated emails into the rate-limited sign-in flow, defeating the per-account brute-force ceiling. Fix: wrap in enforcePublicRateLimit(req, 'portalSignIn', normalized) (or a new bucket like usernameResolve with ~10/15min per-IP), and consider returning a constant fake-email when the username doesn't resolve so hit/miss are indistinguishable at the response-body level too.

HIGH

3. permission-overrides PUT does not validate the override shape

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:31-34, 97-141

updateOverridesSchema is z.record(z.string(), z.record(z.string(), z.boolean())) — any resource name and any action key is accepted. This stores garbage in user_permission_overrides.permission_overrides forever, and silently typo'd keys ('clien_ts.view') won't take effect but won't 400 either. More importantly, there is no allow-list against the RolePermissions shape defined in src/lib/db/schema/users.ts:6, so a future code path that does Object.keys(permissions).forEach(…) could be surprised by a foreign resource appearing in the merged map. Fix: derive a Zod allow-list at module load from the canonical RolePermissions shape (the same VALID_MERGE_TOKENS pattern the templates code uses) and reject unknown resource/action keys with 400.

4. permission-overrides PUT writes for users not assigned to the current port

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:97-122

The PUT inserts/updates a (userId, portId) row without first verifying that targetUserId actually has a user_port_roles row for ctx.portId. An admin at port A can mint override rows for users belonging only to port B (the row is keyed on the admin's portId, so it's a "future override that would activate if the user ever joins this port"). Functionally inert today, but pollutes the override table across tenants and breaks the implicit "you can only manage users in your port" invariant the rest of the admin/users routes enforce. The GET path does the implicit validation by failing the port-role lookup; the PUT should mirror it. Fix: findFirst on userPortRoles with (targetUserId, ctx.portId) first; 404 if missing, mirroring updateUser at src/lib/services/users.service.ts:216-219.

5. Email-change confirm endpoint cannot be aborted after compromise window

src/app/api/v1/me/email/confirm/[token]/route.ts:42-57

Token-based unauthenticated swap. The flow looks otherwise correct (sha256- hashed token, expiry, single-use via appliedAt, race-checked uniqueness). What's missing: when a confirmation completes, all other outstanding userEmailChanges rows for the same userId should be cancelled, and all existing Better Auth sessions for that user should be revoked. Today, if an attacker compromises the account, requests an email change to attacker-owned address, and the victim spots the cancel email but races against the attacker — once the attacker confirms, the victim's cancel link still works on the other pending row but not on the now-applied change, and the attacker's existing CRM session (pn-crm.session_token) survives the swap. Fix: in the confirm handler, after the email UPDATE, also db.delete(sessions).where(eq(sessions.userId, pending.userId)) (or whatever the Better-Auth session table is called) and mark all other open userEmailChanges rows for that user as cancelled. Mirror the cancel-handler behaviour. Severity is HIGH not CRITICAL because the attacker needs the session in the first place.

6. Public /api/auth/[...all] audits the attempted email but doesn't bound brute-force timing

src/app/api/auth/[...all]/route.ts:100-146

Better Auth handles sign-in rate-limiting internally (it has a built-in limiter when configured), but I see no explicit enforcePublicRateLimit wrapper around this catch-all. The loginAttempt bucket I expected in src/lib/rate-limit.ts isn't present in the listing; the closest is portalSignIn, which is wired only to the portal sign-in handler, not the CRM sign-in. If Better Auth's default limiter isn't actively configured in src/lib/auth/index.ts:55-113 (and I don't see a rateLimit: block there), the CRM login endpoint is effectively unrate-limited and the resolve-identifier finding compounds into a real brute-force window. Fix: add an enforcePublicRateLimit(req, 'crmSignIn', attemptedEmail) call inside withAuthAudit before forwarding to upstream.POST(forwardReq) when isSignIn, keyed per-email; declare the bucket in rate-limit.ts mirroring portalSignIn's shape.

MEDIUM

7. CRM updateUser cross-tenant email change has no notification when target is super-admin

src/lib/services/users.service.ts:236-262

When an admin at port A updates a user (including a super-admin who happens to have a port-role row at port A), the email-change flow flips Better Auth's identity instantly with only a courtesy email to the prior address. There's no challenge / token round-trip — the admin acts unilaterally. Self-service email change (/api/v1/me/email) DOES require token confirmation; admin-initiated should at least block when the target is a super-admin or require the change to go through the same confirm-token flow. Fix: gate wantsEmailChange on !profile.isSuperAdmin || ctx.isSuperAdmin and/or always use the token flow even for admin-initiated changes.

8. permission-overrides PUT does not write audit log atomically with the DB write

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:111-134

The existing row is read, then conditionally update-or-insert, but two concurrent PUTs against the same (userId, portId) race: both see existing as the same value, both call update, second writer wins silently with a last-write audit log that's missing the intermediate state. Severity is medium because the audit log still captures both writers' new values and there's no correctness invariant broken — just a forensic gap. Fix: wrap the read + update/insert in withTransaction with FOR UPDATE (or use an upsert with returning('old')-equivalent semantics) and log oldValue from the locked row.

9. Documenso webhook returns 200 on every failure including dedup, which masks crashes

src/app/api/webhooks/documenso/route.ts:264-268

The handler's outermost try/catch logs err but always returns 200. That's the correct posture for signature-invalid traffic (don't leak signal), but also masks downstream handler crashes — Documenso will never retry a 5xx because it never sees one. The handlers are documented as idempotent (handleDocumentCompleted early-returns on duplicate completion), so a retry storm wouldn't double-write, but the missing retry signal turns one transient DB failure into a permanently dropped event. Fix: return 500 on the catch branch so Documenso retries; keep 200 for secret-invalid (line 100) and dedup (line 123) since those are intentional no-ops.

10. withAuth deep-merge: permission overrides only ADD permissions, never EXPLICITLY DENY

src/lib/api/helpers.ts:73-98, 233-238

deepMerge does a recursive shallow assignment — userOverride.permissionOverrides overwrites leaves wholesale. So {clients: {view: false}} works as a deny. However the override is keyed by resource → action map, and the override row stores Partial<RolePermissions>. There's no "tri-state" (inherit/grant/deny) expressed at the DB layer — the comment in the route says "use null at a leaf to clear an override" but the Zod schema only accepts z.boolean() per leaf, not null. So the UI cannot actually clear an override leaf via this endpoint without removing the resource key entirely from the JSON. Worth aligning the schema with the documented contract. Fix: accept z.union([z.boolean(), z.null()]) and strip null leaves server-side before writing.

11. Origin check disabled in dev — but process.env.NODE_ENV check is per-process

src/middleware.ts:79-89

CSRF defense-in-depth is skipped when NODE_ENV !== 'production'. The dev/staging boundary is correct in principle, but staging deployments typically run with NODE_ENV=production, while CI / preview-builds may not. Worth confirming the Dockerfile (Dockerfile) sets NODE_ENV=production on any environment that's reachable from the internet. Note also that the fallback at src/middleware.ts:68-69 allows a request with neither Origin nor Referer through — this is correct for server-side fetches but means any HTTP client that strips both headers (curl with -H "Origin:") bypasses the check. Combined with SameSite=strict cookies the residual risk is low.

12. me/email confirm/cancel tokens are URL-only — referer leakage risk

src/app/api/v1/me/email/route.ts:88-89, src/app/api/v1/me/email/confirm/[token]/route.ts:24-35

The confirm/cancel URLs are emailed as ${baseUrl}/api/v1/me/email/confirm/${rawToken}. The user clicks from their inbox; the email client opens the URL in a browser which then renders /settings?emailChange=confirmed (a redirect). If /settings makes any third-party request before navigating away, the Referer header carries the full confirm URL including the token. The token is single-use and short-lived, so the post-redirect exposure window is small, but defensively the route should Referrer-Policy: no-referrer on the redirect response. Fix: res.headers.set('Referrer-Policy', 'no-referrer') on the NextResponse.redirect(...) call.

Summary

Two CRITICAL findings: self-targetable permission-overrides escalation (finding 1) and unlimited username harvesting at /api/auth/resolve-identifier (finding 2). Both are direct consequences of the recently-added routes that prompted this audit. The remainder are mostly hardening — the v1/* surface overall is well-disciplined: nearly every route under /api/v1/** flows through withAuth(withPermission(...)), body parsing consistently uses parseBody (only public/auth handlers use raw req.json() for documented reasons), and the few raw sql\…` usages I sampled (admin/website-submissions, admin/document-sends, search/recently-viewed) all interpolate via the parameterized tag form rather than string concat. Multi-tenant scoping looks consistent — services accept ctx.portId` and the defense-in-depth pattern is well-applied (e.g. the berth-recommender note in CLAUDE.md). The Documenso webhook receiver has solid replay/dedup/secret discipline.


2. UI/UX consistency + accessibility audit (ui-ux-auditor)

UI/UX Consistency + Accessibility Audit

Scope: Form patterns, dialog/sheet/drawer choices, mobile parity, enum leakage, empty/loading states, badge tones, a11y, plus the recently added surfaces (UserForm tabs, UserList Power toggle, UserPermissionMatrix, Login identifier field, user settings username card).


CRITICAL

C1 — window.confirm() / confirm() used for destructive flows (>=15 sites)

Files using native browser confirm instead of ConfirmationDialog (which wraps AlertDialog):

  • src/components/clients/contacts-editor.tsx:115 — remove contact
  • src/components/clients/client-files-tab.tsx:50 — delete file
  • src/components/yachts/yacht-list.tsx:187 — archive yacht (bulk)
  • src/components/admin/document-templates/template-version-history.tsx:54 — restore older version
  • src/components/shared/addresses-editor.tsx:77 — remove address
  • src/components/documents/document-detail.tsx:160 — cancel/void signing envelope
  • src/components/interests/interest-list.tsx:314 — archive interest
  • src/components/interests/interest-tabs.tsx:483 — outcome/archival flow
  • src/components/interests/interest-eoi-tab.tsx:299 — cancel EOI
  • src/components/interests/interest-reservation-tab.tsx:313 — cancel contract
  • src/components/interests/interest-contact-log-tab.tsx:222 — delete contact log
  • src/components/interests/interest-contract-tab.tsx:310 — cancel contract
  • src/components/interests/interest-documents-tab.tsx:80 — delete file
  • src/components/companies/company-files-tab.tsx:50 — delete file
  • src/components/companies/company-list.tsx:201 — archive company
  • src/components/documents/document-list.tsx:136 — delete document

Why it matters: native confirm cannot be styled, bypasses our <AlertDialog> keyboard semantics, no focus trap, no destructive-action red styling, fails focus-return after dismiss; inconsistent with the rest of the app which uses ConfirmationDialog. Several of these are catastrophic (cancel signing envelope, hard-delete file, archive company). Fix: replace each with <ConfirmationDialog destructive title=… description=… onConfirm={…}> matching the pattern in user-list.tsx.

C2 — UserForm "Permissions" tab silently drops unsaved overrides

src/components/admin/users/user-form.tsx:204-212 and user-permission-matrix.tsx:175-191. The matrix has its own "Save overrides" button; the parent Sheet's "Save changes" only persists Profile-tab fields. onSaveStateChange is declared in the matrix props but never passed by user-form.tsx (line 206), so the parent has no idea overrides are dirty. A user who toggles Inherit/Grant/Deny then clicks "Save changes" loses everything when the Sheet closes — no warning, no toast. Fix: lift overrides state to user-form.tsx, persist both endpoints inside persist(), or track dirty state via onSaveStateChange and block Sheet close with an AlertDialog.


HIGH

H1 — Raw enum render via .replace(/_/g, ' ') outside constants.ts (40+ sites)

Examples (not exhaustive):

  • src/components/documents/documents-hub.tsx:292, document-detail.tsx:204,210,386, entity-folder-view.tsx:63, hub-root-view.tsx:69, signing-details-dialog.tsx:123status, eventType, documentType
  • src/components/reservations/reservation-detail.tsx:230,285,339tenureType, agreement status
  • src/components/berths/berth-status-suggestion-dialog.tsx:61,65
  • src/components/expenses/expense-detail.tsx:229,233, expense-card.tsx:71, expense-columns.tsx:121, expense-form-dialog.tsx:257,278, expense-filters.tsx:16
  • src/components/admin/audit/audit-log-list.tsx:234-235, roles/role-list.tsx:223,239, roles/role-form.tsx:123
  • src/components/admin/users/user-permission-matrix.tsx:101 — local formatAction duplicates pattern
  • src/components/dashboard/source-conversion-chart.tsx:60, activity-feed.tsx:34,44
  • src/components/scan/scan-shell.tsx:227,242
  • src/components/interests/linked-berths-list.tsx:94, interest-tabs.tsx:40
  • src/app/(portal)/portal/{my-yachts,documents,interests}/page.tsx — portal-side enum leakage
  • src/components/search/command-search.tsx:939,965 — fallback after STAGE_LABELS

Fix: route through stageLabel, formatRole, formatOutcome, formatSource (already in constants.ts); add formatDocumentStatus, formatTenureType, formatEventType, formatExpenseCategory, formatPaymentMethod, formatBerthStatus, formatPermissionAction to constants.ts and replace call-sites. Removes "manage memberships" / "Eoi Signed" inconsistencies.

H2 — Mobile parity: 18 list components have no cardRender

DataTable already supports cardRender; without it the mobile view falls back to a raw horizontal-scroll table (bad UX on iOS):

  • src/components/reservations/reservation-list.tsx, berth-reservations-list.tsx
  • src/components/website-analytics/top-list.tsx
  • src/components/shared/notes-list.tsx
  • src/components/residential/residential-clients-list.tsx, residential-interests-list.tsx
  • src/components/documents/document-list.tsx
  • src/components/interests/linked-berths-list.tsx, recommendation-list.tsx
  • src/components/email/email-accounts-list.tsx, email-threads-list.tsx
  • src/components/reports/reports-list.tsx
  • src/components/admin/document-templates/template-list.tsx, forms/form-template-list.tsx, roles/role-list.tsx, tags/tag-list.tsx, ports/port-list.tsx

Fix: add cardRender mirroring desktop columns. UserCard/ClientCard/InterestCard are good templates.

H3 — User settings phone field is unbound on load

src/components/settings/user-settings.tsx:69-92loadProfile() reads firstName, lastName, email, etc., but never reads phone into state. Yet saveProfile() at line 143 sends phone: phone || null, which clears the user's stored phone on every save. Also country as never cast at line 298 is unsound — when no country is selected the PhoneInput shows a US flag even for European users. Fix: add phone to MeResponse + setPhone(res.data.profile?.phone ?? ''). Store country alongside phone (the PhoneInput value is {e164, country} — persist the parsed country).

H4 — UserPermissionMatrix three-state toggle has no a11y semantics

user-permission-matrix.tsx:247-267 — three sibling <button> elements with no role="radiogroup"/role="radio"/aria-checked. Screen readers announce "button, Grant" with no indication which is selected, what the options are, or that they're mutually exclusive. Also no focus ring on the active option. Fix: wrap in <div role="radiogroup" aria-label={${action} permission}> and set role="radio" aria-checked={state === opt} on each. Or use Radix RadioGroup for keyboard arrow navigation.

H5 — Login page form errors not associated with inputs

src/app/(auth)/login/page.tsx:84-119<p className="text-sm text-destructive">{errors.identifier.message}</p> is rendered after the input but the <Input> has no aria-describedby pointing at it, and no aria-invalid={!!errors.identifier}. Same for password. Screen readers won't read the error message when focus lands on the input. Fix: give each error <p id="identifier-error">, add aria-describedby={errors.identifier ? 'identifier-error' : undefined} and aria-invalid={!!errors.identifier} on the Input.

H6 — Desktop sidebar nav lacks aria-current="page"

src/components/layout/sidebar.tsx:177-201 (NavItemLink) — uses active for visual styling but doesn't set aria-current on the <Link>. Mobile bottom tabs already do this (mobile-bottom-tabs.tsx:85). Screen-reader users cannot identify the current page in the desktop sidebar. Fix: aria-current={active ? 'page' : undefined} on the <Link>.


MEDIUM

M1 — Berth status pills use ad-hoc Tailwind colors instead of StatusPill

src/components/berths/berth-columns.tsx:117-119, berth-card.tsx:21-23, berth-detail-header.tsx:90bg-green-100 text-green-800, bg-yellow-100, bg-red-100. The codebase has StatusPill (src/components/ui/status-pill.tsx) with semantic tokens (success-bg, warning-bg, error-bg) already used by docs/reservations. Berth statuses (available/under_offer/sold) map cleanly to active/expired/rejected pill states. Fix: replace ad-hoc badges with <StatusPill status={…}> and extend statusPillVariants if a new tone is needed.

M2 — UserList "Active"/"Disabled" badge inconsistent with StatusPill convention

src/components/admin/users/user-list.tsx:104-115 — uses <Badge variant="default" className="bg-green-600"> with inline green override and <Badge variant="destructive">. The Power/PowerOff icons and ShieldCheck/ShieldOff icons (also row 107/112) lack aria-hidden — but text is present so it's not blocking, just inconsistent. Fix: use <StatusPill status="active">Active</StatusPill> / <StatusPill status="archived">Disabled</StatusPill>; add aria-hidden to all decorative lucide-react icons in the table.

M3 — Only 5 of 73 dashboard routes have a loading.tsx

Only clients/[clientId], invoices, expenses, admin/errors, admin/errors/[requestId] have route-level loading skeletons. The rest fall back to a blank flash. Lists/details that fetch via React Query show a skeleton inside the component, but full-page navigations show nothing. Fix: add loading.tsx per route segment that returns a <Skeleton> matching the page chrome (sidebar/topbar already render via the layout).

M4 — UserPermissionMatrix loading state uses text, not Skeleton

user-permission-matrix.tsx:193-197 renders "Loading permissions…" text. Other list/detail loaders in the app use <Skeleton> from @/components/ui/skeleton. Adds inconsistency. Fix: replace with a Skeleton grid mirroring the accordion shape.

M5 — Settings transient messages persist forever instead of toasting

user-settings.tsx lines 167, 184, 197 (usernameMsg, emailMsg, resetMsg) — these useState strings stay rendered as a <span> next to their button indefinitely. Login uses toast.error(); reset-password and other auth surfaces also use sonner. Fix: swap to toast.success() / toast.error(). Removes stale messages and the inconsistency between auth and settings.

M6 — Email-or-username Login input: visible placeholder collides with sr-only space

src/app/(auth)/login/page.tsx:93placeholder="you@example.com or yourname" with two literal spaces. Mac VoiceOver reads "you at example dot com or yourname" — fine; but the double space is just sloppy formatting. Also the placeholder duplicates the Label "Email or username" — placeholder is unreliable for instructions (clears on focus). Fix: single-space the placeholder, or move the format hint into a <p id="identifier-hint" className="text-xs text-muted-foreground"> and wire aria-describedby.

M7 — User settings username card: client-side pattern validation never surfaces inline

user-settings.tsx:359-386pattern="^[a-z0-9._-]{2,30}$" on the input. HTML5 validation only fires on form submit (this isn't inside a <form>); the Save button is a plain <Button onClick>. So invalid input only fails server-side with a generic 400. No aria-describedby pointing at the helper text (line 382-385). Fix: add a zod-resolved react-hook-form mini-form OR validate on blur and show inline error; wire aria-describedby="username-help".

M8 — UserForm Tabs: focus does not follow tab switch & no dirty-tab indicator

user-form.tsx:194-212 — switching from Profile to Permissions doesn't move keyboard focus to the matrix; switching back loses scroll position. The Permissions tab trigger is disabled for new-user mode (correct) but has no tooltip explaining why. Fix: Radix Tabs handles focus by default; verify and add a title / aria-describedby on the disabled trigger with explanation. Add a small "•" dot on the trigger label when overrides are dirty (depends on C2 fix).

M9 — Email confirmation AlertDialog in UserForm: default focus + return focus

user-form.tsx:362-387 — opens on submit. Radix returns focus to the submit button after close (good), but the dialog's <AlertDialogAction> triggers persist() without disabling itself during the network call; rapid double-click can fire two PATCHes. Also disabled={loading} is set on action but not on <AlertDialogCancel> re-enable timing. Fix: add a submitting guard or rely on existing loading state for both buttons; close dialog only after persist() resolves.

M10 — Decorative icons missing aria-hidden

Across user-list.tsx, user-card.tsx, documents-hub.tsx, berth-status-suggestion-dialog.tsx, status pills with <ShieldCheck>, <Power>, <PowerOff>, <Globe>, etc., the icons supplement text — they should carry aria-hidden="true" so screen readers don't double-announce. Mixed across the codebase; some lucide imports get it, most don't.

M11 — Drawer vs Sheet usage drift

src/components/clients/client-interests-tab.tsx:217 uses Vaul <Drawer> for an interest preview, while every other detail-preview surface (yacht preview, company preview, reservation preview) uses <Sheet>. Vaul drawers are intended for mobile bottom-sheets; using it for an inline preview on desktop is inconsistent. Fix: standardize on <Sheet side="right"> for desktop right-rail previews; reserve <Drawer> for the mobile More menu (more-sheet.tsx).


LOW

L1 — user-permission-matrix.tsx:264 button label cosmetic uppercase done inline

{opt[0]!.toUpperCase() + opt.slice(1)} — works but the non-null ! and inline transform inside JSX is brittle. Consider a OPTION_LABELS constant.

L2 — UserList action column uses title="…" instead of accessible tooltip

user-list.tsx:135,147,180 — relies on native browser tooltips. They don't appear on touch and don't surface to screen readers; the <span className="sr-only"> carries the label which is correct, but consider Radix Tooltip for parity with the rest of the app.

L3 — Login page brand color hardcoded

src/app/(auth)/login/page.tsx:106,123#007bff / #0069d9 hex hardcoded instead of using brand-500 / brand-600 design tokens. Same issue in sidebar.tsx:190,196,379 (#3a7bc8).

L4 — formatAction duplicated locally in matrix instead of in constants.ts

user-permission-matrix.tsx:100-102 re-implements the title-case replace. Move to constants.ts as formatPermissionAction (used in 3+ files: role-list.tsx, role-form.tsx, matrix).

L5 — Hard-coded "border-amber-300 bg-amber-50" warning callouts (15+ sites)

Across bulk-archive-wizard.tsx, hard-delete-dialog.tsx, smart-archive-dialog.tsx, smart-restore-dialog.tsx, pdf-reconcile-dialog.tsx, user-settings.tsx:321, etc. Need a shared <Callout tone="warning|info|danger|success"> primitive that reads from design tokens.


Verified OK

  • Form helper coverage: react-hook-form + zodResolver, PhoneInput, CountryCombobox, TimezoneCombobox, InlineEditableField, InlineTagEditor are present and used consistently in client/yacht/company/interest forms.
  • parseBody + errorResponse envelope convention holding for new endpoints checked.
  • ConfirmationDialog correctly returns focus and traps focus via Radix AlertDialog.
  • StatusPill is the right primitive; just under-adopted (M1, M2).
  • Mobile bottom tabs handle aria-current correctly (template for H6).
  • UserCard already adds aria-label="Actions for ${displayName}" on the icon-only MoreHorizontal trigger.

3. Data model + migrations + relations audit (data-model-auditor)

Data Model + Migrations + Relations Audit

Scope: src/lib/db/schema/*.ts (24 files) and migrations 00000055. ~92 tables, multi-tenant on port_id. Drizzle ORM + postgres-js.


CRITICAL

C1 — No prod migration runner; 0052 uses CREATE INDEX CONCURRENTLY

package.json exposes only db:generate / db:push / db:studio. There is no db:migrate script, no usage of drizzle-orm/postgres-js/migrator, and no in-repo SQL replay loop. The numbered SQL files are applied by hand via psql (implicitly). 0052_audit_critical_fixes.sql runs CREATE INDEX CONCURRENTLY for six composite indexes and its header explicitly forbids wrapping in BEGIN/COMMIT — anyone running it via Drizzle's default migrator (which wraps each file in a single tx) or psql -1 will see it abort silently. The aggregated-projection queries on files/documents then fall back to seq scans in prod. Action: ship a real prod migrator that respects per-file transaction hints, or split 0052 into pre/post files, and document the runbook in CLAUDE.md.

C2 — db:push skips two structural constraints

Both are flagged in source comments:

  1. berths.current_pdf_version_idberth_pdf_versions.id FK (circular dep, set up by 0030).
  2. system_settings_key_port_idx NULLS NOT DISTINCT flag (0047) — required so global settings with port_id IS NULL are unique by key alone.

A fresh-deploy or developer onboarding via db:push produces a structurally divergent DB: dangling pointers on the active berth column, and silent duplicate global (key, NULL) settings accumulating over time. Action: post-push reconciler, or kill db:push for prod and rely solely on the SQL files.


HIGH

H1 — New user_permission_overrides.user_id lacks any FK

Migration 0055 declares user_id TEXT NOT NULL with no REFERENCES "user"(id). Compare portRoleOverrides (cascades on both port_id and role_id). Deleting a user leaves orphaned override rows; a future user.id collision (e.g. re-creating a user with the same id via fixture seed) re-applies them. Same pattern on userPortRoles.userId. The broader codebase treats better-auth user IDs as opaque strings deliberately (~17 columns), but this is a brand-new CRM-owned table where a real FK was straightforward. Action: ship 0056 adding FOREIGN KEY (user_id) REFERENCES "user"(id) ON DELETE CASCADE to user_permission_overrides and user_port_roles.

H2 — Nullable client FKs without set null block hard-delete

documents.clientId (line 72), files.clientId (line 30), email_threads.clientId, formSubmissions.clientId, documentTemplates.sourceFileId, generatedReports.fileId — nullable, declared .references(...), no onDelete. The new admin.permanently_delete_clients permission will fail with FK violation on any client with attached files/documents. The aggregated projection already preserves history via FK snapshots, so ON DELETE SET NULL is the documented intent. Action: add onDelete: 'set null' + a 0056 migration. Same shape applies to berthReservations notNull parents (berth_id, port_id, client_id, yacht_id) which have no onDelete declared in Drizzle (Drizzle emits NO ACTION — correct behavior but inconsistent with the explicit audit pattern in 0042).

H3 — yachts.current_owner_id (and friends) are polymorphic, unconstrained

The current_owner_type discriminator has the 0036 CHECK; the paired current_owner_id has no guarantee the referenced client/company row exists. Same hole on yacht_ownership_history.owner_id, invoices.billing_entity_id, audit_logs.entityId, notifications.entityId. The owner-resolver returns null for missing rows, but direct reads (audit dossier, ownership history rendering) trust the id. Action: daily reconciler reading (owner_type, owner_id) pairs against the discriminator's target table, surfacing orphan counts in the admin inspector.

H4 — Migration 0042's billing_entity_id backfill is a tombstone

UPDATE invoices SET billing_entity_id = COALESCE(NULLIF(client_name, ''), id) WHERE billing_entity_id = '' writes a clientName string as if it were an entity id. The CHECK billing_entity_id <> '' passes, but downstream billing_entity_type='client' resolution returns null forever for these rows. The fix is right (won't fail the migration) but no follow-up tooling logs the tombstones. Action: count post-0042 rows where resolver returns null and expose in the admin inspector.

H5 — System-folder write protection is service-only

assertNotSystemManaged lives in the folders service. Nothing at the DB level rejects UPDATE document_folders SET name='x' WHERE system_managed=true. The 0052-tightened chk_system_folder_shape constrains shape but not write-access. One careless db.update away from breaking the system roots invariant.


MEDIUM

M1 — Missing partial indexes on archived_at

0046 partial-archived indexes covered clients, interests, yachts, residential_clients, residential_interests. Missing: companies.archivedAt (filtered in companies.service), document_folders.archived_at (filtered in hub list queries). Volume is low so it's M, not H.

M2 — userPortRoles allows multiple roles per (user, port)

Unique index is on (userId, portId, roleId) — two role rows for the same (user, port) are permitted. getEffectivePermissions reads findFirst without an ORDER BY and silently picks one. Either tighten to (userId, portId) or union-OR the permissions across all assigned roles.

M3 — interest_berths.berth_id is restrict with no UI escape hatch

onDelete: 'restrict' is the right protective behaviour, but admins hard-deleting a berth hit a raw FK error message. Offer a "detach this berth from N interests" admin button before delete, or soften to set null with a service-side warning.

M4 — audit_logs.searchText (tsvector) lacks a GIN index in Drizzle

The column is declared but only btree indexes appear in the Drizzle table definition. Confirm 0044 (or earlier) ships USING gin (search_text) — if absent, FTS scans linearly. Action: verify and add GIN if missing.

M5 — Username docstring drift

user_profiles.username CHECK is ^[a-z0-9._-]{2,30}$ (matches the validator), but the TS docstring (src/lib/db/schema/users.ts:249) says "330 chars". Cosmetic.

M6 — Polymorphic CHECK coverage gap on document_folders.entity_type

The CHECK round (0036+0042) covered yachts.current_owner_type, invoices.billing_entity_type, yacht_ownership_history.owner_type, document_sends.document_kind. Missing: a constraint that document_folders.entity_type IN ('root','client','company','yacht') for user-created folders. chk_system_folder_shape only fires when system_managed = true.

M7 — JSONB blobs without DB-level validators

system_settings.value, audit_logs.metadata/oldValue/newValue, notifications.metadata, savedViews.filters/sortConfig/columnConfig, berth_pdf_versions.parseResults. The permission-overrides PUT route is well sanitized (ALLOWED_RESOURCE_ACTIONS allow-list before write). userProfiles.preferences is validated and 8KB-capped at the API. The others rely on per-caller validators only.

M8 — scratchpadNotes.linkedClientId crosses ports without enforcement

Notes are user-scoped (no portId), but the linked client lives in a port. A user reassigned between ports could open stale notes pointing at clients in a port they no longer access. UI port-scoped queries hide them, but raw API exposure does not.

M9 — 0027 nationality-ISO backfill is non-idempotent on dirty data

Re-running after manual edits overwrites the nationality_iso column. CLAUDE.md notes the last_imported_at guard for berths (0024/0034 mooring normalization) but 0027 has no such guard.

M10 — currency_rates has no retention

(base, target) is the only unique index; daily polling accumulates rows forever. Low-priority (daily volume is small).


Migration replayability — verdict

Idempotency is strong across 0036+: DO $$ … EXCEPTION WHEN duplicate_object blocks, IF NOT EXISTS on every CREATE INDEX, NOT VALID + VALIDATE pattern in 0042/0044/0052. The 0028→0029 split (data move then DROP interests.berth_id) is correct. 0046 DROP IF EXISTS + CREATE IF NOT EXISTS is correct. The 0050/0051/0052 folder-lifecycle chain forms a clean migration sequence with the shape CHECK tightened in the right order.

The single replayability cliff is C1 above: 0052 + the absent migrator.


Partial-unique indexes — all verified present

Constraint Index Source
one primary berth per interest idx_ib_one_primary WHERE is_primary interests.ts
one default brochure per port (non-archived) idx_brochures_one_default_per_port WHERE is_default AND archived_at IS NULL brochures.ts
username case-insensitive idx_user_profiles_username_unique ON LOWER(username) WHERE NOT NULL 0054
one open alert per fingerprint idx_alerts_fingerprint_open WHERE resolved_at IS NULL insights.ts
one active yacht owner idx_yoh_active WHERE end_date IS NULL yachts.ts
one primary contact per (client, channel) idx_cc_one_primary_per_channel WHERE is_primary clients.ts
one active reservation per berth idx_br_active WHERE status='active' reservations.ts
one subfolder per entity per port uniq_document_folders_entity WHERE entity_id IS NOT NULL documents.ts (0051)
one global setting per key (NULLS NOT DISTINCT) system_settings_key_port_idx 0047 (see C2)
one primary client address idx_ca_primary WHERE is_primary clients.ts
one primary company address idx_compa_primary WHERE is_primary companies.ts

Summary

  • CRITICAL (2): no prod migration runner for 0052's CONCURRENTLY indexes; db:push skips two structural constraints (circular FK, NULLS NOT DISTINCT).
  • HIGH (5): missing FK on new user_permission_overrides.user_id; nullable client FKs without set null block hard-delete; polymorphic owner_id un-validated; 0042 billing_entity_id tombstones invisible; system-folder write-protection is service-only.
  • MEDIUM (10): missing partial indexes on companies + document_folders; userPortRoles allows duplicate roles; interest_berths restrict has no UI escape; audit_logs FTS GIN to verify; misc docstring drift, polymorphic CHECK gap on folders, JSONB writes without DB validators, scratchpad cross-port, 0027 idempotency, currency_rates retention.

Recommended sequencing: ship a real prod migration runner (C1), then a 0056 follow-up that closes H1 + H3 (FKs on user_permission_overrides.user_id and on nullable client FKs).


4. Services + realtime + queue + storage audit (services-auditor)

Services + Realtime + Queue + Storage Audit

Scope: business-logic correctness, webhook idempotency, BullMQ workers, Socket.IO fan-out, storage backend, cross-entity port isolation, the just-added notifyAdminEmailChange helper.

Repo: new-pn-crm @ feat/documents-folders. Audit window: ~22 min. Read-only.


CRITICAL

C1. updateUserInPort email-change bypasses Better Auth account row

Files: src/lib/services/users.service.ts:236-262, 355-387

db.update(user).set({ email: ... }) writes the new email directly to the Better Auth user table. The Better Auth account table (src/lib/db/schema/users.ts:194-210, providerId='credential') carries an accountId column that is typically the user's email — used by Better Auth's password-login flow to resolve a credential row. The update does NOT touch account.accountId, does NOT invalidate active sessions, does NOT update account.updatedAt, and does NOT use Better Auth's admin API (auth.api.updateUser / setEmail). Failure modes:

  • After cutover the user cannot sign in with the new email (Better Auth resolves the credential by old accountId).
  • Existing sessions (cookie keyed to userId) continue to work with the new email already showing in profile — confusing UX, no forced re-auth.
  • The whole flow runs outside any transactionuserProfiles update (line 230), user update (line 247), userPortRoles update (line 281), audit log, and notification-fire are five independent writes. A failure between them leaves partial state with no rollback.
  • No idempotency under retry: there is no guard that the email actually differs from the current account.accountId, and the email-change notification is fire-and-forget — a retried admin request re-fires the courtesy email and rewrites all rows.

Fix: route through auth.api.updateUser (or write account.accountId + bump session invalidation) and wrap in a transaction.

C2. handleDocumentCompleted orphan blobs on mid-flight failure

File: src/lib/services/documents.service.ts:1100-1253

The idempotency early-return (doc.status === 'completed' && doc.signedFileId) only fires when both flags are set. The sequence is:

  1. downloadSignedPdf (line 1120) — may throw.
  2. storage.put(storagePath, signedPdfBuffer) (line 1134) — succeeds → blob exists.
  3. ensureEntityFolder (line 1148) — best-effort.
  4. db.insert(files) (line 1166) — succeeds → file row exists pointing at blob.
  5. db.update(documents).set({status:'completed', signedFileId}) (line 1185) — if this fails (e.g. transient connection loss after files insert), the document keeps signedFileId = NULL.

On the retry from Documenso, the early-return short-circuit is bypassed (signedFileId still NULL). The function re-downloads, re-generates a new UUID (crypto.randomUUID() at line 1131), re-puts to a new key, inserts a second files row, and only then updates the document. The first blob from step 2 + the first files row are now orphaned (unreachable via document, but the file row still exists and may surface in aggregated listings with no docs link).

Additionally, the catch block (line 1244) marks status='completed' with no signedFileId — this means the document is presented to the rep as "complete" while the signed PDF was never persisted. Subsequent webhook retries will retry (no early-return) but if Documenso stops retrying after Nth attempt, the document is permanently stuck "completed with no file."

Fix options: (a) wrap files.insert + documents.update in one transaction; (b) delete the blob in the catch when the file row insert succeeds but the document update fails; (c) refuse to mark status='completed' in the catch — leave as-is so the next retry / cron poll succeeds.


HIGH

File: src/lib/queue/workers/notifications.ts:65-71

`<p>${notif.description ?? notif.title}</p>${
  notif.link ? `<p><a href="${process.env.APP_URL}${notif.link}">View in CRM</a></p>` : ''
}`;

notif.description, notif.title, and notif.link are interpolated into HTML with no escaping. notif.link is mostly internal-generated (/documents/{id}) but several call sites push user-derived values into description (filenames, client names, custom alert text). A description of <img src=x onerror=...> ships as live HTML to the recipient's inbox. Lower-severity than C1 because most notifications are admin-only and the recipient is internal staff, but still an XSS-via-email primitive. Use the same renderEmailBody (allowlist) helper the send-out flow uses.

H2. expense-dedup.markBestDuplicate lost-update race

File: src/lib/services/expense-dedup.service.ts:58-73

scanForDuplicates returns candidates, then markBestDuplicate writes duplicateOf. Two concurrent dedup-engine runs on a pair (A,B) can each mark the other as the duplicate → mutual duplicateOf cycle, both archived later by mergeDuplicate. No advisory lock, no transaction encompassing scan + update. Also: scanForDuplicates does not filter archived_at IS NULL, so already-merged sources can resurface as candidates.

H3. notes.service dead-code dispatch helper

File: src/lib/services/notes.service.ts:80-98

tableForEntity is defined and immediately void-discarded — every CRUD branch inlines its own switch. New entity types (e.g. residential_clients) added to the type union are silently missed by inlined branches because the exhaustive-switch compiler check is absent. This is the actual drift-vector for the polymorphic dispatch CLAUDE.md called out. Either delete the helper or refactor every CRUD operation to go through it.

H4. Socket-server max-connections race

File: src/lib/socket/server.ts:103-106

const userSockets = await io!.in(`user:${session.user.id}`).fetchSockets();
if (userSockets.length >= 10) return next(new Error('Maximum connections reached'));

Between fetchSockets() and the eventual socket.join(user:${userId}) at line 132, another concurrent handshake can pass the same check. Under burst reconnect (e.g. flaky network across many tabs), users get 11+ sockets. The Redis adapter's fetchSockets is multi-pod-aware, but the gating is not atomic. Use a Redis INCR keyed by user:${id}:conn_count with TTL fallback, decrement on disconnect.

H5. Documenso webhook timing side-channel

File: route.ts:60-68 + documenso-webhook.ts:13-21

verifyDocumensoSecret short-circuits on length !== expected.length before timingSafeEqual. Combined with the linear scan across all per-port secrets, response-time deltas leak the number of ports and the length of each secret. Marginal but easy fix: pad to fixed size.

H6. Global Documenso secret silently drops events under multi-tenant ambiguity

File: src/lib/services/documents.service.ts:967-996

resolveWebhookDocument correctly refuses to mutate when documensoId matches multiple ports AND no portId was passed. The webhook route now resolves portId from the matched secret (good — see comment at line 138-143). But the global env.DOCUMENSO_WEBHOOK_SECRET fallback entry returns portId: null (port-config.ts:370), and any port still using the global secret falls back to the "ambiguous → refuse" path. Result: if two ports share the global secret, valid completion events get silently dropped instead of routed. The dedupe + dead-letter on the inbound side doesn't surface this — it just looks like Documenso never delivered. Recommend: require per-port secrets for production and warn loudly when more than one port resolves to portId: null.


MEDIUM

M1. Storage migration loads each blob fully into Node memory

File: src/lib/storage/migrate.ts:170-204 (copyAndVerify)

for await (chunk of stream) { chunks.push(chunk) } materializes the full blob in memory twice (source read + verify re-read) per file. A 200MB signed PDF or GDPR export blows the worker. Consider piping through crypto.createHash('sha256') + tee to the target backend instead of Buffer.concat. The pre-flight free-disk check (line 298-310) does Promise.all(refs.map(head)) for every blob in the table — for large files tables that's thousands of round-trips before any copy starts.

M2. archiveInterest next-in-line dossier outside transaction

File: src/lib/services/interests.service.ts:1067-1112

The IIFE that builds next-in-line notifications fires after softDelete(interests, ...) and evaluateRule — both already queued via void. If the IIFE throws after the interest is archived but before notifications send, only a logger.error lands; the archived interest stays archived with no rep notification. Acceptable as best-effort, but the dossier doesn't run inside the same audit-context request (the createAuditLog call happens earlier), so an operator reading the audit trail sees "archived" without seeing what notifications were attempted. Consider attaching the dossier result to the audit metadata.

M3. attachWorkerAudit always records portId: null

File: src/lib/queue/audit-helpers.ts:50-86

Every job-failure audit row is written with portId: null. Multi-port operators querying their port-scoped audit log will not see worker failures that affected their port (e.g. a documenso-void job carrying portId in job.data). The worker has access to job.data.portId for most queues — extract it where present.

M4. RECURRING_JOB_NAMES drift

File: src/lib/queue/audit-helpers.ts:27-48

Hardcoded Set requires manual sync against scheduler.ts. Typos silently demote cron heartbeats to regular completion logs. Either co-locate or compute from the scheduler module at boot.

M5. Aggregated workflow listing surfaces draft workflows

File: src/lib/services/documents.service.ts:1888

INFLIGHT_STATUSES = ['draft', 'sent', 'partially_signed'] includes draft. CLAUDE.md describes the UI section as "Signing-in-progress" — drafts have not been sent. Confirm intent.

M6. Documenso secrets stored plaintext in system_settings

File: src/lib/services/port-config.ts:351-373

listDocumensoWebhookSecrets reads systemSettings.value directly — no decryption. SMTP/IMAP passwords are AES-256-GCM-encrypted per CLAUDE.md; the Documenso webhook secret should be too.

M7. import worker is a no-op

File: src/lib/queue/workers/import.ts:13-17

process() body is // TODO(L2). Any job pushed to the import queue silently completes with no work — every CSV import is a silent success if the producer side ships first.


Observations on what is solid

  • handleDocumentCompleted idempotency gate (line 1110) is correct when reached. The hazard is the partial-write window above (C2), not the gate itself.
  • resolveWebhookDocument correctly refuses to mutate on multi-port ambiguity.
  • Socket auth middleware (server.ts:91-124) cross-checks the client-supplied auth.portId against userPortRoles — closes the prior tenant-room hijack.
  • Storage filesystem backend correctly refuses to start when MULTI_NODE_DEPLOYMENT=true (filesystem.ts:218) using the zod-validated env, not raw process.env.
  • Magic-byte verification is enforced both for brochures (brochures.service.ts:241-263) and berth PDFs (berth-pdf.service.ts:234-262) with delete-on-mismatch cleanup.
  • File-aggregation projection (files.ts:316-379, 526-579) applies port_id at the entry-point assert, on companies.port_id / clients.port_id / yachts.port_id joins, on files.port_id in the predicate, and on the documents LEFT JOIN's residual (line 567). Defense-in-depth is consistent.
  • Webhook worker has DNS-rebinding SSRF re-resolution at dispatch (webhooks.ts:18-45) and dead-letter handling with operator notifications.

Headline asks: C1 (Better Auth identity rotation), C2 (orphan-blob window), H1 (notification email XSS), H6 (global webhook secret ambiguity drops events silently).


5. Performance + code-trim + render-smoothness audit (perf-test-auditor)

Performance + Testing-Coverage Audit

Branch: feat/documents-folders · Scope: static analysis only. Numbers: 116 vitest files / 1293 tests · 33 smoke specs · 68 services files (15 with a unit-test file → 78 % of services have zero unit tests).


CRITICAL

C1. Zero test coverage for the user-mgmt + permission-override slice just shipped (commit 660553c)

git diff main adds: username sign-in, identifier resolver, per-user permission-override matrix, role-label rendering, search keyword index, user disable/email-change paths, dashboard widget toggles.

grep -rn 'username\|resolve-identifier\|permission-overrides' tests/no matches. Not one smoke spec, not one integration test, not one unit test. The feature ships dark.

Highest-risk slices:

  • POST /api/auth/resolve-identifier — public, unauthenticated, rate-limited via a shared auth bucket. Anti-enumeration relies on a synthetic @auth.invalid fall-through. A wrong shape regression here silently re-enables username enumeration. Needs a vitest test with hit/miss/ empty/error paths.
  • PUT /api/v1/admin/users/[id]/permission-overrides — the schema allow-list (lines 4780 of the route) is hand-maintained against RolePermissions. A drift here lets an admin grant themselves unlisted leaves. There's already a if (targetUserId === ctx.userId) self-target check; no test ensures it stays.
  • The UserPermissionMatrix is the only UI for the new overrides table and is not rendered by any spec.

→ Fix: add at minimum one smoke spec under tests/e2e/smoke/24-admin-features.spec.ts that logs in with username, opens an admin user, toggles a grant/deny, and reloads. Add a vitest test against resolve-identifier covering the four branches.

C2. Documents-hub aggregated projection runs 2 × (N companies + N yachts + N clients) sequential queries

src/lib/services/documents.service.ts:1923-1956 (workflow groups) and the file-aggregation cousin do this:

for (const {id, name} of related.companies) {
  const g = await fetchWorkflowGroupRows(portId, eq(documents.companyId, id));
  …
}

fetchWorkflowGroupRows itself issues a SELECT + a separate COUNT (2 round-trips). For a client with 5 companies + 5 yachts + 3 sibling clients, opening the Documents tab fires (5+5+3)×2 = 26 sequential queries on the inflight projection alone, plus another ~26 on the files-aggregated cousin (mentioned in CLAUDE.md), so ~50 sequential round-trips for a single tab open.

→ Fix: switch to a single SQL WHERE … IN (UNNEST(:companyIds)) GROUP BY :source_kind returning grouped rows + a count window, or at minimum Promise.all the per-id calls so latency is parallel.


HIGH

H1. listUsers is sequential and unbounded (no pagination)

src/lib/services/users.service.ts:16-104 — two sequential SELECTs (port-role rows then super-admin rows). Should be one query with a UNION/LEFT JOIN, or at minimum Promise.all. No limit/offset. For the multi-tenant install where a port could grow to thousands of users this becomes O(N) memory + payload per admin page open. GET /api/v1/admin/users also lacks pagination.

→ Fix: collapse to one SQL with LEFT JOIN userPortRoles … OR userProfiles.isSuperAdmin, add limit/offset, surface { data, total, hasMore }.

H2. DataTable rebuilds the columns array on every render

src/components/shared/data-table.tsx:109-137 constructs allColumns on every render with no useMemo. TanStack Table's docs explicitly warn this resets internal state (sorting, column resizing, virtual scrolling indices) every render. For the clients/interests lists with 50+ rows and 10+ columns this stalls every parent state change.

→ Fix: useMemo(() => […selectColumn?, …columns], [bulkActions, columns]).

H3. Recharts is statically imported in widget-registry.tsx — every dashboard chart ships in the initial bundle

src/components/dashboard/widget-registry.tsx:15-25 static-imports 7 chart files which in turn pull recharts (~80150 KB gzipped). The registry is the only entry point for the dashboard so the first dashboard load pays the entire recharts cost even for users whose widgets are all hidden.

→ Fix: const PipelineFunnelChart = dynamic(() => import('./pipeline-funnel-chart').then(m => m.PipelineFunnelChart), { ssr:false, loading: () => <WidgetSkeleton/> }) per chart. Same fix for website-analytics/pageviews-chart.tsx.

H4. tiptap-to-pdfme.ts (571-line module) ships to the client just for TEMPLATE_VARIABLES

src/components/admin/document-templates/{template-form,template-preview}.tsx import TEMPLATE_VARIABLES from @/lib/pdf/tiptap-to-pdfme. The named import drags the whole module (~570 lines of TipTap→pdfme transform logic) into the client bundle even though only ~60 lines of constant data are used. The @pdfme/common import is type-only so that part is stripped, but the runtime code still ships.

→ Fix: split TEMPLATE_VARIABLES into a leaf file (@/lib/pdf/template-variables.ts) that has no other imports; have tiptap-to-pdfme.ts re-export it for server-side callers.

H5. notifications.service.ts:updatePreferences runs N sequential upserts in a loop

src/lib/services/notifications.service.ts:368-385 — one INSERT … ON CONFLICT per preference row. For ~30 notification types that's 30 round trips per "Save preferences" click. Trivially batchable as a single db.insert().values(rows).onConflictDoUpdate(…).

H6. GET .../permission-overrides chains 5 sequential round-trips

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:99-138 goes profile → portRole → role → portOverride → userOverride sequentially. Each is independent on the userId once profile is loaded; collapse the trailing four into Promise.all.

H7. command-search.tsx invalidates two query keys every time the dropdown opens

src/components/search/command-search.tsx:142-146:

useEffect(() => {
  if (!showDropdown) return;
  queryClient.invalidateQueries({ queryKey: ['search', 'recently-viewed'] });
  queryClient.invalidateQueries({ queryKey: ['search', 'recent-terms'] });
}, [showDropdown, queryClient]);

Each time the user clicks the search box, two queries refire. The useSearch hook already sets staleTime: 30_000 for these. Invalidating on every open defeats the staleTime entirely. Use the existing staleTime or refetchOnMount: 'always' for a single trigger.


MEDIUM

M1. UserPermissionMatrix re-creates setOverrides closures every render

src/components/admin/users/user-permission-matrix.tsx:158-169 defines setState (non-memoized) and passes it down inside .map() rows. The component itself is small (~180 leaves) so the impact is modest, but the 3-state buttons render 540 closures every save. Wrap setState/getState in useCallback, or pull them out as module-scope pure helpers taking (overrides, setOverrides).

M2. dashboard.service.ts:getPipelineForecast scans every active interest into memory

Lines 119-156 fetch every non-archived interest + primary berth and reduce in JS. Push to SQL with SUM(price * CASE WHEN stage = … END) GROUP BY pipeline_stage.

M3. documents.service.ts:listDocuments LEFT-JOINs documents on signed_file_id, but no pagination on folder views

grep indicates listDocuments has no limit/offset when folderId is set. A folder with 1000+ files would dump the whole set. Verify with a quick read; if missing, add the same { limit, offset } pattern used by /api/v1/clients.

M4. service tests gap: 53 of 68 service files have zero unit-test files

Service files with no test include several high-risk surfaces:

  • interests.service.ts (multi-berth, primary-flag invariants)
  • documents.service.ts (folder soft-rescue, owner-wins chain, system-folder lock)
  • document-folders.service.ts (cycle prevention, sibling uniqueness)
  • notifications.service.ts (preference dedupe key, watcher fan-out)
  • users.service.ts (createUser, role assignment, deactivate path)
  • dashboard.service.ts (forecast math, hot-deal rank)
  • client-merge.service.ts, client-hard-delete.service.ts, client-archive.service.ts (destructive paths — exactly the surfaces most worth testing)
  • interest-berths.service.ts (the "never query from outside this service" rule has no integration test enforcing the partial unique index logic)
  • email-accounts.service.ts (AES-256-GCM round-trip, no test ensures the decrypt path stays sound after a key rotation)
  • recently-viewed.service.ts, search.service.ts (search bucket expansion already partly tested but service-level branches missing)

Even one happy-path + one edge-case test per file would lift the coverage floor enormously.

M5. Playwright coverage gap matches the new flows verbatim

grep -rn 'username\|permission-overrides\|disable.*user\|email[-_]change\|widget.*toggle' tests/e2e/no hits.

Specs that should exist but don't:

  • tests/e2e/smoke/01-auth.spec.ts — currently only tests email login; needs a username-only path.
  • tests/e2e/smoke/24-admin-features.spec.ts — needs a permission-override three-state toggle path and a user disable path.
  • tests/e2e/smoke/10-dashboard.spec.ts — needs a widget visibility toggle
    • reload assertion (the persistence path in user_profiles.preferences has only the strict-allow-list cap, no UI-level integration test).

M6. realtime-toasts.tsx was modified without a Playwright spec or vitest unit

Modified in this branch; no spec covers toast deduplication / port-id filtering. Realtime fan-out is a textbook noisy-neighbor surface — a regression here floods users with toast spam.

M7. interest-detail-header.tsx, yacht-tabs.tsx, user-card.tsx modifications

These three changed in this branch with no corresponding spec change. The smoke 02-crud-spine.spec.ts exercises the underlying CRUD but doesn't assert the new inline-edit visuals shipped in commit 04a5949.

M8. clients.service.ts:986 db.query.clients.findMany with no limit

The function looks like a "find all matching" helper. If it's reached by any non-internal call path it could dump every client. Worth a direct read and a limit arg.

M9. command-search.tsx paste handler awaits apiFetch inside onPaste synchronously

Lines 189-206 — onPaste is async and awaits apiFetch before e.preventDefault(). By the time the fetch resolves, the paste event has already been processed and the text dropped into the input. The preventDefault inside the if (res.found && res.href) block silently no-ops on most pastes. Either preventDefault unconditionally up-front, or read e.clipboardData and treat as plain "lookup + navigate" without trying to cancel.

M10. audit-search.service.ts:80 and gdpr-export.service.ts:266 use findMany without limit

Both are admin-only but a long-running port renders the audit page in tens of seconds.


Good news / verified safe

  • serverExternalPackages in next.config.ts keeps pino/bullmq/ioredis/ minio/postgres/better-auth/nodemailer off the client.
  • NAV_CATALOG (175 entries) is only reached via dynamic import() from search.service.ts. Server-only, not in any client bundle.
  • lucide-react, pdfme/generator, pdf-lib, recharts are absent from RSC client boundaries except the dashboard widgets (H3).
  • lucide-react imports are all named (tree-shake safe).
  • No sync crypto, no sync PDF rendering in request handlers. JSON.parse only on cheap surfaces.
  • useSearch debounce + keepPreviousData + staleTime: 30s is correct.

6. Observability + i18n + docs-drift audit (obs-i18n-docs-auditor)

Observability + i18n + Docs-Drift Audit

Auditor: obs-i18n-docs-auditor • Branch: feat/documents-folders • Date: 2026-05-12 Scope: A) createAuditLog coverage, pino discipline, error-event pipeline; B) timezone / currency / country / date-picker; C) CLAUDE.md, BACKLOG.md, numbered specs, admin-search keywords, NAV_CATALOG hrefs.


CRITICAL

src/lib/services/search-nav-catalog.ts — three confirmed-dead entries; cmd-K search routes users to non-existent routes:

Catalog href Actual route Notes
/:portSlug/admin/audit-log /:portSlug/admin/audit Audit log card link
/:portSlug/admin/error-events /:portSlug/admin/errors Super-admin platform errors
/:portSlug/user-settings /:portSlug/settings/profile User-menu uses correct path

Also dead — these :portSlug/settings/<X> paths have no folder under src/app/(dashboard)/[portSlug]/settings/; the only subroute that exists is profile/:

  • /:portSlug/settings/email
  • /:portSlug/settings/branding
  • /:portSlug/settings/templates
  • /:portSlug/settings/storage
  • /:portSlug/settings/recommender
  • /:portSlug/settings/tags
  • /:portSlug/settings/notifications

These look like aliases that were intended to deep-link inside /settings tabs but never wired up. Either redirect them to /admin/<x> (which all exist) or render real settings/<x> pages.

C2. Webhooks bypass the platform-error pipeline

src/app/api/webhooks/documenso/route.ts is the only webhook route in the repo and it does NOT call errorResponse(...) / captureErrorEvent(...). The handler always returns 200 with logger.error(...) only, so admin/errors never sees Documenso webhook crashes — the CLAUDE.md/docs imply errors flow into error_events universally but webhooks are silently outside that flow. Recommended: wrap the handler in a try/catch that calls captureErrorEvent({ statusCode: 500, error, metadata: { source: 'webhook', event } }) before returning 200.


HIGH

H1. PDF templates hard-code en-GB date locale (ignores user prefs)

Every numbered PDF template hardcodes toLocaleString('en-GB', …) / toLocaleDateString('en-GB') regardless of the rendering user's locale/timezone:

  • src/lib/pdf/templates/interest-summary-template.ts:85,162
  • src/lib/pdf/templates/client-summary-template.ts:97,133,143,156
  • src/lib/pdf/templates/berth-spec-template.ts:172,187
  • src/lib/pdf/templates/invoice-template.ts:116
  • src/lib/pdf/templates/reports/{activity,occupancy,pipeline,revenue}-report.ts — all use 'en-GB'
  • src/lib/email/templates/document-signing.ts:141completedAt.toLocaleString('en-GB', …)

CLAUDE.md and the new dashboard greeting / timezone-drift banner suggest the rep's locale + timezone is honoured end-to-end. It isn't — at the PDF / signing-email surface we silently revert to en-GB. User-preference timezone/locale from user_profiles is plumbed nowhere into these templates.

H2. PDF templates hard-code USD price formatting & build raw Number().toLocaleString() strings

  • interest-summary-template.ts:112, berth-spec-template.ts:127,172berth.priceCurrency ?? 'USD' followed by Number(price).toLocaleString() (no Intl currency formatter, no grouping conventions per locale).
  • reports/pipeline-report.ts:93, reports/revenue-report.ts:78,86Number(...).toLocaleString() with no currency code at all in revenue report.

Single-source formatCurrency() exists at src/lib/utils/currency.ts and is used everywhere else — these templates should call it.

H3. Dashboard widgets hard-code 'USD' despite per-port berths_default_currency

berths_default_currency is a system_settings key (admin/settings/settings-manager.tsx:223). But:

  • src/components/dashboard/kpi-cards.tsx:19formatCurrency(value, 'USD', …)
  • src/components/dashboard/revenue-forecast.tsx:25 — same
  • src/components/dashboard/pipeline-value-tile.tsx:45,47 — same (the inner data field is pipelineValueUsd — backend converts to USD before sending). The "pipeline value tile" claim that the comment says "USD-denominated" is fine, but the KPI / forecast tiles silently render Euro/GBP ports as USD.

H4. CLAUDE.md missing two new auth surfaces

Migrations 0054 (user_profiles.username) and 0055 (user_permission_overrides) shipped in this branch. CLAUDE.md has zero mention of:

  • Username sign-in alternative (login form + resolve-identifier endpoint + src/lib/validators/username.ts).
  • Per-user permission overrides (effective-permission chain is now: role → port_role_overrides → user_permission_overrides).

The "Conventions / Auth" section currently implies user_port_roles.role is the leaf authority. New developers won't know to apply user-level overrides when reasoning about effective permissions.

H5. feedback_pwa_assets_pending memory is stale

User memory says PWA assets (icon-192.png, icon-512.png, icon-512-maskable.png) must be added before shipping Phase B scanner. All three exist in public/ plus apple-touch-icon.png. Memory should be cleared.


MEDIUM

M1. archiveBrochure has no createAuditLog call

src/lib/services/brochures.service.ts:191 — service-level archive (archivedAt + isDefault: false) commits without an audit row. Every other archive/delete in this branch (yachts, clients, companies, interests, berths, documents, document-folders, files, invoices, document-templates, email-accounts, users, roles, portal-auth, custom-fields) creates audit logs. Brochures is the outlier — same UX risk as the others (admin can swap default brochure with no trail).

M2. PII risk: portal-auth logs the email address of unknown / disabled-portal users

src/lib/services/portal-auth.service.ts:356,373,423 log email, user.email. Logger redact paths cover passwords / tokens / encrypted blobs but not email / *.email. For most CRM logging this is fine (emails are not secret in this app), but the portal-reset paths specifically log emails of users outside the active session — a quiet PII surface in log aggregators. Recommend either (a) hash-prefix the email (hash6(email)) before logging, or (b) accept-but-document.

M3. Pino logger.info discipline — Documenso & IMAP chatter

  • src/app/api/webhooks/documenso/route.ts:122,156,187,243,258,262 — six logger.info(…) per webhook fire (duplicate skip, lifecycle event, unhandled event type). At realistic Documenso traffic + retry pressure this is noisy. Consider downgrading the 'Documenso lifecycle event' line at L258 (fires on every valid event) to logger.debug.
  • src/lib/services/email-threads.service.ts:290,298,358 — IMAP sync logs mailbox.exists, messageCount, 'No new messages to sync' at info on every poll. At 5-min poll cadence × 24h × N accounts this floods info-level logs. Should be logger.debug.

M4. Timezone-aware reminder dueAt storage looks correct but UI hands off naïve strings

reminders.dueAt is stored as TIMESTAMPTZ (reminders.service.ts:179new Date(data.dueAt)). The validator accepts an ISO string. <DateTimePicker> in date-time inputs reads new Date(input) from the browser — interpretation is local-TZ for YYYY-MM-DDTHH:mm strings, UTC for full ISO with Z. Worth a focused look on the picker component to confirm it emits Z-terminated ISO (else "reminder at 9 AM" means 9 AM browser-local on creation, but server's formatInTimeZone against the rep's chosen TZ will misalign). I did not open <DateTimePicker> itself in this audit — flagging as a 30-minute follow-up.

M5. CLAUDE.md numbered-spec section frames 01-15 as authoritative

CLAUDE.md says:

"Numbered spec files in repo root (01-…through 15-…) contain detailed architecture decisions, feature specs, DB schema docs, API catalog, and implementation sequence."

These specs document the pre-rebuild Nuxt 3 / NocoDB system being migrated FROM. 01-CONSOLIDATED-SYSTEM-SPEC.md header reads "Compiled: 2026-03-11" and the stack tables describe Nuxt 3 SPA, NocoDB, Keycloak OIDC, etc. — none of which are the live Next.js + Drizzle + better-auth stack. New contributors reading CLAUDE.md will be sent down the wrong path. Recommend reframing to "legacy reference for the rebuild target" or moving them to docs/legacy/.

M6. BACKLOG.md doc-folders entry stale vs reality

docs/BACKLOG.md E. "Hidden / stubbed UI tabs" still lists Company Documents tab as landed 2026-05-08 but in the same section says Berth Waiting List + Maintenance Log tabs are "Removed entirely; revisit if/when product asks" — yet src/components/berths/berth-tabs.tsx still imports and renders the tabs strip (the comment in CLAUDE.md is silent on this). Not blocker — just a doc/code drift.

M7. Admin sections browser missing two real admin routes

src/components/admin/admin-sections-browser.tsx registers 30 hrefs. Two /[portSlug]/admin/* routes exist but are NOT surfaced in the browser:

  • /admin/brochures (full UI exists at src/app/(dashboard)/[portSlug]/admin/brochures/page.tsx)
  • /admin/errors (super-admin platform-errors inspector, real route)

The NAV_CATALOG catalog (Cmd-K) covers "Platform errors" via the (dead — see C1) error-events href but no entry for /admin/brochures. Reps cannot discover the brochure admin surface from either the section card grid or global search.

M8. Settings-manager keyword catalog drift

admin-sections-browser keywords list (settings card) is in sync with settings-manager.tsx KNOWN_SETTINGS keys today (21 keys, 39 aliases). However, two settings exist in production that are NOT in either list:

  • documenso_signing_order (CLAUDE.md L46) — typeable via Documenso admin page, not the generic Settings card; reasonable to omit but flag if you want unified search.
  • documenso_redirect_url — same.

Not a bug — just confirming the drift surface is intentional.


What was NOT in scope but worth a quick note

  • Audit-coverage spot-check for sensitive mutations: clients/yachts/companies/interests/berths/documents/document-folders/document-templates/invoices/users/roles/portal-auth/files/custom-fields/document-sends/email-accounts ALL call createAuditLog. Only brochures.archiveBrochure is missing (M1). No other gaps spotted in services that I sampled.
  • Pino redact paths cover passwords, tokens, secrets, encrypted blobs, Authorization headers, cookie headers, two-level nesting — comprehensive. Only soft gap is email field (M2).
  • Error pipeline: errors.ts → captureErrorEvent is invoked on every 5xx route response; the error_events table is read by admin/errors. Looks complete for API routes — gap is webhooks (C2).
  • Country/nationality: consistently stored as ISO-3166-1 alpha-2 across clients, companies, residential — validators centralized in src/lib/validators/i18n.ts. Good.

  1. (C1) Fix the 10 dead search-nav-catalog.ts hrefs — pure typo fixes, very high user-visible impact.
  2. (C2) Wrap webhook handler in captureErrorEvent.
  3. (H4) Add CLAUDE.md sections for username + user permission overrides.
  4. (H1, H2, H3) PDF & dashboard locale/currency consistency pass — plumb user prefs through, kill remaining 'en-GB' / 'USD' hardcodes.
  5. (M1) createAuditLog in archiveBrochure.
  6. (M5) Reframe or relocate numbered specs 01-15.
  7. (M3) Demote 2-3 chatty logger.info lines to logger.debug.
  8. (H5) Clear stale pwa_assets_pending user memory.

~1450 words.


7. Concurrency + race conditions audit (concurrency-auditor)

Concurrency & Race-Condition Audit — pn-crm

Scope: 22-minute read-only sweep of services, queue workers, webhook handlers, and schema invariants. Findings grouped by severity. Line references against feat/documents-folders.


CRITICAL

C-1. handleDocumentCompleted TOCTTOU lets two concurrent webhook retries both download + persist the signed PDF

src/lib/services/documents.service.ts:1100-1190

The idempotency gate if (doc.status === 'completed' && doc.signedFileId) return; is read outside any row lock. The CLAUDE.md note ("idempotent — early-returns when …") is true for sequential retries but not for concurrent ones.

Real-world hit: Documenso retries DOCUMENT_COMPLETED on a 5xx, and the local poll worker also reconciles. If both arrive within milliseconds (e.g. the receiver was slow once then retried while the poll worker also fires), both pass the gate, both call downloadSignedPdf + storage.put, both db.insert(files), and both UPDATE documents.signed_file_id. The losing file row stays in files, but its blob has no documents row pointing at it → permanent orphan blob plus a duplicate file in the entity folder.

Fix: wrap the whole block in db.transaction + SELECT … FROM documents WHERE id = $1 FOR UPDATE before re-checking the gate. (Or add a partial unique index (document_id) WHERE document_type='signed_pdf' on files so the DB rejects the second insert.)

C-2. BullMQ jobs have NO jobId — every webhook retry / duplicate enqueue creates a new job

src/lib/queue/index.ts:24-39 (queue defaults), every call site of queue.add(...) (see inquiry-notifications.service.ts:51,118, webhook-dispatch.ts:59, webhooks.service.ts:323,374, invoices.ts:661,790, gdpr-export.service.ts:101, reports.service.ts:90, email-draft.service.ts:59, notifications.service.ts:165, expenses.ts:190, documents.service.ts send-out paths).

BullMQ deduplicates only when callers pass { jobId: stableKey }. Nothing in the repo does. Implications:

  • A second Documenso 5xx retry that comes through handleDocumentCompleted doesn't go through BullMQ, but webhook outbound deliveries via webhook-dispatch.ts go through queue.add('deliver', …). If dispatchEvent is called twice for the same event id, both rows get a delivery job and the external endpoint sees the event twice.
  • notifications.service.ts:165 enqueues a notification-email job per insert; the dedupeKey collapses DB rows but not jobs. A createNotification that fails after the dedupe collapse but before the job add can leave the queue short; a successful dedupe still adds a fresh job each call.
  • The maintenance queue (concurrency 1, attempts 3) backs off on failure and with no removeOnFail cap on count, a misbehaving job that errors thousands of times can balloon Redis.

Fix: every "logically once-per-X" job needs jobId: \${name}:${entityId}`, plus a removeOnFail: { count: 1000 }` cap on the queue defaults.

C-3. advanceStageIfBehind double-fires evaluateRule on parallel webhook deliveries

src/lib/services/interests.service.ts:881-908. The current-stage read is plain db.query.interests.findFirst, no FOR UPDATE. Two concurrent calls (DOCUMENT_SIGNED

  • DOCUMENT_COMPLETED, both routed to eoi_signed, arriving in parallel because Documenso may send the RECIPIENT_SIGNED + DOCUMENT_COMPLETED pair near-simultaneously) both observe currentIdx < targetIdx, both call changeInterestStage, both trigger evaluateRule('eoi_signed', …). The downstream berth-rule then auto-flips the berth status twice — and if the rule has any side effect like queue.add (it does — see berth-rules-engine), you get two of them.

The comment at line 1274 ("Guard against double-fire") shows the author noticed the risk but only added an idempotency check on the eoi-signed-and-beyond branch, not on the advanceStageIfBehind path itself.

Fix: pull the interest row with FOR UPDATE inside advanceStageIfBehind and call changeInterestStage in the same transaction, OR move the rule-fire side effect inside changeInterestStage and gate it on the actual UPDATE returning a row (i.e. .returning() + check updated.pipelineStage === target).


HIGH

H-1. moveFolder cycle check is not under a row lock — concurrent moves can create a cycle

src/lib/services/document-folders.service.ts:212-275. The cycle check walks ancestors of the destination outside any transaction; the actual UPDATE happens in a separate statement. Two reps moving folders simultaneously can each pass the cycle check against pre-state, then both commit, leaving folder A under B under A. The system folders are protected by assertNotSystemManaged, but any two user folders are vulnerable. Subsequent reads (the cursor walks in listDocuments(..., includeDescendants=true)) would infinite-loop until the seen guard bails — but the tree is now inconsistent.

Fix: open a transaction at the top, SELECT FOR UPDATE the moving folder, walk the ancestor chain inside the same transaction, then UPDATE. PostgreSQL's default READ COMMITTED isolation doesn't see other in-flight updates without the lock.

H-2. Berth-PDF upload writes the blob BEFORE acquiring the advisory lock

src/lib/services/berth-pdf.service.ts:204-294. Step 3 calls backend.put(...); step 4 takes the per-berth advisory lock + inserts the version row. If the transaction in step 4 fails for any reason other than the unique-index conflict (e.g. FK violation, network blip, statement timeout), the blob is already at its UUID path with no DB row pointing at it → orphan blob. The author noted the unique-index ⇒ orphan risk is mitigated by the UUID path (the second blob gets its own path so no overwrite), but didn't address the "tx aborts, blob stays" branch.

Fix: stage the blob in storage after allocating the version row (or wrap both in a saga that deletes the blob on tx rollback via a finalizer).

H-3. upsertInterestBerth isPrimary=true race demotes nothing then both inserts succeed

src/lib/services/interest-berths.service.ts:181-265. Inside the transaction it UPDATE …SET isPrimary=false WHERE interestId=$1 AND isPrimary=true then INSERT … (isPrimary: true). At default READ COMMITTED, two concurrent setPrimaryBerth(X, A) + setPrimaryBerth(X, B) will: both UPDATE (no rows on the first call so no lock — but the second's UPDATE may now hit the freshly inserted row from the first). The partial unique index on (interestId) WHERE isPrimary=true catches the second insert — but only after the first tx commits. If both txns interleave their UPDATE+INSERT before commit, postgres serializes the unique-index check and one fails with 23505. Currently that bubbles up as a generic 500, not as a friendly conflict — and a fast retry would succeed because the loser saw the winner's row and would simply demote it. So the data invariant holds, but the UX surfaces a confusing error.

Fix: catch 23505 on idx_interest_berths_one_primary and either retry once or map to a ConflictError so the toast says "another rep just changed the primary berth, refreshing".

H-4. Admin email-change leaves orphan sessions on the old email

src/lib/services/users.service.ts:233-262. The admin UI flips user.email directly on the Better-Auth user row but never deletes the target user's existing sessions. Concurrent sessions of the affected user keep working under the new email (because Better-Auth indexes sessions by userid, not email) — that's fine. **But the _previous email is now free** to be claimed by a fresh signup before the admin sends the "your email was changed" notice. There's no unique constraint that prevents an attacker from re-registering as old@example.com and taking over outgoing identity artefacts (audit logs reference user_id not email, so this is just identity hygiene; still, the surface exists). Worse — there's no emailVerified = false reset on the swap, so the new email is auto-treated as verified without ever receiving a confirmation.

Fix: in the same transaction, also revoke the user's sessions if the change is admin-initiated (db.delete(session).where(eq(session.userId, …))), and re-set emailVerified = false so the next sign-in goes through the re-verify flow.

H-5. userEmailChanges has no partial unique index on (userId) WHERE not applied/cancelled

src/lib/db/schema/users.ts:360-379. A user can spam the self-service email-change endpoint to create unlimited pending rows. Each row mails the NEW address. Anti-abuse is missing at the DB layer — only application-side rate limit (which I didn't fully audit) stands between a user and unbounded email send-out from your domain.

Fix: CREATE UNIQUE INDEX user_email_changes_one_pending ON user_email_changes (user_id) WHERE applied_at IS NULL AND cancelled_at IS NULL;

H-6. Email-confirm token isn't atomically consumed

src/app/api/v1/me/email/confirm/[token]/route.ts:28-57. Three separate statements: SELECT pending, UPDATE user, UPDATE pending.appliedat. No transaction wrapper. A user who double-clicks the email link (or a link preview pre-fetcher like Outlook SafeLinks) fires two near-simultaneous GETs. Both pass the appliedAt IS NULL check, both flip user.email (idempotent — same value), both mark applied. Functional, but the second audit-log entry is misleading. More importantly: if the second click arrives 200ms later AND the user re-fired a _different change in between that the first click happened to apply, you've stomped state.

Fix: single transaction, SELECT … FOR UPDATE the pending row, branch on its post-lock state.


MEDIUM

M-1. Unbounded fan-out on Promise.all per recipient

  • src/lib/services/inquiry-notifications.service.ts:116-129 — fans out one emailQueue.add per external recipient with no concurrency cap. With a ports admin who lists 500 emails (no UI cap I saw), one inquiry submission pushes 500 queue inserts concurrently. Redis survives it; the surge in pipelined Redis commands can stall co-tenant queues.
  • src/lib/services/notifications.service.ts:344-358 — same shape for document events: one createNotification per recipient, fully parallel. Each createNotification does its own DB insert AND its own queue.add('send-notification-email', …). Big-port notifications can fan out to dozens of users simultaneously per document event.

Fix: use p-limit(10) (already in similar shape elsewhere) or batch with queue.addBulk. Not data-corrupting; tail-latency / resource concern.

M-2. Username uniqueness TOCTTOU surfaces as 500 instead of ConflictError

src/app/api/v1/me/route.ts:137-145. The LOWER(username) SELECT runs outside any lock, and the partial unique index idx_user_profiles_username_unique is the actual guard. Two reps claiming dm concurrently: one succeeds, the other gets a generic 500 (the 23505 is not caught and rewritten). The pre-check shows "available" right before the failed write, which is a worse UX than a clean "already taken" message.

Fix: catch 23505 on the unique index name and translate to ConflictError.

M-3. ensureSystemRoots self-heal recursion not bounded

src/lib/services/document-folders.service.ts:512-516. If ensureSystemRoots throws transiently (e.g. DB hiccup), ensureEntityFolder recurses into itself with no depth guard. In normal operation the second pass will find the root and return; in a pathological case (root is created but the post-insert SELECT fails repeatedly), this can stack. Low-likelihood but trivial to fix with a "called from self-heal already" flag.

M-4. Optimistic UI rollback drops the user's pending edits

src/components/interests/pipeline-board.tsx:120-143. The optimistic update overwrites query data without snapshotting the prior value — on error, the rollback path just invalidateQueries, which refetches from the server. If two reps drag the same card, the second's drop happens after the first's server commit; the server then accepts the second drag, but the first rep's view briefly shows their own change before the next invalidation pulls in the truth. Last-write-wins semantics with no warning. Acceptable today for single-rep ports; will get reported as a bug when teams scale. No version header (If-Unmodified-Since / ETag) anywhere in the API.

M-5. Filesystem storage backend not multi-node — silent corruption if MULTI_NODE_DEPLOYMENT mis-set

CLAUDE.md and src/lib/storage/ say the filesystem backend refuses to start when MULTI_NODE_DEPLOYMENT=true. If that env var defaults to unset and the operator forgets to flip it, two app nodes both write to their own local FS, each thinking they own the only copy. Not a code bug but a configuration cliff edge — worth re-stating in the deploy runbook.

M-6. BullMQ default removeOnFail retains failed jobs for 7 days without count cap

src/lib/queue/index.ts:33. Same volume risk as C-2: a noisy worker that fails 10k times per day fills Redis. Add count: 1000.


Notes on what looks GOOD

  • archiveClientWithDecisions (client-archive.service.ts:165-300) — proper FOR UPDATE on client + berth rows inside one transaction. Exemplary.
  • Berth-PDF version-number allocation under pg_advisory_xact_lock — correct pattern, modulo H-2 above.
  • NocoDB import (scripts/import-berths-from-nocodb.ts:137) — stable 64-bit advisory key, scoped to transaction.
  • Document-folder backfill (scripts/backfill-document-folders.ts:77) — per-port advisory key via hashtext(portId), idempotent.
  • ensureSystemRoots ON CONFLICT DO NOTHING pattern is race-safe.
  • ensureEntityFolder correctly distinguishes entity-id race vs sibling-name race and re-SELECTs the winner.
  • interestBerths partial unique index on is_primary enforces the invariant at the DB layer (H-3 is only a UX gripe, not a data integrity issue).
  • brochures.is_default partial unique index works the same way.
  • Documenso verifyDocumensoSecret uses timing-safe equality (good).

  1. C-1 (signed-PDF orphan) — add row lock or unique index. Highest data-loss risk.
  2. C-2 (jobIds everywhere) — broad blast radius, mechanical fix.
  3. H-4/H-5 (admin email change + pending-change unique index) — security-adjacent.
  4. C-3, H-1, H-2, H-3, H-6 — correctness in real-world retry/burst scenarios.
  5. Medium tier as time allows.

~1430 words.


8. GDPR + privacy + PII audit (gdpr-auditor)

GDPR + Privacy + PII Audit

Repo: new-pn-crm @ feat/documents-folders. Read-only audit. Findings are grouped by severity. Line numbers are approximate.


CRITICAL

C1. GDPR export bundle is materially incomplete — Article 15 violation

src/lib/services/gdpr-bundle-builder.ts enumerates only a subset of tables that hold the data subject's PII. The following tables reference the client (client_id FK) but are NOT included in the bundle:

  • portal_users — the portal account itself (email, name, lastLoginAt, isActive, createdBy). Strictly required: a copy of the account record is core "data we hold about you."
  • email_threads / email_messages — full inbound/outbound correspondence including bodyText, bodyHtml, attachment IDs. This is the most PII-dense table in the system.
  • document_sends — brochure / send-out audit with recipient_email (brochures.ts).
  • reminders — operations table with clientId FK.
  • formSubmissions (public form intake) — already collected via documents for the linked path, but rows where client_id is set directly are missed.
  • files — files attached directly to a client (files.client_id ≠ via documents). The builder pulls documents only.
  • scratchpadNotes.linkedClientId — rep-side free-text notes that reference the client.
  • clientMergeLog — historical merge records that survived earlier deduplications.
  • contact_log (referenced from operations.ts).
  • website_submissions (raw inbound inquiries before they were promoted to interests).

The bundle currently advertises itself as the Article-15 dump. Hand-delivering it would expose the controller to a regulator finding of incomplete disclosure. Fix: widen buildClientBundle to cover every client_id-referencing table (the schema grep below produces the complete list); the audit-log limit of 500 events should also be lifted or paginated for long-tenured clients.

C2. "Right to be forgotten" leaves email correspondence bodies intact

client-hard-delete.service.ts nullifies emailThreads.clientId so the thread (with bodyHtml / bodyText of every inbound + outbound message + the subject's address in from_address / to_addresses) survives the delete in perpetuity. Same pattern for files.clientId, documents.clientId, formSubmissions.clientId, reminders.clientId, documentSends.clientId. The justification in the file ("keep their audit history") is reasonable for audit metadata but the actual PII content (email body, file contents, form answers, recipient emails on brochure sends) is preserved verbatim. A subject who exercised their right to erasure has not, in practice, been erased.

Fix options: (a) cascade-delete email_threads / email_messages on client hard-delete; (b) blank body_text / body_html / address columns inline; (c) require a separate "destructive erasure" mode that the smart-archive flow ladders into.

C3. Username → email enumeration on public endpoint

src/app/api/auth/resolve-identifier/route.ts returns the canonical email when a known username is supplied (line 88: return NextResponse.json({ email: rows[0]!.email })). The miss-path returns a synthetic .invalid address, which protects hit/miss equality, but any successful hit leaks the linked email to an anonymous caller. Rate-limit is 5/15min/IP — sufficient to thwart wordlist brute-force but trivial to walk known/leaked usernames. This is also the entire point a malicious actor would call this endpoint (compromised-credentials stuffing).

Fix: don't echo the resolved email back to the client. Instead, set a short-lived signed cookie / Redis key keyed by the IP+identifier that the subsequent signIn call consumes, or hand the resolved email straight to Better Auth server-side and return only { ok: true }.

C4. Audit-log metadata is unmasked and stores raw PII forever

src/lib/audit.ts maskSensitiveFields covers oldValue/newValue only — metadata is written raw. Multiple call-sites stuff full email addresses into metadata:

  • client-hard-delete.service.ts:135 (metadata: { sentTo: u.email })
  • client-hard-delete.service.ts:350 (bulk variant)
  • portal-auth.service.ts:123 / 187 / 336 / 363 / 380 / 403 (every portal lifecycle event)
  • crm-invite.service.ts:206 / 264 (metadata: { email: invite.email })
  • email-accounts.service.ts:76 / 145 (emailAddress)

Compounded with C5 (no audit-log retention), every staff member's, invitee's, and portal user's email lives in audit_logs.metadata indefinitely with ip_address + user_agent next to it. This is GDPR data-minimisation/storage-limitation breach territory.

Fix: extend maskSensitiveFields to walk metadata recursively, or stop emitting full emails into metadata (use user IDs + a join on demand). The masking set also needs emailAddress and sentTo aliases.


HIGH

H1. No retention policy on audit_logs

src/lib/queue/scheduler.ts registers retention crons for ai_usage, error_events, website_submissions, but not audit_logs. The schema docs the table being kept indefinitely (no pruning worker exists). With IP + user-agent on every row, plus PII in metadata (C4), the table grows unbounded and replays PII forever.

Fix: add a audit-log-retention maintenance job. Recommended split: keep severity ∈ {warning, error, critical} and source = auth for 2 years (legal/security), prune everything else after 12 months. Make the window admin-configurable.

H2. error_events request-body excerpts redact only secret-shaped keys

error-events.service.ts SENSITIVE_KEYS redacts password/token/apiKey/creditCard/ssn etc. — but NOT email, phone, name, dob, address. Any 5xx on POST /api/v1/clients, POST /api/v1/portal-users, POST /api/v1/clients/[id]/contacts, POST /api/v1/admin/users lands the requester's full client-create payload in error_events.request_body_excerpt. Retention is 90 days (good) but the captured rows are visible to every super-admin via the inspector. Fix: add PII keys to SENSITIVE_KEYS or whitelist-only the body schema per route.

H3. Email recipient address logged at debug level

src/lib/email/index.ts:154 — every outbound email logs { to, originalTo, subject }. In prod this is info only if LOG_LEVEL=debug, so usually safe, but the originalTo field also leaks the redirect-target's real address when EMAIL_REDIRECT_TO is set in dev. Tighten to messageId + portId + bool once redirect path is exercised.

H4. Portal lastLoginAt & email kept after client hard-delete

On client hard-delete the portal_users.client_id cascade fires, so the portal user is removed — good. But portal_users.email has a global unique index (idx_portal_users_email_unique) with no port_id. A previously-deleted portal user blocks a new portal account at a different port from re-using that email until/unless the cascade fires. More importantly, if cascade ever doesn't run (e.g. archive-only, no hard delete), the portal account row survives with the email. Verify the archive path also disables/erases the portal user, or document the asymmetry.

H5. Encryption-key rotation is non-incremental for SMTP/IMAP creds

src/lib/utils/encryption.ts hard-codes a single env var (EMAIL_CREDENTIAL_KEY) with no key-version or KID stored on the ciphertext. Rotating the key requires an offline mass re-encrypt; there is no migration path. The same applies to the S3 secret key (storage/s3.ts:74), webhook secret (workers/webhooks.ts:116), and storage_proxy_hmac (storage/filesystem.ts:415). Each decrypt-failure path falls through silently. Fix: prefix ciphertexts with a kid field, support 2 active keys at once, and ship a rotation script that re-wraps ciphertexts to the new key.

H6. Activation/reset tokens travel in URL query strings

portal-auth.service.ts:147 / 408 and crm-invite.service.ts:71 / 233 ship ?token=… in the activation/reset links. The token hash is stored server-side (good), but URL-borne tokens land in browser history, reverse-proxy access logs, Cloudflare logs, and Referer headers if the activation page links anywhere external (e.g. terms-of-service). Common pattern but worth flagging — consider hash fragments (#token=…) which browsers never put in Referer.

H7. IP address recorded on every audit event without a lawful-basis note

audit_logs.ip_address (system.ts:38) and the legacy second copy (line 305) are populated unconditionally. Storing IPs is lawful under "legitimate interest" for security-relevant events, but for routine update/view/create of a client record the lawful-basis argument is much thinner under recent EU regulator guidance. Fix: only retain ip_address on source ∈ {auth, webhook} and on severity ∈ {warning, error, critical}; null it on routine user-source events at write time.


MEDIUM

M1. Recent-search Redis key holds verbatim free-text queries

search.service.ts:2147 saves the raw search term to Redis under recent-search:<userId>:<portId>. If a rep types a client's email/phone/SSN to find them, that string lives in Redis with a 7-day TTL (per the constant) and is not in the GDPR bundle. Low-volume, but document and add to the bundle.

M2. GDPR-export confirmation email contains client name verbatim

client-hard-delete.service.ts sends a confirmation code email to the requester with the deleted client's fullName in subject + body. Reasonable for human verification, but it means the operator's mailbox (often Gmail/Outlook) holds the to-be-erased client's name after deletion. Document this in the privacy notice or strip to initials.

M3. GDPR export ZIP retention overlaps with subject-erasure

The bundle expires 30 days after generation (EXPIRY_DAYS = 30) — but if a subject requests both export and erasure inside that window, the staged ZIP in MinIO will outlive the database row. The cleanup cron only checks expires_at. Fix: when hardDeleteClient runs, delete any non-expired gdpr_exports blobs for that client immediately.

M4. Documenso webhook & document_sends body leak via audit metadata

docs.documenso webhook handler logs signatureHash only (good), but document_sends rows store full recipient_email on set null cascade — so when the linked client is hard-deleted, the recipient email survives on the send row. Same pattern as C2 but for the brochure channel.

M5. portal_auth_tokens.tokenHash is SHA-256, not constant-time-compared

Tokens are hashed before storage (good) but the lookup where idx_portal_tokens_hash_unique uses normal equality. Since the index lookup is O(1) indexed-equality, timing attacks are not viable here — flagged for documentation only.

M6. error_events.error_stack may contain user-supplied strings

Stack traces are 4 KB capped and only fire on 5xx, but PG-driver errors include the offending statement / parameter values in .message (e.g. duplicate-key violations expose the conflicting email). Already mitigated by error-coding most known cases through CodedError, but a defensive scrub on errorMessage for @-shaped or +\d{6,} substrings would harden the inspector.

M7. EMAIL_REDIRECT_TO is enforced only by env, not by a build assertion

The README warns it must be unset in production but there's no runtime guard. A misconfigured prod could silently redirect ALL outbound client mail to a single inbox. Fix: in src/lib/env.ts, refuse EMAIL_REDIRECT_TO when NODE_ENV === 'production'.

M8. Cross-port portal_users.email unique index leaks tenancy

Multi-tenant model says ports are isolated, but the global-unique email index means port A can probe whether email X is already a portal user at any other port by attempting to invite them and reading the conflict error. Tiny enumeration vector, fix by scoping the unique index to (port_id, lower(email)).


Notes / good practices observed

  • audit.ts maskSensitiveFields exists and is applied to old/new JSON.
  • logger.ts ships a thorough redact.paths list covering auth headers, encrypted credentials, cookies.
  • GDPR-export uses presigned 7-day URL (acceptable, behind email auth).
  • Hard-delete is two-factor (permission + email code + typed name) and gated on prior archive.
  • error_events has a 90-day retention cron (migration 0040).
  • AES-256-GCM for encryption at rest is correctly implemented with random IV + auth tag.

Top-3 fix priority

  1. C1 + C2 — complete the GDPR export bundle and make hard-delete actually erase email/file/document content (not just sever FKs).
  2. C3 — stop echoing real emails out of the public resolve-identifier endpoint.
  3. C4 + H1 + H2 — mask PII in audit_logs.metadata and error_events.request_body_excerpt; add an audit_logs retention cron.

9. Email deliverability + template quality audit (email-auditor)

Audit #9 — Email deliverability + template quality

Scope: src/lib/email/**, src/components/admin/email-*, src/app/(dashboard)/[portSlug]/admin/email/page.tsx. Read-only.

Severity legend: CRITICAL = production-breaking / security; HIGH = silent feature bug or data leak; MEDIUM = rendering / brand / spam risk; LOW = polish.


CRITICAL

C1. EMAIL_REDIRECT_TO has no production guard

src/lib/env.ts:41 declares it z.string().email().optional() with no NODE_ENV constraint. src/lib/email/index.ts:131-133 silently rewrites every recipient when it is set. CLAUDE.md says "must be unset in production" but nothing enforces it: a stray prod .env value would funnel every client/portal/EOI invitation to one address with zero alarms. The only signal is a logger.debug line (index.ts:154-156) — debug, not warn, and no startup banner. The companion documenso-client.ts, email-compose.service.ts, and webhooks.ts paths also silently honour it. Add a refinement in env.ts rejecting it (or at least logger.fatal-ing) when NODE_ENV === 'production', and emit a startup logger.warn whenever it is set so it shows up in container logs.

C2. Unescaped URL interpolation into href attributes (XSS-able in browser previews)

Every template inlines ${data.link} / ${data.signingUrl} / ${data.loginUrl} / ${item.link} / ${data.inboxLink} / ${data.crmDeepLink} / ${data.signingUrl} / ${crmUrl} (the last is escaped in inquiry-sales-notification.ts:34 — sole exception) directly into href="…" and into the visible link text. escapeHtml is applied to every recipient name and copy string but never to URLs:

  • crm-invite.ts:42, 48
  • portal-auth.ts:50, 56, 106, 110
  • admin-email-change.ts:48
  • document-signing.ts:92, 98, 211, 216
  • notification-digest.ts:58, 74
  • residential-inquiry.ts:76

Most URLs come from server-built strings (token endpoint + base URL), but notification-digest.items[].link is sourced from notification rows whose deep links can include user-typed entity titles / search queries depending on the producer. A single " in any of those will break out of the attribute. Email clients (Apple Mail, Outlook Web) render the resulting HTML and attribute injection becomes click-jacking / open-redirect. Cheapest fix: pass every URL through encodeURI or an escapeAttribute helper before interpolation, and reject javascript: / data: schemes at the helper level. None of the templates currently verify https:// prefix.


HIGH

H1. Template-subject override mechanism is silently disconnected for ~half the catalog

src/lib/email/template-catalog.ts:39-98 advertises 8 templates as customisable, and src/components/admin/email-templates-admin.tsx exposes a subject editor. But several templates don't accept overrides.subject and so the admin's edit is silently ignored:

  • inquiry-client-confirmation.ts:23-26 — no overrides.subject path
  • inquiry-sales-notification.ts:24 — same
  • residential-inquiry.ts:19, 61 — both functions, no override path
  • crm-invite.ts:26 — no override path

Only portal-auth.ts (activation/reset) and document-signing.ts honour overrides.subject. Admins editing the "Inquiry — client confirmation" subject in /admin/email-templates see the green "Saved" toast and nothing changes. This is the kind of bug users don't report; they assume their override worked.

H2. Catalog defaultSubject strings DO NOT MATCH the literal subjects the code emits

The admin UI shows Default: <catalog string> so users can tell whether they have customised, but every comparison is broken because the strings diverge from the actual templates:

Template Catalog default Code-emitted subject
crm_invite You have been invited to {{portName}} CRM You're invited to the ${portName} CRM
inquiry_client_confirmation We received your inquiry — {{portName}} Thank You for Your Interest in Berth ${mooringNumber} (or …in a ${portName} Berth)
inquiry_sales_notification New berth inquiry — {{clientName}} New Interest - ${portName}
residential_inquiry_client_confirmation We received your residential inquiry — {{portName}} Thank You for Your Interest - ${portName} Residences
residential_inquiry_sales_alert New residential inquiry — {{clientName}} New Residential Inquiry - ${data.fullName}
portal_activation Activate your {{portName}} client portal account matches
portal_reset Reset your {{portName}} client portal password matches

Combined with H1 this means the entire admin-customisation surface is half wired. Pick one of: (a) wire overrides.subject through every template and remove the divergence, or (b) drop the catalog rows for templates that can't be customised yet.

H3. Email Settings page exposes dead form fields

src/app/(dashboard)/[portSlug]/admin/email/page.tsx:34-48 lets admins set email_signature_html and email_footer_html. getPortEmailConfig (port-config.ts:153-154,179-180) reads them into PortEmailConfig.signatureHtml / footerHtml. But sendEmail (src/lib/email/index.ts) never reads or injects them, and shell.ts only reads the unrelated branding_email_footer_html / branding_email_header_html keys (via getBrandingShellgetPortBrandingConfig, lines 39-40 of shell.ts).

Result: the "Default signature (HTML)" and "Email footer (HTML)" controls on /admin/email are write-only sinks. Admins customise the footer; outbound emails never include it. There is a real customer-confidence hit here — a port admin will set a legal disclaimer expecting it on every send. Either (a) wire cfg.footerHtml/signatureHtml into renderShell or sendEmail, or (b) delete the fields from the admin page and consolidate on the Branding-page keys.

H4. residential-inquiry.ts returns NO plaintext fallback

residentialClientConfirmation (line 41) and residentialSalesAlert (line 78) return { subject, html } only — no text. Every other template returns text. Lack of a text part materially hurts spam scoring and breaks plain-text-only readers (some BlackBerry, screen-reader bridges, legacy MTAs). sendEmail (index.ts:144-152) honours text when present, so the consequence is "no plain-text MIME part is attached" — Gmail will still render it but Spamassassin's MIME_HTML_ONLY adds points.

H5. inquiry-sales-notification includes crmUrl (which the admin may set) without scheme validation

inquiry-sales-notification.ts:34 does escapeHtml(crmUrl) then drops it into both href and visible text. Escaping prevents attribute breakout but a javascript: or data:text/html scheme survives entity-encoding. Validate the scheme is https?: server-side before passing in (the producer probably already does; defence-in-depth here is one regex).


MEDIUM

M1. Admin-authored emailHeaderHtml / emailFooterHtml injected raw

shell.ts:39-40, 64-66 interpolates branding HTML directly: ${headerHtml ? \

${headerHtml}
` : ''}. Source is system_settings.branding_email_header_html (admin-only write). An admin account compromise → arbitrary HTML in every outbound email (, tracking pixels exfiltrating recipient IPs, phishing forms in some clients). Email clients largely strip script, but , `, and CSS-position overlays still work in Apple Mail / Outlook desktop. Mitigation: run admin input through a server-side sanitiser (DOMPurify / sanitize-html with an email-safe allowlist).

M2. No dark-mode safety

shell.ts:42-74 ships no <meta name="color-scheme" content="light dark"> / <meta name="supported-color-schemes"> and no @media (prefers-color-scheme: dark) rules. Apple Mail and Gmail auto-invert backgrounds: the #ffffff card stays white but body text (#333333) gets pseudo-darkened in some clients, and the #666 muted copy can drop below contrast threshold. Cards with box-shadow: 0 2px 4px rgba(0,0,0,0.1) render as halos in dark mode.

M3. No MSO / Outlook fallbacks

The shell has no <!--[if mso]> conditionals. The CTA buttons are CSS-padded <a> — Outlook 2016/2019 on Windows renders them as tiny underlined text (the link works but the button shape is gone). Recommended: VML rect fallback inside <!--[if mso]> for every CTA, or switch to bulletproof-button pattern (mso-padding-alt, text-decoration:none etc.).

M4. Background image won't render where it matters most

shell.ts:55 sets background-image: url('…Overhead_1_blur.png') on the outer <table>. Outlook strips background-image entirely (no VML fallback supplied); Gmail mobile sometimes too. The background-color:#f2f2f2 fallback works but the brand impression is lost in the highest-volume client. Either drop the bg image (CLAUDE.md flags moving the asset off s3.portnimara.com anyway) or add VML rect for Outlook.

M5. No preheader

No hidden inbox-preview text. Currently the first visible line ("Welcome to {portName} CRM" or "Just a quick reminder") leaks into the preview pane after the subject. A 1px hidden preheader (<div style="display:none;max-height:0;overflow:hidden;">…</div>) is one of the highest-ROI deliverability tweaks; missing here.

M6. Logo width="100" without 2x source / explicit height

shell.ts:62. Apple Mail on retina renders this scaled — acceptable since the source PNG is 250px wide. But height is unset which forces some clients to recompute mid-render (jank). Add height="100" (assuming square) and the asset is fine.


LOW

  • L1. Hardcoded en-GB date locale. document-signing.ts:141 calls toLocaleString('en-GB', …). CRM is positioned as multi-port; once a non-UK port arrives this is wrong. Read locale from port-config.
  • L2. No List-Unsubscribe / List-Unsubscribe-Post headers. sendEmail adds none. Gmail's Feb-2024 bulk-sender requirements made List-Unsubscribe: <mailto:> table-stakes for any sender exceeding 5k/day. Transactional senders technically exempt, but with notification digests + inquiry confirmations flowing through the same SMTP path, hitting that threshold is plausible. One-line fix.
  • L3. No Message-ID / In-Reply-To threading for digest-style mails. Each digest is a new thread; users will hate this once volume rises.
  • L4. logger.debug in sendEmail (index.ts:154) emits recipient address. PII in log lines. At debug level so prod typically masks, but worth pino-redacting to and originalTo.
  • L5. crmInviteEmail, adminEmailChangeEmail, notificationDigestEmail not in TEMPLATE_KEYS. Means the admin can't customise their subjects at all — inconsistent with the rest. Either add them or document the omission.
  • L6. Hardcoded English copy. Every template — buttons ("Sign in", "Activate account", "Set up your account"), greetings ("Dear …", "Hi …"), legal-ish boilerplate ("If you didn't request this …"). No i18n hook. Out-of-scope for v1 but flag for the Phase-7 cutover note in project_email_ownership_at_cutover.md.
  • L7. SMTP_FROM fallback in sendEmail builds noreply@${env.SMTP_HOST}. If SMTP_HOST is smtp.gmail.com the From becomes noreply@smtp.gmail.com — invalid sender, instant SPF fail. Acceptable because production sets SMTP_FROM explicitly, but worth a logger.warn when this fallback is hit.
  • L8. Subject prefix when redirecting (index.ts:132-134) — [redirected from x@y] appears verbatim and is fine in dev, but if EMAIL_REDIRECT_TO ever slips into prod (see C1) this is the only forensic trail.

What's good

  • All admin-supplied content (names, descriptions, custom messages, notes) is consistently escapeHtml-ed before interpolation. The only escape gaps are URLs (C2).
  • Per-port branding shell is well-isolated; getBrandingShell falls back to defaults cleanly.
  • resolveAttachments enforces portId cross-tenant isolation (index.ts:94-96).
  • SMTP timeouts are explicit (SMTP_TIMEOUTS, index.ts:20-24) — averts the BullMQ-slot starvation the comments warn about.
  • EMAIL_REDIRECT_TO plumbing is consistent across sendEmail, documenso-client, email-compose, webhooks workers — when set, every outbound channel honours it.

Suggested fix order

  1. C1 (production guard for EMAIL_REDIRECT_TO)
  2. C2 (URL escaping / scheme allowlist)
  3. H3 (delete or wire up the dead Email Settings fields — fastest unblocker for admins)
  4. H1 + H2 (fix catalog / wire override paths)
  5. H4 (add plaintext to residential templates)
  6. M1 (sanitise admin HTML)
  7. M2 / M3 / M5 (dark mode + MSO + preheader)
  8. The Lows whenever convenient.

Total: ~1460 words.


10. Error UX + failure-mode resilience audit (error-ux-auditor)

Error UX + Failure-Mode Resilience Audit

Repo: /Users/matt/Repos/new-pn-crm — branch feat/documents-folders. Scope: route-segment error/not-found/loading coverage, error-boundary placement, toast quality, leak surface, degradation when Redis / SMTP / Documenso / MinIO are down.


CRITICAL

C1. Only ONE error.tsx for the entire app, no not-found.tsx per group, no global-error.tsx

find src/app -name "error.tsx" returns exactly src/app/(dashboard)/error.tsx. Plus src/app/not-found.tsx (root) and a single loading.tsx under clients/[clientId]. 73 dashboard pages, 0 portal/auth/scanner error files.

Consequences:

  1. A throw inside /portal/* (dashboard, invoices, documents, profile) has no boundary — Next default unstyled page, no branding, no requestId.
  2. (auth) (login, reset-password, set-password) same exposure.
  3. (scanner)/[portSlug]/scan — receipt scanner on a phone, throws on Tesseract/OpenAI failure, no fallback.
  4. notFound() calls inside portal routes fall through to the root not-found.tsx, which links to /dashboard — wrong destination for portal users (lands them at CRM login).
  5. No global-error.tsx — if RootLayout throws (it reads cookies + ALS), user gets Next default.

Add at minimum:

  • src/app/(portal)/error.tsx + src/app/(portal)/portal/not-found.tsx
  • src/app/(auth)/error.tsx (wrapped in BrandedAuthShell)
  • src/app/(scanner)/[portSlug]/scan/error.tsx
  • src/app/(dashboard)/[portSlug]/not-found.tsx (port-aware link target)
  • src/app/global-error.tsx

C2. 14+ naked toast.error(err.message) call sites bypass toastError()

grep "toast.error.*err.message" returns 14 hits: client-list.tsx, hard-delete-dialog.tsx, bulk-archive-wizard.tsx, smart-restore-dialog.tsx, smart-archive-dialog.tsx, bulk-hard-delete-dialog.tsx, portal/change-password-form.tsx, more. These drop:

  • the stable error code line
  • the Reference ID line
  • the "Copy ID" action button

On a 500 they show "Internal server error" with nothing copyable. On a network failure they show "Failed to fetch" (raw TypeError). Several of these files already import toastError for other call sites — the swap is mechanical: toast.error(err instanceof Error ? err.message : 'X failed')toastError(err, 'X failed').

C3. apiFetch collapses 502/504 with non-JSON body to "Bad Gateway", no requestId

src/lib/api/client.ts:75:

const error = await res.json().catch(() => ({ error: res.statusText }));

Reverse-proxy error pages (nginx, Cloudflare) deliver HTML, not JSON. The user gets ApiError{ message: "Bad Gateway", code: null, requestId: null } — no "Copy ID" action. When proxy fails, the user has nothing to paste to support. Synthesize a client-side correlation ID + a "The server is unreachable. Please try again." string when status >= 500 && JSON.parse fails.

C4. Redis outage wedges every rate-limited route — including login

src/lib/redis.ts uses maxRetriesPerRequest: 3. After exhaustion every call from checkRateLimit() (rate-limit.ts:44) throws ioredis errors. withRateLimit doesn't try/catch, so it bubbles to errorResponse() as 500. /api/auth/* is wrapped in withRateLimit('auth')a Redis blip 500s login. Same exposure for portal sign-in (portalSignIn limiter on /api/portal/auth/sign-in).

Fix: in checkRateLimit, catch redis errors and fail-open for auth / portal-signin (log "rate-limit subsystem unavailable, allowing request") or fall back to a local in-memory limiter.

Same audit needed for BullMQ getQueue().add() calls — confirm user-blocking enqueues (sendForSigning, requestGdprExport) degrade to "we'll process this later" instead of 500.


HIGH

H1. SMTP failure semantics differ across callers

sendEmail() has 10s/10s/30s timeouts (email/index.ts:20) — good. Callers diverge:

  • users.service.ts:381 (admin email-change notify) — logger.warn + swallow. ✓
  • me/email/route.ts:93Promise.allSettled. ✓
  • document-signing-emails.service.ts:169,211,247throws → 500. Documenso already sent; user sees a 500 even though the workflow succeeded. Wrap in try/catch + mark delivery_status: 'failed' so the inbox panel surfaces a retry button.
  • queue/workers/email.ts — BullMQ retries 5× then permanent failure. No DLQ admin surface (webhooks have one at webhooks.ts:281; mirror it).

H2. Storage timeout error lacks semantic name → bad classifier hint

src/lib/storage/s3.ts:52:

throw new Error(`S3 ${label} timed out after ${ms}ms`);

error-classifier.ts ERROR_NAME_HINTS looks for TimeoutError; this throws plain Error. The path-based classifier catches "Storage backend" first but loses the timeout-vs-misconfig distinction. Define a class TimeoutError extends Error and throw it from withTimeout.

H3. Documenso outage: error codes good, UI feedback poor

documenso-client.ts:42-60 maps to DOCUMENSO_TIMEOUT, _AUTH_FAILURE, _UPSTREAM_ERROR. Toasts render cleanly. Missing:

  • The signing page doesn't show "Documenso is unreachable, your draft is saved." Users refresh and assume the draft is gone.
  • The webhook receiver has no per-port rate-limit on 5xx. Documenso retry storms can land if our handler regresses.

H4. Heavy components have no error boundaries except /dashboard/page.tsx widgets

WidgetErrorBoundary is used in 4 dashboard widgets. NOT wrapped around:

  • command-search.tsx (1177 lines) — mounted in header; one render throw kills the entire shell.
  • invoice-pdf-preview.tsx (pdfme — known to throw on malformed font/image).
  • pageviews-chart.tsx, pipeline-funnel-chart.tsx, charts outside /dashboard/page.tsx.
  • The signed-PDF iframe inside documents/[id]/page.tsx — when MinIO is down, chrome-internal error renders in-place with no retry.

Wrap each in WidgetErrorBoundary with a sensible fallback.

H5. New + public routes bypass errorResponse()

grep "errorResponse" shows 691 hits. The exceptions don't propagate X-Request-Id and produce inconsistent shapes:

  • src/app/api/storage/[token]/route.ts — bare NextResponse.json({error:'Invalid or expired token'}), no requestId.
  • src/app/api/public/website-inquiries/route.ts:75,122 — bare {error:'Unauthorized'} / {error:'Unknown port'}.
  • src/app/api/webhooks/documenso/route.ts:100{ok:false, error:'Invalid secret'} 200. Returning 200 is correct (no Documenso retry storms), but the literal string "Invalid secret" confirms the endpoint expects a secret. Drop the string.
  • src/app/api/auth/resolve-identifier/route.ts:91 — defensive 200 returning synthetic email. By design — keep.

MEDIUM

Links to /dashboard. Portal users hit /portal/dashboard, unauth users need /login or /portal/login. Detect cookie/route prefix.

M2. Suspense boundaries are sparse — 9 across src/app, 0 in components

Only set-password, portal/activate, portal/reset-password wrap useSearchParams in Suspense. Every detail page (yacht, company, interest, berth, document, invoice, expense, reservation) flashes empty header on direct URL visits because there's no loading.tsx. Only clients/[clientId]/loading.tsx exists — replicate the pattern across detail routes.

M3. error_events capture is fire-and-forget — DB write failure swallowed silently

void captureErrorEvent({…}) (errors.ts:146, 170). If the DB is up but the insert fails (FK to a deleted user, etc.) the row is lost forever and the super-admin can't trace the original error. Add a fallback that writes to pino with tag:'error_event_capture_failed' so the super-admin can grep server logs as a last resort.

M4. PG-error 23505 sometimes leaks as 500 instead of ConflictError

berth-reservations.service.ts:29-43 and document-folders.service.ts:18-31 explicitly map 23505 → ConflictError. Confirm clients.service.ts, companies.service.ts, yachts.service.ts write paths do the same — at least one or two likely bubble a raw PG error and 500 on duplicate email / duplicate mooring instead of a 409 with a friendly "this name is already in use" message.

M5. /api/ready doesn't exist

/api/health (liveness) returns 200 unconditionally — correct. The comment promises /api/ready for deep checks, but find shows no such file. /api/public/health does deep checks gated by WEBSITE_INTAKE_SECRET — wire k8s readiness probe to it or stub /api/ready.

M6. formatErrorBanner (admin inline forms) doesn't render "Copy ID" action

Lives in toast-error.ts next to toastError(). The toast version has a button; the banner is plain text. Admin users hitting a 500 from inline forms get the reference ID printed but can't click-to-copy. Either build <ErrorBanner err={…}> as a React component or accept the gap.

M7. Worker BullMQ failures have no user-visible surface beyond webhooks

logger.error({jobId, err}, '<queue> job failed') is uniform across all workers (email, documents, notifications, reports, export, ai, webhooks). Only webhooks.ts:281 plumbs a dead_letter notification on permanent failure. Notification/email/export workers should follow suit — for example, a stuck GDPR export should email the user "your export failed, retry from /settings/data."

M8. Portal auth pages would lose brand on render error

Portal-auth pages wrap content in BrandedAuthShell. A throw inside the shell or form lands at Next default page (no (portal)/error.tsx). Add (portal)/portal/error.tsx that renders <BrandedAuthShell> around the error so the brand survives.


Summary

Severity Count
CRITICAL 4
HIGH 5
MEDIUM 8

Highest leverage: ship the 4 missing route-segment files (C1), sweep the 14 bare toast.error(err.message) sites to toastError() (C2), make checkRateLimit fail-open when Redis is down (C4). Together these mean every user-visible degradation is branded + every 5xx surfaces a copy-pasteable reference ID.


14. Documenso integration depth audit (documenso-auditor)

Documenso Integration Depth Audit — Task #14

Scope: documenso-client.ts, documenso-payload.ts, eoi-context.ts, app/api/webhooks/documenso/route.ts. Read-only. Severity: CRITICAL / HIGH / MEDIUM.


CRITICAL

C1. In-app EOI pathway bypasses per-port Documenso config

generateAndSignViaInApp in document-templates.ts calls documensoCreate(...) and documensoSend(...) without portId (lines 831843). resolveCreds then returns the global env triple (DOCUMENSO_API_URL, DOCUMENSO_API_KEY, DOCUMENSO_API_VERSION).

Consequences on multi-tenant deployments:

  • Per-port apiVersion ignored → a v2 port silently hits v1 endpoint paths (or vice versa); createDocument/sendDocument pick the wrong branch.
  • Per-port apiKey ignored → auth fails on tenants whose key is only in system_settings.documenso_api_key_override.
  • redirectUrl and signingOrder SEQUENTIAL/PARALLEL settings never plumbed — the in-app pathway passes no meta arg. Signers always land on Documenso's default thank-you page and v2 ports always sign PARALLEL regardless of admin choice.

Fix: thread portId and a CreateDocumentMeta built from getPortDocumensoConfig(portId) into both calls — mirror generateAndSignViaDocumensoTemplate at lines 894910.

C2. handleDocumentCompleted idempotency has a real cross-channel race

The early-return at documents.service.ts:1110 (if (doc.status === 'completed' && doc.signedFileId) return;) is necessary but not sufficient. Two write paths can race:

  1. Webhook receiver → handleDocumentCompleted.
  2. Background poll worker jobs/processors/documenso-poll.ts:63 → same call (same args).

The route-level documentEvents.signatureHash dedup only catches webhook→webhook repeats. It does not catch webhook + poll, because the poll worker bypasses the webhook entry point and has no signatureHash row. Both can:

  1. Resolve doc (status=sent, signedFileId=null).
  2. Pass the gate.
  3. downloadSignedPdfstorage.putdb.insert(files)db.update(documents).set({ status:'completed', signedFileId }).

Outcome: two files rows, two MinIO blobs; the second UPDATE overwrites signedFileId, orphaning the first row + blob (no DB pointer, never GC'd).

Fix: wrap gate-and-write in a transaction with SELECT ... FOR UPDATE on the documents row, or a pre-claim UPDATE documents SET status='completing' WHERE id=? AND status != 'completed' RETURNING * that atomically reserves the row.

C3. Webhook silently swallows handler errors → permanent event loss

route.ts:264266 catches every handler throw, logs, returns 200 (intentional — "always 200"). But a transient storage/DB failure inside handleDocumentCompleted is lost forever — Documenso records the event as delivered and never retries. Poll worker is the only safety net.

Fix: on handler throw, return non-200 so Documenso retries (bounded budget); or push the raw body onto a BullMQ replay queue.


HIGH

H1. Multi-berth Berth Range Documenso template field still pending

buildDocumensoPayload writes formValues['Berth Range']: context.eoiBerthRange (line 157), and eoi-context.ts:128135 populates it from interest_berths.is_in_eoi_bundle=true via formatBerthRange(). The live Documenso v1 template does not yet have this field (CLAUDE.md confirms). Documenso v1's templates/{id}/generate-document silently drops unknown formValues keys — multi-berth EOIs currently render with only the primary mooring in Berth Number.

The in-app pathway (pdf/fill-eoi-form.ts:6267) fails loudly when its AcroForm field is missing; the Documenso pathway fails silently. Add a startup GET /api/v1/templates/{id} preflight that warns when Berth Range is absent.

H2. placeFields v2 path is unverified against a live Documenso 2.x instance

documenso-client.ts:636 has an explicit "must be confirmed against a live Documenso 2.x instance — top v2 risk" comment. Concerns:

  • Body uses recipientId: String(f.recipientId); v2 may want numeric ID or string token — unverified.
  • Geometry name mapping (positionX/positionY/width/height vs v1 pageX/...) is correct in shape, unverified in field naming.
  • fieldMeta shipped verbatim; v2's create-many schema unpinned.

Any port flipped to apiVersion='v2' using upload-and-place is rolling the dice until realapi run is green.

H3. v1 fallback for CHECKBOX/DROPDOWN/RADIO is broken — silently

fieldTypeNeedsMeta permits CHECKBOX/DROPDOWN/RADIO. On v1, placeFields strips fieldMeta (lines 663671 omit it) and v1's /documents/{id}/fields doesn't accept option metadata. A CHECKBOX placed on a v1 port renders as an unconfigured input with no options.

Code comment acknowledges ("falls back to blank-input behaviour"), but the placement UI gives no signal. Add a v1-aware preflight that disables these field types when apiVersion='v1'.

H4. sendDocument v2 redistribute recipient scoping is unverified

sendReminder v2 (lines 391407) ships { envelopeId, recipientIds: [signerId] } to /api/v2/envelope/redistribute. The leading comment contradicts the body: "redistributes to all pending recipients on the envelope. Single-recipient targeting requires admin-side filtering."

If v2 ignores recipientIds, every "remind one signer" click resends to everyone, including already-completed signers — embarrassment risk on multi-signer EOIs. Realapi verification needed; reconcile comment with implementation either way.


MEDIUM

M1. apiVersion='v1' template-flow caveat correct but locks out v2 features

generateDocumentFromTemplate is hard-coded to /api/v1/templates/{id}/generate-document regardless of apiVersion. v2 instances accept this via backward-compat. Risk: a v2-native admin who built a template in the v2 UI may have field IDs but no stable field namesformValues keyed by name won't match. If Documenso drops v1 compat, every template-flow EOI breaks atomically. Plan now to capture per-template field-ID metadata in admin settings.

M2. getPageDimensions cache + A4 assumption

documenso-client.ts:597 returns DEFAULT_PAGE_DIMENSIONS = { 595, 842 } (A4 portrait, pt) unconditionally — the cache is dead code. Fine for the A4 EOI source PDF; for admin-uploaded contracts in Letter/A3/landscape, percent→pixel conversion is wrong by 530%, placing fields off-page or in the wrong band. Capture real page size via pdf-lib at upload time.

M3. normalizeDocument recipient id collapses to '' on missing fields

Line 75: id: String(rec.recipientId ?? rec.id ?? ''). When both keys are absent (malformed response), id becomes ''; downstream maps keyed by recipient id collapse all phantoms into one bucket. Throw or filter when id is empty.

M4. applyPayloadRedirect /email$/i regex is fragile

documenso-client.ts:148 matches keys ending in email. A future field like notificationEmailAddress or cc_email_2 would be missed and could leak past EMAIL_REDIRECT_TO. Either widen the heuristic, or declare email fields explicitly in DocumensoTemplatePayload and rewrite only those.

M5. voidDocument 404-idempotency loses tenant signal

On 404, log + return silently. The local doc may still have status='sent', so a retry re-attempts. Mostly benign — but set local status='voided' on 404 so DB converges with remote-not-found reality.

M6. EOI hard-gate error code

eoi-context.ts:206 produces Cannot generate EOI - missing required client details: .... Labels are clean (good), but no structured code/field array — UI can't deep-link to the missing tab. Add code: 'EOI_GATE' + missing fields array.

M7. Webhook signatureHash covers replay but not v2 timestamp drift

Confirmed body-sha256 dedups same-payload retries. If v2 ever varies signedAt on retry, the per-recipient ${signatureHash}:signed:${email} keys differ → repeat processing. The per-recipient document_events index protects writes there, but handleRecipientSigned likely also advances interest stage — verify that side-effect is idempotent too.


What's solid

  • normalizeDocument id↔documentId symmetric; downstream consumes the legacy id form consistently — no stray reads of documentId/recipientId.
  • canonicalizeEvent correctly maps DOCUMENT_SIGNEDdocument.signed and routes v2 aliases (RECIPIENT_SIGNED, RECIPIENT_VIEWED) to v1 equivalents with a telemetry log line.
  • verifyDocumensoSecret timing-safe, iterates per-port + global env, rate-limits bad-secret IPs.
  • handleDocumentCompleted early-return is the right shape for the common same-channel retry case. Cross-channel race (C2) is separate.
  • eoi-context.eoiBerthRange plumbing correctly walks interest_berths.is_in_eoi_bundle=true and produces the compact range. Gap is template-side (H1).
  • SEQUENTIAL/PARALLEL signingOrder correctly wired in generateAndSignViaDocumensoTemplate (document-templates.ts:909). Gap is the in-app pathway (C1).
  • buildDocumensoPayload.meta.distributionMethod = 'NONE' — distribute invoked separately by sendDocument. Correct on both versions.
  • EOI hard-gate matches Section 2 requirements (name/address/email); yacht + berth correctly optional.

Pending — all complete

All 19 audit tasks finished. Every report is inlined above.


Appendix: methodology + agent roster

Audit was run as a single pn-crm-audit Claude Code team. Each teammate was a separate Claude Opus 4.7 instance with read-only static-analysis scope (no file edits permitted by the brief). Time budget: 22 minutes per agent. Reports were written to /tmp/audit-*.md and consolidated here.

Team members

Agent Task Output
security-auditor #1 Security + API + auth /tmp/audit-security.md
ui-ux-auditor #2 UI/UX + a11y /tmp/audit-ui-ux.md
data-model-auditor #3 Data model + migrations /tmp/audit-data-model.md
services-auditor #4 Services + realtime + storage /tmp/audit-services.md
perf-test-auditor #5 Performance + code-trim + render /tmp/audit-perf-test.md
obs-i18n-docs-auditor #6 Observability + i18n + docs /tmp/audit-obs-i18n-docs.md
concurrency-auditor #7 Concurrency + races /tmp/audit-concurrency.md
gdpr-auditor #8 GDPR + PII /tmp/audit-gdpr.md
email-auditor #9 Email deliverability /tmp/audit-email.md
error-ux-auditor #10 Error UX + failure modes /tmp/audit-error-ux.md
reporting-auditor #11 Reporting math pending
onboarding-auditor #12 Onboarding UX pending
pdf-auditor #13 PDF + brand assets pending
documenso-auditor #14 Documenso depth /tmp/audit-documenso.md
copy-auditor #15 Copy + terminology pending
deps-auditor #16 Deps + supply chain pending
build-auditor #17 Build + prod readiness pending
recommender-auditor #18 Berth recommender pending
search-auditor #19 Search relevance pending

11. Reporting + analytics math correctness (reporting-auditor)

Task #11 — Reporting + Analytics Math Correctness

Scope: dashboard widgets, kanban "active deals", pipeline-report PDF, revenue-report PDF, dashboard.service.ts, report-generators.ts, analytics.service.ts. Read-only audit.

Canonical pipeline stages live in src/lib/constants.tsPIPELINE_STAGES: open, details_sent, in_communication, eoi_sent, eoi_signed, deposit_10pct, contract_sent, contract_signed, completed. STAGE_WEIGHTS matches.


CRITICAL

C1. Hot-deals card ranks/labels on non-existent stage names

src/lib/services/dashboard.service.ts:198-208 (getHotDeals) builds a CASE that references 'in_comms' and 'deposit_10'. The DB column interests.pipeline_stage stores 'in_communication' and 'deposit_10pct'. Both real stages fall through to ELSE 0, collapsing the rank ladder so any eoi_sent deal outranks every in_communication/deposit_10pct deal, and ordering inside the top tier becomes "newest updatedAt wins" instead of "furthest along."

The frontend mirror in src/components/dashboard/hot-deals-card.tsx:26-36 (STAGE_LABELS) uses the same wrong keys (deposit_10, in_comms), so the badge for those two stages renders the raw enum string deposit_10pct / in_communication instead of "Deposit 10%" / "In Comms." Fix both files; prefer importing STAGE_LABELS from @/lib/constants rather than re-declaring it.

C2. Revenue PDF "TOTAL COMPLETED REVENUE" silently includes lost & cancelled deals

setInterestOutcome in interests.service.ts:919-943 forces pipelineStage = 'completed' for every outcome (won, lost_*, cancelled). fetchRevenueData in report-generators.ts:126-140 then sums berth prices for pipelineStage='completed' AND archivedAt IS NULL with no outcome filter, and the PDF prints the result as TOTAL COMPLETED REVENUE (revenue-report.ts:97). Result: a marina with 1 won + 10 lost deals at €1M berths reports €11M completed revenue. Add eq(interests.outcome, 'won') to the completedRevenue query (and probably to the per-stage breakdown).

C3. Pipeline PDF stageCounts query has no GROUP BY

report-generators.ts:54-60:

db.select({ stage: interests.pipelineStage, count: count() })
  .from(interests)
  .where(...);   // ← no .groupBy()

Postgres rejects a non-aggregated column without GROUP BY (42803). Either the pipeline PDF report has been crashing silently in the worker queue for any port with rows, or every run produces a single row that misses every stage but one. Add .groupBy(interests.pipelineStage).


HIGH

H1. "Active interest" means four different things across surfaces

Surface Filter
getKpis / getPipelineCounts / getRevenueForecast (dashboard tiles + forecast) archivedAt IS NULL AND (outcome IS NULL OR outcome='won')
computePipelineFunnel (analytics funnel) same — but additionally bounded by createdAt BETWEEN range
listInterestsForBoard (kanban) — interests.service.ts:194 archivedAt IS NULL only ⇒ lost & cancelled cards still appear on the board (they all sit in the completed column because of C2)
getHotDeals archivedAt IS NULL AND outcome IS NULL (also excludes won — intentional per comment but worth flagging)
fetchPipelineData / fetchRevenueData (PDF reports) archivedAt IS NULL only ⇒ includes lost & cancelled
computeRevenueBreakdown (invoices) unrelated definition — by invoice status

A rep who reads "12 Active Deals" on the tile then opens the kanban can see 17 cards, because the kanban silently includes 5 lost deals routed to the completed column. Consolidate into a single activeInterestsWhere(port) helper and reuse everywhere.

H2. Occupancy rate uses two different sources, same dashboard

  • getKpis (KPI tile) + fetchOccupancyData (PDF) compute occupancy from berths.status IN ('sold','under_offer').
  • computeOccupancyTimeline (chart on the analytics page) computes occupancy from berth_reservations overlap with each day, with total = COUNT(berths).

The two are unrelated: a berth marked sold with no active reservation contributes to the tile but not the timeline; a berth marked available with an active reservation contributes to the timeline but not the tile. Reps will see the tile read 64% and the chart's right-most point read 12% on the same day. Pick one definition (status-based is the documented one in CLAUDE.md) and align the timeline.

H3. Revenue PDF stage breakdown is unweighted; dashboard forecast is weighted

fetchRevenueData.stageRevenue (report-generators.ts:107-118) does SUM(berths.price) per stage with no pipeline_weights multiplier. The dashboard RevenueForecast widget multiplies by pipeline_weights[stage]. So:

  • Tile shows €420K (weighted).
  • Revenue PDF "Revenue by Pipeline Stage" for the same data shows €1.6M (unweighted). The two are reconcilable in principle but no rep will guess that. Either weight the PDF the same way, or rename the PDF column to "Berth Price by Stage (gross)".

H4. pipeline_weights defaults duplicated in two source files

src/lib/constants.ts:68 (STAGE_WEIGHTS) and src/components/admin/settings/settings-manager.tsx:76-86 hard-code the same object. Drift between the two means admins editing settings could see different defaults than the forecast actually uses. The settings form should import { STAGE_WEIGHTS } from '@/lib/constants' and spread it as defaultValue.

H5. getRevenueForecast silently zeroes out stages with missing weight keys

dashboard.service.ts:139 does weights[stage] ?? 0. If an admin saves pipeline_weights as { "in_comms": 0.2, ... } (legacy key) or simply omits a stage, every active interest at the missing stage contributes €0 to the forecast — no warning, no fallback to STAGE_WEIGHTS[stage]. Validate the saved JSON against PIPELINE_STAGES at write time, OR fall back to the constant per-key (weights[stage] ?? STAGE_WEIGHTS[stage]).


MEDIUM

M1. Interests with no primary berth disappear from "pipeline value"

getKpis and getRevenueForecast use INNER JOIN interest_berths ON isPrimary=true. An interest without a primary-berth link (legitimate while the rep is still sourcing) contributes 0 to pipelineValueUsd and to totalWeightedValue, but is still counted in activeInterests and on the kanban. Mismatch between deal count and value. Surface a footnote (e.g. "5 deals not yet matched to a berth") or LEFT JOIN with a price-coalesce.

M2. "Top Interests by Value" PDF includes lost deals

fetchPipelineData.topInterestsRows (lines 68-83) orders by berths.price DESC NULLS LAST with no outcome filter. A €4M lost deal will sit at the top of the report. Add (outcome IS NULL OR outcome='won').

M3. PDF stage order hardcoded inside both templates

pipeline-report.ts:58-68 and revenue-report.ts:55-65 redeclare the canonical stage order. Renaming a stage in constants.ts will leave the renamed stage appended to the "unknown stages" tail block instead of in its proper position. Import and iterate PIPELINE_STAGES.

M4. selectDistinct in pipelineValueUsd is correct but fragile

dashboard.service.ts:39-47 selectDistinct({ berthId, price }) happens to dedupe correctly because berthId is unique. If a future schema lets two interest_berths rows reference the same berth as primary (the partial unique index permits this if the other row has isPrimary=false), the join would still emit one row per primary-only match. Today's behaviour is fine; a comment in the code claims correctness but doesn't explain why. Add a one-line note tied to the partial unique index.

M5. getHotDeals ordering tiebreaker uses updatedAt while UI shows lastContact

The query orders by desc(rank), desc(updatedAt) (dashboard.service.ts:234) but the card surfaces last touched X ago from dateLastContact. When a stage rank ties, the card with the most recent edit (rename, tag change, stage move) wins, not the most recent contact. Reps will be confused why an interest with 30-day-old lastContact sits above one with 2-day-old contact. Either order by coalesce(dateLastContact, updatedAt) or drop the "last touched" copy.

M6. Source-conversion total includes archived-but-active deals only? No — also includes "still open"

getSourceConversion denominator is "every non-archived interest of that source" (dashboard.service.ts:262). For a source with 100 leads / 5 won / 0 lost / 95 still open, conversion = 5%. A source with 5 leads / 5 won shows 100%. The metric isn't wrong, but the description text "Won deals as a percentage of leads per source" implies a closed funnel; consider switching denominator to won + lost for the "true" rate, or rename the label.


Summary

3 CRITICAL bugs (hot-deals stage typos, lost-revenue mislabelled "completed", missing GROUP BY in pipeline PDF), 5 HIGH inconsistencies (active-deal definition splits 4 ways; occupancy split 2 ways; weighted vs unweighted revenue; duplicated weight defaults; silent zero-weighting), and 6 MEDIUM polish issues. The single most leveraged fix is consolidating one activeInterestsWhere() helper used by every surface, plus adding a outcome='won' filter to the revenue PDF and a GROUP BY to the pipeline PDF.


13. PDF + brand-asset correctness (pdf-auditor)

PDF + brand-asset correctness — audit

Scope: src/lib/pdf/**, src/lib/templates/{merge-fields,berth-range}.ts, src/lib/services/{documenso-payload,brochures,berth-pdf}.service.ts, docs/eoi-documenso-field-mapping.md, assets/eoi-template.pdf.

Severity bands: CRITICAL = customer-visible silent data loss / crash; HIGH = visible quality regression / wrong number on a customer-facing artefact; MEDIUM = polish + future-proofing.


CRITICAL

C-1. Live Documenso template still missing Berth Range field

  • src/lib/services/documenso-payload.ts:157 always emits the Berth Range formValue, and formatBerthRange() produces compact range strings for the multi-berth bundle.
  • docs/eoi-documenso-field-mapping.md:34 flags that the live template (id 8) does not yet have the Berth Range field. Documenso silently ignores unknown formValues keys.
  • Net effect: every multi-berth EOI shipped via the Documenso pathway currently renders only the primary Berth Number. The expanded range (e.g. A1-A3, B5-B7) is dropped end-to-end, with no warning on the Documenso side — the bundle context is lost from the signed PDF.
  • Same field is also addressed defensively in the in-app pathway (src/lib/pdf/fill-eoi-form.ts:60-72), which logs a warning, but only when the in-app template is the one being used.
  • Action: Add the Berth Range text field to Documenso template 8 (mirror the AcroForm field name + size on the source PDF). Once added, single-berth EOIs are unaffected because formatBerthRange collapses a single mooring to its raw form.

C-2. tiptap→pdfme page break is wrong for letter / mixed for A4

  • src/lib/pdf/tiptap-to-pdfme.ts:51-54:
    • PAGE_WIDTH_MM = 170 is correct for A4 (210 2×20) but is treated as the only page format.
    • PAGE_BREAK_THRESHOLD = 250 is hard-coded; A4 page height is 297 mm and the threshold of 250 leaves 47 mm of unused space at the bottom and ignores the real bottom margin (≈ 20 mm).
  • eoi-standard-inapp.ts:67 declares @page { size: letter; ... }, i.e. the seeded HTML template is Letter-sized while the serialiser is working in A4 millimetre coordinates. The template body is authored at a different page size than the engine that lays it out.
  • Net effect: long custom templates either truncate (overflow into the bottom margin, content clipped by pdfme when fields run past page height) or break at the wrong vertical position. The bug is invisible in the seeded template because its content is short, but any port that edits the template to add a few clauses sees clipped output.
  • Action: Make page format a per-template attribute (Letter vs A4), drive both the page width and the break threshold from it (Letter content height ≈ 254 mm, A4 ≈ 277 mm with 10 mm bottom margin), and reject HTML-template @page size: values that disagree with the per-template setting.

C-3. tiptap→pdfme silently drops inline italic / underline + the

    whole image node
  • extractParagraphContent (tiptap-to-pdfme.ts:146-164) only records bold and ignores italic and underline marks. The validator accepts these marks (they're not in UNSUPPORTED_NODES) so an admin saves a template with italics, the preview renders bold-only, and they ship the wrong artefact to a client.
  • processNode for image (line 354) does state.y += 20 and never adds a field. The serialiser reserves 20 mm of whitespace and drops the image entirely. The "Insert image" affordance in the template editor (if exposed) is non-functional today.
  • The validator does NOT list the visible mark names it supports, so admins cannot reason about what's safe to use.
  • Action: Either honour italic/underline via per-segment fields, or reject them at validation time the same way blockquote is rejected. For images, either implement the image pdfme schema or reject image nodes outright.

HIGH

H-1. No font registration → unsupported glyph silent fallback

  • src/lib/pdf/generate.ts calls pdfme/generator.generate({ template, inputs }) with no options.font. pdfme ships only Roboto by default.
  • The tiptap serialiser sets fontName: 'Helvetica' | 'Helvetica-Bold' (tiptap-to-pdfme.ts:205-237). pdfme without a registered Helvetica font silently falls back to its embedded Roboto; the bold variant is also a substitution. This is invisible in dev because Roboto has full Latin + Latin-1 coverage, but non-Latin glyphs (Greek, Cyrillic, Hebrew for AED-tagged clients, the د.إ AED symbol from currency.ts:14) tofu out to .
  • The currency dropdown advertises AED and JPY, both of which use non-Latin glyphs that Roboto does NOT cover (د.إ Arabic, ¥ is fine but د.إ isn't).
  • Action: Register a Unicode-coverage font (Noto Sans + Noto Sans Arabic + Noto Sans CJK) once and pass it to generate(). Mirror the same font on the in-app EOI when that pipeline is built. Until then, the AED currency code in SUPPORTED_CURRENCIES is a footgun on every PDF that renders price.

H-2. Locale inconsistency in money + date formatting

  • Mixed locale strategy in the reporting + summary templates:
    • revenue-report.ts:78,86Number(...).toLocaleString(undefined, ...) (default locale; in Node 20 inside Docker this is en-US.UTF-8 via the standalone image's LANG; on the dev mac it picks the OS locale → different decimal/thousands separators server-side).
    • pipeline-report.ts:93Number(...).toLocaleString() (default locale, no formatting opts).
    • Almost every other template hard-codes 'en-GB' for dates.
  • The interest-summary and berth-spec templates render Price as ${currency} ${Number(price).toLocaleString()} — they bypass formatCurrency() and therefore drop the proper currency symbol formatting (USD 45,000 instead of $45,000.00). invoice-template.ts uses formatCurrency() correctly; the inconsistency is a UX bug.
  • Pipeline report renders "Berth Price" with no currency at all (pipeline-report.ts:92-94): a 45 000 figure is meaningless without it.
  • Action: Route every money render in src/lib/pdf/templates/** through formatCurrency() (src/lib/utils/currency.ts:37), with an explicit locale: 'en-GB' to match the dates. Same for the reports' date stamps.

H-3. Page overflow in fixed-height schemas

  • Every template in src/lib/pdf/templates/ uses fixed position and height slots:
    • client-summary-template.ts reserves 80 mm for the interests list (line 51) and 60 mm for recent activity (line 60). pdfme truncates text that exceeds the slot height; there is no "overflow → next page" mechanism in the template definition.
    • interest-summary-template.ts:65-69 reserves 85 mm for the timeline; with 30 events at 8 pt that's ~3 lines/event = clipped after ~10 events.
    • activity-report.ts:46 reserves 120 mm for activityDetails, and the data layer slices to data.logs.slice(0, 30) (line 70) — the slice masks the bug, but if the report layer sends more logs the bottom rows are clipped.
    • pipeline-report.ts:38-50 allocates 100 mm for summary and 100 mm for details; both can spill on ports with many stages
      • many top interests.
  • pdfme's failure mode is silent clipping, not visible truncation with or a "continued on next page" marker.
  • Action: Either move large lists onto multi-page schemas (push fields onto subsequent schemas[i]) or add explicit pagination inside build*Inputs with a deterministic "showing N of M" tail.

H-4. Numeric/date inputs pass undefined/null through new Date()

  • invoice-template.ts:117 renders Due: ${invoice.dueDate} raw. When dueDate is null the field reads Due: null. Other templates use formatDate()-style helpers that return 'N/A', but the invoice template doesn't.
  • client-summary-template.ts:97,143 and interest-summary-template.ts call new Date(client.createdAt as string | Date) without guarding against undefined. new Date(undefined) yields Invalid Date whose toLocaleDateString returns 'Invalid Date' — that string ends up in the PDF.
  • Action: Add a single formatDate(value, fallback='—') helper in src/lib/utils/date.ts, reuse across all templates; the existing private one in interest-summary-template.ts:83-86 should be hoisted.

MEDIUM

M-1. No accessibility / tagged PDF output

  • All PDFs we produce are untagged (pdfme uses raw pdf-lib under the hood; generate.ts does not call setTitle, setLanguage, setProducer, or anything to enable StructTreeRoot).
  • WCAG 2.1 7.1 / PDF/UA-1 compliance is unmet. For a port that contracts with a public-sector tenant or runs accessibility reviews on outbound EOIs, this is a procurement blocker.
  • The in-app EOI HTML template has zero aria-* attributes and a table-based layout (eoi-standard-inapp.ts:184-209).
  • Action: At minimum set Title, Author, Subject, Lang=en-GB metadata in the in-app EOI fill path (fill-eoi-form.ts) — pdf-lib supports doc.setTitle() etc. without adding accessibility tags. Track tagged-PDF / PDF/UA as a follow-up item.

M-2. EOI in-app source PDF — silent field-name drift

  • fill-eoi-form.ts:42-50 swallows every getTextField() / getCheckBox() exception so a re-cut template whose AcroForm field names changed (e.g. Berth NumberBerth_Number) will produce a "successful" PDF with empty fields. Only Berth Range is special-cased to log when missing.
  • Action: Promote the silent-skip pattern to also log a warning per missing field (already done correctly for the new Berth Range field — apply same treatment to Name, Email, Address, Yacht Name, Length, Width, Draft, Berth Number, Lease_10, Purchase). Without it, the only way to notice a corrupted template is QA on a signed PDF.

M-3. Form not flattened → signer can edit pre-filled fields

  • fill-eoi-form.ts:124 saves the doc unflattened. The comment on line 94 explicitly justifies this ("recipient can still tweak fields if needed before signing"). For an EOI/LOI this is risky: the signer can edit the address, yacht dimensions, or berth number after the fact, and the unflattened PDF carries the edits without the developer/approver re-acknowledging.
  • Documenso pathway is fine — Documenso flattens server-side before producing the signed artefact — but the in-app pathway emits the raw filled AcroForm to the storage backend as-is.
  • Action: Flatten the AcroForm (form.flatten() before doc.save()) for the in-app pathway, OR mark the relevant fields as read-only via field.enableReadOnly(). The "tweak before signing" justification belongs to a draft preview, not the production artefact.

M-4. formatBerthRange warning is noisy at warn-level

  • berth-range.ts:64 logs WARN per non-canonical mooring. The CLAUDE.md mooring spec (^[A-Z]+\d+$) was data-normalised in Phase 0, but historical archived rows + the (deleted) / (archived) suffix scheme on entity folders can leak into the bundle. Every multi-berth EOI containing a legacy mooring spins a stack of warnings.
  • Action: Downgrade the per-mooring warning to debug; emit a single warn summary per formatBerthRange() call when the passthrough list is non-empty.

M-5. berth-spec-template.ts waitingList truncation

  • 50 mm × 8 pt ≈ 12 lines (berth-spec-template.ts:67-70); the waiting-list join key is position ordered 1..N and there is no data-side cap. Ports with > 12 waitlisted clients silently lose the tail of the list on the spec PDF.
  • Same shape problem as H-3 but lower impact (berth-spec is internal).

M-6. assets/eoi-template.pdf — opacity / single source

  • The whole in-app pathway depends on a single committed binary at assets/eoi-template.pdf. There is no sha256 pinned in assets/README.md, no script that regenerates it from a known good source, and the AcroForm field shape is documented only in the mapping doc + the JSDoc of loadEoiTemplatePdf. A swap of this file by anyone with repo access changes legal output silently.
  • Action: Add EXPECTED_SHA256 to assets/README.md + a startup-time check (or test) that the source PDF's sha matches before falling back to EOI_TEMPLATE_PDF_PATH. Same applies to any shipped brochure default.

M-7. Reports / pdfme schemas — no portName brand asset

  • Every report template hard-codes 'Port Nimara' as the fallback in build*Inputs. The CRM is multi-tenant; an admin generating a report for a different port falls back to the wrong brand if the port lookup fails (e.g. report job runs without a hydrated port). Default should be the empty string or '(port)', not a competitor port's brand.

M-8. Brochures + per-berth PDF — no upload-time render audit

  • These are user-uploaded PDFs, not engine-rendered, so the template-quality items above don't apply. The relevant integrity controls (magic-byte check, sha256, size cap, version snapshot) are in place in berth-pdf.service.ts:217-264 and the brochure upload flow. No findings for these two flows.

Summary

# sev file item
C-1 CRIT Documenso template (live) Berth Range field missing — multi-berth ranges dropped end-to-end
C-2 CRIT tiptap-to-pdfme.ts A4 vs Letter page mismatch + hard-coded 250 mm break threshold
C-3 CRIT tiptap-to-pdfme.ts italic/underline marks and image nodes silently dropped
H-1 HIGH generate.ts + tiptap serialiser no font registration → AED/JP/Greek/Cyrillic glyphs missing
H-2 HIGH reports + summaries locale-default toLocaleString server-side + currency bypass
H-3 HIGH every pdfme template fixed-height slots clip overflow with no pagination
H-4 HIGH invoice-template.ts + summaries raw null/undefined date passthrough renders "Invalid Date"
M-1 MED all PDFs no tagged-PDF / PDF/UA metadata
M-2 MED fill-eoi-form.ts silent field-name drift in source EOI PDF
M-3 MED fill-eoi-form.ts in-app EOI ships unflattened AcroForm
M-4 MED berth-range.ts noisy per-mooring warn log
M-5 MED berth-spec-template.ts waitingList overflow
M-6 MED assets/eoi-template.pdf no sha pinning of source binary
M-7 MED report templates wrong-port fallback brand 'Port Nimara'
M-8 MED brochure + per-berth uploads no issues — upload integrity controls in place

Approx word count: ~1380.


15. Customer-facing copy + terminology audit (copy-auditor)

Task #15 Customer-facing copy + terminology audit

Scope: CRM (src/components, src/app/(dashboard)), client portal (src/app/(portal), src/components/portal), branded email templates (src/lib/email/templates), PDF templates (src/lib/pdf/templates), public marketing site (website/). Read-only audit; no edits.


CRITICAL

C1. Four interchangeable nouns for the same domain entity

The same record is called interest, lead, prospect, and deal across surfaces. Sales reps and clients see all four within a single session.

  • Entity / schema / URL: interest (everywhere — DB, /interests, portal nav, page titles).
  • "Lead":
    • src/components/clients/client-interests-tab.tsx:30 LEAD_CATEGORY_LABELS = { hot_lead: 'Hot lead', … } and the column header literally rendered as <dt>Lead</dt>.
    • src/components/interests/interest-tabs.tsx:~736 section heading <h3>Lead</h3> + <EditableRow label="Lead Category">.
    • src/components/berths/berth-interests-tab.tsx:44 hot_lead: 'Hot Lead' (Title Case mismatch with sibling above).
    • src/components/dashboard/lead-source-chart.tsx + source-conversion-chart.tsx widget title "Lead Source Attribution".
  • "Prospect":
    • src/components/berths/berth-detail-header.tsx:~275 form label Linked prospect (optional) + helper Link this status change to the prospect (interest) it relates to. — explicitly parenthesises the canonical name as a synonym.
    • Residential uses prospect as a stage value (Prospect chip in residential-clients-list.tsx, residential-client tabs) — confusing because elsewhere "prospect" means the record itself.
  • "Deal":
    • src/components/berths/berth-tabs.tsx tab label Deal Documents, API path /api/v1/berths/[id]/deal-documents.
    • src/components/clients/bulk-archive-wizard.tsx placeholder Why are you archiving this late-stage deal? and smart-archive-dialog.tsx heading Late-stage deal — confirmation required.
    • src/components/dashboard/hot-deals-card.tsx, widget label "Hot deals".
    • Pervasive in code comments inside interest-tabs.tsx, inline-stage-picker.tsx — comments will leak into future copy.

Recommendation: pick one client-facing noun (the domain choice is interest; "deal" is fine as marketing/internal shorthand for hot interests but should never appear in fields/labels). Rename Deal DocumentsInterest Documents, "Linked prospect" → "Linked interest", <dt>Lead</dt> / <h3>Lead</h3>Buyer profile or Category. Residential prospect is a stage so leave alone but consider renaming to enquiry or new to free up the word.

C2. Raw machine status strings leak to the client portal

src/app/(portal)/portal/interests/page.tsx:80 renders

<span>EOI: {interest.eoiStatus.replace(/_/g, ' ')}</span>

and line 65 the same pattern for leadCategory. Clients see EOI: waiting for signatures, EOI: partially signed, hot lead, etc. — the underscores are stripped but the enum vocabulary is not translated. "hot lead" exposed to the client is also a privacy/optics issue (we are telling the prospect we classified them).

Fix: add a PORTAL_EOI_STATUS_LABEL map (e.g. waiting_for_signatures → "Awaiting your signature", signed → "Signed"); never render leadCategory in the portal at all.

C3. Signing-status labels diverge across three surfaces

For the same enum (draft | sent | partially_signed | completed | expired | cancelled):

Surface Label set
interest-eoi-tab.tsx / interest-contract-tab.tsx / interest-reservation-tab.tsx Draft / Awaiting signatures / Partially signed / Signed / Expired / Cancelled
documents-hub.tsx STATUS_PILL_MAP + document-list.tsx Renders raw enum (<Badge>{doc.status}</Badge>) — sent, partially_signed, completed displayed verbatim
signing-progress.tsx STATUS_LABELS Only Pending / Signed / Declined — missing Sent, Expired, Cancelled
notification-digest.ts email eoi_signed: 'EOI signed', eoi_completed: 'EOI completed' — "signed" vs "completed" used as if different events
Realtime toast (realtime-toasts.tsx) EOI fully signed (yet another phrase)

A user clicks a status pill in Documents Hub (shows partially_signed), opens the interest EOI tab (shows Partially signed), gets a toast that says EOI fully signed, and an email that says EOI completed — four phrasings for one document. Centralise via src/lib/labels/document-status.ts (already a pattern for seed-data etc.) and import everywhere.


HIGH

H1. "Save" button verbiage has six forms

Inventory of submit buttons across src/components/:

  • Save — inline editors, addresses-editor, contacts-editor, inline-phone-field, settings-manager, image-cropper.
  • Save Changes (Title Case) — client-form, yacht-form, berth-form, expense-form, company-form, interest-form, reminder-form, role-form, tag-form, custom-field-form, port-form, webhook-form, template-form.
  • Save changes (sentence case) — admin/users/user-form, interest-contact-log-tab.
  • Save profile, Save username, Save preferences, Save overrides, Save template, Save view — descriptive variants.
  • Saving... (ASCII three dots) vs Saving… (single ellipsis char) — both appear, ~50/50 split.

Decide: sentence case (Save changes) and standardise the loader as Saving… (Unicode ellipsis matches Prettier-friendly UTF-8 elsewhere in the codebase). The same form-pattern with a different casing in adjacent admin sections (user-form Save changes vs role-form Save Changes) is a likely playwright-visual diff source too.

H2. "New X" vs "Create X" mismatch on the same surface

Empty-state CTA and form submit button often disagree:

  • Clients list action: label: 'New Client' → opens sheet titled New Client → submit button Create Client.
  • Same pattern for Yacht, Expense, Company, Role, Tag, Template, Webhook, Port.
  • aria-label="New interest" on interest-list.tsx, button text Create Interest.

Pick one verb per action lifecycle (New … for affordances, Create … for confirm, OR unify to a single Create … throughout). The current pattern teaches the user two words for the same action.

H3. Public marketing form CTAs are all Submit

All five website forms (website/components/pn/specific/website/{form,contact,berths-item,supplement-eoi,register,news-item}/form.vue) use a bare Submit button. The matching confirmation email subjects say "Thank You for Your Interest" and the PDF title is "Expression of Interest". The CTA doesn't mention what the user is submitting.

Recommendation: replace with action-specific verbs — Register interest, Send enquiry, Request a call back (already used as helper text above the Submit on berths-item/form.vue). Loading state Submitting form... is also redundant — Sending… is shorter and matches the CRM email send loader.

H4. EOI vs Expression of Interest abbreviation discipline

Both forms appear, but the split is currently inverse to what client-facing surfaces should do:

  • Client-facing surfaces (portal /portal/documents page, email template documentLabel, website pages, PDF body, Documenso template-form select option) correctly spell out Expression of Interest.
  • But the portal interests page (/portal/interests) and the portal documents page header both say EOIs alongside Expression of Interesttext-sm text-gray-500 mt-1: "Your contracts, EOIs, and signed agreements".
  • Realtime toast to staff says EOI fully signed — fine for staff but the same toast also fires for portal users if they have a session? (Worth verifying; if so, full form needed.)
  • PDF body (eoi-standard-inapp.ts:177) introduces the abbreviation correctly: This Expression of Interest (the "EOI") — good. Other PDFs (interest-summary-template.ts) use raw EOI status: … without that introduction.

Rule of thumb: portal/email/marketing → full form; CRM internal UI → EOI. Audit the portal pages to remove all EOI mentions.

H5. Email greeting + sign-off tone drift

Across src/lib/email/templates/:

  • Greeting: Dear {name}, (portal-auth, crm-invite, inquiry-client-confirmation, residential-inquiry, document-signing — all three modes), Hello {name}, (admin-email-change), Hi {name}, (notification-digest), Welcome, (fallback in crm-invite/portal-auth), Dear Administrator, (inquiry-sales-notification).
  • Sign-off: Best regards, (inquiry-client, residential-inquiry), Thanks, (admin-email-change), Thank you, (document-signing), The {portName} team (most), {senderName} (signing-invitation when provided).

Pattern: client-touching emails should land on one greeting (Dear {name},) and one sign-off (Best regards, / The {portName} team). The casual Hi {name}, on the notification-digest is fine because it's internal to staff, but Hello on admin-email-change is just a third style for the same internal audience.


MEDIUM

M1. "Signing envelope" jargon leaks into a user-facing dialog

smart-archive-dialog.tsx exposes:

  • <option value="leave">Leave envelope pending</option>
  • <option value="void_documenso">Void the signing envelope</option>

"Envelope" is Documenso/DocuSign internal vocabulary. Replace with Leave signing request pending / Cancel the signing request. The Documenso admin page is OK to keep envelope (dev-facing settings).

M2. Override / Confirm overloaded action verb

interest-stage-picker.tsx:179 shows {overrideEffective ? 'Override stage' : 'Confirm'}. The non-override label is too generic; users land on a stage-change dialog and the primary button just says "Confirm". Suggest Move to {stage} (parameterised) or Update stage.

M3. Loading state punctuation inconsistency

Saving... (ASCII) vs Saving… (Unicode ). Easy global codemod; matters for Playwright visual diffs and for screen-reader pronunciation (three dots gets read out as "dot dot dot").

M4. Reminder/alert verb spread

Acknowledge / Dismiss / Mark complete / Resolve (audit log) — four near-synonyms for "I dealt with it". Reminders use Mark complete, Alerts use Acknowledge + Dismiss. Acceptable if the semantics differ (acknowledge = seen, complete = done) but the current copy doesn't make that distinction clear.

M5. "Hot Lead" / "Hot lead" casing within the same domain

  • client-interests-tab.tsx and interest-card.tsx: Hot lead.
  • berth-interests-tab.tsx and interest-filters.tsx: Hot Lead.
  • dashboard/hot-deals-card.tsx: EOI Signed, EOI Sent (Title Case).
  • General CRM trend is sentence case — Title Case in these three files is the outlier.

Suggested follow-ups

  1. Add src/lib/labels/document-status.ts; refactor documents-hub, document-list, interest-{eoi,contract,reservation}-tab, signing-progress, notification-digest to import it. (C3)
  2. Portal: never render eoiStatus / leadCategory raw; map first. (C2)
  3. Rename Deal Documents tab + /deal-documents route to Documents. (C1)
  4. Codemod Save ChangesSave changes, Saving...Saving…, unify New X vs Create X. (H1, H2, M3)
  5. Website: replace bare Submit on five forms with action-specific verbs. (H3)
  6. Portal/email/PDF: drop bare EOI abbreviation in favour of Expression of Interest. (H4)
  7. Standardise email greeting/sign-off pair per audience tier. (H5)
  8. Replace envelope jargon in smart-archive-dialog.tsx. (M1)

Verified clean: Inquiry spelling consistent (American); crm-invite.ts use of "CRM" is staff-only and intentional; reports PDFs only use enum strings internally.


16. Dependency + supply-chain hygiene audit (deps-auditor)

Dependency + Supply-Chain Hygiene Audit

Repo: new-pn-crm @ feat/documents-folders · Date: 2026-05-12 · Auditor: task #16

Inputs: pnpm audit, pnpm outdated, pnpm licenses list [--prod], pnpm why, pnpm install --frozen-lockfile, package.json, pnpm-lock.yaml, Dockerfile*.

Headline: No known CVEs (pnpm audit → 0 across info/low/moderate/high/critical), no GPL/AGPL anywhere in the tree, lockfile is intact and reproducible. Real risk concentrates in two places: a Node 20 base image at/past EOL, and a @types/node major-version mismatch that lets the type-checker greenlight runtime APIs that don't exist in Node 20. Everything else is incremental.


CRITICAL

C1 — @types/node@^25.6.2 against Node 20 runtime

  • What: package.json line 111 pins @types/node to ^25.6.2; resolved version is 25.6.2. Every Dockerfile and the esbuild target (--target=node20) ships Node 20.
  • Why it bites: Node 25 is the Current release line — it includes APIs added after Node 20 (e.g. recent node:sqlite evolution, node:test additions, process.permission, newer fs.glob shapes, updated Web* globals). TypeScript cannot tell you you've called something that won't exist on the runtime — the build passes, the prod worker crashes at first call.
  • Severity: CRITICAL — silent landmine, no compile warning, no audit signal.
  • Fix: Downgrade to @types/node@^20. If you genuinely want to consume Node-22 APIs, also bump the base images to node:22-alpine (see C2) and --target=node22.

C2 — Node 20 LTS at end-of-life

  • What: All three Dockerfiles (Dockerfile, Dockerfile.dev, Dockerfile.worker) use FROM node:20-alpine (no minor pin). Node 20 entered Maintenance LTS Oct 2025 and reaches EOL on 2026-04-30 — i.e. ~2 weeks before today (2026-05-12). The image will still build, but Node 20 no longer receives security patches from upstream. Alpine's package security advisories will continue for OS libs only.
  • Severity: CRITICAL — the base image is the largest surface in the SBOM and is now unpatched against new V8/Node CVEs.
  • Fix: Move to node:22-alpine (Active LTS through Apr 2027). Pin the minor digest (node:22.11-alpine@sha256:…) for reproducibility. Bump esbuild --target=node22 in build:server / build:worker scripts. No app-code change expected — the codebase already uses ESM-native idioms.

HIGH

H1 — @types/pdfkit mis-classified as a runtime dependency

  • What: package.json line 62 puts @types/pdfkit@^0.17.6 under dependencies alongside pdfkit. Type packages are compile-time only.
  • Impact: Slightly bloats prod node_modules and the Docker prod image; more importantly it's a classification smell — anyone reasoning about supply-chain surface will look at dependencies and assume it's executed.
  • Fix: Move to devDependencies alongside the other @types/*.

H2 — Deprecated transitive: glob@10.5.0

  • What: pnpm-lock.yaml carries glob@10.5.0 with the upstream notice "Old versions of glob are not supported and contain widely publicized security vulnerabilities … please update." pnpm why glob traces it to archiver-utils@5.0.2 ← archiver@7.0.1 (a direct dependency, used by GDPR exports per CLAUDE.md).
  • Impact: glob < 11 has known prototype-pollution-class issues. pnpm audit doesn't flag them because the advisories require an exploitable callpath, but the deprecation notice is the upstream signal to upgrade.
  • Fix: Bump archiver to ^8.0.0 (already shown in pnpm outdated). Archiver 8 pulls a current glob and the API is source-compatible for the way src/lib/services/gdpr-export.service.ts uses it. Verify with the GDPR export Playwright case after upgrade.

H3 — Deprecated transitive: @esbuild-kit/{core-utils,esm-loader}

  • What: Both are marked "Merged into tsx: https://tsx.is" by upstream. They come from drizzle-kit@0.31.10 and better-auth@1.6.9 — not directly fixable here.
  • Severity: HIGH (visibility only; no known exploit). The packages still function but receive no upstream maintenance.
  • Fix: Track drizzle-kit and better-auth releases; both maintainers have open PRs migrating to bare tsx. No local change today — file as a watch item.

H4 — pnpm.overrides uses floating ranges

  • What: package.json pnpm.overrides:
    vite:    "8.0.5"        // pinned ✓
    esbuild: ">=0.25.0"     // floating ✗
    postcss: ">=8.5.10"     // floating ✗
    
  • Impact: The >= overrides re-resolve on every pnpm install --no-lockfile / pnpm update. They were added as CVE-fix safety nets, but their floating shape defeats the lockfile's reproducibility guarantee on the very transitives that prompted the override in the first place.
  • Fix: Replace ">=0.25.0" / ">=8.5.10" with the actual resolved versions (currently 0.27.7 and 8.5.14), or use exact pins. Re-evaluate whenever you bump esbuild/postcss.

MEDIUM

M1 — Major-version upgrades available

Captured from pnpm outdated (today vs latest):

Package Current Latest Risk
next 15.5.18 16.2.6 App Router breaking changes in 16; defer until React/Next stabilise together
eslint + eslint-config-next 9 / 15 10 / 16 Lint-only; do alongside Next 16
zod 3.25.76 4.4.3 Wide blast radius — every src/lib/validators/*.ts + createTemplateSchema VALID_MERGE_TOKENS allow-list logic. Plan as its own task
tailwindcss 3.4.19 4.3.0 Config migration (Tailwind 4 = Lightning CSS) — schedule with design tokens work
@hookform/resolvers 3.10.0 5.2.2 API change for zod resolver — paired with zod 4
react-day-picker 9 10 Verify in calendar/date pickers
archiver 7.0.1 8.0.0 Clears H2 — do first, it's narrow
esbuild (dev) 0.27.7 0.28.0 Patch-y; trivial

Minor upgrades (bullmq, better-auth, @tanstack/react-query, vitest, @playwright/test, @types/node patch, libphonenumber-js, tailwind-merge, lint-staged, react-grab) are all single-digit-bumps, no risk.

M2 — dotenv lives in devDependencies but is imported by production-runnable scripts

  • What: dotenv is devDependencies (line 117) but is imported by scripts/backfill-document-folders.ts (documented in CLAUDE.md as a deploy step), scripts/import-berths-from-nocodb.ts, scripts/db-reset.ts, src/lib/db/seed.ts, etc.
  • Impact: Anyone who runs pnpm install --prod and then pnpm db:backfill:doc-folders (a documented deploy command) fails at module resolution. Not exploited today because deploy runs pnpm install without --prod for those steps, but the contract is implicit.
  • Fix: Either (a) move dotenv to dependencies, or (b) document in CLAUDE.md that the backfill must be run from a full-deps image / dev workstation. (a) is the smaller foot-gun.

M3 — node:20-alpine is unpinned (floats on minor + digest)

  • What: No minor or digest pin on the FROM lines.
  • Impact: Two builds an hour apart can land on different base layers; SBOM drifts without code changes; pre-existing CVE-fix bumps reach prod un-noticed (mostly a good thing, but caught me out audits).
  • Fix: Use node:22.11-alpine@sha256:<digest> once you move to 22. Re-pin monthly as part of dependency hygiene.

M4 — No engines field in package.json

  • What: package.json has no engines.node / engines.pnpm. The packageManager field pins pnpm to 10.33.2, but Node is implicit.
  • Impact: pnpm install doesn't enforce the runtime; a contributor on Node 18 will install successfully and only fail later. CI hides this because Docker is the source of truth.
  • Fix: Add "engines": { "node": ">=22 <23", "pnpm": ">=10" } and turn on engine-strict=true in .npmrc if you want hard enforcement.

LICENSE AUDIT (prod tree)

No GPL or AGPL anywhere. Non-permissive licenses found:

Package License Disposition
@img/sharp-libvips-darwin-arm64 (and other arches) LGPL-3.0-or-later OK — dynamic link, native binding; LGPL §5 covers this use
dompurify MPL-2.0 OR Apache-2.0 OK — dual; you may rely on Apache-2.0
@zone-eu/mailsplit (transitive of mailparser) MIT OR EUPL-1.1+ OK — dual; MIT chosen
caniuse-lite CC-BY-4.0 OK — data only, attribution satisfied by upstream notices
postgres (driver) Unlicense OK — public-domain-style
axe-core (dev only) MPL-2.0 OK — dev/test, not redistributed
lightningcss, lightningcss-darwin-arm64 (dev/build) MPL-2.0 OK — build-time, MPL is file-scoped
tslib 0BSD OK

No UNLICENSED / "Custom" / SSPL packages.


LOCKFILE + AUDIT INTEGRITY

  • pnpm auditNo known vulnerabilities.
  • pnpm audit --json metadata shows 989 deps, 0 vulns, 0 dev (because the audit metadata reports dependencies after filtering — clean).
  • pnpm install --frozen-lockfile"Lockfile is up to date", no warnings, no peer-dep Unmet/Conflict lines. Husky prepare hook ran clean.
  • pnpm-lock.yaml has 139 peerDependencies: entries — all satisfied (no peer:missing markers in the resolved graph).
  • No "phantom" deps detected — only the two deprecated chains in H2/H3.

  1. C1 + C2 together — bump base image to node:22.11-alpine, drop @types/node to ^22, esbuild --target=node22. Smoke test build + worker.
  2. H1 — move @types/pdfkit to devDependencies (1-line PR).
  3. H2archiver@^8.0.0; run GDPR-export Playwright case.
  4. H4 — replace floating overrides with exact pins.
  5. M2 — promote dotenv to dependencies, OR document deploy contract.
  6. M4 — add engines field.
  7. M1 majors — schedule one-at-a-time, starting with archiver (done in #3) and esbuild. Next 16 / Zod 4 / Tailwind 4 are each their own project.

Total touch points: ~6 single-line PRs + 3 scheduled major-bump tracks.


17. Build + deploy + prod readiness audit (build-auditor)

Audit #17 — Build + Deploy + Prod Readiness

Scope: Dockerfile, Dockerfile.dev, Dockerfile.worker, docker-compose.yml, docker-compose.prod.yml, docker-compose.dev.yml, next.config.ts, src/lib/env.ts, .env.example, plus the entry points src/server.ts / src/worker.ts and health endpoints.

Branch at audit time: feat/documents-folders.


CRITICAL

C1 — No .dockerignore in repo root

cat .dockerignore returns no such file. Build context at audit time:

  • node_modules = 4.9 GB
  • .next = 2.7 GB
  • .git = 41 MB
  • Plus storage/, playwright-report/, test-results/, tests/, scripts/, screenshots, .env*.

Every docker build ships ~7.6 GB to the daemon. Worse, Dockerfile.dev and the builder stage of Dockerfile / Dockerfile.worker all do COPY . ., which means:

  • .env, .env.local, .env.dev (if present) end up in build-layer history of the builder stage. The runner stage doesn't re-copy them, but intermediate layers are cacheable and pushable; a careless --target builder push leaks secrets.
  • Local node_modules (built on macOS) get shipped to an Alpine builder and then ignored — silent waste, and node_modules/sharp darwin binaries collide with the musl install.
  • Test snapshots / fixtures get baked into the trace.

Fix: add .dockerignore covering at minimum: node_modules, .next, .git, .env*, dist, storage, playwright-report, test-results, tests, coverage, *.log, .DS_Store, .vscode, .idea, .husky, docker-compose.*.yml (not needed inside the image).

C2 — EMAIL_REDIRECT_TO has no production refusal guard

CLAUDE.md is explicit: "must be unset in production". The Zod schema in src/lib/env.ts:41 accepts it unconditionally as z.string().email().optional(). If a staging .env leaks into a prod deploy (a very common ops mistake with the current env_file: .env setup — see M4), every outbound client email, EOI signing invite, and webhook delivery silently routes to the staging mailbox and the production user sees… nothing.

Evidence of the blast radius — EMAIL_REDIRECT_TO short-circuits:

  • src/lib/email/index.ts:131 — all SMTP recipients rewritten.
  • src/lib/services/documenso-client.ts:118-180 — all Documenso recipient lists + template formValues overridden.
  • src/lib/queue/workers/webhooks.ts:94-107 — webhook deliveries fully suppressed.

Fix: add a superRefine (or schema-level cross-check) that hard-fails when NODE_ENV === 'production' && EMAIL_REDIRECT_TO is set. Belt-and-braces: log a logger.fatal and process.exit(1) from src/lib/email/index.ts boot if the condition is reached.

C3 — Custom server depends on socket.io that may not be in the standalone trace

Dockerfile runner stage copies only .next/standalone/, .next/static/, public/, and dist/server.js (renamed server-custom.js). There is no separate pnpm install --prod in the runner — every runtime dep must arrive via Next's output: 'standalone' tracer.

src/server.ts imports @/lib/socket/server, which import { Server } from 'socket.io' and @socket.io/redis-adapter. esbuild bundles server.ts with --packages=external, so at runtime server-custom.js does require('socket.io') against /app/node_modules. Neither socket.io nor @socket.io/redis-adapter is in next.config.ts:66 serverExternalPackages, and no Next route ever imports @/lib/socket/server (the socket server is only instantiated by the custom entry point), so the Next tracer has no reason to include them in .next/standalone/node_modules.

If this has been working in prod it's only because the packages get pulled in transitively via something Next does see. The dependency is invisible to the build system — a Next minor upgrade could drop them from the trace tomorrow.

Fix: add both to serverExternalPackages, and extend outputFileTracingIncludes for the custom-server bundle, or COPY them explicitly into the runner from the deps stage:

COPY --from=deps --chown=nextjs:nodejs /app/node_modules/socket.io ./node_modules/socket.io
COPY --from=deps --chown=nextjs:nodejs /app/node_modules/@socket.io ./node_modules/@socket.io

(Same risk applies to anything else only the custom server imports — audit src/server.ts import graph.)


HIGH

H1 — CSP keeps 'unsafe-inline' on script-src in production

next.config.ts:31script-src 'self' 'unsafe-inline' regardless of isProd. Only 'unsafe-eval' is gated. With 'unsafe-inline' on, the entire XSS defence of CSP is defanged — any reflected/stored XSS still executes inline. The comment claims it's for Tailwind/Radix runtime styles, but those affect style-src, not script-src. Move to nonce or hash-based script policy in prod.

H2 — NEXT_PUBLIC_APP_URL not in Zod schema, but baked at build time

.env.example:67 lists NEXT_PUBLIC_APP_URL but src/lib/env.ts does not validate it. The builder stage runs pnpm build with SKIP_ENV_VALIDATION=1, so Next inlines an empty string when the var is missing. src/providers/socket-provider.tsx:67 then runs io(process.env.NEXT_PUBLIC_APP_URL!, {...})io('', {...}) → browser falls back to window.location.origin, which silently works in most cases but breaks the moment the CRM is fronted by a different origin than the socket gateway. src/lib/auth/client.ts:12 has the same risk for the auth base URL during SSR.

Fix: add NEXT_PUBLIC_APP_URL: z.string().url() to the schema and pass it into the builder stage (via --build-arg + ARG NEXT_PUBLIC_APP_URL in Dockerfile). Drop SKIP_ENV_VALIDATION=1 from the builder stage, or at least surface a build-time warning for missing NEXT_PUBLIC_* vars.

H3 — Dockerfile.dev runs as root and re-installs on every layer rebuild

  • No USER directive → dev container is root inside the bind-mounted /app. Any pnpm dev-spawned child can write to host-mounted files as root.
  • Combined with C1 (no .dockerignore), the COPY package.json pnpm-lock.yaml ./ followed by pnpm dev over a bind mount means the host node_modules shadow the in-container install on macOS (different platform), and dev images frequently break on sharp/tesseract.js until rebuilt.

Fix: create a node user (or reuse uid 1001), chown /app, drop privileges, and ship a working .dockerignore so the build context isn't 7.6 GB.

H4 — docker-compose.prod.yml has no resource limits, no log rotation

crm-app, crm-worker, postgres, redis all run with default unlimited memory and the default json-file log driver. On a small VPS one runaway worker OOMs the host. The default log driver has no rotation, so disks fill silently. Add deploy.resources.limits (or top-level mem_limit in non-swarm mode) and logging: driver: json-file, options: { max-size: "10m", max-file: "5" } to every service.

H5 — Compose healthcheck targets localhost:3000, but env.PORT is configurable

docker-compose.prod.yml:45 and .yml:43 hardcode http://localhost:3000/api/health. If a deploy sets PORT=8080 via .env, the container listens on 8080, the healthcheck stays on 3000 → permanent "unhealthy" → restart loop. Either drop PORT from env.ts (the schema validates it but compose ignores it) or templatize the healthcheck (wget … http://localhost:${PORT:-3000}/api/health).


MEDIUM

M1 — Worker healthcheck only pings Redis

Dockerfile.worker:38-39 checks Redis.ping(). A wedged BullMQ consumer (silent disconnect from the queue stream but TCP alive) passes this probe while jobs queue forever. Upgrade to read a sentinel BullMQ heartbeat key the worker writes on each job loop, or expose a tiny HTTP /healthz from the worker that asserts queue.client.status === 'ready' on the named queues.

M2 — Worker re-installs deps in the runner stage

Dockerfile.worker:31-32 does pnpm install --frozen-lockfile --prod in the runner — network round-trip on every build, even though the deps stage already has the full tree. Move to COPY --from=deps /app/node_modules ./node_modules then pnpm prune --prod, or use pnpm deploy --prod --filter <pkg>. Save ~3060s per build and removes a network failure mode.

M3 — next.config.ts serverExternalPackages likely incomplete

socket.io, @socket.io/redis-adapter, imapflow, mailparser, pdf-lib, pdfme, sharp, tesseract.js are all heavy native/CJS-leaning deps used server-side. Only 8 are listed. Anything missing risks bundling into the Next route trace (slower cold start, larger lambda/standalone size, possible runtime require failures for native bindings). Audit the import graph and add the rest.

M4 — env_file: .env puts every secret into the container env

docker-compose.prod.yml:36,59. Anyone with docker inspect or /proc/<pid>/environ access on the host reads BETTER_AUTH_SECRET, EMAIL_CREDENTIAL_KEY, DOCUMENSO_API_KEY, DOCUMENSO_WEBHOOK_SECRET in plaintext. Switch to docker secrets (/run/secrets/...) or a sidecar mount and have env.ts read from file paths when the _FILE suffix is present.

M5 — .env.example missing schema entries

Not in .env.example: MULTI_NODE_DEPLOYMENT, WEBSITE_INTAKE_SECRET, EMAIL_REDIRECT_TO (intentional per docs but the doc note exists; add a commented # EMAIL_REDIRECT_TO= line so devs know it's an option), DOCUMENSO_CLIENT_RECIPIENT_ID / DOCUMENSO_DEVELOPER_RECIPIENT_ID / DOCUMENSO_APPROVAL_RECIPIENT_ID (env.ts has all three with defaults, but they should still be documented), PORT. The EMAIL_CREDENTIAL_KEY placeholder is 64 zeros — fine for dev but worth a comment that prod must rotate.

M6 — Node 20-alpine, no PID-1 init

Both Dockerfiles use node:20-alpine (still LTS, but the node:22-alpine LTS is current). Neither installs tini/dumb-init — Node handles SIGTERM itself in these entrypoints so it's not broken, but if any child process is ever spawned (e.g. tesseract worker pool) zombie reaping is on Node. Cheap upgrade: RUN apk add --no-cache tini && ENTRYPOINT ["/sbin/tini", "--"].

M7 — Dockerfile runner has no HEALTHCHECK directive (only compose has one)

Image-level HEALTHCHECK makes the image self-describing — useful for non-compose orchestrators (swarm, nomad, k8s readinessProbe via exec). Add the same wget … /api/health line to the app Dockerfile as the worker Dockerfile already does for Redis.

M8 — CSP connect-src https: / img-src https: are wide

Tighten to an allow-list once per-port branding exposes the configured S3 host.

M9 — Builder stage never sets NODE_ENV=production

Dockerfile:14-15 sets NEXT_TELEMETRY_DISABLED=1 + SKIP_ENV_VALIDATION=1 but not NODE_ENV. next.config.ts:3 branches on isProd for CSP — make this deterministic with ENV NODE_ENV=production above RUN pnpm build.


Quick-win checklist

  1. Add .dockerignore (C1).
  2. Refuse-to-start when EMAIL_REDIRECT_TO set in prod (C2).
  3. Pin socket.io into the standalone trace (C3).
  4. Remove 'unsafe-inline' from script-src in prod CSP (H1).
  5. Validate NEXT_PUBLIC_APP_URL at build (H2).
  6. Add compose resource + log limits (H4).
  7. Templatize healthcheck PORT (H5).

18. Berth recommender quality audit (recommender-auditor)

Audit — src/lib/services/berth-recommender.service.ts

Read-only audit. Scope per task #18: tier ladder, heat weights, max-oversize cap, fallthrough policy paths, port-isolation defense-in-depth, CTE correctness, cooldown / late-stage settings, n+1 risk, edge cases.

Code as of feat/documents-folders @ 660553c.


CRITICAL

None blocking ship. The recommender's entry-point port guard (interestInput.portId !== args.portIdCodedError) and the feasible CTE's b.port_id = $portId correctly fence cross-tenant queries at the top level. The remaining issues are correctness / defense-in-depth.


HIGH

H1. active_interest_count lacks i.id IS NOT NULL defense-in-depth filter

aggregates CTE (lines 475479):

COUNT(*) FILTER (
  WHERE i.archived_at IS NULL
    AND i.outcome IS NULL
    AND ib.is_specific_interest = true
) AS active_interest_count

The LEFT JOIN on interests i ON i.id = ib.interest_id AND i.port_id = $portId intentionally sets i.id = NULL when an interest_berths row points at a cross-port interest (orphan / legacy data). For an ib row with is_specific_interest = true whose i.id was nulled by the port-filter, the FILTER evaluates archived_at IS NULL → TRUE, outcome IS NULL → TRUE, is_specific_interest = true → TRUE — and the row is counted as an active interest against the feasible berth, mis-classifying it as Tier C (or D if combined with the H2 issue below).

total_interest_count correctly guards with FILTER (WHERE i.id IS NOT NULL) and the inline comment promises "FILTER also enforces port isolation defense-in-depth," but only total_interest_count carries that guard. The documented project precedent for the documents-hub aggregator is "defense-in- depth port_id filter at every join — entry-point check alone is rejected." The recommender should mirror that.

Fix: Add AND i.id IS NOT NULL to the active_interest_count filter (also worth adding to max_active_stage for consistency — see M3).

H2. max_active_stage not filtered by is_specific_interest = true

Lines 483496:

COALESCE(
  MAX(CASE i.pipeline_stage ...) FILTER (
    WHERE i.archived_at IS NULL AND i.outcome IS NULL
  ),
  0
) AS max_active_stage,

The inline comment on active_interest_count is explicit: "An EOI-bundle-only link (is_specific_interest=false, is_in_eoi_bundle=true) is legal coverage, not a pitch, and shouldn't demote the berth." That intent is honoured by active_interest_count but violated by max_active_stage, which sums over all open ib rows regardless of the is_specific_interest flag.

Concrete failure: berth X is part of an EOI bundle for interest A (at deposit_10pct, EOI-bundle-only — legal coverage, not a pitch). No specific-interest link on X. The recommender computes active_interest_count = 0 (correct) but max_active_stage = 6 (deposit_10pct). classifyTier looks at activeInterestCount > 0 && maxActiveStage >= 6. The first clause is false → tier A (correct). So in this specific case the bug is masked.

But mixed case: berth X has both an EOI-bundle-only deep-stage link AND a specific-interest link at details_sent. active_interest_count = 1, max_active_stage = 6 (from the bundle link) → Tier D. Per the documented semantics it should be Tier C (the only pitch is at details_sent = 2). This falsely sends late-stage warnings into the UI and, when tier_ladder_hide_late_stage = true, hides the berth that should still be recommendable.

Fix: Add AND ib.is_specific_interest = true to the max_active_stage FILTER, to match active_interest_count.


MEDIUM

M1. Tier-B heat suppressed when berth has any active interest

recommendBerths (line 587598): heat is only computed when tier === 'B'. A berth with strong fall-through history plus a single fresh tire-kicker active interest is classified C (active > 0, no late stage), heat = null, and all the recovery signal (recency / furthest stage / interest count / EOI count) becomes invisible in the UI. The pipeline reason chip degrades to "1 active interest in early stage" and the rep loses context about whether the berth has a history of falling through at contract_signed.

Defensible as a design choice — the tier already encodes "needs attention" — but documenting the trade-off (or surfacing a "history" indicator independent of tier) would close the gap.

M2. pipeline_stage = 'completed' (stage 9) absent from CASE expressions

Both max_active_stage and fallthrough_max_stage CASE blocks enumerate open … contract_signed (18) with ELSE 0. The schema comment at src/lib/db/schema/interests.ts:18 lists completed as the ninth stage. An interest at pipeline_stage='completed' with outcome IS NULL (defective but possible) falls into the ELSE 0 branch, producing the same maxStage as "no data." Practically not harmful because won outcomes drop the row from the active filter, but the silent collapse to 0 is fragile if the data ever drifts. Either add the completed arm explicitly or replace the CASE with a join against a stage-order lookup so the JS constant and the SQL arm stay in lock-step.

M3. CTE LEFT JOIN allows null-side rows into all aggregates

Same root cause as H1, narrower impact:

  • lost_count: filter requires i.outcome IS NOT NULL → safe.
  • latest_fallthrough_at, fallthrough_max_stage: same outcome IS NOT NULL guard → safe.
  • eoi_signed_count: i.eoi_status = 'signed' → null-on-null → safe.
  • max_active_stage: filter is i.archived_at IS NULL AND i.outcome IS NULL → both NULLs match → row is included with CASE returning 0 → COALESCE(..., 0) masks it. Safe in practice but only by accident.

Adding the i.id IS NOT NULL predicate to every active-side filter is cheap, matches the documents-hub precedent, and makes the intent self-documenting.


LOW

L1. Negative / zero admin values not validated

asNumber accepts any finite number. topNDefault = 0 returns an empty recommendation list; maxOversizePct = -50 produces a multiplier of 0.5 that combines with the length_ft >= desiredLengthFt filter to make every berth infeasible; fallthroughCooldownDays = -30 puts the cutoff in the future and silently disables the cooldown (every fall-through is "before" the future cutoff). Consider clamping at parse time (Math.max(0, n) for non-negative settings, Math.max(1, n) for topN).

L2. outcome::text cast is a no-op

interests.outcome is declared text(...) (not an enum) — the explicit ::text cast inside LIKE 'lost%' is redundant. Harmless; safe to drop.

L3. Hard-coded heat normalisation constants

computeHeat uses 5 (interest count) and 3 (EOI count) as the "saturate-at" caps and 30 / 365 days for the recency curve. These are not admin-tunable. Per-port behaviour expectations may differ — a port that sees 20+ interests on hot berths will have interestCount saturating early. Promote to settings if tuning lands as a real need; otherwise document the assumption.

L4. Width-only feasibility cap uses 8× L/W heuristic

When desiredLengthFt is null but desiredWidthFt is set, the upper length cap is width * 8 * (1 + oversizePct/100). Inline comment owns this as a pragmatic guard. Worth a unit test pinning the ratio so a future tweak doesn't silently widen the cap.


Architecture / structure — clean

  • Tier ladder (classifyTier): A/B/C/D mapping is correct and matches the doc-string. Tier C/D requires activeInterestCount > 0; D needs maxActiveStage >= LATE_STAGE_THRESHOLD (= 6, deposit_10pct). Tier B requires lostCount > 0. Tier A is the fall-through default. Verified.
  • Heat defaults (30 / 40 / 15 / 15) sum to 100, and computeHeat re-normalises via norm = 100 / weightSum so admin tuning that doesn't sum to 100 still produces a 0..100 score. Final Math.max(0, Math.min( 100, ...)) clamps. Verified.
  • Max-oversize cap arithmetic: oversizeMultiplier = 1 + pct/100, applied as length_ft <= desired * multiplier. Inclusive upper bound; the lower bound length_ft >= desired is also inclusive. Symmetric and correct.
  • Fallthrough policy paths:
    • immediate_with_heat → no cooldown filter, heat surfaces immediately.
    • cooldown → tier B berths whose latestFallthroughAt > now - cooldownDays are skipped; non-B berths unaffected.
    • never_auto_recommend → tier B berths skipped entirely (heat still computed but never reaches the output). All three paths correct.
  • tier_ladder_hide_late_stage: default trueshowLateStage = false → tier D rows dropped at line 564. Caller can override via the showLateStage arg. Correct.
  • N+1 risk: scoring loop is pure JS over the pre-fetched rowset. The three-query shape (loadRecommenderSettings, loadInterestInput, main CTE) is constant. No issue.

Edge cases — verified

  • No history: LEFT JOIN yields one null-side row, all FILTER predicates short-circuit, counts = 0, COALESCEs return 0 → Tier A. ✓
  • All-lost history: active = 0, lost > 0 → Tier B; cooldown / never paths each gate correctly; heat computes from fall-through fields. ✓
  • Mixed open + lost: active > 0 dominates → Tier C/D, heat = null (see M1 trade-off). ✓ (with caveat)
  • Won outcome: not matched by outcome LIKE 'lost%' OR outcome = 'cancelled', doesn't inflate lost_count or contaminate fallthrough stage. ✓
  • Cross-port leakage: prevented at the entry point and the feasible CTE; partial defense-in-depth gap at the aggregates layer (H1, M3).

19. Search relevance audit (search-auditor)

Search relevance audit — task #19

Scope: src/lib/services/search.service.ts, src/lib/services/search-nav-catalog.ts, src/components/search/command-search.tsx, src/hooks/use-search.ts, plus the resolve-id route used by paste detection.

Method: Read each file in full, traced the ranking formulas, simulated the three test queries against scoreEntry, audited the graph-expansion merge for permission leakage, and spot-checked the catalog for duplicates.


Spot-check results (the three required queries)

All three pass — but with a duplicate-result wrinkle (see HIGH-2).

Query Top entry Score Why
ai /admin/ai "AI configuration" 80 label.startsWith("ai")
smtp /settings/email "Email accounts (SMTP / IMAP)" 60 label.includes('smtp') beats keyword-exact (50) on the /admin/email twin
client portal /admin/settings "System Settings" 50 exact keyword match

Runner-ups for ai: System Settings (35, ai interest scoring keyword prefix) and Profile (20, "avatar" substring). Acceptable noise floor.


CRITICAL

None. The system is solid overall — sanitization is correct, port isolation is consistent, the affinity boost is bounded, and paste detection is port-scoped via the resolve-id endpoint (good — prevents cross-tenant navigation on super-admin paste).


HIGH

HIGH-1 — Graph expansion bypasses per-bucket permission gates (authorization leak)

search() (line 18091865) gates each direct-match bucket via can(opts, '<x>.view'). Then expandGraph runs unconditionally on whichever direct matches survived, and its output is pushed into mergedClients / mergedInterests / mergedYachts / mergedCompanies / mergedBerths via mergeWithExpansion (lines 19111915) — without re-checking the destination bucket's permission.

Concrete leak: a user with berths.view but no interests.view who searches A12:

  • direct: berth A12 surfaces
  • expansion: interestsFromBerths → populates expandedInterests → merged into mergedInterests → returned to the client
  • The dropdown renders rows with the client's full name + pipeline stage from interests the user cannot otherwise read

Similar leaks: berth name via yacht-direct match → expandedBerths; client names via company-direct match → expandedClients; etc.

Fix: gate the expansion writes — only push expanded.X into mergedX when can(opts, '<X>.view'). Cleanest: pass the can(...) results into mergeWithExpansion as a "destination allowed" boolean.

HIGH-2 — Six catalog labels are duplicated under different hrefs

The catalog has both /settings/X and /admin/X entries with near-identical labels, so common queries return two visually-similar rows pointing at different pages:

Query Hits
tags /settings/tags "Tags" + /admin/tags "Tags"
branding /settings/branding + /admin/branding
templates /settings/templates "Document templates" + /admin/templates "Document templates"
storage /settings/storage + /admin/storage
analytics/umami /website-analytics + /admin/website-analytics
email/smtp /settings/email + /admin/email

For users who have both manage_settings permissions, the dropdown shows two indistinguishable rows. Recommendation: either (a) collapse to one canonical entry per concept, or (b) disambiguate the label suffix (e.g., "Email accounts (admin)" vs "Email accounts (self-serve)"). The duplication reflects the underlying double-page structure, which deserves its own product decision.

HIGH-3 — looksLikeEmail / wantPhone are computed then discarded

Lines 18041807 compute wantEmail and wantPhone, then lines 18851886 do void wantEmail; void wantPhone; with a TODO-style comment. Dead code paid for on every request. Either delete or wire it into the bucket reordering the comment promises.


MEDIUM

M-1 — applyAffinity re-sorts AFTER mergeWithExpansion, breaking the direct-first guarantee

mergeWithExpansion (line 1754) carefully puts direct matches before expansion rows. Then apply() (line 1905) re-sorts the merged list by recently-touched membership — a recently-touched related-via row can leapfrog a direct (non-touched) match. Either intentional (and should be documented) or a bug (and the merge ordering is wasted work). The current behavior surprises me: I expect direct matches to always win at the top.

M-2 — searchOtherPorts mixes tsvector + trigram + ILIKE inconsistently

Clients section uses tsvector OR ILIKE; berths section uses b.mooring_number % ${query} (pg_trgm operator with the default 0.3 threshold). Berths are short codes — trigram on them is unreliable ("A12" trigram similarity to "B12" is ~0.5, both surface). Standardize: berths should match via prefix only (consistent with the in-port searchBerths).

M-3 — searchNotes interest-branch source_label drops when no primary berth

Line 1166: b.mooring_number AS source_label is null when the interest has no primary berth, so the row's sourceLabel falls back to the generic "Interest" via labelForSource. The interest's client name would be a far more useful label (the interests bucket uses it). Patch: COALESCE with the client name via an extra JOIN.

M-4 — Paste-detection regex hard-codes invoice numbering shape

INVOICE_RE = /^INV-\d{6}-\d+$/i (line 92) assumes the legacy 6-digit prefix. The resolve-id endpoint also accepts invoice_number lookup, so non-matching shapes silently fall through to free-text search. Not a security issue, but if invoice numbering changes the paste shortcut breaks invisibly. Consider expanding to /^INV[-_/].+$/i and letting the resolve-id endpoint be the source of truth.

M-5 — Non-ASCII characters in names are stripped by tsquery sanitizer

buildPrefixTsquery (line 278) strips [^a-z0-9_], so Šibenik, Łukasz, Müller all reduce to empty tokens. The trigram fallback similarity() saves most of these (it's diacritic-tolerant for >0.3 similarity), but exact-prefix matching on accented names is lost. For Croatian / Polish / German tenant names this matters. Consider unaccent() before sanitization or relax the regex to \p{L}.

M-6 — expandGraph issues N+1-style queries for each direct ID set

The LIMIT ${perBucketCap * direct.<X>Ids.length} pattern (e.g., line 1387, 1463, 1486) scales the row cap by direct-match count. With limit=5 and 5 direct berth matches, that's 25 expansion rows fetched, then merged into the same 10-row limit * 2 cap downstream — most fetched rows are thrown away. Minor cost; cap globally instead.

M-7 — searchDocuments JOIN on document_signers has no port_id filter

Defense-in-depth: ds.signer_email ILIKE match is filtered through d.port_id, but the JOIN itself doesn't carry the port filter. Documents are FK-scoped, so no leak today, but the recommender pattern in this codebase (per CLAUDE.md) says "defense-in-depth port_id filter at every join." Apply the same here.

M-8 — import() of searchNavCatalog inside search() is sync wrapped in two awaits

Line 1867 — await Promise.resolve((await import('@/lib/services/search-nav-catalog')).searchNavCatalog(...)). The dynamic import is fine (avoids a circular dep), but Promise.resolve wrapping a sync result then awaiting it is dead ceremony. Inline or await import(...).then(...).

M-9 — Bucket ordering matches spec: notes second-to-last, navigation last ✓

BUCKETS in command-search.tsx (lines 6081) — confirmed. Notes is index 14, Navigation is index 15. buildFlatRows preserves this order, and the comments at lines 7579 and 11351138 document the rationale.


What works well

  • scoreEntry ladder (label-exact 100 → label-prefix 80 → label-substring 60 → kw-exact 50 → kw-prefix 35 → kw-substring 20) is correct and matches the spec.
  • Paste detection: regex narrowness is fine because resolve-id is port-scoped and the fallback is normal search.
  • The NEVER_TSQUERY / NEVER_PHONE sentinels (line 385386) correctly avoid Postgres-evaluation-order surprises that would otherwise break NULL guards in WHERE.
  • searchBerths exact-match short-circuit (line 757) is the right UX call — typing "A1" when A1 exists should not also dump A10A19.
  • Catalog requires is permission-gated correctly and searchNavCatalog respects both requires and superAdminOnly.
  • mergeWithExpansion uses a Set dedupe — direct match wins, no duplicate rows.
  • applyAffinity is stable wrt original order (line 327) when the touched-set is empty.

Recommendations, ranked

  1. Fix HIGH-1 immediately — graph-expansion permission leak. One-line gate per bucket merge.
  2. Resolve HIGH-2 catalog duplicates — product decision needed.
  3. Decide on M-1 — direct-first vs affinity-first. Document the chosen rule in the service docstring.
  4. Clean up HIGH-3 dead code or wire it up to actually reorder buckets for email/phone-shaped queries.
  5. Sweep through M-2 / M-5 / M-7 in a single pass — all are SQL-shape fixes in the same file.

12. Onboarding + first-run UX audit (onboarding-auditor)

Audit · Onboarding + first-run UX (task #12)

Scope: src/app/(dashboard)/[portSlug]/admin/onboarding, ensureSystemRoots, seed-bootstrap.ts, the required-settings gates (SMTP / branding / EOI signers / recommender), empty-state copy on the main lists, and the "what works out of the box" path after POST /api/v1/admin/ports.

Bottom line: the checklist is the right shape but three of its nine auto- checks read the wrong setting key, the forms step links to nowhere, fresh ports ship with zero domain data (no berths, no tags, no signers), and nothing prompts a freshly-invited admin to even open the checklist. A new port is technically usable for clients/companies but cannot generate an EOI without manual SQL or several blind admin visits.


CRITICAL

C1. Three checklist auto-checks read keys that no admin page ever writes

src/components/admin/onboarding-checklist.tsx STEPS declares autoCheckSettingKey values that don't match what the linked admin pages actually persist:

Step Checklist reads Admin page actually writes
email sales_email_smtp_host smtp_host_override (email page) / sales_smtp_host (sales-email card)
documenso documenso_api_url documenso_api_url_override
settings recommender_top_n_default nothing — DEFAULT_RECOMMENDER_SETTINGS covers all keys, admin never has to save

Effect: a port that has actually been fully configured will still show those three steps as incomplete. The "manual mark done" fallback is hidden behind an extra click, and the percentage bar is permanently stuck below 70 %. This makes the checklist actively misleading — operators stop trusting it.

Fix: rename the keys to the _override variants (or both) and drop the recommender auto-check (or check heat_weight_* whose presence genuinely means "admin tuned it").

C2. forms step href is broken

STEPS[8].href = '../' resolves through the Link template to /${portSlug}/admin/..//${portSlug}/ (the dashboard). The intended target (src/app/(dashboard)/[portSlug]/admin/forms/page.tsx) exists and is what the description references. Should be 'forms'.

C3. No gate on EOI signer identity

The checklist treats documenso_api_url (sic — see C1) as proof of Documenso readiness, but the EOI pathway also requires documenso_developer_name, documenso_developer_email, documenso_approver_name, documenso_approver_email, and documenso_eoi_template_id. Without these, buildDocumensoPayload sends recipients with empty names/emails or the template-generate call 404s on a missing template id. There is no visible warning until a rep tries to send the first EOI and Documenso bounces it. Add an autoCheckSettingKey (or a derived multi-key check) for each so the step doesn't go green until the developer + approver + template are all populated.

C4. ensureSystemRoots is awaited but its failure mode poisons port creation

src/lib/services/ports.service.ts:46 awaits ensureSystemRoots(...) after the INSERT INTO ports has committed (no surrounding tx). The inline comment claims "non-fatal if this throws" — but a throw propagates out of createPort, the route returns 500, and the operator sees a failure even though the port row is live. The next admin action self-heals through ensureEntityFolder's fallback, but the failed response leaves the operator suspicious and re-POSTing produces a 409 slug already exists. Either wrap port + folders in a transaction or catch + log + continue here so the error message matches the comment's promise.


HIGH

H1. createPort seeds nothing beyond folders

createPort only writes the port row and the three system folders. It does not seed:

  • Default tags (the checklist asks for "starter tags" but offers no one-click default set)
  • Default brochure (rep can't send the "send brochure" flow until one is uploaded; nothing flags this)
  • Berths (no UI to add berths; the only path is scripts/import-berths-from-nocodb.ts)
  • berth_rules (defaults vary per trigger and are off for berth_unlinked — fine, but the absence isn't surfaced)
  • email_from_address / branding_app_name (used in emails but not validated; sending mail with a blank from address fails silently on most providers)
  • Recommender weight rows (defaults work but the onboarding step reads the absence as "incomplete" — see C1)

Net effect: an admin can finish every onboarding step and still have a port that can't generate an EOI (no berths, no developer/approver, possibly no template) or send a brochure (no brochure exists). The checklist needs either (a) a "Seed defaults" button on port creation that writes recommended starter rows, or (b) explicit failing gates per domain.

H2. Storage step has no in-app action

autoCheckSettingKey: 'storage_backend' only flips green when a row exists in system_settings — but the default backend (s3) is inferred in code from loadStorageConfig() when no row is present, so a perfectly functional s3-backed install never writes that row and the step stays red forever. /admin/storage is read-only (status panel + test connection); switching backends still requires a manual UPDATE system_settings + pnpm tsx scripts/migrate-storage.ts. Either add the writer UI or change this step to verify getStorageBackend() round-trips a probe object.

H3. Roles step auto-ticks immediately

/api/v1/admin/roleslistRoles() returns all roles unfiltered by portId, so the six global system roles created by seedBootstrap make the count > 0 on the freshest possible port. The step turns green without the admin doing a thing, and the description "Create roles & assign users" implies they did. Auto-check should be a per-port subset, e.g. count rows in user_port_roles for portId.

The onboarding checklist lives under Admin → Tenancy → "Onboarding checklist" (admin-sections-browser.tsx:300), described as "read-only references" (which it is not — it has working manual checkboxes). A freshly invited port-admin who logs in lands on /{portSlug} and sees empty stat cards, with no banner, toast, or "Finish setup" CTA pointing at the checklist. Discoverability is effectively zero unless they know the URL. At minimum: dashboard banner when < X of the auto-checks are passing, dismissible per user.

H5. Berth list empty state misleads fresh ports

src/components/berths/berth-list.tsx: title="No berths found", description="Berths are imported from external sources. Adjust your filters...". On a port with zero berths there is nothing to filter — the copy implies the data exists but is hidden. Should branch on totalCount === 0 && noFiltersActive and link to /admin/import with the exact pnpm tsx scripts/import-berths-from-nocodb.ts command, or to a future in-app importer.

H6. Two competing EmptyState components

src/components/ui/empty-state.tsx uses {body, actions}, while src/components/shared/empty-state.tsx uses {description, action}. Different list pages consume different ones (e.g. clients/yachts use shared/, documents-hub uses ui/). Same visual but divergent props will trip up any future "improve onboarding copy" pass. Consolidate.


MEDIUM

M1. Branding auto-check anchors on logo only

branding_logo_url is the proxy for "branding done", but branding_app_name and branding_primary_color are more functionally load-bearing (app name shows in email subjects, color in CTAs). Consider branding_app_name as the gate — or any-of.

M2. Tags step has no "Apply default set" affordance

/admin/tags starts blank. Onboarding tells the operator to "define starter tags" but offers no recommended palette. Add a one-click "Apply recommended set (Hot / Warm / Cold / VIP / Press)" or similar so operators have an opinionated baseline they can edit.

M3. Settings auto-check confuses "value exists" with "operator chose it"

Once the admin opens /admin/settings and saves without changing anything, settings-manager.tsx writes the default back as a real row and the checklist turns green. That's a side effect, not informed consent. Use a sentinel ("admin saw this page") rather than a defaultable knob.

M4. admin-sections-browser description is wrong

"Setup checklist for fresh ports (read-only references)" — OnboardingChecklist has working toggleManual + persisted state. Update the copy or it discourages clicking in.

M5. Vocabularies are global-code-constant

Interest sources / statuses / contact reasons come from VOCABULARIES in src/lib/vocabularies.ts, not from per-port settings. Fine for MVP, but the onboarding doc says "vocabularies" implying configurability. Either expose per-port overrides or remove the mention.

M6. Documents hub root view doesn't tell admins why Clients//Companies//Yachts/ exist

On first visit to /{portSlug}/documents, the system roots are present (from ensureSystemRoots) but with zero children. Empty-state copy ("Upload a file...") doesn't explain that the three locked system folders will auto-populate as deals progress.


What works well

  • seedBootstrap is genuinely idempotent and safe to re-run.
  • ensureSystemRoots race semantics are clean; the partial-unique index pattern is exemplary.
  • DEFAULT_RECOMMENDER_SETTINGS plus loadRecommenderSettings's layered (port > global > default) lookup means recommender is the one subsystem that genuinely works zero-config.
  • The checklist UI affordances (progress bar, auto-detected hint, manual-override button) are solid; only the wiring is wrong.

(~1,290 words)


27. Type-safety + drizzle leak audit (types-auditor)

Type-Safety + Drizzle Leak Audit — Task #27

Branch: feat/documents-folders · 2026-05-12

Top-line counts (src/, ts+tsx)

Pattern Count
as unknown as 72
as any (raw, mostly route hrefs) 69
// eslint-disable @typescript-eslint/no-explicit-any 73
// @ts-ignore / // @ts-expect-error 0
as Route (typed-routes cast) 17
$inferSelect / $inferInsert direct exports 0
Bare : any parameter (not eslint-disabled) 2 functional + 2 declarations

Good news up front: no @ts-ignore / @ts-expect-error anywhere, and no $inferSelect type leaked through the API boundary as a public response contract. Service return shapes go through { data } envelopes; drizzle row types stay internal.


CRITICAL

1. tx: any in client-restore service — bypasses Drizzle's transaction type contract

src/lib/services/client-restore.service.ts:361

tx: any,

This parameter receives a Drizzle transaction client and threads writes through 12+ downstream tables in a multi-step restore. A typo'd table or wrong column type goes undetected at compile time. Type as Parameters<typeof db.transaction>[0] (see src/lib/db/utils.ts:17 for the same shape applied via as unknown as).

2. useQuery<any> + apiFetch<{ data: any }> on berth detail page

src/components/berths/berth-detail.tsx:2025, 60

const { data, isLoading } = useQuery<any>({...});
apiFetch<{ data: any }>(`/api/v1/berths/${berthId}`)
const berth = data as any;

Three escape hatches stacked on the highest-traffic detail page. Every field access downstream is unchecked — a service-side rename to mooringNumbermooring_number would silently render undefined. Replace with a BerthDetailResponse type co-located with the service.

3. Portal-auth and public routes bypass parseBody

6 portal + 3 public-intake routes use raw await req.json() instead of the project-standard parseBody(req, schema):

  • src/app/api/portal/auth/{forgot-password,reset-password,sign-in,activate,change-password}/route.ts
  • src/app/api/auth/set-password/route.ts
  • src/app/api/public/{residential-inquiries,website-inquiries,interests}/route.ts
  • src/app/api/v1/admin/custom-fields/[fieldId]/route.ts (intentional — comment explains)

CLAUDE.md mandates parseBody so 400 errors have field-level shape the toast hook recognizes. ZodErrors from schema.parse after raw req.json() become generic 500s. Custom-fields one is justified; the other 9 are not.


HIGH

4. mergePerms double-cast in new permission-overrides route

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:254, 259

const out = { ...(base as unknown as Record<string, Record<string, boolean>>) };
return out as unknown as RolePermissions;

Comment acknowledges this duplicates withAuth's deepMerge. Either reuse deepMerge from helpers.ts (lines 202205, 234237 already use the same pattern) or extract a typed helper mergePermsTyped(base, patch): RolePermissions. Two implementations of permission merge is a divergence risk.

5. Audit-log as unknown as Record<string, unknown> epidemic

21 occurrences across services that write oldValue / newValue to audit_logs:

  • invoices.ts × 7, expenses.ts × 6, documents.service.ts × 2, berths.service.ts × 2, companies.service.ts, company-memberships.service.ts, yachts.service.ts, document-templates.ts, ocr-config.service.ts × 2, ai-budget.service.ts × 2

Wide repetition of the same widening cast is a smell — every service does the same dance to fit Drizzle row types into the audit JSONB column. Fix: introduce toAuditJson<T>(row: T): Record<string, unknown> once in src/lib/services/audit.ts (same pattern gdpr-bundle-builder.ts already uses — toJsonRow, line 152 comment explicitly cites avoiding this). Removes 21 unsafe casts in one shot.

6. next/typedRoutes defeated by 49 as any href casts

router.push(..) and <Link href={..}> with template-literal dynamic URLs get widened to string, which isn't assignable to Route<string>. Components compensate via as any (49 sites) or as Route (17 sites). Hotspots: command-search.tsx (10), topbar.tsx (10), user-menu.tsx (8), reservation-list.tsx (6), residential lists/headers (8).

This nullifies the value of experimental.typedRoutes everywhere it matters most — dynamic navigation in shells, search, and detail headers. Fix: introduce route(path: string): Route helper in src/lib/routes.ts that does the cast in one audited place; ban as any/as Route for href via ESLint rule. Bonus: makes it possible to migrate to a real typed-routes wrapper later.

7. RolePermissionsRecord<string, unknown> round-trip in withAuth chain

src/lib/api/helpers.ts:203, 235 — two layers of permission merge cast both directions to satisfy deepMerge's untyped signature. deepMerge should be generic over <T extends Record<string, unknown>> and accept RolePermissions directly. Same problem as #4; same fix.


MEDIUM

8. Record<string, unknown> JSONB writes without zod re-parse at write time

Server-side blobs stored to system_settings.value, userProfiles.preferences, audit JSONB columns:

  • ocr-config.service.ts:79, 85value: value as unknown as Record<string, unknown>. Upstream zod parse exists, so safe in practice, but the cast hides the relationship.
  • ai-budget.service.ts:88, 94 — same pattern.
  • me/route.ts:148168good model: explicit ALLOWED_PREF_KEYS allow-list + 8KB size cap + zod via parseBody. Use as the template for the other two.
  • components/admin/settings/settings-manager.tsx:237, 250 and settings-form-card.tsx:7993 — client-side Record<string, unknown> state. Admin-only surfaces, low risk, but no per-key shape check before PUT.

9. Dynamic-sort key cast in invoices list

src/lib/services/invoices.ts:199

column: invoices[query.sort as keyof typeof invoices] as unknown as PgColumn,

The query.sort zod schema should already enum-restrict sort keys to actual columns; if so, the inner as keyof typeof invoices is redundant. If query.sort is a free string, this is also a SQL-shape risk surface (mitigated only because Drizzle column proxy will throw on unknown keys). Verify the validator enum is exhaustive.

10. Template-preview accepts arbitrary content as TipTap

src/app/api/v1/admin/templates/preview/route.ts:32

const doc = body.content as unknown as TipTapNode;

Admin-gated, so blast radius limited, but the renderer downstream assumes well-formed TipTap JSON. Add a minimal tipTapNodeSchema zod check at the boundary — a malformed node tree would otherwise throw deep in the renderer with a useless stack trace.

11. Node stream ↔ Web stream casts

5 sites cast between NodeJS.ReadableStream, Readable, and ReadableStream<Uint8Array> via as unknown as:

  • src/app/api/storage/[token]/route.ts:103
  • src/lib/services/expense-pdf.service.ts:507510
  • src/lib/services/document-sends.service.ts:374
  • src/lib/services/brochures.service.ts:255, berth-pdf.service.ts:350

Type system gap is genuine (Node Readable ↔ Web ReadableStream don't have a structural match in lib.dom + @types/node). Centralize in src/lib/storage/stream-bridge.ts with named helpers toWebStream(readable) / toNodeStream(web) — removes the casts from feature code.

12. as unknown as { destroy: () => void } stream cleanup

brochures.service.ts:255 and berth-pdf.service.ts:350 reach into stream internals because the storage backend's return type doesn't expose destroy. Add destroy?(): void to the StorageBackend.get() return type so cleanup is part of the contract.

13. as unknown as string for pdfme BLANK_PDF sentinel

9 PDF templates carry basePdf: 'BLANK_PDF' as unknown as string. This is a known pdfme upstream type-def bug — the string literal 'BLANK_PDF' is accepted at runtime but typed as Uint8Array | string. Wrap once: const BLANK_PDF = 'BLANK_PDF' as unknown as string; exported from src/lib/pdf/constants.ts. Removes 9 casts.

14. Drizzle self-FK uses : any

src/lib/db/schema/system.ts:43

revertOf: text('revert_of').references((): any => auditLogs.id),

Standard Drizzle workaround for forward-references, but the official typing is (): AnyPgColumn. Swap.

15. phone-parse.ts metadata require

src/lib/dedup/phone-parse.ts:25const metadata: any = require(...) for libphonenumber-js/metadata.min.json. CommonJS interop hack; replace with import metadata from 'libphonenumber-js/metadata.min.json' + resolveJsonModule: true (already on in tsconfig.json).


Drizzle leak check — clean

Searched for $inferSelect / $inferInsert exports crossing the API boundary: zero hits in src/. Services return Drizzle row types internally, but every API route wraps them in { data } envelopes (confirmed by spot-check across invoices, berths, clients, documents). The Record<string, unknown> widenings flagged above happen at write time into JSONB columns, not at read time across the API surface. No PII columns or internal-only fields slip through.


  1. Critical first: fix #1 (tx: any), #2 (berth detail <any>), #3 (parseBody for portal auth) — 34h.
  2. One helper, big win: toAuditJson<T> (#5) — removes 21 casts.
  3. Route helper: route() (#6) — removes 49+ as any and unblocks future real typed-routes adoption.
  4. Stream bridge: centralize Node↔Web conversion (#11) — removes 5 casts.
  5. PDF constant: extract BLANK_PDF (#13) — removes 9 casts.

Net effect: ~85 of 145 escape hatches removed with five focused refactors, and the remaining ones become small enough to justify case-by-case.


31. Auth flow polish audit (auth-flow-auditor)

Auth Flow Polish Audit — Task #31

Scope: CRM (auth)/ pages (login, reset-password, set-password), portal (portal)/ pages (login, activate, reset-password, forgot-password), email-change confirm/cancel landing, /api/auth/resolve-identifier, withAuth gates and Better Auth config.

Severities: CRITICAL = silent security/data risk · HIGH = real user can hit a dead-end · MEDIUM = polish/copy that erodes trust.


CRITICAL

C1 — Password reset does not revoke existing sessions on either flow

Better Auth's sendResetPassword (src/lib/auth/index.ts:73) is configured with no onPasswordReset / revokeAllSessions hook; the same is true for resetPassword in portal-auth.service.ts:428. Outcome: a compromised cookie keeps working forever after the legitimate owner does the "forgot password" dance. This is the canonical "session-bumping on reset" guarantee users assume and we're not delivering it. Add a step that deletes every row in sessions (CRM) and portal_auth_tokens + active portal_sessions (portal) for the affected user inside the same transaction that writes the new password hash.

withAuth rejects with 403 "Account disabled" when userProfiles.isActive === false (src/lib/api/helpers.ts:152), but:

  • auth.api.signInEmail itself doesn't know about isActive — a disabled user can still complete /login and be redirected to /dashboard, where every API call then 403s.
  • Setting isActive=false in updateUser (users.service.ts:227) never deletes the existing sessions row, so an already-logged-in disabled user keeps every page that doesn't hit /api/v1 working, and any cached SSR page loads.

Fix: (a) on signIn add a profile lookup and reject before issuing the cookie; (b) on isActive=false flip, delete from sessions for that userId; (c) middleware should treat a 403 from an API as a global redirect to /login?reason=disabled.


HIGH

Every token failure today is surfaced as a toast on a still-functional form, or as a 400 JSON error that the user only sees if they actually submit the form.

  • set-password/page.tsx (CRM) handles !token (the "Link is missing or invalid" branch, line 73-88) but does NOT distinguish "token present but expired" from "token present but already used" — both surface as a toast.error(body.message) and leave the form interactive, inviting an infinite retry loop.
  • The portal password-set-form.tsx is identical (line 63-67): expired/used tokens render as a red <p> under the form.
  • There is no /account-disabled page; the user just sees 403 Account disabled text from the JSON response in DevTools and gets stuck on /dashboard rendering nothing (or rendered SSR shell with broken API calls).

Recommend: a single <TokenStateMessage state="expired" | "used" | "invalid" /> component that the page server-renders by doing a HEAD-style validate call on mount, plus a /disabled route the middleware redirects to.

H2 — Email-change /settings?emailChange=confirmed|cancelled query param is never consumed

The confirm/cancel redirect URLs at api/v1/me/email/{confirm,cancel}/[token]/route.ts:71/50 set ?emailChange=confirmed|cancelled. Grep shows ZERO consumer in src/app or src/components. The redirect succeeds, but the user lands on the bare /settings page with no banner, no toast, no confirmation — for a security-sensitive action this looks broken / makes users wonder if it took. Wire a banner in user-settings.tsx keyed off useSearchParams().get('emailChange').

api/v1/me/email/cancel/[token]/route.ts is a one-click GET that wipes the pending change. Gmail/Outlook link prefetchers, antivirus URL scanners, and corporate proxies will auto-fetch links and cancel a legitimate request without the user ever clicking. Pattern for safety: GET renders a confirmation page (Are you sure? button), POST executes. Same fix needed on confirm if a link-scanner could pre-confirm an attacker's address-change before the real user sees the cancel link.

H4 — set-password (CRM) success path has no auto-sign-in

set-password/page.tsx:64 toasts "Password set successfully" then routes to /login. The user has to type their email and the password they just chose, again. For invite flows this is the worst conversion point. Either (a) auto-sign-in via auth.api.signInEmail after the consume call returns, or (b) at minimum prefill the email field on /login. (Portal's activate flow has the same problem in password-set-form.tsx:97).

H5 — Reset-password expiry shows no time estimate; users hit "expired" cold

CRM (auth)/reset-password/page.tsx:62 says "we have sent a password reset link" with no TTL. Better Auth default reset token expiry is 1 hour (the email body on line 79 mentions "expires in 1 hour" but the success page doesn't echo this). Portal forgot-password (forgot-password/page.tsx:43) correctly says "expires in 30 minutes". Make the CRM message say "Link expires in 1 hour" so users at the airport know whether to wait.

H6 — resolve-identifier returns 429 with { email: '' } which bypasses the synthetic miss path

api/auth/resolve-identifier/route.ts:56 returns { email: '' } on rate-limit. Client (login/page.tsx:56) does payload.email?.trim() || identifier — so the original username (without @) is passed into authClient.signIn.email. Better Auth rejects it as "invalid email format" instead of "invalid credentials", which is a distinguishably-different error from the normal miss case and re-opens the enumeration channel the synthetic-email defence was built to close. Return { email: syntheticEmail(raw) } on the 429 path too (status code can stay 429).


MEDIUM

M1 — Login error toast leaks Better Auth wording

login/page.tsx:65 uses result.error.message ?? 'Invalid credentials'. Better Auth surfaces strings like "User not found" / "Invalid password" / "Email or password is invalid" depending on the path — the first two are an enumeration leak that bypasses the resolve-identifier defence. Always overwrite to a fixed 'Email or password is incorrect.' and only log the underlying reason server-side.

M2 — Portal sign-in error message is friendlier than CRM

Portal: 'Invalid email or password'. CRM: raw Better Auth message OR 'Invalid credentials'. Unify on "Email or password is incorrect" everywhere (matches CRM (auth)/login/page.tsx:65 and portal (portal)/portal/login/page.tsx:37) — the CRM phrasing "Invalid credentials" is jargon.

M3 — set-password divergence: form-validation TTL mismatch

CRM set-password/page.tsx requires min 9 chars (line 16). Portal password-set-form.tsx also 9 (line 23). But the activation/CRM invite TTL diverges silently: CRM invite = 72h (crm-invite.service.ts:17), portal activation = 72h (portal-auth.service.ts:25), portal reset = 30min, CRM reset = 60min. The "request a new link" copy in the invalid-token branch should embed the actual TTL so admins debugging "why doesn't this work" don't have to read the schema.

M4 — set-password (CRM) error fallback is inconsistent shape

(auth)/set-password/page.tsx:60 reads body.message ?? body.error — but api/auth/set-password/route.ts uses errorResponse(err) which emits { error }. The message key is dead code, fine, but the legacy comment on set-password/route.ts:24 says envelopes were normalised in commit "auditor-F §32" — the page should match: body.error ?? 'Failed to set password.'.

M5 — Portal forgot-password 30-min TTL is short for international clients

30 minutes is aggressive when emails routinely sit in spam quarantine for 5-15 minutes before clearing. CRM reset's 60min is a sensible floor. Either lift to 60min or surface the 30min countdown more aggressively in the email + landing page.

M6 — login Suspense fallback for set-password renders empty shell

set-password/page.tsx:143 falls back to <BrandedAuthShell>{null}</BrandedAuthShell> — a flash of empty branded card while useSearchParams resolves. Replace with a skeleton or "Verifying link…" microcopy; the empty state reads as "page broken" for ~100ms on slow networks.

M7 — /portal/activate Suspense fallback is unbranded grey div

portal/activate/page.tsx:8 falls back to a plain Loading… div — jarring after the branded email. Mirror the CRM set-password pattern with <BrandedAuthShell>. Same on portal/reset-password/page.tsx:8.

password-set-form.tsx:86 always points to /portal/forgot-password. For activation the user has no password yet — /portal/forgot-password returns the silent 200 and the admin has to manually resendActivation. Branch on endpoint, or give portal users a self-service "Resend activation".

M9 — No "Remember me" / shared-device control

Better Auth session expiresIn: 24h (auth/index.ts:94); portal token also 24h. No checkbox to shorten on a shared device, no copy saying so. Add a session-only cookie path.

M10 — Portal login next param is unvalidated

portal/login/page.tsx:42: router.replace(next as never) where next = search.get('next'). Open redirect: /portal/login?next=https://evil.example navigates cross-site after sign-in. Validate next.startsWith('/portal/') before using.


Summary

  • 2 CRITICAL (no session-revoke on password reset; disabled-user keeps session)
  • 6 HIGH (no expired/used/disabled landing pages; emailChange success param consumed by nobody; GET-cancellation prefetch risk; no auto-sign-in after set-password; missing TTL copy; rate-limit branch leaks enumeration)
  • 10 MEDIUM (copy inconsistencies, branded-shell drift, open redirect in portal next, no shared-device session control)

Token mechanics themselves are sound (32-byte CSPRNG, SHA-256 storage, single-use markers, dual rate-limit buckets, anti-enumeration silent-200 on forgot-password, dummy-hash timing equalisation in portal signIn). The polish gaps are in what happens after a token succeeds or fails — landing pages, banners, session lifecycle.


30. Image + asset hygiene audit (asset-auditor)

Image + Asset Hygiene Audit (Task #30)

Scope: uploaded-image handling across avatar, brochure, berth-PDF, generic file uploader, receipt scanner, and the new portrait avatar cropper. EXIF, MIME spoof, polyglots, server-side resize, dimension caps, SVG/GIF risk, filename sanitisation, Content-Disposition, per-surface size caps.

Files reviewed (highlights):

  • src/lib/constants/file-validation.ts (allow-list + magic bytes)
  • src/lib/services/files.ts (uploadFile + previews)
  • src/lib/services/storage.ts (sanitizeFilename)
  • src/app/api/v1/me/avatar/route.ts + src/components/shared/image-cropper-dialog.tsx
  • src/app/api/v1/files/upload/route.ts
  • src/app/api/storage/[token]/route.ts (filesystem-backend proxy)
  • src/lib/services/berth-pdf.service.ts, brochures.service.ts, expense-pdf.service.ts
  • src/app/api/v1/documents/[id]/download/[...slug]/handlers.ts

CRITICAL

C1. No server-side image normalisation on avatar / generic image uploads — EXIF (GPS) is persisted and served verbatim

Where: src/app/api/v1/me/avatar/route.ts:46-68, src/lib/services/files.ts:45-72.

The /api/v1/me/avatar handler takes the multipart body, checks size (≤2 MB), runs bufferMatchesMime (first-bytes-only) and writes the bytes straight to storage. The "cropper" (image-cropper-dialog.tsx) does run a Canvas re-encode client-side, which incidentally drops EXIF — but a malicious user (or simply any user with curl) can bypass the cropper by POSTing the raw image directly to the same endpoint. The route accepts whatever JPEG/PNG/WebP/GIF arrives and the generic uploader (/api/v1/files/upload) has the same property: no sharp().rotate().toBuffer() normalisation, no EXIF strip, no ICC profile reset, no re-encode.

Result: every photo uploaded from a phone — receipt scans (/expenses/scan), client/yacht photo attachments via FileUploadZone, manual avatar PUTs — is served from MinIO with full EXIF: GPS latitude/longitude, device serial, photographer name, original capture timestamp.

GDPR/PII exposure (audit #8 already flagged related issues). The previewer just hands a presigned URL straight to the browser, so any rep / client portal visitor with a download URL gets the metadata.

Fix: run every accepted image/* payload through sharp(buf).rotate() .withMetadata({ orientation: undefined }).toBuffer() (or .toFormat(jpeg|png|webp)) in uploadFile() before backend.put(...). Sharp is already a dependency (used by expense-pdf.service.ts). Same wrapper enforces a max-pixel cap (see H1).

C2. No max-dimension / decompression-bomb gate on uploaded images

Where: src/lib/services/files.ts:38-72, src/app/api/v1/me/avatar/route.ts.

MAX_FILE_SIZE is 50 MB (avatar: 2 MB). Neither path inspects width/height. A 2 MB highly-compressed PNG can decode to >300 megapixels (e.g. a 30000×30000 palette PNG). Any downstream consumer that decodes:

  • the <AvatarImage> in the React UI,
  • pdf-lib/pdfme embedding the image into a generated client/interest PDF,
  • sharp resize in expense-pdf.service.ts, will OOM or pin a worker. The receipt-PDF service runs sharp but only when raw.byteLength > 500 KB, so a 400 KB decompression-bomb PNG skips the threshold and sharp is called on the embed path with no dimension cap, and PDFKit attempts to embed the raw bytes.

Fix: in the normalisation step from C1, cap output to MAX_DIMENSION = 4096 (or 2048 for avatar) using sharp.resize({fit:'inside',withoutEnlargement:true}) and reject any source whose metadata().pixels > LIMIT before allocating the decode buffer.


HIGH

H1. Magic-byte check is prefix-only — polyglots pass

Where: src/lib/constants/file-validation.ts:48-87.

bufferMatchesMime checks the leading 38 bytes. PNG/JPEG/GIF/WebP/ZIP-based office formats all share short, well-known prefixes. A file beginning FF D8 FF ... <PDF body> ... <HTML> <script>...</script> passes as image/jpeg, lives in storage as image/jpeg, and gets served from a presigned MinIO URL with Content-Type: image/jpeg. With nosniff set this is mostly inert in modern browsers, but:

  • The S3-presigned download URL is hit directly by the browser (the proxy with X-Content-Type-Options: nosniff is only on the filesystem backend at /api/storage/[token]). MinIO/S3 does not add nosniff automatically.
  • The signed URL is on the same origin's CDN for portal users when MinIO is fronted by the marketing site, raising same-origin sniff risk.

The avatar/general path has no trailing-byte gate. Compare with the PDF path which at least checks both %PDF- prefix (good) and doesn't enforce a trailing EOF marker — same shape weakness.

Fix:

  • After the prefix check, run sharp(buf).toFormat(declared) re-encode (from C1) which strips any non-image trailer.
  • Force ResponseContentDisposition/ResponseContentType on the presigned download (minio-js supports both via respHeaders) so MinIO emits X-Content-Type-Options: nosniff regardless of object metadata.

H2. Filesystem backend proxy enforces stronger checks than the S3 path

Where: src/app/api/storage/[token]/route.ts:217-225 vs src/lib/storage/s3.ts:249-262.

The filesystem PUT proxy does a magic-byte check on the streamed body when the token's declared content-type is application/pdf. The S3 presigned PUT (used in prod) lets the browser stream straight to MinIO — the only post-upload verification is in berth-pdf.service.ts:234-262 and brochures.service.ts:230-263. Generic image uploads via S3 presigned PUT have no post-upload verification at all because no caller currently mints presigns for arbitrary images — but the abstraction allows it. If a future caller ever presigns a non-PDF, the S3 path will accept anything.

Fix: make presignUpload accept a verifyMagicBytes: true flag and require every caller to opt in/out explicitly. Or wrap S3 presigns in a one-shot post-upload head + first-5-bytes verifier (the brochure path already does this; lift it into getStorageBackend().registerUpload(...)).

H3. Animated GIF is allowed with no frame cap

Where: ALLOWED_MIME_TYPES includes image/gif. No upstream consumer inspects metadata().pages or metadata().delay.

A 50 MB animated GIF with 5000 frames at 5 ms delay will burn CPU on every rep's client list view and on PDF embed. Also a known browser DoS vector.

Fix: during the sharp normalisation (C1), pass { animated: false } so only the first frame is kept, or set pages: 1. Or drop GIF from ALLOWED_MIME_TYPES entirely — the CRM has no real reason to accept it (reps share PNG/JPEG, brochures are PDF).

H4. Avatar Content-Type echoes browser-declared MIME — preview endpoint trusts blindly

Where: src/app/api/v1/me/avatar/route.ts:53mimeType: fileEntry.type || 'image/jpeg'.

fileEntry.type is the browser-declared type. Magic bytes are checked but the stored content-type is still the declared one. If a uploader claims image/webp but sends a JPEG (passes magic-byte check against jpeg signature? no, but a crafted polyglot can pass webp's RIFF check while embedding extra bytes), the stored mimeType is wrong. Downstream PREVIEWABLE_MIMES check goes off files.mimeType so the server's content-type lies.

Fix: after the magic-byte check, derive the canonical MIME from the matched signature (one entry per signature) and store that, not the browser-declared form.


MEDIUM

M1. Content-Disposition for /api/v1/documents/[id]/download/... lacks RFC 5987

Where: src/app/api/v1/documents/[id]/download/[...slug]/handlers.ts:68,92-94.

sanitizeFilenameForHeader replaces "/\/CRLF with _ but emits only filename="..." — Unicode filenames render as mojibake / get truncated by Firefox + Safari. The /api/storage/[token] proxy gets this right (filename*=UTF-8''<encoded>); the doc download doesn't. The other ad-hoc PDF exports (clients/[id]/export-pdf, berths/[id]/export-pdf, interests/[id]/export-pdf) hard-code ASCII filenames and skip RFC 5987 too — acceptable because they're constant, but the doc download is dynamic.

Fix: mirror the storage-proxy form: attachment; filename="<sanitised-ascii>"; filename*=UTF-8''<encoded> and switch the disposition from inline to attachment for non-previewable MIMEs (the current inline lets a malicious file open in-page even with nosniff).

M2. sanitizeFilename doesn't strip RTL/zero-width Unicode

Where: src/lib/services/storage.ts:15-22.

Strips [/\\:], NUL, and \x01-\x1f\x7f. Doesn't touch:

  • U+202E RIGHT-TO-LEFT OVERRIDE — classic Windows-icon-spoof vector (invoice_fdp.exe displays as invoice_exe.pdf).
  • U+200B/U+200C/U+FEFF zero-widths — collision spoofs in folder listings.
  • Surrogate halves.

Fix: Unicode-normalise (name.normalize('NFC')) then drop the Cf/bidi-control category, e.g. via /[---]/gu.

M3. No per-surface size caps beyond avatar/PDF

Where: file-validation.ts (50 MB), avatar (2 MB), berth PDFs (admin setting), brochure (admin setting). The generic uploader has only the 50 MB ceiling — applies equally to a yacht photo, a maintenance-log attachment, a client document scan. Reps could legitimately upload a 49 MB phone-camera PNG and it would be embedded into PDFs without resize.

Fix: uploadFile should branch on category (avatar, yacht_photo, maintenance, attachment, …) and apply per-category byte + dimension caps, not a flat 50 MB.

M4. text/plain / text/csv have no signature verification

Where: file-validation.ts:71-72 (intentionally unconstrained), served through the same presigned URL path as binary files. A user can upload evil.html claiming text/plain; with nosniff plus the stored Content-Type: text/plain modern browsers display it as text, but stale links that get loaded via <iframe src="..."> will render as the declared type. Lower risk than C1/H1 but worth tightening.

Fix: sniff: reject when the first 512 bytes contain <script, <html, <!DOCTYPE, or non-printable bytes outside common encodings. Or require an explicit category: 'text' for these MIMEs and refuse them on the avatar / attachment surfaces.

M5. Image cropper outputs only 512 px JPEG @0.85 — no enforcement that the upload matches

Where: src/components/shared/image-cropper-dialog.tsx:51,70.

outputWidth = 512 is the only client-side cap. Once the cropped JPEG hits the server, the server does not verify that the avatar is square or under some pixel ceiling — the server just sees a 2 MB image. A scripted client can ship a 4000×4000 JPEG straight to /api/v1/me/avatar because the cropper is client-side. Tied to C1's fix (normalise + resize server-side).

M6. No HEIC/HEIF support — iOS share-sheet uploads silently fail

Where: ALLOWED_MIME_TYPES. iPhones default photo format is image/heic; the receipt scanner (scan-shell.tsx:494) uses accept="image/*" so the browser allows the pick, then the upload 400s with "type not allowed". UX regression more than security.

Fix: add image/heic + image/heif to the allow-list and transcode to JPEG in the same normalisation pass (sharp 0.34 supports HEIC via libvips

  • libheif, but check the deploy image first).

Already strong (no action)

  • PDF magic-byte gate on both in-server and presigned-PUT paths (berth-pdf.service.ts, brochures.service.ts, filesystem proxy).
  • SVG excluded from ALLOWED_MIME_TYPES — no SVG-XSS surface in user uploads. The only SVG generation is the chart-card data-URI which is produced by the CRM, not user-controlled.
  • Filesystem-backend proxy sets X-Content-Type-Options: nosniff + Cache-Control: private, no-store + single-use HMAC tokens.
  • Storage key derivation is UUID-based (generateStorageKey) so original filename never controls a path — no path-traversal surface from filenames.
  • uploadFile allow-list + size cap + magic-byte composition.

  1. C1 + C2 + H1 together: introduce a normalizeAndStoreImage() helper in src/lib/services/files.ts that runs every accepted image through sharp().rotate().withMetadata({orientation:undefined}).resize().toFormat() before backend.put(). Drops EXIF, kills polyglot trailers, caps pixels.
  2. H4 + M5: derive canonical MIME from the matched signature; treat mimeType field as server-authoritative.
  3. M1 + M2: tighten filename + disposition headers.
  4. H2: lift the post-upload verify out of berth-pdf / brochure into the storage abstraction.
  5. M3 / M6: per-surface caps + HEIC transcode (deploy-image work).
  6. H3 / M4: drop GIF or freeze to first frame; sniff text payloads.

(word count ≈ 1280)


21. Mobile + PWA + iOS quirks audit (mobile-pwa-auditor)

Mobile + PWA + iOS quirks audit

Branch: feat/documents-folders · Scope: src/app/(scanner), src/components/layout/mobile/*, src/components/search/mobile-search-overlay.tsx, src/components/shared/drawer.tsx, src/middleware.ts, public/manifest.json, public/icon-*.png, root layout viewport/metadata, tailwind.config.ts safe-area utilities.


CRITICAL

None blocking ship.

HIGH

H1. No service worker is registered — /scan PWA has zero offline capability

Grep serviceWorker|navigator.serviceWorker|workbox|next-pwa returns nothing across src/ + public/. The per-port manifest declares display: 'standalone' and the scanner's whole product premise is "rep walks the marina with a phone capturing receipts", i.e. exactly the situation where Wi-Fi drops to nothing between pontoons. Consequences:

  • iOS Add-to-Home Screen installs succeed but cold-launch with no signal fails at the first network call (Next.js page chunks 404 in WebView).
  • The OCR + upload + create-expense chain in ScanShell (src/components/scan/scan-shell.tsx) has no offline-queue / retry. kind: 'error' is rendered and the only artifact is the in-memory blob — closing the PWA loses the photo and the manual-typed fields.
  • Android Chrome will refuse to fire beforeinstallprompt without a service worker, so the install prompt never auto-surfaces.

Fix (in priority order): (1) ship a minimal Workbox/next-pwa SW that precaches the scanner route + Tesseract WASM + lucide icons, (2) wrap the expense submit in an outbox (IndexedDB queue → background sync), (3) capture beforeinstallprompt and surface an "Install" CTA inside the idle-state scanner card.

H2. Two manifests overlap — root /manifest.json and dynamic /[portSlug]/scan/manifest.webmanifest

  • Root layout (src/app/layout.tsx:47) declares manifest: '/manifest.json' with start_url: '/', theme_color: #0f172a (slate-900). Root viewport says themeColor: '#1e2844' (navy). Two different theme colors → Chrome will pick the <meta name="theme-color"> from <head> (navy) but the manifest install splash will use #0f172a. Cosmetic mismatch on install.
  • Scanner manifest overrides scope to /<portSlug>/scan with theme_color: #3a7bc8 (brand blue) and viewport themeColor: '#3a7bc8' — internally consistent. ✓
  • Issue: if a rep visits /<portSlug>/dashboard and hits "Add to Home Screen" (rare but possible), they get a PWA whose start_url is / which redirects to /login on every cold-launch because the root <head> resolves the unscoped manifest first. There is no <link rel="manifest"> swap between the two surfaces; Next.js's generateMetadata on the scanner route DOES override the root metadata (verified at src/app/(scanner)/[portSlug]/scan/layout.tsx:28), but root /manifest.json still defines a competing PWA.

Fix: either narrow the root manifest's scope and start_url to /login (so non-scanner installs land on auth), or remove root manifest: and lean solely on the per-port scoped scanner manifest. Add start_url: '/<portSlug>/dashboard' per-port via a second dynamic manifest for the main app, if installable main-app is even desired.

H3. iOS standalone status-bar / safe-area mismatch in the scanner

  • Scanner layout declares appleWebApp.statusBarStyle: 'default' (src/app/(scanner)/[portSlug]/scan/layout.tsx:32) — that's the white-bar-with-black-text style that iOS draws OPAQUELY above the WebView, NOT under it.
  • viewport.viewportFit: 'cover' is set (line 46) which tells iOS to let content extend under safe areas.
  • ScanShell (src/components/scan/scan-shell.tsx:449) renders <main className="mx-auto ... min-h-[100dvh] w-full max-w-xl ... px-4 py-6 sm:py-10">no pt-safe-top, no pb-safe-bottom, no safe-left/safe-right.
  • Result on iPhone 14/15 with home indicator + standalone install: the "Capture receipt" / "Save expense" buttons sit flush against (or under) the home-indicator stripe. The brand logo at the top is fine because py-6 sm:py-10 happens to clear the notch — by accident, not by design.

Fix: add pb-[calc(env(safe-area-inset-bottom)+1rem)] to <main>, switch statusBarStyle to 'black-translucent' so the brand-blue theme paints over the status area (or to 'default' AND remove viewportFit: 'cover'), and add pl-safe-left pr-safe-right for landscape edge-case.

H4. Dashboard mobile shell uses min-h-screen (100vh) instead of 100dvh

src/components/layout/mobile/mobile-layout.tsx:24,29 uses min-h-screen twice. On iOS Safari (not standalone) 100vh is the LARGE viewport height (URL bar collapsed), so on first paint the page renders ~75100px taller than visible. The bottom tab bar is position: fixed so it lands correctly, but <main>'s min-h-screen means content scrolls below the visible viewport on initial load — reps see a blank strip past the tab bar until the URL bar collapses on first scroll.

Fix: swap both min-h-screen for min-h-[100dvh] (Tailwind 3 supports dynamic viewport units). The scanner layout already does this correctly (src/app/(scanner)/[portSlug]/scan/layout.tsx:68).

MEDIUM

M1. Touch targets below 44pt in the mobile search overlay

src/components/search/mobile-search-overlay.tsx:

  • "Cancel" button (line 273) is plain text — no min-height, hit-area ≈ 16px tall. Thumb-prone position next to the keyboard.
  • Clear-X button (line 260) is size-7 = 28px. Below Apple HIG 44pt.
  • Bucket chips (line 344) are px-3 py-1.5 text-xs → ~28px tall. Apple HIG 44pt fail; they're scrollable so misses are recoverable, but each chip needs min-h-[44px] or a transparent expanded hit-box (before:absolute before:inset-0 before:-my-2).

M2. Inline-editable-field hit-areas too small for marina-glove use

src/components/shared/inline-editable-field.tsx:133,172,257 uses h-8 (32px) and h-7 (28px) for the edit-mode inputs and select triggers. Detail pages on mobile share this pattern. Apple HIG fail; reps with wet/salty fingers on a pontoon will mis-tap. Bump to h-11 (44px) on mobile or guard with a min-h-[44px] md:h-8 mobile-first override.

M3. visualViewport.offsetTop ignored in search overlay positioning

src/components/search/mobile-search-overlay.tsx:7686 subscribes to visualViewport.resize + scroll and reads vv.height. The drawer uses top: 12px + computed height. But vv.offsetTop (the visual-viewport's vertical offset within the layout viewport) is not consulted. On iOS Safari with keyboard up + rubber-band scroll, the visual viewport can shift relative to layout; the drawer's top: 12px is layout-viewport-relative, so the top of the drawer can briefly clip up under the URL/status bar. Minor visual artifact; only affects scrolled-during-typing states.

Fix: top: ${(vv?.offsetTop ?? 0) + 12}px.

M4. Mobile bottom tabs lack safe-left / safe-right insets

src/components/layout/mobile/mobile-bottom-tabs.tsx:4247 uses pb-safe-bottom only. The dynamic manifest forces orientation: 'portrait' ONLY when installed as a PWA. In Safari (pre-install) on iPhone landscape, the bottom tab bar tucks under the notch. Add pl-safe-left pr-safe-right (Tailwind pl-safe-left resolves to padding-left: env(safe-area-inset-left)).

M5. Stale memory + suspiciously small PNGs

project_pwa_assets_pending.md claims icons must be added; all four exist in /public (icon-192 = 688B, icon-512 = 2411B, 512-maskable = 2411B, apple-touch = 654B; dated 2026-05-03). Memory note is stale — delete it. However: 688B / 2411B is small for a real branded PWA icon — these look like placeholders. Swap in production artwork before launch.

M6. apple-touch-icon at /apple-touch-icon.png not referenced by the scanner manifest

The root metadata icons block (src/app/layout.tsx:4046) declares apple: '/apple-touch-icon.png' (180×180). The scanner layout only sets manifest: + appleWebApp — it inherits the root icons.apple because Next.js does shallow-merge of metadata. ✓ but only because of inheritance; explicit confirmation in a comment would prevent future regressions if someone overrides icons: in the scanner layout.

M7. No apple-mobile-web-app-status-bar-style mismatch detection between routes

Root layout: 'black-translucent' (matches navy theme + safe-area inset). Scanner: 'default' (white opaque bar). When a rep navigates from /scan into the main CRM via a deep link inside the same PWA install, iOS uses the install-time status bar style and ignores per-page overrides — so depending on which surface they installed FROM, every other surface looks wrong. Pick one style and apply globally; recommend 'black-translucent' plus consistent safe-area-inset usage on every shell.

M8. Vaul drawer repositionInputs={false} defaults are correct, but iOS keyboard layoutViewport vs visualViewport edge case

src/components/shared/drawer.tsx:2022 defaults shouldScaleBackground: false and repositionInputs: false. The comments in mobile-search-overlay.tsx:106118 describe the iOS reasoning correctly. Verified ✓. However, the MoreSheet's <DrawerContent> uses default bottom: 0 anchoring (no visualViewport-based height override). If MoreSheet ever gains a text input, it'll exhibit the same scroll-then-jump the search overlay had to special-case. Currently MoreSheet is link tiles only — non-issue unless inputs are added.

M9. No <NoScript> or offline fallback page anywhere

If the scanner PWA cold-launches with no network and no service worker (H1), Next.js's standalone-mode router will fail-soft to a blank screen. There is no not-found.tsx, error.tsx, or offline.tsx in src/app/(scanner)/[portSlug]/scan/. Goes hand-in-hand with H1.

M10. The legacy /expenses/scan page coexists with the new /scan PWA flow

src/app/(dashboard)/[portSlug]/expenses/scan/page.tsx is a desktop-flavored scan-receipt page inside the dashboard shell — different from the standalone PWA at /[portSlug]/scan. Both upload to the same /api/v1/expenses/scan-receipt and /api/v1/expenses endpoints, but the user-facing flows diverge (the dashboard one has both camera + file picker buttons; the PWA one is camera-first). Confusion risk; pick one or clearly label the dashboard surface as "Upload receipt (desktop)" vs the PWA "Scan receipt".

M11. interests/interest-list.tsx FAB safe-area offset is hand-rolled

Line 350 hardcodes bottom-[calc(env(safe-area-inset-bottom)+86px)] where 86 = tab-bar height (56) + 30px gap. If tab-bar height changes, FAB collides. Extract MOBILE_TAB_BAR_HEIGHT to a shared constant or CSS var.

Quality nits

  • Scanner manifest short_name: 'Scanner' vs appleWebApp.title: 'PN Scanner' → installed-app label differs across iOS/Android. Unify on "PN Scanner".
  • safe-left/safe-right Tailwind utilities are declared (tailwind.config.ts:150154) but never referenced anywhere in src/.
  • must-revalidate on manifest Cache-Control is redundant alongside max-age=300.

What's solid

  • Per-port dynamic manifest with proper scope + start_url.
  • viewportFit: 'cover' + safe-area-inset utilities in topbar/bottom-tabs.
  • -webkit-tap-highlight-color: transparent global (globals.css:98).
  • Vaul defaults shouldScaleBackground: false, repositionInputs: false (drawer.tsx:2022) match iOS+Vaul known-issue guidance.
  • visualViewport.height tracking for above-keyboard sizing (modulo M3).
  • Drawer GPU-compositing hints (globals.css:261267).
  • HEIC-safe capture (accept="image/*" + capture="environment").
  • Tesseract.js on-device first, AI optional — privacy-respecting fallback.
  • Middleware correctly exempts /scan/manifest.webmanifest + /scan from auth (middleware.ts:17,33).

Top 3 to fix before launch: H1 (service worker + offline queue), H2 (manifest scope overlap), H3 (scanner safe-area bottom-button collision). Everything else is polish.


26. Multi-currency + FX correctness audit (currency-auditor)

Multi-currency + FX correctness audit — task #26

Scope: USD-vs-port-currency across berths/invoices/reports/expenses, FX snapshotting, currency_rates retention, rounding, mixed-currency dashboard totals, PDF math, berths_default_currency, hardcoded USD, formatCurrency. Read-only. Branch feat/documents-folders.


CRITICAL

C1. Dashboard "Pipeline Value" sums mixed currencies as USD

dashboard.service.ts:39-51 and :95-160 reduce berths.price into pipelineValueUsd without reading berths.priceCurrency, then the UI labels the result 'USD' (pipeline-value-tile.tsx:45-47, kpi-cards.tsx:19, revenue-forecast.tsx:25). Same bug in getRevenueForecast (weighted pipeline) and the stage-weights total. A single non-USD berth poisons the headline KPI; masked today only because Port Nimara is USD- only. With the new per-port berths_default_currency setting this will detonate as soon as a second port chooses EUR/GBP.

Fix shape: either (a) refuse to aggregate mixed currencies and render a grouped figure like the Revenue Breakdown chart already does, or (b) convert per row via convert(price, priceCurrency, 'USD') and surface the conversion timestamp. (a) is safer — (b) hides FX risk in one number.

C2. Revenue / Pipeline PDF reports drop currency entirely

pdf/templates/reports/revenue-report.ts:78-97 and pipeline-report.ts:91-100 render amounts with Number(...).toLocaleString(undefined, …) — no currency code, no symbol, no formatCurrency. The generator (report-generators.ts:106-147) sums berths.price across all currencies, again ignoring priceCurrency. PDF output reads TOTAL COMPLETED REVENUE: 1,234,567.00 with no unit. Plus the implicit undefined locale means the same PDF renders differently between US-en and de-DE nodes — non-deterministic under Next.js standalone runtime. Combined with C1 these are the highest-risk financial artefacts in the app — they ship to ownership.

C3. expenses.amountUsd snapshot is brittle and date-misaligned

expenses.ts:117-135 and :227-249 snapshot amountUsd + exchangeRate on the row at create/update — good. But:

  • Frankfurter unreachable at create time → amountUsd = null, exchangeRate = null. The PDF (expense-pdf.service.ts:235-246) falls back to 1:1 with a footnote but no aggregate-total guard — totals silently undercount the foreign-currency portion.
  • The snapshot uses the rate at edit time, not expenseDate. An expense from 6 months ago, edited today, gets today's FX. The correct anchor is expenseDate.

Expenses is the only table that snapshots FX. Invoices, berths, yacht maintenance costs, and EOIs store amount + ISO code only and re-resolve FX live at display — see H1.


HIGH

H1. currency_rates has no history / retention

db/schema/system.ts:207-222 — one row per (base, target). refreshRates() (currency.ts:36-68) upserts in place; only the latest rate ever exists. Consequences:

  • Cannot value an old invoice at its issue-date rate.
  • No FX audit trail — if Frankfurter returns bad data the prior value is gone.
  • The 6-hourly cron (queue/scheduler.ts:31) overwrites silently.

Fix: append-only table (fetchedAt in PK), getRate(from, to, asOf?) selects the most recent row ≤ asOf. Pairs with M8.

H2. Rounding policy is undocumented and currency-blind

currency.ts:23 does Number((amount*rate).toFixed(2)) — pins to 2 decimals regardless of currency. JPY has 0 fractional digits, so a USD → JPY conversion stores .45 JPY which is unspendable; a JPY → USD conversion floors at 1 cent precision when 1 yen ≈ $0.0066. No banker's- rounding helper exists, no Math.round policy, no doc.

Invoice math (services/invoices.ts:251-276, :435-466) does (subtotal * discountPct) / 100 and subtotal - discountAmount + feeAmount with no rounding before String()-ing into numeric columns. A 2% discount on a $100.10 subtotal stores '2.002' and '98.098'. The displayed total (Intl truncates at 2dp) and the stored total diverge by sub-cent amounts for every percentage-discounted invoice.

H3. formatCurrency cents-clamp hides fractions on berth pricing

utils/currency.ts:55-56 clamps minFractionDigits to 0 when maxFractionDigits: 0 is passed — correct for headline tiles but also the default for berth-card / berth-columns / berth-tabs price (berth-card.tsx:91, berth-columns.tsx:185, berth-tabs.tsx:410). €1,250,000.50 renders as "€1,250,001" with no tooltip. Low impact today; will confuse yacht-show buyers once non-round prices land.

H4. Berth recommender ranks prices currency-blind

berth-recommender.service.ts scores by berths.price with no FX normalization. Multi-currency tier ranking is meaningless. Heat weights in system_settings are tuned per-port; admins have no way to spot the skew. Same root as C1 but isolated to the recommender.

H5. /api/v1/currency/convert swallows rate-unavailable as data: null

/api/v1/currency/convert/route.ts:19 does not differentiate "rate unavailable" from "amount was zero" — both return { data: null } with 200. Callers that distinguish these need a separate error envelope. expenses.ts and expense-pdf.service.ts handle null correctly; the API surface does not.


MEDIUM

M1. No cold-start bootstrap for currency_rates

queue/scheduler.ts:31 runs every 6h; on a fresh db:seed the table is empty for up to 6h and every convert() returns null. Seed initial rates in seed-bootstrap.ts or self-trigger on cron registration. Masked today because seeded ports are all USD (USD→USD short-circuits at currency.ts:9).

M2. seed-bootstrap.ts hardcodes USD for every port

seed-bootstrap.ts:42,49 — both demo ports default to USD. The schema admits per-port currency but no EUR/GBP demo port exists. Multi-currency correctness has zero seed/fixture coverage. Adding one non-USD demo port would surface C1/C2/H4 in smoke output.

M3. Hardcoded "Rates (USD)" column header

berth-columns.tsx:324 — header reads 'Rates (USD)' regardless of the row's priceCurrency. Column body is currency-aware; header lies for non-USD rows.

M4. EOI / interest-summary PDFs use prefix code instead of formatCurrency

pdf/templates/interest-summary-template.ts:112, berth-spec-template.ts:127,172'USD 1,234,000' rather than $1,234,000. Inconsistent with the invoice template and in-app UI. Surfaces to clients in EOI bundles.

M5. OCR receipt parser maps $ → USD unconditionally

ocr/parse-receipt-text.ts:17. CAD/AUD/HKD/SGD all print $. Force confirmation when the port's defaultCurrency isn't USD.

M6. Expense form/scan defaults hardcode USD rather than port default

expense-form-dialog.tsx:61,85,215,227, expenses/scan/page.tsx:63,314, scan-shell.tsx:102. A rep at a EUR-default port changes the dropdown on every expense.

M7. Synthesized inverse rates drift

refreshRates() stores 1/rate rounded to 6dp (currency.ts:60). USD→EUR→USD round-trips diverge from identity by basis points; matters for the expense-pdf USD→EUR chain. Fetch base=USD and base=EUR separately from Frankfurter rather than synthesizing.

M8. Unique index blocks the H1 fix

currency_rates_base_target_idx makes append-only history a breaking migration. Flagged so the H1 fix is planned with the index drop.


Notes / non-issues

  • formatCurrency is well-defended; consolidate the ad-hoc toLocaleString({ style: 'currency' }) in expense-columns.tsx / expense-detail.tsx onto it.
  • getRate caching in expense-pdf.service.ts:215-231 is the right shape — reuse for any other batch conversion path.
  • Documenso payloads carry currency through unchanged; no FX in that path.

Top 3 to fix first: C1 (dashboard mixed-currency totals), C2 (report PDFs drop currency entirely), H1 (no FX history/retention).


29. Outbound webhook delivery audit (outbound-webhook-auditor)

Outbound Webhooks — Audit (Task #29)

Scope: src/app/(dashboard)/[portSlug]/admin/webhooks/, src/app/api/v1/admin/webhooks/**, src/lib/services/webhooks.service.ts, src/lib/services/webhook-dispatch.ts, src/lib/services/webhook-event-map.ts, src/lib/queue/workers/webhooks.ts, src/lib/validators/webhooks.ts, src/lib/db/schema/system.ts, src/lib/utils/encryption.ts, src/lib/queue/index.ts. Read-only.


CRITICAL

C1 — Signature has no replay protection

workers/webhooks.ts:120-134 — HMAC covers only the JSON body (sha256=HMAC(secret, bodyString)). The body contains a timestamp field, but it's not separately authenticated/headered in a way the receiver can verify in a freshness window. A captured request replays verbatim, signature still valid. No X-Webhook-Timestamp, no nonce, no documented receiver dedup contract.

Fix: Stripe-style signature = HMAC(secret, ${ts}.${body}) with X-Webhook-Timestamp header, and document that receivers must reject |now ts| > 5 min. Also document X-Webhook-Delivery (already sent) as the receiver-side idempotency key.

C2 — webhook_deliveries grows unbounded

schema/system.ts:107-126 — no reaper anywhere; searches for the table outside writers returned zero hits. Every event, retry, test, and redeliver writes a row with full payload JSONB plus up to 1 KB response_body. BullMQ's removeOnComplete/removeOnFail only prunes Redis, not Postgres. On a port subscribed to high-volume events (berth.status_changed, interest.stage_changed, invoice.*) this is unbounded write-amplification.

Fix: maintenance job pruning by status + age (e.g. 30 d success, 90 d dead_letter), gated by a system_settings retention key. Add (status, createdAt) index for the scan.

C3 — Worker dispatches with empty signature when secret is NULL

workers/webhooks.ts:111-134, schema :97secret is nullable; the worker silently sends header X-Webhook-Signature: '' when missing. Compliant receivers reject, mis-coded ones accept. Creation always generates a secret, so NULL implies DB tampering or a future migration mistake — defence-in-depth still warrants a hard fail.

Fix: dead-letter with reason missing_signing_secret; ideally make webhooks.secret NOT NULL.


HIGH

H1 — DNS-rebinding TOCTOU

workers/webhooks.ts:18-45, 147-167resolveAndCheckHost() does its own dns.lookup, then hands the hostname back to fetch, which resolves again before connecting. The validator's comment (validators/webhooks.ts:69-70) defers rebind to the worker, but the worker's check is independent of the actual connect; a rebind between lookup and fetch still hits internal IPs.

Fix: pin the connect to the resolved IP via an undici Agent with connect: { lookup: () => allowedIp }, keep the original hostname as the Host header so TLS SNI works. Or inspect socket.remoteAddress post-connect and abort on mismatch.

H2 — Retry policy too short / no jitter

queue/index.ts:14, 29-34maxAttempts: 3, exponential 1000 ms. Real schedule ≈ 1 s / 2 s / 4 s — a 30 s receiver outage during a deploy permanently dead-letters every in-flight event; super-admins get a notification storm and events are lost unless redelivered manually. Industry norm (Stripe, GitHub) is ≥5 attempts over hours with jitter.

Fix: bump to 810 attempts, exponential base 30_000 ms with jitter, surface the next-retry-time in the admin detail view.

H3 — No circuit-breaker on chronically failing endpoints

workers/webhooks.ts:231-287 — after a dead_letter the webhook stays is_active=true. Five global worker slots (concurrency: 5) get saturated by a broken subscriber's 3×10 s retry cycle, starving other ports' webhooks. The dead_letter notification dedupes per delivery, so 1000 events → 1000 alerts.

Fix: rolling failure counter on webhooks; auto-set is_active=false after N consecutive dead_letters and alert once. Coalesce notification dedupeKey by webhook+day.

H4 — EMAIL_REDIRECT_TO short-circuit writes status dead_letter

workers/webhooks.ts:94-109 — semantically wrong; "exhausted retries" is what dead_letter means in admin UI. The SSRF-blocked path at :148-166 shares the same status. Doesn't fire alerts (alert path requires isFinalAttempt), but pollutes the deliveries list.

Fix: introduce a skipped (or paused) status, use it for both paths.

H5 — Payloads are ID-only; redeliver re-sends stale data

All 18 dispatchWebhookEvent(...) callsites pass only { clientId, berthId, interestId, ... }. Receivers must call back for anything beyond the ID — yet webhooks fire for archived / merged / deleted entities (client.archived, client.merged, yacht.ownership_transferred). Worse, redeliverWebhookDelivery (webhooks.service.ts:283-343) clones the payload verbatim, so a replay after GDPR erasure resurrects the deleted ID and a replay after a client merge resurfaces the pre-merge identity.

Fix: snapshot a minimal {id, name, status, archived} at dispatch time; on redeliver, re-check entity existence and short-circuit to skipped if the row is gone.

H6 — Test endpoint has no rate limit

webhooks.service.ts:347-383 — combined with H3 a rapid-fire test can stall the queue. Add a per-webhook test throttle (e.g. 1/sec).


MEDIUM

M1 — SSRF denylist gap

validators/webhooks.ts:18-43 covers RFC1918, loopback, link-local (incl. AWS IMDS 169.254/16), CGNAT 100.64/10 (catches Alibaba's 100.100.100.200), IPv6 ULA / link-local, GCP/Azure named metadata hosts. Missing: Oracle Cloud metadata 192.0.0.192. Add the literal.

M2 — HTTPS check is create-time only

webhook.url isn't re-validated at dispatch. A bad migration or DB edit could let http:// through. Add url.startsWith('https://') in the worker before fetch.

M3 — secretMasked decrypts on every list/get

webhooks.service.ts:80-99, 103-123 runs decrypt() per row to compute a 5+3-char mask. The mask is deterministic from the plaintext; cache it in a new column secret_masked so the read path doesn't exercise the encryption key per webhook.

M4 — Shared encryption key naming

utils/encryption.ts:7 reads EMAIL_CREDENTIAL_KEY but encrypts webhook secrets, SMTP creds, and IMAP creds with it. Rotation spans multiple tables; the name implies email-only and invites config drift. Rename to APP_CREDENTIAL_KEY (alias the old name) and document the rotation runbook.

M5 — No event-name versioning

webhook-event-map.ts exports a flat list. When data inside interest.stage_changed changes shape, every receiver breaks silently. Add either a X-Webhook-Version header or event name suffix (.v2) before this surfaces to external integrators.

M6 — responseBody may carry third-party PII

workers/webhooks.ts:191, 197 stores up to 1 KB of the receiver's response and surfaces it through the admin deliveries list. If a receiver echoes data in 4xx bodies it lands in webhook_deliveries.response_body and the pino warn line. Flag for the GDPR DPIA; consider redacting headers / scrubbing on storage.

M7 — SSRF-blocked delivery isn't audit-logged

workers/webhooks.ts:148-166 updates the row but skips createAuditLog. Success and final-fail paths both write audit rows. Add one here — these are the deliveries you most want forensics on.

M8 — No port-id assertion on the BullMQ payload at worker entry

workers/webhooks.ts:69 trusts portId from the job and uses it for notifications + audit. The producer is internal and writes consistent data, so this is defence-in-depth, but the worker could fetch the webhook row and assert webhook.portId === payload.portId before proceeding.


What's good

  • AES-256-GCM at rest; secret returned to admin once only on create / regenerate (webhooks.service.ts:71-75, 225-229).
  • HTTPS-only create-time validation, comprehensive IPv4 + IPv6 private-range denylist, named cloud metadata hosts.
  • DNS re-resolution at dispatch time — intent is right (TOCTOU gap noted in H1).
  • Idempotent delivery row created before enqueue (webhook-dispatch.ts:48-57); worker crash leaves a recoverable pending row.
  • BullMQ retries + dead_letter handling + super-admin notification; redeliver path preserves the original failed row + tags the replay payload with retried_from / retried_at.
  • Multi-tenant guard at every read/write (webhooks.service.ts:108, 137, 174, 200, 244, 292, 352) and on the dispatcher subscription query (webhook-dispatch.ts:33-39).
  • EMAIL_REDIRECT_TO dev kill-switch.
  • 10 s fetch timeout via AbortController.
  • Permission gating: every admin route wraps in withPermission('admin', 'manage_webhooks', …); redeliver + regenerate-secret included.

Priority

Land C1 (timestamp-in-signature) + C2 (deliveries reaper) + H2 (retry policy) + H3 (auto-disable) before exposing webhooks to external integrators. C3, H1, H5 are smaller patches but should ship in the same release. MEDIUM items can batch behind a single "webhook hardening" follow-up.


20. Authorization model integrity audit (authz-auditor)

Authorization model integrity audit

Branch: feat/documents-folders · Date: 2026-05-12 · Auditor: authz-auditor Scope: every API route's permission gate, port-scope SQL filters, per-user override merge semantics, isSuperAdmin bypass paths, residential toggle, Documenso webhook port resolution. Read-only.


CRITICAL

C-1 — Privilege escalation via user-permission-overrides PUT

File: src/app/api/v1/admin/users/[id]/permission-overrides/route.ts lines 153244 (plus the parallel issue in src/lib/services/users.service.ts updateUser line 285292).

The PUT endpoint gates on withPermission('admin', 'manage_users') and refuses self-target (line 163), but it does NOT verify that the caller already holds the permission they are granting to the target user. A port admin holding ONLY admin.manage_users can therefore:

  1. Mint a colleague with admin.permanently_delete_clients = true, admin.system_backup = true, admin.manage_settings = true, documents.delete = true, interests.override_stage = true, etc.
  2. Have the colleague execute those actions on their behalf, or
  3. Re-flip leaves on the colleague's record at will because nothing in the override-merge path knows the granting admin was unprivileged.

The same path exists in updateUser (role reassignment)roleId is validated to exist (line 289) but there is no "you can only assign a role whose effective permission set ⊆ your own" check. Because admin/roles POST is super-admin-only, role-creation is safe, but role assignment is the privilege-escalation surface since a sales_director-equivalent could promote a peer to a super-admin-flavoured role.

The audit log records the change so the activity is detectable, but detection is not prevention. Self-target block on the override route is necessary but not sufficient — the admin can just bounce the elevated permission off a sock-puppet account.

Fix: before writing permissionOverrides, compute the caller's effective permission map and reject any leaf in the new override that is true while the caller's matching leaf is false. Same check on the roleId change in updateUser — compare the new role's effective permission set against the caller's and refuse on any superset.


HIGH

H-1 — Listing endpoints without an explicit withPermission gate

grep -L "withPermission|requireSuperAdmin|requirePermission" against withAuth routes turns up 31 files. Most are legitimate self-service surfaces (/me, /notifications, /currency/*, /users/me/preferences, /alerts/*/dismiss|acknowledge, /saved-views/[id] — ownership-checked) or correctly do an in-handler check (/clients/bulk, /companies/bulk, /yachts/bulk, /interests/bulk — all gate on ctx.permissions?.<resource>?.<action>).

The outliers worth flagging:

  • /api/v1/alerts/route.ts GET — no permission gate. Anyone in the port with valid auth can read every alert row (audit blockers, GDPR alerts, permission-denied alerts, etc.). Service listAlertsForPort scopes on portId so cross-tenant leakage is contained, but the alert payload exposes internal-only signals (e.g. who triggered a permission_denied). Either gate on admin.view_audit_log or filter the payload by sensitivity tier.
  • /api/v1/vocabularies/route.ts GET — intentionally permissionless per the comment (vocabularies feed pickers across the app). Fine — port-scoped.
  • /api/v1/settings/feature-flag/route.ts GET — port-scoped, returns a single boolean for a key the client names. Acceptable.
  • /api/v1/search/route.ts GET — relies on the service's can() helper to skip buckets the caller can't see (search.service.ts line 305). Good. includeOtherPorts correctly gates on ctx.isSuperAdmin (line 20).

H-2 — Search service: residential buckets

src/lib/services/search.service.ts line 305 (can()) honours permissions.residential_clients.view and .residential_interests.view. The withAuth resolver sets these to true when portRole.residentialAccess is true (helpers.ts line 209221) BEFORE the per-user override layer runs (line 227238). So a per-user override with residential_clients.view = false will take effect — verified by tracing deepMerge (helpers.ts line 7398): the source false boolean replaces the target true at the leaf because the recursion only triggers when both sides are objects. Per-user false correctly bubbles through. Pass.

H-3 — withAuth userOverride fetch costs a round-trip on every request

Not a security issue but a perf+coupling note: every authenticated request now runs three sequential queries when the user is not a super-admin (userPortRoles → portRoleOverrides → userPermissionOverrides). Hot routes inherit the latency tax. Consider a Promise.all for the override pair, or a per-request memoize keyed on (userId, portId) since multiple withAuth calls per request don't happen but middleware-adjacent paths exist.


MEDIUM

M-1 — withAuth residential toggle bypasses portRoleOverrides for residential.*

helpers.ts line 209221: when portRole.residentialAccess === true, the resolver replaces permissions.residential_clients and permissions.residential_interests with a hardcoded all-true map. If a port-role-override set residential_clients.delete = false (e.g. "this port lets reps see but not delete residential rows"), the residential toggle silently overrides that. By design? Maybe — the toggle is documented as "full residential access" — but it would be surprising if an admin set up the port-role-override expecting it to constrain toggled users. Document or compose more carefully.

The per-user permission override still wins (it runs after, line 227), so a deliberate admin can recover, but the precedence is subtle.

M-2 — parseBody-vs-req.json consistency on bulk routes

All four bulk routes (clients, yachts, companies, interests) use parseBody correctly. The bulk permission check pattern is repeated four times with the same shape — extract a requireOneOf(ctx, [{resource, action}, ...]) helper to avoid drift when a new bulk route ships.

M-3 — documents.feature-flags and documents.wizard

Both routes wrap with withAuth + withPermission('documents', 'view'). The feature-flags route returns Documenso/template feature toggles — fine. The wizard route fetches drafts. Spot-check passed; both scope to ctx.portId in the service.

M-4 — Documenso webhook port resolution: verified correct

src/app/api/webhooks/documenso/route.ts line 58101: secret enumeration over listDocumensoWebhookSecrets() with verifyDocumensoSecret (timing-safe). The matched portId threads through portScope (line 143) to every per-recipient and per-document handler. resolveWebhookDocument (documents.service.ts line 967996) refuses to mutate when the lookup is ambiguous across ports without a portId. Pass. No cross-tenant write surface.

One small nit: the webhook returns 200 on invalid-secret to avoid leaking signal (line 100) but the audit row records a webhook_failed with portId: null. Rate-limited per IP (line 77). Fine.

M-5 — requireSuperAdmin always requires a portId in userPortRoles?

No — super admins skip the userPortRoles lookup entirely (helpers.ts line 174 condition), but still need portId set somewhere (header or profile.preferences.defaultPortId, line 161164) unless they're hitting a no-port endpoint. The gate on line 166 only fires when !portId && !isSuperAdmin. A super admin without a portId in the request will have ctx.portId = '' and ctx.portSlug = ''; any route that uses ctx.portId in a SQL filter will match nothing, which is a fail-safe but produces confusing empty UIs. Worth documenting that super-admin requests SHOULD always carry an X-Port-Id.

M-6 — requireSuperAdmin audit-logs denials with empty entityId

helpers.ts line 298: entityId: ''. The audit row is functional but harder to query later. Set to attemptedAction or the route path for forensics.


Pass / verified

  • deepMerge false-propagation: false at any leaf correctly overwrites true in the role baseline because the recursion guard requires both sides to be objects (helpers.ts line 8188). Boolean → boolean falls into the else branch and assigns directly.
  • Override layer ordering: role → port-role-override → residential toggle → user-permission-override. User override wins last. Self-target on PUT rejected (route line 163).
  • Listing port_id SQL filter (sampled): clients, interests, yachts, companies, documents, files, berths, invoices, expenses, reminders, residential_clients, residential_interests, alerts, error_events, audit_logs, notes (all four polymorphic types) — every service list* function constrains on portId in the WHERE clause. Search service goes further with defense-in-depth port_id filters inside each per-bucket query (search.service.ts lines 361, 443, 490, 539, 600, 667, 717, 772, 805, 856, 902, 952, 995, 1035, 1069, 1107, 1156, 1172, 1186, 1200, 1384).
  • /admin/ports/[id]: explicit assertPortInScope blocks cross-tenant access by non-super-admins (route line 1520). Pass.
  • /admin/error-events: super admins see all, regular admins scoped to ctx.portId; the [requestId] route additionally re-checks the row's portId and returns 404 (not 403) on mismatch to avoid existence leak.
  • isSuperAdmin writability: not in createUserSchema / updateUserSchema. Only settable via the invitation flow with an explicit if (body.isSuperAdmin && !ctx.isSuperAdmin) throw guard (/admin/invitations/route.ts line 40). Pass.
  • Documenso webhook: secret enumeration is timing-safe; ambiguous cross-port documensoId lookups refuse to mutate; portScope threaded to every handler. Pass.

Summary punch list

Sev Item File / fix
CRIT Privilege escalation via permission-overrides PUT and role reassignment permission-overrides/route.ts, users.service.ts updateUser — refuse to grant any leaf the caller doesn't hold
HIGH /api/v1/alerts GET ungated Add withPermission('admin','view_audit_log') or filter payload
MED Residential toggle silently overrides port-role-override on residential_* Document precedence in helpers.ts or compose via deepMerge
MED withAuth runs three sequential override queries per request Parallelize override fetches
MED Bulk-route permission check duplicated 4× Extract requireOneOf helper
MED requireSuperAdmin audit row carries empty entityId Set to attemptedAction
INFO Super-admin request without X-Port-Id produces empty ctx.portId and silently empty queries Document; consider 400

Word count: ~1050.


25. Inquiry → CRM funnel correctness audit (funnel-auditor)

Inquiry → CRM Funnel Audit

Scope: src/app/api/public/{interests,residential-inquiries,website-inquiries}, src/app/api/v1/admin/website-submissions/*, src/lib/services/inquiry-notifications.service.ts, src/components/admin/inquiry-inbox.tsx, src/lib/validators/{interests,residential}.ts, settings keys inquiry_contact_email / inquiry_notification_recipients / residential_notification_recipients.

Read-only; no edits.


CRITICAL

C1. "Convert to client" prefill goes nowhere — every conversion is double-typed

inquiry-inbox.tsx:127-135 flips the row to converted, then pushes /clients?prefill_name=…&prefill_email=…&prefill_phone=…&prefill_source=website&prefill_inquiry_id=…. A repo-wide grep for any of those five keys returns only the writer — no client form / page / hook ever reads prefill_*. So Convert: (a) flushes the inbox row to "converted" eagerly, (b) drops the operator on a blank New Client form, (c) loses the inquiry_id ↔ client/interest linkage permanently because nothing persists it. The triage state is now lying ("converted" with no downstream entity), and operators retype the payload from the inbox card. Either consume the params in client-form.tsx (with a hidden inquiry_id that the create endpoint persists into clients.metadata or a new inquiry_origin_id FK), or revert the eager state flip so the inbox stays honest until the client is actually saved.

C2. Two parallel intake pipelines with no correlation → duplicate interests + zombie inbox rows

/api/public/interests directly creates clients + yacht + interest rows and queues notifications. /api/public/website-inquiries is the new "dual-write capture" that stores the raw payload in website_submissions for triage. The website is expected to call both (the docstring on website-inquiries/route.ts says "AFTER its existing NocoDB write succeeds"). Nothing links them. Result for a single berth-form POST:

  1. interests row created automatically with notifications fired.
  2. website_submissions row inserted with triage_state='open'.
  3. Operator opens the inbox, sees the "open" card, clicks Convert → second interests row.
  4. Inbox UI never sees that step 1 already happened; the heat-scored interest from step 1 is silently shadowed.

Either the public form path should write submission_id onto the created interests row (and the inbox should auto-mark converted whenever a matching interest exists), or the two pipelines need to be merged into one. Right now they coexist and contradict each other.

C3. Email dedup is case-sensitive — capital-letter resubmission spawns duplicates

public/interests/route.ts:91 matches clientContacts.value === data.email (no lower()). The supporting idx_cc_email index (clients.ts:91) is also raw-value, not lower(value). Two POSTs as Matt@Example.com and matt@example.com produce two separate clients, yachts, interests — and now the recommender has split history for the same human. The companies branch (route.ts:122) gets this right (sql\lower(${companies.name}) = lower(${data.company.name})`); the email branch must match: lowercase on insert and on lookup, plus a partial unique index on lower(value) WHERE channel='email'`.


HIGH

H1. Residential clients have no dedup at all

residential-inquiries/route.ts:73-93 always inserts a fresh residential_clients row. There's no email/phone match, no unique index on residential_clients.email (verified — schema/residential.ts has no uniqueIndex). Every resubmit = new prospect. Sales gets a bloated list of phantoms. Mirror the berth-path dedup (lowercased-email lookup → reuse → open a new residential_interests row only).

H2. findUsersWithInterestsPermission ignores user_permission_overrides (migration 0055)

inquiry-notifications.service.ts:139-158 reads only roles.permissions. The request-time auth path in lib/api/helpers.ts:227-238 correctly layers role → port-role-override → user-override, but this fan-out helper does not. Symptoms:

  • User granted interests.view via override only → never gets new-inquiry pings.
  • User had interests.view in role but their override removed it → still gets pinged.

Either (a) collect every user on the port and run the same deepMerge chain per user, or (b) move permission-resolution into a single service helper both callers use.

H3. Bogus portId fails as a 500 (FK violation) instead of 400

public/interests/route.ts:51-52 accepts the portId query/header but never verifies the port exists before the transaction. An invalid id surfaces as a Postgres FK error from clients.port_id, returned as a generic 500. The residential endpoint (residential-inquiries/route.ts:58-61) validates upfront via db.query.ports.findFirst — make the berth route do the same.

H4. Cross-port email collisions are non-deterministic

public/interests/route.ts:90-114: when a clientcontact with the same email exists on a _different port, the code creates a new client. But tx.query.clientContacts.findFirst returns "any matching row" with no ORDER BY — subsequent submissions may pick either port's row first. Net: same email used cross-port, then resubmitted to the original port, can spawn 2nd/3rd same-port clients. Fix: filter the lookup by joining to clients.port_id, or scope the contact lookup to clients owned by the target port from the start.

H5. portName hardcoded as 'Port Nimara' in four call sites

  • inquiry-notifications.service.ts:57, 126 (client confirmation + sales alert)
  • residential-inquiries/route.ts:158, 207 (subject tokens)

The author left a // future: resolve from getPortBrandingConfig comment. The moment a second port has a marketing site, every email reads "Port Nimara" regardless of recipient. Wire through getBrandingShell(portId).portName (already loaded for the HTML body via branding-resolver.ts).

H6. Residential confirmation ignores inquiry_contact_email

residential-inquiries/route.ts:150 hardcodes contactEmail: 'sales@portnimara.com' in the client confirmation email. The berth path reads the per-port inquiry_contact_email setting (inquiry-notifications.service.ts:44). Settings UI (settings-manager.tsx:96) advertises this setting controls both — but it doesn't. Admins can't reroute residential replies.

H7. Residential sales alert bypasses the email queue

residential-inquiries/route.ts:214 calls await sendEmail(recipients, …) synchronously inside sendResidentialNotifications. The berth path enqueues via BullMQ (inquiry-notifications.service.ts:51,118). When SMTP is slow/down, the residential POST hangs (or 500s — though wrapped in .catch, the await is fired after the response is returned, so worker eventloop is the only victim) and the notification is lost with no retry. Move to the email queue like the berth path.


MEDIUM

M1. No UTM / referrer / attribution capture anywhere

publicInterestSchema and publicResidentialInquirySchema have no utm_source, utm_medium, utm_campaign, referrer, landing_page fields. source is hardcoded 'website' (interests.ts:231). Berth-recommender heat scoring and lead-source dashboards (audit #11) cannot differentiate organic vs paid vs broker referral. The website_submissions.payload JSONB at least preserves whatever the website chooses to forward — but interests itself stores only the literal string 'website'. Add an attribution block to both validators + columns (interests.utm_*, residential_interests.utm_*) and persist what the website hands us.

M2. Public routes use req.json(); schema.parse(body) instead of parseBody

public/interests/route.ts:47-48 and public/residential-inquiries/route.ts:51-52. CLAUDE.md explicitly flags this: "Always use parseBody(req, schema) from @/lib/api/route-helpers" so the error envelope is field-level 400 instead of a generic 500.

M3. Company / yacht / phone matching missing trim + phone-E164 dedup

  • companies match (route.ts:121-124) is case-insensitive but not whitespace-trimmed: "Acme Ltd ""Acme Ltd".
  • Phone contact dedup uses raw clientContacts.value, never valueE164. The same number formatted differently is a duplicate row.
  • Yachts always insert; resubmissions create a fresh yacht every time even if the hull/registration is identical.

M4. Response envelope inconsistency

  • Berth route: { data: { id, message } } at 201 — close to canonical envelope.
  • Residential route: { success: true, clientId, interestId } at 201 — legacy success:true shape that CLAUDE.md says was normalized away in 2026-05-07.

Pick one and update consumers.

M5. Inquiry inbox payload-key extractor is brittle

pickName/pickEmail/pickPhone in inquiry-inbox.tsx:58-78 use a small set of candidate keys but never compose first_name + last_name. Website payloads that send {first_name, last_name} without a name/fullName field render as (no name supplied). Two-line addresses and contact-form payloads silently lose the operator's first hint of who submitted.

M6. Audit log misses dedup decisions

createAuditLog (public/interests/route.ts:254-271) records the interest creation but not whether the client+yacht were created fresh vs reused. Forensics ("did this lead come from the form or get manually entered?") become guesswork. Add metadata.dedup = { clientReused: boolean, companyReused: boolean }.

M7. Yacht inserted with status: 'active' even on speculative form leads

route.ts:179. There's no "prospect" yacht state, so every unconverted interest still leaves an "active" yacht. Active-yacht counts in reports become inflated. Consider a 'prospect' status or a deferred-insert pattern keyed on interests.outcome != 'lost_*'.

M8. Admin website-submissions list permission mismatch

The inbox at /[portSlug]/admin/inquiries is the marketing-funnel triage surface, but /api/v1/admin/website-submissions/route.ts:23 gates GET on admin.view_audit_log. A sales lead reviewing submissions doesn't conceptually need audit-log access. Introduce a dedicated inquiries.view / inquiries.triage permission (consistent with the rest of the permission matrix) so this can be granted independently.


Settings application — verified flow

  • inquiry_contact_email (string, per-port): consumed by berth client-confirmation email (inquiry-notifications.service.ts:44); not consumed by residential confirmation (H6). Falls back to sales@portnimara.com literal.
  • inquiry_notification_recipients (JSON array, per-port): consumed by berth sales-alert fan-out (inquiry-notifications.service.ts:106). Empty array = no external alert. No de-dup against role-based recipients (a user listed here who also has interests.view gets two pings).
  • residential_notification_recipients (JSON array, per-port): consumed by residential alert; falls back to [inquiry_contact_email] if empty (residential-inquiries/route.ts:174-179). Correct envelope.

Three settings are surfaced on the admin Settings page (settings-manager.tsx:96-117) so admins can edit them; default values match the service-side fallbacks.


32. Improvements + nice-to-haves + genuine AI integration opportunities (improvements-auditor)

This is a forward-looking proposal report, not a defect audit. Grouped HIGH-VALUE / MEDIUM / EXPLORE with effort estimates and "what NOT to AI-ify" critical pass.

Audit #32 — Improvements, Nice-to-Haves & AI Opportunities

Scope: Forward-looking proposals, not a defect audit. Every proposal grounded in real surfaces seen in this repo (file paths cited). For each: user benefit, implementation sketch, effort estimate (S/M/L), and risk note where it matters.

Effort key: S = ≤½ day, M = 13 days, L = >3 days / cross-cutting.


Section A — UX / Feature Improvements

A · HIGH-VALUE

A1. Bulk actions on Berths, Companies, Yachts Bulk archive/tag/move flow exists in src/components/clients/client-list.tsx + src/components/interests/interest-list.tsx (single /bulk endpoint per domain), but Berths, Companies, and Yachts use the same data-table.tsx shell with BulkAction[] support and never pass any. Reps regularly need to retag a batch of yachts after import or move 30 berths to a new pricing band.

  • Sketch: Add bulkActions=[...] wired through the existing data-table.tsx API; mirror the /api/v1/clients/bulk and /api/v1/interests/bulk endpoint pattern for berths, companies, yachts. interest-list.tsx lines 124280 are the reference implementation.
  • Effort: M
  • Risk: low — pattern already tested for two domains; ensure permission gate per action mirrors single-entity gates.

A2. Smart undo banner for archive / outcome / stage-change Already have client-restore.service.ts + a smart-restore-dialog component, and stage rollback would be supported by audit logs. Reps lose minutes every time they fat-finger an archive or set an outcome on the wrong card on the pipeline board.

  • Sketch: After any archive / outcome-set / interest_archived / interest_completed trigger, raise a Sonner toast with an "Undo" action for 8s, calling the existing restore service or a tiny reverse-mutation endpoint. Hook into the mutation onSuccess in interest-list.tsx, client-list.tsx, pipeline-board.tsx, and interest-outcome-dialog.tsx.
  • Effort: M
  • Risk: berth-rules-engine has already fired side-effects (berth_unlinked, interest_completed cascade). Undo must replay the reverse rule or explicitly skip rule-engine via a skipRules flag — otherwise undo leaves stale berth status.

A3. "What changed since I last looked" digest on detail pages The entity-activity.service.ts + use-track-entity-view.ts infrastructure is already in place — every detail view is tracked. Reps open a deal they haven't touched in a week and have to manually scroll the activity feed.

  • Sketch: On detail page load, query activity items with createdAt > lastViewedAt (from recently-viewed.service.ts) and render a dismissable "3 new things since 5 days ago: signed EOI, +€2k deposit, new note from María" strip above entity-activity-feed.tsx.
  • Effort: M
  • Risk: none meaningful — purely additive.

A4. j/k row navigation + o open + e edit + / focus filter on list pages Cmd-K is already wired in command-search.tsx; reps still mouse-hop between rows in data-table.tsx. Power users on busy pipeline days are the loudest beneficiaries.

  • Sketch: Add a useListKeyboardNav(rows, activeIndex) hook used inside data-table.tsx. j/k move active row, o/Enter opens detail, e triggers inline-edit on the first inline-editable cell, / focuses the filter input. Respect e.target being an input.
  • Effort: S
  • Risk: must be globally disabled inside dialogs/forms — use the same document.activeElement instanceof HTMLInputElement guard already in command-search.

A5. Quick-create overlay (cmd-K → "+ New …") Command-search currently navigates but doesn't create. Reps regularly want to drop a client/interest/reminder without leaving the current page (e.g. a quick call comes in while reviewing a berth).

  • Sketch: Extend command-search.tsx palette with + New client, + New interest, + New reminder, + Log call. Each opens a drawer-mounted minimal form (3 fields max) using the existing forms wrapped in Drawer instead of Dialog. Re-use client-form.tsx, reminder-form.tsx in a "compact" mode prop.
  • Effort: M
  • Risk: low — entirely additive UI.

A · MEDIUM

A6. Smarter defaults from "my last used" Today client-form.tsx, interest-form.tsx, expense-form-dialog.tsx, and reminder-form.tsx reset every field. A rep doing 12 interests in a row re-types the same source / currency / lead source.

  • Sketch: Persist last-submitted values per form per user under user_profiles.preferences.formDefaults (same shape used for dashboardWidgets per widget-registry.tsx comments). On form open, prefill from preferences, mark prefilled fields with subtle "(last used)" hint. Provide a "Reset defaults" link in the form footer.
  • Effort: S
  • Risk: leaks tag/source preference into the wrong port for super-admins switching ports — scope key by (userId, portId, formName).

A7. Pipeline board: drag-to-stage with confirm on "won/lost" pipeline-board.tsx exists. Today reps must click a card → open the interest → open outcome dialog. Drag-to-stage is the natural kanban gesture.

  • Sketch: Add @dnd-kit/sortable (already in tree if not, very light add). Wire onDragEnd to inline-stage-picker.tsx's mutation. Dropping into won/lost columns opens interest-outcome-dialog.tsx instead of silent set.
  • Effort: M
  • Risk: berth-rules-engine fires on eoi_sent / contract_signed triggers — make sure stage drag uses the same advanceStageIfBehind codepath, not a raw stage update.

A8. Saved-view sharing within a port saved-views.service.ts is per-user. Sales teams want a shared "Hot leads — March" view.

  • Sketch: Add visibility: 'private' | 'shared' column to savedViews; service list() returns own + shared. Permission gate: savedViews.share (new). Show a "Share" toggle in save-view-dialog.tsx.
  • Effort: M
  • Risk: low — additive; ensure shared views can't expose entity rows the viewer lacks permission for (filter happens server-side on data fetch, not view definition, so already safe).

A9. Bulk "Move to folder" in documents hub Documents hub (hub-root-view.tsx, entity-folder-view.tsx, flat-folder-listing.tsx) supports single-item move via move-to-folder-dialog.tsx. No multi-select. Admins post-importing 200 docs spend 200 clicks.

  • Sketch: Add row-checkboxes to document-list.tsx, surface Move to folder as a bulk action. Reuse existing move-to-folder-dialog.tsx accepting an array. Service already supports the operation per-item; wrap in a single transaction.
  • Effort: S
  • Risk: system-managed folders already reject mutations via assertNotSystemManaged — bulk move must respect this per-item and report per-item errors (partial success).

A10. Reminder snooze presets in a single hotkey snooze-dialog.tsx exists with a Date picker. Reps want "tomorrow morning", "next Mon", "in 1 week" one-tap.

  • Sketch: Add quick-buttons row to snooze dialog. Same options as Gmail's snooze. Pre-compute target dates relative to user timezone (already wired via inline-timezone-field.tsx).
  • Effort: S
  • Risk: DST — use the existing formatInTimezone helpers, don't add raw ms.

A11. Dashboard widget: "My open EOIs — needs nudge" 13 widgets in widget-registry.tsx; none surface "EOIs sent ≥ 5 days ago, not yet signed, no reminder set". This is the single most actionable rep widget — the deal that's slipping.

  • Sketch: New widget eoi_followups querying documents where status='sent', computed sent_age_days > N (from system_settings.eoi_nudge_days, default 5), grouped by client. Include "Send reminder" action calling existing sendReminder Documenso wrapper.
  • Effort: M
  • Risk: none.

A12. Dashboard widget: "Berths I'm watching" Multiple reps end up specialising on berth subsets. Today no way to pin.

  • Sketch: Add a watchedBerths array under user preferences, "watch" toggle in berth-detail-header.tsx, widget rendering status changes since last view.
  • Effort: SM

A · EXPLORE

A13. Pipeline "what's due this week" board view A second pipeline-board view mode that columns by next-action-date instead of stage. Useful when stage is similar across many deals but timing varies.

  • Sketch: Toggle in pipeline-board.tsx header switching between stage-mode and date-mode. Bin into "Today / This week / Next week / Later".
  • Effort: M

A14. Inline-editable pipeline-board cards pipeline-card.tsx is read-only; double-click → edit value/notes in place, mirroring the <InlineEditableField> pattern already used everywhere on detail pages.

  • Effort: S

A15. "Open in new tab" cmd-click on any entity row data-table.tsx row click navigates. Need to make every row a real <a href> so cmd-click + middle-click behave natively. Power users coming from Linear / Notion will expect this.

  • Effort: S
  • Risk: keyboard-nav handler from A4 must not interfere with native link semantics.

Section B — Subtle Ergonomic Wins

B · HIGH-VALUE

B1. Auto-save indicator on <InlineEditableField> Inline-editable fields blur-save silently. Reps occasionally close the tab thinking their edit didn't take.

  • Sketch: Tiny "Saved · just now" timestamp ghost-text near the field for 2s after mutation success; "Saving…" spinner while pending. Surface in inline-editable-field.tsx and inline-tag-editor.tsx.
  • Effort: S

B2. Empty-state CTAs everywhere empty-state.tsx exists but several lists fall back to "No results" plain text (e.g. interest-eoi-tab when no EOI yet, client-yachts-tab when none linked).

  • Sketch: Audit every list/tab consumer, wire <EmptyState> with a primary CTA (e.g. "Generate EOI", "Link yacht").
  • Effort: S

B3. Copy-to-clipboard with smarter format Mooring numbers (A1), client phones, IBANs all benefit from "Copy" affordance. Today users select-and-copy from inline-editable fields which produces inconsistent whitespace.

  • Sketch: Add tiny "copy" icon-button next to inline-phone-field.tsx, mooring number display in berth-detail-header.tsx, and bank details in invoice detail. Use the standard navigator.clipboard.writeText with a 1s "Copied" tooltip.
  • Effort: S

B · MEDIUM

B4. Visual indicator for system-managed folders CLAUDE.md says folder-tree-sidebar shows lock markers on system folders. Add the same visual rule to move-to-folder-dialog.tsx — today the dialog lets you select system folders (and gets rejected later by assertNotSystemManaged).

  • Effort: S

B5. "Recently viewed" rail in command-search recently-viewed.service.ts exists; cmd-K opens to all-purpose search. Show last-5-viewed entities at top of palette when no query typed.

  • Effort: S

B6. Inline phone-to-call / phone-to-WhatsApp links inline-phone-field.tsx renders text. Wrap in tel: and append a WhatsApp icon linking to https://wa.me/<E.164>. For a port-side sales team WhatsApp is the primary channel.

  • Effort: S
  • Risk: phone numbers without an + country code break wa.me — only render when E.164-valid.

B7. Toast deduplication for realtime invalidation realtime-toasts.tsx (touched in current branch). Multi-edit sessions where one rep edits 8 fields generate 8 toasts on the watching rep's screen. Coalesce within 2s.

  • Effort: S

B8. Filter chip "save as view" shortcut filter-chips.tsx + saved-views-dropdown.tsx exist. Add a small "Save current filters as view" inline button when there's an unsaved filter delta.

  • Effort: S

B · EXPLORE

B9. Command-palette macros "send EOI to last-viewed client", "create reminder in 3 days for current client", etc. Recorded by holding a key while performing actions, then invokable via cmd-K → "Run macro".

  • Effort: L
  • Risk: niche; design-heavy for low payoff. Push to backlog.

B10. Inline timezone awareness on dates timezone-drift-banner.tsx warns of drift. Extend: every formatDate in detail headers shows Mon 14 May · 14:32 (your time) · 15:32 (client time) on hover when client timezone is known.

  • Effort: S

B11. "Pin" comment/note notes.service.ts is polymorphic; add a pinned: boolean column and surface pinned notes at the top of every tab.

  • Effort: S

Section C — Genuine AI Integration Opportunities

Existing AI surfaces grounded in this repo: admin/ai and admin/ocr admin pages; email-draft.service.ts (compose suggestion via /api/v1/ai/email-draft); interest-scoring.service.ts (pure SQL — not AI today, candidate for AI uplift); berth-pdf-parser.ts (AI is the 3rd parser tier); expense-ocr.service.ts + receipt-scanner.ts (OCR + structuring); ai-budget.service.ts (cost-budget gate). The OpenAI SDK is wired but optional. All proposals below assume model calls go through a service that respects ai-budget and an explicit per-port enable flag.

C · HIGH-VALUE

C1. Auto-summarize a client / interest on detail open When a rep opens a client/interest, summarize: "5 EOIs over 18 months, 2 archived, last touched 12 days ago by María, current stage is contract-out — last note suggests cash-flow concern; berth A4 is the primary." Plays directly into A3 (what-changed digest).

  • Sketch: New /api/v1/ai/entity-summary endpoint accepting entityType + entityId, gathering activity log + notes + linked entities (already available via entity-activity.service.ts), prompting GPT for a 3-sentence summary. Cache by (entityId, last_activity_id) in Redis. Surface as a collapsible card above detail-header-strip.tsx. Always show "View source" → activity feed; never hide raw data.
  • Effort: M
  • Risk: confabulation — model invents a number. Mitigate: structured prompt that returns JSON with claims: [{text, sourceActivityIds: []}], render only claims with non-empty source IDs. Hard 200-token cap.

C2. Semantic search across notes, email bodies & document content search-nav-catalog.ts is keyword-based. Reps searching "the client who was worried about wave exposure" can't find anything. The biggest practical AI win in a CRM.

  • Sketch: Add an embeddings table (pgvector — already supported by Postgres). Embed notes.body, email_messages.text, signed-document OCR text, on insert via a new BullMQ embeddings worker (sibling to workers/ai.ts). Add /api/v1/search/semantic returning ranked entityIds. Toggle in cmd-K palette between "Exact match" and "Semantic". Cite source row per hit.
  • Effort: L
  • Risk: PII flowing to OpenAI embeddings. Use a local embedding model (gte-small via fastembed/onnx) per lib/ai design — never ship raw notes to OpenAI for embedding. Document this clearly in CLAUDE.md.

C3. Interest scoring uplift — hybrid SQL + lightweight learned model interest-scoring.service.ts is pure rule-based (pipelineAge, stageSpeed, etc.). It works but reps disagree on signal weights. Train a per-port logistic regression on historical outcome (won/lost) using current factors + a few new ones (days since last note, last email response time, deposit pattern). Output a calibrated probability.

  • Sketch: New nightly job train-interest-model in workers/ai.ts using a tiny library (no GPT — pure numerical). Persist coefficients in system_settings.interest_model. Service applies them at scoring time. Expose model AUC on admin/ai.
  • Effort: L
  • Risk: per-port data thin (cold start). Default to SQL weights until ≥30 closed interests exist. Document drift detection — refuse to serve a model with AUC ≤ 0.6.

C4. Smart reminder suggestions from email content Inbox (email-threads-list.tsx) already exists. When a client email contains "Let's chat next Tuesday" or "I'll get back to you in two weeks", surface a one-click "Create reminder for 21 May".

  • Sketch: On new email_messages insert, the existing worker calls a new extractActionableDates(body) GPT prompt returning JSON {candidates: [{date, summary, confidence}]}. Surface as a banner in email-threads-list.tsx and in the matching interest's reminder rail. Never auto-create — always suggest.
  • Effort: M
  • Risk: dates in client signatures / disclaimers ("This email was generated on …") fool the model. Filter low-confidence; cap one suggestion per message.

C · MEDIUM

C5. "Why this berth?" + "Why not?" explanation for the recommender berth-recommender.service.ts outputs a tier (A/B/C/D) + heat score. Reps can't always articulate to the client why a specific berth made the shortlist.

  • Sketch: Add an LLM rephrasing step over the structured tier-reasoning JSON (already produced by the service). Returns plain-English: "Tier A: matches your yacht's 22m LOA + 5m beam, on the protected pontoon, currently available, no historical pushback." Render inside berth-recommender-panel.tsx. Source data is fully structured → low confabulation risk.
  • Effort: S
  • Risk: explanation must never contradict the structured tier. Add an automated unit assertion that the explanation contains the tier label and the dimensions field.

C6. Auto-draft post-meeting note from a voice memo Reps walk back from a viewing with a 90s phone recording. Today they re-type. Drop the audio into the client's notes tab, Whisper transcribes + GPT summarizes into note-friendly bullet points.

  • Sketch: Add audio-note-upload action to notes-list.tsx. Worker pipeline: upload via storage backend → Whisper → GPT bullets → insert as a draft note flagged ai_generated=true. Rep reviews + saves.
  • Effort: M
  • Risk: Whisper accent accuracy on Polish / Italian names. Always preserve the raw audio + transcript alongside the bullets; never delete the source.

C7. Translation for portal/client comms Polish reps writing English. English reps writing Polish. Currently they paste into Google Translate.

  • Sketch: Add a translate-icon button to compose-dialog.tsx and notes-list.tsx. One-click translates a draft into the client's preferred language (already tracked on clients.preferredLanguage). Show both versions side-by-side before send.
  • Effort: S
  • Risk: never auto-translate without rep confirmation, especially for any contractual phrasing.

C8. Document-template merge-field auto-population from client context merge-fields.ts catalog + eoi-context.ts already do structured population. Where merge fields lack a structured source (admin templates with {{custom_intro}} blanks), an LLM could draft from notes + client profile. Rep then reviews.

  • Sketch: New "Suggest draft" button on each blank merge field at template-fill time. Returns 23 phrasings; rep picks one.
  • Effort: M
  • Risk: see "what NOT to AI-ify" below — this is borderline. Allowable only for non-legal merge fields (greeting, intro paragraph), explicitly blocked for legal/financial blanks.

C9. Photo categorisation for berth/yacht uploads Berth PDFs are parsed; raw photos uploaded to yacht/berth detail aren't tagged. AI auto-tagging would speed search for "yachts with a bowsprit" or "berths with a fixed davit".

  • Sketch: On image upload via image-cropper-dialog.tsx's completion, queue a vision job that returns 35 tags (drawn from a controlled vocabulary). Store as photo metadata. Search filters use vocabulary terms.
  • Effort: M
  • Risk: vision-model bias / hallucinated features. Constrain output to a port-defined vocabulary list; reject anything outside it.

C · EXPLORE

C10. Conflict / clause-mismatch detection across templates and signed copies When admins edit a template, did the new clause contradict something they wrote in another template? When a counterparty returns a "with edits" PDF (currently uploaded via external-eoi-upload-dialog.tsx), did they alter a non-trivial clause?

  • Sketch: Embed each clause; on template save, surface "this clause is 0.92 similar to but materially differs from a clause in Template X". On external-EOI upload, diff against the canonical template's text and flag deltas in a yellow strip with "Reviewed by [rep]" before the rep can finalize.
  • Effort: L
  • Risk: false confidence — see "what NOT to AI-ify". Acceptable only as an assistive flag, never as a green-light. UI copy must say "Possible material difference detected — review required" not "No material difference".

C11. Expense anomaly detection beyond expense-dedup.service.ts expense-dedup.service.ts handles exact duplicates. Layered AI: detect amounts outside the rolling p95 for the same vendor, or trip-labels that look mismatched against expense date.

  • Sketch: Nightly job computes per-vendor p95 and flags outliers as expense_anomaly reminders for the admin.
  • Effort: M
  • Risk: low — it's a soft flag, not an auto-action. No money movement is gated.

C12. Smart vocabulary maintenance vocabularies table holds lead-sources etc. Over time, reps spawn synonyms ("Inst.", "Instagram", "IG"). Cluster + suggest merges to the admin.

  • Effort: SM

Section C+ — What NOT to AI-ify (critical pass)

These places either carry liability if the model confabulates, or have a tighter ground-truth than AI can match. Refuse the AI proposal even if it sounds appealing.

  • Legal text in EOIs, contracts, reservation agreements. eoi-context.ts, document-templates.service.ts, reservation-agreement-context.ts. The merge-field allow-list (VALID_MERGE_TOKENS in merge-fields.ts) exists precisely to keep AI out of legal copy. Never AI-generate a clause; never AI-paraphrase a clause "for readability"; never AI-translate a clause and present the translation as binding. Keep all legal text rep-authored or counsel-authored, period.
  • Money flow. Invoice amounts, deposit allocation, currency conversions, FX rate selection (currency.ts, invoices.ts). The audit-26 multi-currency audit is in flight precisely because money math has to be deterministic and reconcilable. AI here = unrecoverable customer trust damage on a single mistake.
  • Regulatory / GDPR responses. gdpr-export.service.ts, gdpr-bundle-builder.ts. Subject-access requests must return exactly what's in the database, with no LLM summarization layer that could omit a record.
  • Signing decisions. The Documenso webhook (handleDocumentCompleted idempotency, audit-tier 1) is the source of truth that a contract was signed. AI must never infer signing state from email content. If the contract isn't in the webhook stream as DOCUMENT_COMPLETED, it isn't signed.
  • Berth assignment auto-commit. berth-recommender.service.ts is intentionally pure SQL; the rules engine is intentionally suggest by default. Don't change that — auto-binding a berth to a client based on an LLM "judgment" is exactly the kind of mistake that ends in a refund and an apology. Recommend, never auto-assign.
  • Mooring-number / dimensions parsing. The 3-tier PDF parser (AcroForm → OCR → AI) escalates to AI only when OCR confidence is low and a rep clicks "AI parse" and a mooring-mismatch confirmation is required at apply time (berth-pdf.service.ts). Don't lower any of those guards.
  • Pipeline outcome ("won" / "lost"). This drives revenue reporting (reports.service.ts). Setting an outcome must remain a human decision. AI may suggest "this looks won based on the signed contract", but the human clicks the button.
  • Email send-side text in template-driven send-outs. document-sends.service.ts rate-limits and audits. AI-generated wording is fine for free-form composes (compose-dialog.tsx) where the rep reviews. AI-generated wording is not fine on bulk template sends where one bad phrasing reaches 50 clients before anyone notices.
  • Audit log entries. Audit logs (audit.service.ts) must remain raw structured events. Never let AI rewrite or compress them.
  • Permission overrides. user_permission_overrides (new in this branch). AI must never suggest or auto-apply grant/revoke — that's a security primitive.

Implementation sequencing recommendation

If the team wants a 2-sprint shipping bundle aligned with the existing branch's themes:

  1. Sprint 1 (UX, low risk): A1, A4, A5, A6, A11, B1, B2, B3, B5 — everything tagged S or low-M, no new infra.
  2. Sprint 2 (AI runway): Build the lib/ai skeleton (budget gate is in place; need a local-embedding pipeline + a worker) → land C1 (entity summary) and C5 (recommender explanation), both low-risk because they wrap structured data. Defer C2 (semantic search) until the embedding worker is proven.
  3. Backlog: A2 (smart undo — needs rules-engine reverse design), A7 (drag-to-stage on board), C3 (learned scoring — needs sufficient closed-deal volume per port), C10 (clause conflict — handle with extreme care).

Every C-section proposal should ship behind a per-port admin toggle (system_settings.ai_features.<name>) and respect ai-budget.service.ts. Every AI surface must cite its source rows or be flagged as "AI assistance".

— End of report —


22. Date/time + DST + scheduled jobs audit (datetime-auditor)

Date/time + DST + scheduled jobs audit — 2026-05-12

Scope: BullMQ cron schedules, reminder dueAt round-trip, TZ drift banner, server-side date formatting, ISO-8601, jobs that fire around midnight in user TZ vs server UTC, DST transitions, leap years, end-of-month.

CRITICAL

C1 — Reminder dueAt round-trip shifts by user-TZ offset on every edit

src/components/reminders/reminder-form.tsx:86,99,119

setDueAt(reminder.dueAt.slice(0, 16)); // line 86 — load
tomorrow.toISOString().slice(0, 16); // line 99 — default
new Date(dueAt).toISOString(); // line 119 — submit

reminder.dueAt is an ISO-8601 UTC string (...Z). Stripping the last 5 chars yields 2026-05-15T13:30 and feeds it into a <input type="datetime-local"> which interprets the value as local time. On submit, new Date('2026-05-15T13:30') parses as local-time and .toISOString() converts back to UTC, subtracting the user's UTC offset. So in Warsaw (CEST, UTC+2) every save of an existing reminder shifts the time backward by 2 h. Open + save again, it shifts another 2 h. End-result: a reminder created at "10:00 local" drifts to 06:00, then 04:00, until it's eventually negative-of-the-other-side (early morning vs evening).

The "default tomorrow 9 AM" path has the same bug in the opposite direction: tomorrow.setHours(9,0,0,0) gives 09:00 local, then .toISOString().slice(0,16) strips the Z so the input shows 07:00 (UTC) to the user, who reads it as 07:00 local. On submit it stores 05:00 UTC.

The contact-log dialog at src/components/interests/interest-contact-log-tab.tsx:459-469 already implements the correct pattern (localIsoString building the local HH:MM from getHours()/getMinutes()). Port it to reminder-form.tsx and snooze-dialog.tsx. Same applies to any other future datetime-local binding.

C2 — BullMQ recurring jobs run in UTC, not in port-local time

src/lib/queue/scheduler.ts:66-72

await queue.upsertJobScheduler(
  job.name,
  { pattern: job.pattern }, // no `tz` option
  { data: {}, name: job.name },
);

BullMQ's RepeatOptions defaults tz to UTC when unset. Concrete fallout for the Warsaw port (CET/CEST, UTC+1/+2):

Pattern Intent Actual fire (CET / CEST)
0 8 * * * (invoice-overdue, tenure-expiry) "8 AM local" 09:00 winter / 10:00 summer
0 2 * * * (database-backup) "2 AM local" 03:00 winter / 04:00 summer
0 4 * * * (session-cleanup, gdpr cleanup) "4 AM local" 05:00 winter / 06:00 summer
0 3 * * 0 (backup-cleanup) "Sunday 3 AM" Sun 04:00 winter / 05:00 summer

Twice a year (last Sun of March, last Sun of October) the local firing time visibly shifts by an hour and admin docs ("daily check at 8 AM") silently break. Fix: pass tz: process.env.SCHEDULER_TZ ?? 'Europe/Warsaw' (or read per-port — see also C3) to every upsertJobScheduler. The hourly/sub-hourly patterns (* * * * *, */N * * * *, 0 * * * *) are TZ-invariant and don't need a tz.

C3 — report-scheduler never advances next_run_at

src/lib/queue/workers/reports.ts:22-50, src/lib/services/reports.service.ts

The minutely scheduler selects WHERE next_run_at <= now(), enqueues a generate-report job, and inserts a generated_reports row — but does not bump scheduled_reports.next_run_at. There is no other write of that column anywhere in the service layer or API. Effect: once a scheduled report comes due, the worker re-queues it every minute, forever, until a human zeros the row out. For weekly/monthly reports this means an instant flood of duplicate emails to recipients.

After enqueueing, write a new next_run_at derived from the cron expression (use cron-parser or equivalent; project already vendors croner-style logic via BullMQ's repeat machinery). Wrap the SELECT + UPDATE in a transaction with FOR UPDATE SKIP LOCKED so two scheduler ticks racing on the same row can't double-fire.

HIGH

H1 — detectOverdue compares against UTC "today"

src/lib/services/invoices.ts:763

const today = new Date().toISOString().split('T')[0]!;
// ... lt(invoices.dueDate, today)

invoices.due_date is a DATE. Building "today" from toISOString() returns the UTC calendar date. The cron fires at 08:00 UTC (= 09:00 / 10:00 local) so today-in-UTC and today-in-Warsaw agree at that moment, but if a human ever calls detectOverdue between 00:0002:00 local (still yesterday in UTC), invoices due "today" get flagged overdue a day early. Compute the comparison date in port-local time (Intl + formatToParts).

H2 — Server-side PDF/email date formatting has no timeZone

src/lib/pdf/templates/reports/*.ts, src/lib/pdf/templates/*.ts, src/lib/email/templates/document-signing.ts:141

Many calls of the form new Date().toLocaleString('en-GB') or new Date(...).toLocaleDateString('en-GB') with no { timeZone } option. On a UTC-deployed Docker container the output is UTC even when the PDF context is per-port-local. "Generated: 11/05/2026, 22:30:00" on a report a Warsaw rep opens at 00:30 the next morning is confusing. Pass { timeZone: portTimezone } (resolve from ports.timezone or port_settings) into every server-side formatter.

H3 — Notification-digest TZ gate skips a day on DST spring-forward

src/lib/services/notification-digest.service.ts:79-83

The local-hour gate works correctly in steady state, but on the spring-forward boundary (e.g. Warsaw 31 Mar, 02:00 → 03:00 CEST), if the configured digest time is 02:00 it is skipped entirely — local hour goes from 01 to 03. Conversely on fall-back (CEST → CET) at 03:00 → 02:00 a 02:00 digest fires twice in the same calendar day. Document the gap or, better, gate on (port_id, local-date) last-sent rather than the hour alone.

H4 — Reminders fire/list use new Date() against UTC-stored timestamps but UI shows port-local

src/lib/services/reminders.service.ts:87, 105, 515

lte(reminders.dueAt, new Date()) is correct (dueAt is timestamptz), but processOverdueReminders runs every 15 minutes and emails users the second the UTC instant matches. If a rep sets a reminder for "Friday 17:00" in Warsaw, the email lands 17:00 CEST → fine. But the email template (notifications insert) renders the server time — same H2 issue. Verify the user-facing email body renders dueAt in the recipient's preferred timezone (userProfile.preferences.timezone), not server UTC.

MEDIUM

M1 — TZ-drift banner endpoint asymmetry

src/components/dashboard/timezone-drift-banner.tsx:62-75

Reads from GET /api/v1/me (returns profile.preferences.timezone), writes to PATCH /api/v1/users/me/preferences (a different preferences JSONB row). Both endpoints exist and both ultimately update user_profiles.preferences, so functionally fine — but having two endpoints write the same blob with different validators (/me allow-lists {dark_mode, locale, timezone, tablePreferences}, /users/me/preferences uses updateUserPreferencesSchema) means a key accepted on one endpoint may be silently dropped on the other. Either merge into a single endpoint or document which is canonical.

M2 — Alpine small-ICU risk for per-port Intl.DateTimeFormat({ timeZone })

notification-digest.service.ts localHourFor and any future per-TZ formatter need full-ICU. If the Docker base is Alpine without full-icu, named zones silently fall back to UTC and the catch swallows it. Add a startup self-test confirming Intl.DateTimeFormat('en',{ timeZone: 'Europe/Warsaw'}).format(new Date()) differs from UTC.

M3 — Contact-log followUpAt validator is looser than reminders

src/lib/validators/interest-contact-log.ts:14,23

z.coerce.date() accepts unzoned strings. Tighten to z.string().datetime() to match the direct reminders endpoint.

M4 — BR-060 follow-up uses raw ms-arithmetic for "days since"

src/lib/services/reminders.service.ts:438

(now - lastActivity) / 86_400_000 under/over-counts by 1 h across DST boundaries. Cosmetic for 14-day windows; document the rounding bias.

M5 — Greeting hourly tick uses setInterval(3_600_000)

src/components/dashboard/dashboard-shell.tsx:113 — drifts across DST. Use a recursive setTimeout keyed to next local hour boundary.

ISO-8601 conformance summary

  • Reminder writes/emit: z.string().datetime() + .toISOString()
  • Contact-log writes: z.coerce.date() — loose, see M3.
  • type="date" fields serialize as YYYY-MM-DD matching DB DATE. ✓
  • PDF/email render: mixed; H2 covers the missing timeZone.

Round-trip recap (picker → DB → email)

  1. datetime-local value is local time, no TZ marker.
  2. new Date(v).toISOString() → UTC Z form to API.
  3. DB timestamptz stores the instant.
  4. Re-render to picker via localIsoString(iso) (build local YMD/HM from getHours() etc.) — never iso.slice(0,16).
  5. Email/PDF render with { timeZone: portOrUserTz }.

C1 is the only place this breaks today. Once fixed plus C2/C3, the chain is consistent.

Out of scope

  • No node-cron / croner jobs outside BullMQ.
  • No Date.UTC construction; everything via new Date(...) / Date.now().
  • No Temporal adoption; defer until Node 22 LTS unflags it.

24. File lifecycle + storage drift audit (file-lifecycle-auditor)

Audit — File lifecycle + storage drift

Scope: orphan blobs, stale folder rows, avatar cleanup, EOI signed-PDF orphans, brochure / berth_pdf version retention, storage-swap migration completeness, demoteSystemFolderOnEntityDelete, file_id orphans after document delete, GDPR-export ZIP retention.

Branch: feat/documents-folders @ 660553c. Read-only.


CRITICAL

C1. Avatar replacement leaks files rows + S3 blobs forever

src/app/api/v1/me/avatar/route.ts POST uploads a NEW file via uploadFile() and overwrites user_profiles.avatar_file_id — but never reads or deletes the previous id. Every "Replace photo" leaks one DB row + one blob, untethered (no client_id/yacht_id/company_id), so invisible to every existing UI sweep.

// no read of old avatarFileId, no cleanup
await db
  .update(userProfiles)
  .set({ avatarFileId: record.id, updatedAt: new Date() })
  .where(eq(userProfiles.userId, ctx.userId));

Fix: SELECT the prior avatar_file_id, call deleteFile() (already handles ref-check + blob + audit), wrapped in try/catch so a stale-blob failure doesn't block the new avatar.


HIGH

H1. handleDocumentCompleted put-before-insert leaks signed-PDF blobs on retry storms

src/lib/services/documents.service.ts:1131-1188. Sequence: storage.putdb.insert(files)db.update(documents).set(signedFileId). The idempotency gate at line 1110 stops a second webhook from minting a second blob — but only if doc.status === 'completed' AND signedFileId is set, which requires step 3 to have run. If step 2 OR step 3 throws on attempt N, the blob from step 1 survives with no DB pointer; Documenso retries; the gate doesn't trip (status still not 'completed'); step 1 runs again with a fresh UUID storage path. Each retry compounds an orphan.

Fix: either insert the files row in a pending state BEFORE storage.put (so failure rolls back via FK / explicit cleanup), or reuse a stable storage key derived from documents.id so retries overwrite the same blob.

H2. deleteDocument strands fileId + signedFileId rows + blobs

src/lib/services/documents.service.ts:596-616 does db.delete(documents) only. Both file FKs are plain references() (no cascade, no SET NULL) — the document row vanishes but the files rows + blobs survive with no link back. For a cancelled/expired doc with signedFileId (the sent/partially_signed block at line 599 doesn't cover these), the signed contract PDF — containing PII — is permanently orphaned in storage.

Fix: in deleteDocument, also delete dependent files rows via deleteFile(), or refuse the delete if files attached (mirroring deleteFile's ref-check).

H3. Brochure versions: zero cleanup, ever

src/lib/services/brochures.service.ts:191 archiveBrochure only flips archivedAt + clears isDefault. No version-row delete, no blob delete. No "delete prior version" admin endpoint, no retention cron, no rolling cap. CLAUDE.md says "Archived brochures retain version history" — that's by design, but there's also zero path to ever drop one. With ~10 MB PDFs iterated monthly, linear unbounded growth.

Fix: admin deleteBrochureVersion(brochureId, versionId) endpoint (blob delete via getStorageBackend().delete() + row delete in tx); refuse to delete the only remaining non-archived version. Optionally brochure_version_retention_count system setting.

H4. berth_pdf_versions has no cleanup mechanism

Symmetric problem. src/lib/services/berth-pdf.service.ts inserts a fresh row + UUID-keyed blob per upload (line 213); old versions accumulate forever. current_pdf_version_id advances; history-by-design is unbounded-by-default. For a port with hundreds of berths reuploaded under parser iterations, this is the largest storage footprint in the system.

Fix: admin "Delete this version" action on the version-history list, gated so the current_pdf_version_id cannot be deleted. Storage delete + row delete in a tx.


MEDIUM

M1. files.client_id lacks an explicit onDelete — fragile

src/lib/db/schema/documents.ts:30: clientId: text('client_id').references(() => clients.id) (no onDelete). Migration 0000 records ON DELETE no action. The only existing client-delete path (client-hard-delete.service.ts:193) explicitly nullifies files.client_id first, so it works — but any future bulk-delete / port-teardown / dev script bypassing hardDeleteClient will FK-violate. Compare files.yacht_id + files.company_id, both set null (added in 0042).

Fix: new migration to ON DELETE SET NULL files.client_id. Removes the implicit invariant that hard-delete is the only legal path.

M2. demoteSystemFolderOnEntityDelete is wired for clients only

One caller (client-hard-delete.service.ts:236). No hardDeleteYacht / hardDeleteCompany exists today, so not currently broken — but it's a landmine when those flows ship. Both must call demoteSystemFolderOnEntityDelete(portId, 'yacht'|'company', id).

M3. Hard-deleted-client files become un-swept root orphans

client-hard-delete.service.ts:193 nullifies files.clientId and demotes the system folder to "{name} (deleted)". The file rows now have clientId=null + folder_id pointing at the demoted folder — discoverable in the demoted folder but never automatically dropped. The HARD delete of the client doesn't actually hard-delete their files. Inconsistent with the "hard" naming AND with GDPR Article 17.

Fix: mid-transaction (before the nullify), capture the affected file IDs; post-transaction call deleteFile() on each (handles blob + audit). Alternatively: nightly worker that drops file rows where every entity FK is null + no doc/expense/maint reference + created_at < N days.

M4. GDPR export cleanup retries forever on storage failure

src/lib/queue/workers/maintenance.ts:97-108. If storage.delete(row.storageKey) throws, the catch increments failed but does NOT delete the DB row. Next 4 AM run, same row reappears; same failure; same warn. No max-retry, no dead-letter, no admin escalation. A permanently broken storage path silently piles up infinite warns AND the GDPR-erasure obligation never completes.

Fix: track delete_attempts per row; after N failures either force-delete the DB row + log the orphan-blob to an admin-visible orphans table, or escalate at pino error + Sentry.

M5. migrate.ts table list has no drift guard

src/lib/storage/migrate.ts:52 explicitly admits: "The report_snapshots table called out in the audit does not exist yet. Add it here when it lands." This is a manual checklist with no enforcement — any future table that adds a storage_key/storage_path and forgets to extend TABLES_WITH_STORAGE_KEYS will silently leave its blobs behind on every backend swap.

Fix: integration test that diffs information_schema.columns WHERE column_name IN ('storage_key','storage_path') against TABLES_WITH_STORAGE_KEYS. Failing test forces an update before the new table can ship.

M6. deleteFolderSoftRescue: no per-row audit + opaque sibling-name collision

src/lib/services/document-folders.service.ts:283-326:

  • Only the folder delete itself is audit-logged; the bulk re-parent of N documents + N files leaves no per-row trail. An auditor cannot reconstruct "which folder did this signed contract land in?"
  • If a re-parented child folder's name collides with an existing sibling at the destination, the UPDATE throws on uniq_document_folders_sibling_name and the tx rolls back. Error propagates as a raw "duplicate key" — compare moveFolder, which catches via isSiblingNameConflict and returns a useful 409.

Fix: (a) emit one bulk audit row with metadata: { docsMoved, filesMoved, rescuedTo }; (b) wrap the UPDATE in the same conflict catch.

M7. listTree silently drops orphan folder rows

document-folders.service.ts:95 logs "listTree: orphan folder row … dropped from tree". Defensive — but the orphans aren't auto-healed and aren't surfaced anywhere. Post-soft-rescue this shouldn't happen, but if it does (race, manual SQL, future bug), the row hides forever.

Fix: daily maintenance worker counts documentFolders WHERE parent_id IS NOT NULL AND parent_id NOT IN (SELECT id FROM documentFolders) and emits a metric / log.


Summary

Sev Finding File Effort
CRIT C1 — Avatar replace leaks rows + blobs api/v1/me/avatar/route.ts XS
HIGH H1 — completed-webhook put-before-insert orphan services/documents.service.ts:1131 S
HIGH H2 — deleteDocument strands signed PDF services/documents.service.ts:596 S
HIGH H3 — Brochure versions: no cleanup ever services/brochures.service.ts M
HIGH H4 — Berth PDF versions: no cleanup ever services/berth-pdf.service.ts M
MED M1 — files.client_id lacks onDelete schema/documents.ts:30 XS migration
MED M2 — demoteSystemFolderOnEntityDelete client-only services/document-folders.service.ts:733 XS (future)
MED M3 — Hard-delete client leaves orphan files services/client-hard-delete.service.ts:193 S
MED M4 — GDPR cleanup loops on storage failure queue/workers/maintenance.ts:97 S
MED M5 — Migrate table list has no drift guard lib/storage/migrate.ts:55 S test
MED M6 — Soft-rescue: no per-row audit + opaque collision services/document-folders.service.ts:283 S
MED M7 — Orphan folder rows logged, never healed services/document-folders.service.ts:95 XS

Biggest cumulative storage waste: H3 + H4 (uncapped version retention) and C1 (per-user avatar churn). Most dangerous correctness/GDPR findings: H1 (silent signed-PDF orphan under Documenso retry) and H2 (signed PII PDFs surviving document deletion).


28. Code quality + maintainability hotspots audit (maintainability-auditor)

Audit — Code Quality & Maintainability Hotspots (task #28)

Scope: cyclomatic complexity hotspots, files >500 lines, services violating SRP, monster components, cross-domain duplication, abandoned scaffolding. Read-only.

Top-line numbers: 9 source files >700 lines; 22 files >500 lines. TODO/FIXME/HACK markers: only 3 files (3 markers total) — drift is not the problem here; sheer file size and per-entity duplication are.


CRITICAL

C1. src/lib/services/documents.service.ts — 1982 lines, 33 exports, 30 imports, ~7 distinct concerns

One file owns: document CRUD, hub listing, signing send-flow (sendForSigning, ~200 lines, 10+ branches), manual upload (uploadSignedManually), 6 Documenso webhook handlers (handleDocumentCompleted 224 lines / 11 branches, handleRecipientSigned, …Expired, …Rejected, …Cancelled, …Opened), template-driven wizard (createFromWizard), and aggregated-by-entity projection (listInflightWorkflowsAggregatedByEntity + fetchWorkflowGroupRows). Single strongest SRP violation in the codebase. Recommend split: documents.service.ts (CRUD+detail), documents-signing.ts (send/cancel/manual- upload), documents-webhook-handlers.ts (the 6 handlers), documents- aggregation.ts (the hub projection). Webhook handlers in particular are inbound-event logic, not service CRUD, and dynamic-import circular deps with interests.service.advanceStageIfBehind cross the boundary today.

C2. src/lib/services/search.service.ts — 2163 lines, single file

26 exports, 14 per-entity searchX helpers (clients, residential clients, yachts, companies, interests, residential interests, berths, invoices, expenses, documents, files, reminders, brochures, tags, notes, otherPorts), plus expandGraph (~420 lines, 14+ branches), search orchestrator, and recent- search storage. Cohesive in purpose but no single dev can hold this in head. Recommend: search/buckets/*.ts (one per entity), search/expand-graph.ts, search/orchestrator.ts. Touching one bucket today forces reading 2000+ lines of unrelated context.

C3. src/lib/services/notes.service.ts — 1121 lines, near-pure duplication

6 entity-type branches per operation (clients / interests / yachts / companies / residentialclients / residential_interests). The create function alone (lines 689846) is 158 lines of 6 copy-pasted insert-then-profile-lookup blocks; same for update (8471019) and deleteNote (1020+). A tableForEntity() dispatcher is _defined at line 82 then immediately silenced (void tableForEntity; line 98) — i.e. the abstraction was started, abandoned, and the dead helper left in place. Aggregated listers (listForClient/Yacht/ Company/ResidentialClientAggregated) are 4 near-identical 100-line bodies. Recommend: dispatch table { table, fk, link } keyed by entityType + single generic insert/update/delete; collapses ~600 lines.

C4. src/components/interests/interest-tabs.tsx — 959 lines, single file

OverviewTab is 415 lines of inline JSX (456870). Inline helpers MilestoneSection, MilestoneAdvanceButton, FutureMilestones, EditableRow, InfoRow, useInterestPatch, useStageMutation, humanizeStatus all share this file. Single file owns the entire detail-page overview, milestone widget, mutation hooks, and tab definition. Recommend split: interest-overview-tab.tsx, interest-milestones.tsx, hooks/use-interest-patch.ts.


HIGH

H1. Two near-named template services live side-by-side

src/lib/services/document-templates.ts (955 lines — CRM template flow: listTemplates, generateAndSign, EOI generation) and src/lib/services/document-templates.service.ts (262 lines — Admin TipTap template flow with audit-log versioning). Both export listTemplates, getTemplateById, createTemplate, updateTemplate against different schemas. Different consumers import each by accident-prone path. Strongly recommend renaming the admin one to admin-document-templates.service.ts (it already prefixes its functions with …AdminTemplate…).

H2. Per-entity component duplication is system-wide (4× scaffolding)

For each of clients / yachts / companies / interests there exist near-parallel: <entity>-list.tsx, <entity>-columns.tsx, <entity>-filters.tsx, <entity>-form.tsx, <entity>-detail-header.tsx, <entity>-card.tsx, <entity>-tabs.tsx, <entity>-files-tab.tsx. Confirmed near-identical pairs:

  • client-files-tab.tsx vs company-files-tab.tsx — 88 lines each, only difference is the entity-key parameter (clientId vs companyId) in 6 spots. ~95% byte-identical. Should be <EntityFilesTab entityType=…>.
  • client-list.tsx (350) / yacht-list.tsx (295) / company-list.tsx (308) / interest-list.tsx (469): same imports, same TanStack-table wiring, same bulk-action shape, parameterised only by columns + filters + form components.

A generic <EntityListShell columns={…} filters={…} form={…} /> would collapse ~1400 lines into ~400. Similarly forms: interest-form (756) + company-form (706) share the same react-hook-form skeleton.

H3. src/lib/services/expense-pdf.service.ts — 987 lines, SRP-spanning

Mixes: query/fetch (fetchExpenseRows, resolveReceiptFiles), grouping (groupRows, groupKey, computeTotals), image processing (maybeResizeImage, streamToBuffer), and PDFKit layout primitives (addHeader, addSummaryBox, addExpenseTable, addReceiptPages, renderReceiptHeader, addReceiptErrorPage, addFooter). 17 functions, 3 unrelated concerns. Recommend: expense-pdf/data.ts, expense-pdf/ layout.ts, expense-pdf/index.ts.

H4. src/components/search/command-search.tsx — 1177 lines, 10 inline subcomponents

CommandSearch (268 lines) + FilterChipRow, ChipButton, EmptyStateBeforeSearch, ResultsRegion, ZeroState, QuickCreateButton, ResultRow, Badge, SectionHeading, BucketSection, plus buildFlatRows (327 lines, branch-heavy). The inline subcomponents are reusable in principle but private to this file by virtue of co-location. Recommend: search/internal/{filter-chips,result- row,bucket-section,build-flat-rows,empty-states}.tsx. buildFlatRows deserves its own file with its own test.

H5. src/lib/services/interests.service.ts — 1273 lines, 17 exports

Owns 6 state-transition mutations (changeInterestStage, advanceStageIfBehind, setInterestOutcome, clearInterestOutcome, archiveInterest, restoreInterest), berth-linking (linkBerth/unlinkBerth), tag setter, board projection (listInterestsForBoard, ~75 lines), list+detail. State-transition logic could move to interests-lifecycle.ts; board projection to interests-board.ts. Two interest CRUD helpers (getInterestById 112 lines, listInterests 184 lines) both build elaborate shaped reads — they're load-bearing but should probably both run through a single projection helper.


MEDIUM

M1. Cyclomatic-density hotspots (informal — branch-count per body)

  • documents.service.handleDocumentCompleted — 224 lines, 11 branches.
  • documents.service.sendForSigning — 200 lines, 10 branches.
  • search.service.expandGraph — 420 lines, 14+ branches across entity types.
  • documents.service.uploadSignedManually — ~110 lines.
  • interests.service.changeInterestStage — ~140 lines.
  • notes.service.create/update/deleteNote — 6 inline entity branches each.

M2. Abandoned scaffolding — void <identifier> silencing

The codebase has 7+ deliberate void <symbol> statements added to keep imports/symbols around for future use:

  • src/lib/services/notes.service.ts:98void tableForEntity; (full helper abandoned)
  • src/lib/services/alert-rules.ts:331const _unused = { gt, desc, alertsTable }; void _unused; (3 stale imports)
  • src/app/api/v1/clients/bulk/route.ts:227-228void HIGH_STAKES_STAGES; void ({} as PipelineStage);
  • src/app/api/v1/admin/email-templates/route.ts:91void eq;
  • src/app/api/v1/admin/website-submissions/route.ts:76void lt;
  • src/app/api/v1/interests/bulk/route.ts:134-135void inArray; void withPermission;

Either the future-PR landed without removing the placeholder, or the abstraction was never built. Each is a small lint-clean-up; collectively they signal unfinished refactors. Decide per case: implement the dispatcher (notes), or delete the dead imports.

M3. Real TODO/FIXME — only 3 in the entire src tree

  • src/lib/queue/workers/import.ts:13// TODO(L2): implement import job handlers (worker is a stub).
  • src/lib/queue/scheduler.ts:44// TODO(L2): make per-user schedule configurable.
  • src/components/interests/interest-detail.tsx:26 — JSDoc remark, not a todo.

The import worker stub is the only real loose end — confirm whether import jobs are needed before shipping, otherwise delete the worker registration to avoid an empty queue.

M4. Cross-service implicit coupling via dynamic-import circles

documents.service imports advanceStageIfBehind from interests.service statically; interests.service imports evaluateRule from berth-rules-engine; berth-rules-engine calls services via await import(...) to dodge the cycle. The dynamic-import workaround masks circular ownership: the rules engine is effectively the orchestrator of state changes across documents + interests + invoices. Worth either (a) hoisting the rules engine to a top-level coordinator that the services don't import back, or (b) documenting the cycle explicitly in CLAUDE.md so the next dev doesn't break it.

M5. Largest leaf components without inline subcomponents

  • interest-form.tsx (756) and company-form.tsx (706) are single components. Both define schema + form + nested pickers in one file. Could benefit from interest-form-fields/{dimensions,category,picker}.tsx.
  • interests/linked-berths-list.tsx (530) and documents/documents-hub.tsx (537) sit just above the threshold; readable but on the edge.

M6. Re-export shims (legacy import boundary)

src/components/clients/pipeline-constants.ts — "Re-export from the canonical source so legacy imports keep working." Audit the consumer list and migrate imports to the canonical path; remove the shim.


Notes / non-issues

  • TODO/FIXME hygiene is excellent (3 markers across 148k LOC).
  • The 18 services with audit.service.ts-style pattern are short and cohesive — no monster spread.
  • Drizzle schema split (one file per domain in src/lib/db/schema/) is clean; relations.ts (953 lines) is large but central by design.
  • dashboard-shell.tsx (243 lines) is not a monster — single composition surface, leaves widgets in their own files. Healthy pattern.

Suggested order of operations

  1. Rename document-templates.service.tsadmin-document-templates.service.ts (H1; one-day safety win).
  2. Build <EntityFilesTab entityType="…"> and delete the two copies (H2; warm-up).
  3. Replace notes.service entity-switch ladders with a dispatch table (C3).
  4. Split documents.service along the natural seams: CRUD / signing / webhooks / aggregation (C1).
  5. Split search.service into per-bucket files (C2).
  6. Split interest-tabs.tsx and command-search.tsx (C4, H4).
  7. Sweep void <symbol> placeholders (M2).

Total estimated reduction: ~3500 lines of code via deduplication + better split points, no functional change.


23. Multi-port super-admin flow audit (multi-port-auditor)

Audit: Multi-Port Super-Admin Flow (Task #23)

Scope: super-admin "otherPorts" search extension, port-switcher UX, cross-port report queries, every isSuperAdmin bypass path, accidental data bleed, X-Port-Id header handling, port_id default resolution from preferences, the super-admin-only /admin/ports listing.

Read-only audit. No edits made. Roughly ranked by blast-radius.


CRITICAL

C1. Port-switcher race — first request after navigation can hit the WRONG port

src/providers/port-provider.tsx:38-48, src/components/layout/user-menu.tsx:65-73, src/lib/api/client.ts:50-63.

PortProvider reads the URL slug at render and reconciles Zustand inside a useEffect. apiFetch reads useUIStore.getState().currentPortId synchronously. For a super-admin who is on /port-A/clients and clicks /port-B/clients (or hits a deep link from search/external nav), the first round of queries fires before the reconcile effect commits — sending X-Port-Id = port-A while the page chrome renders port-B. Listings come back from port-A and render inside port-B's shell ⇒ silent cross-port data bleed in the UI.

handlePortChange does invalidate React Query AND push the route, but setPort (Zustand setter) is sync — and the router.push is async. Any queries kicked off by the new route's components before the next tick can still read stale state on the initial mount. The reconcile happens on the second render.

Fix sketch: Have apiFetch derive portId from window.location.pathname FIRST and fall back to Zustand, not the reverse. The slug is authoritative; Zustand is a cache. (The current code only consults the URL when Zustand is empty.)

C2. apiFetch slug-to-id fallback is dead for non-super-admins

src/lib/api/client.ts:18-40.

The fallback for "Zustand not hydrated yet" calls /api/v1/admin/ports. That endpoint has requireSuperAdmin(ctx, 'admin.ports.list') (src/app/api/v1/admin/ports/route.ts:16). For a port director on a hard refresh, the request 403s, resolvePortIdFromSlug returns null, apiFetch ships the request with no X-Port-Id header — and withAuth then falls back to preferences.defaultPortId, which (per next finding) is also unwritable. End state for the user: a 400 "Port context required" on every initial request after a cold reload, until Zustand re-hydrates from localStorage. Suggest a public/authed /api/v1/me/ports lookup that is permission-free.

C3. defaultPortId preference is read by withAuth but the /me PATCH allow-list refuses to write it

src/lib/api/helpers.ts:160-164 reads (profile.preferences as { defaultPortId?: string })?.defaultPortId as the X-Port-Id fallback.

src/app/api/v1/me/route.ts:45-66 defines preferences with z.object({...}).strict() and the allow-list ALLOWED_PREF_KEYS = new Set(['dark_mode', 'locale', 'timezone', 'tablePreferences']) at line 154. defaultPortId is silently stripped at every write. The fallback in withAuth is therefore dead — preferences.defaultPortId can only ever be set by a hand-rolled db.update. For super-admins this means: no header ⇒ no portId ⇒ ctx.portId = '' ⇒ every WHERE port_id = '' returns empty. Mild UX bug for super-admins but silent. Either remove the dead fallback or add defaultPortId to the strict schema + allow-list.


HIGH

H1. searchOtherPorts ignores per-port ACL for super-admin extension (theoretical, currently fine)

src/lib/services/search.service.ts:1232-1314. The docstring at line 31 promises "ports the user can access other than portId". The implementation just excludes excludePortId and joins every other row in ports. Today super-admins can access every port, so the behavior matches. Risk: if a future "regional super-admin" role lands and reuses this code path (opts.includeOtherPorts && opts.isSuperAdmin) the leak is total — no ACL filter. Recommend passing in the set of accessible portIds as a parameter and using it in the port_lookup CTE WHERE, even though the current gate is binary.

H2. /api/v1/admin/users/[id]/permission-overrides PUT — port directors can promote anyone in their port to "owns everything"

src/app/api/v1/admin/users/[id]/permission-overrides/route.ts:153-244.

The route gates on admin.manage_users (port-scoped), and rejects self-target (line 163) + targets not assigned to the same port (line 173). But there is no guard preventing a port director from writing admin.permanently_delete_clients: true, system_backup: true, manage_users: true, etc. onto a different user in the same port — and then logging in as that user (or asking that user) to act with elevated permissions. Self-target is blocked but co-conspirator escalation is not. Mitigation idea: cap the overrides a non-super-admin can set to the leaves they themselves hold (effectively ctx.permissions ∩ overrides). The audit log is recorded, so this is detectable post-hoc, but not prevented.

H3. AdminLayout vs admin-API permission asymmetry

src/app/(dashboard)/[portSlug]/admin/layout.tsx:31-33 redirects every non-super-admin away from /[portSlug]/admin/.... Meanwhile /api/v1/admin/** endpoints are mostly gated on admin.manage_settings / admin.manage_users / admin.view_audit_log — leaves that the port-director role holds. So a port director can hit the APIs (via curl, scripts, or non-/admin UI surfaces such as settings/) but the matching UI is hidden behind a super-admin redirect. Pick a side: either gate the API endpoints on requireSuperAdmin, or let port directors into the corresponding sub-pages of /admin/ (alerting on the ones that should remain super-admin only — backup, queues, storage, ports, invitations).

H4. Super-admin with empty ctx.portId silently filters to zero rows

src/lib/api/helpers.ts:166-168 — only non-super-admins are blocked when portId is null. A super-admin without an X-Port-Id header AND without a preferences.defaultPortId (which is currently every super-admin per C3) gets ctx.portId = ''. Downstream services that do WHERE port_id = ${portId} silently return empty data, which is harmless. But endpoints that BRANCH on isSuperAdmin ? undefined : ctx.portId (e.g. error-events route.ts:32) will hand undefined to the service and return EVERY tenant's rows. Currently only the error-events listing does this — but the pattern is risky. A scoped super-admin with the wrong header today sees one port; without the header they see ALL ports — surprising to admins debugging "why am I seeing port-X data on port-Y?". Recommend an explicit ?allPorts=true opt-in on those endpoints rather than coupling cross-port reads to a missing header.


MEDIUM

M1. Port switcher only invalidates queries, doesn't abort in-flight ones

src/components/layout/user-menu.tsx:65-73. queryClient.invalidateQueries() marks queries stale but lets in-flight ones finish and write into the cache. If a long-running fetch (e.g. PDF generation, expensive report) was started under port-A and resolves after the user switches to port-B, the cache entry is now port-A data keyed on a query that the new page treats as port-B. Worth pairing with cancelQueries() and a re-key on portId (most query keys appear to not embed portId).

M2. /api/v1/expenses/export/parent-company lost its isSuperAdmin guard

src/app/api/v1/expenses/export/parent-company/route.ts:9-12. The comment says "Hard isSuperAdmin check used to lock out port admins who held expenses.export = true" — but the check is no longer in the route body, it now relies on the perm gate alone. The service exportParentCompany is single-port (filters expenses.port_id), so this is not a cross-port leak today. But the doc-vs-code drift should be reconciled either by adding requireSuperAdmin back or by deleting the stale comment.

M3. Search "otherPorts" cross-port hits expose port-level metadata to ALL super-admin queries

src/lib/services/search.service.ts:1862-1864, src/app/api/v1/search/route.ts:20. Toggle includeOtherPorts defaults to false — but any super-admin can flip the query param. The merge into SearchResults.otherPorts returns portId/portSlug/portName/type/id/label/sub from up to 5 other ports per request without rate-limiting the cross-port enumeration. Pairs with the existing search rate-limit (if any) — confirm and add a tighter ceiling on searchOtherPorts(query, limit). Currently limit defaults to whatever the searchQuery schema permits.

M4. Super-admin dashboard redirect always picks first port alphabetically

src/app/dashboard/page.tsx:24-27db.query.ports.findFirst({ orderBy: portsTable.name }). Predictable and stable, but ignores any "last-used port" signal. Combined with C3, a super-admin who manually picks port-B then closes the tab returns to port-A on next login. Cosmetic but disorienting. Easiest fix: persist last_used_port_id in userProfiles.preferences and read it here.

M5. Webhook + document workers fan out to ALL super-admins for in-app notifications

src/lib/queue/workers/webhooks.ts:264, src/lib/queue/workers/documents.ts:73. Both fetch every isSuperAdmin=true AND isActive=true user to send notifications. Not a security issue; flagging because a future "regional super-admin" rollout will make the broadcast list quietly cross-tenant. Wrap the queries in a notifySuperAdmins(portId) helper now so the porting work is one diff later.

M6. /admin/ports/[id] PATCH lets super-admin mutate any port without the rate-limit gate

src/app/api/v1/admin/ports/[id]/route.ts:34-50 — no withRateLimit on a PATCH that touches every port-wide setting (timezone, currency, branding…). Lower priority because callers are short and trusted, but pairs naturally with the audit log.

M7. AuthContext has no accessiblePortIds set

Every cross-port-aware code path re-derives "which ports can this user touch?" from userPortRoles or isSuperAdmin. Hoist into AuthContext (computed once in withAuth) so future endpoints don't have to re-implement the resolution and so cross-port filters can apply inArray(table.portId, ctx.accessiblePortIds) uniformly.


Findings that audit clean

  • /api/v1/admin/ports GET/POST correctly require requireSuperAdmin (route.ts:16,28).
  • /api/v1/admin/ports/[id] correctly enforces port-in-scope for non-super-admins (assertPortInScope, line 15-20).
  • /api/v1/admin/invitations correctly rejects port-director-minted super-admin invites (line 40-42).
  • /api/v1/admin/audit is strictly port-scoped (line 40) — no cross-tenant peek even for super-admins.
  • withAuth correctly refuses requests where the body tries to set portId (header-only); body-based portId is documented as forbidden (line 156-159).
  • Reports service consistently uses ctx.portId in WHERE clauses (reports.service.ts:103-163); no super-admin cross-port aggregation paths.
  • Public berth/inquiry endpoints take their portId from a query param / dedicated header, never from auth context — correctly decoupled from session port.

  1. Fix C1 by making the URL slug authoritative inside apiFetch.
  2. Fix C2 with a small /api/v1/me/accessible-ports endpoint usable by every authed user.
  3. Add defaultPortId to the /me PATCH allow-list (C3) — or strip the fallback from withAuth.
  4. Add the "overrides ∩ caller's own perms" cap to permission-overrides PUT (H2).
  5. Reconcile AdminLayout vs admin-API gating (H3) — write one document of which leaves are super-admin only.
  6. Hoist accessiblePortIds into AuthContext (M7) ahead of the next cross-port feature.

33. S3 vs internal DB pathing + storage routing audit (storage-pathing-auditor)

Audit — S3 vs Internal DB Pathing + Storage Routing

Scope: src/lib/storage/*, every getStorageBackend() consumer, migration script, magic-byte enforcement, encryption-at-rest boundary. Date: 2026-05-12

Boundary summary (what lives where)

  • In DB (Postgres): file metadata only — files.storage_path, berth_pdf_versions.storage_key, brochure_versions.storage_key, gdpr_exports.storage_key, backup_jobs.storage_path, user-avatar FK (user_profiles.avatar_file_idfiles), document signing state (documents.signed_file_id). AES-256-GCM-encrypted secrets: system_settings.storage_s3_secret_key_encrypted, storage_proxy_hmac_secret_encrypted, email_accounts.credentials_enc, webhooks.secret, ocr_config.api_key_encrypted. No BYTEA / JSONB blobs found (grep BYTEA → 0).
  • In backend (S3/filesystem): every uploaded blob — signed PDFs (buildStoragePath(slug,'eoi-signed',…)), per-berth PDFs (berths/{id}/…), brochures, avatars, GDPR exports, pg_dump backups, expense receipts, generated reports, template source PDFs, send-out attachment fallbacks.
  • Routing: getStorageBackend() reads global system_settings.storage_backend ('s3'|'filesystem'), caches by config fingerprint, invalidated via resetStorageBackendCache() on settings write or migration flip. Code never imports minio/Client outside s3.ts (verified — only legacy buildStoragePath helper survives in src/lib/minio/index.ts). Interface methods: put/get/head/delete/listByPrefix/presignUpload/presignDownload — both backends implement all 7.

CRITICAL

C1. backup_jobs.storage_path missing from TABLES_WITH_STORAGE_KEYS — silent backup loss on backend swap

src/lib/storage/migrate.ts:55-60 lists only files, berth_pdf_versions, brochure_versions, gdpr_exports. backup_jobs.storage_path (pg_dump artefacts written by src/lib/services/backup.service.ts:54+72) is not in the list. Flipping S3 → filesystem (or vice-versa) leaves every historical database backup pointing at the old backend — getBackupDownloadUrl(id) will 404 / NoSuchKey, and the admin won't know until they try to restore. This is the worst category of data loss because backups are the recovery path of last resort. The comment in migrate.ts:51 calls out report_snapshots as a future addition but mentions nothing about backup_jobs. Add { table: 'backup_jobs', keyColumn: 'storage_path', pkColumn: 'id' } and ship the line with a smoke test.

C2. Orphan-blob risk: every backend.put runs outside the db.insert(files) transaction

Pattern repeated across 9+ services (files.ts:68-92, documents.service.ts:833-854 and 1134-1183, external-eoi.service.ts:71-96 — comment at L67-70 explicitly acknowledges "orphan reaper handles those" but no reaper exists, invoices.ts:603, document-templates.ts:537,674, reports.service.ts:231, gdpr-export.service.ts:169, backup.service.ts:62, berth-pdf.service.ts:229). Sequence is: PUT bytes → DB INSERT. If insert fails or the process dies in between, the blob is permanent and unreferenced. Only handleDocumentCompleted (documents.service.ts:1110) has an early-return idempotency gate; the rest leak. Over months of operation an S3 prefix accumulates dozens-to-hundreds of orphans that pay storage cost forever and survive every backup-restore. Add an orphan-reaper maintenance job that walks listByPrefix() against the union of all storage_* columns and deletes blobs older than 24 h without a DB pointer. Also wrap the put + insert pairs in a try/catch that explicitly deletes on insert failure (cheap defense in depth).

C3. S3 backend stores blobs without server-side encryption (SSE-S3 / SSE-KMS)

S3Backend.put() (src/lib/storage/s3.ts:191) passes only Content-Type to client.putObject. No x-amz-server-side-encryption header. Bytes-at-rest encryption depends entirely on the bucket's default-encryption policy, which is invisible to the application — a customer who provisions a MinIO/B2/R2 bucket without server-side encryption gets cleartext signed contracts, GDPR exports, and pg_dump archives sitting on disk. Same audit posture as SMTP/IMAP creds (which are AES-GCM in the DB) demands the same guarantee for the blob plane. Either add ServerSideEncryption: 'AES256' to every putObject call, or surface a boot-time check that asserts the bucket has default-encryption enabled and refuses to start otherwise (similar to the MULTI_NODE_DEPLOYMENT guard on FilesystemBackend).

HIGH

H1. Berth-PDF presigned-upload keys are not port-scoped

src/app/api/v1/berths/[id]/pdf-upload-url/handlers.ts:58 builds berths/{berthId}/uploads/{uuid}_{name} — no leading ${portSlug}/. Result: the optional port-binding (p field on the HMAC token, enforced in filesystem.ts:184-188 and documented in index.ts:43-49) cannot be wired here, and the storage-key namespacing convention diverges from buildStoragePath (which always prefixes the port slug). Tenant isolation today relies on the up-front berths.portId === ctx.portId check before mint, but the defense-in-depth port-binding is unwired. Normalise the key to ${portSlug}/berths/... and pass portSlug into backend.presignUpload.

H2. presignDownload callers never pass portSlug — port-binding token guard is dead code

presignDownloadUrl(...) (storage/index.ts:233) accepts portSlug and only 1 of ~12 callers uses it. files.ts:117,128, backup.service.ts:115, portal.service.ts:351, reports.service.ts:170, gdpr-export.service.ts:224,282 all pass undefined. The filesystem-proxy GET will therefore accept any valid HMAC token regardless of the storage-key's port prefix. The check is genuinely defensible (see filesystem.ts:179) but never engaged. Plumb the active port slug through every call site, or remove the optional p field and the verifier code so the contract isn't misleading.

H3. S3Backend.put and backup.service buffer entire blobs into memory

s3.ts:187 (Buffer.isBuffer(body) ? body : await streamToBuffer(body)) and backup.service.ts:60-62 (concatenates the entire pg_dump dump into memory before put). For a multi-GB database dump the worker OOMs. Comment at s3.ts:184-187 explicitly says "typical files are under 50MB" but runPgDump writes a dump file whose size scales with the tenant. Use client.fPutObject (file-path streaming) for backups; for streamable callers expose a putStream(key, stream, sizeBytes, opts) overload that pipes without streamToBuffer.

H4. Migrator's copyAndVerify double-buffers every blob and has no streaming hash

storage/migrate.ts:170-204 reads source → Buffer, sha256, put, then re-reads target → Buffer, sha256 again. For a 5 GB pg_dump (see C1 — once added) this allocates ~10 GB of heap. The sha256-verify round-trip is the right idea; pipe through crypto.createHash on both legs, never buffer.

H5. S3Backend.presignUpload lacks content-type / content-length binding

s3.ts:249-256 only calls presignedPutObject(bucket, key, expiry). The signed URL does not bind Content-Type or Content-Length — a browser can PUT 1 GB of arbitrary bytes against an EOI-signed key. Caps and magic-byte checks fire only on the register call afterwards (registerBrochureVersion and uploadBerthPdf HEAD-then-stream-first-5-bytes path). That's sufficient for the two consumers today, but the gate is one-deep — a future caller that forgets to wire a register endpoint exposes raw S3 directly. Switch to MinIO presignedPostPolicy with content-length-range + Content-Type conditions so the binding is on the signature itself.

MEDIUM

M1. CLAUDE.md drift on "TABLES_WITH_STORAGE_KEYS populated in 9a5ba87"

CLAUDE.md says the migrator covers "every blob in files, berth_pdf_versions, brochure_versions, gdpr_exports". Verified true — but backup_jobs is the missing 5th (see C1). Update the doc + add a unit test that asserts the array matches the set of tables with a storage_* column.

M2. email-compose.service.ts:124 reads attachment bytes into a Buffer

Each attachment under the email_attach_threshold_mb cap is fetched via storage.get(...) and concatenated. With multiple recipients × multiple attachments the worker holds N × size MB simultaneously. Stream into nodemailer's content: <Readable> API directly.

M3. UUID storage keys never check existence before put (no If-None-Match: "*")

crypto.randomUUID() collision is astronomical, but a buggy caller passing a duplicate key (or a re-run of a worker after a partial DB rollback) silently overwrites. Cheap belt: pass If-None-Match: '*' (S3) or O_EXCL (filesystem) — surfaces double-writes loudly.

M4. Per-port S3 routing not possible / listByPrefix unbounded

Storage config rows are global (portId IS NULL). Multi-tenant can't direct port-A vs port-B to separate buckets / KMS keys. listByPrefix returns every key in one array — script-only today but a footgun if called with empty prefix in production. Document the global-config assumption; add a cursor variant before any per-port-bucket customer lands.

M5. storage_filesystem_root change invalidates outstanding HMAC tokens silently

Cache swaps, but tokens minted under the old root still verify HMAC; resolveKeyForProxy then 404s under the new root. Customer download links emailed an hour earlier break with no warning. Either refuse runtime root changes, or warn in admin UI.

M6. Avatar URLs re-presign every 15 min — browser cache broken

No CDN / s-maxage fronts hot reads. Per-page avatar GET burns a presign + S3 round-trip. Issue 24 h URLs for category='avatar', or front with the Next.js Image route.

M7. Verified clean

  • withTimeout(...) wraps every minio call (s3.ts L143/150/190/203/219/237/285/292/300); system-monitoring.service.ts:153 adds its own 5 s deadline. No bare minio calls escape.
  • MULTI_NODE_DEPLOYMENT guard reads env.MULTI_NODE_DEPLOYMENT (zod-coerced, env.ts:80), test at filesystem-backend.test.ts:139. ✓

M8. Magic-byte enforcement

  • In-server uploads: files.ts:58 (bufferMatchesMime), berth-pdf.service.ts:218 (isPdfMagic). ✓
  • Presigned-PUT post-upload register: brochures.service.ts:258 (first-5-byte stream + %PDF-), berth-pdf.service.ts:259 (readFirstBytes + isPdfMagic). ✓
  • Filesystem proxy PUT: inline check route.ts:220 when token's c=application/pdf. ✓
  • S3 direct PUT: no inline check (relies on the register endpoint). Acceptable per CLAUDE.md, but document divergence: a future S3 consumer that forgets to call register leaks the gate.

Verified-clean (informational)

  • No BYTEA / binary-JSONB blob columns. ✓
  • Single canonical key format mismatch (storage_path vs storage_key) is documented + handled by per-table column mapping. ✓
  • validateStorageKey rejects traversal, absolute paths, dotfiles, and >1024 chars. ✓
  • Proxy token op-binding (get vs put) is enforced — replay across ops blocked. ✓
  • Proxy single-use replay protection via Redis SET NX with TTL pinned to token expiry. ✓
  • Filesystem HMAC secret falls back to a derived dev value but throws in production when unset. ✓
  • All blob keys are UUID-namespaced — collision-safe, not deterministic-audit-style. ✓
  1. C1 (one-line fix + smoke test) before any backend migration ships.
  2. C2 orphan reaper — cron job behind maintenance worker.
  3. C3 SSE-S3 — single-line putObject change + bucket-policy assertion at boot.
  4. H1 + H2 port-binding plumbing (small refactor, big invariant).
  5. H3 + H4 + M2 streaming pass over backup + migrator + email attachments.
  6. Remainder during next storage-config UI sweep.

34. Dependency upgrade analysis — Context7-assisted (follow-up after deps-auditor)

Post-session follow-up. Where the original deps-auditor covered abandonment + vulnerabilities, this section queries upstream changelogs via Context7 to weigh the pros/cons of pulling every available major. Use this as your bump roadmap.

Dependency upgrade analysis (Context7-assisted)

Companion to the deps-auditor report from the original 33-agent run. That auditor checked vulnerabilities + abandonment + license risk; this follow-up adds per-dep pros/cons of bumping to the latest stable, informed by upstream changelogs/docs queried via Context7.

Top-line baseline: pnpm audit reports 0 known vulnerabilities. No GPL/AGPL contamination. Lockfile reproducible. We are safe TODAY without any upgrade; everything below is "should we pull the next major in?" prioritization.


At a glance — what's outdated

Package Current Latest Bump size
next 15.5.18 16.2.6 major
eslint-config-next 15.5.18 16.2.6 major (matches next)
zod 3.25.76 4.4.3 major
tailwindcss 3.4.19 4.3.0 major
@hookform/resolvers 3.10.0 5.2.2 TWO majors
archiver 7.0.1 8.0.0 major
react-day-picker 9.14.0 10.0.0 major
eslint 9.39.4 10.3.0 major
esbuild 0.27.7 0.28.0 pre-1.0 minor (effectively major)
@playwright/test 1.59.1 1.60.0 minor
libphonenumber-js 1.12.43 1.13.1 minor
tailwind-merge 3.5.0 3.6.0 minor
bullmq 5.76.6 5.76.8 patch
@tanstack/react-query 5.100.9 5.100.10 patch
better-auth 1.6.9 1.6.10 patch
vitest 4.1.5 4.1.6 patch
lint-staged 17.0.3 17.0.4 patch
@vitest/coverage-v8 4.1.5 4.1.6 patch

@types/node deliberately pinned to ^20.19 to match Node 20 runtime (audit findings — was previously ^25 against a Node 20 runtime, which greenlit non-existent APIs).


Tier A — Pull the patches in now (zero-risk wins)

bullmq, @tanstack/react-query + react-query-devtools, better-auth, vitest + @vitest/coverage-v8, lint-staged, @playwright/test, libphonenumber-js, tailwind-merge.

Pros: patch / minor bumps, bug fixes only, no API changes documented. Cons: none material — pin-bumps after a 30-second pnpm install verify and full vitest run. Recommended: DO as one batch commit. ~5 minutes.


Tier B — Per-major analysis

B-1 — Next.js 15.5 → 16.2 (touches every API route + middleware)

Upstream summary (via Context7):

  • middleware.ts is renamed to proxy.ts in Next 16. The named export middlewareproxy. Config flags rename (skipMiddlewareUrlNormalizeskipProxyUrlNormalize). Edge runtime is NOT supported in proxy — if you need edge runtime you must keep middleware.ts (we already use the Node runtime, so this is just a rename for us).
  • Async cookies() / headers() / params / searchParams was the Next-15 change; Next 16 hardens the warning into an error. We're already async-safe (CLAUDE.md confirms the upgrade landed).
  • Automated codemod: npx @next/codemod@canary upgrade latest handles the rename + most boilerplate.

Risk for us:

  • src/middleware.ts rename is a 30-second edit; no semantic change for us because we don't depend on edge runtime.
  • The Documenso webhook + websocket server custom-server path (src/server.ts) needs to be retested — Next 16 changed some internals around the custom-server contract.
  • eslint-config-next must bump in lockstep (already at 15.5.18 → 16.2.6).
  • Turbopack defaults shifted; our dev script (next dev --turbopack -H 0.0.0.0) needs a quick smoke run.

Recommended: WAIT 2-4 weeks. Next 16 dropped recently; let the field's bug reports settle. Then run the codemod + a full playwright smoke. Effort: 1-2h.


B-2 — Zod 3 → 4 (touches every validator file)

Upstream summary (via Context7):

  • Top-level format helpers — z.email() / z.uuid() / z.url() etc. replace z.string().email() / .uuid() / .url(). Old form is deprecated but still works.
  • Error customization unified: { message: '...' }{ error: '...' }. Old form deprecated.
  • z.function() API completely redesigned — now takes input/output schemas upfront, returns a function factory (not a schema).
  • ~14× perf improvement on parse paths.
  • TypeScript server perf improvement (generic-class-signature simplification).

Risk for us:

  • We have ~30 validator files using z.string().email() / .uuid() style and { message: '...' } style throughout. Both still work in 4.x but produce deprecation warnings on every parse — noisy in logs.
  • @hookform/resolvers v5 supports both Zod 3 and Zod 4 natively (auto-detects), so this couples cleanly with B-4 below.
  • We don't use z.function() anywhere, so the biggest breaking change is a non-issue for us.

Recommended: GO once Tier A is in. Codemod-friendly: a single Find/Replace pass on z.string().email()z.email() etc. covers ~95% of the churn. Effort: 2-3h including running full vitest + writing replacement codemods.


B-3 — Tailwind CSS 3 → 4 (touches tailwind.config.ts, globals.css, every dynamic-class site)

Upstream summary (via Context7):

  • All-new Oxide engine — 5× faster full builds, 100× faster incremental.
  • CSS-first config: tailwind.config.ts is gone. Theme defined in globals.css via @theme + CSS custom properties (--color-brand: …).
  • PostCSS plugin consolidation: postcss.config.mjs switches from tailwindcss + autoprefixer + postcss-import plugins to single @tailwindcss/postcss.
  • Built on native cascade layers, OKLCH colors, container queries, @starting-style, popovers.
  • Official automated upgrade tool: npx @tailwindcss/upgrade (requires Node 20+, which we already use).

Risk for us:

  • We have a custom tailwind.config.ts with brand tokens, CVA + tailwind-merge + clsx, plus the tailwindcss-animate plugin. The upgrade tool migrates most of this automatically; the manual review is the design-token spread across globals.css.
  • shadcn/ui components (components/ui/*) use cn() + arbitrary values heavily. Some [--variable] syntax has changed in v4.
  • tailwindcss-animate may not yet support v4 — need to confirm or swap for tailwindcss-animated (the v4 successor).

Recommended: HIGH-RISK / HIGH-REWARD. Park until you have a clear afternoon. The build-time speedup is genuinely meaningful for dev experience. Run the official upgrade tool on a throwaway branch first; visually diff a handful of critical pages before merging. Effort: 3-4h on a focused day; visual regressions are the variable.


B-4 — @hookform/resolvers 3 → 5 (touches every form file)

Upstream summary (via Context7):

  • v5 supports both Zod 3 and Zod 4 simultaneously via auto-detection — pulls zod/v4 if you opt into it explicitly.
  • Resolver options shape is the same as v3 ({ mode: 'async' | 'sync', raw?: boolean }).
  • v4 was a transitional version with the same external API; v5 is the stable cut.

Risk for us:

  • Coupled with the Zod 4 upgrade — if we stay on Zod 3, v5 still works (the resolver detects Zod-3 schemas via shape probing). Bumping resolvers without bumping Zod is safe.

Recommended: GO IN LOCKSTEP with B-2 (Zod 4). Effort: 5 min once Zod 4 is in.


B-5 — archiver 7 → 8 (touches GDPR-export bundle + backup-restore)

Upstream summary: Library "/gajus/archiver" not found in Context7 — fallback to npm changelog. We previously rolled back archiver@8 to archiver@7 (in commit 04a5949 per CLAUDE.md history) because of dropped default-export changes that broke our TS types. v8 stabilised since then.

Risk for us:

  • Last time we tried this it broke. Read the v8 changelog before retrying.
  • Used only for GDPR export + backup-restore — narrow blast radius. A failed upgrade is non-customer-facing.

Recommended: DEFER. Stay on 7 until either v8 demonstrably fixes a CVE / bug we care about, or until we have a green test suite to verify nothing regressed. Re-attempt only when there's a forcing function.


B-6 — react-day-picker 9 → 10 (touches every date-picker site)

Upstream summary: v10 is a recent cut. Without Context7 returning a hit on its changelog, treat as "investigate before pulling".

Risk for us:

  • Used in ~6 surfaces (reminder form, EOI date fields, expense date, invoice due-date, dashboard date-range picker). A breaking change to the calendar render path would affect every form.

Recommended: DEFER 2-3 weeks to let bug reports surface. Effort to actually do it: ~1h once the spec is reviewed.


B-7 — eslint 9 → 10 + eslint-config-next (touches CI)

Risk for us:

  • ESLint 10 likely drops support for some legacy rule configs.
  • eslint-config-next should bump in lockstep with next (B-1).

Recommended: PAIR WITH B-1. No standalone value to bumping eslint without bumping Next.


B-8 — esbuild 0.27 → 0.28 (touches build pipeline)

Risk for us:

  • We use esbuild via pnpm.overrides plus directly in build:server and build:worker scripts.
  • Pre-1.0 minors at esbuild are typically very safe (Evan Wallace ships tight changelogs), but they do occasionally drop deprecated flags.

Recommended: GO. Bundle the bump with the Tier A patches. Effort: 1 min + a pnpm build smoke.


Tier C — Things to leave alone

  • drizzle-orm 0.45.2 — current major. No upgrade needed.
  • react 19.2.6 / react-dom 19.2.6 — current React 19. Stable.
  • @radix-ui/* — all current. These ship patch updates frequently; consider a quarterly sweep but not blocking.
  • @dnd-kit/*, @pdfme/*, socket.io, bullmq, pino, postgres, minio, ioredis, pdf-lib, pdfkit, sharp, tesseract.js, recharts, cmdk, vaul, sonner, zustand, next-themes, date-fns, clsx, class-variance-authority, jose, nodemailer, mailparser, imapflow, openai, lucide-react, react-easy-crop, react-hook-form — all current within their major lines and either no risk-worthy bump available or already bumped.

  1. Now — pull Tier A patches as one commit (~5 min).
  2. Nowesbuild 0.27 → 0.28 in same commit.
  3. Next focused half-day — Zod 4 + @hookform/resolvers v5 together. Coupled because resolvers v5 supports both. Codemod-able.
  4. 24 weeks — Next 15 → 16 + eslint-config-next 16 + eslint 10. Lockstep. Run @next/codemod first.
  5. When a tester-friendly afternoon opens up — Tailwind 4 via the official upgrade tool, with visual review across critical pages.
  6. Defer indefinitely — archiver 8, react-day-picker 10 (neither is delivering us anything we need).

Non-goal: chasing the bleeding edge on every dep. The audit's baseline finding stands — we are secure today. These are mostly developer-experience and perf wins, not security blockers.


35. Package adoption + PDF stack overhaul (Context7-assisted follow-up)

Companion to section 34. The deps-upgrade analysis answered "should we bump what we already have?" — this section answers two follow-on questions:

  1. PDF stack — are pdfme + pdfkit + pdf-lib the right tools? (No.)
  2. What aren't we using that we should be? — comprehensive sweep of the modern ecosystem against our actual pain points and codebase patterns.

User-directed exclusions:

  • react-hotkeys-hook (no keyboard-shortcut UX target).

35.A — PDF stack overhaul

Current state (5 packages, 4 distinct use cases)

Package Where it lives in our code Use case
@pdfme/common + generator + schemas v6.1.2 src/lib/pdf/generate.ts + 8 template files Declarative report/invoice/EOI templates
pdf-lib v1.17.1 src/lib/pdf/fill-eoi-form.ts, src/lib/services/berth-pdf-parser.ts AcroForm fill (EOI) + uploaded-PDF parsing (berth specs)
pdfkit v0.18.0 + @types/pdfkit src/lib/services/expense-pdf.service.ts (only site) Streaming receipt-attached expense reports
tesseract.js v7.0.0 src/lib/ocr/tesseract-client.ts + scan-shell Berth PDF OCR fallback
Bridge layer: 571-line src/lib/pdf/tiptap-to-pdfme.ts Admin template builder Tiptap JSON → pdfme schema converter

Pain points

  • The 571-line tiptap-to-pdfme.ts bridge is fragile glue between a rich-text format (Tiptap JSON) and a declarative PDF schema (pdfme). Every supported formatting subset (bold, italic, headings, lists, tables, images) is hand-coded. Adding blockquote / codeBlock / horizontalRule / taskList is currently rejected at save time because the bridge doesn't support them.
  • pdfme templates are JSON blobs with positional { x, y } coordinates. Reading/editing them is painful (compare invoice-template.ts vs a declarative React component).
  • @pdfme/generator ships a heavy runtime including the schema engine and font loaders — irrelevant for our use case because we're code-driven, not visual-editor-driven.
  • 3 different generation libraries (pdfme + pdfkit + pdf-lib) means three different mental models, three different test patterns, three different failure modes.

Recommendation per use case

Use case 1 — Template-driven PDFs (8 templates): invoice, client-summary, interest-summary, berth-spec, revenue-report, occupancy-report, pipeline-report, eoi-standard-inapp.

→ Replace with @react-pdf/renderer (/diegomura/react-pdf, 161 snippets, benchmark 87.75).

Why it wins for us:

  • Declarative React components — uses the same skills we already have. No more positional { x, y } JSON.
  • Server-side rendering modes: renderToBuffer (HTTP responses), renderToStream (large reports), renderToFile (background jobs). All three usage patterns are documented and idiomatic — replaces pdfme's generate() call cleanly.
  • First-class page-break controlsbreak, wrap={false}, minPresenceAhead, orphans, widows. pdfme has none of these; we'd be hand-implementing them today if we needed them.
  • Fixed headers/footers via fixed prop with auto page-number rendering (render={({ pageNumber, totalPages }) => …}). We currently re-render header content per page in pdfme.
  • The Tiptap bridge problem dissolves: a rich-text component renders Tiptap JSON directly via a recursive component (~80 LOC, replaces 571 LOC). No more constrained-subset rejections at save time.
  • Tree-shakes — only the components we import ship; pdfme's generator pulls the full schema engine regardless.

Concrete migration cost: rewrite 8 templates as JSX. The shape is 1:1 with our current pdfme schemas (header section, repeating items, footer totals), so it's a mechanical translation. ~4-6 hours total. Bridge layer (571 LOC) goes to zero.

Caveats from Context7:

  • Font registration is explicit (Font.register({ family, src })) — our current fonts move from pdfme's font loader to a one-time call at boot.
  • No Tailwind class support — uses StyleSheet.create({ ... }) with a flexbox-style subset. Familiar to React Native devs.

Use case 2 — AcroForm fill (EOI):

→ Keep pdf-lib. Best-in-class for editing existing PDFs. No replacement candidate is better. Already used correctly in fill-eoi-form.ts.

Use case 3 — Uploaded PDF parsing (berth specs):

→ Add unpdf (/unjs/unpdf, 66 snippets) for text extraction; keep pdf-lib for AcroForm field extraction.

Why:

  • unpdf is the unjs ecosystem's serverless-friendly PDF parser built on pdf.js. Returns { totalPages, text } per page in one call.
  • Better than pdf-lib for text extraction because pdf-lib's text APIs are read-positional, not read-flow.
  • getDocumentProxy() lets us share one parse across extractText, extractLinks, getMeta — useful for the 3-tier berth parser (AcroForm first, OCR fallback, AI fallback) because we can grab all metadata in one pass.

Our current parser uses pdf-lib's low-level text extraction which has known issues with positionally-rendered text (the OCR fallback fires more often than necessary). unpdf.extractText would reduce that fallback rate.

Use case 4 — Streaming receipt-attached expense reports:

→ Keep pdfkit short-term, migrate to @react-pdf/renderer.renderToStream medium-term.

Why keep:

  • expense-pdf.service.ts is the only pdfkit consumer. Its streaming pattern (500 receipts at <100MB RSS) is the load-bearing reason for pdfkit's existence in our deps.
  • @react-pdf/renderer.renderToStream documented in Context7 supports the same use case — but verification needs an actual perf test against a 500-receipt fixture before committing.

Migration plan:

  • Phase 1 (now): replace pdfme templates with @react-pdf/renderer.
  • Phase 2 (after we have @react-pdf/renderer in the codebase): re-test expense-pdf with renderToStream against the 500-receipt fixture. If memory stays under 200MB, swap pdfkit out. If not, keep pdfkit and document the constraint.

Net result after Phase 1

Remove: @pdfme/common, @pdfme/generator, @pdfme/schemas, 571-line bridge file.

Keep: pdf-lib (AcroForm), pdfkit (streaming expenses, pending Phase 2), tesseract.js (OCR).

Add: @react-pdf/renderer, unpdf.

Deps net: 2, 571 LOC of bridge code, +standard declarative API for all templates.


35.B — High-value package additions (prioritized)

Each row below has been validated via Context7 unless marked otherwise.

Tier 1 — Adopt alongside the planned Zod 4 / Tailwind 4 work

Package Replaces / unlocks Where it lands in our code Effort
drizzle-zod (already in drizzle-orm) ~30 hand-maintained validators in src/lib/validators/ createInsertSchema(clients).omit({ id, portId }) etc. 2-3h
@react-pdf/renderer 8 pdfme templates + 571-line tiptap bridge src/lib/pdf/templates/* 4-6h
react-email + @react-email/components 8 hand-strung HTML templates in src/lib/email/templates/ Each becomes a .tsx component, rendered via await render(<…/>) then handed to nodemailer unchanged 2-3h (one template at a time)
@tanstack/react-virtual Pagination on client-list, yacht-list, berth-list, audit-log-list, inbox useVirtualizer({ count, estimateSize }) inside the list shells 1h per list × 5 lists
ts-pattern 19-case dispatch in search.service.ts, 13-case Documenso webhook, 12-case client-restore.service.ts, 10-case recently-viewed/route.ts, 10-case custom-fields/[entityId]/route.ts match(input).with(...).exhaustive() 30 min per site; start with the Documenso webhook
unpdf Hand-rolled text extraction in berth-pdf-parser.ts extractText(await getDocumentProxy(buf)) 1h

Tier 2 — Independent adopts (polish + perf)

Package What it does for us Effort
@formkit/auto-animate One-liner useAutoAnimate() ref on any list. Drops into: deal pipeline kanban (pipeline-board.tsx), reminders rail, alerts rail, files list, notes list. Zero CSS. ~2kb. 5 min per site
motion (formerly framer-motion) Layout animations for kanban reorder (currently snaps), Vaul drawer enter/exit polish, sheet/drawer slides, <AnimatePresence> for inline edits. ~15kb gzip but tree-shakes well. 1-2h to wire the kanban first
use-debounce Replaces ad-hoc setTimeout debounce in yacht-picker, client-picker, audit-log-list, send-document-dialog, custom-fields-section, berth-picker, interest-picker, dedup-suggestion-panel (8 sites). Typed useDebouncedCallback. ~3kb. 30 min total
fast-deep-equal Memo comparator for DataTable and React Query select functions. Drops re-renders when stable references arrive with new identity. ~1kb. 20 min
@upstash/ratelimit Replaces hand-rolled rate limiters in src/lib/rate-limit.ts, api/helpers.ts, route-helpers.ts, document-sends.service.ts. Uses our existing Redis. Sliding-window / fixed-window / token-bucket algorithms tested at scale. 1-2h

Tier 3 — Strategic adopts (bigger commitments)

Package What it unlocks Notes
next-safe-action Type-safe server actions with built-in Zod validation, ownership middleware, useHookFormAction hook. Each form drops ~30 LOC of apiFetch + toastError + mutation-hook plumbing to ~5. Pairs with useHookFormAction which already speaks Zod/RHF. Migrate gradually — use for new forms first, keep API routes for external callers. Couples with Zod 4 (since safe-action v8+ targets Zod 4 best).
@axe-core/playwright Accessibility audit during smoke tests. The 33-agent audit flagged WCAG gaps; this catches regressions automatically. ~30 LOC of test setup. Fails CI on new violations.
@tiptap/core + @tiptap/react + extension packs Real rich-text editor for notes (clients/interests/yachts/companies all have polymorphic notes). Currently plain text. Sales reps note things like "call after 4pm UTC, prefers WhatsApp" — bold/italic/links/lists/mentions would help. Tiptap's JSON output format is already in our codebase (the bridge layer), so we'd be storing the same shape we already render. Decision: keep notes plain or upgrade to rich? If yes, ~3h to wire one entity's notes; the others copy the pattern.
@next/bundle-analyzer Wraps next.config.ts. Generates client + server bundle treemaps after every build. Catches when a tiny PR pulls in recharts on a route that shouldn't have it. The 33-agent audit flagged recharts + pdfme as bundle bloat — this is the tooling to keep that honest. 15 min setup. Run with ANALYZE=true pnpm build.
@sentry/nextjs Error tracking with frontend + backend correlation, release tracking, source maps, performance traces, replay (optional). We have pino logs but no aggregation/alerting/correlation. Important once we have customer-facing users. Decision: do we want a SaaS dependency? Self-hosted GlitchTip is also an option (Sentry-protocol-compatible).
@vercel/og (or satori) Generate Open Graph images for shared docs/portal links. Currently the portal has no social previews; if a client shares their EOI link in WhatsApp/Email, the preview is blank. ~10 LOC per route. 1h for portal share routes.
papaparse CSV import/export. Sales reps frequently ask for "export to Excel." Plays well with our existing TanStack Table data. ~17kb. 30 min for client/interest list export.
@formkit/tempo OR date-fns helpers We have 44 files with hand-rolled new Date().toLocaleString() / .toLocaleDateString(). Centralize via a formatDate(date, format, timezone) helper using date-fns (already installed) — no new package needed if we use date-fns's format, formatDistance, formatRelative which we already have. This is a refactor, not an adoption. 2-3h sweep

Tier 4 — Defer or skip

Package Reason
next-pwa / @serwist/next PWA assets pending (per MEMORY.md). When that lands, @serwist/next is the modern choice (next-pwa is unmaintained). For now, skip.
next-intl / i18next / @lingui/core No i18n target today. When we localize, next-intl is the strongest Next.js App Router integration. For now, skip.
@knocklabs/node + @knocklabs/react Notification center + channel routing + preferences UI. Likely overkill — we have a simple in-app + email notification system that works. Revisit if we add SMS or push.
inngest / trigger.dev Background jobs with observability. We use BullMQ; revisit only if we need step functions / cross-service workflows.
posthog-js Product analytics + feature flags + session recording. We have Umami for web analytics; PostHog adds product-level tracking. Decision pending.
@growthbook/growthbook Feature flags only. We don't have any flagged features today.
fuse.js / minisearch Client-side fuzzy search. Useful for already-loaded list filtering, but TanStack Table's built-in filter is usually enough.
@uppy/core + @uppy/dashboard Rich file upload UI with resume, chunking. We have basic file inputs (0 patterns found in audit grep) — not currently a pain point.
@tanstack/react-form Successor to react-hook-form by same team. RHF is mature, well-known, and we have 8 forms on it. No compelling migration.
valibot / arktype Faster zod alternatives. We're committed to Zod 4.
react-hotkeys-hook Excluded per user direction.

35.C — Deprecation / cleanup candidates

Package Reason Action
@radix-ui/react-icons We use lucide-react everywhere. Audit grep shows no imports from @radix-ui/react-icons. Drop after grep-confirm. ~30s.
@pdfme/common + @pdfme/generator + @pdfme/schemas Replaced by @react-pdf/renderer in Phase 1. After PDF migration.
tailwindcss-animate v1.0.7 Last published 2024, no v4 support. Replace with tw-animate-css (the v4-native successor shadcn now recommends). Required if we move to Tailwind 4.
@types/pdfkit Tops at v7.0.0. We're on pdfkit v0.18 — types are loose but functional. Keep until we migrate expense-pdf to @react-pdf/renderer. Defer.
pino-pretty in dependencies Should be devDependencies only — ships ~500kb to prod worker images if it leaks into the runtime path. Audit-verify the build doesn't include it; move if it does. 5 min check.

35.D — Surfaced refactor opportunities (no new package required)

These came up while sweeping for package gaps. They're refactor wins, not package adoptions.

Opportunity Concrete sites Tool
Centralize date formatting 44 files with hand-rolled .toLocaleString() / .toLocaleDateString() formatDate(date, format, timezone) helper using existing date-fns
Centralize debounce 8 picker/list components use-debounce (or hand-rolled hook)
Centralize rate-limiting 4 hand-rolled limiters @upstash/ratelimit
Replace 5-9 large switch statements with exhaustive matchers search.service.ts (19 cases), Documenso webhook (13), client-restore.service.ts (12), recently-viewed/route.ts (10), custom-fields/[entityId]/route.ts (10) ts-pattern

35.E — Final adoption order (revised, incorporating section 35)

This supersedes section 34's sequencing where they overlap.

  1. Now (one focused day) — Zod 4 + @hookform/resolvers 5 + drizzle-zod. One PR. Codemod-friendly. Highest correctness payoff.
  2. Independent (any time)react-email migration of one template (portal-auth.ts recommended first), then expand. Independent of any version upgrade.
  3. Independent (any time)@react-pdf/renderer + unpdf. Replace 8 pdfme templates, delete 571-LOC bridge, add unpdf to berth parser.
  4. Independent (any time)ts-pattern in the Documenso webhook switch first (the audit's bug-class poster child), then sweep the other 4 sites.
  5. Independent (any time)@tanstack/react-virtual on client-list first, copy pattern to 4 other lists.
  6. Independent (any time)@formkit/auto-animate sprinkle. 5-minute wins per site.
  7. Independent (any time)@next/bundle-analyzer install. 15-min setup; ongoing bundle hygiene.
  8. Next focused half-daymotion wire to the kanban for smooth reorder.
  9. 2-4 weeks — Next 15 → 16 + eslint-config-next 16 + eslint 10 (lockstep, codemod).
  10. Focused afternoon — Tailwind 4 via official upgrade tool + swap tailwindcss-animate for tw-animate-css.
  11. When we have a new form to build — pilot next-safe-action there; backfill existing forms gradually.
  12. Decision required first@sentry/nextjs (SaaS dep), @tiptap/* (rich notes Y/N?), posthog-js (analytics scope), papaparse (CSV export priority).

35.F — Skipped per user direction

  • react-hotkeys-hook — no keyboard-shortcut UX target across the platform.

36. Second-pass package sweep — mobile, fluidity, data speed, DX

Section 35 covered the headline adoption candidates. This section is the deliberate second sweep the user requested — looking specifically for libraries we may have missed across four dimensions: current functionality gaps, optimization (mobile included), UI fluidity, and data retrieval/writing speed.

Findings are grouped by dimension. Each entry says (a) what we have now, (b) what the library adds, (c) where in our codebase it'd land, (d) effort.


36.A — Data speed & concurrency

36.A.1 p-queue + p-limit + p-retry (Sindre Sorhus suite)

Concrete pain: 74 Promise.all(...) sites in services/routes. 8 mass- operation services (expense-pdf, berth-pdf, brochures, backup, document-templates, email-compose, documents, email-threads). Naive Promise.all([...mapped]) will:

  • Fire all 500 expense receipts to S3 simultaneously → MinIO connection pool exhaustion + memory spike (expense-pdf.service.ts docs explicitly call this out as a past problem).
  • Fire all bulk-send-document calls at Documenso simultaneously → hit Documenso's per-second rate limit, cause cascade failures.
  • Fire all email-compose attachments at SMTP simultaneously → SMTP connection limit on Mailgun/SES drops requests silently.

p-limit caps concurrency: pLimit(5) runs at most 5 at a time. p-queue is p-limit + interval rate limiting + pause/resume. p-retry handles exponential backoff retries for transient failures.

Land sites:

  • expense-pdf.service.ts — already has streaming logic, but the per-receipt S3 get calls are unbounded.
  • email-compose.service.ts — bulk send-out is the obvious one.
  • backup.service.ts — GDPR export streaming.
  • documents.service.ts — multi-file folder operations.

Effort: 30 min per service. ~1.5kb each.

36.A.2 @tanstack/query-broadcast-client-experimental

Concrete pain: A rep has the CRM open in two tabs. They update a client in tab A — tab B's stale cache continues showing old values until the next refetch.

What it adds: BroadcastChannel sync between tabs. Free cross-tab cache coherence with no server roundtrips.

Land site: One line in src/providers/query-provider.tsx:

broadcastQueryClient({ queryClient, broadcastChannel: 'pn-crm' });

Effort: 5 minutes. ~2kb.

36.A.3 Underused Drizzle ORM features (no new package)

We have drizzle-orm 0.45.2 and use ~60% of its capabilities.

  • db.batch(...) for atomic multi-statement transactions on Postgres. Currently we use explicit db.transaction(async (tx) => {...}) blocks everywhere — batch is shorter and lets the driver pipeline.
  • Prepared statements via .prepare() — repeated queries (e.g., getClient(id) called per-request) can be prepared once at boot and reused. Postgres saves the parse+plan cost.
  • with (CTE) clauses — we have 30+ places where we'd benefit from WITH active_interests AS (...) SELECT ... instead of joining the same subquery twice. Audit found N+1 patterns; CTEs flatten them.

Land sites: the recommender SQL aggregate (already uses CTEs), dashboard.service.ts analytics queries, search.service.ts graph expansion. These are all "we already wrote raw SQL strings; rewriting as typed Drizzle CTEs" wins.

Effort: opportunistic. No package change.

36.A.4 postgres.js cursor for large reads

We have postgres ^3.4.9. Its await sql\...`.cursor(rows => ...)streams large result sets in batches without buffering all rows. Currently the GDPR-export bundling and the backupdump-tables` paths buffer everything in memory.

Land sites: backup.service.ts, gdpr-export.service.ts (when we build it — currently parked).

Effort: opportunistic refactor when we touch those services.


36.B — UI fluidity & animation

36.B.1 @use-gesture/react (mobile gestures)

Concrete pain: mobile users can't swipe-to-dismiss the Vaul drawer, swipe sideways between kanban columns, or pinch-zoom berth photos. The audit's mobile pass flagged these.

What it adds: declarative gesture handlers (useDrag, usePinch, useScroll). Composes with motion for spring-physics responses.

Land sites:

  • Pipeline kanban: swipe between stage columns on mobile.
  • Vaul drawer: swipe-down to dismiss (Vaul already does this, but adding custom velocity thresholds via @use-gesture polishes the feel).
  • Berth/yacht photo galleries: pinch-zoom.

Effort: 1h to wire one site as the template. ~5kb.

Concrete pain: berth photos and yacht photos render as static grids (per the audit). On mobile, users want to swipe through them.

What it adds: lightweight, touch-native, accessibility-compliant carousel. Plays with framer-motion if we want fancy transitions. shadcn/ui has a Carousel component built on this — drop-in via the shadcn CLI.

Effort: npx shadcn@latest add carousel, then 10 lines to render the photo array. ~10kb gzip.

36.B.3 yet-another-react-lightbox

Concrete pain: clicking a berth photo currently navigates to a fullscreen image route or doesn't expand at all. Sales reps want lightbox-style preview.

What it adds: fullscreen lightbox with keyboard nav, zoom, swipe, slideshow, captions. Plugin system for video/PDF embed if we extend.

Land sites: berth/yacht detail pages, client docs preview.

Effort: 1h. ~15kb gzip with plugins.

36.B.4 react-resizable-panels

Concrete pain: the docs hub has a fixed-width folder sidebar (per CLAUDE.md's documents-hub rewrite). Power users on wide monitors want to drag-resize it.

What it adds: keyboard-accessible resizable split panes with persistent sizing (localStorage). shadcn/ui has a Resizable component built on this.

Land sites: docs hub (sidebar | content), email inbox (folder | thread), admin settings (nav | section).

Effort: npx shadcn@latest add resizable, drop in. ~5kb.


36.C — Mobile optimization

36.C.1 browser-image-compression

Concrete pain: the expense-scanner (scan-shell.tsx) and receipt upload paths accept full-resolution phone photos (typically 4-12 MB each). Mobile users on cellular pay bandwidth + battery for sending 4× more data than necessary. The server then re-runs sharp to resize anyway.

What it adds: client-side image compression in WebWorker before upload. Targets maxSizeMB, maxWidthOrHeight, useWebWorker. The server still validates magic-bytes + sharp-resizes, but receives a 500KB-resized JPG instead of a 12MB original.

Concrete win: a rep on 3G uploading a receipt: ~30s wait → ~5s wait. Server CPU on sharp resize drops to a no-op since the client did it.

Effort: 30 min to wire scan-shell.tsx. ~25kb gzip (worker-bundled so zero main-thread cost).

36.C.2 partysocket

Concrete pain: mobile users on flaky networks frequently lose the Socket.IO connection. Our current client uses Socket.IO's built-in reconnect, which is good but not great for mobile.

What it adds: drop-in WebSocket wrapper with:

  • Exponential backoff with jitter (default Socket.IO is linear).
  • Message queue while disconnected (Socket.IO buffers via volatile flag only).
  • Auto-reconnect on online event + visibilitychange (page wake).
  • Optional auto-detect connection quality (slow vs fast).

Land site: src/providers/socket-provider.tsx.

Effort: depends — partysocket works with raw WS, not Socket.IO's protocol. For Socket.IO we'd need socket.io-client + manual reconnect tuning, or migrate the realtime layer to plain WebSockets (significant). Park as a "mobile flake" investigation, not an immediate adoption.

36.C.3 react-virtuoso (alternative to TanStack Virtual)

Concrete pain: the inbox (src/components/layout/inbox.tsx) uses a plain <ScrollArea className="max-h-[400px]"> with no virtualization. For users with hundreds of unread items, mobile scrolling chugs.

What it adds: specialized virtualization for chat-like / inbox-like UIs with variable-height items and "scroll to bottom on new message" semantics. TanStack Virtual is more headless / generic; Virtuoso is opinionated and better for inbox-shaped UIs.

Land site: inbox.tsx specifically. For the regular lists (client/yacht/berth), TanStack Virtual is still the right call (section 35.B.4).

Effort: 45 min. ~10kb.

36.C.4 @formkit/auto-animate (revisit for mobile)

Already in section 35.B but worth re-emphasising: on mobile, list items appearing/disappearing without animation feels janky. Free polish.


36.D — Input quality & forms

36.D.1 react-imask or react-number-format

Concrete pain: we have currency inputs, phone inputs, date inputs spread across berth-form, expense-form, invoice-form, client-form. The audit flagged inconsistent formatting (decimals, thousand-separators, phone-prefix handling).

What it adds: declarative input masks — <IMaskInput mask="$num" scale={2} thousandsSeparator="," />. Plays cleanly with react-hook-form.

react-number-format is the lighter-weight, currency-specific option. react-imask covers more patterns (phone, date, custom).

Land sites: ~6 form components.

Effort: 30 min per form × 6 = 3h. OR keep our hand-rolled formatters and don't add the dep. Decision pending.

36.D.2 @hookform/devtools (dev-only)

What it adds: a floating panel in the browser showing react-hook-form state in real time (values, errors, isDirty, isValid, touched fields). Massive debug-time win.

Land site: wrap forms in <DevTool control={form.control} /> in dev builds only.

Effort: 15 min. dev-only, ships zero to prod.


36.E — Security & sanitization

36.E.1 isomorphic-dompurify

Concrete pain: src/lib/utils/markdown-email.ts hand-rolls HTML escape + safe-link rendering for email bodies. The audit raised XSS concerns (CRIT-2 in section 4) about admin-supplied content in templates and email bodies. Our hand-rolled escapeHtml is correct for the basic cases, but DOMPurify handles edge cases the audit listed (data URLs, nested encoding, javascript: in href attrs).

What it adds: battle-tested HTML sanitizer used by Google, Microsoft, GitHub. Works in Node + browser (the isomorphic- prefix is the SSR-compatible wrapper around the regular dompurify).

Land sites:

  • renderEmailBody() in markdown-email.ts.
  • Anywhere we render user-supplied HTML (template preview, document body display).

Effort: 1h migration + audit. ~25kb (Node) / ~50kb (browser), acceptable.

36.E.2 @noble/hashes (already covered by better-auth)

We use better-auth for password hashing. No need to add.

36.E.3 WebAuthn / Passkeys (@simplewebauthn/server + /browser)

What it adds: passwordless authentication via device passkeys (Touch ID, Windows Hello, YubiKey). Better Auth has a WebAuthn plugin that wraps these.

Decision required: is passwordless a 2026 roadmap item?


36.F — Observability & perf measurement

36.F.1 web-vitals

Concrete pain: we have no real-user perf data. We don't know our P75 LCP, P75 INP, or P75 CLS across our user base. Any future perf optimization (Cache Components, Tailwind 4, dynamic imports) is shooting in the dark without baseline measurement.

What it adds: Google's official Core Web Vitals library. Ships onLCP, onINP, onCLS, onFCP, onTTFB callbacks. Reports values once per page lifecycle.

Land site: src/app/(dashboard)/layout.tsx — wire a listener that POSTs vitals to /api/v1/internal/vitals (new endpoint, append to existing client_metrics table or similar). 30 LOC end-to-end.

Effort: 1h including backend logging. ~2kb. High value because without this we're guessing about perf wins.

36.F.2 pino-http

Concrete pain: we have request logging via custom middleware. pino-http is the canonical pino HTTP request logger with automatic request-id propagation, response time, status code, and integration with our pino logger. Likely already partially implemented via our hand-rolled middleware.

Effort: check existing middleware first — may already cover this.

36.F.3 @sentry/nextjs (revisit from section 35)

Covered in 35.B Tier 3. Adoption gated on the SaaS-dep decision.


36.G — TypeScript ergonomics

36.G.1 @total-typescript/ts-reset

Concrete pain: TypeScript's stdlib types have well-known foot-guns:

  • Array.isArray(x) narrows to any[] (drops the actual type).
  • JSON.parse(s) returns any (defeats type safety entirely).
  • fetch().json() returns Promise<any>.
  • .filter(Boolean) doesn't remove null | undefined from the type.
  • Array.prototype.includes is too strict on its argument.

ts-reset is a single .d.ts import (import '@total-typescript/ts-reset') that fixes all of these globally. Used by Anthropic, Stripe, Vercel internally.

Concrete impact: likely catches 10-20 latent bugs across our 1000+ TS files where someone called JSON.parse(body) and continued treating the result as a typed object without parsing through Zod.

Effort: 1 line in src/types/globals.d.ts. dev-time only, ships zero runtime.

36.G.2 type-fest

What it adds: ~150 utility types (SetRequired, SetOptional, PartialDeep, MergeDeep, Promisable, Jsonifiable, etc.) that extend TypeScript's built-ins.

Land sites: anywhere we're hand-rolling Omit<X, Y> & Pick<Z, W> gymnastics — type-fest usually has a named util that's clearer.

Effort: opportunistic. ~0kb runtime (types only).

36.G.3 tsc-files

Concrete pain: pre-commit hook runs ESLint on staged files (fast) but no type-check. Type errors slip through to CI.

What it adds: typecheck only the staged TS files and their dependencies, not the full repo. Drops a pre-commit hook from "skip because too slow" to "always on, sub-2-second."

Land site: .husky/pre-commit + lint-staged.config.mjs"*.ts": ["tsc-files --noEmit"].

Effort: 15 min.


36.H — In-browser PDF viewing

36.H.1 pdfjs-dist + a viewer wrapper

Concrete pain: the docs hub (per CLAUDE.md) lets users upload and file PDFs. There's currently no in-app preview — clicking a file likely downloads it or opens in a new tab. A real CRM should preview the PDF inline.

What it adds:

  • pdfjs-dist is Mozilla's pdf.js — the engine.
  • @react-pdf-viewer/core is the most feature-rich React wrapper (zoom, search, annotations).
  • Alternatively, react-pdf (Wojtek Maj's, not @react-pdf/renderer) is a lighter wrapper.

Land site: docs hub file detail / preview pane. EOI signing preview in admin.

Effort: 2-3h for a polished viewer with zoom + page nav. ~150kb gzip (pdf.js is unavoidable; lazy-load only when preview opens).

Note vs section 35.A: @react-pdf/renderer (generator) and pdfjs-dist (viewer) are complementary. We need both: one to make PDFs, one to show them.


36.I — Testing & development data

36.I.1 @faker-js/faker

Concrete pain: seed data is currently hand-maintained (mostly). Faker would replace hand-rolled fake names, emails, addresses, phone numbers, vehicle/yacht names, dates, marina locations with reproducible, locale-aware fakes.

Land site: src/lib/db/seed.ts, src/lib/db/seed-synthetic.ts.

Effort: 1-2h. ~3MB gzip — dev-only, not shipped.

36.I.2 msw (Mock Service Worker)

Concrete pain: integration tests that hit external services (Documenso, SMTP, IMAP) either skip in CI or fail intermittently. msw intercepts fetch/HTTP at the network layer in tests so we can mock external responses deterministically.

Land site: tests/integration/ setup — wrap Documenso + SMTP clients with MSW handlers.

Effort: 2-3h. dev-only.


36.J — Workflow & state machines

36.J.1 @xstate/react

Audit found only one multi-step flow (send-document-dialog.tsx). EOI signing has steps but they're sequential, not state-machine-y. The GDPR export job is a backend state machine but bullmq handles it.

Verdict: not warranted right now. Revisit if we build the client-onboarding flow or the multi-step EOI-with-multi-berth-and- payment-and-signing wizard the roadmap mentions.


36.K — Search & filtering

36.K.1 Postgres-native FTS (no new package — schema migration)

Concrete pain: search.service.ts uses LIKE '%term%' on client/yacht/ company tables. Slow at scale; doesn't rank.

What we could add: Postgres tsvector columns + GIN indexes + a single to_tsquery() call per search. This is all native Postgres — no new npm dep. Drizzle supports it via sql\...`` template literals.

Effort: migration (30 min) + service refactor (2h) + e2e re-run.

36.K.2 External search engines (meilisearch, typesense)

Verdict: overkill until we're past 100k clients per port. Postgres FTS will hold for years. Defer indefinitely.


36.L — Final updated adoption order (incorporating section 36)

Layered on section 35.E:

Same-day adopts (low-risk, high-leverage):

  • @total-typescript/ts-reset — 1-line type-safety upgrade. Do this before any Zod 4 work — it'll catch latent bugs along the way.
  • web-vitals — establish perf baseline before any optimization.
  • @hookform/devtools — dev-only DX win.

Adopt alongside section 35.B Tier 1:

  • p-limit — pair with the section 35 mass-operation refactors. The Documenso bulk-send path is the highest-priority site.
  • @tanstack/query-broadcast-client-experimental — 1-liner in the query provider.

Adopt with mobile/UX work:

  • browser-image-compression — wire into scan-shell first.
  • embla-carousel-react + yet-another-react-lightbox — pair with berth/yacht photo gallery work.
  • react-resizable-panels — pair with docs hub UX work.
  • @use-gesture/react — pair with kanban-on-mobile polish.

Adopt with security pass:

  • isomorphic-dompurify — replaces hand-rolled escapeHtml. Pair with the audit's XSS hardening pass.

Adopt with the docs hub Phase 2:

  • pdfjs-dist + viewer wrapper — when in-app PDF preview becomes a user request.

Park / defer:

  • partysocket (requires Socket.IO investigation first).
  • @xstate/react (no current target).
  • External search engines.
  • WebAuthn / passkeys (roadmap decision).

36.M — Final summary

The first sweep (section 35) found the headline replacements: Zod 4 + drizzle-zod + react-email + @react-pdf/renderer is the single highest-leverage week of work.

This second sweep (section 36) found the operational hardening layer:

  • p-limit family for the 74 unbounded Promise.all sites.
  • @total-typescript/ts-reset for free type safety across 1000+ files.
  • web-vitals to establish a perf baseline before we optimize.
  • isomorphic-dompurify to harden the email/template rendering.
  • browser-image-compression for mobile bandwidth / battery.
  • @tanstack/query-broadcast-client-experimental for free cross-tab cache sync.
  • react-resizable-panels + embla-carousel-react + yet-another-react-lightbox for the photo/preview surfaces.

Together with section 35, this gives us a concrete shopping list of ~20 packages with explicit land-sites in our code and effort estimates, plus 5-6 cleanup-candidate removals. Adopting all of them would shed ~600 LOC of hand-rolled code, eliminate ~5 categories of latent bugs (timezone, XSS, race conditions, type stdlib quirks, missing exhaustiveness), and meaningfully improve mobile UX + perf measurability.


Bottom line: the deps audit (section 34) showed we're secure today. This section (35) shows where we can make the codebase meaningfully better — smaller, cleaner, more declarative — by leveraging packages we don't yet use. The single highest-leverage move is Zod 4 + drizzle-zod + react-email in the same focused day: it kills the validator-drift problem, lands the 14× parse-perf win, and starts paying down the hand-strung-email-templates debt all at once. The PDF stack overhaul (35.A) is the second-highest-leverage move: removing pdfme + the 571-line Tiptap bridge in favor of declarative React components is a category-of-bug eliminator, not just a refactor.