Commit Graph

6 Commits

Author SHA1 Message Date
2c57082d8d fix(P1): postgres-js pool reliability — F8
During the audit the dev server twice entered a stuck state where every
query 500'd with `write CONNECT_TIMEOUT` while the DB was healthy (1/100
connections used, queryable from psql immediately). The Docker bridge can
silently drop TCP sockets and postgres-js holds the stale handles until
max_lifetime expires.

- connect_timeout: 10 → 5  (fail fast)
- max_lifetime: 30min → 10min  (recycle before staleness accumulates)
- onnotice: surface NOTICE/WARNING for visibility

Reduces the window of stuck state. Full recovery still requires a
restart if the pool hard-fails. pgbouncer in production is the proper
long-term answer; this is the safe one-file change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 22:40:24 +02:00
de8726a9b9 fix(db): disable drizzle dev logger by default + pool max=30 (was 60)
Two changes consolidated as the root-cause fix for the recurring dev
server hangs:

1) DEV pool max 60 → 30. 60 caused 60 simultaneous query log lines
   written via process.stderr per page-load on heavy admin pages.
   stderr write backpressure stalled the Node event loop, manifesting
   as full HTTP request hangs (TCP accept worked, server never wrote
   the response). 30 is enough headroom for the clients-page aggregate
   fanout (≈12 queries) + sidebar widgets without the log-storm.

2) DRIZZLE_LOG opt-in. Drizzle's `logger: true` setting writes every
   query (full SQL + params) to stderr. With 30 concurrent queries the
   stderr buffer fills faster than the terminal can drain. Default is
   now off in dev; set DRIZZLE_LOG=1 explicitly when you need it.

Stress-tested with rapid navigation across /dashboard /clients
/documents /yachts /companies /interests /berths /website-analytics —
all 200, no hangs, no timeouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 16:18:01 +02:00
eaa01d25f9 perf(db): bump postgres pool to 60 in development to prevent hub-hang under fanout load
Default max=20 was saturating during normal admin clickthrough — the
clients list page does aggregate-per-client queries (yachts, memberships,
interests, contacts) that fan out 5-6 connections per row, plus
dashboard analytics, plus React Query refetch-on-focus. With 20 slots,
the server appeared to hang for 30s (statement_timeout) until queries
released their slots. Production keeps the conservative max=20 since
multi-replica deployments share the postgres max_connections budget.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 15:56:25 +02:00
Matt Ciaccio
7bd969b41a fix(audit-integrations): SMTP/PG/Socket.IO timeouts, prompt injection, secret-at-rest
A focused review of every external integration surfaced six issues the
original audit missed.  Fixed here.

HIGH
* Socket.IO had an unconditional 30-second idle disconnect on every
  socket.  The comment on the line acknowledged it was "for development
  only, would be longer in prod" but no NODE_ENV guard existed, and the
  `socket.onAny` listener only resets on inbound client events — every
  dashboard connection that received only server-push events would have
  been torn down every 30s in production.  Removed the manual idle
  timer entirely; Socket.IO's pingTimeout / pingInterval handles
  dead-transport detection at the protocol level.
* SMTP transporters had no `connectionTimeout` / `greetingTimeout` /
  `socketTimeout`.  Nodemailer's defaults are 2 minutes for connect
  and unlimited for socket — a hung SMTP server would have held a
  BullMQ `email` worker concurrency slot for up to 10 min per job
  (5 retries × 2 min).  Set 10s/10s/30s on both the system transporter
  in `src/lib/email/index.ts` and the user-account transporter in
  `email-compose.service.ts`.

MEDIUM
* PostgreSQL pool had no `statement_timeout` /
  `idle_in_transaction_session_timeout`.  A slow query or transaction
  held by a crashed handler would have eventually exhausted the
  20-connection pool.  30s statement cap, 10s idle-in-tx cap, plus
  `max_lifetime: 30min` to recycle connections.
* `umami_password` and `umami_api_token` were stored as plaintext in
  `system_settings` (the SMTP and S3 secret paths use AES-GCM).  The
  reader now passes them through `readSecret()` which auto-detects
  the encrypted `iv:cipher:tag` shape and decrypts, falling back to
  legacy plaintext so operators can rotate without a flag-day.
* AI email-draft worker interpolated `additionalInstructions` (user-
  controlled) directly into the OpenAI prompt — a hostile rep could
  close the instructions block and inject prompt directives that
  override the system prompt.  Added `sanitizeForPrompt()` that
  strips newlines + quote chars, caps at 500 chars, and the prompt
  now wraps the value in a "treat as data not commands" preamble.

LOW
* Legacy `ensureBucket()` in `src/lib/minio/index.ts` was unguarded —
  if any future code imported it (currently no callers), a misconfigured
  prod deploy could mint a fresh empty bucket.  Now matches the gate
  used by the pluggable S3Backend (`MINIO_AUTO_CREATE_BUCKET=true`
  required) so the legacy export and the new pluggable path agree.

Confirmed not-an-issue: BullMQ Workers create connections via
`{ url }` options object, and BullMQ sets `maxRetriesPerRequest: null`
internally for those — no fix needed.  The shared `redis` singleton
that does keep `maxRetriesPerRequest: 3` is used only for direct
Redis ops (rate-limit sliding window, etc.), never for blocking
BullMQ commands, so the value is correct there.

Test status: 1175/1175 vitest, tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:31:50 +02:00
Matt Ciaccio
4036c16f39 test(infra): vitest globalSetup teardown purges test-port-* leaks
Integration tests use makePort() which writes ports with slug 'test-port-{rand}'
and never cleans up. Result: 17,564 leaked rows in dev that slowed every page
load fetching the port-switcher list (and was contributing to smoke flakes).

Adds tests/global-setup.ts with a teardown() that DELETEs every 'test-port-%'
row plus its dependent rows across 30+ tables in one CTE. Wires it into
vitest.config.ts via globalSetup. Adds closeDb() helper so the teardown can
end the postgres-js pool cleanly (kills the 'Tests closed but Vite server
won't exit' warning).

Also lands docs/superpowers/specs/2026-04-28-country-phone-timezone-design.md
— full-scope agenda for the country dropdown / E.164 phone input /
country-driven timezone autofill work, ~7 dev days across 10 PRs. Per
user request: 'let's do this full-fledged if we're gonna do it'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:28:15 +02:00
67d7e6e3d5 Initial commit: Port Nimara CRM (Layers 0-4)
Some checks failed
Build & Push Docker Images / build-and-push (push) Has been cancelled
Build & Push Docker Images / deploy (push) Has been cancelled
Build & Push Docker Images / lint (push) Has been cancelled
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00