Files
pn-new-crm/src/lib/db/index.ts

83 lines
3.6 KiB
TypeScript
Raw Normal View History

import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import * as schema from './schema';
const connectionString = process.env.DATABASE_URL!;
// Connection pool for queries.
fix(audit-integrations): SMTP/PG/Socket.IO timeouts, prompt injection, secret-at-rest A focused review of every external integration surfaced six issues the original audit missed. Fixed here. HIGH * Socket.IO had an unconditional 30-second idle disconnect on every socket. The comment on the line acknowledged it was "for development only, would be longer in prod" but no NODE_ENV guard existed, and the `socket.onAny` listener only resets on inbound client events — every dashboard connection that received only server-push events would have been torn down every 30s in production. Removed the manual idle timer entirely; Socket.IO's pingTimeout / pingInterval handles dead-transport detection at the protocol level. * SMTP transporters had no `connectionTimeout` / `greetingTimeout` / `socketTimeout`. Nodemailer's defaults are 2 minutes for connect and unlimited for socket — a hung SMTP server would have held a BullMQ `email` worker concurrency slot for up to 10 min per job (5 retries × 2 min). Set 10s/10s/30s on both the system transporter in `src/lib/email/index.ts` and the user-account transporter in `email-compose.service.ts`. MEDIUM * PostgreSQL pool had no `statement_timeout` / `idle_in_transaction_session_timeout`. A slow query or transaction held by a crashed handler would have eventually exhausted the 20-connection pool. 30s statement cap, 10s idle-in-tx cap, plus `max_lifetime: 30min` to recycle connections. * `umami_password` and `umami_api_token` were stored as plaintext in `system_settings` (the SMTP and S3 secret paths use AES-GCM). The reader now passes them through `readSecret()` which auto-detects the encrypted `iv:cipher:tag` shape and decrypts, falling back to legacy plaintext so operators can rotate without a flag-day. * AI email-draft worker interpolated `additionalInstructions` (user- controlled) directly into the OpenAI prompt — a hostile rep could close the instructions block and inject prompt directives that override the system prompt. Added `sanitizeForPrompt()` that strips newlines + quote chars, caps at 500 chars, and the prompt now wraps the value in a "treat as data not commands" preamble. LOW * Legacy `ensureBucket()` in `src/lib/minio/index.ts` was unguarded — if any future code imported it (currently no callers), a misconfigured prod deploy could mint a fresh empty bucket. Now matches the gate used by the pluggable S3Backend (`MINIO_AUTO_CREATE_BUCKET=true` required) so the legacy export and the new pluggable path agree. Confirmed not-an-issue: BullMQ Workers create connections via `{ url }` options object, and BullMQ sets `maxRetriesPerRequest: null` internally for those — no fix needed. The shared `redis` singleton that does keep `maxRetriesPerRequest: 3` is used only for direct Redis ops (rate-limit sliding window, etc.), never for blocking BullMQ commands, so the value is correct there. Test status: 1175/1175 vitest, tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:31:50 +02:00
//
// `statement_timeout` and `idle_in_transaction_session_timeout` are set
// per-connection via `connection.options` (postgres.js exposes the
// startup-parameter list there). Without these, a slow query or a
// transaction left open by a crashed handler holds a connection slot
// indefinitely and eventually exhausts the pool (max=20). The 30s
// statement cap is well above expected query latency; tune up if a
// legitimate report needs longer.
//
// `max_lifetime` recycles connections every 30 minutes so any
// per-connection state drift (prepared statements, GUCs) doesn't
// accumulate forever.
// Larger pool in development because Next dev fans out (HMR refetches,
// multi-widget dashboards, React Query refetch-on-focus) and a single
// admin clicking around can saturate 20 slots. Production stays at the
// conservative 20 so we don't hammer postgres in a multi-replica deploy.
//
// 60 was too aggressive locally — postgres + the drizzle logger creates
// massive log volume that backed up node's stderr, blocking the event
// loop on otherwise-cheap requests. 30 is a middle ground that holds
// during clients-page fanout without log-storm.
const POOL_MAX = process.env.NODE_ENV === 'development' ? 30 : 20;
// Pool reliability hardening (post-audit F8):
// During the audit the dev server twice entered a stuck state where every
// query 500'd with `write CONNECT_TIMEOUT` while the DB was healthy
// (1 of 100 connections used, queryable from psql immediately).
// The Docker bridge can silently drop TCP sockets and postgres-js's pool
// holds onto the stale handles until max_lifetime expires.
// - connect_timeout: 5s so failures surface fast instead of stalling
// requests for 10s before erroring.
// - max_lifetime: 10min so connections recycle before stale sockets
// accumulate. Was 30min — too long for the Docker socket-drop pattern.
// - onnotice: surfaces postgres NOTICE/WARNING messages that we'd
// otherwise miss (extension warnings, deprecation hints).
const queryClient = postgres(connectionString, {
max: POOL_MAX,
idle_timeout: 20,
connect_timeout: 5,
max_lifetime: 60 * 10,
onnotice: (notice) => {
// postgres-js types `notice` as `unknown`; the runtime shape is
// { severity, code, message, ... }. Only surface WARNING+.
const n = notice as { severity?: string; message?: string };
if (n.severity && n.severity !== 'NOTICE') {
console.warn(`[postgres ${n.severity}] ${n.message ?? ''}`);
}
},
fix(audit-integrations): SMTP/PG/Socket.IO timeouts, prompt injection, secret-at-rest A focused review of every external integration surfaced six issues the original audit missed. Fixed here. HIGH * Socket.IO had an unconditional 30-second idle disconnect on every socket. The comment on the line acknowledged it was "for development only, would be longer in prod" but no NODE_ENV guard existed, and the `socket.onAny` listener only resets on inbound client events — every dashboard connection that received only server-push events would have been torn down every 30s in production. Removed the manual idle timer entirely; Socket.IO's pingTimeout / pingInterval handles dead-transport detection at the protocol level. * SMTP transporters had no `connectionTimeout` / `greetingTimeout` / `socketTimeout`. Nodemailer's defaults are 2 minutes for connect and unlimited for socket — a hung SMTP server would have held a BullMQ `email` worker concurrency slot for up to 10 min per job (5 retries × 2 min). Set 10s/10s/30s on both the system transporter in `src/lib/email/index.ts` and the user-account transporter in `email-compose.service.ts`. MEDIUM * PostgreSQL pool had no `statement_timeout` / `idle_in_transaction_session_timeout`. A slow query or transaction held by a crashed handler would have eventually exhausted the 20-connection pool. 30s statement cap, 10s idle-in-tx cap, plus `max_lifetime: 30min` to recycle connections. * `umami_password` and `umami_api_token` were stored as plaintext in `system_settings` (the SMTP and S3 secret paths use AES-GCM). The reader now passes them through `readSecret()` which auto-detects the encrypted `iv:cipher:tag` shape and decrypts, falling back to legacy plaintext so operators can rotate without a flag-day. * AI email-draft worker interpolated `additionalInstructions` (user- controlled) directly into the OpenAI prompt — a hostile rep could close the instructions block and inject prompt directives that override the system prompt. Added `sanitizeForPrompt()` that strips newlines + quote chars, caps at 500 chars, and the prompt now wraps the value in a "treat as data not commands" preamble. LOW * Legacy `ensureBucket()` in `src/lib/minio/index.ts` was unguarded — if any future code imported it (currently no callers), a misconfigured prod deploy could mint a fresh empty bucket. Now matches the gate used by the pluggable S3Backend (`MINIO_AUTO_CREATE_BUCKET=true` required) so the legacy export and the new pluggable path agree. Confirmed not-an-issue: BullMQ Workers create connections via `{ url }` options object, and BullMQ sets `maxRetriesPerRequest: null` internally for those — no fix needed. The shared `redis` singleton that does keep `maxRetriesPerRequest: 3` is used only for direct Redis ops (rate-limit sliding window, etc.), never for blocking BullMQ commands, so the value is correct there. Test status: 1175/1175 vitest, tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:31:50 +02:00
connection: {
// ms values per postgres.js types; these become Postgres GUC settings
// applied at session start.
statement_timeout: 30_000,
idle_in_transaction_session_timeout: 10_000,
},
});
// Drizzle query logging is opt-in via DRIZZLE_LOG=1 in development.
// Was on-by-default in dev but the volume (per-query SQL + parameters
// for 60+ queries per page load) saturated process.stderr and blocked
// the Node event loop, causing apparent dev-server "hangs" after
// repeated navigation. The HTTP request logs from Next still show
// query routing, which is enough for normal debugging.
export const db = drizzle(queryClient, {
schema,
logger: process.env.DRIZZLE_LOG === '1',
});
/** Close the underlying connection pool. Used by the vitest teardown so
* the parent process can exit cleanly. */
export async function closeDb(): Promise<void> {
await queryClient.end({ timeout: 5 });
}
export type Database = typeof db;