Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
During the audit the dev server twice entered a stuck state where every
query 500'd with `write CONNECT_TIMEOUT` while the DB was healthy (1/100
connections used, queryable from psql immediately). The Docker bridge can
silently drop TCP sockets and postgres-js holds the stale handles until
max_lifetime expires.
- connect_timeout: 10 → 5 (fail fast)
- max_lifetime: 30min → 10min (recycle before staleness accumulates)
- onnotice: surface NOTICE/WARNING for visibility
Reduces the window of stuck state. Full recovery still requires a
restart if the pool hard-fails. pgbouncer in production is the proper
long-term answer; this is the safe one-file change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes consolidated as the root-cause fix for the recurring dev
server hangs:
1) DEV pool max 60 → 30. 60 caused 60 simultaneous query log lines
written via process.stderr per page-load on heavy admin pages.
stderr write backpressure stalled the Node event loop, manifesting
as full HTTP request hangs (TCP accept worked, server never wrote
the response). 30 is enough headroom for the clients-page aggregate
fanout (≈12 queries) + sidebar widgets without the log-storm.
2) DRIZZLE_LOG opt-in. Drizzle's `logger: true` setting writes every
query (full SQL + params) to stderr. With 30 concurrent queries the
stderr buffer fills faster than the terminal can drain. Default is
now off in dev; set DRIZZLE_LOG=1 explicitly when you need it.
Stress-tested with rapid navigation across /dashboard /clients
/documents /yachts /companies /interests /berths /website-analytics —
all 200, no hangs, no timeouts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default max=20 was saturating during normal admin clickthrough — the
clients list page does aggregate-per-client queries (yachts, memberships,
interests, contacts) that fan out 5-6 connections per row, plus
dashboard analytics, plus React Query refetch-on-focus. With 20 slots,
the server appeared to hang for 30s (statement_timeout) until queries
released their slots. Production keeps the conservative max=20 since
multi-replica deployments share the postgres max_connections budget.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A focused review of every external integration surfaced six issues the
original audit missed. Fixed here.
HIGH
* Socket.IO had an unconditional 30-second idle disconnect on every
socket. The comment on the line acknowledged it was "for development
only, would be longer in prod" but no NODE_ENV guard existed, and the
`socket.onAny` listener only resets on inbound client events — every
dashboard connection that received only server-push events would have
been torn down every 30s in production. Removed the manual idle
timer entirely; Socket.IO's pingTimeout / pingInterval handles
dead-transport detection at the protocol level.
* SMTP transporters had no `connectionTimeout` / `greetingTimeout` /
`socketTimeout`. Nodemailer's defaults are 2 minutes for connect
and unlimited for socket — a hung SMTP server would have held a
BullMQ `email` worker concurrency slot for up to 10 min per job
(5 retries × 2 min). Set 10s/10s/30s on both the system transporter
in `src/lib/email/index.ts` and the user-account transporter in
`email-compose.service.ts`.
MEDIUM
* PostgreSQL pool had no `statement_timeout` /
`idle_in_transaction_session_timeout`. A slow query or transaction
held by a crashed handler would have eventually exhausted the
20-connection pool. 30s statement cap, 10s idle-in-tx cap, plus
`max_lifetime: 30min` to recycle connections.
* `umami_password` and `umami_api_token` were stored as plaintext in
`system_settings` (the SMTP and S3 secret paths use AES-GCM). The
reader now passes them through `readSecret()` which auto-detects
the encrypted `iv:cipher:tag` shape and decrypts, falling back to
legacy plaintext so operators can rotate without a flag-day.
* AI email-draft worker interpolated `additionalInstructions` (user-
controlled) directly into the OpenAI prompt — a hostile rep could
close the instructions block and inject prompt directives that
override the system prompt. Added `sanitizeForPrompt()` that
strips newlines + quote chars, caps at 500 chars, and the prompt
now wraps the value in a "treat as data not commands" preamble.
LOW
* Legacy `ensureBucket()` in `src/lib/minio/index.ts` was unguarded —
if any future code imported it (currently no callers), a misconfigured
prod deploy could mint a fresh empty bucket. Now matches the gate
used by the pluggable S3Backend (`MINIO_AUTO_CREATE_BUCKET=true`
required) so the legacy export and the new pluggable path agree.
Confirmed not-an-issue: BullMQ Workers create connections via
`{ url }` options object, and BullMQ sets `maxRetriesPerRequest: null`
internally for those — no fix needed. The shared `redis` singleton
that does keep `maxRetriesPerRequest: 3` is used only for direct
Redis ops (rate-limit sliding window, etc.), never for blocking
BullMQ commands, so the value is correct there.
Test status: 1175/1175 vitest, tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Integration tests use makePort() which writes ports with slug 'test-port-{rand}'
and never cleans up. Result: 17,564 leaked rows in dev that slowed every page
load fetching the port-switcher list (and was contributing to smoke flakes).
Adds tests/global-setup.ts with a teardown() that DELETEs every 'test-port-%'
row plus its dependent rows across 30+ tables in one CTE. Wires it into
vitest.config.ts via globalSetup. Adds closeDb() helper so the teardown can
end the postgres-js pool cleanly (kills the 'Tests closed but Vite server
won't exit' warning).
Also lands docs/superpowers/specs/2026-04-28-country-phone-timezone-design.md
— full-scope agenda for the country dropdown / E.164 phone input /
country-driven timezone autofill work, ~7 dev days across 10 PRs. Per
user request: 'let's do this full-fledged if we're gonna do it'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>