fix(audit-verification): regressions found in post-Tier-6 review

Two parallel reviews of the Tier 0–6 work surfaced one CRITICAL
regression and a handful of remaining cross-tenant gaps that the
original audit didn't enumerate. All fixed here:

CRITICAL
* document-reminders.processReminderQueue — the new bulk-fetch
  leftJoin to documentTemplates was scoped on `templateType` alone.
  Templates of the same type exist in every port; the cartesian
  explosion would have fired one Documenso reminder PER matching
  template-row per cron tick (a 5-port deploy = 5 reminders to the
  same signer per cycle). Added eq(documentTemplates.portId, portId)
  to the join.
* All five remaining Documenso webhook handlers (RecipientSigned /
  Completed / Opened / Rejected / Cancelled) accept and require an
  optional portId now, with a shared resolveWebhookDocument() helper
  that refuses to mutate when the lookup is ambiguous across tenants
  without a resolved port. Tier 5's port-scoping was applied only to
  Expired; the route now forwards the matched portId to every
  handler. Tightens the WHERE clauses on subsequent UPDATEs to (id,
  portId) for defense-in-depth.

HIGH
* verifyDocumensoSecret rejects when `expected` is empty —
  timingSafeEqual(0-bytes, 0-bytes) was returning true, so a dev env
  with a blank DOCUMENSO_WEBHOOK_SECRET would accept a request whose
  X-Documenso-Secret header was also missing/empty.
  listDocumensoWebhookSecrets skips the env entry when blank.
* /api/public/health — the website-intake-secret comparison was a
  string `===` (not constant-time). Switched to timingSafeEqual via
  Buffer.from().

MEDIUM
* server.ts SIGTERM ordering — Socket.io closes BEFORE the HTTP
  drain so long-poll websockets stop holding the server open past
  the compose stop_grace_period.
* /api/v1/me PATCH preferences merge — allow-list filter on the
  merged JSONB so legacy rows from the old .passthrough() era stop
  silently re-shipping their bloat to disk.

Migration fixes (deploy-blocking)
* 0041 referenced `port_role_overrides.permissions` (column is
  `permission_overrides`) — overrides are partial JSONB and don't
  need backfilling at all (deepMerge resolves edit from the base
  role). Removed the override UPDATEs entirely.
* 0042 switched all FK + CHECK adds to NOT VALID + VALIDATE so the
  brief table-lock phase is decoupled from the row-scan validation,
  giving a cleaner abort-and-restart story if a constraint catches
  dirty production data. Added a pre-cleanup UPDATE for
  invoices.billing_entity_id = '' rows (backfills from clientName,
  falls back to the row id) so the new non-empty CHECK passes on a
  dirty table.

Test status: 1175/1175 vitest, tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt Ciaccio
2026-05-05 21:19:39 +02:00
parent 83239104e0
commit 63c4073e64
10 changed files with 218 additions and 150 deletions

View File

@@ -23,20 +23,26 @@ const dev = env.NODE_ENV !== 'production';
async function gracefulShutdown(signal: string, httpServer: HttpServer): Promise<void> {
logger.info({ signal }, 'Shutdown signal received; closing connections');
// Stop accepting new HTTP connections, then drain in-flight ones.
// Order matters: close Socket.io first so it stops accepting new
// sockets and emits disconnect events while the HTTP server is still
// up to flush them. `httpServer.close` only stops new connections;
// it waits for keep-alive HTTP and long-poll websockets to drain on
// their own, so without an explicit io.close() upfront the polls hold
// the server past the compose stop_grace_period and the process gets
// SIGKILL'd mid-frame.
await closeSocketServer().catch((err) => logger.warn({ err }, 'closeSocketServer error'));
// Then drain the HTTP layer.
await new Promise<void>((resolve) => {
httpServer.close((err) => {
if (err) logger.warn({ err }, 'httpServer.close emitted error');
resolve();
});
// Hard timeout — `httpServer.close` waits for ALL keep-alive sockets
// to drain on their own, which can stretch much longer than the
// compose stop_grace_period. 25s leaves headroom under a 30s grace.
// Hard timeout — 25s leaves headroom under a 30s compose grace
// period before SIGKILL would arrive anyway.
setTimeout(() => resolve(), 25_000).unref();
});
await closeSocketServer().catch((err) => logger.warn({ err }, 'closeSocketServer error'));
try {
redis.disconnect();
} catch (err) {