feat(errors): platform-wide request ids + error codes + admin inspector

End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
parent c4a41d5f5b
commit 4723994bdc
26 changed files with 2027 additions and 169 deletions
--- a/src/lib/api/client.ts
+++ b/src/lib/api/client.ts
@@ -74,13 +74,57 @@ export async function apiFetch<T = unknown>(url: string, opts: ApiFetchOptions =

  if (!res.ok) {
    const error = await res.json().catch(() => ({ error: res.statusText }));
-    throw Object.assign(new Error(error.error ?? 'Request failed'), {
+    // Surface the request id so toasts can display "Error ID: …" and
+    // the user can copy it to a support ticket. Server-side wrappers
+    // always set X-Request-Id, even on early-return 401/403 paths.
+    const requestId = error.requestId ?? res.headers.get('x-request-id') ?? null;
+    throw new ApiError({
+      message: error.error ?? error.message ?? 'Request failed',
      status: res.status,
-      code: error.code,
-      details: error.details,
+      code: error.code ?? null,
+      details: error.details ?? null,
+      requestId,
+      retryAfter: typeof error.retryAfter === 'number' ? error.retryAfter : null,
    });
  }

  if (res.status === 204) return undefined as T;
  return res.json() as Promise<T>;
 }
+
+/**
+ * Structured client-side error thrown by `apiFetch`. Carries the stable
+ * fields a toast / error boundary needs to render a useful message:
+ *
+ *   - `message`: plain-text, ready to show to the user
+ *   - `code`:    stable error code from `src/lib/error-codes.ts`
+ *   - `requestId`: paste this to support to find the row in
+ *                  `/admin/errors/<requestId>`
+ *
+ * Mutations should use the `toastError(err)` helper rather than reading
+ * these fields directly — that keeps the toast format consistent.
+ */
+export class ApiError extends Error {
+  status: number;
+  code: string | null;
+  details: unknown;
+  requestId: string | null;
+  retryAfter: number | null;
+
+  constructor(args: {
+    message: string;
+    status: number;
+    code: string | null;
+    details: unknown;
+    requestId: string | null;
+    retryAfter: number | null;
+  }) {
+    super(args.message);
+    this.name = 'ApiError';
+    this.status = args.status;
+    this.code = args.code;
+    this.details = args.details;
+    this.requestId = args.requestId;
+    this.retryAfter = args.retryAfter;
+  }
+}
--- a/src/lib/api/helpers.ts
+++ b/src/lib/api/helpers.ts
@@ -1,3 +1,5 @@
+import { randomUUID } from 'node:crypto';
+
 import { and, eq } from 'drizzle-orm';
 import { NextRequest, NextResponse } from 'next/server';

@@ -8,6 +10,7 @@ import { type RolePermissions } from '@/lib/db/schema/users';
 import { createAuditLog } from '@/lib/audit';
 import { errorResponse } from '@/lib/errors';
 import { logger } from '@/lib/logger';
+import { runWithRequestContext, getRequestContext } from '@/lib/request-context';
 import {
  checkRateLimit,
  rateLimiters,
@@ -99,118 +102,151 @@ export function withAuth(
  routeContext: { params: Promise<Record<string, string>> },
 ) => Promise<NextResponse> {
  return async (req, routeContext) => {
-    try {
-      // 1. Validate session via Better Auth.
-      const session = await auth.api.getSession({ headers: req.headers });
-      if (!session?.user) {
-        return NextResponse.json({ error: 'Authentication required' }, { status: 401 });
-      }
+    // Mint or accept a request id BEFORE entering the ALS frame so every
+    // log line + the response header reference the same value. Clients
+    // (or upstream proxies) may pre-supply via X-Request-Id; otherwise
+    // generate a fresh UUID. Pattern-validated so a crafted header can't
+    // smuggle log-injection chars.
+    const incomingId = req.headers.get('x-request-id');
+    const requestId =
+      incomingId && /^[A-Za-z0-9-]{8,64}$/.test(incomingId) ? incomingId : randomUUID();

-      // 2. Load the CRM user profile (keyed on Better Auth user ID).
-      const profile = await db.query.userProfiles.findFirst({
-        where: eq(userProfiles.userId, session.user.id),
-      });
-      if (!profile || !profile.isActive) {
-        return NextResponse.json({ error: 'Account disabled' }, { status: 403 });
-      }
+    /** Stamp `X-Request-Id` onto every response leaving the wrapper. */
+    const tag = (res: NextResponse): NextResponse => {
+      res.headers.set('X-Request-Id', requestId);
+      return res;
+    };

-      // 3. Resolve port context.
-      //    Port ID comes from the X-Port-Id header (set by the client after port
-      //    selection), falling back to the user's default port from preferences.
-      //    It NEVER comes from the request body - SECURITY-GUIDELINES.md §2.1.
-      const portIdFromHeader = req.headers.get('X-Port-Id');
-      const portId =
-        portIdFromHeader ??
-        (profile.preferences as { defaultPortId?: string } | null)?.defaultPortId ??
-        null;
+    return runWithRequestContext(
+      {
+        requestId,
+        portId: '',
+        userId: '',
+        method: req.method,
+        path: new URL(req.url).pathname,
+        startedAt: Date.now(),
+      },
+      async () => {
+        try {
+          // 1. Validate session via Better Auth.
+          const session = await auth.api.getSession({ headers: req.headers });
+          if (!session?.user) {
+            return tag(NextResponse.json({ error: 'Authentication required' }, { status: 401 }));
+          }

-      if (!portId && !profile.isSuperAdmin) {
-        return NextResponse.json({ error: 'Port context required' }, { status: 400 });
-      }
+          // 2. Load the CRM user profile (keyed on Better Auth user ID).
+          const profile = await db.query.userProfiles.findFirst({
+            where: eq(userProfiles.userId, session.user.id),
+          });
+          if (!profile || !profile.isActive) {
+            return tag(NextResponse.json({ error: 'Account disabled' }, { status: 403 }));
+          }

-      // 4. Resolve effective permissions.
-      let permissions: RolePermissions | null = null;
-      let portSlug = '';
+          // 3. Resolve port context. Port id comes from the X-Port-Id
+          //    header (set by the client after port selection), falling
+          //    back to the user's default port preference. NEVER from the
+          //    request body — SECURITY-GUIDELINES.md §2.1.
+          const portIdFromHeader = req.headers.get('X-Port-Id');
+          const portId =
+            portIdFromHeader ??
+            (profile.preferences as { defaultPortId?: string } | null)?.defaultPortId ??
+            null;

-      if (!profile.isSuperAdmin && portId) {
-        const portRole = await db.query.userPortRoles.findFirst({
-          where: and(eq(userPortRoles.userId, profile.userId), eq(userPortRoles.portId, portId)),
-          with: {
-            role: true,
-            port: true,
-          },
-        });
+          if (!portId && !profile.isSuperAdmin) {
+            return tag(NextResponse.json({ error: 'Port context required' }, { status: 400 }));
+          }

-        if (!portRole) {
-          return NextResponse.json({ error: 'No access to this port' }, { status: 403 });
-        }
+          // 4. Resolve effective permissions.
+          let permissions: RolePermissions | null = null;
+          let portSlug = '';

-        permissions = { ...(portRole.role.permissions as RolePermissions) };
-        portSlug = (portRole.port as { slug: string } | null)?.slug ?? '';
+          if (!profile.isSuperAdmin && portId) {
+            const portRole = await db.query.userPortRoles.findFirst({
+              where: and(
+                eq(userPortRoles.userId, profile.userId),
+                eq(userPortRoles.portId, portId),
+              ),
+              with: {
+                role: true,
+                port: true,
+              },
+            });

-        // Apply port-specific role overrides (deep-merge on top of base role).
-        const override = await db.query.portRoleOverrides.findFirst({
-          where: and(
-            eq(portRoleOverrides.portId, portId),
-            eq(portRoleOverrides.roleId, portRole.roleId),
-          ),
-        });
+            if (!portRole) {
+              return tag(NextResponse.json({ error: 'No access to this port' }, { status: 403 }));
+            }

-        if (override?.permissionOverrides) {
-          permissions = deepMerge(
-            permissions as unknown as Record<string, unknown>,
-            override.permissionOverrides as Record<string, unknown>,
-          ) as RolePermissions;
-        }
+            permissions = { ...(portRole.role.permissions as RolePermissions) };
+            portSlug = (portRole.port as { slug: string } | null)?.slug ?? '';

-        // Per-user residential toggle - flips the residential domain on
-        // top of whatever the role grants. We never use it to *revoke*
-        // residential access from a role that already grants it.
-        if (portRole.residentialAccess && permissions) {
-          permissions = {
-            ...permissions,
-            residential_clients: { view: true, create: true, edit: true, delete: true },
-            residential_interests: {
-              view: true,
-              create: true,
-              edit: true,
-              delete: true,
-              change_stage: true,
+            // Apply port-specific role overrides (deep-merge on top of base role).
+            const override = await db.query.portRoleOverrides.findFirst({
+              where: and(
+                eq(portRoleOverrides.portId, portId),
+                eq(portRoleOverrides.roleId, portRole.roleId),
+              ),
+            });
+
+            if (override?.permissionOverrides) {
+              permissions = deepMerge(
+                permissions as unknown as Record<string, unknown>,
+                override.permissionOverrides as Record<string, unknown>,
+              ) as RolePermissions;
+            }
+
+            // Per-user residential toggle.
+            if (portRole.residentialAccess && permissions) {
+              permissions = {
+                ...permissions,
+                residential_clients: { view: true, create: true, edit: true, delete: true },
+                residential_interests: {
+                  view: true,
+                  create: true,
+                  edit: true,
+                  delete: true,
+                  change_stage: true,
+                },
+              };
+            }
+          } else if (profile.isSuperAdmin && portId) {
+            const port = await db.query.ports.findFirst({
+              where: eq(ports.id, portId),
+            });
+            if (!port) {
+              return tag(NextResponse.json({ error: 'Port not found' }, { status: 404 }));
+            }
+            portSlug = port.slug;
+          }
+
+          // Now that the user + port are resolved, enrich the ALS frame
+          // so log lines + error_events rows pick up the identifiers.
+          const frame = getRequestContext();
+          if (frame) {
+            frame.userId = profile.userId;
+            frame.portId = portId ?? '';
+          }
+
+          const ctx: AuthContext = {
+            userId: profile.userId,
+            portId: portId ?? '',
+            portSlug,
+            isSuperAdmin: profile.isSuperAdmin,
+            permissions,
+            user: {
+              email: session.user.email,
+              name: session.user.name,
            },
+            ipAddress: req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown',
+            userAgent: req.headers.get('user-agent') ?? 'unknown',
          };
-        }
-      } else if (profile.isSuperAdmin && portId) {
-        // Super admin still needs portSlug for response context.
-        // We also validate the portId actually exists - a super-admin session
-        // must not be able to operate against a fabricated portId.
-        const port = await db.query.ports.findFirst({
-          where: eq(ports.id, portId),
-        });
-        if (!port) {
-          return NextResponse.json({ error: 'Port not found' }, { status: 404 });
-        }
-        portSlug = port.slug;
-      }

-      const ctx: AuthContext = {
-        userId: profile.userId,
-        portId: portId ?? '',
-        portSlug,
-        isSuperAdmin: profile.isSuperAdmin,
-        permissions,
-        user: {
-          email: session.user.email,
-          name: session.user.name,
-        },
-        ipAddress: req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown',
-        userAgent: req.headers.get('user-agent') ?? 'unknown',
-      };
-
-      const params = await routeContext.params;
-      return await handler(req, ctx, params);
-    } catch (error) {
-      return errorResponse(error);
-    }
+          const params = await routeContext.params;
+          return tag(await handler(req, ctx, params));
+        } catch (error) {
+          return tag(errorResponse(error));
+        }
+      },
+    );
  };
 }

--- a/src/lib/api/toast-error.ts
+++ b/src/lib/api/toast-error.ts
@@ -0,0 +1,49 @@
+'use client';
+
+import { toast } from 'sonner';
+
+import { ApiError } from '@/lib/api/client';
+
+/**
+ * Render an API error as a toast in the consistent platform format:
+ *
+ *   ┌─────────────────────────────────────────────┐
+ *   │ {plain-text message}                        │
+ *   │                                             │
+ *   │ Error code: EXPENSES_RECEIPT_REQUIRED       │
+ *   │ Reference ID: ab12-cd34-…  [Copy]           │
+ *   └─────────────────────────────────────────────┘
+ *
+ * Use this anywhere a `useMutation({ onError })` would otherwise just
+ * call `toast.error(err.message)`. Falls back gracefully when the error
+ * isn't an ApiError (network errors, programmer errors, etc.).
+ */
+export function toastError(err: unknown, fallback = 'Something went wrong.'): void {
+  if (err instanceof ApiError) {
+    const lines: string[] = [];
+    if (err.code) lines.push(`Error code: ${err.code}`);
+    if (err.requestId) lines.push(`Reference ID: ${err.requestId}`);
+    toast.error(err.message, {
+      description: lines.length > 0 ? lines.join('\n') : undefined,
+      // Long enough to read the message + grab the reference id.
+      duration: 8_000,
+      action: err.requestId
+        ? {
+            label: 'Copy ID',
+            onClick: () => {
+              if (typeof navigator !== 'undefined' && navigator.clipboard) {
+                void navigator.clipboard.writeText(err.requestId!);
+                toast.success('Reference ID copied');
+              }
+            },
+          }
+        : undefined,
+    });
+    return;
+  }
+  if (err instanceof Error) {
+    toast.error(err.message || fallback);
+    return;
+  }
+  toast.error(fallback);
+}