# Error handling ## Overview Every authenticated request runs inside an `AsyncLocalStorage` frame that carries a `requestId` (UUID) plus the resolved `portId` / `userId` / HTTP method / path / start time. The id surfaces: - as `X-Request-Id` on every response header (success or failure) - inside every pino log line emitted during the request - in the JSON error body returned to the client (`requestId` field) - as the primary key of the `error_events` row written when a 5xx fires A user who hits a failure can copy the **Reference ID** from the toast and a super admin can paste it into `//admin/errors/` to see the full request context, sanitized body, error stack, and a heuristic "likely culprit" hint. ## Throwing errors from a service Use `CodedError` with a registered code: ```ts import { CodedError } from '@/lib/errors'; if (!hasReceipts && !ack) { throw new CodedError('EXPENSES_RECEIPT_REQUIRED'); } ``` The code drives: - the HTTP status (defined in `src/lib/error-codes.ts`) - the **plain-text user-facing message** (no jargon — written for the rep on the phone with a customer) - the stable identifier the user can quote to support For more verbose internal context — admin-only — use `internalMessage`: ```ts throw new CodedError('CROSS_PORT_LINK_REJECTED', { internalMessage: `interest ${a.id} (port ${a.portId}) ↔ berth ${b.id} (port ${b.portId})`, }); ``` The `internalMessage` lands in the `error_events` row and the admin inspector but **never** reaches the client. ## Adding a new error code 1. Open `src/lib/error-codes.ts`. 2. Add an entry to the `ERROR_CODES` map. Convention: `DOMAIN_REASON` in SCREAMING_SNAKE_CASE. ```ts FOO_INVALID_BAR: { status: 400, userMessage: 'That bar value is no good. Please try another.', }, ``` 3. Use it: `throw new CodedError('FOO_INVALID_BAR')`. 4. The code, status, and message are now contractually stable — never rename a code once it has shipped. Documentation, UI, and external integrations may pin to it. ## Plain-text message guidelines User-facing messages should: - Avoid internal jargon (no "constraint violation", "FK", "row lock"). - Be written for a rep on the phone with a customer. - Include the suggested next action when natural ("Ask an admin if you think you should"). - Not include any technical detail that doesn't help the user — the request id + error code carry that. Verbose technical detail belongs in `internalMessage` (admin-only). ## Client side In a `useMutation`, render errors with the shared helper: ```ts import { toastError } from '@/lib/api/toast-error'; const mutation = useMutation({ mutationFn: () => apiFetch('/api/v1/foo', { method: 'POST', body: { ... } }), onSuccess: () => { ... }, onError: (err) => toastError(err), }); ``` The toast renders three lines: ``` {plain-text message} Error code: EXPENSES_RECEIPT_REQUIRED Reference ID: 8f3c-ab12-… [Copy ID] ``` The "Copy ID" action puts the request id on the clipboard so the user can paste it into a support ticket. ## Admin inspector `//admin/errors` lists captured 5xx errors: - Status badge + method + path - "Likely culprit" badge (heuristic — Postgres SQLSTATE, error name, stack-path patterns, message keywords) - Truncated error name + message - Timestamp + reference id Click any row for `//admin/errors/` which shows: - Request shape (method / path / when / duration / port / user / IP / UA) - Likely culprit + plain-English hint + subsystem tag - Full error name, message, stack head (first 4 KB) - Sanitized request body excerpt (max 1 KB; sensitive keys redacted) - Raw metadata (Postgres SQLSTATE codes, internalMessage, etc.) Permission: `admin.view_audit_log`. Super admins see every port's errors; regular admins are scoped to their active port. ## What gets persisted | Status | error_events row? | Toast shows code? | | ------ | ----------------- | ----------------- | | 4xx | No | Yes | | 5xx | **Yes** | Yes | 4xx errors are user-action mistakes (validation, not-found, permission denied). They're visible in the audit log but not the error inspector — that table is reserved for platform faults. 5xx errors hit the `errorEvents` table via `captureErrorEvent` inside `errorResponse`, which: 1. Reads the request context from ALS. 2. Sanitizes + truncates the body (1 KB cap, sensitive keys redacted). 3. Pulls Postgres `code` / `severity` / `cause.code` if the underlying error is a `postgres` driver error. 4. Truncates the stack to 4 KB. 5. Inserts one row keyed on `requestId` with `ON CONFLICT DO NOTHING`. Failure to persist NEVER throws — the user is already getting an error response; we don't want a logging-pipeline failure to mask it. ## Likely-culprit classifier `src/lib/error-classifier.ts` runs four passes against an `error_events` row, first match wins: 1. **Postgres SQLSTATE** (from `metadata.code`): 23502 NOT NULL, 23503 FK, 23505 unique, 23514 CHECK, 42703 schema drift, 42P01 missing table, 40001 serialization, 53300 connection limit, … 2. **Error class name**: `AbortError`, `TimeoutError`, `FetchError`, `ZodError`. 3. **Stack path**: `/lib/storage/`, `/lib/email/`, `documenso`, `openai|claude`, `/queue/workers/`. 4. **Message free-text**: `econnrefused`, `rate limit`, `timeout`, `unauthorized|invalid api key`. Returns `null` when nothing matches; the inspector renders "Uncategorized" in that case. Adding a new heuristic is a one-line edit to the relevant array. ## Pruning `error_events` rows are dropped after 90 days by the maintenance worker (TODO: confirm the worker has the deletion path; if not, add a periodic job that runs `DELETE FROM error_events WHERE created_at < now() - interval '90 days'`). ## Migration path for legacy throws Existing `NotFoundError` / `ForbiddenError` / `ConflictError` / `ValidationError` / `RateLimitError` still work — the user-facing messages on these classes have been rewritten to plain-text defaults. Migration to `CodedError` happens opportunistically: when touching a service to fix something else, swap the throw site for a registered code. A follow-up audit pass should walk `git grep "throw new ValidationError"` and migrate the user-impactful ones to specific codes.