Files
pn-new-crm/src/lib/db/migrations/0040_error_events.sql

29 lines
894 B
MySQL
Raw Normal View History

feat(errors): platform-wide request ids + error codes + admin inspector End-to-end error-handling overhaul. A user hitting any failure now sees a plain-text message + stable error code + reference id. A super admin can paste the id into /admin/errors/<id> for the full request shape, sanitized body, error stack, and a heuristic likely-cause hint. REQUEST CONTEXT (AsyncLocalStorage) - src/lib/request-context.ts mints a per-request frame carrying requestId + portId + userId + method + path + start timestamp. - withAuth wraps every authenticated handler in runWithRequestContext and accepts an upstream X-Request-Id header (validated shape) or generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id response header, including early-return 401/403/4xx paths. - Pino logger reads from the same context via mixin — every log line emitted during the request automatically carries the ids with no per-call threading. ERROR CODE REGISTRY - src/lib/error-codes.ts defines stable DOMAIN_REASON codes with HTTP status + plain-text user-facing message (no jargon, written for the rep on the phone with a customer). - New CodedError class wraps a registered code + optional internalMessage (admin-only — never sent to client). - Existing AppError subclasses got plain-text default rewrites so legacy throw sites improve immediately without migration. - High-impact services migrated to specific codes: expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths (CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY / PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender (INTEREST_PORT_MISMATCH). ERROR ENVELOPE - errorResponse always sets X-Request-Id header + requestId field. - 5xx responses include a "Quote error ID …" friendly line. - 4xx kept clean (validation, permission, not-found don't pollute the inspector — they're already in audit log). PERSISTENCE (error_events table, migration 0040) - One row per 5xx, keyed on requestId, with method/path/status/error name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap; password/token/secret/etc keys redacted)/duration/IP/UA/metadata. - captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code so the classifier can recognize FK / unique / NOT NULL / schema- drift violations. - Failure to persist is logged-not-thrown. LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts) - 4-pass heuristic (first match wins): 1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique, 42703 schema drift, 53300 connection limit, …) 2. Error class name (AbortError, TimeoutError, FetchError, ZodError) 3. Stack-path patterns (/lib/storage/, /lib/email/, documenso, openai|claude, /queue/workers/) 4. Free-text message keywords (econnrefused, rate limit, timeout, unauthorized|invalid api key) - Returns { label, hint, subsystem } for the inspector badge. CLIENT SIDE - apiFetch throws structured ApiError with message + code + requestId + details + retryAfter. - toastError() helper renders the standard 3-line toast: plain message / Error code: X / Reference ID: Y [Copy ID]. ADMIN INSPECTOR - /<port>/admin/errors lists captured 5xx with status badge + path + likely-culprit badge + truncated message + reference id. Filter by status code; auto-refresh via TanStack Query. - /<port>/admin/errors/<requestId> deep-dive: request shape, full error name+message+stack, sanitized body excerpt, raw metadata, registered-code lookup (so admin can compare to what user saw), likely-culprit hint with subsystem tag. - /<port>/admin/errors/codes is the in-app code reference page — every registered code grouped by domain prefix, searchable, with HTTP status + user message inline. Linked from inspector header so admins can flip to it while triaging. - Permission: admin.view_audit_log. Super admins see all ports; regular admins port-scoped. - system-monitoring dashboard now surfaces error_events alongside permission_denied audit + queue failed jobs (RecentError gains source: 'request' variant). DOCS - docs/error-handling.md walks through coded errors, plain-text message guidelines, client toasting, admin inspector usage, persistence rules, classifier internals, pruning, and the legacy → CodedError migration path. MIGRATION SAFETY - Audit confirmed all 41 migrations (0000-0040) apply cleanly in journal order against an empty DB. 0040 references ports(id) which exists from 0000. 0035/0038 don't deadlock under sequential psql -f. Removed redundant idx_ds_sent_by from 0038 (created in 0037). Tests: 1168/1168 vitest passing. tsc clean. - security-error-responses tests updated for plain-text messages + new optional response keys (code/requestId/message). - berth-pdf-versions tests assert stable error codes via toMatchObject({ code }) rather than message regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
-- Per-request error capture table powering the super-admin error
-- inspector. A user pastes the "Error ID: …" they saw on a failed
-- mutation; the admin pulls the full row.
--
-- Pruned at 90 days by the maintenance worker.
CREATE TABLE IF NOT EXISTS error_events (
request_id text PRIMARY KEY,
port_id text REFERENCES ports(id) ON DELETE SET NULL,
user_id text,
status_code integer NOT NULL,
method text NOT NULL,
path text NOT NULL,
error_name text,
error_message text,
error_stack text,
request_body_excerpt text,
user_agent text,
ip_address text,
duration_ms integer,
metadata jsonb DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_error_events_port_created
ON error_events(port_id, created_at);
CREATE INDEX IF NOT EXISTS idx_error_events_status_created
ON error_events(status_code, created_at);