feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
/ * *
* Heuristic "likely culprit" classifier for the admin error inspector .
*
* Given an ` error_events ` row , returns a short human - readable label +
* a longer hint pointing at the probable root cause . This is best - effort
chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
2026-05-23 00:52:59 +02:00
* - the goal is to save the admin five minutes of stack reading on the
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
* common cases ( FK violations , schema drift , external service outages ,
* timeouts ) without giving false confidence on the unusual ones .
*
* The classifier reads from data already on the row ( no DB lookups ) ,
* so it can run inside a server - component render with no extra cost .
* /
import type { ErrorEvent } from '@/lib/db/schema/system' ;
export interface LikelyCulprit {
/** Short label for a badge / column. */
label : string ;
/** Longer hint shown in the detail view, with a "next step" suggestion. */
hint : string ;
/** Subsystem tag for filtering: 'db' | 'storage' | 'email' | … */
subsystem : string ;
}
/ * *
* Postgres SQLSTATE codes commonly thrown by ` postgres ` driver wrappers .
* Drizzle bubbles these up unchanged on the ` code ` field . We translate
* the most - common ones into plain - English admin hints .
* /
const PG_CODE_HINTS : Record < string , LikelyCulprit > = {
'23502' : {
label : 'NOT NULL violation' ,
chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
2026-05-23 00:52:59 +02:00
hint : 'A required column was missing on insert. Check the validator vs the schema - a recently added .notNull() column may not have a default.' ,
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
subsystem : 'db' ,
} ,
'23503' : {
label : 'FK violation' ,
hint : 'A referenced row no longer exists (or never did). Check whether the parent was archived/deleted, or the FK was created without ON DELETE handling.' ,
subsystem : 'db' ,
} ,
'23505' : {
label : 'Unique violation' ,
hint : 'A duplicate value hit a UNIQUE index. Common causes: duplicate name within the same port, retried writes after a transient error, or a partial unique index that should fire.' ,
subsystem : 'db' ,
} ,
'23514' : {
label : 'CHECK violation' ,
hint : 'A value failed a CHECK constraint (e.g. polymorphic discriminator outside its allowed set). Verify the input matches the schema enum.' ,
subsystem : 'db' ,
} ,
'42703' : {
label : 'Schema drift' ,
chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
2026-05-23 00:52:59 +02:00
hint : 'A column referenced by the query does not exist in the database. The most recent migration probably has not been applied - run pnpm db:push or apply the SQL file.' ,
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
subsystem : 'db' ,
} ,
'42P01' : {
label : 'Missing table' ,
hint : 'The query referenced a table that does not exist. Either a migration is unapplied or the table was renamed.' ,
subsystem : 'db' ,
} ,
'40001' : {
label : 'Serialization failure' ,
hint : 'Two transactions raced. Retrying the operation usually resolves it; if it persists, look for hot-row contention.' ,
subsystem : 'db' ,
} ,
'57014' : {
label : 'Query cancelled' ,
hint : 'The query exceeded the configured timeout or the client disconnected mid-flight.' ,
subsystem : 'db' ,
} ,
'53300' : {
label : 'Connection limit' ,
hint : 'The database connection pool is exhausted. Look for leaked connections or scale the pool.' ,
subsystem : 'db' ,
} ,
} ;
/** Classify by error name (stable across providers). */
const ERROR_NAME_HINTS : Record < string , LikelyCulprit > = {
AbortError : {
label : 'Request aborted' ,
hint : 'The client disconnected (closed the tab, navigated away) before the response finished. Usually benign.' ,
subsystem : 'request' ,
} ,
TimeoutError : {
label : 'Timeout' ,
hint : 'An upstream call exceeded its time budget. Check the external service health (Documenso, MinIO, OpenAI, SMTP).' ,
subsystem : 'request' ,
} ,
FetchError : {
label : 'External service unreachable' ,
hint : 'A fetch() call failed. Likely the Documenso/MinIO/OpenAI/SMTP host is down or blocked by firewall.' ,
subsystem : 'integration' ,
} ,
ZodError : {
label : 'Validation' ,
hint : 'A zod schema rejected the input. The details array on the response shows which fields failed.' ,
subsystem : 'validation' ,
} ,
} ;
/** Classify by stack-path heuristics. The first match wins. */
const STACK_PATH_HINTS : Array < { pattern : RegExp ; culprit : LikelyCulprit } > = [
{
pattern : /\/lib\/storage\// ,
culprit : {
label : 'Storage backend' ,
hint : 'Failure inside MinIO/S3 or the filesystem proxy. Check storage availability and the backend config in admin > storage.' ,
subsystem : 'storage' ,
} ,
} ,
{
pattern : /\/lib\/email\// ,
culprit : {
label : 'Email subsystem' ,
hint : 'SMTP/IMAP error. Check the configured account credentials in admin > email and the SMTP provider status.' ,
subsystem : 'email' ,
} ,
} ,
{
pattern : /documenso/i ,
culprit : {
label : 'Documenso integration' ,
hint : 'Failure talking to Documenso. Check the API host + key in admin > integrations and verify Documenso uptime.' ,
subsystem : 'integration' ,
} ,
} ,
{
pattern : /openai|claude/i ,
culprit : {
label : 'AI provider' ,
hint : 'OpenAI/Claude call failed. Likely a provider outage, expired key, or rate-limit ceiling. Falls back to template draft when available.' ,
subsystem : 'integration' ,
} ,
} ,
{
pattern : /\/queue\/workers\// ,
culprit : {
label : 'Background worker' ,
hint : 'A BullMQ job threw. Check Redis health and the worker logs for the failed job id.' ,
subsystem : 'queue' ,
} ,
} ,
] ;
chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
2026-05-23 00:52:59 +02:00
/** Classify by free-text scan of the error message - last-resort. */
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
const MESSAGE_HINTS : Array < { pattern : RegExp ; culprit : LikelyCulprit } > = [
{
pattern : /econnrefused|enotfound|getaddrinfo/i ,
culprit : {
label : 'Network connection refused' ,
hint : 'A network call could not reach its host. Check that the dependency (DB, Redis, MinIO, SMTP) is running and reachable.' ,
subsystem : 'integration' ,
} ,
} ,
{
pattern : /rate.?limit/i ,
culprit : {
label : 'Rate limited' ,
hint : 'An upstream provider rate-limited us. Common with SMTP, OpenAI, and Documenso. Back off or raise the per-port cap.' ,
subsystem : 'integration' ,
} ,
} ,
{
pattern : /unauthorized|invalid.?(api.?)?key|401/i ,
culprit : {
label : 'Auth failure' ,
hint : 'A credential was rejected by an upstream service. Check the encrypted secrets in admin > integrations.' ,
subsystem : 'integration' ,
} ,
} ,
{
pattern : /timeout/i ,
culprit : {
label : 'Timeout' ,
hint : 'An operation exceeded its time budget. May be a slow upstream call or a heavy DB query.' ,
subsystem : 'request' ,
} ,
} ,
] ;
/ * *
* Best - effort culprit classification . Returns null when nothing
chore(autonomous-session): consolidate uncommitted work from prior session
Bundles the prior autonomous-session output that was sitting unstaged:
- Em-dash sweep across src/ + tests/ (en-dash/em-dash to hyphen, ~2280 instances)
- country-flag-icons rollout (CountryFlag component, replaces emoji glyphs that
never rendered on Windows; lazy-loads the 3x2 SVG index as a single chunk
after the per-subpath dynamic-import approach silently failed in webpack)
- Admin IA Phase 1+2: 7-domain regroup, 41 to 38 pages, /admin/berths index,
redirects (ocr to ai, reports to dashboard, invitations to users),
docs/admin-ia-proposal.md
- Per-template email tester (registry + endpoint + UI on Email admin page)
- Cancel-document mode picker (delete-from-Documenso vs keep-for-audit)
- Dashboard PDF report: 25 widgets, SVG charts, date-range picker, 11 resolvers
- Customize-widgets per-region sortables at xl+ (charts/rails/feed); single
flat sortable below xl when the layout stacks; per-viewport saved orders
- Audit doc updates capturing each shipped item
- Lint fixes: react-compiler immutability in DonutChart (reduce instead of
let-reassign), set-state-in-effect disables in CountryFlag and
UploadForSigning preview-bytes effect, unused 'confirm' destructures in
interest contract + reservation tabs, unescaped apostrophe in test-template
card copy
2026-05-23 00:52:59 +02:00
* matches - the inspector will display "Uncategorized" .
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
* /
export function classifyError ( row : ErrorEvent ) : LikelyCulprit | null {
// 1. Postgres SQLSTATE on the metadata bag.
const meta = row . metadata as { code? : unknown ; pgCode? : unknown } | null ;
const pgCode =
( typeof meta ? . code === 'string' && meta . code ) ||
( typeof meta ? . pgCode === 'string' && meta . pgCode ) ||
null ;
if ( pgCode && PG_CODE_HINTS [ pgCode ] ) return PG_CODE_HINTS [ pgCode ] ;
// 2. Error class name.
if ( row . errorName ) {
const named = ERROR_NAME_HINTS [ row . errorName ] ;
if ( named ) return named ;
}
// 3. Stack path heuristics.
if ( row . errorStack ) {
for ( const { pattern , culprit } of STACK_PATH_HINTS ) {
if ( pattern . test ( row . errorStack ) ) return culprit ;
}
}
// 4. Message free-text.
if ( row . errorMessage ) {
for ( const { pattern , culprit } of MESSAGE_HINTS ) {
if ( pattern . test ( row . errorMessage ) ) return culprit ;
}
}
return null ;
}