Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
import { db } from '@/lib/db';
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
import { auditLogs, errorEvents } from '@/lib/db/schema';
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
import { redis } from '@/lib/redis';
|
|
|
|
|
import { minioClient } from '@/lib/minio/index';
|
|
|
|
|
import { getQueue, QUEUE_CONFIGS, type QueueName } from '@/lib/queue';
|
|
|
|
|
import { createAuditLog } from '@/lib/audit';
|
|
|
|
|
import { env } from '@/lib/env';
|
2026-03-26 12:06:18 +01:00
|
|
|
import { sql, desc, eq } from 'drizzle-orm';
|
fix(audit-tier-2): error-surface hygiene — toastError + CodedError sweep
Two mechanical sweeps closing the audit's HIGH §16 + MED §11 findings:
* 38 client components / 56 toast.error sites converted to
toastError(err) so the new admin error inspector becomes usable from
user-reported issues — every failed inline-edit, save, send, archive,
upload, etc. now carries the request-id + error-code (Copy ID action).
* 26 service files / 62 bare-Error throws converted to CodedError or
the existing AppError subclasses. Adds new error codes:
DOCUMENSO_UPSTREAM_ERROR (502), DOCUMENSO_AUTH_FAILURE (502),
DOCUMENSO_TIMEOUT (504), OCR_UPSTREAM_ERROR (502),
IMAP_UPSTREAM_ERROR (502), UMAMI_UPSTREAM_ERROR (502),
UMAMI_NOT_CONFIGURED (409), and INSERT_RETURNING_EMPTY (500) for
post-insert returning-empty guards.
* Five vitest assertions updated to match the new user-facing wording
(client-merge "already been merged", expense/interest "couldn't find
that …", documenso "signing service didn't respond").
Test status: 1168/1168 vitest, tsc clean.
Refs: docs/audit-comprehensive-2026-05-05.md HIGH §16 (auditor-H Issue 1)
+ MED §11 (auditor-G Issue 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:18:05 +02:00
|
|
|
import { NotFoundError } from '@/lib/errors';
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
import { logger } from '@/lib/logger';
|
|
|
|
|
|
|
|
|
|
// ─── Types ────────────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export interface ServiceStatus {
|
|
|
|
|
name: string;
|
|
|
|
|
status: 'healthy' | 'degraded' | 'down';
|
|
|
|
|
responseTimeMs: number;
|
|
|
|
|
details?: string;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface HealthStatus {
|
|
|
|
|
overall: 'healthy' | 'degraded' | 'down';
|
|
|
|
|
services: ServiceStatus[];
|
|
|
|
|
checkedAt: Date;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface QueueStatus {
|
|
|
|
|
name: string;
|
|
|
|
|
waiting: number;
|
|
|
|
|
active: number;
|
|
|
|
|
completed: number;
|
|
|
|
|
failed: number;
|
|
|
|
|
delayed: number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface QueueJobSummary {
|
|
|
|
|
id: string;
|
|
|
|
|
name: string;
|
|
|
|
|
data: unknown;
|
|
|
|
|
status: string;
|
|
|
|
|
timestamp: number | undefined;
|
|
|
|
|
processedOn: number | undefined;
|
|
|
|
|
finishedOn: number | undefined;
|
|
|
|
|
failedReason: string | undefined;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface PaginatedQueueJobs {
|
|
|
|
|
jobs: QueueJobSummary[];
|
|
|
|
|
total: number;
|
|
|
|
|
page: number;
|
|
|
|
|
limit: number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface ConnectionStatus {
|
|
|
|
|
totalConnections: number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface RecentError {
|
|
|
|
|
id: string;
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
source: 'audit' | 'queue' | 'request';
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
message: string;
|
|
|
|
|
timestamp: Date;
|
|
|
|
|
metadata?: Record<string, unknown>;
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
/** Set for `source: 'request'` rows so the UI can deep-link to
|
|
|
|
|
* /admin/errors/<requestId>. */
|
|
|
|
|
requestId?: string;
|
|
|
|
|
/** Set for `source: 'request'` rows. */
|
|
|
|
|
statusCode?: number;
|
|
|
|
|
/** Set for `source: 'request'` rows. */
|
|
|
|
|
errorCode?: string | null;
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── Timeout helper ───────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
|
|
|
|
|
return Promise.race([
|
|
|
|
|
promise,
|
|
|
|
|
new Promise<T>((_, reject) =>
|
|
|
|
|
setTimeout(() => reject(new Error(`Timed out after ${ms}ms`)), ms),
|
|
|
|
|
),
|
|
|
|
|
]);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── healthCheck ──────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export async function healthCheck(): Promise<HealthStatus> {
|
|
|
|
|
const checks = await Promise.allSettled([
|
|
|
|
|
checkPostgres(),
|
|
|
|
|
checkRedis(),
|
|
|
|
|
checkMinio(),
|
|
|
|
|
checkDocumenso(),
|
|
|
|
|
]);
|
|
|
|
|
|
|
|
|
|
const services: ServiceStatus[] = checks.map((result) => {
|
|
|
|
|
if (result.status === 'fulfilled') return result.value;
|
|
|
|
|
// Should not happen since each checker catches internally
|
|
|
|
|
return {
|
|
|
|
|
name: 'unknown',
|
|
|
|
|
status: 'down' as const,
|
|
|
|
|
responseTimeMs: 0,
|
|
|
|
|
details: String(result.reason),
|
|
|
|
|
};
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
const hasDown = services.some((s) => s.status === 'down');
|
|
|
|
|
const hasDegraded = services.some((s) => s.status === 'degraded');
|
|
|
|
|
const overall = hasDown ? 'down' : hasDegraded ? 'degraded' : 'healthy';
|
|
|
|
|
|
|
|
|
|
return { overall, services, checkedAt: new Date() };
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async function checkPostgres(): Promise<ServiceStatus> {
|
|
|
|
|
const start = Date.now();
|
|
|
|
|
try {
|
|
|
|
|
await withTimeout(db.execute(sql`SELECT 1`), 5000);
|
|
|
|
|
return { name: 'PostgreSQL', status: 'healthy', responseTimeMs: Date.now() - start };
|
|
|
|
|
} catch (err) {
|
|
|
|
|
return {
|
|
|
|
|
name: 'PostgreSQL',
|
|
|
|
|
status: 'down',
|
|
|
|
|
responseTimeMs: Date.now() - start,
|
|
|
|
|
details: err instanceof Error ? err.message : 'Unknown error',
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async function checkRedis(): Promise<ServiceStatus> {
|
|
|
|
|
const start = Date.now();
|
|
|
|
|
try {
|
|
|
|
|
const result = await withTimeout(redis.ping(), 5000);
|
|
|
|
|
const status = result === 'PONG' ? 'healthy' : 'degraded';
|
|
|
|
|
return { name: 'Redis', status, responseTimeMs: Date.now() - start };
|
|
|
|
|
} catch (err) {
|
|
|
|
|
return {
|
|
|
|
|
name: 'Redis',
|
|
|
|
|
status: 'down',
|
|
|
|
|
responseTimeMs: Date.now() - start,
|
|
|
|
|
details: err instanceof Error ? err.message : 'Unknown error',
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async function checkMinio(): Promise<ServiceStatus> {
|
|
|
|
|
const start = Date.now();
|
|
|
|
|
try {
|
|
|
|
|
await withTimeout(minioClient.bucketExists(env.MINIO_BUCKET), 5000);
|
|
|
|
|
return { name: 'MinIO', status: 'healthy', responseTimeMs: Date.now() - start };
|
|
|
|
|
} catch (err) {
|
|
|
|
|
return {
|
|
|
|
|
name: 'MinIO',
|
|
|
|
|
status: 'down',
|
|
|
|
|
responseTimeMs: Date.now() - start,
|
|
|
|
|
details: err instanceof Error ? err.message : 'Unknown error',
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async function checkDocumenso(): Promise<ServiceStatus> {
|
|
|
|
|
const start = Date.now();
|
|
|
|
|
try {
|
|
|
|
|
const controller = new AbortController();
|
|
|
|
|
const timer = setTimeout(() => controller.abort(), 5000);
|
|
|
|
|
try {
|
|
|
|
|
const res = await fetch(`${env.DOCUMENSO_API_URL}/api/v1/health`, {
|
|
|
|
|
signal: controller.signal,
|
|
|
|
|
method: 'GET',
|
|
|
|
|
});
|
|
|
|
|
clearTimeout(timer);
|
|
|
|
|
const status = res.ok ? 'healthy' : 'degraded';
|
|
|
|
|
return { name: 'Documenso', status, responseTimeMs: Date.now() - start };
|
|
|
|
|
} finally {
|
|
|
|
|
clearTimeout(timer);
|
|
|
|
|
}
|
|
|
|
|
} catch (err) {
|
|
|
|
|
return {
|
|
|
|
|
name: 'Documenso',
|
|
|
|
|
status: 'down',
|
|
|
|
|
responseTimeMs: Date.now() - start,
|
|
|
|
|
details: err instanceof Error ? err.message : 'Unreachable',
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── getQueueDashboard ────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export async function getQueueDashboard(): Promise<QueueStatus[]> {
|
|
|
|
|
const queueNames = Object.keys(QUEUE_CONFIGS) as QueueName[];
|
|
|
|
|
const results = await Promise.allSettled(
|
|
|
|
|
queueNames.map(async (name) => {
|
|
|
|
|
const queue = getQueue(name);
|
|
|
|
|
const counts = await queue.getJobCounts(
|
|
|
|
|
'waiting',
|
|
|
|
|
'active',
|
|
|
|
|
'completed',
|
|
|
|
|
'failed',
|
|
|
|
|
'delayed',
|
|
|
|
|
);
|
|
|
|
|
return {
|
|
|
|
|
name,
|
|
|
|
|
waiting: counts.waiting ?? 0,
|
|
|
|
|
active: counts.active ?? 0,
|
|
|
|
|
completed: counts.completed ?? 0,
|
|
|
|
|
failed: counts.failed ?? 0,
|
|
|
|
|
delayed: counts.delayed ?? 0,
|
|
|
|
|
} satisfies QueueStatus;
|
|
|
|
|
}),
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
return results.map((r, i) => {
|
|
|
|
|
if (r.status === 'fulfilled') return r.value;
|
|
|
|
|
const name = queueNames[i] ?? 'unknown';
|
|
|
|
|
logger.warn({ queue: name, err: r.reason }, 'Failed to get queue counts');
|
|
|
|
|
return {
|
|
|
|
|
name,
|
|
|
|
|
waiting: 0,
|
|
|
|
|
active: 0,
|
|
|
|
|
completed: 0,
|
|
|
|
|
failed: 0,
|
|
|
|
|
delayed: 0,
|
|
|
|
|
};
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── getQueueJobs ─────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
type JobStatus = 'waiting' | 'active' | 'completed' | 'failed' | 'delayed';
|
|
|
|
|
|
|
|
|
|
export async function getQueueJobs(
|
|
|
|
|
queueName: QueueName,
|
|
|
|
|
status: JobStatus = 'failed',
|
|
|
|
|
page = 1,
|
|
|
|
|
limit = 20,
|
|
|
|
|
): Promise<PaginatedQueueJobs> {
|
|
|
|
|
const queue = getQueue(queueName);
|
|
|
|
|
const start = (page - 1) * limit;
|
|
|
|
|
const end = start + limit - 1;
|
|
|
|
|
|
|
|
|
|
const jobs = await queue.getJobs([status], start, end);
|
|
|
|
|
const counts = await queue.getJobCounts(status);
|
|
|
|
|
const total = counts[status] ?? 0;
|
|
|
|
|
|
|
|
|
|
const summaries: QueueJobSummary[] = jobs.map((job) => {
|
|
|
|
|
// Truncate job data to prevent huge payloads
|
|
|
|
|
let truncatedData: unknown;
|
|
|
|
|
try {
|
|
|
|
|
const dataStr = JSON.stringify(job.data);
|
|
|
|
|
truncatedData =
|
2026-05-04 22:57:01 +02:00
|
|
|
dataStr.length > 500 ? JSON.parse(dataStr.slice(0, 500) + '...(truncated)') : job.data;
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
} catch {
|
|
|
|
|
truncatedData = '[unparseable]';
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return {
|
|
|
|
|
id: job.id ?? '',
|
|
|
|
|
name: job.name,
|
|
|
|
|
data: truncatedData,
|
|
|
|
|
status,
|
|
|
|
|
timestamp: job.timestamp,
|
|
|
|
|
processedOn: job.processedOn ?? undefined,
|
|
|
|
|
finishedOn: job.finishedOn ?? undefined,
|
|
|
|
|
failedReason: job.failedReason ?? undefined,
|
|
|
|
|
};
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
return { jobs: summaries, total, page, limit };
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── retryJob ─────────────────────────────────────────────────────────────────
|
|
|
|
|
|
2026-05-04 22:57:01 +02:00
|
|
|
export async function retryJob(queueName: QueueName, jobId: string, userId: string): Promise<void> {
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
const queue = getQueue(queueName);
|
|
|
|
|
const job = await queue.getJob(jobId);
|
fix(audit-tier-2): error-surface hygiene — toastError + CodedError sweep
Two mechanical sweeps closing the audit's HIGH §16 + MED §11 findings:
* 38 client components / 56 toast.error sites converted to
toastError(err) so the new admin error inspector becomes usable from
user-reported issues — every failed inline-edit, save, send, archive,
upload, etc. now carries the request-id + error-code (Copy ID action).
* 26 service files / 62 bare-Error throws converted to CodedError or
the existing AppError subclasses. Adds new error codes:
DOCUMENSO_UPSTREAM_ERROR (502), DOCUMENSO_AUTH_FAILURE (502),
DOCUMENSO_TIMEOUT (504), OCR_UPSTREAM_ERROR (502),
IMAP_UPSTREAM_ERROR (502), UMAMI_UPSTREAM_ERROR (502),
UMAMI_NOT_CONFIGURED (409), and INSERT_RETURNING_EMPTY (500) for
post-insert returning-empty guards.
* Five vitest assertions updated to match the new user-facing wording
(client-merge "already been merged", expense/interest "couldn't find
that …", documenso "signing service didn't respond").
Test status: 1168/1168 vitest, tsc clean.
Refs: docs/audit-comprehensive-2026-05-05.md HIGH §16 (auditor-H Issue 1)
+ MED §11 (auditor-G Issue 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:18:05 +02:00
|
|
|
if (!job) throw new NotFoundError('queue job');
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
|
|
|
|
|
await job.retry();
|
|
|
|
|
|
|
|
|
|
void createAuditLog({
|
|
|
|
|
userId,
|
|
|
|
|
portId: null,
|
|
|
|
|
action: 'update',
|
|
|
|
|
entityType: 'queue_job',
|
|
|
|
|
entityId: jobId,
|
|
|
|
|
metadata: { queueName, jobName: job.name, action: 'retry' },
|
|
|
|
|
ipAddress: 'system',
|
|
|
|
|
userAgent: 'system',
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── deleteJob ────────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export async function deleteJob(
|
|
|
|
|
queueName: QueueName,
|
|
|
|
|
jobId: string,
|
|
|
|
|
userId: string,
|
|
|
|
|
): Promise<void> {
|
|
|
|
|
const queue = getQueue(queueName);
|
|
|
|
|
const job = await queue.getJob(jobId);
|
fix(audit-tier-2): error-surface hygiene — toastError + CodedError sweep
Two mechanical sweeps closing the audit's HIGH §16 + MED §11 findings:
* 38 client components / 56 toast.error sites converted to
toastError(err) so the new admin error inspector becomes usable from
user-reported issues — every failed inline-edit, save, send, archive,
upload, etc. now carries the request-id + error-code (Copy ID action).
* 26 service files / 62 bare-Error throws converted to CodedError or
the existing AppError subclasses. Adds new error codes:
DOCUMENSO_UPSTREAM_ERROR (502), DOCUMENSO_AUTH_FAILURE (502),
DOCUMENSO_TIMEOUT (504), OCR_UPSTREAM_ERROR (502),
IMAP_UPSTREAM_ERROR (502), UMAMI_UPSTREAM_ERROR (502),
UMAMI_NOT_CONFIGURED (409), and INSERT_RETURNING_EMPTY (500) for
post-insert returning-empty guards.
* Five vitest assertions updated to match the new user-facing wording
(client-merge "already been merged", expense/interest "couldn't find
that …", documenso "signing service didn't respond").
Test status: 1168/1168 vitest, tsc clean.
Refs: docs/audit-comprehensive-2026-05-05.md HIGH §16 (auditor-H Issue 1)
+ MED §11 (auditor-G Issue 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 20:18:05 +02:00
|
|
|
if (!job) throw new NotFoundError('queue job');
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
|
|
|
|
|
await job.remove();
|
|
|
|
|
|
|
|
|
|
void createAuditLog({
|
|
|
|
|
userId,
|
|
|
|
|
portId: null,
|
|
|
|
|
action: 'delete',
|
|
|
|
|
entityType: 'queue_job',
|
|
|
|
|
entityId: jobId,
|
|
|
|
|
metadata: { queueName, jobName: job.name, action: 'delete' },
|
|
|
|
|
ipAddress: 'system',
|
|
|
|
|
userAgent: 'system',
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── getActiveConnections ─────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export async function getActiveConnections(): Promise<ConnectionStatus> {
|
|
|
|
|
try {
|
|
|
|
|
const { getIO } = await import('@/lib/socket/server');
|
|
|
|
|
const io = getIO();
|
|
|
|
|
const sockets = await io.fetchSockets();
|
|
|
|
|
return { totalConnections: sockets.length };
|
|
|
|
|
} catch {
|
|
|
|
|
return { totalConnections: 0 };
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ─── getRecentErrors ──────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
export async function getRecentErrors(limit = 20): Promise<RecentError[]> {
|
|
|
|
|
// Fetch permission-denied audit log entries
|
|
|
|
|
const auditErrors = await db
|
|
|
|
|
.select({
|
|
|
|
|
id: auditLogs.id,
|
|
|
|
|
action: auditLogs.action,
|
|
|
|
|
entityType: auditLogs.entityType,
|
|
|
|
|
entityId: auditLogs.entityId,
|
|
|
|
|
metadata: auditLogs.metadata,
|
|
|
|
|
createdAt: auditLogs.createdAt,
|
|
|
|
|
})
|
|
|
|
|
.from(auditLogs)
|
|
|
|
|
.where(eq(auditLogs.action, 'permission_denied'))
|
|
|
|
|
.orderBy(desc(auditLogs.createdAt))
|
|
|
|
|
.limit(limit);
|
|
|
|
|
|
|
|
|
|
const auditResults: RecentError[] = auditErrors.map((row) => ({
|
|
|
|
|
id: row.id,
|
|
|
|
|
source: 'audit' as const,
|
|
|
|
|
message: `Permission denied on ${row.entityType}`,
|
|
|
|
|
timestamp: row.createdAt,
|
|
|
|
|
metadata: (row.metadata as Record<string, unknown>) ?? {},
|
|
|
|
|
}));
|
|
|
|
|
|
2026-05-04 22:57:01 +02:00
|
|
|
// Fetch failed jobs from all queues (sample - top 5 per queue)
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
const queueNames = Object.keys(QUEUE_CONFIGS) as QueueName[];
|
|
|
|
|
const failedJobResults = await Promise.allSettled(
|
|
|
|
|
queueNames.map(async (name) => {
|
|
|
|
|
const queue = getQueue(name);
|
|
|
|
|
const jobs = await queue.getJobs(['failed'], 0, 4);
|
2026-05-04 22:57:01 +02:00
|
|
|
return jobs.map(
|
|
|
|
|
(job): RecentError => ({
|
|
|
|
|
id: `${name}:${job.id ?? ''}`,
|
|
|
|
|
source: 'queue',
|
|
|
|
|
message: `Queue job failed: ${job.name} in ${name}`,
|
|
|
|
|
timestamp: job.finishedOn ? new Date(job.finishedOn) : new Date(job.timestamp),
|
|
|
|
|
metadata: { queueName: name, failedReason: job.failedReason },
|
|
|
|
|
}),
|
|
|
|
|
);
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
}),
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
const queueErrors: RecentError[] = failedJobResults
|
|
|
|
|
.filter((r): r is PromiseFulfilledResult<RecentError[]> => r.status === 'fulfilled')
|
|
|
|
|
.flatMap((r) => r.value);
|
|
|
|
|
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
// Captured 5xx requests from the per-request error_events table —
|
|
|
|
|
// this is the deepest source: full stack head + body excerpt + path.
|
|
|
|
|
// The dedicated /admin/errors page paginates this; here we surface
|
|
|
|
|
// the most recent for the dashboard.
|
|
|
|
|
const requestErrorRows = await db
|
|
|
|
|
.select({
|
|
|
|
|
requestId: errorEvents.requestId,
|
|
|
|
|
statusCode: errorEvents.statusCode,
|
|
|
|
|
method: errorEvents.method,
|
|
|
|
|
path: errorEvents.path,
|
|
|
|
|
errorName: errorEvents.errorName,
|
|
|
|
|
errorMessage: errorEvents.errorMessage,
|
|
|
|
|
metadata: errorEvents.metadata,
|
|
|
|
|
createdAt: errorEvents.createdAt,
|
|
|
|
|
})
|
|
|
|
|
.from(errorEvents)
|
|
|
|
|
.orderBy(desc(errorEvents.createdAt))
|
|
|
|
|
.limit(limit);
|
|
|
|
|
|
|
|
|
|
const requestErrors: RecentError[] = requestErrorRows.map((row) => {
|
|
|
|
|
const meta = (row.metadata as Record<string, unknown>) ?? {};
|
|
|
|
|
return {
|
|
|
|
|
id: row.requestId,
|
|
|
|
|
source: 'request' as const,
|
|
|
|
|
message:
|
|
|
|
|
`${row.method} ${row.path} → ${row.statusCode} ${row.errorMessage ?? row.errorName ?? ''}`.trim(),
|
|
|
|
|
timestamp: row.createdAt,
|
|
|
|
|
metadata: meta,
|
|
|
|
|
requestId: row.requestId,
|
|
|
|
|
statusCode: row.statusCode,
|
|
|
|
|
errorCode: typeof meta.code === 'string' ? meta.code : null,
|
|
|
|
|
};
|
|
|
|
|
});
|
|
|
|
|
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
// Merge and sort combined list by timestamp descending
|
feat(errors): platform-wide request ids + error codes + admin inspector
End-to-end error-handling overhaul. A user hitting any failure now sees
a plain-text message + stable error code + reference id. A super admin
can paste the id into /admin/errors/<id> for the full request shape,
sanitized body, error stack, and a heuristic likely-cause hint.
REQUEST CONTEXT (AsyncLocalStorage)
- src/lib/request-context.ts mints a per-request frame carrying
requestId + portId + userId + method + path + start timestamp.
- withAuth wraps every authenticated handler in runWithRequestContext
and accepts an upstream X-Request-Id header (validated shape) or
generates a fresh UUID. The id ALWAYS leaves on the X-Request-Id
response header, including early-return 401/403/4xx paths.
- Pino logger reads from the same context via mixin — every log
line emitted during the request automatically carries the ids
with no per-call threading.
ERROR CODE REGISTRY
- src/lib/error-codes.ts defines stable DOMAIN_REASON codes with
HTTP status + plain-text user-facing message (no jargon, written
for the rep on the phone with a customer).
- New CodedError class wraps a registered code + optional
internalMessage (admin-only — never sent to client).
- Existing AppError subclasses got plain-text default rewrites so
legacy throw sites improve immediately without migration.
- High-impact services migrated to specific codes:
expenses (RECEIPT_REQUIRED, INVOICE_LINKED), interest-berths
(CROSS_PORT_LINK_REJECTED), berth-pdf (PDF_MAGIC_BYTE / PDF_EMPTY /
PDF_TOO_LARGE / VERSION_ALREADY_CURRENT), recommender
(INTEREST_PORT_MISMATCH).
ERROR ENVELOPE
- errorResponse always sets X-Request-Id header + requestId field.
- 5xx responses include a "Quote error ID …" friendly line.
- 4xx kept clean (validation, permission, not-found don't pollute
the inspector — they're already in audit log).
PERSISTENCE (error_events table, migration 0040)
- One row per 5xx, keyed on requestId, with method/path/status/error
name+message/stack head (4KB cap)/sanitized body excerpt (1KB cap;
password/token/secret/etc keys redacted)/duration/IP/UA/metadata.
- captureErrorEvent extracts Postgres SQLSTATE/severity/cause.code
so the classifier can recognize FK / unique / NOT NULL / schema-
drift violations.
- Failure to persist is logged-not-thrown.
LIKELY-CULPRIT CLASSIFIER (src/lib/error-classifier.ts)
- 4-pass heuristic (first match wins):
1. Postgres SQLSTATE → human reason (23503 FK, 23505 unique,
42703 schema drift, 53300 connection limit, …)
2. Error class name (AbortError, TimeoutError, FetchError,
ZodError)
3. Stack-path patterns (/lib/storage/, /lib/email/, documenso,
openai|claude, /queue/workers/)
4. Free-text message keywords (econnrefused, rate limit, timeout,
unauthorized|invalid api key)
- Returns { label, hint, subsystem } for the inspector badge.
CLIENT SIDE
- apiFetch throws structured ApiError with message + code + requestId
+ details + retryAfter.
- toastError() helper renders the standard 3-line toast:
plain message / Error code: X / Reference ID: Y [Copy ID].
ADMIN INSPECTOR
- /<port>/admin/errors lists captured 5xx with status badge + path +
likely-culprit badge + truncated message + reference id. Filter by
status code; auto-refresh via TanStack Query.
- /<port>/admin/errors/<requestId> deep-dive: request shape, full
error name+message+stack, sanitized body excerpt, raw metadata,
registered-code lookup (so admin can compare to what user saw),
likely-culprit hint with subsystem tag.
- /<port>/admin/errors/codes is the in-app code reference page —
every registered code grouped by domain prefix, searchable, with
HTTP status + user message inline. Linked from inspector header
so admins can flip to it while triaging.
- Permission: admin.view_audit_log. Super admins see all ports;
regular admins port-scoped.
- system-monitoring dashboard now surfaces error_events alongside
permission_denied audit + queue failed jobs (RecentError gains
source: 'request' variant).
DOCS
- docs/error-handling.md walks through coded errors, plain-text
message guidelines, client toasting, admin inspector usage,
persistence rules, classifier internals, pruning, and the
legacy → CodedError migration path.
MIGRATION SAFETY
- Audit confirmed all 41 migrations (0000-0040) apply cleanly in
journal order against an empty DB. 0040 references ports(id)
which exists from 0000. 0035/0038 don't deadlock under sequential
psql -f. Removed redundant idx_ds_sent_by from 0038 (created in
0037).
Tests: 1168/1168 vitest passing. tsc clean.
- security-error-responses tests updated for plain-text messages
+ new optional response keys (code/requestId/message).
- berth-pdf-versions tests assert stable error codes via
toMatchObject({ code }) rather than message regex.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:12:59 +02:00
|
|
|
const combined = [...auditResults, ...queueErrors, ...requestErrors].sort(
|
Initial commit: Port Nimara CRM (Layers 0-4)
Full CRM rebuild with Next.js 15, TypeScript, Tailwind, Drizzle ORM,
PostgreSQL, Redis, BullMQ, MinIO, and Socket.io. Includes 461 source
files covering clients, berths, interests/pipeline, documents/EOI,
expenses/invoices, email, notifications, dashboard, admin, and
client portal. CI/CD via Gitea Actions with Docker builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:52:51 +01:00
|
|
|
(a, b) => b.timestamp.getTime() - a.timestamp.getTime(),
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
return combined.slice(0, limit);
|
|
|
|
|
}
|