Six audit documents capture the 2026-05-06 review pass (comprehensive, frontend, missing-features, permissions, reliability) along with the Documenso integration audit + locked build plan that drove the bulk of subsequent feature work. Adds `docs/admin-ux-backlog.md` as a living tracker for the autonomous push — every item marked DONE or REMAINING with file pointers and scope estimates so future sessions can pick up where this one stopped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
11 KiB
Reliability audit — 2026-05-06 (focused, post-batch deltas)
Scope: NEW services from the recent archive/restore/hard-delete/external-EOI batches.
Out of scope (already covered in docs/audit-comprehensive-2026-05-06.md):
worker imports, rate limits, hard-delete error message UX, smart-restore
dead reversal applier, bulk hard-delete redis loop, audit log spam.
Critical
C1. Bulk archive enqueues zero post-commit side effects
- File:
src/app/api/v1/clients/bulk/route.ts:68-134 - Scenario: When the bulk wizard archives 100 clients with high-stakes
reasons,
archiveClientWithDecisionsreturnsexternalCleanupsandreleasedBerthsarrays per-client, butrunBulkdiscards the return value. Documenso envelopes that the wizard markedvoid_documensonever get queued, and "next-in-line" notifications never fire. The database is left indocuments.status='cancelled'with the live Documenso envelope still out for signature — the signer can complete a legally-binding envelope that the CRM thinks is voided. - Fix: Make the per-row callback return the result, then loop over
resultsafterrunBulkto enqueue Documenso voids and fire next-in-line notifications (mirroring the single-client route). DefaultingdocumentDecisionsto'leave'(line 113-116) hides the symptom for the bulk wizard but isn't enough — the single-client service can still surface this if the bulk path is ever generalized.
High
H1. Restore wizard silently drops every released berth
- File:
src/lib/services/client-restore.service.ts:359-372 - Scenario:
applyReversalforberth_releasedis a no-op with a comment saying "v1 leaves the berth available". But the dossier (line 122-129) classifies these asautoReversibleand the UI tells the operator "still available — re-attaching to the restored client". The wizard incrementsautoReversedand the audit log records a successful auto-reverse — but nothing actually happens. Operator thinks restore re-linked their berth; it didn't. - Fix: Either (a) actually re-link by persisting the original
interestIdin theberth_releaseddecision detail (it's already there, line 211) and re-inserting aninterestBerthsrow + flipping the berth status back tounder_offer, or (b) reclassify these asreversibleWithPromptwith copy that says "berth left available — re-add via the interest detail page".
H2. Smart-archive berth status update has TOCTOU race
- File:
src/lib/services/client-archive.service.ts:191-207 - Scenario: Berth row is read via
dossier.berths(read outside the tx) and modified inside the tx without afor updatelock onberths. Two concurrent flows — e.g. operator A archives client X while operator B sells berth A1 to client Y — can race: A readsberth.status === 'sold' → false, B's tx commits sold, A's tx then flips it back toavailable. The "still under offer" subselect doesn't catch this because berth.status is the source of truth, not interest_berths. - Fix: Add
tx.select(...).from(berths).where(eq(berths.id, d.berthId)).for('update')before the status flip and re-checkstatus !== 'sold'against the locked row.
H3. Bulk archive can pick the wrong interest for berth release
- File:
src/app/api/v1/clients/bulk/route.ts:95-103 - Scenario: When a client has multiple interests linked to the same
berth, the bulk wizard picks
dossier.interests.find((i) => i.primaryBerthMooring === b.mooringNumber)and falls back todossier.interests[0]?.interestId ?? ''. The fallback to the first-interest-or-empty-string can handarchiveClientWithDecisionsaninterestIdthat was never linked to that berth — so thedelete from interest_berths where berthId=… and interestId=…matches zero rows and the link is silently retained. Worse: an empty string''reaches the delete, which still matches zero rows but leaves the berth status check believing the link was removed. - Fix: Build the berth→interest map from
interestBerthRows(the authoritative join) rather than guessing byprimaryBerthMooring, and skip berths with no resolvable interest rather than emitting an empty-string interestId.
H4. External EOI runs four writes outside a transaction
- File:
src/lib/services/external-eoi.service.ts:67-155 - Scenario:
getStorageBackend().put(),files.insert,documents.insert,documentEvents.insert, and the interests update happen as five independent operations. If any one fails after the storage upload, you're left with an orphan PDF in S3/MinIO and partial DB state. If the documents insert fails after the file insert, the file row points to a storage key with no document referencing it — and the interest never advances. - Fix: Wrap files/documents/documentEvents/interests in a single
db.transaction. Storage upload stays outside (S3 isn't transactional) but on tx failure, schedule a cleanup job that deletes the orphan storage object, or accept the orphan and add a janitor.
H5. Bulk wizard double-submit re-archives the same client and racy errors
- File:
src/app/api/v1/clients/bulk/route.ts:68-120+src/lib/services/client-archive.service.ts:165-173 - Scenario: The single-client
archiveClientWithDecisionslocks the row and throwsConflictError('Client is already archived')on re-entry — good. ButrunBulkswallows the error string and returns it as{ok:false, error:"Client is already archived"}for that client. If the bulk wizard double-submits (network retry, double click), partial successes from the first request now look like per-client failures in the response, confusing the operator. There's no idempotency key on the bulk submit. - Fix: Treat
ConflictError('already archived')as success in the bulk per-row handler (the desired end state is reached). Or add an idempotency-key header on the bulk endpoint that short-circuits a duplicate request with the cached response.
Medium
M1. Hard-delete clientMergeLog.surviving_client_id deletes audit history
- File:
src/lib/services/client-hard-delete.service.ts:209 - Scenario: The comment says "merged records remain in the log
because mergedClientId has no FK", but the delete is wider than
needed: it removes every merge-log row where this client was the
survivor. If client X (being deleted) previously absorbed clients
A/B/C, the audit trail of those merges is lost on X's deletion. The
surviving rows that remain (
mergedClientId = X) are now inconsistent — they reference a survivor that no longer exists. - Fix: Either preserve the survivor rows by setting
surviving_client_id = NULL(requires column nullable) or keep the current behavior but document it more visibly. At minimum, log the deleted merge-log row count so operators can investigate gaps.
M2. Documenso void worker has no max-retry guard for non-404 errors
- File:
src/lib/queue/workers/documents.ts:19-37 - Scenario:
voidDocumentthrowsCodedErroron non-404 failures (auth error, network blip, Documenso 500). BullMQ retries with backoff, but there's no per-job idempotency check — the second retry hits the same envelope, voidDocument's 404 short-circuit only kicks in if Documenso has actually voided it on the first retry before the API call returned an error. A persistent 401 / 403 will retry forever (until BullMQ exhausts attempts) and the documents row stayscancelledin the CRM with the envelope still live in Documenso. The DLQ is mentioned in the comment but the worker doesn't surface a DLQ alert hook. - Fix: On exhaustion, write back to
documents(e.g.cancellation_failed=true) and emit an admin notification so the envelope can be voided manually.
M3. Next-in-line notification fan-out unhandled rejection
- File:
src/lib/services/next-in-line-notify.service.ts:75-87 - Scenario: Each
void createNotification(...)is a fire-and-forget promise with no.catchhandler. Ifnotifications.servicedispatches to a DB that's transiently down, the unhandled rejection will surface in the Node process with no recipient context (the closure captureduserIdis in the stack but pino won't include it unless explicitly logged). Process-level handlers will log it but individual recipients silently lose their notification. - Fix:
.catch((err) => logger.warn({err, userId, berthId: input.berthId}, 'next-in-line notification failed')).
M4. Restore service uses any for transaction type
- File:
src/lib/services/client-restore.service.ts:354-355 - Scenario:
applyReversal(tx: any, ...)defeats Drizzle's type safety. A future schema rename (e.g.yachts.statusenum change) won't fail at compile time inside this function. Combined with the documented v1 no-op forberth_released, the function looks innocuous but carries the most risk. - Fix: Use the proper Drizzle tx type —
Parameters<Parameters<typeof db.transaction>[0]>[0]or a named type alias from@/lib/db/types.tsif one exists.
M5. interests.changeInterestStage milestones write outside tx
- File:
src/lib/services/interests.service.ts:630-648 - Scenario: The override path (and normal path) writes
pipelineStagein one update and milestone fields (dateEoiSent,dateContractSigned, etc.) in a second update. If the process crashes between the two, the stage advances but the milestone is never recorded. Funnel/conversion math then under- counts that interest. Over-the-wire this is rare but the audit log fires before the milestone update succeeds, so the audit trail claims a complete transition that's actually half-applied. - Fix: Combine both into a single update statement, computing the
milestone fields in JS and merging them into the
set({...})clause.
Low
L1. Smart-archive coalesces invoice notes via SQL string concat
- File:
src/lib/services/client-archive.service.ts:288-291 - Scenario:
notes: sql\coalesce(${invoices.notes}, '') || ${...}`embedsnew Date().toISOString()and the action label inside a parameterized string. The values are bound, so it's not an injection risk, but the\n[archive ...]` marker is appended unconditionally — re-running the archive on a not-yet-committed client would double the marker. Combined with H5 (no idempotency on bulk), a retry could bloat invoice notes with duplicate markers. - Fix: Append only when the marker isn't already present, or rely
on the
clients.archivedAt is nullprecheck (which already guards re-entry) and accept the duplicate as theoretically impossible.
L2. Hard-delete requestHardDeleteCode reveals client existence pre-archive
- File:
src/lib/services/client-hard-delete.service.ts:77-85 - Scenario: A user without
admin.permanently_delete_clientsshouldn't reach this service, so this is theoretical, but the ConflictError "Client must be archived" leaks the existence of an unarchived client to anyone who can reach the route. The audit doc flagged hard-delete error messages already (out of scope), but this specific error path isn't covered there. - Fix: Same as the audit-doc finding for the symmetric path —
return a generic
NotFoundErrorinstead of distinguishing "not found" from "not archived" externally; log the distinction internally only.