# Reliability audit — 2026-05-06 (focused, post-batch deltas) Scope: NEW services from the recent archive/restore/hard-delete/external-EOI batches. Out of scope (already covered in `docs/audit-comprehensive-2026-05-06.md`): worker imports, rate limits, hard-delete error message UX, smart-restore dead reversal applier, bulk hard-delete redis loop, audit log spam. --- ## Critical ### C1. Bulk archive enqueues zero post-commit side effects - **File:** `src/app/api/v1/clients/bulk/route.ts:68-134` - **Scenario:** When the bulk wizard archives 100 clients with high-stakes reasons, `archiveClientWithDecisions` returns `externalCleanups` and `releasedBerths` arrays per-client, but `runBulk` discards the return value. Documenso envelopes that the wizard marked `void_documenso` never get queued, and "next-in-line" notifications never fire. The database is left in `documents.status='cancelled'` with the live Documenso envelope still out for signature — the signer can complete a legally-binding envelope that the CRM thinks is voided. - **Fix:** Make the per-row callback return the result, then loop over `results` after `runBulk` to enqueue Documenso voids and fire next-in-line notifications (mirroring the single-client route). Defaulting `documentDecisions` to `'leave'` (line 113-116) hides the symptom for the bulk wizard but isn't enough — the single-client service can still surface this if the bulk path is ever generalized. --- ## High ### H1. Restore wizard silently drops every released berth - **File:** `src/lib/services/client-restore.service.ts:359-372` - **Scenario:** `applyReversal` for `berth_released` is a no-op with a comment saying "v1 leaves the berth available". But the dossier (line 122-129) classifies these as `autoReversible` and the UI tells the operator "still available — re-attaching to the restored client". The wizard increments `autoReversed` and the audit log records a successful auto-reverse — but nothing actually happens. Operator thinks restore re-linked their berth; it didn't. - **Fix:** Either (a) actually re-link by persisting the original `interestId` in the `berth_released` decision detail (it's already there, line 211) and re-inserting an `interestBerths` row + flipping the berth status back to `under_offer`, or (b) reclassify these as `reversibleWithPrompt` with copy that says "berth left available — re-add via the interest detail page". ### H2. Smart-archive berth status update has TOCTOU race - **File:** `src/lib/services/client-archive.service.ts:191-207` - **Scenario:** Berth row is read via `dossier.berths` (read outside the tx) and modified inside the tx without a `for update` lock on `berths`. Two concurrent flows — e.g. operator A archives client X while operator B sells berth A1 to client Y — can race: A reads `berth.status === 'sold' → false`, B's tx commits sold, A's tx then flips it back to `available`. The "still under offer" subselect doesn't catch this because berth.status is the source of truth, not interest_berths. - **Fix:** Add `tx.select(...).from(berths).where(eq(berths.id, d.berthId)).for('update')` before the status flip and re-check `status !== 'sold'` against the locked row. ### H3. Bulk archive can pick the wrong interest for berth release - **File:** `src/app/api/v1/clients/bulk/route.ts:95-103` - **Scenario:** When a client has multiple interests linked to the same berth, the bulk wizard picks `dossier.interests.find((i) => i.primaryBerthMooring === b.mooringNumber)` and falls back to `dossier.interests[0]?.interestId ?? ''`. The fallback to the first-interest-or-empty-string can hand `archiveClientWithDecisions` an `interestId` that was never linked to that berth — so the `delete from interest_berths where berthId=… and interestId=…` matches zero rows and the link is silently retained. Worse: an empty string `''` reaches the delete, which still matches zero rows but leaves the berth status check believing the link was removed. - **Fix:** Build the berth→interest map from `interestBerthRows` (the authoritative join) rather than guessing by `primaryBerthMooring`, and skip berths with no resolvable interest rather than emitting an empty-string interestId. ### H4. External EOI runs four writes outside a transaction - **File:** `src/lib/services/external-eoi.service.ts:67-155` - **Scenario:** `getStorageBackend().put()`, `files.insert`, `documents.insert`, `documentEvents.insert`, and the interests update happen as five independent operations. If any one fails after the storage upload, you're left with an orphan PDF in S3/MinIO and partial DB state. If the documents insert fails after the file insert, the file row points to a storage key with no document referencing it — and the interest never advances. - **Fix:** Wrap files/documents/documentEvents/interests in a single `db.transaction`. Storage upload stays outside (S3 isn't transactional) but on tx failure, schedule a cleanup job that deletes the orphan storage object, or accept the orphan and add a janitor. ### H5. Bulk wizard double-submit re-archives the same client and racy errors - **File:** `src/app/api/v1/clients/bulk/route.ts:68-120` + `src/lib/services/client-archive.service.ts:165-173` - **Scenario:** The single-client `archiveClientWithDecisions` locks the row and throws `ConflictError('Client is already archived')` on re-entry — good. But `runBulk` swallows the error string and returns it as `{ok:false, error:"Client is already archived"}` for that client. If the bulk wizard double-submits (network retry, double click), partial successes from the first request now look like per-client failures in the response, confusing the operator. There's no idempotency key on the bulk submit. - **Fix:** Treat `ConflictError('already archived')` as success in the bulk per-row handler (the desired end state is reached). Or add an idempotency-key header on the bulk endpoint that short-circuits a duplicate request with the cached response. --- ## Medium ### M1. Hard-delete `clientMergeLog.surviving_client_id` deletes audit history - **File:** `src/lib/services/client-hard-delete.service.ts:209` - **Scenario:** The comment says "merged records remain in the log because mergedClientId has no FK", but the delete is wider than needed: it removes every merge-log row where this client was the survivor. If client X (being deleted) previously absorbed clients A/B/C, the audit trail of those merges is lost on X's deletion. The surviving rows that remain (`mergedClientId = X`) are now inconsistent — they reference a survivor that no longer exists. - **Fix:** Either preserve the survivor rows by setting `surviving_client_id = NULL` (requires column nullable) or keep the current behavior but document it more visibly. At minimum, log the deleted merge-log row count so operators can investigate gaps. ### M2. Documenso void worker has no max-retry guard for non-404 errors - **File:** `src/lib/queue/workers/documents.ts:19-37` - **Scenario:** `voidDocument` throws `CodedError` on non-404 failures (auth error, network blip, Documenso 500). BullMQ retries with backoff, but there's no per-job idempotency check — the second retry hits the same envelope, voidDocument's 404 short-circuit only kicks in if Documenso has actually voided it on the first retry before the API call returned an error. A persistent 401 / 403 will retry forever (until BullMQ exhausts attempts) and the documents row stays `cancelled` in the CRM with the envelope still live in Documenso. The DLQ is mentioned in the comment but the worker doesn't surface a DLQ alert hook. - **Fix:** On exhaustion, write back to `documents` (e.g. `cancellation_failed=true`) and emit an admin notification so the envelope can be voided manually. ### M3. Next-in-line notification fan-out unhandled rejection - **File:** `src/lib/services/next-in-line-notify.service.ts:75-87` - **Scenario:** Each `void createNotification(...)` is a fire-and-forget promise with no `.catch` handler. If `notifications.service` dispatches to a DB that's transiently down, the unhandled rejection will surface in the Node process with no recipient context (the closure captured `userId` is in the stack but pino won't include it unless explicitly logged). Process-level handlers will log it but individual recipients silently lose their notification. - **Fix:** `.catch((err) => logger.warn({err, userId, berthId: input.berthId}, 'next-in-line notification failed'))`. ### M4. Restore service uses `any` for transaction type - **File:** `src/lib/services/client-restore.service.ts:354-355` - **Scenario:** `applyReversal(tx: any, ...)` defeats Drizzle's type safety. A future schema rename (e.g. `yachts.status` enum change) won't fail at compile time inside this function. Combined with the documented v1 no-op for `berth_released`, the function looks innocuous but carries the most risk. - **Fix:** Use the proper Drizzle tx type — `Parameters[0]>[0]` or a named type alias from `@/lib/db/types.ts` if one exists. ### M5. interests.changeInterestStage milestones write outside tx - **File:** `src/lib/services/interests.service.ts:630-648` - **Scenario:** The override path (and normal path) writes `pipelineStage` in one update and milestone fields (`dateEoiSent`, `dateContractSigned`, etc.) in a second update. If the process crashes between the two, the stage advances but the milestone is never recorded. Funnel/conversion math then under- counts that interest. Over-the-wire this is rare but the audit log fires before the milestone update succeeds, so the audit trail claims a complete transition that's actually half-applied. - **Fix:** Combine both into a single update statement, computing the milestone fields in JS and merging them into the `set({...})` clause. --- ## Low ### L1. Smart-archive coalesces invoice notes via SQL string concat - **File:** `src/lib/services/client-archive.service.ts:288-291` - **Scenario:** `notes: sql\`coalesce(${invoices.notes}, '') || ${...}\``embeds`new Date().toISOString()`and the action label inside a parameterized string. The values are bound, so it's not an injection risk, but the`\n[archive ...]` marker is appended unconditionally — re-running the archive on a not-yet-committed client would double the marker. Combined with H5 (no idempotency on bulk), a retry could bloat invoice notes with duplicate markers. - **Fix:** Append only when the marker isn't already present, or rely on the `clients.archivedAt is null` precheck (which already guards re-entry) and accept the duplicate as theoretically impossible. ### L2. Hard-delete `requestHardDeleteCode` reveals client existence pre-archive - **File:** `src/lib/services/client-hard-delete.service.ts:77-85` - **Scenario:** A user without `admin.permanently_delete_clients` shouldn't reach this service, so this is theoretical, but the ConflictError "Client must be archived" leaks the existence of an unarchived client to anyone who can reach the route. The audit doc flagged hard-delete error messages already (out of scope), but this specific error path isn't covered there. - **Fix:** Same as the audit-doc finding for the symmetric path — return a generic `NotFoundError` instead of distinguishing "not found" from "not archived" externally; log the distinction internally only.