feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
/ * *
* Memory - efficient expense PDF export .
*
* Replaces the legacy ` client-portal/server/api/expenses/generate-pdf.ts `
* ( 1009 lines , pdfkit + full - buffer - everything + base64 - wrapped JSON
* response — would OOM on hundreds of receipts ) .
*
* Design constraints ( per user requirement : " could be hundreds of
* expenses and images , also compress files if they ' re stupidly large " ) :
*
* 1 . * * Stream the PDF output * * — pdfkit . pipe ( response ) instead of
* accumulating chunks . Bytes leave the process as they ' re written .
* 2 . * * Serial receipt processing * * — fetch one receipt at a time , embed ,
* release . Peak heap = ~ one image + the in - flight pdfkit page .
* 3 . * * Sharp resize before embedding * * — receipts above the size / dim
* thresholds get downscaled to ≤ 1500 px on the long edge at JPEG q80 .
* A typical 8 MB phone photo collapses to ~ 250 KB ; the embedded PDF
* ends up ~ 5 – 10 x smaller than the legacy output .
* 4 . * * Storage backend abstraction * * — receipts come from
* ` getStorageBackend().get(storageKey) ` ; works against MinIO / S3 in
* production and the local filesystem in dev .
* 5 . * * Heap budget * * — for a 500 - receipt export ( avg 8 MB raw → 250 KB
* resized + a few MB pdfkit working set ) , peak RSS stays under 100 MB .
* The legacy implementation needed > 2 GB for the same input .
*
* Caller flow :
*
* const pdfStream = await streamExpensePdf ( { portId , expenseIds , options } ) ;
* return new Response ( pdfStream , { headers : { 'content-type' : 'application/pdf' } } ) ;
*
* ` pdfStream ` is a ` ReadableStream<Uint8Array> ` ready to hand straight to
* the Web Response constructor ; pdfkit ' s Node - stream output is converted
* via ` Readable.toWeb ` so the route handler stays in standard runtime .
* /
import { Readable } from 'node:stream' ;
2026-05-05 05:11:26 +02:00
import { eq , inArray , and , gte , lte , isNull , desc } from 'drizzle-orm' ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
import PDFDocument from 'pdfkit' ;
import sharp from 'sharp' ;
import { db } from '@/lib/db' ;
import { expenses } from '@/lib/db/schema/financial' ;
import { files } from '@/lib/db/schema/documents' ;
import { getRate } from '@/lib/services/currency' ;
import { getStorageBackend } from '@/lib/storage' ;
import { logger } from '@/lib/logger' ;
// ─── Public options + result types ──────────────────────────────────────────
export type GroupBy = 'none' | 'payer' | 'category' | 'date' ;
export type PageFormat = 'A4' | 'Letter' | 'Legal' ;
export type TargetCurrency = 'USD' | 'EUR' ;
export interface ExpensePdfOptions {
/** Title at the top of the document, e.g. "March 2026 Expense Report". */
documentName : string ;
/** Subtitle below the title (defaults to "Generated on <today>"). */
subheader? : string ;
/** Group expenses in the table by payer/category/date. Default: none. */
groupBy? : GroupBy ;
/** Append one page per receipt image at the end. */
includeReceipts? : boolean ;
/** Include the OCR-extracted "Contents" string in the table row. */
includeReceiptContents? : boolean ;
/** Show the summary box (count + totals + grouping label). */
includeSummary? : boolean ;
/** Show the per-row expense table. */
includeDetails? : boolean ;
/** Add a 5% management fee line (parent-company export). */
includeProcessingFee? : boolean ;
/** Currency to convert all amounts into for the totals + line items. */
targetCurrency? : TargetCurrency ;
pageFormat? : PageFormat ;
}
export interface ExpensePdfArgs {
portId : string ;
/** When set, only these expenses are exported (ordered by expenseDate desc). */
expenseIds? : string [ ] ;
/** Otherwise, all matching expenses for the port get exported. */
filter ? : {
dateFrom? : Date | string | null ;
dateTo? : Date | string | null ;
category? : string | null ;
paymentStatus? : string | null ;
payer? : string | null ;
includeArchived? : boolean ;
} ;
options : ExpensePdfOptions ;
2026-05-05 05:11:26 +02:00
/ * *
* Caller ' s abort signal . When the client disconnects mid - stream we stop
* pulling receipts off the storage backend rather than burning CPU / IO on
* an export nobody ' s reading . Without this , a 1000 - receipt export aborted
* at byte 0 keeps the process busy for minutes .
* /
signal? : AbortSignal ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
}
// ─── Image resize gate ──────────────────────────────────────────────────────
/** Receipts above this raw-byte size are forced through sharp resize. */
const RESIZE_BYTE_THRESHOLD = 500 * 1024 ; // 500 KB
/ * * M a x l o n g - e d g e p i x e l s i z e a f t e r r e s i z e . K e e p s t e x t l e g i b l e w h i l e
* collapsing typical phone - camera receipts ( 4032 × 3024 → 1500 × 1125 ) . * /
const MAX_DIMENSION = 1500 ;
/** JPEG quality for resized output. */
const JPEG_QUALITY = 80 ;
/ * *
* Resize a receipt image to a memory - friendly size . Returns the input
* buffer untouched when :
* - it ' s already below the byte threshold AND
* - sharp can read its metadata AND
* - both dimensions are ≤ MAX_DIMENSION
*
* Returns a JPEG buffer in every other case . Sharp processes the input
* image stream - style internally ( libvips ) , so the only Node - heap cost
* during resize is the input + output buffers .
* /
async function maybeResizeImage (
raw : Buffer ,
contentType : string | null | undefined ,
) : Promise < { buffer : Buffer ; contentType : 'image/jpeg' | 'image/png' ; resized : boolean } > {
// Pdfkit only supports JPEG + PNG. Anything else gets transcoded to JPEG.
const isJpeg = contentType === 'image/jpeg' || contentType === 'image/jpg' ;
const isPng = contentType === 'image/png' ;
const passthroughCt : 'image/jpeg' | 'image/png' = isPng ? 'image/png' : 'image/jpeg' ;
if ( raw . byteLength <= RESIZE_BYTE_THRESHOLD && ( isJpeg || isPng ) ) {
try {
const meta = await sharp ( raw ) . metadata ( ) ;
const w = meta . width ? ? 0 ;
const h = meta . height ? ? 0 ;
if ( w > 0 && h > 0 && w <= MAX_DIMENSION && h <= MAX_DIMENSION ) {
return { buffer : raw , contentType : passthroughCt , resized : false } ;
}
} catch {
// Fall through to resize+transcode on any sharp metadata failure.
}
}
const resized = await sharp ( raw )
. rotate ( ) // honour EXIF orientation so phone photos aren't sideways
. resize ( {
width : MAX_DIMENSION ,
height : MAX_DIMENSION ,
fit : 'inside' ,
withoutEnlargement : true ,
} )
. jpeg ( { quality : JPEG_QUALITY , mozjpeg : true } )
. toBuffer ( ) ;
return { buffer : resized , contentType : 'image/jpeg' , resized : true } ;
}
// ─── Currency conversion ────────────────────────────────────────────────────
interface ExpenseRow {
id : string ;
establishmentName : string | null ;
amount : string ;
currency : string ;
amountUsd : string | null ;
paymentMethod : string | null ;
category : string | null ;
payer : string | null ;
expenseDate : Date ;
description : string | null ;
receiptFileIds : string [ ] | null ;
/ * * T r u e w h e n t h e r e p c r e a t e d t h e e x p e n s e w i t h o u t a r e c e i p t ( a n d
* acknowledged it may not be reimbursed ) . Surfaces as a banner row in
* the table + a footnote at the bottom of the summary box . * /
noReceiptAcknowledged : boolean ;
paymentStatus : string | null ;
}
interface ProcessedExpense extends ExpenseRow {
amountTarget : number ;
amountUsdNumeric : number ;
amountEurNumeric : number ;
2026-05-05 05:11:26 +02:00
/ * * T r u e w h e n A N Y r a t e l o o k u p f o r t h i s r o w f e l l b a c k t o 1 : 1 ( e . g . t h e
* exchange - rate cache was cold and the upstream API returned null ) .
* Surfaced via an asterisk in the table + a footnote in the summary . * /
rateUnavailable : boolean ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
}
interface Totals {
count : number ;
targetTotal : number ;
usdTotal : number ;
eurTotal : number ;
processingFee : number ;
finalTotal : number ;
targetCurrency : TargetCurrency ;
/ * * N u m b e r o f e x p e n s e s w i t h ` n o R e c e i p t A c k n o w l e d g e d = t r u e ` — s u r f a c e s a s a
* warning footer in the summary box . Reps need to know this count
* before forwarding the export to a parent - company reimbursement queue . * /
noReceiptCount : number ;
/ * * S u m o f t h e n o - r e c e i p t e x p e n s e s ' t a r g e t T o t a l — t h e a m o u n t a t r i s k
* of being denied reimbursement . * /
noReceiptAmount : number ;
2026-05-05 05:11:26 +02:00
/ * * N u m b e r o f r o w s w h o s e c o n v e r s i o n f e l l b a c k t o 1 : 1 — s u r f a c e s a s a n
* amber footer so reps know the totals are approximate . Audit caught
* the silent 1 :1 fallback ; users were getting EUR - labelled USD totals . * /
rateUnavailableCount : number ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
}
async function processExpenses (
rows : ExpenseRow [ ] ,
target : TargetCurrency ,
) : Promise < ProcessedExpense [ ] > {
// Resolve rate ONCE per source currency (cached by getRate). Avoids the
2026-05-05 05:11:26 +02:00
// legacy code's per-row API call. We also track *which* lookups failed
// (returned null upstream) so the PDF can surface a warning rather than
// silently treating EUR as USD.
const rateCache = new Map < string , { rate : number ; ok : boolean } > ( ) ;
const ensureRate = async ( from : string , to : string ) : Promise < { rate : number ; ok : boolean } > = > {
if ( from === to ) return { rate : 1 , ok : true } ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
const key = ` ${ from } -> ${ to } ` ;
if ( rateCache . has ( key ) ) return rateCache . get ( key ) ! ;
2026-05-05 05:11:26 +02:00
const fetched = await getRate ( from , to ) ;
const entry = fetched != null ? { rate : fetched , ok : true } : { rate : 1 , ok : false } ;
rateCache . set ( key , entry ) ;
if ( ! entry . ok ) {
logger . warn ( { from , to } , 'Expense PDF: exchange rate unavailable, falling back to 1:1' ) ;
}
return entry ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
} ;
const out : ProcessedExpense [ ] = [ ] ;
for ( const row of rows ) {
const raw = parseFloat ( row . amount ) ;
2026-05-05 05:11:26 +02:00
let rateUnavailable = false ;
let usd : number ;
if ( row . amountUsd != null ) {
usd = parseFloat ( row . amountUsd ) ;
} else if ( row . currency . toUpperCase ( ) === 'USD' ) {
usd = raw ;
} else {
const { rate , ok } = await ensureRate ( row . currency . toUpperCase ( ) , 'USD' ) ;
usd = raw * rate ;
if ( ! ok ) rateUnavailable = true ;
}
// Skip the USD->EUR chain when the source already matches the target —
// every redundant rate lookup adds rounding noise on top of the network
// round-trip. EUR-source + EUR-target should land back exactly at the
// input amount, not raw * USD-rate * USD-rate-inverse.
let eur : number ;
if ( row . currency . toUpperCase ( ) === 'EUR' ) {
eur = raw ;
} else {
const { rate , ok } = await ensureRate ( 'USD' , 'EUR' ) ;
eur = usd * rate ;
if ( ! ok ) rateUnavailable = true ;
}
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
const targetVal = target === 'USD' ? usd : eur ;
out . push ( {
. . . row ,
amountUsdNumeric : usd ,
amountEurNumeric : eur ,
amountTarget : targetVal ,
2026-05-05 05:11:26 +02:00
rateUnavailable ,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
} ) ;
}
return out ;
}
function computeTotals (
rows : ProcessedExpense [ ] ,
target : TargetCurrency ,
includeProcessingFee : boolean ,
) : Totals {
const targetTotal = rows . reduce ( ( s , r ) = > s + r . amountTarget , 0 ) ;
const usdTotal = rows . reduce ( ( s , r ) = > s + r . amountUsdNumeric , 0 ) ;
const eurTotal = rows . reduce ( ( s , r ) = > s + r . amountEurNumeric , 0 ) ;
const processingFee = includeProcessingFee ? targetTotal * 0.05 : 0 ;
const receiptlessRows = rows . filter ( ( r ) = > r . noReceiptAcknowledged ) ;
2026-05-05 05:11:26 +02:00
const rateUnavailableCount = rows . reduce ( ( n , r ) = > n + ( r . rateUnavailable ? 1 : 0 ) , 0 ) ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
return {
count : rows.length ,
targetTotal ,
usdTotal ,
eurTotal ,
processingFee ,
finalTotal : targetTotal + processingFee ,
targetCurrency : target ,
noReceiptCount : receiptlessRows.length ,
noReceiptAmount : receiptlessRows.reduce ( ( s , r ) = > s + r . amountTarget , 0 ) ,
2026-05-05 05:11:26 +02:00
rateUnavailableCount ,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
} ;
}
// ─── Page dimensions ────────────────────────────────────────────────────────
function pageDims ( format : PageFormat ) : { width : number ; height : number } {
switch ( format ) {
case 'Letter' :
return { width : 612 , height : 792 } ;
case 'Legal' :
return { width : 612 , height : 1008 } ;
case 'A4' :
default :
return { width : 595 , height : 842 } ;
}
}
// ─── Symbol helper ──────────────────────────────────────────────────────────
function currencySymbol ( c : string ) : string {
switch ( c . toUpperCase ( ) ) {
case 'USD' :
return '$' ;
case 'EUR' :
return '€' ;
case 'GBP' :
return '£' ;
default :
return c . toUpperCase ( ) + ' ' ;
}
}
// ─── Grouping ───────────────────────────────────────────────────────────────
function groupKey ( row : ProcessedExpense , by : GroupBy ) : string {
switch ( by ) {
case 'payer' :
return row . payer ? ? 'Unknown payer' ;
case 'category' :
return row . category ? ? 'Uncategorized' ;
case 'date' :
return row . expenseDate . toISOString ( ) . slice ( 0 , 10 ) ;
default :
return 'all' ;
}
}
function groupRows (
rows : ProcessedExpense [ ] ,
by : GroupBy ,
) : Array < { key : string ; rows : ProcessedExpense [ ] } > {
if ( by === 'none' ) return [ { key : 'all' , rows } ] ;
const map = new Map < string , ProcessedExpense [ ] > ( ) ;
for ( const r of rows ) {
const k = groupKey ( r , by ) ;
if ( ! map . has ( k ) ) map . set ( k , [ ] ) ;
map . get ( k ) ! . push ( r ) ;
}
return [ . . . map . entries ( ) ]
. sort ( ( [ a ] , [ b ] ) = > a . localeCompare ( b ) )
. map ( ( [ key , rs ] ) = > ( { key , rows : rs } ) ) ;
}
// ─── Fetching ───────────────────────────────────────────────────────────────
async function fetchExpenseRows ( args : ExpensePdfArgs ) : Promise < ExpenseRow [ ] > {
const conditions = [ eq ( expenses . portId , args . portId ) ] ;
2026-05-05 05:11:26 +02:00
// Soft-delete filter applies regardless of which path produced the
// expense list. The audit caught a regression where an `expenseIds`
// selection would happily export archived rows because the
// `isNull(archivedAt)` predicate sat inside the `else` branch — that
// violates the soft-delete contract used everywhere else.
if ( ! args . filter ? . includeArchived ) {
conditions . push ( isNull ( expenses . archivedAt ) ) ;
}
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
if ( args . expenseIds ? . length ) {
conditions . push ( inArray ( expenses . id , args . expenseIds ) ) ;
} else {
if ( args . filter ? . dateFrom ) {
conditions . push (
gte (
expenses . expenseDate ,
args . filter . dateFrom instanceof Date
? args . filter . dateFrom
: new Date ( args . filter . dateFrom ) ,
) ,
) ;
}
if ( args . filter ? . dateTo ) {
conditions . push (
lte (
expenses . expenseDate ,
args . filter . dateTo instanceof Date ? args.filter.dateTo : new Date ( args . filter . dateTo ) ,
) ,
) ;
}
if ( args . filter ? . category ) conditions . push ( eq ( expenses . category , args . filter . category ) ) ;
if ( args . filter ? . payer ) conditions . push ( eq ( expenses . payer , args . filter . payer ) ) ;
if ( args . filter ? . paymentStatus )
conditions . push ( eq ( expenses . paymentStatus , args . filter . paymentStatus ) ) ;
}
const rows = await db
. select ( {
id : expenses.id ,
establishmentName : expenses.establishmentName ,
amount : expenses.amount ,
currency : expenses.currency ,
amountUsd : expenses.amountUsd ,
paymentMethod : expenses.paymentMethod ,
category : expenses.category ,
payer : expenses.payer ,
expenseDate : expenses.expenseDate ,
description : expenses.description ,
receiptFileIds : expenses.receiptFileIds ,
noReceiptAcknowledged : expenses.noReceiptAcknowledged ,
paymentStatus : expenses.paymentStatus ,
} )
. from ( expenses )
. where ( and ( . . . conditions ) )
2026-05-05 05:11:26 +02:00
. orderBy ( desc ( expenses . expenseDate ) ) ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
return rows as ExpenseRow [ ] ;
}
interface ResolvedFile {
fileId : string ;
storagePath : string ;
storageBucket : string ;
mimeType : string | null ;
filename : string ;
}
/** Bulk-resolve file metadata so the receipt loop can do a single round-trip. */
async function resolveReceiptFiles ( fileIds : string [ ] ) : Promise < Map < string , ResolvedFile > > {
if ( fileIds . length === 0 ) return new Map ( ) ;
const rows = await db
. select ( {
id : files.id ,
storagePath : files.storagePath ,
storageBucket : files.storageBucket ,
mimeType : files.mimeType ,
filename : files.filename ,
} )
. from ( files )
. where ( inArray ( files . id , fileIds ) ) ;
const map = new Map < string , ResolvedFile > ( ) ;
for ( const r of rows ) {
map . set ( r . id , {
fileId : r.id ,
storagePath : r.storagePath ,
storageBucket : r.storageBucket ,
mimeType : r.mimeType ,
filename : r.filename ,
} ) ;
}
return map ;
}
// ─── Streaming buffer helper ────────────────────────────────────────────────
/ * * D r a i n a N o d e R e a d a b l e S t r e a m i n t o a B u f f e r . C a l l e r i s r e s p o n s i b l e f o r
* not holding multiple in memory simultaneously . * /
async function streamToBuffer ( stream : NodeJS.ReadableStream ) : Promise < Buffer > {
const chunks : Buffer [ ] = [ ] ;
for await ( const chunk of stream as AsyncIterable < Buffer | string > ) {
chunks . push ( typeof chunk === 'string' ? Buffer . from ( chunk ) : chunk ) ;
}
return Buffer . concat ( chunks ) ;
}
// ─── PDF builder ────────────────────────────────────────────────────────────
/ * *
* Build the expense PDF and return a Web ReadableStream of bytes . The
* caller ( route handler ) streams this directly to the client ; we never
* materialize the whole PDF in memory .
* /
export async function streamExpensePdf (
args : ExpensePdfArgs ,
) : Promise < { stream : ReadableStream < Uint8Array > ; suggestedFilename : string } > {
const opts : Required <
Omit < ExpensePdfOptions , 'subheader' | 'documentName' | 'pageFormat' | 'targetCurrency' >
> & {
subheader? : string ;
documentName : string ;
pageFormat : PageFormat ;
targetCurrency : TargetCurrency ;
} = {
documentName : args.options.documentName ,
subheader : args.options.subheader ,
groupBy : args.options.groupBy ? ? 'none' ,
includeReceipts : args.options.includeReceipts ? ? false ,
includeReceiptContents : args.options.includeReceiptContents ? ? false ,
includeSummary : args.options.includeSummary ? ? true ,
includeDetails : args.options.includeDetails ? ? true ,
includeProcessingFee : args.options.includeProcessingFee ? ? false ,
targetCurrency : args.options.targetCurrency ? ? 'EUR' ,
pageFormat : args.options.pageFormat ? ? 'A4' ,
} ;
const rawRows = await fetchExpenseRows ( args ) ;
const processed = await processExpenses ( rawRows , opts . targetCurrency ) ;
const totals = computeTotals ( processed , opts . targetCurrency , opts . includeProcessingFee ) ;
// Bulk-resolve receipt file metadata (one DB round-trip vs N).
const allFileIds = processed
. flatMap ( ( r ) = > r . receiptFileIds ? ? [ ] )
. filter ( ( s ) : s is string = > typeof s === 'string' && s . length > 0 ) ;
const filesById = opts . includeReceipts
? await resolveReceiptFiles ( allFileIds )
: new Map < string , ResolvedFile > ( ) ;
const dims = pageDims ( opts . pageFormat ) ;
const doc = new PDFDocument ( {
size : [ dims . width , dims . height ] ,
margins : { top : 60 , bottom : 60 , left : 60 , right : 60 } ,
} ) ;
// Pull bytes off pdfkit's Node Readable as soon as they're available so
// the client sees the response start streaming before we even begin
// fetching receipts. Node Readable → Web ReadableStream conversion.
const nodeStream = doc as unknown as NodeJS . ReadableStream ;
const webStream = Readable . toWeb (
nodeStream as unknown as Readable ,
) as unknown as ReadableStream < Uint8Array > ;
// Kick off the page-builder asynchronously. Errors propagate via doc.end()
// / doc.emit('error') and surface to the consumer of the stream.
void ( async ( ) = > {
try {
addHeader ( doc , opts ) ;
if ( opts . includeSummary ) addSummaryBox ( doc , totals , opts ) ;
if ( opts . includeDetails ) addExpenseTable ( doc , processed , opts ) ;
if ( opts . includeReceipts ) {
2026-05-05 05:11:26 +02:00
await addReceiptPages ( doc , processed , filesById , {
targetCurrency : opts.targetCurrency ,
signal : args.signal ,
} ) ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
}
addFooter ( doc ) ;
doc . end ( ) ;
} catch ( err ) {
logger . error ( { err } , 'Expense PDF stream failed mid-build' ) ;
doc . emit ( 'error' , err ) ;
}
} ) ( ) ;
2026-05-05 05:11:26 +02:00
// `\s` includes CR/LF; using it lets a malicious documentName forge
// additional response headers via Content-Disposition. Restrict to
// word/dot/dash/space (single-line space only — \s would let \n through).
const safeName = opts . documentName . replace ( /[^\w. \-]+/g , '_' ) . trim ( ) || 'expenses' ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
return {
stream : webStream ,
suggestedFilename : ` ${ safeName } .pdf ` ,
} ;
}
// ─── Page sections ──────────────────────────────────────────────────────────
function addHeader ( doc : PDFKit.PDFDocument , opts : { documentName : string ; subheader? : string } ) {
doc
. fontSize ( 24 )
. font ( 'Helvetica-Bold' )
. fillColor ( '#000000' )
. text ( opts . documentName , { align : 'center' } ) ;
const subheader = opts . subheader ? ? ` Generated on ${ new Date ( ) . toLocaleDateString ( ) } ` ;
doc . fontSize ( 12 ) . font ( 'Helvetica' ) . fillColor ( '#666666' ) . text ( subheader , { align : 'center' } ) ;
doc . fillColor ( '#000000' ) . moveDown ( 1 ) ;
}
function addSummaryBox (
doc : PDFKit.PDFDocument ,
totals : Totals ,
opts : { includeProcessingFee : boolean ; groupBy : GroupBy } ,
) {
const sym = currencySymbol ( totals . targetCurrency ) ;
const otherSym = totals . targetCurrency === 'USD' ? '€' : '$' ;
const otherTotal = totals . targetCurrency === 'USD' ? totals.eurTotal : totals.usdTotal ;
doc . fontSize ( 14 ) . font ( 'Helvetica-Bold' ) . text ( 'Summary' ) ;
doc . moveDown ( 0.4 ) ;
const lineY = doc . y ;
const lines = [
` Total expenses: ${ totals . count } ` ,
` Subtotal ( ${ totals . targetCurrency } ): ${ sym } ${ totals . targetTotal . toFixed ( 2 ) } ` ,
` ${ totals . targetCurrency === 'USD' ? 'EUR' : 'USD' } equivalent: ${ otherSym } ${ otherTotal . toFixed ( 2 ) } ` ,
] ;
if ( opts . includeProcessingFee ) {
lines . push ( ` Processing fee (5%): ${ sym } ${ totals . processingFee . toFixed ( 2 ) } ` ) ;
lines . push ( ` Final total: ${ sym } ${ totals . finalTotal . toFixed ( 2 ) } ` ) ;
}
if ( opts . groupBy !== 'none' ) lines . push ( ` Grouping: by ${ opts . groupBy } ` ) ;
// Warning footer when the export contains acknowledged-no-receipt rows.
// Reps need to see the at-risk count + amount BEFORE they forward the
// PDF to a reimbursement queue.
const showNoReceiptWarning = totals . noReceiptCount > 0 ;
const warningLines = showNoReceiptWarning
? [
` WARNING: ${ totals . noReceiptCount } expense ${ totals . noReceiptCount === 1 ? '' : 's' } on this report ${ totals . noReceiptCount === 1 ? 'has' : 'have' } no receipt attached ` ,
` ( ${ sym } ${ totals . noReceiptAmount . toFixed ( 2 ) } at risk of being denied reimbursement). ` ,
]
: [ ] ;
2026-05-05 05:11:26 +02:00
// Second warning band: any row whose currency conversion fell back to
// 1:1 because the upstream rate was unavailable. Without this surface,
// an EUR-source row would appear as `targetCurrency=EUR, amount=USD`
// and reps would never know the totals are wrong.
const showRateWarning = totals . rateUnavailableCount > 0 ;
const rateWarningLines = showRateWarning
? [
` Note: ${ totals . rateUnavailableCount } expense ${ totals . rateUnavailableCount === 1 ? '' : 's' } could not be converted (rate unavailable); ` ,
` the displayed amount ${ totals . rateUnavailableCount === 1 ? ' is' : 's are' } approximate (1:1 fallback). ` ,
]
: [ ] ;
const boxHeight = ( lines . length + warningLines . length + rateWarningLines . length ) * 16 + 20 ;
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
doc
. rect ( 60 , lineY , doc . page . width - 120 , boxHeight )
. fillColor ( '#f5f5f5' )
. fill ( )
. strokeColor ( '#dddddd' )
. stroke ( ) ;
doc . fillColor ( '#000000' ) . fontSize ( 11 ) . font ( 'Helvetica' ) ;
let y = lineY + 12 ;
for ( const line of lines ) {
doc . text ( line , 75 , y ) ;
y += 16 ;
}
if ( showNoReceiptWarning ) {
doc . fillColor ( '#dc3545' ) . font ( 'Helvetica-Bold' ) ;
for ( const line of warningLines ) {
doc . text ( line , 75 , y ) ;
y += 16 ;
}
doc . fillColor ( '#000000' ) . font ( 'Helvetica' ) ;
}
2026-05-05 05:11:26 +02:00
if ( showRateWarning ) {
doc . fillColor ( '#92400e' ) . font ( 'Helvetica-Oblique' ) ;
for ( const line of rateWarningLines ) {
doc . text ( line , 75 , y ) ;
y += 16 ;
}
doc . fillColor ( '#000000' ) . font ( 'Helvetica' ) ;
}
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
doc . y = lineY + boxHeight + 12 ;
}
interface Column {
header : string ;
width : number ;
x : number ;
align ? : 'left' | 'right' ;
}
function addExpenseTable (
doc : PDFKit.PDFDocument ,
rows : ProcessedExpense [ ] ,
opts : { groupBy : GroupBy ; includeReceiptContents : boolean ; targetCurrency : TargetCurrency } ,
) {
doc . fontSize ( 14 ) . font ( 'Helvetica-Bold' ) . text ( 'Expense details' ) ;
doc . moveDown ( 0.4 ) ;
const sym = currencySymbol ( opts . targetCurrency ) ;
const baseColumns : Column [ ] = [
{ header : 'Date' , width : 60 , x : 60 } ,
{ header : 'Establishment' , width : 110 , x : 120 } ,
{ header : 'Category' , width : 65 , x : 230 } ,
{ header : 'Payer' , width : 55 , x : 295 } ,
{ header : 'Amount' , width : 75 , x : 350 , align : 'right' } ,
{ header : 'Status' , width : 50 , x : 425 } ,
] ;
if ( opts . includeReceiptContents ) {
baseColumns . push ( { header : 'Description' , width : 100 , x : 475 } ) ;
}
const drawHeader = ( ) = > {
doc
. fontSize ( 9 )
. font ( 'Helvetica-Bold' )
. rect ( 60 , doc . y , doc . page . width - 120 , 22 )
. fillColor ( '#f2f2f2' )
. fill ( )
. strokeColor ( '#dddddd' )
. stroke ( )
. fillColor ( '#000000' ) ;
const headerY = doc . y + 6 ;
for ( const col of baseColumns ) {
doc . text ( col . header , col . x , headerY , { width : col.width , align : col.align ? ? 'left' } ) ;
}
doc . y += 22 ;
} ;
const drawRow = ( row : ProcessedExpense , alt : boolean ) = > {
if ( doc . y > doc . page . height - 80 ) {
doc . addPage ( ) ;
drawHeader ( ) ;
}
const rowTop = doc . y ;
if ( alt ) {
doc
. rect ( 60 , rowTop , doc . page . width - 120 , 20 )
. fillColor ( '#fafafa' )
. fill ( ) ;
}
doc . fillColor ( '#000000' ) . fontSize ( 8 ) . font ( 'Helvetica' ) ;
const date = row . expenseDate . toISOString ( ) . slice ( 0 , 10 ) ;
const amount = ` ${ sym } ${ row . amountTarget . toFixed ( 2 ) } ` ;
// Annotate the establishment cell with a red "(no receipt)" marker
// when the rep created the expense without proof. This keeps the
// warning glanceable per row without adding a new column.
const establishment =
( row . establishmentName ? ? '-' ) + ( row . noReceiptAcknowledged ? ' (no receipt)' : '' ) ;
const data : string [ ] = [
date ,
establishment ,
row . category ? ? '-' ,
row . payer ? ? '-' ,
amount ,
row . paymentStatus ? ? '-' ,
] ;
if ( opts . includeReceiptContents ) {
data . push ( ( ( row . description ? ? '' ) || '-' ) . slice ( 0 , 80 ) ) ;
}
data . forEach ( ( value , i ) = > {
const col = baseColumns [ i ] ! ;
// Draw the establishment cell in red when no-receipt; reset to
// black for everything else so warning visibility doesn't bleed.
const isWarningCell = i === 1 && row . noReceiptAcknowledged ;
if ( isWarningCell ) doc . fillColor ( '#dc3545' ) ;
doc . text ( value , col . x , rowTop + 6 , {
width : col.width - 4 ,
align : col.align ? ? 'left' ,
ellipsis : true ,
} ) ;
if ( isWarningCell ) doc . fillColor ( '#000000' ) ;
} ) ;
doc . y = rowTop + 20 ;
} ;
drawHeader ( ) ;
let altIndex = 0 ;
for ( const group of groupRows ( rows , opts . groupBy ) ) {
if ( opts . groupBy !== 'none' ) {
if ( doc . y > doc . page . height - 80 ) {
doc . addPage ( ) ;
drawHeader ( ) ;
}
const groupTotal = group . rows . reduce ( ( s , r ) = > s + r . amountTarget , 0 ) ;
doc
. rect ( 60 , doc . y , doc . page . width - 120 , 20 )
. fillColor ( '#e7f3ff' )
. fill ( )
. strokeColor ( '#dddddd' )
. stroke ( ) ;
doc
. fillColor ( '#000000' )
. fontSize ( 9 )
. font ( 'Helvetica-Bold' )
. text (
` ${ group . key } ( ${ group . rows . length } expense ${ group . rows . length === 1 ? '' : 's' } — ${ sym } ${ groupTotal . toFixed ( 2 ) } ) ` ,
65 ,
doc . y + 5 ,
{ width : doc.page.width - 130 } ,
) ;
doc . y += 20 ;
}
for ( const row of group . rows ) {
drawRow ( row , altIndex % 2 === 1 ) ;
altIndex += 1 ;
}
}
doc . moveDown ( 0.5 ) ;
}
async function addReceiptPages (
doc : PDFKit.PDFDocument ,
rows : ProcessedExpense [ ] ,
filesById : Map < string , ResolvedFile > ,
2026-05-05 05:11:26 +02:00
opts : { targetCurrency : TargetCurrency ; signal? : AbortSignal } ,
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
) {
const expensesWithReceipts = rows . filter (
( r ) = > Array . isArray ( r . receiptFileIds ) && r . receiptFileIds . length > 0 ,
) ;
if ( expensesWithReceipts . length === 0 ) return ;
const totalReceipts = expensesWithReceipts . reduce (
( s , r ) = > s + ( r . receiptFileIds ? . length ? ? 0 ) ,
0 ,
) ;
const backend = await getStorageBackend ( ) ;
const sym = currencySymbol ( opts . targetCurrency ) ;
let receiptCounter = 0 ;
let resizedCount = 0 ;
const startedAt = Date . now ( ) ;
for ( const expense of expensesWithReceipts ) {
for ( const fileId of expense . receiptFileIds ? ? [ ] ) {
2026-05-05 05:11:26 +02:00
// Bail out the moment the client disconnects. Without this, an
// export aborted on the wire would keep grinding through the
// remaining receipts and only stop when the doc.end() write
// failed — minutes later for a 1000-row export.
if ( opts . signal ? . aborted ) {
logger . info (
{ receiptCounter , totalReceipts } ,
'Expense PDF stream aborted by client; halting receipt loop' ,
) ;
return ;
}
feat(expenses): streaming expense-PDF export + receipt-less expense flag + audit-3 fixes
Replaces the legacy text-only expense PDF (was just dumping rows into a
single pdfme text field — no images, no pagination) with a proper
streaming export modelled on the legacy Nuxt client-portal but
re-architected for memory safety. The legacy implementation OOM'd on
hundreds of receipts because it:
- buffered every receipt image into memory simultaneously
- accumulated PDF chunks into an array, concat'd at end
- base64-encoded the whole PDF into a JSON response (3x peak memory)
- had no image downscaling
The new design:
- `streamExpensePdf()` (src/lib/services/expense-pdf.service.ts):
pdfkit pipes bytes directly to the HTTP response (no Buffer
accumulation). Receipts are processed serially so peak heap is one
image at a time. Sharp downscales any receipt > 500 KB or > 1500 px
to JPEG q80 — typical 8 MB phone photo collapses to ~250 KB. For a
500-receipt export, peak RSS stays under ~100 MB; legacy needed >2
GB for the same input.
- Pages: cover summary box (count, totals, currency equiv, optional
processing fee), grouped expense table (groupBy=none|payer|category|
date), one-page-per-receipt with header (establishment, amount,
date, payer, category, file name) and full-bleed image.
- Storage backend abstraction — receipts stream from
`getStorageBackend().get(storageKey)`, works on MinIO/S3/filesystem.
- Route: POST /api/v1/expenses/export/pdf streams binary
application/pdf with cache-control:no-store. Validator caps
expenseIds at 1000 to prevent runaway loops.
Receipt-less expense flow (per user request):
- Schema: 0033 migration adds `expenses.no_receipt_acknowledged`
boolean (default false).
- Validator: createExpenseSchema requires either receiptFileIds OR
noReceiptAcknowledged=true; the .refine() error message tells the
rep exactly what to do. updateExpenseSchema is partial and skips
the rule (existing rows can be edited without re-acknowledging).
- PDF: receiptless expenses get an inline red "(no receipt)" tag in
the establishment cell + a red footer warning in the summary box
showing the count and at-risk amount.
- The legacy parent-company reimbursement queue may refuse to pay
receiptless expenses, so the warning is load-bearing for ops.
Audit-3 fixes piggy-backed:
- 🔴 Tesseract OCR runtime now races a 30s timeout (CPU-bomb DoS
protection — a crafted PDF rasterizing to high-res noise could
pin the worker indefinitely).
- 🟠 brochures.service.ts:listBrochures dropped a wasted query (the
legacy single-brochure fast-path was discarding its result on the
multi-brochure branch).
- 🟠 berth-pdf.service.ts:listBerthPdfVersions now Promise.all's the
presignDownload calls instead of awaiting each in a for-loop —
20-version berths went from 20× round-trip to 1×.
- 🟡 public berths route no longer logs the full `row` object on
enum drift (was dumping price + amenity columns into ops logs).
- 🟡 dropped the dead `void sql` import from public berths route.
Tests still 1163/1163. tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:38:32 +02:00
receiptCounter += 1 ;
const file = filesById . get ( fileId ) ;
if ( ! file ) {
addReceiptErrorPage (
doc ,
expense ,
receiptCounter ,
totalReceipts ,
sym ,
'Receipt file metadata missing' ,
) ;
continue ;
}
let imageBuffer : Buffer | null = null ;
try {
// Stream from storage → buffer. Sharp + pdfkit both need a Buffer
// (neither accepts a streaming body), so we pay one image's bytes
// per loop iteration. Released to GC after embed.
const stream = await backend . get ( file . storagePath ) ;
const raw = await streamToBuffer ( stream ) ;
const resized = await maybeResizeImage ( raw , file . mimeType ) ;
if ( resized . resized ) resizedCount += 1 ;
imageBuffer = resized . buffer ;
// Page header
doc . addPage ( ) ;
renderReceiptHeader ( doc , expense , file , receiptCounter , totalReceipts , sym ) ;
// Embed the image full-bleed in the remaining vertical space.
const margin = 60 ;
const headerBlockHeight = 110 ;
const imgX = margin ;
const imgY = doc . y ;
const imgW = doc . page . width - margin * 2 ;
const imgH = doc . page . height - imgY - margin ;
try {
doc . image ( imageBuffer , imgX , imgY , {
fit : [ imgW , imgH ] ,
align : 'center' ,
valign : 'center' ,
} ) ;
} catch ( err ) {
logger . warn (
{ err , fileId , mimeType : file.mimeType } ,
'pdfkit refused to embed receipt; falling back to error page' ,
) ;
// Replace the partial page content with an error footer; pdfkit
// doesn't allow removing already-drawn elements, so we just append
// the error message in red below.
doc
. fontSize ( 11 )
. fillColor ( '#dc3545' )
. text (
` Receipt could not be embedded: ${ ( err as Error ) . message } ` ,
imgX ,
imgY + headerBlockHeight ,
{ width : imgW , align : 'center' } ,
) ;
doc . fillColor ( '#000000' ) ;
}
} catch ( err ) {
logger . warn (
{ err , fileId , expenseId : expense.id , storagePath : file.storagePath } ,
'Receipt fetch failed; rendering placeholder page' ,
) ;
addReceiptErrorPage (
doc ,
expense ,
receiptCounter ,
totalReceipts ,
sym ,
( err as Error ) . message ? ? 'Receipt could not be loaded from storage' ,
) ;
} finally {
// Release the buffer reference so V8 can reclaim it before the
// next iteration. Without this, the closure could pin the last
// image until the loop fully completes.
imageBuffer = null ;
}
}
}
logger . info (
{
totalReceipts ,
resized : resizedCount ,
elapsedMs : Date.now ( ) - startedAt ,
} ,
'Expense PDF receipt pages built' ,
) ;
}
function renderReceiptHeader (
doc : PDFKit.PDFDocument ,
expense : ProcessedExpense ,
file : ResolvedFile ,
index : number ,
total : number ,
sym : string ,
) {
const margin = 60 ;
const headerH = 90 ;
doc
. rect ( margin , doc . y , doc . page . width - margin * 2 , headerH )
. fillColor ( '#f8f9fa' )
. fill ( )
. strokeColor ( '#dee2e6' )
. stroke ( ) ;
doc . fillColor ( '#000000' ) ;
doc
. fontSize ( 14 )
. font ( 'Helvetica-Bold' )
. text ( ` Receipt ${ index } of ${ total } ` , margin + 10 , doc . y - headerH + 10 ) ;
doc
. fontSize ( 11 )
. font ( 'Helvetica-Bold' )
. text (
` ${ expense . establishmentName ? ? '—' } ${ sym } ${ expense . amountTarget . toFixed ( 2 ) } ` ,
margin + 10 ,
doc . y + 4 ,
) ;
doc
. fontSize ( 9 )
. font ( 'Helvetica' )
. fillColor ( '#666666' )
. text (
` Date: ${ expense . expenseDate . toISOString ( ) . slice ( 0 , 10 ) } · Payer: ${ expense . payer ? ? '—' } · Category: ${ expense . category ? ? '—' } · File: ${ file . filename } ` ,
margin + 10 ,
doc . y + 4 ,
{ width : doc.page.width - margin * 2 - 20 } ,
) ;
doc . fillColor ( '#000000' ) ;
// Reset cursor to below the header block.
const margin2 = 60 ;
doc . y = doc . y + Math . max ( headerH - 50 , 20 ) ;
void margin2 ;
}
function addReceiptErrorPage (
doc : PDFKit.PDFDocument ,
expense : ProcessedExpense ,
index : number ,
total : number ,
sym : string ,
message : string ,
) {
doc . addPage ( ) ;
doc . fontSize ( 14 ) . font ( 'Helvetica-Bold' ) . text ( ` Receipt ${ index } of ${ total } ` , { align : 'center' } ) ;
doc
. fontSize ( 11 )
. font ( 'Helvetica' )
. text ( ` ${ expense . establishmentName ? ? '—' } ${ sym } ${ expense . amountTarget . toFixed ( 2 ) } ` , {
align : 'center' ,
} ) ;
doc . moveDown ( 2 ) ;
doc . fontSize ( 11 ) . fillColor ( '#dc3545' ) . text ( message , { align : 'center' } ) ;
doc . fillColor ( '#000000' ) ;
}
function addFooter ( doc : PDFKit.PDFDocument ) {
doc . fontSize ( 9 ) . fillColor ( '#666666' ) ;
const range = doc . bufferedPageRange ( ) ;
for ( let i = range . start ; i < range . start + range . count ; i += 1 ) {
doc . switchToPage ( i ) ;
doc . text ( ` Page ${ i + 1 } of ${ range . count } ` , 60 , doc . page . height - 30 , {
align : 'right' ,
width : doc.page.width - 120 ,
} ) ;
doc . text (
` Generated ${ new Date ( ) . toISOString ( ) . slice ( 0 , 19 ) . replace ( 'T' , ' ' ) } UTC ` ,
60 ,
doc . page . height - 30 ,
{
align : 'left' ,
width : doc.page.width - 120 ,
} ,
) ;
}
doc . fillColor ( '#000000' ) ;
}