diff --git a/docs/superpowers/specs/2026-04-29-gws-inbox-triage-design.md b/docs/superpowers/specs/2026-04-29-gws-inbox-triage-design.md new file mode 100644 index 0000000..f14450b --- /dev/null +++ b/docs/superpowers/specs/2026-04-29-gws-inbox-triage-design.md @@ -0,0 +1,376 @@ +# Google Workspace inbox-triage integration (exploratory) + +**Status:** Exploratory — not approved for build +**Date:** 2026-04-29 +**Tracks:** AI inbox-triage, Google Workspace email connection + +## What this spec is for + +The user has flagged inbox-triage as the most valuable AI surface left to +build, but conditioned email integration on it being via Google Workspace +specifically (not generic IMAP), with a per-port toggle so clients who +don't use GWS aren't billed for capability they can't reach. + +This document captures what that build actually costs — especially on +the Google side, which is where most teams underestimate the work — so +we can decide whether to commit before writing any code. **Nothing in +this spec is approved for implementation.** The deliverable is a go / +no-go decision and, if go, a scope choice between three deployment +models that cost wildly different amounts of calendar time. + +## What inbox-triage actually does for the user + +Concretely, on the staff member's desktop: + +1. **Linked-inbox panel on the client detail page.** When you open + `/[port]/clients/` you see the last N email threads with that + client, pulled from the staff member's own Gmail. Each thread has + the latest message preview, an "open in Gmail" deep-link, and a + "draft reply" button (Phase 2+). +2. **Inbox triage queue.** A new top-level page `/[port]/inbox` that + lists unread/unanswered threads ranked by AI-assessed importance + (high-value client, contractual urgency, chase-overdue). Each row + has one-click actions: "log this as a note on the client", + "create a follow-up reminder", "draft reply". +3. **Email-driven alerts.** When a high-value client emails and no one + responds within X hours, the existing alerts engine fires a + `inbox.unanswered_high_value` rule (slots into the alert framework + from Phase B without schema change). +4. **Reply drafts (Phase 3).** AI generates a reply draft grounded in + the client's CRM record (open interests, pending reservations, + recent invoices). Staff edit and send through Gmail. + +The value is selective: a port with three staff members fielding 50 +client emails a day saves maybe an hour a day collectively if the +ranking is right. Below that volume the build doesn't pay back. + +## What already exists in the codebase + +The CRM is roughly halfway scaffolded for this: + +| Surface | Status | Notes | +| ----------------------------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------------------ | +| `email_accounts` table | ✅ Exists | Has `provider: 'google' \| 'outlook' \| 'custom'` discriminator and `imap_*` / `smtp_*` cols. Built for IMAP, not OAuth. | +| `email_threads` / `email_messages` tables | ✅ Exists | Already linked to `clientId`. Schema is good as-is for Gmail. | +| `email-threads.service.ts` `syncInbox()` | ⚠ Stub-ish | IMAP-flow only. Won't reach Gmail without OAuth + Gmail API rewrite. | +| `email` BullMQ queue + `inbox-sync` job name | ✅ Exists | Worker dispatches on the job name; new sync impl drops in. | +| `google_calendar_tokens` table | ✅ Exists | OAuth token storage shape we can mirror for Gmail. | +| Per-port email override (port `email_settings`) | ✅ Exists | Used for outbound only today; Gmail integration is per-staff-user, not per-port. | +| `ai_usage_ledger` + per-port `aiEnabled` flag | ✅ Exists (Phase 3a/3b) | Triage AI calls book against the same ledger. | +| `withRateLimit('ai', ...)` wrapper | ✅ Exists (Phase 3c) | Caps triage AI traffic at 60/min/user out of the box. | + +Net: schemas are mostly right. The OAuth flow, Gmail API client, push +notification receiver, and triage classifier are the new builds. + +## Why Google Workspace specifically + +The user's stated constraint: "I don't think we need email integration +unless we connect it to Google Workspace." Reasons that hold up: + +- **No password storage.** OAuth tokens are revocable, scoped, and + rotate. IMAP requires app passwords, which Google has been actively + deprecating since 2024 — they'll be gone for the workspace plans + this product targets. +- **Push notifications, not polling.** Gmail's `users.watch` API plus + Google Pub/Sub means we get an HTTP callback within seconds of a new + message landing. IMAP requires polling on a 30-60 second cadence, + which costs more and lags worse. +- **Search and labels.** The Gmail API exposes label management and + full-text search natively; IMAP search is much weaker. +- **Threading.** Gmail's `threadId` is canonical. Reconstructing + threads over IMAP from `In-Reply-To` / `References` headers is + reliable in theory, painful in practice. + +Microsoft 365 is the obvious peer integration but is out of scope here. +The Graph API model is similar enough that a future M365 path can reuse +most of the storage shape. + +## Three deployment models — pick one before building + +This is the most important decision in the spec. Each model has +different OAuth-verification consequences, which dominate everything +else. + +### Model A — Marketplace-published OAuth app + +A single OAuth client owned by Port Nimara, listed in the Google +Workspace Marketplace, that any GWS customer can install. Each staff +member clicks "Connect Gmail," consents to the scopes, and the CRM +stores their refresh token. + +**Google-side work:** + +1. Build the OAuth flow in CRM (~1 week). +2. Submit for OAuth verification. Gmail's `gmail.readonly` / + `gmail.modify` scopes are **restricted scopes** — they require: + - Domain-verified production URLs + - A homepage with a privacy policy that explicitly enumerates which + scopes are used and why + - A demo video (literally a screen recording) showing the consent + screen and what happens next + - **A third-party security assessment from a Google-approved + vendor** ($15k–$75k, 6–12 weeks) + - A Cloud Application Security Assessment (CASA) report +3. Marketplace listing review (~2 weeks after CASA passes). + +**Calendar time:** 4–6 months. +**Money:** $15k–$75k for the security assessment alone. +**Recurring:** Re-verification every 12 months. + +Right answer if Port Nimara wants to be the marina-CRM that ships GWS +out of the box for _any_ customer. Wrong answer if there are <5 +customers who'd use it. + +### Model B — Per-customer "Internal" OAuth app + +Each customer's GWS admin creates an OAuth client _inside their own +workspace_ and gives Port Nimara the client ID + secret. Because the +app is "Internal," Google skips verification entirely — the consent +screen is unverified-but-permitted. Tokens never cross workspace +boundaries. + +**Google-side work per customer:** + +1. Customer's GWS admin enables the Gmail API in their Cloud project. +2. Creates an OAuth 2.0 client ID with type "Internal" + your CRM's + redirect URI. +3. Hands the client ID + secret to Port Nimara out-of-band. +4. Staff connect their Gmail through that client. + +**Calendar time per customer:** ~1 hour of admin work. +**Money:** $0. +**Limit:** Doesn't span across GWS workspaces. A user with two GWS +accounts (e.g. the marina + a personal workspace) can only connect the +one matching the OAuth client. + +This is the **clear winner for the current customer base**: small +number of customers, each with their own GWS workspace, and each +buying the integration as part of an onboarding conversation. + +### Model C — Forward-to-CRM mailbox + +The CRM exposes a per-port email alias (e.g. +`port-nimara-NN@inbox.portnimara.com`). Customers configure a Gmail +filter or mailing rule that BCCs that alias on relevant threads. The +CRM ingests via SMTP and runs the same triage pipeline. + +**Google-side work:** None. Customer does it as a Gmail filter. +**Calendar time:** ~1 week of CRM-side build. +**Limit:** Receive-only — no reply drafts, no thread state changes, +no labels. The "draft reply" feature in Phase 3 above is impossible +under this model. + +Model C is the right answer if the user wants to ship inbox-triage +_now_ and decide on bidirectional Gmail integration later. The schema +is designed so the model can be upgraded to A or B without data +migration. + +### Recommendation + +**Build Model B first.** It costs nothing on the Google side, takes +~3 weeks of CRM work, and matches the actual customer profile. +**Promote to Model A only after 3+ paying customers ask for it +unprompted.** Until then, the security-assessment cost can't justify +itself. + +Model C as a fallback for customers who refuse to set up an Internal +OAuth app. Build it last, lazily — the schema accommodates it. + +## End-to-end flow (Model B) + +### 1. Per-port OAuth-app config + +New admin page `/[port]/admin/google-workspace`: + +- Field: "OAuth client ID" (their internal client ID) +- Field: "OAuth client secret" (encrypted at rest using `ENCRYPTION_KEY`) +- Field: "Authorized redirect URI" (read-only; we display the value + they need to paste into their Google Cloud Console) +- Toggle: "Enable Gmail integration for this port" + +Stored in `system_settings` under key `gws.config`, port-scoped. +Resolution mirrors the existing OCR config service. + +### 2. Per-staff connect flow + +Staff member visits `/[port]/me/integrations`, clicks "Connect Gmail." + +``` +GET /api/v1/auth/gws/start + → looks up port's gws.config + → builds Google authorize URL with port's client_id + state token + → 302 to Google +[ user consents ] + → 302 back to /api/v1/auth/gws/callback?code=…&state=… + → exchanges code for tokens via port's client_secret + → stores in new `gws_user_tokens` table (encrypted) + → schedules an `inbox-watch` job +``` + +### 3. Push notification subscription + +After tokens are stored, the worker calls +`gmail.users.watch({ topicName: , labelIds: ['INBOX'] })`. +Gmail then posts to a Pub/Sub topic on every inbox change. The CRM +exposes a Pub/Sub push subscription endpoint at +`/api/webhooks/gmail-push` which fetches the changed messages via the +delta `historyId` and writes them into `email_messages`. + +Watch subscriptions expire every 7 days. A maintenance job +re-establishes them daily. + +### 4. Triage pipeline + +For each new inbound message: + +1. Match against `clients` and `companies` by `from_address` against + `client_contacts` (email channel). Persist a thread→client link if + found. +2. If port has `aiEnabled` AND `gws.triageEnabled`, queue an `ai` + job that classifies the thread: + - `urgency`: low / medium / high + - `category`: invoice-question / availability / contract / other + - `requires_response`: boolean +3. AI call records into `ai_usage_ledger` with `feature='inbox_triage'`. + The existing per-port budget gates apply automatically. +4. Triage output written to a new `email_triage` table keyed on + `email_messages.id`. + +### 5. UI surfaces + +- `/[port]/inbox` — sorted by triage rank, port-wide view. +- Linked-inbox panel on `client-tabs.tsx` — adds a new "Email" tab + pulling from `email_threads` filtered to that client. +- Alert rule `inbox.unanswered_high_value` slots into Phase B's + alert engine; no schema change. + +## Schema additions + +Three new tables, all port-scoped where it matters: + +```ts +// Per-staff Gmail tokens. Mirror of google_calendar_tokens. +gws_user_tokens { + id, userId (UNIQUE), portId, emailAddress, + accessTokenEnc, refreshTokenEnc, tokenExpiry, + scope, watchExpiresAt, watchHistoryId, + connectedAt, lastSyncAt, syncEnabled, createdAt, updatedAt +} + +// Triage classifications keyed to messages. +email_triage { + messageId (PK, FK → email_messages.id ON DELETE CASCADE), + urgency, category, requiresResponse, + modelVersion, tokensUsed, classifiedAt +} + +// Pub/Sub idempotency log. Gmail re-delivers; we dedupe. +gws_push_log { + messageId (Pub/Sub message id, PK), + historyId, receivedAt +} +``` + +Plus extensions to `email_messages`: + +- `googleMessageId` (text, indexed) — Gmail's own ID for thread ops. +- `googleThreadId` (text, indexed). +- `gmailLabels` (text[]) — for "is unread" checks without hitting Gmail. + +The existing `emailAccounts.provider='google'` column repurposes +unchanged; the IMAP fields go nullable since OAuth-flow accounts won't +populate them. + +## AI cost interaction + +Triage AI is opt-in **twice**: the port admin must turn on +`aiEnabled` (Phase 3a flag, default off) **and** `gws.triageEnabled` +(this spec, default off). Either toggle off and the inbox sync still +runs but skips classification, so staff can manually scan threads +without burning tokens. + +Per-message token cost on a current Haiku-class model is roughly +1500–2500 tokens including the system prompt. A port doing 200 inbound +emails a day at the upper bound is ~500k tokens/day. The default +hard-cap is 500k/month, so triage will trip it inside a day. Two +mitigations baked in: + +- The system prompt is short (<500 tokens) and prompt-cached on the + Anthropic side, so most tokens are output. +- Triage runs only on threads not already classified — re-syncs from + the watch loop don't re-bill. + +The admin UI shows triage as its own line in the per-feature breakdown +so customers can see how much their inbox is costing them and tune +caps accordingly. + +## Phased build (assuming Model B) + +| Phase | Scope | Effort | Ships when | +| ---------------------------- | ------------------------------------------------------------------------------------------------------- | ------ | ----------------------------------------------------------- | +| **G1** Connect | OAuth flow + per-port config + per-user token storage. No sync yet. Staff can connect; nothing happens. | 1 week | Standalone | +| **G2** Read-only sync | Pub/Sub push receiver + delta sync into `email_messages`. Linked-inbox tab on client detail. No AI. | 1 week | After G1 | +| **G3** Triage classification | AI classifier, `email_triage` writes, `/inbox` page sorting. Per-port toggle. | 1 week | After G2; depends on Phase 3b budgets being live (they are) | +| **G4** Reply drafts | Gmail API send + draft creation. "Draft reply" button on the client detail Email tab. | 1 week | After G3 | +| **G5** Alerts | New `inbox.unanswered_high_value` rule. Hooks into Phase B alert engine. | 2 days | After G3 | + +Total: ~5 weeks for a single engineer, assuming the user provides one +real GWS workspace to test against during G1. + +## Open decisions for the user + +These are the questions to resolve before scheduling the build, in +priority order: + +1. **Deployment model — A, B, or C?** Default recommendation B. +2. **Single user or domain-wide delegation?** Per-staff connect (one + token per user) is simpler. Domain-wide delegation lets the port + admin connect once on behalf of every staff member but requires + the customer to grant a service account broader access. Default + recommendation: per-staff. +3. **Scope set.** Minimal viable scope is `gmail.readonly`. To send + replies (G4) we need `gmail.send`. To manage labels (e.g. mark + "triaged-by-CRM") we need `gmail.modify`. Each scope expansion + widens the consent screen scariness but doesn't add new + verification steps under Model B. +4. **Pub/Sub topic ownership.** Pub/Sub topics live in _some_ GCP + project. Under Model B the customer's project owns the topic — + they pay for Pub/Sub (cents/month) and grant our service account + subscriber access. Alternative: Port Nimara owns the topic and + the customer's Gmail publishes cross-project (allowed, slightly + more setup). Default: customer-owned topic, fewer moving parts. +5. **Triage model.** Haiku 4.5 is right for cost; Sonnet 4.6 is + right if the ranking quality on Haiku turns out to be poor. + Defer this until G3 has real-world tuning data. + +## Things that are NOT in this spec + +- **Microsoft 365 / Outlook integration.** Same shape, different API. + Once Model B is proven on GWS, Graph API takes another ~3 weeks. +- **Reply drafts grounded in CRM context.** That's G4 and depends on + the work in this spec, but the prompt engineering for "good replies + citing this client's open interests + reservations + invoices" + deserves its own design pass before building. +- **Cross-staff triage queue (i.e. "show me all unanswered emails + across the team").** That requires either domain-wide delegation + (decision #2 above) or per-staff opt-in to a shared view. Punt + until staff actually ask for it. +- **Sentiment / urgency tone analysis.** Tempting; almost always + wrong; skip in v1. +- **"Smart drafts" using the recipient's past replies as context.** + Every customer asks for this and almost no one uses it once + built. Skip. + +## Cost summary at a glance + +| Item | Model A | Model B | Model C | +| ------------------------------- | ------------------------------- | -------------------------------------- | ------------------------------------ | +| Build effort | 3–4 weeks | ~5 weeks (over G1–G5) | ~1 week (receive-only) | +| Calendar time to first customer | 4–6 months | 1 hour of customer admin work | 1 hour of customer Gmail-filter work | +| Up-front cash | $15k–$75k (CASA) | $0 | $0 | +| Recurring | Re-verification annually | None | None | +| Best for | 50+ customers, Marketplace play | 1–10 customers, white-glove onboarding | Customers who refuse OAuth setup | + +The recommendation stands: build Model B for G1 + G2 + G3, ship that, +and let real customer demand decide whether G4/G5 and Model A +promotion are worth the calendar time.