kalei/docs/technical/kalei-system-architecture-p...

515 lines
18 KiB
Markdown
Raw Normal View History

# Kalei System Architecture Plan
Version: 1.0
Date: 2026-02-10
Status: Proposed canonical architecture for implementation
## 1. Purpose and Scope
This document consolidates the existing Kalei docs into one implementation-ready system architecture plan.
In scope:
- Core features: Mirror, Kaleidoscope (Turn), Lens, Gallery, Spectrum analytics, subscriptions (all ship in v1).
- Mobile-first architecture (iOS/Android via Expo) with optional web support.
- Production operations for safety, privacy, reliability, and cost control.
Out of scope:
- Pixel-level UI specs and brand copy details.
- Provider contract/legal details.
- Full threat model artifacts (to be produced separately).
## 2. Inputs Reviewed
- `docs/app-blueprint.md`
- `docs/kalei-infrastructure-plan.md`
- `docs/kalei-ai-model-comparison.md`
- `docs/kalei-mirror-feature.md`
- `docs/kalei-spectrum-phase2.md`
- `docs/kalei-complete-design.md`
- `docs/kalei-brand-guidelines.md`
## 3. Architecture Drivers
### 3.1 Product drivers
- Core loop quality: Mirror fragment detection and Turn reframes must feel high quality and emotionally calibrated.
- Daily habit loop: low friction, fast response, strong retention mechanics.
- Over time: longitudinal Spectrum insights from accumulated usage data.
### 3.2 Non-functional drivers
- Safety first: crisis language must bypass reframing and trigger support flow.
- Privacy first: personal reflective writing is highly sensitive.
- Cost discipline: launch target under ~EUR 30/month fixed infrastructure.
- Operability: architecture must be maintainable by a small team.
- Gradual scale: support ~50 DAU at launch and scale to ~10k DAU without full rewrite.
## 4. Canonical Decisions
This plan resolves conflicting guidance across current docs.
| Topic | Decision | Rationale |
|---|---|---|
| Backend platform | Self-hosted API-first modular monolith on Node.js (Fastify preferred) | Matches budget constraints and keeps full control of safety, rate limits, and AI routing. |
| Data layer | PostgreSQL 16 + Redis | Postgres for source-of-truth relational + analytics tables; Redis for counters, rate limits, caching, idempotency. |
| Auth | JWT auth service in API + refresh token rotation + social login (Apple/Google) | Aligns with self-hosted stack while preserving mobile auth UX. |
| Mobile | React Native + Expo (local/native builds) | Fastest path for iOS/Android while keeping build pipeline under direct control. |
| AI integration | AI Gateway abstraction via OpenRouter with provider pinning | Single API, automatic failover, no vendor lock-in, and deterministic routing to non-Chinese providers for data privacy. |
| AI default | DeepSeek V3.2 via OpenRouter, hosted on DeepInfra/Fireworks (US/EU infrastructure) | 8590% cheaper than Claude Haiku with comparable emotional intelligence benchmarks. Provider pinning ensures no data flows through Chinese servers. |
| AI fallback | Claude Haiku 4.5 via OpenRouter (automatic failover on provider outage) | Highest-quality safety net activated transparently when primary provider is unavailable. |
| Billing | Self-hosted entitlement authority (direct App Store + Google Play server APIs) | Keeps billing logic in-house and avoids closed SaaS dependency in core authorization path. |
| Analytics/monitoring | PostHog self-hosted + GlitchTip + centralized app logs + cost telemetry | Open-source-first observability stack with lower vendor lock-in. |
## 5. System Context
```mermaid
flowchart LR
user[User] --> app[Expo App]
app --> edge[Edge Proxy]
edge --> api[Kalei API]
api --> db[(PostgreSQL)]
api --> redis[(Redis)]
api --> ai[AI Providers]
api --> billing[Store Entitlements]
api --> push[Push Gateway]
api --> obs[Observability]
app --> analytics[Product Analytics]
```
## 6. Container Architecture
```mermaid
flowchart TB
subgraph Client
turn[Turn Screen]
mirror[Mirror Screen]
lens[Lens Screen]
spectrum_ui[Spectrum Dashboard]
profile_ui[Gallery and Profile]
end
subgraph Platform
gateway[API Gateway and Auth]
turn_service[Turn Service]
mirror_service[Mirror Service]
lens_service[Lens Service]
spectrum_service[Spectrum Service]
safety_service[Safety Service]
entitlement_service[Entitlement Service]
jobs[Job Scheduler and Workers]
ai_gateway[AI Gateway]
cost_guard[Usage Meter and Cost Guard]
end
subgraph Data
postgres[(PostgreSQL)]
redis[(Redis)]
object_storage[(Object Storage)]
end
subgraph External
ai_provider[DeepSeek V3.2 via OpenRouter + DeepInfra/Fireworks + Claude Haiku fallback]
store_billing[App Store and Play Billing APIs]
push_provider[APNs and FCM]
glitchtip[GlitchTip]
posthog[PostHog self-hosted]
end
turn --> gateway
mirror --> gateway
lens --> gateway
spectrum_ui --> gateway
profile_ui --> gateway
gateway --> turn_service
gateway --> mirror_service
gateway --> lens_service
gateway --> spectrum_service
gateway --> entitlement_service
mirror_service --> safety_service
turn_service --> safety_service
lens_service --> safety_service
spectrum_service --> safety_service
turn_service --> ai_gateway
mirror_service --> ai_gateway
lens_service --> ai_gateway
spectrum_service --> ai_gateway
ai_gateway --> ai_provider
turn_service --> cost_guard
mirror_service --> cost_guard
lens_service --> cost_guard
spectrum_service --> cost_guard
turn_service --> postgres
mirror_service --> postgres
lens_service --> postgres
spectrum_service --> postgres
entitlement_service --> postgres
jobs --> postgres
turn_service --> redis
mirror_service --> redis
lens_service --> redis
spectrum_service --> redis
cost_guard --> redis
jobs --> redis
entitlement_service --> store_billing
jobs --> push_provider
gateway --> glitchtip
gateway --> posthog
spectrum_service --> object_storage
```
## 7. Domain and Service Boundaries
### 7.1 Runtime modules
- `auth`: sign-up/sign-in, token issuance/rotation, device session management.
- `entitlements`: direct App Store + Google Play sync, plan gating (`free`, `prism`, `prism_plus`).
- `mirror`: session lifecycle, message ingestion, fragment detection, inline reframe, reflection.
- `turn`: structured reframing workflow and saved patterns.
- `lens`: goals, actions, daily focus generation, check-ins.
- `spectrum`: analytics feature store, weekly/monthly aggregation, insight generation.
- `safety`: crisis detection, escalation, crisis response policy.
- `ai_gateway`: prompt templates, OpenRouter API integration with provider pinning (DeepInfra/Fireworks primary, Claude Haiku fallback), retries/timeouts, structured output validation.
- `usage_cost`: token telemetry, per-user budgets, global spend controls.
- `notifications`: push scheduling, reminders, weekly summaries.
### 7.2 Why modular monolith first
- Lowest operational overhead at launch.
- Strong transaction boundaries in one codebase.
- Easy extraction path later for `spectrum` workers or `ai_gateway` if load increases.
## 8. Core Data Architecture
### 8.1 Data domains
- Identity: users, profiles, auth_sessions, refresh_tokens.
- Product interactions: turns, mirror_sessions, mirror_messages, mirror_fragments, lens_goals, lens_actions.
- Analytics: spectrum_session_analysis, spectrum_turn_analysis, spectrum_weekly, spectrum_monthly.
- Commerce: subscriptions, entitlement_snapshots, billing_events.
- Safety and operations: safety_events, ai_usage_events, request_logs, audit_events.
### 8.2 Entity relationship view
```mermaid
flowchart LR
users[USERS] --> profiles[PROFILES]
users --> auth_sessions[AUTH_SESSIONS]
users --> refresh_tokens[REFRESH_TOKENS]
users --> turns[TURNS]
users --> mirror_sessions[MIRROR_SESSIONS]
mirror_sessions --> mirror_messages[MIRROR_MESSAGES]
mirror_messages --> mirror_fragments[MIRROR_FRAGMENTS]
users --> lens_goals[LENS_GOALS]
lens_goals --> lens_actions[LENS_ACTIONS]
users --> spectrum_session[SPECTRUM_SESSION_ANALYSIS]
users --> spectrum_turn[SPECTRUM_TURN_ANALYSIS]
users --> spectrum_weekly[SPECTRUM_WEEKLY]
users --> spectrum_monthly[SPECTRUM_MONTHLY]
users --> subscriptions[SUBSCRIPTIONS]
users --> entitlement[ENTITLEMENT_SNAPSHOTS]
users --> safety_events[SAFETY_EVENTS]
users --> ai_usage[AI_USAGE_EVENTS]
```
### 8.3 Storage policy
- Raw reflective content remains in transactional tables, encrypted at rest.
- Spectrum dashboard reads aggregated tables only by default.
- Per-session exclusion flags allow users to opt out entries from analytics.
- Hard delete workflow removes raw + derived analytics for requested windows.
## 9. Key Runtime Sequences
### 9.1 Mirror message processing with safety gate
```mermaid
sequenceDiagram
participant App as Mobile App
participant API as Kalei API
participant Safety as Safety Service
participant Ent as Entitlement Service
participant AI as AI Gateway
participant Model as AI Provider
participant DB as PostgreSQL
participant Redis as Redis
App->>API: POST /mirror/messages
API->>Ent: Check plan/quota
Ent->>Redis: Read counters
Ent-->>API: Allowed
API->>Safety: Crisis precheck
alt Crisis detected
Safety->>DB: Insert safety_event
API-->>App: Crisis resources response
else Not crisis
API->>AI: Detect fragments prompt
AI->>Model: Inference request
Model-->>AI: Fragments with confidence
AI-->>API: Validated structured result
API->>DB: Save message + fragments
API->>Redis: Increment usage counters
API-->>App: Highlight payload
end
```
### 9.2 Turn (Kaleidoscope) request
```mermaid
sequenceDiagram
participant App as Mobile App
participant API as Kalei API
participant Ent as Entitlement Service
participant Safety as Safety Service
participant AI as AI Gateway
participant Model as AI Provider
participant DB as PostgreSQL
participant Cost as Cost Guard
App->>API: POST /turns
API->>Ent: Validate tier + daily cap
API->>Safety: Crisis precheck
alt Crisis detected
API-->>App: Crisis resources response
else Safe
API->>AI: Generate 3 reframes + micro-action
AI->>Model: Inference stream
Model-->>AI: Structured reframes
AI-->>API: Response + token usage
API->>Cost: Record token usage + budget check
API->>DB: Save turn + metadata
API-->>App: Stream final turn result
end
```
### 9.3 Weekly Spectrum aggregation (background)
```mermaid
sequenceDiagram
participant Cron as Scheduler
participant Worker as Spectrum Worker
participant DB as PostgreSQL
participant AI as AI Gateway
participant Model as Batch Provider
participant Push as Notification Service
Cron->>Worker: Trigger weekly job
Worker->>DB: Load eligible users + raw events
Worker->>DB: Compute vectors and weekly aggregates
Worker->>AI: Generate insight narratives from aggregates
AI->>Model: Batch request
Model-->>AI: Insight text
AI-->>Worker: Validated summaries
Worker->>DB: Upsert spectrum_weekly and monthly deltas
Worker->>Push: Enqueue spectrum updated notifications
```
## 10. API Surface (v1)
### 10.1 Auth and profile
- `POST /auth/register`
- `POST /auth/login`
- `POST /auth/refresh`
- `POST /auth/logout`
- `GET /me`
- `PATCH /me/profile`
### 10.2 Mirror
- `POST /mirror/sessions`
- `POST /mirror/messages`
- `POST /mirror/fragments/{id}/reframe`
- `POST /mirror/sessions/{id}/close`
- `GET /mirror/sessions`
- `DELETE /mirror/sessions/{id}`
### 10.3 Turn
- `POST /turns`
- `GET /turns`
- `GET /turns/{id}`
- `POST /turns/{id}/save`
### 10.4 Lens
- `POST /lens/goals`
- `GET /lens/goals`
- `POST /lens/goals/{id}/actions`
- `POST /lens/actions/{id}/complete`
- `GET /lens/affirmation/today`
### 10.5 Spectrum
- `GET /spectrum/weekly`
- `GET /spectrum/monthly`
- `POST /spectrum/reset`
- `POST /spectrum/exclusions`
### 10.6 Billing and entitlements
- `POST /billing/webhooks/apple`
- `POST /billing/webhooks/google`
- `GET /billing/entitlements`
## 11. Security, Safety, and Compliance Architecture
### 11.1 Security controls
- TLS everywhere (edge proxy to API origin and service egress).
- JWT access tokens (short TTL) + rotating refresh tokens.
- Password hashing with Argon2id (preferred) or bcrypt with strong cost factor.
- Row ownership checks enforced in API and optionally DB RLS for defense in depth.
- Secrets in environment vault; never in client bundle.
- Audit logging for auth events, entitlement changes, deletes, and safety events.
### 11.2 Data protection
- Encryption at rest for disk volumes and database backups.
- Column-level encryption for highly sensitive text fields (Mirror message content).
- Data minimization for analytics: Spectrum reads vectors and aggregates by default.
- User rights flows: export, per-item delete, account delete, Spectrum reset.
### 11.3 Safety architecture
- Multi-stage crisis filter:
1. Deterministic keyword and pattern pass.
2. Low-latency model confirmation where needed.
3. Hardcoded crisis response templates and hotline resources.
- Crisis-level content is never reframed.
- Safety events are logged and monitored for false-positive/false-negative tuning.
## 12. Reliability and Performance
### 12.1 Initial SLO targets
- API availability: 99.5% monthly at launch, 99.9% target at scale.
- Turn and Mirror response latency:
- p50 < 1.8s
- p95 < 3.5s
- Weekly Spectrum jobs completed within 2 hours of scheduled run.
### 12.2 Resilience patterns
- Idempotency keys on write endpoints.
- AI provider timeout + retry policy with circuit breaker.
- Graceful degradation hierarchy when budget/latency pressure occurs:
1. Degrade Lens generation first (template fallback).
2. Keep Turn and Mirror available.
3. Pause non-critical Spectrum generation if needed.
- Dead-letter queue for failed async jobs.
## 13. Observability and FinOps
### 13.1 Telemetry
- Structured logs with request ID, user ID hash, feature, model, token usage, cost.
- Metrics:
- request rate/error rate/latency by endpoint
- AI token usage and cost by feature
- quota denials and safety escalations
- Tracing across API -> AI Gateway -> provider call.
### 13.2 Cost controls
- Global monthly AI spend cap and alert thresholds (50%, 80%, 95%).
- Per-user daily token budget in Redis.
- Feature-level cost envelope with OpenRouter provider routing:
- All features: DeepSeek V3.2 via DeepInfra/Fireworks (US/EU, $0.26/$0.38 per MTok)
- Automatic failover: Claude Haiku 4.5 on provider outage ($1.00/$5.00 per MTok)
- Future: introduce tiered model routing at 5,000+ DAU when usage data justifies complexity
- Prompt caching for stable system prompts (DeepInfra ~20% cache hit discount).
## 14. Deployment Topology and Scaling Path
### 14.1 Launch deployment (single-node)
```mermaid
flowchart LR
EDGE[Caddy or Nginx Edge] --> NX[Nginx]
NX --> API[API + Workers]
API --> PG[(PostgreSQL)]
API --> R[(Redis)]
API --> AIP[AI Providers]
```
### 14.2 Scaling evolution
```mermaid
flowchart LR
launch[Launch single VPS API DB Redis] --> traction[Traction split DB keep API monolith]
traction --> growth[Growth separate workers and scale API]
growth --> scale[Scale optional service extraction]
```
### 14.3 Trigger-based scaling
- Move DB off app node when p95 query latency > 120ms sustained or storage > 70%.
- Add API replica when CPU > 70% sustained at peak and p95 latency breaches SLO.
- Split workers when Spectrum jobs impact interactive endpoints.
## 15. Delivery Plan
All features ship in a single unified v1 release. The build is a continuous 12-week effort:
### 15.1 Weeks 14: Platform Foundation
- API skeleton, auth, profile, entitlements integration.
- Postgres schema v1 and migrations.
- Mirror + Turn endpoints with safety pre-check.
- Usage metering and rate limiting.
### 15.2 Weeks 58: Core Experience
- Lens flows, Rehearsal, Ritual, Evidence Wall, and Gallery history.
- Push notifications and daily reminders.
- Full observability, alerting, and incident runbooks.
- Beta load testing and security hardening.
### 15.3 Weeks 912: Spectrum & Launch Readiness
- Spectrum: vector extraction pipeline, aggregated tables, weekly batch jobs, dashboard endpoints.
- Data exclusion controls and reset workflow.
- Cost optimization pass on AI routing.
- Final QA, store submission, beta launch.
## 16. Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Reframe quality variance by provider/model | Core UX degradation | Keep AI Gateway abstraction + blind quality harness + model canary rollout. |
| Safety false negatives | High trust and user harm risk | Defense-in-depth crisis filter + explicit no-reframe crisis policy + monitoring and review loop. |
| AI cost spikes | Margin compression | Hard spend caps, per-feature budgets, degradation order, model fallback lanes. |
| Single-node bottlenecks | Latency and availability issues | Trigger-based scaling plan and early instrumentation. |
| Sensitive data handling errors | Compliance and trust risk | Encryption, strict retention controls, deletion workflows, audit logs. |
## 17. Decision Log and Open Items
### 17.1 Decided in this plan
- Self-hosted API + Postgres + Redis is the canonical launch architecture.
- AI provider routing is built in from day one.
- Safety is an explicit service and gate on all AI-facing paths.
- Spectrum runs asynchronously over aggregated data.
### 17.2 Resolved: AI Provider Strategy (February 2026)
- **Decided:** DeepSeek V3.2 via OpenRouter, pinned to non-Chinese providers (DeepInfra/Fireworks). Single model for all features at launch. Claude Haiku 4.5 as automatic fallback.
- **Rationale:** 8590% cost reduction vs Claude Haiku. Nature 2025 study confirms comparable emotional intelligence scores. Non-Chinese hosting avoids data sovereignty concerns. Single-model approach minimizes complexity for solo founder.
- **Revisit at:** 600+ DAU (evaluate self-hosting), 5,000+ DAU (evaluate tiered model routing).
### 17.3 Remaining open decisions
- Exact hosting target for DB scaling at traction stage (dedicated VPS vs managed Postgres).
- Regional crisis resource strategy (US-first or multi-region at launch).
---
If approved, this document should become the architecture source of truth and supersede conflicting details in older planning docs.