17 KiB

Raw Blame History

Kalei System Architecture Plan

Version: 1.0
Date: 2026-02-10
Status: Proposed canonical architecture for implementation

1. Purpose and Scope

This document consolidates the existing Kalei docs into one implementation-ready system architecture plan.

In scope:

Phase 1 features: Mirror, Kaleidoscope (Turn), Lens, Gallery, subscriptions.
Phase 2 features: Spectrum analytics (weekly/monthly insight pipeline).
Mobile-first architecture (iOS/Android via Expo) with optional web support.
Production operations for safety, privacy, reliability, and cost control.

Out of scope:

Pixel-level UI specs and brand copy details.
Provider contract/legal details.
Full threat model artifacts (to be produced separately).

2. Inputs Reviewed

docs/app-blueprint.md
docs/kalei-infrastructure-plan.md
docs/kalei-ai-model-comparison.md
docs/kalei-mirror-feature.md
docs/kalei-spectrum-phase2.md
docs/kalei-complete-design.md
docs/kalei-brand-guidelines.md

3. Architecture Drivers

3.1 Product drivers

Core loop quality: Mirror fragment detection and Turn reframes must feel high quality and emotionally calibrated.
Daily habit loop: low friction, fast response, strong retention mechanics.
Phase 2 depth: longitudinal Spectrum insights from accumulated usage data.

3.2 Non-functional drivers

Safety first: crisis language must bypass reframing and trigger support flow.
Privacy first: personal reflective writing is highly sensitive.
Cost discipline: launch target under ~EUR 30/month fixed infrastructure.
Operability: architecture must be maintainable by a small team.
Gradual scale: support ~50 DAU at launch and scale to ~10k DAU without full rewrite.

4. Canonical Decisions

This plan resolves conflicting guidance across current docs.

Topic	Decision	Rationale
Backend platform	Self-hosted API-first modular monolith on Node.js (Fastify preferred)	Matches budget constraints and keeps full control of safety, rate limits, and AI routing.
Data layer	PostgreSQL 16 + Redis	Postgres for source-of-truth relational + analytics tables; Redis for counters, rate limits, caching, idempotency.
Auth	JWT auth service in API + refresh token rotation + social login (Apple/Google)	Aligns with self-hosted stack while preserving mobile auth UX.
Mobile	React Native + Expo (local/native builds)	Fastest path for iOS/Android while keeping build pipeline under direct control.
AI integration	AI Gateway abstraction with provider routing	Prevents hard lock-in and supports quality/cost strategy across open-weight model backends and optional hosted fallbacks.
AI default	Open-weight Qwen/Llama family via vLLM (Ollama locally) for Mirror/Turn/Safety-sensitive paths at launch	Keeps model stack open-source-first while preserving routing flexibility.
AI cost fallback	Route low-risk generation (Lens/basic content) to lower-cost providers when budget thresholds hit	Preserves core quality while controlling spend.
Billing	Self-hosted entitlement authority (direct App Store + Google Play server APIs)	Keeps billing logic in-house and avoids closed SaaS dependency in core authorization path.
Analytics/monitoring	PostHog self-hosted + GlitchTip + centralized app logs + cost telemetry	Open-source-first observability stack with lower vendor lock-in.

5. System Context

flowchart LR
    user[User] --> app[Expo App]
    app --> edge[Edge Proxy]
    edge --> api[Kalei API]
    api --> db[(PostgreSQL)]
    api --> redis[(Redis)]
    api --> ai[AI Providers]
    api --> billing[Store Entitlements]
    api --> push[Push Gateway]
    api --> obs[Observability]
    app --> analytics[Product Analytics]

6. Container Architecture

flowchart TB
    subgraph Client
        turn[Turn Screen]
        mirror[Mirror Screen]
        lens[Lens Screen]
        spectrum_ui[Spectrum Dashboard]
        profile_ui[Gallery and Profile]
    end

    subgraph Platform
        gateway[API Gateway and Auth]
        turn_service[Turn Service]
        mirror_service[Mirror Service]
        lens_service[Lens Service]
        spectrum_service[Spectrum Service]
        safety_service[Safety Service]
        entitlement_service[Entitlement Service]
        jobs[Job Scheduler and Workers]
        ai_gateway[AI Gateway]
        cost_guard[Usage Meter and Cost Guard]
    end

    subgraph Data
        postgres[(PostgreSQL)]
        redis[(Redis)]
        object_storage[(Object Storage)]
    end

    subgraph External
        ai_provider[Open-weight models via vLLM or Ollama]
        store_billing[App Store and Play Billing APIs]
        push_provider[APNs and FCM]
        glitchtip[GlitchTip]
        posthog[PostHog self-hosted]
    end

    turn --> gateway
    mirror --> gateway
    lens --> gateway
    spectrum_ui --> gateway
    profile_ui --> gateway

    gateway --> turn_service
    gateway --> mirror_service
    gateway --> lens_service
    gateway --> spectrum_service
    gateway --> entitlement_service

    mirror_service --> safety_service
    turn_service --> safety_service
    lens_service --> safety_service
    spectrum_service --> safety_service

    turn_service --> ai_gateway
    mirror_service --> ai_gateway
    lens_service --> ai_gateway
    spectrum_service --> ai_gateway
    ai_gateway --> ai_provider

    turn_service --> cost_guard
    mirror_service --> cost_guard
    lens_service --> cost_guard
    spectrum_service --> cost_guard

    turn_service --> postgres
    mirror_service --> postgres
    lens_service --> postgres
    spectrum_service --> postgres
    entitlement_service --> postgres
    jobs --> postgres

    turn_service --> redis
    mirror_service --> redis
    lens_service --> redis
    spectrum_service --> redis
    cost_guard --> redis
    jobs --> redis

    entitlement_service --> store_billing
    jobs --> push_provider
    gateway --> glitchtip
    gateway --> posthog
    spectrum_service --> object_storage

7. Domain and Service Boundaries

7.1 Runtime modules

auth: sign-up/sign-in, token issuance/rotation, device session management.
entitlements: direct App Store + Google Play sync, plan gating (free, prism, prism_plus).
mirror: session lifecycle, message ingestion, fragment detection, inline reframe, reflection.
turn: structured reframing workflow and saved patterns.
lens: goals, actions, daily focus generation, check-ins.
spectrum: analytics feature store, weekly/monthly aggregation, insight generation.
safety: crisis detection, escalation, crisis response policy.
ai_gateway: prompt templates, model routing, retries/timeouts, structured output validation.
usage_cost: token telemetry, per-user budgets, global spend controls.
notifications: push scheduling, reminders, weekly summaries.

7.2 Why modular monolith first

Lowest operational overhead at launch.
Strong transaction boundaries in one codebase.
Easy extraction path later for spectrum workers or ai_gateway if load increases.

8. Core Data Architecture

8.1 Data domains

Identity: users, profiles, auth_sessions, refresh_tokens.
Product interactions: turns, mirror_sessions, mirror_messages, mirror_fragments, lens_goals, lens_actions.
Analytics: spectrum_session_analysis, spectrum_turn_analysis, spectrum_weekly, spectrum_monthly.
Commerce: subscriptions, entitlement_snapshots, billing_events.
Safety and operations: safety_events, ai_usage_events, request_logs, audit_events.

8.2 Entity relationship view

flowchart LR
    users[USERS] --> profiles[PROFILES]
    users --> auth_sessions[AUTH_SESSIONS]
    users --> refresh_tokens[REFRESH_TOKENS]
    users --> turns[TURNS]
    users --> mirror_sessions[MIRROR_SESSIONS]
    mirror_sessions --> mirror_messages[MIRROR_MESSAGES]
    mirror_messages --> mirror_fragments[MIRROR_FRAGMENTS]
    users --> lens_goals[LENS_GOALS]
    lens_goals --> lens_actions[LENS_ACTIONS]
    users --> spectrum_session[SPECTRUM_SESSION_ANALYSIS]
    users --> spectrum_turn[SPECTRUM_TURN_ANALYSIS]
    users --> spectrum_weekly[SPECTRUM_WEEKLY]
    users --> spectrum_monthly[SPECTRUM_MONTHLY]
    users --> subscriptions[SUBSCRIPTIONS]
    users --> entitlement[ENTITLEMENT_SNAPSHOTS]
    users --> safety_events[SAFETY_EVENTS]
    users --> ai_usage[AI_USAGE_EVENTS]

8.3 Storage policy

Raw reflective content remains in transactional tables, encrypted at rest.
Spectrum dashboard reads aggregated tables only by default.
Per-session exclusion flags allow users to opt out entries from analytics.
Hard delete workflow removes raw + derived analytics for requested windows.

9. Key Runtime Sequences

9.1 Mirror message processing with safety gate

sequenceDiagram
    participant App as Mobile App
    participant API as Kalei API
    participant Safety as Safety Service
    participant Ent as Entitlement Service
    participant AI as AI Gateway
    participant Model as AI Provider
    participant DB as PostgreSQL
    participant Redis as Redis

    App->>API: POST /mirror/messages
    API->>Ent: Check plan/quota
    Ent->>Redis: Read counters
    Ent-->>API: Allowed
    API->>Safety: Crisis precheck
    alt Crisis detected
        Safety->>DB: Insert safety_event
        API-->>App: Crisis resources response
    else Not crisis
        API->>AI: Detect fragments prompt
        AI->>Model: Inference request
        Model-->>AI: Fragments with confidence
        AI-->>API: Validated structured result
        API->>DB: Save message + fragments
        API->>Redis: Increment usage counters
        API-->>App: Highlight payload
    end

9.2 Turn (Kaleidoscope) request

sequenceDiagram
    participant App as Mobile App
    participant API as Kalei API
    participant Ent as Entitlement Service
    participant Safety as Safety Service
    participant AI as AI Gateway
    participant Model as AI Provider
    participant DB as PostgreSQL
    participant Cost as Cost Guard

    App->>API: POST /turns
    API->>Ent: Validate tier + daily cap
    API->>Safety: Crisis precheck
    alt Crisis detected
        API-->>App: Crisis resources response
    else Safe
        API->>AI: Generate 3 reframes + micro-action
        AI->>Model: Inference stream
        Model-->>AI: Structured reframes
        AI-->>API: Response + token usage
        API->>Cost: Record token usage + budget check
        API->>DB: Save turn + metadata
        API-->>App: Stream final turn result
    end

9.3 Weekly Spectrum aggregation (background)

sequenceDiagram
    participant Cron as Scheduler
    participant Worker as Spectrum Worker
    participant DB as PostgreSQL
    participant AI as AI Gateway
    participant Model as Batch Provider
    participant Push as Notification Service

    Cron->>Worker: Trigger weekly job
    Worker->>DB: Load eligible users + raw events
    Worker->>DB: Compute vectors and weekly aggregates
    Worker->>AI: Generate insight narratives from aggregates
    AI->>Model: Batch request
    Model-->>AI: Insight text
    AI-->>Worker: Validated summaries
    Worker->>DB: Upsert spectrum_weekly and monthly deltas
    Worker->>Push: Enqueue spectrum updated notifications

10. API Surface (v1)

10.1 Auth and profile

POST /auth/register
POST /auth/login
POST /auth/refresh
POST /auth/logout
GET /me
PATCH /me/profile

10.2 Mirror

POST /mirror/sessions
POST /mirror/messages
POST /mirror/fragments/{id}/reframe
POST /mirror/sessions/{id}/close
GET /mirror/sessions
DELETE /mirror/sessions/{id}

10.3 Turn

POST /turns
GET /turns
GET /turns/{id}
POST /turns/{id}/save

10.4 Lens

POST /lens/goals
GET /lens/goals
POST /lens/goals/{id}/actions
POST /lens/actions/{id}/complete
GET /lens/affirmation/today

10.5 Spectrum

GET /spectrum/weekly
GET /spectrum/monthly
POST /spectrum/reset
POST /spectrum/exclusions

10.6 Billing and entitlements

POST /billing/webhooks/apple
POST /billing/webhooks/google
GET /billing/entitlements

11. Security, Safety, and Compliance Architecture

11.1 Security controls

TLS everywhere (edge proxy to API origin and service egress).
JWT access tokens (short TTL) + rotating refresh tokens.
Password hashing with Argon2id (preferred) or bcrypt with strong cost factor.
Row ownership checks enforced in API and optionally DB RLS for defense in depth.
Secrets in environment vault; never in client bundle.
Audit logging for auth events, entitlement changes, deletes, and safety events.

11.2 Data protection

Encryption at rest for disk volumes and database backups.
Column-level encryption for highly sensitive text fields (Mirror message content).
Data minimization for analytics: Spectrum reads vectors and aggregates by default.
User rights flows: export, per-item delete, account delete, Spectrum reset.

11.3 Safety architecture

Multi-stage crisis filter:
1. Deterministic keyword and pattern pass.
2. Low-latency model confirmation where needed.
3. Hardcoded crisis response templates and hotline resources.
Crisis-level content is never reframed.
Safety events are logged and monitored for false-positive/false-negative tuning.

12. Reliability and Performance

12.1 Initial SLO targets

API availability: 99.5% monthly (Phase 1), 99.9% target by Phase 2.
Turn and Mirror response latency:
- p50 < 1.8s
- p95 < 3.5s
Weekly Spectrum jobs completed within 2 hours of scheduled run.

12.2 Resilience patterns

Idempotency keys on write endpoints.
AI provider timeout + retry policy with circuit breaker.
Graceful degradation hierarchy when budget/latency pressure occurs:
1. Degrade Lens generation first (template fallback).
2. Keep Turn and Mirror available.
3. Pause non-critical Spectrum generation if needed.
Dead-letter queue for failed async jobs.

13. Observability and FinOps

13.1 Telemetry

Structured logs with request ID, user ID hash, feature, model, token usage, cost.
Metrics:
- request rate/error rate/latency by endpoint
- AI token usage and cost by feature
- quota denials and safety escalations
Tracing across API -> AI Gateway -> provider call.

13.2 Cost controls

Global monthly AI spend cap and alert thresholds (50%, 80%, 95%).
Per-user daily token budget in Redis.
Feature-level cost envelope with model routing:
- Turn/Mirror: quality-first lane
- Lens/Spectrum narrative generation: cost-optimized lane as needed
Prompt caching for stable system prompts.

14. Deployment Topology and Scaling Path

14.1 Phase 1 deployment (single-node)

flowchart LR
    EDGE[Caddy or Nginx Edge] --> NX[Nginx]
    NX --> API[API + Workers]
    API --> PG[(PostgreSQL)]
    API --> R[(Redis)]
    API --> AIP[AI Providers]

14.2 Phase evolution

flowchart LR
    p1[Phase 1 single VPS API DB Redis] --> p2[Phase 2 split DB keep API monolith]
    p2 --> p3[Phase 3 separate workers and scale API]
    p3 --> p4[Phase 4 optional service extraction]

14.3 Trigger-based scaling

Move DB off app node when p95 query latency > 120ms sustained or storage > 70%.
Add API replica when CPU > 70% sustained at peak and p95 latency breaches SLO.
Split workers when Spectrum jobs impact interactive endpoints.

15. Delivery Plan

15.1 Phase A (Weeks 1-4): Platform foundation

API skeleton, auth, profile, entitlements integration.
Postgres schema v1 and migrations.
Mirror + Turn endpoints with safety pre-check.
Usage metering and rate limiting.

15.2 Phase B (Weeks 5-8): Product completion for launch

Lens flows and Gallery history.
Push notifications and daily reminders.
Full observability, alerting, and incident runbooks.
Beta load testing and security hardening.

15.3 Phase C (Weeks 9-12): Spectrum baseline (Phase 2 readiness)

Vector extraction pipeline and aggregated tables.
Weekly batch jobs and dashboard endpoints.
Data exclusion controls and reset workflow.
Cost optimization pass on AI routing.

16. Risks and Mitigations

Risk	Impact	Mitigation
Reframe quality variance by provider/model	Core UX degradation	Keep AI Gateway abstraction + blind quality harness + model canary rollout.
Safety false negatives	High trust and user harm risk	Defense-in-depth crisis filter + explicit no-reframe crisis policy + monitoring and review loop.
AI cost spikes	Margin compression	Hard spend caps, per-feature budgets, degradation order, model fallback lanes.
Single-node bottlenecks	Latency and availability issues	Trigger-based scaling plan and early instrumentation.
Sensitive data handling errors	Compliance and trust risk	Encryption, strict retention controls, deletion workflows, audit logs.

17. Decision Log and Open Items

17.1 Decided in this plan

Self-hosted API + Postgres + Redis is the canonical launch architecture.
AI provider routing is built in from day one.
Safety is an explicit service and gate on all AI-facing paths.
Spectrum runs asynchronously over aggregated data.

17.2 Open decisions to finalize before build start

Final provider mix at launch:
- Option A: Qwen-first on all features via vLLM.
- Option B: Qwen for Turn/Mirror and smaller open-weight model for Lens/Spectrum narratives.
Exact hosting target for Phase 2 DB scaling (dedicated VPS vs managed Postgres).
Regional crisis resource strategy (US-first or multi-region at launch).

If approved, this document should become the architecture source of truth and supersede conflicting details in older planning docs.

17 KiB Raw Blame History