From 0835cadfd9b9f742011255957fa2921551563c45 Mon Sep 17 00:00:00 2001 From: Matt Date: Thu, 12 Feb 2026 00:07:12 +0100 Subject: [PATCH] Add comprehensive project documentation and phase plans Includes architecture, infrastructure, AI model comparison, getting started guide, and detailed phase-by-phase development roadmap. Co-Authored-By: Claude Sonnet 4.5 --- docs/codex phase documents/README.md | 51 ++ .../phase-0-groundwork-and-dev-environment.md | 185 +++++++ .../phase-1-platform-foundation.md | 235 ++++++++ .../phase-2-core-experience-build.md | 210 ++++++++ .../phase-3-launch-readiness-and-hardening.md | 167 ++++++ .../phase-4-spectrum-and-scale.md | 158 ++++++ docs/kalei-ai-model-comparison.md | 172 ++++++ docs/kalei-getting-started.md | 326 +++++++++++ docs/kalei-infrastructure-plan.md | 500 +++++++++++++++++ docs/kalei-system-architecture-plan.md | 509 ++++++++++++++++++ 10 files changed, 2513 insertions(+) create mode 100644 docs/codex phase documents/README.md create mode 100644 docs/codex phase documents/phase-0-groundwork-and-dev-environment.md create mode 100644 docs/codex phase documents/phase-1-platform-foundation.md create mode 100644 docs/codex phase documents/phase-2-core-experience-build.md create mode 100644 docs/codex phase documents/phase-3-launch-readiness-and-hardening.md create mode 100644 docs/codex phase documents/phase-4-spectrum-and-scale.md create mode 100644 docs/kalei-ai-model-comparison.md create mode 100644 docs/kalei-getting-started.md create mode 100644 docs/kalei-infrastructure-plan.md create mode 100644 docs/kalei-system-architecture-plan.md diff --git a/docs/codex phase documents/README.md b/docs/codex phase documents/README.md new file mode 100644 index 0000000..47bbbcd --- /dev/null +++ b/docs/codex phase documents/README.md @@ -0,0 +1,51 @@ +# Codex Phase Documents + +Last updated: 2026-02-10 + +This folder contains the deep execution plan for building Kalei from zero to launch and then Phase 2 growth. + +Read in order: + +1. `phase-0-groundwork-and-dev-environment.md` +2. `phase-1-platform-foundation.md` +3. `phase-2-core-experience-build.md` +4. `phase-3-launch-readiness-and-hardening.md` +5. `phase-4-spectrum-and-scale.md` + +## Phase map + +- Phase 0: Groundwork and Dev Environment + - Goal: stable tooling, accounts, repo standards, and local infrastructure. +- Phase 1: Platform Foundation + - Goal: production-quality backend skeleton, auth, entitlements, core data model. +- Phase 2: Core Experience Build + - Goal: ship Mirror, Turn, Lens end-to-end in mobile + API. +- Phase 3: Launch Readiness and Hardening + - Goal: safety, billing, reliability, compliance, app store readiness. +- Phase 4: Spectrum and Scale + - Goal: analytics pipeline, weekly and monthly insights, scaling controls. + +## Definition of progression + +Do not start the next phase until the current phase exit checklist is complete. + +If a phase slips, reduce scope but do not skip quality gates for: + +- security +- safety +- observability +- data integrity + +## Tooling policy + +These phase docs assume an open-source-first stack: + +- Gitea for source control and CI +- GlitchTip for error tracking +- PostHog self-hosted for product analytics +- Ollama (local) and vLLM (staging/prod) for open-weight model serving + +Platform exceptions remain for mobile distribution and push: + +- Apple App Store and Google Play billing/distribution APIs +- APNs and FCM delivery infrastructure diff --git a/docs/codex phase documents/phase-0-groundwork-and-dev-environment.md b/docs/codex phase documents/phase-0-groundwork-and-dev-environment.md new file mode 100644 index 0000000..4ef9aa6 --- /dev/null +++ b/docs/codex phase documents/phase-0-groundwork-and-dev-environment.md @@ -0,0 +1,185 @@ +# Phase 0 - Groundwork and Dev Environment + +Duration: 1-2 weeks +Primary owner: Founder + coding assistant + +## 1. Objective + +Build a stable base so feature work can move fast without breaking: + +- all required accounts are created +- local stack boots reliably +- repo structure and standards are in place +- CI checks run on every pull request + +## 2. Prerequisites + +- Read `docs/kalei-getting-started.md` +- Read `docs/kalei-system-architecture-plan.md` + +## 3. Outcomes + +By the end of Phase 0 you will have: + +- local Postgres and Redis running via Docker +- mobile app bootstrapped with Expo +- API service bootstrapped with Fastify +- initial DB migration system in place +- lint, format, and test commands working +- CI pipeline validating every PR + +## 4. Deep Work Breakdown + +## 4.1 Access and Account Setup + +Task list: + +1. Set up Gitea (self-hosted or managed) for source control and CI. +2. Set up open-weight model serving accounts and endpoints (Ollama local, vLLM target host). +3. Create GlitchTip project for API and mobile error tracking. +4. Create PostHog self-hosted project for product analytics. +5. Set up DNS (PowerDNS self-hosted or managed DNS provider) and add domain. +6. Confirm Apple Developer and Google Play Console access (required for app distribution). + +Deliverables: + +- shared credential inventory (local secure password manager) +- documented secret naming convention + +## 4.2 Repository and Branching Standards + +Task list: + +1. Define branch policy: `main`, short-lived feature branches. +2. Define PR checklist template. +3. Add CODEOWNERS or at least reviewer policy. +4. Add issue templates for bug and feature requests. + +Deliverables: + +- `CONTRIBUTING.md` +- `.gitea/pull_request_template.md` (or repository PR template equivalent) +- `.gitea/ISSUE_TEMPLATE/*` (or repository issue template equivalent) + +## 4.3 Local Development Environment + +Task list: + +1. Install and verify Git, Node, npm, Docker, Expo CLI. +2. Add Docker compose for Postgres + Redis. +3. Create `.env.example` for API and mobile. +4. Add one-command local start script. + +Deliverables: + +- `infra/docker/docker-compose.yml` +- `services/api/.env.example` +- `apps/mobile/.env.example` +- root `Makefile` or npm scripts for local startup + +## 4.4 API and Mobile Skeletons + +Task list: + +1. Create Fastify app with health endpoint. +2. Create Expo app with tabs template. +3. Add API client module to mobile app. +4. Show backend health status in app. + +Deliverables: + +- API running on local port +- mobile app able to read API response + +## 4.5 Data and Migration Baseline + +Task list: + +1. Choose migration tool (for example, `node-pg-migrate`, `drizzle`, or `knex`). +2. Create first migration set for identity tables. +3. Add migration run and rollback commands. +4. Add seed command for local dev data. + +Minimum initial tables: + +- users +- profiles +- auth_sessions +- refresh_tokens + +## 4.6 Quality and Automation Baseline + +Task list: + +1. Add ESLint + Prettier for API and mobile. +2. Add API unit test framework and one integration test. +3. Configure Gitea Actions (or Woodpecker CI) for lint + test. +4. Add commit hooks (optional but recommended) using `husky`. + +Deliverables: + +- passing CI on every push/PR +- at least one passing API integration test + +## 5. Suggested Day-by-Day Plan + +Day 1: + +- account setup +- tooling install +- repo folder scaffold + +Day 2: + +- docker compose and env files +- API skeleton with `/health` + +Day 3: + +- Expo app setup +- mobile to API health call + +Day 4: + +- migration tooling and first migrations +- baseline seed script + +Day 5: + +- linting and tests +- CI setup +- first stable baseline commit + +## 6. Validation Checklist + +All items must be true: + +- `docker compose` starts Postgres and Redis with no errors. +- API starts and `GET /health` returns 200. +- Mobile app loads and displays backend health. +- Migrations can run on clean DB and rollback at least one step. +- CI runs lint and tests successfully. + +## 7. Exit Criteria + +You can exit Phase 0 when: + +- no manual setup surprises remain for a fresh machine +- all team members can run the stack locally in under 30 minutes +- baseline quality checks are automated + +## 8. Platform exceptions + +These are not open source, but required for shipping mobile apps: + +- Apple App Store tooling and APIs +- Google Play tooling and APIs + +## 9. Typical Pitfalls and Fixes + +- Pitfall: unclear `.env` expectations. + - Fix: complete `.env.example` files with comments. +- Pitfall: mobile app cannot reach local API on real device. + - Fix: use machine LAN IP, not localhost, for device testing. +- Pitfall: migration drift. + - Fix: never edit applied migration files; create a new migration. diff --git a/docs/codex phase documents/phase-1-platform-foundation.md b/docs/codex phase documents/phase-1-platform-foundation.md new file mode 100644 index 0000000..f734aa7 --- /dev/null +++ b/docs/codex phase documents/phase-1-platform-foundation.md @@ -0,0 +1,235 @@ +# Phase 1 - Platform Foundation + +Duration: 2-3 weeks +Primary owner: Backend-first with mobile stub integration + +## 1. Objective + +Build a production-grade platform foundation: + +- robust auth and session model +- entitlement checks for free vs paid plans +- core domain schema for Mirror, Turn, and Lens +- AI gateway scaffold with usage metering +- observability and error handling baseline + +## 2. Entry Criteria + +Phase 0 exit checklist must be complete. + +## 3. Core Scope + +## 3.1 API Module Setup + +Implement service modules: + +- auth +- profiles +- entitlements +- mirror (session/message skeleton) +- turn (request skeleton) +- lens (goal/action skeleton) +- ai_gateway +- usage_cost +- safety (precheck skeleton) + +Each module needs: + +- route handlers +- input/output schema validation +- service layer +- repository/data access layer +- unit tests + +## 3.2 Identity and Access + +Implement: + +- email/password registration and login +- JWT access token (short TTL) +- refresh token rotation and revocation +- logout all sessions +- role model (at least `user`, `admin`) + +Security details: + +- hash passwords with Argon2id or bcrypt +- store refresh tokens hashed +- include device metadata per session + +## 3.3 Entitlement Model + +Implement plan model now, even before paywall UI is complete. + +Suggested plan keys: + +- `free` +- `prism` +- `prism_plus` + +Implement gates for: + +- turns per day +- mirror sessions per week +- spectrum access + +Integration approach: + +- no RevenueCat dependency +- ingest App Store Server Notifications directly +- ingest Google Play RTDN notifications directly +- maintain local entitlement snapshots as source of truth for authorization + +## 3.4 Data Model (Phase 1 schema) + +Create migrations for: + +- users +- profiles +- subscriptions +- entitlement_snapshots +- turns +- mirror_sessions +- mirror_messages +- mirror_fragments +- lens_goals +- lens_actions +- ai_usage_events +- safety_events + +Design requirements: + +- every row has `created_at`, `updated_at` where relevant +- index by `user_id` and key query timestamp +- soft delete where legal retention requires it + +## 3.5 AI Gateway Scaffold + +Implement a strict abstraction now: + +- provider adapter interface +- request envelope (feature, model, temperature, timeout) +- response normalization +- token usage extraction +- retry + timeout + circuit breaker policy + +Do not expose provider SDK directly in feature modules. + +## 3.6 Safety Precheck Skeleton + +Implement now even if rule set is basic: + +- deterministic keyword precheck +- safety event logging +- return safety status to caller + +Mirror and Turn endpoints must call this precheck before generation. + +## 3.7 Usage Metering and Cost Guardrails + +Implement: + +- per-user usage counters in Redis +- endpoint-level rate limit middleware +- AI usage event write on every provider call +- per-feature daily budget checks + +## 3.8 Observability Baseline + +Implement: + +- structured logging with request IDs +- error tracking to GlitchTip +- latency and error metrics per endpoint +- AI cost metrics by feature + +## 4. Detailed Build Sequence + +Week 1: + +1. Finalize schema and migration files. +2. Implement auth and profile endpoints. +3. Add integration tests for auth flows. + +Week 2: + +1. Implement entitlements and plan gating middleware. +2. Implement AI gateway interface and one real provider adapter. +3. Implement Redis rate limits and usage counters. + +Week 3: + +1. Implement Mirror and Turn endpoint skeletons with safety precheck. +2. Implement Lens goal and action skeleton endpoints. +3. Add complete observability hooks and dashboards. + +## 5. API Contract Minimum For End Of Phase + +Auth: + +- `POST /auth/register` +- `POST /auth/login` +- `POST /auth/refresh` +- `POST /auth/logout` +- `GET /me` + +Entitlements: + +- `GET /billing/entitlements` +- webhook endpoints for App Store and Google Play billing event ingestion + +Feature skeleton: + +- `POST /mirror/sessions` +- `POST /mirror/messages` +- `POST /turns` +- `POST /lens/goals` + +## 6. Testing Requirements + +Minimum automated coverage: + +- auth happy path and invalid credential path +- token refresh rotation path +- entitlement denial for free limits +- safety precheck path for crisis keyword match +- AI gateway timeout and fallback behavior + +Recommended: + +- basic load test for auth + turn skeleton endpoints + +## 7. Phase Deliverables + +Code deliverables: + +- migration files for core schema +- API modules with tests +- Redis-backed rate limit and usage tracking +- AI gateway abstraction with one provider +- safety precheck middleware + +Operational deliverables: + +- GlitchTip configured +- endpoint metrics visible +- API runbook for local and staging + +## 8. Exit Criteria + +You can exit Phase 1 when: + +- core auth model is stable and tested +- plan gating is enforced server-side +- Mirror/Turn/Lens endpoint skeletons are live +- AI calls only happen through AI gateway +- logs, metrics, and error tracking are active + +## 9. Risks To Watch + +- Risk: auth complexity balloons early. + - Mitigation: keep v1 auth strict but minimal; defer advanced IAM. +- Risk: schema churn from feature uncertainty. + - Mitigation: maintain a schema decision log and avoid premature optimization. +- Risk: provider coupling in feature code. + - Mitigation: enforce gateway adapter pattern in code review. diff --git a/docs/codex phase documents/phase-2-core-experience-build.md b/docs/codex phase documents/phase-2-core-experience-build.md new file mode 100644 index 0000000..27e2be9 --- /dev/null +++ b/docs/codex phase documents/phase-2-core-experience-build.md @@ -0,0 +1,210 @@ +# Phase 2 - Core Experience Build + +Duration: 3-5 weeks +Primary owner: Mobile + backend in parallel + +## 1. Objective + +Ship Kalei's core user experience end-to-end: + +- Mirror with fragment highlighting and inline reframe +- Turn generation with 3 perspectives and micro-action +- Lens goals, daily actions, and daily focus/affirmation flow +- Gallery/history views for user continuity + +## 2. Entry Criteria + +Phase 1 exit checklist complete. + +## 3. Product Scope In This Phase + +## 3.1 Mirror (Awareness) + +Required behavior: + +- user starts mirror session +- user submits messages +- backend runs safety precheck first +- backend runs fragment detection on safe content +- app highlights detected fragments above confidence threshold +- user taps fragment for inline reframe +- user closes session and receives reflection summary + +Backend work: + +- finalize `mirror_sessions`, `mirror_messages`, `mirror_fragments` +- add close-session reflection endpoint +- add mirror session list/detail endpoints + +Mobile work: + +- mirror compose UI +- highlight rendering for detected fragment ranges +- tap-to-reframe interaction card +- session close and reflection display + +## 3.2 Turn (Kaleidoscope) + +Required behavior: + +- user submits a fragment or thought +- backend runs safety precheck +- backend generates 3 reframed perspectives +- backend returns micro-action (if-then) +- user can save turn to gallery + +Backend work: + +- finalize `turns` table and categories +- add save/unsave state +- add history list endpoint + +Mobile work: + +- turn input and loading animation +- display 3 patterns + micro-action +- save to gallery and view history + +## 3.3 Lens (Direction) + +Required behavior: + +- user creates one or more goals +- app generates or stores daily action suggestions +- user can mark actions complete +- optional daily affirmation/focus shown + +Backend work: + +- finalize `lens_goals`, `lens_actions` +- daily action generation endpoint +- daily affirmation endpoint through AI gateway + +Mobile work: + +- goal creation UI +- daily action checklist UI +- completion updates and streak indicator + +## 4. Deep Technical Workstreams + +## 4.1 Prompt and Output Contracts + +Create strict prompt templates and JSON output contracts per feature: + +- Mirror fragment detection +- Mirror inline reframe +- Turn multi-pattern output +- Lens daily focus output + +Require server-side validation of AI output shape before returning to clients. + +## 4.2 Safety Integration + +At this phase safety must be complete for user-facing flows: + +- all Mirror and Turn requests pass safety gate +- crisis response path returns resource payload, not reframe payload +- safety events are queryable for audit + +## 4.3 Entitlement Enforcement + +Enforce in API middleware: + +- free turn daily limits +- free mirror weekly limits +- spectrum endpoint lock for non-entitled users + +Add clear response codes and client UI handling for plan limits. + +## 4.4 Performance Targets + +Set targets now and test against them: + +- Mirror fragment detection p95 under 3.5s +- Turn generation p95 under 3.5s +- client screen transitions under 300ms for cached navigation + +## 5. Detailed Build Plan + +Week 1: + +- finish Mirror backend and basic mobile UI +- complete fragment highlight rendering + +Week 2: + +- finish inline reframe flow and session reflections +- add Mirror history and session detail view + +Week 3: + +- finish Turn backend and mobile flow +- add save/history integration + +Week 4: + +- finish Lens goals and daily actions +- add daily focus/affirmation flow + +Week 5 (optional hardening week): + +- optimize latency +- improve retry and offline handling +- run end-to-end QA pass + +## 6. Test Plan + +Unit tests: + +- prompt builder functions +- AI output validators +- entitlement middleware +- safety decision functions + +Integration tests: + +- full Mirror message lifecycle +- full Turn generation lifecycle +- Lens action completion lifecycle + +Manual QA matrix: + +- normal usage +- plan-limit blocked usage +- low-connectivity behavior +- crisis-language safety behavior + +## 7. Deliverables + +Functional deliverables: + +- Mirror v1 complete +- Turn v1 complete +- Lens v1 complete +- Gallery/history v1 complete + +Engineering deliverables: + +- stable endpoint contracts +- documented prompt versions +- meaningful test coverage for critical flows +- feature-level latency and error metrics + +## 8. Exit Criteria + +You can exit Phase 2 when: + +- users can complete Mirror -> Turn -> Lens flow end-to-end +- plan limits and safety behavior are consistent and test-backed +- no critical P0 bugs in core user paths +- telemetry confirms baseline latency and reliability targets + +## 9. Risks To Watch + +- Risk: output variability from model causes UI breakage. + - Mitigation: strict response schema validation and fallback copy. +- Risk: too much feature scope in one pass. + - Mitigation: ship v1 flows first, defer advanced UX polish. +- Risk: latency drift from complex prompts. + - Mitigation: simplify prompts and use cached static context. diff --git a/docs/codex phase documents/phase-3-launch-readiness-and-hardening.md b/docs/codex phase documents/phase-3-launch-readiness-and-hardening.md new file mode 100644 index 0000000..e27ee90 --- /dev/null +++ b/docs/codex phase documents/phase-3-launch-readiness-and-hardening.md @@ -0,0 +1,167 @@ +# Phase 3 - Launch Readiness and Hardening + +Duration: 2-4 weeks +Primary owner: Full stack + operations focus + +## 1. Objective + +Prepare Kalei for real users with production safeguards: + +- safety policy completion and crisis flow readiness +- subscription and entitlement reliability +- app and API operational stability +- privacy and compliance basics for app store approval + +## 2. Entry Criteria + +Phase 2 exit checklist complete. + +## 3. Scope + +## 3.1 Safety and Trust Hardening + +Tasks: + +1. finalize crisis keyword and pattern sets +2. validate crisis response templates and regional resources +3. add safety dashboards and alerting +4. add audit trail for safety decisions + +Validation goals: + +- crisis path returns under 1 second in most cases +- no crisis path returns reframing output + +## 3.2 Billing and Entitlements + +Tasks: + +1. complete App Store Server Notifications ingestion +2. complete Google Play RTDN ingestion +3. build reconciliation jobs for both stores (entitlements sync) +4. test expired, canceled, trial, billing retry, and restore scenarios +5. add paywall gating in all required clients + +Validation goals: + +- entitlement state converges within minutes after billing changes +- no premium endpoint access for expired plans + +## 3.3 Reliability Engineering + +Tasks: + +1. finalize health checks and readiness probes +2. add backup and restore procedures for Postgres +3. add Redis persistence strategy for critical counters if required +4. define incident severity levels and on-call workflow + +Validation goals: + +- verified DB restore from backup in staging +- runbook exists for API outage, DB outage, AI provider outage + +## 3.4 Security and Compliance Baseline + +Tasks: + +1. secrets rotation policy and documented process +2. verify transport security and secure headers +3. verify account deletion and data export flows +4. prepare privacy policy and terms for submission + +Validation goals: + +- basic security checklist signed off +- app store privacy disclosures map to real data flows + +## 3.5 Observability and Cost Control + +Tasks: + +1. define alerts for latency, error rate, and AI spend thresholds +2. implement monthly spend cap and automatic degradation rules +3. monitor feature-level token cost dashboards + +Validation goals: + +- alert thresholds tested in staging +- degradation path verified (Lens fallback first) + +## 3.6 Beta and Release Pipeline + +Tasks: + +1. set up TestFlight internal/external testing +2. set up Android internal testing track +3. run beta cycle with scripted feedback collection +4. triage and fix launch-blocking defects + +Validation goals: + +- no unresolved launch-blocking defects +- release checklist complete for both stores + +## 4. Suggested Execution Plan + +Week 1: + +- safety hardening and billing reconciliation +- initial reliability runbooks + +Week 2: + +- security/compliance checks +- backup and restore drills +- full observability alert tuning + +Week 3: + +- TestFlight and Play internal beta +- defect triage and fixes + +Week 4 (if needed): + +- final store submission materials +- go/no-go readiness review + +## 5. Release Checklists + +## 5.1 API release checklist + +- migration plan reviewed +- rollback plan documented +- dashboards green +- error budget acceptable + +## 5.2 Mobile release checklist + +- build reproducibility verified +- crash-free session baseline from beta acceptable +- paywall and entitlement states correct +- copy and metadata final + +## 5.3 Business and policy checklist + +- privacy policy URL live +- terms URL live +- support contact available +- crisis resources configured for launch regions + +## 6. Exit Criteria + +You can exit Phase 3 when: + +- app is store-ready with stable entitlement behavior +- safety flow is verified and monitored +- operations runbooks and alerts are live +- backup and restore are proven in practice + +## 7. Risks To Watch + +- Risk: entitlement mismatch from webhook delays. + - Mitigation: scheduled reconciliation and idempotent webhook handling. +- Risk: launch-day AI latency spikes. + - Mitigation: timeout limits and graceful fallback behavior. +- Risk: compliance gaps discovered late. + - Mitigation: complete privacy mapping before store submission. diff --git a/docs/codex phase documents/phase-4-spectrum-and-scale.md b/docs/codex phase documents/phase-4-spectrum-and-scale.md new file mode 100644 index 0000000..945572a --- /dev/null +++ b/docs/codex phase documents/phase-4-spectrum-and-scale.md @@ -0,0 +1,158 @@ +# Phase 4 - Spectrum and Scale + +Duration: 3-6 weeks +Primary owner: Data + backend + product analytics + +## 1. Objective + +Deliver Phase 2 intelligence features and scaling maturity: + +- Spectrum weekly and monthly insights +- aggregated analytics model over user activity +- asynchronous jobs and batch processing +- cost, reliability, and scaling controls for growth + +## 2. Entry Criteria + +Phase 3 exit checklist complete. + +## 3. Scope + +## 3.1 Spectrum Data Foundation + +Implement tables and data flow for: + +- session-level emotional vectors +- turn-level impact analysis +- weekly aggregates +- monthly aggregates + +Data design requirements: + +- user-level partition/index strategy for query speed +- clear retention and deletion behavior +- exclusion flags so users can omit sessions from analysis + +## 3.2 Aggregation Pipeline + +Build asynchronous jobs: + +1. post-session analysis job +2. weekly aggregation job +3. monthly narrative job + +Job engineering requirements: + +- idempotency keys +- retry with backoff +- dead-letter queue for failures +- metrics for queue depth and job duration + +## 3.3 Spectrum Insight Generation + +Implement AI-assisted summary generation using aggregated data only. + +Rules: + +- do not include raw user text in generated insights by default +- validate output tone and safety constraints +- version prompts and track prompt revisions + +## 3.4 Spectrum API and Client + +Backend endpoints: + +- weekly insight feed +- monthly deep dive +- spectrum reset +- exclusions management + +Mobile screens: + +- emotional landscape view +- pattern distribution view +- insight feed cards +- monthly summary panel + +## 3.5 Growth-Ready Scale Controls + +Implement scale milestones: + +- worker isolation from interactive API if needed +- database optimization and index tuning +- caching strategy for read-heavy insight endpoints +- cost-aware model routing for non-critical generation + +## 4. Detailed Execution Plan + +Week 1: + +- schema rollout for spectrum tables +- event ingestion hooks from Mirror/Turn/Lens + +Week 2: + +- implement post-session analysis and weekly aggregation jobs +- add metrics and retries + +Week 3: + +- implement monthly aggregation and narrative generation +- implement spectrum API endpoints + +Week 4: + +- mobile spectrum dashboard v1 +- push notification hooks for weekly summaries + +Week 5-6 (as needed): + +- performance tuning +- scale and cost optimization +- UX polish for insight comprehension + +## 5. Quality and Analytics Requirements + +Quality gates: + +- no raw-content leakage in Spectrum UI +- weekly job completion SLA met +- dashboard load times within agreed target + +Analytics requirements: + +- track spectrum engagement events +- track conversion impact from spectrum teaser to upgrade +- track retention lift for spectrum users vs non-spectrum users + +## 6. Deliverables + +Functional deliverables: + +- Spectrum dashboard v1 +- weekly and monthly insight generation +- user controls for exclusions and reset + +Engineering deliverables: + +- robust worker pipeline with retries and DLQ +- aggregated analytics tables with indexing strategy +- end-to-end observability for job health and costs + +## 7. Exit Criteria + +You can exit Phase 4 when: + +- weekly and monthly insights run on schedule reliably +- users can view, reset, and control analysis scope +- spectrum cost and performance stay inside defined envelopes +- data deletion behavior is verified for raw and derived records + +## 8. Risks To Watch + +- Risk: analytics pipeline complexity causes reliability issues. + - Mitigation: isolate workers and enforce idempotent jobs. +- Risk: insight quality is too generic. + - Mitigation: prompt iteration with rubric scoring and blinded review. +- Risk: costs drift with growing history windows. + - Mitigation: aggregate-first processing and strict feature budget controls. diff --git a/docs/kalei-ai-model-comparison.md b/docs/kalei-ai-model-comparison.md new file mode 100644 index 0000000..fa76037 --- /dev/null +++ b/docs/kalei-ai-model-comparison.md @@ -0,0 +1,172 @@ +# Kalei — AI Model Selection: Unbiased Analysis + +## The Question + +Which AI model should power a mental wellness app that needs to detect emotional fragments, generate empathetic perspective reframes, produce personalized affirmations, detect crisis signals, and analyze behavioral patterns over time? + +--- + +## What Kalei Actually Needs From Its AI + +| Task | Quality Bar | Frequency | Latency Tolerance | +|------|------------|-----------|-------------------| +| **Mirror** — detect emotional fragments in freeform writing | High empathy + precision | 2-7x/week per user | 2-3s acceptable | +| **Kaleidoscope** — generate 3 perspective reframes | Highest — this IS the product | 3-10x/day per user | 2-3s acceptable | +| **Lens** — daily affirmation generation | Medium — structured output | 1x/day per user | 5s acceptable | +| **Crisis Detection** — flag self-harm/distress signals | Critical safety — zero false negatives | Every interaction | <1s preferred | +| **Spectrum** — weekly/monthly pattern analysis | High analytical depth | 1x/week batch | Minutes acceptable | + +The Kaleidoscope reframes are the core product experience. If they feel generic, robotic, or tone-deaf, users churn. This is the task where model quality matters most. + +--- + +## Venice.ai API — What You Get + +Since you already have Venice Pro ($10 one-time API credit), here are the relevant models and their pricing: + +### Best Venice Models for Kalei + +| Model | Input/MTok | Output/MTok | Cache Read | Context | Privacy | Notes | +|-------|-----------|------------|------------|---------|---------|-------| +| **DeepSeek V3.2** | $0.40 | $1.00 | $0.20 | 164K | Private | Strongest general model on Venice | +| **Qwen3 235B A22B** | $0.15 | $0.75 | — | 131K | Private | Best price-to-quality ratio | +| **Llama 3.3 70B** | $0.70 | $2.80 | — | 131K | Private | Meta's flagship open model | +| **Gemma 3 27B** | $0.12 | $0.20 | — | 203K | Private | Ultra-cheap, Google's open model | +| **Venice Small (Qwen3 4B)** | $0.05 | $0.15 | — | 33K | Private | Affirmation-tier only | + +### Venice Advantages +- **Privacy-first architecture** — no data retention, critical for mental health +- **OpenAI-compatible API** — trivial to swap in/out, same SDK +- **Prompt caching** on select models (DeepSeek V3.2 confirmed) +- **You already pay for Pro** — $10 free API credit to test +- **No minimum commitment** — pure pay-per-use + +### Venice Limitations +- **No batch API** — can't get 50% off for Spectrum overnight processing +- **"Uncensored" default posture** — Venice optimizes for no guardrails, which is the OPPOSITE of what a mental health app needs. We must disable Venice system prompts and provide our own safety layer +- **No equivalent to Anthropic's constitutional AI** — crisis detection safety net is entirely on us +- **Smaller infrastructure** — less battle-tested at scale than Anthropic/OpenAI +- **Rate limits not publicly documented** — could be a problem at scale + +--- + +## Head-to-Head: Venice Models vs Claude Haiku 4.5 + +### Cost Per User Per Month + +Calculated using our established usage model: Free user = 3 Turns/day, 2 Mirror/week, daily Lens. + +| Model (via) | Free User/mo | Prism User/mo | vs Claude Haiku | +|-------------|-------------|--------------|-----------------| +| **Claude Haiku 4.5** (Anthropic) | $0.31 | $0.63 | baseline | +| **DeepSeek V3.2** (Venice) | ~$0.07 | ~$0.15 | **78% cheaper** | +| **Qwen3 235B** (Venice) | ~$0.05 | ~$0.10 | **84% cheaper** | +| **Llama 3.3 70B** (Venice) | ~$0.16 | ~$0.33 | **48% cheaper** | +| **Gemma 3 27B** (Venice) | ~$0.02 | ~$0.04 | **94% cheaper** | + +The cost difference is massive. At 200 DAU (Phase 2), monthly AI cost drops from ~$50 to ~$10-15. + +### Quality Comparison for Emotional Tasks + +This is the critical question. Here's what the research and benchmarks tell us: + +**Emotional Intelligence (EI) Benchmarks:** +- A 2025 Nature study tested LLMs on 5 standard EI tests. GPT-4, Claude 3.5 Haiku, and DeepSeek V3 all outperformed humans (81% avg vs 56% human avg) +- GPT-4 scored highest with a Z-score of 4.26 on the LEAS emotional awareness scale +- Claude models are specifically noted for "endless empathy" — excellent for therapeutic contexts but with dependency risk +- A blinded study found AI-generated psychological advice was rated MORE empathetic than human expert advice + +**Model-Specific Emotional Qualities:** + +| Model | Empathy Quality | Tone Consistency | Creative Reframing | Safety/Guardrails | +|-------|----------------|-----------------|-------------------|-------------------| +| Claude Haiku 4.5 | ★★★★☆ | ★★★★★ | ★★★★☆ | ★★★★★ | +| DeepSeek V3.2 | ★★★☆☆ | ★★★★☆ | ★★★☆☆ | ★★☆☆☆ | +| Qwen3 235B | ★★★★☆ | ★★★★☆ | ★★★☆☆ | ★★☆☆☆ | +| Llama 3.3 70B | ★★★☆☆ | ★★★☆☆ | ★★★☆☆ | ★★★☆☆ | +| Gemma 3 27B | ★★☆☆☆ | ★★★☆☆ | ★★☆☆☆ | ★★★☆☆ | + +**Key findings:** +- DeepSeek V3.2 is described as "slightly more mechanical in tone" with "repetition in phrasing" — problematic for daily therapeutic interactions +- Qwen3 is praised for "coherent extended conversations" and "tone consistency over long interactions" — actually quite good for our use case +- Llama 3.3 is solid but unremarkable for emotional tasks +- Gemma 3 27B is too small for the nuance we need in Mirror and Kaleidoscope +- Claude's constitutional AI training makes crisis detection significantly more reliable out-of-the-box + +--- + +## The Honest Recommendation + +### Option A: Venice-First (Lowest Cost) +**Primary: Qwen3 235B via Venice** for all features + +- Monthly cost at 200 DAU: ~$10-15 +- Pros: 84% cheaper, privacy-first, you already have the account +- Cons: No batch API (Spectrum costs more), no built-in safety net, requires extensive prompt engineering for emotional quality, crisis detection entirely self-built +- Risk: If reframe quality feels "off" or generic, the core product fails + +### Option B: Claude-First (Current Plan) +**Primary: Claude Haiku 4.5 via Anthropic** + +- Monthly cost at 200 DAU: ~$50 +- Pros: Best-in-class empathy and safety, prompt caching, batch API (50% off Spectrum), constitutional AI for crisis detection +- Cons: 4-6x more expensive, Anthropic lock-in +- Risk: Higher burn rate, but product quality is higher + +### Option C: Hybrid (Recommended) ★ +**Split by task criticality:** + +| Task | Model | Via | Why | +|------|-------|-----|-----| +| **Kaleidoscope reframes** | Qwen3 235B | Venice | Core product, needs quality BUT Qwen3 handles tone consistency well. Test extensively. | +| **Mirror fragments** | Qwen3 235B | Venice | Structured detection task, Qwen3 is precise enough | +| **Lens affirmations** | Venice Small (Qwen3 4B) | Venice | Simple generation, doesn't need a big model | +| **Crisis detection** | Application-layer keywords + Qwen3 235B | Venice + custom code | Keyword matching first, LLM confirmation second | +| **Spectrum batch** | DeepSeek V3.2 | Venice | Analytical task, DeepSeek excels at structured analysis | + +**Estimated monthly cost at 200 DAU: ~$12-18** (vs $50 with Claude, vs $10 all-Qwen3) + +### Why Hybrid via Venice Wins + +1. **You already pay for Pro** — the $10 credit lets you prototype immediately +2. **OpenAI-compatible API** — if Venice quality disappoints, swapping to Anthropic/Groq/OpenRouter is a 1-line base URL change +3. **Privacy alignment** — Venice's no-data-retention policy is actually perfect for mental health data +4. **Cost headroom** — at $12-18/mo AI cost, you could serve 200 DAU and still be profitable with just 3-4 Prism subscribers +5. **Qwen3 235B is genuinely good** — it's not a compromise model, it scores competitively on emotional tasks + +### The Critical Caveat: Safety Layer + +Venice's "uncensored" philosophy means we MUST build our own safety layer: + +``` +User input → Keyword crisis detector (local, instant) + → If flagged: hardcoded crisis response (no LLM needed) + → If clear: send to Venice API with our safety-focused system prompt + → Post-process: scan output for harmful patterns before showing to user +``` + +This adds development time but gives us MORE control than relying on any provider's built-in guardrails. + +--- + +## Revised Cost Model with Venice + +| Phase | DAU | AI Cost/mo | Total Infra/mo | Break-even Subscribers | +|-------|-----|-----------|----------------|----------------------| +| Phase 1 (0-500 users) | ~50 | ~$4 | ~$17 | **4 Prism @ $4.99** | +| Phase 2 (500-2K) | ~200 | ~$15 | ~$40 | **8 Prism** | +| Phase 3 (2K-10K) | ~1K | ~$60 | ~$110 | **22 Prism** | + +Compare to Claude-first: Phase 1 was $26/mo, now $17. Phase 2 was $90-100, now $40. That's significant runway extension. + +--- + +## Action Plan + +1. **Immediately**: Use your Venice Pro $10 credit to test Qwen3 235B with Kalei's actual system prompts +2. **Build a test harness**: Send 50 real emotional writing samples through both Qwen3 (Venice) and Claude Haiku, blind-rate the outputs +3. **If Qwen3 passes**: Go Venice-first, save 60-80% on AI costs +4. **If Qwen3 disappoints on reframes specifically**: Use Claude Haiku for Kaleidoscope only, Venice for everything else +5. **Build the safety layer regardless** — don't rely on any provider's guardrails for a mental health app + +The API is OpenAI-compatible, so the switching cost is near zero. Start cheap, validate quality, upgrade only where needed. diff --git a/docs/kalei-getting-started.md b/docs/kalei-getting-started.md new file mode 100644 index 0000000..a441a8e --- /dev/null +++ b/docs/kalei-getting-started.md @@ -0,0 +1,326 @@ +# Kalei Getting Started Guide (Beginner Friendly) + +Last updated: 2026-02-10 +Audience: First-time app builders + +This guide explains the groundwork you need before coding, then gives you the exact first steps to start building Kalei. + +Reference architecture: `docs/kalei-system-architecture-plan.md` + +## 1. What You Are Building + +Kalei is a mobile-first mental wellness product with four product pillars: + +- Mirror: freeform writing with passive AI fragment detection. +- Turn (Kaleidoscope): structured AI reframing. +- Lens: goals, daily actions, and affirmations. +- Spectrum (Phase 2): weekly and monthly insight analytics. + +At launch, your implementation target is: + +- Mobile app: React Native + Expo. +- Backend API: Node.js + Fastify. +- Data: PostgreSQL + Redis. +- Source control and CI: Gitea + Gitea Actions (or Woodpecker CI). +- AI access: provider-agnostic AI Gateway using open-weight models (Ollama for local dev, vLLM for staging/prod). +- Billing and entitlements: self-hosted entitlement service (direct Apple App Store + Google Play verification, no RevenueCat dependency). + +## 2. How To Use This Document Set + +Read in this order: + +1. `docs/kalei-getting-started.md` (this file) +2. `docs/codex phase documents/README.md` +3. `docs/codex phase documents/phase-0-groundwork-and-dev-environment.md` +4. `docs/codex phase documents/phase-1-platform-foundation.md` +5. `docs/codex phase documents/phase-2-core-experience-build.md` +6. `docs/codex phase documents/phase-3-launch-readiness-and-hardening.md` +7. `docs/codex phase documents/phase-4-spectrum-and-scale.md` + +## 3. Groundwork Checklist (Before You Write Feature Code) + +## 3.1 Accounts You Need + +Create these accounts first so you do not block yourself later. + +Must have for early development: + +- Gitea (source control and CI) +- Apple Developer Program (iOS distribution, required) +- Google Play Console (Android distribution, required) +- DNS provider account (or self-hosted DNS using PowerDNS) + +Strongly recommended now (not later): + +- GlitchTip (open-source error tracking) +- PostHog self-hosted (open-source product analytics) +- Domain registrar account (for `kalei.ai`) + +## 3.2 Local Tools You Need + +Install this baseline stack: + +- Git +- Node.js LTS (via `nvm`, recommended) +- npm (bundled with Node) or pnpm +- Docker Desktop (for PostgreSQL + Redis locally) +- VS Code (or equivalent IDE) +- Expo Go app on your phone (iOS/Android) +- Ollama (local open-weight model serving) + +Install and verify: + +```bash +# Git +git --version + +# nvm + Node LTS +nvm install --lts +nvm use --lts +node -v +npm -v + +# Docker +docker --version +docker compose version + +# Expo CLI (optional global; npx is also fine) +npx expo --version + +# Ollama +ollama --version +``` + +## 3.3 Decide Your Working Model + +Set these rules now: + +- Work in short feature slices with demoable outcomes. +- Every backend endpoint gets at least one automated test. +- No direct AI calls from client apps. +- No secrets in the repo, ever. +- Crisis-level text is never reframed. + +## 4. Recommended Monorepo Structure + +If you are starting from scratch, use this layout: + +```text +Kalei/ + apps/ + mobile/ + services/ + api/ + workers/ + packages/ + shared/ + infra/ + docker/ + scripts/ + docs/ +``` + +Why this structure: + +- Keeps mobile and backend isolated but coordinated. +- Lets you share schemas/types in `packages/shared`. +- Keeps infra scripts in one predictable place. + +## 5. Step-By-Step Initial Setup + +These are the first practical steps for week 1. + +## Step 1: Initialize folders + +```bash +mkdir -p apps/mobile services/api services/workers packages/shared infra/docker infra/scripts +``` + +## Step 2: Bootstrap the mobile app + +```bash +npx create-expo-app@latest apps/mobile --template tabs +cd apps/mobile +npm install +cd ../.. +``` + +## Step 3: Bootstrap the API service + +```bash +mkdir -p services/api && cd services/api +npm init -y +npm install fastify @fastify/cors @fastify/helmet @fastify/sensible zod dotenv pino pino-pretty pg ioredis +npm install -D typescript tsx @types/node vitest supertest @types/supertest eslint prettier +npx tsc --init +cd ../.. +``` + +## Step 4: Bring up local PostgreSQL and Redis + +Create `infra/docker/docker-compose.yml`: + +```yaml +services: + postgres: + image: postgres:16 + environment: + POSTGRES_USER: kalei + POSTGRES_PASSWORD: kalei + POSTGRES_DB: kalei + ports: + - "5432:5432" + volumes: + - pg_data:/var/lib/postgresql/data + + redis: + image: redis:7 + ports: + - "6379:6379" + volumes: + - redis_data:/data + +volumes: + pg_data: + redis_data: +``` + +Start services: + +```bash +docker compose -f infra/docker/docker-compose.yml up -d +``` + +## Step 5: Create environment files + +Create: + +- `services/api/.env` +- `apps/mobile/.env` + +API `.env` minimum: + +```env +NODE_ENV=development +PORT=8080 +DATABASE_URL=postgres://kalei:kalei@localhost:5432/kalei +REDIS_URL=redis://localhost:6379 +JWT_ACCESS_SECRET=replace_me +JWT_REFRESH_SECRET=replace_me +AI_PROVIDER=openai_compatible +AI_BASE_URL=http://localhost:11434/v1 +AI_MODEL=qwen2.5:14b +AI_API_KEY=local-dev +GLITCHTIP_DSN=replace_me +POSTHOG_API_KEY=replace_me +POSTHOG_HOST=http://localhost:8000 +APPLE_SHARED_SECRET=replace_me +GOOGLE_PLAY_PACKAGE_NAME=com.kalei.app +``` + +Mobile `.env` minimum: + +```env +EXPO_PUBLIC_API_BASE_URL=http://localhost:8080 +EXPO_PUBLIC_ERROR_TRACKING_DSN=replace_me +``` + +## Step 6: Create your first backend health endpoint + +Create `/health` returning status, uptime, and version. This is your first proof that the API is running. + +## Step 7: Connect mobile app to backend + +Add a tiny service function in mobile that calls `/health` and shows the result on screen. + +## Step 8: Add migrations baseline + +Create migration folders and your first migration for: + +- users +- profiles +- auth_sessions +- refresh_tokens + +Add a migration script and run it on local Postgres. + +## Step 9: Set up linting, formatting, and tests + +At minimum: + +- API: `npm run lint`, `npm run test` +- Mobile: `npm run lint` + +## Step 10: Push a clean baseline commit + +Your first stable commit should include: + +- mobile app runs +- API runs +- db and redis run in Docker +- health endpoint tested +- env files templated + +## 6. Non-Negotiable Ground Rules + +These reduce rework and production risk. + +- API-first contracts: Define backend request/response schema first. +- Version prompts: Keep prompt templates in source control with version tags. +- Idempotency: write endpoints should support idempotency keys. +- Structured logs: every request gets a request ID. +- Safety-first branching: crisis path is explicit and tested. + +## 7. Open-Source-First Policy + +Default to open-source tools unless there is a hard platform requirement. + +Open-source defaults for Kalei: + +- Git forge and CI: Gitea + Gitea Actions (or Woodpecker CI) +- Error tracking: GlitchTip +- Product analytics: PostHog self-hosted +- AI serving: Ollama (local), vLLM (staging/prod) +- Runtime and data: Fastify, PostgreSQL, Redis + +Unavoidable non-open-source dependencies: + +- Apple App Store distribution and StoreKit APIs +- Google Play distribution and Billing APIs +- APNs and FCM for push delivery + +## 8. What "Done" Looks Like For Groundwork + +Before Phase 1 starts, you should be able to demonstrate: + +- Local stack boot with one command (`docker compose ... up -d`). +- API starts with no errors and serves `/health`. +- Mobile app opens and calls API successfully. +- Baseline DB migrations run and rollback cleanly. +- One CI pipeline runs lint + tests on pull requests. + +## 9. Common Beginner Mistakes (Avoid These) + +- Building UI screens before backend contracts exist. +- Calling AI provider directly from the app. +- Waiting too long to add tests and logs. +- Keeping architecture only in your head (not in docs). +- Delaying safety and privacy work until late phases. + +## 10. Recommended Weekly Rhythm + +Use this cycle every week: + +1. Plan: define exact outcomes for the week. +2. Build: complete one vertical slice (API + mobile + data). +3. Verify: run tests, manual QA, and failure-path checks. +4. Demo: produce a short demo video for your own review. +5. Retrospective: capture blockers and adjust next week. + +## 11. Next Step + +Start with: + +- `docs/codex phase documents/phase-0-groundwork-and-dev-environment.md` + +Then execute each phase in order. diff --git a/docs/kalei-infrastructure-plan.md b/docs/kalei-infrastructure-plan.md new file mode 100644 index 0000000..8ed06cd --- /dev/null +++ b/docs/kalei-infrastructure-plan.md @@ -0,0 +1,500 @@ +# Kalei — Infrastructure & Financial Plan + +## The Constraint + +**Starting capital:** €0 – €2,000 max +**Monthly burn target:** Under €30/month at launch, scaling only when revenue justifies it +**Goal:** Ship a production-quality AI mental wellness app that can serve its first 1,000 users without going broke + +--- + +## 1. The AI Decision (This Is Everything) + +AI is 70–90% of Kalei's variable cost. Every other infrastructure decision is rounding error compared to this one. Here's the full landscape: + +### Option A: Claude API (Anthropic Direct) + +| Model | Input / MTok | Output / MTok | Quality | Speed | +|---|---|---|---|---| +| Haiku 4.5 | $1.00 | $5.00 | Near-frontier | Fast | +| Haiku 3 | $0.25 | $1.25 | Good | Very fast | +| Sonnet 4.5 | $3.00 | $15.00 | Frontier | Medium | + +**Haiku 4.5** is the sweet spot for Kalei. It's fast, cheap, and its emotional intelligence and nuanced language understanding are strong enough for cognitive distortion detection and reframing — the core product. Sonnet is overkill for most interactions; Opus is completely unnecessary. + +**Cost optimizations available:** +- Prompt caching: 90% reduction on cached system prompt tokens (cache reads cost 0.1× base input) +- Batch API: 50% discount for non-real-time processing (Spectrum analysis, weekly insights) +- Smart prompt design: Keep system prompts tight, reuse cached context + +### Option B: Open-Source Models via API Providers + +| Provider | Model | Input / MTok | Output / MTok | +|---|---|---|---| +| OpenRouter / Fireworks | Qwen3-235B-A22B | $0.45 | $1.80 | +| Together AI | Llama 3.3 70B | $0.20 | $0.20 | +| Groq | Qwen3-32B | $0.29 | ~$0.39 | +| DeepInfra | Various 7–70B | $0.05–$0.50 | $0.10–$1.50 | + +Cheaper on paper, but critical trade-off: **quality of emotional understanding.** Kalei's entire value proposition is that the AI "gets" your thinking patterns and offers genuinely insightful reframes. A bad reframe isn't just unhelpful — in a mental wellness context, it can feel dismissive or harmful. Claude Haiku 4.5 was specifically trained for nuance, safety, and emotional calibration. + +### Option C: Self-Hosted GPU (Eliminated) + +- Netcup vGPU (H200, 7GB VRAM): **€137/month** — that's our entire budget +- Even cheap GPU providers (vast.ai, Lambda): $50–150/month for anything capable of running a 30B+ model +- Requires DevOps expertise to maintain inference servers +- **Verdict: Not viable at our budget.** Not even close. Revisit only at 5,000+ paying users. + +### The Decision: Hybrid API Strategy + +**Primary engine:** Claude Haiku 4.5 via Anthropic API +**Batch processing:** Claude Haiku 4.5 Batch API (50% off) for Spectrum analysis, weekly insights +**Fallback / cost ceiling:** If costs spike, route simple tasks (affirmation generation, basic Lens content) to Qwen3-32B via Groq ($0.29/$0.39 per MTok) — 70–85% cheaper for tasks that don't require Claude's emotional depth + +This gives us Claude-quality where it matters (Mirror, Kaleidoscope, crisis detection) and an escape valve for cost control. + +--- + +## 2. Per-User AI Cost Model + +Here's what a real user session looks like in tokens: + +### The Mirror (Freeform Writing + AI Highlights) + +| Component | Input Tokens | Output Tokens | +|---|---|---| +| System prompt (cached after first call) | ~800 | — | +| User's writing (per session, ~300 words) | ~400 | — | +| Fragment detection (5 highlights avg) | — | ~500 | +| Inline reframe (per tap, user triggers ~2) | ~200 | ~150 | +| Session Reflection | ~300 | ~400 | +| **Total per Mirror session** | **~1,700** | **~1,050** | + +With prompt caching (system prompt cached): effective input ≈ 980 tokens (800 cached at 0.1×) + 900 fresh = **~980 billable input tokens** + +### The Kaleidoscope (One Turn) + +| Component | Input Tokens | Output Tokens | +|---|---|---| +| System prompt (cached) | ~600 | — | +| User's fragment + context | ~300 | — | +| 3 reframe perspectives | — | ~450 | +| **Total per Turn** | **~900** | **~450** | + +With caching: ~360 billable input tokens + +### The Lens (Daily Affirmation) + +| Component | Input Tokens | Output Tokens | +|---|---|---| +| System prompt (cached) | ~400 | — | +| User context + goals | ~200 | — | +| Generated affirmation | — | ~100 | +| **Total per daily affirmation** | **~600** | **~100** | + +With caching: ~240 billable input tokens + +### Monthly Usage Per Active User Profile + +**Free user** (3 Turns/day, 2 Mirror sessions/week, daily Lens): + +| Feature | Sessions/Month | Billable Input Tokens | Output Tokens | +|---|---|---|---| +| Kaleidoscope | 90 Turns | 32,400 | 40,500 | +| Mirror | 8 sessions | 7,840 | 8,400 | +| Lens | 30 affirmations | 7,200 | 3,000 | +| **Total** | | **47,440** | **51,900** | + +**Cost with Haiku 4.5:** (47,440 × $1.00 + 51,900 × $5.00) / 1,000,000 = **$0.047 + $0.260 = $0.31/month** + +**Prism subscriber** (unlimited usage, assume 2× free user + Spectrum): + +| Feature | Sessions/Month | Billable Input Tokens | Output Tokens | +|---|---|---|---| +| Kaleidoscope | 180 Turns | 64,800 | 81,000 | +| Mirror | 16 sessions | 15,680 | 16,800 | +| Lens | 30 affirmations | 7,200 | 3,000 | +| Spectrum (batch) | 4 analyses | 8,000 | 12,000 | +| **Total** | | **95,680** | **112,800** | + +**Cost with Haiku 4.5:** $0.096 + $0.564 = **$0.66/month** +Spectrum uses Batch API (50% off): saves ~$0.03, so effective = **~$0.63/month** + +**Reality check:** Most users won't hit max usage. Expect average active user cost of **$0.15–$0.40/month.** + +--- + +## 3. Infrastructure Stack + +### Server: Netcup VPS 1000 G12 + +| Spec | Value | +|---|---| +| CPU | 4 vCores (AMD EPYC) | +| RAM | 8 GB DDR5 ECC | +| Storage | 256 GB NVMe | +| Bandwidth | Unlimited, 2.5 Gbps | +| Location | Nuremberg, Germany | +| **Price** | **€8.45/month** (~$9.20) | + +This runs everything: API server, database, Redis cache, reverse proxy. Comfortably handles hundreds of concurrent users. Can upgrade to VPS 2000 (€15.59/mo) when we outgrow it. + +**What runs on this box:** +- Node.js / Express API server (or Fastify for speed) +- PostgreSQL 16 (direct install, not Supabase overhead) +- Redis (session cache, rate limiting, prompt cache keys) +- Nginx (reverse proxy, SSL termination, rate limiting) +- Certbot (free SSL via Let's Encrypt) + +### Why NOT Supabase Cloud + +Supabase Cloud Pro is $25/month — that's 3× our VPS cost and we'd still need a separate server for the API layer. Self-hosting Supabase via Docker is possible but adds ~2GB RAM overhead for all the services (GoTrue, PostgREST, Realtime, Storage, Kong). On an 8GB VPS, that leaves very little room. + +**Instead:** Run PostgreSQL directly. We get all the database functionality we need (Row Level Security, triggers, functions, JSON support) without the Supabase services overhead. We build our own auth layer (JWT-based, simple) and our own API. This is leaner, cheaper, and gives us full control. + +If we later want Supabase features (real-time subscriptions, storage), we can self-host just the components we need. + +### Domain & DNS + +| Item | Cost | +|---|---| +| kalei.ai domain | ~$50–70/year (~$5/month) | +| Cloudflare DNS (free tier) | $0 | +| Cloudflare CDN/DDoS (free tier) | $0 | + +### App Deployment & Distribution + +| Item | Cost | +|---|---| +| Expo / EAS Build (free tier) | $0 (limited builds, queue wait) | +| Apple Developer Program | $99/year (~$8.25/month) | +| Google Play Developer | $25 one-time | +| Push Notifications (Firebase Cloud Messaging) | $0 | + +**Build strategy:** Use Expo free tier for development. For production releases, use EAS free tier (low priority queue, ~30 min wait) or build locally. 2–4 builds per month is fine for the free tier. + +### Email & Transactional + +| Item | Cost | +|---|---| +| Resend (transactional email, free tier) | $0 (up to 100 emails/day) | +| Or Brevo free tier | $0 (300 emails/day) | + +### Monitoring & Error Tracking + +| Item | Cost | +|---|---| +| Sentry (free tier) | $0 (5K errors/month) | +| UptimeRobot (free tier) | $0 (50 monitors) | +| Custom logging to PostgreSQL | $0 | + +--- + +## 4. Total Monthly Cost Breakdown + +### Phase 0: Pre-Launch / Development (Months 1–3) + +| Item | Monthly Cost | +|---|---| +| Netcup VPS 1000 G12 | €8.45 | +| Domain (kalei.ai) | ~€5.00 | +| Claude API (dev/testing, ~$5 credit free) | €0 | +| Expo Free Tier | €0 | +| Cloudflare, Sentry, email | €0 | +| **Total** | **~€13.50/month** | + +**Upfront costs:** Apple Developer ($99) + Google Play ($25) + Domain (~$55/year) = **~€180 one-time** + +### Phase 1: Launch (0–500 users, ~50 DAU) + +Assuming 50 daily active users, ~200 registered: + +| Item | Monthly Cost | +|---|---| +| Netcup VPS 1000 G12 | €8.45 | +| Domain | ~€5.00 | +| Claude API (~50 active × $0.25 avg) | ~€12.50 | +| Expo Free Tier | €0 | +| Infrastructure (Cloudflare, etc.) | €0 | +| **Total** | **~€26/month** | + +### Phase 2: Traction (500–2,000 users, ~200 DAU) + +| Item | Monthly Cost | +|---|---| +| Netcup VPS 2000 G12 (upgrade) | €15.59 | +| Domain | ~€5.00 | +| Claude API (~200 active × $0.25 avg) | ~€50.00 | +| Expo Starter (if needed for OTA updates) | €19.00 | +| Email (may need paid tier) | €0–10 | +| **Total** | **~€90–100/month** | + +### Phase 3: Growth (2,000–10,000 users, ~1,000 DAU) + +| Item | Monthly Cost | +|---|---| +| Netcup VPS 4000 G12 | €26.18 | +| Domain | ~€5.00 | +| Claude API (~1,000 active × $0.25 avg) | ~€250.00 | +| Expo Production plan | €99.00 | +| Email paid tier | ~€20 | +| Sentry paid (if needed) | ~€26 | +| **Total** | **~€425/month** | + +At this point, AI cost is 60% of total spend. This is where the Groq/open-source fallback for simple tasks starts saving real money. + +--- + +## 5. Pricing Reevaluation + +### The Old Price: $7.99/month (Prism) + +Based on the cost model above, let's check if this works: + +**At Phase 1 (50 DAU, ~10 paying subscribers):** +- Revenue: 10 × $7.99 = $79.90 +- Costs: ~$28 +- **Margin: +$52 (65%)** + +**At Phase 2 (200 DAU, ~40 paying subscribers @ 20% conversion):** +- Revenue: 40 × $7.99 = $319.60 +- Costs: ~$100 +- **Margin: +$220 (69%)** + +**At Phase 3 (1,000 DAU, ~150 paying subscribers @ 15% conversion):** +- Revenue: 150 × $7.99 = $1,198.50 +- Costs: ~$425 +- **Margin: +$773 (65%)** + +The margins are healthy. But $7.99 feels like a lot for a brand-new app from an unknown brand in a competitive wellness space. Users compare against Headspace ($12.99), Calm ($14.99), but those have massive brand recognition and content libraries. + +### The New Price: $4.99/month (Prism) + +**Why $4.99:** +- Psychological barrier is much lower — impulse-buy territory +- Significantly undercuts major competitors while offering AI personalization they don't have +- At $0.63/month cost per Prism subscriber, the margin is still **87%** +- Annual option: $39.99/year ($3.33/month) — strong incentive to commit +- Free tier remains generous enough to demonstrate value (3 Turns/day, 2 Mirror/week) + +**Revised projections at $4.99:** + +| Phase | Paying Users | Monthly Revenue | Monthly Cost | Margin | +|---|---|---|---|---| +| Phase 1 | 15 (higher conversion at lower price) | $74.85 | ~$28 | +$47 (62%) | +| Phase 2 | 60 | $299.40 | ~$100 | +$200 (67%) | +| Phase 3 | 250 | $1,247.50 | ~$450 | +$798 (64%) | + +The lower price likely drives higher conversion, so net revenue is similar or better. And the margin stays well above 60% at every stage. + +### Alternative: Tiered Pricing + +| Tier | Price | What You Get | +|---|---|---| +| **Free** | $0 | 3 Turns/day, 2 Mirror/week, basic Lens, 30-day Gallery | +| **Prism** | $4.99/mo | Unlimited Turns + Mirror, advanced reframe styles, full Gallery, fragment tracking | +| **Prism+** | $9.99/mo | Everything in Prism + full Spectrum dashboard, weekly/monthly AI insights, export, priority processing | + +This is smart because Spectrum is the most expensive feature (batch AI analysis of historical data) and the most valuable retention tool. Gating it behind a higher tier means only your most engaged (and willing-to-pay) users generate that cost, and they're paying for it. + +--- + +## 6. Revenue Milestones & Sustainability + +### Break-Even Analysis + +**Monthly fixed costs (Phase 1):** ~€16 (VPS + domain) +**Variable cost per active user:** ~€0.25 + +Break-even on fixed costs alone: **4 Prism subscribers at $4.99** cover the infrastructure. + +To cover Apple's annual fee ($99) and Google ($25 amortized): add ~$10/month → total of **6 subscribers** to fully break even. + +### Path to Sustainability + +| Milestone | Users | Paying | MRR | Costs | Profit | +|---|---|---|---|---|---| +| Month 3 | 100 | 5 | $25 | $28 | -$3 | +| Month 6 | 500 | 30 | $150 | $45 | +$105 | +| Month 9 | 1,500 | 80 | $400 | $85 | +$315 | +| Month 12 | 3,000 | 200 | $1,000 | $200 | +$800 | +| Month 18 | 8,000 | 600 | $3,000 | $500 | +$2,500 | + +The model becomes self-sustaining around **month 4–5** with ~15 paying subscribers. + +--- + +## 7. Technical Architecture Summary + +``` +┌─────────────────────────────────────────────────────┐ +│ CLIENTS │ +│ React Native (iOS + Android) │ +│ via Expo / EAS │ +└──────────────────┬──────────────────────────────────┘ + │ HTTPS + ▼ +┌─────────────────────────────────────────────────────┐ +│ CLOUDFLARE (Free Tier) │ +│ DNS · CDN · DDoS Protection · SSL │ +└──────────────────┬──────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────┐ +│ NETCUP VPS 1000 G12 (€8.45/mo) │ +│ │ +│ ┌──────────┐ ┌───────────┐ ┌──────────────────┐ │ +│ │ Nginx │→ │ Node.js │→ │ PostgreSQL 16 │ │ +│ │ (proxy) │ │ API │ │ (all app data) │ │ +│ └──────────┘ └─────┬─────┘ └──────────────────┘ │ +│ │ ┌──────────────────┐ │ +│ │ │ Redis │ │ +│ │ │ (cache/sessions)│ │ +│ │ └──────────────────┘ │ +└──────────────────────┼──────────────────────────────┘ + │ API Calls + ▼ + ┌──────────────────────────────┐ + │ ANTHROPIC API │ + │ │ + │ Haiku 4.5 (primary) │ + │ • Mirror fragment detection │ + │ • Kaleidoscope reframes │ + │ • Lens affirmations │ + │ • Crisis detection │ + │ │ + │ Haiku 4.5 Batch (50% off) │ + │ • Spectrum weekly analysis │ + │ • Monthly insights │ + │ • Growth trajectory calc │ + └──────────────────────────────┘ + │ + (Future fallback if costs spike) + │ + ┌──────────────────────────────┐ + │ GROQ / OPENROUTER │ + │ Qwen3-32B ($0.29/MTok) │ + │ • Simple affirmations │ + │ • Basic Lens content │ + │ • Non-critical generation │ + └──────────────────────────────┘ +``` + +### Key Technical Decisions + +**Auth:** Custom JWT-based auth built into our Node.js API. Uses bcrypt for password hashing, short-lived access tokens (15 min) + long-lived refresh tokens stored in PostgreSQL. Social login (Apple Sign-In, Google) via their SDKs — free. + +**Database schema:** PostgreSQL with Row Level Security policies. Tables for users, mirror_sessions, mirror_fragments, turns, lens_goals, spectrum_analyses. All user content encrypted at rest (PostgreSQL `pgcrypto` extension). + +**AI request pipeline:** +1. Client sends user text to our API +2. API constructs prompt with cached system prompt + user context +3. API calls Claude Haiku 4.5, streams response back to client +4. API logs token usage for cost tracking +5. Response stored in PostgreSQL for Spectrum analysis + +**Rate limiting:** Redis-based. Free tier: 3 Turns/day, 2 Mirror/week enforced server-side. Prism: unlimited but soft-capped at 50 Turns/day to prevent abuse (99.9% of users will never hit this). + +**Prompt caching strategy:** System prompts for each feature (Mirror, Kaleidoscope, Lens) are designed to be identical across users. Only the user's specific content changes. With Anthropic's prompt caching, the system prompt (600–800 tokens) is cached and subsequent calls within a 5-minute window cost only 10% of the base input rate. This cuts effective input costs by ~40–50%. + +--- + +## 8. Cost Control Safeguards + +These prevent a surprise API bill from killing the project: + +1. **Hard spending cap** on Anthropic API dashboard (start at $50/month, increase as revenue grows) +2. **Per-user daily token budget** tracked in Redis. If a user somehow generates excessive requests, they get a "take a break" message (fits the wellness brand perfectly) +3. **Graceful degradation:** If API budget is 80% consumed, route Lens affirmations to local template system (pre-written affirmations, no AI needed). Mirror and Kaleidoscope get priority for remaining budget. +4. **Batch everything possible:** Spectrum analysis runs overnight via Batch API (50% off). Weekly insights generated in a single batch job Sunday night. +5. **Monitor daily:** Simple Telegram bot alerts if daily API spend exceeds threshold + +--- + +## 9. Startup Budget Allocation + +With a maximum €2,000 to spend wisely: + +| Category | Amount | What It Covers | +|---|---|---| +| Apple Developer Account | €99 | Annual fee, required for App Store | +| Google Play Developer | €25 | One-time fee | +| Domain (kalei.ai, 1 year) | ~€55 | Annual registration | +| Netcup VPS (6 months prepaid) | ~€51 | Runway for half a year of hosting | +| Claude API credits (initial deposit) | €100 | Covers dev + testing + first ~300 active user-months | +| Design assets (fonts, if not free) | €0–50 | Inter + custom weight = free. Icon set if needed. | +| Contingency | ~€120 | Unexpected costs | +| **Total startup spend** | **~€450–500** | | +| **Remaining reserve** | **~€1,500** | 10+ months of Phase 1 operating costs | + +This means the €2,000 budget gives us roughly **12–14 months of runway** before we need a single paying customer. That's extremely comfortable for finding product-market fit. + +--- + +## 10. When to Scale (And What Changes) + +| Trigger | Action | Cost Impact | +|---|---|---| +| >200 concurrent connections | Upgrade to VPS 2000 (€15.59) | +€7/month | +| >500 DAU | Add Redis Cluster or separate DB VPS | +€5–8/month | +| >$200/month API spend | Implement Groq fallback for Lens | Saves ~20% on AI costs | +| >2,000 DAU | Upgrade to VPS 4000 (€26.18) | +€10/month | +| >$500/month API spend | Evaluate self-hosted Qwen3-32B on netcup vGPU | Only if cost-effective at that volume | +| >10,000 DAU | Consider second VPS for API/DB separation | Architecture change | +| >$2,000/month revenue | Consider dedicated server or managed Postgres | Comfort/reliability upgrade | + +The beauty of this architecture is that **nothing changes architecturally as we scale** — we just give the same VPS more resources, and the API costs scale linearly and predictably with users. + +--- + +## 11. Competitive Cost Comparison + +To put this in perspective — what would this cost on "standard" startup infrastructure? + +| Our Stack | "Normal" Startup Stack | Monthly Cost | +|---|---|---| +| Netcup VPS (€8.45) | AWS EC2 t3.medium | $35–50 | +| PostgreSQL on VPS ($0) | Supabase Pro or RDS | $25–50 | +| Redis on VPS ($0) | Redis Cloud or ElastiCache | $15–30 | +| Cloudflare free ($0) | AWS CloudFront + ALB | $20–40 | +| Claude Haiku 4.5 (same) | Claude Haiku 4.5 (same) | Same | +| **Our total: ~$28/mo** | **Their total: ~$120–200/mo** | | + +We're running at **20–25%** of what a "typical" startup would spend by self-hosting on a European VPS instead of defaulting to AWS/GCP. + +--- + +## 12. Final Pricing Recommendation + +| | Free | Prism | Prism+ (Phase 2 launch) | +|---|---|---|---| +| **Price** | $0 | **$4.99/month** | **$9.99/month** | +| | | $39.99/year | $79.99/year | +| Turns/day | 3 | Unlimited | Unlimited | +| Mirror/week | 2 | Unlimited | Unlimited | +| Lens | Basic | Full | Full | +| Reframe styles | 1 (Compassionate) | All 4 | All 4 | +| Gallery | 30 days | Full history | Full history | +| Fragment tracking | No | Yes | Yes | +| Spectrum | No | No | **Full dashboard** | +| Weekly AI insights | No | No | **Yes** | +| Growth trajectory | No | No | **Yes** | +| Export | No | Basic | Full | +| **Our cost per user** | ~$0.15 | ~$0.45 | ~$0.70 | +| **Margin** | N/A (acquisition) | **91%** | **93%** | + +### Why This Works + +At **$4.99**, Kalei is: +- Cheaper than Headspace ($12.99), Calm ($14.99), Woebot (free but limited) +- More personalized than any of them (AI-powered, not pre-recorded content) +- Profitable from subscriber #6 +- Self-sustaining from month ~5 +- Fully funded for 12+ months on a €2,000 budget even with zero revenue + +The model scales cleanly because **AI costs are the only meaningful variable cost**, and they scale linearly with usage at a rate that our pricing covers with 90%+ margins. + +--- + +*Last updated: February 2026* +*All prices include VAT where applicable. USD/EUR conversions at approximate current rates.* diff --git a/docs/kalei-system-architecture-plan.md b/docs/kalei-system-architecture-plan.md new file mode 100644 index 0000000..772613a --- /dev/null +++ b/docs/kalei-system-architecture-plan.md @@ -0,0 +1,509 @@ +# Kalei System Architecture Plan + +Version: 1.0 +Date: 2026-02-10 +Status: Proposed canonical architecture for implementation + +## 1. Purpose and Scope + +This document consolidates the existing Kalei docs into one implementation-ready system architecture plan. + +In scope: +- Phase 1 features: Mirror, Kaleidoscope (Turn), Lens, Gallery, subscriptions. +- Phase 2 features: Spectrum analytics (weekly/monthly insight pipeline). +- Mobile-first architecture (iOS/Android via Expo) with optional web support. +- Production operations for safety, privacy, reliability, and cost control. + +Out of scope: +- Pixel-level UI specs and brand copy details. +- Provider contract/legal details. +- Full threat model artifacts (to be produced separately). + +## 2. Inputs Reviewed + +- `docs/app-blueprint.md` +- `docs/kalei-infrastructure-plan.md` +- `docs/kalei-ai-model-comparison.md` +- `docs/kalei-mirror-feature.md` +- `docs/kalei-spectrum-phase2.md` +- `docs/kalei-complete-design.md` +- `docs/kalei-brand-guidelines.md` + +## 3. Architecture Drivers + +### 3.1 Product drivers + +- Core loop quality: Mirror fragment detection and Turn reframes must feel high quality and emotionally calibrated. +- Daily habit loop: low friction, fast response, strong retention mechanics. +- Phase 2 depth: longitudinal Spectrum insights from accumulated usage data. + +### 3.2 Non-functional drivers + +- Safety first: crisis language must bypass reframing and trigger support flow. +- Privacy first: personal reflective writing is highly sensitive. +- Cost discipline: launch target under ~EUR 30/month fixed infrastructure. +- Operability: architecture must be maintainable by a small team. +- Gradual scale: support ~50 DAU at launch and scale to ~10k DAU without full rewrite. + +## 4. Canonical Decisions + +This plan resolves conflicting guidance across current docs. + +| Topic | Decision | Rationale | +|---|---|---| +| Backend platform | Self-hosted API-first modular monolith on Node.js (Fastify preferred) | Matches budget constraints and keeps full control of safety, rate limits, and AI routing. | +| Data layer | PostgreSQL 16 + Redis | Postgres for source-of-truth relational + analytics tables; Redis for counters, rate limits, caching, idempotency. | +| Auth | JWT auth service in API + refresh token rotation + social login (Apple/Google) | Aligns with self-hosted stack while preserving mobile auth UX. | +| Mobile | React Native + Expo (local/native builds) | Fastest path for iOS/Android while keeping build pipeline under direct control. | +| AI integration | AI Gateway abstraction with provider routing | Prevents hard lock-in and supports quality/cost strategy across open-weight model backends and optional hosted fallbacks. | +| AI default | Open-weight Qwen/Llama family via vLLM (Ollama locally) for Mirror/Turn/Safety-sensitive paths at launch | Keeps model stack open-source-first while preserving routing flexibility. | +| AI cost fallback | Route low-risk generation (Lens/basic content) to lower-cost providers when budget thresholds hit | Preserves core quality while controlling spend. | +| Billing | Self-hosted entitlement authority (direct App Store + Google Play server APIs) | Keeps billing logic in-house and avoids closed SaaS dependency in core authorization path. | +| Analytics/monitoring | PostHog self-hosted + GlitchTip + centralized app logs + cost telemetry | Open-source-first observability stack with lower vendor lock-in. | + +## 5. System Context + +```mermaid +flowchart LR + user[User] --> app[Expo App] + app --> edge[Edge Proxy] + edge --> api[Kalei API] + api --> db[(PostgreSQL)] + api --> redis[(Redis)] + api --> ai[AI Providers] + api --> billing[Store Entitlements] + api --> push[Push Gateway] + api --> obs[Observability] + app --> analytics[Product Analytics] +``` + +## 6. Container Architecture + +```mermaid +flowchart TB + subgraph Client + turn[Turn Screen] + mirror[Mirror Screen] + lens[Lens Screen] + spectrum_ui[Spectrum Dashboard] + profile_ui[Gallery and Profile] + end + + subgraph Platform + gateway[API Gateway and Auth] + turn_service[Turn Service] + mirror_service[Mirror Service] + lens_service[Lens Service] + spectrum_service[Spectrum Service] + safety_service[Safety Service] + entitlement_service[Entitlement Service] + jobs[Job Scheduler and Workers] + ai_gateway[AI Gateway] + cost_guard[Usage Meter and Cost Guard] + end + + subgraph Data + postgres[(PostgreSQL)] + redis[(Redis)] + object_storage[(Object Storage)] + end + + subgraph External + ai_provider[Open-weight models via vLLM or Ollama] + store_billing[App Store and Play Billing APIs] + push_provider[APNs and FCM] + glitchtip[GlitchTip] + posthog[PostHog self-hosted] + end + + turn --> gateway + mirror --> gateway + lens --> gateway + spectrum_ui --> gateway + profile_ui --> gateway + + gateway --> turn_service + gateway --> mirror_service + gateway --> lens_service + gateway --> spectrum_service + gateway --> entitlement_service + + mirror_service --> safety_service + turn_service --> safety_service + lens_service --> safety_service + spectrum_service --> safety_service + + turn_service --> ai_gateway + mirror_service --> ai_gateway + lens_service --> ai_gateway + spectrum_service --> ai_gateway + ai_gateway --> ai_provider + + turn_service --> cost_guard + mirror_service --> cost_guard + lens_service --> cost_guard + spectrum_service --> cost_guard + + turn_service --> postgres + mirror_service --> postgres + lens_service --> postgres + spectrum_service --> postgres + entitlement_service --> postgres + jobs --> postgres + + turn_service --> redis + mirror_service --> redis + lens_service --> redis + spectrum_service --> redis + cost_guard --> redis + jobs --> redis + + entitlement_service --> store_billing + jobs --> push_provider + gateway --> glitchtip + gateway --> posthog + spectrum_service --> object_storage +``` + +## 7. Domain and Service Boundaries + +### 7.1 Runtime modules + +- `auth`: sign-up/sign-in, token issuance/rotation, device session management. +- `entitlements`: direct App Store + Google Play sync, plan gating (`free`, `prism`, `prism_plus`). +- `mirror`: session lifecycle, message ingestion, fragment detection, inline reframe, reflection. +- `turn`: structured reframing workflow and saved patterns. +- `lens`: goals, actions, daily focus generation, check-ins. +- `spectrum`: analytics feature store, weekly/monthly aggregation, insight generation. +- `safety`: crisis detection, escalation, crisis response policy. +- `ai_gateway`: prompt templates, model routing, retries/timeouts, structured output validation. +- `usage_cost`: token telemetry, per-user budgets, global spend controls. +- `notifications`: push scheduling, reminders, weekly summaries. + +### 7.2 Why modular monolith first + +- Lowest operational overhead at launch. +- Strong transaction boundaries in one codebase. +- Easy extraction path later for `spectrum` workers or `ai_gateway` if load increases. + +## 8. Core Data Architecture + +### 8.1 Data domains + +- Identity: users, profiles, auth_sessions, refresh_tokens. +- Product interactions: turns, mirror_sessions, mirror_messages, mirror_fragments, lens_goals, lens_actions. +- Analytics: spectrum_session_analysis, spectrum_turn_analysis, spectrum_weekly, spectrum_monthly. +- Commerce: subscriptions, entitlement_snapshots, billing_events. +- Safety and operations: safety_events, ai_usage_events, request_logs, audit_events. + +### 8.2 Entity relationship view + +```mermaid +flowchart LR + users[USERS] --> profiles[PROFILES] + users --> auth_sessions[AUTH_SESSIONS] + users --> refresh_tokens[REFRESH_TOKENS] + users --> turns[TURNS] + users --> mirror_sessions[MIRROR_SESSIONS] + mirror_sessions --> mirror_messages[MIRROR_MESSAGES] + mirror_messages --> mirror_fragments[MIRROR_FRAGMENTS] + users --> lens_goals[LENS_GOALS] + lens_goals --> lens_actions[LENS_ACTIONS] + users --> spectrum_session[SPECTRUM_SESSION_ANALYSIS] + users --> spectrum_turn[SPECTRUM_TURN_ANALYSIS] + users --> spectrum_weekly[SPECTRUM_WEEKLY] + users --> spectrum_monthly[SPECTRUM_MONTHLY] + users --> subscriptions[SUBSCRIPTIONS] + users --> entitlement[ENTITLEMENT_SNAPSHOTS] + users --> safety_events[SAFETY_EVENTS] + users --> ai_usage[AI_USAGE_EVENTS] +``` + +### 8.3 Storage policy + +- Raw reflective content remains in transactional tables, encrypted at rest. +- Spectrum dashboard reads aggregated tables only by default. +- Per-session exclusion flags allow users to opt out entries from analytics. +- Hard delete workflow removes raw + derived analytics for requested windows. + +## 9. Key Runtime Sequences + +### 9.1 Mirror message processing with safety gate + +```mermaid +sequenceDiagram + participant App as Mobile App + participant API as Kalei API + participant Safety as Safety Service + participant Ent as Entitlement Service + participant AI as AI Gateway + participant Model as AI Provider + participant DB as PostgreSQL + participant Redis as Redis + + App->>API: POST /mirror/messages + API->>Ent: Check plan/quota + Ent->>Redis: Read counters + Ent-->>API: Allowed + API->>Safety: Crisis precheck + alt Crisis detected + Safety->>DB: Insert safety_event + API-->>App: Crisis resources response + else Not crisis + API->>AI: Detect fragments prompt + AI->>Model: Inference request + Model-->>AI: Fragments with confidence + AI-->>API: Validated structured result + API->>DB: Save message + fragments + API->>Redis: Increment usage counters + API-->>App: Highlight payload + end +``` + +### 9.2 Turn (Kaleidoscope) request + +```mermaid +sequenceDiagram + participant App as Mobile App + participant API as Kalei API + participant Ent as Entitlement Service + participant Safety as Safety Service + participant AI as AI Gateway + participant Model as AI Provider + participant DB as PostgreSQL + participant Cost as Cost Guard + + App->>API: POST /turns + API->>Ent: Validate tier + daily cap + API->>Safety: Crisis precheck + alt Crisis detected + API-->>App: Crisis resources response + else Safe + API->>AI: Generate 3 reframes + micro-action + AI->>Model: Inference stream + Model-->>AI: Structured reframes + AI-->>API: Response + token usage + API->>Cost: Record token usage + budget check + API->>DB: Save turn + metadata + API-->>App: Stream final turn result + end +``` + +### 9.3 Weekly Spectrum aggregation (background) + +```mermaid +sequenceDiagram + participant Cron as Scheduler + participant Worker as Spectrum Worker + participant DB as PostgreSQL + participant AI as AI Gateway + participant Model as Batch Provider + participant Push as Notification Service + + Cron->>Worker: Trigger weekly job + Worker->>DB: Load eligible users + raw events + Worker->>DB: Compute vectors and weekly aggregates + Worker->>AI: Generate insight narratives from aggregates + AI->>Model: Batch request + Model-->>AI: Insight text + AI-->>Worker: Validated summaries + Worker->>DB: Upsert spectrum_weekly and monthly deltas + Worker->>Push: Enqueue spectrum updated notifications +``` + +## 10. API Surface (v1) + +### 10.1 Auth and profile + +- `POST /auth/register` +- `POST /auth/login` +- `POST /auth/refresh` +- `POST /auth/logout` +- `GET /me` +- `PATCH /me/profile` + +### 10.2 Mirror + +- `POST /mirror/sessions` +- `POST /mirror/messages` +- `POST /mirror/fragments/{id}/reframe` +- `POST /mirror/sessions/{id}/close` +- `GET /mirror/sessions` +- `DELETE /mirror/sessions/{id}` + +### 10.3 Turn + +- `POST /turns` +- `GET /turns` +- `GET /turns/{id}` +- `POST /turns/{id}/save` + +### 10.4 Lens + +- `POST /lens/goals` +- `GET /lens/goals` +- `POST /lens/goals/{id}/actions` +- `POST /lens/actions/{id}/complete` +- `GET /lens/affirmation/today` + +### 10.5 Spectrum + +- `GET /spectrum/weekly` +- `GET /spectrum/monthly` +- `POST /spectrum/reset` +- `POST /spectrum/exclusions` + +### 10.6 Billing and entitlements + +- `POST /billing/webhooks/apple` +- `POST /billing/webhooks/google` +- `GET /billing/entitlements` + +## 11. Security, Safety, and Compliance Architecture + +### 11.1 Security controls + +- TLS everywhere (edge proxy to API origin and service egress). +- JWT access tokens (short TTL) + rotating refresh tokens. +- Password hashing with Argon2id (preferred) or bcrypt with strong cost factor. +- Row ownership checks enforced in API and optionally DB RLS for defense in depth. +- Secrets in environment vault; never in client bundle. +- Audit logging for auth events, entitlement changes, deletes, and safety events. + +### 11.2 Data protection + +- Encryption at rest for disk volumes and database backups. +- Column-level encryption for highly sensitive text fields (Mirror message content). +- Data minimization for analytics: Spectrum reads vectors and aggregates by default. +- User rights flows: export, per-item delete, account delete, Spectrum reset. + +### 11.3 Safety architecture + +- Multi-stage crisis filter: + 1. Deterministic keyword and pattern pass. + 2. Low-latency model confirmation where needed. + 3. Hardcoded crisis response templates and hotline resources. +- Crisis-level content is never reframed. +- Safety events are logged and monitored for false-positive/false-negative tuning. + +## 12. Reliability and Performance + +### 12.1 Initial SLO targets + +- API availability: 99.5% monthly (Phase 1), 99.9% target by Phase 2. +- Turn and Mirror response latency: + - p50 < 1.8s + - p95 < 3.5s +- Weekly Spectrum jobs completed within 2 hours of scheduled run. + +### 12.2 Resilience patterns + +- Idempotency keys on write endpoints. +- AI provider timeout + retry policy with circuit breaker. +- Graceful degradation hierarchy when budget/latency pressure occurs: + 1. Degrade Lens generation first (template fallback). + 2. Keep Turn and Mirror available. + 3. Pause non-critical Spectrum generation if needed. +- Dead-letter queue for failed async jobs. + +## 13. Observability and FinOps + +### 13.1 Telemetry + +- Structured logs with request ID, user ID hash, feature, model, token usage, cost. +- Metrics: + - request rate/error rate/latency by endpoint + - AI token usage and cost by feature + - quota denials and safety escalations +- Tracing across API -> AI Gateway -> provider call. + +### 13.2 Cost controls + +- Global monthly AI spend cap and alert thresholds (50%, 80%, 95%). +- Per-user daily token budget in Redis. +- Feature-level cost envelope with model routing: + - Turn/Mirror: quality-first lane + - Lens/Spectrum narrative generation: cost-optimized lane as needed +- Prompt caching for stable system prompts. + +## 14. Deployment Topology and Scaling Path + +### 14.1 Phase 1 deployment (single-node) + +```mermaid +flowchart LR + EDGE[Caddy or Nginx Edge] --> NX[Nginx] + NX --> API[API + Workers] + API --> PG[(PostgreSQL)] + API --> R[(Redis)] + API --> AIP[AI Providers] +``` + +### 14.2 Phase evolution + +```mermaid +flowchart LR + p1[Phase 1 single VPS API DB Redis] --> p2[Phase 2 split DB keep API monolith] + p2 --> p3[Phase 3 separate workers and scale API] + p3 --> p4[Phase 4 optional service extraction] +``` + +### 14.3 Trigger-based scaling + +- Move DB off app node when p95 query latency > 120ms sustained or storage > 70%. +- Add API replica when CPU > 70% sustained at peak and p95 latency breaches SLO. +- Split workers when Spectrum jobs impact interactive endpoints. + +## 15. Delivery Plan + +### 15.1 Phase A (Weeks 1-4): Platform foundation + +- API skeleton, auth, profile, entitlements integration. +- Postgres schema v1 and migrations. +- Mirror + Turn endpoints with safety pre-check. +- Usage metering and rate limiting. + +### 15.2 Phase B (Weeks 5-8): Product completion for launch + +- Lens flows and Gallery history. +- Push notifications and daily reminders. +- Full observability, alerting, and incident runbooks. +- Beta load testing and security hardening. + +### 15.3 Phase C (Weeks 9-12): Spectrum baseline (Phase 2 readiness) + +- Vector extraction pipeline and aggregated tables. +- Weekly batch jobs and dashboard endpoints. +- Data exclusion controls and reset workflow. +- Cost optimization pass on AI routing. + +## 16. Risks and Mitigations + +| Risk | Impact | Mitigation | +|---|---|---| +| Reframe quality variance by provider/model | Core UX degradation | Keep AI Gateway abstraction + blind quality harness + model canary rollout. | +| Safety false negatives | High trust and user harm risk | Defense-in-depth crisis filter + explicit no-reframe crisis policy + monitoring and review loop. | +| AI cost spikes | Margin compression | Hard spend caps, per-feature budgets, degradation order, model fallback lanes. | +| Single-node bottlenecks | Latency and availability issues | Trigger-based scaling plan and early instrumentation. | +| Sensitive data handling errors | Compliance and trust risk | Encryption, strict retention controls, deletion workflows, audit logs. | + +## 17. Decision Log and Open Items + +### 17.1 Decided in this plan + +- Self-hosted API + Postgres + Redis is the canonical launch architecture. +- AI provider routing is built in from day one. +- Safety is an explicit service and gate on all AI-facing paths. +- Spectrum runs asynchronously over aggregated data. + +### 17.2 Open decisions to finalize before build start + +- Final provider mix at launch: + - Option A: Qwen-first on all features via vLLM. + - Option B: Qwen for Turn/Mirror and smaller open-weight model for Lens/Spectrum narratives. +- Exact hosting target for Phase 2 DB scaling (dedicated VPS vs managed Postgres). +- Regional crisis resource strategy (US-first or multi-region at launch). + +--- + +If approved, this document should become the architecture source of truth and supersede conflicting details in older planning docs.