LetsBeBiz-Redesign/docs/architecture-proposal/gpt/01-architecture-and-dataflo...

251 lines
9.3 KiB
Markdown

# 01. Architecture And Data Flows
## 1. Scope And Non-Negotiables
This proposal is explicitly designed around the fixed constraints from the Architecture Brief:
- 4-layer security model is mandatory.
- Secrets never leave tenant server is mandatory.
- 3-tier autonomy + external communications gate is mandatory.
- OpenClaw is upstream dependency (no fork by default).
- One customer = one VPS is mandatory.
- `n8n` removal is prerequisite.
## 2. Proposed Target Architecture
### 2.1 Core Decisions
| Decision | Proposal | Why |
|---|---|---|
| Hub stack | Keep Next.js + Prisma + PostgreSQL | Existing app already has major workflows and 80+ APIs; rewrite is timeline-risky for 3-month launch. |
| OpenClaw integration | Use pinned upstream release, no fork | Maximizes upgrade velocity and avoids merge debt. |
| Safety Wrapper shape | Hybrid: OpenClaw plugin + local egress proxy + local execution adapters | Gives direct hook interception plus transport-level redaction guarantee. |
| Mobile | React Native + Expo | Fastest path to iOS/Android with TypeScript contract reuse. |
| Website | Separate public web app (same monorepo) + Hub public APIs | Security isolation between public onboarding and admin/customer portal. |
| Repo strategy | Monorepo for first-party services; OpenClaw kept separate upstream repo | Strong contract sharing + CI simplicity without violating upstream dependency model. |
### 2.2 System Context Diagram
```mermaid
flowchart LR
subgraph Client[Client Layer]
M[Mobile App\nReact Native + Expo]
W[Website\nOnboarding + Checkout]
C[Customer Portal Web]
A[Admin Portal Web]
end
subgraph Control[Central Platform]
H[Hub API + UI\nNext.js + Prisma]
DB[(PostgreSQL)]
Q[Background Workers\nAutomation + Metering]
N[Notification Service\nPush/Email]
ST[Stripe]
NC[Netcup/Hetzner]
end
subgraph Tenant[Per-Customer VPS]
OC[OpenClaw Gateway\nUpstream]
SW[Safety Wrapper Plugin\nHooks + Classification]
SP[LLM Egress Proxy\nSecrets Firewall]
SV[(Secrets Vault SQLite\nEncrypted)]
TA[Tool Adapters + Exec Guards]
TS[(Tool Stacks 25+)]
AP[(Approval Cache SQLite)]
TU[(Token Usage Buckets)]
end
M --> H
W --> H
C --> H
A --> H
H --> DB
H --> Q
H --> N
H <--> ST
H <--> NC
H <--> OC
OC --> SW
SW --> SP
SP --> LLM[(LLM Providers)]
SW <--> SV
SW <--> TA
TA <--> TS
SW <--> AP
SW --> TU
TU --> H
```
## 3. Tenant Runtime Architecture
### 3.1 4-Layer Security Enforcement
| Layer | Enforcement Point | Implementation |
|---|---|---|
| 1. Sandbox | OpenClaw runtime/tool sandbox settings | OpenClaw native sandbox + process/container isolation. |
| 2. Tool Policy | OpenClaw agent tool allow/deny | Per-agent tool manifest; tools not listed are unreachable. |
| 3. Command Gating | Safety Wrapper `before_tool_call` | Green/Yellow/Yellow+External/Red/Critical Red classification + approval flow. |
| 4. Secrets Redaction | Local egress proxy + transcript hooks | Outbound prompt redaction before network egress, plus log/transcript redaction hooks. |
### 3.2 Safety Wrapper Components
- `classification-engine`: deterministic rules engine with signed policy bundle from Hub.
- `approval-gateway`: sync/async approval requests to Hub, with 24h expiry.
- `secret-ref-resolver`: resolves `SECRET_REF(...)` at execution time only.
- `adapter-runtime`: executes tool API adapters and guarded shell/docker/file actions.
- `metering-collector`: captures per-agent/per-model token usage and aggregates hourly.
- `hub-sync-client`: registration, heartbeat, config pull, backup status, command results.
### 3.3 OpenClaw Hook Usage (No Fork)
Safety Wrapper plugin uses upstream hook points for enforcement and observability:
- `before_tool_call`: classify/gate/block/require approval.
- `after_tool_call`: audit capture + normalization.
- `message_sending`: outbound content redaction.
- `before_message_write`, `tool_result_persist`: local persistence redaction.
- `llm_output`: token accounting and per-model usage capture.
- `before_prompt_build`: inject cacheable SOUL/TOOLS prefix metadata.
- `subagent_spawning`: enforce max depth/budget.
- `gateway_start`: health checks + Hub session bootstrap.
## 4. Primary Data Flows
### 4.1 Signup To Provisioning Flow
```mermaid
sequenceDiagram
participant User
participant Site as Website
participant Hub
participant Stripe
participant Worker as Automation Worker
participant Provider as Netcup/Hetzner
participant Prov as Provisioner
participant VPS as Tenant VPS
User->>Site: Describe business + pick tools
Site->>Hub: Create onboarding draft
Site->>Stripe: Checkout session
Stripe-->>Hub: checkout.session.completed
Hub->>Worker: Create order (PAYMENT_CONFIRMED)
Worker->>Provider: Allocate VPS
Provider-->>Worker: VPS ready (IP + creds)
Worker->>Hub: DNS_PENDING -> DNS_READY
Worker->>Prov: Start provisioning job
Prov->>VPS: Install stacks + OpenClaw + Safety
Prov->>VPS: Seed secrets vault + tool registry
Prov->>VPS: Register tenant with Hub
VPS-->>Hub: register + first heartbeat
Hub-->>User: Provisioning complete + app links
```
### 4.2 Agent Tool Call With Gating
```mermaid
sequenceDiagram
participant U as User
participant OC as OpenClaw
participant SW as Safety Wrapper
participant H as Hub
participant T as Tool/API
U->>OC: "Publish this newsletter"
OC->>SW: tool call proposal
SW->>SW: classify = Yellow+External
SW->>H: approval request
H-->>U: push approval request
U->>H: approve
H-->>SW: approval grant
SW->>T: execute with SECRET_REF injection
T-->>SW: result
SW-->>OC: redacted result
OC-->>U: completion summary
```
### 4.3 Secrets Redaction Outbound Flow
```mermaid
flowchart LR
A[OpenClaw Prompt Payload] --> B[Safety Wrapper Pre-Redaction]
B --> C[Secrets Registry Match]
C --> D[Pattern Safety Net]
D --> E[Function-Call SecretRef Rebinding]
E --> F[Local Egress Proxy]
F --> G[Provider API]
C --> C1[(Vault SQLite)]
D --> D1[(Regex + Entropy Rules)]
F --> F1[Transport-Level Block if bypass attempt]
```
### 4.4 Token Metering And Billing
```mermaid
flowchart LR
O[OpenClaw llm_output hook] --> M[Metering Collector]
M --> B[(Hourly Buckets SQLite)]
B --> H[Hub Usage Ingest API]
H --> P[(Billing Period + Usage Tables)]
P --> S[Stripe Usage/Billing]
H --> UI[Usage Dashboard + Alerts]
```
## 5. Prompt Caching Architecture
- SOUL.md and TOOLS.md are split into stable cacheable prefix blocks and dynamic suffix blocks.
- Stable prefix hash is generated per agent version.
- Prefix changes only when agent config changes; day-to-day conversations hit cache-read pricing.
- Metering persists `input/output/cache_read/cache_write` separately to preserve margin analytics.
## 6. Mobile, Website, And Channel Architecture
### 6.1 Mobile App
- React Native + Expo app as primary interface.
- Real-time chat via Hub websocket gateway.
- Approvals as push notifications (approve/deny quick actions).
- Fallback channel switchboard in Hub for WhatsApp/Telegram relay adapters.
### 6.2 Website + Onboarding
- Dedicated public frontend app (`apps/website`) with strict network boundary to Hub public APIs.
- Onboarding classifier service (cheap model profile) performs 1-2 message business classification.
- Tool bundle recommendation engine returns editable stack + resource calculator.
- Checkout remains Stripe-hosted.
## 7. First-Hour Workflow Templates (Architecture Proof)
| Template | Cross-Tool Actions | Gating Profile |
|---|---|---|
| Freelancer First Hour | Connect mail + calendar, create folders, configure intake form, first daily brief | Mostly Green/Yellow |
| Agency First Hour | Chat inbox setup, project board scaffolding, proposal template generation, shared KB setup | Yellow + Yellow+External approval |
| E-commerce First Hour | Inventory import, support inbox routing, analytics dashboard baseline, recovery email draft | Mixed Yellow/Yellow+External |
| Consulting First Hour | Scheduling links, client doc signature template, CRM stages, weekly report automation | Mostly Yellow + one external gate |
These templates are codified as audited workflow blueprints executed through the same command classification path as ad-hoc agent actions.
## 8. Interactive Demo Architecture (Pre-Purchase)
Proposal: shared but isolated "Demo Tenant Pool" instead of a single static demo VPS.
- Each prospect gets a short-lived demo tenant snapshot (TTL 2 hours).
- Demo runs synthetic data and fake outbound integrations only.
- Same Safety Wrapper + approvals UI as production to demonstrate trust model.
- Recycled automatically after session expiry.
This is safer and more realistic than one long-lived shared "Bella's Bakery" host.
## 9. Required Pre-Launch Cleanup Baseline
Before core build starts, execute repository cleanup gate:
- Remove all `n8n` references from Hub, Provisioner, stacks, scripts, tests, and docs used for production behavior.
- Remove deployment references to deprecated `orchestrator` and `sysadmin-agent` from active provisioning paths.
- Close plaintext credential leak path (`jobs/*/config.json` root password exposure) by moving to one-time secret files + immediate secure deletion.
No feature work should proceed until this baseline passes CI policy checks.