LetsBeBiz-Redesign/docs/architecture-proposal/gpt/01-architecture-and-dataflo...

# 01. Architecture And Data Flows

## 1. Scope And Non-Negotiables

This proposal is explicitly designed around the fixed constraints from the Architecture Brief:

- 4-layer security model is mandatory.
- Secrets never leave tenant server is mandatory.
- 3-tier autonomy + external communications gate is mandatory.
- OpenClaw is upstream dependency (no fork by default).
- One customer = one VPS is mandatory.
- `n8n` removal is prerequisite.

## 2. Proposed Target Architecture

### 2.1 Core Decisions

| Decision | Proposal | Why |
|---|---|---|
| Hub stack | Keep Next.js + Prisma + PostgreSQL | Existing app already has major workflows and 80+ APIs; rewrite is timeline-risky for 3-month launch. |
| OpenClaw integration | Use pinned upstream release, no fork | Maximizes upgrade velocity and avoids merge debt. |
| Safety Wrapper shape | Hybrid: OpenClaw plugin + local egress proxy + local execution adapters | Gives direct hook interception plus transport-level redaction guarantee. |
| Mobile | React Native + Expo | Fastest path to iOS/Android with TypeScript contract reuse. |
| Website | Separate public web app (same monorepo) + Hub public APIs | Security isolation between public onboarding and admin/customer portal. |
| Repo strategy | Monorepo for first-party services; OpenClaw kept separate upstream repo | Strong contract sharing + CI simplicity without violating upstream dependency model. |

### 2.2 System Context Diagram

```mermaid
flowchart LR
    subgraph Client[Client Layer]
      M[Mobile App\nReact Native + Expo]
      W[Website\nOnboarding + Checkout]
      C[Customer Portal Web]
      A[Admin Portal Web]
    end

    subgraph Control[Central Platform]
      H[Hub API + UI\nNext.js + Prisma]
      DB[(PostgreSQL)]
      Q[Background Workers\nAutomation + Metering]
      N[Notification Service\nPush/Email]
      ST[Stripe]
      NC[Netcup/Hetzner]
    end

    subgraph Tenant[Per-Customer VPS]
      OC[OpenClaw Gateway\nUpstream]
      SW[Safety Wrapper Plugin\nHooks + Classification]
      SP[LLM Egress Proxy\nSecrets Firewall]
      SV[(Secrets Vault SQLite\nEncrypted)]
      TA[Tool Adapters + Exec Guards]
      TS[(Tool Stacks 25+)]
      AP[(Approval Cache SQLite)]
      TU[(Token Usage Buckets)]
    end

    M --> H
    W --> H
    C --> H
    A --> H

    H --> DB
    H --> Q
    H --> N
    H <--> ST
    H <--> NC

    H <--> OC
    OC --> SW
    SW --> SP
    SP --> LLM[(LLM Providers)]

    SW <--> SV
    SW <--> TA
    TA <--> TS
    SW <--> AP
    SW --> TU
    TU --> H
```

## 3. Tenant Runtime Architecture

### 3.1 4-Layer Security Enforcement

| Layer | Enforcement Point | Implementation |
|---|---|---|
| 1. Sandbox | OpenClaw runtime/tool sandbox settings | OpenClaw native sandbox + process/container isolation. |
| 2. Tool Policy | OpenClaw agent tool allow/deny | Per-agent tool manifest; tools not listed are unreachable. |
| 3. Command Gating | Safety Wrapper `before_tool_call` | Green/Yellow/Yellow+External/Red/Critical Red classification + approval flow. |
| 4. Secrets Redaction | Local egress proxy + transcript hooks | Outbound prompt redaction before network egress, plus log/transcript redaction hooks. |

### 3.2 Safety Wrapper Components

- `classification-engine`: deterministic rules engine with signed policy bundle from Hub.
- `approval-gateway`: sync/async approval requests to Hub, with 24h expiry.
- `secret-ref-resolver`: resolves `SECRET_REF(...)` at execution time only.
- `adapter-runtime`: executes tool API adapters and guarded shell/docker/file actions.
- `metering-collector`: captures per-agent/per-model token usage and aggregates hourly.
- `hub-sync-client`: registration, heartbeat, config pull, backup status, command results.

### 3.3 OpenClaw Hook Usage (No Fork)

Safety Wrapper plugin uses upstream hook points for enforcement and observability:

- `before_tool_call`: classify/gate/block/require approval.
- `after_tool_call`: audit capture + normalization.
- `message_sending`: outbound content redaction.
- `before_message_write`, `tool_result_persist`: local persistence redaction.
- `llm_output`: token accounting and per-model usage capture.
- `before_prompt_build`: inject cacheable SOUL/TOOLS prefix metadata.
- `subagent_spawning`: enforce max depth/budget.
- `gateway_start`: health checks + Hub session bootstrap.

## 4. Primary Data Flows

### 4.1 Signup To Provisioning Flow

```mermaid
sequenceDiagram
    participant User
    participant Site as Website
    participant Hub
    participant Stripe
    participant Worker as Automation Worker
    participant Provider as Netcup/Hetzner
    participant Prov as Provisioner
    participant VPS as Tenant VPS

    User->>Site: Describe business + pick tools
    Site->>Hub: Create onboarding draft
    Site->>Stripe: Checkout session
    Stripe-->>Hub: checkout.session.completed
    Hub->>Worker: Create order (PAYMENT_CONFIRMED)
    Worker->>Provider: Allocate VPS
    Provider-->>Worker: VPS ready (IP + creds)
    Worker->>Hub: DNS_PENDING -> DNS_READY
    Worker->>Prov: Start provisioning job
    Prov->>VPS: Install stacks + OpenClaw + Safety
    Prov->>VPS: Seed secrets vault + tool registry
    Prov->>VPS: Register tenant with Hub
    VPS-->>Hub: register + first heartbeat
    Hub-->>User: Provisioning complete + app links
```

### 4.2 Agent Tool Call With Gating

```mermaid
sequenceDiagram
    participant U as User
    participant OC as OpenClaw
    participant SW as Safety Wrapper
    participant H as Hub
    participant T as Tool/API

    U->>OC: "Publish this newsletter"
    OC->>SW: tool call proposal
    SW->>SW: classify = Yellow+External
    SW->>H: approval request
    H-->>U: push approval request
    U->>H: approve
    H-->>SW: approval grant
    SW->>T: execute with SECRET_REF injection
    T-->>SW: result
    SW-->>OC: redacted result
    OC-->>U: completion summary
```

### 4.3 Secrets Redaction Outbound Flow

```mermaid
flowchart LR
    A[OpenClaw Prompt Payload] --> B[Safety Wrapper Pre-Redaction]
    B --> C[Secrets Registry Match]
    C --> D[Pattern Safety Net]
    D --> E[Function-Call SecretRef Rebinding]
    E --> F[Local Egress Proxy]
    F --> G[Provider API]

    C --> C1[(Vault SQLite)]
    D --> D1[(Regex + Entropy Rules)]
    F --> F1[Transport-Level Block if bypass attempt]
```

### 4.4 Token Metering And Billing

```mermaid
flowchart LR
    O[OpenClaw llm_output hook] --> M[Metering Collector]
    M --> B[(Hourly Buckets SQLite)]
    B --> H[Hub Usage Ingest API]
    H --> P[(Billing Period + Usage Tables)]
    P --> S[Stripe Usage/Billing]
    H --> UI[Usage Dashboard + Alerts]
```

## 5. Prompt Caching Architecture

- SOUL.md and TOOLS.md are split into stable cacheable prefix blocks and dynamic suffix blocks.
- Stable prefix hash is generated per agent version.
- Prefix changes only when agent config changes; day-to-day conversations hit cache-read pricing.
- Metering persists `input/output/cache_read/cache_write` separately to preserve margin analytics.

## 6. Mobile, Website, And Channel Architecture

### 6.1 Mobile App

- React Native + Expo app as primary interface.
- Real-time chat via Hub websocket gateway.
- Approvals as push notifications (approve/deny quick actions).
- Fallback channel switchboard in Hub for WhatsApp/Telegram relay adapters.

### 6.2 Website + Onboarding

- Dedicated public frontend app (`apps/website`) with strict network boundary to Hub public APIs.
- Onboarding classifier service (cheap model profile) performs 1-2 message business classification.
- Tool bundle recommendation engine returns editable stack + resource calculator.
- Checkout remains Stripe-hosted.

## 7. First-Hour Workflow Templates (Architecture Proof)

| Template | Cross-Tool Actions | Gating Profile |
|---|---|---|
| Freelancer First Hour | Connect mail + calendar, create folders, configure intake form, first daily brief | Mostly Green/Yellow |
| Agency First Hour | Chat inbox setup, project board scaffolding, proposal template generation, shared KB setup | Yellow + Yellow+External approval |
| E-commerce First Hour | Inventory import, support inbox routing, analytics dashboard baseline, recovery email draft | Mixed Yellow/Yellow+External |
| Consulting First Hour | Scheduling links, client doc signature template, CRM stages, weekly report automation | Mostly Yellow + one external gate |

These templates are codified as audited workflow blueprints executed through the same command classification path as ad-hoc agent actions.

## 8. Interactive Demo Architecture (Pre-Purchase)

Proposal: shared but isolated "Demo Tenant Pool" instead of a single static demo VPS.

- Each prospect gets a short-lived demo tenant snapshot (TTL 2 hours).
- Demo runs synthetic data and fake outbound integrations only.
- Same Safety Wrapper + approvals UI as production to demonstrate trust model.
- Recycled automatically after session expiry.

This is safer and more realistic than one long-lived shared "Bella's Bakery" host.

## 9. Required Pre-Launch Cleanup Baseline

Before core build starts, execute repository cleanup gate:

- Remove all `n8n` references from Hub, Provisioner, stacks, scripts, tests, and docs used for production behavior.
- Remove deployment references to deprecated `orchestrator` and `sysadmin-agent` from active provisioning paths.
- Close plaintext credential leak path (`jobs/*/config.json` root password exposure) by moving to one-time secret files + immediate secure deletion.

No feature work should proceed until this baseline passes CI policy checks.