9.3 KiB

Raw Permalink Blame History

01. Architecture And Data Flows

1. Scope And Non-Negotiables

This proposal is explicitly designed around the fixed constraints from the Architecture Brief:

4-layer security model is mandatory.
Secrets never leave tenant server is mandatory.
3-tier autonomy + external communications gate is mandatory.
OpenClaw is upstream dependency (no fork by default).
One customer = one VPS is mandatory.
n8n removal is prerequisite.

2. Proposed Target Architecture

2.1 Core Decisions

Decision	Proposal	Why
Hub stack	Keep Next.js + Prisma + PostgreSQL	Existing app already has major workflows and 80+ APIs; rewrite is timeline-risky for 3-month launch.
OpenClaw integration	Use pinned upstream release, no fork	Maximizes upgrade velocity and avoids merge debt.
Safety Wrapper shape	Hybrid: OpenClaw plugin + local egress proxy + local execution adapters	Gives direct hook interception plus transport-level redaction guarantee.
Mobile	React Native + Expo	Fastest path to iOS/Android with TypeScript contract reuse.
Website	Separate public web app (same monorepo) + Hub public APIs	Security isolation between public onboarding and admin/customer portal.
Repo strategy	Monorepo for first-party services; OpenClaw kept separate upstream repo	Strong contract sharing + CI simplicity without violating upstream dependency model.

2.2 System Context Diagram

flowchart LR
    subgraph Client[Client Layer]
      M[Mobile App\nReact Native + Expo]
      W[Website\nOnboarding + Checkout]
      C[Customer Portal Web]
      A[Admin Portal Web]
    end

    subgraph Control[Central Platform]
      H[Hub API + UI\nNext.js + Prisma]
      DB[(PostgreSQL)]
      Q[Background Workers\nAutomation + Metering]
      N[Notification Service\nPush/Email]
      ST[Stripe]
      NC[Netcup/Hetzner]
    end

    subgraph Tenant[Per-Customer VPS]
      OC[OpenClaw Gateway\nUpstream]
      SW[Safety Wrapper Plugin\nHooks + Classification]
      SP[LLM Egress Proxy\nSecrets Firewall]
      SV[(Secrets Vault SQLite\nEncrypted)]
      TA[Tool Adapters + Exec Guards]
      TS[(Tool Stacks 25+)]
      AP[(Approval Cache SQLite)]
      TU[(Token Usage Buckets)]
    end

    M --> H
    W --> H
    C --> H
    A --> H

    H --> DB
    H --> Q
    H --> N
    H <--> ST
    H <--> NC

    H <--> OC
    OC --> SW
    SW --> SP
    SP --> LLM[(LLM Providers)]

    SW <--> SV
    SW <--> TA
    TA <--> TS
    SW <--> AP
    SW --> TU
    TU --> H

3. Tenant Runtime Architecture

3.1 4-Layer Security Enforcement

Layer	Enforcement Point	Implementation
1. Sandbox	OpenClaw runtime/tool sandbox settings	OpenClaw native sandbox + process/container isolation.
2. Tool Policy	OpenClaw agent tool allow/deny	Per-agent tool manifest; tools not listed are unreachable.
3. Command Gating	Safety Wrapper `before_tool_call`	Green/Yellow/Yellow+External/Red/Critical Red classification + approval flow.
4. Secrets Redaction	Local egress proxy + transcript hooks	Outbound prompt redaction before network egress, plus log/transcript redaction hooks.

3.2 Safety Wrapper Components

classification-engine: deterministic rules engine with signed policy bundle from Hub.
approval-gateway: sync/async approval requests to Hub, with 24h expiry.
secret-ref-resolver: resolves SECRET_REF(...) at execution time only.
adapter-runtime: executes tool API adapters and guarded shell/docker/file actions.
metering-collector: captures per-agent/per-model token usage and aggregates hourly.
hub-sync-client: registration, heartbeat, config pull, backup status, command results.

3.3 OpenClaw Hook Usage (No Fork)

Safety Wrapper plugin uses upstream hook points for enforcement and observability:

before_tool_call: classify/gate/block/require approval.
after_tool_call: audit capture + normalization.
message_sending: outbound content redaction.
before_message_write, tool_result_persist: local persistence redaction.
llm_output: token accounting and per-model usage capture.
before_prompt_build: inject cacheable SOUL/TOOLS prefix metadata.
subagent_spawning: enforce max depth/budget.
gateway_start: health checks + Hub session bootstrap.

4. Primary Data Flows

sequenceDiagram
    participant User
    participant Site as Website
    participant Hub
    participant Stripe
    participant Worker as Automation Worker
    participant Provider as Netcup/Hetzner
    participant Prov as Provisioner
    participant VPS as Tenant VPS

    User->>Site: Describe business + pick tools
    Site->>Hub: Create onboarding draft
    Site->>Stripe: Checkout session
    Stripe-->>Hub: checkout.session.completed
    Hub->>Worker: Create order (PAYMENT_CONFIRMED)
    Worker->>Provider: Allocate VPS
    Provider-->>Worker: VPS ready (IP + creds)
    Worker->>Hub: DNS_PENDING -> DNS_READY
    Worker->>Prov: Start provisioning job
    Prov->>VPS: Install stacks + OpenClaw + Safety
    Prov->>VPS: Seed secrets vault + tool registry
    Prov->>VPS: Register tenant with Hub
    VPS-->>Hub: register + first heartbeat
    Hub-->>User: Provisioning complete + app links

4.2 Agent Tool Call With Gating

sequenceDiagram
    participant U as User
    participant OC as OpenClaw
    participant SW as Safety Wrapper
    participant H as Hub
    participant T as Tool/API

    U->>OC: "Publish this newsletter"
    OC->>SW: tool call proposal
    SW->>SW: classify = Yellow+External
    SW->>H: approval request
    H-->>U: push approval request
    U->>H: approve
    H-->>SW: approval grant
    SW->>T: execute with SECRET_REF injection
    T-->>SW: result
    SW-->>OC: redacted result
    OC-->>U: completion summary

4.3 Secrets Redaction Outbound Flow

flowchart LR
    A[OpenClaw Prompt Payload] --> B[Safety Wrapper Pre-Redaction]
    B --> C[Secrets Registry Match]
    C --> D[Pattern Safety Net]
    D --> E[Function-Call SecretRef Rebinding]
    E --> F[Local Egress Proxy]
    F --> G[Provider API]

    C --> C1[(Vault SQLite)]
    D --> D1[(Regex + Entropy Rules)]
    F --> F1[Transport-Level Block if bypass attempt]

4.4 Token Metering And Billing

flowchart LR
    O[OpenClaw llm_output hook] --> M[Metering Collector]
    M --> B[(Hourly Buckets SQLite)]
    B --> H[Hub Usage Ingest API]
    H --> P[(Billing Period + Usage Tables)]
    P --> S[Stripe Usage/Billing]
    H --> UI[Usage Dashboard + Alerts]

5. Prompt Caching Architecture

SOUL.md and TOOLS.md are split into stable cacheable prefix blocks and dynamic suffix blocks.
Stable prefix hash is generated per agent version.
Prefix changes only when agent config changes; day-to-day conversations hit cache-read pricing.
Metering persists input/output/cache_read/cache_write separately to preserve margin analytics.

6. Mobile, Website, And Channel Architecture

6.1 Mobile App

React Native + Expo app as primary interface.
Real-time chat via Hub websocket gateway.
Approvals as push notifications (approve/deny quick actions).
Fallback channel switchboard in Hub for WhatsApp/Telegram relay adapters.

6.2 Website + Onboarding

Dedicated public frontend app (apps/website) with strict network boundary to Hub public APIs.
Onboarding classifier service (cheap model profile) performs 1-2 message business classification.
Tool bundle recommendation engine returns editable stack + resource calculator.
Checkout remains Stripe-hosted.

7. First-Hour Workflow Templates (Architecture Proof)

Template	Cross-Tool Actions	Gating Profile
Freelancer First Hour	Connect mail + calendar, create folders, configure intake form, first daily brief	Mostly Green/Yellow
Agency First Hour	Chat inbox setup, project board scaffolding, proposal template generation, shared KB setup	Yellow + Yellow+External approval
E-commerce First Hour	Inventory import, support inbox routing, analytics dashboard baseline, recovery email draft	Mixed Yellow/Yellow+External
Consulting First Hour	Scheduling links, client doc signature template, CRM stages, weekly report automation	Mostly Yellow + one external gate

These templates are codified as audited workflow blueprints executed through the same command classification path as ad-hoc agent actions.

8. Interactive Demo Architecture (Pre-Purchase)

Proposal: shared but isolated "Demo Tenant Pool" instead of a single static demo VPS.

Each prospect gets a short-lived demo tenant snapshot (TTL 2 hours).
Demo runs synthetic data and fake outbound integrations only.
Same Safety Wrapper + approvals UI as production to demonstrate trust model.
Recycled automatically after session expiry.

This is safer and more realistic than one long-lived shared "Bella's Bakery" host.

9. Required Pre-Launch Cleanup Baseline

Before core build starts, execute repository cleanup gate:

Remove all n8n references from Hub, Provisioner, stacks, scripts, tests, and docs used for production behavior.
Remove deployment references to deprecated orchestrator and sysadmin-agent from active provisioning paths.
Close plaintext credential leak path (jobs/*/config.json root password exposure) by moving to one-time secret files + immediate secure deletion.

No feature work should proceed until this baseline passes CI policy checks.

9.3 KiB Raw Permalink Blame History