Initial commit: LetsBe Biz project with openclaw source

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-27 16:24:23 +01:00
commit 14ff8fd54c
93 changed files with 31651 additions and 0 deletions

View File

@@ -0,0 +1,37 @@
# 00. Executive Summary
## Recommended Direction
- Retain and extend `letsbe-hub` instead of rewriting backend.
- Build Safety Wrapper as OpenClaw plugin with a separate local egress redaction proxy.
- Treat OpenClaw as a pinned upstream dependency (no fork).
- Make `n8n`/deprecated stack removal and plaintext credential leak fixes the first gate.
- Launch mobile with React Native + Expo and web onboarding as separate frontend app.
- Move first-party code to a monorepo for shared contracts and coordinated CI.
## Delivery Window
- Start: March 2, 2026
- Founding member launch target: May 24, 2026
- Buffer: May 25-31, 2026
## Hard Requirements Preserved
- 4-layer security model
- secrets-never-leave-server invariant
- 3-tier autonomy with independent external-comms gate
- one customer per VPS
## Most Critical Risks
- security bypass in redaction/gating
- provisioner migration instability
- billing metering accuracy drift
## First Build Gate
Do not start feature tracks until:
1. all `n8n` production references removed
2. deprecated deploy paths disabled
3. plaintext provisioning secret storage eliminated

View File

@@ -0,0 +1,250 @@
# 01. Architecture And Data Flows
## 1. Scope And Non-Negotiables
This proposal is explicitly designed around the fixed constraints from the Architecture Brief:
- 4-layer security model is mandatory.
- Secrets never leave tenant server is mandatory.
- 3-tier autonomy + external communications gate is mandatory.
- OpenClaw is upstream dependency (no fork by default).
- One customer = one VPS is mandatory.
- `n8n` removal is prerequisite.
## 2. Proposed Target Architecture
### 2.1 Core Decisions
| Decision | Proposal | Why |
|---|---|---|
| Hub stack | Keep Next.js + Prisma + PostgreSQL | Existing app already has major workflows and 80+ APIs; rewrite is timeline-risky for 3-month launch. |
| OpenClaw integration | Use pinned upstream release, no fork | Maximizes upgrade velocity and avoids merge debt. |
| Safety Wrapper shape | Hybrid: OpenClaw plugin + local egress proxy + local execution adapters | Gives direct hook interception plus transport-level redaction guarantee. |
| Mobile | React Native + Expo | Fastest path to iOS/Android with TypeScript contract reuse. |
| Website | Separate public web app (same monorepo) + Hub public APIs | Security isolation between public onboarding and admin/customer portal. |
| Repo strategy | Monorepo for first-party services; OpenClaw kept separate upstream repo | Strong contract sharing + CI simplicity without violating upstream dependency model. |
### 2.2 System Context Diagram
```mermaid
flowchart LR
subgraph Client[Client Layer]
M[Mobile App\nReact Native + Expo]
W[Website\nOnboarding + Checkout]
C[Customer Portal Web]
A[Admin Portal Web]
end
subgraph Control[Central Platform]
H[Hub API + UI\nNext.js + Prisma]
DB[(PostgreSQL)]
Q[Background Workers\nAutomation + Metering]
N[Notification Service\nPush/Email]
ST[Stripe]
NC[Netcup/Hetzner]
end
subgraph Tenant[Per-Customer VPS]
OC[OpenClaw Gateway\nUpstream]
SW[Safety Wrapper Plugin\nHooks + Classification]
SP[LLM Egress Proxy\nSecrets Firewall]
SV[(Secrets Vault SQLite\nEncrypted)]
TA[Tool Adapters + Exec Guards]
TS[(Tool Stacks 25+)]
AP[(Approval Cache SQLite)]
TU[(Token Usage Buckets)]
end
M --> H
W --> H
C --> H
A --> H
H --> DB
H --> Q
H --> N
H <--> ST
H <--> NC
H <--> OC
OC --> SW
SW --> SP
SP --> LLM[(LLM Providers)]
SW <--> SV
SW <--> TA
TA <--> TS
SW <--> AP
SW --> TU
TU --> H
```
## 3. Tenant Runtime Architecture
### 3.1 4-Layer Security Enforcement
| Layer | Enforcement Point | Implementation |
|---|---|---|
| 1. Sandbox | OpenClaw runtime/tool sandbox settings | OpenClaw native sandbox + process/container isolation. |
| 2. Tool Policy | OpenClaw agent tool allow/deny | Per-agent tool manifest; tools not listed are unreachable. |
| 3. Command Gating | Safety Wrapper `before_tool_call` | Green/Yellow/Yellow+External/Red/Critical Red classification + approval flow. |
| 4. Secrets Redaction | Local egress proxy + transcript hooks | Outbound prompt redaction before network egress, plus log/transcript redaction hooks. |
### 3.2 Safety Wrapper Components
- `classification-engine`: deterministic rules engine with signed policy bundle from Hub.
- `approval-gateway`: sync/async approval requests to Hub, with 24h expiry.
- `secret-ref-resolver`: resolves `SECRET_REF(...)` at execution time only.
- `adapter-runtime`: executes tool API adapters and guarded shell/docker/file actions.
- `metering-collector`: captures per-agent/per-model token usage and aggregates hourly.
- `hub-sync-client`: registration, heartbeat, config pull, backup status, command results.
### 3.3 OpenClaw Hook Usage (No Fork)
Safety Wrapper plugin uses upstream hook points for enforcement and observability:
- `before_tool_call`: classify/gate/block/require approval.
- `after_tool_call`: audit capture + normalization.
- `message_sending`: outbound content redaction.
- `before_message_write`, `tool_result_persist`: local persistence redaction.
- `llm_output`: token accounting and per-model usage capture.
- `before_prompt_build`: inject cacheable SOUL/TOOLS prefix metadata.
- `subagent_spawning`: enforce max depth/budget.
- `gateway_start`: health checks + Hub session bootstrap.
## 4. Primary Data Flows
### 4.1 Signup To Provisioning Flow
```mermaid
sequenceDiagram
participant User
participant Site as Website
participant Hub
participant Stripe
participant Worker as Automation Worker
participant Provider as Netcup/Hetzner
participant Prov as Provisioner
participant VPS as Tenant VPS
User->>Site: Describe business + pick tools
Site->>Hub: Create onboarding draft
Site->>Stripe: Checkout session
Stripe-->>Hub: checkout.session.completed
Hub->>Worker: Create order (PAYMENT_CONFIRMED)
Worker->>Provider: Allocate VPS
Provider-->>Worker: VPS ready (IP + creds)
Worker->>Hub: DNS_PENDING -> DNS_READY
Worker->>Prov: Start provisioning job
Prov->>VPS: Install stacks + OpenClaw + Safety
Prov->>VPS: Seed secrets vault + tool registry
Prov->>VPS: Register tenant with Hub
VPS-->>Hub: register + first heartbeat
Hub-->>User: Provisioning complete + app links
```
### 4.2 Agent Tool Call With Gating
```mermaid
sequenceDiagram
participant U as User
participant OC as OpenClaw
participant SW as Safety Wrapper
participant H as Hub
participant T as Tool/API
U->>OC: "Publish this newsletter"
OC->>SW: tool call proposal
SW->>SW: classify = Yellow+External
SW->>H: approval request
H-->>U: push approval request
U->>H: approve
H-->>SW: approval grant
SW->>T: execute with SECRET_REF injection
T-->>SW: result
SW-->>OC: redacted result
OC-->>U: completion summary
```
### 4.3 Secrets Redaction Outbound Flow
```mermaid
flowchart LR
A[OpenClaw Prompt Payload] --> B[Safety Wrapper Pre-Redaction]
B --> C[Secrets Registry Match]
C --> D[Pattern Safety Net]
D --> E[Function-Call SecretRef Rebinding]
E --> F[Local Egress Proxy]
F --> G[Provider API]
C --> C1[(Vault SQLite)]
D --> D1[(Regex + Entropy Rules)]
F --> F1[Transport-Level Block if bypass attempt]
```
### 4.4 Token Metering And Billing
```mermaid
flowchart LR
O[OpenClaw llm_output hook] --> M[Metering Collector]
M --> B[(Hourly Buckets SQLite)]
B --> H[Hub Usage Ingest API]
H --> P[(Billing Period + Usage Tables)]
P --> S[Stripe Usage/Billing]
H --> UI[Usage Dashboard + Alerts]
```
## 5. Prompt Caching Architecture
- SOUL.md and TOOLS.md are split into stable cacheable prefix blocks and dynamic suffix blocks.
- Stable prefix hash is generated per agent version.
- Prefix changes only when agent config changes; day-to-day conversations hit cache-read pricing.
- Metering persists `input/output/cache_read/cache_write` separately to preserve margin analytics.
## 6. Mobile, Website, And Channel Architecture
### 6.1 Mobile App
- React Native + Expo app as primary interface.
- Real-time chat via Hub websocket gateway.
- Approvals as push notifications (approve/deny quick actions).
- Fallback channel switchboard in Hub for WhatsApp/Telegram relay adapters.
### 6.2 Website + Onboarding
- Dedicated public frontend app (`apps/website`) with strict network boundary to Hub public APIs.
- Onboarding classifier service (cheap model profile) performs 1-2 message business classification.
- Tool bundle recommendation engine returns editable stack + resource calculator.
- Checkout remains Stripe-hosted.
## 7. First-Hour Workflow Templates (Architecture Proof)
| Template | Cross-Tool Actions | Gating Profile |
|---|---|---|
| Freelancer First Hour | Connect mail + calendar, create folders, configure intake form, first daily brief | Mostly Green/Yellow |
| Agency First Hour | Chat inbox setup, project board scaffolding, proposal template generation, shared KB setup | Yellow + Yellow+External approval |
| E-commerce First Hour | Inventory import, support inbox routing, analytics dashboard baseline, recovery email draft | Mixed Yellow/Yellow+External |
| Consulting First Hour | Scheduling links, client doc signature template, CRM stages, weekly report automation | Mostly Yellow + one external gate |
These templates are codified as audited workflow blueprints executed through the same command classification path as ad-hoc agent actions.
## 8. Interactive Demo Architecture (Pre-Purchase)
Proposal: shared but isolated "Demo Tenant Pool" instead of a single static demo VPS.
- Each prospect gets a short-lived demo tenant snapshot (TTL 2 hours).
- Demo runs synthetic data and fake outbound integrations only.
- Same Safety Wrapper + approvals UI as production to demonstrate trust model.
- Recycled automatically after session expiry.
This is safer and more realistic than one long-lived shared "Bella's Bakery" host.
## 9. Required Pre-Launch Cleanup Baseline
Before core build starts, execute repository cleanup gate:
- Remove all `n8n` references from Hub, Provisioner, stacks, scripts, tests, and docs used for production behavior.
- Remove deployment references to deprecated `orchestrator` and `sysadmin-agent` from active provisioning paths.
- Close plaintext credential leak path (`jobs/*/config.json` root password exposure) by moving to one-time secret files + immediate secure deletion.
No feature work should proceed until this baseline passes CI policy checks.

View File

@@ -0,0 +1,334 @@
# 02. Component Breakdown And API Contracts
## 1. Component Breakdown
## 1.1 Control Plane Components
| Component | Runtime | Responsibility | Notes |
|---|---|---|---|
| Hub Web/API | Next.js 16 + Node | Admin UI, customer portal, public APIs, tenant APIs | Keep existing app, add route groups and API contracts below. |
| Billing Engine | Node worker + Prisma | Usage aggregation, pool accounting, overage invoicing | Hourly usage compaction + end-of-period invoice sync. |
| Provisioning Orchestrator | Existing automation worker | Order state machine and provisioning job dispatch | Keep and harden existing job pipeline. |
| Notification Gateway | Node service | Push notifications, email alerts, approval prompts | Expo push + email provider adapters. |
| Onboarding Classifier | Lightweight service | Business-type classification + starter bundle recommendation | Cheap fast model profile; capped context. |
## 1.2 Tenant Components (Per VPS)
| Component | Runtime | Responsibility | State Store |
|---|---|---|---|
| OpenClaw Gateway | Node 22+ upstream | Agent runtime, sessions, tool orchestration | OpenClaw JSON/JSONL storage |
| Safety Wrapper Plugin | TypeScript package | Classification, gating, hooks, metering, Hub sync | SQLite (`safety.db`) |
| Egress Proxy | Node/Rust sidecar | Outbound redaction + transport enforcement | In-memory + policy cache |
| Execution Adapters | Local modules | Shell/Docker/file/env and tool REST adapters | Audit log in SQLite |
| Secrets Vault | SQLite + encryption | Secret values, rotation history, fingerprints | `vault.db` |
## 1.3 Deprecated Components (Explicitly Out)
- `letsbe-orchestrator`: behavior studied for migration inputs only.
- `letsbe-sysadmin-agent`: executor patterns ported, service itself not retained.
- `letsbe-mcp-browser`: replaced by OpenClaw native browser tooling.
## 2. API Design Rules (Applies To All Contracts)
- Base path versioning: `/api/v1/...`
- JSON request/response with strict schema validation.
- Idempotency required on mutating tenant commands (`Idempotency-Key` header).
- Authn/authz split by channel:
- Tenant channel: `Bearer <tenant_api_key>` (hash stored server-side)
- Mobile/customer channel: session JWT + RBAC
- Public website onboarding: scoped API key + anti-abuse limits
- All mutating endpoints emit audit event rows.
- All time fields are ISO 8601 UTC.
## 3. Hub ↔ Tenant API Contracts
## 3.1 Register Tenant Node
`POST /api/v1/tenant/register`
Purpose: first boot registration from Safety Wrapper.
Request:
```json
{
"registrationToken": "rt_...",
"orderId": "ord_...",
"agentVersion": "safety-wrapper@0.1.0",
"openclawVersion": "2026.2.26",
"hostname": "cust-vps-001",
"capabilities": ["browser", "exec", "docker", "approval_queue"]
}
```
Response `201`:
```json
{
"tenantApiKey": "tk_live_...",
"tenantId": "ten_...",
"heartbeatIntervalSec": 30,
"configEtag": "cfg_9f1a...",
"time": "2026-02-26T20:15:00Z"
}
```
## 3.2 Heartbeat + Pull Deltas
`POST /api/v1/tenant/heartbeat`
Purpose: status signal plus lightweight config/update pull.
Request:
```json
{
"tenantId": "ten_...",
"server": {
"uptimeSec": 86400,
"diskPct": 61.2,
"memPct": 57.8,
"openclawHealthy": true
},
"agents": [
{"agentId": "marketing", "status": "online", "autonomyLevel": 2}
],
"pendingApprovals": 1,
"lastAppliedConfigEtag": "cfg_9f1a..."
}
```
Response `200`:
```json
{
"configChanged": true,
"nextConfigEtag": "cfg_9f1b...",
"commands": [],
"clock": "2026-02-26T20:15:30Z"
}
```
## 3.3 Pull Full Tenant Config
`GET /api/v1/tenant/config?etag=cfg_9f1a...`
Response `200` includes:
- agent definitions (SOUL/TOOLS refs, model profile)
- autonomy policy
- external comms gate unlock map
- command classification ruleset checksum
- tool registry template version
## 3.4 Approval Request / Resolve
`POST /api/v1/tenant/approval-requests`
```json
{
"tenantId": "ten_...",
"requestId": "apr_...",
"agentId": "marketing",
"class": "yellow_external",
"tool": "listmonk.send_campaign",
"humanSummary": "Send campaign 'March Offer' to 1,204 recipients",
"expiresAt": "2026-02-27T20:15:30Z",
"context": {"recipientCount": 1204}
}
```
`GET /api/v1/tenant/approval-requests/{requestId}` returns `PENDING|APPROVED|DENIED|EXPIRED`.
## 3.5 Usage Ingestion
`POST /api/v1/tenant/usage-buckets`
```json
{
"tenantId": "ten_...",
"buckets": [
{
"hour": "2026-02-26T20:00:00Z",
"agentId": "marketing",
"model": "openrouter/deepseek-v3.2",
"inputTokens": 12000,
"outputTokens": 3800,
"cacheReadTokens": 6400,
"cacheWriteTokens": 0,
"webSearchCalls": 3,
"webFetchCalls": 1
}
]
}
```
## 3.6 Backup Status
`POST /api/v1/tenant/backup-status`
Tracks last run, duration, snapshot ID, integrity verification state.
## 4. Customer/Mobile API Contracts
## 4.1 Agent And Autonomy Management
- `GET /api/v1/customer/agents`
- `PATCH /api/v1/customer/agents/{agentId}`
- `PATCH /api/v1/customer/agents/{agentId}/autonomy`
- `PATCH /api/v1/customer/agents/{agentId}/external-comms-gate`
Autonomy update request:
```json
{
"autonomyLevel": 2,
"externalComms": {
"defaultLocked": true,
"toolUnlocks": [
{"tool": "chatwoot.reply_external", "enabled": true, "expiresAt": null}
]
}
}
```
## 4.2 Approval Queue
- `GET /api/v1/customer/approvals?status=pending`
- `POST /api/v1/customer/approvals/{id}` with `{ "decision": "approve" | "deny" }`
## 4.3 Usage And Billing
- `GET /api/v1/customer/usage/summary`
- `GET /api/v1/customer/usage/by-agent`
- `GET /api/v1/customer/billing/current-period`
- `POST /api/v1/customer/billing/payment-method`
## 4.4 Realtime Channels
- `GET /api/v1/customer/events/stream` (SSE fallback)
- `WS /api/v1/customer/ws` (chat updates, approvals, status)
## 5. Public Website/Onboarding API Contracts
## 5.1 Business Classification
`POST /api/v1/public/onboarding/classify`
```json
{
"sessionId": "onb_...",
"messages": [
{"role": "user", "content": "I run a 5-person digital agency"}
]
}
```
Response:
```json
{
"businessType": "agency",
"confidence": 0.91,
"recommendedBundle": "agency_core_v1",
"followUpQuestion": "Do you need ticketing or only chat?"
}
```
## 5.2 Bundle Quote
`POST /api/v1/public/onboarding/quote`
Returns min tier, projected token pool, monthly estimate, and Stripe checkout seed payload.
## 5.3 Order Creation
`POST /api/v1/public/orders` with strict schema + anti-fraud controls.
## 6. Safety Wrapper Internal Contract (Local Only)
Local Unix socket JSON-RPC interface between plugin orchestration and execution layer.
Method examples:
- `exec.run`
- `docker.compose`
- `file.read`
- `file.write`
- `env.update`
- `tool.http.call`
Example request:
```json
{
"id": "rpc_1",
"method": "tool.http.call",
"params": {
"tool": "ghost",
"operation": "posts.create",
"secretRefs": ["ghost_admin_key"],
"payload": {"title": "..."}
}
}
```
Guarantees:
- Secrets passed only as references, never raw values in request logs.
- Execution engine resolves references inside isolated process boundary.
- Full request/result hashes persisted for audit traceability.
## 7. Tool Registry Contract
`tool-registry.json` shape (tenant-local):
```json
{
"version": "2026-02-26",
"tools": [
{
"id": "chatwoot",
"baseUrl": "https://chat.customer-domain.tld",
"auth": {"type": "bearer_secret_ref", "ref": "chatwoot_api_token"},
"adapters": ["contacts.list", "conversation.reply"],
"externalCommsOperations": ["conversation.reply_external"],
"cheatsheet": "/opt/letsbe/cheatsheets/chatwoot.md"
}
]
}
```
## 8. Error Contract And Retries
Standard error envelope:
```json
{
"error": {
"code": "APPROVAL_REQUIRED",
"message": "Operation requires approval",
"requestId": "req_...",
"retryable": true,
"details": {"approvalRequestId": "apr_..."}
}
}
```
Common error codes:
- `AUTH_INVALID`
- `TENANT_UNKNOWN`
- `APPROVAL_REQUIRED`
- `APPROVAL_EXPIRED`
- `CLASSIFICATION_BLOCKED`
- `SECRET_REF_UNRESOLVED`
- `POLICY_VERSION_MISMATCH`
- `RATE_LIMITED`
## 9. API Compatibility And Change Policy
- Backward-compatible additions: allowed in-place.
- Breaking changes: new version path (`/api/v2`).
- Deprecation window: minimum 60 days for tenant APIs.
- Contract tests run in CI for Hub, Safety Wrapper, Mobile, and Website clients.

View File

@@ -0,0 +1,145 @@
# 03. Deployment Strategy
## 1. Goals
- Ship to founding members in ~12 weeks without compromising security invariants.
- Maintain one-VPS-per-customer isolation.
- Keep OpenClaw upstream-pinned and independently upgradeable.
- Make tenant rollout reversible with fast rollback paths.
## 2. Environment Topology
## 2.1 Control Plane Environments
| Environment | Purpose | Data |
|---|---|---|
| `dev` | Rapid feature iteration | Synthetic/local data |
| `staging` | Release-candidate validation, e2e, load, security checks | Sanitized fixtures |
| `prod-eu` | EU customers (default EU routing) | Real customer data |
| `prod-us` | NA customers (default NA routing) | Real customer data |
Control plane services (Hub + worker + notifications) are region-deployed with independent DBs and clear region affinity.
## 2.2 Tenant Environments
- `sandbox tenants`: internal QA and interactive demo pool.
- `canary tenants`: first real-production update recipients.
- `general tenants`: full customer fleet.
## 3. Deployment Units
## 3.1 Control Plane Units
- `hub-web-api` container (Next.js standalone runtime)
- `hub-worker` container (automation + billing jobs)
- `notifications` container (push/email delivery)
- `postgres` (managed or self-hosted HA)
## 3.2 Tenant Units (Per Customer VPS)
- `openclaw` container (upstream image/tag pinned)
- `safety-wrapper` plugin package mounted into OpenClaw extension dir
- `egress-proxy` service (localhost-only)
- tool containers and nginx from provisioner
- local SQLite data stores for secrets/approvals/metering
## 4. Provisioning Deployment Plan
## 4.1 Provisioner Mode
Continue with existing one-shot SSH provisioner flow, retooled to:
- deploy OpenClaw + Safety components
- remove legacy orchestrator/sysadmin deployment
- strip deprecated stacks and n8n references
- write secrets into encrypted vault only (no plaintext long-lived config)
## 4.2 Immutable Artifact Inputs
Provisioning uses pinned artifacts only:
- OpenClaw release tag (`stable` channel pin)
- Safety Wrapper image/package digest
- Tool stack compose templates with hash
- policy bundle version + checksum
## 5. Secrets And Credential Deployment
- Registration token is one-time and short-lived.
- Tenant API key returned at registration; only hash stored in Hub DB.
- Provisioner writes bootstrap secrets to tmpfs file, consumed once, then shredded.
- Existing plaintext job config path (`jobs/<id>/config.json`) replaced by encrypted payload + ephemeral decrypt-on-run.
## 6. Release Strategy
## 6.1 Control Plane
- Trunk-based merges behind feature flags.
- Deploy via Gitea Actions with staged promotions (`dev -> staging -> prod`).
- DB migrations run in expand/contract pattern.
## 6.2 Tenant Plane
Tenant updates split into independent channels:
- `policy-only`: classification/autonomy/tool policy updates (no binary change)
- `wrapper patch`: Safety Wrapper version bump
- `openclaw bump`: upstream release bump (separate tracked campaign)
Rollout:
1. Internal sandbox tenants
2. 5% canary customer tenants
3. 25%
4. 100%
Auto-stop criteria:
- redaction test failure
- approval-routing failure >1%
- tenant heartbeat drop >3%
## 7. Rollback Strategy
## 7.1 Control Plane Rollback
- Keep last two container digests deployable.
- Migration rollback policy: only for reversible migrations; otherwise hotfix-forward.
## 7.2 Tenant Rollback
- Policy rollback via previous signed policy bundle.
- Wrapper rollback to previous plugin package.
- OpenClaw rollback to previous pinned stable tag after compatibility check.
## 8. Observability And SLOs
## 8.1 Required Telemetry
- tenant heartbeat latency and freshness
- approval queue latency (request -> decision)
- redaction pipeline counters (matches by layer)
- token usage ingest lag
- provisioning success/failure per step
## 8.2 Launch SLO Targets
- Hub API availability: 99.9%
- Tenant heartbeat freshness: 99% under 2 minutes
- Approval propagation: p95 < 5 seconds (Hub to mobile push)
- Provisioning success first-attempt: >= 90%
## 9. Dual-Provider Strategy (Netcup + Hetzner)
- Primary capacity pool on Netcup (EU/US).
- Overflow path on Hetzner with same provisioner scripts and hardened baseline.
- Provider adapter abstraction lives in Hub `server-provisioning` module; provisioner remains Debian-focused and provider-agnostic.
## 10. Cutover Plan From Current State
1. Freeze legacy orchestrator/sysadmin deployment paths.
2. Land prerequisite cleanup release (n8n/deprecated removal + credential leak fix).
3. Enable new tenant register/heartbeat APIs in Hub.
4. Provision first new-architecture internal tenant.
5. Execute parallel-run window (old and new provisioning flows side-by-side for internal only).
6. Flip default provisioning to new flow for production orders.

View File

@@ -0,0 +1,176 @@
# 04. Detailed Implementation Plan And Dependency Graph
## 1. Planning Assumptions
- Target launch window: 12 weeks.
- Team model assumed for schedule below:
- 2 backend/platform engineers
- 1 mobile/fullstack engineer
- 1 DevOps/SRE engineer
- 1 QA/security engineer (shared)
- Existing Hub codebase is retained and extended.
## 2. Work Breakdown Structure (WBS)
## Phase 0: Prerequisite Cleanup And Hardening (Week 1)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P0-1 | Remove all `n8n` code references (Hub, provisioner, stacks, scripts, tests) | 3d | - | `rg -n n8n` clean in production code paths; CI policy check added |
| P0-2 | Remove deprecated deploy targets (`orchestrator`, `sysadmin`) from active provisioning | 2d | P0-1 | No new orders can deploy deprecated services |
| P0-3 | Fix plaintext provisioning secret leak (`jobs/*/config.json`) | 2d | P0-1 | No root/server password persisted in plaintext job files |
| P0-4 | Baseline security regression tests for cleanup changes | 1d | P0-2,P0-3 | Green CI + sign-off |
## Phase 1: Safety Substrate (Weeks 2-3)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P1-1 | Build encrypted secrets vault SQLite schema + key management | 3d | P0-4 | CRUD, rotation, audit log implemented |
| P1-2 | Implement egress redaction proxy (registry + regex + entropy layers) | 4d | P1-1 | Redaction test suite pass with seeded secrets |
| P1-3 | Implement command classification engine (5-tier + external gate) | 3d | P1-1 | Deterministic policy tests pass |
| P1-4 | Implement approval state cache + retry logic (tenant-local) | 2d | P1-3 | Approval resilience tests pass |
| P1-5 | OpenClaw plugin skeleton with hooks + telemetry envelope | 3d | P1-2,P1-3 | Hook smoke tests green against pinned OpenClaw tag |
## Phase 2: Hub Tenant APIs + Data Model (Weeks 3-4)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P2-1 | Add Prisma models: approval queue, usage buckets, agent policy, comms unlocks | 2d | P0-4 | Migration applied in staging |
| P2-2 | Implement tenant register/heartbeat/config APIs | 3d | P2-1 | Contract tests pass |
| P2-3 | Implement tenant approval-request APIs + customer approval endpoints | 3d | P2-1 | End-to-end approval cycle works |
| P2-4 | Implement usage ingest + billing period updates | 3d | P2-1 | Usage events visible in dashboard |
| P2-5 | Add push notification pipeline for approvals | 2d | P2-3 | Mobile push test path validated |
## Phase 3: Safety Wrapper Execution Layer (Weeks 4-6)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P3-1 | Port shell/docker/file/env guarded executors from sysadmin patterns | 5d | P1-5 | Security unit tests pass |
| P3-2 | Implement tool registry loader + SECRET_REF resolver | 3d | P1-1,P3-1 | Tool calls run without raw secret exposure |
| P3-3 | Implement core adapters (Chatwoot, Ghost, Nextcloud, Cal.com, Odoo, Listmonk) | 6d | P3-2 | Adapter contract tests pass |
| P3-4 | Implement metering capture and hourly bucket compaction | 2d | P1-5,P2-4 | Buckets reliably posted to Hub |
| P3-5 | Add subagent budget/depth limits and policy enforcement | 2d | P1-5 | Policy tests and abuse tests pass |
## Phase 4: Provisioner Retool (Weeks 5-7)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P4-1 | Add OpenClaw + Safety deployment steps to provisioner | 4d | P3-2 | Fresh VPS comes online with heartbeat |
| P4-2 | Remove legacy stack templates and nginx configs from default deployment path | 2d | P0-2 | Deprecated stacks excluded from installs |
| P4-3 | Generate and deploy tenant configs/policies during provisioning | 3d | P2-2,P4-1 | Config sync succeeds on first boot |
| P4-4 | Migrate initial browser setup scenarios to OpenClaw browser tool | 4d | P4-1 | 8 scenarios replaced or retired |
| P4-5 | Add idempotent recovery checkpoints per provisioning step | 2d | P4-1 | Retry from failed step validated |
## Phase 5: Customer Interfaces (Weeks 6-9)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P5-1 | Customer web portal for approvals, agent settings, usage | 5d | P2-3,P2-4 | Beta usable on staging |
| P5-2 | Mobile app MVP (chat, approvals, health, usage) | 8d | P2-5,P5-1 | TestFlight/internal distribution ready |
| P5-3 | Public onboarding website + classifier + bundle calculator | 6d | P2-1 | Stripe flow works end-to-end |
| P5-4 | WhatsApp/Telegram fallback relay (minimal) | 3d | P2-3 | Approval fallback path works |
## Phase 6: Workflow Templates + Demo Experience (Weeks 8-10)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P6-1 | Implement 4 first-hour workflow templates as auditable blueprints | 5d | P3-3,P5-1 | Templates executable end-to-end |
| P6-2 | Build interactive demo tenant pool manager (TTL snapshots) | 4d | P4-1,P5-3 | Demo session provisioning <5 min |
| P6-3 | Add product telemetry for template completion and demo conversion | 2d | P6-1,P6-2 | Metrics dashboards live |
## Phase 7: Quality, Hardening, Launch (Weeks 10-12)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P7-1 | Full security test suite (redaction, gating, injection, auth) | 4d | P3-5,P4-5 | Critical findings resolved |
| P7-2 | Load, soak, and chaos tests on staging fleet | 3d | P6-1 | SLO gates met |
| P7-3 | Canary launch (5% -> 25% -> 100%) with rollback drills | 4d | P7-1,P7-2 | Canary metrics stable |
| P7-4 | Launch readiness review + runbook finalization | 2d | P7-3 | Founding member launch sign-off |
## 3. Dependency Graph
```mermaid
graph TD
P0_1[P0-1 n8n cleanup] --> P0_2[P0-2 deprecated deploy removal]
P0_1 --> P0_3[P0-3 plaintext secret fix]
P0_2 --> P0_4[P0-4 baseline security tests]
P0_3 --> P0_4
P0_4 --> P1_1[P1-1 vault]
P1_1 --> P1_2[P1-2 egress proxy]
P1_1 --> P1_3[P1-3 classification]
P1_3 --> P1_4[P1-4 approval cache]
P1_2 --> P1_5[P1-5 openclaw plugin skeleton]
P1_3 --> P1_5
P0_4 --> P2_1[P2-1 hub prisma models]
P2_1 --> P2_2[P2-2 tenant register/heartbeat/config]
P2_1 --> P2_3[P2-3 approval APIs]
P2_1 --> P2_4[P2-4 usage ingest]
P2_3 --> P2_5[P2-5 push notifications]
P1_5 --> P3_1[P3-1 guarded executors]
P1_1 --> P3_2[P3-2 tool registry + secret ref]
P3_1 --> P3_2
P3_2 --> P3_3[P3-3 tool adapters]
P1_5 --> P3_4[P3-4 metering]
P2_4 --> P3_4
P1_5 --> P3_5[P3-5 subagent controls]
P3_2 --> P4_1[P4-1 provisioner openclaw+safety]
P0_2 --> P4_2[P4-2 legacy stack template removal]
P2_2 --> P4_3[P4-3 config generation]
P4_1 --> P4_3
P4_1 --> P4_4[P4-4 browser scenario migration]
P4_1 --> P4_5[P4-5 idempotent checkpoints]
P2_3 --> P5_1[P5-1 customer portal]
P2_4 --> P5_1
P2_5 --> P5_2[P5-2 mobile app MVP]
P5_1 --> P5_2
P2_1 --> P5_3[P5-3 onboarding website]
P2_3 --> P5_4[P5-4 whatsapp/telegram fallback]
P3_3 --> P6_1[P6-1 first-hour templates]
P5_1 --> P6_1
P4_1 --> P6_2[P6-2 interactive demo pool]
P5_3 --> P6_2
P6_1 --> P6_3[P6-3 template/demo telemetry]
P6_2 --> P6_3
P3_5 --> P7_1[P7-1 full security suite]
P4_5 --> P7_1
P6_1 --> P7_2[P7-2 load/soak/chaos]
P7_1 --> P7_3[P7-3 canary launch]
P7_2 --> P7_3
P7_3 --> P7_4[P7-4 launch readiness]
```
## 4. Critical Path
Primary critical chain:
`P0 cleanup -> P1 safety substrate -> P3 execution layer -> P4 provisioner retool -> P7 hardening/canary`
Secondary critical chain:
`P2 Hub APIs -> P5 mobile approvals -> P7 canary`
## 5. Parallelization Strategy
To meet 12 weeks, run these in parallel after Week 3:
- Track A: Safety Wrapper + adapters (P3)
- Track B: Provisioner retool (P4)
- Track C: Customer interfaces (P5)
## 6. Definition Of Done (Program-Level)
Launch gate passes only when all are true:
- secrets-never-leave-server invariant passes automated red-team test suite
- gating matrix works exactly for all 5 command classes and 3 autonomy levels
- external comms gate enforces lock-by-default at all autonomy levels
- provisioning succeeds >=90% first attempt and >=99% with retries
- approval path works across web + mobile push with audit completeness
- usage metering reconciles with provider usage within <=1% variance

View File

@@ -0,0 +1,75 @@
# 05. Estimated Timelines
## 1. Date Anchors
- Planning baseline date: **Thursday, February 26, 2026**
- Proposed execution start: **Monday, March 2, 2026**
- 12-week target launch window end: **Sunday, May 24, 2026**
- Recommended contingency buffer: **May 25-31, 2026**
## 2. Timeline Summary
| Phase | Dates | Duration | Confidence |
|---|---|---:|---|
| Phase 0 prerequisites | Mar 2 - Mar 8 | 1 week | High |
| Phase 1 safety substrate | Mar 9 - Mar 22 | 2 weeks | Medium |
| Phase 2 Hub APIs/models | Mar 16 - Mar 29 | 2 weeks (overlap) | High |
| Phase 3 wrapper execution layer | Mar 23 - Apr 12 | 3 weeks | Medium |
| Phase 4 provisioner retool | Mar 30 - Apr 19 | 3 weeks (overlap) | Medium |
| Phase 5 mobile + website + portal | Apr 6 - May 3 | 4 weeks | Medium |
| Phase 6 templates + demo | Apr 27 - May 10 | 2 weeks | Medium |
| Phase 7 hardening + canary + launch | May 4 - May 24 | 3 weeks | Medium-Low |
## 3. Milestones
| Milestone | Target Date | Exit Condition |
|---|---|---|
| M1: Cleanup gate passed | Mar 8, 2026 | n8n and deprecated deploy paths removed; plaintext secret leak fixed |
| M2: Security substrate alpha | Mar 22, 2026 | redaction proxy + classifier + plugin skeleton integrated |
| M3: Hub tenant APIs beta | Mar 29, 2026 | register/heartbeat/approval/usage contracts stable |
| M4: First full tenant provision | Apr 12, 2026 | new VPS boots with OpenClaw + Safety + heartbeat |
| M5: Customer interface beta | May 3, 2026 | web portal + mobile approvals + onboarding flow functional |
| M6: Launch candidate | May 17, 2026 | full security/perf test pass; canary starts |
| M7: Founding member launch | May 24, 2026 | canary complete; runbooks and rollback drills signed off |
## 4. Weekly View (Condensed)
```text
Week 1 (Mar 2) : Phase 0 prerequisite cleanup
Week 2-3 : Phase 1 safety substrate begins
Week 3-4 : Phase 2 Hub API/data model work (parallel)
Week 4-6 : Phase 3 wrapper execution + adapters
Week 5-7 : Phase 4 provisioner retool and browser migration
Week 6-9 : Phase 5 customer portal/mobile/website
Week 9-10 : Phase 6 templates + interactive demo
Week 10-12 : Phase 7 hardening, canary rollout, launch
Buffer Week : May 25-31 contingency
```
## 5. Critical Timeline Risks
| Risk | Schedule Impact If Realized |
|---|---|
| OpenClaw hook behavior drift or undocumented edge cases | +1 to +2 weeks |
| Provisioner migration instability on fresh VPS images | +1 week |
| Mobile push approval reliability issues (iOS/Android differences) | +0.5 to +1 week |
| Token billing reconciliation defects with Stripe meter events | +1 week |
| Security findings in redaction/gating late in cycle | +1 to +3 weeks |
## 6. Confidence Ranges
| Scenario | Launch Window |
|---|---|
| Optimistic | May 17-24, 2026 |
| Most likely | May 24-31, 2026 |
| Conservative | June 7-14, 2026 |
## 7. Scope Compression Options (If Needed)
To preserve security and launch by May 24-31, de-scope in this order:
1. Delay WhatsApp/Telegram fallback to post-launch.
2. Limit initial tool adapter set to top 8 usage tools, keep others on browser fallback.
3. Ship 3 first-hour templates at launch, add the 4th in first patch.
Do **not** cut redaction, gating, approval, or metering correctness work.

View File

@@ -0,0 +1,73 @@
# 06. Risk Assessment
## 1. Risk Scoring Method
- Probability: 1 (low) to 5 (high)
- Impact: 1 (low) to 5 (high)
- Risk score = Probability x Impact
## 2. Top Risks
| ID | Risk | Prob | Impact | Score | Mitigation | Contingency Trigger |
|---|---|---:|---:|---:|---|---|
| R1 | Secret exfiltration via unredacted outbound payload | 3 | 5 | 15 | Multi-layer redaction tests, egress deny-by-default policy, seeded canary secrets | Any unredacted canary secret seen outside tenant |
| R2 | Command gating bypass due misclassification | 3 | 5 | 15 | Deterministic policy engine, contract tests per class, human-readable reason logging | Red/Critical executes without approval in tests |
| R3 | OpenClaw upstream changes break plugin behavior | 3 | 4 | 12 | Pin stable tags, adapter compatibility suite, staged upgrade canaries | Hook contract test fails against new tag |
| R4 | Provisioner regressions reduce provisioning success | 4 | 4 | 16 | Idempotent checkpoints, replay tests, synthetic VPS CI | First-attempt success < 90% |
| R5 | Billing usage mismatch vs provider costs | 3 | 4 | 12 | Dual-entry usage checks, nightly reconciliation jobs, alert thresholds | >1% sustained variance for 24h |
| R6 | Mobile approval notification delays/drop | 3 | 3 | 9 | Push retries + in-app queue fallback + email fallback | p95 approval notify > 30s |
| R7 | Performance overhead exceeds Lite-tier budget | 2 | 4 | 8 | Memory profiling budget gates, disable non-essential plugins, tune browser lifecycle | LetsBe overhead > 800MB sustained |
| R8 | Tool API churn breaks adapters | 4 | 3 | 12 | Adapter integration tests against pinned versions, fallback to browser playbook | Adapter failure rate > 5% |
| R9 | Security debt from AI-generated code quality | 4 | 4 | 16 | Mandatory senior review on security modules, lint rules, banned patterns checks | Critical static-analysis finding unresolved >48h |
| R10 | Legal/compliance drift (license/source disclosure pages) | 2 | 4 | 8 | Automated license manifest publishing, pre-release legal checklist | Missing OSS disclosure page at RC freeze |
## 3. Risk Register By Domain
## 3.1 Security Risks
- Redaction misses non-standard secret formats.
- External comms gate incorrectly tied to autonomy level.
- Local logs/transcripts persist raw secret material.
- Local execution adapters allow shell metacharacter bypass.
## 3.2 Delivery Risks
- Too much simultaneous change across Hub + provisioner + tenant runtime.
- Underestimated migration effort from deprecated orchestrator/sysadmin behaviors.
- Browser automation migration complexity for setup scripts.
## 3.3 Operational Risks
- Dual-region Hub operations increase DB and deploy complexity.
- Insufficient on-call runbooks for approval outages and provisioning failures.
- Canary rollout without automated rollback criteria.
## 4. Mitigation Program
## 4.1 Pre-Launch Controls
- Security invariants are encoded as executable tests (not checklist-only).
- Every release candidate must pass redaction canary probes.
- Dry-run provisioning must pass on both Netcup and Hetzner targets.
## 4.2 Runtime Controls
- Alert on heartbeat freshness degradation.
- Alert on approval queue lag and expiration spikes.
- Alert on sudden drop in cache-read ratio (cost anomaly indicator).
## 4.3 Governance Controls
- Security design review required for changes in Safety Wrapper, redaction, or secrets flows.
- Migration freeze on deprecated paths after Phase 0.
- Weekly risk review with updated probability/impact re-scoring.
## 5. Launch Go/No-Go Risk Gates
No launch if any condition is true:
- unresolved severity-1 security defect
- redaction tests fail for any supported secret class
- command gating matrix not fully passing
- usage reconciliation error >1% over 72h canary
- provisioning first-attempt success below 85% in final week

View File

@@ -0,0 +1,111 @@
# 07. Testing Strategy Proposal
## 1. Testing Principles
- Security-critical behavior is verified with invariant tests, not only unit coverage.
- Contract-first testing between Hub, Safety Wrapper, Mobile, Website, and Provisioner.
- Fast feedback in CI, deep verification in staging and nightly runs.
- AI-generated code receives stricter review and mutation testing on critical paths.
## 2. Test Pyramid By Component
| Layer | Hub | Safety Wrapper | Provisioner | Mobile/Website |
|---|---|---|---|---|
| Unit | services, validators, policy logic | classifier, redactor, secret resolver, adapters | parser/utils/template render | UI logic, state stores, hooks |
| Integration | Prisma + API handlers + auth | plugin hooks vs OpenClaw test harness | SSH runner against disposable VM | API integration against mock Hub |
| End-to-end | full order/provision/approval/billing flow | tenant command execution path | full 10-step provisioning with checkpoints | chat/approval/onboarding user journeys |
| Security | authz, rate-limit, session hardening | secret exfil tests, gating bypass tests | credential leakage scans | token storage, deep link auth |
| Performance | API p95 and DB load | per-turn latency overhead, memory usage | provisioning duration and retry cost | startup latency, push receipt latency |
## 3. Mandatory Security Invariant Suite
The following automated tests are required before each release:
1. **Secrets Never Leave Server Test**
- Seed known secrets in vault and files.
- Trigger prompts/tool outputs containing these values.
- Assert outbound payloads and persisted logs contain only placeholders.
2. **Command Classification Matrix Test**
- Execute fixtures for each command class (Green/Yellow/Yellow+External/Red/Critical).
- Validate behavior across autonomy levels 1-3.
3. **External Comms Independence Test**
- At autonomy level 3, external action remains blocked when comms gate locked.
- Unlock only targeted tool; validate others remain blocked.
4. **Approval Expiry Test**
- Approval request expires at 24h.
- Late approval cannot be replayed.
5. **SECRET_REF Boundary Test**
- Secrets cannot be requested directly by raw name/value.
- Only valid references in allowlisted tool operations resolve.
## 4. Provisioning Test Strategy
## 4.1 Fast Checks
- Shellcheck + static checks for bash scripts.
- Template substitution tests (all placeholders resolved, none leaked).
- Stack inventory policy tests (no banned tools like n8n).
## 4.2 Disposable VPS E2E
Nightly automated runs:
- create disposable Debian VPS
- run full provisioning
- run smoke checks on selected tool endpoints
- verify tenant registration + heartbeat + approvals
- tear down VPS and collect artifacts
## 5. Contract Testing
- OpenAPI specs for Hub APIs and tenant APIs.
- Consumer-driven contract tests for:
- Safety Wrapper against Hub tenant endpoints
- Mobile app against customer endpoints
- Website onboarding against public endpoints
- Contract break blocks merge.
## 6. Data And Billing Validation
- Synthetic token event generator with known totals.
- Reconcile tenant usage buckets against Hub aggregated totals.
- Reconcile Hub totals against Stripe meter/invoice preview.
- Fail build if variance exceeds threshold.
## 7. Quality Gates (CI)
- Unit + integration tests must pass.
- Security invariants must pass.
- Critical package diff review for Safety Wrapper and Provisioner.
- Minimum thresholds:
- security-critical modules: >=90% branch coverage
- overall backend: >=75% branch coverage
- Mutation testing on classifier and redactor modules.
## 8. Human Review Workflow (Anti-AI-Slop)
Required for security-critical PRs:
- one reviewer validates threat model assumptions
- one reviewer validates test completeness and failure cases
- checklist includes: error paths, rollback behavior, idempotency, logging hygiene
No direct auto-merge for changes in:
- redaction engine
- command classifier
- secret storage/resolution
- provisioning credential handling
## 9. Launch Validation Checklist
Before founding-member launch:
- 7-day staging soak with no sev-1/2 defects
- two successful rollback drills (control plane and tenant plane)
- production canary with live approval + billing reconciliation
- first-hour templates executed successfully on staging tenants

View File

@@ -0,0 +1,128 @@
# 08. CI/CD Strategy (Gitea-Based)
## 1. Objectives
- Keep release cadence high without bypassing security checks.
- Provide deterministic, reproducible artifacts for Hub, Safety components, and Provisioner.
- Enforce policy gates (security invariants, banned tools, contract compatibility) in CI.
## 2. Platform Baseline
- CI engine: **Gitea Actions** with self-hosted **act_runner**.
- Artifact registry: private container registry (`code.letsbe.solutions/...`).
- Deployment target:
- Control plane: Docker hosts (EU + US)
- Tenant plane: provisioner-managed customer VPS rollout jobs
## 3. Branch And Release Model
- `main`: releasable at all times.
- short-lived feature branches.
- release tags: `hub/vX.Y.Z`, `safety/vX.Y.Z`, `provisioner/vX.Y.Z`.
- hotfix branch only for production incidents, merged back to `main` immediately.
## 4. Pipeline Stages
## 4.1 Pull Request Pipeline
1. `lint-typecheck`
2. `unit-tests`
3. `integration-tests`
4. `contract-tests`
5. `security-scan` (SAST, dependency vulnerabilities, secret scan)
6. `policy-checks`:
- banned stack/reference detector (`n8n`, deprecated deploy targets)
- no plaintext credentials in artifacts/config
7. `build-preview-images`
## 4.2 Main Branch Pipeline
1. re-run all PR checks
2. build immutable release images
3. generate SBOMs
4. image signing (cosign/sigstore-compatible)
5. push to registry with digest pins
6. deploy to `dev` automatically
## 4.3 Promotion Pipelines
- `promote-staging`: manual approval gate + smoke tests
- `promote-prod-eu`: manual approval + canary checks
- `promote-prod-us`: separate manual gate after EU health confirmation
## 5. Tenant Rollout Pipeline
Separate workflow for tenant-plane updates:
- policy-only rollout job
- wrapper package rollout job
- OpenClaw version rollout campaign
Rollout controller enforces:
- canary percentages
- halt thresholds
- automated rollback trigger execution
## 6. Required Checks Per Package
| Package | Required Jobs |
|---|---|
| Hub | lint, unit, integration, Prisma migration check, API contract tests |
| Safety Wrapper | unit, hook integration (OpenClaw pinned tag), redaction/gating invariants |
| Egress Proxy | redaction corpus tests, outbound policy tests, perf checks |
| Provisioner | shellcheck, template checks, disposable VPS smoke run |
| Mobile | typecheck, unit/UI tests, API contract tests, build verification |
| Website | lint/typecheck, onboarding flow tests, pricing/quote tests |
## 7. Example Gitea Workflow Skeleton
```yaml
name: pr-checks
on: [pull_request]
jobs:
lint-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pnpm install --frozen-lockfile
- run: pnpm lint && pnpm typecheck
- run: pnpm test:unit
security-policy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pnpm test:security-invariants
- run: ./scripts/ci/check-banned-references.sh
- run: ./scripts/ci/check-no-plaintext-secrets.sh
```
## 8. Secrets And Runner Security
- Gitea secrets scoped by environment (`dev/staging/prod`).
- Runner hosts are isolated and ephemeral where possible.
- No production credentials in PR jobs.
- OIDC-based short-lived cloud/provider credentials preferred over long-lived static tokens.
## 9. Change Management Gates
Security-critical paths require extra gate:
- files under `safety-wrapper/`, `egress-proxy/`, `provisioner/scripts/credentials*`
- mandatory 2 reviewers
- security test suite pass required
- no force-merge override
## 10. Metrics For CI/CD Quality
Track weekly:
- median PR cycle time
- flaky test rate
- change failure rate
- mean time to rollback
- canary abort count
Use these metrics in weekly engineering ops review to keep speed/quality balance aligned with launch target.

View File

@@ -0,0 +1,105 @@
# 09. Repository Structure Proposal
## 1. Decision
**Choose: Monorepo for LetsBe first-party code, with OpenClaw kept as separate pinned upstream dependency.**
This is the best speed/quality tradeoff for a 3-month launch while preserving the non-fork requirement.
## 2. Why This Over Multi-Repo
## 2.1 Benefits
- Shared TypeScript contracts across Hub, Mobile, Website, and Safety services.
- One CI graph with selective test execution and consistent policy checks.
- Easier cross-cutting refactors (API shape changes, auth, telemetry schema updates).
- Better fit for AI-assisted coding workflows where context continuity matters.
## 2.2 Risks
- Larger repo and CI complexity.
- Migration effort from existing repo layout.
## 2.3 Mitigations
- Use path-based CI execution and build caching.
- Keep OpenClaw external to avoid massive vendor code in monorepo.
- Execute migration in controlled steps with history-preserving imports.
## 3. Proposed Structure
```text
letsbe-platform/
apps/
hub/ # Next.js admin + customer portal + APIs
website/ # public onboarding and marketing app
mobile/ # React Native + Expo
services/
safety-wrapper/ # OpenClaw plugin package
egress-proxy/ # LLM redaction proxy
provisioner/ # provisioning controller + scripts/templates
packages/
api-contracts/ # OpenAPI specs + TS SDKs
policy-engine/ # shared classification and gate logic
tooling-sdk/ # adapter framework + SECRET_REF utilities
ui-kit/ # shared design components (web/mobile where possible)
config/ # eslint/tsconfig/jest/shared tooling
infra/
gitea-workflows/
docker/
scripts/
docs/
architecture-proposal/
runbooks/
```
## 4. OpenClaw Upstream Strategy (No Fork)
OpenClaw remains outside monorepo as independent upstream source:
- Track pinned release tag in `services/safety-wrapper/openclaw-version.lock`.
- CI job pulls pinned OpenClaw version for compatibility tests.
- Upgrade workflow:
1. open compatibility PR bumping lock file
2. run hook-contract test suite
3. run staging canary tenants
4. promote if green
If a temporary patch is unavoidable, maintain patch as isolated overlay and upstream contribution plan; do not maintain long-lived fork branch.
## 5. Migration Plan From Current Repos
## 5.1 Current Inputs
- `letsbe-hub`
- `letsbe-ansible-runner`
- `letsbe-orchestrator` (reference only, not migrated as active runtime)
- `letsbe-sysadmin-agent` (reference only, patterns ported into Safety)
- `openclaw` (kept external)
## 5.2 Migration Steps
1. Create monorepo skeleton and shared package manager workspace.
2. Import `letsbe-hub` into `apps/hub` with history.
3. Import `letsbe-ansible-runner` into `services/provisioner`.
4. Create new `services/safety-wrapper` and `services/egress-proxy`.
5. Scaffold `apps/mobile` and `apps/website`.
6. Extract shared contracts from hub into `packages/api-contracts`.
7. Add compatibility adapters so existing deployments continue during transition.
8. Archive deprecated repos as read-only references after cutover.
## 6. Governance Model
- CODEOWNERS by area (`hub`, `safety`, `provisioner`, `mobile`, `website`).
- Required reviewer policy:
- 2 reviewers for `safety-wrapper`, `egress-proxy`, `provisioner` secrets paths.
- 1 reviewer for non-security UI changes.
- Architectural Decision Records (ADR) stored under `docs/adr`.
## 7. Alternative Considered: Keep Multi-Repo
Rejected for v1 because cross-repo contract drift is already visible in current state (legacy APIs, deprecated stacks, stale references). Under a 12-week launch window, contract drift risk is higher than monorepo migration overhead.
## 8. Post-Launch Option
After launch, if team scaling or compliance requirements demand stricter isolation, split out mobile and website into separate repos while preserving shared contract package publication.

View File

@@ -0,0 +1,102 @@
# 10. Technology Validation Sources
Validation date: **2026-02-26**
This proposal uses current official documentation (and release notes where relevant) for each major recommended technology.
## 1. OpenClaw
- Docs home: https://docs.openclaw.ai/
- Plugin development/hooks: https://docs.openclaw.ai/guide/developers/plugins/overview/
- Browser tool docs: https://docs.openclaw.ai/guide/tools/browser/
- OpenClaw GitHub releases/readme: https://github.com/openclawai/openclaw
Used for:
- hook names and plugin lifecycle
- browser capabilities and profile modes
- upstream release/update model
## 2. Next.js
- Official docs: https://nextjs.org/docs
- Release notes: https://nextjs.org/blog
Used for:
- app router patterns
- deployment/runtime guidance
- version-aware migration planning
## 3. Prisma
- Official docs: https://www.prisma.io/docs
- ORM release notes: https://github.com/prisma/prisma/releases
Used for:
- schema/migration guidance
- Prisma Client behavior and deployment practices
## 4. React Native + Expo
- React Native docs: https://reactnative.dev/docs/getting-started
- React Native releases: https://github.com/facebook/react-native/releases
- Expo docs: https://docs.expo.dev/
- Expo SDK changelog: https://expo.dev/changelog
Used for:
- mobile stack decision
- push notification and build pipeline planning
## 5. Flutter (evaluated alternative)
- Flutter docs: https://docs.flutter.dev/
- Flutter releases: https://github.com/flutter/flutter/releases
Used for:
- alternative comparison for mobile stack decision
## 6. Playwright
- Official docs: https://playwright.dev/docs/intro
- Release notes: https://playwright.dev/docs/release-notes
Used for:
- browser automation fallback strategy
- testing and scenario migration approach
## 7. SQLite
- SQLite docs: https://www.sqlite.org/docs.html
- SQLite file format/security references: https://www.sqlite.org/fileformat.html
Used for:
- tenant-local vault, approval cache, and usage bucket storage design
## 8. Stripe
- Stripe API docs: https://docs.stripe.com/api
- Usage-based billing/meter events: https://docs.stripe.com/billing/subscriptions/usage-based
Used for:
- overage billing architecture
- usage ingestion and invoice flow design
## 9. Gitea Actions / Act Runner
- Gitea Actions docs: https://docs.gitea.com/usage/actions/overview
- Act runner docs: https://docs.gitea.com/usage/actions/act-runner
Used for:
- CI/CD workflow strategy
- runner security and deployment pipeline design
## 10. Additional Provider References
- Netcup API context (existing integration baseline): https://www.netcup.com/en
- Hetzner Cloud docs (overflow strategy): https://docs.hetzner.cloud/
Used for:
- provider-agnostic provisioning strategy
## 11. Note On Source Priority
For technical decisions, this proposal prioritizes primary official documentation and release notes over secondary summaries.

View File

@@ -0,0 +1,41 @@
# LetsBe Biz Architecture Proposal (GPT Team)
Date: 2026-02-26
Author: GPT Architecture Team
This folder contains the complete architecture development plan requested in `docs/technical/LetsBe_Biz_Architecture_Brief.md` Section 1.
## Deliverables Index
0. [00-executive-summary.md](./00-executive-summary.md)
Executive direction and launch gating summary.
1. [01-architecture-and-dataflows.md](./01-architecture-and-dataflows.md)
Architecture document with system diagrams and data flow diagrams.
2. [02-components-and-api-contracts.md](./02-components-and-api-contracts.md)
Component breakdown and API contracts.
3. [03-deployment-strategy.md](./03-deployment-strategy.md)
Deployment strategy for control plane and tenant plane.
4. [04-implementation-plan-and-dependency-graph.md](./04-implementation-plan-and-dependency-graph.md)
Detailed implementation plan, task breakdown, and dependency graph.
5. [05-estimated-timelines.md](./05-estimated-timelines.md)
Estimated timelines and milestone schedule.
6. [06-risk-assessment.md](./06-risk-assessment.md)
Risk assessment and mitigation plan.
7. [07-testing-strategy.md](./07-testing-strategy.md)
Testing strategy proposal.
8. [08-cicd-strategy-gitea.md](./08-cicd-strategy-gitea.md)
Gitea-based CI/CD strategy.
9. [09-repository-structure-proposal.md](./09-repository-structure-proposal.md)
Repository structure proposal and migration plan.
10. [10-technology-validation-sources.md](./10-technology-validation-sources.md)
Current official documentation references used to validate technology choices.
## Executive Direction (One-Page Summary)
- Keep `letsbe-hub` (Next.js + Prisma) and retool it; do not rewrite core backend in v1 launch window.
- Build Safety Wrapper as OpenClaw plugin + local egress secrets proxy; keep OpenClaw upstream and un-forked.
- Remove all `n8n` and deprecated-stack references as a hard prerequisite (Week 1).
- Replace orchestrator/sysadmin responsibilities with explicit Hub↔Safety APIs and local execution adapters.
- Build mobile app with React Native + Expo for speed, push approvals, and shared TypeScript contracts.
- Use monorepo for first-party LetsBe code (Hub, Mobile, Safety services, Provisioner), while consuming OpenClaw as pinned upstream dependency.
- Target 12-week founding-member launch with strict security quality gates, canary rollout, and staged feature hardening.