# LetsBe Biz — Risk Assessment

**Date:** February 27, 2026
**Team:** Claude Opus 4.6 Architecture Team
**Document:** 06 of 09
**Status:** Proposal — Competing with independent team

---

## Table of Contents

1. [Risk Matrix Overview](#1-risk-matrix-overview)
2. [HIGH Risks](#2-high-risks)
3. [MEDIUM Risks](#3-medium-risks)
4. [LOW Risks](#4-low-risks)
5. [Known Unknowns](#5-known-unknowns)
6. [Security-Specific Risks](#6-security-specific-risks)
7. [Business & Operational Risks](#7-business--operational-risks)
8. [Dependency Risks](#8-dependency-risks)
9. [Risk Monitoring Plan](#9-risk-monitoring-plan)

---

## 1. Risk Matrix Overview

### Scoring

- **Impact:** How bad is it if this happens? (1-5, where 5 = catastrophic)
- **Likelihood:** How likely is it? (1-5, where 5 = almost certain)
- **Risk Score:** Impact × Likelihood
- **Severity:** HIGH (≥15), MEDIUM (8-14), LOW (≤7)

### Summary

| Severity | Count | Action Required |
|----------|-------|-----------------|
| HIGH | 6 | Active mitigation required; block launch if unresolved |
| MEDIUM | 9 | Mitigation planned; monitor weekly |
| LOW | 7 | Accepted; monitor monthly |

---

## 2. HIGH Risks

### H1 — Secrets Redaction Bypass

| Attribute | Value |
|-----------|-------|
| **Impact** | 5 (Catastrophic — customer secrets sent to LLM provider) |
| **Likelihood** | 3 (Possible — novel encoding/nesting could evade patterns) |
| **Risk Score** | 15 |
| **Category** | Security |

**Description:** The 4-layer redaction pipeline (Aho-Corasick → regex → entropy → JSON keys) may fail to catch secrets in edge cases: base64-encoded values, URL-encoded strings, secrets split across multiple JSON fields, secrets embedded in error messages from tools, or secrets in non-UTF-8 encodings.

**Mitigation:**
1. TDD approach — write adversarial tests BEFORE implementation (Phase 1, week 3)
2. Adversarial testing matrix from Technical Architecture §19.2: Unicode edge cases, base64, URL-encoded, nested JSON, YAML, log output
3. Shannon entropy filter (Layer 3) as catch-all for unknown patterns (≥4.5 bits/char, ≥32 chars)
4. Dedicated security audit in Phase 4 (week 13) with crafted bypass payloads
5. Post-launch: bug bounty program for redaction bypass (internal at first, public later)
6. Monitoring: log all redaction events; alert on suspiciously high entropy in outbound LLM calls

**Residual risk:** MEDIUM after mitigation. The entropy filter is the safety net, but it has false-positive trade-offs.

### H2 — OpenClaw Hook Gap (before_tool_call not bridged to external plugins)

| Attribute | Value |
|-----------|-------|
| **Impact** | 5 (Catastrophic — Safety Wrapper cannot intercept tool calls) |
| **Likelihood** | 2 (Unlikely — we've already planned for this via separate process) |
| **Risk Score** | 10 → Elevated to HIGH due to impact severity |
| **Category** | Technical / Dependency |

**Description:** The Technical Architecture v1.2 proposes the Safety Wrapper as an in-process OpenClaw extension using `before_tool_call` / `after_tool_call` hooks. Our analysis (GitHub Discussion #20575) found these hooks are NOT bridged to external plugins — they only work for bundled/internal hooks. This means the in-process extension model proposed in the Technical Architecture does not work as documented.

**Mitigation:**
1. **Already addressed:** Our architecture uses the Safety Wrapper as a SEPARATE PROCESS (localhost:8200). OpenClaw's tool calls are configured to route through the Safety Wrapper's HTTP API, not through in-process hooks.
2. OpenClaw's `exec` tool is configured to call the Safety Wrapper's execute endpoint instead of running commands directly.
3. OpenClaw's model provider is configured to proxy through the Secrets Proxy (localhost:8100) for LLM calls.
4. This approach is hook-independent — it works regardless of OpenClaw's internal hook architecture.

**Residual risk:** LOW after mitigation. The separate-process architecture was specifically designed to avoid this risk.

### H3 — OpenClaw Upstream Breaking Changes

| Attribute | Value |
|-----------|-------|
| **Impact** | 4 (Major — could break tool routing, sessions, or agent management) |
| **Likelihood** | 4 (Likely — OpenClaw is actively developed with calendar-versioned releases) |
| **Risk Score** | 16 |
| **Category** | Dependency |

**Description:** OpenClaw uses calendar versioning (2026.2.6-3) and is under active development. Breaking changes to the config format, tool system, session management, or API could break our integration. The v1.2 architecture already found one breaking change (hook bridging gap).

**Mitigation:**
1. Pin to a specific release tag (e.g., `v2026.2.6-3`). Never float to `latest`.
2. Monthly review of OpenClaw releases during development; quarterly post-launch.
3. Staging-first rollout: test new releases on staging VPS before any production deployment.
4. Canary deployment: staging → 5% → 25% → 100% (see 03-DEPLOYMENT-STRATEGY).
5. Maintain a compatibility test suite: 20-30 tests verifying our integration points (tool routing, LLM proxy, session management, config loading).
6. Document all integration points in a single "OpenClaw Integration Surface" document.

**Residual risk:** MEDIUM. We control the pin, but upstream changes may require adaptation work that delays feature development.

### H4 — Provisioner Reliability (Zero Tests)

| Attribute | Value |
|-----------|-------|
| **Impact** | 5 (Catastrophic — new customers can't be onboarded) |
| **Likelihood** | 3 (Possible — 4,477 LOC Bash with zero tests, complex SSH-based provisioning) |
| **Risk Score** | 15 |
| **Category** | Technical |

**Description:** The provisioner (`letsbe-provisioner`) is ~4,477 LOC of Bash scripts with zero automated tests. It performs 10-step SSH-based provisioning including Docker deployment, secret generation, nginx configuration, and SSL certificate setup. Any failure in this pipeline blocks new customer onboarding. The step 10 rewrite (replacing orchestrator/sysadmin with OpenClaw/Safety Wrapper) adds significant risk.

**Mitigation:**
1. Containerized integration test: run provisioner inside Docker against a test VPS (or mock SSH target). Phase 4, week 14.
2. Incremental testing during development: test each provisioner step independently.
3. Keep the existing provisioner working alongside the new step 10 until verified.
4. Pre-provisioned server pool: have 3-5 servers ready so provisioner failures don't block immediate customer needs.
5. Rollback procedure: if new provisioner fails, manually deploy the existing stack and convert later.
6. Manual verification checklist for the first 5 provisioning runs.

**Residual risk:** MEDIUM. The lack of automated tests is a persistent concern, but manual verification and the pre-provisioned pool mitigate the immediate impact.

### H5 — CVE-2026-25253 (Cross-Site WebSocket Hijacking in OpenClaw)

| Attribute | Value |
|-----------|-------|
| **Impact** | 4 (Major — potential unauthorized session access) |
| **Likelihood** | 2 (Unlikely — patched in v2026.1.29, but must verify pin includes fix) |
| **Risk Score** | 8 → Elevated to HIGH due to security nature |
| **Category** | Security / Dependency |

**Description:** CVE-2026-25253 (CVSS 8.8) is a cross-site WebSocket hijacking vulnerability in OpenClaw. Patched 2026-01-29. Our pinned version (v2026.2.6-3) includes the fix, but any downgrade or use of an older version would reintroduce it.

**Mitigation:**
1. Verify pinned version ≥ v2026.1.29 during CI build (automated check).
2. OpenClaw bound to loopback (127.0.0.1) — not exposed to external network, reducing attack surface.
3. `openclaw security audit --deep` run during provisioning (catches known CVEs).
4. Include CVE check in monthly OpenClaw review process.

**Residual risk:** LOW after mitigation. Loopback binding means external exploitation requires prior VPS access.

### H6 — Single Point of Failure: Safety Wrapper Lead

| Attribute | Value |
|-----------|-------|
| **Impact** | 4 (Major — critical path stalls; no one else understands security layer) |
| **Likelihood** | 3 (Possible — single senior engineer on core IP) |
| **Risk Score** | 12 → Elevated to HIGH due to critical path impact |
| **Category** | Organizational |

**Description:** The Safety Wrapper is the core IP and critical path item. It requires a senior engineer with security expertise. If this person is unavailable (illness, departure, burnout), the entire project stalls.

**Mitigation:**
1. Pair programming on all safety-critical code (classification, redaction, injection).
2. Weekly architecture reviews where the second engineer (Hub or DevOps) reviews Safety Wrapper changes.
3. Comprehensive documentation: every design decision, every edge case, every test rationale.
4. Cross-training: Hub Backend engineer should be able to make minor Safety Wrapper changes by week 8.
5. Code review culture: no Safety Wrapper PR merges without review from at least one other engineer.

**Residual risk:** MEDIUM. Documentation and cross-training reduce bus factor from 1 to ~1.5 by week 8.

---

## 3. MEDIUM Risks

### M1 — Mobile App Platform Inconsistencies

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — degraded experience on one platform) |
| **Likelihood** | 4 (Likely — iOS/Android differences are common with Expo) |
| **Risk Score** | 12 |
| **Category** | Technical |

**Description:** Expo Bare Workflow mitigates many platform differences, but push notification behavior, background app refresh, secure storage, and SSE streaming can differ between iOS and Android.

**Mitigation:**
1. Test on both platforms from week 9 (not just week 14).
2. Focus on Android first (more forgiving platform for initial testing), polish iOS separately.
3. Use Expo's managed push notification service (Expo Push) which abstracts APNs/FCM differences.
4. Secure storage: use `expo-secure-store` which wraps Keychain (iOS) and EncryptedSharedPreferences (Android).
5. Keep mobile app simple for v1 — chat, approvals, basic dashboard. Advanced features post-launch.

### M2 — Stripe Billing Meters Complexity

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — billing inaccurate or overage not triggered) |
| **Likelihood** | 3 (Possible — Stripe Billing Meters API is relatively new) |
| **Risk Score** | 9 |
| **Category** | Technical |

**Description:** Token overage billing requires Stripe Billing Meters to track usage and generate invoices. This API is newer and has less community documentation than standard Stripe subscriptions.

**Mitigation:**
1. Prototype Stripe Billing Meters in week 1-2 (during Prisma model planning) — verify the API works as expected.
2. Fallback: if Billing Meters are too complex, use Stripe usage records on subscription items (older, well-documented API).
3. Overage billing is in the scope cut table — can be deferred (hard stop at pool limit instead).

### M3 — Tool API Stability

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — specific tool becomes unusable until cheat sheet updated) |
| **Likelihood** | 3 (Possible — open-source tools update APIs between major versions) |
| **Risk Score** | 9 |
| **Category** | Technical |

**Description:** Cheat sheets document specific API endpoints for tools like Portainer, Nextcloud, Chatwoot, etc. If a tool updates its API (breaking changes), the agent's cheat sheet becomes inaccurate, causing failed API calls.

**Mitigation:**
1. Pin Docker image versions for all tools (already done in provisioner Compose files).
2. Cheat sheets include tool version they were tested against.
3. Agent behavior: if API call fails, retry with browser fallback automatically.
4. Post-launch: automated cheat sheet validation tests (curl against running tools, verify endpoints return expected shapes).

### M4 — Hub Performance Under Tenant Load

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — slow approvals, delayed heartbeats) |
| **Likelihood** | 3 (Possible — Hub was designed for admin use, not 100+ tenant heartbeats) |
| **Risk Score** | 9 |
| **Category** | Technical |

**Description:** The Hub currently handles admin dashboard requests. With 100+ tenants sending heartbeats every 60 seconds, token usage every hour, approval requests, and customer portal requests, the load profile changes significantly.

**Mitigation:**
1. Heartbeat endpoint must be lightweight: accept payload, queue for async processing, return 200 immediately.
2. Database: add indexes on `ServerConnection.status`, `TokenUsageBucket.periodId`, `CommandApproval.status`.
3. Connection pooling: Prisma's default connection pool (10 connections) may need to increase.
4. Load test with simulated tenants before launch (week 14-15).
5. Horizontal scaling: Hub runs behind nginx — add second instance if needed (session storage is JWT, no sticky sessions required).

### M5 — Secrets Proxy Latency Impact

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — noticeable delay in agent responses) |
| **Likelihood** | 3 (Possible — 4-layer pipeline on every LLM call) |
| **Risk Score** | 9 |
| **Category** | Performance |

**Description:** Every LLM call routes through the Secrets Proxy, which runs 4 layers of redaction. With 50+ secrets in the registry, the Aho-Corasick pattern matching, regex scanning, entropy analysis, and JSON key scanning must complete within the 10ms latency budget.

**Mitigation:**
1. Aho-Corasick is O(n) where n = input length (not number of patterns). This is inherently fast.
2. Pre-compile regex patterns at startup, not per-request.
3. Entropy filter only runs on strings ≥32 chars that weren't caught by earlier layers.
4. Benchmark at startup: if latency exceeds 10ms with the current secret count, log a warning.
5. Cache the Aho-Corasick automaton rebuild (only when secrets change, not per-request).

### M6 — LLM Provider Reliability

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — agents unable to respond during outage) |
| **Likelihood** | 4 (Likely — OpenRouter/Anthropic/Google have periodic outages) |
| **Risk Score** | 12 |
| **Category** | External Dependency |

**Description:** If the LLM provider (OpenRouter or direct provider) goes down, agents cannot respond. This directly impacts user experience.

**Mitigation:**
1. OpenClaw's native model failover chains: primary → fallback1 → fallback2.
2. Auth profile rotation before model fallback (OpenClaw native feature).
3. Graceful degradation: agent reports "I'm having trouble reaching my AI backend right now. I'll try again in a few minutes."
4. Heartbeat keep-warm (`heartbeat.every: "55m"`) prevents cold starts after brief outages.
5. Multiple OpenRouter API keys for rate limit distribution.

### M7 — Config.json Plaintext Password (Existing Critical Bug)

| Attribute | Value |
|-----------|-------|
| **Impact** | 4 (Major — root password exposed on provisioned servers) |
| **Likelihood** | 5 (Almost certain — it's a known issue documented in the repo analysis) |
| **Risk Score** | 20 → Classified as MEDIUM because fix is already planned |
| **Category** | Security |

**Description:** The provisioner's config.json contains the root password in plaintext after provisioning. This is a known issue from the repo analysis.

**Mitigation:**
1. **Already in scope:** Task 11.3 in implementation plan — 0.5 day effort in week 11.
2. Fix: delete config.json after provisioning completes (or redact sensitive fields).
3. Additional: ensure config.json is not committed to any git repository.
4. Verify fix during provisioner integration testing (week 14).

### M8 — Token Metering Accuracy

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — billing disputes, lost revenue, or overcharges) |
| **Likelihood** | 3 (Possible — token counting varies by provider, model, and caching) |
| **Risk Score** | 9 |
| **Category** | Business |

**Description:** Token metering captures counts from OpenRouter response headers. But different providers count tokens differently (e.g., cache-read vs. cache-write, system prompt tokens, tool use tokens). Inaccurate metering leads to billing disputes or revenue leakage.

**Mitigation:**
1. Trust OpenRouter's `x-openrouter-usage` headers as source of truth (they normalize across providers).
2. Track input/output/cache-read/cache-write separately (OpenClaw native).
3. Reconciliation: compare Safety Wrapper's local aggregation with OpenRouter's billing dashboard monthly.
4. Buffer: include a 5% tolerance in pool tracking to handle rounding differences.
5. Alert on anomalies: if hourly usage spikes >3× average, flag for investigation.

### M9 — n8n Cleanup Completeness

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — leftover references cause confusion, not functional failure) |
| **Likelihood** | 4 (Likely — n8n references are scattered across provisioner, compose, scripts) |
| **Risk Score** | 8 |
| **Category** | Technical Debt |

**Description:** n8n was removed from the tool stack (Sustainable Use License issue), but references remain in Playwright scripts, Docker Compose stacks, adapter code, and config files. Incomplete cleanup leads to provisioning errors or wasted container resources.

**Mitigation:**
1. Comprehensive grep: `grep -rn "n8n" letsbe-provisioner/` — enumerate all references.
2. Remove systematically: Compose services, nginx configs, Playwright scripts, environment templates, tool registry entries.
3. Verify: run provisioner on staging after cleanup — confirm no n8n containers start.
4. Replace in tool inventory: n8n's P1 cheat sheet slot → Activepieces.

---

## 4. LOW Risks

### L1 — Expo SDK Upgrade During Development

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — time spent on SDK migration instead of features) |
| **Likelihood** | 3 (Possible — Expo releases new SDK every ~3 months) |
| **Risk Score** | 6 |
| **Category** | Technical |

**Mitigation:** Pin to Expo SDK 52 for development. Upgrade post-launch.

### L2 — Gitea Actions Limitations

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — workarounds needed for CI/CD edge cases) |
| **Likelihood** | 3 (Possible — Gitea Actions is younger than GitHub Actions) |
| **Risk Score** | 6 |
| **Category** | Tooling |

**Mitigation:** Use simple, well-tested workflow patterns. Avoid advanced GitHub Actions features that may not have Gitea equivalents.

### L3 — Domain/DNS Automation Failure

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — manual DNS record creation as fallback) |
| **Likelihood** | 3 (Possible — Cloudflare/Entri API integration complexity) |
| **Risk Score** | 6 |
| **Category** | Technical |

**Mitigation:** DNS automation is in the scope cut table. Manual DNS creation is the existing, proven flow.

### L4 — Chromium Memory Usage on Lite Tier

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — Lite tier too constrained for browser tool) |
| **Likelihood** | 2 (Unlikely — Chromium headless is ~128MB, within budget) |
| **Risk Score** | 6 |
| **Category** | Performance |

**Mitigation:** Monitor Chromium memory on Lite tier. If excessive, limit browser tool to single tab. Chromium is only active during browser automation — it doesn't run permanently.

### L5 — Founding Member Churn

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — reduced early feedback, not technical failure) |
| **Likelihood** | 3 (Possible — early product may not meet all expectations) |
| **Risk Score** | 6 |
| **Category** | Business |

**Mitigation:** Hands-on onboarding for first 10 customers. Weekly check-ins. Fast iteration on feedback. Founding member 2× token bonus incentivizes retention.

### L6 — Time Zone Coordination (Distributed Team)

| Attribute | Value |
|-----------|-------|
| **Impact** | 2 (Minor — slower iteration cycles) |
| **Likelihood** | 2 (Unlikely — team likely EU-based) |
| **Risk Score** | 4 |
| **Category** | Organizational |

**Mitigation:** Async communication culture. Overlap hours for critical decisions. Written architecture documents (this proposal) reduce synchronous dependency.

### L7 — Image Registry Availability

| Attribute | Value |
|-----------|-------|
| **Impact** | 3 (Moderate — can't deploy or provision if registry down) |
| **Likelihood** | 1 (Rare — self-hosted Gitea registry) |
| **Risk Score** | 3 |
| **Category** | Infrastructure |

**Mitigation:** Cache images on all provisioned servers. Provisioner pre-pulls during off-peak. Registry backup via Gitea's built-in backup.

---

## 5. Known Unknowns

Things we know we don't know — areas requiring investigation during Phase 1-2.

### U1 — Exact OpenClaw Tool Routing Configuration

**Unknown:** How exactly do we configure OpenClaw to route tool calls to our Safety Wrapper HTTP API instead of executing them directly?

**Options under investigation:**
- A) Configure `exec` tool to call Safety Wrapper endpoint via curl
- B) Use OpenClaw's custom tool definition to register Safety Wrapper as a tool provider
- C) Override the exec tool's handler via plugin registration

**Investigation timeline:** Week 1-2 (during Safety Wrapper skeleton work)
**Impact if unresolved:** HIGH — blocks all tool integration

### U2 — OpenClaw LLM Proxy Configuration

**Unknown:** How do we tell OpenClaw to route LLM calls through our Secrets Proxy (localhost:8100) instead of directly to OpenRouter?

**Expected approach:** Configure the model provider's `apiBaseUrl` to point to `http://127.0.0.1:8100` instead of the actual provider URL. The Secrets Proxy forwards to the real provider after redaction.

**Investigation timeline:** Week 1 (during Secrets Proxy skeleton)
**Impact if unresolved:** HIGH — secrets redaction won't work

### U3 — Expo Push Notification Reliability for Time-Sensitive Approvals

**Unknown:** How reliable are Expo Push notifications for time-sensitive approval requests? What's the delivery latency? What happens if the notification is delayed by 30+ seconds?

**Investigation timeline:** Week 9-10 (during mobile app development)
**Fallback:** If push notifications are unreliable, add polling fallback in the mobile app (check for pending approvals every 30 seconds when app is foregrounded).

### U4 — Stripe Billing Meters Invoice Timing

**Unknown:** When do Stripe Billing Meters generate invoices? At the end of the billing period? Can we trigger mid-period for real-time usage updates?

**Investigation timeline:** Week 5-6 (during billing pipeline development)
**Fallback:** If Billing Meters don't support real-time, use webhook events from usage threshold alerts instead.

### U5 — Secrets in Tool Output (Post-Execution Redaction)

**Unknown:** When a tool returns output that contains secrets (e.g., `docker inspect` returns environment variables with passwords), are those redacted before reaching the LLM?

**Expected approach:** The Safety Wrapper redacts tool output before returning it to OpenClaw. But this means the Safety Wrapper must see the output, which it does since it's the execution layer.

**Verification needed:** Confirm that tool output flows through Safety Wrapper → redacted → returned to OpenClaw, not bypassed.

**Investigation timeline:** Week 4 (during OpenClaw integration)

### U6 — OpenClaw Session Persistence Across Restarts

**Unknown:** When OpenClaw restarts (e.g., after a Docker container restart), do agent sessions resume cleanly? Do in-flight tool calls get replayed or lost?

**Investigation timeline:** Week 4 (integration testing)
**Impact:** If sessions don't survive restarts, users may lose conversation context after Safety Proxy or OpenClaw crashes.

---

## 6. Security-Specific Risks

### Attack Surface Analysis

| Attack Vector | Component | Severity | Mitigation |
|--------------|-----------|----------|------------|
| **Prompt injection via tool output** | Safety Wrapper → OpenClaw | HIGH | Redact secrets from tool output; validate tool responses; OpenClaw's native context safety |
| **Shell command injection** | Safety Wrapper shell executor | HIGH | Allowlist-based execution; no shell metacharacters; execFile (not exec); path validation |
| **Path traversal in file operations** | Safety Wrapper file executor | HIGH | Jail to allowed directories; reject `..`, symlinks outside jail; canonical path resolution |
| **SSRF via browser tool** | OpenClaw browser → internal network | MEDIUM | SSRF protection lists (OpenClaw native); restrict to localhost ports |
| **Credential exfiltration via encoding** | Secrets Proxy | HIGH | 4-layer pipeline including entropy filter; base64/URL-decode before scanning |
| **Approval bypass via race condition** | Safety Wrapper approval queue | MEDIUM | Atomic approval state transitions; database locking on approval check |
| **Hub API key theft** | Tenant server → Hub | MEDIUM | API keys stored encrypted; transmitted via TLS; rotatable |
| **Cross-tenant data leakage** | Hub database | LOW | One customer = one VPS; Hub enforces tenant isolation via API key scoping |
| **DoS via LLM token exhaustion** | Safety Wrapper token metering | MEDIUM | Per-hour rate limits; automatic pause at pool exhaustion; alert at 80/90/100% |
| **WebSocket hijacking** | OpenClaw WebSocket | LOW | CVE-2026-25253 patched; OpenClaw bound to loopback |

### Security Invariants (Must Hold Under All Conditions)

| Invariant | Enforcement | Verification |
|-----------|------------|-------------|
| Secrets never reach LLM providers | Secrets Proxy transport-layer redaction | P0 test suite + adversarial audit |
| AI never sees raw credential values | SECRET_REF placeholders; injection at execution time | Integration tests |
| Destructive operations require human approval (at levels 1-2) | Safety Wrapper autonomy engine | P0 test suite |
| External comms always gated by default | External Comms Gate (independent of autonomy) | Configuration verification |
| Audit trail captures every tool call | Append-only SQLite audit log | Log completeness check |
| Container runs as non-root | Docker security configuration | Provisioner verification |
| OpenClaw not accessible from external network | Loopback binding | Network scan |
| Elevated Mode permanently disabled | OpenClaw configuration | Config verification |

---

## 7. Business & Operational Risks

### B1 — Market Timing

| Attribute | Value |
|-----------|-------|
| **Risk** | AI agent platforms are proliferating rapidly. Delay risks competitor capturing the SMB privacy-first niche. |
| **Impact** | 3 (Moderate) |
| **Likelihood** | 3 (Possible) |
| **Mitigation** | Focus on the privacy moat — competitors would need to redesign their architecture to match the secrets-never-leave guarantee. Ship fast on the core differentiator. |

### B2 — Unit Economics at Scale

| Attribute | Value |
|-----------|-------|
| **Risk** | Token costs, LLM API prices, and VPS costs may shift. The current pricing model (€29-109/mo) assumes specific cost structures. |
| **Impact** | 3 (Moderate) |
| **Likelihood** | 3 (Possible — LLM prices are dropping, but usage patterns are unpredictable) |
| **Mitigation** | Token pool sizes are configurable in Hub settings. Markup thresholds are configurable. Pricing tiers can be adjusted without code changes. Monitor unit economics from founding member data. |

### B3 — Customer Support at Scale

| Attribute | Value |
|-----------|-------|
| **Risk** | Each customer has their own VPS with unique configuration. Debugging customer issues is more complex than multi-tenant SaaS. |
| **Impact** | 3 (Moderate) |
| **Likelihood** | 4 (Likely — one-VPS-per-customer means one-off issues) |
| **Mitigation** | Hub monitoring dashboard. Tenant health heartbeats. Centralized logging via Hub. Remote diagnostic commands via Hub API. Consider adding remote shell access for LetsBe staff (gated by customer approval). |

### B4 — Regulatory Risk (EU AI Act)

| Attribute | Value |
|-----------|-------|
| **Risk** | EU AI Act may impose requirements on AI agents acting autonomously on behalf of businesses. |
| **Impact** | 2 (Minor — likely "limited risk" category for business tools) |
| **Likelihood** | 2 (Unlikely to affect v1 launch) |
| **Mitigation** | Audit trail captures every AI decision. Human-in-the-loop via approval system. Transparency via agent activity feed. Monitor EU AI Act implementation timeline. |

---

## 8. Dependency Risks

### External Dependencies

| Dependency | Version | Risk | Mitigation |
|-----------|---------|------|------------|
| **OpenClaw** | v2026.2.6-3 | Breaking changes; hook gaps | Pin release; compatibility tests; separate-process architecture |
| **OpenRouter** | API v1 | Rate limits; outages; pricing changes | Failover chains; multiple API keys; direct provider fallback |
| **Stripe** | v17.7.0 | API deprecations; Billing Meters maturity | Use stable APIs; test mode validation; fallback to usage records |
| **Expo SDK** | 52 | Breaking changes in SDK upgrades | Pin SDK; upgrade post-launch |
| **Netcup SCP API** | OAuth2 | API changes; rate limits | Existing integration proven; Hetzner as overflow provider |
| **PostgreSQL** | 16 | Minimal risk — mature and stable | Standard backup strategy |
| **Node.js** | 22 | LTS until April 2027 | Aligned with OpenClaw's runtime requirement |
| **better-sqlite3** | Latest | Native compilation on different platforms | Pin version; test in CI Docker |
| **Prisma** | 7.0.0 | Migration compatibility; query performance | Well-established ORM; large community |

### Internal Dependencies

| Dependency | Owner | Risk | Mitigation |
|-----------|-------|------|------------|
| **Hub (existing codebase)** | Hub Backend Engineer | 80+ endpoints to maintain alongside new development | Additive-only changes; no breaking existing endpoints |
| **Provisioner (Bash scripts)** | DevOps Engineer | Zero tests; complex SSH operations | Integration tests; manual verification; incremental changes |
| **Gitea (self-hosted)** | DevOps Engineer | Single point of failure for source control and CI | Regular backups; consider mirror to external Git provider |

---

## 9. Risk Monitoring Plan

### Weekly Risk Review (Every Friday)

| Activity | Owner | Output |
|----------|-------|--------|
| Review risk register | Project Lead | Updated risk scores; new risks added |
| Check milestone progress vs. plan | Project Lead | Buffer consumption tracked |
| Security invariant spot-check | Safety Wrapper Lead | Random adversarial test run |
| Dependency version check | DevOps | Alert on new OpenClaw releases or CVEs |

### Automated Monitoring (Post-Deployment)

| Monitor | Frequency | Alert Threshold |
|---------|-----------|----------------|
| Secrets redaction miss rate | Per-request | Any non-zero rate |
| Safety Wrapper uptime | Every 60s | Downtime > 30s |
| Hub ↔ SW heartbeat | Every 60s | 2 missed heartbeats |
| Token usage anomaly | Hourly | >3× average hourly usage |
| Provisioner success rate | Per-provisioning | Any failure |
| LLM provider latency | Per-request | p95 > 30s |
| Memory usage per component | Every 5min | >90% of budget |

### Risk Escalation Matrix

| Risk Score Change | Action |
|-------------------|--------|
| Score increases by ≥5 | Escalate to project lead; discuss in weekly review |
| New HIGH risk identified | Immediate team notification; mitigation plan within 24h |
| Milestone at risk (>3 days behind) | Scope cut discussion; buffer reallocation |
| Security invariant violation | STOP DEPLOYMENT. All hands on fix. No exceptions. |

---

*End of Document — 06 Risk Assessment*