LetsBeBiz-Redesign/docs/architecture-proposal/claude/06-RISK-ASSESSMENT.md

31 KiB
Raw Blame History

LetsBe Biz — Risk Assessment

Date: February 27, 2026 Team: Claude Opus 4.6 Architecture Team Document: 06 of 09 Status: Proposal — Competing with independent team


Table of Contents

  1. Risk Matrix Overview
  2. HIGH Risks
  3. MEDIUM Risks
  4. LOW Risks
  5. Known Unknowns
  6. Security-Specific Risks
  7. Business & Operational Risks
  8. Dependency Risks
  9. Risk Monitoring Plan

1. Risk Matrix Overview

Scoring

  • Impact: How bad is it if this happens? (1-5, where 5 = catastrophic)
  • Likelihood: How likely is it? (1-5, where 5 = almost certain)
  • Risk Score: Impact × Likelihood
  • Severity: HIGH (≥15), MEDIUM (8-14), LOW (≤7)

Summary

Severity Count Action Required
HIGH 6 Active mitigation required; block launch if unresolved
MEDIUM 9 Mitigation planned; monitor weekly
LOW 7 Accepted; monitor monthly

2. HIGH Risks

H1 — Secrets Redaction Bypass

Attribute Value
Impact 5 (Catastrophic — customer secrets sent to LLM provider)
Likelihood 3 (Possible — novel encoding/nesting could evade patterns)
Risk Score 15
Category Security

Description: The 4-layer redaction pipeline (Aho-Corasick → regex → entropy → JSON keys) may fail to catch secrets in edge cases: base64-encoded values, URL-encoded strings, secrets split across multiple JSON fields, secrets embedded in error messages from tools, or secrets in non-UTF-8 encodings.

Mitigation:

  1. TDD approach — write adversarial tests BEFORE implementation (Phase 1, week 3)
  2. Adversarial testing matrix from Technical Architecture §19.2: Unicode edge cases, base64, URL-encoded, nested JSON, YAML, log output
  3. Shannon entropy filter (Layer 3) as catch-all for unknown patterns (≥4.5 bits/char, ≥32 chars)
  4. Dedicated security audit in Phase 4 (week 13) with crafted bypass payloads
  5. Post-launch: bug bounty program for redaction bypass (internal at first, public later)
  6. Monitoring: log all redaction events; alert on suspiciously high entropy in outbound LLM calls

Residual risk: MEDIUM after mitigation. The entropy filter is the safety net, but it has false-positive trade-offs.

H2 — OpenClaw Hook Gap (before_tool_call not bridged to external plugins)

Attribute Value
Impact 5 (Catastrophic — Safety Wrapper cannot intercept tool calls)
Likelihood 2 (Unlikely — we've already planned for this via separate process)
Risk Score 10 → Elevated to HIGH due to impact severity
Category Technical / Dependency

Description: The Technical Architecture v1.2 proposes the Safety Wrapper as an in-process OpenClaw extension using before_tool_call / after_tool_call hooks. Our analysis (GitHub Discussion #20575) found these hooks are NOT bridged to external plugins — they only work for bundled/internal hooks. This means the in-process extension model proposed in the Technical Architecture does not work as documented.

Mitigation:

  1. Already addressed: Our architecture uses the Safety Wrapper as a SEPARATE PROCESS (localhost:8200). OpenClaw's tool calls are configured to route through the Safety Wrapper's HTTP API, not through in-process hooks.
  2. OpenClaw's exec tool is configured to call the Safety Wrapper's execute endpoint instead of running commands directly.
  3. OpenClaw's model provider is configured to proxy through the Secrets Proxy (localhost:8100) for LLM calls.
  4. This approach is hook-independent — it works regardless of OpenClaw's internal hook architecture.

Residual risk: LOW after mitigation. The separate-process architecture was specifically designed to avoid this risk.

H3 — OpenClaw Upstream Breaking Changes

Attribute Value
Impact 4 (Major — could break tool routing, sessions, or agent management)
Likelihood 4 (Likely — OpenClaw is actively developed with calendar-versioned releases)
Risk Score 16
Category Dependency

Description: OpenClaw uses calendar versioning (2026.2.6-3) and is under active development. Breaking changes to the config format, tool system, session management, or API could break our integration. The v1.2 architecture already found one breaking change (hook bridging gap).

Mitigation:

  1. Pin to a specific release tag (e.g., v2026.2.6-3). Never float to latest.
  2. Monthly review of OpenClaw releases during development; quarterly post-launch.
  3. Staging-first rollout: test new releases on staging VPS before any production deployment.
  4. Canary deployment: staging → 5% → 25% → 100% (see 03-DEPLOYMENT-STRATEGY).
  5. Maintain a compatibility test suite: 20-30 tests verifying our integration points (tool routing, LLM proxy, session management, config loading).
  6. Document all integration points in a single "OpenClaw Integration Surface" document.

Residual risk: MEDIUM. We control the pin, but upstream changes may require adaptation work that delays feature development.

H4 — Provisioner Reliability (Zero Tests)

Attribute Value
Impact 5 (Catastrophic — new customers can't be onboarded)
Likelihood 3 (Possible — 4,477 LOC Bash with zero tests, complex SSH-based provisioning)
Risk Score 15
Category Technical

Description: The provisioner (letsbe-provisioner) is ~4,477 LOC of Bash scripts with zero automated tests. It performs 10-step SSH-based provisioning including Docker deployment, secret generation, nginx configuration, and SSL certificate setup. Any failure in this pipeline blocks new customer onboarding. The step 10 rewrite (replacing orchestrator/sysadmin with OpenClaw/Safety Wrapper) adds significant risk.

Mitigation:

  1. Containerized integration test: run provisioner inside Docker against a test VPS (or mock SSH target). Phase 4, week 14.
  2. Incremental testing during development: test each provisioner step independently.
  3. Keep the existing provisioner working alongside the new step 10 until verified.
  4. Pre-provisioned server pool: have 3-5 servers ready so provisioner failures don't block immediate customer needs.
  5. Rollback procedure: if new provisioner fails, manually deploy the existing stack and convert later.
  6. Manual verification checklist for the first 5 provisioning runs.

Residual risk: MEDIUM. The lack of automated tests is a persistent concern, but manual verification and the pre-provisioned pool mitigate the immediate impact.

H5 — CVE-2026-25253 (Cross-Site WebSocket Hijacking in OpenClaw)

Attribute Value
Impact 4 (Major — potential unauthorized session access)
Likelihood 2 (Unlikely — patched in v2026.1.29, but must verify pin includes fix)
Risk Score 8 → Elevated to HIGH due to security nature
Category Security / Dependency

Description: CVE-2026-25253 (CVSS 8.8) is a cross-site WebSocket hijacking vulnerability in OpenClaw. Patched 2026-01-29. Our pinned version (v2026.2.6-3) includes the fix, but any downgrade or use of an older version would reintroduce it.

Mitigation:

  1. Verify pinned version ≥ v2026.1.29 during CI build (automated check).
  2. OpenClaw bound to loopback (127.0.0.1) — not exposed to external network, reducing attack surface.
  3. openclaw security audit --deep run during provisioning (catches known CVEs).
  4. Include CVE check in monthly OpenClaw review process.

Residual risk: LOW after mitigation. Loopback binding means external exploitation requires prior VPS access.

H6 — Single Point of Failure: Safety Wrapper Lead

Attribute Value
Impact 4 (Major — critical path stalls; no one else understands security layer)
Likelihood 3 (Possible — single senior engineer on core IP)
Risk Score 12 → Elevated to HIGH due to critical path impact
Category Organizational

Description: The Safety Wrapper is the core IP and critical path item. It requires a senior engineer with security expertise. If this person is unavailable (illness, departure, burnout), the entire project stalls.

Mitigation:

  1. Pair programming on all safety-critical code (classification, redaction, injection).
  2. Weekly architecture reviews where the second engineer (Hub or DevOps) reviews Safety Wrapper changes.
  3. Comprehensive documentation: every design decision, every edge case, every test rationale.
  4. Cross-training: Hub Backend engineer should be able to make minor Safety Wrapper changes by week 8.
  5. Code review culture: no Safety Wrapper PR merges without review from at least one other engineer.

Residual risk: MEDIUM. Documentation and cross-training reduce bus factor from 1 to ~1.5 by week 8.


3. MEDIUM Risks

M1 — Mobile App Platform Inconsistencies

Attribute Value
Impact 3 (Moderate — degraded experience on one platform)
Likelihood 4 (Likely — iOS/Android differences are common with Expo)
Risk Score 12
Category Technical

Description: Expo Bare Workflow mitigates many platform differences, but push notification behavior, background app refresh, secure storage, and SSE streaming can differ between iOS and Android.

Mitigation:

  1. Test on both platforms from week 9 (not just week 14).
  2. Focus on Android first (more forgiving platform for initial testing), polish iOS separately.
  3. Use Expo's managed push notification service (Expo Push) which abstracts APNs/FCM differences.
  4. Secure storage: use expo-secure-store which wraps Keychain (iOS) and EncryptedSharedPreferences (Android).
  5. Keep mobile app simple for v1 — chat, approvals, basic dashboard. Advanced features post-launch.

M2 — Stripe Billing Meters Complexity

Attribute Value
Impact 3 (Moderate — billing inaccurate or overage not triggered)
Likelihood 3 (Possible — Stripe Billing Meters API is relatively new)
Risk Score 9
Category Technical

Description: Token overage billing requires Stripe Billing Meters to track usage and generate invoices. This API is newer and has less community documentation than standard Stripe subscriptions.

Mitigation:

  1. Prototype Stripe Billing Meters in week 1-2 (during Prisma model planning) — verify the API works as expected.
  2. Fallback: if Billing Meters are too complex, use Stripe usage records on subscription items (older, well-documented API).
  3. Overage billing is in the scope cut table — can be deferred (hard stop at pool limit instead).

M3 — Tool API Stability

Attribute Value
Impact 3 (Moderate — specific tool becomes unusable until cheat sheet updated)
Likelihood 3 (Possible — open-source tools update APIs between major versions)
Risk Score 9
Category Technical

Description: Cheat sheets document specific API endpoints for tools like Portainer, Nextcloud, Chatwoot, etc. If a tool updates its API (breaking changes), the agent's cheat sheet becomes inaccurate, causing failed API calls.

Mitigation:

  1. Pin Docker image versions for all tools (already done in provisioner Compose files).
  2. Cheat sheets include tool version they were tested against.
  3. Agent behavior: if API call fails, retry with browser fallback automatically.
  4. Post-launch: automated cheat sheet validation tests (curl against running tools, verify endpoints return expected shapes).

M4 — Hub Performance Under Tenant Load

Attribute Value
Impact 3 (Moderate — slow approvals, delayed heartbeats)
Likelihood 3 (Possible — Hub was designed for admin use, not 100+ tenant heartbeats)
Risk Score 9
Category Technical

Description: The Hub currently handles admin dashboard requests. With 100+ tenants sending heartbeats every 60 seconds, token usage every hour, approval requests, and customer portal requests, the load profile changes significantly.

Mitigation:

  1. Heartbeat endpoint must be lightweight: accept payload, queue for async processing, return 200 immediately.
  2. Database: add indexes on ServerConnection.status, TokenUsageBucket.periodId, CommandApproval.status.
  3. Connection pooling: Prisma's default connection pool (10 connections) may need to increase.
  4. Load test with simulated tenants before launch (week 14-15).
  5. Horizontal scaling: Hub runs behind nginx — add second instance if needed (session storage is JWT, no sticky sessions required).

M5 — Secrets Proxy Latency Impact

Attribute Value
Impact 3 (Moderate — noticeable delay in agent responses)
Likelihood 3 (Possible — 4-layer pipeline on every LLM call)
Risk Score 9
Category Performance

Description: Every LLM call routes through the Secrets Proxy, which runs 4 layers of redaction. With 50+ secrets in the registry, the Aho-Corasick pattern matching, regex scanning, entropy analysis, and JSON key scanning must complete within the 10ms latency budget.

Mitigation:

  1. Aho-Corasick is O(n) where n = input length (not number of patterns). This is inherently fast.
  2. Pre-compile regex patterns at startup, not per-request.
  3. Entropy filter only runs on strings ≥32 chars that weren't caught by earlier layers.
  4. Benchmark at startup: if latency exceeds 10ms with the current secret count, log a warning.
  5. Cache the Aho-Corasick automaton rebuild (only when secrets change, not per-request).

M6 — LLM Provider Reliability

Attribute Value
Impact 3 (Moderate — agents unable to respond during outage)
Likelihood 4 (Likely — OpenRouter/Anthropic/Google have periodic outages)
Risk Score 12
Category External Dependency

Description: If the LLM provider (OpenRouter or direct provider) goes down, agents cannot respond. This directly impacts user experience.

Mitigation:

  1. OpenClaw's native model failover chains: primary → fallback1 → fallback2.
  2. Auth profile rotation before model fallback (OpenClaw native feature).
  3. Graceful degradation: agent reports "I'm having trouble reaching my AI backend right now. I'll try again in a few minutes."
  4. Heartbeat keep-warm (heartbeat.every: "55m") prevents cold starts after brief outages.
  5. Multiple OpenRouter API keys for rate limit distribution.

M7 — Config.json Plaintext Password (Existing Critical Bug)

Attribute Value
Impact 4 (Major — root password exposed on provisioned servers)
Likelihood 5 (Almost certain — it's a known issue documented in the repo analysis)
Risk Score 20 → Classified as MEDIUM because fix is already planned
Category Security

Description: The provisioner's config.json contains the root password in plaintext after provisioning. This is a known issue from the repo analysis.

Mitigation:

  1. Already in scope: Task 11.3 in implementation plan — 0.5 day effort in week 11.
  2. Fix: delete config.json after provisioning completes (or redact sensitive fields).
  3. Additional: ensure config.json is not committed to any git repository.
  4. Verify fix during provisioner integration testing (week 14).

M8 — Token Metering Accuracy

Attribute Value
Impact 3 (Moderate — billing disputes, lost revenue, or overcharges)
Likelihood 3 (Possible — token counting varies by provider, model, and caching)
Risk Score 9
Category Business

Description: Token metering captures counts from OpenRouter response headers. But different providers count tokens differently (e.g., cache-read vs. cache-write, system prompt tokens, tool use tokens). Inaccurate metering leads to billing disputes or revenue leakage.

Mitigation:

  1. Trust OpenRouter's x-openrouter-usage headers as source of truth (they normalize across providers).
  2. Track input/output/cache-read/cache-write separately (OpenClaw native).
  3. Reconciliation: compare Safety Wrapper's local aggregation with OpenRouter's billing dashboard monthly.
  4. Buffer: include a 5% tolerance in pool tracking to handle rounding differences.
  5. Alert on anomalies: if hourly usage spikes >3× average, flag for investigation.

M9 — n8n Cleanup Completeness

Attribute Value
Impact 2 (Minor — leftover references cause confusion, not functional failure)
Likelihood 4 (Likely — n8n references are scattered across provisioner, compose, scripts)
Risk Score 8
Category Technical Debt

Description: n8n was removed from the tool stack (Sustainable Use License issue), but references remain in Playwright scripts, Docker Compose stacks, adapter code, and config files. Incomplete cleanup leads to provisioning errors or wasted container resources.

Mitigation:

  1. Comprehensive grep: grep -rn "n8n" letsbe-provisioner/ — enumerate all references.
  2. Remove systematically: Compose services, nginx configs, Playwright scripts, environment templates, tool registry entries.
  3. Verify: run provisioner on staging after cleanup — confirm no n8n containers start.
  4. Replace in tool inventory: n8n's P1 cheat sheet slot → Activepieces.

4. LOW Risks

L1 — Expo SDK Upgrade During Development

Attribute Value
Impact 2 (Minor — time spent on SDK migration instead of features)
Likelihood 3 (Possible — Expo releases new SDK every ~3 months)
Risk Score 6
Category Technical

Mitigation: Pin to Expo SDK 52 for development. Upgrade post-launch.

L2 — Gitea Actions Limitations

Attribute Value
Impact 2 (Minor — workarounds needed for CI/CD edge cases)
Likelihood 3 (Possible — Gitea Actions is younger than GitHub Actions)
Risk Score 6
Category Tooling

Mitigation: Use simple, well-tested workflow patterns. Avoid advanced GitHub Actions features that may not have Gitea equivalents.

L3 — Domain/DNS Automation Failure

Attribute Value
Impact 2 (Minor — manual DNS record creation as fallback)
Likelihood 3 (Possible — Cloudflare/Entri API integration complexity)
Risk Score 6
Category Technical

Mitigation: DNS automation is in the scope cut table. Manual DNS creation is the existing, proven flow.

L4 — Chromium Memory Usage on Lite Tier

Attribute Value
Impact 3 (Moderate — Lite tier too constrained for browser tool)
Likelihood 2 (Unlikely — Chromium headless is ~128MB, within budget)
Risk Score 6
Category Performance

Mitigation: Monitor Chromium memory on Lite tier. If excessive, limit browser tool to single tab. Chromium is only active during browser automation — it doesn't run permanently.

L5 — Founding Member Churn

Attribute Value
Impact 2 (Minor — reduced early feedback, not technical failure)
Likelihood 3 (Possible — early product may not meet all expectations)
Risk Score 6
Category Business

Mitigation: Hands-on onboarding for first 10 customers. Weekly check-ins. Fast iteration on feedback. Founding member 2× token bonus incentivizes retention.

L6 — Time Zone Coordination (Distributed Team)

Attribute Value
Impact 2 (Minor — slower iteration cycles)
Likelihood 2 (Unlikely — team likely EU-based)
Risk Score 4
Category Organizational

Mitigation: Async communication culture. Overlap hours for critical decisions. Written architecture documents (this proposal) reduce synchronous dependency.

L7 — Image Registry Availability

Attribute Value
Impact 3 (Moderate — can't deploy or provision if registry down)
Likelihood 1 (Rare — self-hosted Gitea registry)
Risk Score 3
Category Infrastructure

Mitigation: Cache images on all provisioned servers. Provisioner pre-pulls during off-peak. Registry backup via Gitea's built-in backup.


5. Known Unknowns

Things we know we don't know — areas requiring investigation during Phase 1-2.

U1 — Exact OpenClaw Tool Routing Configuration

Unknown: How exactly do we configure OpenClaw to route tool calls to our Safety Wrapper HTTP API instead of executing them directly?

Options under investigation:

  • A) Configure exec tool to call Safety Wrapper endpoint via curl
  • B) Use OpenClaw's custom tool definition to register Safety Wrapper as a tool provider
  • C) Override the exec tool's handler via plugin registration

Investigation timeline: Week 1-2 (during Safety Wrapper skeleton work) Impact if unresolved: HIGH — blocks all tool integration

U2 — OpenClaw LLM Proxy Configuration

Unknown: How do we tell OpenClaw to route LLM calls through our Secrets Proxy (localhost:8100) instead of directly to OpenRouter?

Expected approach: Configure the model provider's apiBaseUrl to point to http://127.0.0.1:8100 instead of the actual provider URL. The Secrets Proxy forwards to the real provider after redaction.

Investigation timeline: Week 1 (during Secrets Proxy skeleton) Impact if unresolved: HIGH — secrets redaction won't work

U3 — Expo Push Notification Reliability for Time-Sensitive Approvals

Unknown: How reliable are Expo Push notifications for time-sensitive approval requests? What's the delivery latency? What happens if the notification is delayed by 30+ seconds?

Investigation timeline: Week 9-10 (during mobile app development) Fallback: If push notifications are unreliable, add polling fallback in the mobile app (check for pending approvals every 30 seconds when app is foregrounded).

U4 — Stripe Billing Meters Invoice Timing

Unknown: When do Stripe Billing Meters generate invoices? At the end of the billing period? Can we trigger mid-period for real-time usage updates?

Investigation timeline: Week 5-6 (during billing pipeline development) Fallback: If Billing Meters don't support real-time, use webhook events from usage threshold alerts instead.

U5 — Secrets in Tool Output (Post-Execution Redaction)

Unknown: When a tool returns output that contains secrets (e.g., docker inspect returns environment variables with passwords), are those redacted before reaching the LLM?

Expected approach: The Safety Wrapper redacts tool output before returning it to OpenClaw. But this means the Safety Wrapper must see the output, which it does since it's the execution layer.

Verification needed: Confirm that tool output flows through Safety Wrapper → redacted → returned to OpenClaw, not bypassed.

Investigation timeline: Week 4 (during OpenClaw integration)

U6 — OpenClaw Session Persistence Across Restarts

Unknown: When OpenClaw restarts (e.g., after a Docker container restart), do agent sessions resume cleanly? Do in-flight tool calls get replayed or lost?

Investigation timeline: Week 4 (integration testing) Impact: If sessions don't survive restarts, users may lose conversation context after Safety Proxy or OpenClaw crashes.


6. Security-Specific Risks

Attack Surface Analysis

Attack Vector Component Severity Mitigation
Prompt injection via tool output Safety Wrapper → OpenClaw HIGH Redact secrets from tool output; validate tool responses; OpenClaw's native context safety
Shell command injection Safety Wrapper shell executor HIGH Allowlist-based execution; no shell metacharacters; execFile (not exec); path validation
Path traversal in file operations Safety Wrapper file executor HIGH Jail to allowed directories; reject .., symlinks outside jail; canonical path resolution
SSRF via browser tool OpenClaw browser → internal network MEDIUM SSRF protection lists (OpenClaw native); restrict to localhost ports
Credential exfiltration via encoding Secrets Proxy HIGH 4-layer pipeline including entropy filter; base64/URL-decode before scanning
Approval bypass via race condition Safety Wrapper approval queue MEDIUM Atomic approval state transitions; database locking on approval check
Hub API key theft Tenant server → Hub MEDIUM API keys stored encrypted; transmitted via TLS; rotatable
Cross-tenant data leakage Hub database LOW One customer = one VPS; Hub enforces tenant isolation via API key scoping
DoS via LLM token exhaustion Safety Wrapper token metering MEDIUM Per-hour rate limits; automatic pause at pool exhaustion; alert at 80/90/100%
WebSocket hijacking OpenClaw WebSocket LOW CVE-2026-25253 patched; OpenClaw bound to loopback

Security Invariants (Must Hold Under All Conditions)

Invariant Enforcement Verification
Secrets never reach LLM providers Secrets Proxy transport-layer redaction P0 test suite + adversarial audit
AI never sees raw credential values SECRET_REF placeholders; injection at execution time Integration tests
Destructive operations require human approval (at levels 1-2) Safety Wrapper autonomy engine P0 test suite
External comms always gated by default External Comms Gate (independent of autonomy) Configuration verification
Audit trail captures every tool call Append-only SQLite audit log Log completeness check
Container runs as non-root Docker security configuration Provisioner verification
OpenClaw not accessible from external network Loopback binding Network scan
Elevated Mode permanently disabled OpenClaw configuration Config verification

7. Business & Operational Risks

B1 — Market Timing

Attribute Value
Risk AI agent platforms are proliferating rapidly. Delay risks competitor capturing the SMB privacy-first niche.
Impact 3 (Moderate)
Likelihood 3 (Possible)
Mitigation Focus on the privacy moat — competitors would need to redesign their architecture to match the secrets-never-leave guarantee. Ship fast on the core differentiator.

B2 — Unit Economics at Scale

Attribute Value
Risk Token costs, LLM API prices, and VPS costs may shift. The current pricing model (€29-109/mo) assumes specific cost structures.
Impact 3 (Moderate)
Likelihood 3 (Possible — LLM prices are dropping, but usage patterns are unpredictable)
Mitigation Token pool sizes are configurable in Hub settings. Markup thresholds are configurable. Pricing tiers can be adjusted without code changes. Monitor unit economics from founding member data.

B3 — Customer Support at Scale

Attribute Value
Risk Each customer has their own VPS with unique configuration. Debugging customer issues is more complex than multi-tenant SaaS.
Impact 3 (Moderate)
Likelihood 4 (Likely — one-VPS-per-customer means one-off issues)
Mitigation Hub monitoring dashboard. Tenant health heartbeats. Centralized logging via Hub. Remote diagnostic commands via Hub API. Consider adding remote shell access for LetsBe staff (gated by customer approval).

B4 — Regulatory Risk (EU AI Act)

Attribute Value
Risk EU AI Act may impose requirements on AI agents acting autonomously on behalf of businesses.
Impact 2 (Minor — likely "limited risk" category for business tools)
Likelihood 2 (Unlikely to affect v1 launch)
Mitigation Audit trail captures every AI decision. Human-in-the-loop via approval system. Transparency via agent activity feed. Monitor EU AI Act implementation timeline.

8. Dependency Risks

External Dependencies

Dependency Version Risk Mitigation
OpenClaw v2026.2.6-3 Breaking changes; hook gaps Pin release; compatibility tests; separate-process architecture
OpenRouter API v1 Rate limits; outages; pricing changes Failover chains; multiple API keys; direct provider fallback
Stripe v17.7.0 API deprecations; Billing Meters maturity Use stable APIs; test mode validation; fallback to usage records
Expo SDK 52 Breaking changes in SDK upgrades Pin SDK; upgrade post-launch
Netcup SCP API OAuth2 API changes; rate limits Existing integration proven; Hetzner as overflow provider
PostgreSQL 16 Minimal risk — mature and stable Standard backup strategy
Node.js 22 LTS until April 2027 Aligned with OpenClaw's runtime requirement
better-sqlite3 Latest Native compilation on different platforms Pin version; test in CI Docker
Prisma 7.0.0 Migration compatibility; query performance Well-established ORM; large community

Internal Dependencies

Dependency Owner Risk Mitigation
Hub (existing codebase) Hub Backend Engineer 80+ endpoints to maintain alongside new development Additive-only changes; no breaking existing endpoints
Provisioner (Bash scripts) DevOps Engineer Zero tests; complex SSH operations Integration tests; manual verification; incremental changes
Gitea (self-hosted) DevOps Engineer Single point of failure for source control and CI Regular backups; consider mirror to external Git provider

9. Risk Monitoring Plan

Weekly Risk Review (Every Friday)

Activity Owner Output
Review risk register Project Lead Updated risk scores; new risks added
Check milestone progress vs. plan Project Lead Buffer consumption tracked
Security invariant spot-check Safety Wrapper Lead Random adversarial test run
Dependency version check DevOps Alert on new OpenClaw releases or CVEs

Automated Monitoring (Post-Deployment)

Monitor Frequency Alert Threshold
Secrets redaction miss rate Per-request Any non-zero rate
Safety Wrapper uptime Every 60s Downtime > 30s
Hub ↔ SW heartbeat Every 60s 2 missed heartbeats
Token usage anomaly Hourly >3× average hourly usage
Provisioner success rate Per-provisioning Any failure
LLM provider latency Per-request p95 > 30s
Memory usage per component Every 5min >90% of budget

Risk Escalation Matrix

Risk Score Change Action
Score increases by ≥5 Escalate to project lead; discuss in weekly review
New HIGH risk identified Immediate team notification; mitigation plan within 24h
Milestone at risk (>3 days behind) Scope cut discussion; buffer reallocation
Security invariant violation STOP DEPLOYMENT. All hands on fix. No exceptions.

End of Document — 06 Risk Assessment