LetsBeBiz-Redesign/docs/architecture-proposal/gpt/06-risk-assessment.md

# 06. Risk Assessment

## 1. Risk Scoring Method

- Probability: 1 (low) to 5 (high)
- Impact: 1 (low) to 5 (high)
- Risk score = Probability x Impact

## 2. Top Risks

| ID | Risk | Prob | Impact | Score | Mitigation | Contingency Trigger |
|---|---|---:|---:|---:|---|---|
| R1 | Secret exfiltration via unredacted outbound payload | 3 | 5 | 15 | Multi-layer redaction tests, egress deny-by-default policy, seeded canary secrets | Any unredacted canary secret seen outside tenant |
| R2 | Command gating bypass due misclassification | 3 | 5 | 15 | Deterministic policy engine, contract tests per class, human-readable reason logging | Red/Critical executes without approval in tests |
| R3 | OpenClaw upstream changes break plugin behavior | 3 | 4 | 12 | Pin stable tags, adapter compatibility suite, staged upgrade canaries | Hook contract test fails against new tag |
| R4 | Provisioner regressions reduce provisioning success | 4 | 4 | 16 | Idempotent checkpoints, replay tests, synthetic VPS CI | First-attempt success < 90% |
| R5 | Billing usage mismatch vs provider costs | 3 | 4 | 12 | Dual-entry usage checks, nightly reconciliation jobs, alert thresholds | >1% sustained variance for 24h |
| R6 | Mobile approval notification delays/drop | 3 | 3 | 9 | Push retries + in-app queue fallback + email fallback | p95 approval notify > 30s |
| R7 | Performance overhead exceeds Lite-tier budget | 2 | 4 | 8 | Memory profiling budget gates, disable non-essential plugins, tune browser lifecycle | LetsBe overhead > 800MB sustained |
| R8 | Tool API churn breaks adapters | 4 | 3 | 12 | Adapter integration tests against pinned versions, fallback to browser playbook | Adapter failure rate > 5% |
| R9 | Security debt from AI-generated code quality | 4 | 4 | 16 | Mandatory senior review on security modules, lint rules, banned patterns checks | Critical static-analysis finding unresolved >48h |
| R10 | Legal/compliance drift (license/source disclosure pages) | 2 | 4 | 8 | Automated license manifest publishing, pre-release legal checklist | Missing OSS disclosure page at RC freeze |

## 3. Risk Register By Domain

## 3.1 Security Risks

- Redaction misses non-standard secret formats.
- External comms gate incorrectly tied to autonomy level.
- Local logs/transcripts persist raw secret material.
- Local execution adapters allow shell metacharacter bypass.

## 3.2 Delivery Risks

- Too much simultaneous change across Hub + provisioner + tenant runtime.
- Underestimated migration effort from deprecated orchestrator/sysadmin behaviors.
- Browser automation migration complexity for setup scripts.

## 3.3 Operational Risks

- Dual-region Hub operations increase DB and deploy complexity.
- Insufficient on-call runbooks for approval outages and provisioning failures.
- Canary rollout without automated rollback criteria.

## 4. Mitigation Program

## 4.1 Pre-Launch Controls

- Security invariants are encoded as executable tests (not checklist-only).
- Every release candidate must pass redaction canary probes.
- Dry-run provisioning must pass on both Netcup and Hetzner targets.

## 4.2 Runtime Controls

- Alert on heartbeat freshness degradation.
- Alert on approval queue lag and expiration spikes.
- Alert on sudden drop in cache-read ratio (cost anomaly indicator).

## 4.3 Governance Controls

- Security design review required for changes in Safety Wrapper, redaction, or secrets flows.
- Migration freeze on deprecated paths after Phase 0.
- Weekly risk review with updated probability/impact re-scoring.

## 5. Launch Go/No-Go Risk Gates

No launch if any condition is true:

- unresolved severity-1 security defect
- redaction tests fail for any supported secret class
- command gating matrix not fully passing
- usage reconciliation error >1% over 72h canary
- provisioning first-attempt success below 85% in final week