LetsBeBiz-Redesign/docs/architecture-proposal/gpt/06-risk-assessment.md

3.8 KiB

06. Risk Assessment

1. Risk Scoring Method

  • Probability: 1 (low) to 5 (high)
  • Impact: 1 (low) to 5 (high)
  • Risk score = Probability x Impact

2. Top Risks

ID Risk Prob Impact Score Mitigation Contingency Trigger
R1 Secret exfiltration via unredacted outbound payload 3 5 15 Multi-layer redaction tests, egress deny-by-default policy, seeded canary secrets Any unredacted canary secret seen outside tenant
R2 Command gating bypass due misclassification 3 5 15 Deterministic policy engine, contract tests per class, human-readable reason logging Red/Critical executes without approval in tests
R3 OpenClaw upstream changes break plugin behavior 3 4 12 Pin stable tags, adapter compatibility suite, staged upgrade canaries Hook contract test fails against new tag
R4 Provisioner regressions reduce provisioning success 4 4 16 Idempotent checkpoints, replay tests, synthetic VPS CI First-attempt success < 90%
R5 Billing usage mismatch vs provider costs 3 4 12 Dual-entry usage checks, nightly reconciliation jobs, alert thresholds >1% sustained variance for 24h
R6 Mobile approval notification delays/drop 3 3 9 Push retries + in-app queue fallback + email fallback p95 approval notify > 30s
R7 Performance overhead exceeds Lite-tier budget 2 4 8 Memory profiling budget gates, disable non-essential plugins, tune browser lifecycle LetsBe overhead > 800MB sustained
R8 Tool API churn breaks adapters 4 3 12 Adapter integration tests against pinned versions, fallback to browser playbook Adapter failure rate > 5%
R9 Security debt from AI-generated code quality 4 4 16 Mandatory senior review on security modules, lint rules, banned patterns checks Critical static-analysis finding unresolved >48h
R10 Legal/compliance drift (license/source disclosure pages) 2 4 8 Automated license manifest publishing, pre-release legal checklist Missing OSS disclosure page at RC freeze

3. Risk Register By Domain

3.1 Security Risks

  • Redaction misses non-standard secret formats.
  • External comms gate incorrectly tied to autonomy level.
  • Local logs/transcripts persist raw secret material.
  • Local execution adapters allow shell metacharacter bypass.

3.2 Delivery Risks

  • Too much simultaneous change across Hub + provisioner + tenant runtime.
  • Underestimated migration effort from deprecated orchestrator/sysadmin behaviors.
  • Browser automation migration complexity for setup scripts.

3.3 Operational Risks

  • Dual-region Hub operations increase DB and deploy complexity.
  • Insufficient on-call runbooks for approval outages and provisioning failures.
  • Canary rollout without automated rollback criteria.

4. Mitigation Program

4.1 Pre-Launch Controls

  • Security invariants are encoded as executable tests (not checklist-only).
  • Every release candidate must pass redaction canary probes.
  • Dry-run provisioning must pass on both Netcup and Hetzner targets.

4.2 Runtime Controls

  • Alert on heartbeat freshness degradation.
  • Alert on approval queue lag and expiration spikes.
  • Alert on sudden drop in cache-read ratio (cost anomaly indicator).

4.3 Governance Controls

  • Security design review required for changes in Safety Wrapper, redaction, or secrets flows.
  • Migration freeze on deprecated paths after Phase 0.
  • Weekly risk review with updated probability/impact re-scoring.

5. Launch Go/No-Go Risk Gates

No launch if any condition is true:

  • unresolved severity-1 security defect
  • redaction tests fail for any supported secret class
  • command gating matrix not fully passing
  • usage reconciliation error >1% over 72h canary
  • provisioning first-attempt success below 85% in final week