# 07. Testing Strategy Proposal ## 1. Testing Principles - Security-critical behavior is verified with invariant tests, not only unit coverage. - Contract-first testing between Hub, Safety Wrapper, Mobile, Website, and Provisioner. - Fast feedback in CI, deep verification in staging and nightly runs. - AI-generated code receives stricter review and mutation testing on critical paths. ## 2. Test Pyramid By Component | Layer | Hub | Safety Wrapper | Provisioner | Mobile/Website | |---|---|---|---|---| | Unit | services, validators, policy logic | classifier, redactor, secret resolver, adapters | parser/utils/template render | UI logic, state stores, hooks | | Integration | Prisma + API handlers + auth | plugin hooks vs OpenClaw test harness | SSH runner against disposable VM | API integration against mock Hub | | End-to-end | full order/provision/approval/billing flow | tenant command execution path | full 10-step provisioning with checkpoints | chat/approval/onboarding user journeys | | Security | authz, rate-limit, session hardening | secret exfil tests, gating bypass tests | credential leakage scans | token storage, deep link auth | | Performance | API p95 and DB load | per-turn latency overhead, memory usage | provisioning duration and retry cost | startup latency, push receipt latency | ## 3. Mandatory Security Invariant Suite The following automated tests are required before each release: 1. **Secrets Never Leave Server Test** - Seed known secrets in vault and files. - Trigger prompts/tool outputs containing these values. - Assert outbound payloads and persisted logs contain only placeholders. 2. **Command Classification Matrix Test** - Execute fixtures for each command class (Green/Yellow/Yellow+External/Red/Critical). - Validate behavior across autonomy levels 1-3. 3. **External Comms Independence Test** - At autonomy level 3, external action remains blocked when comms gate locked. - Unlock only targeted tool; validate others remain blocked. 4. **Approval Expiry Test** - Approval request expires at 24h. - Late approval cannot be replayed. 5. **SECRET_REF Boundary Test** - Secrets cannot be requested directly by raw name/value. - Only valid references in allowlisted tool operations resolve. ## 4. Provisioning Test Strategy ## 4.1 Fast Checks - Shellcheck + static checks for bash scripts. - Template substitution tests (all placeholders resolved, none leaked). - Stack inventory policy tests (no banned tools like n8n). ## 4.2 Disposable VPS E2E Nightly automated runs: - create disposable Debian VPS - run full provisioning - run smoke checks on selected tool endpoints - verify tenant registration + heartbeat + approvals - tear down VPS and collect artifacts ## 5. Contract Testing - OpenAPI specs for Hub APIs and tenant APIs. - Consumer-driven contract tests for: - Safety Wrapper against Hub tenant endpoints - Mobile app against customer endpoints - Website onboarding against public endpoints - Contract break blocks merge. ## 6. Data And Billing Validation - Synthetic token event generator with known totals. - Reconcile tenant usage buckets against Hub aggregated totals. - Reconcile Hub totals against Stripe meter/invoice preview. - Fail build if variance exceeds threshold. ## 7. Quality Gates (CI) - Unit + integration tests must pass. - Security invariants must pass. - Critical package diff review for Safety Wrapper and Provisioner. - Minimum thresholds: - security-critical modules: >=90% branch coverage - overall backend: >=75% branch coverage - Mutation testing on classifier and redactor modules. ## 8. Human Review Workflow (Anti-AI-Slop) Required for security-critical PRs: - one reviewer validates threat model assumptions - one reviewer validates test completeness and failure cases - checklist includes: error paths, rollback behavior, idempotency, logging hygiene No direct auto-merge for changes in: - redaction engine - command classifier - secret storage/resolution - provisioning credential handling ## 9. Launch Validation Checklist Before founding-member launch: - 7-day staging soak with no sev-1/2 defects - two successful rollback drills (control plane and tenant plane) - production canary with live approval + billing reconciliation - first-hour templates executed successfully on staging tenants