4.2 KiB
4.2 KiB
07. Testing Strategy Proposal
1. Testing Principles
- Security-critical behavior is verified with invariant tests, not only unit coverage.
- Contract-first testing between Hub, Safety Wrapper, Mobile, Website, and Provisioner.
- Fast feedback in CI, deep verification in staging and nightly runs.
- AI-generated code receives stricter review and mutation testing on critical paths.
2. Test Pyramid By Component
| Layer | Hub | Safety Wrapper | Provisioner | Mobile/Website |
|---|---|---|---|---|
| Unit | services, validators, policy logic | classifier, redactor, secret resolver, adapters | parser/utils/template render | UI logic, state stores, hooks |
| Integration | Prisma + API handlers + auth | plugin hooks vs OpenClaw test harness | SSH runner against disposable VM | API integration against mock Hub |
| End-to-end | full order/provision/approval/billing flow | tenant command execution path | full 10-step provisioning with checkpoints | chat/approval/onboarding user journeys |
| Security | authz, rate-limit, session hardening | secret exfil tests, gating bypass tests | credential leakage scans | token storage, deep link auth |
| Performance | API p95 and DB load | per-turn latency overhead, memory usage | provisioning duration and retry cost | startup latency, push receipt latency |
3. Mandatory Security Invariant Suite
The following automated tests are required before each release:
- Secrets Never Leave Server Test
- Seed known secrets in vault and files.
- Trigger prompts/tool outputs containing these values.
- Assert outbound payloads and persisted logs contain only placeholders.
- Command Classification Matrix Test
- Execute fixtures for each command class (Green/Yellow/Yellow+External/Red/Critical).
- Validate behavior across autonomy levels 1-3.
- External Comms Independence Test
- At autonomy level 3, external action remains blocked when comms gate locked.
- Unlock only targeted tool; validate others remain blocked.
- Approval Expiry Test
- Approval request expires at 24h.
- Late approval cannot be replayed.
- SECRET_REF Boundary Test
- Secrets cannot be requested directly by raw name/value.
- Only valid references in allowlisted tool operations resolve.
4. Provisioning Test Strategy
4.1 Fast Checks
- Shellcheck + static checks for bash scripts.
- Template substitution tests (all placeholders resolved, none leaked).
- Stack inventory policy tests (no banned tools like n8n).
4.2 Disposable VPS E2E
Nightly automated runs:
- create disposable Debian VPS
- run full provisioning
- run smoke checks on selected tool endpoints
- verify tenant registration + heartbeat + approvals
- tear down VPS and collect artifacts
5. Contract Testing
- OpenAPI specs for Hub APIs and tenant APIs.
- Consumer-driven contract tests for:
- Safety Wrapper against Hub tenant endpoints
- Mobile app against customer endpoints
- Website onboarding against public endpoints
- Contract break blocks merge.
6. Data And Billing Validation
- Synthetic token event generator with known totals.
- Reconcile tenant usage buckets against Hub aggregated totals.
- Reconcile Hub totals against Stripe meter/invoice preview.
- Fail build if variance exceeds threshold.
7. Quality Gates (CI)
- Unit + integration tests must pass.
- Security invariants must pass.
- Critical package diff review for Safety Wrapper and Provisioner.
- Minimum thresholds:
- security-critical modules: >=90% branch coverage
- overall backend: >=75% branch coverage
- Mutation testing on classifier and redactor modules.
8. Human Review Workflow (Anti-AI-Slop)
Required for security-critical PRs:
- one reviewer validates threat model assumptions
- one reviewer validates test completeness and failure cases
- checklist includes: error paths, rollback behavior, idempotency, logging hygiene
No direct auto-merge for changes in:
- redaction engine
- command classifier
- secret storage/resolution
- provisioning credential handling
9. Launch Validation Checklist
Before founding-member launch:
- 7-day staging soak with no sev-1/2 defects
- two successful rollback drills (control plane and tenant plane)
- production canary with live approval + billing reconciliation
- first-hour templates executed successfully on staging tenants