# 07. Testing Strategy Proposal

## 1. Testing Principles

- Security-critical behavior is verified with invariant tests, not only unit coverage.
- Contract-first testing between Hub, Safety Wrapper, Mobile, Website, and Provisioner.
- Fast feedback in CI, deep verification in staging and nightly runs.
- AI-generated code receives stricter review and mutation testing on critical paths.

## 2. Test Pyramid By Component

| Layer | Hub | Safety Wrapper | Provisioner | Mobile/Website |
|---|---|---|---|---|
| Unit | services, validators, policy logic | classifier, redactor, secret resolver, adapters | parser/utils/template render | UI logic, state stores, hooks |
| Integration | Prisma + API handlers + auth | plugin hooks vs OpenClaw test harness | SSH runner against disposable VM | API integration against mock Hub |
| End-to-end | full order/provision/approval/billing flow | tenant command execution path | full 10-step provisioning with checkpoints | chat/approval/onboarding user journeys |
| Security | authz, rate-limit, session hardening | secret exfil tests, gating bypass tests | credential leakage scans | token storage, deep link auth |
| Performance | API p95 and DB load | per-turn latency overhead, memory usage | provisioning duration and retry cost | startup latency, push receipt latency |

## 3. Mandatory Security Invariant Suite

The following automated tests are required before each release:

1. **Secrets Never Leave Server Test**
- Seed known secrets in vault and files.
- Trigger prompts/tool outputs containing these values.
- Assert outbound payloads and persisted logs contain only placeholders.

2. **Command Classification Matrix Test**
- Execute fixtures for each command class (Green/Yellow/Yellow+External/Red/Critical).
- Validate behavior across autonomy levels 1-3.

3. **External Comms Independence Test**
- At autonomy level 3, external action remains blocked when comms gate locked.
- Unlock only targeted tool; validate others remain blocked.

4. **Approval Expiry Test**
- Approval request expires at 24h.
- Late approval cannot be replayed.

5. **SECRET_REF Boundary Test**
- Secrets cannot be requested directly by raw name/value.
- Only valid references in allowlisted tool operations resolve.

## 4. Provisioning Test Strategy

## 4.1 Fast Checks

- Shellcheck + static checks for bash scripts.
- Template substitution tests (all placeholders resolved, none leaked).
- Stack inventory policy tests (no banned tools like n8n).

## 4.2 Disposable VPS E2E

Nightly automated runs:

- create disposable Debian VPS
- run full provisioning
- run smoke checks on selected tool endpoints
- verify tenant registration + heartbeat + approvals
- tear down VPS and collect artifacts

## 5. Contract Testing

- OpenAPI specs for Hub APIs and tenant APIs.
- Consumer-driven contract tests for:
  - Safety Wrapper against Hub tenant endpoints
  - Mobile app against customer endpoints
  - Website onboarding against public endpoints
- Contract break blocks merge.

## 6. Data And Billing Validation

- Synthetic token event generator with known totals.
- Reconcile tenant usage buckets against Hub aggregated totals.
- Reconcile Hub totals against Stripe meter/invoice preview.
- Fail build if variance exceeds threshold.

## 7. Quality Gates (CI)

- Unit + integration tests must pass.
- Security invariants must pass.
- Critical package diff review for Safety Wrapper and Provisioner.
- Minimum thresholds:
  - security-critical modules: >=90% branch coverage
  - overall backend: >=75% branch coverage
- Mutation testing on classifier and redactor modules.

## 8. Human Review Workflow (Anti-AI-Slop)

Required for security-critical PRs:

- one reviewer validates threat model assumptions
- one reviewer validates test completeness and failure cases
- checklist includes: error paths, rollback behavior, idempotency, logging hygiene

No direct auto-merge for changes in:

- redaction engine
- command classifier
- secret storage/resolution
- provisioning credential handling

## 9. Launch Validation Checklist

Before founding-member launch:

- 7-day staging soak with no sev-1/2 defects
- two successful rollback drills (control plane and tenant plane)
- production canary with live approval + billing reconciliation
- first-hour templates executed successfully on staging tenants