LetsBeBiz-Redesign/docs/architecture-proposal/gpt/04-implementation-plan-and-...

177 lines
8.1 KiB
Markdown

# 04. Detailed Implementation Plan And Dependency Graph
## 1. Planning Assumptions
- Target launch window: 12 weeks.
- Team model assumed for schedule below:
- 2 backend/platform engineers
- 1 mobile/fullstack engineer
- 1 DevOps/SRE engineer
- 1 QA/security engineer (shared)
- Existing Hub codebase is retained and extended.
## 2. Work Breakdown Structure (WBS)
## Phase 0: Prerequisite Cleanup And Hardening (Week 1)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P0-1 | Remove all `n8n` code references (Hub, provisioner, stacks, scripts, tests) | 3d | - | `rg -n n8n` clean in production code paths; CI policy check added |
| P0-2 | Remove deprecated deploy targets (`orchestrator`, `sysadmin`) from active provisioning | 2d | P0-1 | No new orders can deploy deprecated services |
| P0-3 | Fix plaintext provisioning secret leak (`jobs/*/config.json`) | 2d | P0-1 | No root/server password persisted in plaintext job files |
| P0-4 | Baseline security regression tests for cleanup changes | 1d | P0-2,P0-3 | Green CI + sign-off |
## Phase 1: Safety Substrate (Weeks 2-3)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P1-1 | Build encrypted secrets vault SQLite schema + key management | 3d | P0-4 | CRUD, rotation, audit log implemented |
| P1-2 | Implement egress redaction proxy (registry + regex + entropy layers) | 4d | P1-1 | Redaction test suite pass with seeded secrets |
| P1-3 | Implement command classification engine (5-tier + external gate) | 3d | P1-1 | Deterministic policy tests pass |
| P1-4 | Implement approval state cache + retry logic (tenant-local) | 2d | P1-3 | Approval resilience tests pass |
| P1-5 | OpenClaw plugin skeleton with hooks + telemetry envelope | 3d | P1-2,P1-3 | Hook smoke tests green against pinned OpenClaw tag |
## Phase 2: Hub Tenant APIs + Data Model (Weeks 3-4)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P2-1 | Add Prisma models: approval queue, usage buckets, agent policy, comms unlocks | 2d | P0-4 | Migration applied in staging |
| P2-2 | Implement tenant register/heartbeat/config APIs | 3d | P2-1 | Contract tests pass |
| P2-3 | Implement tenant approval-request APIs + customer approval endpoints | 3d | P2-1 | End-to-end approval cycle works |
| P2-4 | Implement usage ingest + billing period updates | 3d | P2-1 | Usage events visible in dashboard |
| P2-5 | Add push notification pipeline for approvals | 2d | P2-3 | Mobile push test path validated |
## Phase 3: Safety Wrapper Execution Layer (Weeks 4-6)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P3-1 | Port shell/docker/file/env guarded executors from sysadmin patterns | 5d | P1-5 | Security unit tests pass |
| P3-2 | Implement tool registry loader + SECRET_REF resolver | 3d | P1-1,P3-1 | Tool calls run without raw secret exposure |
| P3-3 | Implement core adapters (Chatwoot, Ghost, Nextcloud, Cal.com, Odoo, Listmonk) | 6d | P3-2 | Adapter contract tests pass |
| P3-4 | Implement metering capture and hourly bucket compaction | 2d | P1-5,P2-4 | Buckets reliably posted to Hub |
| P3-5 | Add subagent budget/depth limits and policy enforcement | 2d | P1-5 | Policy tests and abuse tests pass |
## Phase 4: Provisioner Retool (Weeks 5-7)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P4-1 | Add OpenClaw + Safety deployment steps to provisioner | 4d | P3-2 | Fresh VPS comes online with heartbeat |
| P4-2 | Remove legacy stack templates and nginx configs from default deployment path | 2d | P0-2 | Deprecated stacks excluded from installs |
| P4-3 | Generate and deploy tenant configs/policies during provisioning | 3d | P2-2,P4-1 | Config sync succeeds on first boot |
| P4-4 | Migrate initial browser setup scenarios to OpenClaw browser tool | 4d | P4-1 | 8 scenarios replaced or retired |
| P4-5 | Add idempotent recovery checkpoints per provisioning step | 2d | P4-1 | Retry from failed step validated |
## Phase 5: Customer Interfaces (Weeks 6-9)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P5-1 | Customer web portal for approvals, agent settings, usage | 5d | P2-3,P2-4 | Beta usable on staging |
| P5-2 | Mobile app MVP (chat, approvals, health, usage) | 8d | P2-5,P5-1 | TestFlight/internal distribution ready |
| P5-3 | Public onboarding website + classifier + bundle calculator | 6d | P2-1 | Stripe flow works end-to-end |
| P5-4 | WhatsApp/Telegram fallback relay (minimal) | 3d | P2-3 | Approval fallback path works |
## Phase 6: Workflow Templates + Demo Experience (Weeks 8-10)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P6-1 | Implement 4 first-hour workflow templates as auditable blueprints | 5d | P3-3,P5-1 | Templates executable end-to-end |
| P6-2 | Build interactive demo tenant pool manager (TTL snapshots) | 4d | P4-1,P5-3 | Demo session provisioning <5 min |
| P6-3 | Add product telemetry for template completion and demo conversion | 2d | P6-1,P6-2 | Metrics dashboards live |
## Phase 7: Quality, Hardening, Launch (Weeks 10-12)
| ID | Task | Duration | Depends On | Exit Criteria |
|---|---|---:|---|---|
| P7-1 | Full security test suite (redaction, gating, injection, auth) | 4d | P3-5,P4-5 | Critical findings resolved |
| P7-2 | Load, soak, and chaos tests on staging fleet | 3d | P6-1 | SLO gates met |
| P7-3 | Canary launch (5% -> 25% -> 100%) with rollback drills | 4d | P7-1,P7-2 | Canary metrics stable |
| P7-4 | Launch readiness review + runbook finalization | 2d | P7-3 | Founding member launch sign-off |
## 3. Dependency Graph
```mermaid
graph TD
P0_1[P0-1 n8n cleanup] --> P0_2[P0-2 deprecated deploy removal]
P0_1 --> P0_3[P0-3 plaintext secret fix]
P0_2 --> P0_4[P0-4 baseline security tests]
P0_3 --> P0_4
P0_4 --> P1_1[P1-1 vault]
P1_1 --> P1_2[P1-2 egress proxy]
P1_1 --> P1_3[P1-3 classification]
P1_3 --> P1_4[P1-4 approval cache]
P1_2 --> P1_5[P1-5 openclaw plugin skeleton]
P1_3 --> P1_5
P0_4 --> P2_1[P2-1 hub prisma models]
P2_1 --> P2_2[P2-2 tenant register/heartbeat/config]
P2_1 --> P2_3[P2-3 approval APIs]
P2_1 --> P2_4[P2-4 usage ingest]
P2_3 --> P2_5[P2-5 push notifications]
P1_5 --> P3_1[P3-1 guarded executors]
P1_1 --> P3_2[P3-2 tool registry + secret ref]
P3_1 --> P3_2
P3_2 --> P3_3[P3-3 tool adapters]
P1_5 --> P3_4[P3-4 metering]
P2_4 --> P3_4
P1_5 --> P3_5[P3-5 subagent controls]
P3_2 --> P4_1[P4-1 provisioner openclaw+safety]
P0_2 --> P4_2[P4-2 legacy stack template removal]
P2_2 --> P4_3[P4-3 config generation]
P4_1 --> P4_3
P4_1 --> P4_4[P4-4 browser scenario migration]
P4_1 --> P4_5[P4-5 idempotent checkpoints]
P2_3 --> P5_1[P5-1 customer portal]
P2_4 --> P5_1
P2_5 --> P5_2[P5-2 mobile app MVP]
P5_1 --> P5_2
P2_1 --> P5_3[P5-3 onboarding website]
P2_3 --> P5_4[P5-4 whatsapp/telegram fallback]
P3_3 --> P6_1[P6-1 first-hour templates]
P5_1 --> P6_1
P4_1 --> P6_2[P6-2 interactive demo pool]
P5_3 --> P6_2
P6_1 --> P6_3[P6-3 template/demo telemetry]
P6_2 --> P6_3
P3_5 --> P7_1[P7-1 full security suite]
P4_5 --> P7_1
P6_1 --> P7_2[P7-2 load/soak/chaos]
P7_1 --> P7_3[P7-3 canary launch]
P7_2 --> P7_3
P7_3 --> P7_4[P7-4 launch readiness]
```
## 4. Critical Path
Primary critical chain:
`P0 cleanup -> P1 safety substrate -> P3 execution layer -> P4 provisioner retool -> P7 hardening/canary`
Secondary critical chain:
`P2 Hub APIs -> P5 mobile approvals -> P7 canary`
## 5. Parallelization Strategy
To meet 12 weeks, run these in parallel after Week 3:
- Track A: Safety Wrapper + adapters (P3)
- Track B: Provisioner retool (P4)
- Track C: Customer interfaces (P5)
## 6. Definition Of Done (Program-Level)
Launch gate passes only when all are true:
- secrets-never-leave-server invariant passes automated red-team test suite
- gating matrix works exactly for all 5 command classes and 3 autonomy levels
- external comms gate enforces lock-by-default at all autonomy levels
- provisioning succeeds >=90% first attempt and >=99% with retries
- approval path works across web + mobile push with audit completeness
- usage metering reconciles with provider usage within <=1% variance