LetsBeBiz-Redesign/docs/technical/LetsBe_Repo_Analysis.md

102 KiB
Raw Blame History

LetsBe Platform — Comprehensive Architecture Analysis

Generated: 2026-02-25 Scope: All 5 repositories in the LetsBe workspace


Table of Contents


Repository 1: letsbe-hub

1. Overview

  • Language/Framework: TypeScript (strict), Next.js 16.1.1 (App Router, standalone output, Turbopack), React 19.2.3
  • Runtime: Node.js 20 (Alpine in Docker)
  • Approximate LOC: ~12,00015,000 across 244 source files (156 .ts + 88 .tsx)
  • Key Dependencies:
    • ORM: Prisma 7.0.0 with @prisma/adapter-pg (PostgreSQL native driver)
    • Auth: NextAuth.js 5.0.0-beta.30 (Credentials provider, JWT sessions)
    • Payments: Stripe 17.7.0
    • SSH: ssh2 1.17.0
    • State: TanStack Query v5, Zustand 5.0.3
    • UI: Tailwind CSS 3.4.17, shadcn/ui (Radix primitives), Recharts 3.6.0
    • Validation: Zod 3.24.1
    • HTTP: undici 7.18.2 (Portainer HTTPS with self-signed certs)
    • Email: nodemailer 7.0.12
    • 2FA: otplib 13.1.0, qrcode 1.5.4
    • Storage: @aws-sdk/client-s3 3.968.0 (S3/MinIO)
    • Testing: Vitest 4.0.16, @testing-library/react 16.3.1
  • Deployment:
    • Dockerfile: Multi-stage build (deps → builder → runner), Node 20 Alpine, installs docker-cli, runs startup.sh (Prisma migrate deploy → node server.js)
    • deploy/docker-compose.yml: db (postgres:16-alpine) + hub (image: code.letsbe.solutions/letsbe/hub), port 127.0.0.1:3847:3000, mounts Docker socket + jobs/logs dirs
    • .gitea/workflows/build.yml: Gitea Actions — lint+typecheck job → Docker build+push to code.letsbe.solutions/letsbe/hub
    • deploy/nginx/hub.conf: Reverse proxy to localhost:3847

2. Architecture

Entry Points

  • Dev: npm run devnext devhttp://localhost:3000
  • Production: startup.shprisma migrate deploy then node server.js
  • DB Seed: npm run db:seedtsx prisma/seed.ts
  • Root page: src/app/page.tsx — redirects to /admin or /login

Core Modules

Module Path Description
automationWorker src/lib/services/automation-worker.ts State machine for order provisioning flow (PAYMENT_CONFIRMED → AWAITING_SERVER → SERVER_READY → DNS_PENDING → DNS_READY → PROVISIONING → FULFILLED). Handles AUTO/MANUAL/PAUSED modes.
configGenerator src/lib/services/config-generator.ts Generates JobConfig JSON (server IP, tools, domain, licenseKey, registry creds) for provisioning containers. Decrypts stored passwords.
dockerSpawner src/lib/services/docker-spawner.ts Spawns docker run for letsbe/ansible-runner containers. Mounts config.json + logs dir. Limits concurrency via DOCKER_MAX_CONCURRENT.
jobService src/lib/services/job-service.ts Manages ProvisioningJob lifecycle: create, claim (SELECT FOR UPDATE SKIP LOCKED), complete, fail (retry backoff: 1min/5min/15min).
dnsService src/lib/services/dns-service.ts DNS A-record verification for all required subdomains per tool before provisioning. Wildcard detection.
credentialService src/lib/services/credential-service.ts AES-256-CBC encrypt/decrypt using CREDENTIAL_ENCRYPTION_KEY with scrypt key derivation. Legacy key migration support.
netcupService src/lib/services/netcup-service.ts (~1150 lines) Full Netcup SCP API integration via OAuth2 Device Flow. Server list/detail, power actions, metrics, snapshots, rescue mode, hostname, reinstall.
portainerClient src/lib/services/portainer-client.ts (~707 lines) Portainer API client (JWT auth, self-signed cert via undici). Container list/inspect/logs/stats/start/stop/restart/remove.
stripeService src/lib/services/stripe-service.ts Stripe webhook verification, checkout session creation, price-to-plan mapping (STARTER/PRO).
emailService src/lib/services/email-service.ts Nodemailer SMTP wrapper. Sends welcome/test/notification emails. Lazy-loads transporter.
settingsService src/lib/services/settings-service.ts (~663 lines) 50+ system settings in SystemSetting table. Encrypted with SETTINGS_ENCRYPTION_KEY. Categories: docker, dockerhub, gitea, hub, provisioning, netcup, email, notifications, storage.
permissionService src/lib/services/permission-service.ts Role-based permission matrix (OWNER/ADMIN/MANAGER/SUPPORT) with 20 permission types.
apiKeyService src/lib/services/api-key-service.ts Hub API key generation (hk_* prefix) and SHA-256 hash validation.
containerHealthService src/lib/services/container-health-service.ts Polls Portainer for container state, detects crashes/OOM/restarts, records ContainerEvent.
errorDetectionService src/lib/services/error-detection-service.ts Regex pattern matching against container log lines using ErrorDetectionRule rules.
statsCollectionService src/lib/services/stats-collection-service.ts Collects CPU/memory/disk/network from Netcup + container counts from Portainer. Stores as ServerStatsSnapshot (90-day retention).
storageService src/lib/services/storage-service.ts S3-compatible (MinIO) storage for staff profile photos.
totpService src/lib/services/totp-service.ts TOTP/HOTP via otplib. QR code generation, secret encryption, backup codes.
securityVerificationService src/lib/services/security-verification-service.ts 8-digit codes for destructive actions (WIPE/REINSTALL) on enterprise servers, emailed to contact.
SSHClient src/lib/ssh/client.ts SSH2 wrapper: connect, runCommand, uploadContent (SFTP), streamCommand, testConnection.
ansible/runner src/lib/ansible/runner.ts SSH-based inline provisioning: uploads bootstrap script, installs Docker, deploys orchestrator + sysadmin-agent containers. 30-minute timeout.
hooks/ src/hooks/ (14 files) React Query hooks for all admin features: analytics, automation, customers, dns, enterprise-clients, netcup, orders, portainer, profile, provisioning-logs, servers, settings, staff, stats, two-factor.

Data Models (Prisma schema — prisma/schema.prisma)

Core Business Models:

Model Table Key Fields
User users id, email (unique), passwordHash, name, company, status (PENDING_VERIFICATION/ACTIVE/SUSPENDED), twoFactorEnabled, twoFactorSecretEnc, backupCodesEnc
Staff staff id, email (unique), passwordHash, name, role (OWNER/ADMIN/MANAGER/SUPPORT), status (ACTIVE/SUSPENDED), profilePhotoKey, 2FA fields
StaffInvitation staff_invitations id, email, role, token (unique), expiresAt, invitedBy
Subscription subscriptions id, userId, plan (TRIAL/STARTER/PRO/ENTERPRISE), tier (HUB_DASHBOARD/ADVANCED), tokenLimit, tokensUsed, trialEndsAt, stripeCustomerId, stripeSubscriptionId, status
Order orders id, userId, status (8 states: PAYMENT_CONFIRMED→FULFILLED/FAILED), tier, domain, tools[], configJson, automationMode (AUTO/MANUAL/PAUSED), customer, companyName, licenseKey, serverIp, serverPasswordEncrypted, sshPort, netcupServerId, portainerUrl, dashboardUrl, portainerUsername, portainerPasswordEnc, failureReason, timestamps
DnsVerification dns_verifications id, orderId (unique), wildcardPassed, manualOverride, allPassed, totalSubdomains, passedCount, lastCheckedAt, verifiedAt
DnsRecord dns_records id, dnsVerificationId, subdomain, fullDomain, expectedIp, resolvedIp, status (PENDING/VERIFIED/MISMATCH/NOT_FOUND/ERROR/SKIPPED)
ProvisioningJob provisioning_jobs id, orderId, jobType, status (PENDING/CLAIMED/RUNNING/COMPLETED/FAILED/DEAD), priority, claimedAt, claimedBy, containerName, attempt, maxAttempts (3), nextRetryAt, configSnapshot (JSON), runnerTokenHash, result, error
JobLog job_logs id, jobId, level, message, step, progress, timestamp
ProvisioningLog provisioning_logs id, orderId, level, message, step, timestamp
TokenUsage token_usage id, userId, instanceId, operation, tokensInput, tokensOutput, model, createdAt
RunnerToken runner_tokens id, tokenHash (unique), name, isActive, lastUsed
ServerConnection server_connections id, orderId (unique), registrationToken (unique), hubApiKey, hubApiKeyHash, orchestratorUrl, agentVersion, status (PENDING/REGISTERED/ONLINE/OFFLINE), lastHeartbeat
RemoteCommand remote_commands id, serverConnectionId, type (SHELL/RESTART_SERVICE/UPDATE/ECHO), payload, status (PENDING/SENT/EXECUTING/COMPLETED/FAILED), result, errorMessage, timestamps
SystemSetting system_settings id, key (unique), value, encrypted, category

Enterprise/Monitoring Models:

Model Table Key Fields
EnterpriseClient enterprise_clients id, name, companyName, contactEmail, contactPhone, notes, isActive
EnterpriseServer enterprise_servers id, clientId, netcupServerId, nickname, purpose, isActive, portainerUrl, portainerUsername, portainerPasswordEnc
ServerStatsSnapshot server_stats_snapshots id, serverId, clientId, timestamp, cpuPercent, memoryUsedMb, memoryTotalMb, diskReadMbps, diskWriteMbps, networkInMbps, networkOutMbps, containersRunning, containersStopped
ErrorDetectionRule error_detection_rules id, clientId, name, pattern (regex), severity, isActive, description
DetectedError detected_errors id, serverId, ruleId, containerId, containerName, logLine, context, timestamp, acknowledgedAt, acknowledgedBy
SecurityVerificationCode security_verification_codes id, clientId, code, action (WIPE/REINSTALL), targetServerId, expiresAt, usedAt, attempts
LogScanPosition log_scan_positions id, serverId, containerId, lastLineCount, lastLogHash, lastScannedAt
ContainerStateSnapshot container_state_snapshots id, serverId, containerId, containerName, state, exitCode, capturedAt
ContainerEvent container_events id, serverId, containerId, containerName, eventType (CRASH/OOM_KILLED/RESTART/STOPPED), exitCode, details, acknowledgedAt
NotificationSetting notification_settings id, clientId (unique), enabled, criticalErrorsOnly, containerCrashes, recipients[], cooldownMinutes
NotificationCooldown notification_cooldowns id, type (unique), lastSentAt
Pending2FASession pending_2fa_sessions id, token (unique), userId, userType, email, name, role, company, subscription, expiresAt

API Endpoints

Authentication:

Method Path Description
GET/POST /api/auth/[...nextauth] NextAuth handlers (Credentials provider)
GET /api/v1/auth/invite/[token] Look up staff invitation by token
POST /api/v1/auth/accept-invite Accept staff invitation, set password
GET /api/v1/auth/2fa/status Get 2FA enabled status
POST /api/v1/auth/2fa/setup Generate TOTP secret + QR code
POST /api/v1/auth/2fa/verify Verify TOTP code to enable 2FA
POST /api/v1/auth/2fa/disable Disable 2FA
GET /api/v1/auth/2fa/backup-codes Get/regenerate backup codes
POST /api/v1/setup One-time initial owner account creation

Profile:

Method Path Description
GET/PATCH /api/v1/profile Get/update current user profile
POST /api/v1/profile/photo Upload profile photo to S3/MinIO
POST /api/v1/profile/password Change password

Admin — Customers:

Method Path Description
GET /api/v1/admin/customers List customers (pagination, search)
POST /api/v1/admin/customers Create customer
GET /api/v1/admin/customers/[id] Get customer detail with orders
PATCH /api/v1/admin/customers/[id] Update customer

Admin — Orders:

Method Path Description
GET /api/v1/admin/orders List orders (filter by status, tier, search)
POST /api/v1/admin/orders Create order (staff-initiated, MANUAL mode)
GET /api/v1/admin/orders/[id] Get order detail
PATCH /api/v1/admin/orders/[id] Update order (credentials, status)
POST /api/v1/admin/orders/[id]/provision Spawn Docker provisioning container
GET /api/v1/admin/orders/[id]/logs/stream SSE stream provisioning logs
GET /api/v1/admin/orders/[id]/dns Get DNS verification status
POST /api/v1/admin/orders/[id]/dns/verify Trigger DNS verification
POST /api/v1/admin/orders/[id]/dns/skip Manual DNS override
GET/PATCH /api/v1/admin/orders/[id]/automation Get/change automation mode
GET /api/v1/admin/orders/[id]/containers List containers via Portainer
GET /api/v1/admin/orders/[id]/containers/stats All container stats
GET /api/v1/admin/orders/[id]/containers/[cId] Container detail
POST /api/v1/admin/orders/[id]/containers/[cId]/[action] start/stop/restart container
GET /api/v1/admin/orders/[id]/containers/[cId]/logs Container logs
GET /api/v1/admin/orders/[id]/containers/[cId]/stats Container CPU/memory stats
GET /api/v1/admin/orders/[id]/portainer Get Portainer credentials
POST /api/v1/admin/orders/[id]/portainer/init Initialize Portainer endpoint
POST /api/v1/admin/orders/[id]/test-ssh Test SSH connection

Admin — Servers:

Method Path Description
GET /api/v1/admin/servers List servers (from fulfilled orders)
GET /api/v1/admin/servers/[id]/health Server health / connection status
POST /api/v1/admin/servers/[id]/command Queue remote command for orchestrator
POST /api/v1/admin/portainer/ping Test Portainer connectivity

Admin — Netcup:

Method Path Description
GET/POST/DELETE /api/v1/admin/netcup/auth OAuth2 Device Flow (initiate/poll/disconnect)
GET /api/v1/admin/netcup/servers List Netcup servers
GET /api/v1/admin/netcup/servers/[id] Server detail
PATCH /api/v1/admin/netcup/servers/[id] Power action / hostname / nickname
GET /api/v1/admin/netcup/servers/[id]/metrics CPU, disk, network metrics
GET/POST /api/v1/admin/netcup/servers/[id]/snapshots List/create snapshots
DELETE/POST /api/v1/admin/netcup/servers/[id]/snapshots Delete/revert snapshot

Admin — Enterprise Clients:

Method Path Description
GET/POST /api/v1/admin/enterprise-clients List/create enterprise clients
GET /api/v1/admin/enterprise-clients/error-summary Aggregated error summary
GET/PATCH/DELETE /api/v1/admin/enterprise-clients/[id] CRUD
GET /api/v1/admin/enterprise-clients/[id]/stats Client stats history
GET /api/v1/admin/enterprise-clients/[id]/errors Detected errors
POST /api/v1/admin/enterprise-clients/[id]/errors/[eId]/acknowledge Acknowledge error
GET/POST /api/v1/admin/enterprise-clients/[id]/error-rules Error detection rules CRUD
GET/PATCH/DELETE /api/v1/admin/enterprise-clients/[id]/error-rules/[rId] Manage individual rule
GET /api/v1/admin/enterprise-clients/[id]/error-dashboard Error dashboard
GET /api/v1/admin/enterprise-clients/[id]/container-events Container crash/OOM events
GET/PATCH /api/v1/admin/enterprise-clients/[id]/notifications Notification settings
GET/POST /api/v1/admin/enterprise-clients/[id]/servers Servers for client
GET/PATCH/DELETE .../servers/[sId] Server CRUD
GET .../servers/[sId]/stats Server stats
GET .../servers/[sId]/containers Container list
GET .../servers/[sId]/containers/[cId] Container detail
GET .../servers/[sId]/containers/[cId]/logs Container logs
POST .../servers/[sId]/actions WIPE/REINSTALL (with security code)
POST .../servers/[sId]/verify Verify security code
POST .../servers/[sId]/test-portainer Test Portainer connection

Admin — Staff:

Method Path Description
GET /api/v1/admin/staff List staff members
POST /api/v1/admin/staff/invite Send invitation email
GET /api/v1/admin/staff/invites List pending invitations
DELETE /api/v1/admin/staff/invites/[id] Cancel invitation
PATCH/DELETE /api/v1/admin/staff/[id] Update/delete staff

Admin — Settings & Analytics:

Method Path Description
GET/POST /api/v1/admin/settings List all / batch update settings
GET/PATCH /api/v1/admin/settings/[key] Get/set single setting
POST /api/v1/admin/settings/email/test Send test email
POST /api/v1/admin/settings/storage/test Test S3/MinIO
GET /api/v1/admin/stats Dashboard statistics
GET /api/v1/admin/analytics Analytics data

Orchestrator Phone-Home (called BY orchestrator instances):

Method Path Auth Description
POST /api/v1/orchestrator/register registrationToken in body Orchestrator registers post-deploy, receives hubApiKey
POST /api/v1/orchestrator/heartbeat Bearer {hubApiKey} Status heartbeat, credential sync, receive queued commands
GET /api/v1/orchestrator/commands Bearer {hubApiKey} Poll for pending commands
POST /api/v1/orchestrator/commands Bearer {hubApiKey} Report command execution result

Public API:

Method Path Auth Description
POST /api/v1/public/orders X-API-Key Create order from external source (website), AUTO mode
GET /api/v1/public/orders X-API-Key Get order status

Webhooks:

Method Path Auth Description
POST /api/v1/webhooks/stripe Stripe signature Handles checkout.session.completed → creates User, Subscription, Order

Cron:

Method Path Auth Description
GET/POST /api/cron/collect-stats Bearer $CRON_SECRET Stats + log scanning + container health
GET /api/cron/cleanup-stats Bearer $CRON_SECRET Purge old stats (90-day retention)

Authentication / Authorization

  • Staff login: NextAuth.js v5 Credentials provider → bcrypt compare → optional TOTP 2FA (via Pending2FASession table, 5-min TTL) → JWT (7-day maxAge, 24h refresh)
  • Customer login: Same flow, userType: 'customer'
  • JWT payload: { id, userType, role, email, name, company, subscription }
  • Route protection: NextAuth auth() middleware — /admin/* requires userType === 'staff'
  • Permission matrix: permissionService — 20 permissions across OWNER/ADMIN/MANAGER/SUPPORT roles
  • Orchestrator API keys: hk_* prefix, SHA-256 hash storage, Bearer header
  • Public API: X-API-Key: $PUBLIC_API_KEY env var comparison
  • Cron: Authorization: Bearer $CRON_SECRET

Configuration

Variable Description
DATABASE_URL PostgreSQL connection string
NEXTAUTH_URL, NEXTAUTH_SECRET, AUTH_TRUST_HOST NextAuth config
CREDENTIAL_ENCRYPTION_KEY AES-256-CBC for server passwords
SETTINGS_ENCRYPTION_KEY AES-256-CBC for system settings
ENCRYPTION_KEY Legacy fallback
HUB_URL Public URL of this Hub instance
ADMIN_EMAIL, ADMIN_PASSWORD Initial owner setup
PUBLIC_API_KEY External API auth
CRON_SECRET Cron job auth
STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET Stripe integration
STRIPE_STARTER_PRICE_ID, STRIPE_PRO_PRICE_ID Stripe price IDs
DOCKER_REGISTRY_URL, DOCKER_IMAGE_NAME, DOCKER_IMAGE_TAG Runner container config
DOCKER_NETWORK_MODE, DOCKER_MAX_CONCURRENT Docker spawning limits
JOBS_DIR, JOBS_HOST_DIR, LOGS_DIR, LOGS_HOST_DIR Job file paths
TELEMETRY_RETENTION_DAYS Stats retention (default 90)

3. Inter-Repo Dependencies

Target Repo How Details
letsbe-ansible-runner Docker image spawn docker-spawner.ts spawns code.letsbe.solutions/letsbe/ansible-runner:latest containers for provisioning jobs. Mounts config.json + logs dir.
letsbe-orchestrator Deployed on tenant servers via ansible/runner.ts Image code.letsbe.solutions/letsbe/letsbe-orchestrator:latest deployed as container letsbe-orchestrator-api on port 8100. Also deploys companion postgres:16 as letsbe-orchestrator-db.
letsbe-sysadmin-agent Deployed on tenant servers via ansible/runner.ts Image code.letsbe.solutions/letsbe/letsbe-sysadmin:latest deployed as letsbe-sysadmin-agent container. Connected to local orchestrator at http://letsbe-orchestrator-api:8100.
letsbe-mcp-browser No references found Not deployed or referenced anywhere in Hub codebase.

APIs exposed that other repos consume:

  • POST /api/v1/orchestrator/register — called by deployed orchestrators
  • POST /api/v1/orchestrator/heartbeat — called by deployed orchestrators (and by sysadmin agent's hub_client.py)
  • GET/POST /api/v1/orchestrator/commands — command queue for remote orchestrators
  • POST /api/v1/jobs/{jobId}/logs — called by ansible-runner to stream logs
  • PATCH /api/v1/jobs/{jobId} — called by ansible-runner to update job status
  • POST /api/v1/instances/activate — called by ansible-runner's local_bootstrap.sh for license validation

4. Current State Assessment

Fully implemented and working:

  • Complete admin dashboard with staff management, RBAC, 2FA
  • Customer management (CRUD, subscriptions)
  • Order lifecycle with 8-state automation state machine
  • Netcup server management (full SCP API integration via OAuth2 Device Flow)
  • Portainer integration for container management on provisioned servers
  • DNS verification workflow before provisioning
  • Docker-based provisioning job spawning with retry logic
  • SSE log streaming during provisioning
  • Stripe webhook integration for checkout → order creation
  • Enterprise client management with per-server monitoring
  • Error detection with regex-based rules
  • Container health monitoring (crash/OOM/restart detection)
  • Email notifications with cooldown
  • Server stats collection and analytics with 90-day retention
  • AES-256-CBC credential encryption at rest
  • Security verification codes for destructive operations
  • Staff invitation flow with email
  • S3/MinIO storage for profile photos
  • System settings management with encryption
  • SSH-based inline provisioning as alternative path

Partially implemented:

  • automationMode: AUTO — the state machine exists but transition triggers between some states appear to rely on manual admin actions via the dashboard
  • Token usage tracking (TokenUsage model) — schema exists but no clear ingestion path from orchestrators

Not started / planned:

  • Customer-facing self-service portal (current UI is staff-only admin)
  • No README.md in the repository

Tests:

  • 10 unit test files under src/__tests__/unit/lib/ (services: api-key, automation-worker, config-generator, credential, dns, job, permission, security-verification; plus api/client and auth-helpers)
  • No integration tests, no E2E tests
  • No coverage configuration or thresholds

5. External Integrations

Integration Library Usage
Netcup SCP API raw fetch Server management, OAuth2 Device Flow auth
Stripe stripe npm Checkout sessions, webhooks, plan mapping
Portainer API undici Container CRUD on provisioned servers (self-signed cert support)
S3 / MinIO @aws-sdk/client-s3 Object storage for profile photos
SMTP nodemailer Email dispatch
Docker Engine child_process.exec('docker run ...') Runner container spawning
SSH ssh2 Direct server access for provisioning

Repository 2: letsbe-orchestrator

1. Overview

  • Language/Framework: Python 3.11, FastAPI >=0.109.0, Uvicorn
  • Approximate LOC: ~7,500 across ~50 Python source files
  • Key Dependencies:
    • ORM: SQLAlchemy 2.0 async (asyncpg for PostgreSQL, aiosqlite for tests)
    • Migrations: Alembic 1.13.0
    • Validation: Pydantic v2, pydantic-settings
    • HTTP: httpx 0.26.0
    • Rate Limiting: slowapi 0.1.9
    • Testing: pytest 8.0.0, pytest-asyncio
  • Deployment:
    • Dockerfile: python:3.11-slim, installs gcc + libpq-dev, exposes port 8000
    • docker-compose.yml: db (postgres:16-alpine, port 5434) + api (image: code.letsbe.solutions/letsbe/orchestrator:latest, port 127.0.0.1:8100:8000)
    • docker-compose-dev.yml: Local build, port 5433, DEBUG=true, ADMIN_API_KEY=dev-admin-key-12345
    • docker-compose.local.yml: LOCAL_MODE overlay (adds LOCAL_MODE=true, INSTANCE_ID, LOCAL_AGENT_KEY, Hub telemetry vars)
    • .gitea/workflows/build.yml: test job (pytest) → build job (Docker push to code.letsbe.solutions/letsbe/orchestrator)
    • deploy.sh: Registry login, pull, compose up, Alembic migration after 5s delay
    • nginx.conf: Reverse proxy to 127.0.0.1:8100

2. Architecture

Entry Points

  • Server: uvicorn app.main:app --host 0.0.0.0 --port 8000
  • Migrations: alembic upgrade head (run inside container via deploy.sh)
  • Tests: pytest -v --tb=short

Core Modules

Module Path Description
app/main.py App factory Registers all routers under /api/v1. CORS middleware, RequestIDMiddleware (X-Request-ID), trailing slash normalization, slowapi rate limiting, IntegrityError→409 handler. Lifespan: LocalBootstrapService.run() + HubTelemetryService.start() on startup.
app/config.py Settings Pydantic BaseSettings with @lru_cache singleton. DATABASE_URL, ADMIN_API_KEY, LOCAL_MODE, HUB_URL, etc.
app/db.py Database SQLAlchemy async engine factory, session maker, get_db() dependency with rollback-on-error.
app/models/ ORM models 6 models: Tenant, Agent, Task, Server, Event, RegistrationToken. All use UUID PK + TimestampMixin.
app/routes/agents.py (~530 lines) Agent management Registration (new token flow + legacy), local registration, heartbeat, list/get agents. Rate limited 5/min on registration.
app/routes/tasks.py (~284 lines) Task CRUD Create task, list (filter by tenant/status), atomic claim (oldest pending), update status/result.
app/routes/tenants.py Tenant CRUD Create, list, get, dashboard token set/revoke.
app/routes/events.py Event logging List, get, create events.
app/routes/env.py Env management Create ENV_INSPECT/ENV_UPDATE tasks for agent.
app/routes/files.py File management Create FILE_INSPECT tasks for agent.
app/routes/health.py Health check Returns {"status":"ok","version":"..."}
app/routes/meta.py Instance metadata Returns instance_id, local_mode, version, tenant_id, bootstrap_status.
app/routes/playbooks.py (~1531 lines) Playbook orchestration 10 tool playbooks generating COMPOSITE/PLAYWRIGHT tasks. Tools: Chatwoot, Nextcloud, Poste, Keycloak, n8n, Cal.com, Umami, Uptime Kuma, Vaultwarden, Portainer.
app/routes/registration_tokens.py Token management Admin-only CRUD for registration tokens with SHA-256 hash storage.
app/playbooks/ (10 files) Step builders Each file defines build_*_steps() → list of CompositeStep, and create_*_task() → persists COMPOSITE/PLAYWRIGHT task.
app/services/hub_telemetry.py (~271 lines) Hub telemetry Background asyncio service sending periodic JSON to {HUB_URL}/api/v1/instances/{INSTANCE_ID}/telemetry. Windowed SQL aggregates, exponential backoff, ±15% jitter.
app/services/local_bootstrap.py (~168 lines) Local mode bootstrap When LOCAL_MODE=true: waits for DB (30 retries × 2s), creates/retrieves "local" tenant. Exposes get_local_tenant_id() and get_bootstrap_status().
app/dependencies/auth.py Agent auth get_current_agent() — validates X-Agent-Id + X-Agent-Secret via SHA-256 hash comparison.
app/dependencies/admin_auth.py Admin auth verify_admin_api_key() — validates X-Admin-Api-Key header.
app/dependencies/dashboard_auth.py Dashboard auth verify_dashboard_token() — validates X-Dashboard-Token per-tenant (defined but not wired to any route).
app/dependencies/local_agent_auth.py Local agent auth verify_local_agent_key() — validates X-Local-Agent-Key, returns 404 if not LOCAL_MODE.

Data Models (SQLAlchemy — app/models/)

Table Key Columns
tenants id (UUID), name (unique), domain (unique, nullable), dashboard_token_hash (SHA-256, nullable), timestamps
agents id (UUID), tenant_id (FK→tenants, nullable), name (hostname), version, status (online/offline/invalid), last_heartbeat, token (legacy), secret_hash (SHA-256), registration_token_id (FK), timestamps
tasks id (UUID), tenant_id (FK→tenants), agent_id (FK→agents, nullable), type (VARCHAR(100), indexed), payload (JSONB), status (pending/running/completed/failed, indexed), result (JSONB, nullable), timestamps
servers id (UUID), tenant_id (FK→tenants), hostname, ip_address (IPv4/v6), status (provisioning/ready/error/terminated), timestamps
events id (UUID), tenant_id (FK→tenants, indexed), task_id (FK→tasks, nullable, indexed), event_type (indexed), payload (JSONB), created_at (indexed, immutable)
registration_tokens id (UUID), tenant_id (FK→tenants, indexed), token_hash (SHA-256, indexed), description, max_uses (default 1, 0=unlimited), use_count, expires_at, revoked, created_by, timestamps

Alembic Migrations: 4 migration files — initial_schema, add_agent_fields (secret_hash, registration_token_id), add_dashboard_token_hash, add_registration_tokens.

API Endpoints

Method Path Auth Description
GET / None Welcome message
GET /health None Health check
GET /api/v1/meta/instance None Instance metadata (LOCAL_MODE, version, bootstrap)
POST /api/v1/tenants None Create tenant
GET /api/v1/tenants None List tenants
GET /api/v1/tenants/{id} None Get tenant
POST /api/v1/tenants/{id}/dashboard-token AdminAuth Set dashboard token
DELETE /api/v1/tenants/{id}/dashboard-token AdminAuth Revoke dashboard token
POST /api/v1/tasks None Create task
GET /api/v1/tasks None List tasks (filter by tenant, status)
GET /api/v1/tasks/next AgentAuth Atomically claim oldest pending task
GET /api/v1/tasks/{id} None Get task
PATCH /api/v1/tasks/{id} AgentAuth Update task status/result
GET /api/v1/agents None List agents
GET /api/v1/agents/{id} None Get agent
POST /api/v1/agents/register None (rate limited) Register agent (token or legacy)
POST /api/v1/agents/register-local LocalAgentKey Local agent registration (idempotent)
POST /api/v1/agents/{id}/heartbeat AgentAuth Heartbeat
POST /api/v1/agents/{id}/env/inspect None Create ENV_INSPECT task
POST /api/v1/agents/{id}/env/update None Create ENV_UPDATE task
POST /api/v1/agents/{id}/files/inspect None Create FILE_INSPECT task
POST /api/v1/tenants/{id}/registration-tokens AdminAuth Create registration token
GET /api/v1/tenants/{id}/registration-tokens AdminAuth List tokens
GET /api/v1/tenants/{id}/registration-tokens/{tId} AdminAuth Get token
DELETE /api/v1/tenants/{id}/registration-tokens/{tId} AdminAuth Revoke token
GET /api/v1/events None List events
GET /api/v1/events/{id} None Get event
POST /api/v1/events AdminAuth Create event
POST /api/v1/tenants/{id}/{tool}/setup None ENV_UPDATE + DOCKER_RELOAD task (10 tools)
POST /api/v1/tenants/{id}/{tool}/initial-setup None PLAYWRIGHT task (8 tools)

Authentication

  • Agent Auth (new): X-Agent-Id + X-Agent-Secret headers → SHA-256 hash comparison on agent.secret_hash
  • Agent Auth (legacy): Authorization: Bearer <token> → plaintext comparison on agent.token
  • Admin Auth: X-Admin-Api-Key → comparison against settings.ADMIN_API_KEY
  • Dashboard Token (defined, unused): X-Dashboard-Token → SHA-256 comparison against tenant.dashboard_token_hash
  • Local Agent Key: X-Local-Agent-Key → comparison against settings.LOCAL_AGENT_KEY (returns 404 if not LOCAL_MODE)

Configuration

Variable Default Description
DATABASE_URL postgresql+asyncpg://orchestrator:orchestrator@localhost:5434/orchestrator DB connection
ADMIN_API_KEY (required) Admin endpoint auth
DEBUG false SQL echo + Swagger
DB_POOL_SIZE/DB_MAX_OVERFLOW/DB_POOL_TIMEOUT/DB_POOL_RECYCLE 5/10/30/1800 Connection pool
LOCAL_MODE false Single-tenant mode
INSTANCE_ID None Required if LOCAL_MODE
HUB_URL None Hub telemetry target
HUB_API_KEY None Hub auth
HUB_TELEMETRY_ENABLED false Enable telemetry
HUB_TELEMETRY_INTERVAL_SECONDS 60 Telemetry interval
LOCAL_TENANT_DOMAIN local.letsbe.cloud Auto-created tenant domain
CORS_ALLOWED_ORIGINS "" CORS
LOCAL_AGENT_KEY None Required if LOCAL_MODE

3. Inter-Repo Dependencies

Target Repo How Details
letsbe-hub HTTP telemetry HubTelemetryService sends POST {HUB_URL}/api/v1/instances/{INSTANCE_ID}/telemetry with X-Hub-Api-Key header
letsbe-sysadmin-agent Task consumer Agent polls GET /api/v1/tasks/next, reports via PATCH /api/v1/tasks/{id}, heartbeats via POST /api/v1/agents/{id}/heartbeat

APIs exposed:

  • Full REST API consumed by sysadmin-agent (registration, heartbeat, task polling, task updates, events)
  • Playbook endpoints consumed by Hub dashboard (via remote commands that proxy through orchestrator)
  • /api/v1/meta/instance consumed by ansible-runner's local_bootstrap.sh

4. Current State Assessment

Fully implemented:

  • Multi-tenant support with UUID-based isolation
  • Agent registration with both secure (SHA-256 secret hash) and legacy (plaintext token) flows
  • Registration token system with max_uses, expiration, revocation
  • Task queue with atomic claim (SELECT FOR UPDATE semantics)
  • 10 tool playbooks generating COMPOSITE and PLAYWRIGHT task chains
  • Hub telemetry service with windowed SQL aggregates and backoff
  • LOCAL_MODE bootstrap for single-tenant deployments
  • Rate limiting on agent registration (5/min)
  • Request ID middleware for tracing

Partially implemented:

  • DashboardAuthDep defined in app/dependencies/dashboard_auth.py but not applied to any route
  • servers table exists but no routes to manage servers directly
  • Playbook routes have no authentication (security concern — see Gap Analysis)

Not started (from ROADMAP.md):

  • 8+ additional tool playbooks (NocoDB, Directus, Ghost, MinIO, Activepieces, Listmonk, Odoo, Mixpost)
  • Server introspection APIs (/servers/{id}/scan, /diagnose, /health)
  • New task types: NGINX_RELOAD, HEALTHCHECK, STACK_HEALTH
  • LLM integration for natural language commands
  • Task chaining based on results
  • Automatic remediation workflows
  • Dashboard & UI

Tests: 13 test files covering agent auth, env routes, file routes, task auth, events, hub telemetry, local mode, and 3 playbook tests. Uses in-memory SQLite (not PostgreSQL). No coverage thresholds.

5. External Integrations

Integration Target Details
letsbe Hub {HUB_URL}/api/v1/instances/{INSTANCE_ID}/telemetry Periodic telemetry with agent/task/server metrics
Entri DNS Referenced in CLAUDE.md only Planned but not implemented
HashiCorp Vault Referenced in CLAUDE.md only Planned but not implemented

Repository 3: letsbe-sysadmin-agent

1. Overview

  • Language/Framework: Python 3.11, fully async (asyncio + httpx). No web framework — this is a worker/agent process with no HTTP listener.
  • Approximate LOC: ~7,600 across 30 source files + ~1,500 lines of tests
  • Key Dependencies:
    • HTTP: httpx >=0.27.0
    • Logging: structlog >=24.0.0
    • Validation: Pydantic >=2.0.0, pydantic-settings >=2.0.0
    • Browser: Playwright 1.49.1 (pinned)
    • Env: python-dotenv >=1.0.0
    • Testing: pytest >=8.0.0, pytest-asyncio >=0.23.0
  • Deployment:
    • Dockerfile: python:3.11-slim, installs docker-cli + Chromium + Docker Compose v2.32.1, playwright install chromium, creates non-root agent user, CMD ["python", "-m", "app.main"]
    • docker-compose.yml (dev): Mounts Docker socket + /opt/letsbe/{env,stacks,nginx}, runs as root, seccomp=chromium-seccomp.json, 1.5 CPU / 1GB RAM
    • docker-compose.local.yml: LOCAL_MODE overlay
    • docker-compose.prod.yml: Pre-built image + MCP browser sidecar service
    • .gitea/workflows/build.yml: Docker build+push to code.letsbe.solutions/letsbe/sysadmin-agent

2. Architecture

Entry Points

  • Primary: python -m app.mainapp.main.run()asyncio.run(main())
  • No CLI arguments — all config via environment variables

Core Modules

Module Path Description
app/main.py Entry point Initializes settings, configures logging, validates mounted dirs, creates OrchestratorClient + Agent + TaskManager, starts heartbeat + poll loops, awaits SIGTERM/SIGINT.
app/config.py Settings Pydantic BaseSettings (frozen). All runtime settings: URLs, credentials, timeouts, security paths, Playwright config. get_settings() is @lru_cache.
app/agent.py (~383 lines) Agent lifecycle Registration (priority: persisted creds → LOCAL_MODE → registration token → legacy), heartbeat loop with exponential backoff + jitter, graceful shutdown. Triggers Hub heartbeats after orchestrator heartbeats.
app/task_manager.py (~262 lines) Task dispatch Polls orchestrator for next task every poll_intervals, dispatches to executors via asyncio.create_task(), concurrency via Semaphore(max_concurrent_tasks), circuit breaker, status updates (RUNNING→COMPLETED/FAILED).
app/clients/orchestrator_client.py (~923 lines) Orchestrator HTTP client Circuit breaker (5 failures, 30s cooldown), exponential backoff + jitter, credential persistence to ~/.letsbe-agent/credentials.json (atomic write, mode 0600), pending result buffering to ~/.letsbe-agent/pending_results.json. Dual auth: X-Agent-Id+X-Agent-Secret (new) or Authorization: Bearer (legacy).
app/clients/hub_client.py (~161 lines) Hub HTTP client Optional. Sends heartbeats with tool credentials to {HUB_URL}/api/v1/orchestrator/heartbeat. Change detection via SHA-256 hash of credentials.env.
app/executors/__init__.py Executor registry EXECUTOR_REGISTRY dict mapping task type strings → executor classes. get_executor(task_type) factory.
app/executors/base.py Base class BaseExecutor (ABC) with task_type property + execute(). ExecutionResult dataclass: success, data, error, duration_ms.
app/executors/echo_executor.py Echo Returns {"echoed": payload}. For testing connectivity.
app/executors/shell_executor.py (~164 lines) Shell commands Validates against ALLOWED_COMMANDS allowlist (absolute paths + arg regex). Blocks shell metacharacters. Timeout enforcement.
app/executors/file_executor.py (~224 lines) File write/append Path traversal prevention, size limits, mode selection (write/append).
app/executors/file_inspect_executor.py (~154 lines) File read Byte-limited reads with truncation, path security.
app/executors/env_update_executor.py (~286 lines) Env file update Atomic writes (temp→chmod→rename), key merge/removal, KEY format validation (^[A-Z][A-Z0-9_]*$).
app/executors/env_inspect_executor.py (~162 lines) Env file read Read .env files, optional key filtering, path security.
app/executors/docker_executor.py (~291 lines) Docker Compose Finds compose file (yml/yaml priority), runs docker compose up -d --pull or --no-pull, path validation, timeout enforcement.
app/executors/composite_executor.py (~208 lines) Task chains Executes sequence of sub-tasks. Failure stops chain. Collects per-step results.
app/executors/nextcloud_executor.py (~359 lines) Nextcloud domain Runs docker exec → Nextcloud occ commands for trusted domain configuration.
app/executors/playwright_executor.py (~330 lines) Browser automation Domain validation, scenario lookup + execution, artifact collection (screenshots, traces).
app/playwright_scenarios/ (8 tools) Browser scenarios Initial setup automation for: Cal.com, Chatwoot, Keycloak, n8n, Nextcloud, Poste, Umami, Uptime Kuma. Each ~230-290 lines.
app/utils/validation.py (~426 lines) Security validation Shell command allowlist, path traversal prevention, ENV key format validation, domain allowlist checking, forbidden metacharacter blocking.
app/utils/credential_reader.py (~157 lines) Credential sync Reads /opt/letsbe/env/credentials.env, extracts structured credentials by tool (Portainer, Nextcloud, Keycloak, MinIO, Poste) for Hub heartbeat. SHA-256 change detection.
app/utils/logger.py Logging structlog config: JSON (production) or colored console (dev).

Executor Registry (Task Types)

Task Type Executor Class Description
ECHO EchoExecutor Connectivity test
SHELL ShellExecutor Allowlisted shell commands
FILE_WRITE FileExecutor Write/append files
FILE_INSPECT FileInspectExecutor Read files
ENV_UPDATE EnvUpdateExecutor Atomic .env file updates
ENV_INSPECT EnvInspectExecutor Read .env files
DOCKER_RELOAD DockerExecutor Docker Compose up -d
COMPOSITE CompositeExecutor Sequential sub-task chains
NEXTCLOUD_SET_DOMAIN NextcloudSetDomainExecutor Nextcloud trusted domain
PLAYWRIGHT PlaywrightExecutor Browser automation scenarios

Playwright Scenarios

Scenario Module What It Automates
echo app/playwright_scenarios/echo.py Test scenario — navigates + screenshots
calcom_initial_setup app/playwright_scenarios/calcom/initial_setup.py Admin account signup + onboarding
chatwoot_initial_setup app/playwright_scenarios/chatwoot/initial_setup.py Super admin account creation
keycloak_initial_setup app/playwright_scenarios/keycloak/initial_setup.py Admin login + realm creation
n8n_initial_setup app/playwright_scenarios/n8n/initial_setup.py Owner account setup
nextcloud_initial_setup app/playwright_scenarios/nextcloud/initial_setup.py Admin account creation
poste_initial_setup app/playwright_scenarios/poste/initial_setup.py Hostname + admin configuration
umami_initial_setup app/playwright_scenarios/umami/initial_setup.py Default password change + website tracking
uptime_kuma_initial_setup app/playwright_scenarios/uptime_kuma/initial_setup.py Admin account creation

Data Models

Model Fields Notes
Task (dataclass) id, type, payload, tenant_id, created_at Received from orchestrator
ExecutionResult (dataclass) success, data, error, duration_ms Returned by executors
HeartbeatResult (dataclass) status (enum: SUCCESS/AUTH_FAILED/SERVER_ERROR/NETWORK_ERROR/NOT_REGISTERED), message
ScenarioOptions (dataclass) timeout_ms, screenshot_on_failure/success, save_trace, allowed_domains, artifacts_dir
ScenarioResult (dataclass) success, data, screenshots, error, trace_path
Settings (BaseSettings) 35+ fields (see Configuration below) Frozen after init

Authentication (Outbound)

  • Orchestrator (new): X-Agent-Id + X-Agent-Secret headers. Also X-Agent-Version + X-Agent-Hostname.
  • Orchestrator (legacy): Authorization: Bearer <token>
  • Hub: Authorization: Bearer {HUB_API_KEY}
  • Credential persistence: ~/.letsbe-agent/credentials.json (mode 0600, atomic write)

Configuration

Variable Default Description
ORCHESTRATOR_URL http://host.docker.internal:8000 Orchestrator base URL
LOCAL_MODE false Single-tenant mode
LOCAL_AGENT_KEY None Key for local registration
REGISTRATION_TOKEN None Multi-tenant registration
AGENT_ID, AGENT_SECRET, TENANT_ID None Set post-registration
HUB_URL, HUB_API_KEY None Hub credential sync
HUB_TELEMETRY_ENABLED true Enable Hub heartbeats
HEARTBEAT_INTERVAL 30s Heartbeat frequency
POLL_INTERVAL 5s Task poll frequency
MAX_CONCURRENT_TASKS 3 Semaphore limit
CIRCUIT_BREAKER_THRESHOLD/COOLDOWN 5 failures / 30s Circuit breaker
ALLOWED_FILE_ROOT /opt/letsbe File ops root
ALLOWED_ENV_ROOT /opt/letsbe/env Env file root
ALLOWED_STACKS_ROOT /opt/letsbe/stacks Docker compose root
MAX_FILE_SIZE 10MB File write limit
SHELL_TIMEOUT 60s Command timeout
PLAYWRIGHT_DEFAULT_TIMEOUT_MS 60000 Action timeout
PLAYWRIGHT_NAVIGATION_TIMEOUT_MS 120000 Navigation timeout
MCP_SERVICE_URL None MCP browser sidecar URL (declared but unused)

3. Inter-Repo Dependencies

Target Repo How Details
letsbe-orchestrator Primary dependency — all API calls Registration, heartbeat, task polling, result submission, event dispatch. All via /api/v1/* paths.
letsbe-hub Optional — credential sync Sends heartbeats with Portainer/Nextcloud/Keycloak/MinIO/Poste credentials to POST /api/v1/orchestrator/heartbeat.
letsbe-mcp-browser Sidecar in prod compose docker-compose.prod.yml runs mcp-browser at http://mcp-browser:8931. MCP_SERVICE_URL setting declared but not wired to any executor code yet.

4. Current State Assessment

Fully implemented:

  • Agent registration (3 flows: persisted credentials, LOCAL_MODE, registration token)
  • Heartbeat with exponential backoff + jitter
  • Task polling with circuit breaker
  • 10 executor types (ECHO, SHELL, FILE_WRITE, FILE_INSPECT, ENV_UPDATE, ENV_INSPECT, DOCKER_RELOAD, COMPOSITE, NEXTCLOUD_SET_DOMAIN, PLAYWRIGHT)
  • 8 Playwright initial-setup scenarios
  • Security: shell command allowlist, path traversal prevention, env key validation, metacharacter blocking
  • Credential persistence with atomic writes
  • Pending result buffering for offline recovery
  • Hub credential sync with SHA-256 change detection

Partially implemented:

  • MCP_SERVICE_URL / mcp_service_url setting exists in config but no executor code uses it — the MCP browser integration is declared but not wired

Not started (from ROADMAP.md):

  • Phase 2: SERVICE_DISCOVER, CONFIG_SCAN, NGINX_INSPECT executors
  • Phase 3: NGINX_RELOAD, HEALTHCHECK, STACK_HEALTH, PACKAGE_UPGRADE executors
  • Phase 4: BACKUP, RESTORE, LOG_TAIL, CERT_CHECK executors
  • MCP sidecar integration for exploratory browser control

Tests: 8 test files covering all executors (composite, docker, env_inspect, env_update, file, file_inspect, nextcloud, playwright). ROADMAP claims 140+ tests. Uses mocks for subprocess/playwright — no real Docker or browser. No integration tests.

5. External Integrations

Integration Details
Orchestrator API All /api/v1/* endpoints
Hub API POST /api/v1/orchestrator/heartbeat (optional)
Docker Engine Docker socket mounted for docker-compose operations
Playwright/Chromium Local headless browser for initial setup scenarios
MCP Browser Sidecar at http://mcp-browser:8931 (prod compose, not yet integrated in code)
Target tools (via Playwright) Nextcloud, Keycloak, Chatwoot, n8n, Poste, Cal.com, Umami, Uptime Kuma

Repository 4: letsbe-ansible-runner

1. Overview

  • Language/Framework: Pure Bash shell scripts (set -euo pipefail). No application framework.
  • Container Base: debian:bookworm-slim
  • Approximate LOC: ~4,477 (scripts ~3,077, stacks ~800, nginx configs ~600)
  • Key Dependencies (apt): openssh-client, sshpass, jq, curl, ca-certificates, openssl, zip, unzip
  • Target server installs: Docker CE, nginx, certbot, fail2ban, ufw, unattended-upgrades, and ~20 more system packages
  • Deployment:
    • Dockerfile: debian:bookworm-slim, copies scripts/stacks/nginx, ENTRYPOINT entrypoint.sh
    • docker-compose.yml: Used by Hub to spawn short-lived provisioning containers. Image code.letsbe.solutions/letsbe/ansible-runner, mounts config.json + logs dir, limits 1.0 CPU / 512M, restart: "no"
    • .gitea/workflows/build.yml: Docker build+push to code.letsbe.solutions/letsbe/ansible-runner

2. Architecture

Entry Points

  • Primary: entrypoint.sh (Docker ENTRYPOINT) — the container's entire lifecycle
  • Remote scripts (executed on target server via SSH/SCP):
    • scripts/setup.sh — Full 10-step server provisioning
    • scripts/env_setup.sh — Template rendering + secret generation
    • scripts/local_bootstrap.sh — License validation + orchestrator initialization
    • scripts/backups.sh — Daily backup cron job
    • scripts/restore.sh — Manual backup restoration

Core Modules

Module Path Description
entrypoint.sh (~323 lines) Container entry SSH connects to target server (sshpass + root password), uploads scripts/stacks/nginx via SCP, executes env_setup.sh + setup.sh remotely, streams structured logs to Hub API, PATCHes job status on completion/failure.
scripts/setup.sh (~832 lines) Server provisioning 10-step setup: system packages, Docker CE, disable conflicting services, nginx + fallback config, UFW firewall, optional admin user + SSH key, SSH hardening (port 22022, key-only auth), unattended security updates, deploy tool stacks via docker-compose, deploy sysadmin agent, local_bootstrap.sh, Certbot SSL certs.
scripts/env_setup.sh (~678 lines) Secret generation + template rendering Accepts customer/domain/company via CLI args or JSON. Generates 50+ cryptographic secrets (passwords, tokens, keys). Replaces {{ variable }} placeholders in all docker-compose files, nginx configs, and backup script. Writes master credentials.env + portainer_admin_password.txt.
scripts/local_bootstrap.sh (~259 lines) Post-deploy bootstrap Calls POST {HUB_URL}/api/v1/instances/activate with license key. Waits for orchestrator health. Gets tenant ID from orchestrator /api/v1/meta/instance. Writes sysadmin-credentials.env + admin-credentials.env.
scripts/backups.sh (~473 lines) Automated backups Daily 2am cron. Backs up: 18 PostgreSQL databases (pg_dump), 2 MySQL (WordPress, Ghost), 1 MongoDB (LibreChat), env files, nginx configs, rclone config, crontab. Uploads to rclone remote. Rotates: 7 daily local + 4 weekly remote. Writes backup-status.json.
scripts/restore.sh (~512 lines) Backup restoration Subcommands: list, list-remote, download, postgres/mysql/mongo restore per-tool, env restore, nginx restore, full restore.
nginx/ (33 files) Nginx configs Reverse proxy templates for all tools. {{ variable }} placeholders. Certbot ACME challenge support.
stacks/ (28+ dirs) Docker Compose stacks Pre-configured stacks for: Activepieces, Cal.com, Chatwoot, Diun+Watchtower, Documenso, Ghost, Gitea, Gitea+Drone, GlitchTip, HTML, Keycloak, LibreChat, Listmonk, MinIO, n8n, Nextcloud, NocoDB, Odoo, Orchestrator, Penpot, Portainer, Redash, Squidex, Sysadmin (agent+MCP browser), Typebot, Umami, Uptime Kuma, Vaultwarden, Windmill, WordPress.

Data Schemas (JSON)

Job Config (/job/config.json — mounted by Hub):

server: { ip, port, rootPassword }
customer, domain, companyName, licenseKey, dashboardTier
tools: string[]
dockerHub: { username, token, registry? }
gitea: { registry, username, token }

Job Result (written to /tmp/job_result.json, sent to Hub):

dashboard_url, portainer_url, portainer_username?, portainer_password?

APIs Called (Outbound)

Method Path Auth Called From
POST {HUB_API_URL}/api/v1/jobs/{JOB_ID}/logs X-Runner-Token entrypoint.sh:log_to_hub()
PATCH {HUB_API_URL}/api/v1/jobs/{JOB_ID} X-Runner-Token entrypoint.sh:update_job_status()
POST {HUB_URL}/api/v1/instances/activate License key in body local_bootstrap.sh:validate_license()
GET {ORCHESTRATOR_URL}/health None local_bootstrap.sh:wait_for_orchestrator()
GET {ORCHESTRATOR_URL}/api/v1/meta/instance None local_bootstrap.sh:get_local_tenant_id()

Authentication

  • Runner → Hub: X-Runner-Token: ${RUNNER_TOKEN} header on all job log/status endpoints
  • Runner → Target Server: sshpass with root password from config.json. SSH options: StrictHostKeyChecking=no
  • Bootstrap → Hub: License key in JSON body
  • Post-setup SSH: Hardened to port 22022, key-only auth (PermitRootLogin prohibit-password)

Configuration

Variable Default Description
HUB_API_URL https://hub.letsbe.solutions Hub API base URL
JOB_ID (empty) Job identifier
RUNNER_TOKEN (empty) Hub auth token
JOB_CONFIG_PATH /job/config.json Config file path

3. Inter-Repo Dependencies

Target Repo How Details
letsbe-hub HTTP API calls Streams logs, updates job status, activates license.
letsbe-orchestrator Deploys + bootstraps stacks/orchestrator/docker-compose.yml pulls code.letsbe.solutions/letsbe/orchestrator:latest. local_bootstrap.sh calls orchestrator's /health and /api/v1/meta/instance.
letsbe-sysadmin-agent Deploys as stack stacks/sysadmin/docker-compose.yml pulls code.letsbe.solutions/letsbe/sysadmin-agent:latest. Always deployed in step 9.5 of setup.sh.
letsbe-mcp-browser Deploys as sidecar stacks/sysadmin/docker-compose.yml also runs code.letsbe.solutions/letsbe/mcp-browser:latest as mcp-browser service on port 8931.

4. Current State Assessment

Fully implemented:

  • Complete server provisioning pipeline (10 steps)
  • Template rendering for 33 nginx configs and 28+ docker-compose stacks
  • 50+ cryptographic secret generation
  • SSH hardening (port change, key-only auth, fail2ban)
  • UFW firewall configuration
  • Docker CE installation
  • Certbot SSL certificate issuance
  • Automated backup system (PostgreSQL × 18, MySQL × 2, MongoDB × 1)
  • Backup restoration for all supported databases
  • Backup rotation (7 daily + 4 weekly)
  • Hub integration (log streaming, status updates, license activation)
  • Orchestrator bootstrap (health check, tenant ID retrieval)

Bugs/Gaps identified:

  • stacks/sysadmin/docker-compose.yml line 66: MCP_BROWSER_API_KEY={{ mcp_browser_api_key }} template variable is referenced but NOT generated in env_setup.sh's sed substitutions — this placeholder would survive to runtime unchanged

Not started:

  • No tests whatsoever
  • No README

5. External Integrations

Integration Details
ifconfig.co / icanhazip.com Public IP detection during env_setup
Let's Encrypt / Certbot SSL certificate issuance
Docker Hub Public images (WordPress, Ghost, Redis, Postgres, etc.)
ghcr.io LibreChat, Windmill, Nextcloud images
quay.io Keycloak images
code.letsbe.solutions Private Gitea registry for LetsBe images
rclone remote Cloud backup storage
LibreChat AI providers Groq, Mistral, OpenRouter, Portkey, Anthropic, OpenAI, Google Gemini (configured in librechat.yaml)

Repository 5: letsbe-mcp-browser

1. Overview

  • Language/Framework: Python 3.11, FastAPI >=0.109.0, Uvicorn
  • Approximate LOC: ~1,246 (927 source + 319 tests)
  • Key Dependencies: FastAPI, Uvicorn, Playwright 1.56.0 (pinned), Pydantic v2, pydantic-settings, pytest + pytest-asyncio + httpx (test)
  • Deployment:
    • Dockerfile: mcr.microsoft.com/playwright/python:v1.56.0-jammy base (Chromium pre-installed), exposes port 8931, healthcheck via /health
    • docker-compose.yml: Dev stack, port 8931:8931, seccomp=chromium-seccomp.json, 1.5 CPU / 1G RAM, 3 max sessions
    • .gitea/workflows/build.yml: test job (pytest) → build job (Docker push to code.letsbe.solutions/letsbe/mcp-browser)

2. Architecture

Entry Points

  • Server: uvicorn app.server:app --host 0.0.0.0 --port 8931
  • Lifespan: On startup → launch Chromium, init SessionManager, start cleanup task. On shutdown → close all sessions, stop browser.

Core Modules

Module Path Description
app/server.py (~451 lines) FastAPI app All HTTP endpoints, Pydantic request/response models, API key auth dependency, lifespan management.
app/config.py (~38 lines) Settings Pydantic BaseSettings: MAX_SESSIONS (3), IDLE_TIMEOUT (300s), MAX_SESSION_LIFETIME (1800s), MAX_ACTIONS_PER_SESSION (50), API_KEY, SCREENSHOTS_DIR.
app/session_manager.py (~260 lines) Session management BrowserSession (isolated session with domain filter, page, timestamps, action counter). SessionManager (dict of sessions, asyncio.Lock, background cleanup loop).
app/playwright_client.py (~88 lines) Browser singleton Single Chromium instance shared across sessions. Docker-compatible launch flags (--no-sandbox, --disable-gpu, --single-process).
app/domain_filter.py (~83 lines) URL allowlist Exact domain, wildcard subdomain (*.example.com), domain with port. Case-insensitive regex compilation.

API Endpoints

Method Path Auth Description
GET /health None Health check: status, active_sessions, max_sessions
POST /sessions API Key Create session with allowed_domains list. Returns session_id, timestamps, action limits.
GET /sessions/{id}/status API Key Session status
DELETE /sessions/{id} API Key Close session (idempotent)
POST /sessions/{id}/navigate API Key Navigate to URL (domain allowlist checked). Returns title, url, or blocked reason.
POST /sessions/{id}/click API Key Click element by CSS selector
POST /sessions/{id}/type API Key Fill text into selector, optional Enter press
POST /sessions/{id}/wait API Key Wait for selector/text/timeout
POST /sessions/{id}/screenshot API Key Take PNG screenshot, save to disk
POST /sessions/{id}/snapshot API Key Accessibility tree as flat node list (ref, role, name, text)
GET /metrics API Key Basic metrics: active sessions, max sessions

Authentication

  • Optional API key: X-API-Key header. If API_KEY env var is empty, auth is disabled. /health is always public.

Configuration

Variable Default Description
MAX_SESSIONS 3 Max concurrent browser sessions
IDLE_TIMEOUT_SECONDS 300 (5 min) Session idle timeout
MAX_SESSION_LIFETIME_SECONDS 1800 (30 min) Absolute session lifetime
MAX_ACTIONS_PER_SESSION 50 Action limit per session
BROWSER_HEADLESS True Headless mode
DEFAULT_TIMEOUT_MS 30000 Element timeout
NAVIGATION_TIMEOUT_MS 60000 Navigation timeout
API_KEY "" (disabled) API key for auth
SCREENSHOTS_DIR /screenshots Screenshot storage

3. Inter-Repo Dependencies

Target Repo How Details
None This is a standalone sidecar. No outbound calls to other repos.

Consumed by:

  • letsbe-sysadmin-agent (docker-compose.prod.yml): Runs as sidecar at http://mcp-browser:8931 with shared MCP_BROWSER_API_KEY
  • letsbe-ansible-runner (stacks/sysadmin/docker-compose.yml): Deployed alongside sysadmin agent

4. Current State Assessment

Fully implemented:

  • Session-based browser management with domain allowlisting
  • All core browser actions (navigate, click, type, wait, screenshot, snapshot)
  • Automatic session cleanup (idle timeout, max lifetime, action limits)
  • Security via mandatory domain restrictions
  • Docker deployment with Chromium seccomp profile

Tests: 10 unit tests for DomainFilter, 14 for SessionManager (sync + async mocks). No integration tests, no FastAPI TestClient tests. httpx test dependency unused.

Zero TODOs or stubs in the codebase.

5. External Integrations

None. This service is entirely self-contained. It controls only a local Chromium browser instance. No external API calls.


6. System Architecture Map

                                    ┌─────────────────────────────┐
                                    │     EXTERNAL SERVICES       │
                                    │                             │
                                    │  Stripe ←── Webhooks        │
                                    │  Netcup SCP API (OAuth2)    │
                                    │  SMTP Server                │
                                    │  S3 / MinIO Storage         │
                                    │  Let's Encrypt (Certbot)    │
                                    │  Docker Hub / ghcr.io       │
                                    └────────────┬────────────────┘
                                                 │
┌────────────────────────────────────────────────┼────────────────────────────────────────────┐
│                                    CENTRAL HUB SERVER                                       │
│                                                │                                            │
│  ┌─────────────────────────────────────────────┴──────────────────────────────────────────┐ │
│  │                        letsbe-hub (Next.js 16.1 / TypeScript)                          │ │
│  │                                                                                        │ │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌───────────────────────────┐  │ │
│  │  │ Admin UI     │  │ REST API     │  │ Orchestrator │  │ Docker Spawner            │  │ │
│  │  │ (React/Next) │  │ /api/v1/*    │  │ Phone-Home   │  │ (spawns runner containers)│  │ │
│  │  └──────────────┘  └──────┬───────┘  │ /register    │  └────────────┬──────────────┘  │ │
│  │                           │          │ /heartbeat   │               │                  │ │
│  │                           │          │ /commands    │               │                  │ │
│  │                           │          └──────┬───────┘               │                  │ │
│  │                     ┌─────┴─────┐           │                      │                  │ │
│  │                     │ PostgreSQL │           │                      │                  │ │
│  │                     │ (Prisma)   │           │                      │                  │ │
│  │                     └───────────┘           │                      │                  │ │
│  └─────────────────────────────────────────────┼──────────────────────┼──────────────────┘ │
│                                                │                      │                    │
│  ┌─────────────────────────────────────────────┼──────────────────────┼────────────────┐   │
│  │                    letsbe-ansible-runner (Bash / Docker)           │                │   │
│  │                                             │                      │                │   │
│  │   Short-lived containers spawned by Hub ◄───┘                      │                │   │
│  │   ┌──────────────────────────────────────┐  ┌──────────────────┐  │                │   │
│  │   │ entrypoint.sh                        │  │ 28+ tool stacks  │  │                │   │
│  │   │   → SSH to target server             │  │ 33 nginx configs │  │                │   │
│  │   │   → SCP upload scripts/stacks        │──│ env_setup.sh     │  │                │   │
│  │   │   → Execute setup.sh remotely        │  │ setup.sh         │  │                │   │
│  │   │   → Stream logs back to Hub          │  │ local_bootstrap  │  │                │   │
│  │   │   → PATCH job status to Hub          │  │ backups/restore  │  │                │   │
│  │   └──────────────────────────────────────┘  └──────────────────┘  │                │   │
│  └───────────────────────────────────────────────────────────────────┘                │   │
│                                                                                        │   │
└────────────────────────────────────────────────────────────────────────────────────────────┘
                                                 │
                                   SSH (port 22 → 22022)
                                   + Docker image pulls
                                                 │
                                                 ▼
┌────────────────────────────────────────────────────────────────────────────────────────────┐
│                              TENANT SERVER (per-customer VPS)                               │
│                                                                                            │
│  ┌──────────────────────────────────────────────────────────────────────────────────────┐  │
│  │                    letsbe-orchestrator (FastAPI / Python 3.11)                        │  │
│  │                                                                                      │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌───────────────────────┐   │  │
│  │  │ REST API     │  │ Task Queue   │  │ Hub Telemetry│  │ Local Bootstrap      │   │  │
│  │  │ :8100        │  │ (PostgreSQL) │  │ (periodic)   │  │ (LOCAL_MODE setup)   │   │  │
│  │  │ /agents/*    │  │ PENDING →    │  │ POST to Hub  │  └───────────────────────┘   │  │
│  │  │ /tasks/*     │  │ RUNNING →    │  │ /telemetry   │                              │  │
│  │  │ /tenants/*   │  │ COMPLETED    │  └──────────────┘                              │  │
│  │  │ /playbooks/* │  └──────────────┘                                                │  │
│  │  └──────┬───────┘        │                                                          │  │
│  │         │          ┌─────┴─────┐                                                    │  │
│  │         │          │ PostgreSQL │                                                    │  │
│  │         │          │ :5432      │                                                    │  │
│  │         │          └───────────┘                                                    │  │
│  └─────────┼────────────────────────────────────────────────────────────────────────────┘  │
│            │ HTTP (poll)                                                                    │
│            ▼                                                                               │
│  ┌──────────────────────────────────────────────────────────────────────────────────────┐  │
│  │                 letsbe-sysadmin-agent (Python 3.11 / asyncio worker)                  │  │
│  │                                                                                      │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌───────────────────────┐   │  │
│  │  │ Agent        │  │ Task Manager │  │ Executors    │  │ Playwright Scenarios │   │  │
│  │  │ (register,   │  │ (poll, claim,│  │ SHELL        │  │ Chatwoot, Nextcloud  │   │  │
│  │  │  heartbeat)  │  │  dispatch)   │  │ FILE_WRITE   │  │ Keycloak, n8n        │   │  │
│  │  │              │  │              │  │ ENV_UPDATE   │  │ Cal.com, Poste       │   │  │
│  │  │              │  │              │  │ DOCKER_RELOAD│  │ Umami, Uptime Kuma   │   │  │
│  │  │              │  │              │  │ COMPOSITE    │  │                       │   │  │
│  │  │              │  │              │  │ PLAYWRIGHT   │  │                       │   │  │
│  │  └──────────────┘  └──────────────┘  └──────┬───────┘  └───────────┬───────────┘   │  │
│  └─────────────────────────────────────────────┼──────────────────────┼────────────────┘  │
│                                                │                      │                    │
│  ┌─────────────────────────────────────────────┼──────────────────────┼────────────────┐  │
│  │                letsbe-mcp-browser (FastAPI / Playwright sidecar)   │                │  │
│  │                              :8931                                 │                │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐             │                │  │
│  │  │ Session Mgmt │  │ Domain Filter│  │ Chromium     │◄────────────┘                │  │
│  │  │ (create,     │  │ (allowlist)  │  │ (headless)   │  HTTP API                    │  │
│  │  │  cleanup)    │  │              │  │ navigate,    │  (not yet wired from agent)  │  │
│  │  │              │  │              │  │ click, type  │                              │  │
│  │  └──────────────┘  └──────────────┘  │ screenshot   │                              │  │
│  │                                      │ snapshot     │                              │  │
│  │                                      └──────────────┘                              │  │
│  └────────────────────────────────────────────────────────────────────────────────────┘  │
│                                                                                          │
│  ┌──────────────────────────────────────────────────────────────────────────────────┐    │
│  │                        DEPLOYED TOOL STACKS (Docker Compose)                      │    │
│  │                                                                                    │    │
│  │  Portainer  │  Nextcloud  │  Keycloak  │  n8n  │  Cal.com  │  WordPress  │  ...   │    │
│  │  Chatwoot   │  Poste      │  MinIO     │  Ghost│  Umami    │  LibreChat  │        │    │
│  │  Typebot    │  NocoDB     │  Vaultwarden│ Penpot│  Uptime K │  Redash     │        │    │
│  └──────────────────────────────────────────────────────────────────────────────────┘    │
│                                                                                          │
│  nginx (reverse proxy for all tools) + Certbot SSL                                      │
│  UFW firewall + fail2ban + SSH hardening (port 22022)                                   │
└──────────────────────────────────────────────────────────────────────────────────────────┘

PROTOCOLS:
  Hub ──[HTTP REST]──► Ansible Runner (spawned Docker container)
  Ansible Runner ──[SSH/SCP]──► Tenant Server (provisioning)
  Ansible Runner ──[HTTP REST]──► Hub (log streaming, job status)
  Orchestrator ──[HTTP REST]──► Hub (telemetry, registration)
  Sysadmin Agent ──[HTTP REST]──► Orchestrator (register, heartbeat, tasks)
  Sysadmin Agent ──[HTTP REST]──► Hub (credential sync heartbeat)
  Sysadmin Agent ──[HTTP REST]──► MCP Browser (not yet wired)
  Hub Admin UI ──[HTTP REST]──► Hub API (same process, /api/v1/*)
  Hub ──[Portainer API/HTTPS]──► Tenant Portainer (container management)
  Hub ──[SSH]──► Tenant Server (direct SSH, test-connection)

7. Data Flow

7.1 User Signs Up and Logs into the Hub

  1. Stripe Checkout: Customer completes payment on external website. Stripe sends checkout.session.completed webhook to POST /api/v1/webhooks/stripe (src/app/api/v1/webhooks/stripe/route.ts).
  2. Webhook Handler (stripeService.ts): Verifies Stripe signature, extracts customer_email, customer_name, line_items, metadata (domain, tools). Maps price ID to plan (STARTER→ADVANCED, PRO→HUB_DASHBOARD).
  3. Create User + Subscription + Order: Creates User record (bcrypt-hashed password from metadata or generated), Subscription (plan, tier, limits), and Order (status: PAYMENT_CONFIRMED, domain, tools, tier, automationMode: AUTO).
  4. Staff Login (Admin): Staff navigates to /login. login-form.tsx posts credentials to NextAuth /api/auth/callback/credentials. NextAuth authorize() in src/lib/auth.ts looks up Staff by email, compares bcrypt hash. If 2FA enabled: creates Pending2FASession (5-min TTL), returns 2FA_REQUIRED:{token}. Frontend shows TOTP input. Second authorize() call verifies TOTP via totpService, issues JWT (7-day maxAge). Redirect to /admin.

7.2 User Creates and Configures an AI Agent

The platform doesn't have a direct "create agent" user flow. Agents are automatically deployed during server provisioning. The flow is:

  1. Admin creates Order (or AUTO mode from Stripe): Order enters the automationWorker state machine (automation-worker.ts).
  2. State: AWAITING_SERVER: Admin assigns a Netcup server (via dashboard, PATCH /api/v1/admin/orders/[id] with serverIp). Or selects from Netcup server list.
  3. State: SERVER_READY → DNS_PENDING → DNS_READY: dnsService verifies A records for all tool subdomains. Admin can skip via POST /dns/skip.
  4. State: PROVISIONING: POST /api/v1/admin/orders/[id]/provision triggers dockerSpawner.ts which runs docker run letsbe/ansible-runner with the job config.
  5. Runner provisions server (see 7.4 below). Among other things, deploys the orchestrator and sysadmin-agent containers.
  6. Agent auto-registers: On first boot, letsbe-sysadmin-agent calls POST /api/v1/agents/register-local on the local orchestrator, receives agent_id + agent_secret, persists to ~/.letsbe-agent/credentials.json.
  7. Agent starts polling: task_manager.py polls GET /api/v1/tasks/next every 5 seconds. Orchestrator can now dispatch tasks.

7.3 Agent Executes a Task (Trigger to Completion)

  1. Task Creation: Hub admin (or playbook endpoint) calls POST /api/v1/tasks on the orchestrator with {tenant_id, type, payload}. Example: type: "COMPOSITE", payload: { sequence: [{task: "ENV_UPDATE", payload: {...}}, {task: "DOCKER_RELOAD", payload: {...}}] }.
  2. Task Queued: Orchestrator persists to tasks table with status: PENDING.
  3. Agent Claims: sysadmin-agent's task_manager.py calls GET /api/v1/tasks/next. Orchestrator atomically claims oldest pending task for the agent's tenant (or any tenant if agent has no tenant_id). Returns Task(id, type, payload).
  4. Agent Dispatches: task_manager.py looks up executor in EXECUTOR_REGISTRY[task.type], wraps in asyncio.create_task() (bounded by Semaphore(3)).
  5. Agent Reports RUNNING: PATCH /api/v1/tasks/{id} with {status: "running"}.
  6. Executor Runs:
    • COMPOSITE: Iterates through payload.sequence, executing each sub-task's executor in order. Stops on first failure.
    • ENV_UPDATE: Reads .env file, merges new key-value pairs, atomic write (temp→chmod→rename).
    • DOCKER_RELOAD: Finds docker-compose.yml in specified dir, runs docker compose up -d --pull.
    • PLAYWRIGHT: Looks up scenario by name, launches Chromium page with domain allowlist, executes scenario steps (click, fill, wait, screenshot), collects artifacts.
  7. Agent Reports Result: PATCH /api/v1/tasks/{id} with {status: "completed", result: {...}} or {status: "failed", error: "..."}.
  8. Offline Recovery: If orchestrator is unreachable when reporting, result is buffered to ~/.letsbe-agent/pending_results.json and retried on next successful connection.

7.4 Ansible Playbook Gets Triggered and Runs

  1. Trigger: Hub admin clicks "Provision" → POST /api/v1/admin/orders/[id]/provision (in Hub).
  2. Config Generation: configGenerator.ts builds JobConfig JSON: server IP, root password (decrypted), domain, tools list, customer name, license key, Docker/Gitea registry credentials.
  3. Job Creation: jobService.ts creates ProvisioningJob (status: PENDING). Generates RunnerToken for auth.
  4. Docker Spawn: dockerSpawner.ts runs: docker run --name letsbe-runner-{jobId} -v {configPath}:/job/config.json:ro -v {logsPath}:/logs --memory=512m --cpus=1.0 code.letsbe.solutions/letsbe/ansible-runner:latest
  5. Container Starts → entrypoint.sh:
    • load_config(): Parses /job/config.json, extracts server IP, credentials, tools list.
    • step_prepare(): SSH to target, creates /opt/letsbe/{env,stacks,nginx,config,scripts} directories.
    • step_docker_login(): SSH, runs docker login for Gitea registry and Docker Hub.
    • step_upload(): SCP uploads all scripts/, stacks/, nginx/ directories to /opt/letsbe/.
    • step_env(): SSH, runs env_setup.sh --customer X --domain Y --company Z. Generates 50+ secrets, replaces all {{ variable }} templates.
    • read_portainer_credentials(): SSH, reads credentials.env for Portainer admin user/password.
    • step_setup(): SSH, runs setup.sh --tools "tool1,tool2,..." --domain "example.com". This is the big 10-step provisioning (packages, Docker, nginx, firewall, SSH hardening, tool stacks, sysadmin agent, bootstrap, SSL).
    • step_finalize(): Writes /tmp/job_result.json, PATCHes job status to Hub as completed (or failed).
  6. Log Streaming: Throughout, log_to_hub() sends POST {HUB_API_URL}/api/v1/jobs/{JOB_ID}/logs with structured JSON entries. Hub SSE endpoint (GET /orders/[id]/logs/stream) streams these to the admin UI.
  7. Container Exits: restart: "no" — container stops when entrypoint.sh finishes.

7.5 MCP Browser Tool Gets Invoked by an Agent

Current state: NOT YET FULLY WIRED.

The intended flow (based on code analysis):

  1. Task Creation: Orchestrator creates a task (likely future BROWSER_EXPLORE type or enhanced PLAYWRIGHT type) with a payload specifying URL and actions.
  2. Agent Receives Task: PlaywrightExecutor (or a future MCP executor) gets the task.
  3. Agent Calls MCP Browser: HTTP calls to http://mcp-browser:8931:
    • POST /sessions — creates session with allowed_domains
    • POST /sessions/{id}/navigate — navigate to URL
    • POST /sessions/{id}/snapshot — get accessibility tree
    • POST /sessions/{id}/click, /type, /wait — interact with page
    • POST /sessions/{id}/screenshot — capture visual state
    • DELETE /sessions/{id} — cleanup
  4. Agent Returns Result: Compiles browser interaction results and sends back to orchestrator.

What's actually implemented today:

  • MCP browser service is fully functional with all HTTP endpoints
  • MCP browser is deployed as a sidecar alongside the sysadmin agent (in stacks/sysadmin/docker-compose.yml and docker-compose.prod.yml)
  • MCP_SERVICE_URL setting exists in sysadmin-agent config but is not connected to any executor
  • The agent currently uses its own embedded Playwright (via PlaywrightExecutor + playwright_scenarios/) rather than the MCP browser sidecar
  • The MCP browser appears designed for future LLM-driven exploratory browsing (as opposed to the scripted scenarios the agent runs today)

8. Gap Analysis

8.1 Missing Pieces for Production SaaS Launch

  1. Customer-facing portal: The Hub UI is staff-only admin. No self-service portal for customers to manage their servers, view status, or configure tools. The User model and Subscription model exist but are only used in the Stripe webhook flow.

  2. Automated end-to-end provisioning: The automationWorker state machine exists but transitions between states (AWAITING_SERVER → SERVER_READY → DNS_PENDING → DNS_READY) appear to require manual admin actions. True AUTO mode needs: automatic Netcup server allocation, automatic DNS configuration (Entri integration referenced but not built), and automatic progression through all states.

  3. DNS automation: dnsService only verifies DNS records — it doesn't create them. An external DNS provider integration (Entri, Cloudflare, etc.) is needed to automatically configure A records for customer domains.

  4. Monitoring alerting completeness: Container health monitoring exists for enterprise clients but is not integrated into the standard order/server management flow. Self-hosted customers have no monitoring dashboard.

  5. Customer notification system: Email welcome + security codes exist, but no customer-facing status notifications (provisioning progress, downtime alerts, cert renewal warnings).

  6. Backup management UI: Backups are configured via cron on the target server (backups.sh) but there's no admin UI to monitor backup status, trigger restores, or configure backup settings.

  7. Multi-server support per customer: The current model is 1 order = 1 server. No support for customers needing multiple servers or high-availability deployments.

  8. Rate limiting / abuse prevention: Only agent registration is rate-limited (5/min). Public API, webhook endpoints, and admin routes have no rate limiting.

  9. Audit logging: Events table exists in orchestrator but no comprehensive audit trail in the Hub for admin actions (who changed what, when).

  10. API documentation: No OpenAPI/Swagger docs served in production (Swagger only enabled when DEBUG=true in orchestrator).

8.2 Security Concerns

  1. Hardcoded dev credentials: docker-compose-dev.yml in orchestrator has ADMIN_API_KEY=dev-admin-key-12345. Risk: if accidentally used in production.

  2. Unauthenticated orchestrator endpoints: Multiple endpoints lack auth:

    • POST /api/v1/tenants — anyone can create tenants
    • POST /api/v1/tasks — anyone can create tasks
    • GET /api/v1/tasks, GET /api/v1/agents — anyone can list tasks/agents
    • All playbook endpoints (POST /api/v1/tenants/{id}/{tool}/setup) — no auth
    • This is partially mitigated by the orchestrator only listening on 127.0.0.1:8100, but any process on the tenant server can access it.
  3. Root password in transit: The ansible-runner receives root passwords via config.json mounted from the Hub server. The password travels: Hub DB (encrypted) → configGenerator (decrypted) → config.json file on Hub disk → Docker mount → entrypoint.sh → sshpass over SSH. The config.json file sits unencrypted on disk.

  4. SSH StrictHostKeyChecking=no: entrypoint.sh disables host key checking, making it vulnerable to MITM attacks during initial provisioning.

  5. MCP browser API key gap: stacks/sysadmin/docker-compose.yml references {{ mcp_browser_api_key }} but env_setup.sh never generates or substitutes this variable. The MCP browser would run with no API key in production.

  6. Docker socket access: Both the sysadmin agent and Hub production containers mount /var/run/docker.sock and run as root, giving full Docker engine access. The sysadmin agent has a TODO about using Docker group membership instead.

  7. Secrets in .env files: All credentials are stored in plaintext .env files on the target server at /opt/letsbe/env/. No at-rest encryption. The ROADMAP mentions Vault integration but it's not built.

  8. No CSRF protection: NextAuth provides some CSRF via JWT, but custom API routes don't have explicit CSRF tokens.

  9. NextAuth beta: Using next-auth@5.0.0-beta.30 — a pre-release version in production.

8.3 Scalability Bottlenecks

  1. Single Hub instance: No horizontal scaling story for the Hub. PostgreSQL connection pooling exists but no multi-instance deployment config.

  2. Docker spawner concurrency: DOCKER_MAX_CONCURRENT=3 limits provisioning parallelism. Spawning Docker containers on the Hub server itself limits vertical scale.

  3. Ansible runner is synchronous SSH: Each provisioning job holds an SSH connection for 10-30 minutes. No work distribution or queue beyond Docker container isolation.

  4. Orchestrator per-tenant: Every tenant server runs its own PostgreSQL + orchestrator + sysadmin agent. This is clean isolation but expensive per-customer (3 containers overhead before any tools).

  5. Stats collection cron: collect-stats cron job polls every Netcup server and Portainer instance serially. As enterprise client count grows, this becomes a bottleneck.

  6. Portainer API as single point: All container management goes through Portainer's HTTP API. If Portainer is down on a server, no container visibility.

8.4 Overlap / Duplication

  1. Dual provisioning paths: The Hub has both ansible/runner.ts (SSH-based inline provisioning) AND docker-spawner.ts (ansible-runner container). The SSH path appears to be an older alternative. Both deploy the same infrastructure.

  2. Playwright in two places: The sysadmin agent has its own embedded Playwright (with playwright_scenarios/), AND the MCP browser sidecar provides an HTTP API for Playwright. The agent doesn't use the MCP browser yet — it runs Chromium directly.

  3. Credential sync duplication: Credentials flow Hub → ansible-runner → credentials.env on server → sysadmin-agent reads and sends back to Hub via heartbeat. This round-trip is redundant since the Hub generated the credentials in the first place.

  4. Hub telemetry vs credential heartbeat: The orchestrator sends telemetry to Hub (hub_telemetry.py), AND the sysadmin agent independently sends heartbeats with credentials to Hub (hub_client.py). Two separate phone-home mechanisms to the same Hub.

8.5 Dead Code / Potentially Unnecessary Components

  1. DashboardAuthDep in orchestrator: Defined but not applied to any route. Dashboard token can be set/revoked via admin endpoints but never checked.

  2. servers table in orchestrator: Model exists, relationships defined, but no routes to manage servers. Status values defined (provisioning/ready/error/terminated) but never set by any code path.

  3. LibreChat MCP server configs: stacks/librechat/librechat.yaml has commented-out MCP server configurations that are non-functional.

  4. Legacy agent token auth: Both orchestrator and sysadmin-agent maintain backward-compatible legacy token auth alongside the new secret-hash auth. This adds complexity.

  5. ansible/runner.ts in Hub: The SSH-based inline provisioning path appears to be superseded by the Docker-based ansible-runner approach, but the code remains.


9. Tech Debt & Quick Wins

Top 10 Tech Debt Items

# Item Repo Impact Effort
1 Unauthenticated orchestrator endpoints — tenant/task CRUD and all playbook routes have no auth. Any process on the tenant server can create tasks or trigger playbooks. orchestrator HIGH (security) Medium — add AdminAuth or DashboardAuth dependency to routes
2 No integration/E2E tests across any repo — all tests are unit tests with mocks. No tests that verify actual HTTP flows, database interactions (Hub uses mocked Prisma, orchestrator uses SQLite not PostgreSQL), or provisioning pipelines. all HIGH (reliability) High — need test infrastructure
3 MCP browser API key not generated{{ mcp_browser_api_key }} template variable in sysadmin docker-compose is never substituted by env_setup.sh. MCP browser runs unauthenticated. ansible-runner HIGH (security) Low — add key generation to env_setup.sh
4 Root password unencrypted on disk — provisioning job config.json sits in plaintext on Hub server's filesystem (JOBS_DIR) containing the root password for the target server. hub HIGH (security) Medium — encrypt at rest, delete after job
5 NextAuth beta dependencynext-auth@5.0.0-beta.30 in production. Breaking changes possible on upgrade. hub Medium (stability) High — monitor for stable release, pin carefully
6 Dual provisioning paths — Both ansible/runner.ts (SSH inline) and docker-spawner.ts (container-based) exist, adding confusion about which path is canonical. hub Medium (maintainability) Low — deprecate/remove one path
7 Legacy agent auth — Both token-based and secret-hash-based auth maintained in parallel. Adds code complexity and potential security confusion. orchestrator, agent Medium (complexity) Medium — migration plan + removal timeline
8 No backup monitoringbackups.sh writes backup-status.json but nothing reads it. Hub has no visibility into backup health. ansible-runner, hub Medium (operations) Medium — add status endpoint + Hub polling
9 Credential round-trip — Hub generates creds → ansible-runner writes to server → agent reads and heartbeats back to Hub. Hub already has the creds. hub, agent Low (waste) Low — eliminate re-sync, use Hub as source of truth
10 No README in any repo — all 5 repos lack README.md. Project documentation lives only in CLAUDE.md files (intended for AI assistants, not human developers). all Low (onboarding) Low — create READMEs

5 Quick Wins

# Win Repo Impact Effort
1 Add mcp_browser_api_key to env_setup.sh — generate a random key and add the sed substitution. Fixes a real security hole. ansible-runner HIGH ~30 minutes
2 Add auth to orchestrator playbook/tenant/task routes — apply existing AdminAuthDep or DashboardAuthDep (already coded) to unprotected routes. orchestrator HIGH ~2 hours
3 Clean up job config files after provisioning — add a post-completion step in jobService to delete the config.json file containing the root password. hub HIGH ~1 hour
4 Wire DashboardAuthDep to playbook routes — the dependency is already built and tested, just needs to be added to route decorators. orchestrator Medium ~1 hour
5 Remove or deprecate ansible/runner.ts — if docker-spawner is the canonical path, remove the SSH inline path to reduce confusion and attack surface. hub Medium ~2 hours

10. Recommended Architecture

10.1 Repository Disposition

Repo Recommendation Rationale
letsbe-hub KEEP — this is the central control plane Largest, most feature-complete repo. Admin UI, customer management, order lifecycle, payment integration, server monitoring.
letsbe-orchestrator KEEP — essential per-tenant control plane Runs on each tenant server. Task queue, agent management, playbook dispatch. Clean separation of concerns.
letsbe-sysadmin-agent KEEP — essential per-tenant worker Executes tasks from orchestrator. Well-designed executor pattern. Playwright scenarios working.
letsbe-ansible-runner KEEP but RENAME to letsbe-provisioner The name is misleading — it doesn't use Ansible at all. It's a Bash-based Docker provisioning container. The name should reflect what it actually does.
letsbe-mcp-browser EVALUATE for MERGE into sysadmin-agent The sysadmin agent already has embedded Playwright. The MCP browser is a separate HTTP API for the same capability. Currently unused. Consider either: (a) merging MCP browser functionality into the agent as an internal module, or (b) completing the integration and migrating all Playwright scenarios to use MCP browser as the browser backend. Option (b) is better long-term for LLM-driven exploratory browsing.

10.2 Missing Services / Components

Component Purpose Priority
Customer Portal Self-service dashboard for customers to view server status, manage tools, see credentials. Built as pages under the existing Hub (Next.js app, /customer route group). HIGH — needed for SaaS launch
DNS Automation Service Integration with DNS provider (Cloudflare, Entri, or Route53) to automatically create/manage A records. Could be a service module within the Hub. HIGH — blocks fully automated provisioning
Notification Service Transactional email service for provisioning status updates, downtime alerts, cert renewal warnings. Extend existing emailService with event-driven triggers. Medium
Backup Monitor Agent-side executor that reads backup-status.json and reports to orchestrator. Hub-side dashboard for backup health. Medium
Centralized Logging Aggregated log collection across tenant servers. Consider Loki or similar lightweight log aggregation. Current approach of per-server logs doesn't scale. Low (for initial launch)

10.3 Target Architecture for Production Launch

                    ┌──────────────────────────────────────────┐
                    │              INTERNET                      │
                    │                                            │
                    │  Stripe    Cloudflare/Entri    Netcup SCP │
                    └──────┬──────────┬──────────────┬──────────┘
                           │          │              │
                    ┌──────┴──────────┴──────────────┴──────────┐
                    │          letsbe-hub (Next.js)              │
                    │                                            │
                    │  Admin Portal (/admin)                     │
                    │  Customer Portal (/customer)  ← NEW       │
                    │  Public API (/api/v1/public)               │
                    │  Orchestrator Phone-Home (/api/v1/orch)    │
                    │  Stripe Webhooks                           │
                    │                                            │
                    │  Services:                                 │
                    │    automationWorker (state machine)        │
                    │    dnsService + DNS Provider ← NEW         │
                    │    dockerSpawner (runner containers)       │
                    │    netcupService                           │
                    │    jobService                              │
                    │    notificationService (enhanced) ← NEW   │
                    │                                            │
                    │  PostgreSQL (Prisma)                       │
                    └────────────────┬──────────────────────────┘
                                     │
                    Docker spawn      │    Phone-home (HTTPS)
                    ┌────────────┐   │   ┌──────────────────┐
                    │ letsbe-    │   │   │  Tenant Servers   │
                    │ provisioner│───┼──►│  (N instances)    │
                    │ (Bash)     │   │   │                   │
                    │            │   │   │  orchestrator     │◄── Tasks/Playbooks
                    │ Short-lived│   │   │  sysadmin-agent   │◄── Executors
                    └────────────┘   │   │  mcp-browser      │◄── LLM browsing
                                     │   │  (or merged)      │
                                     │   │                   │
                                     │   │  Tool Stacks:     │
                                     │   │  28+ services     │
                                     │   │  nginx + SSL      │
                                     │   └──────────────────┘
                                     │
                                     │ Heartbeat/Telemetry
                                     └──────────────────────

Key changes from current state:

  1. Add customer-facing portal within existing Hub
  2. Add DNS provider integration for automated record management
  3. Rename ansible-runnerprovisioner to reduce confusion
  4. Fix security gaps (auth on orchestrator routes, MCP browser API key, config cleanup)
  5. Either merge MCP browser into agent or complete the integration
  6. Remove legacy auth paths after migration period
  7. Remove duplicate SSH provisioning path from Hub
  8. Add integration tests for critical paths (provisioning, task execution, phone-home)
  9. Enhance backup monitoring with agent-to-Hub reporting

End of Analysis