letsbe-sysadmin/ROADMAP.md

4.9 KiB

SysAdmin Agent Roadmap

This document tracks Agent-specific work for the AI SysAdmin system.

Completed Work

Core Infrastructure

  • Secure startup
  • Automatic registration with orchestrator
  • Polling loop (configurable interval)
  • Heartbeat loop
  • Executor registry system
  • BaseExecutor + ExecutionResult model
  • Logging with structlog
  • Sandboxing and path validation
  • Task timing, error propagation
  • Circuit breaker for resilience
  • Full test suite (140+ tests)

Executors

Executor Purpose Tests Status
ECHO Test connectivity Done
SHELL Run allowed shell commands Done
ENV_UPDATE Atomic env file edits Done
ENV_INSPECT Read and parse env files Done
FILE_WRITE Write files safely Done
FILE_INSPECT Read files with size limits 24 Done
DOCKER_RELOAD Pull + up -d compose stacks 26 Done
COMPOSITE Chain multiple executors Done
NEXTCLOUD Nextcloud-specific tasks Done
PLAYWRIGHT Browser automation Done

Security

  • Path sandboxing to /opt/letsbe/
  • Allowed file root validation
  • Max file size limits
  • Shell command timeout
  • Non-root execution (configurable)

Remaining Work

Phase 1: Support for New Playbooks

No new executors needed - existing executors support all Phase 1 tool playbooks via COMPOSITE tasks.


Phase 2: Introspection Executors

Executor Purpose Status
SERVICE_DISCOVER List all running services/containers ⬚ Todo
CONFIG_SCAN Find misconfigurations across services ⬚ Todo
NGINX_INSPECT Parse nginx configs for domain info ⬚ Todo

Phase 3: Server-Level Executors

Executor Purpose Status
NGINX_RELOAD Validate and reload nginx ⬚ Todo
HEALTHCHECK Check docker status, ports, logs ⬚ Todo
STACK_HEALTH Verify docker compose stack integrity ⬚ Todo
PACKAGE_UPGRADE System package updates ⬚ Todo

NGINX_RELOAD requirements:

  • Validate config with nginx -t
  • Reload with nginx -s reload
  • Rollback on failure
  • Path sandboxing for config files

HEALTHCHECK requirements:

  • Check container status via Docker API
  • Verify expected ports are listening
  • Scan logs for error patterns
  • Return structured health report

Phase 4: Advanced Executors

Executor Purpose Status
BACKUP Create and upload backups ⬚ Todo
RESTORE Restore from backup ⬚ Todo
LOG_TAIL Stream logs from containers ⬚ Todo
CERT_CHECK Verify SSL certificate status ⬚ Todo

Phase 5: Playwright Browser Automation

Completed:

  • Playwright installation in container
  • Scenario-based executor architecture
  • Domain allowlist security (mandatory)
  • Screenshot capture for success/failure
  • Artifact storage with per-task isolation
  • Route interception for domain blocking
  • Unit tests for validation logic

Available Scenarios:

Scenario Purpose Status
echo Test connectivity and page load Done
nextcloud_initial_setup Automate Nextcloud admin setup wizard Done

Usage Example:

{
  "type": "PLAYWRIGHT",
  "payload": {
    "scenario": "nextcloud_initial_setup",
    "inputs": {
      "base_url": "https://cloud.example.com",
      "admin_username": "admin",
      "admin_password": "secret123"
    },
    "options": {
      "allowed_domains": ["cloud.example.com"],
      "screenshot_on_success": true
    }
  }
}

Remaining Work:

  • MCP sidecar service for exploratory browser control
  • Additional tool setup scenarios (Keycloak, Poste, etc.)

Executor Implementation Pattern

All executors follow the same pattern:

from app.executors.base import BaseExecutor, ExecutionResult

class NewExecutor(BaseExecutor):
    """Description of what this executor does."""

    async def execute(self, payload: dict) -> ExecutionResult:
        # 1. Validate payload
        # 2. Validate paths (if file operations)
        # 3. Perform operation
        # 4. Return ExecutionResult(success=True/False, data={...}, error=...)

Register in app/executors/__init__.py:

from .new_executor import NewExecutor
EXECUTOR_REGISTRY["NEW_TYPE"] = NewExecutor

Testing

All executors must have comprehensive tests:

# Run all tests
pytest

# Run specific executor tests
pytest tests/test_executors/test_new_executor.py -v

# Run with coverage
pytest --cov=app/executors

Next Steps

  1. Existing executors support Phase 1 - no changes needed
  2. When Phase 2 starts, implement SERVICE_DISCOVER executor
  3. When Phase 3 starts, implement NGINX_RELOAD and HEALTHCHECK