22 KiB

Raw Permalink Blame History

LetsBe Biz — Dispatcher Routing Logic

Version: 1.0 Date: February 26, 2026 Authors: Matt (Founder), Claude (Architecture) Status: Engineering Spec — Ready for Implementation Companion docs: Technical Architecture v1.2, Pricing Model v2.2, SOUL.md Content Spec v1.0 Decision refs: Foundation Document Decisions #33, #35, #41

1. Purpose

This document specifies two routing systems that are central to LetsBe Biz:

Agent Routing (Dispatcher) — How user messages are routed to the correct AI agent
Model Routing — How AI requests are routed to the optimal LLM model based on task complexity, user settings, and cost constraints

Both routing systems live in the Safety Wrapper extension and operate transparently — users interact with "their AI team," not with routing logic.

2. Agent Routing (Dispatcher Logic)

2.1 Architecture

The Dispatcher is an OpenClaw agent configured with agentToAgent communication enabled. It uses the messaging tool profile and serves as the default entry point for all user messages.

User Message
    │
    ▼
Dispatcher Agent (SOUL.md: routing rules)
    │
    ├── Simple / cross-domain → Handle directly
    │
    ├── Infrastructure → delegate to IT Admin
    ├── Content / analytics → delegate to Marketing
    ├── Scheduling / comms → delegate to Secretary
    ├── CRM / pipeline → delegate to Sales
    │
    └── Multi-domain → coordinate across agents

2.2 Routing Decision Matrix

The Dispatcher routes based on intent classification. OpenClaw's native agent routing handles this through the Dispatcher's SOUL.md instructions — no separate classification model is needed.

Signal	Routes To	Examples
Infrastructure keywords	IT Admin	"restart", "container", "backup", "disk", "server", "install", "update", "nginx", "Docker", "SSL", "certificate", "Keycloak", "Portainer"
Content/analytics keywords	Marketing	"blog", "post", "newsletter", "campaign", "analytics", "traffic", "subscribers", "Ghost", "Listmonk", "Umami", "SEO"
Scheduling/comms keywords	Secretary	"calendar", "meeting", "schedule", "email", "respond", "follow up", "Chatwoot", "Cal.com", "appointment", "reminder"
CRM/sales keywords	Sales	"lead", "opportunity", "pipeline", "CRM", "deal", "prospect", "follow-up", "Odoo", "quote", "proposal"
System questions	Dispatcher (self)	"what can you do", "how does this work", "what tools do I have", "help", "status", "summary"
Multi-domain	Dispatcher coordinates	"morning briefing", "give me a weekly summary", "how's business", "prepare for my meeting with [client]"

2.3 Delegation Protocol

When the Dispatcher delegates to a specialist agent, it uses OpenClaw's native agent-to-agent messaging:

1. Dispatcher receives user message
2. Dispatcher identifies the target agent
3. Dispatcher sends structured delegation message:
   {
     "to": "it-admin",
     "context": "User requests: 'Why is Nextcloud slow?'",
     "expectation": "Diagnose and report. If action needed, get user approval."
   }
4. Target agent receives message, executes task
5. Target agent returns result to Dispatcher
6. Dispatcher formats and presents result to user

2.4 Multi-Agent Coordination

For tasks spanning multiple agents, the Dispatcher acts as coordinator:

Example: "Prepare for my call with Acme Corp tomorrow"

Dispatcher identifies subtasks:
- Secretary: Pull calendar details, recent email threads with Acme
- Sales: Pull CRM record, pipeline status, last interaction
- Marketing: Check if Acme visited the website recently (Umami)
Dispatcher delegates each subtask in parallel (or sequential if dependencies exist)
Dispatcher compiles results into a unified briefing
Dispatcher presents the briefing to the user

2.5 Fallback Behavior

Scenario	Behavior
Target agent unavailable (crashed/restarting)	Dispatcher notifies user, suggests IT Admin investigate
Ambiguous request	Dispatcher makes best judgment, routes, tells user who's handling it
User explicitly names an agent	Route directly ("Tell the IT Admin to restart Ghost")
Request is outside all agent capabilities	Dispatcher explains honestly what's possible and what isn't
Agent returns an error	Dispatcher reports the error to the user and suggests next steps

3. Model Routing

3.1 Architecture

Model routing determines which LLM processes each agent turn. The Safety Wrapper's before_prompt_build hook (or the outbound secrets proxy) controls which model endpoint the request is sent to.

Agent Turn
    │
    ▼
Safety Wrapper: Model Router
    │
    ├── Check user's model setting (Basic / Balanced / Complex / Specific Model)
    ├── Check if premium model → verify credit card on file
    ├── Check token pool → enough tokens remaining?
    │
    ▼
Route to OpenRouter endpoint
    │
    ├── Primary model → attempt
    ├── If rate limited → try auth profile rotation (same model, different key)
    ├── If still failing → fallback to next model in chain
    │
    ▼
Response → Token metering → Return to agent

3.2 Model Presets (Basic Settings)

Users who don't want to think about models pick a preset. Each preset maps to a prioritized model chain.

Preset	Primary Model	Fallback 1	Fallback 2	Blended Cost/1M	Use Case
Basic Tasks	GPT 5 Nano	Gemini 3 Flash Preview	DeepSeek V3.2	$0.20–1.58	Quick lookups, formatting, simple drafts
Balanced (default)	DeepSeek V3.2	MiniMax M2.5	GPT 5 Nano	$0.20–0.70	Daily operations, routine agent work
Complex Tasks	GLM 5	MiniMax M2.5	DeepSeek V3.2	$0.33–1.68	Analysis, multi-step reasoning, reports

Preset assignment logic:

function resolveModel(agentId, taskContext) {
  // 1. Check for agent-specific model override
  if (agentConfig[agentId].model) return agentConfig[agentId].model;

  // 2. Check user's global preset setting
  const preset = tenantConfig.modelPreset; // "basic" | "balanced" | "complex"

  // 3. Return the primary model for that preset
  return PRESETS[preset].primary;
}

3.3 Advanced Model Selection

Users with a credit card on file can select specific models per agent or per task:

Configuration Level	Scope	Example
Global preset	All agents, all tasks	"Use Balanced for everything"
Per-agent override	All tasks for one agent	"IT Admin uses Complex, everything else uses Balanced"
Per-task override (future)	Single task/conversation	"Use Claude Sonnet for this analysis"

Schema (Safety Wrapper config):

{
  "model_routing": {
    "default_preset": "balanced",
    "agent_overrides": {
      "it-admin": { "preset": "complex" },
      "marketing": { "model": "claude-sonnet-4.6" }
    },
    "premium_enabled": true,
    "credit_card_on_file": true
  }
}

3.4 Included vs. Premium Model Routing

Model Category	Token Pool	Billing	Credit Card Required
Included (DeepSeek V3.2, GPT 5 Nano, GPT 5.2 Mini, MiniMax M2.5, Gemini Flash, GLM 5)	Draws from monthly allocation	Subscription covers it	No
Premium (GPT 5.2, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3.1 Pro)	Separate — does NOT draw from pool	Per-token metered to credit card	Yes

Routing decision tree:

Is the selected model Premium?
├── No → Check token pool
│   ├── Tokens remaining → Route to model
│   └── Pool exhausted → Apply overage markup, notify user, route to model
│
└── Yes → Check credit card
    ├── Card on file → Route to model, meter tokens
    └── No card → Reject, prompt user to add card in settings

3.5 Token Pool Management

Each subscription tier includes a monthly token allocation:

Tier	Monthly Tokens	Founding Member Tokens
Lite (€29)	~8M	~16M
Build (€45)	~15M	~30M
Scale (€75)	~25M	~50M
Enterprise (€109)	~40M	~80M

Pool tracking implementation:

On every LLM response:
  1. Safety Wrapper captures token counts (input, output, cache_read, cache_write)
  2. Calculates cost: tokens × model_rate × (1 + openrouter_fee)
  3. Converts to "standard tokens" (normalized to DeepSeek V3.2 equivalent)
  4. Decrements from monthly pool
  5. Reports to Hub via usage endpoint

When pool is exhausted:
  1. Safety Wrapper detects pool < 0
  2. Switches to overage billing mode
  3. Applies overage markup (35% for cheap models, 25% mid, 20% top included)
  4. Notifies user: "Your included tokens are used up. Continuing at overage rates."
  5. User can top up, upgrade tier, or wait for next billing cycle

3.6 Fallback Chain Logic

When the primary model fails, the router attempts fallbacks before giving up.

Failure types and responses:

Failure	First Response	Second Response	Third Response
Rate limited (429)	Rotate auth profile (different OpenRouter key)	Wait 5s, retry same model	Fall to next model in chain
Model unavailable (503)	Fall to next model in chain immediately	Continue down chain	Return error to agent
Context too long	Truncate and retry	Fall to model with larger context (Gemini Flash: 1M)	Return error suggesting context compaction
Timeout (>60s)	Retry once	Fall to faster model	Return timeout error
Auth error (401/403)	Rotate auth profile	Retry with Hub-synced key	Return auth error, notify admin

Auth profile rotation: OpenClaw natively supports multiple auth profiles per model provider. Before falling back to a different model, the router first tries rotating to a different API key for the same model. This handles per-key rate limits.

{
  "providers": {
    "openrouter": {
      "auth_profiles": [
        { "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
        { "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" },
        { "id": "tertiary", "key": "SECRET_REF(openrouter_key_3)" }
      ],
      "rotation_strategy": "round-robin-on-failure"
    }
  }
}

3.7 Fallback Chain Definitions

Starting Model	Fallback 1	Fallback 2	Fallback 3 (emergency)
DeepSeek V3.2	MiniMax M2.5	GPT 5 Nano	— (return error)
GPT 5 Nano	DeepSeek V3.2	—	—
GLM 5	MiniMax M2.5	DeepSeek V3.2	GPT 5 Nano
MiniMax M2.5	DeepSeek V3.2	GPT 5 Nano	—
Gemini Flash	DeepSeek V3.2	GPT 5 Nano	—
GPT 5.2 (premium)	GLM 5 (included)	DeepSeek V3.2	—
Claude Sonnet 4.6 (premium)	GPT 5.2 (premium)	GLM 5	DeepSeek V3.2
Claude Opus 4.6 (premium)	Claude Sonnet 4.6	GPT 5.2	GLM 5

Cross-category fallback rule: Premium models can fall back to included models, but included models never fall "up" to premium models (that would charge the user unexpectedly).

4. Prompt Caching Strategy

4.1 Cache Architecture

OpenClaw's prompt caching with cacheRetention: "long" (1-hour TTL) is the primary cost optimization. The SOUL.md is the cacheable prefix.

┌─────────────────────────────────────────────────┐
│ Cached Prefix (1-hour TTL)                       │
│ ┌──────────────────┐ ┌────────────────────────┐ │
│ │ SOUL.md (~3K tok) │ │ Tool Registry (~3K tok) │ │
│ └──────────────────┘ └────────────────────────┘ │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Dynamic Content (not cached)                     │
│ ┌────────────────┐ ┌──────────────────────────┐ │
│ │ Conversation    │ │ Tool results / context   │ │
│ └────────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────┘

4.2 Cache Savings by Model

Model	Standard Input/1M	Cache Read/1M	Savings %	Monthly Savings (20M tokens, 60% cache hit)
DeepSeek V3.2	$0.274	$0.211	23%	$0.76
GPT 5 Nano	$0.053	$0.005	91%	$0.58
Gemini Flash	$0.528	$0.053	90%	$5.70
GLM 5	$1.002	$0.211	79%	$9.49
Claude Sonnet 4.6	$3.165	$0.317	90%	$34.18

4.3 Heartbeat Keep-Warm

To maximize cache hit rates, the Safety Wrapper sends a heartbeat every 55 minutes (just under the 1-hour TTL). This keeps the SOUL.md prefix in cache without a real user interaction.

Config:

{
  "heartbeat": {
    "every": "55m"
  }
}

Cost of keep-warm: One minimal prompt per agent every 55 minutes = ~5K tokens × 24 turns/day × 5 agents = ~600K tokens/day. At DeepSeek V3.2 cache read rates: ~$0.13/day ($3.80/month). This is offset by the cache savings on real interactions.

Decision: Only keep-warm agents that were active in the last 24 hours. No point warming cache for agents the user hasn't talked to. The Safety Wrapper tracks lastActiveAt per agent and only sends heartbeats for recently active agents.

4.4 Cache Invalidation

Cache is invalidated when:

Event	Action	Impact
SOUL.md content changes	Cache miss on next turn, re-cache	One-time cost (~6K tokens at full input rate)
Tool registry changes (new tool installed)	Cache miss on next turn, re-cache	One-time cost
Model changed (user switches preset)	New model has fresh cache	Cache builds from scratch for new model
1-hour TTL expires without heartbeat	Cache expires naturally	Re-cache on next interaction

5. Load Balancing & Rate Limiting

5.1 Per-Tenant Rate Limits

Each tenant VPS has built-in rate limiting to prevent runaway token consumption:

Limit	Default	Configurable	Purpose
Max concurrent agent turns	1	No (OpenClaw default)	Prevent race conditions
Max tool calls per turn	50	Yes (Safety Wrapper)	Prevent infinite loops
Max tokens per single turn	100K	Yes (Safety Wrapper)	Prevent context explosion
Max turns per hour per agent	60	Yes (Safety Wrapper)	Prevent runaway automation
Max API calls to Hub per minute	10	No	Prevent Hub overload
OpenRouter requests per minute	Per OpenRouter's limits	No	External rate limit

5.2 Loop Detection

OpenClaw's native loop detection (tools.loopDetection.enabled: true) prevents agents from calling the same tool repeatedly without making progress. The Safety Wrapper adds a secondary check:

If agent calls the same tool with the same parameters 3 times in 60 seconds:
  → Block the call
  → Log warning
  → Notify user: "The AI appears to be stuck in a loop. I've paused it."

5.3 Token Budget Alerts

The Safety Wrapper monitors token consumption and alerts proactively:

Pool Usage	Action
50% consumed	No action (tracked internally)
75% consumed	Notify user: "You've used 75% of your monthly tokens"
90% consumed	Notify user with upgrade suggestion
100% consumed	Switch to overage mode, notify user with cost estimate
150% of pool (overage)	Strong warning, suggest reviewing which agents are most active

6. Monitoring & Observability

6.1 Metrics Collected

The Safety Wrapper collects and reports these metrics to the Hub via heartbeat:

Metric	Granularity	Used For
Tokens per agent per model per hour	Hourly buckets	Billing, usage dashboards
Model selection frequency	Per request	Optimize default presets
Fallback trigger count	Per hour	Monitor model reliability
Cache hit rate	Per agent per hour	Cost optimization tracking
Agent routing decisions	Per request	Dispatcher accuracy tracking
Tool call count per agent	Per hour	Identify heavy automation
Approval queue latency	Per request	UX optimization
Error rate per model	Per hour	Model health monitoring

6.2 Dashboard Views (Hub)

Admin dashboard (staff):

Global token usage heatmap (all tenants)
Model usage distribution pie chart
Fallback frequency by model (alerts if >5% in any hour)
Revenue per model (included vs. premium vs. overage)

Customer dashboard:

"Your AI Team This Month" — token usage by agent, visualized as a bar chart
Model usage breakdown (which models are being used)
Pool status gauge (% remaining)
Cost breakdown (included vs. overage vs. premium)
"Most Active Agent" — who's doing the most work

7. Configuration Reference

7.1 Model Routing Config (`model-routing.json`)

{
  "presets": {
    "basic": {
      "name": "Basic Tasks",
      "models": ["openai/gpt-5-nano", "google/gemini-3-flash-preview", "deepseek/deepseek-v3.2"],
      "description": "Quick lookups, simple drafts, data entry"
    },
    "balanced": {
      "name": "Balanced",
      "models": ["deepseek/deepseek-v3.2", "minimax/minimax-m2.5", "openai/gpt-5-nano"],
      "description": "Day-to-day operations, routine tasks"
    },
    "complex": {
      "name": "Complex Tasks",
      "models": ["zhipu/glm-5", "minimax/minimax-m2.5", "deepseek/deepseek-v3.2"],
      "description": "Analysis, multi-step reasoning, reports"
    }
  },
  "premium_models": [
    "openai/gpt-5.2",
    "google/gemini-3.1-pro",
    "anthropic/claude-sonnet-4.6",
    "anthropic/claude-opus-4.6"
  ],
  "included_models": [
    "deepseek/deepseek-v3.2",
    "openai/gpt-5-nano",
    "openai/gpt-5.2-mini",
    "minimax/minimax-m2.5",
    "google/gemini-3-flash-preview",
    "zhipu/glm-5"
  ],
  "fallback_chains": {
    "deepseek/deepseek-v3.2": ["minimax/minimax-m2.5", "openai/gpt-5-nano"],
    "openai/gpt-5-nano": ["deepseek/deepseek-v3.2"],
    "zhipu/glm-5": ["minimax/minimax-m2.5", "deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
    "minimax/minimax-m2.5": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
    "google/gemini-3-flash-preview": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
    "openai/gpt-5.2": ["zhipu/glm-5", "deepseek/deepseek-v3.2"],
    "anthropic/claude-sonnet-4.6": ["openai/gpt-5.2", "zhipu/glm-5", "deepseek/deepseek-v3.2"],
    "anthropic/claude-opus-4.6": ["anthropic/claude-sonnet-4.6", "openai/gpt-5.2", "zhipu/glm-5"]
  },
  "overage_markup": {
    "cheap": { "threshold_max": 0.50, "markup": 0.35 },
    "mid": { "threshold_max": 1.20, "markup": 0.25 },
    "top": { "threshold_max": 999, "markup": 0.20 }
  },
  "premium_markup": {
    "default": 0.10,
    "overrides": {
      "anthropic/claude-opus-4.6": 0.08
    }
  },
  "rate_limits": {
    "max_tool_calls_per_turn": 50,
    "max_tokens_per_turn": 100000,
    "max_turns_per_hour_per_agent": 60,
    "loop_detection_threshold": 3,
    "loop_detection_window_seconds": 60
  },
  "caching": {
    "retention": "long",
    "heartbeat_interval": "55m",
    "warmup_only_active_agents": true,
    "active_agent_threshold_hours": 24
  }
}

7.2 OpenRouter Provider Config (`openclaw.json` excerpt)

{
  "providers": {
    "openrouter": {
      "base_url": "http://127.0.0.1:8100/v1",
      "auth_profiles": [
        { "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
        { "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" }
      ],
      "rotation_strategy": "round-robin-on-failure",
      "timeout_ms": 60000,
      "retry": {
        "max_attempts": 3,
        "backoff_ms": [1000, 5000, 15000]
      }
    }
  },
  "model": {
    "primary": "deepseek/deepseek-v3.2",
    "fallback": ["minimax/minimax-m2.5", "openai/gpt-5-nano"]
  }
}

Note: base_url points to the local secrets proxy (127.0.0.1:8100) which handles credential injection and outbound redaction before forwarding to OpenRouter's actual API.

8. Implementation Priorities

Priority	Component	Effort	Dependencies
P0	Basic preset routing (3 presets → model selection)	1 week	Safety Wrapper skeleton
P0	Fallback chain with auth rotation	1 week	OpenRouter integration
P0	Token metering and pool tracking	2 weeks	Hub billing endpoints
P1	Agent routing (Dispatcher SOUL.md)	1 week	SOUL.md templates
P1	Prompt caching with heartbeat keep-warm	3 days	OpenClaw caching config
P1	Loop detection (Safety Wrapper layer)	3 days	Safety Wrapper hooks
P2	Per-agent model overrides	3 days	Hub agent config UI
P2	Premium model gating (credit card check)	1 week	Hub billing + Stripe
P2	Token budget alerts	3 days	Hub notification system
P3	Multi-agent coordination (parallel delegation)	2 weeks	Agent-to-agent messaging
P3	Per-task model override (future)	1 week	Conversation context detection
P3	Customer usage dashboard	1 week	Hub frontend

9. Changelog

Version	Date	Changes
1.0	2026-02-26	Initial spec. Agent routing via Dispatcher. Model presets (Basic/Balanced/Complex). Fallback chains with auth rotation. Token pool management. Prompt caching strategy. Rate limiting. Configuration reference. Implementation priorities.

22 KiB Raw Permalink Blame History Unescape Escape