540 lines
22 KiB
Markdown
540 lines
22 KiB
Markdown
|
|
# LetsBe Biz — Dispatcher Routing Logic
|
|||
|
|
|
|||
|
|
**Version:** 1.0
|
|||
|
|
**Date:** February 26, 2026
|
|||
|
|
**Authors:** Matt (Founder), Claude (Architecture)
|
|||
|
|
**Status:** Engineering Spec — Ready for Implementation
|
|||
|
|
**Companion docs:** Technical Architecture v1.2, Pricing Model v2.2, SOUL.md Content Spec v1.0
|
|||
|
|
**Decision refs:** Foundation Document Decisions #33, #35, #41
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Purpose
|
|||
|
|
|
|||
|
|
This document specifies two routing systems that are central to LetsBe Biz:
|
|||
|
|
|
|||
|
|
1. **Agent Routing (Dispatcher)** — How user messages are routed to the correct AI agent
|
|||
|
|
2. **Model Routing** — How AI requests are routed to the optimal LLM model based on task complexity, user settings, and cost constraints
|
|||
|
|
|
|||
|
|
Both routing systems live in the Safety Wrapper extension and operate transparently — users interact with "their AI team," not with routing logic.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Agent Routing (Dispatcher Logic)
|
|||
|
|
|
|||
|
|
### 2.1 Architecture
|
|||
|
|
|
|||
|
|
The Dispatcher is an OpenClaw agent configured with `agentToAgent` communication enabled. It uses the `messaging` tool profile and serves as the default entry point for all user messages.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
User Message
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Dispatcher Agent (SOUL.md: routing rules)
|
|||
|
|
│
|
|||
|
|
├── Simple / cross-domain → Handle directly
|
|||
|
|
│
|
|||
|
|
├── Infrastructure → delegate to IT Admin
|
|||
|
|
├── Content / analytics → delegate to Marketing
|
|||
|
|
├── Scheduling / comms → delegate to Secretary
|
|||
|
|
├── CRM / pipeline → delegate to Sales
|
|||
|
|
│
|
|||
|
|
└── Multi-domain → coordinate across agents
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 Routing Decision Matrix
|
|||
|
|
|
|||
|
|
The Dispatcher routes based on **intent classification**. OpenClaw's native agent routing handles this through the Dispatcher's SOUL.md instructions — no separate classification model is needed.
|
|||
|
|
|
|||
|
|
| Signal | Routes To | Examples |
|
|||
|
|
|--------|-----------|---------|
|
|||
|
|
| Infrastructure keywords | IT Admin | "restart", "container", "backup", "disk", "server", "install", "update", "nginx", "Docker", "SSL", "certificate", "Keycloak", "Portainer" |
|
|||
|
|
| Content/analytics keywords | Marketing | "blog", "post", "newsletter", "campaign", "analytics", "traffic", "subscribers", "Ghost", "Listmonk", "Umami", "SEO" |
|
|||
|
|
| Scheduling/comms keywords | Secretary | "calendar", "meeting", "schedule", "email", "respond", "follow up", "Chatwoot", "Cal.com", "appointment", "reminder" |
|
|||
|
|
| CRM/sales keywords | Sales | "lead", "opportunity", "pipeline", "CRM", "deal", "prospect", "follow-up", "Odoo", "quote", "proposal" |
|
|||
|
|
| System questions | Dispatcher (self) | "what can you do", "how does this work", "what tools do I have", "help", "status", "summary" |
|
|||
|
|
| Multi-domain | Dispatcher coordinates | "morning briefing", "give me a weekly summary", "how's business", "prepare for my meeting with [client]" |
|
|||
|
|
|
|||
|
|
### 2.3 Delegation Protocol
|
|||
|
|
|
|||
|
|
When the Dispatcher delegates to a specialist agent, it uses OpenClaw's native agent-to-agent messaging:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
1. Dispatcher receives user message
|
|||
|
|
2. Dispatcher identifies the target agent
|
|||
|
|
3. Dispatcher sends structured delegation message:
|
|||
|
|
{
|
|||
|
|
"to": "it-admin",
|
|||
|
|
"context": "User requests: 'Why is Nextcloud slow?'",
|
|||
|
|
"expectation": "Diagnose and report. If action needed, get user approval."
|
|||
|
|
}
|
|||
|
|
4. Target agent receives message, executes task
|
|||
|
|
5. Target agent returns result to Dispatcher
|
|||
|
|
6. Dispatcher formats and presents result to user
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.4 Multi-Agent Coordination
|
|||
|
|
|
|||
|
|
For tasks spanning multiple agents, the Dispatcher acts as coordinator:
|
|||
|
|
|
|||
|
|
**Example: "Prepare for my call with Acme Corp tomorrow"**
|
|||
|
|
|
|||
|
|
1. Dispatcher identifies subtasks:
|
|||
|
|
- Secretary: Pull calendar details, recent email threads with Acme
|
|||
|
|
- Sales: Pull CRM record, pipeline status, last interaction
|
|||
|
|
- Marketing: Check if Acme visited the website recently (Umami)
|
|||
|
|
2. Dispatcher delegates each subtask in parallel (or sequential if dependencies exist)
|
|||
|
|
3. Dispatcher compiles results into a unified briefing
|
|||
|
|
4. Dispatcher presents the briefing to the user
|
|||
|
|
|
|||
|
|
### 2.5 Fallback Behavior
|
|||
|
|
|
|||
|
|
| Scenario | Behavior |
|
|||
|
|
|----------|----------|
|
|||
|
|
| Target agent unavailable (crashed/restarting) | Dispatcher notifies user, suggests IT Admin investigate |
|
|||
|
|
| Ambiguous request | Dispatcher makes best judgment, routes, tells user who's handling it |
|
|||
|
|
| User explicitly names an agent | Route directly ("Tell the IT Admin to restart Ghost") |
|
|||
|
|
| Request is outside all agent capabilities | Dispatcher explains honestly what's possible and what isn't |
|
|||
|
|
| Agent returns an error | Dispatcher reports the error to the user and suggests next steps |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Model Routing
|
|||
|
|
|
|||
|
|
### 3.1 Architecture
|
|||
|
|
|
|||
|
|
Model routing determines which LLM processes each agent turn. The Safety Wrapper's `before_prompt_build` hook (or the outbound secrets proxy) controls which model endpoint the request is sent to.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Agent Turn
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Safety Wrapper: Model Router
|
|||
|
|
│
|
|||
|
|
├── Check user's model setting (Basic / Balanced / Complex / Specific Model)
|
|||
|
|
├── Check if premium model → verify credit card on file
|
|||
|
|
├── Check token pool → enough tokens remaining?
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Route to OpenRouter endpoint
|
|||
|
|
│
|
|||
|
|
├── Primary model → attempt
|
|||
|
|
├── If rate limited → try auth profile rotation (same model, different key)
|
|||
|
|
├── If still failing → fallback to next model in chain
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Response → Token metering → Return to agent
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.2 Model Presets (Basic Settings)
|
|||
|
|
|
|||
|
|
Users who don't want to think about models pick a preset. Each preset maps to a prioritized model chain.
|
|||
|
|
|
|||
|
|
| Preset | Primary Model | Fallback 1 | Fallback 2 | Blended Cost/1M | Use Case |
|
|||
|
|
|--------|--------------|------------|------------|-----------------|----------|
|
|||
|
|
| **Basic Tasks** | GPT 5 Nano | Gemini 3 Flash Preview | DeepSeek V3.2 | $0.20–1.58 | Quick lookups, formatting, simple drafts |
|
|||
|
|
| **Balanced** (default) | DeepSeek V3.2 | MiniMax M2.5 | GPT 5 Nano | $0.20–0.70 | Daily operations, routine agent work |
|
|||
|
|
| **Complex Tasks** | GLM 5 | MiniMax M2.5 | DeepSeek V3.2 | $0.33–1.68 | Analysis, multi-step reasoning, reports |
|
|||
|
|
|
|||
|
|
**Preset assignment logic:**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
function resolveModel(agentId, taskContext) {
|
|||
|
|
// 1. Check for agent-specific model override
|
|||
|
|
if (agentConfig[agentId].model) return agentConfig[agentId].model;
|
|||
|
|
|
|||
|
|
// 2. Check user's global preset setting
|
|||
|
|
const preset = tenantConfig.modelPreset; // "basic" | "balanced" | "complex"
|
|||
|
|
|
|||
|
|
// 3. Return the primary model for that preset
|
|||
|
|
return PRESETS[preset].primary;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.3 Advanced Model Selection
|
|||
|
|
|
|||
|
|
Users with a credit card on file can select specific models per agent or per task:
|
|||
|
|
|
|||
|
|
| Configuration Level | Scope | Example |
|
|||
|
|
|--------------------|-------|---------|
|
|||
|
|
| Global preset | All agents, all tasks | "Use Balanced for everything" |
|
|||
|
|
| Per-agent override | All tasks for one agent | "IT Admin uses Complex, everything else uses Balanced" |
|
|||
|
|
| Per-task override (future) | Single task/conversation | "Use Claude Sonnet for this analysis" |
|
|||
|
|
|
|||
|
|
**Schema (Safety Wrapper config):**
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"model_routing": {
|
|||
|
|
"default_preset": "balanced",
|
|||
|
|
"agent_overrides": {
|
|||
|
|
"it-admin": { "preset": "complex" },
|
|||
|
|
"marketing": { "model": "claude-sonnet-4.6" }
|
|||
|
|
},
|
|||
|
|
"premium_enabled": true,
|
|||
|
|
"credit_card_on_file": true
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.4 Included vs. Premium Model Routing
|
|||
|
|
|
|||
|
|
| Model Category | Token Pool | Billing | Credit Card Required |
|
|||
|
|
|---------------|------------|---------|---------------------|
|
|||
|
|
| **Included** (DeepSeek V3.2, GPT 5 Nano, GPT 5.2 Mini, MiniMax M2.5, Gemini Flash, GLM 5) | Draws from monthly allocation | Subscription covers it | No |
|
|||
|
|
| **Premium** (GPT 5.2, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3.1 Pro) | Separate — does NOT draw from pool | Per-token metered to credit card | **Yes** |
|
|||
|
|
|
|||
|
|
**Routing decision tree:**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Is the selected model Premium?
|
|||
|
|
├── No → Check token pool
|
|||
|
|
│ ├── Tokens remaining → Route to model
|
|||
|
|
│ └── Pool exhausted → Apply overage markup, notify user, route to model
|
|||
|
|
│
|
|||
|
|
└── Yes → Check credit card
|
|||
|
|
├── Card on file → Route to model, meter tokens
|
|||
|
|
└── No card → Reject, prompt user to add card in settings
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.5 Token Pool Management
|
|||
|
|
|
|||
|
|
Each subscription tier includes a monthly token allocation:
|
|||
|
|
|
|||
|
|
| Tier | Monthly Tokens | Founding Member Tokens |
|
|||
|
|
|------|---------------|----------------------|
|
|||
|
|
| Lite (€29) | ~8M | ~16M |
|
|||
|
|
| Build (€45) | ~15M | ~30M |
|
|||
|
|
| Scale (€75) | ~25M | ~50M |
|
|||
|
|
| Enterprise (€109) | ~40M | ~80M |
|
|||
|
|
|
|||
|
|
**Pool tracking implementation:**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
On every LLM response:
|
|||
|
|
1. Safety Wrapper captures token counts (input, output, cache_read, cache_write)
|
|||
|
|
2. Calculates cost: tokens × model_rate × (1 + openrouter_fee)
|
|||
|
|
3. Converts to "standard tokens" (normalized to DeepSeek V3.2 equivalent)
|
|||
|
|
4. Decrements from monthly pool
|
|||
|
|
5. Reports to Hub via usage endpoint
|
|||
|
|
|
|||
|
|
When pool is exhausted:
|
|||
|
|
1. Safety Wrapper detects pool < 0
|
|||
|
|
2. Switches to overage billing mode
|
|||
|
|
3. Applies overage markup (35% for cheap models, 25% mid, 20% top included)
|
|||
|
|
4. Notifies user: "Your included tokens are used up. Continuing at overage rates."
|
|||
|
|
5. User can top up, upgrade tier, or wait for next billing cycle
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.6 Fallback Chain Logic
|
|||
|
|
|
|||
|
|
When the primary model fails, the router attempts fallbacks before giving up.
|
|||
|
|
|
|||
|
|
**Failure types and responses:**
|
|||
|
|
|
|||
|
|
| Failure | First Response | Second Response | Third Response |
|
|||
|
|
|---------|---------------|-----------------|----------------|
|
|||
|
|
| Rate limited (429) | Rotate auth profile (different OpenRouter key) | Wait 5s, retry same model | Fall to next model in chain |
|
|||
|
|
| Model unavailable (503) | Fall to next model in chain immediately | Continue down chain | Return error to agent |
|
|||
|
|
| Context too long | Truncate and retry | Fall to model with larger context (Gemini Flash: 1M) | Return error suggesting context compaction |
|
|||
|
|
| Timeout (>60s) | Retry once | Fall to faster model | Return timeout error |
|
|||
|
|
| Auth error (401/403) | Rotate auth profile | Retry with Hub-synced key | Return auth error, notify admin |
|
|||
|
|
|
|||
|
|
**Auth profile rotation:** OpenClaw natively supports multiple auth profiles per model provider. Before falling back to a different model, the router first tries rotating to a different API key for the same model. This handles per-key rate limits.
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"providers": {
|
|||
|
|
"openrouter": {
|
|||
|
|
"auth_profiles": [
|
|||
|
|
{ "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
|
|||
|
|
{ "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" },
|
|||
|
|
{ "id": "tertiary", "key": "SECRET_REF(openrouter_key_3)" }
|
|||
|
|
],
|
|||
|
|
"rotation_strategy": "round-robin-on-failure"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3.7 Fallback Chain Definitions
|
|||
|
|
|
|||
|
|
| Starting Model | Fallback 1 | Fallback 2 | Fallback 3 (emergency) |
|
|||
|
|
|---------------|------------|------------|----------------------|
|
|||
|
|
| DeepSeek V3.2 | MiniMax M2.5 | GPT 5 Nano | — (return error) |
|
|||
|
|
| GPT 5 Nano | DeepSeek V3.2 | — | — |
|
|||
|
|
| GLM 5 | MiniMax M2.5 | DeepSeek V3.2 | GPT 5 Nano |
|
|||
|
|
| MiniMax M2.5 | DeepSeek V3.2 | GPT 5 Nano | — |
|
|||
|
|
| Gemini Flash | DeepSeek V3.2 | GPT 5 Nano | — |
|
|||
|
|
| GPT 5.2 (premium) | GLM 5 (included) | DeepSeek V3.2 | — |
|
|||
|
|
| Claude Sonnet 4.6 (premium) | GPT 5.2 (premium) | GLM 5 | DeepSeek V3.2 |
|
|||
|
|
| Claude Opus 4.6 (premium) | Claude Sonnet 4.6 | GPT 5.2 | GLM 5 |
|
|||
|
|
|
|||
|
|
**Cross-category fallback rule:** Premium models can fall back to included models, but included models never fall "up" to premium models (that would charge the user unexpectedly).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Prompt Caching Strategy
|
|||
|
|
|
|||
|
|
### 4.1 Cache Architecture
|
|||
|
|
|
|||
|
|
OpenClaw's prompt caching with `cacheRetention: "long"` (1-hour TTL) is the primary cost optimization. The SOUL.md is the cacheable prefix.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────┐
|
|||
|
|
│ Cached Prefix (1-hour TTL) │
|
|||
|
|
│ ┌──────────────────┐ ┌────────────────────────┐ │
|
|||
|
|
│ │ SOUL.md (~3K tok) │ │ Tool Registry (~3K tok) │ │
|
|||
|
|
│ └──────────────────┘ └────────────────────────┘ │
|
|||
|
|
└─────────────────────────────────────────────────┘
|
|||
|
|
┌─────────────────────────────────────────────────┐
|
|||
|
|
│ Dynamic Content (not cached) │
|
|||
|
|
│ ┌────────────────┐ ┌──────────────────────────┐ │
|
|||
|
|
│ │ Conversation │ │ Tool results / context │ │
|
|||
|
|
│ └────────────────┘ └──────────────────────────┘ │
|
|||
|
|
└─────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4.2 Cache Savings by Model
|
|||
|
|
|
|||
|
|
| Model | Standard Input/1M | Cache Read/1M | Savings % | Monthly Savings (20M tokens, 60% cache hit) |
|
|||
|
|
|-------|-------------------|---------------|-----------|---------------------------------------------|
|
|||
|
|
| DeepSeek V3.2 | $0.274 | $0.211 | 23% | $0.76 |
|
|||
|
|
| GPT 5 Nano | $0.053 | $0.005 | 91% | $0.58 |
|
|||
|
|
| Gemini Flash | $0.528 | $0.053 | 90% | $5.70 |
|
|||
|
|
| GLM 5 | $1.002 | $0.211 | 79% | $9.49 |
|
|||
|
|
| Claude Sonnet 4.6 | $3.165 | $0.317 | 90% | $34.18 |
|
|||
|
|
|
|||
|
|
### 4.3 Heartbeat Keep-Warm
|
|||
|
|
|
|||
|
|
To maximize cache hit rates, the Safety Wrapper sends a heartbeat every 55 minutes (just under the 1-hour TTL). This keeps the SOUL.md prefix in cache without a real user interaction.
|
|||
|
|
|
|||
|
|
**Config:**
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"heartbeat": {
|
|||
|
|
"every": "55m"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Cost of keep-warm:** One minimal prompt per agent every 55 minutes = ~5K tokens × 24 turns/day × 5 agents = ~600K tokens/day. At DeepSeek V3.2 cache read rates: ~$0.13/day ($3.80/month). This is offset by the cache savings on real interactions.
|
|||
|
|
|
|||
|
|
**Decision: Only keep-warm agents that were active in the last 24 hours.** No point warming cache for agents the user hasn't talked to. The Safety Wrapper tracks `lastActiveAt` per agent and only sends heartbeats for recently active agents.
|
|||
|
|
|
|||
|
|
### 4.4 Cache Invalidation
|
|||
|
|
|
|||
|
|
Cache is invalidated when:
|
|||
|
|
|
|||
|
|
| Event | Action | Impact |
|
|||
|
|
|-------|--------|--------|
|
|||
|
|
| SOUL.md content changes | Cache miss on next turn, re-cache | One-time cost (~6K tokens at full input rate) |
|
|||
|
|
| Tool registry changes (new tool installed) | Cache miss on next turn, re-cache | One-time cost |
|
|||
|
|
| Model changed (user switches preset) | New model has fresh cache | Cache builds from scratch for new model |
|
|||
|
|
| 1-hour TTL expires without heartbeat | Cache expires naturally | Re-cache on next interaction |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Load Balancing & Rate Limiting
|
|||
|
|
|
|||
|
|
### 5.1 Per-Tenant Rate Limits
|
|||
|
|
|
|||
|
|
Each tenant VPS has built-in rate limiting to prevent runaway token consumption:
|
|||
|
|
|
|||
|
|
| Limit | Default | Configurable | Purpose |
|
|||
|
|
|-------|---------|-------------|---------|
|
|||
|
|
| Max concurrent agent turns | 1 | No (OpenClaw default) | Prevent race conditions |
|
|||
|
|
| Max tool calls per turn | 50 | Yes (Safety Wrapper) | Prevent infinite loops |
|
|||
|
|
| Max tokens per single turn | 100K | Yes (Safety Wrapper) | Prevent context explosion |
|
|||
|
|
| Max turns per hour per agent | 60 | Yes (Safety Wrapper) | Prevent runaway automation |
|
|||
|
|
| Max API calls to Hub per minute | 10 | No | Prevent Hub overload |
|
|||
|
|
| OpenRouter requests per minute | Per OpenRouter's limits | No | External rate limit |
|
|||
|
|
|
|||
|
|
### 5.2 Loop Detection
|
|||
|
|
|
|||
|
|
OpenClaw's native loop detection (`tools.loopDetection.enabled: true`) prevents agents from calling the same tool repeatedly without making progress. The Safety Wrapper adds a secondary check:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
If agent calls the same tool with the same parameters 3 times in 60 seconds:
|
|||
|
|
→ Block the call
|
|||
|
|
→ Log warning
|
|||
|
|
→ Notify user: "The AI appears to be stuck in a loop. I've paused it."
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5.3 Token Budget Alerts
|
|||
|
|
|
|||
|
|
The Safety Wrapper monitors token consumption and alerts proactively:
|
|||
|
|
|
|||
|
|
| Pool Usage | Action |
|
|||
|
|
|-----------|--------|
|
|||
|
|
| 50% consumed | No action (tracked internally) |
|
|||
|
|
| 75% consumed | Notify user: "You've used 75% of your monthly tokens" |
|
|||
|
|
| 90% consumed | Notify user with upgrade suggestion |
|
|||
|
|
| 100% consumed | Switch to overage mode, notify user with cost estimate |
|
|||
|
|
| 150% of pool (overage) | Strong warning, suggest reviewing which agents are most active |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Monitoring & Observability
|
|||
|
|
|
|||
|
|
### 6.1 Metrics Collected
|
|||
|
|
|
|||
|
|
The Safety Wrapper collects and reports these metrics to the Hub via heartbeat:
|
|||
|
|
|
|||
|
|
| Metric | Granularity | Used For |
|
|||
|
|
|--------|------------|----------|
|
|||
|
|
| Tokens per agent per model per hour | Hourly buckets | Billing, usage dashboards |
|
|||
|
|
| Model selection frequency | Per request | Optimize default presets |
|
|||
|
|
| Fallback trigger count | Per hour | Monitor model reliability |
|
|||
|
|
| Cache hit rate | Per agent per hour | Cost optimization tracking |
|
|||
|
|
| Agent routing decisions | Per request | Dispatcher accuracy tracking |
|
|||
|
|
| Tool call count per agent | Per hour | Identify heavy automation |
|
|||
|
|
| Approval queue latency | Per request | UX optimization |
|
|||
|
|
| Error rate per model | Per hour | Model health monitoring |
|
|||
|
|
|
|||
|
|
### 6.2 Dashboard Views (Hub)
|
|||
|
|
|
|||
|
|
**Admin dashboard (staff):**
|
|||
|
|
- Global token usage heatmap (all tenants)
|
|||
|
|
- Model usage distribution pie chart
|
|||
|
|
- Fallback frequency by model (alerts if >5% in any hour)
|
|||
|
|
- Revenue per model (included vs. premium vs. overage)
|
|||
|
|
|
|||
|
|
**Customer dashboard:**
|
|||
|
|
- "Your AI Team This Month" — token usage by agent, visualized as a bar chart
|
|||
|
|
- Model usage breakdown (which models are being used)
|
|||
|
|
- Pool status gauge (% remaining)
|
|||
|
|
- Cost breakdown (included vs. overage vs. premium)
|
|||
|
|
- "Most Active Agent" — who's doing the most work
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Configuration Reference
|
|||
|
|
|
|||
|
|
### 7.1 Model Routing Config (`model-routing.json`)
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"presets": {
|
|||
|
|
"basic": {
|
|||
|
|
"name": "Basic Tasks",
|
|||
|
|
"models": ["openai/gpt-5-nano", "google/gemini-3-flash-preview", "deepseek/deepseek-v3.2"],
|
|||
|
|
"description": "Quick lookups, simple drafts, data entry"
|
|||
|
|
},
|
|||
|
|
"balanced": {
|
|||
|
|
"name": "Balanced",
|
|||
|
|
"models": ["deepseek/deepseek-v3.2", "minimax/minimax-m2.5", "openai/gpt-5-nano"],
|
|||
|
|
"description": "Day-to-day operations, routine tasks"
|
|||
|
|
},
|
|||
|
|
"complex": {
|
|||
|
|
"name": "Complex Tasks",
|
|||
|
|
"models": ["zhipu/glm-5", "minimax/minimax-m2.5", "deepseek/deepseek-v3.2"],
|
|||
|
|
"description": "Analysis, multi-step reasoning, reports"
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"premium_models": [
|
|||
|
|
"openai/gpt-5.2",
|
|||
|
|
"google/gemini-3.1-pro",
|
|||
|
|
"anthropic/claude-sonnet-4.6",
|
|||
|
|
"anthropic/claude-opus-4.6"
|
|||
|
|
],
|
|||
|
|
"included_models": [
|
|||
|
|
"deepseek/deepseek-v3.2",
|
|||
|
|
"openai/gpt-5-nano",
|
|||
|
|
"openai/gpt-5.2-mini",
|
|||
|
|
"minimax/minimax-m2.5",
|
|||
|
|
"google/gemini-3-flash-preview",
|
|||
|
|
"zhipu/glm-5"
|
|||
|
|
],
|
|||
|
|
"fallback_chains": {
|
|||
|
|
"deepseek/deepseek-v3.2": ["minimax/minimax-m2.5", "openai/gpt-5-nano"],
|
|||
|
|
"openai/gpt-5-nano": ["deepseek/deepseek-v3.2"],
|
|||
|
|
"zhipu/glm-5": ["minimax/minimax-m2.5", "deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
|||
|
|
"minimax/minimax-m2.5": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
|||
|
|
"google/gemini-3-flash-preview": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
|||
|
|
"openai/gpt-5.2": ["zhipu/glm-5", "deepseek/deepseek-v3.2"],
|
|||
|
|
"anthropic/claude-sonnet-4.6": ["openai/gpt-5.2", "zhipu/glm-5", "deepseek/deepseek-v3.2"],
|
|||
|
|
"anthropic/claude-opus-4.6": ["anthropic/claude-sonnet-4.6", "openai/gpt-5.2", "zhipu/glm-5"]
|
|||
|
|
},
|
|||
|
|
"overage_markup": {
|
|||
|
|
"cheap": { "threshold_max": 0.50, "markup": 0.35 },
|
|||
|
|
"mid": { "threshold_max": 1.20, "markup": 0.25 },
|
|||
|
|
"top": { "threshold_max": 999, "markup": 0.20 }
|
|||
|
|
},
|
|||
|
|
"premium_markup": {
|
|||
|
|
"default": 0.10,
|
|||
|
|
"overrides": {
|
|||
|
|
"anthropic/claude-opus-4.6": 0.08
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"rate_limits": {
|
|||
|
|
"max_tool_calls_per_turn": 50,
|
|||
|
|
"max_tokens_per_turn": 100000,
|
|||
|
|
"max_turns_per_hour_per_agent": 60,
|
|||
|
|
"loop_detection_threshold": 3,
|
|||
|
|
"loop_detection_window_seconds": 60
|
|||
|
|
},
|
|||
|
|
"caching": {
|
|||
|
|
"retention": "long",
|
|||
|
|
"heartbeat_interval": "55m",
|
|||
|
|
"warmup_only_active_agents": true,
|
|||
|
|
"active_agent_threshold_hours": 24
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 7.2 OpenRouter Provider Config (`openclaw.json` excerpt)
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"providers": {
|
|||
|
|
"openrouter": {
|
|||
|
|
"base_url": "http://127.0.0.1:8100/v1",
|
|||
|
|
"auth_profiles": [
|
|||
|
|
{ "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
|
|||
|
|
{ "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" }
|
|||
|
|
],
|
|||
|
|
"rotation_strategy": "round-robin-on-failure",
|
|||
|
|
"timeout_ms": 60000,
|
|||
|
|
"retry": {
|
|||
|
|
"max_attempts": 3,
|
|||
|
|
"backoff_ms": [1000, 5000, 15000]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
},
|
|||
|
|
"model": {
|
|||
|
|
"primary": "deepseek/deepseek-v3.2",
|
|||
|
|
"fallback": ["minimax/minimax-m2.5", "openai/gpt-5-nano"]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Note: `base_url` points to the local secrets proxy (`127.0.0.1:8100`) which handles credential injection and outbound redaction before forwarding to OpenRouter's actual API.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Implementation Priorities
|
|||
|
|
|
|||
|
|
| Priority | Component | Effort | Dependencies |
|
|||
|
|
|----------|-----------|--------|-------------|
|
|||
|
|
| P0 | Basic preset routing (3 presets → model selection) | 1 week | Safety Wrapper skeleton |
|
|||
|
|
| P0 | Fallback chain with auth rotation | 1 week | OpenRouter integration |
|
|||
|
|
| P0 | Token metering and pool tracking | 2 weeks | Hub billing endpoints |
|
|||
|
|
| P1 | Agent routing (Dispatcher SOUL.md) | 1 week | SOUL.md templates |
|
|||
|
|
| P1 | Prompt caching with heartbeat keep-warm | 3 days | OpenClaw caching config |
|
|||
|
|
| P1 | Loop detection (Safety Wrapper layer) | 3 days | Safety Wrapper hooks |
|
|||
|
|
| P2 | Per-agent model overrides | 3 days | Hub agent config UI |
|
|||
|
|
| P2 | Premium model gating (credit card check) | 1 week | Hub billing + Stripe |
|
|||
|
|
| P2 | Token budget alerts | 3 days | Hub notification system |
|
|||
|
|
| P3 | Multi-agent coordination (parallel delegation) | 2 weeks | Agent-to-agent messaging |
|
|||
|
|
| P3 | Per-task model override (future) | 1 week | Conversation context detection |
|
|||
|
|
| P3 | Customer usage dashboard | 1 week | Hub frontend |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Changelog
|
|||
|
|
|
|||
|
|
| Version | Date | Changes |
|
|||
|
|
|---------|------|---------|
|
|||
|
|
| 1.0 | 2026-02-26 | Initial spec. Agent routing via Dispatcher. Model presets (Basic/Balanced/Complex). Fallback chains with auth rotation. Token pool management. Prompt caching strategy. Rate limiting. Configuration reference. Implementation priorities. |
|