540 lines
22 KiB
Markdown
540 lines
22 KiB
Markdown
# LetsBe Biz — Dispatcher Routing Logic
|
||
|
||
**Version:** 1.0
|
||
**Date:** February 26, 2026
|
||
**Authors:** Matt (Founder), Claude (Architecture)
|
||
**Status:** Engineering Spec — Ready for Implementation
|
||
**Companion docs:** Technical Architecture v1.2, Pricing Model v2.2, SOUL.md Content Spec v1.0
|
||
**Decision refs:** Foundation Document Decisions #33, #35, #41
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This document specifies two routing systems that are central to LetsBe Biz:
|
||
|
||
1. **Agent Routing (Dispatcher)** — How user messages are routed to the correct AI agent
|
||
2. **Model Routing** — How AI requests are routed to the optimal LLM model based on task complexity, user settings, and cost constraints
|
||
|
||
Both routing systems live in the Safety Wrapper extension and operate transparently — users interact with "their AI team," not with routing logic.
|
||
|
||
---
|
||
|
||
## 2. Agent Routing (Dispatcher Logic)
|
||
|
||
### 2.1 Architecture
|
||
|
||
The Dispatcher is an OpenClaw agent configured with `agentToAgent` communication enabled. It uses the `messaging` tool profile and serves as the default entry point for all user messages.
|
||
|
||
```
|
||
User Message
|
||
│
|
||
▼
|
||
Dispatcher Agent (SOUL.md: routing rules)
|
||
│
|
||
├── Simple / cross-domain → Handle directly
|
||
│
|
||
├── Infrastructure → delegate to IT Admin
|
||
├── Content / analytics → delegate to Marketing
|
||
├── Scheduling / comms → delegate to Secretary
|
||
├── CRM / pipeline → delegate to Sales
|
||
│
|
||
└── Multi-domain → coordinate across agents
|
||
```
|
||
|
||
### 2.2 Routing Decision Matrix
|
||
|
||
The Dispatcher routes based on **intent classification**. OpenClaw's native agent routing handles this through the Dispatcher's SOUL.md instructions — no separate classification model is needed.
|
||
|
||
| Signal | Routes To | Examples |
|
||
|--------|-----------|---------|
|
||
| Infrastructure keywords | IT Admin | "restart", "container", "backup", "disk", "server", "install", "update", "nginx", "Docker", "SSL", "certificate", "Keycloak", "Portainer" |
|
||
| Content/analytics keywords | Marketing | "blog", "post", "newsletter", "campaign", "analytics", "traffic", "subscribers", "Ghost", "Listmonk", "Umami", "SEO" |
|
||
| Scheduling/comms keywords | Secretary | "calendar", "meeting", "schedule", "email", "respond", "follow up", "Chatwoot", "Cal.com", "appointment", "reminder" |
|
||
| CRM/sales keywords | Sales | "lead", "opportunity", "pipeline", "CRM", "deal", "prospect", "follow-up", "Odoo", "quote", "proposal" |
|
||
| System questions | Dispatcher (self) | "what can you do", "how does this work", "what tools do I have", "help", "status", "summary" |
|
||
| Multi-domain | Dispatcher coordinates | "morning briefing", "give me a weekly summary", "how's business", "prepare for my meeting with [client]" |
|
||
|
||
### 2.3 Delegation Protocol
|
||
|
||
When the Dispatcher delegates to a specialist agent, it uses OpenClaw's native agent-to-agent messaging:
|
||
|
||
```
|
||
1. Dispatcher receives user message
|
||
2. Dispatcher identifies the target agent
|
||
3. Dispatcher sends structured delegation message:
|
||
{
|
||
"to": "it-admin",
|
||
"context": "User requests: 'Why is Nextcloud slow?'",
|
||
"expectation": "Diagnose and report. If action needed, get user approval."
|
||
}
|
||
4. Target agent receives message, executes task
|
||
5. Target agent returns result to Dispatcher
|
||
6. Dispatcher formats and presents result to user
|
||
```
|
||
|
||
### 2.4 Multi-Agent Coordination
|
||
|
||
For tasks spanning multiple agents, the Dispatcher acts as coordinator:
|
||
|
||
**Example: "Prepare for my call with Acme Corp tomorrow"**
|
||
|
||
1. Dispatcher identifies subtasks:
|
||
- Secretary: Pull calendar details, recent email threads with Acme
|
||
- Sales: Pull CRM record, pipeline status, last interaction
|
||
- Marketing: Check if Acme visited the website recently (Umami)
|
||
2. Dispatcher delegates each subtask in parallel (or sequential if dependencies exist)
|
||
3. Dispatcher compiles results into a unified briefing
|
||
4. Dispatcher presents the briefing to the user
|
||
|
||
### 2.5 Fallback Behavior
|
||
|
||
| Scenario | Behavior |
|
||
|----------|----------|
|
||
| Target agent unavailable (crashed/restarting) | Dispatcher notifies user, suggests IT Admin investigate |
|
||
| Ambiguous request | Dispatcher makes best judgment, routes, tells user who's handling it |
|
||
| User explicitly names an agent | Route directly ("Tell the IT Admin to restart Ghost") |
|
||
| Request is outside all agent capabilities | Dispatcher explains honestly what's possible and what isn't |
|
||
| Agent returns an error | Dispatcher reports the error to the user and suggests next steps |
|
||
|
||
---
|
||
|
||
## 3. Model Routing
|
||
|
||
### 3.1 Architecture
|
||
|
||
Model routing determines which LLM processes each agent turn. The Safety Wrapper's `before_prompt_build` hook (or the outbound secrets proxy) controls which model endpoint the request is sent to.
|
||
|
||
```
|
||
Agent Turn
|
||
│
|
||
▼
|
||
Safety Wrapper: Model Router
|
||
│
|
||
├── Check user's model setting (Basic / Balanced / Complex / Specific Model)
|
||
├── Check if premium model → verify credit card on file
|
||
├── Check token pool → enough tokens remaining?
|
||
│
|
||
▼
|
||
Route to OpenRouter endpoint
|
||
│
|
||
├── Primary model → attempt
|
||
├── If rate limited → try auth profile rotation (same model, different key)
|
||
├── If still failing → fallback to next model in chain
|
||
│
|
||
▼
|
||
Response → Token metering → Return to agent
|
||
```
|
||
|
||
### 3.2 Model Presets (Basic Settings)
|
||
|
||
Users who don't want to think about models pick a preset. Each preset maps to a prioritized model chain.
|
||
|
||
| Preset | Primary Model | Fallback 1 | Fallback 2 | Blended Cost/1M | Use Case |
|
||
|--------|--------------|------------|------------|-----------------|----------|
|
||
| **Basic Tasks** | GPT 5 Nano | Gemini 3 Flash Preview | DeepSeek V3.2 | $0.20–1.58 | Quick lookups, formatting, simple drafts |
|
||
| **Balanced** (default) | DeepSeek V3.2 | MiniMax M2.5 | GPT 5 Nano | $0.20–0.70 | Daily operations, routine agent work |
|
||
| **Complex Tasks** | GLM 5 | MiniMax M2.5 | DeepSeek V3.2 | $0.33–1.68 | Analysis, multi-step reasoning, reports |
|
||
|
||
**Preset assignment logic:**
|
||
|
||
```
|
||
function resolveModel(agentId, taskContext) {
|
||
// 1. Check for agent-specific model override
|
||
if (agentConfig[agentId].model) return agentConfig[agentId].model;
|
||
|
||
// 2. Check user's global preset setting
|
||
const preset = tenantConfig.modelPreset; // "basic" | "balanced" | "complex"
|
||
|
||
// 3. Return the primary model for that preset
|
||
return PRESETS[preset].primary;
|
||
}
|
||
```
|
||
|
||
### 3.3 Advanced Model Selection
|
||
|
||
Users with a credit card on file can select specific models per agent or per task:
|
||
|
||
| Configuration Level | Scope | Example |
|
||
|--------------------|-------|---------|
|
||
| Global preset | All agents, all tasks | "Use Balanced for everything" |
|
||
| Per-agent override | All tasks for one agent | "IT Admin uses Complex, everything else uses Balanced" |
|
||
| Per-task override (future) | Single task/conversation | "Use Claude Sonnet for this analysis" |
|
||
|
||
**Schema (Safety Wrapper config):**
|
||
|
||
```json
|
||
{
|
||
"model_routing": {
|
||
"default_preset": "balanced",
|
||
"agent_overrides": {
|
||
"it-admin": { "preset": "complex" },
|
||
"marketing": { "model": "claude-sonnet-4.6" }
|
||
},
|
||
"premium_enabled": true,
|
||
"credit_card_on_file": true
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3.4 Included vs. Premium Model Routing
|
||
|
||
| Model Category | Token Pool | Billing | Credit Card Required |
|
||
|---------------|------------|---------|---------------------|
|
||
| **Included** (DeepSeek V3.2, GPT 5 Nano, GPT 5.2 Mini, MiniMax M2.5, Gemini Flash, GLM 5) | Draws from monthly allocation | Subscription covers it | No |
|
||
| **Premium** (GPT 5.2, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3.1 Pro) | Separate — does NOT draw from pool | Per-token metered to credit card | **Yes** |
|
||
|
||
**Routing decision tree:**
|
||
|
||
```
|
||
Is the selected model Premium?
|
||
├── No → Check token pool
|
||
│ ├── Tokens remaining → Route to model
|
||
│ └── Pool exhausted → Apply overage markup, notify user, route to model
|
||
│
|
||
└── Yes → Check credit card
|
||
├── Card on file → Route to model, meter tokens
|
||
└── No card → Reject, prompt user to add card in settings
|
||
```
|
||
|
||
### 3.5 Token Pool Management
|
||
|
||
Each subscription tier includes a monthly token allocation:
|
||
|
||
| Tier | Monthly Tokens | Founding Member Tokens |
|
||
|------|---------------|----------------------|
|
||
| Lite (€29) | ~8M | ~16M |
|
||
| Build (€45) | ~15M | ~30M |
|
||
| Scale (€75) | ~25M | ~50M |
|
||
| Enterprise (€109) | ~40M | ~80M |
|
||
|
||
**Pool tracking implementation:**
|
||
|
||
```
|
||
On every LLM response:
|
||
1. Safety Wrapper captures token counts (input, output, cache_read, cache_write)
|
||
2. Calculates cost: tokens × model_rate × (1 + openrouter_fee)
|
||
3. Converts to "standard tokens" (normalized to DeepSeek V3.2 equivalent)
|
||
4. Decrements from monthly pool
|
||
5. Reports to Hub via usage endpoint
|
||
|
||
When pool is exhausted:
|
||
1. Safety Wrapper detects pool < 0
|
||
2. Switches to overage billing mode
|
||
3. Applies overage markup (35% for cheap models, 25% mid, 20% top included)
|
||
4. Notifies user: "Your included tokens are used up. Continuing at overage rates."
|
||
5. User can top up, upgrade tier, or wait for next billing cycle
|
||
```
|
||
|
||
### 3.6 Fallback Chain Logic
|
||
|
||
When the primary model fails, the router attempts fallbacks before giving up.
|
||
|
||
**Failure types and responses:**
|
||
|
||
| Failure | First Response | Second Response | Third Response |
|
||
|---------|---------------|-----------------|----------------|
|
||
| Rate limited (429) | Rotate auth profile (different OpenRouter key) | Wait 5s, retry same model | Fall to next model in chain |
|
||
| Model unavailable (503) | Fall to next model in chain immediately | Continue down chain | Return error to agent |
|
||
| Context too long | Truncate and retry | Fall to model with larger context (Gemini Flash: 1M) | Return error suggesting context compaction |
|
||
| Timeout (>60s) | Retry once | Fall to faster model | Return timeout error |
|
||
| Auth error (401/403) | Rotate auth profile | Retry with Hub-synced key | Return auth error, notify admin |
|
||
|
||
**Auth profile rotation:** OpenClaw natively supports multiple auth profiles per model provider. Before falling back to a different model, the router first tries rotating to a different API key for the same model. This handles per-key rate limits.
|
||
|
||
```json
|
||
{
|
||
"providers": {
|
||
"openrouter": {
|
||
"auth_profiles": [
|
||
{ "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
|
||
{ "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" },
|
||
{ "id": "tertiary", "key": "SECRET_REF(openrouter_key_3)" }
|
||
],
|
||
"rotation_strategy": "round-robin-on-failure"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3.7 Fallback Chain Definitions
|
||
|
||
| Starting Model | Fallback 1 | Fallback 2 | Fallback 3 (emergency) |
|
||
|---------------|------------|------------|----------------------|
|
||
| DeepSeek V3.2 | MiniMax M2.5 | GPT 5 Nano | — (return error) |
|
||
| GPT 5 Nano | DeepSeek V3.2 | — | — |
|
||
| GLM 5 | MiniMax M2.5 | DeepSeek V3.2 | GPT 5 Nano |
|
||
| MiniMax M2.5 | DeepSeek V3.2 | GPT 5 Nano | — |
|
||
| Gemini Flash | DeepSeek V3.2 | GPT 5 Nano | — |
|
||
| GPT 5.2 (premium) | GLM 5 (included) | DeepSeek V3.2 | — |
|
||
| Claude Sonnet 4.6 (premium) | GPT 5.2 (premium) | GLM 5 | DeepSeek V3.2 |
|
||
| Claude Opus 4.6 (premium) | Claude Sonnet 4.6 | GPT 5.2 | GLM 5 |
|
||
|
||
**Cross-category fallback rule:** Premium models can fall back to included models, but included models never fall "up" to premium models (that would charge the user unexpectedly).
|
||
|
||
---
|
||
|
||
## 4. Prompt Caching Strategy
|
||
|
||
### 4.1 Cache Architecture
|
||
|
||
OpenClaw's prompt caching with `cacheRetention: "long"` (1-hour TTL) is the primary cost optimization. The SOUL.md is the cacheable prefix.
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────┐
|
||
│ Cached Prefix (1-hour TTL) │
|
||
│ ┌──────────────────┐ ┌────────────────────────┐ │
|
||
│ │ SOUL.md (~3K tok) │ │ Tool Registry (~3K tok) │ │
|
||
│ └──────────────────┘ └────────────────────────┘ │
|
||
└─────────────────────────────────────────────────┘
|
||
┌─────────────────────────────────────────────────┐
|
||
│ Dynamic Content (not cached) │
|
||
│ ┌────────────────┐ ┌──────────────────────────┐ │
|
||
│ │ Conversation │ │ Tool results / context │ │
|
||
│ └────────────────┘ └──────────────────────────┘ │
|
||
└─────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 4.2 Cache Savings by Model
|
||
|
||
| Model | Standard Input/1M | Cache Read/1M | Savings % | Monthly Savings (20M tokens, 60% cache hit) |
|
||
|-------|-------------------|---------------|-----------|---------------------------------------------|
|
||
| DeepSeek V3.2 | $0.274 | $0.211 | 23% | $0.76 |
|
||
| GPT 5 Nano | $0.053 | $0.005 | 91% | $0.58 |
|
||
| Gemini Flash | $0.528 | $0.053 | 90% | $5.70 |
|
||
| GLM 5 | $1.002 | $0.211 | 79% | $9.49 |
|
||
| Claude Sonnet 4.6 | $3.165 | $0.317 | 90% | $34.18 |
|
||
|
||
### 4.3 Heartbeat Keep-Warm
|
||
|
||
To maximize cache hit rates, the Safety Wrapper sends a heartbeat every 55 minutes (just under the 1-hour TTL). This keeps the SOUL.md prefix in cache without a real user interaction.
|
||
|
||
**Config:**
|
||
```json
|
||
{
|
||
"heartbeat": {
|
||
"every": "55m"
|
||
}
|
||
}
|
||
```
|
||
|
||
**Cost of keep-warm:** One minimal prompt per agent every 55 minutes = ~5K tokens × 24 turns/day × 5 agents = ~600K tokens/day. At DeepSeek V3.2 cache read rates: ~$0.13/day ($3.80/month). This is offset by the cache savings on real interactions.
|
||
|
||
**Decision: Only keep-warm agents that were active in the last 24 hours.** No point warming cache for agents the user hasn't talked to. The Safety Wrapper tracks `lastActiveAt` per agent and only sends heartbeats for recently active agents.
|
||
|
||
### 4.4 Cache Invalidation
|
||
|
||
Cache is invalidated when:
|
||
|
||
| Event | Action | Impact |
|
||
|-------|--------|--------|
|
||
| SOUL.md content changes | Cache miss on next turn, re-cache | One-time cost (~6K tokens at full input rate) |
|
||
| Tool registry changes (new tool installed) | Cache miss on next turn, re-cache | One-time cost |
|
||
| Model changed (user switches preset) | New model has fresh cache | Cache builds from scratch for new model |
|
||
| 1-hour TTL expires without heartbeat | Cache expires naturally | Re-cache on next interaction |
|
||
|
||
---
|
||
|
||
## 5. Load Balancing & Rate Limiting
|
||
|
||
### 5.1 Per-Tenant Rate Limits
|
||
|
||
Each tenant VPS has built-in rate limiting to prevent runaway token consumption:
|
||
|
||
| Limit | Default | Configurable | Purpose |
|
||
|-------|---------|-------------|---------|
|
||
| Max concurrent agent turns | 1 | No (OpenClaw default) | Prevent race conditions |
|
||
| Max tool calls per turn | 50 | Yes (Safety Wrapper) | Prevent infinite loops |
|
||
| Max tokens per single turn | 100K | Yes (Safety Wrapper) | Prevent context explosion |
|
||
| Max turns per hour per agent | 60 | Yes (Safety Wrapper) | Prevent runaway automation |
|
||
| Max API calls to Hub per minute | 10 | No | Prevent Hub overload |
|
||
| OpenRouter requests per minute | Per OpenRouter's limits | No | External rate limit |
|
||
|
||
### 5.2 Loop Detection
|
||
|
||
OpenClaw's native loop detection (`tools.loopDetection.enabled: true`) prevents agents from calling the same tool repeatedly without making progress. The Safety Wrapper adds a secondary check:
|
||
|
||
```
|
||
If agent calls the same tool with the same parameters 3 times in 60 seconds:
|
||
→ Block the call
|
||
→ Log warning
|
||
→ Notify user: "The AI appears to be stuck in a loop. I've paused it."
|
||
```
|
||
|
||
### 5.3 Token Budget Alerts
|
||
|
||
The Safety Wrapper monitors token consumption and alerts proactively:
|
||
|
||
| Pool Usage | Action |
|
||
|-----------|--------|
|
||
| 50% consumed | No action (tracked internally) |
|
||
| 75% consumed | Notify user: "You've used 75% of your monthly tokens" |
|
||
| 90% consumed | Notify user with upgrade suggestion |
|
||
| 100% consumed | Switch to overage mode, notify user with cost estimate |
|
||
| 150% of pool (overage) | Strong warning, suggest reviewing which agents are most active |
|
||
|
||
---
|
||
|
||
## 6. Monitoring & Observability
|
||
|
||
### 6.1 Metrics Collected
|
||
|
||
The Safety Wrapper collects and reports these metrics to the Hub via heartbeat:
|
||
|
||
| Metric | Granularity | Used For |
|
||
|--------|------------|----------|
|
||
| Tokens per agent per model per hour | Hourly buckets | Billing, usage dashboards |
|
||
| Model selection frequency | Per request | Optimize default presets |
|
||
| Fallback trigger count | Per hour | Monitor model reliability |
|
||
| Cache hit rate | Per agent per hour | Cost optimization tracking |
|
||
| Agent routing decisions | Per request | Dispatcher accuracy tracking |
|
||
| Tool call count per agent | Per hour | Identify heavy automation |
|
||
| Approval queue latency | Per request | UX optimization |
|
||
| Error rate per model | Per hour | Model health monitoring |
|
||
|
||
### 6.2 Dashboard Views (Hub)
|
||
|
||
**Admin dashboard (staff):**
|
||
- Global token usage heatmap (all tenants)
|
||
- Model usage distribution pie chart
|
||
- Fallback frequency by model (alerts if >5% in any hour)
|
||
- Revenue per model (included vs. premium vs. overage)
|
||
|
||
**Customer dashboard:**
|
||
- "Your AI Team This Month" — token usage by agent, visualized as a bar chart
|
||
- Model usage breakdown (which models are being used)
|
||
- Pool status gauge (% remaining)
|
||
- Cost breakdown (included vs. overage vs. premium)
|
||
- "Most Active Agent" — who's doing the most work
|
||
|
||
---
|
||
|
||
## 7. Configuration Reference
|
||
|
||
### 7.1 Model Routing Config (`model-routing.json`)
|
||
|
||
```json
|
||
{
|
||
"presets": {
|
||
"basic": {
|
||
"name": "Basic Tasks",
|
||
"models": ["openai/gpt-5-nano", "google/gemini-3-flash-preview", "deepseek/deepseek-v3.2"],
|
||
"description": "Quick lookups, simple drafts, data entry"
|
||
},
|
||
"balanced": {
|
||
"name": "Balanced",
|
||
"models": ["deepseek/deepseek-v3.2", "minimax/minimax-m2.5", "openai/gpt-5-nano"],
|
||
"description": "Day-to-day operations, routine tasks"
|
||
},
|
||
"complex": {
|
||
"name": "Complex Tasks",
|
||
"models": ["zhipu/glm-5", "minimax/minimax-m2.5", "deepseek/deepseek-v3.2"],
|
||
"description": "Analysis, multi-step reasoning, reports"
|
||
}
|
||
},
|
||
"premium_models": [
|
||
"openai/gpt-5.2",
|
||
"google/gemini-3.1-pro",
|
||
"anthropic/claude-sonnet-4.6",
|
||
"anthropic/claude-opus-4.6"
|
||
],
|
||
"included_models": [
|
||
"deepseek/deepseek-v3.2",
|
||
"openai/gpt-5-nano",
|
||
"openai/gpt-5.2-mini",
|
||
"minimax/minimax-m2.5",
|
||
"google/gemini-3-flash-preview",
|
||
"zhipu/glm-5"
|
||
],
|
||
"fallback_chains": {
|
||
"deepseek/deepseek-v3.2": ["minimax/minimax-m2.5", "openai/gpt-5-nano"],
|
||
"openai/gpt-5-nano": ["deepseek/deepseek-v3.2"],
|
||
"zhipu/glm-5": ["minimax/minimax-m2.5", "deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
||
"minimax/minimax-m2.5": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
||
"google/gemini-3-flash-preview": ["deepseek/deepseek-v3.2", "openai/gpt-5-nano"],
|
||
"openai/gpt-5.2": ["zhipu/glm-5", "deepseek/deepseek-v3.2"],
|
||
"anthropic/claude-sonnet-4.6": ["openai/gpt-5.2", "zhipu/glm-5", "deepseek/deepseek-v3.2"],
|
||
"anthropic/claude-opus-4.6": ["anthropic/claude-sonnet-4.6", "openai/gpt-5.2", "zhipu/glm-5"]
|
||
},
|
||
"overage_markup": {
|
||
"cheap": { "threshold_max": 0.50, "markup": 0.35 },
|
||
"mid": { "threshold_max": 1.20, "markup": 0.25 },
|
||
"top": { "threshold_max": 999, "markup": 0.20 }
|
||
},
|
||
"premium_markup": {
|
||
"default": 0.10,
|
||
"overrides": {
|
||
"anthropic/claude-opus-4.6": 0.08
|
||
}
|
||
},
|
||
"rate_limits": {
|
||
"max_tool_calls_per_turn": 50,
|
||
"max_tokens_per_turn": 100000,
|
||
"max_turns_per_hour_per_agent": 60,
|
||
"loop_detection_threshold": 3,
|
||
"loop_detection_window_seconds": 60
|
||
},
|
||
"caching": {
|
||
"retention": "long",
|
||
"heartbeat_interval": "55m",
|
||
"warmup_only_active_agents": true,
|
||
"active_agent_threshold_hours": 24
|
||
}
|
||
}
|
||
```
|
||
|
||
### 7.2 OpenRouter Provider Config (`openclaw.json` excerpt)
|
||
|
||
```json
|
||
{
|
||
"providers": {
|
||
"openrouter": {
|
||
"base_url": "http://127.0.0.1:8100/v1",
|
||
"auth_profiles": [
|
||
{ "id": "primary", "key": "SECRET_REF(openrouter_key_1)" },
|
||
{ "id": "secondary", "key": "SECRET_REF(openrouter_key_2)" }
|
||
],
|
||
"rotation_strategy": "round-robin-on-failure",
|
||
"timeout_ms": 60000,
|
||
"retry": {
|
||
"max_attempts": 3,
|
||
"backoff_ms": [1000, 5000, 15000]
|
||
}
|
||
}
|
||
},
|
||
"model": {
|
||
"primary": "deepseek/deepseek-v3.2",
|
||
"fallback": ["minimax/minimax-m2.5", "openai/gpt-5-nano"]
|
||
}
|
||
}
|
||
```
|
||
|
||
Note: `base_url` points to the local secrets proxy (`127.0.0.1:8100`) which handles credential injection and outbound redaction before forwarding to OpenRouter's actual API.
|
||
|
||
---
|
||
|
||
## 8. Implementation Priorities
|
||
|
||
| Priority | Component | Effort | Dependencies |
|
||
|----------|-----------|--------|-------------|
|
||
| P0 | Basic preset routing (3 presets → model selection) | 1 week | Safety Wrapper skeleton |
|
||
| P0 | Fallback chain with auth rotation | 1 week | OpenRouter integration |
|
||
| P0 | Token metering and pool tracking | 2 weeks | Hub billing endpoints |
|
||
| P1 | Agent routing (Dispatcher SOUL.md) | 1 week | SOUL.md templates |
|
||
| P1 | Prompt caching with heartbeat keep-warm | 3 days | OpenClaw caching config |
|
||
| P1 | Loop detection (Safety Wrapper layer) | 3 days | Safety Wrapper hooks |
|
||
| P2 | Per-agent model overrides | 3 days | Hub agent config UI |
|
||
| P2 | Premium model gating (credit card check) | 1 week | Hub billing + Stripe |
|
||
| P2 | Token budget alerts | 3 days | Hub notification system |
|
||
| P3 | Multi-agent coordination (parallel delegation) | 2 weeks | Agent-to-agent messaging |
|
||
| P3 | Per-task model override (future) | 1 week | Conversation context detection |
|
||
| P3 | Customer usage dashboard | 1 week | Hub frontend |
|
||
|
||
---
|
||
|
||
## 9. Changelog
|
||
|
||
| Version | Date | Changes |
|
||
|---------|------|---------|
|
||
| 1.0 | 2026-02-26 | Initial spec. Agent routing via Dispatcher. Model presets (Basic/Balanced/Complex). Fallback chains with auth rotation. Token pool management. Prompt caching strategy. Rate limiting. Configuration reference. Implementation priorities. |
|