Include full contents of all nested repositories

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 16:25:02 +01:00
parent 14ff8fd54c
commit 2401ed446f
7271 changed files with 1310112 additions and 6 deletions
--- a/openclaw/docs/concepts/agent-loop.md
+++ b/openclaw/docs/concepts/agent-loop.md
@@ -0,0 +1,148 @@
+---
+summary: "Agent loop lifecycle, streams, and wait semantics"
+read_when:
+  - You need an exact walkthrough of the agent loop or lifecycle events
+title: "Agent Loop"
+---
+
+# Agent Loop (OpenClaw)
+
+An agentic loop is the full “real” run of an agent: intake → context assembly → model inference →
+tool execution → streaming replies → persistence. It’s the authoritative path that turns a message
+into actions and a final reply, while keeping session state consistent.
+
+In OpenClaw, a loop is a single, serialized run per session that emits lifecycle and stream events
+as the model thinks, calls tools, and streams output. This doc explains how that authentic loop is
+wired end-to-end.
+
+## Entry points
+
+- Gateway RPC: `agent` and `agent.wait`.
+- CLI: `agent` command.
+
+## How it works (high-level)
+
+1. `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately.
+2. `agentCommand` runs the agent:
+   - resolves model + thinking/verbose defaults
+   - loads skills snapshot
+   - calls `runEmbeddedPiAgent` (pi-agent-core runtime)
+   - emits **lifecycle end/error** if the embedded loop does not emit one
+3. `runEmbeddedPiAgent`:
+   - serializes runs via per-session + global queues
+   - resolves model + auth profile and builds the pi session
+   - subscribes to pi events and streams assistant/tool deltas
+   - enforces timeout -> aborts run if exceeded
+   - returns payloads + usage metadata
+4. `subscribeEmbeddedPiSession` bridges pi-agent-core events to OpenClaw `agent` stream:
+   - tool events => `stream: "tool"`
+   - assistant deltas => `stream: "assistant"`
+   - lifecycle events => `stream: "lifecycle"` (`phase: "start" | "end" | "error"`)
+5. `agent.wait` uses `waitForAgentJob`:
+   - waits for **lifecycle end/error** for `runId`
+   - returns `{ status: ok|error|timeout, startedAt, endedAt, error? }`
+
+## Queueing + concurrency
+
+- Runs are serialized per session key (session lane) and optionally through a global lane.
+- This prevents tool/session races and keeps session history consistent.
+- Messaging channels can choose queue modes (collect/steer/followup) that feed this lane system.
+  See [Command Queue](/concepts/queue).
+
+## Session + workspace preparation
+
+- Workspace is resolved and created; sandboxed runs may redirect to a sandbox workspace root.
+- Skills are loaded (or reused from a snapshot) and injected into env and prompt.
+- Bootstrap/context files are resolved and injected into the system prompt report.
+- A session write lock is acquired; `SessionManager` is opened and prepared before streaming.
+
+## Prompt assembly + system prompt
+
+- System prompt is built from OpenClaw’s base prompt, skills prompt, bootstrap context, and per-run overrides.
+- Model-specific limits and compaction reserve tokens are enforced.
+- See [System prompt](/concepts/system-prompt) for what the model sees.
+
+## Hook points (where you can intercept)
+
+OpenClaw has two hook systems:
+
+- **Internal hooks** (Gateway hooks): event-driven scripts for commands and lifecycle events.
+- **Plugin hooks**: extension points inside the agent/tool lifecycle and gateway pipeline.
+
+### Internal hooks (Gateway hooks)
+
+- **`agent:bootstrap`**: runs while building bootstrap files before the system prompt is finalized.
+  Use this to add/remove bootstrap context files.
+- **Command hooks**: `/new`, `/reset`, `/stop`, and other command events (see Hooks doc).
+
+See [Hooks](/automation/hooks) for setup and examples.
+
+### Plugin hooks (agent + gateway lifecycle)
+
+These run inside the agent loop or gateway pipeline:
+
+- **`before_model_resolve`**: runs pre-session (no `messages`) to deterministically override provider/model before model resolution.
+- **`before_prompt_build`**: runs after session load (with `messages`) to inject `prependContext`/`systemPrompt` before prompt submission.
+- **`before_agent_start`**: legacy compatibility hook that may run in either phase; prefer the explicit hooks above.
+- **`agent_end`**: inspect the final message list and run metadata after completion.
+- **`before_compaction` / `after_compaction`**: observe or annotate compaction cycles.
+- **`before_tool_call` / `after_tool_call`**: intercept tool params/results.
+- **`tool_result_persist`**: synchronously transform tool results before they are written to the session transcript.
+- **`message_received` / `message_sending` / `message_sent`**: inbound + outbound message hooks.
+- **`session_start` / `session_end`**: session lifecycle boundaries.
+- **`gateway_start` / `gateway_stop`**: gateway lifecycle events.
+
+See [Plugins](/tools/plugin#plugin-hooks) for the hook API and registration details.
+
+## Streaming + partial replies
+
+- Assistant deltas are streamed from pi-agent-core and emitted as `assistant` events.
+- Block streaming can emit partial replies either on `text_end` or `message_end`.
+- Reasoning streaming can be emitted as a separate stream or as block replies.
+- See [Streaming](/concepts/streaming) for chunking and block reply behavior.
+
+## Tool execution + messaging tools
+
+- Tool start/update/end events are emitted on the `tool` stream.
+- Tool results are sanitized for size and image payloads before logging/emitting.
+- Messaging tool sends are tracked to suppress duplicate assistant confirmations.
+
+## Reply shaping + suppression
+
+- Final payloads are assembled from:
+  - assistant text (and optional reasoning)
+  - inline tool summaries (when verbose + allowed)
+  - assistant error text when the model errors
+- `NO_REPLY` is treated as a silent token and filtered from outgoing payloads.
+- Messaging tool duplicates are removed from the final payload list.
+- If no renderable payloads remain and a tool errored, a fallback tool error reply is emitted
+  (unless a messaging tool already sent a user-visible reply).
+
+## Compaction + retries
+
+- Auto-compaction emits `compaction` stream events and can trigger a retry.
+- On retry, in-memory buffers and tool summaries are reset to avoid duplicate output.
+- See [Compaction](/concepts/compaction) for the compaction pipeline.
+
+## Event streams (today)
+
+- `lifecycle`: emitted by `subscribeEmbeddedPiSession` (and as a fallback by `agentCommand`)
+- `assistant`: streamed deltas from pi-agent-core
+- `tool`: streamed tool events from pi-agent-core
+
+## Chat channel handling
+
+- Assistant deltas are buffered into chat `delta` messages.
+- A chat `final` is emitted on **lifecycle end/error**.
+
+## Timeouts
+
+- `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides.
+- Agent runtime: `agents.defaults.timeoutSeconds` default 600s; enforced in `runEmbeddedPiAgent` abort timer.
+
+## Where things can end early
+
+- Agent timeout (abort)
+- AbortSignal (cancel)
+- Gateway disconnect or RPC timeout
+- `agent.wait` timeout (wait-only, does not stop agent)
--- a/openclaw/docs/concepts/agent-workspace.md
+++ b/openclaw/docs/concepts/agent-workspace.md
@@ -0,0 +1,234 @@
+---
+summary: "Agent workspace: location, layout, and backup strategy"
+read_when:
+  - You need to explain the agent workspace or its file layout
+  - You want to back up or migrate an agent workspace
+title: "Agent Workspace"
+---
+
+# Agent workspace
+
+The workspace is the agent's home. It is the only working directory used for
+file tools and for workspace context. Keep it private and treat it as memory.
+
+This is separate from `~/.openclaw/`, which stores config, credentials, and
+sessions.
+
+**Important:** the workspace is the **default cwd**, not a hard sandbox. Tools
+resolve relative paths against the workspace, but absolute paths can still reach
+elsewhere on the host unless sandboxing is enabled. If you need isolation, use
+[`agents.defaults.sandbox`](/gateway/sandboxing) (and/or per‑agent sandbox config).
+When sandboxing is enabled and `workspaceAccess` is not `"rw"`, tools operate
+inside a sandbox workspace under `~/.openclaw/sandboxes`, not your host workspace.
+
+## Default location
+
+- Default: `~/.openclaw/workspace`
+- If `OPENCLAW_PROFILE` is set and not `"default"`, the default becomes
+  `~/.openclaw/workspace-<profile>`.
+- Override in `~/.openclaw/openclaw.json`:
+
+```json5
+{
+  agent: {
+    workspace: "~/.openclaw/workspace",
+  },
+}
+```
+
+`openclaw onboard`, `openclaw configure`, or `openclaw setup` will create the
+workspace and seed the bootstrap files if they are missing.
+
+If you already manage the workspace files yourself, you can disable bootstrap
+file creation:
+
+```json5
+{ agent: { skipBootstrap: true } }
+```
+
+## Extra workspace folders
+
+Older installs may have created `~/openclaw`. Keeping multiple workspace
+directories around can cause confusing auth or state drift, because only one
+workspace is active at a time.
+
+**Recommendation:** keep a single active workspace. If you no longer use the
+extra folders, archive or move them to Trash (for example `trash ~/openclaw`).
+If you intentionally keep multiple workspaces, make sure
+`agents.defaults.workspace` points to the active one.
+
+`openclaw doctor` warns when it detects extra workspace directories.
+
+## Workspace file map (what each file means)
+
+These are the standard files OpenClaw expects inside the workspace:
+
+- `AGENTS.md`
+  - Operating instructions for the agent and how it should use memory.
+  - Loaded at the start of every session.
+  - Good place for rules, priorities, and "how to behave" details.
+
+- `SOUL.md`
+  - Persona, tone, and boundaries.
+  - Loaded every session.
+
+- `USER.md`
+  - Who the user is and how to address them.
+  - Loaded every session.
+
+- `IDENTITY.md`
+  - The agent's name, vibe, and emoji.
+  - Created/updated during the bootstrap ritual.
+
+- `TOOLS.md`
+  - Notes about your local tools and conventions.
+  - Does not control tool availability; it is only guidance.
+
+- `HEARTBEAT.md`
+  - Optional tiny checklist for heartbeat runs.
+  - Keep it short to avoid token burn.
+
+- `BOOT.md`
+  - Optional startup checklist executed on gateway restart when internal hooks are enabled.
+  - Keep it short; use the message tool for outbound sends.
+
+- `BOOTSTRAP.md`
+  - One-time first-run ritual.
+  - Only created for a brand-new workspace.
+  - Delete it after the ritual is complete.
+
+- `memory/YYYY-MM-DD.md`
+  - Daily memory log (one file per day).
+  - Recommended to read today + yesterday on session start.
+
+- `MEMORY.md` (optional)
+  - Curated long-term memory.
+  - Only load in the main, private session (not shared/group contexts).
+
+See [Memory](/concepts/memory) for the workflow and automatic memory flush.
+
+- `skills/` (optional)
+  - Workspace-specific skills.
+  - Overrides managed/bundled skills when names collide.
+
+- `canvas/` (optional)
+  - Canvas UI files for node displays (for example `canvas/index.html`).
+
+If any bootstrap file is missing, OpenClaw injects a "missing file" marker into
+the session and continues. Large bootstrap files are truncated when injected;
+adjust limits with `agents.defaults.bootstrapMaxChars` (default: 20000) and
+`agents.defaults.bootstrapTotalMaxChars` (default: 150000).
+`openclaw setup` can recreate missing defaults without overwriting existing
+files.
+
+## What is NOT in the workspace
+
+These live under `~/.openclaw/` and should NOT be committed to the workspace repo:
+
+- `~/.openclaw/openclaw.json` (config)
+- `~/.openclaw/credentials/` (OAuth tokens, API keys)
+- `~/.openclaw/agents/<agentId>/sessions/` (session transcripts + metadata)
+- `~/.openclaw/skills/` (managed skills)
+
+If you need to migrate sessions or config, copy them separately and keep them
+out of version control.
+
+## Git backup (recommended, private)
+
+Treat the workspace as private memory. Put it in a **private** git repo so it is
+backed up and recoverable.
+
+Run these steps on the machine where the Gateway runs (that is where the
+workspace lives).
+
+### 1) Initialize the repo
+
+If git is installed, brand-new workspaces are initialized automatically. If this
+workspace is not already a repo, run:
+
+```bash
+cd ~/.openclaw/workspace
+git init
+git add AGENTS.md SOUL.md TOOLS.md IDENTITY.md USER.md HEARTBEAT.md memory/
+git commit -m "Add agent workspace"
+```
+
+### 2) Add a private remote (beginner-friendly options)
+
+Option A: GitHub web UI
+
+1. Create a new **private** repository on GitHub.
+2. Do not initialize with a README (avoids merge conflicts).
+3. Copy the HTTPS remote URL.
+4. Add the remote and push:
+
+```bash
+git branch -M main
+git remote add origin <https-url>
+git push -u origin main
+```
+
+Option B: GitHub CLI (`gh`)
+
+```bash
+gh auth login
+gh repo create openclaw-workspace --private --source . --remote origin --push
+```
+
+Option C: GitLab web UI
+
+1. Create a new **private** repository on GitLab.
+2. Do not initialize with a README (avoids merge conflicts).
+3. Copy the HTTPS remote URL.
+4. Add the remote and push:
+
+```bash
+git branch -M main
+git remote add origin <https-url>
+git push -u origin main
+```
+
+### 3) Ongoing updates
+
+```bash
+git status
+git add .
+git commit -m "Update memory"
+git push
+```
+
+## Do not commit secrets
+
+Even in a private repo, avoid storing secrets in the workspace:
+
+- API keys, OAuth tokens, passwords, or private credentials.
+- Anything under `~/.openclaw/`.
+- Raw dumps of chats or sensitive attachments.
+
+If you must store sensitive references, use placeholders and keep the real
+secret elsewhere (password manager, environment variables, or `~/.openclaw/`).
+
+Suggested `.gitignore` starter:
+
+```gitignore
+.DS_Store
+.env
+**/*.key
+**/*.pem
+**/secrets*
+```
+
+## Moving the workspace to a new machine
+
+1. Clone the repo to the desired path (default `~/.openclaw/workspace`).
+2. Set `agents.defaults.workspace` to that path in `~/.openclaw/openclaw.json`.
+3. Run `openclaw setup --workspace <path>` to seed any missing files.
+4. If you need sessions, copy `~/.openclaw/agents/<agentId>/sessions/` from the
+   old machine separately.
+
+## Advanced notes
+
+- Multi-agent routing can use different workspaces per agent. See
+  [Channel routing](/channels/channel-routing) for routing configuration.
+- If `agents.defaults.sandbox` is enabled, non-main sessions can use per-session sandbox
+  workspaces under `agents.defaults.sandbox.workspaceRoot`.
--- a/openclaw/docs/concepts/agent.md
+++ b/openclaw/docs/concepts/agent.md
@@ -0,0 +1,123 @@
+---
+summary: "Agent runtime (embedded pi-mono), workspace contract, and session bootstrap"
+read_when:
+  - Changing agent runtime, workspace bootstrap, or session behavior
+title: "Agent Runtime"
+---
+
+# Agent Runtime 🤖
+
+OpenClaw runs a single embedded agent runtime derived from **pi-mono**.
+
+## Workspace (required)
+
+OpenClaw uses a single agent workspace directory (`agents.defaults.workspace`) as the agent’s **only** working directory (`cwd`) for tools and context.
+
+Recommended: use `openclaw setup` to create `~/.openclaw/openclaw.json` if missing and initialize the workspace files.
+
+Full workspace layout + backup guide: [Agent workspace](/concepts/agent-workspace)
+
+If `agents.defaults.sandbox` is enabled, non-main sessions can override this with
+per-session workspaces under `agents.defaults.sandbox.workspaceRoot` (see
+[Gateway configuration](/gateway/configuration)).
+
+## Bootstrap files (injected)
+
+Inside `agents.defaults.workspace`, OpenClaw expects these user-editable files:
+
+- `AGENTS.md` — operating instructions + “memory”
+- `SOUL.md` — persona, boundaries, tone
+- `TOOLS.md` — user-maintained tool notes (e.g. `imsg`, `sag`, conventions)
+- `BOOTSTRAP.md` — one-time first-run ritual (deleted after completion)
+- `IDENTITY.md` — agent name/vibe/emoji
+- `USER.md` — user profile + preferred address
+
+On the first turn of a new session, OpenClaw injects the contents of these files directly into the agent context.
+
+Blank files are skipped. Large files are trimmed and truncated with a marker so prompts stay lean (read the file for full content).
+
+If a file is missing, OpenClaw injects a single “missing file” marker line (and `openclaw setup` will create a safe default template).
+
+`BOOTSTRAP.md` is only created for a **brand new workspace** (no other bootstrap files present). If you delete it after completing the ritual, it should not be recreated on later restarts.
+
+To disable bootstrap file creation entirely (for pre-seeded workspaces), set:
+
+```json5
+{ agent: { skipBootstrap: true } }
+```
+
+## Built-in tools
+
+Core tools (read/exec/edit/write and related system tools) are always available,
+subject to tool policy. `apply_patch` is optional and gated by
+`tools.exec.applyPatch`. `TOOLS.md` does **not** control which tools exist; it’s
+guidance for how _you_ want them used.
+
+## Skills
+
+OpenClaw loads skills from three locations (workspace wins on name conflict):
+
+- Bundled (shipped with the install)
+- Managed/local: `~/.openclaw/skills`
+- Workspace: `<workspace>/skills`
+
+Skills can be gated by config/env (see `skills` in [Gateway configuration](/gateway/configuration)).
+
+## pi-mono integration
+
+OpenClaw reuses pieces of the pi-mono codebase (models/tools), but **session management, discovery, and tool wiring are OpenClaw-owned**.
+
+- No pi-coding agent runtime.
+- No `~/.pi/agent` or `<workspace>/.pi` settings are consulted.
+
+## Sessions
+
+Session transcripts are stored as JSONL at:
+
+- `~/.openclaw/agents/<agentId>/sessions/<SessionId>.jsonl`
+
+The session ID is stable and chosen by OpenClaw.
+Legacy Pi/Tau session folders are **not** read.
+
+## Steering while streaming
+
+When queue mode is `steer`, inbound messages are injected into the current run.
+The queue is checked **after each tool call**; if a queued message is present,
+remaining tool calls from the current assistant message are skipped (error tool
+results with "Skipped due to queued user message."), then the queued user
+message is injected before the next assistant response.
+
+When queue mode is `followup` or `collect`, inbound messages are held until the
+current turn ends, then a new agent turn starts with the queued payloads. See
+[Queue](/concepts/queue) for mode + debounce/cap behavior.
+
+Block streaming sends completed assistant blocks as soon as they finish; it is
+**off by default** (`agents.defaults.blockStreamingDefault: "off"`).
+Tune the boundary via `agents.defaults.blockStreamingBreak` (`text_end` vs `message_end`; defaults to text_end).
+Control soft block chunking with `agents.defaults.blockStreamingChunk` (defaults to
+800–1200 chars; prefers paragraph breaks, then newlines; sentences last).
+Coalesce streamed chunks with `agents.defaults.blockStreamingCoalesce` to reduce
+single-line spam (idle-based merging before send). Non-Telegram channels require
+explicit `*.blockStreaming: true` to enable block replies.
+Verbose tool summaries are emitted at tool start (no debounce); Control UI
+streams tool output via agent events when available.
+More details: [Streaming + chunking](/concepts/streaming).
+
+## Model refs
+
+Model refs in config (for example `agents.defaults.model` and `agents.defaults.models`) are parsed by splitting on the **first** `/`.
+
+- Use `provider/model` when configuring models.
+- If the model ID itself contains `/` (OpenRouter-style), include the provider prefix (example: `openrouter/moonshotai/kimi-k2`).
+- If you omit the provider, OpenClaw treats the input as an alias or a model for the **default provider** (only works when there is no `/` in the model ID).
+
+## Configuration (minimal)
+
+At minimum, set:
+
+- `agents.defaults.workspace`
+- `channels.whatsapp.allowFrom` (strongly recommended)
+
+---
+
+_Next: [Group Chats](/channels/group-messages)_ 🦞
--- a/openclaw/docs/concepts/architecture.md
+++ b/openclaw/docs/concepts/architecture.md
@@ -0,0 +1,139 @@
+---
+summary: "WebSocket gateway architecture, components, and client flows"
+read_when:
+  - Working on gateway protocol, clients, or transports
+title: "Gateway Architecture"
+---
+
+# Gateway architecture
+
+Last updated: 2026-01-22
+
+## Overview
+
+- A single long‑lived **Gateway** owns all messaging surfaces (WhatsApp via
+  Baileys, Telegram via grammY, Slack, Discord, Signal, iMessage, WebChat).
+- Control-plane clients (macOS app, CLI, web UI, automations) connect to the
+  Gateway over **WebSocket** on the configured bind host (default
+  `127.0.0.1:18789`).
+- **Nodes** (macOS/iOS/Android/headless) also connect over **WebSocket**, but
+  declare `role: node` with explicit caps/commands.
+- One Gateway per host; it is the only place that opens a WhatsApp session.
+- The **canvas host** is served by the Gateway HTTP server under:
+  - `/__openclaw__/canvas/` (agent-editable HTML/CSS/JS)
+  - `/__openclaw__/a2ui/` (A2UI host)
+    It uses the same port as the Gateway (default `18789`).
+
+## Components and flows
+
+### Gateway (daemon)
+
+- Maintains provider connections.
+- Exposes a typed WS API (requests, responses, server‑push events).
+- Validates inbound frames against JSON Schema.
+- Emits events like `agent`, `chat`, `presence`, `health`, `heartbeat`, `cron`.
+
+### Clients (mac app / CLI / web admin)
+
+- One WS connection per client.
+- Send requests (`health`, `status`, `send`, `agent`, `system-presence`).
+- Subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
+
+### Nodes (macOS / iOS / Android / headless)
+
+- Connect to the **same WS server** with `role: node`.
+- Provide a device identity in `connect`; pairing is **device‑based** (role `node`) and
+  approval lives in the device pairing store.
+- Expose commands like `canvas.*`, `camera.*`, `screen.record`, `location.get`.
+
+Protocol details:
+
+- [Gateway protocol](/gateway/protocol)
+
+### WebChat
+
+- Static UI that uses the Gateway WS API for chat history and sends.
+- In remote setups, connects through the same SSH/Tailscale tunnel as other
+  clients.
+
+## Connection lifecycle (single client)
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant Gateway
+
+    Client->>Gateway: req:connect
+    Gateway-->>Client: res (ok)
+    Note right of Gateway: or res error + close
+    Note left of Client: payload=hello-ok<br>snapshot: presence + health
+
+    Gateway-->>Client: event:presence
+    Gateway-->>Client: event:tick
+
+    Client->>Gateway: req:agent
+    Gateway-->>Client: res:agent<br>ack {runId, status:"accepted"}
+    Gateway-->>Client: event:agent<br>(streaming)
+    Gateway-->>Client: res:agent<br>final {runId, status, summary}
+```
+
+## Wire protocol (summary)
+
+- Transport: WebSocket, text frames with JSON payloads.
+- First frame **must** be `connect`.
+- After handshake:
+  - Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}`
+  - Events: `{type:"event", event, payload, seq?, stateVersion?}`
+- If `OPENCLAW_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token`
+  must match or the socket closes.
+- Idempotency keys are required for side‑effecting methods (`send`, `agent`) to
+  safely retry; the server keeps a short‑lived dedupe cache.
+- Nodes must include `role: "node"` plus caps/commands/permissions in `connect`.
+
+## Pairing + local trust
+
+- All WS clients (operators + nodes) include a **device identity** on `connect`.
+- New device IDs require pairing approval; the Gateway issues a **device token**
+  for subsequent connects.
+- **Local** connects (loopback or the gateway host’s own tailnet address) can be
+  auto‑approved to keep same‑host UX smooth.
+- All connects must sign the `connect.challenge` nonce.
+- Signature payload `v3` also binds `platform` + `deviceFamily`; the gateway
+  pins paired metadata on reconnect and requires repair pairing for metadata
+  changes.
+- **Non‑local** connects still require explicit approval.
+- Gateway auth (`gateway.auth.*`) still applies to **all** connections, local or
+  remote.
+
+Details: [Gateway protocol](/gateway/protocol), [Pairing](/channels/pairing),
+[Security](/gateway/security).
+
+## Protocol typing and codegen
+
+- TypeBox schemas define the protocol.
+- JSON Schema is generated from those schemas.
+- Swift models are generated from the JSON Schema.
+
+## Remote access
+
+- Preferred: Tailscale or VPN.
+- Alternative: SSH tunnel
+
+  ```bash
+  ssh -N -L 18789:127.0.0.1:18789 user@host
+  ```
+
+- The same handshake + auth token apply over the tunnel.
+- TLS + optional pinning can be enabled for WS in remote setups.
+
+## Operations snapshot
+
+- Start: `openclaw gateway` (foreground, logs to stdout).
+- Health: `health` over WS (also included in `hello-ok`).
+- Supervision: launchd/systemd for auto‑restart.
+
+## Invariants
+
+- Exactly one Gateway controls a single Baileys session per host.
+- Handshake is mandatory; any non‑JSON or non‑connect first frame is a hard close.
+- Events are not replayed; clients must refresh on gaps.
--- a/openclaw/docs/concepts/compaction.md
+++ b/openclaw/docs/concepts/compaction.md
@@ -0,0 +1,62 @@
+---
+summary: "Context window + compaction: how OpenClaw keeps sessions under model limits"
+read_when:
+  - You want to understand auto-compaction and /compact
+  - You are debugging long sessions hitting context limits
+title: "Compaction"
+---
+
+# Context Window & Compaction
+
+Every model has a **context window** (max tokens it can see). Long-running chats accumulate messages and tool results; once the window is tight, OpenClaw **compacts** older history to stay within limits.
+
+## What compaction is
+
+Compaction **summarizes older conversation** into a compact summary entry and keeps recent messages intact. The summary is stored in the session history, so future requests use:
+
+- The compaction summary
+- Recent messages after the compaction point
+
+Compaction **persists** in the session’s JSONL history.
+
+## Configuration
+
+Use the `agents.defaults.compaction` setting in your `openclaw.json` to configure compaction behavior (mode, target tokens, etc.).
+Compaction summarization preserves opaque identifiers by default (`identifierPolicy: "strict"`). You can override this with `identifierPolicy: "off"` or provide custom text with `identifierPolicy: "custom"` and `identifierInstructions`.
+
+## Auto-compaction (default on)
+
+When a session nears or exceeds the model’s context window, OpenClaw triggers auto-compaction and may retry the original request using the compacted context.
+
+You’ll see:
+
+- `🧹 Auto-compaction complete` in verbose mode
+- `/status` showing `🧹 Compactions: <count>`
+
+Before compaction, OpenClaw can run a **silent memory flush** turn to store
+durable notes to disk. See [Memory](/concepts/memory) for details and config.
+
+## Manual compaction
+
+Use `/compact` (optionally with instructions) to force a compaction pass:
+
+```
+/compact Focus on decisions and open questions
+```
+
+## Context window source
+
+Context window is model-specific. OpenClaw uses the model definition from the configured provider catalog to determine limits.
+
+## Compaction vs pruning
+
+- **Compaction**: summarises and **persists** in JSONL.
+- **Session pruning**: trims old **tool results** only, **in-memory**, per request.
+
+See [/concepts/session-pruning](/concepts/session-pruning) for pruning details.
+
+## Tips
+
+- Use `/compact` when sessions feel stale or context is bloated.
+- Large tool outputs are already truncated; pruning can further reduce tool-result buildup.
+- If you need a fresh slate, `/new` or `/reset` starts a new session id.
--- a/openclaw/docs/concepts/context.md
+++ b/openclaw/docs/concepts/context.md
@@ -0,0 +1,161 @@
+---
+summary: "Context: what the model sees, how it is built, and how to inspect it"
+read_when:
+  - You want to understand what “context” means in OpenClaw
+  - You are debugging why the model “knows” something (or forgot it)
+  - You want to reduce context overhead (/context, /status, /compact)
+title: "Context"
+---
+
+# Context
+
+“Context” is **everything OpenClaw sends to the model for a run**. It is bounded by the model’s **context window** (token limit).
+
+Beginner mental model:
+
+- **System prompt** (OpenClaw-built): rules, tools, skills list, time/runtime, and injected workspace files.
+- **Conversation history**: your messages + the assistant’s messages for this session.
+- **Tool calls/results + attachments**: command output, file reads, images/audio, etc.
+
+Context is _not the same thing_ as “memory”: memory can be stored on disk and reloaded later; context is what’s inside the model’s current window.
+
+## Quick start (inspect context)
+
+- `/status` → quick “how full is my window?” view + session settings.
+- `/context list` → what’s injected + rough sizes (per file + totals).
+- `/context detail` → deeper breakdown: per-file, per-tool schema sizes, per-skill entry sizes, and system prompt size.
+- `/usage tokens` → append per-reply usage footer to normal replies.
+- `/compact` → summarize older history into a compact entry to free window space.
+
+See also: [Slash commands](/tools/slash-commands), [Token use & costs](/reference/token-use), [Compaction](/concepts/compaction).
+
+## Example output
+
+Values vary by model, provider, tool policy, and what’s in your workspace.
+
+### `/context list`
+
+```
+🧠 Context breakdown
+Workspace: <workspaceDir>
+Bootstrap max/file: 20,000 chars
+Sandbox: mode=non-main sandboxed=false
+System prompt (run): 38,412 chars (~9,603 tok) (Project Context 23,901 chars (~5,976 tok))
+
+Injected workspace files:
+- AGENTS.md: OK | raw 1,742 chars (~436 tok) | injected 1,742 chars (~436 tok)
+- SOUL.md: OK | raw 912 chars (~228 tok) | injected 912 chars (~228 tok)
+- TOOLS.md: TRUNCATED | raw 54,210 chars (~13,553 tok) | injected 20,962 chars (~5,241 tok)
+- IDENTITY.md: OK | raw 211 chars (~53 tok) | injected 211 chars (~53 tok)
+- USER.md: OK | raw 388 chars (~97 tok) | injected 388 chars (~97 tok)
+- HEARTBEAT.md: MISSING | raw 0 | injected 0
+- BOOTSTRAP.md: OK | raw 0 chars (~0 tok) | injected 0 chars (~0 tok)
+
+Skills list (system prompt text): 2,184 chars (~546 tok) (12 skills)
+Tools: read, edit, write, exec, process, browser, message, sessions_send, …
+Tool list (system prompt text): 1,032 chars (~258 tok)
+Tool schemas (JSON): 31,988 chars (~7,997 tok) (counts toward context; not shown as text)
+Tools: (same as above)
+
+Session tokens (cached): 14,250 total / ctx=32,000
+```
+
+### `/context detail`
+
+```
+🧠 Context breakdown (detailed)
+…
+Top skills (prompt entry size):
+- frontend-design: 412 chars (~103 tok)
+- oracle: 401 chars (~101 tok)
+… (+10 more skills)
+
+Top tools (schema size):
+- browser: 9,812 chars (~2,453 tok)
+- exec: 6,240 chars (~1,560 tok)
+… (+N more tools)
+```
+
+## What counts toward the context window
+
+Everything the model receives counts, including:
+
+- System prompt (all sections).
+- Conversation history.
+- Tool calls + tool results.
+- Attachments/transcripts (images/audio/files).
+- Compaction summaries and pruning artifacts.
+- Provider “wrappers” or hidden headers (not visible, still counted).
+
+## How OpenClaw builds the system prompt
+
+The system prompt is **OpenClaw-owned** and rebuilt each run. It includes:
+
+- Tool list + short descriptions.
+- Skills list (metadata only; see below).
+- Workspace location.
+- Time (UTC + converted user time if configured).
+- Runtime metadata (host/OS/model/thinking).
+- Injected workspace bootstrap files under **Project Context**.
+
+Full breakdown: [System Prompt](/concepts/system-prompt).
+
+## Injected workspace files (Project Context)
+
+By default, OpenClaw injects a fixed set of workspace files (if present):
+
+- `AGENTS.md`
+- `SOUL.md`
+- `TOOLS.md`
+- `IDENTITY.md`
+- `USER.md`
+- `HEARTBEAT.md`
+- `BOOTSTRAP.md` (first-run only)
+
+Large files are truncated per-file using `agents.defaults.bootstrapMaxChars` (default `20000` chars). OpenClaw also enforces a total bootstrap injection cap across files with `agents.defaults.bootstrapTotalMaxChars` (default `150000` chars). `/context` shows **raw vs injected** sizes and whether truncation happened.
+
+## Skills: what’s injected vs loaded on-demand
+
+The system prompt includes a compact **skills list** (name + description + location). This list has real overhead.
+
+Skill instructions are _not_ included by default. The model is expected to `read` the skill’s `SKILL.md` **only when needed**.
+
+## Tools: there are two costs
+
+Tools affect context in two ways:
+
+1. **Tool list text** in the system prompt (what you see as “Tooling”).
+2. **Tool schemas** (JSON). These are sent to the model so it can call tools. They count toward context even though you don’t see them as plain text.
+
+`/context detail` breaks down the biggest tool schemas so you can see what dominates.
+
+## Commands, directives, and “inline shortcuts”
+
+Slash commands are handled by the Gateway. There are a few different behaviors:
+
+- **Standalone commands**: a message that is only `/...` runs as a command.
+- **Directives**: `/think`, `/verbose`, `/reasoning`, `/elevated`, `/model`, `/queue` are stripped before the model sees the message.
+  - Directive-only messages persist session settings.
+  - Inline directives in a normal message act as per-message hints.
+- **Inline shortcuts** (allowlisted senders only): certain `/...` tokens inside a normal message can run immediately (example: “hey /status”), and are stripped before the model sees the remaining text.
+
+Details: [Slash commands](/tools/slash-commands).
+
+## Sessions, compaction, and pruning (what persists)
+
+What persists across messages depends on the mechanism:
+
+- **Normal history** persists in the session transcript until compacted/pruned by policy.
+- **Compaction** persists a summary into the transcript and keeps recent messages intact.
+- **Pruning** removes old tool results from the _in-memory_ prompt for a run, but does not rewrite the transcript.
+
+Docs: [Session](/concepts/session), [Compaction](/concepts/compaction), [Session pruning](/concepts/session-pruning).
+
+## What `/context` actually reports
+
+`/context` prefers the latest **run-built** system prompt report when available:
+
+- `System prompt (run)` = captured from the last embedded (tool-capable) run and persisted in the session store.
+- `System prompt (estimate)` = computed on the fly when no run report exists (or when running via a CLI backend that doesn’t generate the report).
+
+Either way, it reports sizes and top contributors; it does **not** dump the full system prompt or tool schemas.
--- a/openclaw/docs/concepts/features.md
+++ b/openclaw/docs/concepts/features.md
@@ -0,0 +1,53 @@
+---
+summary: "OpenClaw capabilities across channels, routing, media, and UX."
+read_when:
+  - You want a full list of what OpenClaw supports
+title: "Features"
+---
+
+## Highlights
+
+<Columns>
+  <Card title="Channels" icon="message-square">
+    WhatsApp, Telegram, Discord, and iMessage with a single Gateway.
+  </Card>
+  <Card title="Plugins" icon="plug">
+    Add Mattermost and more with extensions.
+  </Card>
+  <Card title="Routing" icon="route">
+    Multi-agent routing with isolated sessions.
+  </Card>
+  <Card title="Media" icon="image">
+    Images, audio, and documents in and out.
+  </Card>
+  <Card title="Apps and UI" icon="monitor">
+    Web Control UI and macOS companion app.
+  </Card>
+  <Card title="Mobile nodes" icon="smartphone">
+    iOS and Android nodes with Canvas support.
+  </Card>
+</Columns>
+
+## Full list
+
+- WhatsApp integration via WhatsApp Web (Baileys)
+- Telegram bot support (grammY)
+- Discord bot support (channels.discord.js)
+- Mattermost bot support (plugin)
+- iMessage integration via local imsg CLI (macOS)
+- Agent bridge for Pi in RPC mode with tool streaming
+- Streaming and chunking for long responses
+- Multi-agent routing for isolated sessions per workspace or sender
+- Subscription auth for Anthropic and OpenAI via OAuth
+- Sessions: direct chats collapse into shared `main`; groups are isolated
+- Group chat support with mention based activation
+- Media support for images, audio, and documents
+- Optional voice note transcription hook
+- WebChat and macOS menu bar app
+- iOS node with pairing and Canvas surface
+- Android node with pairing, Canvas, chat, and camera
+
+<Note>
+Legacy Claude, Codex, Gemini, and Opencode paths have been removed. Pi is the only
+coding agent path.
+</Note>
--- a/openclaw/docs/concepts/markdown-formatting.md
+++ b/openclaw/docs/concepts/markdown-formatting.md
@@ -0,0 +1,130 @@
+---
+summary: "Markdown formatting pipeline for outbound channels"
+read_when:
+  - You are changing markdown formatting or chunking for outbound channels
+  - You are adding a new channel formatter or style mapping
+  - You are debugging formatting regressions across channels
+title: "Markdown Formatting"
+---
+
+# Markdown formatting
+
+OpenClaw formats outbound Markdown by converting it into a shared intermediate
+representation (IR) before rendering channel-specific output. The IR keeps the
+source text intact while carrying style/link spans so chunking and rendering can
+stay consistent across channels.
+
+## Goals
+
+- **Consistency:** one parse step, multiple renderers.
+- **Safe chunking:** split text before rendering so inline formatting never
+  breaks across chunks.
+- **Channel fit:** map the same IR to Slack mrkdwn, Telegram HTML, and Signal
+  style ranges without re-parsing Markdown.
+
+## Pipeline
+
+1. **Parse Markdown -> IR**
+   - IR is plain text plus style spans (bold/italic/strike/code/spoiler) and link spans.
+   - Offsets are UTF-16 code units so Signal style ranges align with its API.
+   - Tables are parsed only when a channel opts into table conversion.
+2. **Chunk IR (format-first)**
+   - Chunking happens on the IR text before rendering.
+   - Inline formatting does not split across chunks; spans are sliced per chunk.
+3. **Render per channel**
+   - **Slack:** mrkdwn tokens (bold/italic/strike/code), links as `<url|label>`.
+   - **Telegram:** HTML tags (`<b>`, `<i>`, `<s>`, `<code>`, `<pre><code>`, `<a href>`).
+   - **Signal:** plain text + `text-style` ranges; links become `label (url)` when label differs.
+
+## IR example
+
+Input Markdown:
+
+```markdown
+Hello **world** — see [docs](https://docs.openclaw.ai).
+```
+
+IR (schematic):
+
+```json
+{
+  "text": "Hello world — see docs.",
+  "styles": [{ "start": 6, "end": 11, "style": "bold" }],
+  "links": [{ "start": 19, "end": 23, "href": "https://docs.openclaw.ai" }]
+}
+```
+
+## Where it is used
+
+- Slack, Telegram, and Signal outbound adapters render from the IR.
+- Other channels (WhatsApp, iMessage, MS Teams, Discord) still use plain text or
+  their own formatting rules, with Markdown table conversion applied before
+  chunking when enabled.
+
+## Table handling
+
+Markdown tables are not consistently supported across chat clients. Use
+`markdown.tables` to control conversion per channel (and per account).
+
+- `code`: render tables as code blocks (default for most channels).
+- `bullets`: convert each row into bullet points (default for Signal + WhatsApp).
+- `off`: disable table parsing and conversion; raw table text passes through.
+
+Config keys:
+
+```yaml
+channels:
+  discord:
+    markdown:
+      tables: code
+    accounts:
+      work:
+        markdown:
+          tables: off
+```
+
+## Chunking rules
+
+- Chunk limits come from channel adapters/config and are applied to the IR text.
+- Code fences are preserved as a single block with a trailing newline so channels
+  render them correctly.
+- List prefixes and blockquote prefixes are part of the IR text, so chunking
+  does not split mid-prefix.
+- Inline styles (bold/italic/strike/inline-code/spoiler) are never split across
+  chunks; the renderer reopens styles inside each chunk.
+
+If you need more on chunking behavior across channels, see
+[Streaming + chunking](/concepts/streaming).
+
+## Link policy
+
+- **Slack:** `[label](url)` -> `<url|label>`; bare URLs remain bare. Autolink
+  is disabled during parse to avoid double-linking.
+- **Telegram:** `[label](url)` -> `<a href="url">label</a>` (HTML parse mode).
+- **Signal:** `[label](url)` -> `label (url)` unless label matches the URL.
+
+## Spoilers
+
+Spoiler markers (`||spoiler||`) are parsed only for Signal, where they map to
+SPOILER style ranges. Other channels treat them as plain text.
+
+## How to add or update a channel formatter
+
+1. **Parse once:** use the shared `markdownToIR(...)` helper with channel-appropriate
+   options (autolink, heading style, blockquote prefix).
+2. **Render:** implement a renderer with `renderMarkdownWithMarkers(...)` and a
+   style marker map (or Signal style ranges).
+3. **Chunk:** call `chunkMarkdownIR(...)` before rendering; render each chunk.
+4. **Wire adapter:** update the channel outbound adapter to use the new chunker
+   and renderer.
+5. **Test:** add or update format tests and an outbound delivery test if the
+   channel uses chunking.
+
+## Common gotchas
+
+- Slack angle-bracket tokens (`<@U123>`, `<#C123>`, `<https://...>`) must be
+  preserved; escape raw HTML safely.
+- Telegram HTML requires escaping text outside tags to avoid broken markup.
+- Signal style ranges depend on UTF-16 offsets; do not use code point offsets.
+- Preserve trailing newlines for fenced code blocks so closing markers land on
+  their own line.
--- a/openclaw/docs/concepts/memory.md
+++ b/openclaw/docs/concepts/memory.md
@@ -0,0 +1,736 @@
+---
+title: "Memory"
+summary: "How OpenClaw memory works (workspace files + automatic memory flush)"
+read_when:
+  - You want the memory file layout and workflow
+  - You want to tune the automatic pre-compaction memory flush
+---
+
+# Memory
+
+OpenClaw memory is **plain Markdown in the agent workspace**. The files are the
+source of truth; the model only "remembers" what gets written to disk.
+
+Memory search tools are provided by the active memory plugin (default:
+`memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.
+
+## Memory files (Markdown)
+
+The default workspace layout uses two memory layers:
+
+- `memory/YYYY-MM-DD.md`
+  - Daily log (append-only).
+  - Read today + yesterday at session start.
+- `MEMORY.md` (optional)
+  - Curated long-term memory.
+  - **Only load in the main, private session** (never in group contexts).
+
+These files live under the workspace (`agents.defaults.workspace`, default
+`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
+
+## Memory tools
+
+OpenClaw exposes two agent-facing tools for these Markdown files:
+
+- `memory_search` — semantic recall over indexed snippets.
+- `memory_get` — targeted read of a specific Markdown file/line range.
+
+`memory_get` now **degrades gracefully when a file doesn't exist** (for example,
+today's daily log before the first write). Both the builtin manager and the QMD
+backend return `{ text: "", path }` instead of throwing `ENOENT`, so agents can
+handle "nothing recorded yet" and continue their workflow without wrapping the
+tool call in try/catch logic.
+
+## When to write memory
+
+- Decisions, preferences, and durable facts go to `MEMORY.md`.
+- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
+- If someone says "remember this," write it down (do not keep it in RAM).
+- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
+- If you want something to stick, **ask the bot to write it** into memory.
+
+## Automatic memory flush (pre-compaction ping)
+
+When a session is **close to auto-compaction**, OpenClaw triggers a **silent,
+agentic turn** that reminds the model to write durable memory **before** the
+context is compacted. The default prompts explicitly say the model _may reply_,
+but usually `NO_REPLY` is the correct response so the user never sees this turn.
+
+This is controlled by `agents.defaults.compaction.memoryFlush`:
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        reserveTokensFloor: 20000,
+        memoryFlush: {
+          enabled: true,
+          softThresholdTokens: 4000,
+          systemPrompt: "Session nearing compaction. Store durable memories now.",
+          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
+        },
+      },
+    },
+  },
+}
+```
+
+Details:
+
+- **Soft threshold**: flush triggers when the session token estimate crosses
+  `contextWindow - reserveTokensFloor - softThresholdTokens`.
+- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
+- **Two prompts**: a user prompt plus a system prompt append the reminder.
+- **One flush per compaction cycle** (tracked in `sessions.json`).
+- **Workspace must be writable**: if the session runs sandboxed with
+  `workspaceAccess: "ro"` or `"none"`, the flush is skipped.
+
+For the full compaction lifecycle, see
+[Session management + compaction](/reference/session-management-compaction).
+
+## Vector memory search
+
+OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` so
+semantic queries can find related notes even when wording differs.
+
+Defaults:
+
+- Enabled by default.
+- Watches memory files for changes (debounced).
+- Configure memory search under `agents.defaults.memorySearch` (not top-level
+  `memorySearch`).
+- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects:
+  1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
+  2. `openai` if an OpenAI key can be resolved.
+  3. `gemini` if a Gemini key can be resolved.
+  4. `voyage` if a Voyage key can be resolved.
+  5. `mistral` if a Mistral key can be resolved.
+  6. Otherwise memory search stays disabled until configured.
+- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
+- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
+
+Remote embeddings **require** an API key for the embedding provider. OpenClaw
+resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
+variables. Codex OAuth only covers chat/completions and does **not** satisfy
+embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
+`models.providers.google.apiKey`. For Voyage, use `VOYAGE_API_KEY` or
+`models.providers.voyage.apiKey`. For Mistral, use `MISTRAL_API_KEY` or
+`models.providers.mistral.apiKey`.
+When using a custom OpenAI-compatible endpoint,
+set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).
+
+### QMD backend (experimental)
+
+Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
+[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines
+BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells
+out to QMD for retrieval. Key points:
+
+**Prereqs**
+
+- Disabled by default. Opt in per-config (`memory.backend = "qmd"`).
+- Install the QMD CLI separately (`bun install -g https://github.com/tobi/qmd` or grab
+  a release) and make sure the `qmd` binary is on the gateway’s `PATH`.
+- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
+  macOS).
+- QMD runs fully locally via Bun + `node-llama-cpp` and auto-downloads GGUF
+  models from HuggingFace on first use (no separate Ollama daemon required).
+- The gateway runs QMD in a self-contained XDG home under
+  `~/.openclaw/agents/<agentId>/qmd/` by setting `XDG_CONFIG_HOME` and
+  `XDG_CACHE_HOME`.
+- OS support: macOS and Linux work out of the box once Bun + SQLite are
+  installed. Windows is best supported via WSL2.
+
+**How the sidecar runs**
+
+- The gateway writes a self-contained QMD home under
+  `~/.openclaw/agents/<agentId>/qmd/` (config + cache + sqlite DB).
+- Collections are created via `qmd collection add` from `memory.qmd.paths`
+  (plus default workspace memory files), then `qmd update` + `qmd embed` run
+  on boot and on a configurable interval (`memory.qmd.update.interval`,
+  default 5 m).
+- The gateway now initializes the QMD manager on startup, so periodic update
+  timers are armed even before the first `memory_search` call.
+- Boot refresh now runs in the background by default so chat startup is not
+  blocked; set `memory.qmd.update.waitForBootSync = true` to keep the previous
+  blocking behavior.
+- Searches run via `memory.qmd.searchMode` (default `qmd search --json`; also
+  supports `vsearch` and `query`). If the selected mode rejects flags on your
+  QMD build, OpenClaw retries with `qmd query`. If QMD fails or the binary is
+  missing, OpenClaw automatically falls back to the builtin SQLite manager so
+  memory tools keep working.
+- OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is
+  controlled by QMD itself.
+- **First search may be slow**: QMD may download local GGUF models (reranker/query
+  expansion) on the first `qmd query` run.
+  - OpenClaw sets `XDG_CONFIG_HOME`/`XDG_CACHE_HOME` automatically when it runs QMD.
+  - If you want to pre-download models manually (and warm the same index OpenClaw
+    uses), run a one-off query with the agent’s XDG dirs.
+
+    OpenClaw’s QMD state lives under your **state dir** (defaults to `~/.openclaw`).
+    You can point `qmd` at the exact same index by exporting the same XDG vars
+    OpenClaw uses:
+
+    ```bash
+    # Pick the same state dir OpenClaw uses
+    STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
+
+    export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config"
+    export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache"
+
+    # (Optional) force an index refresh + embeddings
+    qmd update
+    qmd embed
+
+    # Warm up / trigger first-time model downloads
+    qmd query "test" -c memory-root --json >/dev/null 2>&1
+    ```
+
+**Config surface (`memory.qmd.*`)**
+
+- `command` (default `qmd`): override the executable path.
+- `searchMode` (default `search`): pick which QMD command backs
+  `memory_search` (`search`, `vsearch`, `query`).
+- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`.
+- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional
+  stable `name`).
+- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`,
+  `exportDir`).
+- `update`: controls refresh cadence and maintenance execution:
+  (`interval`, `debounceMs`, `onBoot`, `waitForBootSync`, `embedInterval`,
+  `commandTimeoutMs`, `updateTimeoutMs`, `embedTimeoutMs`).
+- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`,
+  `maxInjectedChars`, `timeoutMs`).
+- `scope`: same schema as [`session.sendPolicy`](/gateway/configuration#session).
+  Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
+  hits in groups/channels.
+  - `match.keyPrefix` matches the **normalized** session key (lowercased, with any
+    leading `agent:<id>:` stripped). Example: `discord:channel:`.
+  - `match.rawKeyPrefix` matches the **raw** session key (lowercased), including
+    `agent:<id>:`. Example: `agent:main:discord:`.
+  - Legacy: `match.keyPrefix: "agent:..."` is still treated as a raw-key prefix,
+    but prefer `rawKeyPrefix` for clarity.
+- When `scope` denies a search, OpenClaw logs a warning with the derived
+  `channel`/`chatType` so empty results are easier to debug.
+- Snippets sourced outside the workspace show up as
+  `qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
+  understands that prefix and reads from the configured QMD collection root.
+- When `memory.qmd.sessions.enabled = true`, OpenClaw exports sanitized session
+  transcripts (User/Assistant turns) into a dedicated QMD collection under
+  `~/.openclaw/agents/<id>/qmd/sessions/`, so `memory_search` can recall recent
+  conversations without touching the builtin SQLite index.
+- `memory_search` snippets now include a `Source: <path#line>` footer when
+  `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep
+  the path metadata internal (the agent still receives the path for
+  `memory_get`, but the snippet text omits the footer and the system prompt
+  warns the agent not to cite it).
+
+**Example**
+
+```json5
+memory: {
+  backend: "qmd",
+  citations: "auto",
+  qmd: {
+    includeDefaultMemory: true,
+    update: { interval: "5m", debounceMs: 15000 },
+    limits: { maxResults: 6, timeoutMs: 4000 },
+    scope: {
+      default: "deny",
+      rules: [
+        { action: "allow", match: { chatType: "direct" } },
+        // Normalized session-key prefix (strips `agent:<id>:`).
+        { action: "deny", match: { keyPrefix: "discord:channel:" } },
+        // Raw session-key prefix (includes `agent:<id>:`).
+        { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
+      ]
+    },
+    paths: [
+      { name: "docs", path: "~/notes", pattern: "**/*.md" }
+    ]
+  }
+}
+```
+
+**Citations & fallback**
+
+- `memory.citations` applies regardless of backend (`auto`/`on`/`off`).
+- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which
+  engine served the results. If the QMD subprocess exits or JSON output can’t be
+  parsed, the search manager logs a warning and returns the builtin provider
+  (existing Markdown embeddings) until QMD recovers.
+
+### Additional memory paths
+
+If you want to index Markdown files outside the default workspace layout, add
+explicit paths:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
+    }
+  }
+}
+```
+
+Notes:
+
+- Paths can be absolute or workspace-relative.
+- Directories are scanned recursively for `.md` files.
+- Only Markdown files are indexed.
+- Symlinks are ignored (files or directories).
+
+### Gemini embeddings (native)
+
+Set the provider to `gemini` to use the Gemini embeddings API directly:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "gemini",
+      model: "gemini-embedding-001",
+      remote: {
+        apiKey: "YOUR_GEMINI_API_KEY"
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
+- `remote.headers` lets you add extra headers if needed.
+- Default model: `gemini-embedding-001`.
+
+If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy),
+you can use the `remote` configuration with the OpenAI provider:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
+        headers: { "X-Custom-Header": "value" }
+      }
+    }
+  }
+}
+```
+
+If you don't want to set an API key, use `memorySearch.provider = "local"` or set
+`memorySearch.fallback = "none"`.
+
+Fallbacks:
+
+- `memorySearch.fallback` can be `openai`, `gemini`, `voyage`, `mistral`, `local`, or `none`.
+- The fallback provider is only used when the primary embedding provider fails.
+
+Batch indexing (OpenAI + Gemini + Voyage):
+
+- Disabled by default. Set `agents.defaults.memorySearch.remote.batch.enabled = true` to enable for large-corpus indexing (OpenAI, Gemini, and Voyage).
+- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
+- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
+- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
+- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
+
+Why OpenAI batch is fast + cheap:
+
+- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
+- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
+- See the OpenAI Batch API docs and pricing for details:
+  - [https://platform.openai.com/docs/api-reference/batch](https://platform.openai.com/docs/api-reference/batch)
+  - [https://platform.openai.com/pricing](https://platform.openai.com/pricing)
+
+Config example:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      fallback: "openai",
+      remote: {
+        batch: { enabled: true, concurrency: 2 }
+      },
+      sync: { watch: true }
+    }
+  }
+}
+```
+
+Tools:
+
+- `memory_search` — returns snippets with file + line ranges.
+- `memory_get` — read memory file content by path.
+
+Local mode:
+
+- Set `agents.defaults.memorySearch.provider = "local"`.
+- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
+- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
+
+### How the memory tools work
+
+- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
+- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
+- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
+
+### What gets indexed (and when)
+
+- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
+- Index storage: per-agent SQLite at `~/.openclaw/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
+- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
+- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store.
+
+### Hybrid search (BM25 + vector)
+
+When enabled, OpenClaw combines:
+
+- **Vector similarity** (semantic match, wording can differ)
+- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
+
+If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search.
+
+#### Why hybrid?
+
+Vector search is great at “this means the same thing”:
+
+- “Mac Studio gateway host” vs “the machine running the gateway”
+- “debounce file updates” vs “avoid indexing on every write”
+
+But it can be weak at exact, high-signal tokens:
+
+- IDs (`a828e60`, `b3b9895a…`)
+- code symbols (`memorySearch.query.hybrid`)
+- error strings ("sqlite-vec unavailable")
+
+BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
+Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
+good results for both "natural language" queries and "needle in a haystack" queries.
+
+#### How we merge results (the current design)
+
+Implementation sketch:
+
+1. Retrieve a candidate pool from both sides:
+
+- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
+- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
+
+2. Convert BM25 rank into a 0..1-ish score:
+
+- `textScore = 1 / (1 + max(0, bm25Rank))`
+
+3. Union candidates by chunk id and compute a weighted score:
+
+- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
+
+Notes:
+
+- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
+- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
+- If FTS5 can't be created, we keep vector-only search (no hard failure).
+
+This isn't "IR-theory perfect", but it's simple, fast, and tends to improve recall/precision on real notes.
+If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
+(min/max or z-score) before mixing.
+
+#### Post-processing pipeline
+
+After merging vector and keyword scores, two optional post-processing stages
+refine the result list before it reaches the agent:
+
+```
+Vector + Keyword → Weighted Merge → Temporal Decay → Sort → MMR → Top-K Results
+```
+
+Both stages are **off by default** and can be enabled independently.
+
+#### MMR re-ranking (diversity)
+
+When hybrid search returns results, multiple chunks may contain similar or overlapping content.
+For example, searching for "home network setup" might return five nearly identical snippets
+from different daily notes that all mention the same router configuration.
+
+**MMR (Maximal Marginal Relevance)** re-ranks the results to balance relevance with diversity,
+ensuring the top results cover different aspects of the query instead of repeating the same information.
+
+How it works:
+
+1. Results are scored by their original relevance (vector + BM25 weighted score).
+2. MMR iteratively selects results that maximize: `λ × relevance − (1−λ) × max_similarity_to_selected`.
+3. Similarity between results is measured using Jaccard text similarity on tokenized content.
+
+The `lambda` parameter controls the trade-off:
+
+- `lambda = 1.0` → pure relevance (no diversity penalty)
+- `lambda = 0.0` → maximum diversity (ignores relevance)
+- Default: `0.7` (balanced, slight relevance bias)
+
+**Example — query: "home network setup"**
+
+Given these memory files:
+
+```
+memory/2026-02-10.md  → "Configured Omada router, set VLAN 10 for IoT devices"
+memory/2026-02-08.md  → "Configured Omada router, moved IoT to VLAN 10"
+memory/2026-02-05.md  → "Set up AdGuard DNS on 192.168.10.2"
+memory/network.md     → "Router: Omada ER605, AdGuard: 192.168.10.2, VLAN 10: IoT"
+```
+
+Without MMR — top 3 results:
+
+```
+1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
+2. memory/2026-02-08.md  (score: 0.89)  ← router + VLAN (near-duplicate!)
+3. memory/network.md     (score: 0.85)  ← reference doc
+```
+
+With MMR (λ=0.7) — top 3 results:
+
+```
+1. memory/2026-02-10.md  (score: 0.92)  ← router + VLAN
+2. memory/network.md     (score: 0.85)  ← reference doc (diverse!)
+3. memory/2026-02-05.md  (score: 0.78)  ← AdGuard DNS (diverse!)
+```
+
+The near-duplicate from Feb 8 drops out, and the agent gets three distinct pieces of information.
+
+**When to enable:** If you notice `memory_search` returning redundant or near-duplicate snippets,
+especially with daily notes that often repeat similar information across days.
+
+#### Temporal decay (recency boost)
+
+Agents with daily notes accumulate hundreds of dated files over time. Without decay,
+a well-worded note from six months ago can outrank yesterday's update on the same topic.
+
+**Temporal decay** applies an exponential multiplier to scores based on the age of each result,
+so recent memories naturally rank higher while old ones fade:
+
+```
+decayedScore = score × e^(-λ × ageInDays)
+```
+
+where `λ = ln(2) / halfLifeDays`.
+
+With the default half-life of 30 days:
+
+- Today's notes: **100%** of original score
+- 7 days ago: **~84%**
+- 30 days ago: **50%**
+- 90 days ago: **12.5%**
+- 180 days ago: **~1.6%**
+
+**Evergreen files are never decayed:**
+
+- `MEMORY.md` (root memory file)
+- Non-dated files in `memory/` (e.g., `memory/projects.md`, `memory/network.md`)
+- These contain durable reference information that should always rank normally.
+
+**Dated daily files** (`memory/YYYY-MM-DD.md`) use the date extracted from the filename.
+Other sources (e.g., session transcripts) fall back to file modification time (`mtime`).
+
+**Example — query: "what's Rod's work schedule?"**
+
+Given these memory files (today is Feb 10):
+
+```
+memory/2025-09-15.md  → "Rod works Mon-Fri, standup at 10am, pairing at 2pm"  (148 days old)
+memory/2026-02-10.md  → "Rod has standup at 14:15, 1:1 with Zeb at 14:45"    (today)
+memory/2026-02-03.md  → "Rod started new team, standup moved to 14:15"        (7 days old)
+```
+
+Without decay:
+
+```
+1. memory/2025-09-15.md  (score: 0.91)  ← best semantic match, but stale!
+2. memory/2026-02-10.md  (score: 0.82)
+3. memory/2026-02-03.md  (score: 0.80)
+```
+
+With decay (halfLife=30):
+
+```
+1. memory/2026-02-10.md  (score: 0.82 × 1.00 = 0.82)  ← today, no decay
+2. memory/2026-02-03.md  (score: 0.80 × 0.85 = 0.68)  ← 7 days, mild decay
+3. memory/2025-09-15.md  (score: 0.91 × 0.03 = 0.03)  ← 148 days, nearly gone
+```
+
+The stale September note drops to the bottom despite having the best raw semantic match.
+
+**When to enable:** If your agent has months of daily notes and you find that old,
+stale information outranks recent context. A half-life of 30 days works well for
+daily-note-heavy workflows; increase it (e.g., 90 days) if you reference older notes frequently.
+
+#### Configuration
+
+Both features are configured under `memorySearch.query.hybrid`:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      query: {
+        hybrid: {
+          enabled: true,
+          vectorWeight: 0.7,
+          textWeight: 0.3,
+          candidateMultiplier: 4,
+          // Diversity: reduce redundant results
+          mmr: {
+            enabled: true,    // default: false
+            lambda: 0.7       // 0 = max diversity, 1 = max relevance
+          },
+          // Recency: boost newer memories
+          temporalDecay: {
+            enabled: true,    // default: false
+            halfLifeDays: 30  // score halves every 30 days
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+You can enable either feature independently:
+
+- **MMR only** — useful when you have many similar notes but age doesn't matter.
+- **Temporal decay only** — useful when recency matters but your results are already diverse.
+- **Both** — recommended for agents with large, long-running daily note histories.
+
+### Embedding cache
+
+OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
+
+Config:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      cache: {
+        enabled: true,
+        maxEntries: 50000
+      }
+    }
+  }
+}
+```
+
+### Session memory search (experimental)
+
+You can optionally index **session transcripts** and surface them via `memory_search`.
+This is gated behind an experimental flag.
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      experimental: { sessionMemory: true },
+      sources: ["memory", "sessions"]
+    }
+  }
+}
+```
+
+Notes:
+
+- Session indexing is **opt-in** (off by default).
+- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
+- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
+- Results still include snippets only; `memory_get` remains limited to memory files.
+- Session indexing is isolated per agent (only that agent’s session logs are indexed).
+- Session logs live on disk (`~/.openclaw/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
+
+Delta thresholds (defaults shown):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      sync: {
+        sessions: {
+          deltaBytes: 100000,   // ~100 KB
+          deltaMessages: 50     // JSONL lines
+        }
+      }
+    }
+  }
+}
+```
+
+### SQLite vector acceleration (sqlite-vec)
+
+When the sqlite-vec extension is available, OpenClaw stores embeddings in a
+SQLite virtual table (`vec0`) and performs vector distance queries in the
+database. This keeps search fast without loading every embedding into JS.
+
+Configuration (optional):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      store: {
+        vector: {
+          enabled: true,
+          extensionPath: "/path/to/sqlite-vec"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `enabled` defaults to true; when disabled, search falls back to in-process
+  cosine similarity over stored embeddings.
+- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the
+  error and continues with the JS fallback (no vector table).
+- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
+  or non-standard install locations).
+
+### Local embedding auto-download
+
+- Default local embedding model: `hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf` (~0.6 GB).
+- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
+- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
+- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
+
+### Custom OpenAI-compatible endpoint example
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_REMOTE_API_KEY",
+        headers: {
+          "X-Organization": "org-id",
+          "X-Project": "project-id"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+
+- `remote.*` takes precedence over `models.providers.openai.*`.
+- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.
--- a/openclaw/docs/concepts/messages.md
+++ b/openclaw/docs/concepts/messages.md
@@ -0,0 +1,154 @@
+---
+summary: "Message flow, sessions, queueing, and reasoning visibility"
+read_when:
+  - Explaining how inbound messages become replies
+  - Clarifying sessions, queueing modes, or streaming behavior
+  - Documenting reasoning visibility and usage implications
+title: "Messages"
+---
+
+# Messages
+
+This page ties together how OpenClaw handles inbound messages, sessions, queueing,
+streaming, and reasoning visibility.
+
+## Message flow (high level)
+
+```
+Inbound message
+  -> routing/bindings -> session key
+  -> queue (if a run is active)
+  -> agent run (streaming + tools)
+  -> outbound replies (channel limits + chunking)
+```
+
+Key knobs live in configuration:
+
+- `messages.*` for prefixes, queueing, and group behavior.
+- `agents.defaults.*` for block streaming and chunking defaults.
+- Channel overrides (`channels.whatsapp.*`, `channels.telegram.*`, etc.) for caps and streaming toggles.
+
+See [Configuration](/gateway/configuration) for full schema.
+
+## Inbound dedupe
+
+Channels can redeliver the same message after reconnects. OpenClaw keeps a
+short-lived cache keyed by channel/account/peer/session/message id so duplicate
+deliveries do not trigger another agent run.
+
+## Inbound debouncing
+
+Rapid consecutive messages from the **same sender** can be batched into a single
+agent turn via `messages.inbound`. Debouncing is scoped per channel + conversation
+and uses the most recent message for reply threading/IDs.
+
+Config (global default + per-channel overrides):
+
+```json5
+{
+  messages: {
+    inbound: {
+      debounceMs: 2000,
+      byChannel: {
+        whatsapp: 5000,
+        slack: 1500,
+        discord: 1500,
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- Debounce applies to **text-only** messages; media/attachments flush immediately.
+- Control commands bypass debouncing so they remain standalone.
+
+## Sessions and devices
+
+Sessions are owned by the gateway, not by clients.
+
+- Direct chats collapse into the agent main session key.
+- Groups/channels get their own session keys.
+- The session store and transcripts live on the gateway host.
+
+Multiple devices/channels can map to the same session, but history is not fully
+synced back to every client. Recommendation: use one primary device for long
+conversations to avoid divergent context. The Control UI and TUI always show the
+gateway-backed session transcript, so they are the source of truth.
+
+Details: [Session management](/concepts/session).
+
+## Inbound bodies and history context
+
+OpenClaw separates the **prompt body** from the **command body**:
+
+- `Body`: prompt text sent to the agent. This may include channel envelopes and
+  optional history wrappers.
+- `CommandBody`: raw user text for directive/command parsing.
+- `RawBody`: legacy alias for `CommandBody` (kept for compatibility).
+
+When a channel supplies history, it uses a shared wrapper:
+
+- `[Chat messages since your last reply - for context]`
+- `[Current message - respond to this]`
+
+For **non-direct chats** (groups/channels/rooms), the **current message body** is prefixed with the
+sender label (same style used for history entries). This keeps real-time and queued/history
+messages consistent in the agent prompt.
+
+History buffers are **pending-only**: they include group messages that did _not_
+trigger a run (for example, mention-gated messages) and **exclude** messages
+already in the session transcript.
+
+Directive stripping only applies to the **current message** section so history
+remains intact. Channels that wrap history should set `CommandBody` (or
+`RawBody`) to the original message text and keep `Body` as the combined prompt.
+History buffers are configurable via `messages.groupChat.historyLimit` (global
+default) and per-channel overrides like `channels.slack.historyLimit` or
+`channels.telegram.accounts.<id>.historyLimit` (set `0` to disable).
+
+## Queueing and followups
+
+If a run is already active, inbound messages can be queued, steered into the
+current run, or collected for a followup turn.
+
+- Configure via `messages.queue` (and `messages.queue.byChannel`).
+- Modes: `interrupt`, `steer`, `followup`, `collect`, plus backlog variants.
+
+Details: [Queueing](/concepts/queue).
+
+## Streaming, chunking, and batching
+
+Block streaming sends partial replies as the model produces text blocks.
+Chunking respects channel text limits and avoids splitting fenced code.
+
+Key settings:
+
+- `agents.defaults.blockStreamingDefault` (`on|off`, default off)
+- `agents.defaults.blockStreamingBreak` (`text_end|message_end`)
+- `agents.defaults.blockStreamingChunk` (`minChars|maxChars|breakPreference`)
+- `agents.defaults.blockStreamingCoalesce` (idle-based batching)
+- `agents.defaults.humanDelay` (human-like pause between block replies)
+- Channel overrides: `*.blockStreaming` and `*.blockStreamingCoalesce` (non-Telegram channels require explicit `*.blockStreaming: true`)
+
+Details: [Streaming + chunking](/concepts/streaming).
+
+## Reasoning visibility and tokens
+
+OpenClaw can expose or hide model reasoning:
+
+- `/reasoning on|off|stream` controls visibility.
+- Reasoning content still counts toward token usage when produced by the model.
+- Telegram supports reasoning stream into the draft bubble.
+
+Details: [Thinking + reasoning directives](/tools/thinking) and [Token use](/reference/token-use).
+
+## Prefixes, threading, and replies
+
+Outbound message formatting is centralized in `messages`:
+
+- `messages.responsePrefix`, `channels.<channel>.responsePrefix`, and `channels.<channel>.accounts.<id>.responsePrefix` (outbound prefix cascade), plus `channels.whatsapp.messagePrefix` (WhatsApp inbound prefix)
+- Reply threading via `replyToMode` and per-channel defaults
+
+Details: [Configuration](/gateway/configuration#messages) and channel docs.
--- a/openclaw/docs/concepts/model-failover.md
+++ b/openclaw/docs/concepts/model-failover.md
@@ -0,0 +1,149 @@
+---
+summary: "How OpenClaw rotates auth profiles and falls back across models"
+read_when:
+  - Diagnosing auth profile rotation, cooldowns, or model fallback behavior
+  - Updating failover rules for auth profiles or models
+title: "Model Failover"
+---
+
+# Model failover
+
+OpenClaw handles failures in two stages:
+
+1. **Auth profile rotation** within the current provider.
+2. **Model fallback** to the next model in `agents.defaults.model.fallbacks`.
+
+This doc explains the runtime rules and the data that backs them.
+
+## Auth storage (keys + OAuth)
+
+OpenClaw uses **auth profiles** for both API keys and OAuth tokens.
+
+- Secrets live in `~/.openclaw/agents/<agentId>/agent/auth-profiles.json` (legacy: `~/.openclaw/agent/auth-profiles.json`).
+- Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets).
+- Legacy import-only OAuth file: `~/.openclaw/credentials/oauth.json` (imported into `auth-profiles.json` on first use).
+
+More detail: [/concepts/oauth](/concepts/oauth)
+
+Credential types:
+
+- `type: "api_key"` → `{ provider, key }`
+- `type: "oauth"` → `{ provider, access, refresh, expires, email? }` (+ `projectId`/`enterpriseUrl` for some providers)
+
+## Profile IDs
+
+OAuth logins create distinct profiles so multiple accounts can coexist.
+
+- Default: `provider:default` when no email is available.
+- OAuth with email: `provider:<email>` (for example `google-antigravity:user@gmail.com`).
+
+Profiles live in `~/.openclaw/agents/<agentId>/agent/auth-profiles.json` under `profiles`.
+
+## Rotation order
+
+When a provider has multiple profiles, OpenClaw chooses an order like this:
+
+1. **Explicit config**: `auth.order[provider]` (if set).
+2. **Configured profiles**: `auth.profiles` filtered by provider.
+3. **Stored profiles**: entries in `auth-profiles.json` for the provider.
+
+If no explicit order is configured, OpenClaw uses a round‑robin order:
+
+- **Primary key:** profile type (**OAuth before API keys**).
+- **Secondary key:** `usageStats.lastUsed` (oldest first, within each type).
+- **Cooldown/disabled profiles** are moved to the end, ordered by soonest expiry.
+
+### Session stickiness (cache-friendly)
+
+OpenClaw **pins the chosen auth profile per session** to keep provider caches warm.
+It does **not** rotate on every request. The pinned profile is reused until:
+
+- the session is reset (`/new` / `/reset`)
+- a compaction completes (compaction count increments)
+- the profile is in cooldown/disabled
+
+Manual selection via `/model …@<profileId>` sets a **user override** for that session
+and is not auto‑rotated until a new session starts.
+
+Auto‑pinned profiles (selected by the session router) are treated as a **preference**:
+they are tried first, but OpenClaw may rotate to another profile on rate limits/timeouts.
+User‑pinned profiles stay locked to that profile; if it fails and model fallbacks
+are configured, OpenClaw moves to the next model instead of switching profiles.
+
+### Why OAuth can “look lost”
+
+If you have both an OAuth profile and an API key profile for the same provider, round‑robin can switch between them across messages unless pinned. To force a single profile:
+
+- Pin with `auth.order[provider] = ["provider:profileId"]`, or
+- Use a per-session override via `/model …` with a profile override (when supported by your UI/chat surface).
+
+## Cooldowns
+
+When a profile fails due to auth/rate‑limit errors (or a timeout that looks
+like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
+Format/invalid‑request errors (for example Cloud Code Assist tool call ID
+validation failures) are treated as failover‑worthy and use the same cooldowns.
+
+Cooldowns use exponential backoff:
+
+- 1 minute
+- 5 minutes
+- 25 minutes
+- 1 hour (cap)
+
+State is stored in `auth-profiles.json` under `usageStats`:
+
+```json
+{
+  "usageStats": {
+    "provider:profile": {
+      "lastUsed": 1736160000000,
+      "cooldownUntil": 1736160600000,
+      "errorCount": 2
+    }
+  }
+}
+```
+
+## Billing disables
+
+Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failover‑worthy, but they’re usually not transient. Instead of a short cooldown, OpenClaw marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
+
+State is stored in `auth-profiles.json`:
+
+```json
+{
+  "usageStats": {
+    "provider:profile": {
+      "disabledUntil": 1736178000000,
+      "disabledReason": "billing"
+    }
+  }
+}
+```
+
+Defaults:
+
+- Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**.
+- Backoff counters reset if the profile hasn’t failed for **24 hours** (configurable).
+
+## Model fallback
+
+If all profiles for a provider fail, OpenClaw moves to the next model in
+`agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and
+timeouts that exhausted profile rotation (other errors do not advance fallback).
+
+When a run starts with a model override (hooks or CLI), fallbacks still end at
+`agents.defaults.model.primary` after trying any configured fallbacks.
+
+## Related config
+
+See [Gateway configuration](/gateway/configuration) for:
+
+- `auth.profiles` / `auth.order`
+- `auth.cooldowns.billingBackoffHours` / `auth.cooldowns.billingBackoffHoursByProvider`
+- `auth.cooldowns.billingMaxHours` / `auth.cooldowns.failureWindowHours`
+- `agents.defaults.model.primary` / `agents.defaults.model.fallbacks`
+- `agents.defaults.imageModel` routing
+
+See [Models](/concepts/models) for the broader model selection and fallback overview.
--- a/openclaw/docs/concepts/model-providers.md
+++ b/openclaw/docs/concepts/model-providers.md
@@ -0,0 +1,442 @@
+---
+summary: "Model provider overview with example configs + CLI flows"
+read_when:
+  - You need a provider-by-provider model setup reference
+  - You want example configs or CLI onboarding commands for model providers
+title: "Model Providers"
+---
+
+# Model providers
+
+This page covers **LLM/model providers** (not chat channels like WhatsApp/Telegram).
+For model selection rules, see [/concepts/models](/concepts/models).
+
+## Quick rules
+
+- Model refs use `provider/model` (example: `opencode/claude-opus-4-6`).
+- If you set `agents.defaults.models`, it becomes the allowlist.
+- CLI helpers: `openclaw onboard`, `openclaw models list`, `openclaw models set <provider/model>`.
+
+## API key rotation
+
+- Supports generic provider rotation for selected providers.
+- Configure multiple keys via:
+  - `OPENCLAW_LIVE_<PROVIDER>_KEY` (single live override, highest priority)
+  - `<PROVIDER>_API_KEYS` (comma or semicolon list)
+  - `<PROVIDER>_API_KEY` (primary key)
+  - `<PROVIDER>_API_KEY_*` (numbered list, e.g. `<PROVIDER>_API_KEY_1`)
+- For Google providers, `GOOGLE_API_KEY` is also included as fallback.
+- Key selection order preserves priority and deduplicates values.
+- Requests are retried with the next key only on rate-limit responses (for example `429`, `rate_limit`, `quota`, `resource exhausted`).
+- Non-rate-limit failures fail immediately; no key rotation is attempted.
+- When all candidate keys fail, the final error is returned from the last attempt.
+
+## Built-in providers (pi-ai catalog)
+
+OpenClaw ships with the pi‑ai catalog. These providers require **no**
+`models.providers` config; just set auth + pick a model.
+
+### OpenAI
+
+- Provider: `openai`
+- Auth: `OPENAI_API_KEY`
+- Optional rotation: `OPENAI_API_KEYS`, `OPENAI_API_KEY_1`, `OPENAI_API_KEY_2`, plus `OPENCLAW_LIVE_OPENAI_KEY` (single override)
+- Example model: `openai/gpt-5.1-codex`
+- CLI: `openclaw onboard --auth-choice openai-api-key`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "openai/gpt-5.1-codex" } } },
+}
+```
+
+### Anthropic
+
+- Provider: `anthropic`
+- Auth: `ANTHROPIC_API_KEY` or `claude setup-token`
+- Optional rotation: `ANTHROPIC_API_KEYS`, `ANTHROPIC_API_KEY_1`, `ANTHROPIC_API_KEY_2`, plus `OPENCLAW_LIVE_ANTHROPIC_KEY` (single override)
+- Example model: `anthropic/claude-opus-4-6`
+- CLI: `openclaw onboard --auth-choice token` (paste setup-token) or `openclaw models auth paste-token --provider anthropic`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "anthropic/claude-opus-4-6" } } },
+}
+```
+
+### OpenAI Code (Codex)
+
+- Provider: `openai-codex`
+- Auth: OAuth (ChatGPT)
+- Example model: `openai-codex/gpt-5.3-codex`
+- CLI: `openclaw onboard --auth-choice openai-codex` or `openclaw models auth login --provider openai-codex`
+- Default transport is `auto` (WebSocket-first, SSE fallback)
+- Override per model via `agents.defaults.models["openai-codex/<model>"].params.transport` (`"sse"`, `"websocket"`, or `"auto"`)
+
+```json5
+{
+  agents: { defaults: { model: { primary: "openai-codex/gpt-5.3-codex" } } },
+}
+```
+
+### OpenCode Zen
+
+- Provider: `opencode`
+- Auth: `OPENCODE_API_KEY` (or `OPENCODE_ZEN_API_KEY`)
+- Example model: `opencode/claude-opus-4-6`
+- CLI: `openclaw onboard --auth-choice opencode-zen`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "opencode/claude-opus-4-6" } } },
+}
+```
+
+### Google Gemini (API key)
+
+- Provider: `google`
+- Auth: `GEMINI_API_KEY`
+- Optional rotation: `GEMINI_API_KEYS`, `GEMINI_API_KEY_1`, `GEMINI_API_KEY_2`, `GOOGLE_API_KEY` fallback, and `OPENCLAW_LIVE_GEMINI_KEY` (single override)
+- Example model: `google/gemini-3-pro-preview`
+- CLI: `openclaw onboard --auth-choice gemini-api-key`
+
+### Google Vertex, Antigravity, and Gemini CLI
+
+- Providers: `google-vertex`, `google-antigravity`, `google-gemini-cli`
+- Auth: Vertex uses gcloud ADC; Antigravity/Gemini CLI use their respective auth flows
+- Caution: Antigravity and Gemini CLI OAuth in OpenClaw are unofficial integrations. Some users have reported Google account restrictions after using third-party clients. Review Google terms and use a non-critical account if you choose to proceed.
+- Antigravity OAuth is shipped as a bundled plugin (`google-antigravity-auth`, disabled by default).
+  - Enable: `openclaw plugins enable google-antigravity-auth`
+  - Login: `openclaw models auth login --provider google-antigravity --set-default`
+- Gemini CLI OAuth is shipped as a bundled plugin (`google-gemini-cli-auth`, disabled by default).
+  - Enable: `openclaw plugins enable google-gemini-cli-auth`
+  - Login: `openclaw models auth login --provider google-gemini-cli --set-default`
+  - Note: you do **not** paste a client id or secret into `openclaw.json`. The CLI login flow stores
+    tokens in auth profiles on the gateway host.
+
+### Z.AI (GLM)
+
+- Provider: `zai`
+- Auth: `ZAI_API_KEY`
+- Example model: `zai/glm-4.7`
+- CLI: `openclaw onboard --auth-choice zai-api-key`
+  - Aliases: `z.ai/*` and `z-ai/*` normalize to `zai/*`
+
+### Vercel AI Gateway
+
+- Provider: `vercel-ai-gateway`
+- Auth: `AI_GATEWAY_API_KEY`
+- Example model: `vercel-ai-gateway/anthropic/claude-opus-4.6`
+- CLI: `openclaw onboard --auth-choice ai-gateway-api-key`
+
+### Kilo Gateway
+
+- Provider: `kilocode`
+- Auth: `KILOCODE_API_KEY`
+- Example model: `kilocode/anthropic/claude-opus-4.6`
+- CLI: `openclaw onboard --kilocode-api-key <key>`
+- Base URL: `https://api.kilo.ai/api/gateway/`
+- Expanded built-in catalog includes GLM-5 Free, MiniMax M2.5 Free, GPT-5.2, Gemini 3 Pro Preview, Gemini 3 Flash Preview, Grok Code Fast 1, and Kimi K2.5.
+
+See [/providers/kilocode](/providers/kilocode) for setup details.
+
+### Other built-in providers
+
+- OpenRouter: `openrouter` (`OPENROUTER_API_KEY`)
+- Example model: `openrouter/anthropic/claude-sonnet-4-5`
+- Kilo Gateway: `kilocode` (`KILOCODE_API_KEY`)
+- Example model: `kilocode/anthropic/claude-opus-4.6`
+- xAI: `xai` (`XAI_API_KEY`)
+- Mistral: `mistral` (`MISTRAL_API_KEY`)
+- Example model: `mistral/mistral-large-latest`
+- CLI: `openclaw onboard --auth-choice mistral-api-key`
+- Groq: `groq` (`GROQ_API_KEY`)
+- Cerebras: `cerebras` (`CEREBRAS_API_KEY`)
+  - GLM models on Cerebras use ids `zai-glm-4.7` and `zai-glm-4.6`.
+  - OpenAI-compatible base URL: `https://api.cerebras.ai/v1`.
+- GitHub Copilot: `github-copilot` (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN`)
+- Hugging Face Inference: `huggingface` (`HUGGINGFACE_HUB_TOKEN` or `HF_TOKEN`) — OpenAI-compatible router; example model: `huggingface/deepseek-ai/DeepSeek-R1`; CLI: `openclaw onboard --auth-choice huggingface-api-key`. See [Hugging Face (Inference)](/providers/huggingface).
+
+## Providers via `models.providers` (custom/base URL)
+
+Use `models.providers` (or `models.json`) to add **custom** providers or
+OpenAI/Anthropic‑compatible proxies.
+
+### Moonshot AI (Kimi)
+
+Moonshot uses OpenAI-compatible endpoints, so configure it as a custom provider:
+
+- Provider: `moonshot`
+- Auth: `MOONSHOT_API_KEY`
+- Example model: `moonshot/kimi-k2.5`
+
+Kimi K2 model IDs:
+
+{/_moonshot-kimi-k2-model-refs:start_/ && null}
+
+- `moonshot/kimi-k2.5`
+- `moonshot/kimi-k2-0905-preview`
+- `moonshot/kimi-k2-turbo-preview`
+- `moonshot/kimi-k2-thinking`
+- `moonshot/kimi-k2-thinking-turbo`
+  {/_moonshot-kimi-k2-model-refs:end_/ && null}
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "moonshot/kimi-k2.5" } },
+  },
+  models: {
+    mode: "merge",
+    providers: {
+      moonshot: {
+        baseUrl: "https://api.moonshot.ai/v1",
+        apiKey: "${MOONSHOT_API_KEY}",
+        api: "openai-completions",
+        models: [{ id: "kimi-k2.5", name: "Kimi K2.5" }],
+      },
+    },
+  },
+}
+```
+
+### Kimi Coding
+
+Kimi Coding uses Moonshot AI's Anthropic-compatible endpoint:
+
+- Provider: `kimi-coding`
+- Auth: `KIMI_API_KEY`
+- Example model: `kimi-coding/k2p5`
+
+```json5
+{
+  env: { KIMI_API_KEY: "sk-..." },
+  agents: {
+    defaults: { model: { primary: "kimi-coding/k2p5" } },
+  },
+}
+```
+
+### Qwen OAuth (free tier)
+
+Qwen provides OAuth access to Qwen Coder + Vision via a device-code flow.
+Enable the bundled plugin, then log in:
+
+```bash
+openclaw plugins enable qwen-portal-auth
+openclaw models auth login --provider qwen-portal --set-default
+```
+
+Model refs:
+
+- `qwen-portal/coder-model`
+- `qwen-portal/vision-model`
+
+See [/providers/qwen](/providers/qwen) for setup details and notes.
+
+### Volcano Engine (Doubao)
+
+Volcano Engine (火山引擎) provides access to Doubao and other models in China.
+
+- Provider: `volcengine` (coding: `volcengine-plan`)
+- Auth: `VOLCANO_ENGINE_API_KEY`
+- Example model: `volcengine/doubao-seed-1-8-251228`
+- CLI: `openclaw onboard --auth-choice volcengine-api-key`
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "volcengine/doubao-seed-1-8-251228" } },
+  },
+}
+```
+
+Available models:
+
+- `volcengine/doubao-seed-1-8-251228` (Doubao Seed 1.8)
+- `volcengine/doubao-seed-code-preview-251028`
+- `volcengine/kimi-k2-5-260127` (Kimi K2.5)
+- `volcengine/glm-4-7-251222` (GLM 4.7)
+- `volcengine/deepseek-v3-2-251201` (DeepSeek V3.2 128K)
+
+Coding models (`volcengine-plan`):
+
+- `volcengine-plan/ark-code-latest`
+- `volcengine-plan/doubao-seed-code`
+- `volcengine-plan/kimi-k2.5`
+- `volcengine-plan/kimi-k2-thinking`
+- `volcengine-plan/glm-4.7`
+
+### BytePlus (International)
+
+BytePlus ARK provides access to the same models as Volcano Engine for international users.
+
+- Provider: `byteplus` (coding: `byteplus-plan`)
+- Auth: `BYTEPLUS_API_KEY`
+- Example model: `byteplus/seed-1-8-251228`
+- CLI: `openclaw onboard --auth-choice byteplus-api-key`
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "byteplus/seed-1-8-251228" } },
+  },
+}
+```
+
+Available models:
+
+- `byteplus/seed-1-8-251228` (Seed 1.8)
+- `byteplus/kimi-k2-5-260127` (Kimi K2.5)
+- `byteplus/glm-4-7-251222` (GLM 4.7)
+
+Coding models (`byteplus-plan`):
+
+- `byteplus-plan/ark-code-latest`
+- `byteplus-plan/doubao-seed-code`
+- `byteplus-plan/kimi-k2.5`
+- `byteplus-plan/kimi-k2-thinking`
+- `byteplus-plan/glm-4.7`
+
+### Synthetic
+
+Synthetic provides Anthropic-compatible models behind the `synthetic` provider:
+
+- Provider: `synthetic`
+- Auth: `SYNTHETIC_API_KEY`
+- Example model: `synthetic/hf:MiniMaxAI/MiniMax-M2.1`
+- CLI: `openclaw onboard --auth-choice synthetic-api-key`
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "synthetic/hf:MiniMaxAI/MiniMax-M2.1" } },
+  },
+  models: {
+    mode: "merge",
+    providers: {
+      synthetic: {
+        baseUrl: "https://api.synthetic.new/anthropic",
+        apiKey: "${SYNTHETIC_API_KEY}",
+        api: "anthropic-messages",
+        models: [{ id: "hf:MiniMaxAI/MiniMax-M2.1", name: "MiniMax M2.1" }],
+      },
+    },
+  },
+}
+```
+
+### MiniMax
+
+MiniMax is configured via `models.providers` because it uses custom endpoints:
+
+- MiniMax (Anthropic‑compatible): `--auth-choice minimax-api`
+- Auth: `MINIMAX_API_KEY`
+
+See [/providers/minimax](/providers/minimax) for setup details, model options, and config snippets.
+
+### Ollama
+
+Ollama is a local LLM runtime that provides an OpenAI-compatible API:
+
+- Provider: `ollama`
+- Auth: None required (local server)
+- Example model: `ollama/llama3.3`
+- Installation: [https://ollama.ai](https://ollama.ai)
+
+```bash
+# Install Ollama, then pull a model:
+ollama pull llama3.3
+```
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "ollama/llama3.3" } },
+  },
+}
+```
+
+Ollama is automatically detected when running locally at `http://127.0.0.1:11434/v1`. See [/providers/ollama](/providers/ollama) for model recommendations and custom configuration.
+
+### vLLM
+
+vLLM is a local (or self-hosted) OpenAI-compatible server:
+
+- Provider: `vllm`
+- Auth: Optional (depends on your server)
+- Default base URL: `http://127.0.0.1:8000/v1`
+
+To opt in to auto-discovery locally (any value works if your server doesn’t enforce auth):
+
+```bash
+export VLLM_API_KEY="vllm-local"
+```
+
+Then set a model (replace with one of the IDs returned by `/v1/models`):
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "vllm/your-model-id" } },
+  },
+}
+```
+
+See [/providers/vllm](/providers/vllm) for details.
+
+### Local proxies (LM Studio, vLLM, LiteLLM, etc.)
+
+Example (OpenAI‑compatible):
+
+```json5
+{
+  agents: {
+    defaults: {
+      model: { primary: "lmstudio/minimax-m2.1-gs32" },
+      models: { "lmstudio/minimax-m2.1-gs32": { alias: "Minimax" } },
+    },
+  },
+  models: {
+    providers: {
+      lmstudio: {
+        baseUrl: "http://localhost:1234/v1",
+        apiKey: "LMSTUDIO_KEY",
+        api: "openai-completions",
+        models: [
+          {
+            id: "minimax-m2.1-gs32",
+            name: "MiniMax M2.1",
+            reasoning: false,
+            input: ["text"],
+            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+            contextWindow: 200000,
+            maxTokens: 8192,
+          },
+        ],
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- For custom providers, `reasoning`, `input`, `cost`, `contextWindow`, and `maxTokens` are optional.
+  When omitted, OpenClaw defaults to:
+  - `reasoning: false`
+  - `input: ["text"]`
+  - `cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }`
+  - `contextWindow: 200000`
+  - `maxTokens: 8192`
+- Recommended: set explicit values that match your proxy/model limits.
+
+## CLI examples
+
+```bash
+openclaw onboard --auth-choice opencode-zen
+openclaw models set opencode/claude-opus-4-6
+openclaw models list
+```
+
+See also: [/gateway/configuration](/gateway/configuration) for full configuration examples.
--- a/openclaw/docs/concepts/models.md
+++ b/openclaw/docs/concepts/models.md
@@ -0,0 +1,215 @@
+---
+summary: "Models CLI: list, set, aliases, fallbacks, scan, status"
+read_when:
+  - Adding or modifying models CLI (models list/set/scan/aliases/fallbacks)
+  - Changing model fallback behavior or selection UX
+  - Updating model scan probes (tools/images)
+title: "Models CLI"
+---
+
+# Models CLI
+
+See [/concepts/model-failover](/concepts/model-failover) for auth profile
+rotation, cooldowns, and how that interacts with fallbacks.
+Quick provider overview + examples: [/concepts/model-providers](/concepts/model-providers).
+
+## How model selection works
+
+OpenClaw selects models in this order:
+
+1. **Primary** model (`agents.defaults.model.primary` or `agents.defaults.model`).
+2. **Fallbacks** in `agents.defaults.model.fallbacks` (in order).
+3. **Provider auth failover** happens inside a provider before moving to the
+   next model.
+
+Related:
+
+- `agents.defaults.models` is the allowlist/catalog of models OpenClaw can use (plus aliases).
+- `agents.defaults.imageModel` is used **only when** the primary model can’t accept images.
+- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
+
+## Quick model picks (anecdotal)
+
+- **GLM**: a bit better for coding/tool calling.
+- **MiniMax**: better for writing and vibes.
+
+## Setup wizard (recommended)
+
+If you don’t want to hand-edit config, run the onboarding wizard:
+
+```bash
+openclaw onboard
+```
+
+It can set up model + auth for common providers, including **OpenAI Code (Codex)
+subscription** (OAuth) and **Anthropic** (API key recommended; `claude
+setup-token` also supported).
+
+## Config keys (overview)
+
+- `agents.defaults.model.primary` and `agents.defaults.model.fallbacks`
+- `agents.defaults.imageModel.primary` and `agents.defaults.imageModel.fallbacks`
+- `agents.defaults.models` (allowlist + aliases + provider params)
+- `models.providers` (custom providers written into `models.json`)
+
+Model refs are normalized to lowercase. Provider aliases like `z.ai/*` normalize
+to `zai/*`.
+
+Provider configuration examples (including OpenCode Zen) live in
+[/gateway/configuration](/gateway/configuration#opencode-zen-multi-model-proxy).
+
+## “Model is not allowed” (and why replies stop)
+
+If `agents.defaults.models` is set, it becomes the **allowlist** for `/model` and for
+session overrides. When a user selects a model that isn’t in that allowlist,
+OpenClaw returns:
+
+```
+Model "provider/model" is not allowed. Use /model to list available models.
+```
+
+This happens **before** a normal reply is generated, so the message can feel
+like it “didn’t respond.” The fix is to either:
+
+- Add the model to `agents.defaults.models`, or
+- Clear the allowlist (remove `agents.defaults.models`), or
+- Pick a model from `/model list`.
+
+Example allowlist config:
+
+```json5
+{
+  agent: {
+    model: { primary: "anthropic/claude-sonnet-4-5" },
+    models: {
+      "anthropic/claude-sonnet-4-5": { alias: "Sonnet" },
+      "anthropic/claude-opus-4-6": { alias: "Opus" },
+    },
+  },
+}
+```
+
+## Switching models in chat (`/model`)
+
+You can switch models for the current session without restarting:
+
+```
+/model
+/model list
+/model 3
+/model openai/gpt-5.2
+/model status
+```
+
+Notes:
+
+- `/model` (and `/model list`) is a compact, numbered picker (model family + available providers).
+- On Discord, `/model` and `/models` open an interactive picker with provider and model dropdowns plus a Submit step.
+- `/model <#>` selects from that picker.
+- `/model status` is the detailed view (auth candidates and, when configured, provider endpoint `baseUrl` + `api` mode).
+- Model refs are parsed by splitting on the **first** `/`. Use `provider/model` when typing `/model <ref>`.
+- If the model ID itself contains `/` (OpenRouter-style), you must include the provider prefix (example: `/model openrouter/moonshotai/kimi-k2`).
+- If you omit the provider, OpenClaw treats the input as an alias or a model for the **default provider** (only works when there is no `/` in the model ID).
+
+Full command behavior/config: [Slash commands](/tools/slash-commands).
+
+## CLI commands
+
+```bash
+openclaw models list
+openclaw models status
+openclaw models set <provider/model>
+openclaw models set-image <provider/model>
+
+openclaw models aliases list
+openclaw models aliases add <alias> <provider/model>
+openclaw models aliases remove <alias>
+
+openclaw models fallbacks list
+openclaw models fallbacks add <provider/model>
+openclaw models fallbacks remove <provider/model>
+openclaw models fallbacks clear
+
+openclaw models image-fallbacks list
+openclaw models image-fallbacks add <provider/model>
+openclaw models image-fallbacks remove <provider/model>
+openclaw models image-fallbacks clear
+```
+
+`openclaw models` (no subcommand) is a shortcut for `models status`.
+
+### `models list`
+
+Shows configured models by default. Useful flags:
+
+- `--all`: full catalog
+- `--local`: local providers only
+- `--provider <name>`: filter by provider
+- `--plain`: one model per line
+- `--json`: machine‑readable output
+
+### `models status`
+
+Shows the resolved primary model, fallbacks, image model, and an auth overview
+of configured providers. It also surfaces OAuth expiry status for profiles found
+in the auth store (warns within 24h by default). `--plain` prints only the
+resolved primary model.
+OAuth status is always shown (and included in `--json` output). If a configured
+provider has no credentials, `models status` prints a **Missing auth** section.
+JSON includes `auth.oauth` (warn window + profiles) and `auth.providers`
+(effective auth per provider).
+Use `--check` for automation (exit `1` when missing/expired, `2` when expiring).
+
+Preferred Anthropic auth is the Claude Code CLI setup-token (run anywhere; paste on the gateway host if needed):
+
+```bash
+claude setup-token
+openclaw models status
+```
+
+## Scanning (OpenRouter free models)
+
+`openclaw models scan` inspects OpenRouter’s **free model catalog** and can
+optionally probe models for tool and image support.
+
+Key flags:
+
+- `--no-probe`: skip live probes (metadata only)
+- `--min-params <b>`: minimum parameter size (billions)
+- `--max-age-days <days>`: skip older models
+- `--provider <name>`: provider prefix filter
+- `--max-candidates <n>`: fallback list size
+- `--set-default`: set `agents.defaults.model.primary` to the first selection
+- `--set-image`: set `agents.defaults.imageModel.primary` to the first image selection
+
+Probing requires an OpenRouter API key (from auth profiles or
+`OPENROUTER_API_KEY`). Without a key, use `--no-probe` to list candidates only.
+
+Scan results are ranked by:
+
+1. Image support
+2. Tool latency
+3. Context size
+4. Parameter count
+
+Input
+
+- OpenRouter `/models` list (filter `:free`)
+- Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY` (see [/environment](/help/environment))
+- Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates`
+- Probe controls: `--timeout`, `--concurrency`
+
+When run in a TTY, you can select fallbacks interactively. In non‑interactive
+mode, pass `--yes` to accept defaults.
+
+## Models registry (`models.json`)
+
+Custom providers in `models.providers` are written into `models.json` under the
+agent directory (default `~/.openclaw/agents/<agentId>/models.json`). This file
+is merged by default unless `models.mode` is set to `replace`.
+
+Merge mode precedence for matching provider IDs:
+
+- Non-empty `apiKey`/`baseUrl` already present in the agent `models.json` win.
+- Empty or missing agent `apiKey`/`baseUrl` fall back to config `models.providers`.
+- Other provider fields are refreshed from config and normalized catalog data.
--- a/openclaw/docs/concepts/multi-agent.md
+++ b/openclaw/docs/concepts/multi-agent.md
@@ -0,0 +1,542 @@
+---
+summary: "Multi-agent routing: isolated agents, channel accounts, and bindings"
+title: Multi-Agent Routing
+read_when: "You want multiple isolated agents (workspaces + auth) in one gateway process."
+status: active
+---
+
+# Multi-Agent Routing
+
+Goal: multiple _isolated_ agents (separate workspace + `agentDir` + sessions), plus multiple channel accounts (e.g. two WhatsApps) in one running Gateway. Inbound is routed to an agent via bindings.
+
+## What is “one agent”?
+
+An **agent** is a fully scoped brain with its own:
+
+- **Workspace** (files, AGENTS.md/SOUL.md/USER.md, local notes, persona rules).
+- **State directory** (`agentDir`) for auth profiles, model registry, and per-agent config.
+- **Session store** (chat history + routing state) under `~/.openclaw/agents/<agentId>/sessions`.
+
+Auth profiles are **per-agent**. Each agent reads from its own:
+
+```text
+~/.openclaw/agents/<agentId>/agent/auth-profiles.json
+```
+
+Main agent credentials are **not** shared automatically. Never reuse `agentDir`
+across agents (it causes auth/session collisions). If you want to share creds,
+copy `auth-profiles.json` into the other agent's `agentDir`.
+
+Skills are per-agent via each workspace’s `skills/` folder, with shared skills
+available from `~/.openclaw/skills`. See [Skills: per-agent vs shared](/tools/skills#per-agent-vs-shared-skills).
+
+The Gateway can host **one agent** (default) or **many agents** side-by-side.
+
+**Workspace note:** each agent’s workspace is the **default cwd**, not a hard
+sandbox. Relative paths resolve inside the workspace, but absolute paths can
+reach other host locations unless sandboxing is enabled. See
+[Sandboxing](/gateway/sandboxing).
+
+## Paths (quick map)
+
+- Config: `~/.openclaw/openclaw.json` (or `OPENCLAW_CONFIG_PATH`)
+- State dir: `~/.openclaw` (or `OPENCLAW_STATE_DIR`)
+- Workspace: `~/.openclaw/workspace` (or `~/.openclaw/workspace-<agentId>`)
+- Agent dir: `~/.openclaw/agents/<agentId>/agent` (or `agents.list[].agentDir`)
+- Sessions: `~/.openclaw/agents/<agentId>/sessions`
+
+### Single-agent mode (default)
+
+If you do nothing, OpenClaw runs a single agent:
+
+- `agentId` defaults to **`main`**.
+- Sessions are keyed as `agent:main:<mainKey>`.
+- Workspace defaults to `~/.openclaw/workspace` (or `~/.openclaw/workspace-<profile>` when `OPENCLAW_PROFILE` is set).
+- State defaults to `~/.openclaw/agents/main/agent`.
+
+## Agent helper
+
+Use the agent wizard to add a new isolated agent:
+
+```bash
+openclaw agents add work
+```
+
+Then add `bindings` (or let the wizard do it) to route inbound messages.
+
+Verify with:
+
+```bash
+openclaw agents list --bindings
+```
+
+## Quick start
+
+<Steps>
+  <Step title="Create each agent workspace">
+
+Use the wizard or create workspaces manually:
+
+```bash
+openclaw agents add coding
+openclaw agents add social
+```
+
+Each agent gets its own workspace with `SOUL.md`, `AGENTS.md`, and optional `USER.md`, plus a dedicated `agentDir` and session store under `~/.openclaw/agents/<agentId>`.
+
+  </Step>
+
+  <Step title="Create channel accounts">
+
+Create one account per agent on your preferred channels:
+
+- Discord: one bot per agent, enable Message Content Intent, copy each token.
+- Telegram: one bot per agent via BotFather, copy each token.
+- WhatsApp: link each phone number per account.
+
+```bash
+openclaw channels login --channel whatsapp --account work
+```
+
+See channel guides: [Discord](/channels/discord), [Telegram](/channels/telegram), [WhatsApp](/channels/whatsapp).
+
+  </Step>
+
+  <Step title="Add agents, accounts, and bindings">
+
+Add agents under `agents.list`, channel accounts under `channels.<channel>.accounts`, and connect them with `bindings` (examples below).
+
+  </Step>
+
+  <Step title="Restart and verify">
+
+```bash
+openclaw gateway restart
+openclaw agents list --bindings
+openclaw channels status --probe
+```
+
+  </Step>
+</Steps>
+
+## Multiple agents = multiple people, multiple personalities
+
+With **multiple agents**, each `agentId` becomes a **fully isolated persona**:
+
+- **Different phone numbers/accounts** (per channel `accountId`).
+- **Different personalities** (per-agent workspace files like `AGENTS.md` and `SOUL.md`).
+- **Separate auth + sessions** (no cross-talk unless explicitly enabled).
+
+This lets **multiple people** share one Gateway server while keeping their AI “brains” and data isolated.
+
+## One WhatsApp number, multiple people (DM split)
+
+You can route **different WhatsApp DMs** to different agents while staying on **one WhatsApp account**. Match on sender E.164 (like `+15551234567`) with `peer.kind: "direct"`. Replies still come from the same WhatsApp number (no per‑agent sender identity).
+
+Important detail: direct chats collapse to the agent’s **main session key**, so true isolation requires **one agent per person**.
+
+Example:
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "alex", workspace: "~/.openclaw/workspace-alex" },
+      { id: "mia", workspace: "~/.openclaw/workspace-mia" },
+    ],
+  },
+  bindings: [
+    {
+      agentId: "alex",
+      match: { channel: "whatsapp", peer: { kind: "direct", id: "+15551230001" } },
+    },
+    {
+      agentId: "mia",
+      match: { channel: "whatsapp", peer: { kind: "direct", id: "+15551230002" } },
+    },
+  ],
+  channels: {
+    whatsapp: {
+      dmPolicy: "allowlist",
+      allowFrom: ["+15551230001", "+15551230002"],
+    },
+  },
+}
+```
+
+Notes:
+
+- DM access control is **global per WhatsApp account** (pairing/allowlist), not per agent.
+- For shared groups, bind the group to one agent or use [Broadcast groups](/channels/broadcast-groups).
+
+## Routing rules (how messages pick an agent)
+
+Bindings are **deterministic** and **most-specific wins**:
+
+1. `peer` match (exact DM/group/channel id)
+2. `parentPeer` match (thread inheritance)
+3. `guildId + roles` (Discord role routing)
+4. `guildId` (Discord)
+5. `teamId` (Slack)
+6. `accountId` match for a channel
+7. channel-level match (`accountId: "*"`)
+8. fallback to default agent (`agents.list[].default`, else first list entry, default: `main`)
+
+If multiple bindings match in the same tier, the first one in config order wins.
+If a binding sets multiple match fields (for example `peer` + `guildId`), all specified fields are required (`AND` semantics).
+
+Important account-scope detail:
+
+- A binding that omits `accountId` matches the default account only.
+- Use `accountId: "*"` for a channel-wide fallback across all accounts.
+- If you later add the same binding for the same agent with an explicit account id, OpenClaw upgrades the existing channel-only binding to account-scoped instead of duplicating it.
+
+## Multiple accounts / phone numbers
+
+Channels that support **multiple accounts** (e.g. WhatsApp) use `accountId` to identify
+each login. Each `accountId` can be routed to a different agent, so one server can host
+multiple phone numbers without mixing sessions.
+
+## Concepts
+
+- `agentId`: one “brain” (workspace, per-agent auth, per-agent session store).
+- `accountId`: one channel account instance (e.g. WhatsApp account `"personal"` vs `"biz"`).
+- `binding`: routes inbound messages to an `agentId` by `(channel, accountId, peer)` and optionally guild/team ids.
+- Direct chats collapse to `agent:<agentId>:<mainKey>` (per-agent “main”; `session.mainKey`).
+
+## Platform examples
+
+### Discord bots per agent
+
+Each Discord bot account maps to a unique `accountId`. Bind each account to an agent and keep allowlists per bot.
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "main", workspace: "~/.openclaw/workspace-main" },
+      { id: "coding", workspace: "~/.openclaw/workspace-coding" },
+    ],
+  },
+  bindings: [
+    { agentId: "main", match: { channel: "discord", accountId: "default" } },
+    { agentId: "coding", match: { channel: "discord", accountId: "coding" } },
+  ],
+  channels: {
+    discord: {
+      groupPolicy: "allowlist",
+      accounts: {
+        default: {
+          token: "DISCORD_BOT_TOKEN_MAIN",
+          guilds: {
+            "123456789012345678": {
+              channels: {
+                "222222222222222222": { allow: true, requireMention: false },
+              },
+            },
+          },
+        },
+        coding: {
+          token: "DISCORD_BOT_TOKEN_CODING",
+          guilds: {
+            "123456789012345678": {
+              channels: {
+                "333333333333333333": { allow: true, requireMention: false },
+              },
+            },
+          },
+        },
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- Invite each bot to the guild and enable Message Content Intent.
+- Tokens live in `channels.discord.accounts.<id>.token` (default account can use `DISCORD_BOT_TOKEN`).
+
+### Telegram bots per agent
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "main", workspace: "~/.openclaw/workspace-main" },
+      { id: "alerts", workspace: "~/.openclaw/workspace-alerts" },
+    ],
+  },
+  bindings: [
+    { agentId: "main", match: { channel: "telegram", accountId: "default" } },
+    { agentId: "alerts", match: { channel: "telegram", accountId: "alerts" } },
+  ],
+  channels: {
+    telegram: {
+      accounts: {
+        default: {
+          botToken: "123456:ABC...",
+          dmPolicy: "pairing",
+        },
+        alerts: {
+          botToken: "987654:XYZ...",
+          dmPolicy: "allowlist",
+          allowFrom: ["tg:123456789"],
+        },
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- Create one bot per agent with BotFather and copy each token.
+- Tokens live in `channels.telegram.accounts.<id>.botToken` (default account can use `TELEGRAM_BOT_TOKEN`).
+
+### WhatsApp numbers per agent
+
+Link each account before starting the gateway:
+
+```bash
+openclaw channels login --channel whatsapp --account personal
+openclaw channels login --channel whatsapp --account biz
+```
+
+`~/.openclaw/openclaw.json` (JSON5):
+
+```js
+{
+  agents: {
+    list: [
+      {
+        id: "home",
+        default: true,
+        name: "Home",
+        workspace: "~/.openclaw/workspace-home",
+        agentDir: "~/.openclaw/agents/home/agent",
+      },
+      {
+        id: "work",
+        name: "Work",
+        workspace: "~/.openclaw/workspace-work",
+        agentDir: "~/.openclaw/agents/work/agent",
+      },
+    ],
+  },
+
+  // Deterministic routing: first match wins (most-specific first).
+  bindings: [
+    { agentId: "home", match: { channel: "whatsapp", accountId: "personal" } },
+    { agentId: "work", match: { channel: "whatsapp", accountId: "biz" } },
+
+    // Optional per-peer override (example: send a specific group to work agent).
+    {
+      agentId: "work",
+      match: {
+        channel: "whatsapp",
+        accountId: "personal",
+        peer: { kind: "group", id: "1203630...@g.us" },
+      },
+    },
+  ],
+
+  // Off by default: agent-to-agent messaging must be explicitly enabled + allowlisted.
+  tools: {
+    agentToAgent: {
+      enabled: false,
+      allow: ["home", "work"],
+    },
+  },
+
+  channels: {
+    whatsapp: {
+      accounts: {
+        personal: {
+          // Optional override. Default: ~/.openclaw/credentials/whatsapp/personal
+          // authDir: "~/.openclaw/credentials/whatsapp/personal",
+        },
+        biz: {
+          // Optional override. Default: ~/.openclaw/credentials/whatsapp/biz
+          // authDir: "~/.openclaw/credentials/whatsapp/biz",
+        },
+      },
+    },
+  },
+}
+```
+
+## Example: WhatsApp daily chat + Telegram deep work
+
+Split by channel: route WhatsApp to a fast everyday agent and Telegram to an Opus agent.
+
+```json5
+{
+  agents: {
+    list: [
+      {
+        id: "chat",
+        name: "Everyday",
+        workspace: "~/.openclaw/workspace-chat",
+        model: "anthropic/claude-sonnet-4-5",
+      },
+      {
+        id: "opus",
+        name: "Deep Work",
+        workspace: "~/.openclaw/workspace-opus",
+        model: "anthropic/claude-opus-4-6",
+      },
+    ],
+  },
+  bindings: [
+    { agentId: "chat", match: { channel: "whatsapp" } },
+    { agentId: "opus", match: { channel: "telegram" } },
+  ],
+}
+```
+
+Notes:
+
+- If you have multiple accounts for a channel, add `accountId` to the binding (for example `{ channel: "whatsapp", accountId: "personal" }`).
+- To route a single DM/group to Opus while keeping the rest on chat, add a `match.peer` binding for that peer; peer matches always win over channel-wide rules.
+
+## Example: same channel, one peer to Opus
+
+Keep WhatsApp on the fast agent, but route one DM to Opus:
+
+```json5
+{
+  agents: {
+    list: [
+      {
+        id: "chat",
+        name: "Everyday",
+        workspace: "~/.openclaw/workspace-chat",
+        model: "anthropic/claude-sonnet-4-5",
+      },
+      {
+        id: "opus",
+        name: "Deep Work",
+        workspace: "~/.openclaw/workspace-opus",
+        model: "anthropic/claude-opus-4-6",
+      },
+    ],
+  },
+  bindings: [
+    {
+      agentId: "opus",
+      match: { channel: "whatsapp", peer: { kind: "direct", id: "+15551234567" } },
+    },
+    { agentId: "chat", match: { channel: "whatsapp" } },
+  ],
+}
+```
+
+Peer bindings always win, so keep them above the channel-wide rule.
+
+## Family agent bound to a WhatsApp group
+
+Bind a dedicated family agent to a single WhatsApp group, with mention gating
+and a tighter tool policy:
+
+```json5
+{
+  agents: {
+    list: [
+      {
+        id: "family",
+        name: "Family",
+        workspace: "~/.openclaw/workspace-family",
+        identity: { name: "Family Bot" },
+        groupChat: {
+          mentionPatterns: ["@family", "@familybot", "@Family Bot"],
+        },
+        sandbox: {
+          mode: "all",
+          scope: "agent",
+        },
+        tools: {
+          allow: [
+            "exec",
+            "read",
+            "sessions_list",
+            "sessions_history",
+            "sessions_send",
+            "sessions_spawn",
+            "session_status",
+          ],
+          deny: ["write", "edit", "apply_patch", "browser", "canvas", "nodes", "cron"],
+        },
+      },
+    ],
+  },
+  bindings: [
+    {
+      agentId: "family",
+      match: {
+        channel: "whatsapp",
+        peer: { kind: "group", id: "120363999999999999@g.us" },
+      },
+    },
+  ],
+}
+```
+
+Notes:
+
+- Tool allow/deny lists are **tools**, not skills. If a skill needs to run a
+  binary, ensure `exec` is allowed and the binary exists in the sandbox.
+- For stricter gating, set `agents.list[].groupChat.mentionPatterns` and keep
+  group allowlists enabled for the channel.
+
+## Per-Agent Sandbox and Tool Configuration
+
+Starting with v2026.1.6, each agent can have its own sandbox and tool restrictions:
+
+```js
+{
+  agents: {
+    list: [
+      {
+        id: "personal",
+        workspace: "~/.openclaw/workspace-personal",
+        sandbox: {
+          mode: "off",  // No sandbox for personal agent
+        },
+        // No tool restrictions - all tools available
+      },
+      {
+        id: "family",
+        workspace: "~/.openclaw/workspace-family",
+        sandbox: {
+          mode: "all",     // Always sandboxed
+          scope: "agent",  // One container per agent
+          docker: {
+            // Optional one-time setup after container creation
+            setupCommand: "apt-get update && apt-get install -y git curl",
+          },
+        },
+        tools: {
+          allow: ["read"],                    // Only read tool
+          deny: ["exec", "write", "edit", "apply_patch"],    // Deny others
+        },
+      },
+    ],
+  },
+}
+```
+
+Note: `setupCommand` lives under `sandbox.docker` and runs once on container creation.
+Per-agent `sandbox.docker.*` overrides are ignored when the resolved scope is `"shared"`.
+
+**Benefits:**
+
+- **Security isolation**: Restrict tools for untrusted agents
+- **Resource control**: Sandbox specific agents while keeping others on host
+- **Flexible policies**: Different permissions per agent
+
+Note: `tools.elevated` is **global** and sender-based; it is not configurable per agent.
+If you need per-agent boundaries, use `agents.list[].tools` to deny `exec`.
+For group targeting, use `agents.list[].groupChat.mentionPatterns` so @mentions map cleanly to the intended agent.
+
+See [Multi-Agent Sandbox & Tools](/tools/multi-agent-sandbox-tools) for detailed examples.
--- a/openclaw/docs/concepts/oauth.md
+++ b/openclaw/docs/concepts/oauth.md
@@ -0,0 +1,148 @@
+---
+summary: "OAuth in OpenClaw: token exchange, storage, and multi-account patterns"
+read_when:
+  - You want to understand OpenClaw OAuth end-to-end
+  - You hit token invalidation / logout issues
+  - You want setup-token or OAuth auth flows
+  - You want multiple accounts or profile routing
+title: "OAuth"
+---
+
+# OAuth
+
+OpenClaw supports “subscription auth” via OAuth for providers that offer it (notably **OpenAI Codex (ChatGPT OAuth)**). For Anthropic subscriptions, use the **setup-token** flow. This page explains:
+
+- how the OAuth **token exchange** works (PKCE)
+- where tokens are **stored** (and why)
+- how to handle **multiple accounts** (profiles + per-session overrides)
+
+OpenClaw also supports **provider plugins** that ship their own OAuth or API‑key
+flows. Run them via:
+
+```bash
+openclaw models auth login --provider <id>
+```
+
+## The token sink (why it exists)
+
+OAuth providers commonly mint a **new refresh token** during login/refresh flows. Some providers (or OAuth clients) can invalidate older refresh tokens when a new one is issued for the same user/app.
+
+Practical symptom:
+
+- you log in via OpenClaw _and_ via Claude Code / Codex CLI → one of them randomly gets “logged out” later
+
+To reduce that, OpenClaw treats `auth-profiles.json` as a **token sink**:
+
+- the runtime reads credentials from **one place**
+- we can keep multiple profiles and route them deterministically
+
+## Storage (where tokens live)
+
+Secrets are stored **per-agent**:
+
+- Auth profiles (OAuth + API keys + optional value-level refs): `~/.openclaw/agents/<agentId>/agent/auth-profiles.json`
+- Legacy compatibility file: `~/.openclaw/agents/<agentId>/agent/auth.json`
+  (static `api_key` entries are scrubbed when discovered)
+
+Legacy import-only file (still supported, but not the main store):
+
+- `~/.openclaw/credentials/oauth.json` (imported into `auth-profiles.json` on first use)
+
+All of the above also respect `$OPENCLAW_STATE_DIR` (state dir override). Full reference: [/gateway/configuration](/gateway/configuration#auth-storage-oauth--api-keys)
+
+For static secret refs and runtime snapshot activation behavior, see [Secrets Management](/gateway/secrets).
+
+## Anthropic setup-token (subscription auth)
+
+Run `claude setup-token` on any machine, then paste it into OpenClaw:
+
+```bash
+openclaw models auth setup-token --provider anthropic
+```
+
+If you generated the token elsewhere, paste it manually:
+
+```bash
+openclaw models auth paste-token --provider anthropic
+```
+
+Verify:
+
+```bash
+openclaw models status
+```
+
+## OAuth exchange (how login works)
+
+OpenClaw’s interactive login flows are implemented in `@mariozechner/pi-ai` and wired into the wizards/commands.
+
+### Anthropic (Claude Pro/Max) setup-token
+
+Flow shape:
+
+1. run `claude setup-token`
+2. paste the token into OpenClaw
+3. store as a token auth profile (no refresh)
+
+The wizard path is `openclaw onboard` → auth choice `setup-token` (Anthropic).
+
+### OpenAI Codex (ChatGPT OAuth)
+
+Flow shape (PKCE):
+
+1. generate PKCE verifier/challenge + random `state`
+2. open `https://auth.openai.com/oauth/authorize?...`
+3. try to capture callback on `http://127.0.0.1:1455/auth/callback`
+4. if callback can’t bind (or you’re remote/headless), paste the redirect URL/code
+5. exchange at `https://auth.openai.com/oauth/token`
+6. extract `accountId` from the access token and store `{ access, refresh, expires, accountId }`
+
+Wizard path is `openclaw onboard` → auth choice `openai-codex`.
+
+## Refresh + expiry
+
+Profiles store an `expires` timestamp.
+
+At runtime:
+
+- if `expires` is in the future → use the stored access token
+- if expired → refresh (under a file lock) and overwrite the stored credentials
+
+The refresh flow is automatic; you generally don't need to manage tokens manually.
+
+## Multiple accounts (profiles) + routing
+
+Two patterns:
+
+### 1) Preferred: separate agents
+
+If you want “personal” and “work” to never interact, use isolated agents (separate sessions + credentials + workspace):
+
+```bash
+openclaw agents add work
+openclaw agents add personal
+```
+
+Then configure auth per-agent (wizard) and route chats to the right agent.
+
+### 2) Advanced: multiple profiles in one agent
+
+`auth-profiles.json` supports multiple profile IDs for the same provider.
+
+Pick which profile is used:
+
+- globally via config ordering (`auth.order`)
+- per-session via `/model ...@<profileId>`
+
+Example (session override):
+
+- `/model Opus@anthropic:work`
+
+How to see what profile IDs exist:
+
+- `openclaw channels list --json` (shows `auth[]`)
+
+Related docs:
+
+- [/concepts/model-failover](/concepts/model-failover) (rotation + cooldown rules)
+- [/tools/slash-commands](/tools/slash-commands) (command surface)
--- a/openclaw/docs/concepts/presence.md
+++ b/openclaw/docs/concepts/presence.md
@@ -0,0 +1,102 @@
+---
+summary: "How OpenClaw presence entries are produced, merged, and displayed"
+read_when:
+  - Debugging the Instances tab
+  - Investigating duplicate or stale instance rows
+  - Changing gateway WS connect or system-event beacons
+title: "Presence"
+---
+
+# Presence
+
+OpenClaw “presence” is a lightweight, best‑effort view of:
+
+- the **Gateway** itself, and
+- **clients connected to the Gateway** (mac app, WebChat, CLI, etc.)
+
+Presence is used primarily to render the macOS app’s **Instances** tab and to
+provide quick operator visibility.
+
+## Presence fields (what shows up)
+
+Presence entries are structured objects with fields like:
+
+- `instanceId` (optional but strongly recommended): stable client identity (usually `connect.client.instanceId`)
+- `host`: human‑friendly host name
+- `ip`: best‑effort IP address
+- `version`: client version string
+- `deviceFamily` / `modelIdentifier`: hardware hints
+- `mode`: `ui`, `webchat`, `cli`, `backend`, `probe`, `test`, `node`, ...
+- `lastInputSeconds`: “seconds since last user input” (if known)
+- `reason`: `self`, `connect`, `node-connected`, `periodic`, ...
+- `ts`: last update timestamp (ms since epoch)
+
+## Producers (where presence comes from)
+
+Presence entries are produced by multiple sources and **merged**.
+
+### 1) Gateway self entry
+
+The Gateway always seeds a “self” entry at startup so UIs show the gateway host
+even before any clients connect.
+
+### 2) WebSocket connect
+
+Every WS client begins with a `connect` request. On successful handshake the
+Gateway upserts a presence entry for that connection.
+
+#### Why one‑off CLI commands don’t show up
+
+The CLI often connects for short, one‑off commands. To avoid spamming the
+Instances list, `client.mode === "cli"` is **not** turned into a presence entry.
+
+### 3) `system-event` beacons
+
+Clients can send richer periodic beacons via the `system-event` method. The mac
+app uses this to report host name, IP, and `lastInputSeconds`.
+
+### 4) Node connects (role: node)
+
+When a node connects over the Gateway WebSocket with `role: node`, the Gateway
+upserts a presence entry for that node (same flow as other WS clients).
+
+## Merge + dedupe rules (why `instanceId` matters)
+
+Presence entries are stored in a single in‑memory map:
+
+- Entries are keyed by a **presence key**.
+- The best key is a stable `instanceId` (from `connect.client.instanceId`) that survives restarts.
+- Keys are case‑insensitive.
+
+If a client reconnects without a stable `instanceId`, it may show up as a
+**duplicate** row.
+
+## TTL and bounded size
+
+Presence is intentionally ephemeral:
+
+- **TTL:** entries older than 5 minutes are pruned
+- **Max entries:** 200 (oldest dropped first)
+
+This keeps the list fresh and avoids unbounded memory growth.
+
+## Remote/tunnel caveat (loopback IPs)
+
+When a client connects over an SSH tunnel / local port forward, the Gateway may
+see the remote address as `127.0.0.1`. To avoid overwriting a good client‑reported
+IP, loopback remote addresses are ignored.
+
+## Consumers
+
+### macOS Instances tab
+
+The macOS app renders the output of `system-presence` and applies a small status
+indicator (Active/Idle/Stale) based on the age of the last update.
+
+## Debugging tips
+
+- To see the raw list, call `system-presence` against the Gateway.
+- If you see duplicates:
+  - confirm clients send a stable `client.instanceId` in the handshake
+  - confirm periodic beacons use the same `instanceId`
+  - check whether the connection‑derived entry is missing `instanceId` (duplicates are expected)
--- a/openclaw/docs/concepts/queue.md
+++ b/openclaw/docs/concepts/queue.md
@@ -0,0 +1,89 @@
+---
+summary: "Command queue design that serializes inbound auto-reply runs"
+read_when:
+  - Changing auto-reply execution or concurrency
+title: "Command Queue"
+---
+
+# Command Queue (2026-01-16)
+
+We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.
+
+## Why
+
+- Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
+- Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.
+
+## How it works
+
+- A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
+- `runEmbeddedPiAgent` enqueues by **session key** (lane `session:<key>`) to guarantee only one active run per session.
+- Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agents.defaults.maxConcurrent`.
+- When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
+- Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.
+
+## Queue modes (per channel)
+
+Inbound messages can steer the current run, wait for a followup turn, or do both:
+
+- `steer`: inject immediately into the current run (cancels pending tool calls after the next tool boundary). If not streaming, falls back to followup.
+- `followup`: enqueue for the next agent turn after the current run ends.
+- `collect`: coalesce all queued messages into a **single** followup turn (default). If messages target different channels/threads, they drain individually to preserve routing.
+- `steer-backlog` (aka `steer+backlog`): steer now **and** preserve the message for a followup turn.
+- `interrupt` (legacy): abort the active run for that session, then run the newest message.
+- `queue` (legacy alias): same as `steer`.
+
+Steer-backlog means you can get a followup response after the steered run, so
+streaming surfaces can look like duplicates. Prefer `collect`/`steer` if you want
+one response per inbound message.
+Send `/queue collect` as a standalone command (per-session) or set `messages.queue.byChannel.discord: "collect"`.
+
+Defaults (when unset in config):
+
+- All surfaces → `collect`
+
+Configure globally or per channel via `messages.queue`:
+
+```json5
+{
+  messages: {
+    queue: {
+      mode: "collect",
+      debounceMs: 1000,
+      cap: 20,
+      drop: "summarize",
+      byChannel: { discord: "collect" },
+    },
+  },
+}
+```
+
+## Queue options
+
+Options apply to `followup`, `collect`, and `steer-backlog` (and to `steer` when it falls back to followup):
+
+- `debounceMs`: wait for quiet before starting a followup turn (prevents “continue, continue”).
+- `cap`: max queued messages per session.
+- `drop`: overflow policy (`old`, `new`, `summarize`).
+
+Summarize keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt.
+Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`.
+
+## Per-session overrides
+
+- Send `/queue <mode>` as a standalone command to store the mode for the current session.
+- Options can be combined: `/queue collect debounce:2s cap:25 drop:summarize`
+- `/queue default` or `/queue reset` clears the session override.
+
+## Scope and guarantees
+
+- Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
+- Default lane (`main`) is process-wide for inbound + main heartbeats; set `agents.defaults.maxConcurrent` to allow multiple sessions in parallel.
+- Additional lanes may exist (e.g. `cron`, `subagent`) so background jobs can run in parallel without blocking inbound replies.
+- Per-session lanes guarantee that only one agent run touches a given session at a time.
+- No external dependencies or background worker threads; pure TypeScript + promises.
+
+## Troubleshooting
+
+- If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
+- If you need queue depth, enable verbose logs and watch for queue timing lines.
--- a/openclaw/docs/concepts/retry.md
+++ b/openclaw/docs/concepts/retry.md
@@ -0,0 +1,69 @@
+---
+summary: "Retry policy for outbound provider calls"
+read_when:
+  - Updating provider retry behavior or defaults
+  - Debugging provider send errors or rate limits
+title: "Retry Policy"
+---
+
+# Retry policy
+
+## Goals
+
+- Retry per HTTP request, not per multi-step flow.
+- Preserve ordering by retrying only the current step.
+- Avoid duplicating non-idempotent operations.
+
+## Defaults
+
+- Attempts: 3
+- Max delay cap: 30000 ms
+- Jitter: 0.1 (10 percent)
+- Provider defaults:
+  - Telegram min delay: 400 ms
+  - Discord min delay: 500 ms
+
+## Behavior
+
+### Discord
+
+- Retries only on rate-limit errors (HTTP 429).
+- Uses Discord `retry_after` when available, otherwise exponential backoff.
+
+### Telegram
+
+- Retries on transient errors (429, timeout, connect/reset/closed, temporarily unavailable).
+- Uses `retry_after` when available, otherwise exponential backoff.
+- Markdown parse errors are not retried; they fall back to plain text.
+
+## Configuration
+
+Set retry policy per provider in `~/.openclaw/openclaw.json`:
+
+```json5
+{
+  channels: {
+    telegram: {
+      retry: {
+        attempts: 3,
+        minDelayMs: 400,
+        maxDelayMs: 30000,
+        jitter: 0.1,
+      },
+    },
+    discord: {
+      retry: {
+        attempts: 3,
+        minDelayMs: 500,
+        maxDelayMs: 30000,
+        jitter: 0.1,
+      },
+    },
+  },
+}
+```
+
+## Notes
+
+- Retries apply per request (message send, media upload, reaction, poll, sticker).
+- Composite flows do not retry completed steps.
--- a/openclaw/docs/concepts/session-pruning.md
+++ b/openclaw/docs/concepts/session-pruning.md
@@ -0,0 +1,121 @@
+---
+title: "Session Pruning"
+summary: "Session pruning: tool-result trimming to reduce context bloat"
+read_when:
+  - You want to reduce LLM context growth from tool outputs
+  - You are tuning agents.defaults.contextPruning
+---
+
+# Session Pruning
+
+Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).
+
+## When it runs
+
+- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
+- Only affects the messages sent to the model for that request.
+- Only active for Anthropic API calls (and OpenRouter Anthropic models).
+- For best results, match `ttl` to your model `cacheRetention` policy (`short` = 5m, `long` = 1h).
+- After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again.
+
+## Smart defaults (Anthropic)
+
+- **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`.
+- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheRetention: "short"` on Anthropic models.
+- If you set any of these values explicitly, OpenClaw does **not** override them.
+
+## What this improves (cost + cache behavior)
+
+- **Why prune:** Anthropic prompt caching only applies within the TTL. If a session goes idle past the TTL, the next request re-caches the full prompt unless you trim it first.
+- **What gets cheaper:** pruning reduces the **cacheWrite** size for that first request after the TTL expires.
+- **Why the TTL reset matters:** once pruning runs, the cache window resets, so follow‑up requests can reuse the freshly cached prompt instead of re-caching the full history again.
+- **What it does not do:** pruning doesn’t add tokens or “double” costs; it only changes what gets cached on that first post‑TTL request.
+
+## What can be pruned
+
+- Only `toolResult` messages.
+- User + assistant messages are **never** modified.
+- The last `keepLastAssistants` assistant messages are protected; tool results after that cutoff are not pruned.
+- If there aren’t enough assistant messages to establish the cutoff, pruning is skipped.
+- Tool results containing **image blocks** are skipped (never trimmed/cleared).
+
+## Context window estimation
+
+Pruning uses an estimated context window (chars ≈ tokens × 4). The base window is resolved in this order:
+
+1. `models.providers.*.models[].contextWindow` override.
+2. Model definition `contextWindow` (from the model registry).
+3. Default `200000` tokens.
+
+If `agents.defaults.contextTokens` is set, it is treated as a cap (min) on the resolved window.
+
+## Mode
+
+### cache-ttl
+
+- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
+- When it runs: same soft-trim + hard-clear behavior as before.
+
+## Soft vs hard pruning
+
+- **Soft-trim**: only for oversized tool results.
+  - Keeps head + tail, inserts `...`, and appends a note with the original size.
+  - Skips results with image blocks.
+- **Hard-clear**: replaces the entire tool result with `hardClear.placeholder`.
+
+## Tool selection
+
+- `tools.allow` / `tools.deny` support `*` wildcards.
+- Deny wins.
+- Matching is case-insensitive.
+- Empty allow list => all tools allowed.
+
+## Interaction with other limits
+
+- Built-in tools already truncate their own output; session pruning is an extra layer that prevents long-running chats from accumulating too much tool output in the model context.
+- Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).
+
+## Defaults (when enabled)
+
+- `ttl`: `"5m"`
+- `keepLastAssistants`: `3`
+- `softTrimRatio`: `0.3`
+- `hardClearRatio`: `0.5`
+- `minPrunableToolChars`: `50000`
+- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }`
+- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
+
+## Examples
+
+Default (off):
+
+```json5
+{
+  agents: { defaults: { contextPruning: { mode: "off" } } },
+}
+```
+
+Enable TTL-aware pruning:
+
+```json5
+{
+  agents: { defaults: { contextPruning: { mode: "cache-ttl", ttl: "5m" } } },
+}
+```
+
+Restrict pruning to specific tools:
+
+```json5
+{
+  agents: {
+    defaults: {
+      contextPruning: {
+        mode: "cache-ttl",
+        tools: { allow: ["exec", "read"], deny: ["*image*"] },
+      },
+    },
+  },
+}
+```
+
+See config reference: [Gateway Configuration](/gateway/configuration)
--- a/openclaw/docs/concepts/session-tool.md
+++ b/openclaw/docs/concepts/session-tool.md
@@ -0,0 +1,219 @@
+---
+summary: "Agent session tools for listing sessions, fetching history, and sending cross-session messages"
+read_when:
+  - Adding or modifying session tools
+title: "Session Tools"
+---
+
+# Session Tools
+
+Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history, and send to another session.
+
+## Tool Names
+
+- `sessions_list`
+- `sessions_history`
+- `sessions_send`
+- `sessions_spawn`
+
+## Key Model
+
+- Main direct chat bucket is always the literal key `"main"` (resolved to the current agent’s main key).
+- Group chats use `agent:<agentId>:<channel>:group:<id>` or `agent:<agentId>:<channel>:channel:<id>` (pass the full key).
+- Cron jobs use `cron:<job.id>`.
+- Hooks use `hook:<uuid>` unless explicitly set.
+- Node sessions use `node-<nodeId>` unless explicitly set.
+
+`global` and `unknown` are reserved values and are never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
+
+## sessions_list
+
+List sessions as an array of rows.
+
+Parameters:
+
+- `kinds?: string[]` filter: any of `"main" | "group" | "cron" | "hook" | "node" | "other"`
+- `limit?: number` max rows (default: server default, clamp e.g. 200)
+- `activeMinutes?: number` only sessions updated within N minutes
+- `messageLimit?: number` 0 = no messages (default 0); >0 = include last N messages
+
+Behavior:
+
+- `messageLimit > 0` fetches `chat.history` per session and includes the last N messages.
+- Tool results are filtered out in list output; use `sessions_history` for tool messages.
+- When running in a **sandboxed** agent session, session tools default to **spawned-only visibility** (see below).
+
+Row shape (JSON):
+
+- `key`: session key (string)
+- `kind`: `main | group | cron | hook | node | other`
+- `channel`: `whatsapp | telegram | discord | signal | imessage | webchat | internal | unknown`
+- `displayName` (group display label if available)
+- `updatedAt` (ms)
+- `sessionId`
+- `model`, `contextTokens`, `totalTokens`
+- `thinkingLevel`, `verboseLevel`, `systemSent`, `abortedLastRun`
+- `sendPolicy` (session override if set)
+- `lastChannel`, `lastTo`
+- `deliveryContext` (normalized `{ channel, to, accountId }` when available)
+- `transcriptPath` (best-effort path derived from store dir + sessionId)
+- `messages?` (only when `messageLimit > 0`)
+
+## sessions_history
+
+Fetch transcript for one session.
+
+Parameters:
+
+- `sessionKey` (required; accepts session key or `sessionId` from `sessions_list`)
+- `limit?: number` max messages (server clamps)
+- `includeTools?: boolean` (default false)
+
+Behavior:
+
+- `includeTools=false` filters `role: "toolResult"` messages.
+- Returns messages array in the raw transcript format.
+- When given a `sessionId`, OpenClaw resolves it to the corresponding session key (missing ids error).
+
+## sessions_send
+
+Send a message into another session.
+
+Parameters:
+
+- `sessionKey` (required; accepts session key or `sessionId` from `sessions_list`)
+- `message` (required)
+- `timeoutSeconds?: number` (default >0; 0 = fire-and-forget)
+
+Behavior:
+
+- `timeoutSeconds = 0`: enqueue and return `{ runId, status: "accepted" }`.
+- `timeoutSeconds > 0`: wait up to N seconds for completion, then return `{ runId, status: "ok", reply }`.
+- If wait times out: `{ runId, status: "timeout", error }`. Run continues; call `sessions_history` later.
+- If the run fails: `{ runId, status: "error", error }`.
+- Announce delivery runs after the primary run completes and is best-effort; `status: "ok"` does not guarantee the announce was delivered.
+- Waits via gateway `agent.wait` (server-side) so reconnects don't drop the wait.
+- Agent-to-agent message context is injected for the primary run.
+- Inter-session messages are persisted with `message.provenance.kind = "inter_session"` so transcript readers can distinguish routed agent instructions from external user input.
+- After the primary run completes, OpenClaw runs a **reply-back loop**:
+  - Round 2+ alternates between requester and target agents.
+  - Reply exactly `REPLY_SKIP` to stop the ping‑pong.
+  - Max turns is `session.agentToAgent.maxPingPongTurns` (0–5, default 5).
+- Once the loop ends, OpenClaw runs the **agent‑to‑agent announce step** (target agent only):
+  - Reply exactly `ANNOUNCE_SKIP` to stay silent.
+  - Any other reply is sent to the target channel.
+  - Announce step includes the original request + round‑1 reply + latest ping‑pong reply.
+
+## Channel Field
+
+- For groups, `channel` is the channel recorded on the session entry.
+- For direct chats, `channel` maps from `lastChannel`.
+- For cron/hook/node, `channel` is `internal`.
+- If missing, `channel` is `unknown`.
+
+## Security / Send Policy
+
+Policy-based blocking by channel/chat type (not per session id).
+
+```json
+{
+  "session": {
+    "sendPolicy": {
+      "rules": [
+        {
+          "match": { "channel": "discord", "chatType": "group" },
+          "action": "deny"
+        }
+      ],
+      "default": "allow"
+    }
+  }
+}
+```
+
+Runtime override (per session entry):
+
+- `sendPolicy: "allow" | "deny"` (unset = inherit config)
+- Settable via `sessions.patch` or owner-only `/send on|off|inherit` (standalone message).
+
+Enforcement points:
+
+- `chat.send` / `agent` (gateway)
+- auto-reply delivery logic
+
+## sessions_spawn
+
+Spawn a sub-agent run in an isolated session and announce the result back to the requester chat channel.
+
+Parameters:
+
+- `task` (required)
+- `label?` (optional; used for logs/UI)
+- `agentId?` (optional; spawn under another agent id if allowed)
+- `model?` (optional; overrides the sub-agent model; invalid values error)
+- `thinking?` (optional; overrides thinking level for the sub-agent run)
+- `runTimeoutSeconds?` (defaults to `agents.defaults.subagents.runTimeoutSeconds` when set, otherwise `0`; when set, aborts the sub-agent run after N seconds)
+- `thread?` (default false; request thread-bound routing for this spawn when supported by the channel/plugin)
+- `mode?` (`run|session`; defaults to `run`, but defaults to `session` when `thread=true`; `mode="session"` requires `thread=true`)
+- `cleanup?` (`delete|keep`, default `keep`)
+
+Allowlist:
+
+- `agents.list[].subagents.allowAgents`: list of agent ids allowed via `agentId` (`["*"]` to allow any). Default: only the requester agent.
+
+Discovery:
+
+- Use `agents_list` to discover which agent ids are allowed for `sessions_spawn`.
+
+Behavior:
+
+- Starts a new `agent:<agentId>:subagent:<uuid>` session with `deliver: false`.
+- Sub-agents default to the full tool set **minus session tools** (configurable via `tools.subagents.tools`).
+- Sub-agents are not allowed to call `sessions_spawn` (no sub-agent → sub-agent spawning).
+- Always non-blocking: returns `{ status: "accepted", runId, childSessionKey }` immediately.
+- With `thread=true`, channel plugins can bind delivery/routing to a thread target (Discord support is controlled by `session.threadBindings.*` and `channels.discord.threadBindings.*`).
+- After completion, OpenClaw runs a sub-agent **announce step** and posts the result to the requester chat channel.
+  - If the assistant final reply is empty, the latest `toolResult` from sub-agent history is included as `Result`.
+- Reply exactly `ANNOUNCE_SKIP` during the announce step to stay silent.
+- Announce replies are normalized to `Status`/`Result`/`Notes`; `Status` comes from runtime outcome (not model text).
+- Sub-agent sessions are auto-archived after `agents.defaults.subagents.archiveAfterMinutes` (default: 60).
+- Announce replies include a stats line (runtime, tokens, sessionKey/sessionId, transcript path, and optional cost).
+
+## Sandbox Session Visibility
+
+Session tools can be scoped to reduce cross-session access.
+
+Default behavior:
+
+- `tools.sessions.visibility` defaults to `tree` (current session + spawned subagent sessions).
+- For sandboxed sessions, `agents.defaults.sandbox.sessionToolsVisibility` can hard-clamp visibility.
+
+Config:
+
+```json5
+{
+  tools: {
+    sessions: {
+      // "self" | "tree" | "agent" | "all"
+      // default: "tree"
+      visibility: "tree",
+    },
+  },
+  agents: {
+    defaults: {
+      sandbox: {
+        // default: "spawned"
+        sessionToolsVisibility: "spawned", // or "all"
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- `self`: only the current session key.
+- `tree`: current session + sessions spawned by the current session.
+- `agent`: any session belonging to the current agent id.
+- `all`: any session (cross-agent access still requires `tools.agentToAgent`).
+- When a session is sandboxed and `sessionToolsVisibility="spawned"`, OpenClaw clamps visibility to `tree` even if you set `tools.sessions.visibility="all"`.
--- a/openclaw/docs/concepts/session.md
+++ b/openclaw/docs/concepts/session.md
@@ -0,0 +1,310 @@
+---
+summary: "Session management rules, keys, and persistence for chats"
+read_when:
+  - Modifying session handling or storage
+title: "Session Management"
+---
+
+# Session Management
+
+OpenClaw treats **one direct-chat session per agent** as primary. Direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`), while group/channel chats get their own keys. `session.mainKey` is honored.
+
+Use `session.dmScope` to control how **direct messages** are grouped:
+
+- `main` (default): all DMs share the main session for continuity.
+- `per-peer`: isolate by sender id across channels.
+- `per-channel-peer`: isolate by channel + sender (recommended for multi-user inboxes).
+- `per-account-channel-peer`: isolate by account + channel + sender (recommended for multi-account inboxes).
+  Use `session.identityLinks` to map provider-prefixed peer ids to a canonical identity so the same person shares a DM session across channels when using `per-peer`, `per-channel-peer`, or `per-account-channel-peer`.
+
+## Secure DM mode (recommended for multi-user setups)
+
+> **Security Warning:** If your agent can receive DMs from **multiple people**, you should strongly consider enabling secure DM mode. Without it, all users share the same conversation context, which can leak private information between users.
+
+**Example of the problem with default settings:**
+
+- Alice (`<SENDER_A>`) messages your agent about a private topic (for example, a medical appointment)
+- Bob (`<SENDER_B>`) messages your agent asking "What were we talking about?"
+- Because both DMs share the same session, the model may answer Bob using Alice's prior context.
+
+**The fix:** Set `dmScope` to isolate sessions per user:
+
+```json5
+// ~/.openclaw/openclaw.json
+{
+  session: {
+    // Secure DM mode: isolate DM context per channel + sender.
+    dmScope: "per-channel-peer",
+  },
+}
+```
+
+**When to enable this:**
+
+- You have pairing approvals for more than one sender
+- You use a DM allowlist with multiple entries
+- You set `dmPolicy: "open"`
+- Multiple phone numbers or accounts can message your agent
+
+Notes:
+
+- Default is `dmScope: "main"` for continuity (all DMs share the main session). This is fine for single-user setups.
+- Local CLI onboarding writes `session.dmScope: "per-channel-peer"` by default when unset (existing explicit values are preserved).
+- For multi-account inboxes on the same channel, prefer `per-account-channel-peer`.
+- If the same person contacts you on multiple channels, use `session.identityLinks` to collapse their DM sessions into one canonical identity.
+- You can verify your DM settings with `openclaw security audit` (see [security](/cli/security)).
+
+## Gateway is the source of truth
+
+All session state is **owned by the gateway** (the “master” OpenClaw). UI clients (macOS app, WebChat, etc.) must query the gateway for session lists and token counts instead of reading local files.
+
+- In **remote mode**, the session store you care about lives on the remote gateway host, not your Mac.
+- Token counts shown in UIs come from the gateway’s store fields (`inputTokens`, `outputTokens`, `totalTokens`, `contextTokens`). Clients do not parse JSONL transcripts to “fix up” totals.
+
+## Where state lives
+
+- On the **gateway host**:
+  - Store file: `~/.openclaw/agents/<agentId>/sessions/sessions.json` (per agent).
+- Transcripts: `~/.openclaw/agents/<agentId>/sessions/<SessionId>.jsonl` (Telegram topic sessions use `.../<SessionId>-topic-<threadId>.jsonl`).
+- The store is a map `sessionKey -> { sessionId, updatedAt, ... }`. Deleting entries is safe; they are recreated on demand.
+- Group entries may include `displayName`, `channel`, `subject`, `room`, and `space` to label sessions in UIs.
+- Session entries include `origin` metadata (label + routing hints) so UIs can explain where a session came from.
+- OpenClaw does **not** read legacy Pi/Tau session folders.
+
+## Maintenance
+
+OpenClaw applies session-store maintenance to keep `sessions.json` and transcript artifacts bounded over time.
+
+### Defaults
+
+- `session.maintenance.mode`: `warn`
+- `session.maintenance.pruneAfter`: `30d`
+- `session.maintenance.maxEntries`: `500`
+- `session.maintenance.rotateBytes`: `10mb`
+- `session.maintenance.resetArchiveRetention`: defaults to `pruneAfter` (`30d`)
+- `session.maintenance.maxDiskBytes`: unset (disabled)
+- `session.maintenance.highWaterBytes`: defaults to `80%` of `maxDiskBytes` when budgeting is enabled
+
+### How it works
+
+Maintenance runs during session-store writes, and you can trigger it on demand with `openclaw sessions cleanup`.
+
+- `mode: "warn"`: reports what would be evicted but does not mutate entries/transcripts.
+- `mode: "enforce"`: applies cleanup in this order:
+  1. prune stale entries older than `pruneAfter`
+  2. cap entry count to `maxEntries` (oldest first)
+  3. archive transcript files for removed entries that are no longer referenced
+  4. purge old `*.deleted.<timestamp>` and `*.reset.<timestamp>` archives by retention policy
+  5. rotate `sessions.json` when it exceeds `rotateBytes`
+  6. if `maxDiskBytes` is set, enforce disk budget toward `highWaterBytes` (oldest artifacts first, then oldest sessions)
+
+### Performance caveat for large stores
+
+Large session stores are common in high-volume setups. Maintenance work is write-path work, so very large stores can increase write latency.
+
+What increases cost most:
+
+- very high `session.maintenance.maxEntries` values
+- long `pruneAfter` windows that keep stale entries around
+- many transcript/archive artifacts in `~/.openclaw/agents/<agentId>/sessions/`
+- enabling disk budgets (`maxDiskBytes`) without reasonable pruning/cap limits
+
+What to do:
+
+- use `mode: "enforce"` in production so growth is bounded automatically
+- set both time and count limits (`pruneAfter` + `maxEntries`), not just one
+- set `maxDiskBytes` + `highWaterBytes` for hard upper bounds in large deployments
+- keep `highWaterBytes` meaningfully below `maxDiskBytes` (default is 80%)
+- run `openclaw sessions cleanup --dry-run --json` after config changes to verify projected impact before enforcing
+- for frequent active sessions, pass `--active-key` when running manual cleanup
+
+### Customize examples
+
+Use a conservative enforce policy:
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      pruneAfter: "45d",
+      maxEntries: 800,
+      rotateBytes: "20mb",
+      resetArchiveRetention: "14d",
+    },
+  },
+}
+```
+
+Enable a hard disk budget for the sessions directory:
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      maxDiskBytes: "1gb",
+      highWaterBytes: "800mb",
+    },
+  },
+}
+```
+
+Tune for larger installs (example):
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      pruneAfter: "14d",
+      maxEntries: 2000,
+      rotateBytes: "25mb",
+      maxDiskBytes: "2gb",
+      highWaterBytes: "1.6gb",
+    },
+  },
+}
+```
+
+Preview or force maintenance from CLI:
+
+```bash
+openclaw sessions cleanup --dry-run
+openclaw sessions cleanup --enforce
+```
+
+## Session pruning
+
+OpenClaw trims **old tool results** from the in-memory context right before LLM calls by default.
+This does **not** rewrite JSONL history. See [/concepts/session-pruning](/concepts/session-pruning).
+
+## Pre-compaction memory flush
+
+When a session nears auto-compaction, OpenClaw can run a **silent memory flush**
+turn that reminds the model to write durable notes to disk. This only runs when
+the workspace is writable. See [Memory](/concepts/memory) and
+[Compaction](/concepts/compaction).
+
+## Mapping transports → session keys
+
+- Direct chats follow `session.dmScope` (default `main`).
+  - `main`: `agent:<agentId>:<mainKey>` (continuity across devices/channels).
+    - Multiple phone numbers and channels can map to the same agent main key; they act as transports into one conversation.
+  - `per-peer`: `agent:<agentId>:dm:<peerId>`.
+  - `per-channel-peer`: `agent:<agentId>:<channel>:dm:<peerId>`.
+  - `per-account-channel-peer`: `agent:<agentId>:<channel>:<accountId>:dm:<peerId>` (accountId defaults to `default`).
+  - If `session.identityLinks` matches a provider-prefixed peer id (for example `telegram:123`), the canonical key replaces `<peerId>` so the same person shares a session across channels.
+- Group chats isolate state: `agent:<agentId>:<channel>:group:<id>` (rooms/channels use `agent:<agentId>:<channel>:channel:<id>`).
+  - Telegram forum topics append `:topic:<threadId>` to the group id for isolation.
+  - Legacy `group:<id>` keys are still recognized for migration.
+- Inbound contexts may still use `group:<id>`; the channel is inferred from `Provider` and normalized to the canonical `agent:<agentId>:<channel>:group:<id>` form.
+- Other sources:
+  - Cron jobs: `cron:<job.id>`
+  - Webhooks: `hook:<uuid>` (unless explicitly set by the hook)
+  - Node runs: `node-<nodeId>`
+
+## Lifecycle
+
+- Reset policy: sessions are reused until they expire, and expiry is evaluated on the next inbound message.
+- Daily reset: defaults to **4:00 AM local time on the gateway host**. A session is stale once its last update is earlier than the most recent daily reset time.
+- Idle reset (optional): `idleMinutes` adds a sliding idle window. When both daily and idle resets are configured, **whichever expires first** forces a new session.
+- Legacy idle-only: if you set `session.idleMinutes` without any `session.reset`/`resetByType` config, OpenClaw stays in idle-only mode for backward compatibility.
+- Per-type overrides (optional): `resetByType` lets you override the policy for `direct`, `group`, and `thread` sessions (thread = Slack/Discord threads, Telegram topics, Matrix threads when provided by the connector).
+- Per-channel overrides (optional): `resetByChannel` overrides the reset policy for a channel (applies to all session types for that channel and takes precedence over `reset`/`resetByType`).
+- Reset triggers: exact `/new` or `/reset` (plus any extras in `resetTriggers`) start a fresh session id and pass the remainder of the message through. `/new <model>` accepts a model alias, `provider/model`, or provider name (fuzzy match) to set the new session model. If `/new` or `/reset` is sent alone, OpenClaw runs a short “hello” greeting turn to confirm the reset.
+- Manual reset: delete specific keys from the store or remove the JSONL transcript; the next message recreates them.
+- Isolated cron jobs always mint a fresh `sessionId` per run (no idle reuse).
+
+## Send policy (optional)
+
+Block delivery for specific session types without listing individual ids.
+
+```json5
+{
+  session: {
+    sendPolicy: {
+      rules: [
+        { action: "deny", match: { channel: "discord", chatType: "group" } },
+        { action: "deny", match: { keyPrefix: "cron:" } },
+        // Match the raw session key (including the `agent:<id>:` prefix).
+        { action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
+      ],
+      default: "allow",
+    },
+  },
+}
+```
+
+Runtime override (owner only):
+
+- `/send on` → allow for this session
+- `/send off` → deny for this session
+- `/send inherit` → clear override and use config rules
+  Send these as standalone messages so they register.
+
+## Configuration (optional rename example)
+
+```json5
+// ~/.openclaw/openclaw.json
+{
+  session: {
+    scope: "per-sender", // keep group keys separate
+    dmScope: "main", // DM continuity (set per-channel-peer/per-account-channel-peer for shared inboxes)
+    identityLinks: {
+      alice: ["telegram:123456789", "discord:987654321012345678"],
+    },
+    reset: {
+      // Defaults: mode=daily, atHour=4 (gateway host local time).
+      // If you also set idleMinutes, whichever expires first wins.
+      mode: "daily",
+      atHour: 4,
+      idleMinutes: 120,
+    },
+    resetByType: {
+      thread: { mode: "daily", atHour: 4 },
+      direct: { mode: "idle", idleMinutes: 240 },
+      group: { mode: "idle", idleMinutes: 120 },
+    },
+    resetByChannel: {
+      discord: { mode: "idle", idleMinutes: 10080 },
+    },
+    resetTriggers: ["/new", "/reset"],
+    store: "~/.openclaw/agents/{agentId}/sessions/sessions.json",
+    mainKey: "main",
+  },
+}
+```
+
+## Inspecting
+
+- `openclaw status` — shows store path and recent sessions.
+- `openclaw sessions --json` — dumps every entry (filter with `--active <minutes>`).
+- `openclaw gateway call sessions.list --params '{}'` — fetch sessions from the running gateway (use `--url`/`--token` for remote gateway access).
+- Send `/status` as a standalone message in chat to see whether the agent is reachable, how much of the session context is used, current thinking/verbose toggles, and when your WhatsApp web creds were last refreshed (helps spot relink needs).
+- Send `/context list` or `/context detail` to see what’s in the system prompt and injected workspace files (and the biggest context contributors).
+- Send `/stop` (or standalone abort phrases like `stop`, `stop action`, `stop run`, `stop openclaw`) to abort the current run, clear queued followups for that session, and stop any sub-agent runs spawned from it (the reply includes the stopped count).
+- Send `/compact` (optional instructions) as a standalone message to summarize older context and free up window space. See [/concepts/compaction](/concepts/compaction).
+- JSONL transcripts can be opened directly to review full turns.
+
+## Tips
+
+- Keep the primary key dedicated to 1:1 traffic; let groups keep their own keys.
+- When automating cleanup, delete individual keys instead of the whole store to preserve context elsewhere.
+
+## Session origin metadata
+
+Each session entry records where it came from (best-effort) in `origin`:
+
+- `label`: human label (resolved from conversation label + group subject/channel)
+- `provider`: normalized channel id (including extensions)
+- `from`/`to`: raw routing ids from the inbound envelope
+- `accountId`: provider account id (when multi-account)
+- `threadId`: thread/topic id when the channel supports it
+  The origin fields are populated for direct messages, channels, and groups. If a
+  connector only updates delivery routing (for example, to keep a DM main session
+  fresh), it should still provide inbound context so the session keeps its
+  explainer metadata. Extensions can do this by sending `ConversationLabel`,
+  `GroupSubject`, `GroupChannel`, `GroupSpace`, and `SenderName` in the inbound
+  context and calling `recordSessionMetaFromInbound` (or passing the same context
+  to `updateLastRoute`).
--- a/openclaw/docs/concepts/sessions.md
+++ b/openclaw/docs/concepts/sessions.md
@@ -0,0 +1,10 @@
+---
+summary: "Alias for session management docs"
+read_when:
+  - You looked for docs/concepts/sessions.md; canonical doc lives in docs/concepts/session.md
+title: "Sessions"
+---
+
+# Sessions
+
+Canonical session management docs live in [Session management](/concepts/session).
--- a/openclaw/docs/concepts/streaming.md
+++ b/openclaw/docs/concepts/streaming.md
@@ -0,0 +1,155 @@
+---
+summary: "Streaming + chunking behavior (block replies, channel preview streaming, mode mapping)"
+read_when:
+  - Explaining how streaming or chunking works on channels
+  - Changing block streaming or channel chunking behavior
+  - Debugging duplicate/early block replies or channel preview streaming
+title: "Streaming and Chunking"
+---
+
+# Streaming + chunking
+
+OpenClaw has two separate streaming layers:
+
+- **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).
+- **Preview streaming (Telegram/Discord/Slack):** update a temporary **preview message** while generating.
+
+There is **no true token-delta streaming** to channel messages today. Preview streaming is message-based (send + edits/appends).
+
+## Block streaming (channel messages)
+
+Block streaming sends assistant output in coarse chunks as it becomes available.
+
+```
+Model output
+  └─ text_delta/events
+       ├─ (blockStreamingBreak=text_end)
+       │    └─ chunker emits blocks as buffer grows
+       └─ (blockStreamingBreak=message_end)
+            └─ chunker flushes at message_end
+                   └─ channel send (block replies)
+```
+
+Legend:
+
+- `text_delta/events`: model stream events (may be sparse for non-streaming models).
+- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
+- `channel send`: actual outbound messages (block replies).
+
+**Controls:**
+
+- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
+- Channel overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per channel.
+- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
+- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
+- `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).
+- Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).
+- Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).
+- Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.
+
+**Boundary semantics:**
+
+- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
+- `message_end`: wait until assistant message finishes, then flush buffered output.
+
+`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
+
+## Chunking algorithm (low/high bounds)
+
+Block chunking is implemented by `EmbeddedBlockChunker`:
+
+- **Low bound:** don’t emit until buffer >= `minChars` (unless forced).
+- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
+- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.
+- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.
+
+`maxChars` is clamped to the channel `textChunkLimit`, so you can’t exceed per-channel caps.
+
+## Coalescing (merge streamed blocks)
+
+When block streaming is enabled, OpenClaw can **merge consecutive block chunks**
+before sending them out. This reduces “single-line spam” while still providing
+progressive output.
+
+- Coalescing waits for **idle gaps** (`idleMs`) before flushing.
+- Buffers are capped by `maxChars` and will flush if they exceed it.
+- `minChars` prevents tiny fragments from sending until enough text accumulates
+  (final flush always sends remaining text).
+- Joiner is derived from `blockStreamingChunk.breakPreference`
+  (`paragraph` → `\n\n`, `newline` → `\n`, `sentence` → space).
+- Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
+- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.
+
+## Human-like pacing between blocks
+
+When block streaming is enabled, you can add a **randomized pause** between
+block replies (after the first block). This makes multi-bubble responses feel
+more natural.
+
+- Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).
+- Modes: `off` (default), `natural` (800–2500ms), `custom` (`minMs`/`maxMs`).
+- Applies only to **block replies**, not final replies or tool summaries.
+
+## “Stream chunks or everything”
+
+This maps to:
+
+- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.
+- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
+- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
+
+**Channel note:** Block streaming is **off unless**
+`*.blockStreaming` is explicitly set to `true`. Channels can stream a live preview
+(`channels.<channel>.streaming`) without block replies.
+
+Config location reminder: the `blockStreaming*` defaults live under
+`agents.defaults`, not the root config.
+
+## Preview streaming modes
+
+Canonical key: `channels.<channel>.streaming`
+
+Modes:
+
+- `off`: disable preview streaming.
+- `partial`: single preview that is replaced with latest text.
+- `block`: preview updates in chunked/appended steps.
+- `progress`: progress/status preview during generation, final answer at completion.
+
+### Channel mapping
+
+| Channel  | `off` | `partial` | `block` | `progress`        |
+| -------- | ----- | --------- | ------- | ----------------- |
+| Telegram | ✅    | ✅        | ✅      | maps to `partial` |
+| Discord  | ✅    | ✅        | ✅      | maps to `partial` |
+| Slack    | ✅    | ✅        | ✅      | ✅                |
+
+Slack-only:
+
+- `channels.slack.nativeStreaming` toggles Slack native streaming API calls when `streaming=partial` (default: `true`).
+
+Legacy key migration:
+
+- Telegram: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
+- Discord: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
+- Slack: `streamMode` auto-migrates to `streaming` enum; boolean `streaming` auto-migrates to `nativeStreaming`.
+
+### Runtime behavior
+
+Telegram:
+
+- Uses Bot API `sendMessage` + `editMessageText`.
+- Preview streaming is skipped when Telegram block streaming is explicitly enabled (to avoid double-streaming).
+- `/reasoning stream` can write reasoning to preview.
+
+Discord:
+
+- Uses send + edit preview messages.
+- `block` mode uses draft chunking (`draftChunk`).
+- Preview streaming is skipped when Discord block streaming is explicitly enabled.
+
+Slack:
+
+- `partial` can use Slack native streaming (`chat.startStream`/`append`/`stop`) when available.
+- `block` uses append-style draft previews.
+- `progress` uses status preview text, then final answer.
--- a/openclaw/docs/concepts/system-prompt.md
+++ b/openclaw/docs/concepts/system-prompt.md
@@ -0,0 +1,129 @@
+---
+summary: "What the OpenClaw system prompt contains and how it is assembled"
+read_when:
+  - Editing system prompt text, tools list, or time/heartbeat sections
+  - Changing workspace bootstrap or skills injection behavior
+title: "System Prompt"
+---
+
+# System Prompt
+
+OpenClaw builds a custom system prompt for every agent run. The prompt is **OpenClaw-owned** and does not use the pi-coding-agent default prompt.
+
+The prompt is assembled by OpenClaw and injected into each agent run.
+
+## Structure
+
+The prompt is intentionally compact and uses fixed sections:
+
+- **Tooling**: current tool list + short descriptions.
+- **Safety**: short guardrail reminder to avoid power-seeking behavior or bypassing oversight.
+- **Skills** (when available): tells the model how to load skill instructions on demand.
+- **OpenClaw Self-Update**: how to run `config.apply` and `update.run`.
+- **Workspace**: working directory (`agents.defaults.workspace`).
+- **Documentation**: local path to OpenClaw docs (repo or npm package) and when to read them.
+- **Workspace Files (injected)**: indicates bootstrap files are included below.
+- **Sandbox** (when enabled): indicates sandboxed runtime, sandbox paths, and whether elevated exec is available.
+- **Current Date & Time**: user-local time, timezone, and time format.
+- **Reply Tags**: optional reply tag syntax for supported providers.
+- **Heartbeats**: heartbeat prompt and ack behavior.
+- **Runtime**: host, OS, node, model, repo root (when detected), thinking level (one line).
+- **Reasoning**: current visibility level + /reasoning toggle hint.
+
+Safety guardrails in the system prompt are advisory. They guide model behavior but do not enforce policy. Use tool policy, exec approvals, sandboxing, and channel allowlists for hard enforcement; operators can disable these by design.
+
+## Prompt modes
+
+OpenClaw can render smaller system prompts for sub-agents. The runtime sets a
+`promptMode` for each run (not a user-facing config):
+
+- `full` (default): includes all sections above.
+- `minimal`: used for sub-agents; omits **Skills**, **Memory Recall**, **OpenClaw
+  Self-Update**, **Model Aliases**, **User Identity**, **Reply Tags**,
+  **Messaging**, **Silent Replies**, and **Heartbeats**. Tooling, **Safety**,
+  Workspace, Sandbox, Current Date & Time (when known), Runtime, and injected
+  context stay available.
+- `none`: returns only the base identity line.
+
+When `promptMode=minimal`, extra injected prompts are labeled **Subagent
+Context** instead of **Group Chat Context**.
+
+## Workspace bootstrap injection
+
+Bootstrap files are trimmed and appended under **Project Context** so the model sees identity and profile context without needing explicit reads:
+
+- `AGENTS.md`
+- `SOUL.md`
+- `TOOLS.md`
+- `IDENTITY.md`
+- `USER.md`
+- `HEARTBEAT.md`
+- `BOOTSTRAP.md` (only on brand-new workspaces)
+- `MEMORY.md` and/or `memory.md` (when present in the workspace; either or both may be injected)
+
+All of these files are **injected into the context window** on every turn, which
+means they consume tokens. Keep them concise — especially `MEMORY.md`, which can
+grow over time and lead to unexpectedly high context usage and more frequent
+compaction.
+
+> **Note:** `memory/*.md` daily files are **not** injected automatically. They
+> are accessed on demand via the `memory_search` and `memory_get` tools, so they
+> do not count against the context window unless the model explicitly reads them.
+
+Large files are truncated with a marker. The max per-file size is controlled by
+`agents.defaults.bootstrapMaxChars` (default: 20000). Total injected bootstrap
+content across files is capped by `agents.defaults.bootstrapTotalMaxChars`
+(default: 150000). Missing files inject a short missing-file marker.
+
+Sub-agent sessions only inject `AGENTS.md` and `TOOLS.md` (other bootstrap files
+are filtered out to keep the sub-agent context small).
+
+Internal hooks can intercept this step via `agent:bootstrap` to mutate or replace
+the injected bootstrap files (for example swapping `SOUL.md` for an alternate persona).
+
+To inspect how much each injected file contributes (raw vs injected, truncation, plus tool schema overhead), use `/context list` or `/context detail`. See [Context](/concepts/context).
+
+## Time handling
+
+The system prompt includes a dedicated **Current Date & Time** section when the
+user timezone is known. To keep the prompt cache-stable, it now only includes
+the **time zone** (no dynamic clock or time format).
+
+Use `session_status` when the agent needs the current time; the status card
+includes a timestamp line.
+
+Configure with:
+
+- `agents.defaults.userTimezone`
+- `agents.defaults.timeFormat` (`auto` | `12` | `24`)
+
+See [Date & Time](/date-time) for full behavior details.
+
+## Skills
+
+When eligible skills exist, OpenClaw injects a compact **available skills list**
+(`formatSkillsForPrompt`) that includes the **file path** for each skill. The
+prompt instructs the model to use `read` to load the SKILL.md at the listed
+location (workspace, managed, or bundled). If no skills are eligible, the
+Skills section is omitted.
+
+```
+<available_skills>
+  <skill>
+    <name>...</name>
+    <description>...</description>
+    <location>...</location>
+  </skill>
+</available_skills>
+```
+
+This keeps the base prompt small while still enabling targeted skill usage.
+
+## Documentation
+
+When available, the system prompt includes a **Documentation** section that points to the
+local OpenClaw docs directory (either `docs/` in the repo workspace or the bundled npm
+package docs) and also notes the public mirror, source repo, community Discord, and
+ClawHub ([https://clawhub.com](https://clawhub.com)) for skills discovery. The prompt instructs the model to consult local docs first
+for OpenClaw behavior, commands, configuration, or architecture, and to run
+`openclaw status` itself when possible (asking the user only when it lacks access).
--- a/openclaw/docs/concepts/timezone.md
+++ b/openclaw/docs/concepts/timezone.md
@@ -0,0 +1,91 @@
+---
+summary: "Timezone handling for agents, envelopes, and prompts"
+read_when:
+  - You need to understand how timestamps are normalized for the model
+  - Configuring the user timezone for system prompts
+title: "Timezones"
+---
+
+# Timezones
+
+OpenClaw standardizes timestamps so the model sees a **single reference time**.
+
+## Message envelopes (local by default)
+
+Inbound messages are wrapped in an envelope like:
+
+```
+[Provider ... 2026-01-05 16:26 PST] message text
+```
+
+The timestamp in the envelope is **host-local by default**, with minutes precision.
+
+You can override this with:
+
+```json5
+{
+  agents: {
+    defaults: {
+      envelopeTimezone: "local", // "utc" | "local" | "user" | IANA timezone
+      envelopeTimestamp: "on", // "on" | "off"
+      envelopeElapsed: "on", // "on" | "off"
+    },
+  },
+}
+```
+
+- `envelopeTimezone: "utc"` uses UTC.
+- `envelopeTimezone: "user"` uses `agents.defaults.userTimezone` (falls back to host timezone).
+- Use an explicit IANA timezone (e.g., `"Europe/Vienna"`) for a fixed offset.
+- `envelopeTimestamp: "off"` removes absolute timestamps from envelope headers.
+- `envelopeElapsed: "off"` removes elapsed time suffixes (the `+2m` style).
+
+### Examples
+
+**Local (default):**
+
+```
+[Signal Alice +1555 2026-01-18 00:19 PST] hello
+```
+
+**Fixed timezone:**
+
+```
+[Signal Alice +1555 2026-01-18 06:19 GMT+1] hello
+```
+
+**Elapsed time:**
+
+```
+[Signal Alice +1555 +2m 2026-01-18T05:19Z] follow-up
+```
+
+## Tool payloads (raw provider data + normalized fields)
+
+Tool calls (`channels.discord.readMessages`, `channels.slack.readMessages`, etc.) return **raw provider timestamps**.
+We also attach normalized fields for consistency:
+
+- `timestampMs` (UTC epoch milliseconds)
+- `timestampUtc` (ISO 8601 UTC string)
+
+Raw provider fields are preserved.
+
+## User timezone for the system prompt
+
+Set `agents.defaults.userTimezone` to tell the model the user's local time zone. If it is
+unset, OpenClaw resolves the **host timezone at runtime** (no config write).
+
+```json5
+{
+  agents: { defaults: { userTimezone: "America/Chicago" } },
+}
+```
+
+The system prompt includes:
+
+- `Current Date & Time` section with local time and timezone
+- `Time format: 12-hour` or `24-hour`
+
+You can control the prompt format with `agents.defaults.timeFormat` (`auto` | `12` | `24`).
+
+See [Date & Time](/date-time) for the full behavior and examples.
--- a/openclaw/docs/concepts/typebox.md
+++ b/openclaw/docs/concepts/typebox.md
@@ -0,0 +1,289 @@
+---
+summary: "TypeBox schemas as the single source of truth for the gateway protocol"
+read_when:
+  - Updating protocol schemas or codegen
+title: "TypeBox"
+---
+
+# TypeBox as protocol source of truth
+
+Last updated: 2026-01-10
+
+TypeBox is a TypeScript-first schema library. We use it to define the **Gateway
+WebSocket protocol** (handshake, request/response, server events). Those schemas
+drive **runtime validation**, **JSON Schema export**, and **Swift codegen** for
+the macOS app. One source of truth; everything else is generated.
+
+If you want the higher-level protocol context, start with
+[Gateway architecture](/concepts/architecture).
+
+## Mental model (30 seconds)
+
+Every Gateway WS message is one of three frames:
+
+- **Request**: `{ type: "req", id, method, params }`
+- **Response**: `{ type: "res", id, ok, payload | error }`
+- **Event**: `{ type: "event", event, payload, seq?, stateVersion? }`
+
+The first frame **must** be a `connect` request. After that, clients can call
+methods (e.g. `health`, `send`, `chat.send`) and subscribe to events (e.g.
+`presence`, `tick`, `agent`).
+
+Connection flow (minimal):
+
+```
+Client                    Gateway
+  |---- req:connect -------->|
+  |<---- res:hello-ok --------|
+  |<---- event:tick ----------|
+  |---- req:health ---------->|
+  |<---- res:health ----------|
+```
+
+Common methods + events:
+
+| Category  | Examples                                                  | Notes                              |
+| --------- | --------------------------------------------------------- | ---------------------------------- |
+| Core      | `connect`, `health`, `status`                             | `connect` must be first            |
+| Messaging | `send`, `poll`, `agent`, `agent.wait`                     | side-effects need `idempotencyKey` |
+| Chat      | `chat.history`, `chat.send`, `chat.abort`, `chat.inject`  | WebChat uses these                 |
+| Sessions  | `sessions.list`, `sessions.patch`, `sessions.delete`      | session admin                      |
+| Nodes     | `node.list`, `node.invoke`, `node.pair.*`                 | Gateway WS + node actions          |
+| Events    | `tick`, `presence`, `agent`, `chat`, `health`, `shutdown` | server push                        |
+
+Authoritative list lives in `src/gateway/server.ts` (`METHODS`, `EVENTS`).
+
+## Where the schemas live
+
+- Source: `src/gateway/protocol/schema.ts`
+- Runtime validators (AJV): `src/gateway/protocol/index.ts`
+- Server handshake + method dispatch: `src/gateway/server.ts`
+- Node client: `src/gateway/client.ts`
+- Generated JSON Schema: `dist/protocol.schema.json`
+- Generated Swift models: `apps/macos/Sources/OpenClawProtocol/GatewayModels.swift`
+
+## Current pipeline
+
+- `pnpm protocol:gen`
+  - writes JSON Schema (draft‑07) to `dist/protocol.schema.json`
+- `pnpm protocol:gen:swift`
+  - generates Swift gateway models
+- `pnpm protocol:check`
+  - runs both generators and verifies the output is committed
+
+## How the schemas are used at runtime
+
+- **Server side**: every inbound frame is validated with AJV. The handshake only
+  accepts a `connect` request whose params match `ConnectParams`.
+- **Client side**: the JS client validates event and response frames before
+  using them.
+- **Method surface**: the Gateway advertises the supported `methods` and
+  `events` in `hello-ok`.
+
+## Example frames
+
+Connect (first message):
+
+```json
+{
+  "type": "req",
+  "id": "c1",
+  "method": "connect",
+  "params": {
+    "minProtocol": 2,
+    "maxProtocol": 2,
+    "client": {
+      "id": "openclaw-macos",
+      "displayName": "macos",
+      "version": "1.0.0",
+      "platform": "macos 15.1",
+      "mode": "ui",
+      "instanceId": "A1B2"
+    }
+  }
+}
+```
+
+Hello-ok response:
+
+```json
+{
+  "type": "res",
+  "id": "c1",
+  "ok": true,
+  "payload": {
+    "type": "hello-ok",
+    "protocol": 2,
+    "server": { "version": "dev", "connId": "ws-1" },
+    "features": { "methods": ["health"], "events": ["tick"] },
+    "snapshot": {
+      "presence": [],
+      "health": {},
+      "stateVersion": { "presence": 0, "health": 0 },
+      "uptimeMs": 0
+    },
+    "policy": { "maxPayload": 1048576, "maxBufferedBytes": 1048576, "tickIntervalMs": 30000 }
+  }
+}
+```
+
+Request + response:
+
+```json
+{ "type": "req", "id": "r1", "method": "health" }
+```
+
+```json
+{ "type": "res", "id": "r1", "ok": true, "payload": { "ok": true } }
+```
+
+Event:
+
+```json
+{ "type": "event", "event": "tick", "payload": { "ts": 1730000000 }, "seq": 12 }
+```
+
+## Minimal client (Node.js)
+
+Smallest useful flow: connect + health.
+
+```ts
+import { WebSocket } from "ws";
+
+const ws = new WebSocket("ws://127.0.0.1:18789");
+
+ws.on("open", () => {
+  ws.send(
+    JSON.stringify({
+      type: "req",
+      id: "c1",
+      method: "connect",
+      params: {
+        minProtocol: 3,
+        maxProtocol: 3,
+        client: {
+          id: "cli",
+          displayName: "example",
+          version: "dev",
+          platform: "node",
+          mode: "cli",
+        },
+      },
+    }),
+  );
+});
+
+ws.on("message", (data) => {
+  const msg = JSON.parse(String(data));
+  if (msg.type === "res" && msg.id === "c1" && msg.ok) {
+    ws.send(JSON.stringify({ type: "req", id: "h1", method: "health" }));
+  }
+  if (msg.type === "res" && msg.id === "h1") {
+    console.log("health:", msg.payload);
+    ws.close();
+  }
+});
+```
+
+## Worked example: add a method end‑to‑end
+
+Example: add a new `system.echo` request that returns `{ ok: true, text }`.
+
+1. **Schema (source of truth)**
+
+Add to `src/gateway/protocol/schema.ts`:
+
+```ts
+export const SystemEchoParamsSchema = Type.Object(
+  { text: NonEmptyString },
+  { additionalProperties: false },
+);
+
+export const SystemEchoResultSchema = Type.Object(
+  { ok: Type.Boolean(), text: NonEmptyString },
+  { additionalProperties: false },
+);
+```
+
+Add both to `ProtocolSchemas` and export types:
+
+```ts
+  SystemEchoParams: SystemEchoParamsSchema,
+  SystemEchoResult: SystemEchoResultSchema,
+```
+
+```ts
+export type SystemEchoParams = Static<typeof SystemEchoParamsSchema>;
+export type SystemEchoResult = Static<typeof SystemEchoResultSchema>;
+```
+
+2. **Validation**
+
+In `src/gateway/protocol/index.ts`, export an AJV validator:
+
+```ts
+export const validateSystemEchoParams = ajv.compile<SystemEchoParams>(SystemEchoParamsSchema);
+```
+
+3. **Server behavior**
+
+Add a handler in `src/gateway/server-methods/system.ts`:
+
+```ts
+export const systemHandlers: GatewayRequestHandlers = {
+  "system.echo": ({ params, respond }) => {
+    const text = String(params.text ?? "");
+    respond(true, { ok: true, text });
+  },
+};
+```
+
+Register it in `src/gateway/server-methods.ts` (already merges `systemHandlers`),
+then add `"system.echo"` to `METHODS` in `src/gateway/server.ts`.
+
+4. **Regenerate**
+
+```bash
+pnpm protocol:check
+```
+
+5. **Tests + docs**
+
+Add a server test in `src/gateway/server.*.test.ts` and note the method in docs.
+
+## Swift codegen behavior
+
+The Swift generator emits:
+
+- `GatewayFrame` enum with `req`, `res`, `event`, and `unknown` cases
+- Strongly typed payload structs/enums
+- `ErrorCode` values and `GATEWAY_PROTOCOL_VERSION`
+
+Unknown frame types are preserved as raw payloads for forward compatibility.
+
+## Versioning + compatibility
+
+- `PROTOCOL_VERSION` lives in `src/gateway/protocol/schema.ts`.
+- Clients send `minProtocol` + `maxProtocol`; the server rejects mismatches.
+- The Swift models keep unknown frame types to avoid breaking older clients.
+
+## Schema patterns and conventions
+
+- Most objects use `additionalProperties: false` for strict payloads.
+- `NonEmptyString` is the default for IDs and method/event names.
+- The top-level `GatewayFrame` uses a **discriminator** on `type`.
+- Methods with side effects usually require an `idempotencyKey` in params
+  (example: `send`, `poll`, `agent`, `chat.send`).
+
+## Live schema JSON
+
+Generated JSON Schema is in the repo at `dist/protocol.schema.json`. The
+published raw file is typically available at:
+
+- [https://raw.githubusercontent.com/openclaw/openclaw/main/dist/protocol.schema.json](https://raw.githubusercontent.com/openclaw/openclaw/main/dist/protocol.schema.json)
+
+## When you change schemas
+
+1. Update the TypeBox schemas.
+2. Run `pnpm protocol:check`.
+3. Commit the regenerated schema + Swift models.
--- a/openclaw/docs/concepts/typing-indicators.md
+++ b/openclaw/docs/concepts/typing-indicators.md
@@ -0,0 +1,68 @@
+---
+summary: "When OpenClaw shows typing indicators and how to tune them"
+read_when:
+  - Changing typing indicator behavior or defaults
+title: "Typing Indicators"
+---
+
+# Typing indicators
+
+Typing indicators are sent to the chat channel while a run is active. Use
+`agents.defaults.typingMode` to control **when** typing starts and `typingIntervalSeconds`
+to control **how often** it refreshes.
+
+## Defaults
+
+When `agents.defaults.typingMode` is **unset**, OpenClaw keeps the legacy behavior:
+
+- **Direct chats**: typing starts immediately once the model loop begins.
+- **Group chats with a mention**: typing starts immediately.
+- **Group chats without a mention**: typing starts only when message text begins streaming.
+- **Heartbeat runs**: typing is disabled.
+
+## Modes
+
+Set `agents.defaults.typingMode` to one of:
+
+- `never` — no typing indicator, ever.
+- `instant` — start typing **as soon as the model loop begins**, even if the run
+  later returns only the silent reply token.
+- `thinking` — start typing on the **first reasoning delta** (requires
+  `reasoningLevel: "stream"` for the run).
+- `message` — start typing on the **first non-silent text delta** (ignores
+  the `NO_REPLY` silent token).
+
+Order of “how early it fires”:
+`never` → `message` → `thinking` → `instant`
+
+## Configuration
+
+```json5
+{
+  agent: {
+    typingMode: "thinking",
+    typingIntervalSeconds: 6,
+  },
+}
+```
+
+You can override mode or cadence per session:
+
+```json5
+{
+  session: {
+    typingMode: "message",
+    typingIntervalSeconds: 4,
+  },
+}
+```
+
+## Notes
+
+- `message` mode won’t show typing for silent-only replies (e.g. the `NO_REPLY`
+  token used to suppress output).
+- `thinking` only fires if the run streams reasoning (`reasoningLevel: "stream"`).
+  If the model doesn’t emit reasoning deltas, typing won’t start.
+- Heartbeats never show typing, regardless of mode.
+- `typingIntervalSeconds` controls the **refresh cadence**, not the start time.
+  The default is 6 seconds.
--- a/openclaw/docs/concepts/usage-tracking.md
+++ b/openclaw/docs/concepts/usage-tracking.md
@@ -0,0 +1,35 @@
+---
+summary: "Usage tracking surfaces and credential requirements"
+read_when:
+  - You are wiring provider usage/quota surfaces
+  - You need to explain usage tracking behavior or auth requirements
+title: "Usage Tracking"
+---
+
+# Usage tracking
+
+## What it is
+
+- Pulls provider usage/quota directly from their usage endpoints.
+- No estimated costs; only the provider-reported windows.
+
+## Where it shows up
+
+- `/status` in chats: emoji‑rich status card with session tokens + estimated cost (API key only). Provider usage shows for the **current model provider** when available.
+- `/usage off|tokens|full` in chats: per-response usage footer (OAuth shows tokens only).
+- `/usage cost` in chats: local cost summary aggregated from OpenClaw session logs.
+- CLI: `openclaw status --usage` prints a full per-provider breakdown.
+- CLI: `openclaw channels list` prints the same usage snapshot alongside provider config (use `--no-usage` to skip).
+- macOS menu bar: “Usage” section under Context (only if available).
+
+## Providers + credentials
+
+- **Anthropic (Claude)**: OAuth tokens in auth profiles.
+- **GitHub Copilot**: OAuth tokens in auth profiles.
+- **Gemini CLI**: OAuth tokens in auth profiles.
+- **Antigravity**: OAuth tokens in auth profiles.
+- **OpenAI Codex**: OAuth tokens in auth profiles (accountId used when present).
+- **MiniMax**: API key (coding plan key; `MINIMAX_CODE_PLAN_KEY` or `MINIMAX_API_KEY`); uses the 5‑hour coding plan window.
+- **z.ai**: API key via env/config/auth store.
+
+Usage is hidden if no matching OAuth/API credentials exist.