LetsBeBiz-Redesign/openclaw/docs/concepts/streaming.md

---
summary: "Streaming + chunking behavior (block replies, channel preview streaming, mode mapping)"
read_when:
  - Explaining how streaming or chunking works on channels
  - Changing block streaming or channel chunking behavior
  - Debugging duplicate/early block replies or channel preview streaming
title: "Streaming and Chunking"
---

# Streaming + chunking

OpenClaw has two separate streaming layers:

- **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).
- **Preview streaming (Telegram/Discord/Slack):** update a temporary **preview message** while generating.

There is **no true token-delta streaming** to channel messages today. Preview streaming is message-based (send + edits/appends).

## Block streaming (channel messages)

Block streaming sends assistant output in coarse chunks as it becomes available.

```
Model output
  └─ text_delta/events
       ├─ (blockStreamingBreak=text_end)
       │    └─ chunker emits blocks as buffer grows
       └─ (blockStreamingBreak=message_end)
            └─ chunker flushes at message_end
                   └─ channel send (block replies)
```

Legend:

- `text_delta/events`: model stream events (may be sparse for non-streaming models).
- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
- `channel send`: actual outbound messages (block replies).

**Controls:**

- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
- Channel overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per channel.
- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
- `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).
- Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).
- Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).
- Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.

**Boundary semantics:**

- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
- `message_end`: wait until assistant message finishes, then flush buffered output.

`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.

## Chunking algorithm (low/high bounds)

Block chunking is implemented by `EmbeddedBlockChunker`:

- **Low bound:** don’t emit until buffer >= `minChars` (unless forced).
- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.
- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.

`maxChars` is clamped to the channel `textChunkLimit`, so you can’t exceed per-channel caps.

## Coalescing (merge streamed blocks)

When block streaming is enabled, OpenClaw can **merge consecutive block chunks**
before sending them out. This reduces “single-line spam” while still providing
progressive output.

- Coalescing waits for **idle gaps** (`idleMs`) before flushing.
- Buffers are capped by `maxChars` and will flush if they exceed it.
- `minChars` prevents tiny fragments from sending until enough text accumulates
  (final flush always sends remaining text).
- Joiner is derived from `blockStreamingChunk.breakPreference`
  (`paragraph` → `\n\n`, `newline` → `\n`, `sentence` → space).
- Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.

## Human-like pacing between blocks

When block streaming is enabled, you can add a **randomized pause** between
block replies (after the first block). This makes multi-bubble responses feel
more natural.

- Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).
- Modes: `off` (default), `natural` (800–2500ms), `custom` (`minMs`/`maxMs`).
- Applies only to **block replies**, not final replies or tool summaries.

## “Stream chunks or everything”

This maps to:

- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.
- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).

**Channel note:** Block streaming is **off unless**
`*.blockStreaming` is explicitly set to `true`. Channels can stream a live preview
(`channels.<channel>.streaming`) without block replies.

Config location reminder: the `blockStreaming*` defaults live under
`agents.defaults`, not the root config.

## Preview streaming modes

Canonical key: `channels.<channel>.streaming`

Modes:

- `off`: disable preview streaming.
- `partial`: single preview that is replaced with latest text.
- `block`: preview updates in chunked/appended steps.
- `progress`: progress/status preview during generation, final answer at completion.

### Channel mapping

| Channel  | `off` | `partial` | `block` | `progress`        |
| -------- | ----- | --------- | ------- | ----------------- |
| Telegram | ✅    | ✅        | ✅      | maps to `partial` |
| Discord  | ✅    | ✅        | ✅      | maps to `partial` |
| Slack    | ✅    | ✅        | ✅      | ✅                |

Slack-only:

- `channels.slack.nativeStreaming` toggles Slack native streaming API calls when `streaming=partial` (default: `true`).

Legacy key migration:

- Telegram: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
- Discord: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
- Slack: `streamMode` auto-migrates to `streaming` enum; boolean `streaming` auto-migrates to `nativeStreaming`.

### Runtime behavior

Telegram:

- Uses Bot API `sendMessage` + `editMessageText`.
- Preview streaming is skipped when Telegram block streaming is explicitly enabled (to avoid double-streaming).
- `/reasoning stream` can write reasoning to preview.

Discord:

- Uses send + edit preview messages.
- `block` mode uses draft chunking (`draftChunk`).
- Preview streaming is skipped when Discord block streaming is explicitly enabled.

Slack:

- `partial` can use Slack native streaming (`chat.startStream`/`append`/`stop`) when available.
- `block` uses append-style draft previews.
- `progress` uses status preview text, then final answer.
-												Include full contents of all nested repositories

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

											
										
										
											2026-02-27 16:25:02 +01:00
+								---
 								summary: "Streaming + chunking behavior (block replies, channel preview streaming, mode mapping)"
 								read_when:
 								  - Explaining how streaming or chunking works on channels
 								  - Changing block streaming or channel chunking behavior
 								  - Debugging duplicate/early block replies or channel preview streaming
 								title: "Streaming and Chunking"
 								---
 								# Streaming + chunking
 								OpenClaw has two separate streaming layers:
 								- **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).
 								- **Preview streaming (Telegram/Discord/Slack):** update a temporary **preview message** while generating.
 								There is **no true token-delta streaming** to channel messages today. Preview streaming is message-based (send + edits/appends).
 								## Block streaming (channel messages)
 								Block streaming sends assistant output in coarse chunks as it becomes available.
 								```
 								Model output
 								  └─ text_delta/events
 								       ├─ (blockStreamingBreak=text_end)
 								       │    └─ chunker emits blocks as buffer grows
 								       └─ (blockStreamingBreak=message_end)
 								            └─ chunker flushes at message_end
 								                   └─ channel send (block replies)
 								```
 								Legend:
 								- `text_delta/events`: model stream events (may be sparse for non-streaming models).
 								- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
 								- `channel send`: actual outbound messages (block replies).
 								**Controls:**
 								- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
 								- Channel overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per channel.
 								- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
 								- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
 								- `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).
 								- Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).
 								- Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).
 								- Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.
 								**Boundary semantics:**
 								- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
 								- `message_end`: wait until assistant message finishes, then flush buffered output.
 								`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
 								## Chunking algorithm (low/high bounds)
 								Block chunking is implemented by `EmbeddedBlockChunker`:
 								- **Low bound:** don’t emit until buffer >= `minChars` (unless forced).
 								- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
 								- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.
 								- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.
 								`maxChars` is clamped to the channel `textChunkLimit`, so you can’t exceed per-channel caps.
 								## Coalescing (merge streamed blocks)
 								When block streaming is enabled, OpenClaw can **merge consecutive block chunks**
 								before sending them out. This reduces “single-line spam” while still providing
 								progressive output.
 								- Coalescing waits for **idle gaps** (`idleMs`) before flushing.
 								- Buffers are capped by `maxChars` and will flush if they exceed it.
 								- `minChars` prevents tiny fragments from sending until enough text accumulates
 								  (final flush always sends remaining text).
 								- Joiner is derived from `blockStreamingChunk.breakPreference`
 								  (`paragraph` → `\n\n`, `newline` → `\n`, `sentence` → space).
 								- Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
 								- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.
 								## Human-like pacing between blocks
 								When block streaming is enabled, you can add a **randomized pause** between
 								block replies (after the first block). This makes multi-bubble responses feel
 								more natural.
 								- Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).
 								- Modes: `off` (default), `natural` (800–2500ms), `custom` (`minMs`/`maxMs`).
 								- Applies only to **block replies**, not final replies or tool summaries.
 								## “Stream chunks or everything”
 								This maps to:
 								- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.
 								- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
 								- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
 								**Channel note:** Block streaming is **off unless**
 								`*.blockStreaming` is explicitly set to `true`. Channels can stream a live preview
 								(`channels.<channel>.streaming`) without block replies.
 								Config location reminder: the `blockStreaming*` defaults live under
 								`agents.defaults`, not the root config.
 								## Preview streaming modes
 								Canonical key: `channels.<channel>.streaming`
 								Modes:
 								- `off`: disable preview streaming.
 								- `partial`: single preview that is replaced with latest text.
 								- `block`: preview updates in chunked/appended steps.
 								- `progress`: progress/status preview during generation, final answer at completion.
 								### Channel mapping
 								| Channel  | `off` | `partial` | `block` | `progress`        |
 								| -------- | ----- | --------- | ------- | ----------------- |
 								| Telegram | ✅    | ✅        | ✅      | maps to `partial` |
 								| Discord  | ✅    | ✅        | ✅      | maps to `partial` |
 								| Slack    | ✅    | ✅        | ✅      | ✅                |
 								Slack-only:
 								- `channels.slack.nativeStreaming` toggles Slack native streaming API calls when `streaming=partial` (default: `true`).
 								Legacy key migration:
 								- Telegram: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
 								- Discord: `streamMode` + boolean `streaming` auto-migrate to `streaming` enum.
 								- Slack: `streamMode` auto-migrates to `streaming` enum; boolean `streaming` auto-migrates to `nativeStreaming`.
 								### Runtime behavior
 								Telegram:
 								- Uses Bot API `sendMessage` + `editMessageText`.
 								- Preview streaming is skipped when Telegram block streaming is explicitly enabled (to avoid double-streaming).
 								- `/reasoning stream` can write reasoning to preview.
 								Discord:
 								- Uses send + edit preview messages.
 								- `block` mode uses draft chunking (`draftChunk`).
 								- Preview streaming is skipped when Discord block streaming is explicitly enabled.
 								Slack:
 								- `partial` can use Slack native streaming (`chat.startStream`/`append`/`stop`) when available.
 								- `block` uses append-style draft previews.
 								- `progress` uses status preview text, then final answer.