docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md

# Voice Discovery Mode — Design Spec

## Overview

Pivot the voice mode from a "faster way to fill out the configurator" into a standalone consultative discovery experience. Exploratory users — people who don't yet know exactly what they need — get a warm, conversational entry point separate from the typed configurator. The conversation is free-flowing and consultant-like, structured data is captured silently, and the user receives a personalized brief at the end.

## Entry Point & Framing

### Placement

A new standalone section on the landing page, positioned after the services or process section — wherever the natural "I'm interested but not sure" moment occurs. Completely decoupled from the configurator.

### Copy Direction

- **Headline:** Warm and inviting — "Not sure where to start?" or "Still figuring out what you need?"
- **Subtext:** "Tell us what you're thinking and we'll figure it out together. You'll get a personalized brief at the end."
- **CTA button:** "Let's talk" — styled distinctly from the configurator's CTA.
- Both EN and FR translations required.

### Behavior

Clicking the CTA scrolls to / reveals the voice conversation panel inline on the page. No route change, no modal. The panel expands in place, keeping the user grounded in the site context.

### Configurator Changes

- Remove the `ModeToggle` component and `mode` state from `WizardContainer.tsx`.
- The configurator becomes typed-form-only — the "I know what I want" path.
- No other changes to the configurator itself.

## Voice Conversation UI

### Layout

A dedicated panel, roughly the same width as the configurator card but taller. Three zones stacked vertically:

1. **Agent header** — LetsBe branding mark, agent name, connection status dot. Similar to current but slightly more prominent.

2. **Orb + transcript area** — Orb is larger (24-28 units instead of 20). Live transcript below it with significantly more vertical space (`max-h-72` or similar instead of current `max-h-40`). Proper autoscroll using `scrollIntoView` on the bottom ref. **Selection chips are removed** — no visible evidence of structured data capture.

3. **Controls** — Mic toggle and end call button. Same as current, cleaner without chips.

### Mobile

Panel goes nearly full-width on small screens. Transcript takes most of the viewport height. Orb may scale down slightly. Controls stay fixed at the bottom for thumb reach.

### Contact Confirmation Card

When the agent captures name and email, a small inline card appears (above controls or below transcript) showing the captured values with inline edit affordance. The agent says "I've got your details on screen — look right?" User can tap to edit, then confirm. **This replaces the verbal spell-back entirely.**

Requires a new tool (e.g., `request_contact`) that the agent calls to surface the card, rather than collecting contact info verbally.

### During Brief Generation

After contact confirmation and `complete_brief` trigger:
- Connection is closed (already fixed).
- Panel transitions to a generating state — orb morphs to loader or StepGenerating-style progress indicators.
- Transcript remains visible so the conversation doesn't vanish.

### On Completion

Transitions to the same `StepComplete` view (brief preview + book a call CTA). The brief content will be richer due to deeper conversation, but presentation is the same.

## System Prompt & Agent Behavior

### Tone

The agent is a conversational consultant, not an interviewer with a checklist. No numbered topic list to work through. The prompt gives the agent a goal: "understand what this person needs deeply enough to write a compelling brief."

### Behavioral Guidelines

- **Follow the user's thread.** If they talk about a frustration, dig into it. Don't redirect to the next "topic."
- **One question at a time.** This stays — it works.
- **Offer perspective, not just questions.** "That sounds like it might be more of a systems problem than a website problem." The agent has opinions, not just a clipboard.
- **Reference LetsBe naturally.** "We've done something similar for a hospitality client" — not a feature list.
- **2-3 sentences per response.** Prevents monologuing.

### Structured Data Capture

`update_selections` tool stays. The agent is never instructed to "cover these topics." It maps what it hears to predefined values silently. If the conversation never touches timeline, that field stays empty — that's fine.

### Brief Generation

`conversationSummary` is the **primary payload**. The prompt instructs the agent to include everything discussed: pain points, current tools, what they want to keep vs change, business context, decision-makers, what success looks like. Structured fields (`services`, `industry`, `timeline`) are metadata that helps organize the brief, not the substance.

### Brief Content Philosophy

The brief should be **diagnostic, not prescriptive:**
- **Deep on their world** — pain points, current tools, what's broken, customers, what success looks like.
- **Deep on what matters** — priorities and trade-offs surfaced in conversation.
- **LetsBe's perspective** — a few sentences of informed opinion on what the real problem is.
- **High-level on implementation** — no stack recommendations, no architecture, no specific deliverables.
- **No timeline/cost** — "that's what the call is for."

The brief should make the user feel understood and make the follow-up call feel like a warm continuation, not a cold intro.

### Contact Collection

The agent asks for name and email when the conversation reaches a natural conclusion — "I think I've got a great picture of what you need. Let me put a brief together — what's your name and email?" No forced timing. The `request_contact` tool surfaces the on-screen card for verification.

### Language

Both EN and FR system prompts, same as now.

## Reconnection Handling

Exploratory conversations run longer than form-filling. If the WebSocket drops mid-conversation:

- Preserve the transcript on disconnect.
- Show a "reconnect" option instead of just an error.
- On reconnect, seed the new Gemini session with the transcript so far (as context in the system prompt or initial message) so the agent can pick up where it left off.
- The structured selections captured so far are preserved in state.

## Technical Changes

### Files to Modify

- **`VoiceAgentProvider.tsx`** — Refactor `handleToolCall` so `conversationSummary` is the primary brief input. Add state for contact confirmation card (name + email captured, pending user confirm). Add reconnection logic (preserve transcript, re-seed on reconnect). Connection teardown on brief completion already fixed.

- **`VoiceAgent.tsx`** — New layout: larger orb, bigger transcript area, no selection chips. Add contact confirmation card component (inline editable name + email). Fix autoscroll with `scrollIntoView`. Guard controls for brief-complete state (already done). Mobile-responsive layout.

- **`gemini-live.ts`** — Rewrite `buildSystemPrompt()` for both locales with consultative tone. Adjust `complete_brief` tool description to emphasize `conversationSummary`. Add `request_contact` tool declaration that surfaces the on-screen card.

- **`WizardContainer.tsx`** — Remove `ModeToggle` component import, `mode` state, and the voice mode rendering branch. Remove `handleVoiceComplete` and `VoiceAgentProvider` wrapper (these move to the new section).

- **`ModeToggle.tsx`** — Delete entirely.

- **New: Discovery section component** — New section component for the landing page with warm copy, CTA, and expandable voice panel. This is where `VoiceAgentProvider` and `VoiceAgent` now live.

- **Landing page** — Add the new discovery section at the appropriate position.

- **i18n message files** (`en.json`, `fr.json`) — Add translations for discovery section copy. Update voice-related strings as needed.

- **Email template** — Verify the brief email template handles longer, more narrative content gracefully. Adjust if needed.

### What Stays the Same

- WebSocket connection to Gemini Live API
- Audio worklet recording + playback pipeline
- `update_selections` tool (used silently now)
- `/api/configure` route and brief generation logic
- `/api/gemini-token` route
- `StepComplete` component
- `analyze_website` tool (still useful when someone mentions their current site)
- The typed configurator (minus the mode toggle)
docs: add voice discovery mode design spec Captures the pivot from form-filling voice mode to a standalone consultative discovery experience with separate entry point, rewritten system prompt, on-screen contact verification, and reconnection handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-04-01 13:46:47 -04:00			`# Voice Discovery Mode — Design Spec`

			`## Overview`

			`Pivot the voice mode from a "faster way to fill out the configurator" into a standalone consultative discovery experience. Exploratory users — people who don't yet know exactly what they need — get a warm, conversational entry point separate from the typed configurator. The conversation is free-flowing and consultant-like, structured data is captured silently, and the user receives a personalized brief at the end.`

			`## Entry Point & Framing`

			`### Placement`

			`A new standalone section on the landing page, positioned after the services or process section — wherever the natural "I'm interested but not sure" moment occurs. Completely decoupled from the configurator.`

			`### Copy Direction`

			`- Headline: Warm and inviting — "Not sure where to start?" or "Still figuring out what you need?"`
			`- Subtext: "Tell us what you're thinking and we'll figure it out together. You'll get a personalized brief at the end."`
			`- CTA button: "Let's talk" — styled distinctly from the configurator's CTA.`
			`- Both EN and FR translations required.`

			`### Behavior`

			`Clicking the CTA scrolls to / reveals the voice conversation panel inline on the page. No route change, no modal. The panel expands in place, keeping the user grounded in the site context.`

			`### Configurator Changes`

			- Remove the `ModeToggle` component and `mode` state from `WizardContainer.tsx`.
			`- The configurator becomes typed-form-only — the "I know what I want" path.`
			`- No other changes to the configurator itself.`

			`## Voice Conversation UI`

			`### Layout`

			`A dedicated panel, roughly the same width as the configurator card but taller. Three zones stacked vertically:`

			`1. Agent header — LetsBe branding mark, agent name, connection status dot. Similar to current but slightly more prominent.`

			2. Orb + transcript area — Orb is larger (24-28 units instead of 20). Live transcript below it with significantly more vertical space (`max-h-72` or similar instead of current `max-h-40`). Proper autoscroll using `scrollIntoView` on the bottom ref. Selection chips are removed — no visible evidence of structured data capture.

			`3. Controls — Mic toggle and end call button. Same as current, cleaner without chips.`

			`### Mobile`

			`Panel goes nearly full-width on small screens. Transcript takes most of the viewport height. Orb may scale down slightly. Controls stay fixed at the bottom for thumb reach.`

			`### Contact Confirmation Card`

			`When the agent captures name and email, a small inline card appears (above controls or below transcript) showing the captured values with inline edit affordance. The agent says "I've got your details on screen — look right?" User can tap to edit, then confirm. This replaces the verbal spell-back entirely.`

			Requires a new tool (e.g., `request_contact`) that the agent calls to surface the card, rather than collecting contact info verbally.

			`### During Brief Generation`

			After contact confirmation and `complete_brief` trigger:
			`- Connection is closed (already fixed).`
			`- Panel transitions to a generating state — orb morphs to loader or StepGenerating-style progress indicators.`
			`- Transcript remains visible so the conversation doesn't vanish.`

			`### On Completion`

			Transitions to the same `StepComplete` view (brief preview + book a call CTA). The brief content will be richer due to deeper conversation, but presentation is the same.

			`## System Prompt & Agent Behavior`

			`### Tone`

			`The agent is a conversational consultant, not an interviewer with a checklist. No numbered topic list to work through. The prompt gives the agent a goal: "understand what this person needs deeply enough to write a compelling brief."`

			`### Behavioral Guidelines`

			`- Follow the user's thread. If they talk about a frustration, dig into it. Don't redirect to the next "topic."`
			`- One question at a time. This stays — it works.`
			`- Offer perspective, not just questions. "That sounds like it might be more of a systems problem than a website problem." The agent has opinions, not just a clipboard.`
			`- Reference LetsBe naturally. "We've done something similar for a hospitality client" — not a feature list.`
			`- 2-3 sentences per response. Prevents monologuing.`

			`### Structured Data Capture`

			`update_selections` tool stays. The agent is never instructed to "cover these topics." It maps what it hears to predefined values silently. If the conversation never touches timeline, that field stays empty — that's fine.

			`### Brief Generation`

			`conversationSummary` is the primary payload. The prompt instructs the agent to include everything discussed: pain points, current tools, what they want to keep vs change, business context, decision-makers, what success looks like. Structured fields (`services`, `industry`, `timeline`) are metadata that helps organize the brief, not the substance.

			`### Brief Content Philosophy`

			`The brief should be diagnostic, not prescriptive:`
			`- Deep on their world — pain points, current tools, what's broken, customers, what success looks like.`
			`- Deep on what matters — priorities and trade-offs surfaced in conversation.`
			`- LetsBe's perspective — a few sentences of informed opinion on what the real problem is.`
			`- High-level on implementation — no stack recommendations, no architecture, no specific deliverables.`
			`- No timeline/cost — "that's what the call is for."`

			`The brief should make the user feel understood and make the follow-up call feel like a warm continuation, not a cold intro.`

			`### Contact Collection`

			The agent asks for name and email when the conversation reaches a natural conclusion — "I think I've got a great picture of what you need. Let me put a brief together — what's your name and email?" No forced timing. The `request_contact` tool surfaces the on-screen card for verification.

			`### Language`

			`Both EN and FR system prompts, same as now.`

			`## Reconnection Handling`

			`Exploratory conversations run longer than form-filling. If the WebSocket drops mid-conversation:`

			`- Preserve the transcript on disconnect.`
			`- Show a "reconnect" option instead of just an error.`
			`- On reconnect, seed the new Gemini session with the transcript so far (as context in the system prompt or initial message) so the agent can pick up where it left off.`
			`- The structured selections captured so far are preserved in state.`

			`## Technical Changes`

			`### Files to Modify`

			- `VoiceAgentProvider.tsx` — Refactor `handleToolCall` so `conversationSummary` is the primary brief input. Add state for contact confirmation card (name + email captured, pending user confirm). Add reconnection logic (preserve transcript, re-seed on reconnect). Connection teardown on brief completion already fixed.

			- `VoiceAgent.tsx` — New layout: larger orb, bigger transcript area, no selection chips. Add contact confirmation card component (inline editable name + email). Fix autoscroll with `scrollIntoView`. Guard controls for brief-complete state (already done). Mobile-responsive layout.

			- `gemini-live.ts` — Rewrite `buildSystemPrompt()` for both locales with consultative tone. Adjust `complete_brief` tool description to emphasize `conversationSummary`. Add `request_contact` tool declaration that surfaces the on-screen card.

			- `WizardContainer.tsx` — Remove `ModeToggle` component import, `mode` state, and the voice mode rendering branch. Remove `handleVoiceComplete` and `VoiceAgentProvider` wrapper (these move to the new section).

			- `ModeToggle.tsx` — Delete entirely.

			- New: Discovery section component — New section component for the landing page with warm copy, CTA, and expandable voice panel. This is where `VoiceAgentProvider` and `VoiceAgent` now live.

			`- Landing page — Add the new discovery section at the appropriate position.`

			- i18n message files (`en.json`, `fr.json`) — Add translations for discovery section copy. Update voice-related strings as needed.

			`- Email template — Verify the brief email template handles longer, more narrative content gracefully. Adjust if needed.`

			`### What Stays the Same`

			`- WebSocket connection to Gemini Live API`
			`- Audio worklet recording + playback pipeline`
			- `update_selections` tool (used silently now)
			- `/api/configure` route and brief generation logic
			- `/api/gemini-token` route
			- `StepComplete` component
			- `analyze_website` tool (still useful when someone mentions their current site)
			`- The typed configurator (minus the mode toggle)`