Captures the pivot from form-filling voice mode to a standalone consultative discovery experience with separate entry point, rewritten system prompt, on-screen contact verification, and reconnection handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8.3 KiB
Voice Discovery Mode — Design Spec
Overview
Pivot the voice mode from a "faster way to fill out the configurator" into a standalone consultative discovery experience. Exploratory users — people who don't yet know exactly what they need — get a warm, conversational entry point separate from the typed configurator. The conversation is free-flowing and consultant-like, structured data is captured silently, and the user receives a personalized brief at the end.
Entry Point & Framing
Placement
A new standalone section on the landing page, positioned after the services or process section — wherever the natural "I'm interested but not sure" moment occurs. Completely decoupled from the configurator.
Copy Direction
- Headline: Warm and inviting — "Not sure where to start?" or "Still figuring out what you need?"
- Subtext: "Tell us what you're thinking and we'll figure it out together. You'll get a personalized brief at the end."
- CTA button: "Let's talk" — styled distinctly from the configurator's CTA.
- Both EN and FR translations required.
Behavior
Clicking the CTA scrolls to / reveals the voice conversation panel inline on the page. No route change, no modal. The panel expands in place, keeping the user grounded in the site context.
Configurator Changes
- Remove the
ModeTogglecomponent andmodestate fromWizardContainer.tsx. - The configurator becomes typed-form-only — the "I know what I want" path.
- No other changes to the configurator itself.
Voice Conversation UI
Layout
A dedicated panel, roughly the same width as the configurator card but taller. Three zones stacked vertically:
-
Agent header — LetsBe branding mark, agent name, connection status dot. Similar to current but slightly more prominent.
-
Orb + transcript area — Orb is larger (24-28 units instead of 20). Live transcript below it with significantly more vertical space (
max-h-72or similar instead of currentmax-h-40). Proper autoscroll usingscrollIntoViewon the bottom ref. Selection chips are removed — no visible evidence of structured data capture. -
Controls — Mic toggle and end call button. Same as current, cleaner without chips.
Mobile
Panel goes nearly full-width on small screens. Transcript takes most of the viewport height. Orb may scale down slightly. Controls stay fixed at the bottom for thumb reach.
Contact Confirmation Card
When the agent captures name and email, a small inline card appears (above controls or below transcript) showing the captured values with inline edit affordance. The agent says "I've got your details on screen — look right?" User can tap to edit, then confirm. This replaces the verbal spell-back entirely.
Requires a new tool (e.g., request_contact) that the agent calls to surface the card, rather than collecting contact info verbally.
During Brief Generation
After contact confirmation and complete_brief trigger:
- Connection is closed (already fixed).
- Panel transitions to a generating state — orb morphs to loader or StepGenerating-style progress indicators.
- Transcript remains visible so the conversation doesn't vanish.
On Completion
Transitions to the same StepComplete view (brief preview + book a call CTA). The brief content will be richer due to deeper conversation, but presentation is the same.
System Prompt & Agent Behavior
Tone
The agent is a conversational consultant, not an interviewer with a checklist. No numbered topic list to work through. The prompt gives the agent a goal: "understand what this person needs deeply enough to write a compelling brief."
Behavioral Guidelines
- Follow the user's thread. If they talk about a frustration, dig into it. Don't redirect to the next "topic."
- One question at a time. This stays — it works.
- Offer perspective, not just questions. "That sounds like it might be more of a systems problem than a website problem." The agent has opinions, not just a clipboard.
- Reference LetsBe naturally. "We've done something similar for a hospitality client" — not a feature list.
- 2-3 sentences per response. Prevents monologuing.
Structured Data Capture
update_selections tool stays. The agent is never instructed to "cover these topics." It maps what it hears to predefined values silently. If the conversation never touches timeline, that field stays empty — that's fine.
Brief Generation
conversationSummary is the primary payload. The prompt instructs the agent to include everything discussed: pain points, current tools, what they want to keep vs change, business context, decision-makers, what success looks like. Structured fields (services, industry, timeline) are metadata that helps organize the brief, not the substance.
Brief Content Philosophy
The brief should be diagnostic, not prescriptive:
- Deep on their world — pain points, current tools, what's broken, customers, what success looks like.
- Deep on what matters — priorities and trade-offs surfaced in conversation.
- LetsBe's perspective — a few sentences of informed opinion on what the real problem is.
- High-level on implementation — no stack recommendations, no architecture, no specific deliverables.
- No timeline/cost — "that's what the call is for."
The brief should make the user feel understood and make the follow-up call feel like a warm continuation, not a cold intro.
Contact Collection
The agent asks for name and email when the conversation reaches a natural conclusion — "I think I've got a great picture of what you need. Let me put a brief together — what's your name and email?" No forced timing. The request_contact tool surfaces the on-screen card for verification.
Language
Both EN and FR system prompts, same as now.
Reconnection Handling
Exploratory conversations run longer than form-filling. If the WebSocket drops mid-conversation:
- Preserve the transcript on disconnect.
- Show a "reconnect" option instead of just an error.
- On reconnect, seed the new Gemini session with the transcript so far (as context in the system prompt or initial message) so the agent can pick up where it left off.
- The structured selections captured so far are preserved in state.
Technical Changes
Files to Modify
-
VoiceAgentProvider.tsx— RefactorhandleToolCallsoconversationSummaryis the primary brief input. Add state for contact confirmation card (name + email captured, pending user confirm). Add reconnection logic (preserve transcript, re-seed on reconnect). Connection teardown on brief completion already fixed. -
VoiceAgent.tsx— New layout: larger orb, bigger transcript area, no selection chips. Add contact confirmation card component (inline editable name + email). Fix autoscroll withscrollIntoView. Guard controls for brief-complete state (already done). Mobile-responsive layout. -
gemini-live.ts— RewritebuildSystemPrompt()for both locales with consultative tone. Adjustcomplete_brieftool description to emphasizeconversationSummary. Addrequest_contacttool declaration that surfaces the on-screen card. -
WizardContainer.tsx— RemoveModeTogglecomponent import,modestate, and the voice mode rendering branch. RemovehandleVoiceCompleteandVoiceAgentProviderwrapper (these move to the new section). -
ModeToggle.tsx— Delete entirely. -
New: Discovery section component — New section component for the landing page with warm copy, CTA, and expandable voice panel. This is where
VoiceAgentProviderandVoiceAgentnow live. -
Landing page — Add the new discovery section at the appropriate position.
-
i18n message files (
en.json,fr.json) — Add translations for discovery section copy. Update voice-related strings as needed. -
Email template — Verify the brief email template handles longer, more narrative content gracefully. Adjust if needed.
What Stays the Same
- WebSocket connection to Gemini Live API
- Audio worklet recording + playback pipeline
update_selectionstool (used silently now)/api/configureroute and brief generation logic/api/gemini-tokenrouteStepCompletecomponentanalyze_websitetool (still useful when someone mentions their current site)- The typed configurator (minus the mode toggle)