Files
LetsBeBiz-Site/docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md
Matt 81675335ad docs: add voice discovery mode design spec
Captures the pivot from form-filling voice mode to a standalone
consultative discovery experience with separate entry point, rewritten
system prompt, on-screen contact verification, and reconnection handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 13:46:47 -04:00

8.3 KiB

Voice Discovery Mode — Design Spec

Overview

Pivot the voice mode from a "faster way to fill out the configurator" into a standalone consultative discovery experience. Exploratory users — people who don't yet know exactly what they need — get a warm, conversational entry point separate from the typed configurator. The conversation is free-flowing and consultant-like, structured data is captured silently, and the user receives a personalized brief at the end.

Entry Point & Framing

Placement

A new standalone section on the landing page, positioned after the services or process section — wherever the natural "I'm interested but not sure" moment occurs. Completely decoupled from the configurator.

Copy Direction

  • Headline: Warm and inviting — "Not sure where to start?" or "Still figuring out what you need?"
  • Subtext: "Tell us what you're thinking and we'll figure it out together. You'll get a personalized brief at the end."
  • CTA button: "Let's talk" — styled distinctly from the configurator's CTA.
  • Both EN and FR translations required.

Behavior

Clicking the CTA scrolls to / reveals the voice conversation panel inline on the page. No route change, no modal. The panel expands in place, keeping the user grounded in the site context.

Configurator Changes

  • Remove the ModeToggle component and mode state from WizardContainer.tsx.
  • The configurator becomes typed-form-only — the "I know what I want" path.
  • No other changes to the configurator itself.

Voice Conversation UI

Layout

A dedicated panel, roughly the same width as the configurator card but taller. Three zones stacked vertically:

  1. Agent header — LetsBe branding mark, agent name, connection status dot. Similar to current but slightly more prominent.

  2. Orb + transcript area — Orb is larger (24-28 units instead of 20). Live transcript below it with significantly more vertical space (max-h-72 or similar instead of current max-h-40). Proper autoscroll using scrollIntoView on the bottom ref. Selection chips are removed — no visible evidence of structured data capture.

  3. Controls — Mic toggle and end call button. Same as current, cleaner without chips.

Mobile

Panel goes nearly full-width on small screens. Transcript takes most of the viewport height. Orb may scale down slightly. Controls stay fixed at the bottom for thumb reach.

Contact Confirmation Card

When the agent captures name and email, a small inline card appears (above controls or below transcript) showing the captured values with inline edit affordance. The agent says "I've got your details on screen — look right?" User can tap to edit, then confirm. This replaces the verbal spell-back entirely.

Requires a new tool (e.g., request_contact) that the agent calls to surface the card, rather than collecting contact info verbally.

During Brief Generation

After contact confirmation and complete_brief trigger:

  • Connection is closed (already fixed).
  • Panel transitions to a generating state — orb morphs to loader or StepGenerating-style progress indicators.
  • Transcript remains visible so the conversation doesn't vanish.

On Completion

Transitions to the same StepComplete view (brief preview + book a call CTA). The brief content will be richer due to deeper conversation, but presentation is the same.

System Prompt & Agent Behavior

Tone

The agent is a conversational consultant, not an interviewer with a checklist. No numbered topic list to work through. The prompt gives the agent a goal: "understand what this person needs deeply enough to write a compelling brief."

Behavioral Guidelines

  • Follow the user's thread. If they talk about a frustration, dig into it. Don't redirect to the next "topic."
  • One question at a time. This stays — it works.
  • Offer perspective, not just questions. "That sounds like it might be more of a systems problem than a website problem." The agent has opinions, not just a clipboard.
  • Reference LetsBe naturally. "We've done something similar for a hospitality client" — not a feature list.
  • 2-3 sentences per response. Prevents monologuing.

Structured Data Capture

update_selections tool stays. The agent is never instructed to "cover these topics." It maps what it hears to predefined values silently. If the conversation never touches timeline, that field stays empty — that's fine.

Brief Generation

conversationSummary is the primary payload. The prompt instructs the agent to include everything discussed: pain points, current tools, what they want to keep vs change, business context, decision-makers, what success looks like. Structured fields (services, industry, timeline) are metadata that helps organize the brief, not the substance.

Brief Content Philosophy

The brief should be diagnostic, not prescriptive:

  • Deep on their world — pain points, current tools, what's broken, customers, what success looks like.
  • Deep on what matters — priorities and trade-offs surfaced in conversation.
  • LetsBe's perspective — a few sentences of informed opinion on what the real problem is.
  • High-level on implementation — no stack recommendations, no architecture, no specific deliverables.
  • No timeline/cost — "that's what the call is for."

The brief should make the user feel understood and make the follow-up call feel like a warm continuation, not a cold intro.

Contact Collection

The agent asks for name and email when the conversation reaches a natural conclusion — "I think I've got a great picture of what you need. Let me put a brief together — what's your name and email?" No forced timing. The request_contact tool surfaces the on-screen card for verification.

Language

Both EN and FR system prompts, same as now.

Reconnection Handling

Exploratory conversations run longer than form-filling. If the WebSocket drops mid-conversation:

  • Preserve the transcript on disconnect.
  • Show a "reconnect" option instead of just an error.
  • On reconnect, seed the new Gemini session with the transcript so far (as context in the system prompt or initial message) so the agent can pick up where it left off.
  • The structured selections captured so far are preserved in state.

Technical Changes

Files to Modify

  • VoiceAgentProvider.tsx — Refactor handleToolCall so conversationSummary is the primary brief input. Add state for contact confirmation card (name + email captured, pending user confirm). Add reconnection logic (preserve transcript, re-seed on reconnect). Connection teardown on brief completion already fixed.

  • VoiceAgent.tsx — New layout: larger orb, bigger transcript area, no selection chips. Add contact confirmation card component (inline editable name + email). Fix autoscroll with scrollIntoView. Guard controls for brief-complete state (already done). Mobile-responsive layout.

  • gemini-live.ts — Rewrite buildSystemPrompt() for both locales with consultative tone. Adjust complete_brief tool description to emphasize conversationSummary. Add request_contact tool declaration that surfaces the on-screen card.

  • WizardContainer.tsx — Remove ModeToggle component import, mode state, and the voice mode rendering branch. Remove handleVoiceComplete and VoiceAgentProvider wrapper (these move to the new section).

  • ModeToggle.tsx — Delete entirely.

  • New: Discovery section component — New section component for the landing page with warm copy, CTA, and expandable voice panel. This is where VoiceAgentProvider and VoiceAgent now live.

  • Landing page — Add the new discovery section at the appropriate position.

  • i18n message files (en.json, fr.json) — Add translations for discovery section copy. Update voice-related strings as needed.

  • Email template — Verify the brief email template handles longer, more narrative content gracefully. Adjust if needed.

What Stays the Same

  • WebSocket connection to Gemini Live API
  • Audio worklet recording + playback pipeline
  • update_selections tool (used silently now)
  • /api/configure route and brief generation logic
  • /api/gemini-token route
  • StepComplete component
  • analyze_website tool (still useful when someone mentions their current site)
  • The typed configurator (minus the mode toggle)