From 81675335ad81d49e480d8bec7bd7356c5a1b7ee1 Mon Sep 17 00:00:00 2001 From: Matt Date: Wed, 1 Apr 2026 13:46:47 -0400 Subject: [PATCH] docs: add voice discovery mode design spec Captures the pivot from form-filling voice mode to a standalone consultative discovery experience with separate entry point, rewritten system prompt, on-screen contact verification, and reconnection handling. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...2026-04-01-voice-discovery-pivot-design.md | 144 ++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md diff --git a/docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md b/docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md new file mode 100644 index 0000000..d4d8980 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-voice-discovery-pivot-design.md @@ -0,0 +1,144 @@ +# Voice Discovery Mode — Design Spec + +## Overview + +Pivot the voice mode from a "faster way to fill out the configurator" into a standalone consultative discovery experience. Exploratory users — people who don't yet know exactly what they need — get a warm, conversational entry point separate from the typed configurator. The conversation is free-flowing and consultant-like, structured data is captured silently, and the user receives a personalized brief at the end. + +## Entry Point & Framing + +### Placement + +A new standalone section on the landing page, positioned after the services or process section — wherever the natural "I'm interested but not sure" moment occurs. Completely decoupled from the configurator. + +### Copy Direction + +- **Headline:** Warm and inviting — "Not sure where to start?" or "Still figuring out what you need?" +- **Subtext:** "Tell us what you're thinking and we'll figure it out together. You'll get a personalized brief at the end." +- **CTA button:** "Let's talk" — styled distinctly from the configurator's CTA. +- Both EN and FR translations required. + +### Behavior + +Clicking the CTA scrolls to / reveals the voice conversation panel inline on the page. No route change, no modal. The panel expands in place, keeping the user grounded in the site context. + +### Configurator Changes + +- Remove the `ModeToggle` component and `mode` state from `WizardContainer.tsx`. +- The configurator becomes typed-form-only — the "I know what I want" path. +- No other changes to the configurator itself. + +## Voice Conversation UI + +### Layout + +A dedicated panel, roughly the same width as the configurator card but taller. Three zones stacked vertically: + +1. **Agent header** — LetsBe branding mark, agent name, connection status dot. Similar to current but slightly more prominent. + +2. **Orb + transcript area** — Orb is larger (24-28 units instead of 20). Live transcript below it with significantly more vertical space (`max-h-72` or similar instead of current `max-h-40`). Proper autoscroll using `scrollIntoView` on the bottom ref. **Selection chips are removed** — no visible evidence of structured data capture. + +3. **Controls** — Mic toggle and end call button. Same as current, cleaner without chips. + +### Mobile + +Panel goes nearly full-width on small screens. Transcript takes most of the viewport height. Orb may scale down slightly. Controls stay fixed at the bottom for thumb reach. + +### Contact Confirmation Card + +When the agent captures name and email, a small inline card appears (above controls or below transcript) showing the captured values with inline edit affordance. The agent says "I've got your details on screen — look right?" User can tap to edit, then confirm. **This replaces the verbal spell-back entirely.** + +Requires a new tool (e.g., `request_contact`) that the agent calls to surface the card, rather than collecting contact info verbally. + +### During Brief Generation + +After contact confirmation and `complete_brief` trigger: +- Connection is closed (already fixed). +- Panel transitions to a generating state — orb morphs to loader or StepGenerating-style progress indicators. +- Transcript remains visible so the conversation doesn't vanish. + +### On Completion + +Transitions to the same `StepComplete` view (brief preview + book a call CTA). The brief content will be richer due to deeper conversation, but presentation is the same. + +## System Prompt & Agent Behavior + +### Tone + +The agent is a conversational consultant, not an interviewer with a checklist. No numbered topic list to work through. The prompt gives the agent a goal: "understand what this person needs deeply enough to write a compelling brief." + +### Behavioral Guidelines + +- **Follow the user's thread.** If they talk about a frustration, dig into it. Don't redirect to the next "topic." +- **One question at a time.** This stays — it works. +- **Offer perspective, not just questions.** "That sounds like it might be more of a systems problem than a website problem." The agent has opinions, not just a clipboard. +- **Reference LetsBe naturally.** "We've done something similar for a hospitality client" — not a feature list. +- **2-3 sentences per response.** Prevents monologuing. + +### Structured Data Capture + +`update_selections` tool stays. The agent is never instructed to "cover these topics." It maps what it hears to predefined values silently. If the conversation never touches timeline, that field stays empty — that's fine. + +### Brief Generation + +`conversationSummary` is the **primary payload**. The prompt instructs the agent to include everything discussed: pain points, current tools, what they want to keep vs change, business context, decision-makers, what success looks like. Structured fields (`services`, `industry`, `timeline`) are metadata that helps organize the brief, not the substance. + +### Brief Content Philosophy + +The brief should be **diagnostic, not prescriptive:** +- **Deep on their world** — pain points, current tools, what's broken, customers, what success looks like. +- **Deep on what matters** — priorities and trade-offs surfaced in conversation. +- **LetsBe's perspective** — a few sentences of informed opinion on what the real problem is. +- **High-level on implementation** — no stack recommendations, no architecture, no specific deliverables. +- **No timeline/cost** — "that's what the call is for." + +The brief should make the user feel understood and make the follow-up call feel like a warm continuation, not a cold intro. + +### Contact Collection + +The agent asks for name and email when the conversation reaches a natural conclusion — "I think I've got a great picture of what you need. Let me put a brief together — what's your name and email?" No forced timing. The `request_contact` tool surfaces the on-screen card for verification. + +### Language + +Both EN and FR system prompts, same as now. + +## Reconnection Handling + +Exploratory conversations run longer than form-filling. If the WebSocket drops mid-conversation: + +- Preserve the transcript on disconnect. +- Show a "reconnect" option instead of just an error. +- On reconnect, seed the new Gemini session with the transcript so far (as context in the system prompt or initial message) so the agent can pick up where it left off. +- The structured selections captured so far are preserved in state. + +## Technical Changes + +### Files to Modify + +- **`VoiceAgentProvider.tsx`** — Refactor `handleToolCall` so `conversationSummary` is the primary brief input. Add state for contact confirmation card (name + email captured, pending user confirm). Add reconnection logic (preserve transcript, re-seed on reconnect). Connection teardown on brief completion already fixed. + +- **`VoiceAgent.tsx`** — New layout: larger orb, bigger transcript area, no selection chips. Add contact confirmation card component (inline editable name + email). Fix autoscroll with `scrollIntoView`. Guard controls for brief-complete state (already done). Mobile-responsive layout. + +- **`gemini-live.ts`** — Rewrite `buildSystemPrompt()` for both locales with consultative tone. Adjust `complete_brief` tool description to emphasize `conversationSummary`. Add `request_contact` tool declaration that surfaces the on-screen card. + +- **`WizardContainer.tsx`** — Remove `ModeToggle` component import, `mode` state, and the voice mode rendering branch. Remove `handleVoiceComplete` and `VoiceAgentProvider` wrapper (these move to the new section). + +- **`ModeToggle.tsx`** — Delete entirely. + +- **New: Discovery section component** — New section component for the landing page with warm copy, CTA, and expandable voice panel. This is where `VoiceAgentProvider` and `VoiceAgent` now live. + +- **Landing page** — Add the new discovery section at the appropriate position. + +- **i18n message files** (`en.json`, `fr.json`) — Add translations for discovery section copy. Update voice-related strings as needed. + +- **Email template** — Verify the brief email template handles longer, more narrative content gracefully. Adjust if needed. + +### What Stays the Same + +- WebSocket connection to Gemini Live API +- Audio worklet recording + playback pipeline +- `update_selections` tool (used silently now) +- `/api/configure` route and brief generation logic +- `/api/gemini-token` route +- `StepComplete` component +- `analyze_website` tool (still useful when someone mentions their current site) +- The typed configurator (minus the mode toggle)