INSEAD Voicebot

Idle

Tap the orb to start

Idle0:00

Transcript

Connected

Start a conversation on the Voice tab — this page shows the running transcript.

The use-case: a voicebot for customer contact

Problem

Our company is a mid-sized Dutch property-and-casualty insurer. Our customer contact centre handles around five hundred thousand inbound calls per year. Roughly a quarter of those are simple status enquiries: where is my claim, when will my pay-out arrive, has my premium gone through. Another fifteen percent are policy FAQ — coverage questions, deductibles, what's included. These calls are repetitive, low-complexity, and reasonably scriptable, yet they take human agents three to five minutes each and cost us between three and five euros per call. Meanwhile the truly complex, emotional, or high-value conversations — a customer who just crashed their car, a difficult fraud signal, a vulnerable claimant — wait in the queue behind people asking what their excess is.

Solution

A voicebot that handles two scopes in phase one: claims status enquiries and policy FAQ. Customers call the existing service number, the bot answers, identifies them, handles the simple ask, and either resolves the call or warm-transfers to a human with full context. Crucially, the bot never tries to handle a first-notice-of-loss or anything emotional — those go straight to a human from the first detected signal.

Why voice and not a chatbot

Three reasons. First, our customer base skews older than the digital average for the Netherlands; voice is their preferred channel and pushing them to chat reduces NPS. Second, voice handles emotional and time-pressured contact better than chat — a frustrated customer types worse than they talk. Third, we already operate a phone number; adding voice AI in front of it is a non-disruptive UX change rather than launching a new channel.

Expected impact

In phase one we target a thirty-percent deflection rate on the two in-scope call types, which is at the low end of what comparable Dutch insurers have reported publicly. That's roughly sixty thousand calls per year handled end-to-end by the bot, saving on the order of two-hundred-thousand euros annually in agent time, against a build-and-run cost we estimate at sixty to eighty thousand in year one. We are not promising NPS gains — the goal is NPS-neutral while freeing human agents for the calls that actually need them. We will measure deflection rate, NPS for bot-handled calls, NPS for transferred calls, and average wait time for human-handled calls as the four leading indicators.

Risks and limits

The biggest risk is misclassification — a vulnerable or emotional caller getting kept on the bot when they need a human. We mitigate with conservative routing: any detected emotional cue, any sentence the bot is not confident it understood, any complex claim type, and we transfer. Second risk is regulatory — the Dutch financial regulator AFM requires clear duty-of-care for vulnerable consumers, and we will not roll out without sign-off from compliance and a documented escalation path. Third risk is brand — a bad voicebot interaction reflects badly on us. Mitigation is a hard scope, a phased rollout, and clear opt-out ("press zero for a person, any time"). What we are explicitly not doing in phase one: claims intake, sales, complaints handling, anything with material financial decisions.

Rollout

Phase one is a three-to-six-month pilot on claims status only, routed to ten percent of inbound. Phase two adds policy FAQ and ramps to fifty percent. Phase three, contingent on phase two metrics, considers broader scopes. Each phase has explicit go and no-go criteria on deflection rate, NPS deltas, and complaint volume. We start with claims status because it is the highest-volume, lowest-complexity, lowest-emotional-risk slice — the cleanest learning ground.

Rather hear this out loud?The voicebot can walk you through any section in about ninety seconds.

Tech stack

How the voicebot actually works.

A tight realtime voice loop on LiveKit Agents, with off-the-shelf STT, LLM, and TTS providers wrapped behind swappable plugins.

Capture

Deepgram Nova-3

Streaming ASR. ~80ms first-token latency.

Reason

Claude Haiku 4.5

Anthropic LLM handles understanding + response.

Ground

Inline corpus

Persona, FAQ, team and use-case markdown.

Speak

ElevenLabs Flash

Sub-300ms TTS with one consistent voice.

Realtime voice pipeline

LiveKit Agents orchestrates the realtime loop. Deepgram does streaming ASR, Claude Haiku 4.5 generates responses, ElevenLabs Flash returns sub-300ms TTS. The full round-trip targets under 800ms end-to-end.

Turn-taking is voice-activity driven (Silero VAD). The agent worker runs as a Python process; the browser holds a LiveKit WebRTC session over a self-hosted livekit-server.

livekit-agents<800ms p50silero vad

Speech in & out

Deepgram Nova-3 handles streaming ASR with low first-token latency. ElevenLabs Flash provides a consistent cloned voice.

Both providers are wrapped behind LiveKit's plugin interface so we can swap if vendor pricing or SLAs change.

deepgram nova-3elevenlabs flashplugin adapter

Knowledge grounding

Persona, course summary, FAQ, team bios, and the use-case brief are inlined as Markdown into the system prompt — small enough to fit fully cached, large enough to answer detailed questions.

Anthropic prompt caching keeps the system prompt on the cache hit path between turns.

inline markdownephemeral cacheno rag

Transcript streaming

Bot transcript appears in the browser word-by-word, paced to TTS audio playback. Driven by LiveKit Agents' TranscriptSynchronizer.

User transcript appears as Deepgram finalizes segments. Both are rendered as participants emit segments on RoomEvent.TranscriptionReceived.

audio-syncedRoomEvent.TranscriptionReceived

Lifecycle & safety

Sessions cap at five minutes and auto-end after sixty seconds of silence, with a spoken goodbye.

Default LiveKit dev keys are used for the local stack; production deployment uses generated keys per the deployment plan.

5-min cap60s silence timeoutdev keys local-only

The INSEAD course context

The course is INSEAD's Transforming Your Business with AI — a five-week executive programme on how to identify, prioritise, and deliver AI initiatives that drive real business outcomes. It moves through stock-taking the broader AI landscape and setting an AI strategy, then building an AI-driven organisation with attention to execution, risks and governance, then the generative-AI revolution specifically, then design trade-offs, and finally organising in the age of algorithms.

The course is built around five evaluation dimensions that every use-case should hit: business value, technical feasibility, risk and governance, organisational capability, and change management. Participants do action-learning projects — guided, real-world assignments where they apply the frameworks directly to their own organisations.

For the final assignment, each group writes a proposed AI use-case for their own company, defends it against the framework, and presents it. Roel's team's use-case is a voicebot for customer contact at their insurer, and instead of just slides, they built me — the working demo of that use-case. So when you ask me about ROI, governance, or rollout, I am answering through the lens this course taught the team to use.

The team

We are a group of five colleagues from a Dutch property-and-casualty insurer who took the INSEAD course together. Between us we cover claims operations, product, IT architecture, customer experience, and one engineer-by-training currently in a transformation role. We're a mix of people who have lived through previous waves of customer-contact technology — IVR, web self-service, chatbots — which is partly why we have opinions about what a voicebot should and shouldn't try to do.

The company

A mid-sized Dutch P&C insurer with several hundred thousand customers and a contact centre measured in low millions of inbound contacts per year across channels. The exact name isn't important for this demo; what matters is the customer base skews older than the digital-native average, the regulator is the AFM, and the product mix is dominated by motor, home, and liability.

The meta angle

We could have written a deck. We chose to actually build the thing. Partly because we figured it would be the most honest way to show that voice AI is buildable today — you can hear it land or fall flat in real time, not in a screenshot. Partly because it forced us to sit with the trade-offs the INSEAD frameworks talk about, in code, not in slides. And partly, yes, for the laugh — a voicebot pitching a voicebot is a slightly absurd object, and we leaned into it.

Want to hear it instead of reading it?The voicebot can walk you through the course context and the team in about a minute.

FAQ — anchor answers

These are short anchor answers for the most likely audience questions. The model should treat them as guidance for tone and content, not as scripts to read verbatim.

Q: What's your use-case, in one sentence? A voicebot that handles claims-status and policy-FAQ calls at a Dutch insurer, so human agents are free for the calls that actually need a human.

Q: Why voice and not a chatbot? Three things: our customer base is older and prefers voice, voice handles emotion and urgency better than chat, and we already run a phone number — voice AI is a non-disruptive UX change rather than a new channel.

Q: What's the ROI? We target thirty-percent deflection on the in-scope call types in phase one — about sixty thousand calls a year, roughly two-hundred-thousand euros saved against a sixty-to-eighty-thousand build-and-run cost. Modest, not magical, and we expected scrutiny on those numbers.

Q: What does the bot actually handle? Phase one: claims status and policy FAQ. Phase two: broader FAQ. Anything emotional, complex, or financial — claims intake, complaints, sales — goes to a human immediately.

Q: What about regulation and compliance? The Dutch AFM requires demonstrated duty-of-care for vulnerable consumers, so the bot routes anything that looks emotional, urgent, or complex straight to a human. We won't roll out without compliance sign-off and a documented escalation path.

Q: What if the customer is angry or upset? The bot detects emotional cues and transfers. The escalation is one-touch ("press zero or just ask for a person") and always available. Conservative routing is the design principle — when in doubt, transfer.

Q: How do you hand off to a human? The bot summarises the conversation, the detected intent, and the customer's verified identity, and that summary lands with the human agent before they pick up. No re-explaining.

Q: Are you the actual voicebot we're proposing? No. I'm the demo built to show the use-case. The production version would run on the insurer's telephony stack, not in a browser, and would be tuned to specific call flows. Same idea, different plumbing.

Q: How were you built? What's under the hood? Browser microphone, LiveKit for real-time audio, Deepgram Nova-3 for speech-to-text, Claude Haiku 4.5 for the language model, ElevenLabs Flash for text-to-speech. Self-hosted on a small Hetzner server. About one evening of plumbing, on top of off-the-shelf parts.

Q: What did you learn from the INSEAD course? That the technology is the easy part. The hard parts are scoping conservatively, governance, change management for the contact-centre staff who'd work alongside the bot, and being honest about which use-cases actually have ROI versus the ones that just sound exciting.

Q: Aren't you just a chatbot with extra steps? Voice changes the UX more than people expect. Older customers will use voice who won't touch a chat. Tone-of-voice carries information that text drops. And the failure modes are different — a slow chatbot is annoying; a slow voicebot is intolerable. Different product, not a re-skin.

Got a question that's not here?Tap through to the voice tab and just ask — the bot is built on this same set of answers.