The problem
Your team has already built the hard part. The chatbot works. The LangChain agent answers questions in production. The brain exists - prompted, tested, shipped. Voice is where it stalls. Speech is a different engineering discipline, not more AI work. Telephony, SIP, carrier accounts, STT/TTS providers, voice activity detection, barge-in, endpointing, audio transport, scaling under load - none of it exists in a software team’s stack, and none of it makes your agent any smarter. Teams routinely spend three or four months wiring this up before their agent takes a single real call. That time buys no product differentiation. Every voice startup rebuilds the same plumbing. Unpod closes the gap: the agent you already have maps straight to voice. Text in, text out, a phone number on the front. You keep building the brain; Unpod does the speaking.What Unpod handles vs what you write
The line is sharp: the wire between Unpod and your code carries text, not audio. Your code never sees a packet, a codec, or a SIP message. It receives transcribed text and returns text. Audio never leaves Unpod’s infrastructure.| Unpod handles | You write |
|---|---|
| Telephony, SIP, carriers | The agent’s brain - one entrypoint |
| Phone numbers (provision or bring your own) | LLM choice and prompts |
| STT, TTS, and provider failover | Conversation flows and logic |
| VAD, barge-in, endpointing | Tools and business logic |
| Audio transport (phone, WebSocket, WebRTC) | Per-call memory and state |
| Scaling, dispatch, and orchestration | |
| Recordings and transcripts |
Vs the alternatives
The honest comparison is about what you own and how long it takes to get to a real call. Rolling your own (or using a pipeline framework like Pipecat or an audio platform like LiveKit): you assemble and run every component - STT provider, TTS provider, VAD, transport, endpointing, scaling. Maximum control, maximum setup. Phone numbers mean a SIP trunk and a carrier account. Expect months before the first production call, and ongoing operational ownership of the speech layer forever. Unpod: you bring the agent; the speech layer is managed. Numbers provision directly, voice profiles configure STT/TTS with automatic failover, and dispatch and scaling are handled for you. Expect hours to your first call. Choose to roll your own when you need fine-grained control over the audio pipeline or already run that infrastructure. Choose Unpod when your value is the agent logic and you do not want to become a telephony team. For the full feature-by-feature comparison, see Unpod vs Pipecat vs LiveKit. Unpod’s core components are open source - you can self-host the full stack or use the managed cloud. See Self-Hosting.Start building
Quickstart
Talk to your own voice agent in the browser in about five minutes. No phone number needed.
SuperDialog
The dialog framework. Turn a prompt or flow graph into an executable conversation - pure text in, pure text out.
Platform (no-code)
Build, configure, and run agents from the hosted UI. No SDK required.