Skip to main content

The problem

Your team has already built the hard part. The chatbot works. The LangChain agent answers questions in production. The brain exists - prompted, tested, shipped. Voice is where it stalls. Speech is a different engineering discipline, not more AI work. Telephony, SIP, carrier accounts, STT/TTS providers, voice activity detection, barge-in, endpointing, audio transport, scaling under load - none of it exists in a software team’s stack, and none of it makes your agent any smarter. Teams routinely spend three or four months wiring this up before their agent takes a single real call. That time buys no product differentiation. Every voice startup rebuilds the same plumbing. Unpod closes the gap: the agent you already have maps straight to voice. Text in, text out, a phone number on the front. You keep building the brain; Unpod does the speaking.

What Unpod handles vs what you write

The line is sharp: the wire between Unpod and your code carries text, not audio. Your code never sees a packet, a codec, or a SIP message. It receives transcribed text and returns text. Audio never leaves Unpod’s infrastructure.
Unpod handlesYou write
Telephony, SIP, carriersThe agent’s brain - one entrypoint
Phone numbers (provision or bring your own)LLM choice and prompts
STT, TTS, and provider failoverConversation flows and logic
VAD, barge-in, endpointingTools and business logic
Audio transport (phone, WebSocket, WebRTC)Per-call memory and state
Scaling, dispatch, and orchestration
Recordings and transcripts
In practice your entire integration is a function that receives a call and runs a dialog:
from superdialog import DialogMachine, Flow
from unpod import AgentRunner, CallContext

flow = Flow.load("support.json")

async def handle_call(ctx: CallContext) -> None:
    ctx.session.dialog_machine = DialogMachine(flow=flow, llm="anthropic/claude-haiku-4-5")
    await ctx.session.run()

AgentRunner(entrypoint=handle_call, agent_id="agt_...").start()
(Illustrative - the Quickstart walks through the runnable version, no flow file needed.) No pipeline configuration. No WebRTC. No SIP. No carrier account. No audio handling. If you already have an HTTP endpoint or a LangChain agent, you can point Unpod at it with no SDK code at all. For where this sits in the larger architecture, see Core Concepts.

Vs the alternatives

The honest comparison is about what you own and how long it takes to get to a real call. Rolling your own (or using a pipeline framework like Pipecat or an audio platform like LiveKit): you assemble and run every component - STT provider, TTS provider, VAD, transport, endpointing, scaling. Maximum control, maximum setup. Phone numbers mean a SIP trunk and a carrier account. Expect months before the first production call, and ongoing operational ownership of the speech layer forever. Unpod: you bring the agent; the speech layer is managed. Numbers provision directly, voice profiles configure STT/TTS with automatic failover, and dispatch and scaling are handled for you. Expect hours to your first call. Choose to roll your own when you need fine-grained control over the audio pipeline or already run that infrastructure. Choose Unpod when your value is the agent logic and you do not want to become a telephony team. For the full feature-by-feature comparison, see Unpod vs Pipecat vs LiveKit. Unpod’s core components are open source - you can self-host the full stack or use the managed cloud. See Self-Hosting.

Start building

Quickstart

Talk to your own voice agent in the browser in about five minutes. No phone number needed.

SuperDialog

The dialog framework. Turn a prompt or flow graph into an executable conversation - pure text in, pure text out.

Platform (no-code)

Build, configure, and run agents from the hosted UI. No SDK required.