Architecture at a Glance
A single inbound call, end to end: Read it as: a Caller dials a Number, which routes over a Trunk into Unpod. Unpod transcribes the audio with the call’s voice profile (STT) and sends plain text across the Bridge over a WebSocket to your AgentRunner. Your runner hands each turn to your Agent (your brain). Your reply text crosses back over the Bridge, Unpod synthesises it (TTS), and the caller hears it. SuperDialog is one option for the brain - powerful, but optional. You own everything from the AgentRunner down. Everything above it is managed.Glossary
One canonical definition each. Headings are anchor-able, so other pages can deep link (for example/get-started/core-concepts#pipe).
Number
A phone number callers dial. Inbound calls arrive on a number and route to the Speech Pipe it is attached to. Numbers come from Unpod directly or you bring your own (BYON). You never configure a carrier. See Numbers.Trunk
The SIP connection that carries calls between a telephony provider and Unpod. A trunk is the source of phone numbers: you register a trunk, then sync its numbers into Unpod. You configure a trunk once; after that you work with numbers, not SIP. See Trunks.Voice Profile
A bundle of STT + TTS provider configuration: which providers recognise and synthesise speech, which language and voice, latency, and failover order. Profiles are a read-only catalog you pick from - you reference one by name orprofile_id when creating a Speech Pipe. See
Voice Profiles.
Pipe
A Speech Pipe is the configuration entity that binds a call together: a name, a voice profile, recording and duration settings, and theagent_id that
points at your runner. Numbers attach to a pipe; outbound calls run through a
pipe. The pipe is the anchor that connects a number, a voice profile, and your
agent. See Pipes.
Bridge
The text-routing seam inside Unpod between the speech pipeline and your code. Transcribed caller text crosses the Bridge to your AgentRunner; your reply text crosses back to be synthesised. The Bridge is why your code never touches audio. It is Unpod-internal infrastructure - you do not configure it; you’ll see the term in WebSocket frame names and in the legacy Bridges API.Agent (Brain)
Your conversation logic - whatever decides what to say next. It can be a SuperDialogDialogMachine, a LangChain chain, a plain HTTP endpoint, or custom
Python. Unpod is brain-agnostic: it routes text in and text out. “Agent” and
“brain” mean the same thing here. See
Bring Your Agent.
AgentRunner
A long-lived Python process you run. It registers with the Unpod orchestrator over WebSocket, advertises capacity, and serves a per-call bridge that Unpod dials into. For each call it builds aCallContext and invokes your entrypoint.
You identify it with an agent_id (see IDs You’ll Meet). See
SDK Setup.
Session
Your control interface for one live call, reached asctx.session. It exposes
controls (say(), transfer_to_human(), end(), recording controls), hooks,
metrics, and the dialog_machine slot where you plug in your brain. Calling
session.run() keeps the call alive and routes each transcribed turn to your
brain. See Session Controls.
CallContext
The per-call metadata envelope your entrypoint receives:async def entrypoint(ctx: CallContext). It carries call_id, session_id,
agent_id, direction ("inbound" or "outbound"), user_number, any
instructions and data from dispatch, and the live session you control the
call through.
Space
A Platform concept, not an SDK one. A Space is a workspace container in the Unpod Platform that organises agents, tasks, runs, and data. The Platform’s REST API addresses a space by its space token. You only meet spaces when you use the hosted Platform or its REST API - the voice SDK does not require one.Two APIs
Theunpod SDK package contains two distinct halves. Know which one you are
using.
| Management API | Connectivity API | |
|---|---|---|
| Protocol | REST (HTTPS) | WebSocket (WSS) |
| Entry point | Client / AsyncClient | AgentRunner / Session |
| Purpose | Provision resources | Handle live calls |
| You call it | Before calls | During calls |
| Examples | client.numbers, client.voice_profiles, client.pipes, client.trunks, client.calls | AgentRunner(...).start(), ctx.session.say(...) |
Client (sync) or AsyncClient (async); it reads UNPOD_API_KEY
from the environment.
AgentRunner holds a
persistent WSS connection to the orchestrator, and each call gives you a live
Session to act on.
IDs You’ll Meet
Four identifiers cause most first-run failures. They are not interchangeable.| ID | What it identifies | Where it comes from |
|---|---|---|
agent_id | Your runner pool - which AgentRunner should handle a call | A string you choose. Passed to both AgentRunner(agent_id=...) and client.pipes.create(agent_id=...). They must match exactly. |
pipe_id | One Speech Pipe in the Speech Stack | A UUID Unpod assigns when you create the pipe. Used in REST calls (numbers.attach, calls.create). |
| Space token | A Platform workspace | A token from the Platform’s Spaces API. Used only in the hosted Platform and its REST API, not in the voice SDK. |
| Runner agent ID | Same as agent_id | Just another name for agent_id as seen from the runner side. Internally the runner derives a worker_id (<agent_id>#<random>) per process, but you never set that. |
Naming: The Four Product Terms
Unpod is one company with one platform, described at four altitudes. Use these terms precisely.| Term | What it means |
|---|---|
| Unpod | The company and the platform as a whole - everything below combined. |
| Speech Stack | The voice infrastructure plus the unpod SDK: numbers, trunks, voice profiles, pipes, STT/TTS, and the AgentRunner runtime. This is what a Python dev builds against. |
| SuperDialog | The optional dialog framework (superdialog package): flow graphs, tools, and state for structured conversations. One choice of brain - not required. |
| Platform | The hosted UI and self-hostable stack: dashboard, agent studio, spaces, analytics, and telephony management on top of the Speech Stack. |
Next Steps
Make Your First Call
Wire a number, a pipe, and a runner end to end.
Speech Stack
Numbers, voice profiles, pipes, and the AgentRunner SDK.
SuperDialog
The optional framework for structured conversation flows.
Session Controls
Act on a live call - say, transfer, end, record.