Skip to main content
Already have a working agent? This is your page. You wrap your brain in a DialogAdapter, assign it to ctx.session.dialog_machine, and Unpod handles the rest of the call - telephony, speech-to-text, text-to-speech, audio transport. Your Agent stays text-in/text-out. If you don’t have an agent yet, start with the Quickstart instead - it builds the minimal brain first.

Install

uv add unpod

Write your entrypoint

The entrypoint is an async function called once per call. Set your agent on ctx.session.dialog_machine and call await ctx.session.run().
from unpod import AgentRunner, CallContext
from unpod.adapters.langchain import LangChainAdapter

# Your existing LangChain chain
from your_app import chain

async def entrypoint(ctx: CallContext) -> None:
    ctx.session.dialog_machine = LangChainAdapter(chain)
    await ctx.session.run()
AgentRunner requires UNPOD_API_KEY in your environment, and the agent_id you pass must match your Speech Pipe’s agent_id - see IDs You’ll Meet. The Setup Checklist lists every variable.
AgentRunner(
    entrypoint=entrypoint,
    agent_id="my-agent",  # must match agent_id in your Speech Pipe config
).start()
Incoming calls to your Unpod number are now routed to your agent.

Streaming is the hot path

This is the single most important thing on this page. During a live call, session.run() calls your adapter’s stream() on every user turn and pipes tokens straight to the voice bridge for synthesis. turn() is never called by the framework during live calls.
If your adapter implements turn() properly but fakes stream() with a single-chunk fallback (await the full response, yield it once), the caller hears silence until the entire response is generated, then a long monologue - choppy, high-latency audio with no error message anywhere. Implement real token streaming in stream().
The bundled adapters differ here - check the table below before choosing: OpenAIAdapter, AnthropicAdapter, and LangChainAdapter stream real tokens from their providers; HTTPAdapter cannot stream (one HTTP round-trip, one chunk), so expect a latency penalty proportional to your response length.

The DialogAdapter protocol

Any object with these three methods is a valid brain - no base class required (the protocol is runtime-checkable):
async def turn(self, text: str, context: dict | None = None) -> str
    # Return a complete response. Not called during live calls.

async def stream(self, text: str, context: dict | None = None) -> AsyncIterator[str]
    # Yield response tokens. THE hot path - session.run() calls this.

def assist(self, text: str) -> None
    # Inject a system instruction before the next turn.
Auto-wrapping: assigning a superdialog DialogMachine or LLMAgent directly to ctx.session.dialog_machine wraps it in a SuperDialogAdapter for you. Anything else must satisfy the protocol above, or the setter raises TypeError.

Supported adapters

AdapterImportReal streamingUse when
OpenAIAdapterunpod.adapters.openaiYesOpenAI AsyncOpenAI client - gpt-4o, gpt-4o-mini, etc.
AnthropicAdapterunpod.adapters.anthropicYesAnthropic AsyncAnthropic client - Claude models
LangChainAdapterunpod.adapters.langchainYesLangChain chain with .ainvoke() / .astream()
SuperDialogAdapterunpod.adapters.superdialogYessuperdialog DialogMachine / LLMAgent - or assign directly, auto-wrapped
HTTPAdapterunpod.adapters.httpNo - single chunkRemote agent API (any language); accepts the latency trade-off
MCPAdapterunpod.adapters.mcpPreviewInterface defined; full MCP orchestration is not implemented yet (unpod[mcp])
CustomImplement the protocolUp to youAny Python object with turn, stream, assist

OpenAI

from openai import AsyncOpenAI
from unpod import AgentRunner, CallContext
from unpod.adapters.openai import OpenAIAdapter

client = AsyncOpenAI()

async def entrypoint(ctx: CallContext) -> None:
    ctx.session.dialog_machine = OpenAIAdapter(
        client=client,
        model="gpt-4o-mini",
        system_prompt="You are a helpful support agent. Be concise.",
    )
    await ctx.session.run()

AgentRunner(entrypoint=entrypoint, agent_id="my-agent").start()

Anthropic (Claude)

import anthropic
from unpod import AgentRunner, CallContext
from unpod.adapters.anthropic import AnthropicAdapter

client = anthropic.AsyncAnthropic()

async def entrypoint(ctx: CallContext) -> None:
    ctx.session.dialog_machine = AnthropicAdapter(
        client=client,
        model="claude-haiku-4-5-20251001",
        system_prompt="You are a helpful support agent. Be concise.",
    )
    await ctx.session.run()

AgentRunner(entrypoint=entrypoint, agent_id="my-agent").start()
Model name formats differ by layer: direct adapters take the provider’s raw model name (claude-haiku-4-5-20251001, gpt-4o-mini); superdialog brains take a provider/model URI (anthropic/claude-haiku-4-5). Both are correct in their context.

LangChain

LangChainAdapter expects your chain to accept {"messages": [...]} as input by default. This works with ChatPromptTemplate | ChatModel chains. The adapter keeps conversation history across turns and streams via .astream(). If your chain uses a different input key, pass input_key:
# Chain expects {"input": "..."}
LangChainAdapter(chain, input_key="input")

HTTP endpoint

HTTPAdapter lets you keep your agent in any language behind an HTTP API. Each user turn becomes one POST:
from unpod.adapters.http import HTTPAdapter

adapter = HTTPAdapter(
    url="https://agent.example.com/respond",
    headers={"Authorization": "Bearer ..."},  # optional
    timeout_s=10.0,
)
Request body your endpoint receives, and the response it must return:
// request - session_id and system_instructions are included when present
{"text": "what the caller said", "context": {}, "session_id": "...", "system_instructions": ["..."]}
// response
{"text": "your agent's reply"}
HTTPAdapter does not stream - the whole reply arrives as one chunk, so the caller waits for your full HTTP round-trip before hearing anything. Fine for short replies; for long-form answers prefer an in-process adapter.

Using session controls

Inside your entrypoint you can react to call events and control the call:
async def entrypoint(ctx: CallContext) -> None:
    @ctx.session.on("user_turn")
    async def _(text: str) -> None:
        print(f"User said: {text}")

    @ctx.session.on("call_end")
    async def _(reason: str) -> None:
        print(f"Call ended: {reason}")

    ctx.session.dialog_machine = LangChainAdapter(chain)
    await ctx.session.run()
Common hooks: call_start, user_turn, agent_turn, user_partial, interruption, call_end, error (the registry also accepts tool_call, tool_result, silence, and metric). See Session for the full control surface.

Writing a custom adapter

Implement the three protocol methods - no base class required:
class MyAdapter:
    async def turn(self, text: str, context: dict | None = None) -> str:
        """Return complete response. Called for non-streaming use."""
        return my_agent.respond(text)

    async def stream(self, text: str, context: dict | None = None):
        """Yield response tokens. THIS is the hot path used by session.run()."""
        async for token in my_agent.stream(text):
            yield token

    def assist(self, text: str) -> None:
        """Inject a system instruction before the next turn."""
        my_agent.set_instruction(text)

Next steps

Setup Checklist

One-time resource provisioning: agent, number, voice profile

Session Controls

say(), transfer(), recording controls during live calls

SuperDialog

State machine framework for structured conversation flows

SuperDialog + Voice

Plug a DialogMachine directly into your AgentRunner session