AgentRunner & Sessions

The Connectivity half of the SDK. An AgentRunner is the long-lived process that receives calls; a Session is your control surface for one live call. This page is the reference for both.

Install

uv add unpod
uv add "unpod[dialog]"      # optional: superdialog for structured flows
uv add "unpod[langchain]"   # optional: LangChain adapter

Source: unpod-ai/unpod-python-sdk.

export UNPOD_API_KEY="sk_..."
export UNPOD_BASE_URL="api.unpod.ai"   # one URL; the runner derives wss://<host>

The Quickstart documents every environment variable.

The runner

AgentRunner holds a WebSocket connection to the Unpod orchestrator. When a call is dispatched to your agent, the runner invokes your entrypoint with a CallContext.

Animated AgentRunner dispatch diagram showing the Unpod orchestrator connected to AgentRunner over WSS, heartbeats and capacity reporting, dispatch into entrypoint CallContext, and the Session run loop.

from unpod import AgentRunner, CallContext

async def handle_call(ctx: CallContext) -> None:
    await ctx.session.say("Hello, thanks for calling!")
    await ctx.session.run()  # blocks until the call ends

runner = AgentRunner(
    entrypoint=handle_call,
    agent_id="my-agent",  # must match agent_id in your Speech Pipe
)
runner.start()  # blocking

agent_id is the runner agent ID - a short string you choose, matching the agent_id in your Speech Pipe config. Not the pipe’s UUID. See IDs You’ll Meet.

Constructor

AgentRunner(
    entrypoint: Callable[[CallContext], Awaitable[None]],
    agent_id: str,
    api_key: str | None = None,            # falls back to UNPOD_API_KEY
    max_sessions: int = 50,                 # max concurrent sessions
    max_concurrent_calls: int | None = None, # alias for max_sessions
    permits_per_minute: int = 120,          # rate of new call acceptance
    drain_timeout_s: int = 60,              # graceful shutdown window
    dev_mode: bool = False,                 # local orchestrator, dev pool
    base_url: str | None = None,            # override orchestrator URL
    serving_url: str | None = None,         # serve transport only; falls back to UNPOD_RUNNER_URL
    agent_secret: str | None = None,        # serve transport only; falls back to UNPOD_AGENT_SECRET
    transport: str = "dial_out",            # "dial_out" (v2 default) | "serve" (legacy)
)

Parameter	Description
`max_sessions`	Max simultaneous sessions this runner accepts; the orchestrator will not dispatch beyond it.
`permits_per_minute`	Rate of new call acceptance. Lower it to protect downstream systems.
`drain_timeout_s`	On shutdown, wait this long for active calls to finish before force-exiting.
`dev_mode`	Register in a dev pool against a local orchestrator.
`transport`	`"dial_out"` (v2, default): the runner never listens — it dials out per call to the `bridge_url` delivered in the `job.assign` frame. `"serve"` (legacy, deprecated): the runner hosts a bridge server that media agents dial into.
`serving_url`	Public `wss://` URL of the runner’s bridge server. Only used with `transport="serve"`; ignored (warns) under the default `dial_out` transport, which never listens.
`agent_secret`	Only used with `transport="serve"`: when set, inbound bridge connections are HMAC-verified; without it the runner accepts unsigned connections (dev default). Ignored (warns) under the default `dial_out` transport.

Runner lifecycle hooks

React to runner-level events (distinct from per-session hooks):

@runner.on("call_start")
async def on_call_start(ctx: CallContext) -> None:
    print(f"New call: {ctx.call_id}")

@runner.on("call_end")
async def on_call_end(ctx: CallContext, final_state: str) -> None:
    print(f"Call {ctx.call_id} ended: {final_state}")

This is the runner-level call_end (signature (ctx, final_state)), fired once per call on the runner. The session-level call_end (@ctx.session.on("call_end"), signature (final_state)) fires inside a single call and is where you read the telephony end_reason via ctx.session.data.get("end_reason"). See Call Lifecycle.

CallContext

Every call to your entrypoint receives a CallContext:

async def handle_call(ctx: CallContext) -> None:
    ctx.call_id       # str: unique call ID
    ctx.session_id    # str: unique session ID
    ctx.agent_id      # str: the CALL's agent (from the dispatch / call.started)
    ctx.runner_id     # str: this runner's OWN configured agent_id
    ctx.direction     # str: "inbound" or "outbound"
    ctx.user_number   # str: caller's E.164 number
    ctx.instructions  # str | None: per-call override instructions
    ctx.data          # dict: metadata from dispatch (e.g. CRM data)
    ctx.room          # dict: LiveKit room metadata (informational; brain is text-only)
    ctx.session       # Session: call control object

ctx.agent_id is the agent the call was dispatched for (carried on call.started); ctx.runner_id is the agent_id this runner process was constructed with. They match for a single-agent runner and differ for a multi-tenant one — route on ctx.agent_id, not ctx.runner_id.

The Session

ctx.session is your interface to the live call: speak, interrupt, transfer, record, end - all from inside your entrypoint.

Speaking

await ctx.session.say("Thank you for your patience.")   # speak via TTS, returns immediately
await ctx.session.set_filler("One moment please...")    # played during processing silences

Interrupting

@ctx.session.on("user_turn")
async def on_user_turn(text: str) -> None:
    if "stop" in text.lower():
        await ctx.session.interrupt()   # stop the current utterance

Transferring

await ctx.session.transfer_to_human(queue="tier-2-support")   # cold transfer to a human queue
await ctx.session.transfer_to_agent(agent_id="billing-agent") # cold transfer to another agent

A cold transfer drops your session the moment it is initiated. For a warm handoff, use the out-of-band client.sessions.transfer(..., mode="warm") - see below.

Ending

await ctx.session.end(reason="completed")  # reason defaults to "completed"

Common reasons: "completed", "no_response", "error", "transferred", "max_duration".

Recording control

await ctx.session.recording.pause(reason="PII")  # e.g. before card numbers
await ctx.session.recording.resume()

Pause/resume requires recording to be enabled on the Speech Pipe (recording=True); otherwise these calls are ignored.

Per-call data

session.data is a plain dict scoped to the current call:

ctx.session.data["customer"] = await crm.lookup(ctx.user_number)

The main loop - `run()`

session.run() keeps the call alive. It reads bridge events, fires your hooks, routes each transcribed user turn to your dialog adapter’s stream(), and pipes the reply tokens to TTS.

async def handle_call(ctx: CallContext) -> None:
    ctx.session.dialog_machine = my_brain          # see Bring Your Agent
    await ctx.session.say("Hi, I'm Alex. How can I help?")
    await ctx.session.run()                        # blocks until call ends
    # anything here runs as post-call cleanup

Live metrics

m = ctx.session.metrics.live()   # CallMetrics snapshot, during or after run()
m.turns           # int: dialog turns so far
m.duration_s      # float: call duration
m.stt_p95_ms      # int: P95 STT latency
m.llm_p95_ms      # int: P95 LLM latency
m.tts_p95_ms      # int: P95 TTS latency
m.cost.voice      # float - m.cost.llm, m.cost.total
m.tokens.input    # int - m.tokens.output
m.active_llm      # str: model used on the last turn

Latency/cost/token fields populate only if you feed the tracker via metrics.record_turn(...) (e.g. from the turn_complete hook). Out of the box only duration_s is meaningful — for real per-turn numbers use the llm_call / turn_complete hooks. See Metrics, Cost & Observability.

Session API reference

Method	Signature	Description
`say`	`async (text: str) → None`	Speak text via TTS
`interrupt`	`async () → None`	Stop current utterance
`set_filler`	`async (text: str) → None`	Set filler phrase
`transfer_to_human`	`async (queue: str) → None`	Cold transfer to human queue
`transfer_to_agent`	`async (agent_id: str) → None`	Cold transfer to another agent
`end`	`async (reason: str = "completed") → None`	End the call
`run`	`async () → None`	Main event loop
`on`	`(event: str) → decorator`	Register hook
`recording.pause`	`async (reason: str = "") → None`	Pause recording
`recording.resume`	`async () → None`	Resume recording
`metrics`	`property → MetricsTracker`	Per-call metrics (`.live()`)
`dialog_machine`	`property (get/set)`	Dialog adapter - auto-wraps superdialog types
`data`	`dict[str, Any]`	Per-call scratch space

Out-of-band session control

Act on a live session from outside the call - your backend, an ops tool - via the Management SDK, targeting it by session ID:

from unpod import AsyncClient

async with AsyncClient(api_key="sk_...") as client:  # or set UNPOD_PLATFORM_TOKEN + UNPOD_ORG_HANDLE (preferred) and pass no args
    await client.sessions.end(session_id)

    await client.sessions.transfer(          # warm handoff supported here
        session_id,
        to_type="sip",
        to_config={"number": "+15551230000"},
        mode="warm",
        warm_handoff_ms=4000,
    )

    await client.sessions.merge(             # e.g. conference a supervisor in
        primary_session_id,
        secondary_session_ids=[other_session_id],
    )

Running in production

Monitoring

s = runner.stats()        # RunnerStats snapshot
s.in_flight               # current active calls
s.queued                  # dispatches waiting for capacity
s.capacity                # your max_sessions setting
s.completed_last_hour     # completed calls
s.failed_last_hour        # failed calls
s.mean_call_duration_s    # average call length

Graceful shutdown

Send SIGTERM (standard for containers and systemd). The runner stops accepting dispatches, waits up to drain_timeout_s for active calls, then exits. Or call await runner.shutdown() yourself.

Multiple runners

Run multiple AgentRunner processes with the same agent_id across machines. The orchestrator load-balances on reported capacity - no shared state needed.

Getting Started

Speech

Connectivity

Dialog

Calls

Session

Production

AgentRunner & Sessions

Install

The runner

Constructor

Runner lifecycle hooks

CallContext

The Session

Speaking

Interrupting

Transferring

Ending

Recording control

Per-call data

The main loop - `run()`

Live metrics

Session API reference

Out-of-band session control

Running in production

Monitoring

Graceful shutdown

Multiple runners

Next steps

Bring Your Agent

Hooks & Events

​Install

​The runner

​Constructor

​Runner lifecycle hooks

​CallContext

​The Session

​Speaking

​Interrupting

​Transferring

​Ending

​Recording control

​Per-call data

​The main loop - run()

​Live metrics

​Session API reference

​Out-of-band session control

​Running in production

​Monitoring

​Graceful shutdown

​Multiple runners

​Next steps

Bring Your Agent

Hooks & Events

Install

The runner

Constructor

Runner lifecycle hooks

CallContext

The Session

Speaking

Interrupting

Transferring

Ending

Recording control

Per-call data

The main loop - `run()`

Live metrics

Session API reference

Out-of-band session control

Running in production

Monitoring

Graceful shutdown

Multiple runners

Next steps