Deploy - Unpod AI

An AgentRunner is a long-lived process: it registers with the Unpod orchestrator over WSS and heartbeats (at an interval the orchestrator assigns on registration). Under the default dial_out transport it then dials out to a per-call bridge each time a call is assigned; under the legacy serve transport it instead hosts a bridge connection per call. Provisioning is covered by the Provisioning checklist; this page is what changes when you leave your laptop.

Since v2 the runner defaults to the dial_out transport: both the control connection and the per-call bridge are outbound from the runner, so a runner behind NAT needs no public reachability, serving_url, or agent_secret. The reachability and authentication steps below apply only when you opt into the legacy transport="serve" model. See AgentRunner constructor.

What production needs

A production deployment should have:

A stable UNPOD_API_KEY - required by the AgentRunner for the orchestrator connection (full variable table)
A clear agent_id shared by every replica of the same agent
Capacity limits (max_sessions, permits_per_minute) that match your traffic
Shutdown handling so active calls can drain before the process exits
Observability for active calls, failures, and latency
(legacy serve transport only) a public serving_url so Unpod can reach the runner bridge, and an agent_secret so inbound bridge connections are signed and verified

If you are writing docs for app developers, this is the minimum contract:

Set the environment variables.
Run multiple replicas with the same agent_id when you need scale.
Restart the process under a supervisor.
(legacy serve transport only) expose the runner publicly and lock down the bridge with UNPOD_AGENT_SECRET.

Production checklist

1. Reachability - `serving_url` (legacy `serve` transport only)

This step applies only to transport="serve". Under the default dial_out transport the runner never listens - it dials out to the bridge URL delivered in each job.assign frame - so no public reachability is needed.

In serve mode the runner hosts a per-call bridge that Unpod dials into. It binds to 0.0.0.0:8765 by default; in production, tell the orchestrator where that bridge is publicly reachable:

export UNPOD_RUNNER_URL="wss://agents.example.com:8765"

(or pass serving_url=... to AgentRunner). The host/port must be reachable from Unpod.

2. Authentication - `agent_secret` (legacy `serve` transport only)

Also serve-only. Under the default dial_out transport the per-call bridge is authenticated by the per-call token embedded in the job.assign bridge_url, so agent_secret is ignored (the constructor warns if you pass it).

In serve mode, with an agent secret set, every inbound bridge connection is HMAC-verified (signed URLs, replay-protected). Without one, the runner accepts unsigned connections - acceptable only for local dev.

export UNPOD_AGENT_SECRET="a-long-random-secret"

3. Capacity

AgentRunner(
    entrypoint=handle_call,
    agent_id="my-agent",
    max_sessions=50,          # orchestrator never dispatches beyond this
    permits_per_minute=120,   # rate of new call acceptance
    drain_timeout_s=60,
)

4. Scaling - add replicas

Run more processes with the same agent_id - on one machine or many. They form a pool; the orchestrator load-balances dispatches across replicas by reported capacity. No shared state, no coordination needed.

# replica 1, replica 2, ... identical:
UNPOD_API_KEY=sk_... UNPOD_AGENT_SECRET=... python agent.py

5. Shutdown and supervision

On SIGTERM the runner stops accepting dispatches and drains active calls for up to drain_timeout_s before exiting - container- and systemd-friendly.

Under the default dial_out transport the runner auto-reconnects its orchestrator control socket with jittered exponential backoff (1s → 30s cap) and re-registers under the same worker_id; in-flight calls ride their own bridge sockets and survive a control-socket drop. Still run it under a supervisor with restart-on-exit (systemd Restart=always, Kubernetes restart policy, Docker --restart) to recover from process crashes and from a non-retriable transport rejection (an orchestrator too old to acknowledge dial_out). The legacy transport="serve" model does not auto-reconnect.

Environment summary

The minimum production .env - every variable defined in the Provisioning checklist:

UNPOD_API_KEY="sk_..."              # required by the AgentRunner
UNPOD_BASE_URL="api.unpod.ai"
UNPOD_PLATFORM_TOKEN="..."          # management client; falls back to UNPOD_API_KEY
UNPOD_ORG_HANDLE="acme"             # org-scoped / telephony calls

# Legacy transport="serve" only (ignored under the default dial_out transport):
UNPOD_AGENT_SECRET="long-random-secret"
UNPOD_RUNNER_URL="wss://agents.example.com:8765"

Constructor arguments beat the environment:

AgentRunner(base_url=...) overrides UNPOD_ORCHESTRATOR_URL
AgentRunner(serving_url=...) overrides UNPOD_RUNNER_URL
AsyncClient(base_url=...) overrides UNPOD_SERVICE_BASE_URL

Watch it run

Wire up metrics and runner stats before you need them.

​What production needs

​Production checklist

​1. Reachability - serving_url (legacy serve transport only)

​2. Authentication - agent_secret (legacy serve transport only)

​3. Capacity

​4. Scaling - add replicas

​5. Shutdown and supervision

​Environment summary

​Watch it run

What production needs

Production checklist

1. Reachability - `serving_url` (legacy `serve` transport only)

2. Authentication - `agent_secret` (legacy `serve` transport only)

3. Capacity

4. Scaling - add replicas

5. Shutdown and supervision

Environment summary

Watch it run