Two engines, one contract
One Python package. No services, no daemons. Everything in-process. SuperDialog ships two conversation engines behind the sameAgent protocol
(turn / assist / chat_ctx / load_chat_ctx). Hosts, sessions, and
adapters do not know which engine they are driving.
- Engine B - Playbook (default). The checkpoint-compound runtime for fluid conversations. Checkpoints gate outcomes, not utterances; a fast Talker streams every spoken turn while an async Director extracts, judges, and steers over an event-sourced log. This is where new investment goes.
- Engine A - DialogMachine (legacy). The graph-railed state machine. A flow graph decides every transition; the LLM speaks within the rails. Fully supported; existing flows keep working - by default, compiled onto Engine B.
DialogMachine(source, llm, *, engine=...) is the recommended way in and drives
either engine - the Playbook engine by default, the legacy graph runtime with
engine="flow".
Library shape
Engine B - the Playbook runtime
The default engine runs declarative checkpoint journeys. Two LLM roles share one append-only event log:- A fast Talker streams every spoken turn with one LLM call.
- An async Director makes one structured call per user utterance to extract typed slots, judge advance rules, run tools, and write a steering note.
One turn, in order
- User text arrives. The agent snapshots state (version N) for the Talker.
- Director starts concurrently in a cancellation-shielded task: appends the utterance, then makes one structured call that extracts slots, judges the advance rules, and writes a 1-3 sentence steering note.
- Talker streams concurrently from snapshot N - persona, guidance, steering note, slots, and recent transcript packed into one streaming call; tokens go straight to the host. At a hard gate it barriers first.
- Quiescence. After the verdict is applied, the runtime hops until nothing
moves: the entered checkpoint’s pipeline runs,
judge: exprrules evaluate LLM-free,autocheckpoints speak and advance, and a terminal checkpoint ends the session with its outcome. - Join and repair. The Talker’s speech is logged once;
check_repairscompares it against later slot writes and nudges a self-correction if the Talker re-asked something already answered.
The event-sourced log
Every mutation is an event; state is a pure fold over the log; the log is the audit artifact.replay,
run_session, and run_eval.
Gates and degradation
Soft gates never block - provisional values satisfyrequires, the Talker
streams immediately, correctness converges via the Director. Hard gates (
payments, identity) require confirmed slots and barrier the Talker until the
verdict lands - on timeout it speaks a filler, then a hold line, never hangs.
Every degradation rung is an event in the log, so degraded mode is auditable,
not silent.
Engine A - DialogMachine (legacy)
AFlow is a directed graph: nodes (states), edges (transitions with
natural-language conditions), and declarative actions. The graph decides what is
possible; the LLM picks among the outgoing edges. Every transition is
authored and every reachable path is enumerable - strong where determinism is
the point.
compile_flow); you only
opt into the original runtime with engine="flow". See
Flows for graph authoring and migration.
Model URI resolver
LiveKit/litellm-style URIs route to any provider:| URI | Routes to |
|---|---|
openai/gpt-4.1-mini | OpenAI |
anthropic/claude-haiku-4-5 | Anthropic |
google/gemini-2.5-pro | |
groq/llama-3.3-70b | Groq |
bedrock/<model> | AWS Bedrock |
vllm/<model>@<host> | Self-hosted vLLM |
ollama/<model>@<host> | Self-hosted Ollama |
openrouter/<vendor>/<model> | OpenRouter |
custom/<name>/<model> | Developer-registered via register_llm_provider |
llm drives both the Talker and the Director unless you
split them with director_llm= (a strong model to judge, a fast model to speak).
Adapter pattern
Adapters live insuperdialog.adapters and are thin shims. The same agent -
PlaybookAgent or legacy DialogMachine - passes through all of them.
| Adapter | Use case |
|---|---|
DialogMachineLLM (LiveKit) | Plug into Agent(llm=...) (accepts any Agent) |
make_processor (PipeCat) | Factory for FrameProcessor in a pipeline |
FastAPIRouter | Mountable router with /turn, /stream, /reset |
WebSocketRunner | Standalone WSS server for Unpod Voice Infra |
What lives outside this library
SuperDialog ends at text in, text out - on both engines. The following are out of scope:- Audio processing
- STT, TTS
- Telephony, SIP, RTP
- Media servers and WebRTC Rooms
- Phone numbers, voice profiles
- Billing