Skip to main content

Two engines, one contract

One Python package. No services, no daemons. Everything in-process. SuperDialog ships two conversation engines behind the same Agent protocol (turn / assist / chat_ctx / load_chat_ctx). Hosts, sessions, and adapters do not know which engine they are driving.
Animated SuperDialog engine contract diagram showing host platforms flowing through adapters, SessionWorker, the Agent protocol, and then branching to PlaybookAgent and DialogMachine engines.
  • Engine B - Playbook (default). The checkpoint-compound runtime for fluid conversations. Checkpoints gate outcomes, not utterances; a fast Talker streams every spoken turn while an async Director extracts, judges, and steers over an event-sourced log. This is where new investment goes.
  • Engine A - DialogMachine (legacy). The graph-railed state machine. A flow graph decides every transition; the LLM speaks within the rails. Fully supported; existing flows keep working - by default, compiled onto Engine B.
DialogMachine(source, llm, *, engine=...) is the recommended way in and drives either engine - the Playbook engine by default, the legacy graph runtime with engine="flow".

Library shape

superdialog/
  ├─ flow/                # Flow graph: nodes, edges, serialization
  ├─ machine/             # DialogStateMachine engine (Engine A internals)
  ├─ dialog_machine.py    # Public DialogMachine facade (unified entry point)
  ├─ playbook/            # Playbook engine (Engine B): models, events,
  │                       #   runtime, talker, director, compiler, replay
  ├─ agent.py             # Agent Protocol + TurnResult
  ├─ agents/              # LLMAgent, LangChainAgent (non-DM brains)
  ├─ session/             # Session, SessionHandle, SessionWorker, stores, locks
  ├─ chat_context.py      # ChatContext, ChatMessage (LiveKit-aligned)
  ├─ llm/                 # Model URI resolver and provider adapters
  ├─ tools/               # Python / HTTP / MCP tool wrappers
  ├─ cli/                 # superdialog generate / chat / optimize / playbook / flow / eval
  └─ adapters/            # LiveKit, PipeCat, FastAPI, WebSocket

Engine B - the Playbook runtime

The default engine runs declarative checkpoint journeys. Two LLM roles share one append-only event log:
  • A fast Talker streams every spoken turn with one LLM call.
  • An async Director makes one structured call per user utterance to extract typed slots, judge advance rules, run tools, and write a steering note.

One turn, in order

Animated Playbook turn runtime diagram showing user text entering PlaybookAgent.turn, splitting into a shielded Director task and Talker stream, joining, and returning checkpoint and outcome data.
  1. User text arrives. The agent snapshots state (version N) for the Talker.
  2. Director starts concurrently in a cancellation-shielded task: appends the utterance, then makes one structured call that extracts slots, judges the advance rules, and writes a 1-3 sentence steering note.
  3. Talker streams concurrently from snapshot N - persona, guidance, steering note, slots, and recent transcript packed into one streaming call; tokens go straight to the host. At a hard gate it barriers first.
  4. Quiescence. After the verdict is applied, the runtime hops until nothing moves: the entered checkpoint’s pipeline runs, judge: expr rules evaluate LLM-free, auto checkpoints speak and advance, and a terminal checkpoint ends the session with its outcome.
  5. Join and repair. The Talker’s speech is logged once; check_repairs compares it against later slot writes and nudges a self-correction if the Talker re-asked something already answered.
Barge-in is safe by construction: aborting the stream cancels speech, not the state machine - the Director runs to completion in a shielded scope.

The event-sourced log

Every mutation is an event; state is a pure fold over the log; the log is the audit artifact.
from superdialog.playbook import ConversationState, EventLog

text = agent.event_log.to_jsonl()                 # persist (JSONL, one event/line)
agent.load_event_log(EventLog.from_jsonl(text))   # lossless restore
state = ConversationState.fold(agent.event_log, playbook)
Because the log is the artifact, replay and eval are free: re-run the Director over recorded utterances to catch regressions, or score persona self-play sessions. See the API Reference for replay, run_session, and run_eval.

Gates and degradation

Soft gates never block - provisional values satisfy requires, the Talker streams immediately, correctness converges via the Director. Hard gates ( payments, identity) require confirmed slots and barrier the Talker until the verdict lands - on timeout it speaks a filler, then a hold line, never hangs. Every degradation rung is an event in the log, so degraded mode is auditable, not silent.

Engine A - DialogMachine (legacy)

A Flow is a directed graph: nodes (states), edges (transitions with natural-language conditions), and declarative actions. The graph decides what is possible; the LLM picks among the outgoing edges. Every transition is authored and every reachable path is enumerable - strong where determinism is the point.
Animated SuperDialog runtime diagram showing user text entering DialogMachine.turn, loading a node, building a prompt, calling the LLM, running tools, updating state, advancing an edge, and returning a turn result to CLI, FastAPI, LiveKit, or Unpod hosts.
from superdialog import DialogMachine, Flow

# engine="flow" selects the legacy graph runtime; the default is Playbook.
dm = DialogMachine(Flow.load("kyc.json"), llm="anthropic/claude-haiku-4-5", engine="flow")
reply = await dm.turn("hello")
Flexibility on this engine is rail-shaped, and each turn costs a route decision plus a speak call - exactly the friction the Playbook engine removes. By default, flow JSON runs compiled onto Engine B (compile_flow); you only opt into the original runtime with engine="flow". See Flows for graph authoring and migration.

Model URI resolver

LiveKit/litellm-style URIs route to any provider:
URIRoutes to
openai/gpt-4.1-miniOpenAI
anthropic/claude-haiku-4-5Anthropic
google/gemini-2.5-proGoogle
groq/llama-3.3-70bGroq
bedrock/<model>AWS Bedrock
vllm/<model>@<host>Self-hosted vLLM
ollama/<model>@<host>Self-hosted Ollama
openrouter/<vendor>/<model>OpenRouter
custom/<name>/<model>Developer-registered via register_llm_provider
On the Playbook engine, llm drives both the Talker and the Director unless you split them with director_llm= (a strong model to judge, a fast model to speak).

Adapter pattern

Adapters live in superdialog.adapters and are thin shims. The same agent - PlaybookAgent or legacy DialogMachine - passes through all of them.
AdapterUse case
DialogMachineLLM (LiveKit)Plug into Agent(llm=...) (accepts any Agent)
make_processor (PipeCat)Factory for FrameProcessor in a pipeline
FastAPIRouterMountable router with /turn, /stream, /reset
WebSocketRunnerStandalone WSS server for Unpod Voice Infra

What lives outside this library

SuperDialog ends at text in, text out - on both engines. The following are out of scope:
  • Audio processing
  • STT, TTS
  • Telephony, SIP, RTP
  • Media servers and WebRTC Rooms
  • Phone numbers, voice profiles
  • Billing