Skip to main content

Thinking in Playbooks

If you have built LLM chatbots before, SuperDialog’s default engine asks for one shift in thinking:
Traditional botA rigid graphSuperDialog playbook
Write one long promptWire every transition by handAuthor checkpoints that gate outcomes
LLM decides everythingLLM can only traverse defined edgesLLM owns the phrasing; the framework owns the outcomes
No structureUsers don’t follow your graphConversation is free inside a checkpoint; progress is gated
The key move: checkpoints gate outcomes, not utterances. You don’t script what the agent says next - you declare what “done” means for each step, and the model speaks freely to get there.

The core model

A playbook is one or more journeys, each a list of checkpoints. A checkpoint is a call-center-script unit with four parts:
  • goal - what “done” means for this step (“Have the city, date, and party size”)
  • slots - typed data to extract while here (city: str, date: date, players: int)
  • guidance - prose the agent speaks from (it owns the wording)
  • advance_when - an ordered list of rules that move the conversation forward
Animated Playbook checkpoint model diagram showing goal, slots, guidance, and advance rules inside a checkpoint, free conversation within it, and movement to the next checkpoint when an outcome is met.
Inside a checkpoint the conversation is free - the caller can answer in any order, give everything in one breath, or change their mind. The framework’s job is only to decide when the goal is actually met and where to go next.

Start with the topology, then write the steps

Before writing any prose, map your conversation:
  1. What are the distinct steps (checkpoints) in this conversation?
  2. What data (slots) must each step capture?
  3. What outcomes move the conversation forward (advance rules)?
  4. What can go wrong at each step? (caller refuses, asks a side question)
Then write it in the simple format - prose steps and a persona, the same thing superdialog generate produces:
goal: "Book a haircut and confirm it."
persona:
  name: Mira
  voice_style: "Warm and brief. One question at a time."
  identity: "You are Mira, a booking assistant for Glow Studio."
playbook:
  - id: greet
    purpose: "Open the call."
    say: "Greet the caller and ask how you can help."
    done_when: "Caller is ready to book."
  - id: collect
    purpose: "Get the booking details."
    say: "Ask for their name and preferred service."
    collect: [name, service]
    done_when: "Name and service are captured."
  - id: confirm
    purpose: "Confirm and close."
    say: "Read back the booking and confirm."
    done_when: "Caller has confirmed."
See Playbooks for the full section reference and when to graduate to the full format (typed slots, gates, pipelines, multiple journeys).

Test it - no infrastructure needed

superdialog generate "Book a haircut and confirm it." --output salon.yaml
superdialog chat salon.yaml
Full interactive REPL against your playbook. No Unpod account, no phone number, no voice setup required.
> I'd like to book a haircut
Hi! I'd be happy to help. May I have your name?
> Mira, and I'd like a colour
[checkpoint=collect ended=False]
The status line names the live checkpoint, so you can watch the conversation advance as outcomes are met. Iterate on salon.yaml, re-run chat - the loop takes seconds.

Soft gates vs hard gates

Every checkpoint has a gate. This is where fluidity meets reliability:
  • Soft gate (default) - provisional values are enough; the agent never blocks. The model keeps the conversation moving and the extracted data settles in the background. Use it for everything that isn’t irreversible.
  • Hard gate - for payments, identity, anything you can’t undo. Required slots must be confirmed (not just provisionally extracted), and the agent briefly waits for that confirmation before it speaks the gated line. A single model guess can never push past a hard gate on its own.
- id: take_payment
  goal: "Charge the deposit."
  gate: hard                 # waits for confirmed slots before proceeding
  advance_when:
    - when: "deposit charged"
      to: booking.confirmed
      requires: [card_token, amount]

Why one streaming call, not two

A rigid graph has to make two LLM calls per turn: one to decide which edge fires, then one to speak. For voice, that adds latency before the caller hears anything. The Playbook engine splits the work across two roles that share one event-sourced log:
  • A fast Talker streams every spoken turn with one LLM call - tokens go straight to the host (and to TTS for voice).
  • An async Director does the judging - extract slots, evaluate advance rules, run tools - off the speech path.
The caller hears the agent immediately; correctness converges a beat behind. Barge-in is safe by construction: aborting the stream cancels speech, never the state. (The full runtime is in Architecture.)

When a graph still fits

The checkpoint model is the default and the right choice for most conversations. A hand-authored flow graph still earns its place when:
  • Compliance / auditability - you need every reachable path enumerable and lintable as a spec.
  • Strict determinism - the conversation truly is a fixed decision tree with no room for the model to improvise.
You don’t lose anything by authoring a graph: by default it runs compiled onto the Playbook engine (Playbook.load detects flow JSON and converts it), and you can still run the original graph runtime with engine="flow" / superdialog chat --mode flow. See Flows for graph authoring and the migration path.

Next steps

Playbooks

The simple and full authoring formats

Quickstart

Generate and run your first playbook

Architecture

The Talker/Director runtime and event log

Flows (legacy)

Graph authoring and the migration path