Quickstart

This guide walks you through creating your first understudy test.

Creating a Scene

A Scene defines a test scenario with:

  • The starting prompt

  • A conversation plan for the simulated user

  • World state (context)

  • Expectations for what should/shouldn’t happen

from understudy import Scene, Persona, Expectations

scene = Scene(
    id="return_nonreturnable_item",
    starting_prompt="I want to return something I bought.",
    conversation_plan="""
        Goal: Return earbuds from order ORD-10027.
        - If asked for order ID: provide ORD-10027
        Return reason: defective.
        If denied: accept escalation.
    """,
    persona=Persona.FRUSTRATED_BUT_COOPERATIVE,
    max_turns=20,
    context={
        "orders": {
            "ORD-10027": {
                "items": [{"name": "Earbuds", "category": "personal_audio"}],
                "status": "delivered",
            }
        },
        "policy": {
            "non_returnable_categories": ["personal_audio"],
        },
    },
    expectations=Expectations(
        required_tools=["lookup_order", "get_return_policy"],
        forbidden_tools=["create_return"],
        allowed_terminal_states=["return_denied_policy", "escalated_to_human"],
    ),
)

Or load from YAML:

scene = Scene.from_file("scenes/return_nonreturnable_item.yaml")

Running a Rehearsal

from understudy import run
from understudy.adk import ADKApp

app = ADKApp(agent=your_agent)
trace = run(app, scene)

Making Assertions

Assert against the trace (what happened), not the prose:

def test_policy_enforcement():
    trace = run(app, scene)

    assert trace.called("lookup_order")
    assert trace.called("get_return_policy")
    assert not trace.called("create_return")
    assert trace.terminal_state in {"return_denied_policy", "escalated_to_human"}

Using LLM Judges

For subjective qualities like tone and clarity:

from understudy import Judge, TONE_EMPATHY

judge = Judge(rubric=TONE_EMPATHY, samples=5)
result = judge.evaluate(trace)

assert result.score == 1
assert result.agreement_rate >= 0.6

Running a Suite

Run all scenes in a directory:

from understudy import Suite

suite = Suite.from_directory("scenes/")
results = suite.run(app, parallel=4)
results.to_junit_xml("test-results/understudy.xml")
assert results.all_passed