Quickstart¶
This guide walks you through creating your first understudy test.
Creating a Scene¶
A Scene defines a test scenario with:
The starting prompt
A conversation plan for the simulated user
World state (context)
Expectations for what should/shouldn’t happen
from understudy import Scene, Persona, Expectations
scene = Scene(
id="return_nonreturnable_item",
starting_prompt="I want to return something I bought.",
conversation_plan="""
Goal: Return earbuds from order ORD-10027.
- If asked for order ID: provide ORD-10027
Return reason: defective.
If denied: accept escalation.
""",
persona=Persona.FRUSTRATED_BUT_COOPERATIVE,
max_turns=20,
context={
"orders": {
"ORD-10027": {
"items": [{"name": "Earbuds", "category": "personal_audio"}],
"status": "delivered",
}
},
"policy": {
"non_returnable_categories": ["personal_audio"],
},
},
expectations=Expectations(
required_tools=["lookup_order", "get_return_policy"],
forbidden_tools=["create_return"],
allowed_terminal_states=["return_denied_policy", "escalated_to_human"],
),
)
Or load from YAML:
scene = Scene.from_file("scenes/return_nonreturnable_item.yaml")
Running a Rehearsal¶
from understudy import run
from understudy.adk import ADKApp
app = ADKApp(agent=your_agent)
trace = run(app, scene)
Making Assertions¶
Assert against the trace (what happened), not the prose:
def test_policy_enforcement():
trace = run(app, scene)
assert trace.called("lookup_order")
assert trace.called("get_return_policy")
assert not trace.called("create_return")
assert trace.terminal_state in {"return_denied_policy", "escalated_to_human"}
Using LLM Judges¶
For subjective qualities like tone and clarity:
from understudy import Judge, TONE_EMPATHY
judge = Judge(rubric=TONE_EMPATHY, samples=5)
result = judge.evaluate(trace)
assert result.score == 1
assert result.agreement_rate >= 0.6
Running a Suite¶
Run all scenes in a directory:
from understudy import Suite
suite = Suite.from_directory("scenes/")
results = suite.run(app, parallel=4)
results.to_junit_xml("test-results/understudy.xml")
assert results.all_passed