Python SDK Reference

`run_test_suite`

The main entrypoint into the testing framework.

name	required	type	description
`id`	true	`string`	A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
`test_cases`	true	`list[BaseTestCase]`	A list of instances that subclass `BaseTestCase`. These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly to `fn` and will also be made available to your evaluators. `BaseTestCase` is an abstract base class that requires you to implement the `hash` function. See Test case hashing for more information.
`evaluators`	true	`list[BaseTestEvaluator]`	A list of instances that subclass `BaseTestEvaluator`.
`fn`	true	`Callable[[BaseTestCase], Any]`	The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
`max_test_case_concurrency`	false	`int`	The maximum number of test cases that can be running concurrently through `fn`. Useful to avoid rate limiting from external services, such as an LLM provider.

`BaseTestCase`

An abstract base class that you can subclass to create your own test cases.

name	required	type	description
`hash`	true	`Callable[[], str]`	A method that returns a string that uniquely identifies the test case for its lifetime. See Test case hashing for more information.

`BaseTestEvaluator`

An abstract base class that you can subclass to create your own evaluators.

name	required	type	description
`id`	true	`string`	A unique identifier for the evaluator.
`max_concurrency`	false	`number`	The maximum number of concurrent calls to `evaluate_test_case` allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
`evaluate_test_case`	true	`Callable[[BaseTestCase, Any], Optional[Evaluation]]`	Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.

`Evaluation`

A class that represents the result of an evaluation.

name	required	type	description
`score`	true	`number`	A number between 0 and 1 that represents the score of the evaluation.
`threshold`	false	`Threshold`	An optional `Threshold` that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
`metadata`	false	`object`	Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed. Attached metadata is surfaced in the test run comparison UI.

`Threshold`

A class that defines the passing criteria for an evaluation.

name	required	type	description
`lt`	false	`number`	The score must be less than this number in order to be considered passing.
`lte`	false	`number`	The score must be less than or equal to this number in order to be considered passing.
`gt`	false	`number`	The score must be greater than this number in order to be considered passing.
`gte`	false	`number`	The score must be greater than or equal to this number in order to be considered passing.

Example

from dataclasses import dataclass
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
from autoblocks.testing.run import run_test_suite
from autoblocks.testing.util import md5

@dataclass
class MyTestCase(BaseTestCase):
    input: str
    expected_substrings: list[str]

    def hash(self) -> str:
        return md5(self.input)

class HasAllSubstrings(BaseTestEvaluator):
    id = "has-all-substrings"

    def evaluate_test_case(self, test_case: MyTestCase, output: str) -> Evaluation:
        score = 1.0
        for substring in test_case.expected_substrings:
            if substring not in output:
                score = 0.0
                break

        return Evaluation(
            score=score,
            threshold=Threshold(gte=1.0),
            metadata={
                "reason": "Output must contain all expected substrings",
            },
        )

run_test_suite(
    id="my-test-suite",
    test_cases=[
        MyTestCase(
            input="hello world",
            expected_substrings=["hello", "world"],
        )
    ],
    fn=lambda test_case: test_case.input,
    evaluators=[HasAllSubstrings()],
)

Introduction

Demo Apps

Prompt Management

Prompt Snippets

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

LLMs

Python SDK Reference

Python SDK Reference

`run_test_suite`

`BaseTestCase`

`BaseTestEvaluator`

`Evaluation`

`Threshold`

Example

Introduction

Demo Apps

Prompt Management

Prompt Snippets

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

LLMs

​Python SDK Reference

​run_test_suite

​BaseTestCase

​BaseTestEvaluator

​Evaluation

​Threshold

​Example

Python SDK Reference

`run_test_suite`

`BaseTestCase`

`BaseTestEvaluator`

`Evaluation`

`Threshold`

Example