Python
Python SDK Reference
Technical reference for the Autoblocks Python SDK testing functionality.
Python SDK Reference
run_test_suite
The main entrypoint into the testing framework.
name | required | type | description |
---|---|---|---|
id | true | string | A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite. |
test_cases | true | list[BaseTestCase] | A list of instances that subclass BaseTestCase . These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly to fn and will also be made available to your evaluators. BaseTestCase is an abstract base class that requires you to implement the hash function. See Test case hashing for more information. |
evaluators | true | list[BaseTestEvaluator] | A list of instances that subclass BaseTestEvaluator . |
fn | true | Callable[[BaseTestCase], Any] | The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type. |
max_test_case_concurrency | false | int | The maximum number of test cases that can be running concurrently through fn . Useful to avoid rate limiting from external services, such as an LLM provider. |
BaseTestCase
An abstract base class that you can subclass to create your own test cases.
name | required | type | description |
---|---|---|---|
hash | true | Callable[[], str] | A method that returns a string that uniquely identifies the test case for its lifetime. See Test case hashing for more information. |
BaseTestEvaluator
An abstract base class that you can subclass to create your own evaluators.
name | required | type | description |
---|---|---|---|
id | true | string | A unique identifier for the evaluator. |
max_concurrency | false | number | The maximum number of concurrent calls to evaluate_test_case allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider. |
evaluate_test_case | true | Callable[[BaseTestCase, Any], Optional[Evaluation]] | Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous. |
Evaluation
A class that represents the result of an evaluation.
name | required | type | description |
---|---|---|---|
score | true | number | A number between 0 and 1 that represents the score of the evaluation. |
threshold | false | Threshold | An optional Threshold that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined. |
metadata | false | object | Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed. Attached metadata is surfaced in the test run comparison UI. |
Threshold
A class that defines the passing criteria for an evaluation.
name | required | type | description |
---|---|---|---|
lt | false | number | The score must be less than this number in order to be considered passing. |
lte | false | number | The score must be less than or equal to this number in order to be considered passing. |
gt | false | number | The score must be greater than this number in order to be considered passing. |
gte | false | number | The score must be greater than or equal to this number in order to be considered passing. |