Testing SDK Reference
Run a test suite
run_test_suite
/ runTestSuite
is the main entrypoint into the testing framework.
Below are the arguments you can pass to this function:
- Name
id
- Type
- string
- Required
- Description
A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
- Name
test_cases
- Type
- list[BaseTestCase]
- Required
- Description
A list of instances that subclass
BaseTestCase
. These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly tofn
and will also be made available to your evaluators.BaseTestCase
is an abstract base class that requires you to implement thehash
function. See Test case hashing for more information.
- Name
evaluators
- Type
- list[BaseTestEvaluator]
- Required
- Description
A list of instances that subclass
BaseTestEvaluator
.
- Name
fn
- Type
- Callable[[BaseTestCase], Any]
- Required
- Description
The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
- Name
max_test_case_concurrency
- Type
- int
- Description
The maximum number of test cases that can be running concurrently through
fn
. Useful to avoid rate limiting from external services, such as an LLM provider.
- Name
grid_search_params
- Type
- dict[str, Sequence[Any]]
- Description
Grid search enables you to test multiple combinations of parameters in your application. See grid search for more information.
run_test_suite(
id="my-test-suite",
test_cases=gen_test_cases(),
evaluators=[HasAllSubstrings(), IsFriendly()],
fn=test_fn,
)
Test case hashing
Test cases need to be uniquely identified by a hash while still allowing for the test case to evolve over time.
In general, your hash should likely be comprised of the properties you consider "inputs" to your test function.
In the example below, the test cases are identified by a combination of their x
and y
properties.
This allows you to change or add properties related to expectations without losing the identity and thus the history of the test case:
All test cases must subclass BaseTestCase
and implement the hash
method.
The hash
method should return a string that uniquely identifies the test case for its lifetime.
import dataclasses
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.util import md5
@dataclasses.dataclass
class MyTestCase(BaseTestCase):
# Input properties
x: int
y: int
# Expectation properties
expected_sum: int
expected_product: int
# I can add more properties here as my test case evolves
# without losing the identity + history of the test case
# expected_difference: int
def hash(self) -> str:
""" My hash is only comprised of my input properties. """
return md5(f"{self.x}-{self.y}")
Hashes only need to be unique within a single test suite.
Hashes should be no more than 100 characters.
BaseTestEvaluator
An abstract base class that you can subclass to create your own evaluators.
- Name
id
- Type
- string
- Required
- Description
A unique identifier for the evaluator.
- Name
max_concurrency
- Type
- number
- Description
The maximum number of concurrent calls to
evaluate_test_case
allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
- Name
evaluate_test_case
- Type
- Callable[[BaseTestCase, Any], Optional[Evaluation]]
- Required
- Description
Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.
from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation
class MyEvaluator(BaseTestEvaluator):
id = "my-evaluator"
max_concurrency = 5
def evaluate_test_case(self, test_case: SomeTestCase, output: string) -> Evaluation:
return Evaluation(score=0.5)
Not all evaluators need to reference the test case. Some are "stateless" and evaluate the output in isolation.
Evaluation
- Name
score
- Type
- number
- Required
- Description
A number between 0 and 1 that represents the score of the evaluation.
- Name
threshold
- Type
- Threshold
- Description
An optional
Threshold
that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
- Name
metadata
- Type
- object
- Description
Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed.
Attached metadata is surfaced in the test run comparison UI.
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
# Evaluation with score and threshold
Evaluation(
score=0.5,
threshold=Threshold(lt=0.6),
)
# Evaluation with score, threshold, and metadata
Evaluation(
score=0,
threshold=Threshold(gte=1),
metadata={
"reason": "An explanation of why the evaluation failed",
},
)
# Evaluation with score only
Evaluation(score=0.5)
Threshold
You can use any combination of these properties to define a range for the score.
- Name
lt
- Type
- number
- Description
The score must be less than this number in order to be considered passing.
- Name
lte
- Type
- number
- Description
The score must be less than or equal to this number in order to be considered passing.
- Name
gt
- Type
- number
- Description
The score must be greater than this number in order to be considered passing.
- Name
gte
- Type
- number
- Description
The score must be greater than or equal to this number in order to be considered passing.
from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold
Evaluation(
score=0.5,
# Score must greater than or equal to 1
threshold=Threshold(gte=1),
)
Evaluation(
score=0.5,
# Score must be:
# - greater than or equal to 0.4 AND
# - less than 0.6
threshold=Threshold(
gte=0.4,
lt=0.6,
),
)