Testing SDK Reference

Run a test suite

run_test_suite / runTestSuite is the main entrypoint into the testing framework. Below are the arguments you can pass to this function:

Name
id
Type
string
Required
Description
A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
Name
test_cases
Type
list[BaseTestCase]
Required
Description
A list of instances that subclass BaseTestCase. These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly to fn and will also be made available to your evaluators.
BaseTestCase is an abstract base class that requires you to implement the hash function. See Test case hashing for more information.
Name
evaluators
Type
list[BaseTestEvaluator]
Required
Description
A list of instances that subclass BaseTestEvaluator.
Name
fn
Type
Callable[[BaseTestCase], Any]
Required
Description
The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
Name
max_test_case_concurrency
Type
int
Description
The maximum number of test cases that can be running concurrently through fn. Useful to avoid rate limiting from external services, such as an LLM provider.
Name
grid_search_params
Type
dict[str, Sequence[Any]]
Description
Grid search enables you to test multiple combinations of parameters in your application. See grid search for more information.

run_test_suite(
  id="my-test-suite",
  test_cases=gen_test_cases(),
  evaluators=[HasAllSubstrings(), IsFriendly()],
  fn=test_fn,
)

Test case hashing

Test cases need to be uniquely identified by a hash while still allowing for the test case to evolve over time.

In general, your hash should likely be comprised of the properties you consider "inputs" to your test function. In the example below, the test cases are identified by a combination of their x and y properties. This allows you to change or add properties related to expectations without losing the identity and thus the history of the test case:

All test cases must subclass BaseTestCase and implement the hash method. The hash method should return a string that uniquely identifies the test case for its lifetime.

import dataclasses

from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.util import md5

@dataclasses.dataclass
class MyTestCase(BaseTestCase):
    # Input properties
    x: int
    y: int

    # Expectation properties
    expected_sum: int
    expected_product: int

    # I can add more properties here as my test case evolves
    # without losing the identity + history of the test case
    # expected_difference: int

    def hash(self) -> str:
        """ My hash is only comprised of my input properties. """
        return md5(f"{self.x}-{self.y}")

Hashes only need to be unique within a single test suite.

Hashes should be no more than 100 characters.

`BaseTestEvaluator`

An abstract base class that you can subclass to create your own evaluators.

Name
id
Type
string
Required
Description
A unique identifier for the evaluator.
Name
max_concurrency
Type
number
Description
The maximum number of concurrent calls to evaluate_test_case allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
Name
evaluate_test_case
Type
Callable[[BaseTestCase, Any], Optional[Evaluation]]
Required
Description
Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.

from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation

class MyEvaluator(BaseTestEvaluator):
  id = "my-evaluator"

  max_concurrency = 5

  def evaluate_test_case(self, test_case: SomeTestCase, output: string) -> Evaluation:
    return Evaluation(score=0.5)

Not all evaluators need to reference the test case. Some are "stateless" and evaluate the output in isolation.

`Evaluation`

Name
score
Type
number
Required
Description
A number between 0 and 1 that represents the score of the evaluation.
Name
threshold
Type
Threshold
Description
An optional Threshold that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
Name
metadata
Type
object
Description
Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed.
Attached metadata is surfaced in the test run comparison UI.

from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold

# Evaluation with score and threshold
Evaluation(
  score=0.5,
  threshold=Threshold(lt=0.6),
)

# Evaluation with score, threshold, and metadata
Evaluation(
  score=0,
  threshold=Threshold(gte=1),
  metadata={
    "reason": "An explanation of why the evaluation failed",
  },
)

# Evaluation with score only
Evaluation(score=0.5)

`Threshold`

You can use any combination of these properties to define a range for the score.

Name
lt
Type
number
Description
The score must be less than this number in order to be considered passing.
Name
lte
Type
number
Description
The score must be less than or equal to this number in order to be considered passing.
Name
gt
Type
number
Description
The score must be greater than this number in order to be considered passing.
Name
gte
Type
number
Description
The score must be greater than or equal to this number in order to be considered passing.

from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold

Evaluation(
  score=0.5,
  # Score must greater than or equal to 1
  threshold=Threshold(gte=1),
)

Evaluation(
  score=0.5,
  # Score must be:
  # - greater than or equal to 0.4 AND
  # - less than 0.6
  threshold=Threshold(
    gte=0.4,
    lt=0.6,
  ),
)