Testing SDK Reference

Run a test suite

run_test_suite / runTestSuite is the main entrypoint into the testing framework. Below are the arguments you can pass to this function:

  • Name
    id
    Type
    string
    Required
    Description

    A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.

  • Name
    test_cases
    Type
    list[BaseTestCase]
    Required
    Description

    A list of instances that subclass BaseTestCase. These are typically dataclasses and can be any schema that facilitates testing your application. They will be passed directly to fn and will also be made available to your evaluators.

    BaseTestCase is an abstract base class that requires you to implement the hash function. See Test case hashing for more information.

  • Name
    evaluators
    Type
    list[BaseTestEvaluator]
    Required
    Description

    A list of instances that subclass BaseTestEvaluator.

  • Name
    fn
    Type
    Callable[[BaseTestCase], Any]
    Required
    Description

    The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.

  • Name
    max_test_case_concurrency
    Type
    int
    Description

    The maximum number of test cases that can be running concurrently through fn. Useful to avoid rate limiting from external services, such as an LLM provider.

  • Name
    grid_search_params
    Type
    dict[str, Sequence[Any]]
    Description

    Grid search enables you to test multiple combinations of parameters in your application. See grid search for more information.

run_test_suite(
  id="my-test-suite",
  test_cases=gen_test_cases(),
  evaluators=[HasAllSubstrings(), IsFriendly()],
  fn=test_fn,
)

Test case hashing

Test cases need to be uniquely identified by a hash while still allowing for the test case to evolve over time.

In general, your hash should likely be comprised of the properties you consider "inputs" to your test function. In the example below, the test cases are identified by a combination of their x and y properties. This allows you to change or add properties related to expectations without losing the identity and thus the history of the test case:

All test cases must subclass BaseTestCase and implement the hash method. The hash method should return a string that uniquely identifies the test case for its lifetime.

import dataclasses

from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.util import md5

@dataclasses.dataclass
class MyTestCase(BaseTestCase):
    # Input properties
    x: int
    y: int

    # Expectation properties
    expected_sum: int
    expected_product: int

    # I can add more properties here as my test case evolves
    # without losing the identity + history of the test case
    # expected_difference: int

    def hash(self) -> str:
        """ My hash is only comprised of my input properties. """
        return md5(f"{self.x}-{self.y}")

BaseTestEvaluator

An abstract base class that you can subclass to create your own evaluators.

  • Name
    id
    Type
    string
    Required
    Description

    A unique identifier for the evaluator.

  • Name
    max_concurrency
    Type
    number
    Description

    The maximum number of concurrent calls to evaluate_test_case allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.

  • Name
    evaluate_test_case
    Type
    Callable[[BaseTestCase, Any], Optional[Evaluation]]
    Required
    Description

    Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.

from autoblocks.testing.models import BaseTestEvaluator
from autoblocks.testing.models import Evaluation

class MyEvaluator(BaseTestEvaluator):
  id = "my-evaluator"

  max_concurrency = 5

  def evaluate_test_case(self, test_case: SomeTestCase, output: string) -> Evaluation:
    return Evaluation(score=0.5)

Evaluation

  • Name
    score
    Type
    number
    Required
    Description

    A number between 0 and 1 that represents the score of the evaluation.

  • Name
    threshold
    Type
    Threshold
    Description

    An optional Threshold that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.

  • Name
    metadata
    Type
    object
    Description

    Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed.

    Attached metadata is surfaced in the test run comparison UI.

from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold

# Evaluation with score and threshold
Evaluation(
  score=0.5,
  threshold=Threshold(lt=0.6),
)

# Evaluation with score, threshold, and metadata
Evaluation(
  score=0,
  threshold=Threshold(gte=1),
  metadata={
    "reason": "An explanation of why the evaluation failed",
  },
)

# Evaluation with score only
Evaluation(score=0.5)

Threshold

  • Name
    lt
    Type
    number
    Description

    The score must be less than this number in order to be considered passing.

  • Name
    lte
    Type
    number
    Description

    The score must be less than or equal to this number in order to be considered passing.

  • Name
    gt
    Type
    number
    Description

    The score must be greater than this number in order to be considered passing.

  • Name
    gte
    Type
    number
    Description

    The score must be greater than or equal to this number in order to be considered passing.

from autoblocks.testing.models import Evaluation
from autoblocks.testing.models import Threshold

Evaluation(
  score=0.5,
  # Score must greater than or equal to 1
  threshold=Threshold(gte=1),
)

Evaluation(
  score=0.5,
  # Score must be:
  # - greater than or equal to 0.4 AND
  # - less than 0.6
  threshold=Threshold(
    gte=0.4,
    lt=0.6,
  ),
)