Python Quick Start

This guide will help you get started with creating and using evaluators in Python.

Installation

First, install the Autoblocks client:

pip install autoblocks-client

Creating an Evaluator

Let’s create a simple evaluator that checks if a response contains a specific substring:

from typing import Dict, Any
from autoblocks.testing import BaseTestEvaluator, Evaluation

class HasSubstring(BaseTestEvaluator):
    def __init__(self):
        super().__init__("has-substring")

    def evaluate_test_case(self, test_case: Dict[str, Any], output: str) -> Evaluation:
        score = 1 if test_case["expected_substring"] in output else 0
        return Evaluation(score=score, threshold={"gte": 1})

Using an LLM Judge

For more complex evaluations, you can use an LLM as a judge:

from typing import Dict, Any
from autoblocks.testing import BaseLLMJudge, Evaluation

class IsProfessionalTone(BaseLLMJudge):
    def __init__(self):
        super().__init__("is-professional-tone")
        self.max_concurrency = 2
        self.prompt = """Please evaluate the provided text for its professionalism in the context of formal communication.
Consider the following criteria in your assessment:

Tone and Style: Respectful, objective, and appropriately formal tone without bias or excessive emotionality.
Grammar and Punctuation: Correct grammar, punctuation, and capitalization.
Based on these criteria, provide a binary response where:

0 indicates the text does not maintain a professional tone.
1 indicates the text maintains a professional tone.
No further explanation or summary is required; just provide the number that represents your assessment."""

    async def score_content(self, content: str) -> float:
        # Your LLM call implementation here
        return 1.0

    async def evaluate_test_case(self, test_case: Dict[str, Any], output: str) -> Evaluation:
        score = await self.score_content(output)
        return Evaluation(score=score)

Using Out of Box Evaluators

Autoblocks provides several out-of-box evaluators that you can use directly:

from typing import Dict, Any
from autoblocks.testing import BaseAccuracy

class Accuracy(BaseAccuracy):
    def __init__(self):
        super().__init__("accuracy")

    def output_mapper(self, output: str) -> str:
        return output

    def expected_output_mapper(self, test_case: Dict[str, Any]) -> str:
        return test_case["expected_output"]

Running Evaluations

You can run evaluations using the test suite:

from typing import Dict, Any
from autoblocks.testing import run_test_suite

async def main():
    await run_test_suite(
        id="my-test-suite",
        test_cases=[
            {
                "input": "hello world",
                "expected_output": "hello world",
            }
        ],
        test_case_hash=["input"],
        fn=lambda test_case: test_case["input"],
        evaluators=[Accuracy()],
    )

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Next Steps