Python Quick Start

This guide will help you get started with creating and using evaluators in Python.

Installation

First, install the Autoblocks client:

pip install autoblocks-client

Creating an Evaluator

Let’s create a simple evaluator that checks if a response contains a specific substring:

from typing import Dict, Any
from autoblocks.testing import BaseTestEvaluator, Evaluation

class HasSubstring(BaseTestEvaluator):
    def __init__(self):
        super().__init__("has-substring")

    def evaluate_test_case(self, test_case: Dict[str, Any], output: str) -> Evaluation:
        score = 1 if test_case["expected_substring"] in output else 0
        return Evaluation(score=score, threshold={"gte": 1})

Using an LLM Judge

For more complex evaluations, you can use an LLM as a judge:

from typing import Dict, Any
from autoblocks.testing import BaseLLMJudge, Evaluation

class IsProfessionalTone(BaseLLMJudge):
    def __init__(self):
        super().__init__("is-professional-tone")
        self.max_concurrency = 2
        self.prompt = """Please evaluate the provided text for its professionalism in the context of formal communication.
Consider the following criteria in your assessment:

Tone and Style: Respectful, objective, and appropriately formal tone without bias or excessive emotionality.
Grammar and Punctuation: Correct grammar, punctuation, and capitalization.
Based on these criteria, provide a binary response where:

0 indicates the text does not maintain a professional tone.
1 indicates the text maintains a professional tone.
No further explanation or summary is required; just provide the number that represents your assessment."""

    async def score_content(self, content: str) -> float:
        # Your LLM call implementation here
        return 1.0

    async def evaluate_test_case(self, test_case: Dict[str, Any], output: str) -> Evaluation:
        score = await self.score_content(output)
        return Evaluation(score=score)

Using Out of Box Evaluators

Autoblocks provides several out-of-box evaluators that you can use directly:

from typing import Dict, Any
from autoblocks.testing import BaseAccuracy

class Accuracy(BaseAccuracy):
    def __init__(self):
        super().__init__("accuracy")

    def output_mapper(self, output: str) -> str:
        return output

    def expected_output_mapper(self, test_case: Dict[str, Any]) -> str:
        return test_case["expected_output"]

Running Evaluations

You can run evaluations using the test suite:

from typing import Dict, Any
from autoblocks.testing import run_test_suite

async def main():
    await run_test_suite(
        id="my-test-suite",
        test_cases=[
            {
                "input": "hello world",
                "expected_output": "hello world",
            }
        ],
        test_case_hash=["input"],
        fn=lambda test_case: test_case["input"],
        evaluators=[Accuracy()],
    )

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Introduction

Prompt Management

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

Python Quick Start

Python Quick Start

Installation

Creating an Evaluator

Using an LLM Judge

Using Out of Box Evaluators

Running Evaluations

Next Steps

Introduction

Prompt Management

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

​Python Quick Start

​Installation

​Creating an Evaluator

​Using an LLM Judge

​Using Out of Box Evaluators

​Running Evaluations

​Next Steps

Python Quick Start

Installation

Creating an Evaluator

Using an LLM Judge

Using Out of Box Evaluators

Running Evaluations

Next Steps