Python Quick Start

This guide will help you get started with using datasets in Python.

Installation

First, install the Autoblocks client:

pip install autoblocks-client

Setup

Set your Autoblocks API key from the settings page as an environment variable:

export AUTOBLOCKS_API_KEY=...

Basic Usage

Here’s how to fetch a dataset and use it in your code:

from autoblocks.client import AutoblocksAPIClient

client = AutoblocksAPIClient(api_key=os.environ["AUTOBLOCKS_API_KEY"])

# Get the latest revision of a dataset
dataset = client.get_dataset(
    name="My Dataset",
    schema_version="1",
)

# Get a specific revision
pinned_dataset = client.get_dataset(
    name="My Dataset",
    schema_version="1",
    revision_id="123",
)

# Get a subset of the dataset using splits
split_dataset = client.get_dataset(
    name="My Dataset",
    schema_version="1",
    splits=["split-1"],
)

print(dataset)

Working with Dataset Splits

Dataset splits allow you to divide your dataset into smaller, more manageable pieces. This is useful for creating subsets of your dataset for different testing scenarios.

# Get a specific split
training_split = client.get_dataset(
    name="My Dataset",
    schema_version="1",
    splits=["training"],
)

# Get multiple splits
test_splits = client.get_dataset(
    name="My Dataset",
    schema_version="1",
    splits=["test-1", "test-2"],
)

Using Datasets with Test Suites

You can use datasets with test suites to associate test results with dataset items:

from autoblocks.client import AutoblocksAPIClient
from autoblocks.testing import run_test_suite
from dataclasses import dataclass

client = AutoblocksAPIClient(api_key=os.environ["AUTOBLOCKS_API_KEY"])

@dataclass
class TestCase:
    input: str
    dataset_item_id: str

async def main():
    dataset = client.get_dataset(
        name="My Dataset",
        schema_version="1",
    )

    await run_test_suite(
        id="my-test-suite",
        test_cases=[
            TestCase(
                dataset_item_id=item.revision_id,
                **item.data,
            )
            for item in dataset.items
        ],
        test_case_hash=["input"],
        fn=lambda test_case: test_case.input,  # Replace with your LLM call
        evaluators=[],  # Replace with your evaluators
    )

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Next Steps