Human Review

Human review mode is designed for humans to review, grade, and discuss test results. However, the schemas of your test case and output as they exist in your codebase often contain implementation details that are not relevant to a human reviewer. Each SDK provides methods that allow you to transform your test cases and outputs into human-readable formats.

from dataclasses import dataclass
from uuid import UUID
from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import HumanReviewField
from autoblocks.testing.models import HumanReviewFieldContentType
from autoblocks.testing.util import md5

@dataclass
class Document:
    uuid: UUID  # Not relevant for human review, so we don't include it below
    title: str
    content: str

@dataclass
class MyCustomTestCase(BaseTestCase):
    user_question: str
    documents: list[Document]

    def hash(self) -> str:
        return md5(self.user_question)

    def serialize_for_human_review(self) -> list[HumanReviewField]:
        return [
            HumanReviewField(
                name="Question",
                value=self.user_question,
                content_type=HumanReviewFieldContentType.TEXT,
            ),
        ] + [
            HumanReviewField(
                name=f"Document {i + 1}: {doc.title}",
                value=doc.content,
                content_type=HumanReviewFieldContentType.TEXT,
            )
            for i, doc in enumerate(self.documents)
        ]

@dataclass
class MyCustomOutput:
    answer: str
    reason: str

    # These fields are implementation details not needed
    # for human review, so they will be omitted below
    x: int
    y: int
    z: int

    def serialize_for_human_review(self) -> list[HumanReviewField]:
        return [
            HumanReviewField(
                name="Answer",
                value=self.answer,
                content_type=HumanReviewFieldContentType.TEXT
            ),
            HumanReviewField(
                name="Reason",
                value=self.reason,
                content_type=HumanReviewFieldContentType.TEXT,
            ),
        ]

Creating a human review job on the free plan

Whether you are on the free plan or a paid plan, you can create human review jobs directly without using runTestSuite.

from dataclasses import dataclass

from autoblocks.testing.models import BaseTestCase
from autoblocks.testing.models import HumanReviewField
from autoblocks.testing.models import HumanReviewFieldContentType
from autoblocks.testing.run import RunManager
from autoblocks.testing.util import md5


# Update with your test case type
@dataclass
class TestCase(BaseTestCase):
    input: str

    def serialize_for_human_review(self) -> list[HumanReviewField]:
        return [
            HumanReviewField(
                name="Input",
                value=self.input,
                content_type=HumanReviewFieldContentType.TEXT,
            ),
        ]

    def hash(self) -> str:
        return md5(self.input)


# Update with your output type
@dataclass
class Output:
    output: str

    def serialize_for_human_review(self) -> list[HumanReviewField]:
        return [
            HumanReviewField(
                name="Output",
                value=self.output,
                content_type=HumanReviewFieldContentType.TEXT,
            ),
        ]


run = RunManager[TestCase, Output](
    test_id="test-id",
)

run.start()
# Add results from your test suite here
run.add_result(
    test_case=TestCase(input="Hello, world!"),
    output=Output(output="Hi, world!"),
)
run.end()

run.create_human_review_job(
    assignee_email_address="${emailAddress}",
    name="Review for accuracy",
)

Using the results

You can use the results of a human review job for a variety of purposes, such as:

  • Fine tuning an evaluation model
  • Few shot examples in your LLM judges
  • Improving your core product based on expert feedback
  • and more!

Use the human review API to get the results of a human review job.