Offline Testing & Evaluation

Autoblocks provides an advanced testing suite designed to help you ship AI product changes with confidence.

Local testing

Our Testing SDKs allow you to declaratively define tests for your LLM application and execute them either locally or in a CI/CD pipeline. Your tests can exist in a standalone script or be executed as part of a larger test framework.

Autoblocks also ships a CLI for interacting with our APIs via the command line. Product engineers can write tests utilizing the Autoblocks Testing SDKs, and conveniently run and store results with one single command.

Test cases

Test cases are a representative sample of user inputs you run through your product for testing purposes. Many product teams start out by storing their test cases in a spreadsheet, but quickly realize this isn't a scalable solution.

Autoblocks cloud—based test case management allows you to improve the relevance of your test cases, as well as your overall test case coverage.

Test case management

Teams can use the Autoblocks UI to manage test cases for their AI product. This unlocks powerful workflows enabling developers and other team members to continually experiment with their product. Developers can manage baseline test cases in code, while other team members can use the UI to add more over time.

Test case generation

To expedite the process of creating test cases, you can configure mappings to generate test cases from production data. This saves time from doing manual data entry, and allows you to create test cases that are representative of what is actually occurring in production.

Learn more about test cases

Regression testing in CI

Autoblocks enables you to easily run your test suites in a CI environment. You can choose to run tests when relevant code changes, on a schedule, or on demand. The Autoblocks CLI will automatically capture relevant git metadata to surface in the Autoblocks UI.

Test results are published on each CI build. In case of failed tests, you can easily navigate to the UI to investigate the issues further.

Learn more about CI testing

Test run comparison

Autoblocks test run comparisons simplify the process of testing different prompts, models, context retrieval mechanisms, and supporting code in your LLM applications.

This feature helps you investigate how product changes—such as modifying a prompt or changing the model—impact output quality.

Learn more about test run comparison UI

Human review

In the test run comparison UI, Autoblocks offers an intuitive workflow for providing human review on a test case's output that an automatic evaluation can not cover. Attach a human review to a singular test case result or enter into a session where you can review multiple test cases and runs at once.

Lean more about human review

Tracking test performance over time

Autoblocks tracks the performance of your tests over time. This allows you to detect subtle trends, such as model and prompt drift, that may go unnoticed in individual test runs.

By keeping test cases relevant and tracking performance over time, Autoblocks provides the tools you need to be confident your product is always performing its best.

Learn more about test run comparison UI