TypeScript SDK Reference

`runTestSuite`

The main entrypoint into the testing framework.

name	required	type	description
`id`	true	`string`	A unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
`testCases`	true	`BaseTestCase[]`	A list of instances that subclass `BaseTestCase`. These can be any schema that facilitates testing your application. They will be passed directly to `fn` and will also be made available to your evaluators. `BaseTestCase` is an abstract base class that requires you to implement the `hash` function. See Test case hashing for more information.
`testCaseHash`	false	`(testCase: BaseTestCase) => string`	An optional function that returns a string that uniquely identifies a test case for its lifetime. If not provided, the test case’s `hash` method will be used.
`evaluators`	true	`BaseTestEvaluator[]`	A list of instances that subclass `BaseTestEvaluator`.
`fn`	true	`(testCase: BaseTestCase) => Promise<any> or any`	The function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
`maxTestCaseConcurrency`	false	`number`	The maximum number of test cases that can be running concurrently through `fn`. Useful to avoid rate limiting from external services, such as an LLM provider.

`BaseTestEvaluator`

An abstract base class that you can subclass to create your own evaluators.

name	required	type	description
`id`	true	`string`	A unique identifier for the evaluator.
`maxConcurrency`	false	`number`	The maximum number of concurrent calls to `evaluateTestCase` allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
`evaluateTestCase`	true	`(testCase: BaseTestCase, output: any) => Promise<Evaluation or undefined> or Evaluation or undefined`	Creates an evaluation on a test case and its output. This method can be synchronous or asynchronous.

`Evaluation`

An interface that represents the result of an evaluation.

name	required	type	description
`score`	true	`number`	A number between 0 and 1 that represents the score of the evaluation.
`threshold`	false	`Threshold`	An optional `Threshold` that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
`metadata`	false	`Record<string, any>`	Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed. Attached metadata is surfaced in the test run comparison UI.

`Threshold`

An interface that defines the passing criteria for an evaluation.

name	required	type	description
`lt`	false	`number`	The score must be less than this number in order to be considered passing.
`lte`	false	`number`	The score must be less than or equal to this number in order to be considered passing.
`gt`	false	`number`	The score must be greater than this number in order to be considered passing.
`gte`	false	`number`	The score must be greater than or equal to this number in order to be considered passing.

Example

import { BaseTestCase, BaseTestEvaluator, Evaluation, Threshold, runTestSuite } from '@autoblocks/testing';

interface MyTestCase extends BaseTestCase {
  input: string;
  expectedSubstrings: string[];
}

class HasAllSubstrings extends BaseTestEvaluator {
  id = 'has-all-substrings';

  async evaluateTestCase(testCase: MyTestCase, output: string): Promise<Evaluation> {
    let score = 1.0;
    for (const substring of testCase.expectedSubstrings) {
      if (!output.includes(substring)) {
        score = 0.0;
        break;
      }
    }

    return {
      score,
      threshold: { gte: 1.0 },
      metadata: {
        reason: 'Output must contain all expected substrings',
      },
    };
  }
}

await runTestSuite({
  id: 'my-test-suite',
  testCases: [
    {
      input: 'hello world',
      expectedSubstrings: ['hello', 'world'],
      hash: () => 'test-case-1',
    },
  ],
  fn: (testCase: MyTestCase) => testCase.input,
  evaluators: [new HasAllSubstrings()],
});

Introduction

Prompt Management

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

LLMs

TypeScript SDK Reference

TypeScript SDK Reference

`runTestSuite`

`BaseTestEvaluator`

`Evaluation`

`Threshold`

Example

Introduction

Prompt Management

Tracing

Testing

Evaluators

Datasets

Human Review

Workflow Builder

Agent Simulate (Voice)

Role-Based Access Control (RBAC)

LLMs

​TypeScript SDK Reference

​runTestSuite

​BaseTestEvaluator

​Evaluation

​Threshold

​Example

TypeScript SDK Reference

`runTestSuite`

`BaseTestEvaluator`

`Evaluation`

`Threshold`

Example