TypeScript SDK Reference

runTestSuite

The main entrypoint into the testing framework.

namerequiredtypedescription
idtruestringA unique ID for the test suite. This will be displayed in the Autoblocks platform and should remain the same for the lifetime of a test suite.
testCasestrueBaseTestCase[]A list of instances that subclass BaseTestCase. These can be any schema that facilitates testing your application. They will be passed directly to fn and will also be made available to your evaluators. BaseTestCase is an abstract base class that requires you to implement the hash function. See Test case hashing for more information.
testCaseHashfalse(testCase: BaseTestCase) => stringAn optional function that returns a string that uniquely identifies a test case for its lifetime. If not provided, the test case’s hash method will be used.
evaluatorstrueBaseTestEvaluator[]A list of instances that subclass BaseTestEvaluator.
fntrue(testCase: BaseTestCase) => Promise<any> or anyThe function you are testing. Its only argument is an instance of a test case. This function can be synchronous or asynchronous and can return any type.
maxTestCaseConcurrencyfalsenumberThe maximum number of test cases that can be running concurrently through fn. Useful to avoid rate limiting from external services, such as an LLM provider.

BaseTestEvaluator

An abstract base class that you can subclass to create your own evaluators.

namerequiredtypedescription
idtruestringA unique identifier for the evaluator.
maxConcurrencyfalsenumberThe maximum number of concurrent calls to evaluateTestCase allowed for the evaluator. Useful to avoid rate limiting from external services, such as an LLM provider.
evaluateTestCasetrue(testCase: BaseTestCase, output: any) => Promise<Evaluation or undefined> or Evaluation or undefinedCreates an evaluation on a test case and its output. This method can be synchronous or asynchronous.

Evaluation

An interface that represents the result of an evaluation.

namerequiredtypedescription
scoretruenumberA number between 0 and 1 that represents the score of the evaluation.
thresholdfalseThresholdAn optional Threshold that describes the range the score must be in in order to be considered passing. If no threshold is attached, the score is reported and the pass / fail status is undefined.
metadatafalseRecord<string, any>Key-value pairs that provide additional context about the evaluation. This is typically used to explain why an evaluation failed. Attached metadata is surfaced in the test run comparison UI.

Threshold

An interface that defines the passing criteria for an evaluation.

namerequiredtypedescription
ltfalsenumberThe score must be less than this number in order to be considered passing.
ltefalsenumberThe score must be less than or equal to this number in order to be considered passing.
gtfalsenumberThe score must be greater than this number in order to be considered passing.
gtefalsenumberThe score must be greater than or equal to this number in order to be considered passing.

Example

import { BaseTestCase, BaseTestEvaluator, Evaluation, Threshold, runTestSuite } from '@autoblocks/testing';

interface MyTestCase extends BaseTestCase {
  input: string;
  expectedSubstrings: string[];
}

class HasAllSubstrings extends BaseTestEvaluator {
  id = 'has-all-substrings';

  async evaluateTestCase(testCase: MyTestCase, output: string): Promise<Evaluation> {
    let score = 1.0;
    for (const substring of testCase.expectedSubstrings) {
      if (!output.includes(substring)) {
        score = 0.0;
        break;
      }
    }

    return {
      score,
      threshold: { gte: 1.0 },
      metadata: {
        reason: 'Output must contain all expected substrings',
      },
    };
  }
}

await runTestSuite({
  id: 'my-test-suite',
  testCases: [
    {
      input: 'hello world',
      expectedSubstrings: ['hello', 'world'],
      hash: () => 'test-case-1',
    },
  ],
  fn: (testCase: MyTestCase) => testCase.input,
  evaluators: [new HasAllSubstrings()],
});