Documentation Index
Fetch the complete documentation index at: https://docs.autoblocks.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before using evaluators, you must configure your OpenAI API key in the settings. This key is required for the LLM-based evaluation functionality.LLM as a Judge
What is an LLM Evaluator?
An LLM evaluator uses a large language model to assess your agent’s performance by analyzing conversation transcripts. The evaluator:- Reviews the entire conversation
- Evaluates against specified criteria
- Provides a pass/fail result
- Explains the reasoning behind its decision
Creating an Evaluator
Define Success Criteria
Clearly specify what constitutes a successful interaction. For example:- “The agent should confirm the appointment date and time”
- “The agent must verify the caller’s name”
- “The agent should handle interruptions politely”
- “The agent must not share sensitive information”
Example Evaluation Criteria
Best Practices
Writing Effective Criteria
-
Be Specific
- Use clear, measurable objectives
- Avoid ambiguous language
- Include specific requirements
-
Focus on Key Behaviors
- Identify critical success factors
- Prioritize important interactions
- Define must-have elements
-
Consider Edge Cases
- Include criteria for handling interruptions
- Address potential misunderstandings
- Cover error scenarios
Example Scenarios
Basic Appointment ConfirmationUnderstanding Results
Evaluation Output
The LLM evaluator provides:- A pass/fail status
- A reason explaining the decision
Example Output
Tips for Success
-
Iterate on Criteria
- Start with basic requirements
- Test with different scenarios
- Refine based on results
-
Balance Strictness
- Set reasonable expectations
- Account for natural conversation flow
- Consider multiple valid approaches
-
Review and Adjust
- Monitor evaluation results
- Identify patterns in failures
- Update criteria as needed
Webhook Evaluators
What is a Webhook Evaluator?
A webhook evaluator allows you to implement custom evaluation logic by hosting your own evaluation endpoint. This gives you complete control over the evaluation process and allows for complex, domain-specific evaluation criteria.Webhook Payload Structure
The webhook will receive a JSON payload containing:- Input details about the scenario, persona, and data fields
- Output containing the conversation transcript
Implementing a Webhook Evaluator
You can host your webhook evaluator using services like Val Town, which provides a simple way to deploy and run JavaScript functions as webhooks. Example implementation using Val Town:Using SDKs for Types
You can use our official SDKs to ensure correct types for the return value:- JavaScript SDK - Use the
Evaluationclass in the@autoblocks/client/testingpackage - Python SDK - Use the
Evaluationdataclass in theautoblocks.testing.modelspackage
Best Practices
-
Error Handling
- Implement proper error handling
- Return meaningful error messages
- Log evaluation failures
-
Performance
- Keep evaluation logic efficient
- Handle timeouts appropriately
- Cache expensive computations
-
Testing
- Test with various scenarios
- Verify edge cases
- Monitor evaluation consistency
-
Security
- Secure your endpoint with authentication headers
- Validate incoming requests
- Use environment variables for sensitive data

