Prerequisites
Before using evaluators, you must configure your OpenAI API key in the settings. This key is required for the LLM-based evaluation functionality.LLM as a Judge
What is an LLM Evaluator?
An LLM evaluator uses a large language model to assess your agent’s performance by analyzing conversation transcripts. The evaluator:- Reviews the entire conversation
- Evaluates against specified criteria
- Provides a pass/fail result
- Explains the reasoning behind its decision
Creating an Evaluator
Define Success Criteria
Clearly specify what constitutes a successful interaction. For example:- “The agent should confirm the appointment date and time”
- “The agent must verify the caller’s name”
- “The agent should handle interruptions politely”
- “The agent must not share sensitive information”
Example Evaluation Criteria
Best Practices
Writing Effective Criteria
-
Be Specific
- Use clear, measurable objectives
- Avoid ambiguous language
- Include specific requirements
-
Focus on Key Behaviors
- Identify critical success factors
- Prioritize important interactions
- Define must-have elements
-
Consider Edge Cases
- Include criteria for handling interruptions
- Address potential misunderstandings
- Cover error scenarios
Example Scenarios
Basic Appointment ConfirmationUnderstanding Results
Evaluation Output
The LLM evaluator provides:- A pass/fail status
- A reason explaining the decision
Example Output
Tips for Success
-
Iterate on Criteria
- Start with basic requirements
- Test with different scenarios
- Refine based on results
-
Balance Strictness
- Set reasonable expectations
- Account for natural conversation flow
- Consider multiple valid approaches
-
Review and Adjust
- Monitor evaluation results
- Identify patterns in failures
- Update criteria as needed
Webhook Evaluators
What is a Webhook Evaluator?
A webhook evaluator allows you to implement custom evaluation logic by hosting your own evaluation endpoint. This gives you complete control over the evaluation process and allows for complex, domain-specific evaluation criteria.Webhook Payload Structure
The webhook will receive a JSON payload containing:- Input details about the scenario, persona, and data fields
- Output containing the conversation transcript
Implementing a Webhook Evaluator
You can host your webhook evaluator using services like Val Town, which provides a simple way to deploy and run JavaScript functions as webhooks. Example implementation using Val Town:Using SDKs for Types
You can use our official SDKs to ensure correct types for the return value:- JavaScript SDK - Use the
Evaluation
class in the@autoblocks/client/testing
package - Python SDK - Use the
Evaluation
dataclass in theautoblocks.testing.models
package
Best Practices
-
Error Handling
- Implement proper error handling
- Return meaningful error messages
- Log evaluation failures
-
Performance
- Keep evaluation logic efficient
- Handle timeouts appropriately
- Cache expensive computations
-
Testing
- Test with various scenarios
- Verify edge cases
- Monitor evaluation consistency
-
Security
- Secure your endpoint with authentication headers
- Validate incoming requests
- Use environment variables for sensitive data