Analyze Results - Autoblocks Documentation

Understanding Results Structure

Executions vs Test Runs

Executions: Individual simulation runs with their specific results
Test Runs: Groups of executions bundled together for comparison over time
- Compare performance across different scenarios
- Track improvements between iterations
- Analyze patterns across multiple runs

Reviewing Executions

Execution Details

Each execution provides detailed information about:

Timestamp of the run
Duration of the conversation
Tokens used
Input/Output pairs
Pass/Fail status
Evaluation results

Transcript Review

Review conversations in detail with:

Complete conversation transcript
Audio playback of the interaction
Turn-by-turn message analysis

Performance Metrics

Track important metrics including:

Response times
Token usage
Success rates
Evaluation results
Overall pass rates

Advanced Search

Search Syntax

Use powerful search operators to find specific executions:

String Search:

field:value - Contains search (e.g. source:aws, input:hello)
field!:value - Not contains search (e.g. source!:aws)
field=value - Exact match (e.g. source=aws)
field!=value - Not equals (e.g. source!=aws)
field is:empty - Check for empty values

Numeric Search:

duration>100 - Greater than
duration<500 - Less than
duration>=100 - Greater than or equal
duration<=500 - Less than or equal
duration=100 - Exact match

Free Text Search:

Simple text search (e.g. hello) - Searches across all text fields
Use AND to combine terms (e.g. hello AND world)
Use OR for alternatives (e.g. hello OR world)
Use parentheses for grouping (e.g. (hello OR world) AND test)

Combining Searches:

Mix and match different operators (e.g. source:aws AND environment=prod)
Use parentheses for complex queries (e.g. (duration>100 OR duration<=50) AND environment=prod)
Combine free text with specific field searches (e.g. hello AND source:aws)

Quoted Strings:

Use quotes for multi-word values (e.g. source:"aws lambda")

Available Fields:

source - Source of the execution
environment - Environment name
input - Input text
output - Output text
message - Run message
duration - Duration in milliseconds

Visualization Tools

Timeline View

Visual representation of execution timing
Identify patterns in response times
Spot anomalies or performance issues
Track conversation flow

Performance Graphs

Success rate trends
Duration distribution
Token usage patterns
Data capture accuracy over time

Comparison Tools

Compare executions across:

Different personas
Time periods
Edge cases
Data field variations

Best Practices

Analysis Workflow

Review Overall Metrics
- Check success rates
- Analyze duration patterns
- Review token usage
Deep Dive into Failures
- Examine failed executions
- Review error patterns
- Identify common issues
Compare Across Runs
- Track improvements
- Identify regressions
- Analyze pattern changes
Document Findings
- Note successful strategies
- Document areas for improvement
- Track action items

Tips for Effective Analysis

Start with high-level metrics
Use search to find specific patterns
Compare similar scenarios
Track improvements over time
Document unusual cases
Share insights with team

EvaluatorsLearn how to create and use LLM evaluators to assess your AI agent's performance

On this page

Understanding Results Structure
Executions vs Test Runs
Reviewing Executions
Execution Details
Transcript Review
Performance Metrics
Advanced Search
Search Syntax
Visualization Tools
Timeline View
Performance Graphs
Comparison Tools
Best Practices
Analysis Workflow
Tips for Effective Analysis