Skip to main contentUnderstanding Results Structure
Executions vs Test Runs
- Executions: Individual simulation runs with their specific results
- Test Runs: Groups of executions bundled together for comparison over time
- Compare performance across different scenarios
- Track improvements between iterations
- Analyze patterns across multiple runs
Reviewing Executions
Execution Details
Each execution provides detailed information about:
- Timestamp of the run
- Duration of the conversation
- Tokens used
- Input/Output pairs
- Pass/Fail status
- Evaluation results
Transcript Review
Review conversations in detail with:
- Complete conversation transcript
- Audio playback of the interaction
- Turn-by-turn message analysis
Track important metrics including:
- Response times
- Token usage
- Success rates
- Evaluation results
- Overall pass rates
Advanced Search
Search Syntax
Use powerful search operators to find specific executions:
String Search:
field:value - Contains search (e.g. source:aws, input:hello)
field!:value - Not contains search (e.g. source!:aws)
field=value - Exact match (e.g. source=aws)
field!=value - Not equals (e.g. source!=aws)
field is:empty - Check for empty values
Numeric Search:
duration>100 - Greater than
duration<500 - Less than
duration>=100 - Greater than or equal
duration<=500 - Less than or equal
duration=100 - Exact match
Free Text Search:
- Simple text search (e.g.
hello) - Searches across all text fields
- Use
AND to combine terms (e.g. hello AND world)
- Use
OR for alternatives (e.g. hello OR world)
- Use parentheses for grouping (e.g.
(hello OR world) AND test)
Combining Searches:
- Mix and match different operators (e.g.
source:aws AND environment=prod)
- Use parentheses for complex queries (e.g.
(duration>100 OR duration<=50) AND environment=prod)
- Combine free text with specific field searches (e.g.
hello AND source:aws)
Quoted Strings:
- Use quotes for multi-word values (e.g.
source:"aws lambda")
Available Fields:
source - Source of the execution
environment - Environment name
input - Input text
output - Output text
message - Run message
duration - Duration in milliseconds
Timeline View
- Visual representation of execution timing
- Identify patterns in response times
- Spot anomalies or performance issues
- Track conversation flow
- Success rate trends
- Duration distribution
- Token usage patterns
- Data capture accuracy over time
Compare executions across:
- Different personas
- Time periods
- Edge cases
- Data field variations
Best Practices
Analysis Workflow
-
Review Overall Metrics
- Check success rates
- Analyze duration patterns
- Review token usage
-
Deep Dive into Failures
- Examine failed executions
- Review error patterns
- Identify common issues
-
Compare Across Runs
- Track improvements
- Identify regressions
- Analyze pattern changes
-
Document Findings
- Note successful strategies
- Document areas for improvement
- Track action items
Tips for Effective Analysis
- Start with high-level metrics
- Use search to find specific patterns
- Compare similar scenarios
- Track improvements over time
- Document unusual cases
- Share insights with team