Understanding Results Structure
Executions vs Test Runs
- Executions: Individual simulation runs with their specific results
- Test Runs: Groups of executions bundled together for comparison over time
- Compare performance across different scenarios
- Track improvements between iterations
- Analyze patterns across multiple runs
Reviewing Executions
Execution Details
Each execution provides detailed information about:- Timestamp of the run
- Duration of the conversation
- Tokens used
- Input/Output pairs
- Pass/Fail status
- Evaluation results
Transcript Review
Review conversations in detail with:- Complete conversation transcript
- Audio playback of the interaction
- Turn-by-turn message analysis
Performance Metrics
Track important metrics including:- Response times
- Token usage
- Success rates
- Evaluation results
- Overall pass rates
Advanced Search
Search Syntax
Use powerful search operators to find specific executions: String Search:field:value- Contains search (e.g.source:aws,input:hello)field!:value- Not contains search (e.g.source!:aws)field=value- Exact match (e.g.source=aws)field!=value- Not equals (e.g.source!=aws)field is:empty- Check for empty values
duration>100- Greater thanduration<500- Less thanduration>=100- Greater than or equalduration<=500- Less than or equalduration=100- Exact match
- Simple text search (e.g.
hello) - Searches across all text fields - Use
ANDto combine terms (e.g.hello AND world) - Use
ORfor alternatives (e.g.hello OR world) - Use parentheses for grouping (e.g.
(hello OR world) AND test)
- Mix and match different operators (e.g.
source:aws AND environment=prod) - Use parentheses for complex queries (e.g.
(duration>100 OR duration<=50) AND environment=prod) - Combine free text with specific field searches (e.g.
hello AND source:aws)
- Use quotes for multi-word values (e.g.
source:"aws lambda")
source- Source of the executionenvironment- Environment nameinput- Input textoutput- Output textmessage- Run messageduration- Duration in milliseconds
Visualization Tools
Timeline View
- Visual representation of execution timing
- Identify patterns in response times
- Spot anomalies or performance issues
- Track conversation flow
Performance Graphs
- Success rate trends
- Duration distribution
- Token usage patterns
- Data capture accuracy over time
Comparison Tools
Compare executions across:- Different personas
- Time periods
- Edge cases
- Data field variations
Best Practices
Analysis Workflow
-
Review Overall Metrics
- Check success rates
- Analyze duration patterns
- Review token usage
-
Deep Dive into Failures
- Examine failed executions
- Review error patterns
- Identify common issues
-
Compare Across Runs
- Track improvements
- Identify regressions
- Analyze pattern changes
-
Document Findings
- Note successful strategies
- Document areas for improvement
- Track action items
Tips for Effective Analysis
- Start with high-level metrics
- Use search to find specific patterns
- Compare similar scenarios
- Track improvements over time
- Document unusual cases
- Share insights with team

