Simulations
Analyze Results
Learn how to analyze and understand your simulation results
Understanding Results Structure
Executions vs Test Runs
- Executions: Individual simulation runs with their specific results
- Test Runs: Groups of executions bundled together for comparison over time
- Compare performance across different scenarios
- Track improvements between iterations
- Analyze patterns across multiple runs
Reviewing Executions
Execution Details
Each execution provides detailed information about:
- Timestamp of the run
- Duration of the conversation
- Tokens used
- Input/Output pairs
- Pass/Fail status
- Evaluation results
Transcript Review
Review conversations in detail with:
- Complete conversation transcript
- Audio playback of the interaction
- Turn-by-turn message analysis
Performance Metrics
Track important metrics including:
- Response times
- Token usage
- Success rates
- Evaluation results
- Overall pass rates
Advanced Search
Search Syntax
Use powerful search operators to find specific executions:
String Search:
field:value
- Contains search (e.g.source:aws
,input:hello
)field!:value
- Not contains search (e.g.source!:aws
)field=value
- Exact match (e.g.source=aws
)field!=value
- Not equals (e.g.source!=aws
)field is:empty
- Check for empty values
Numeric Search:
duration>100
- Greater thanduration<500
- Less thanduration>=100
- Greater than or equalduration<=500
- Less than or equalduration=100
- Exact match
Free Text Search:
- Simple text search (e.g.
hello
) - Searches across all text fields - Use
AND
to combine terms (e.g.hello AND world
) - Use
OR
for alternatives (e.g.hello OR world
) - Use parentheses for grouping (e.g.
(hello OR world) AND test
)
Combining Searches:
- Mix and match different operators (e.g.
source:aws AND environment=prod
) - Use parentheses for complex queries (e.g.
(duration>100 OR duration<=50) AND environment=prod
) - Combine free text with specific field searches (e.g.
hello AND source:aws
)
Quoted Strings:
- Use quotes for multi-word values (e.g.
source:"aws lambda"
)
Available Fields:
source
- Source of the executionenvironment
- Environment nameinput
- Input textoutput
- Output textmessage
- Run messageduration
- Duration in milliseconds
Visualization Tools
Timeline View
- Visual representation of execution timing
- Identify patterns in response times
- Spot anomalies or performance issues
- Track conversation flow
Performance Graphs
- Success rate trends
- Duration distribution
- Token usage patterns
- Data capture accuracy over time
Comparison Tools
Compare executions across:
- Different personas
- Time periods
- Edge cases
- Data field variations
Best Practices
Analysis Workflow
-
Review Overall Metrics
- Check success rates
- Analyze duration patterns
- Review token usage
-
Deep Dive into Failures
- Examine failed executions
- Review error patterns
- Identify common issues
-
Compare Across Runs
- Track improvements
- Identify regressions
- Analyze pattern changes
-
Document Findings
- Note successful strategies
- Document areas for improvement
- Track action items
Tips for Effective Analysis
- Start with high-level metrics
- Use search to find specific patterns
- Compare similar scenarios
- Track improvements over time
- Document unusual cases
- Share insights with team