Understanding Results Structure

Executions vs Test Runs

  • Executions: Individual simulation runs with their specific results
  • Test Runs: Groups of executions bundled together for comparison over time
    • Compare performance across different scenarios
    • Track improvements between iterations
    • Analyze patterns across multiple runs

Reviewing Executions

Execution Details

Each execution provides detailed information about:

  • Timestamp of the run
  • Duration of the conversation
  • Tokens used
  • Input/Output pairs
  • Pass/Fail status
  • Evaluation results

Transcript Review

Review conversations in detail with:

  • Complete conversation transcript
  • Audio playback of the interaction
  • Turn-by-turn message analysis

Performance Metrics

Track important metrics including:

  • Response times
  • Token usage
  • Success rates
  • Evaluation results
  • Overall pass rates

Search Syntax

Use powerful search operators to find specific executions:

String Search:

  • field:value - Contains search (e.g. source:aws, input:hello)
  • field!:value - Not contains search (e.g. source!:aws)
  • field=value - Exact match (e.g. source=aws)
  • field!=value - Not equals (e.g. source!=aws)
  • field is:empty - Check for empty values

Numeric Search:

  • duration>100 - Greater than
  • duration<500 - Less than
  • duration>=100 - Greater than or equal
  • duration<=500 - Less than or equal
  • duration=100 - Exact match

Free Text Search:

  • Simple text search (e.g. hello) - Searches across all text fields
  • Use AND to combine terms (e.g. hello AND world)
  • Use OR for alternatives (e.g. hello OR world)
  • Use parentheses for grouping (e.g. (hello OR world) AND test)

Combining Searches:

  • Mix and match different operators (e.g. source:aws AND environment=prod)
  • Use parentheses for complex queries (e.g. (duration>100 OR duration<=50) AND environment=prod)
  • Combine free text with specific field searches (e.g. hello AND source:aws)

Quoted Strings:

  • Use quotes for multi-word values (e.g. source:"aws lambda")

Available Fields:

  • source - Source of the execution
  • environment - Environment name
  • input - Input text
  • output - Output text
  • message - Run message
  • duration - Duration in milliseconds

Visualization Tools

Timeline View

  • Visual representation of execution timing
  • Identify patterns in response times
  • Spot anomalies or performance issues
  • Track conversation flow

Performance Graphs

  • Success rate trends
  • Duration distribution
  • Token usage patterns
  • Data capture accuracy over time

Comparison Tools

Compare executions across:

  • Different personas
  • Time periods
  • Edge cases
  • Data field variations

Best Practices

Analysis Workflow

  1. Review Overall Metrics

    • Check success rates
    • Analyze duration patterns
    • Review token usage
  2. Deep Dive into Failures

    • Examine failed executions
    • Review error patterns
    • Identify common issues
  3. Compare Across Runs

    • Track improvements
    • Identify regressions
    • Analyze pattern changes
  4. Document Findings

    • Note successful strategies
    • Document areas for improvement
    • Track action items

Tips for Effective Analysis

  • Start with high-level metrics
  • Use search to find specific patterns
  • Compare similar scenarios
  • Track improvements over time
  • Document unusual cases
  • Share insights with team