Analyze AI Products

Autoblocks enables you to run A/B experiments to see which product decisions are lending to better success metrics, and provides tools to get feedback on how users are interacting with your AI product.

Running A/B tests and online experiments in production

Sometimes, offline testing alone does not allow you to build strong convictions around one product configuration versus another.

In this case, you might want to let the data (and your real users) decide. Autoblocks enables you to run A/B experiments to see which changes to your AI product are leading to better success metrics.

There are three main types of discretionary components in your AI system:

  • Prompts
  • Models and model parameters
  • Retrieval mechanisms

Your decisions regarding any of these are fair game to run experiments against.

What are success metrics for an AI system?

User actions

User actions are more traditional analytic events that can occur. Autoblocks can be used as a singular platform for all your product analysis, eliminating the need to use another tool. It also allows you to correlate user actions with variables of your AI product.

For example, if you are generating sales emails and want to track which prompt versions are getting the most clicks, you can use Autoblocks to track and visualize using the Explore page.

User feedback

Using Autoblocks, it is easy to monitor how users are reacting to the changes you are making. Autoblocks helps you make sense of what changes are driving better user outcomes through feedback clustering and real time dashboards.

Learn more about dashboards

For example, on any message sent to Autoblocks you can include a rating property to track explicit positive, neutral, or negative signals. You could also send a feedback property to track raw input entered into a text box from users.

    feedback="I really enjoyed the personalized response.",

Once you have sent in user feedback, you can visualize it on the Explore page.

  1. Filter for the event type.
  2. Open Chart Options
  3. Breakdown by the feedback property
  4. Select Stacked Bar or Line chart type. The stacked bar chart is great for visualizing what proportion of events are negative, while a line chart is useful for visualizing trends over time.
  5. Granularity is set to hourly by default, but you can tailor this to your specific needs.

User feedback is a great property to filter by when curating a fine-tuning dataset. For example, you can use the traces API to export input-output pairs corresponding to positive user feedback.