Analyze AI Products
Autoblocks enables you to run A/B experiments to see which product decisions are lending to better success metrics, and provides tools to get feedback on how users are interacting with your AI product.
Running A/B tests and online experiments in production
Sometimes, offline testing alone does not allow you to build strong convictions around one product configuration versus another.
In this case, you might want to let the data (and your real users) decide. Autoblocks enables you to run A/B experiments to see which changes to your AI product are leading to better success metrics.
There are three main types of discretionary components in your AI system:
- Prompts
- Models and model parameters
- Retrieval mechanisms
Your decisions regarding any of these are fair game to run experiments against.
What are success metrics for an AI system?
User actions
User actions are more traditional analytic events that can occur. Autoblocks can be used as a singular platform for all your product analysis, eliminating the need to use another tool. It also allows you to correlate user actions with variables of your AI product.
For example, if you are generating sales emails and want to track which prompt versions are getting the most clicks, you can use Autoblocks to track and visualize using the Explore page.
User feedback
Using Autoblocks, it is easy to monitor how users are reacting to the changes you are making. Autoblocks helps you make sense of what changes are driving better user outcomes through feedback clustering and real time dashboards.
For example, on any message sent to Autoblocks you can include a rating
property to track explicit positive, neutral, or negative signals. You could also send a feedback
property to track raw input entered into a text box from users.
It's recommended to include properties like sessionId
, userId
, or other relevant identifiers to associate an event to relevant user actions.
View full tracer documentation here.
tracer.send_event(
"user.feedback",
properties=dict(
rating="positive",
feedback="I really enjoyed the personalized response.",
),
)
Once you have sent in user feedback, you can visualize it on the Explore page.
- Filter for the
user.feedback
event type. - Open Chart Options
- Breakdown by the
feedback
property - Select Stacked Bar or Line chart type. The stacked bar chart is great for visualizing what proportion of events are negative, while a line chart is useful for visualizing trends over time.
- Granularity is set to hourly by default, but you can tailor this to your specific needs.
You can overlay filters to focus your visualization on user feedback for a particular subset of LLM interactions.
User feedback is a great property to filter by when curating a fine-tuning dataset. For example, you can use the traces API to export input-output pairs corresponding to positive user feedback.