Evaluation

This documentation is valid for:

Evaluating the performance of your AI Assistants is crucial to ensure they provide accurate and helpful responses. The Evaluation feature is designed to streamline and automate this process, allowing you to focus on improving your AI Assistant's capabilities.

With the Evaluation feature, you can:

Define a set of test questions: Create a list of questions specifically designed to evaluate your AI Assistant's performance.
Execute evaluation runs: Run these test questions against different versions of your AI Assistant, automatically collecting responses and feedback.
Analyze evaluation results: The module provides dashboards and visualizations to track the evolution of your AI Assistant's performance across different runs. You can monitor metrics like the percentage of correct answers and identify areas for improvement.
Automate feedback collection: Configure the module to automatically compare the AI Assistant's responses to expected answers and provide feedback without manual intervention.

Programmatic access to the Evaluation features is provided through the following three Evaluation APIs. Each API is accompanied by a dedicated Jupyter notebook with practical examples and code snippets.

DataSet API: Manage the datasets used for evaluating your AI Assistants. For practical examples and code snippets, review the DataSetAPI.ipynb notebook.
Evaluation Plan API: Define how you want to evaluate your AI Assistants, including setting evaluation metrics. Review the EvaluationPlanAPI.ipynb notebook for illustrative examples.
Evaluation Result API: Retrieve and analyze the results of your executed evaluation plans. The EvaluationResultAPI.ipynb notebook provides practical examples for using this API.

Page Id

—

Created: 10 March 2025 - Last update: 10 March 2025 by rp

Next: How to evaluate an AI Assistant

Backlinks

See all