Evaluating the performance of your AI Assistants is crucial to ensure they provide accurate and helpful responses. The Evaluation feature is designed to streamline and automate this process, allowing you to focus on improving your AI Assistant's capabilities.
With the Evaluation feature, you can:
- Define a set of test questions: Create a list of questions specifically designed to evaluate your AI Assistant's performance.
- Execute evaluation runs: Run these test questions against different versions of your AI Assistant, automatically collecting responses and feedback.
- Analyze evaluation results: The module provides dashboards and visualizations to track the evolution of your AI Assistant's performance across different runs. You can monitor metrics like the percentage of correct answers and identify areas for improvement.
- Automate feedback collection: Configure the module to automatically compare the AI Assistant's responses to expected answers and provide feedback without manual intervention.
Programmatic access to the Evaluation features is provided through the following three Evaluation APIs. Each API is accompanied by a dedicated Jupyter notebook with practical examples and code snippets.