Table of contents
Official Content
  • This documentation is valid for:

This guide explains the resources available in the Globant Enterprise AI Evaluation Notebooks and how to use them effectively to evaluate your AI Assistants.

Step 1: Create the Dataset

Prepare a dataset that contains the test cases for your AI Assistant. Each row in the dataset represents a test case, defining:

  • The input question for the assistant.
  • The expected output based on predefined criteria.

You can create a dataset using the DataSet API in two ways:

  • Create a dataset from scratch: Work through the "Working with Datasets" section of the DataSetAPI.ipynb notebook. This section provides examples and code snippets for creating a new dataset using API endpoints and then populating it with rows.
  • Upload a complete dataset from a JSON file: If you already have your dataset in a JSON file, you can directly upload it. Work through the "Uploading Data via Files" section of the DataSetAPI.ipynb notebook. This section explains how to use the POST/dataSetApi/dataSet/FileUpload endpoint to create a dataset from a JSON file. If your dataset is in CSV format, you'll first need to convert it to JSON. The CSVtoJSONConversion.ipynb notebook provides a practical example and code to guide you through this conversion process.

Once you have created your dataset, the DataSetAPI.ipynb notebook also provides examples for managing your dataset, including:

  • Retrieve, update, and delete datasets.
  • Add, modify, and remove rows within a dataset.
  • Manage expected sources and filter variables associated with dataset rows.
  • Upload dataset rows via file uploads.

Step 2: Define the Evaluation Plan

Create an evaluation plan to evaluate your AI Assistant using the Evaluation Plan API, specifying the following:

  • The AI Assistant to be tested.
  • The dataset that will be used for testing.
  • The metrics that will be applied to assess the assistant's performance.

You can achieve this by working through the EvaluationPlanAPI.ipynb notebook, which provides examples and code snippets to:

  • Create, retrieve, update, and delete evaluation plans.
  • Associate system metrics with your evaluation plans and manage their weights.
  • Retrieve available system metrics and their details.
  • Execute a defined evaluation plan.

Step 3: Execute the Evaluation Plan

Run the evaluation plan to initiate the testing process. The evaluation engine will:

  • Instantiate the assistant for each row in the dataset.
  • Capture the assistant's response.
  • Apply the defined metrics to compare the actual results with the expected outputs.

You can execute an evaluation plan using the POST/evaluationPlanApi/evaluationPlan/{evaluationPlanId} endpoint of the Evaluation Plan API. Refer to the EvaluationPlanAPI.ipynb notebook for a practical example.

Step 4: Retrieve and Analyze the Evaluation Results

Use the GET/evaluationResultApi/evaluationResult/{evaluationResultId} endpoint of the Evaluation Result API to retrieve the results of the executed evaluation plan. The results will include:

  • The assistant's responses for each test case.
  • The computed metric scores based on the expected vs. actual outputs.

The EvaluationResultAPI.ipynb notebook provides examples and code snippets for retrieving evaluation results.

Basic Example

For a basic walkthrough of using the DataSet, Evaluation Plan, and Evaluation Result APIs together, refer to the EvaluationAPITutorial.ipynb notebook.

Last update: March 2025 | © GeneXus. All rights reserved. GeneXus Powered by Globant