Here is a step-by-step guide on how to use Data Analyst Assistant, upload a document, and start chatting with the assistant from the playground.
First, enter the Globant Enterprise AI Backoffice. On the left side of the screen, you can find the backoffice menu. In this menu, click on Assistants. Then click on the CREATE DATA ANALYST ASSISTANT.
Next, in the Project Dynamic Combo Box, select the project you want to work with (in this case, the Default one is used).
Metadata is descriptive information about the data contained in a dataset. It is used to provide context and better understand the available data.
Metadata usually includes details such as the name of the dataset, a description of its contents, and the structure of its columns (including data types and possible values).
This metadata is used to define how the dataset should be processed and to clarify the dataset to be analyzed.
To complete the Metadata (JSON) correctly, you must define a JSON with the following structure:
{
"dataset_name": {
"dataframe name": "dataset_name",
"description": "contains data regarding different types of revenues, their types, dates, and associated details.",
"column explanations": {
"colum1": "description column 1. dtype: dtype1",
"colum2": "description column 2. dtype: dtype2",
...
"columN": "description column N. dtype: dtypeN"
},
"considerations": {
"coder": [
"Describe you consideration here",
......
"Example of consideration: use column1 to determine the type"
],
"interpreter": [
"Example 1",
"Example N"
]
}
}
}
Where:
- "dataframe name" must contain the name of the dataset and correspond to the name of one of the .csv files to be uploaded.
- "description" must provide a description of the dataset, including the data types of its columns and possible values.
- "column explanations" must include a description of each dataset column, specifying its purpose and the type of data it contains.
- "colum1", ..., "columN": Exact names (without blanks or special characters) of the columns of the .csv files to be loaded.
The number of columns in the .csv files must match the number of columns provided in the metadata. In addition, it is important to avoid duplication of column names.
- dtype: must be one of the following types:
- string
- float
- integer
- datetime: must be one of the following formats:
- '%Y-%m-%d'
- '%d-%m-%Y'
- '%Y/%m/%d'
- '%d/%m/%Y'
- '%m/%d/%Y'
- '%Y/%d/%m'
- date
- bool
- biginteger
- "considerations" must detail specific considerations for the coder and the interpreter.
- "coder": It generates the necessary code to process the data. In addition, it uses the Metadata information to guide the process of extracting relevant information from the data.
The coder may include considerations that address consistency in data formats, accuracy and completeness of records, validation of data with external sources, and assessment of the importance of each piece of data during analysis.
- "interpreter": Its function is to produce the final answer with the information obtained from the data processing performed by the coder. It analyzes the processing results to better understand the data and provide relevant answers.
It may include considerations indicating the understanding of technical terms and abbreviations used in the data, as well as specifying that responses should be returned in the language preferred by the end user or according to the context of use.
Glossary is a JSON that contains a list of terms used in the end user's company or domain, together with their respective definitions. These terms and definitions are relevant for the LLM to understand the questions and provide more accurate answers.
For example, you can include abbreviations that are often used in the company and their meanings for better understanding in a format such as the following:
{
"glossary": {
"term1": "Definition of term 1",
"term2": "Definition of term 2",
"term3": "Definition of term 3",
"abbr1": "Abbreviation 1 - Meaning of abbreviation 1",
"abbr2": "Abbreviation 2 - Meaning of abbreviation 2",
"abbr3": "Abbreviation 3 - Meaning of abbreviation 3"
}
}
Upload a dataset file by clicking on the +Add files... button.
Make sure that the CSV file follows the descriptions and metadata previously added.
Do not use uppercase letters in the file name; use this_notation.csv instead of ThisNotation.csv.
Then click on SAVE to create the assistant.
After you click on the SAVE button, a window will appear to enter a Name, Description, and Icon:
Next, click on the CONFIRM button.
Once you have created the Data Analyst Assistant, you return to the Assistants page where you can see the current assistant creation status.
This status is related to the process of loading the datasets with the Metadata and Glossary information.
The possible statuses are FAILED, COMPLETED, PROCESSING (with a progress percentage).
In this case, it is being processed with a progress of 5.4%.
Once the upload status is complete, you can click on UPDATE, and view the version identifier with which it was saved. In addition, you can change the name and description, configure it as enabled or disabled, or add an icon if you wish.
To edit the Data Analyst Assistants, go to the Assistants home page and click on EDIT.
A window similar to the one below will open, where you will have the option to make one of the following three edits in the General section:
- Extend datasets
- Replace datasets
- Update prompts
- Clean cache
- Clean data
This option allows you to add records to existing datasets in any Data Analyst Assistant.
Simply click on + Add files... and then click on the SAVE IN CURRENT REVIEW button.
This expands the information available without affecting the original structure of the dataset.
When using this function, the files selected for integration must meet the same conditions required when initially defining the assistant, such as having the same number of columns and file names.
It is essential to avoid that the files used for the extension contain the same rows that are already loaded in the assistant, as this could generate duplicate keys and cause problems in the execution of the queries.
It must be noted that in Extend datasets mode it is not possible to make changes to the metadata or the glossary. This function focuses exclusively on extending existing datasets without altering other aspects of the assistant.
Selecting the Replace datasets option will display a window like the one below:
This option allows you to completely replace the data in an existing dataset, as well as modify the associated metadata and glossary.
This is useful when you need to completely update the information or make significant changes to the structure and description of the dataset.
Selecting this option overwrites all previous data with the new data provided and updates the metadata and glossary as necessary.
Note that it is mandatory to select one or more dataset files to be loaded. In addition, already loaded dataset information will be deleted in order to process and save the new one.
When you select Update prompts, you will see a window similar to the one below, where you will find a list of the 28 default prompts.
This function allows you to query and update each of the prompts individually by clicking on UPDATE.
This action requires that you have the Administrator, ProjectRole, or OrganizationRole role.
When you click on UPDATE, the current contents of the prompt will be displayed so that it can be modified.
Once the changes are confirmed by clicking on the CONFIRM button, you will be returned to the prompts list. If the editing is cancelled by clicking on the CANCEL button, you will also return to the prompts list without making any changes.
It is important to note that in the Update prompts option there is no SAVE IN CURRENT REVIEW button, only a RETURN button will be offered. This is because the prompts updates are performed from each update of the prompts list and not at the general level of the assistant.
When generating a new version of the assistant and making changes with the Extend datasets or Replace datasets options, the default prompts will be used. This means that changes made to prompts in previous versions of the assistant will be ignored.
Finally, you can test your Data Analyst Assistant by clicking on Playground on the left side menu of the Backoffice window:
It is also possible to use the Chat API.