Profile Metadata is a parameter you can set in the Retrieval tab of the RAG Assistant. It enables the use of advanced configurations for RAG Assistants.
The default value is an empty object {}.
You can use the filter and chat options in Profile Metadata as follows:
{
"filter" [...], // list of fixed filters to apply
"chat" {...}, // complete list of chat elements to override
}
The filter section enables the use of fixed filters on every question. The pattern allows using a collection of items with the key, value and operator elements. You can find more details in Filter Operators.
For example, to use only the SampleFile.txt file, you can apply the following filter:
{"filter":
[
{"key": "name", "value":"SampleFile", "operator":"$eq"},
{"key": "extension", "value": "txt", "operator": "$eq"}
]
}
This section allows you to tune each access to the Large Language Model (LLM) to achieve the desired results.
chat.search.llm configures the LLM settings for summarizing user queries and history into a new query that considers the history. By default, the same LLM settings are used. However, you can customize these settings for this specific call.
For example, a valid configuration is as follows:
{
"chat": {
"search": {
"llm": {
"cache": true,
"maxTokens": 1002,
"modelName": "gpt-4o-mini",
"provider": "openai"
}
}
}
}
chat.retriever.llm configures the LLM settings for the retriever prompt. You can use this to specifically configure the LLM for the retriever.
For example, a valid configuration is as follows:
{
"chat": {
"retriever": {
"llm": {
"cache": true,
"maxTokens": 1003,
"modelName": "gpt-4o-mini",
"provider": "openai"
},
"queryCount": 3
}
}
}
Note that, in this case, the queryCount value is also set to 3, which is specific to Multi Query.
The queryCount parameter allows you to adjust the number of queries generated from different perspectives to obtain a wider set of potentially relevant documents.
Advanced options:
By default, document chunks are returned with the answer to the question. If you want to only return the answer (excluding the sources), disable the returnSourceDocuments parameter as follows:
{
"chat": {
"search": {
"returnSourceDocuments": false
}
}
}
documentAggregation allows you to control how document chunks are grouped and displayed in the results. Instead of receiving a long list of individual fragments (associated to document chunks), you can group them by document or by document and page.
The documentAggregation accepts the following values:
- empty (default): When the value is empty or the element is not present, all individual document chunks are returned separately.
- mergeByDocument: Merges all chunks belonging to the same document (same document identifier ) into a single entry. This aggregates the pageContent (document chunk) for each document.
- mergeByDocumentPage: Merges chunks with the same document identifier and page number into a single entry. This option is only applicable when a pageNumber is available.
It will return all elements without any processing
{
"chat": {
"search": {
"documentAggregation": ""
}
}
}
If several chunks are from the same document, only one element will be returned.
{
"chat": {
"search": {
"documentAggregation": "mergeByDocument"
}
}
}
If several chunks are from the same document and page, only one element by document+page will be returned.
{
"chat": {
"search": {
"documentAggregation": "mergeByDocumentPage"
}
}
}
Note: When using mergeByDocument or mergeByDocumentPage, the returned score will be the highest score among the merged chunks.
The Retriever section supports the following advanced configuration:
- searchType
- useOriginalQuery1
- metadata1
- step
1 - used in the context of the Self Query Use Case.
Used in conjuntion with specific Embeddings models to do the retrieval; some models have different modes; for ingestion or query; check option values here. Configuration examples for querying (if the model supports it) are:
{
"chat": {
"retriever": {
"searchType": "RETRIEVAL_QUERY" // for example using text-embedding-004 or 005
}
}
}
{
"chat": {
"retriever": {
"searchType": "query" // for example using nv-embedqa-e5-v5
}
}
}
If using Azure AISearch the semantics are different and the following configurations are available.
The step parameter allows you to control the RAG process execution. It's useful when you only need to retrieve documents related to a question from the Vector Store, skipping Augmentation and Generation.
The step parameter accepts two values:
- "all" (default): Executes all steps of the RAG process
- "documents": Retrieves only the associated documents from the Vector Store
When you set the "documents" value, the assistant will return the relevant documents according to its configuration. Note that in this case, the text element (which is typically filled during the Generation step) will be empty.
To configure the step parameter, use the following structure:
{
"chat": {
"retriever": {
"step": "documents"
}
}
}
By using this configuration, you can efficiently retrieve relevant documents without executing the full RAG process, which can be beneficial for certain use cases or performance optimization.
The step parameter allows you to control the RAG process execution. It's useful when you only need to retrieve documents related to a question from the Vector Store, skipping Augmentation and Generation.
When using the cohere.embed-english-v3 multimodal embeddings model, you need to configure the multimodalBatchSize parameter to 1. This is because images can only be ingested one at a time. This extra configuration ensures that the RAG assistant can correctly ingest the information when processing files.
To configure this, you need to add the following configuration:
{
"chat": {
"ingestion": {
"multimodalBatchSize": 1
}
}
}
If this parameter is not set correctly, the error "total number of images must be at most 1" may appear.
The selection for Supported Chat Models and Embeddings providers are the catalogued by the system. For cases where you need to use a Beta or not registered model (which you know is available), you can overide the configuration using the chat.embeddings or chat.llm properties as follows:
{
"chat": {
"embeddings": {
"provider": "pinecone",
"modelName": "llama-text-embed-v2"
}
}
}
{
"chat": {
"llm": {
"provider": "vertex_ai",
"modelName": "some_fancy_new_model"
}
}
}
Please review Guardrails applied to RAG Assistants.
You can combine several of the previous configurations; for example for filter and chat configurations, this is the correct configuration:
{
"chat": {
"retriever": {
"llm": {
"cache": true,
"maxTokens": 1003,
"modelName": "gpt-4o",
"provider": "openai"
},
"queryCount": 3
},
"search": {
"llm": {
"cache": true,
"maxTokens": 1002,
"modelName": "gpt-4o-mini",
"provider": "openai"
}
}
},
"filter": [
{"key": "name", "value":"SampleFile", "operator":"$eq"},
{"key": "extension", "value": "txt", "operator": "$eq"}
]
}
Note that the values to be configured in the llm are the same as those described in the LLM Settings section.