Table of contents
Official Content
  • This documentation is valid for:

Self Query is a type of retriever used by the RAG Assistant to obtain information. To configure it, go to the Retrieval subsection, and set the Self Query value in the Retriever Type parameter.

The Self Query retriever type analyzes the user's query, inferring metadata to narrow the search space.

For this process to be carried out effectively, it is imperative that you ensure that the following settings are made:

  1. You must enter documents with metadata in one of the following ways:
     
  2. In addition, the RAG Assistant must be thoroughly familiar with all the metadata elements available for use, which is accomplished by properly configuring the Profile Metadata parameter as follows:
    {
        "chat": {
            "retriever": {
                "metadata": [
                    {
                      "description": "a brief description of the metadata element, will be used by the LLM to discover and generate a filter",
                      "name": "key element",
                      "type": "string|number"
                    }
                ],
            }
        }
    }
    

Execution

When querying a RAG Assistant configured as Self Query retriever, the following will happen:

  • User: asks a query.
  • System: executes an internal LLM call to map the query to detect if filters are available.
  • System: retrieves relevant information from the vector store (based on the query) and the filters if available.
  • System: executes the final LLM call with the context obtained from the previous steps using the associated configuration (such as prompt, temperature, score threshold, chunk count, and so on).

Sample

Suppose some of your documents can be ingested with the following metadata:

  • country
  • documentid
  • publishdate
  • year

Then, in the Profile Metadata of the RAG Assistant, you must set up something similar to the following:

{
    "chat": {
        "retriever": {
            "metadata": [
                {
                 "description": "The country associated to the document",
                 "name": "country",
                 "type": "string"
                },
                {
                 "description": "The document identifier (formatted as 'XX-XX')",
                 "name": "documentid",
                 "type": "string"
                },
                {
                 "description": "The document publish date (formatted as 'YYYYMMDD')",
                 "name": "publishdate",
                 "type": "string"
                },
               {
                 "description": "The document publish year",
                 "name": "year",
                 "type": "number"
               }
            ],
        }
    }
}

Important: If you know the possible values of a metadata element, detail them on the description attribute; you can add a couple of samples too. For the sample, if you have a known list of countries associated with the documents, they can be specified in the description element too, as follows:

{
    "description": "The country associated to the document ('Country1', 'Country2', ... 'CountryN')",
    "name": "country",
    "type": "string"
}

The following questions will be able to narrow down only those documents with the associated information, executing a semantic + metadata search on the vector store.

Question Filters
What are the considerations for documentid "AB-12"? documentid "AB-12"
What is the adopted strategy for the country Spain? country "Spain"
What are the priority areas of the 2022 to 2025 year for country Spain? Year between 2022 and 2025
country "Spain"
Highlights for country "Spain" publish date between November and December 2023 publish data between 20231101 and 20231130
country "Spain"

 

Read the Backoffice Requests section to follow up on how this process is done and what filters were discovered (if any) for the questions.

Advanced

When you click the Set Default Prompt button, the Retriever prompt is automatically set with the following:

Your goal is to structure the user's query to match the request schema provided below.

<< Structured Request Schema >>
When responding use a markdown code snippet with a JSON object formatted in the following schema:

```json
{{
    \"query\": string \\ text string to compare to document contents
    \"filter\": string \\ logical condition statement for filtering documents
}}
```

The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as well.

A logical condition statement is composed of one or more comparison and logical operation statements.

A comparison statement takes the form: `comp(attr, val)`:
- `comp` (eq | ne | gt | gte | lt | lte): comparator
- `attr` (string):  name of attribute to apply the comparison to
- `val` (string): is the comparison value

A logical operation statement takes the form `op(statement1, statement2, ...)`:
- `op` (and | or): logical operator
- `statement1`, `statement2`, ... (comparison statements or logical operation statements): one or more statements to apply the operation to

Make sure that you only use the comparators and logical operators listed above and no others.
Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters only use the attributed names with its function names if there are functions applied on them.
Make sure that filters only use format `YYYYMMDD` when handling timestamp data typed values.
Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.
Make sure that filters are only used as needed. If there are no filters that should be applied return \"NO_FILTER\" for the filter value.

You can customize the Retriever Prompt as seen below by modifying everything up to the samples section. To add custom samples, place them at the bottom of the prompt. 

Your goal is to structure the users query to match the request schema provided below.

<< Structured Request Schema >>

When responding use a markdown code snippet with a JSON object formatted in the following schema:

```json
{{{{
    "query": string \\ text string to compare to document contents
    "filter": string \\ logical condition statement for filtering documents
}}}}
```

The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as well.

A logical condition statement is composed of one or more comparison and logical operation statements.

A comparison statement takes the form: `comp(attr, val)`:

- `comp` ({allowed_comparators}): comparator
- `attr` (string):  name of attribute to apply the comparison to
- `val` (string): is the comparison value

A logical operation statement takes the form `op(statement1, statement2, ...)`:
- `op` ({allowed_operators}): logical operator
- `statement1`, `statement2`, ... (comparison statements or logical operation statements): one or more statements to apply the operation to

Make sure that you only use the comparators and logical operators listed above and no others.
Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters only use the attributed names with its function names if there are functions applied on them.
Make sure that filters only use format `YYYY-MM-DD` when handling timestamp data typed values.
Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.
Make sure that filters are only used as needed. If there are no filters that should be applied return "NO_FILTER" for the filter value.

{samples} // will be injected with FewShot samples and the metadata detailed above

User Query:
{query}

Structured Request:

Note: Keep in mind that not all of the prompt is modifiable.

 

Last update: September 2024 | © GeneXus. All rights reserved. GeneXus Powered by Globant