In this section, you can set up instructions that guide the assistant on how to approach and answer questions. These instructions establish clear guidelines for the assistant to provide relevant and useful answers based on the context provided. The default value is:
You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
Use all this context to answer:
<context>
{context}
</context>
Question: {question}
Helpful answer in markdown gives an extensive response:
Keep the context and question variables set because they will be replaced with the associated information before the interaction.
The context variable can be customized and will reflect the chunks obtained from the VectorStore closest to the question.
The question variable will reflect what the customer asked.
The context and question variables are mandatory on a RAG prompt, otherwise the following error will be detailed:
Missing {context}|{question} in prompt definition
It defines how many chunks are retrieved from the VectorStore to augment the context; defaults to 5 for assistants created since 6/08/2024; 2 otherwise.
Check the Context Prompt Template to customize how each Chunk is handled.
It sets the number of historical messages that are taken into account in the conversation.
This is useful for tracking the interaction history and understanding the context gathered in the conversation. Note that this value refers to the end user's question and associated answer. That is, if the "History Message Count" value is set to 4, it means that you are interested in considering the last 4 interactions:
- Previous Question
- Previous Answer
- Last Question
- Last Answer
The minimum value it can take is 0, which indicates that the conversation history is not of interest.
When the value is greater than 0, it is used together with the "History Prompt" parameter.
It applies when the History Message Count is higher than 0; it will compact the previous interactions so as not to lose valuable context data; check here for more detail and how to manage the history in a RAG assistant.
Configuration of the model used by the assistant to generate the answer, including the service provider, model name, temperature, maximum token limit, and other important parameters that affect how answers are generated. Check available options here.
The LLM configuration parameters are listed below:
It indicates the language model service provider used. The possible values it can take are: "openai", "azureopenai", "cohere", "google", "googlevertexai", "anthropic", "nvidia.nemo", "bedrock".
API authentication key provided to access the language model service. For example, if the Provider parameter is "azureopenai", you must specify "apiKey" with the authentication key.
It specifies the name of the language model used. For example, if the Provider parameter has the value "openai", modelName takes the value 'gpt-3.5-turbo' and since 6/08/2024 the value 'gpt-4o-mini'. Check available models here.
It controls the randomness of response generation, with values represented by decimal numbers. A value of 0.00 indicates a more deterministic generation, while higher values introduce more variability and surprise in the generated answers.
It sets the maximum number of text units in an answer. The setting is expressed as an integer; for example, 1000. Depending on the selected model, you will have to check for the maximum supported value.
It controls the diversity of generated answers. It is expressed as decimal numeric values, where a value of 1 indicates that all possible options are considered.
This parameter allows regulating the diversity of the generated answers, determining the extent of the options considered during the text generation process.
This value is deprecated. Check Chat API for streaming usage.
It enables or disables the level of detail in the information generated by the model during the process. It can be configured with Boolean values, where 'true' enables the detailed output and 'false' disables it.
It controls whether to search the cache if the query has already been made. It is configured with Boolean values, where 'true' enables caching and 'false' disables it. Note that all assistant configuration values are taken into account in the cache search.
It establishes the text generation process by favoring diversity and reducing the repetition of frequent terms, thus optimizing the quality and variety of the generated answers. It is configured with decimal numeric values from -2.0 to 2.0.
More information: https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty
It establishes the text generation process by regulating the appearance of certain terms in the answers. It is configured with decimal numeric values from -2.0 to 2.0.
More information: https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty
It configures the URL pointing to the specific server or service where the language models are hosted.
The endpoint configuration varies depending on the service provider used.
For example, for 'openai', the URL must be precise and contain the specific engine version:
https://api.openai.com/v1/chat/completions
In the case of 'azureopenai', the URL must include the name, implementation, and version:
https://{name}.openai.azure.com/openai/deployments/{implementation}/chat/completions?api-version={version}
It configures the the response format, valid options are "" (empty string) or the literal "json_object". Notice that for the latter, the prompt must explicitly detail you want a JSON output, more detail here.
Suggestion for using this value is to add the following to the end of the prompt:
Question: {question}
Helpful answer in JSON only following the format:
{{
"reply": "answer here"
}}
A valid response from the model can be
{
"reply": "This is the reply from the model"
}
This option is only valid for openAI 3.5 or higher models.
VectorStore Search Options