Image

Sign upLogin

Table of contents

Page Id

Configuration - Prompt

This documentation is valid for:

Prompt

In this section, you can set up instructions that guide the assistant on how to approach and answer questions. These instructions establish clear guidelines for the assistant to provide relevant and useful answers based on the context provided. The default value is:

You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
Use all this context to answer:

<context>
{context}
</context>

Question: {question}
Helpful answer in markdown gives an extensive response:

Keep the context and question variables set because they will be replaced with the associated information before the interaction.

The context variable can be customized and will reflect the chunks obtained from the VectorStore closest to the question.

The question variable will reflect what the customer asked.

The context and question variables are mandatory on a RAG prompt, otherwise the following error will be detailed:

Missing {context}|{question} in prompt definition

Chunk Count

It defines how many chunks are retrieved from the VectorStore to augment the context; defaults to 5 for assistants created since 6/08/2024; 2 otherwise.

Check the Context Prompt Template to customize how each Chunk is handled.

History Message Count

It sets the number of historical messages that are taken into account in the conversation.

This is useful for tracking the interaction history and understanding the context gathered in the conversation. Note that this value refers to the end user's question and associated answer. That is, if the "History Message Count" value is set to 4, it means that you are interested in considering the last 4 interactions: 

Previous Question
Previous Answer
Last Question
Last Answer

The minimum value it can take is 0, which indicates that the conversation history is not of interest.

When the value is greater than 0, it is used together with the "History Prompt" parameter.

History Prompt

It applies when the History Message Count is higher than 0; it will compact the previous interactions so as not to lose valuable context data; check here for more detail and how to manage the history in a RAG assistant.

Llm Settings

Configuration of the model used by the assistant to generate the answer, including the service provider, model name, temperature, maximum token limit, and other important parameters that affect how answers are generated. See available options listed in Supported Chat Models.

The LLM configuration parameters are listed below:

Provider

It indicates the language model service provider used. The possible values it can take are: "openai", "azure", "azureopenai", "cohere", "google", "vertex_ai", "anthropic", "nvidia.nemo", "awsbedrock".

apiKey

API authentication key provided to access the language model service. For example, if the Provider parameter is "azureopenai", you must specify "apiKey" with the authentication key.

Model Name

It specifies the name of the language model used. Check available models in Supported Chat Models.

Note: If you need to use a model not detailed on the list, use the overide mechanism.

Temperature

It controls the randomness of response generation, with values represented by decimal numbers. A value of 0.00 indicates a more deterministic generation, while higher values introduce more variability and surprise in the generated answers.

maxTokens

It sets the maximum number of text units in an answer. The setting is expressed as an integer; for example, 1000. Depending on the selected model, you will have to check for the maximum supported value.

topP

It controls the diversity of generated answers. It is expressed as decimal numeric values, where a value of 1 indicates that all possible options are considered.

This parameter allows regulating the diversity of the generated answers, determining the extent of the options considered during the text generation process.

Stream

This value is deprecated. Check Chat API for streaming usage.

Verbose

It enables or disables the level of detail in the information generated by the model during the process. It can be configured with Boolean values, where 'true' enables the detailed output and 'false' disables it.

Cache

It controls whether to search the cache if the query has already been made. It is configured with Boolean values, where 'true' enables caching and 'false' disables it. Note that all assistant configuration values are taken into account in the cache search.

frequencyPenalty

It establishes the text generation process by favoring diversity and reducing the repetition of frequent terms, thus optimizing the quality and variety of the generated answers. It is configured with decimal numeric values from -2.0 to 2.0.

More information: https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty

presencePenalty

It establishes the text generation process by regulating the appearance of certain terms in the answers. It is configured with decimal numeric values from -2.0 to 2.0.

More information: https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty

endpoint

It configures the URL pointing to the specific server or service where the language models are hosted.

The endpoint configuration varies depending on the service provider used.

For example, for 'openai', the URL must be precise and contain the specific engine version:

https://api.openai.com/v1/chat/completions

In the case of 'azureopenai', the URL must include the name, implementation, and version:

https://{name}.openai.azure.com/openai/deployments/{implementation}/chat/completions?api-version={version}

type

It configures the the response format. The valid options are:valid options are:

Empty string (""): This is the default option and returns the response in plain text.
"json_object": This option returns the response in JSON format. To use this option, you must explicitly state in your prompt that you want a JSON output. For more details on how to format your prompt for JSON output, see Create chat completion.

Here's an example of how to use the "json_object" option:

Question: {question}
Helpful answer in JSON only following the format:

{{
"reply": "answer here"
}}

A valid response from the model using this format would look like this:

{
"reply": "This is the reply from the model"
}

Note: The "json_object" option is only available for OpenAI 3.5 or higher. Check the list in Supported Chat Models.