History Prompt Usage

This documentation is valid for:

Defines how historical interactions are handled in the conversation. It is configured in the Prompt of the RAG Assistant.

The default History Prompt value is as follows:

Given the following conversation and a follow-up question, rephrase the follow-up question to be a standalone question.

Chat History:
{chat_history}
Follow-Up Input: {question}
Standalone question:

Depending on the use case, you may need to try different alternatives to compact the history and rephrase the query to keep the relevant information. This is another prompt that can be used.

--- functionality ---
receive a chat history, along with a new question. And add a very brief and concise summary of the conversation history before the question

--- input structure ---

history: [conversation history]
new question: [question2]

--- output structure ---

[very concise summary of the previous conversation]

[question2]

--- input ---
history: {chat_history}
new question: {question}

--- question with summary of the previous conversation ---

Anyway, you can change the prompt according to your use case.

Compacting the Conversation History with LLMs

If you set the History Message Count parameter to a value greater than 0, the RAG Assistant will use an LLM to compact the conversation history before querying the vector store. This is especially important when using models with small context windows (e.g., 4k, 16k).

The RAG Assistant uses the History Prompt to summarize the conversation history and maintain the essential parts of the conversation and relevant context for the new question.

You can control this behavior using the compactHistory option:

compact (default): The RAG Assistant uses the History Prompt to summarize the conversation history.
none: The RAG Assistant bypasses the History Prompt and uses only the last n interactions (as defined by the History Message Count parameter). The last messages from previous conversations will be included (question and answer). Use this value for more control.

The compactHistory option is configured from the Profile Metadata, in Retrieval below chat/search. Here's an example:

{
    "chat": {
        "search": {
            "compactHistory": "none"
        }
    }
}

If you don't specify this value, it defaults to compact.

Note: You may need to adjust the Chunk Count parameter to balance the total number of input tokens sent to the prompt.

Sample

Consider a RAG Assistant with the following settings in the Prompt section:

History Message Count: 6
Chunk Count: 2

In a conversation where the following questions and answers are exchanged:

question 1
answer 1
question 2
answer 2
question 3
answer 3
question 4

To answer question 4, the RAG Assistant will perform the following steps:

compact

Call an LLM using the History Prompt to compact the history; in this case, it considers from "question 1" to "answer 3" (because History Message Count is 6).
Use the result of the previous call to get the best 2 chunks from the vector store.
Call an LLM using the Prompt with "question 4" and the context variable assigned.

none

Get the best 2 chunks from the vector store using "question 4".
Call an LLM using the Prompt with question 4 and the context variable assigned, plus the latest messages exchanged (the complete messages from "question 1" to "answer 3").

The last LLM call will differ depending on the compactHistory setting:

compact	None
{"model":"...","messages":[{"role":"user","content":"You are an assistant...question: 4"}]}	{"model":"...","messages":[{"role":"system","content":"You are an assistant....question: question 4"},{"role":"user","content":"question 1"},{"role":"assistant","content":"answer 1"},{"role":"user","content":"question 2"},{"role":"assistant","content":"answer 2"},{"role":"user","content":"question 3"},{"role":"assistant","content":"answer 3"},{"role":"user","content":"question 4"}]}

As you can see, the compact setting removes unnecessary context, while the none setting includes all the recent messages. Choose the setting that best suits your needs and the context window of your LLM.

History Prompt Usage

Compacting the Conversation History with LLMs

Sample

compact

none

See Also