Appearance
Memory
Today we will step into a new module—Memory. This is an extremely important topic in LLM application development that we haven't discussed before.
Most LLM applications require multiple rounds of interaction with users. If the LLM cannot remember the previous conversation history during interactions, it will be unable to provide coherent and meaningful responses, much like having "dementia." This significantly diminishes the user experience, as shown in the figure below:
The Memory module is designed to address this issue; it functions like the human hippocampus, responsible for storing and recalling key information to ensure smooth and accurate conversations.
In today’s session, we will first briefly discuss the basic concepts and principles of the Memory module. Then, through a simple LLM example, you will experience the power of the Memory module firsthand. Finally, we will dive into the source code to learn how traditional chains and LCEL chains in LangChain use and implement Memory.
Basic Principles of the Memory Module
How can we enable the LLM to respond based on conversation history? From the previous experience of constructing RAG applications, it is easy to think of manipulating the construction of prompts. We can directly send past conversation content along with the question to the LLM, allowing it to reference the historical conversation content when answering.
For example, the following prompt template:
python
promptTemplate = """
The <history> tag below contains the historical conversation records between AI and Human. Reference the historical context to answer the user's question.
Historical conversation:
<history>
{chat_history}
</history>
Human: {question}
AI:"""
In this prompt template, we have left a placeholder for the historical conversation content. Before sending it to the LLM for execution, we query the relevant historical dialogue and fill it into the prompt. After the LLM responds, we find a way to store this question and answer for future reference.
Yes, the basic principle of Memory is that before LLM execution, we query historical records, and after LLM execution, we store the current dialogue records.
Although the integration of Memory into LLMs is not complex, designing a functional and accurate Memory system is not simple.
The First Question: How to Query?
Current large language models have token limits, so if we merge all conversation content into the prompt each time, as the number of interactions increases, the prompt will exceed the model’s context length limit.
Even if the number of tokens supported by the model is sufficient, including all conversation content can lead to skyrocketing conversation costs. Moreover, the LLM may struggle to extract the most relevant historical content related to the current dialogue, resulting in a decline in answer quality.
Thus, how to query the context related to the current question from historical conversations—the query algorithm—will directly impact the quality of the model's responses.
The Second Question: How to Store?
Storing conversation records is relatively straightforward. LangChain integrates various databases, making it easy to use the classes and methods encapsulated in LangChain to save memory to different databases. You can click here for specific integration links.
Of course, which storage method to use depends more on the structure and algorithm of the query data. For example, if using vector similarity or knowledge graphs as query algorithms, we need to choose different databases for storage.
Whether to store the entire conversation content also depends on the design of the actual system. For instance, to avoid wasting storage space, we may choose to retain only the data from the past month.
Sometimes, we don’t store the original conversation records. Based on the accuracy of the query and storage space considerations, we can merge relevant conversations before storage.
LangChain provides different storage methods and Memory components with query algorithms, allowing us to use them out of the box. We will explain the implementation principles and usage of these components in the next section.
Now, let’s experience the transformation of LLM after adding Memory through a simple example.
An Example to Experience the Power of Memory
The first step in “enhancing” the LLM with Memory is to instantiate a Memory object. Here, we choose ConversationBufferMemory
, which is a relatively simple Memory implementation in LangChain.
python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history")
The memory_key
is assigned the name of the placeholder variable representing the historical conversation in the prompt template, as follows:
python
from langchain.prompts import PromptTemplate
template_str = """
You are a chatbot having a conversation with a human.
Previous conversation:
{chat_history}
Human: {question}
AI:"""
prompt_template = PromptTemplate.from_template(template_str)
Next, we create a model instance and build the chain using LLMChain
, passing the ConversationBufferMemory
object created above into the memory
parameter of the LLMChain
constructor.
python
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
llm = ChatOpenAI(temperature=0)
memory_chain = LLMChain(
llm=llm,
prompt=prompt_template,
verbose=True, # verbose means print detailed process
memory=memory,
)
Now we have an LLM application chain with memory functionality. Let’s try asking a question:
python
memory_chain.predict(question="Hello, I am Jack.")
> Entering new LLMChain chain...
Prompt after formatting:
You are a chatbot having a conversation with a human.
Previous conversation:
Human: Hello, I am Jack.
AI:
> Finished chain.
Hello, Jack. I am a chatbot. How can I assist you?
Now let's ask, “Do you remember what my name is?”:
python
memory_chain.predict(question="Do you remember what my name is?")
> Entering new LLMChain chain...
Prompt after formatting:
You are a chatbot having a conversation with a human.
Previous conversation:
Human: Hello, I am Jack.
AI: Hello, Jack. I am a chatbot. How can I assist you?
Human: Do you remember what my name is?
AI:
> Finished chain.
Of course, you just told me your name is Jack. Do you have any other questions?
We can see that in the subsequent dialogue, the prompts sent to the LLM included the historical conversation content, and the LLM indeed provided satisfactory responses.
Implementation of the Memory Module in LangChain
The two most fundamental and important actions in the Memory module are querying and storing. In this section, we will analyze the source code to see how LangChain integrates these two actions into the construction of chains.
As we learned earlier, there are two types of chains in LangChain: traditional chains and LCEL. The way these two types link to the Memory module differs.
Memory Implementation in Traditional Chains
In traditional chains, all implementation chain classes inherit from Chain
, and the execution path generally follows: XXXChain.predict/XXXChain(...) -> Chain.__call__ -> Chain.invoke
.
python
class LLMChain(Chain):
...
def predict(self, callbacks: Callbacks = None, **kwargs: Any) -> str:
return self(kwargs, callbacks=callbacks)[self.output_key]
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
...
def __call__(...) -> Dict[str, Any]:
...
return self.invoke(...)
This means that ultimately, it will call the invoke
method of the Chain
.
python
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
def invoke(...):
...
inputs = self.prep_inputs(input)
...
# Call the specific Chain's _call interface to perform the actual LLM call
outputs = self._call(inputs)
...
final_outputs: Dict[str, Any] = self.prep_outputs(...)
...
return final_outputs
From the implementation code of invoke
, we can see that before calling the model, it first calls the prep_inputs
method to perform some preprocessing. After the model responds, it also calls prep_outputs
to perform some output preprocessing before returning to the user.
Memory's querying and storing are handled in these two places. Let’s first look at prep_inputs
:
python
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
...
memory: Optional[BaseMemory] = None
...
def prep_inputs(self, inputs: Union[Dict[str, Any], Any]) -> Dict[str, str]:
...
if self.memory is not None:
external_context = self.memory.load_memory_variables(inputs)
inputs = dict(inputs, **external_context)
...
return inputs
Do you remember the Memory
object ConversationBufferMemory
that was passed to LLMChain
in the previous example? This object is assigned to the Chain.memory
variable.
When Chain.memory
is assigned, it calls the load_memory_variables
method of the Memory object to retrieve historical records. The retrieved historical records are added to the final prompt.
Having understood the path for fetching historical records, let’s see how prep_outputs
stores the current dialogue.
python
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
...
memory: Optional[BaseMemory] = None
...
def prep_outputs(
self,
inputs: Dict[str, str],
outputs: Dict[str, str],
return_only_outputs: bool = False,
) -> Dict[str, str]:
...
if self.memory is not None:
self.memory.save_context(inputs, outputs)
...
In prep_outputs
, the current user input and model response are passed to the save_context
method of the Chain.memory
variable for storage.
Now, let’s focus on the Chain.memory
variable, which is an instance of BaseMemory
. BaseMemory
is an interface designed by LangChain for Memory, containing necessary abstract methods such as load_memory_variables
and save_context
:
python
class BaseMemory(Serializable, ABC):
@abstractmethod
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Return key-value pairs given the text input to the chain."""
@abstractmethod
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save the context of this chain run to memory."""
...
Any specific Memory class can seamlessly integrate into LangChain's chains as long as it implements these methods.
Implementation of Memory in LCEL Chains
LCEL is a chain-building method recommended by LangChain. Although many Memory components are still developed using the traditional chain integration method (implementing the BaseMemory interface) as of LangChain version 0.1.13, the framework code for integrating Memory with LCEL has already been implemented.
Let's rewrite the previous Memory example using the LCEL approach to quickly understand how to add Memory functionality to an LCEL chain.
The prompt template and model instantiation are the same as before, so I'll directly copy the code for clarity:
python
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
template_str = """
You are a chatbot having a conversation with a human.
Previous conversation:
{chat_history}
Human: {question}
AI:"""
prompt_template = PromptTemplate.from_template(template_str)
llm = ChatOpenAI(temperature=0)
Next, we build a chain using LCEL syntax:
python
memory_chain = prompt_template | llm
Now, we need to implement a function to retrieve the conversation history:
python
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
This function takes a string variable session_id
(a requirement by LangChain) and returns an instance of the BaseChatMessageHistory
class, representing the historical records retrieved based on the session ID.
Finally, we instantiate a RunnableWithMessageHistory
object using the memory_chain
and get_session_history
:
python
from langchain_core.runnables.history import RunnableWithMessageHistory
with_message_history = RunnableWithMessageHistory(
memory_chain,
get_session_history,
input_messages_key="question",
history_messages_key="chat_history",
)
Here, input_messages_key
specifies the variable name for user input in the prompt template, while history_messages_key
specifies the variable name for historical records.
To execute the chain, specify the session ID:
python
with_message_history.invoke(
{"question": "你好,我是jack"},
config={"configurable": {"session_id": "abc123"}},
)
with_message_history.invoke(
{"question": "我的名字叫什么?"},
config={"configurable": {"session_id": "abc123"}},
)
At first glance, it may seem like there’s no memory storage code present. However, the key lies in the BaseChatMessageHistory
returned by get_session_history
.
BaseChatMessageHistory
is designed by LangChain to store historical dialogues, functioning similarly to BaseMemory
in traditional chains. It provides an abstract method add_message
for adding historical records.
python
class BaseChatMessageHistory(ABC):
...
messages: List[BaseMessage]
...
def add_message(self, message: BaseMessage) -> None:
For instance, the ChatMessageHistory
class inherits from BaseChatMessageHistory
, and its add_message
method simply adds the current message to the messages
array.
python
class ChatMessageHistory(BaseChatMessageHistory, BaseModel):
...
def add_message(self, message: BaseMessage) -> None:
"""Add a self-created message to the store"""
self.messages.append(message)
...
Thus, after the model responds, the BaseChatMessageHistory
returned by get_session_history
can call add_message
to store the current dialogue.
Next, let's focus on RunnableWithMessageHistory
, which extends RunnableBindingBase
and serves as a decorator to manage chat message history.
We can trace its execution process starting from the call method. The invoke
method of with_message_history
calls:
python
class RunnableBindingBase(RunnableSerializable[Input, Output]):
def invoke(...) -> Output:
return self.bound.invoke(
input,
self._merge_configs(config),
**{**self.kwargs, **kwargs},
)
Here, the first point of interest is self._merge_configs(config)
, where config
is the passed {"configurable": {"session_id": "abc123"}}
. The _merge_configs
method of RunnableWithMessageHistory
is then called:
python
class RunnableWithMessageHistory(RunnableBindingBase):
def _merge_configs(self, *configs: Optional[RunnableConfig]) -> RunnableConfig:
...
config = super()._merge_configs(*configs)
configurable = config.get("configurable", {})
self.history_factory_config = [
ConfigurableFieldSpec(
id="session_id",
annotation=str,
name="Session ID",
description="Unique identifier for a session.",
default="",
is_shared=True,
),
]
...
message_history = self.get_session_history(
**{key: configurable[key] for key in expected_keys}
)
config["configurable"]["message_history"] = message_history
return config
In _merge_configs
, the get_session_history
function retrieves the corresponding BaseChatMessageHistory
object, which is then passed down.
Looking back at RunnableBindingBase
's invoke
method, it ultimately calls the invoke
method of the bound
variable. The assignment of bound
in RunnableWithMessageHistory
is as follows:
python
class RunnableWithMessageHistory(RunnableBindingBase):
def __init__(
self,
runnable,
get_session_history,
...
input_messages_key: Optional[str] = None,
...
history_messages_key: Optional[str] = None,
...
) -> None:
history_chain: Runnable = RunnableLambda(
self._enter_history, self._aenter_history
).with_config(run_name="load_history")
messages_key = history_messages_key or input_messages_key
if messages_key:
history_chain = RunnablePassthrough.assign(
**{messages_key: history_chain}
).with_config(run_name="insert_history")
bound = (
history_chain | runnable.with_listeners(on_end=self._exit_history)
).with_config(run_name="RunnableWithMessageHistory")
Here, a new LCEL chain is created by adding pre-processing (_enter_history
) and post-processing (_exit_history
) to the incoming LCEL chain, and it is assigned to the bound
variable.
Now, let's see what _enter_history
and _exit_history
do:
python
class RunnableWithMessageHistory(RunnableBindingBase):
def _enter_history(self, input: Any, config: RunnableConfig) -> List[BaseMessage]:
hist = config["configurable"]["message_history"]
...
return hist.messages.copy()
def _exit_history(self, run: Run, config: RunnableConfig) -> None:
hist = config["configurable"]["message_history"]
...
for m in input_messages + output_messages:
hist.add_message(m)
In _enter_history
, it simply copies the contents of BaseChatMessageHistory.messages
. In _exit_history
, it calls BaseChatMessageHistory.add_message
to store both the user and model response messages.
At this point, you might wonder how to add complex logic for querying historical records, such as custom Memory components that vectorize storage for vector queries. To achieve this, you can use Python's property
decorator to access the return values of methods as attributes. For example, defining a method named messages
and using @property
will allow you to access hist.messages
as a method call.
python
class VectorMessageHistory(BaseChatMessageHistory, BaseModel):
@property
def messages(self) -> List[BaseMessage]:
"""Use vector similarity to query related dialogue records"""
...
In my opinion, it would be more intuitive if BaseChatMessageHistory
provided a get_messages
abstract method for developers to inherit and implement.
Summary
The Memory module is a crucial topic in LLM application development. The principle behind integrating Memory is straightforward: query historical records before executing the LLM to merge them into the prompt, and store the current dialogue records after execution.
The key to designing a Memory system lies in the data structures and algorithms used for querying and storing historical records. LangChain offers a variety of Memory components suited for different use cases. As of version 0.1.7, most Memory components are designed for traditional chains, but it's expected that future versions will introduce components for LCEL chains.
The integration methods for Memory in traditional and LCEL chains in LangChain differ. This article has provided a detailed source code analysis, and following this walkthrough should deepen your understanding and help you adapt more quickly to future iterations of the framework.