Appearance
Choosing the Memory Component
LangChain has built-in various Memory components that can suit most application scenarios. The data structures and algorithms used by these components have been validated in practice. Therefore, learning the implementation principles of these components and knowing how to use the appropriate component in different scenarios is crucial for improving our LLM application development skills.
Today, we will detail and study the implementation principles and pros and cons of seven commonly used built-in Memory components: ConversationBufferMemory, ConversationBufferWindowMemory, ConversationSummaryMemory, ConversationSummaryBufferMemory, ConversationEntityMemory, VectorStoreRetrieverMemory, and ConversationKGMemory.
As of now (version 0.1.13), the Memory components mentioned above are only available in traditional chain versions. In the previous lesson, we mentioned that there are two crucial methods in the implementation of Memory components for traditional chains: load_memory_variables
for retrieving conversation history and save_context
for storing the current conversation. These two methods will also serve as the entry points for studying each component today.
ConversationBufferMemory
ConversationBufferMemory is the simplest memory component in LangChain, storing all raw conversation records in memory and returning the full conversation record upon query.
ConversationBufferMemory does not even implement its own save_context
, directly reusing BaseChatMemory
's. Below is the source code for BaseChatMemory.save_context
:
python
class BaseChatMemory(BaseMemory, ABC):
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
# Get input and model response from this conversation
input_str, output_str = self._get_input_output(inputs, outputs)
# Encapsulate into message objects and store in chat_memory.messages array
self.chat_memory.add_messages(
[HumanMessage(content=input_str), AIMessage(content=output_str)]
)
Source code for load_memory_variables
:
python
class ConversationBufferMemory(BaseChatMemory):
...
# Placeholder variable name for historical records in user prompt template
memory_key: str = "history"
# Get historical records (the buffer internally calls other methods to obtain self.chat_memory.messages; here simplified)
@property
def buffer(self) -> Any:
"""Return all content of self.chat_memory.messages"""
...
# Concatenate key and all conversation history directly
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Return history buffer."""
return {self.memory_key: self.buffer}
As we can see, ConversationBufferMemory is indeed quite simple: it uses a memory array to store all conversation records, appending the current conversation to the array each time it stores; when querying, it returns the entire array.
This component is very useful in scenarios with fewer interactions between the user and AI, such as customer service inquiries. Of course, the downside is evident: with too many interactions, it may exceed the model's token limit.
ConversationBufferWindowMemory
ConversationBufferWindowMemory is a simple optimization of the ConversationBufferMemory component, allowing us to set a maximum number of conversation rounds ( k ) to return at most the latest ( k ) conversation contents during queries, thereby controlling the length of the prompts.
The implementation code for ConversationBufferWindowMemory is largely the same as that for ConversationBufferMemory, except for some differences in returning conversation records.
python
class ConversationBufferWindowMemory(BaseChatMemory):
...
# Maximum number of rounds to return
k: int = 5
# Get historical records (the buffer internally calls other methods to obtain self.chat_memory.messages, so this function remains unchanged in actual code; here simplified for explanation)
@property
def buffer(self) -> Any:
...
# Since each round of dialogue has two messages (Human and AI), it needs to return k*2 messages
messages = self.chat_memory.messages[-self.k * 2 :] if self.k > 0 else []
...
ConversationBufferWindowMemory utilizes the coherence of conversations, where new dialogues often build on recent exchanges. Therefore, probabilistically speaking, the latest few rounds of conversation content are usually more relevant to the new conversation than older historical dialogues.
However, this is not absolute, especially for long-term conversations where older dialogue history may contain key information directly affecting new dialogues. Thus, ConversationBufferWindowMemory is more commonly used in short-term conversations.
ConversationSummaryMemory
To address the issue of lost historical messages in ConversationBufferWindowMemory, we can consider using the ConversationSummaryMemory component, which does not store raw conversation content but rather saves a summary of the entire conversation. Each time it stores, it calls the LLM to generate a new summary of the entire conversation.
Let’s first look at the implementation of save_context
:
python
class ConversationSummaryMemory(BaseChatMemory, SummarizerMixin):
...
# Summary of the entire conversation
buffer: str = ""
...
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
super().save_context(inputs, outputs)
# self.chat_memory.messages[-2:] represents the Human message and AI message from this round of dialogue
# self.buffer represents the previous conversation summary
self.buffer = self.predict_new_summary(
self.chat_memory.messages[-2:], self.buffer
)
In predict_new_summary
, it calls the LLM to merge the current conversation content with the previous conversation summary to generate a new summary. The summarization prompt template used is as follows (translated into English):
plaintext
_DEFAULT_SUMMARIZER_TEMPLATE = """
Summarize the provided conversation content step by step, adding to the previous summary and returning a new summary.
Example
Current Summary:
The human asks the AI about its views on AI. The AI believes that AI is a positive force.
New Conversation Content:
Human: Why do you think AI is a positive force?
AI: Because AI will help humans reach their full potential.
New Summary:
The human asks the AI about its views on AI. The AI believes that AI is a positive force because it will help humans reach their full potential.
Example End
Current Summary:
{summary}
New Conversation Content:
{new_lines}
New Summary:
"""
load_memory_variables
is straightforward, returning the conversation summary self.buffer
, so we won't post the code here.
The advantages of ConversationSummaryMemory are clear: by continuously synthesizing summaries, it retains the dialogue content of the entire conversation. However, the downsides are also evident: each time, it requires an additional call to the LLM for summary merging, which undoubtedly increases the time spent in user interactions and the cost of dialogues. Additionally, the summary may suffer from “distortion,” occasionally omitting key dialogue content.
ConversationSummaryBufferMemory
LangChain offers another summary memory component—ConversationSummaryBufferMemory—that combines the summarization features of ConversationSummaryMemory with the storage of original conversation content.
ConversationSummaryBufferMemory retains recent raw interaction content in its buffer and imposes a max_token_limit
to limit the maximum number of tokens in the buffer. When the limit is exceeded, it calls the LLM to summarize the older conversation content. During queries, it returns both the original conversation records from the buffer and the older conversation summary.
This approach has two main benefits:
- By retaining the original records of newer historical conversations, it reduces the potential information loss caused by summarization “distortion.”
- It only calls the LLM to summarize when exceeding the
max_token_limit
, avoiding frequent LLM calls, thus improving performance and user experience.
Let's first examine the storage logic of the component:
python
class ConversationSummaryBufferMemory(BaseChatMemory, SummarizerMixin):
# Limit the maximum token count in the raw conversation content buffer, default is 2000
max_token_limit: int = 2000
# Store old conversation summaries
moving_summary_buffer: str = ""
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
super().save_context(inputs, outputs)
self.prune()
def prune(self) -> None:
"""Prune buffer if it exceeds max token limit"""
# Get all conversation content in the current buffer
buffer = self.chat_memory.messages
# Calculate the current token count in the buffer
curr_buffer_length = self.llm.get_num_tokens_from_messages(buffer)
if curr_buffer_length > self.max_token_limit:
pruned_memory = []
while curr_buffer_length > self.max_token_limit:
pruned_memory.append(buffer.pop(0))
# Recalculate token count after removing the oldest message
curr_buffer_length = self.llm.get_num_tokens_from_messages(buffer)
# When token count is within limit, exit loop and summarize removed messages with the old summary
self.moving_summary_buffer = self.predict_new_summary(
pruned_memory, self.moving_summary_buffer
)
When saving the current conversation, if the buffer (chat_memory.messages
) has few messages, it simply appends the new message to the end. If the buffer exceeds the max_token_limit
, it removes the oldest message and recalculates the token count until it's within limits, at which point it calls the LLM to merge the removed messages and the old summary into a new summary.
The prompt template used for generating summaries in ConversationSummaryBufferMemory is the same as that in ConversationBufferWindowMemory.
During queries, the logic is straightforward: it returns the old conversation summary (moving_summary_buffer
) and the original conversation records in the buffer (chat_memory.messages
).
ConversationEntityMemory
ConversationEntityMemory addresses the limitations of ConversationSummaryBufferMemory by generating separate summaries for each topic or entity discussed in a conversation. This approach helps retain crucial information that might otherwise be lost when only one summary is created.
Implementation Overview
The load_memory_variables
method is central to this component:
python
class ConversationEntityMemory(BaseChatMemory):
...
llm: BaseLanguageModel
entity_cache: List[str] = []
k: int = 3
entity_extraction_prompt: BasePromptTemplate = ENTITY_EXTRACTION_PROMPT
entity_store: BaseEntityStore = Field(default_factory=InMemoryEntityStore)
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
chain = LLMChain(llm=self.llm, prompt=self.entity_extraction_prompt)
buffer_string = get_buffer_string(self.buffer[-self.k * 2 :], ...)
output = chain.predict(
history=buffer_string,
input=inputs[prompt_input_key],
)
entities = [w.strip() for w in output.split(",")]
entity_summaries = {}
for entity in entities:
entity_summaries[entity] = self.entity_store.get(entity, "")
self.entity_cache = entities
return {
self.chat_history_key: buffer,
"entities": entity_summaries,
}
In this method, the component uses an LLM to extract the list of entities from recent dialogues, which helps ensure that references in the conversation (like pronouns) are correctly identified.
Purpose of the Variable k
The variable k
determines how many recent dialogue turns are considered when extracting entities. This is crucial for accurately interpreting references to previously mentioned topics, enhancing the coherence of the memory system.
By generating separate summaries for each entity discussed, ConversationEntityMemory significantly reduces the risk of losing important information, ensuring that all relevant topics are maintained throughout the conversation.
Why Extracting Entity Lists Requires Recent k Rounds of Dialogue
The reason for using the content of the most recent k rounds of dialogue to extract the entity list is primarily to resolve pronoun references in the dialogue. Consider the following conversation:
Human: Are you familiar with the LangChain framework?
AI: Of course.
Human: Can you use it for LLM application development?
If we want to extract the entity list from the last round of dialogue, the pronoun "it" makes it impossible to accurately identify the entity "LangChain" based solely on the last round's content.
Finally, let's appreciate the prompt template for extracting the entity list (translated):
python
_DEFAULT_ENTITY_EXTRACTION_TEMPLATE = """
You are an AI assistant reading a record of a conversation between AI and humans. Extract all proper nouns from the last line of the dialogue. As a guideline, proper nouns are usually capitalized. You should explicitly extract all names and locations.
Providing the dialogue history is solely for resolving core references (for example, "How much do you know about him?" where "him" was defined in the previous line) — ignore items that are not mentioned in the last line.
If there is nothing noteworthy to return (for example, if the user is just greeting or having a simple conversation), the result should be a single comma-separated list, or NONE.
Example:
Dialogue History:
Person #1: How is your day going?
AI: "Very well! And you?"
Person #1: Good! Busy working on LangChain. A lot to do.
AI: "Sounds like a heavy workload! What are you doing to improve LangChain?"
Last line:
Person #1: I’m trying to improve LangChain's interface, user experience, and its integration with various products users might want... a lot to do.
Output: LangChain
End of example
Example:
Dialogue History:
Person #1: How is your day going?
AI: "Very well! And you?"
Person #1: Good! Busy working on LangChain. A lot to do.
AI: "Sounds like a heavy workload! What are you doing to improve LangChain?"
Last line:
Person #1: I’m trying to improve LangChain's interface, user experience, and its integration with various products users might want... a lot to do. I’m working with Person #2.
Output: LangChain, Person #2
End of example
Dialogue History (for reference):
{history}
Last line of the dialogue (for extraction):
Human: {input}
Output:
"""
Now, let's continue learning how ConversationEntityMemory
stores dialogue content:
python
class ConversationEntityMemory(BaseChatMemory):
# List of entities involved in the current session
entity_cache: List[str] = []
...
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
super().save_context(inputs, outputs)
# Retrieve the most recent k rounds of dialogue from chat_memory.messages
buffer_string = get_buffer_string(self.buffer[-self.k * 2:], ...)
# LLM chain for generating summaries
chain = LLMChain(llm=self.llm, prompt=self.entity_summarization_prompt)
for entity in self.entity_cache:
# Existing summary of the entity
existing_summary = self.entity_store.get(entity, "")
# Merge the current dialogue content with the existing summary to generate a new summary
output = chain.predict(
summary=existing_summary,
entity=entity,
history=buffer_string,
input=input_data,
)
# Update the entity's summary information in the cache
self.entity_store.set(entity, output.strip())
In load_memory_variables
, after calling LLM to get the entity list for the current dialogue, the entity_cache
variable stores these entities. In save_context
, it iterates through all entities in entity_cache
, calls LLM again, and combines the current dialogue with the old entity summaries to generate new summaries.
The prompt template for updating entity summaries is as follows (translated):
python
_DEFAULT_ENTITY_SUMMARIZATION_TEMPLATE = """
You are an AI assistant helping humans track facts about relevant people, places, and concepts in their lives. Based on your last dialogue with humans, update the summary of the provided entity in the "Entity" section. If this is your first time writing a summary, return a single sentence.
Updates should include only the facts conveyed about the provided entity in the last line of dialogue and should only contain facts about that entity.
If there is no new information about the provided entity, or if the information is not noteworthy (not important or relevant facts for long-term memory), return the existing summary unchanged.
Complete dialogue history (for reference):
{history}
Entity to summarize:
{entity}
Existing summary of {entity}:
{summary}
Last line of dialogue:
Human: {input}
Updated summary:
"""
Compared to ConversationSummaryBufferMemory
, ConversationEntityMemory
has clear advantages but also a drawback: in save_context
, it updates the summary for each entity, meaning the more entities involved in a dialogue round, the more LLM calls needed when storing memories.
VectorStoreRetrieverMemory (Vector Store Retrieval Memory Component)
While ConversationEntityMemory
tries to minimize information loss through entity-based summaries, the risk of loss still exists, especially with multiple LLM calls increasing conversation time and affecting user experience.
Analogous to RAG applications, the growing dialogue content can be viewed as external data in RAG. A vector storage method can be used to vectorize and store the original dialogue content; when querying, a vector query algorithm is specified to retrieve relevant memories. This is the fundamental principle of VectorStoreRetrieverMemory
in LangChain.
Unlike other memory components, VectorStoreRetrieverMemory
does not directly handle memory storage or query algorithms. Instead, it manages memory data via a VectorStoreRetriever
passed during initialization.
python
from langchain.memory import VectorStoreRetrieverMemory
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
# Use Chroma to store memory data and specify the vectorization method
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
# Define the vector query algorithm and number of returns
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})
# Instantiate VectorStoreRetrieverMemory object
memory = VectorStoreRetrieverMemory(retriever=retriever)
Thus, the logic in load_memory_variables
and save_context
is lightweight:
python
class VectorStoreRetrieverMemory(BaseMemory):
retriever: VectorStoreRetriever = Field(exclude=True)
...
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Union[List[Document], str]]:
"""Return history buffer."""
input_key = self._get_prompt_input_key(inputs)
query = inputs[input_key]
# Use retriever to query the most relevant memory documents related to user input
docs = self.retriever.get_relevant_documents(query)
result: Union[List[Document], str]
if not self.return_docs:
result = "\n".join([doc.page_content for doc in docs])
else:
result = docs
return {self.memory_key: result}
def _form_documents(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> List[Document]:
"""
Format inputs and outputs as document objects
return [Document(page_content="input:output")]
"""
...
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
documents = self._form_documents(inputs, outputs)
self.retriever.add_documents(documents)
When querying memories, the retriever
is used to find the most relevant documents related to user input and return them; since the messages per dialogue round are generally not large, documents are created directly without chunking, and stored in a vectorized manner using the retriever.
With VectorStoreRetrieverMemory
, we can persist memory data, allowing users to return to the dialogue and continue previous conversations. Additionally, as the original dialogue content is stored, there’s no risk of information loss. This component is highly suitable for long-term dialogue scenarios.
ConversationKGMemory (Knowledge Graph Memory Component)
While VectorStoreRetrieverMemory
seems nearly perfect, consider the following dialogue:
Human: Can you recommend any good movies?
AI: You might consider "Interstellar" and "Inception," both sci-fi films directed by Christopher Nolan.
Human: What other films has he directed?
AI: "Memento," "The Dark Knight," and "The Dark Knight Rises," etc.
If we ask, "What films has Nolan directed?" the reference to "he" can cause issues since vector queries can only retrieve the first round's dialogue.
This limitation arises because vector queries do not recognize semantic connections between dialogue contents, leading to incomplete information retrieval.
Additionally, during conversations, we might correct the AI's responses, such as in the following dialogue:
Human: Name three of the most handsome men in the world.
AI: Daniel Wu, Andy Lau, and you!
Human: You're wrong, be humble; I'm not.
AI: Sorry, I misspoke; you are indeed not.
If we store this dialogue and later ask, "Name three of the most handsome men," the correction won't be retrieved, leading to an incorrect answer. This illustrates that vector storage cannot update or correct information.
Using a Knowledge Graph (KG) as a data structure to manage memory data can effectively resolve these issues.
What is a Knowledge Graph?
A knowledge graph is a structured semantic data structure that organizes information graphically. Think of it as a vast network diagram where each node represents an entity (like a person, place, or object), and the edges between nodes represent relationships. This allows us to understand various connections in the world.
Core components of a knowledge graph include:
- Entities: Nodes representing objects in the real world (e.g., people, locations).
- Relations: Edges connecting entities, indicating a relationship (e.g., "lives in," "founded").
- Attributes: Information associated with entities, describing their characteristics (e.g., a person’s "age" or "occupation").
LangChain has designed ConversationKGMemory
, utilizing a knowledge graph to reconstruct and store information from dialogues.
To illustrate how ConversationKGMemory
works, let's create an instance and save some dialogue contexts:
python
from langchain.memory import ConversationKGMemory
from langchain_openai import OpenAI
llm = OpenAI(temperature=0)
memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
Now, let’s look at the implementation of save_context
:
python
class ConversationKGMemory(BaseChatMemory):
...
# Implementation of the knowledge graph, using networkx
kg: NetworkxEntityGraph = Field(default_factory=NetworkxEntityGraph)
# Prompt template for extracting knowledge triplets
knowledge_extraction_prompt: BasePromptTemplate = KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT
llm: BaseLanguageModel
# Number of dialogue messages to return
k: int = 2
...
def get_knowledge_triplets(self, input_string: str) -> List[KnowledgeTriple]:
chain = LLMChain(llm=self.llm, prompt=self.knowledge_extraction_prompt)
# Retrieve recent k rounds of dialogue
buffer_string = get_buffer_string(self.buffer[-self.k * 2:], ...)
# Call LLM to get triplet information
output = chain.predict(
history=buffer_string,
input=input_string,
verbose=True,
)
# Format as List[KnowledgeTriple]
knowledge = parse_triples(output)
return knowledge
def _get_and_update_kg(self, inputs: Dict[str, Any]) -> None:
"""Get and update knowledge graph from conversation history."""
prompt_input_key = self._get_prompt_input_key(inputs)
knowledge = self.get_knowledge_triplets(inputs[prompt_input_key])
for triple in knowledge:
self.kg.add_triple(triple)
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
super().save_context(inputs, outputs)
self._get_and_update_kg(inputs)
The kg
variable represents the underlying knowledge graph, maintaining the conversation's knowledge structure. The core logic involves calling get_knowledge_triplets
to retrieve entity triplets from the dialogue, which are formatted into KnowledgeTriple
and then added to the knowledge graph.
It’s important to note that the prompt for extracting triplet entities also requires the most recent k rounds of dialogue to resolve pronoun references.
For example, passing a new message into get_knowledge_triplets
could yield:
python
memory.get_knowledge_triplets("her favorite color is red")
# [KnowledgeTriple(subject='sam', predicate='has a favorite color', object_='red')]
Here, the LLM correctly identifies "her" as "sam" based on previous dialogue, generating relevant entity relationships.
Next, let’s examine load_memory_variables
:
python
class ConversationKGMemory(BaseChatMemory):
...
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Return history buffer."""
# Get all entities involved in the current dialogue
entities = self._get_current_entities(inputs)
summary_strings = []
for entity in entities:
# Retrieve knowledge for the specified entity
knowledge = self.kg.get_entity_knowledge(entity)
if knowledge:
summary = f"On {entity}: {'. '.join(knowledge)}."
summary_strings.append(summary)
...
context = "\n".join(summary_strings)
return {self.memory_key: context}
Querying memories in ConversationKGMemory
involves first using the LLM to analyze the current dialogue's entities, similar to ConversationEntityMemory
. Then, knowledge for each entity is retrieved from the knowledge graph, ensuring richer context and accuracy in responses.
Summary
ConversationBufferMemory vs. ConversationBufferWindowMemory
- ConversationBufferMemory: Suitable for short dialogues; returns all conversation content on each query, risking exceeding the model's token limit.
- ConversationBufferWindowMemory: Limits the size of the returned dialogue window, reducing the risk of exceeding token limits but may lead to message loss.
ConversationSummaryMemory vs. ConversationSummaryBufferMemory
- ConversationSummaryMemory: Generates dialogue summaries, effectively mitigating the token limit risk.
- ConversationSummaryBufferMemory: Utilizes dialogue coherence to additionally return recent original dialogue content, helping to minimize the loss of key information.
ConversationEntityMemory
- Extracts dialogue entities using LLMs, generating individual summaries for each entity to prevent interference. However, it requires multiple LLM calls, increasing response time.
VectorStoreRetrieverMemory
- Stores and queries dialogue data in vectorized form, fundamentally addressing information loss from summary implementations. Yet, due to vector algorithms' inability to capture contextual associations, it may return inaccurate memory data.
ConversationKGMemory
- Uses a knowledge graph to store entity information in a graph data structure, allowing real-time updates of relationships between dialogue entities for more accurate memory retrieval.
Conclusion
The choice of memory component should be based on the specific application context. Understanding the characteristics of each component and the problems they address allows for effective modifications and combinations, enabling a more powerful utilization of LLMs to create an optimal solution.